Next Article in Journal
Improving Perceived Quality of Live Adaptative Video Streaming
Previous Article in Journal
Entropy as an Objective Function of Optimization Multimodal Transportations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Entropy Gain of Linear Systems and Some of Its Implications

1
Department of Electronic Engineering, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso 2390123, Chile
2
Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark
*
Authors to whom correspondence should be addressed.
Entropy 2021, 23(8), 947; https://doi.org/10.3390/e23080947
Submission received: 23 May 2021 / Revised: 12 July 2021 / Accepted: 20 July 2021 / Published: 24 July 2021
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
We study the increase in per-sample differential entropy rate of random sequences and processes after being passed through a non minimum-phase (NMP) discrete-time, linear time-invariant (LTI) filter G. For LTI discrete-time filters and random processes, it has long been established by Theorem 14 in Shannon’s seminal paper that this entropy gain, G ( G ) , equals the integral of log | G ( e j ω ) | . In this note, we first show that Shannon’s Theorem 14 does not hold in general. Then, we prove that, when comparing the input differential entropy to that of the entire (longer) output of G, the entropy gain equals G ( G ) . We show that the entropy gain between equal-length input and output sequences is upper bounded by G ( G ) and arises if and only if there exists an output additive disturbance with finite differential entropy (no matter how small) or a random initial state. Unlike what happens with linear maps, the entropy gain in this case depends on the distribution of all the signals involved. We illustrate some of the consequences of these results by presenting their implications in three different problems. Specifically: conditions for equality in an information inequality of importance in networked control problems; extending to a much broader class of sources the existing results on the rate-distortion function for non-stationary Gaussian sources, and an observation on the capacity of auto-regressive Gaussian channels with feedback.

1. Introduction

We study the difference between the differential entropy rate of a random process u 1 = { u 1 , u 2 , } entering a discrete-time linear time-invariant (LTI) system G and the differential entropy rate of its (possibly noisy) output y 1 , as depicted in Figure 1.
Recall that the differential entropy rate of a random process x 1 is given by h ¯ ( x 1 ) lim n n 1 h ( x 1 , x 2 , , x n ) , provided the limit exists, where h ( x 1 , , x n ) = E log f ( x 1 , , x n ) is the differential entropy of the ensemble x 1 , , x n with probability density function (PDF) f [1]. The system G is supposed to satisfy the following:
Assumption 1.
The LTI system G in Figure 1 is causal and stable and such that
1.
G has a rational p-th order transfer function G ( z ) with m zeros { ρ i } i = 1 m outside the unit circle, i.e., non-minimum-phase (NMP) zeros, where m { 0 , 1 , , p } , indexed in non-increasing magnitude order, i.e., | ρ 1 | | ρ 2 | | ρ m | > 1 .
2.
The unit-impulse response of G, say, g 0 , g 1 , satisfies | g 0 | = 1 .
In this general setup, G may have a random initial state vector x 0 R p , p N , and a real-valued random output disturbance z 1 . Our main purpose is to characterize the limit
G ( G , x 0 , u 1 , z 1 ) lim n 1 n h ( y 1 n ) h ( u 1 n ) ,
evaluating the possible effect produced by x 0 and z 1 . This difference can be interpreted as the entropy gain (entropy amplification or entropy boost) introduced by the filter G and (as apparent from the other variables in the argument of G ) the statistics of x 0 , u 1 , z 1 . We shall refer to the special case in which x 0 and z 1 are both zero (or deterministic) as the noise-less case, and write G ( G , 0 , u 1 , 0 ) accordingly.
The earliest reference related to this problem corresponds to a noise-less continuous-time counterpart considered by Shannon. In his seminal 1948 paper [2], Shannon gave a formula for the change in differential entropy per degree of freedom that a continuous-time random process u c , band-limited to a frequency range [ 0 , B ) (in Hz), experiences after passing through an LTI continuous-time filter G c (without considering a random initial state or an output disturbance). Such entropy per degree of freedom is defined in terms of uniformly taken samples as
h ¯ ( u c ) lim n 1 n h ( u c ( T ) , u c ( 2 T ) , , u c ( n T ) ) ,
with T 1 / ( 2 B ) . In this formula, if the LTI filter has frequency response G c ( ξ ) (with ξ in Hz), then the resulting differential entropy rate of the output process y c is given by the following theorem:
Theorem 1
(Reference [2], Theorem 14). If an ensemble having an entropy h ¯ ( u c ) per degree of freedom in band B is passed through a filter with characteristic G c ( ξ ) the output ensemble has an entropy
h ¯ ( y c ) = h ¯ ( u c ) + 2 B 0 B log G c ( ξ ) d ξ .
Shannon arrived at (3) by arguing that an LTI filter can be seen as a linear operator that selectively scales its input signal along infinitely many frequencies, each of them representing an orthogonal component of the source. He then obtained the result by writing down the determinant of the Jacobian of this operator as the product of the squared frequency response magnitude of the filter over n frequency bands, applying logarithm, dividing by n, and then taking the limit as n tends to infinity.
Remark 1.
There is a factor of two in excess in the integral on the right-hand side (RHS) of (3). To see this, consider a filter with a constant gain a over [ 0 , B ) (i.e., a simple multiplicative factor). In such case, the entropy rate of y c should exceed that of u c by log a [1]. However, (3) yields an entropy gain equal to 2 log a . This error arises because the determinant of the Jacobian of the transformation is actually the product of G c over the n frequency bands considered in Shannon’s argument. Such excess factor of two is also present in the entropy losses appearing in Reference [2], Table 1.
Theorem 14 in Reference [2] has found application in works ranging from traditional themes, such as linear prediction [3] and source coding [4], to molecular communication systems [5,6].
The available literature treating the phenomenon itself of the entropy gain (loss, boost, or amplification) induced by LTI systems seems to be rather scarce. This is not surprising given that (3) was published in Reference [2], Theorem 14, the work which gave birth to Information Theory.
The following publication concerned with this problem is Reference [7], following a time-domain analysis for the corresponding discrete-time problem. In this approach, one can obtain y 1 n { y ( 1 ) , y ( 2 ) , , y ( n ) } as a function of u 1 n , for every n N , and evaluate the difference between the limits h ¯ ( y 1 ) and h ¯ ( u 1 ) , obtained by letting n . More precisely, for an LTI discrete-time filter G with impulse response g 0 = { g 0 , g 1 , } , we can write
y n 1 = g 0 0 0 g 1 g 0 0 g n 1 g n 2 g 0 G n u n 1 ,
where we adopt the notation y n 1 for column vectors to avoid the abuse of notation incurred by treating the sequence y 1 n as a vector, and because, by writing y n 1 , it is easier to remember that its samples are ordered from top to bottom. y n 1 [ y ( 1 ) y ( 2 ) y ( n ) ] T and the random vector u n 1 is defined likewise. From this, it is clear (see, e.g., the corollary after Theorem 8.6.4 in Reference [1]) that
h ( y n 1 ) = h ( u n 1 ) + log | det ( G n ) | ,
where det ( G n ) (or simply det G n ) stands for the determinant of G n . This result is utilized in Reference [7] to show that no entropy gain is produced by a stable minimum phase LTI system G if and only if the first sample in its impulse response has unit magnitude.
In Reference [8], p. 568, the entropy gain of a discrete-time LTI system G (the noise-less version of the setup depicted in Figure 1) is found to be
h ¯ ( y 1 ) = h ¯ ( u 1 ) + 1 2 π π π log G ( e j ω ) d ω ,
where y 1 is the filter’s discrete-time output process (without the effect of random initial state or an output disturbance) and
h ¯ ( y 1 ) lim n 1 n h ( y 1 n ) .
This result was obtained starting from the fact that, for a Gaussian stationary process u 1 with power spectral density (PSD) S u ( e j ω ) , h ¯ ( u 1 ) = 1 2 π π π S u ( e j ω ) d ω . If u 1 enters a discrete-time LTI system with frequency response G ( e j ω ) , then the PSD of its output y 1 is S y ( e j ω ) = S u ( e j ω ) G ( e j ω ) 2 ; thus, it is argued that (6) follows for Gaussian stationary inputs. Then, Reference [8] extends the result for non-Gaussian inputs with a proof sketch which uses a time-domain relation, like (4), to point out that the filter is a linear operator and, as such, the differential entropy of its output exceeds that of its input by a quantity that is independent of the input distribution. (It is worth noting that (6) is the discrete-time equivalent of (3) (without its wrong factor of 2), which follows directly from the correspondence between sampled band-limited continuous-time systems and discrete-time systems.)
It is in Reference [9], Section II-C, where, for the first time, it is shown that, for a stationary Gaussian input u 1 , the full entropy gain predicted by (6) takes place if the system output y 1 is contaminated by an additive output disturbance of length p and positive definite covariance matrix, where p is the order of G ( z ) .
The integral 1 2 π π π log G ( e j ω ) d ω can be related to the structure of the filter G. It is well known (from Jensen’s formula) that if G has a causal and stable rational transfer function G ( z ) and an impulse response with its first sample g 0 lim z G ( z ) , then
1 2 π π π log G ( e j ω ) d ω = log | g 0 | + i : ρ i > 1 log ρ i ,
where { ρ i } are the zeros of G ( z ) (see, e.g., References [10,11]). This provides a straightforward formula to evaluate 1 2 π π π log G ( e j ω ) d ω of a given LTI filter with rational transfer function G ( z ) . When combined with (6), this equation also reveals that if the entropy gain G ( u 1 , y 1 ) is negative (i.e., if it corresponds to an entropy loss), then | g 0 | < 1 (with the corresponding change of variables, this is the case in all the examples given by Shannon in Reference [2], Table 1). More importantly, (8) allows us to concentrate, without loss of generality, on LTI systems G ( z ) , whose first impulse-response sample has unit magnitude, as required by Assumption 1. Under the latter condition, (8) shows that the entropy gain is greater than zero if and only if G ( z ) has zeros outside the unit disk D { ρ C : ρ 1 } . A system with the latter property is said to be non-minimum phase (NMP); conversely, a system with all its zeros inside D is said to be minimum phase (MP) [11].

1.1. Main Contributions of this Paper

The main contributions of this paper can be summarized as follows:
  • Our first main result is showing that (6) and (3) do not hold for a large class of continuous-time filters and inputs. To see this, notice that
    | g 0 | = 1 | det ( G n ) | = 1 , n N ,
    which, in view of (5), is equivalent to h ( y n 1 ) = h ( u n 1 ) , n N . In turn, this implies that h ¯ ( y 1 ) h ¯ ( u 1 ) = 0 , regardless of whether G ( z ) (i.e., the polynomial g 0 + g 1 z 1 + ) has zeros with magnitude greater than one (choose, for example, g 0 = 1 , g 1 = 2 , and g k = 0 for k 2 ). This reveals that (4) holds if and only G ( z ) is MP. But (6) and (3) are equivalent (correcting for the in excess factor of 2 discussed in Remark 1); thus, Theorem 14 in Reference [2] also does not hold for a class of continuous-time filters. However, the transfer function G c ( s ) of a band-limited continuous-time filter G c is defined only for imaginary values of s (because the bilateral Laplace transform of sin ( t ) / t converges only on the imaginary axis), so one cannot classify such filters as MP or NMP. Instead, we consider a class of continuous-time filters limited to the frequencies in the band [ 0 , B ) , where B > 0 is in [Hz], defined by having a unit-impulse response of the form
    g ( t ) k = 0 η g k ϕ k ( t ) ,
    for some absolutely summable sequence of real-valued coefficients { g i } i = 0 η , η = 1 , 2 , , where the sinc functions
    ϕ k ( t ) sin ( 2 π B [ t k / ( 2 B ) ] ) π [ t k / ( 2 B ) ] .
    Since every such g satisfies g ( k / ( 2 B ) ) = 0 for k < 0 , it makes sense to refer to such filters as “sample-wise causal”. For this class of band-limited filters, we show that Theorem 14 holds if and only if the z-transform of { g i } i = 0 η is MP:
Theorem 2.
Suppose G c is a low-pass continuous-time filter with unit-impulse response as in (10). Let the continuous-time random input of G c be
u c ( t ) = 1 2 B k = 1 u ( k ) ϕ k ( t ) ,
for some random sequence { u ( k ) } k = 1 , with ϕ k as in (11), and denote its output as y c . Then,
h ¯ ( y c ) h ¯ ( u c ) = log g 0 = log 1 B 0 B G c ( ξ ) d ξ ( a ) 1 B 0 B log G c ( ξ ) d ξ ,
with equality in ( a ) if and only if the polynomial g 0 + g 1 z 1 + g 2 z 2 has no roots outside the unit circle.
2.
We show that 1 2 π π π log G ( e j ω ) d ω actually corresponds to the entropy gain introduced by G but considering the new notion of effective differential entropy rate of y 1 proposed in this paper, defined next.
Definition 1
(The Effective Differential Entropy). Let y R be a random vector. If y can be written as a linear transformation y = S u , for some u R n ( n ) with bounded differential entropy, S R × n , then the effective differential entropy of y is defined as
h ˘ ( y ) h ( A y ) ,
where S = A T T C is an SVD for S , with T R n × n .
We can now state our second main result, the proof of which is in Appendix A:
Theorem 3.
Let u 1 be the input of an LTI system G with transfer function G ( z ) without zeros on the unit circle and with an absolutely summable unit impulse response { g i } i = 0 η 1 , with η = if G has an infinite impulse response. Denote the output of G as y 1 . Suppose h ( u 1 n ) < for every finite n. Then,
lim n 1 n h ˘ ( y 1 n + η ( u 1 n ) ) h ( u 1 n ) = 1 2 π π π log G ( e j ω ) d ω ,
where y 1 n + η ( u 1 n ) denotes the entire response of G to the input u 1 n .
Theorem 3 states that, when considering the full-length output of a system, the effective entropy gain is introduced by the system itself.
Section 4 provides a geometrical description of the phenomenon behind Definition 1 and Theorem 3.
2.
We show that 1 2 π π π log G ( e j ω ) d ω is a tight upper bound to the entropy gain of G (as defined in (1)), when the output is contaminated by some additional additive signal, such as a random initial state (represented by x 0 in Figure 1) or an output disturbance (such as z 1 in Figure 1), with sufficiently many degrees of freedom (a condition formally stated in Assumption 2 below). Moreover, we show that an entropy gain equal to the latter upper bound can appear even when these disturbances or random initial state have infinitesimally small variances. To the best of our knowledge, the latter phenomenon has been discussed in the literature first (and only) in Reference [9], Section II-C, for Gaussian stationary inputs and an LTI filter. We go beyond the latter result by explicitly and fully characterizing the entropy gain of LTI systems for a large class of not necessarily Gaussian nor stationary random input. We refer to this class as entropy-balanced processes, formally specified in the following definition:
Definition 2.
A random process { v ( k ) } k = 1 is said to be entropy balanced if the following two conditions are satisfied:
(i) 
Its sample variances σ v ( n ) 2 are finite for finite n and
lim n 1 n log ( σ v ( n ) 2 ) = 0 .
(ii) 
For every ν N and for every sequence of matrices { Φ n } n = ν + 1 , Φ n R ( n ν ) × n with orthonormal rows,
lim n 1 n h ( Φ n v n 1 ) h ( v n 1 ) = 0 .
The second condition guarantees that projecting an entropy-balanced process onto any subspace having finitely fewer dimensions yields a process with the same differential entropy rate.
The entropy gain induced by finite-length output disturbances is characterized by our next theorem.
Theorem 4.
In the system of Figure 1, let G satisfy Assumption 1 and suppose that u 1 is entropy balanced. Suppose the random output disturbance z 1 is such that z ( i ) = 0 , i > κ , and that | h ( z 1 κ ) | < . Let κ ¯ min { κ , m } , where m is the number of NMP zeros of G ( z ) . Then,
i = m κ ¯ + 1 m log | ρ i | lim sup n 1 n ( h ( y n 1 ) h ( u n 1 ) ) i = 1 κ ¯ log | ρ i | ( a ) 1 2 π π π log G ( e j ω ) d ω
with equality in ( a ) if and only if κ m .
The proof is presented in Section 6.4, and we provide geometrical insight explaining the phenomenon underlying Definition 2 and Theorem 4 in Section 5.1.
2.
We illustrate the relevance of the results summarized above by applying them to three problems in three areas, namely:
(a)
Networked Control: We show that equality holds in the inequality stated in Reference [12], Lemma 3.2 (a fundamental piece for the performance limitation results further developed in Reference [13]), under very general conditions. In addition, we extend the validity of a related equality for the perfect-feedback case, given by Reference [14], Theorem 14, for Gaussian signals, to the much larger class of entropy-balanced processes.
(b)
The rate-distortion function for non-stationary Gaussian sources: This problem has been previously solved in References [15,16,17]. We provide a simpler proof based upon the results described above. This proof extends the result stated in References [16,17] to a broader class of non-stationary sources.
(c)
Gaussian channel capacity with feedback: We show that capacity results based on using a short random sequence as channel input and relying on a feedback filter which boosts the entropy rate of the end-to-end channel noise (such as the one proposed in Reference [9]), crucially depend upon the complete absence of any additional disturbance anywhere in the system. Specifically, we show that the information rate of such capacity-achieving schemes drops to zero in the presence of any such additional disturbance. As a consequence, the relevance of characterizing the robust (i.e., in the presence of disturbances) feedback capacity of Gaussian channels, which appears to be a fairly unexplored problem, becomes evident.

1.2. Paper Outline

The remainder of this paper begins with some necessary definitions and preliminary results in Section 2. It continues with our detailed exposition in Section 3 of why Shannon’s reasoning fails to yield the right expression for the entropy gain. We present an intuitive discussion leading to the definition of effective differential entropy in Section 4, which is ended by the proof of Theorem 3. Section 5 gives a geometric interpretation of how an arbitrarily small additive perturbation is able to boost the differential entropy rate of the process coming out of an NMP LTI filter. This exposition helps understanding and justifies the introduction of entropy-balanced random processes, which are also characterized there. Section 6 and Section 7 contain our results for the entropy gain produced by an output disturbance and a random initial state, respectively. Our illustrative application results are presented in Section 8, followed by our conclusions in Section 9. Except when presented right after a statement or in its own section, all proofs are given in Appendix B.

2. Preliminaries

2.1. Notation

The sets of natural, real and complex numbers are denoted N , R , and C , respectively. For a complex x, { x } is the real part of x. For a set S , the indicator function 1 S ( x ) equals 1 if x S and 0 otherwise. For any LTI system G, the transfer function G ( z ) corresponds to the z-transform of the impulse response g 0 , g 1 , , i.e., G ( z ) = i = 0 g i z i . For a transfer function G ( z ) , we denote by G n R n × n the lower triangular Toeplitz matrix having [ g 0 g n 1 ] T as its first column. We write x 1 n as a shorthand for the sequence { x 1 , , x n } , and, when convenient, we write x 1 n in vector form as x n 1 [ x 1 x 2 x n ] T , where ( ) T denotes transposition. Random scalars (vectors) are denoted using non-italic characters, such as x (non-italic and boldface characters, such as x ). The notation x y means x and y are independent. If x and z are conditionally independent given y, we write x y z . For matrices, we use upper-case boldface symbols, such as A . We write λ i ( A ) to denote the i-th eigenvalue of A sorted in increasing magnitude. If A C m × n , A H is its conjugate transpose, and σ i ( A ) λ i ( A H A ) , if m n , and σ i ( A ) λ i ( A A H ) , if m < n . We define σ min ( A ) σ 1 ( A ) and σ max ( A ) σ min { m , n } ( A ) . The term A i , j denotes the entry in the intersection between the i-th row and the j-th column. If A C m × n , then A T and A * denote the transpose and conjugate transpose of A , respectively. We write [ A ] i 2 i 1 , with 1 i 1 i 2 m , to refer to the matrix formed by selecting the rows i 1 to i 2 of A . Likewise, for 1 j 1 j 2 n , A j 1 j 2 is the matrix built with columns j 1 to j 2 of A . The expression m 1 [ A ] m 2 corresponds to the square sub-matrix along the main diagonal of A , with its top-left and bottom-right corners on A m 1 , m 1 and A m 2 , m 2 , respectively. A diagonal matrix whose entries are the elements in a set D (wherein elements may be repeated) is denoted as diag D . If A R n × m 1 and B R n × m 2 , we write [ A | B ] R n × ( m 1 + m 2 ) to denote the augmented matrix built by placing the columns of A followed by those of B .

2.2. Mutual Information and Differential Entropy

Let x , y, and z be random variables with joint PDF f x , y , z , and marginal PDFs f x , f y , and f z , respectively. The mutual information between x and y is defined as I ( x ; y ) f x , y ( x , y ) log f x , y ( x , y ) f x ( x ) f y ( y ) d x d y . The conditional mutual information between x and y given z is defined as I ( x ; y | z ) f x , y , z ( x , y , z ) log f x , y | z ( x , y | z ) f x | z ( x | z ) f y | z ( y | z ) d x d y d z , where f x , y | z is the joint PDF of x and y given z, and f x | z , f y | z are defined likewise. The conditional differential entropy of x given y is defined as h ( x | y ) f x , y ( x , y ) log ( f x | y ( x | y ) ) d x d y .
From these definitions, it is easy to verify the following properties Reference [1], Sections 2.4–2.6 and 8.4–8.6:
  • Shift invariance: for every deterministic function f,
    h ( x + f ( y ) | y ) = h ( x | y ) .
  • Non-negativity:
    I ( x ; y ) 0 ,
    with equality if and only if x and y are independent.
  • Chain Rule:
    I ( x ; y , z ) = I ( x ; y ) + I ( x ; z | y ) .
  • Relationship with entropy:
    I ( x ; y ) = h ( x ) h ( x | y ) = h ( y ) h ( y | x ) .

2.3. System Model and Assumptions

Consider the discrete-time system depicted in Figure 1. In this setup, the block G satisfies Assumption 1.
It is worth noting that there is no loss of generality in considering g 0 = 1 , since one can otherwise write G ( z ) as G ( z ) = g 0 · ( G ( z ) / g 0 ) ; thus, the entropy gain introduced by G ( z ) would be log | g 0 | plus the entropy gain due to G ( z ) / g 0 (in agreement with (6)), which has an impulse response with its first sample equal to 1.
The following assumption is made about the output disturbance z 1 :
Assumption 2.
The disturbance z 1 is independent of u 1 and belongs to a κ-dimensional linear subspace, for some finite κ N . This subspace is spanned by the κ orthonormal columns of a matrix Φ R | N | × κ (where | N | stands for the countably infinite size of N ), such that | h ( Φ T z 1 ) | < . Moreover, z 1 = Φ s κ 1 , where the random vector s κ 1 Φ T z 1 has finite differential entropy, its covariance matrix K s κ 1 satisfies λ max ( K s κ 1 ) < , and it is independent of u 1 .

3. Revisiting Theorem 14 in Reference Shannon et al.

In this section, after presenting the proof of Theorem 2, we develop Shannon’s approach into a more detailed and formal exposition. This allows us to explain why, for part of the continuous-time filters considered in Theorem 2, the approach chosen by Shannon to prove Theorem 14 in Reference [2] is unable to predict the correct value for the entropy gain.

3.1. Proof of Theorem 2

To begin with, the Fourier transform of ϕ k is
Φ k ( ξ ) ϕ k ( t ) e j 2 π ξ t d t = 1 [ B , B ] ( ξ ) e j 2 π ξ k / ( 2 B ) .
It is easy to verify that the functions ϕ k satisfy the following orthogonality property:
ϕ k ( t ) ϕ i ( t ) d t = 2 B , k = i 0 , k i
and
ϕ k ( t ) = ϕ k ( t ) .
Notice that u ( k ) = u c ( k 2 B ) , k N .
The output of G c sampled at time t = / ( 2 B ) , N , is
y ( ) y c ( 2 B ) = g ( τ ) u c ( 2 B τ ) d τ
= 1 2 B k = 1 i = 0 η g i u ( k ) ϕ i ( τ ) ϕ k ( 2 B τ ) d τ
= 1 2 B k = 1 i = 0 η g i u ( k ) ϕ i ( τ ) ϕ k ( τ ) d τ
= i = 0 η g i u ( i ) ,
with u ( k ) = 0 for k 0 . This means that the output samples y 1 are the discrete-time convolution between u 1 and the filter coefficients { g i } i = 0 η . Therefore, the matrix relation (4) holds. We then obtain that h ¯ ( y c ) = h ¯ ( u c ) + log | g 0 | .
The frequency response of G c is given by
G c ( ξ ) = g ( t ) e j 2 π ξ t d t = k = 0 η g k Φ k ( ξ ) = k = 0 η g k e j π ξ k / B ,
where ξ is in [Hz]. This means that
g 0 = 1 2 B B B G c ( ξ ) d ξ = 1 B 0 B { G c ( ξ ) } d ξ ,
where the last equality holds because G c ( ξ ) is conjugate symmetric. Thus, the entropy gain introduced by G c is the right-hand side of (13), concluding the proof.   □

3.2. Formalizing Shannon’s Argument

In the approach followed by Shannon, it is argued that the entropy gain is the limit as n of n 1 r = 0 n 1 log | G c ( ξ r ) | over uniformly spaced frequencies ξ 0 , , ξ n 1 . Here, we show that this summation corresponds to log | det ( G ˜ n ) | , where G ˜ n is an n-by-n Toeplitz circulant matrix. Moreover, the sequences of Hermitian matrices { G n G n * } n = 1 and { G ˜ n G ˜ n * } n = 1 are asymptotically equivalent (as defined in Reference [18], Section 2.3), which would yield lim n n 1 log | det ( G n ) | = lim n n 1 log | det ( G ˜ n ) | if the eigenvalues of G n G n * were bounded between constants 0 < ζ m < ζ M < for all n N . However, if G ( z ) (the z-transform of { g k } k = 0 ) has NMP zeros, then G n G n * has eigenvalues tending to zero exponentially as n , which precludes these two limits to coincide.
To prove the above claims, we first apply the change of variable ω π ξ / B , with which (30) becomes
G c ( B ω / π ) = G ( e j ω ) k = 0 η g k e j ω k ,
where G ( e j ω ) is the frequency response of the discrete-time filter G with unit-impulse response { g i } i = 1 η and ω is in radians per second. Now, following Shannon’s approach, we uniformly sample G ( e j ω ) at n frequencies
ω r r 2 π n , r / n 0.5 r 2 π n 2 π , r / n > 0.5 , r = 0 , 1 , , n 1 ,
which, from (32), yields the spectral samples
G ( e j ω r ) = k = 0 η g k e j 2 π n r k .
We will cast the reason why (3) fails to coincide with the correct expression for the entropy gain provided by (5) as a disagreement between the asymptotic behavior of the logarithm of the determinant of two sequences of asymptotically equivalent matrices. For that purpose, since (34) coincides with Reference [18], Equation 4.34, we have that the spectral samples G ( e j ω r ) r = 0 n 1 are the eigenvalues of the Toeplitz circulant matrix (Reference [18], Chapter 3)
G ˜ n g ˜ n , 0 g ˜ n , n 1 g ˜ n , 1 g ˜ n , 1 g ˜ n , n 1 g ˜ n , n 1 g ˜ n , 0 = U n * diag { G ( e j ω 0 ) , , G ( e j ω n 1 ) } U n ,
where U n C n × n is the n-point discrete Fourier transform (DFT) matrix, defined as
[ U n ] k , r 1 n e j 2 π n k r , k , r = 0 , 1 , , n 1 .
From Reference [18], Lemma 4.5, g ˜ n , k i N 0 : k + n i η g k + i n , corresponding to the (possibly) aliased impulse response g 0 , g 1 , , g η as a result of sampling in frequency.
We can now see that the discrepancy between the entropy gain predicted by (3) and (5) is the disagreement between the following limits:
lim n 1 2 n log ( det ( G n G n * ) ) = ( 31 ) log 1 B 0 B { G c ( ξ ) } d ξ ,
lim n 1 2 n log ( det ( G ˜ n G ˜ n * ) ) = ( ) 1 B 0 B log | G c ( ξ ) | d ξ ,
where, due to (8), the expressions on both right-hand sides differ if and only if G ( z ) has NMP zeros. According to Reference [18], Lemma 4.6, the sequences { G n } n = 1 and { G ˜ n } n = 1 are asymptotically equivalent, which is written as G n G ˜ n . Then, from Reference [18], Theorem 2.1, the Hermitian matrices G n G n * G ˜ n G ˜ n * , which, from Reference [18], Theorem 2.4, implies that
lim n 1 n i = 1 n f ( λ i ( G n G n * ) ) = lim n 1 n i = 1 n f ( λ i ( G ˜ n G ˜ n * ) ) ,
for any function f continuous over a finite interval [ ζ m , ζ M ] such that
ζ m λ i ( G n G n * ) , λ i ( G ˜ n G ˜ n * ) ζ M , i = 1 , 2 , , n , n = 1 , 2 , .
However, when G ( z ) has m NMP zeros, Lemma 7 (in Section 6.3) establishes that there are exactly m eigenvalues of G n that tend to zero exponentially as n . Crucially, log ( · ) is discontinuous at 0, which precludes the limits in (37) from coinciding.

4. The Effective Differential Entropy

Theorem 3 establishes that the effective differential entropy rate of the entire or complete output of an LTI system exceeds that of the (shorter) input sequence by the RHS of (15). This section provides a geometrical interpretation of this problem and intuition about the effective differential entropy already introduced in Definition 1.
Consider the random vectors u [ u 1 u 2 ] T and y [ y 1 y 2 y 3 ] T related via
y 1 y 2 y 3 = 1 0 2 1 0 2 G ˘ 2 u 1 u 2 .
Suppose u is uniformly distributed over [ 0 , 1 ] × [ 0 , 1 ] . Applying the conventional definition of differential entropy of a random sequence, we would have that
h ( y 1 , y 2 , y 3 ) = h ( y 1 , y 2 ) + h ( y 3 | y 1 , y 2 ) =
because y 3 is a deterministic function of y 1 and y 2 :
y 3 = [ 0 2 ] [ u 1 u 2 ] T = [ 0 2 ] 1 0 2 1 1 y 1 y 2 .
In other words, the problem lies in that, although the output is a three-dimensional vector, it only has two degrees of freedom, i.e., it is restricted to a 2-dimensional subspace of R 3 . This is illustrated in Figure 2, where the set [ 0 , 1 ] × [ 0 , 1 ] is shown (coinciding with the u-v plane), together with its image through G ˘ 2 (as defined in (40)).
As can be seen in this figure, the image of the square [ 0 , 1 ] 2 through G ˘ 2 is a 2-dimensional rhombus over which { y 1 , y 2 , y 3 } distributes uniformly. Since the intuitive notion of differential entropy of an ensemble of random variables relates to the size of the region spanned by the associated random vector (and determines how difficult it is to compress it in a lossy fashion with a given precision), one could argue that the differential entropy of { y 1 , y 2 , y 3 } , far from being , should be somewhat larger than that of { u 1 , u 2 } (since the rhombus G ˘ 2 [ 0 , 1 ] 2 has a larger area than [ 0 , 1 ] 2 ). So, what does it mean that (and why should) h ( y 1 , y 2 , y 3 ) = ? Simply put, the differential entropy relates to the volume spanned by the support of the probability density function. For y in our example, the latter (three-dimensional) volume is clearly zero.
From the above discussion, the comparison between the differential entries of y R 3 and u R 2 of our previous example should take into account that y actually lives in a two-dimensional subspace of R 3 . Indeed, since the multiplication by a unitary matrix does not alter differential entries, we could consider the differential entropy of
y ˜ 0 Q ˘ q ¯ T y ,
where Q ˘ T is the 3 × 2 matrix with orthonormal rows in the SVD of G ˘ 2
G ˘ 2 = Q ˘ T D ˘ R ˘ ,
and q ¯ is a unit-norm vector orthogonal to the rows of Q ˘ (and thus orthogonal to y , as well). We are now able to compute the differential entropy in R 2 for y ˜ , corresponding to the rotated version of y such that its support is now aligned with R 2 .
The preceding discussion motivates the use of a modified version of the notion of differential entropy for a random vector y R n which considers the number of dimensions actually spanned by y instead of its length.
It is worth mentioning that Shannon’s differential entropy of a vector y R , whose support’s -volume is greater than zero, arises from considering it as the difference between its (absolute) entropy and that of a random variable uniformly distributed over an -dimensional, unit-volume region of R . More precisely, if in this case the probability density function (PDF) of y = [ y 1 y 2 y ] T is Riemann integrable, then [1], Thm. 9.3.1,
h ( y ) = lim Δ 0 H ( y Δ ) + log Δ ,
where y Δ is the discrete-valued random vector resulting when y is quantized using an -dimensional uniform quantizer with -cubic quantization cells with volume Δ . However, if we consider a variable y whose support belongs to an n-dimensional subspace of R , n < (i.e., y = S u = A T T C u , as in Definition 1), then the entropy of its quantized version in R , say H ( y Δ ) , is distinct from H n ( ( A y ) Δ ) , the entropy of A y in R n . Moreover, it turns out that, in general,
lim Δ 0 H ( y Δ ) H n ( ( A y ) Δ ) 0 ,
despite the fact that A has orthonormal rows. Thus, the definition given by (44) does not yield consistent results for the case wherein a random vector has a support’s dimension (i.e., its number of degrees of freedom) smaller that its length (The mentioned inconsistency refers to (45).), which reveals that the asymptotic behavior H ( y Δ ) changes if y is rotated. (If this were not the case, then we could redefine (44) replacing by n, in a spirit similar to the one behind Renyi’s d-dimensional entropy [19].) To see this, consider the case in which u R distributes uniformly over [ 0 , 1 ] and y = [ 1 1 ] T u / 2 . Clearly, y distributes uniformly over the unit-length segment connecting the origin with the point ( 1 , 1 ) / 2 . Then,
H 2 ( y Δ ) = 1 Δ 2 Δ 2 log Δ 2 1 1 Δ 2 2 Δ log 1 1 Δ 2 2 Δ .
On the other hand, since, in this case, A y = u , we have that
H 1 ( ( A y ) Δ ) = H 1 ( u Δ ) = 1 Δ Δ log Δ ( 1 1 Δ Δ ) log ( 1 1 Δ Δ ) .
Thus, the d-dimensional entropy would not generally be equal to the effective differential entropy, that is:
lim Δ 0 H 1 ( ( A y ) Δ ) H 2 ( y Δ ) = lim Δ 0 1 Δ 2 Δ 2 log Δ 2 1 Δ Δ log Δ = log 2 .
The latter example further illustrates why the notion of effective entropy is appropriate in the setup considered in this section, where the effective dimension of the random sequences does not coincide with their length (it is easy to verify that the effective entropy of y does not change if one rotates y in R ).
We finish this section with an example to illustrate the usefulness of the notion of effective differential entropy beyond the context of entropy gain.

Application Example: Shannon Lower Bound

The rate-distortion function (RDF) R ( D ) is the infimum, among all codes, of the expected number of bits per sample necessary to reconstruct a given random source with distortion not greater than D [1]. Let the source and reconstruction be the vectors x 1 and x 1 + v 1 , respectively, and suppose the distortion is assessed using the mean-squared error (MSE) d ( v 1 ) E [ v 1 2 ] . Then, restricting our attention to uniquely-decodable codes Reference [1], p. 105), the Shannon Lower Bound (SLB) [20] establishes that
R ( D ) h ( x 1 ) max d ( v 1 ) D h ( v 1 ) ,
provided h ( x 1 ) is bounded. Therefore, if x 1 is the entire forced response of an FIR filter G of order p to an input u 1 n , then = n + p and h ( x 1 ) is minus infinity, which precludes one from using (49). We will show next that, in this case, the SLB can still be stated by using the effective differential entropy h ˘ ( x 1 ) instead of h ( x 1 ) . Following Definition 1, we can write the source vector as x 1 = A T T C u n 1 , where A R n × has orthonormal rows, T R n × n is diagonal with non-negative entries, and C R n × n is unitary. Let H [ A T | A ¯ T ] R × be a unitary matrix, which means that A ¯ A T = 0 p × . Then,
R ( D ) ( a ) I ( x 1 ; x 1 + v 1 ) ,
= ( a ) I ( H x 1 ; H x 1 + H v 1 ) ,
= ( a ) I ( A x 1 ; A x 1 + H v 1 ) ,
= ( a ) I ( A x 1 ; A x 1 + A v 1 , A ¯ v 1 ) ,
= ( 21 ) I ( A x 1 ; A x 1 + A v 1 ) , + I ( A x 1 ; A ¯ v 1 | A x 1 + A v 1 ) ,
( 20 ) I ( A x 1 ; A x 1 + A v 1 ) ,
= ( 22 ) h ( A x 1 ) h ( A x 1 | A x 1 + A v 1 ) ,
= ( 19 ) h ( A x 1 ) h ( A v 1 | A x 1 + A v 1 ) ,
( b ) h ( A x 1 ) h ( A v 1 ) ,
( a ) h ( A x 1 ) max w n 1 : d ( w n 1 ) D h ( w n 1 ) ,
= ( c ) h ˘ ( x 1 ) max w n 1 : d ( w n 1 ) D h ( w n 1 ) ,
where ( a ) stems from Reference [1], Theorems 5.4.1 and 5.5.1 and Equations (10).58-10.61, ( b ) holds because conditioning does not increase entropy and ( c ) is from the definition of effective differential entropy.

5. Entropy-Balanced Processes: Geometric Interpretation and Properties

In the first part of this section, we provide a geometric interpretation of the effect that a non-minimum phase LTI system has on its input random process. This will give an intuitive meaning to the notion of an entropy-balanced random process (introduced in Definition 2 above) and provide insights into why and how the entropy gain defined in (1) arises as a consequence of an output random disturbance or a random initial state (the themes of Section 6 and Section 7, respectively).
The second part of this section identifies several entropy-balanced processes and establishes two properties satisfied by this class of processes.

5.1. Geometric Interpretation

We begin our discussion with a simple example.
Example 1.
Suppose that G in Figure 1 is a finite impulse response (FIR) filter with impulse response g 0 = 1 , g 1 = 2 , g i = 0 , i 2 . Notice that this choice yields G ( z ) = ( z 2 ) / z ; thus, G ( z ) has one non-minimum phase zero, at z = 2 . The associated matrix G n for n = 3 is
G 3 = 1 0 0 2 1 0 0 2 1 ,
whose determinant is clearly one (indeed, all its eigenvalues are 1). Hence, as discussed in the introduction, h ( G 3 u 3 1 ) = h ( u 3 1 ) ; thus, G 3 (and G n , in general) does not introduce an entropy gain by itself. However, an interesting phenomenon becomes evident by looking at the SVD of G 3 , given by G 3 = Q 3 T D 3 R 3 , where Q 3 and R 3 are unitary matrices, and D 3 diag { d 1 , d 2 , d 3 } . In this case, D 3 = diag { 0.19394 , 1.90321 , 2.70928 } ; thus, one of the singular values of G 3 is much smaller than the others (although the product of all singular values yields 1, as expected). As will be shown in Section 6, for a stable G ( z ) , such uneven distribution of singular values arises only when G ( z ) has non-minimum phase zeros. The effect of this can be visualized by looking at the image of the cube [ 0 , 1 ] 3 through G 3 , shown in Figure 3.
If the input u 3 1 were uniformly distributed over this cube (of unit volume), then G 3 u 3 1 would distribute uniformly over the unit-volume parallelepiped depicted in Figure 3; hence, h ( G 3 u 3 1 ) = h ( u 3 1 ) .
Now, if we add to G 3 u 3 1 a disturbance z 3 1 = Φ s , with scalar s uniformly distributed over [ 0.5 , 0.5 ] independent of u 3 1 , and with Φ R 3 × 1 , the effect would be to “thicken” the support over which the resulting random vector y 3 1 = G 3 u 3 1 + z 3 1 is distributed, along the direction pointed by Φ . If Φ is aligned with the direction along which the support of G 3 u 3 1 is thinnest (given by q 3 , 1 , the first row of Q 3 ), then the resulting support would have its volume significantly increased, which can be associated with a large increase in the differential entropy of y 3 1 with respect to u 3 1 . Indeed, a relatively small variance of s and an approximately aligned Φ would still produce a significant entropy gain.
The above example suggests that the entropy gain from u n 1 to y n 1 appears as a combination of two factors. The first of these is the uneven way in which the random vector G n u n 1 is distributed over R n . The second factor is the alignment of the disturbance vector z n 1 with respect to the span of the subset { q n , i } i Ω n of columns of Q n , associated with the smallest singular values of G n , indexed by the elements in the set Ω n . As we shall discuss in the next section, if G has m non-minimum phase zeros, then, as n increases, there will be m singular values of G n going to zero exponentially. Since the product of the singular values of G n equals 1 for all n, it follows that i Ω n d n , i must grow exponentially with n, where d n , i is the i-th diagonal entry of D n . This implies that G n u n 1 expands with n along the span of { q n , i } i Ω n , compensating its shrinkage along the span of { q n , i } i Ω n , thus keeping h ( G n u n 1 ) = h ( u n 1 ) for all n. Thus, as n grows, any small disturbance distributed over the span of { q n , i } i Ω n , added to G n u n 1 , will keep the support of the resulting distribution from shrinking along this subspace. Consequently, the expansion of G n u n 1 with n along the span of { q n , i } i Ω n is no longer compensated, yielding an entropy increase proportional to log ( i Ω n d n , i ) .
The above analysis allows one to anticipate a situation in which no entropy gain would take place even when some singular values of G n tend to zero as n . Since the increase in entropy is made possible by the fact that, as n grows, the support of the distribution of G n u n 1 shrinks along the span of { q n , i } i Ω n , no such entropy gain should arise if the support of the distribution of the input u n 1 expands accordingly along the directions pointed by the rows { r n , i } i Ω n of R n .
An example of such situation can be easily constructed as follows: Let G ( z ) in Figure 1 have non-minimum phase zeros and suppose that u 1 is generated as G 1 u ˜ 1 , where u ˜ 1 is an i.i.d. random process with bounded entropy rate. Since the determinant of G n 1 equals 1 for all n, we have that h ( u n 1 ) = h ( u ˜ n 1 ) , for all n. On the other hand, y n 1 = G n G n 1 u ˜ n 1 + z n 1 = u ˜ n 1 + z n 1 . Since z n 1 = [ Φ ] n 1 s κ 1 for some finite κ (recall Assumption 2), it is easy to show that lim n 1 n h ( y n 1 ) = lim n 1 n h ( u ˜ n 1 ) = lim n 1 n h ( u n 1 ) ; thus, no entropy gain appears.
The preceding discussion reveals that the entropy gain produced by G in the situation shown in Figure 1 depends on the distribution of the input and on the support and distribution of the disturbance. This stands in stark contrast with the well known fact that the increase in differential entropy produced by an invertible linear operator depends only on its Jacobian, and not on the statistics of the input [2]. We have also seen that the distribution of a random process along the different directions within the Euclidean space which contains it plays a key role, as well. This motivates the need to specify a class of random processes which distribute more or less evenly over all directions. This is precisely the intuitive meaning of an entropy-balanced process.
The following section identifies a large family of processes belonging to this class, as well as two properties which greatly expands this family.

5.2. Characterization of Entropy-Balanced Processes

We have defined the notion of an “entropy-balanced” process in Section 1.1. In words, the first condition in this definition allows one to guarantee that the orthogonal projection of an entropy-balanced process onto any ν -dimensional linear subspace has a differential entropy whose magnitude remains bounded or grows at most sub-linearly with n. The second condition states that the projection of an entropy-balanced process v 1 onto any linear subspaces having ν fewer dimensions has the same differential entropy rate as the original process. This condition is equivalent to requiring that every unitary transformation on v 1 n yields a random sequence y 1 n such that lim n 1 n h ( y n ν + 1 n | y 1 n ν ) = 0 . This property of the resulting random sequence y 1 n means that one cannot predict its last ν samples with arbitrary accuracy by using its previous n ν samples, even if n goes to infinity.
We now characterize a large family of entropy-balanced random processes and establish some of their properties. Although intuition may suggest that most random processes (such as i.i.d. or stationary processes) should be entropy balanced, that statement seems rather difficult to prove. In the following, we show that the entropy-balanced condition is met by i.i.d. processes with per-sample probability density function (PDF) being uniform, piece-wise constant or Gaussian. It is also shown that adding to an entropy-balanced process an independent random processes independent of the former yields another entropy-balanced process, and that filtering an entropy-balanced process by a stable and minimum phase filter yields an entropy-balanced process, as well. The proofs can be found in Appendix B.
Lemma 1.
Let u 1 be a Gaussian random process with independent elements having positive and bounded variance, i.e., there exist 0 < σ ˇ 2 σ ^ 2 < such that σ ˇ 2 σ u ( n ) 2 σ ^ 2 , n N . Then, u 1 is entropy balanced.
Lemma 2.
Let u 1 be a random process with independent elements satisfying Condition i) in Definition 2, in which each u i is distributed according to a (possibly different) piece-wise constant PDF such that each interval where this PDF is constant has measure less than θ and greater than ϵ, for some constants 0 < ϵ < θ < . Then, u 1 is entropy balanced.
Lemma 3.
Let u 1 and v 1 be mutually independent random processes. If u 1 is entropy balanced, and w 1 u 1 + v 1 satisfies σ w ( n ) 2 < for finite n and lim n n 1 log ( σ w ( n ) 2 ) = 0 , then w 1 is also entropy balanced.
The proof of Lemma 3 is on page 33. The working behind this lemma can be interpreted intuitively by noting that adding to a random process another independent random process can only increase the “spread” of the distribution of the former, which tends to balance the entropy of the resulting process along all dimensions in Euclidean space. In addition, it follows from Lemma 3 that all i.i.d. processes having a per-sample PDF which can be constructed by convolving uniform, piece-wise constant or Gaussian PDFs as many times as required are entropy balanced. It also implies that one can have non-stationary processes which are entropy balanced, since Lemma 3 imposes no requirements for the process v 1 .
The next lemma related to the properties of entropy-balanced processes shows that filtering by a stable and minimum phase LTI filter preserves the entropy balanced condition of its input.
Lemma 4.
Let u 1 be an entropy-balanced process and G an LTI stable and minimum-phase filter. Then, the output w 1 G u 1 is also an entropy-balanced process.
This result implies that any stable moving-average auto-regressive process constructed from entropy-balanced innovations is also entropy balanced, provided the coefficients of the averaging and regression correspond to a stable MP filter.
The last lemma of this section states a crucial property of entropy-balanced processes (the proof is in Appendix B, page 34).
Lemma 5.
Let u 1 be an entropy balanced process. Consider a disturbance z 1 satisfying Assumption 2 and define y 1 u 1 + z 1 . Then, lim n n 1 ( h ( y 1 n ) h ( u 1 n ) ) = 0 .
We finish this section by pointing out two examples of processes which are non-entropy-balanced, namely the output of a NMP-filter to an entropy-balanced input and the output of an unstable filter to an entropy-balanced input. The first of these cases plays a central role in the next section.

6. Entropy Gain Due to External Disturbances

In this section, we formalize the ideas which were qualitatively outlined in the previous section. Specifically, for the system shown in Figure 1 we will characterize the entropy gain G ( G , x 0 , u 1 , z 1 ) defined in (1) for the case in which the initial state x 0 is zero (or deterministic) and there exists a random disturbance of (possibly infinite length) z 1 which satisfies Assumption 2.

6.1. Input Disturbances Do Not Produce Entropy Gain

In this section, we show that random disturbances satisfying Assumption 2, when added to the input  u 1 (i.e., before G), do not introduce entropy gain. This result can be obtained from Lemma 6, as stated in the following theorem:
Theorem 5
(Input Disturbances do not Introduce Entropy Gain). Let G and z 1 satisfy Assumptions 1 and 2, respectively. Suppose that u 1 is entropy balanced and consider the output
y 1 = G ( u 1 + z 1 ) .
Then,
lim n 1 n h ( y 1 n ) h ( u 1 n ) = 0
Proof. 
From Lemma 5, the differential entropy rate of u 1 equals that of u 1 + z 1 . The proof is completed by recalling that G yields no entropy gain for its input u 1 + z 1 because it corresponds to the noise-less scenario.    □

6.2. The Entropy Gain Introduced by Output Disturbances when G is MP is Zero

The results from the previous section yield the following corollary, which states that an LTI system with transfer function G ( z ) without zeros outside the unit circle (i.e., an MP transfer function) cannot introduce entropy gain.
Corollary 1
(Minimum Phase Filters do not Introduce Entropy Gain). Consider the system shown in Figure 1 wherein the input u 1 is an entropy-balanced random process and the output disturbance z 1 satisfies Assumption 2. Besides Assumption 1, suppose that G ( z ) is minimum phase. Then,
lim n 1 n h ( y 1 n ) h ( u 1 n ) = 0 .
Proof. 
Since G ( z ) is minimum phase and stable, the result follows directly from Lemmas 4 and 5.    □

6.3. The Entropy Gain Introduced by Output Disturbances when G ( z ) is NMP

We show here that the entropy gain of an LTI system with transfer function G ( z ) and an output disturbance is at most the sum of the logarithm of the magnitude of the zeros of G ( z ) outside the unit circle.
The following lemma will be instrumental for that purpose.
Lemma 6.
Consider the system in Figure 1, and suppose z 1 satisfies Assumption 2, and that the input process u 1 is entropy balanced. Let G n = Q n T D n R n be the SVD of G n , where D n = diag { d n , 1 , , d n , n } are the singular values of G n , with d n , 1 d n , 2 d n , n , such that | det G n | = 1 n . Let m be the number of these singular values which tend to zero exponentially as n . Then,
lim n 1 n h ( y 1 n ) h ( u 1 n ) = lim n 1 n i = 1 m log d n , i + h [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 .
The proof of this lemma can be found on page 34, in Appendix B.
Lemma 6 leaves the need to characterize the asymptotic behavior of the singular values of G n . This is accomplished in the following lemma, which relates these singular values to the zeros of G ( z ) . It is a generalization of the unnumbered lemma in the proof of Reference [16], Theorem 1 (restated in Appendix C as Lemma A3), which holds for FIR transfer functions, to the case of infinite-impulse response (IIR) transfer functions (i.e., transfer functions having poles).
Lemma 7.
For a transfer function G ( z ) satisfying Assumption 1, where its zeros { ρ i } i = 1 p satisfy | ρ 1 | | ρ m | > 1 | ρ m + 1 | | ρ p | . Then,
λ l ( G n G n T ) = α n , l 2 | ρ l | 2 n , if l m , α n , l 2 , o t h e r w i s e ,
where the elements in the sequence { α n , l } are positive and increase or decrease at most polynomially with n.
(The proof of this lemma can be found in Appendix B, page 36).
Lemma 6 also precisely formulates the geometric idea outlined in Section 5.1. To see this, notice that no entropy gain is obtained if the output disturbance vector z n 1 becomes orthogonal (with probability 1) to the space spanned by the first m columns of Q n sufficiently fast as n . Recalling from Assumption 2 that
z n 1 = [ Φ ] n 1 s κ 1 ,
where the matrix Φ has κ orthonormal columns of infinite length, such orthogonality condition can be formally stated by defining
κ n rank ( [ Q n ] m 1 [ Φ ] n 1 )
κ ^ lim sup n κ n
κ ˇ lim inf n κ n
as κ ^ = 0 .
If this were the case, then the disturbance would not be able fill the subspace along which G n u n 1 is shrinking exponentially. Indeed, if κ n = 0 for all n, then h ( [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 ) = h ( 1 [ D n ] m [ R n ] m 1 u n 1 ) = i = 1 m log d n , i + h ( [ R n ] m 1 u n 1 ) , and the latter sum cancels out the one on the RHS of (64), while lim n 1 n h ( [ R n ] m 1 u n 1 ) = 0 since u 1 is entropy balanced. On the contrary (and loosely speaking), if the projection of the support of z n 1 onto the subspace spanned by the first m rows of Q n is of dimension m (i.e., if κ n = m ) for all n, then h ( [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 ) remains bounded for all n, and the entropy limit of the sum lim n 1 n ( i = 1 m log d n , i ) on the RHS of (64) yields the largest possible entropy gain. Notice that i = 1 m log d n , i = i = m + 1 n log d n , i (because det ( G n ) = 1 ); thus, this entropy gain stems from the uncompensated expansion of G n u n 1 along the space spanned by the rows of [ Q n ] n m + 1 . Beyond these extreme cases (i.e., for general values of κ ˇ and κ ^ ), the following theorem provides tight bounds on the entropy gain.
Theorem 6.
In the system of Figure 1, suppose that u 1 is entropy balanced, and that G ( z ) and z 1 satisfy Assumptions 1 and 2, respectively, where the zeros { ρ i } i = 1 p of G ( z ) satisfy | ρ 1 | | ρ m | > 1 | ρ m + 1 | | ρ p | . For each n N , let Q n T R n × n be the unitary matrix holding the left singular vectors of G n R n × n (as in Lemma 6), where G n is as defined in (4).
1.
Then,
0 lim inf n 1 n h ( y 1 n ) h ( u 1 n ) lim sup n 1 n ( h ( y 1 n ) h ( u 1 n ) ) i = 1 κ ^ log | ρ i | ( 8 ) 1 2 π π π log G ( e j ω ) d ω .
The bounds on both extremes are tight. Moreover, the lower bound is reached if κ ^ = 0 .
2.
If lim inf n 1 n log ( σ min ( [ Q n ] m 1 [ Φ ] n 1 ) = 0 , then
i = m κ ˇ + 1 m log | ρ i | lim inf n 1 n h ( y 1 n ) h ( u 1 n ) .
Thus, the rightmost upper bound in (70) is achieved if κ ˇ = m .
Proof. 
See Appendix B, page 37.    □
The next technical result is very useful for finding conditions under which the requirements of point 2 in Theorem 6 are satisfied (the proof is in Appendix B, page 39).
Lemma 8.
Let F be an FIR LTI causal system of order m such that the m zeros of F ( z ) are NMP, and F n = Q n T D n R n be an SVD for F n , for every n { m , m + 1 , } . For each κ { 1 , , n } , define
κ n rank [ Q n ] m 1 1 κ ,
and κ ¯ min { m , κ } . Then,
lim n σ min [ Q n ] m 1 1 κ > 0 ,
and lim n κ n = κ ¯ .
Now, we can prove Theorem 4.

6.4. Proof of Theorem 4

Factorize G ( z ) as G ( z ) = F ( z ) G ˜ ( z ) , where G ˜ ( z ) is stable and minimum phase and F ( z ) is a stable FIR transfer function with all the m non-minimum-phase zeros of G ( z ) . Letting u ˜ n 1 G ˜ n u n 1 , we have that h ( y n 1 ) = h ( F n u ˜ n 1 + z n 1 ) , h ( u ˜ n 1 ) = h ( u n 1 ) , and that u ˜ 1 is entropy balanced (from Lemma 4). Thus,
h ( y n 1 ) h ( u n 1 ) = h ( G n u n 1 + z n 1 ) h ( u n 1 ) = h ( F n u ˜ n 1 + z n 1 ) h ( u ˜ n 1 ) .
This means that the entropy gain of G due to the output disturbance z 1 corresponds to the entropy gain of F due to the same output disturbance.
Clearly, u 1 , F ( z ) , and z 1 satisfy the assumptions of Theorem 6 with Φ = [ I κ | 0 ] T (see Assumption 2). Therefore,
[ Q n ] m 1 [ Φ ] n 1 = [ Q n ] m 1 1 κ .
Combining this with Lemma 8, it readily follows that, for every κ 1 , the condition in point 2 of Theorem 6 is met, and also lim n κ n = κ ¯ . The proof is then completed by substituting lim inf n κ n = lim sup n κ n = κ ¯ into (70) and (71).

7. Entropy Gain Due to a Random Initial State

Here, we analyze the scenario illustrated by Figure 1 for the case in which there exists a random initial state x 0 independent of the input u 1 , and zero (or deterministic) output disturbance.
The treatment of an initial state of the LTI system G requires one to first define an internal model for it. For this purpose, in this section, we consider the state-space realization of G in the Kalman canonical form, given by
x ( k ) x c o ( k ) x c ¯ o ( k ) x c o ¯ ( k ) x c ¯ o ¯ ( k ) , = A c o A 12 0 0 0 A c ¯ o 0 0 A 31 A 32 A c o ¯ A 34 0 A 42 0 A c ¯ o ¯ x c o ( k 1 ) x c ¯ o ( k 1 ) x c o ¯ ( k 1 ) x c ¯ o ¯ ( k 1 ) , + b c o 0 b c o ¯ 0 u ( k )
y ( k ) = c c o T c c ¯ o T 0 0 x ( k 1 ) + u ( k ) ,
(see, e.g., Reference [21] or Reference [22], Chapter 6) where the column state vectors x c o ( k ) , x c ¯ o ( k ) , x c o ¯ ( k ) , x c ¯ o ¯ ( k ) are, respectively, controllable and observable, non-controllable and observable, controllable and non-observable, and non-controllable and non-observable. There is no loss of generality in choosing this state-space representation, because every state-space representation consistent with a rational transfer function G ( z ) can be written in this form (Reference [22], Theorem 6.7).
Since our interest is on the effect of the random initial state of G on its output, we only need to consider the observable subsystem within (76) and without its input, given by
x o ( k ) x c o ( k ) x c ¯ o ( k ) = A c o A 12 0 A c ¯ o A o x c o ( k 1 ) x c ¯ o ( k 1 ) ,
y ˜ ( k ) = [ c c o T c c ¯ o T ] c o T x o ( k 1 ) ,
where y ˜ is the natural response of G to its initial state x o ( 0 ) and x c o R p and x c ¯ o R q . We shall decompose y ˜ as
y ˜ 1 = [ y ˜ c ¯ o ] 1 n + [ y ˜ c o ] 1 ,
where y ˜ c ¯ o and y ˜ c o are the natural responses of G to initial states [ 0 1 × p x c ¯ o ( 0 ) T ] T and [ x c o ( 0 ) T 0 1 × q ] T , respectively. The natural response component y ˜ c o can be generated by the following minimal state-space representation of G ( z ) , without the effect of its input u:
x c o , 1 ( k ) x c o , 2 ( k ) x c o , 3 ( k ) x c o , p ( k ) x c o ( k ) = b 1 b 2 b 3 b p 1 0 0 0 0 1 0 0 0 0 1 0 A c o x c o , 1 ( k 1 ) x c o , 2 ( k 1 ) x c o , 3 ( k 1 ) x c o , p ( k 1 ) , x c o
y ˜ c o ( k ) = [ a 1 a 2 a p ] a T x c o ( k 1 ) + [ b 1 b 2 b p ] p T x c o ( k 1 ) .
Now, we can state and prove the main result of this section:
Theorem 7.
Suppose G satisfies Assumption 1 and u 1 is entropy balanced. Assume that x o ( 0 ) (the observable part of the initial state of G) is independent of the input u 1 , | h ( x o ( 0 ) ) | < and that tr { K x o ( 0 ) } < . Then,
lim n 1 n ( h ( y 1 n ) h ( u 1 n ) ) = i = 1 m log ρ i .
Proof. 
Both G and u 1 satisfy the conditions of Theorem 6. Thus, as in its statement, we write G ( z ) = F ( z ) G ˜ ( z ) , where G ˜ ( z ) is stable and minimum phase and F ( z ) is a stable FIR transfer function with only the m non-minimum-phase zeros of G ( z ) .
Defining w n 1 G ˜ n u n 1 , we have
y n 1 = F n G ˜ n u n 1 + y ˜ n 1 = F n w n 1 + y ˜ n 1 ,
h ( w n 1 ) = h ( u n 1 ) ,
and y ˜ n 1 w n 1 . In addition, the fact that G is stable guarantees that the sample second moment of y ˜ 1 decays exponentially, which means that y ˜ 1 satisfies Assumption 2. Thus, the conditions of Lemma 6 are met considering G n = F n , where now F n = Q n T D n R n is the SVD for F n , and d n , 1 d n , 2 d n , n . Consequently, the proof would be completed if we can show that lim n 1 n h ( [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ˜ n 1 ) = 0 . But all the involved variables have bounded variance, while R n is unitary, [ Q n ] m 1 has orthonormal rows and the entries of [ D n ] m 1 decay exponentially with n. This implies that lim n 1 n h ( [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ˜ n 1 ) 0 . Therefore, it is only left to prove that
lim n 1 n h ( [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ˜ n 1 ) 0 .
Recalling (78), let us decompose [ y ˜ c o ] n 1 so that
y ˜ n 1 = F n P ˜ n x c o ( 0 ) + P n x c o ( 0 ) + [ y ˜ c ¯ o ] n 1 ,
where P ˜ n , P n R n × ( p + q ) , the sequences F n P ˜ n x c o ( 0 ) and P n x c o ( 0 ) , respectively, are the natural responses of G ˜ and F to the controllable and observable initial state x c o , and [ y ˜ c ¯ o ] n 1 is the natural response of G to the non-controllable and observable initial state x c ¯ o ( 0 ) . Then,
h ( [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ˜ n 1 ) ( a ) h ( [ Q n ] m 1 y ˜ n 1 ) = ( 85 ) h ( [ Q n ] m 1 ( F n P ˜ n x c o ( 0 ) + P n x c o ( 0 ) + [ y ˜ c ¯ o ] n 1 ) ) ,
= ( b ) h ( [ Q n ] m 1 ( F n P ˜ n x c o ( 0 ) + P n x c o ( 0 ) ) | x c ¯ o ( 0 ) ) = h ( [ Q n ] m 1 ( F n P ˜ n + P n ) x c o ( 0 ) | x c ¯ o ( 0 ) ) ,
where ( a ) is from the entropy-power inequality [1] and ( b ) holds because conditioning does not increase entropy and [ y ˜ c ¯ o ] n 1 is a deterministic function of x c ¯ o ( 0 ) . Let the SVD of [ Q n ] m 1 ( F n P ˜ n + P n ) be
[ Q n ] m 1 ( F n P ˜ n + P n ) = S n T n H n , n = m , m + 1 , ,
where S n R m × m is unitary, T n = diag { t 1 , t 2 , , t m } holds the singular values of [ Q n ] m 1 ( F n P ˜ n + P n ) and H n R m × p has orthonormal rows. Substituting this SVD into (87) we obtain
h ( [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ˜ n 1 ) h ( S n T n H n x c o ( 0 ) | x c ¯ o ( 0 ) ) = log ( det ( T n ) ) + h ( H n x c o ( 0 ) | x c ¯ o ( 0 ) ) .
This last differential entropy is bounded because | h ( x o ) | < and tr { K x o } < , which implies (thanks to Proposition A1) that | h ( H n x c o , x c ¯ o ) | < , and by the chain rule of entropy,
h ( H n x c o , x c ¯ o ) = h ( x c ¯ o ) + h ( H n x c o ( 0 ) | x c ¯ o ( 0 ) ) ,
so | h ( H n x c o ( 0 ) | x c ¯ o ( 0 ) ) | < because | h ( x c ¯ o ( 0 ) ) | < (again from Proposition A1). Thus, in view of (89) and (84), all that remains to prove is that
lim n σ min [ Q n ] m 1 F n P ˜ n + P n > 0 ,
For that purpose, notice that [ Q n ] m 1 F n P ˜ n + P n = 1 [ D n ] m [ R n ] m 1 P ˜ n + [ Q n ] m 1 P n . Therefore, from Lemma A4 (in Appendix C), it follows that (91) holds if
lim n σ min [ Q n ] m 1 P n > 0 ,
and
lim n σ max 1 [ D n ] m [ R n ] m 1 P ˜ n = 0 .
To prove (93), recall that the entries in the diagonal matrix 1 [ D n ] m decay exponentially with n. On the other hand, the rows of [ R n ] m 1 are orthonormal. Finally, the fact that G ˜ is stable implies that the p + q columns of P ˜ n have norms which are bounded for all n. These three observations readily yield that (93) holds.
To prove that (92) holds, write the rational transfer function of G (described by (80)) as
G ( z ) = 1 + a 1 z 1 + + a p z p 1 + b 1 z 1 + + b p z p = ( 1 + f 1 z 1 + + f m z m ) F ( z ) 1 + a ˜ 1 z 1 + + a ˜ m ˜ z m ˜ 1 + b 1 z 1 + + b p z p G ˜ ( z ) ,
where m ˜ p m . The coefficients in the numerator of G ( z ) are related to those of F ( z ) and G ˜ ( z ) by the convolution
a i = j = 0 m f j a ˜ i j , i = 1 , , p ,
where a ˜ 0 = f 0 = 1 .
Denote the natural response of F (up to time n) to its initial state x F ( 0 ) (which is a linear function of x c o ( 0 ) ) as
y ¨ n 1 P n x c o ( 0 ) .
Let w ˜ n 1 P ˜ n x c o ( 0 ) be the natural response of G ˜ to its initial state x c o ( 0 ) . Following the structure of (80), w ˜ ( k ) can be written as
w ˜ ( k ) = [ a ˜ 1 a ˜ m ˜ 0 0 ] x c o ( k 1 ) + p T x c o ( k 1 ) , k = 1 , 2 , ,
where x c o satisfies (79). Considering the following minimal state-space representation of F
x F ( k ) x F , 1 ( k ) x F , 2 ( k ) x F , 3 ( k ) x F , m ( k ) = 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 A F x F , 1 ( k 1 ) x F , 2 ( k 1 ) x F , 3 ( k 1 ) x F , m ( k 1 ) + 1 0 0 0 w ( k ) ,
y ˜ c o ( k ) = [ f 1 f 2 f m ] c F T x F ( k 1 ) + w ˜ ( k ) ,
it can be seen that the natural response of F to its own initial state x F ( 0 ) can be written as
y ¨ ( k ) = y ˜ c o ( k ) w ˜ ( k ) the effect of f 1 , , f k 1 .
But, from (80) and (95),
y ˜ c o ( k ) = a ˜ T diag { [ f 1 f m ] } m ˜ + 1 x c o ( k 1 ) + [ a ˜ 1 a ˜ m ˜ 0 0 ] x c o ( k 1 ) + p T x c o ( k 1 ) w ˜ ( k ) ,
where a ˜ [ 1 a ˜ a ˜ m ˜ ] , and
Entropy 23 00947 i001
therefore,
y ¨ ( 1 ) = a ˜ T diag { [ f 1 f 2 f m ] } m ˜ + 1 x c o ( 0 ) ,
y ¨ ( 2 ) = a ˜ T diag { [ 0 f 2 f m ] } m ˜ + 1 A c o x c o ( 0 ) ,
= a ˜ T diag { [ f 2 f m 0 ] } m ˜ + 1 x c o ( 0 )
y ¨ ( m ) = a ˜ T diag { [ 0 0 f m ] } m ˜ + 1 A c o m 1 x c o ,
= a ˜ T diag { [ f m 0 0 ] } m ˜ + 1 x c o ( 0 ) ,
with y ¨ ( k ) = 0 for k > m . Therefore,
y ¨ m 1 = E x c o ( 0 ) = [ M | N ] E x c o ( 0 ) ,
where M R m × ( p m ) and N R m × m is a lower anti-triangular Toeplitz matrix with a ˜ m ˜ f m along its main anti diagonal.
This implies that P n = [ E T | 0 p × ( n m ) ] T and
σ min ( E ) > 0 .
Thus, resuming the reasoning before (94), we have that
[ Q n ] m 1 P n = 1 [ Q n ( p ) ] m E .
It then follows from (110) and Lemma 8 that
lim n σ min [ Q n ] m 1 P n = lim n σ min 1 [ Q n ] m E . > 0 .
Hence, (91) is satisfied. Substituting (91) into (89) and the latter into (84) yields
lim n 1 n h [ D n ] m 1 R n w n 1 + [ Q n ] m 1 y ¯ n 1 = 0 .
The proof is completed by invoking Lemma (6).   □
Theorem 7 allows us to formalize the effect that the presence or absence of a random initial state has on the entropy gain using arguments similar to those utilized in Section 6.

8. Some Implications

The purpose of this section is to illustrate how the results obtained in the previous section can be applied to other problems. To do so, we present next some of the implications of these results on three different problems previously addressed in the literature, namely finding the rate-distortion function for non-stationary processes, an inequality in networked control theory, and the feedback capacity of Gaussian stationary channels. The common feature in these three problems is that, in all of them, non-minimum phase transfer functions play a role (either explicitly or implicitly).

8.1. Networked Control

The analysis developed in Reference [13] considers an LTI system P within a noisy feedback loop, as the one depicted in Figure 4. In this scheme, C represents a causal feedback channel which combines the output of P with an exogenous (noise) random process c 1 to generate its output. The process c 1 is assumed independent of the initial state of P, represented by the random vector x 0 , which has finite differential entropy.
For this system, it is shown in Reference [13], Theorem 4.2, that
h ¯ ( y 1 ) h ¯ ( u 1 ) + lim n 1 n I ( x 0 ; y 1 n ) ,
where I ( x 0 ; y 1 n ) is the mutual information (see Reference [1], Section 8.5) between x 0 and y 1 n , with equality if w is a deterministic function of v. Furthermore, it is shown in Reference [12], Lemma 3.2, that, if | h ( x 0 ) | < and the steady state variance of system P remains asymptotically bounded as k , then
lim n 1 n I ( x 0 ; y 1 n ) p i : | p i | > 1 log p i ,
where { p i } are the poles of P. Thus, for the (simplest) case in which w = v , the output y 1 is the result of filtering u 1 by a filter G = 1 1 P (as shown in Figure 4 right), and the resulting entropy rate of y 1 will exceed that of u 1 only if there is a random initial state with bounded differential entropy (see (114a)). Moreover, if w = v and G ( z ) is stable, (114) (as well as Reference [13], Lemma 4.3) implies that this entropy gain is lower bounded by the right-hand side (RHS) of (8), which is greater than zero if and only if G is NMP. However, both [12,13] do not provide conditions under which this lower bound is reached.
In Reference [14], Theorem 14, it is shown that, when there is perfect feedback (i.e., when v = w ), as in Figure 4 right, with P being the concatenation of a stabilizing LTI controller and an LTI plant, and assuming u 1 is Gaussian i.i.d. and a Gaussian initial state, then
h ¯ ( y 1 ) h ¯ ( u 1 ) = p i : | p i | > 1 log | p i | .
Notice that this implies reaching equality in both (114a) and (114b).
By using the results obtained in Section 7 we show next that equality holds in (114b) provided the feedback channel satisfies the following assumption:
Assumption 3.
The feedback channel in Figure 4 can be written as
w = A B v + B F ( c ) ,
where:
1.
A and B are stable rational transfer functions such that A B is biproper, A B P has the same unstable poles as P, and the feedback A B stabilizes the plant P.
2.
F is any (possibly non-linear) operator such that c ˜ F ( c ) has finite variance σ c ˜ ( n ) 2 for finite n, lim n n 1 log ( σ c ˜ ( n ) 2 ) = 0 , and
3.
c 1 x 0 .
We also extend Reference [14], Theorem 14, to situations including a feedback channel satisfying Assumption 3. For the perfect-feedback case, this extends the validity of (115) to a much larger class of distributions for u 1 .
An illustration of the class of feedback channels satisfying this assumption is depicted on top of Figure 5. Trivial examples of channels satisfying Assumption 5 are a Gaussian additive channel preceded and followed by linear operators [23]. Indeed, when F is an LTI system with a strictly causal transfer function, the feedback channel that satisfies Assumption 3 is widely known as a noise shaper with input pre and post filter, used in, e.g., References [24,25,26,27].
Theorem 8.
In the networked control system of Figure 4, suppose that the feedback channel satisfies Assumption 3, that the plant P ( z ) has poles { p i } i p , and that the input u 1 is entropy balanced. If the random initial states of A B and P, namely s 0 R q and x 0 R p , respectively, are independent, have finite variance and | h ( x 0 ) | < , then
lim n 1 n I ( x 0 ; y 1 n ) = p i > 1 log p i .
Moreover,
lim n 1 n ( h ( y 1 n ) h ( u ˜ 1 n ) ) = p i > 1 log p i ,
where u ˜ u + B c ˜ (see Figure 5 bottom).
Proof. 
Let P ( z ) = N ( z ) / G ( z ) and T ( z ) A ( z ) B ( z ) = Γ ( z ) / Θ ( z ) . We will first show that the output y n 1 can be written as
y n 1 = G n G ˜ n u ˜ n 1 + G n P ˜ n [ x 0 T s 0 T ] T + P n x 0 ,
where G ˜ is the stable LTI system with biproper and MP transfer function
G ˜ ( z ) Θ ( z ) Θ ( z ) G ( z ) + N ( z ) Γ ( z ) ,
with s 0 R q , x 0 R p and [ x 0 T s 0 T ] T being the random initial states of T, G, and G ˜ , respectively, and
u ˜ u + B c ˜
(see Figure 5 bottom). The matrices P ˜ n R n × p and P n R n × ( p + q ) . From Figure 5, it is clear that the transfer function from u ˜ to y is G ( z ) Θ ( z ) Θ ( z ) G ( z ) + N ( z ) Γ ( z ) , validating the first term on the RHS of (118). In addition, it is evident that the initial state of G ˜ is a linear combination of x 0 and s 0 , justifying the term P ˜ n [ x 0 T s 0 T ] T as the natural response of G ˜ . Thus, it is only left to prove that the initial state of G is x 0 . For that purpose, let G ( z ) = 1 i = 1 p g i z i and N ( z ) = i = 1 p n i z i . Define the following variables:
o 1 G y , w N o .
Then, the recursion corresponding to P ( z ) is
o k = i = 1 p g i o k i + y k , k 1 ,
w k = i = 1 p n i o k i , k 1 .
This reveals that the initial state of P ( z ) corresponds to
x 0 = [ o 1 p o 2 p o 0 ] .
But, from (121), o is also the output of G ˜ to the input u ˜ , and
y k = o k i = 1 p f i o k i , k 1 ,
which means that the initial state of G is x 0 .
Now, using (118), we have that
I ( x 0 ; y n 1 ) = h ( y n 1 ) h ( y n 1 | x 0 ) ,
= h ( y n 1 ) h ( F n [ G ˜ n u ˜ n 1 + P ˜ n s 0 ] ) ,
= h ( F n u ˜ n 1 + P n x 0 ) h ( u ˜ n 1 ) ,
where the first equality is because s 0 x 0 and u ¯ n 1 G ˜ n u ˜ n 1 + P ˜ n s 0 . The last equality holds since the first sample of the unit-impulse response of G is 1. Since u 1 is entropy balanced, G ˜ ( z ) is biproper, stable, and MP, and both c ˜ 1 and P ˜ n s 0 have finite variance, it follows from Lemmas 3 and 4 that u ¯ 1 is entropy balanced, as well. Thus, the proof of the first claim is completed by direct application of Theorem 7.
For the second claim,
h ( y n 1 ) h ( u ˜ n 1 ) = ( a ) h ( y n 1 ) h ( G ˜ n u ˜ n 1 ) = h ( y n 1 ) h ( u ¯ n 1 ) + ( h ( G ˜ n u ˜ n 1 ) h ( u ˜ n 1 ) ) ,
where ( a ) holds because the first sample of the unit-impulse response of G ˜ is g ˜ 0 = lim z G ˜ ( z ) = 1 . Then,
lim n 1 n ( h ( y n 1 ) h ( u ˜ n 1 ) ) = lim n 1 n ( h ( y n 1 ) h ( u ¯ n 1 ) ) + lim n 1 n ( h ( G ˜ n u ˜ n 1 ) h ( u ˜ n 1 ) ) ,
= ( a ) lim n 1 n ( h ( y n 1 ) h ( u ¯ n 1 ) ) ,
= ( b ) p i > 1 log p i ,
where ( a ) holds because G ˜ u ˜ is entropy balanced (from Lemma 4), and P ˜ n s 0 has finite variance, allowing us to apply Proposition A3. In turn, ( b ) follows from (128) and (117a). This completes the proof.   □
Remark 2.
If A ( z ) had poles outside the unit circle, then Theorem 8 can still be applied by associating those poles to P.
Remark 3.
Under the conditions of Theorem 8, one has that if either h ¯ ( u 1 ) or h ¯ ( c ˜ 1 ) exists, then the other entropy rate exists too. In that case, if c u and defining c ¯ B c ˜ , (117) yields
h ¯ ( y 1 ) h ¯ ( u 1 ) h ¯ ( c ¯ 1 ) = lim n 1 n I ( x 0 ; y 1 n ) = p i > 1 log p i ,
revealing that the gap in (114a) is exactly h ¯ ( c ¯ 1 ) . In addition, in the perfect-feedback scenario, Theorem 8 extends the validity of (115) from the Gaussian i.i.d. u and Gaussian x 0 considered in Reference [14], Theorem 14, to an entropy-balanced u and an x 0 with finite variance and finite differential entropy.

8.2. Rate Distortion Function for Non-Stationary Processes

In this section, we obtain a simpler proof of a result by Gray, Hashimoto and Arimoto [15,16,17], which compares the rate distortion function (RDF) of a non-stationary auto-regressive Gaussian process x 1 (of a certain class to be defined shortly) to that of a corresponding stationary version, under MSE distortion. Our proof is based upon the ideas developed in the previous sections, and extends the class of non-stationary sources for which the results in References [15,16,17] are valid.
To be more precise, let { a i } i = 1 and { a ˜ i } i = 1 be, respectively, the impulse responses of two linear time-invariant filters A and A ˜ with rational transfer functions
A ( z ) = z M i = 1 M ( z p i ) ,
A ˜ ( z ) = z M i = 1 M | p i * | ( z 1 / p i * ) ,
where p i > 1 , i = 1 , , M . From these definitions, it is clear that A ( z ) is unstable, A ˜ ( z ) is stable, and
| A ( e j ω ) | = | A ˜ ( e j ω ) | , ω [ π , π ] .
Notice also that lim z A ( z ) = 1 and lim z A ˜ ( z ) = 1 / i = 1 M | p i | ; thus,
a 0 = 1 , a ˜ 0 = i = 1 M | p i | 1 .
Consider the non-stationary random sequence (source) x 1 and the asymptotically stationary source x ˜ 1 generated by passing a stationary Gaussian process w 1 through A ( z ) and A ˜ ( z ) , respectively, which can be written as
x n 1 = A n w 1 n , n = 1 , ,
x ˜ n 1 = A ˜ n w 1 n , n = 1 , .
(A block-diagram associated with the construction of x is presented in Figure 6.)
Define the rate-distortion functions for these two sources as
R x ( D ) lim n R x , n ( D ) , R x , n ( D ) min 1 n I ( x 1 n ; x 1 n + u 1 n ) ,
R x ˜ ( D ) lim n R x ˜ , n ( D ) , R x ˜ , n ( D ) min 1 n I ( x ˜ 1 n ; x ˜ 1 n + u ˜ 1 n ) ,
where, for each n, the minima are taken over all the conditional probability density functions f u 1 n | x 1 n and f u ˜ 1 n | x ˜ 1 n yielding E [ u n 1 2 ] / n D and E [ u ˜ n 1 2 ] / n D , respectively.
The above rate-distortion functions have been characterized in References [15,16,17] for the case in which w 1 is an i.i.d. Gaussian process. In particular, it is explicitly stated in References [16,17] that, for that case,
R x ( D ) R x ˜ ( D ) = 1 2 π π π log | A 1 ( e j ω ) | d ω = i = 1 M log | p i | .
We will next provide an alternative and simpler proof of this result, and extend its validity for general (not-necessarily stationary) Gaussian w 1 , using the entropy gain properties of non-minimum phase filters established in Section 6. Indeed, the approach in References [15,16,17] is based upon asymptotically-equivalent Toeplitz matrices in terms of the signals’ covariance matrices. This restricts w 1 to be Gaussian and i.i.d. and A ( z ) to be an all-pole unstable transfer function, and then, the only non-stationarity allowed is that arising from unstable poles. For instance, a cyclo-stationary innovation followed by an unstable filter A ( z ) would yield a source which cannot be treated using Gray and Hashimoto’s approach. By contrast, the reasoning behind our proof lets w 1 be any entropy-balanced Gaussian process with bounded differential entropy rate, and then let the source be A w , with A ( z ) having unstable poles (and possibly zeros and stable poles, as well).
The statement is as follows:
Theorem 9.
Let w 1 be any Gaussian entropy-balanced process with bounded differential entropy rate, and let x 1 and x ˜ 1 be as defined in (138) and (139), respectively. Then, (142) holds.
Thanks to the ideas developed in the previous sections, it is possible to give an intuitive outline of the proof of this theorem (given in Appendix B, page 40) by using a sequence of block diagrams. More precisely, consider the diagrams shown in Figure 7.
In the top diagram in this figure, suppose that y = C x + u realizes the RDF for the non-stationary source x. The sequence u is independent of x , and the linear filter C ( z ) is such that the error ( y x ) y (a necessary condition for minimum MSE optimality). The filter B ( z ) is the Blaschke product of A ( z ) (see (A83) in Appendix B) (a stable, NMP filter with unit frequency response magnitude such that x ˜ = B x ).
If one moves the filter B ( z ) towards the source, then the middle diagram in Figure 7 is obtained. By doing this, the stationary source x ˜ appears with an additive error signal u ˜ that has the same asymptotic variance as u, reconstructed as y ˜ = C x ˜ + u ˜ . From the invertibility of B ( z ) , it also follows that the mutual information rate between x ˜ and y ˜ equals that between x and y. Thus, the channel y ˜ = C x ˜ + u ˜ has the same rate and distortion as the channel y = C x + u .
However, if one now adds a short disturbance d to the error signal u ˜ (as depicted in the bottom diagram of Figure 7), then the resulting additive error term u ¯ = u ˜ + d will be independent of x ˜ and will have the same asymptotic variance as u ˜ . Nonetheless, the differential entropy rate of u ¯ will exceed that of u ˜ by the RHS of (142). This will make the mutual information rate between x ˜ and y ¯ to be less than that between x ˜ and y ˜ by the same amount. Hence, R x ˜ ( D ) is at most R x ( D ) i = 1 m log p i . A similar reasoning can be followed to prove that R x ( D ) R x ˜ ( D ) i = 1 m log p i .

8.3. The Feedback Channel Capacity of (Non-White) Gaussian Channels

Consider a non-white additive Gaussian channel of the form
y k = x k + z k ,
where the input x is subject to the power constraint
lim n 1 n E [ x n 1 2 ] P ,
and z 1 is a stationary Gaussian process.
The feedback information capacity of this channel is realized by a Gaussian input x, and is given by
C FB = lim n max K x n 1 : 1 n tr { K x n 1 } P I ( x n 1 ; y n 1 ) ,
where K x n 1 is the covariance matrix of x n 1 , and, for every k N , the input x k is allowed to depend upon the channel outputs y 1 k 1 (since there exists a causal, noise-less feedback channel with one-step delay).
In Reference [9], it was shown that if z is an auto-regressive moving-average process of M-th order, then C FB can be achieved by the scheme shown in Figure 8. In this system, B is a strictly causal and stable finite-order filter and v 1 is Gaussian with v k = 0 for all k > M and such that v n 1 is Gaussian with a positive-definite covariance matrix K v M 1 .
Here, we use the ideas developed in Section 6 to show that the information rate achieved by the capacity-achieving scheme proposed in Reference [9] drops to zero if there exists any additive disturbance of length at least M and finite differential entropy affecting the output, no matter how small.
To see this, notice that, in this case, and for all n > M ,
I ( x 1 n ; y 1 n ) = I ( v 1 M ; y 1 n ) = h ( y n 1 ) h ( y n 1 | v n 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 + v n 1 | v M 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 | v M 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 ) = h ( y n 1 ) h ( z n 1 ) ,
= h ( ( I n + B n ) z n 1 + v n 1 ) h ( z n 1 ) ,
since det ( I n + B n ) = 1 . From Theorem 4, this gap between differential entries is precisely the entropy gain introduced by I n + B n to an input z n 1 when the output is affected by the disturbance v M 1 . Thus, from Theorem 4, the capacity of this scheme will correspond to 1 2 π π π log 1 + B ( e j ω ) d ω = ρ i > 1 log ρ i , where { ρ i } i = 1 M are the zeros of 1 + B ( z ) , which is precisely the result stated in Reference [9], Theorem 4.1.
However, if the output is now affected by an additive disturbance d 1 not passing through B ( z ) such that d k = 0 , k > M and | h ( d M 1 ) | < , with d 1 ( v 1 M , z 1 ) , then we will have
y n 1 = v n 1 + ( I n + B n ) z n 1 + d n 1 .
In this case,
I ( x 1 n ; y 1 n ) = I ( v 1 M ; y 1 n ) = h ( y n 1 ) h ( y n 1 | v n 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 + v n 1 + d n 1 | v M 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 + d n 1 | v M 1 ) ,
= h ( y n 1 ) h ( ( I n + B n ) z n 1 + d n 1 ) .
But lim n 1 n ( h ( ( I n + B n ) z n 1 + v n 1 + d n 1 ) h ( ( I n + B n ) z n 1 + d n 1 ) ) = 0 , which follows directly from applying Theorem 4 to each of the differential entries. Notice that this result holds irrespective of how small the power of the disturbance may be.
Thus, the capacity-achieving scheme proposed in Reference [9] (and further studied in Reference [28]), although of groundbreaking theoretical importance, would yield zero rate in any practical situation, since in every physically implemented scheme, signals are unavoidably affected by some amount of noise.

9. Conclusions

We have provided an intuitive explanation and a rigorous characterization of the entropy gain of a linear time-invariant (LTI) system, defined as the difference between the differential entropy rates of its output and input random signals. The continuous-time version of this problem, considered by Shannon in Theorem 14 of his 1948 landmark paper, involves an LTI system G c band limited to B [Hz]. For this scenario, we restricted our attention to systems such that the samples of its unit-impulse response, taken ( 2 B ) 1 seconds apart, correspond to the unit-impulse response g 0 , g 1 , of a causal and stable discrete-time system G. We show that the entropy gain in this case is log | g 0 | , which implies that, for this class of systems, Shannon’s Theorem 14 holds if and only if G c has a corresponding discrete-time G that is minimum phase (MP).
For the discrete-time case, we introduced a new notion referred to as effective differential entropy, which quantifies the amount of uncertainty in vector signals that are confined to subspaces of lower dimensionality than that of the signals themselves. (Note that this is not possible by the conventional notion of differential entropy, which simply diverges to minus infinity.) It turns out that the difference in effective differential entropy rate between an n-length input to an LTI discrete-time system with frequency response G ( e j ω ) , and its full length output, as n tends to infinity, equals 1 2 π π π | G ( e j ω ) | d ω .
When comparing input and output sequences of equal length, our analysis revealed that, in the absence of external random disturbances, the entropy gain of a discrete-time LTI system G with unit-impulse response g 0 , g 1 , is simply log | g 0 | . An entropy gain greater than log | g 0 | can be obtained only if a random signal is added to the output of G and if such output process has statistical properties that make it susceptible to the added random signal. In order to characterize the role of G, its input has been assumed to be entropy balanced (EB), a notion introduced herein. Crucially, the differential entropy rate of an EB process is not susceptible to random signals. EB processes constitute a large family that includes Gaussian processes with bounded, non-vanishing variance. We also show that (i) the sum of an EB process and any bounded variance process is EB, too, and (ii) passing an EB process by a stable MP filter yields an EB process. When the input is EB, we show that if G has NMP zeros ρ 1 , ρ 2 , , ρ m , then the largest possible entropy gain is | g 0 | + i = 1 m log | ρ i | , which equals 1 2 π π π | G ( e j ω ) | d ω . This upper bound is achieved by adding a finite-length output disturbance with finite variance and bounded differential entropy if and only if its length is at least m, no matter how tiny its variance may be. The same entropy gain is also obtained if G has a random initial state with bounded differential entropy and finite variance.
We used these fundamental insights about when the entropy gain occurs in order to establish a new and more general proof of the quadratic rate-distortion function for non-stationary Gaussian sources. Moreover, we demonstrated that the information rate of the capacity-achieving scheme proposed in Reference [9] for the auto regressive Gaussian channel with feedback drops to zero in the presence of any additive disturbance in the channel input or output of sufficient (finite) length, no matter how small it may be. This has crucial implications in any physical setup, where noise is unavoidable.

Author Contributions

Conceptualization, M.S.D. and M.M.; Investigation, M.S.D., M.M. and J.Ø.; Writing—original draft, M.S.D.; Writing—review and editing, M.M. and J.Ø. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Comisión Nacional de Investigación Científica y Tecnológica grant number FB0008.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 3

The total length of the output , will grow with the length n of the input, if G is FIR, and will be infinite, if G is IIR. Letting η + 1 be the length of the impulse response of G in the FIR case, we define the output-length function
( n ) length of y when input is u n 1 = n + η , if G is FIR , , if G is IIR .
It is also convenient to define the sequence of matrices { G ˘ n } n = 1 , where G ˘ n R ( n ) × n is Toeplitz with G ˘ n i , j = 0 , i < j , G ˘ n i , j = g i j , i j . This allows one to write the entire output y 1 of a causal LTI filter G with impulse response { g k } k = 0 η to an input u 1 as
y ( n ) 1 ( u n 1 ) = G ˘ n u n 1 .
Let the SVD of G ˘ n be G ˘ n = Q ˘ n T D ˘ n R ˘ n , where Q ˘ n R n × ( n ) has orthonormal rows, D ˘ n R n × n is diagonal with positive elements, and R ˘ n R n × n is unitary.
The effective differential entropy of y 1 ( n ) ( u 1 n ) exceeds the differential entropy of u 1 n by
h ˘ ( y ( n ) 1 ( u n 1 ) ) h ( u n 1 ) = h ( Q ˘ n G ˘ n u n 1 ) h ( u n 1 ) = h ( D ˘ n R ˘ n u n 1 ) h ( u n 1 ) = log det ( D ˘ n ) .
The determinant of D n can be related to that of G ˘ n T G ˘ n by noticing that
G ˘ n T G ˘ n = ( Q ˘ n T D ˘ n R ˘ n ) T ( Q ˘ n T D ˘ n R ˘ n ) = R ˘ n T D ˘ n Q ˘ n Q ˘ n T D ˘ n R ˘ n = R ˘ n T D ˘ n 2 R ˘ n .
Since R ˘ n is unitary, it follows that det D ˘ n 2 = det G ˘ n T G ˘ n , which from (A3) means that
h ˘ ( y ( n ) 1 ( u n 1 ) ) h ( u n 1 ) = 1 2 log ( det ( G ˘ n T G ˘ n ) ) .
The product H n G ˘ n T G ˘ n is a symmetric Toeplitz matrix, with its first column, [ h 0 h 1 h n 1 ] T , given by
h i = k = 0 n g k g k i . Thus, the sequence { h i } i = 0 n 1 corresponds to the samples 0 to n 1 of those resulting from the complete convolution g g , even when the filter G is IIR, where g denotes the time-reversed (possibly infinitely long) response g. Consequently, and since G ( z ) has no zeros on the unit circle, and g is absolutely summable, we can use the Grenander and Szegö’s theorem [29], and Reference [18], Theorem 4.2, to obtain that
lim n log det ( G ˘ n T G ˘ n ) 1 / n = 1 2 π π π log G ( e j ω ) 2 d ω .
In order to finish the proof, we divide (A5) by n, take the limit as n , and replace (A6) in the latter.

Appendix B. Proofs of Results Stated in the Previous Sections

Proof of Lemma 1.
Let σ u ( k ) 2 be the variance of u ( k ) . Thus, h ( u n 1 ) = 1 2 log ( ( 2 π e ) n det ( diag { σ u ( k ) 2 } k = 1 n ) ) . Let y n ν + 1 Φ n u n 1 . Then, K y n ν + 1 = Φ n diag { σ u ( k ) 2 } k = 1 n Φ n T . As a consequence,
h ( y n ν + 1 ) = 1 2 log ( 2 π e ) [ n ν ] det Φ n diag { σ u ( k ) 2 } k = 1 n Φ n T .
But from the Courant-Fischer theorem [30],
log det ( diag { σ u ( k ) 2 } k = 1 n ) ν log ( σ ^ 2 ) log det ( Φ n diag { σ u ( k ) 2 } k = 1 n Φ n T ) log det ( diag { σ u ( k ) 2 } k = 1 n ) ν log ( σ ˇ 2 ) ;
thus, lim n 1 n ( h ( y n ν + 1 ) h ( u n 1 ) ) = 0 , satisfying Condition ii) in Definition 2. Adding to this the fact that, in this case, σ u ( n ) 2 σ ^ 2 < for all n, Condition i) in Definition 2 is satisfied, as well, completing the proof.   □
Proof of Lemma 2.
Let { b i , } = 1 be the intervals (bins) in R where the sample u ( i ) has constant PDF. Define the discrete random process c 1 , where c ( i ) = if and only if u ( i ) b i , . Let y n ν + 1 Φ n u n 1 where Φ n R ( n ν ) × n has orthonormal rows. Then,
h ( y n ν + 1 ) = h ( y n ν + 1 | c n 1 ) + I ( c n 1 ; y n ν + 1 )
h ( y n ν + 1 | c n 1 ) + I ( c n 1 ; u n 1 ) ,
where the inequality is due to the fact that u n 1 and y n ν + 1 are deterministic functions of u n 1 ; hence, c n 1 u n 1 y n ν + 1 . Subtracting h ( u n 1 ) from (A9) we obtain
h ( y n ν + 1 ) h ( u n 1 ) h ( y n ν + 1 | c n 1 ) + I ( c n 1 ; u n 1 ) h ( u n 1 )
= h ( y n ν + 1 | c n 1 ) h ( u n 1 | c n 1 ) .
Hence,
lim n 1 n h ( y n ν + 1 ) h ( u n 1 ) lim n 1 n h ( y n ν + 1 | c n 1 ) h ( u n 1 | c n 1 ) = 0 ,
where the last equality follows from Lemma A1 (in Appendix C) whose conditions are met because, given c n 1 , the sequence u n 1 has independent entries each of them distributed uniformly over a possibly different interval with finite and positive measure. The opposite inequality is obtained by following the same steps as in the proof of Lemma A1, from (A124) onwards, which completes the proof.   □
Proof of Lemma 3.
Let y n 1 [ Ψ n T | Φ n T ] T w n 1 , where [ Ψ n T | Φ n T ] T R n × n is a unitary matrix and where Ψ n R ν × n and Φ n R ( n ν ) × n have orthonormal rows.
Then,
h ( y n ν + 1 ) = h ( y n 1 ) h ( y ν 1 | y n ν + 1 ) = h ( w n 1 ) h ( y ν 1 | y n ν + 1 ) .
We can lower bound h ( y ν 1 | y n ν + 1 ) as follows:
h ( y ν 1 | y n ν + 1 ) = ( c ) h ( Ψ n u n 1 + Ψ n v n 1 | Φ n u n 1 + Φ n v n 1 )
( a ) h ( Ψ n u n 1 + Ψ n v n 1 | Φ n u n 1 + Φ n v n 1 , v n 1 )
= ( c ) h ( Ψ n u n 1 | Φ n u n 1 + Φ n v n 1 , v n 1 )
= ( c ) h ( Ψ n u n 1 | Φ n u n 1 , v n 1 )
= ( b ) h ( Ψ n u n 1 | Φ n u n 1 )
= ( c ) h ( u n 1 ) h ( Φ n u n 1 ) ,
where ( a ) holds because conditioning does not increase entropy, ( b ) is from the fact that u n 1 v n 1 , and ( c ) follows from the chain rule of entropy.
Substituting this result into (A14), dividing by n, taking the limit as n , and recalling that u 1 is entropy balanced, we conclude that lim n 1 n ( h ( Φ n w n 1 ) h ( w n 1 ) ) 0 .
The opposite bound over h ( y ν 1 | y n ν + 1 ) can be obtained from
h ( y ν 1 | y n ν + 1 ) = h ( Ψ n u n 1 + Ψ n v n 1 | Φ n u n 1 + Φ n v n 1 ) h ( Ψ n u n 1 + Ψ n v n 1 ) h ( Ψ n ( w G ) n 1 ) ,
where ( w G ) n 1 is a jointly Gaussian sequence with the same second-order moment as w n 1 . Therefore, h ( Ψ n ( w G ) n 1 ) = 1 2 log ( ( 2 π e ) ν det ( Ψ n K w n 1 Ψ n T ) ) ν 2 log ( 2 π e λ max ( K w n 1 ) ) . But w n 1 satisfies the assumptions of Proposition A2; thus, lim n n 1 log ( λ max ( K w n 1 ) ) = 0 . Therefore, lim n n 1 h ( Ψ n ( w G ) n 1 ) 0 , which substituted in (A14) yields
lim n 1 n h ( y ν 1 | y n ν + 1 ) = lim n 1 n ( h ( Φ n w n 1 ) h ( w n 1 ) ) 0 .
Hence, w 1 satisfies Condition ii) of Definition 2. Since w 1 also satisfies Condition i) of Definition 2, it follows that w 1 is entropy balanced, completing the proof.   □
Proof of Lemma 4.
Pick any ν N and let y n 1 [ Φ n T | Ψ n T ] T w n 1 where [ Φ n T | Ψ n T ] T R n × n is a unitary matrix and the matrices Ψ n R ν × n and Φ n R ( n ν ) × n have orthonormal rows. Since w n 1 = G n u n 1 , we have that
Φ n w n 1 = Φ n G n u n 1 .
Let Φ n G n = A n Σ n B n be the SVD of Φ n G n , where A n R ( n ν ) × ( n ν ) is an orthogonal matrix, B n R ( n ν ) × n has orthonormal rows and Σ n R ( n ν ) × ( n ν ) is a diagonal matrix with the singular values of Φ n G n .
Hence
h ( Φ n w n 1 ) = h ( Φ n G n u n 1 ) = h ( A n Σ n B n u n 1 ) = log det ( Σ n ) + h ( B n u n 1 ) .
The singular values of Φ n G n are σ i ( Φ n G n ) = λ i ( Φ n G n G n T Φ n T ) , i = 1 , 2 , , n ν . Now, notice that
λ i ( [ Φ n T | Ψ n T ] T G G T [ Φ n T | Ψ n T ] ) = λ i ( G G T )
and that
[ Φ n T | Ψ n T ] T G G T [ Φ n T | Ψ n T ] = Φ n G G T Φ n T Φ n G G T Ψ n T Ψ n G G T Φ n T Ψ n G G T Ψ n T .
Thus, from (A24) and the Cauchy eigenvalue interlacing theorem [30],
λ i ( G n G n T ) λ i ( Φ n G G T Φ n T ) λ i + ν ( G n G n T ) , i = 1 , , n ν .
Hence,
ν 2 n log λ max ( G n G n T ) 1 n log det ( Σ n ) 1 n log det ( G n ) ν 2 n log λ min ( G n G n T ) .
Recalling that G is minimum phase (which guarantees that its singular values change at most polynomially with n, due to Lemma 7), we conclude that
lim n 1 n log det ( Σ n ) = 1 n log | det ( G n ) | .
Substituting back into (A23), we arrive to
lim n 1 n h ( Φ n w n 1 ) = ( a ) lim n 1 n log | det ( G n ) | + lim n 1 n h ( u 1 n ) = lim n 1 n h ( w 1 n ) ,
where ( a ) holds because u 1 is entropy balanced. This completes the proof.   □
Proof of Lemma 5.
Let { Ψ n } n = 1 be a sequence of matrices, each Ψ n R κ × n with orthonormal rows spanning a subspace of R n that contains the span of the columns of [ Φ ] n 1 . For each n N , let Ψ ¯ n R ( n κ ) × n be such that H n [ Ψ n T | Ψ ¯ n T ] T is a unitary matrix. Then,
h ( Ψ ¯ n y n 1 ) = h ( Ψ ¯ n u n 1 ) .
Thus,
lim n 1 n h ( y n 1 ) h ( u n 1 ) = lim n 1 n ( h ( y n 1 ) h ( Ψ ¯ n y n 1 ) ) ( h ( u n 1 ) h ( Ψ ¯ n u n 1 ) ) = 0 ,
where the last equality holds because u 1 is entropy balanced and y 1 is entropy balanced (from Lemma 3). This completes the proof.   □
Proof of Lemma 6.
Since Q n is unitary, we have that
h ( y n 1 ) = h ( Q n y n 1 ) = h ( D n R n u n 1 v n 1 + Q n z n 1 z ¯ n 1 w n 1 ) = h ( w n 1 ) ,
where
w n 1 Q n y n 1 = v n 1 + z ¯ n 1 ,
v n 1 D n R n u n 1 ,
z ¯ n 1 Q n z n 1 .
Thus,
h ( y n 1 ) = h ( w 1 n ) = ( a ) h ( w 1 m ) + h ( w m + 1 n | w 1 m ) = h ( [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 ) + h ( w m + 1 n | w 1 m ) ,
where ( a ) follows from the chain rule of differential entropy. It only remains to show that the limit of ( 1 / n ) h ( w m + 1 n | w 1 m ) as n equals the entropy rate of u 1 . We will do this by deriving a lower and an upper bounds which converge to the same expression as n .
A lower bound for h ( w m + 1 n | w 1 m ) can be obtained by noticing that
h ( w m + 1 n | w 1 m ) = h ( v m + 1 n + z ¯ m + 1 n | v 1 m + z ¯ 1 m )
( a ) h ( v m + 1 n + z ¯ m + 1 n | v 1 m , z ¯ 1 n )
= ( b ) h ( v m + 1 n | v 1 m , z ¯ 1 n )
= ( c ) h ( v m + 1 n | v 1 m )
= ( d ) h ( v 1 n ) h ( v 1 m )
= ( e ) h ( u 1 n ) h ( v 1 m ) ,
where ( a ) follows from the fact that conditioning on more information does not increase differential entropy, ( b ) is due to the fact that h ( x + a ) = h ( x ) , for any constant a, ( c ) holds because z ¯ 1 v 1 , ( d ) is a direct application of the chain rule of differential entropy, and ( e ) stems from (A34) and the fact that det ( D n R n ) = 1 . On the other hand,
h ( v 1 m ) = h ( [ D n ] m 1 R n u n 1 ) = i = 1 m log d n , i + h ( [ R n ] m 1 u n 1 ) .
Then, by inserting (A43) and (A42) in (A37), dividing by n, and taking the limit n , we obtain
lim n 1 n h ( w m + 1 n | w 1 m ) lim n 1 n h ( u 1 n ) i = 1 m log d n , i h ( [ R n ] m 1 u n 1 )
= h ¯ ( u 1 ) lim n 1 n i = 1 m log d n , i ,
where the last equality is a consequence of the fact that u 1 is entropy balanced (specifically, from Proposition A3).
We now derive an upper bound for h ( w m + 1 n | w 1 m ) . Defining the random vector
x n m + 1 [ R n ] n m + 1 u n 1 ,
and since D n is diagonal, we can write
v m 1 = [ D n ] n m + 1 R n u n 1 = m + 1 [ D n ] n x n m + 1 ,
where
m + 1 [ D n ] n diag { d n , m + 1 , d n , m + 2 , , d n , n } .
Therefore,
h ( w m + 1 n | w 1 m ) h ( w n m + 1 ) = h ( m + 1 [ D n ] n x n m + 1 + z ¯ n m + 1 )
= log det ( m + 1 [ D n ] n ) + h ( x n m + 1 + ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 ) .
Notice that, by Assumption 2, z ¯ n m + 1 = [ Q n ] n m + 1 z n 1 = [ Q n ] n m + 1 [ Φ ] n 1 s κ 1 and, thus, is restricted to the span of [ Q n ] n m + 1 [ Φ ] n 1 of dimension κ n κ , for all n m + κ . Then, for every n > m + κ n , one can construct a unitary matrix H n ( A n T | B n T ) T R ( n m ) × ( n m ) , with A n R κ × ( n m ) and B n R ( n m κ ) × ( n m ) , such that the rows of A n span the space spanned by the columns of ( m + 1 [ D n ] n ) 1 [ Q n ] n m + 1 [ Φ ] n 1 and such that B n ( m + 1 [ D n ] n ) 1 [ Q n ] n m + 1 [ Φ ] n 1 = 0 . Therefore, from (A49),
h ( w m + 1 n | w 1 m ) log det ( m + 1 [ D n ] n ) + h ( H n x n m + 1 + H n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 ) = log det ( m + 1 [ D n ] n ) + h ( B n x n m + 1 ) + h ( A n x n m + 1 + A n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 | B n x n m + 1 ) log det ( m + 1 [ D n ] n ) + h ( B n x n m + 1 ) + h ( A n x n m + 1 + A n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 ) log det ( m + 1 [ D n ] n ) + h ( B n x n m + 1 ) + 1 2 log ( 2 π e ) κ det K A n x n m + 1 + K A n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 log det ( m + 1 [ D n ] n ) + h ( B n x n m + 1 ) + 1 2 log ( 2 π e ) κ λ max ( K x n m + 1 ) + λ max ( K z ¯ n m + 1 ) λ min ( m + 1 [ D n ] n ) 2 κ ,
where K A n x n m + 1 and K A n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 are the covariance matrices of A n x n m + 1 and A n ( m + 1 [ D n ] n ) 1 z ¯ n m + 1 , respectively, and where the last inequality follows from [31]. The fact that λ max ( K x n m + 1 ) and λ max ( K z ¯ n m + 1 ) are upper bounded for all n, and the fact that λ min ( m + 1 [ D n ] n ) either grows with n or decreases sub-exponentially (from Lemma 7), imply that
lim n 1 n h ( w m + 1 n | w 1 m ) lim n 1 n log det ( m + 1 [ D n ] n ) + lim n 1 n h ( B n x n m + 1 ) .
But the fact that det D n = 1 implies that log det ( m + 1 [ D n ] n ) = i = 1 m log d n , i . On the other hand, recalling that x n m + 1 = [ R n ] n m + 1 u n 1 and noting that B n [ R n ] n m + 1 has orthonormal rows, reveals that lim n 1 n h ( B n x n m + 1 ) = h ¯ ( u 1 ) (from the assumption that u 1 is entropy balanced). Therefore,
lim n 1 n h ( w m + 1 n | w 1 m ) h ¯ ( u 1 ) lim n 1 n i = 1 m log d n , i ,
which coincides with the lower bound found in (A45), completing the proof.   □
Proof of Lemma 7.
The transfer function G ( z ) can be factored as G ( z ) = G ˜ ( z ) F ( z ) , where G ˜ ( z ) is stable and minimum phase and F ( z ) is stable with all the non-minimum phase zeros of G ( z ) , both being biproper rational functions. From Lemma A2 (in Appendix C), in the limit as n , the eigenvalues of G ˜ n T G ˜ n are lower and upper bounded by λ min ( G ˜ T G ˜ ) and λ max ( G ˜ T G ˜ ) , respectively, where 0 < λ min ( G ˜ T G ˜ ) λ max ( G ˜ T G ˜ ) < . Let G ˜ n = Q ˜ n T D ˜ n R ˜ n and F n = Q n T D n R n be the SVDs of G ˜ n and F n , respectively, with d ˜ n , 1 d ˜ n , 2 d ˜ n , n and d n , 1 d n , 2 d n , n being the diagonal entries of the diagonal matrices D ˜ n , D n , respectively. Then,
G n T G n = F n T G ˜ n T G ˜ n F n = ( D ˜ n R ˜ n Q n T D n R n ) T D ˜ n R ˜ n Q n T D n R n .
Denoting the i-th row of R n by r n , i T be, we have that, from the Courant-Fischer theorem [30] that
λ i ( G n T G n ) max v span { r n , k } k = 1 i : v = 1 G v 2
= max v span { r n , k } k = 1 i : v = 1 D ˜ n R ˜ n T Q n T D n R n v 2
d n , i 2 d ˜ n , n 2 .
Likewise,
λ i ( G n T G n ) min v span { r n , k } k = i n : v = 1 G v
= min v span { r n , k } k = i n : v = 1 D ˜ n R ˜ n T Q n T D n R n v 2
d n , i 2 d ˜ n , 1 2 .
Thus,
lim n λ i ( G n T G n ) d n , i 2 λ min ( G ˜ T G ˜ ) , λ max ( G ˜ T G ˜ ) .
The result now follows directly from Lemma A3 (in Appendix C).   □
Proof of Theorem 6.
To begin with, the entropy power inequality [1] gives h ( y n 1 ) = h ( G n u n 1 + z n 1 ) h ( G n u n 1 ) = h ( y n 1 ) , proving the lower bound in (70).
To obtain the other bounds on the entropy gain of G n , we will use Lemma 6. Recalling the structure of z 1 specified in Assumption 2, the random vector whose differential entropy appears on the RHS of (64) takes the form
[ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 = [ D n ] m 1 R n u n 1 + [ Q n ] m 1 [ Φ ] n 1 s κ 1 .
Notice that, for every n κ , the columns of the matrix [ Q n ] m 1 [ Φ ] n 1 R m × κ span a space of dimension κ n { 0 , 1 , , κ ¯ } , with κ ¯ min { m , κ } . If κ n = 0 (i.e., [ Q n ] m 1 [ Φ ] n 1 = 0 ), then
h ( [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 ) = h ( [ D n ] m 1 R n u n 1 ) .
If that is that case for every n κ , the lower bound in (70) is reached by inserting the latter expression into (64) and invoking Lemma 7.
Let [ Q n ] m 1 [ Φ ] n 1 = A n T T n B n be an SVD for [ Q n ] m 1 [ Φ ] n 1 , where A n R κ n × m has orthonormal rows,
T n = diag { t 1 ( n ) , t 2 ( n ) , , t κ n ( n ) } ,
where 0 < t 1 ( n ) t 2 ( n ) t κ n ( n ) 1 are the singular values of [ Q n ] m 1 [ Φ ] n 1 , and B n R κ n × κ has orthonormal rows. Construct a unitary matrix H n R m × m such that
H n A n A ¯ n ,
where A n R κ n × m is as before, and A ¯ n R ( m κ n ) × m has orthonormal rows, and its row span is the orthogonal complement of that of A n . Thus,
H n [ Q n ] m 1 [ Φ ] n 1 = A n [ Q n ] m 1 [ Φ ] n 1 0 ( m κ n ) × κ , n N .
From (A63) and (A60), we obtain
h [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 = h [ D n ] m 1 R n u n 1 + [ Q n ] m 1 [ Φ ] n 1 s κ 1
= h H n ( [ D n ] m 1 R n u n 1 + [ Q n ] m 1 [ Φ ] n 1 s κ 1 )
= h A n [ D n ] m 1 R n u n 1 + A n [ Q n ] m 1 [ Φ ] n 1 s κ 1 | A ¯ n [ D n ] m 1 R n u n 1 + ( 1 1 { m } ( κ n ) ) h ( A ¯ n [ D n ] m 1 R n u n 1 ) ,
where the indicator function 1 { m } ( κ n ) = 1 if κ n = m and 0 otherwise. The first differential entropy on the RHS of (A66) can be lower bounded as
h A n [ D n ] m 1 R n u n 1 + A n [ Q n ] m 1 [ Φ ] n 1 s κ 1 | A ¯ n [ D n ] m 1 R n u n 1 ( a ) h A n [ Q n ] m 1 [ Φ ] n 1 s κ 1 | A ¯ n [ D n ] m 1 R n u n 1 = ( b ) h A n [ Q n ] m 1 [ Φ ] n 1 s κ 1 = h T n B n s κ 1 ( c ) κ n log ( t 1 ( n ) ) + h ( s κ 1 ) κ κ n 2 log ( λ max ( K s κ 1 ) ) ,
where ( a ) is from the entropy power inequality [1], ( b ) holds because s κ 1 u n 1 and ( c ) is from Proposition A1. An upper bound can be obtained as
h A n [ D n ] m 1 R n u n 1 + A n [ Q n ] m 1 [ Φ ] n 1 s κ 1 | A ¯ n [ D n ] m 1 R n u n 1 ( a ) h A n [ D n ] m 1 R n u n 1 + T n B n s κ 1 ( b ) 1 2 log ( 2 π e ) m det ( K A n [ D n ] m 1 R n u n 1 + K T n B n s κ 1 ) ( c ) κ n 2 log ( 2 π e ) ( d m ) 2 λ max ( K u n 1 ) + ( t κ n ( n ) ) 2 λ max ( K s κ 1 ) ,
where ( a ) holds because conditioning does not increase entropy, ( b ) is because a Gaussian distribution maximizes the differential entropy for a given covariance matrix, and ( c ) is due to Reference [31]. Notice that u 1 satisfies the requirements of Proposition A2, implying that lim n n 1 λ max ( K u n 1 ) = 0 . Thus, since t κ n ( n ) 1 , it follows from (A67), (A68), and (A66) that
lim n 1 1 { m } ( κ n ) n h ( A ¯ n [ D n ] m 1 R n u n 1 ) + lim n κ n n log ( t 1 ( n ) ) lim n 1 n h [ D n ] m 1 R n u n 1 + [ Q n ] m 1 z n 1 lim n ( 1 1 { m } ( κ n ) ) h ( A ¯ n [ D n ] m 1 R n u n 1 ) .
For the last differential entropy on the RHS of (A66), notice that [ D n ] m 1 R n = 1 [ D n ] m [ R n ] m 1 . Consider the SVD A ¯ n 1 [ D n ] m [ R n ] m 1 = V n T Σ n W n , with V n R ( m κ n ) × ( m κ n ) being unitary, Σ n R ( m κ n ) × ( m κ n ) being diagonal, and W n R ( m κ n ) × n having orthonormal rows. We can then conclude that
h ( A ¯ n [ D n ] m 1 R n u n 1 ) = h ( Σ n W n u n 1 ) = log det ( Σ n ) + h ( W n u n 1 ) .
Now, the fact that
A ¯ n 1 [ D n ] m 1 [ D n ] m A ¯ n T = A ¯ n 1 [ D n ] m [ R n ] m 1 ( A ¯ n 1 [ D n ] m [ R n ] m 1 ) T = V T Σ W W T Σ T V = V T Σ Σ T V
reveals that
log det Σ = 1 2 log | det ( A ¯ n ( 1 [ D n ] m ) 2 A ¯ n T ) | .
Recalling that A ¯ n = [ H n ] m κ n + 1 and that H n R m × m is unitary, it is easy to show (by using the Courant-Fischer theorem [30]) that
i = 1 m κ n log d n , i ( a ) 1 2 log det ( A ¯ n ( 1 [ D n ] m ) 2 A ¯ n T ) ( b ) i = κ n + 1 m log d n , i ,
with equality in ( a ) and ( b ) if and only if A ¯ n = [ I m κ n | 0 ] and A ¯ n = [ 0 | I m κ n ] , respectively. Substituting this into (A71) and then the latter into (A70), we arrive to
h ( [ W n ] m 1 u n 1 ) + i = 1 m κ n log d n , i h ( A ¯ n [ D n ] m 1 R n u n 1 ) h ( [ W n ] m 1 u n 1 ) + i = κ n + 1 m log d n , i .
Substituting the upper bound from this equation and from (A68) into (A66) and the latter in (64), exploiting the fact that u 1 is entropy balanced (which ensures that u 1 satisfies Condition i) in Definition 2) and invoking Lemma 7 yields the upper bound in (70).
Doing the same substitutions but with the lower bounds in (A73) and (A67), and using the assumption that lim n 1 n log ( t 1 ( n ) ) = 0 , gives the lower bound of (71). This completes the proof.   □
Proof of Lemma 8.
We will consider first the case κ = m and show that lim n σ min ( 1 [ Q n ] m ) > 0 , where now Q n T is the left unitary matrix in the SVD F n = Q n T D n R n . We will prove that this is the case by using a contradiction argument. Thus, suppose the contrary, i.e., that
lim n σ min ( 1 [ Q n ] m ) = 0 .
Then, there exists a sequence of unit-norm vectors { v n } n = 1 , with v n R m for all n, such that
lim n v n T 1 [ Q n ] m = 0 .
For each n N , define the n-length unit-norm image vectors t n T v n T [ Q n ] m 1 . Then,
F n T t n = R n T D n Q n t n = D n Q n t n = 1 [ D n ] m v n ,
where the last equality follows from the fact that, by construction, t n T is in the span of the first m rows of Q n , together with the fact that Q n is unitary (which implies that [ Q n ] n m + 1 t n = 0 ). Since the top m entries in D n decay exponentially as n increases, we have that
F n T t n O ( ζ n | ρ M | n ) ,
where ζ n is a finite-order polynomial of n (from Lemma A3, in Appendix C).
Now, notice that [ F n ] n m + 1 ( [ F n ] n m + 1 ) T is a Toeplitz matrix with the convolution of f and f (the impulse response of F and its time-reversed version, respectively) on its first row and column. It then follows from Reference [18], Lemma 4.1, that
lim n λ min ( [ F n ] n m + 1 ( [ F n ] n m + 1 ) T ) = min ω : ω [ π , π ] | F ( e j ω ) | 2 > 0
(the inequality is strict because all the zeros of F ( z ) are strictly outside the unit disk). Then, we conclude that
lim n σ min ( [ F n ] n m + 1 ) > 0 .
Recall that t n = 1 ; thus, from (A75), lim n [ t n ] m 1 = 0 and lim n [ t n ] n m + 1 = 1 , which means that lim n F n T t n = lim n ( [ F n ] n m + 1 ) T [ t n ] n m + 1 lim n σ min ( [ F n ] n m + 1 ) > ( 79 ) 0 , which contradicts (A77). Therefore,
lim n σ min ( 1 [ Q n ] m ) > 0 .
Now, consider an arbitrary κ 1 . Since
σ min [ Q n ] m 1 1 κ σ min 1 [ Q n ] m ,
it follows from (A80) that
lim n σ min [ Q n ] m 1 [ Φ ] n 1 = lim n σ min [ Q n ] m 1 1 κ > 0 ;
thus, lim n κ n = κ ¯ . This completes the proof.   □
Proof of Theorem 9.
Denote the Blaschke product [11] of A ( z ) as
B ( z ) i = 1 m ( z p i ) i = 1 m p i * ( z 1 / p i * ) ,
which clearly satisfies
| B ( e j ω ) | = 1 , ω [ π , π ] ,
b 0 lim | z | B ( z ) = 1 i = 1 m p i * ,
where b 0 is the first sample in the impulse response of B ( z ) . Notice that (A84) implies that lim n 1 n E [ B n u n 1 2 ] = lim n 1 n E [ u n 1 2 ] for every sequence of random variables u 1 with uniformly bounded variance. Since B ( z ) has only stable poles and its zeros coincide exactly with the poles of A ( z ) , it follows that B ( z ) A ( z ) is an MP stable transfer function. Thus, the asymptotically stationary process x ˜ 1 defined in (139) can be constructed as
x ˜ n 1 B n x n 1 ,
where B n is a Toeplitz lower triangular matrix with its main diagonal entries equal to b 0 . Since w 1 is entropy balanced, so is x ˜ 1 , thanks to Lemma 4.
The fact that B ( z ) is biproper with b 0 as in (A85) implies that, for any u n 1 with finite differential entropy,
h ( B n u n 1 ) = h ( u n 1 ) n i = 1 m log p i G ,
which will be utilized next.
For any given n m , suppose that C ( z ) is chosen and x n 1 and u n 1 are distributed so as to minimize I ( x n 1 ; C n x n 1 + u n 1 ) subject to the constraint E [ y n 1 x n 1 2 ] = E [ ( C n I ) x n 1 2 ] + E [ u n 1 ] 2 D (i.e., x n 1 , u n 1 is a realization of R x , n ( D ) ), yielding the reconstruction
y n 1 = C n x n 1 + u n 1 .
Since we are considering mean-squared error distortion, it follows that, for rate-distortion optimality, u n 1 must be jointly Gaussian with x n 1 . In addition, there is no loss of rate-distortion optimality if u 1 is entropy balanced (otherwise, it would have a lower entropy rate than its entropy-balanced counterpart, which differs from the former only on a finite number of samples and has the same asymptotic MSE). From these vectors, define
u ˜ n 1 B n u n 1 ,
y ˜ n 1 B n y n 1 = B n C n ( B n ) 1 x ˜ n 1 + u ˜ n 1 ,
y ¯ n 1 y ˜ n 1 + d n 1 = B n C n ( B n ) 1 x ˜ n 1 + u ˜ n 1 + d n 1 ,
where d n 1 is a zero-mean Gaussian vector independent of ( u ˜ n 1 , x ˜ n 1 ) with finite differential entropy and finite variance such that d k = 0 , k > m . Then, we have that (the change of variables and the steps in this chain of equations is represented by the block diagrams shown in Figure 7)
n R x , n ( D ) = I ( x n 1 ; y n 1 ) = ( a ) I ( B n x n 1 ; B n y n 1 ) = I ( x ˜ n 1 ; y ˜ n 1 )
= h ( y ˜ n 1 ) h ( y ˜ n 1 | x ˜ n 1 )
= ( b ) h ( y ˜ n 1 ) h ( u ˜ n 1 | x ˜ n 1 )
= ( c ) h ( y ˜ n 1 ) h ( u ˜ n 1 )
= ( d ) h ( y ˜ n 1 ) h ( u ˜ n 1 + d n 1 ) + [ h ( u n 1 ) h ( u ˜ n 1 + d n 1 ) ] n G
= ( e ) h ( y ˜ n 1 ) h ( u ˜ n 1 + d n 1 | x ˜ n 1 ) + n G [ h ( u n 1 ) h ( u ˜ n 1 + d n 1 ) ]
= ( f ) h ( y ˜ n 1 ) h ( y ¯ n 1 | x ¯ n 1 ) + n G [ h ( u n 1 ) h ( u ˜ n 1 + d n 1 ) ]
= h ( y ˜ n 1 ) h ( y ¯ n 1 ) + I ( x ˜ n 1 ; y ¯ n 1 ) + n G [ h ( u n 1 ) h ( u ˜ n 1 + d n 1 ) ]
= I ( x ˜ n 1 ; y ¯ n 1 ) + n G [ h ( u n 1 ) h ( u ˜ n 1 + d n 1 ) ] + [ h ( y ˜ n 1 ) h ( y ˜ n 1 + d n 1 ) ] ,
where ( a ) follows from B n being invertible, ( b ) is due to the fact that y ˜ n 1 = P n x ˜ n 1 + u ˜ n 1 , ( c ) holds because u n 1 x n 1 . The equality ( d ) stems from h ( u ˜ n 1 ) = h ( u n 1 ) n G (see (A87)). Equality holds in ( e ) because x ˜ n 1 ( u ˜ n 1 , d n 1 ) and in ( f ) because of (A91). But from Theorem 4 and since u 1 is entropy balanced, lim n 1 n ( h ( u ˜ n 1 + d m 1 ) h ( u n 1 ) ) = 0 . From Lemma 3 and because u 1 is entropy balanced, so is y ˜ 1 . This guarantees, from Lemma 5, that lim n n 1 [ h ( y ˜ n 1 ) h ( y ˜ n 1 + d n 1 ) ] = 0 . Thus, R x , n ( D ) = lim n 1 n ( x ˜ n 1 ; y ¯ n 1 ) + G R x ˜ , n ( D ) + G .
At the same time, the distortion for the source x ˜ n 1 when reconstructed as y ¯ n 1 is
lim n 1 n E y ¯ n 1 x ˜ n 1 2 = lim n 1 n E y ˜ x ˜ n 1 2 + E d n 1 2 = ( a ) lim n 1 n E y ˜ x ˜ n 1 2
= lim n 1 n E B n ( y n 1 x n 1 ) 2 = ( b ) lim n 1 n E y n 1 x n 1 2 ,
where ( a ) holds because d n 1 = d m 1 is bounded, and ( b ) is due to the fact that, in the limit, B ( z ) is a unitary operator. Recalling the definitions of R x ˜ ( D ) and R x ˜ ( D ) , we conclude that lim n 1 n ( x ˜ n 1 ; y ¯ n 1 ) R x ˜ , n ( D ) ; therefore,
R x ( D ) R x ˜ ( D ) i = 1 m log | p i | .
In order to complete the proof, it suffices to show that R x ( D ) R x ˜ ( D ) i = 1 m log | p i | . For this purpose, consider now the (asymptotically) stationary source x ˜ n 1 , and suppose that y ^ n 1 = x ˜ n 1 + u n 1 realizes R x ˜ , n ( D ) . Again, x ˜ n 1 and u n 1 will be jointly Gaussian, satisfying y ^ n 1 u n 1 (the latter condition is required for minimum MSE optimality). From this, one can propose an alternative realization in which the error sequence is u ˜ B n u n 1 , yielding an output y ˜ n 1 = x ˜ n 1 + u ˜ n 1 with y ˜ n 1 u ˜ n 1 . Then,
n R x ˜ , n ( D ) = I ( x ˜ n 1 ; y ^ n 1 ) = h ( x ˜ n 1 ) h ( x ˜ n 1 | y ^ n 1 )
= ( a ) h ( x ˜ n 1 ) h ( u n 1 )
= ( b ) h ( x ˜ n 1 ) h ( u ˜ n 1 ) n G
= ( c ) h ( x ˜ n 1 ) h ( u ˜ n 1 | y ˜ n 1 ) n G
= ( d ) h ( x ˜ n 1 ) h ( x ˜ n 1 | y ˜ n 1 ) n G
= ( a ) I ( x ˜ n 1 ; y ˜ n 1 ) n G
= ( a ) I ( B n x n 1 ; B n y n 1 ) n G
= ( e ) I ( x n 1 ; y n 1 ) n G ,
where ( a ) follows by recalling that y ^ n 1 = x ˜ n 1 + u n 1 and because y ^ n 1 u n 1 , ( b ) stems from (A87), ( c ) is a consequence of y ˜ n 1 u ˜ n 1 , ( d ) follows from the fact that y ˜ n 1 = x ˜ n 1 + u ˜ n 1 . Finally, ( e ) holds because B n is invertible for all n. Since, asymptotically as n , the distortion yielded by y n 1 for the non-stationary source x n 1 is the same which is obtained when x ˜ n 1 is reconstructed as y ^ n 1 (recall (A84)), we conclude that R x ( D ) R x ˜ ( D ) i = 1 M log | p i | , completing the proof.   □

Appendix C. Technical Lemmas and Propositions

Proposition A1.
Let the random vector s κ 1 have finite differential entropy, and suppose its covariance matrix K s κ 1 satisfies λ max ( K s κ 1 ) < . Then, for any unitary matrix A R κ × κ and i = 1 , 2 , , κ
h ( s κ 1 ) κ i 2 log ( 2 π e λ max ( K s κ 1 ) ) h ( [ A ] i 1 s κ 1 ) i 2 log ( 2 π e λ max ( K s κ 1 ) ) .
Proof. 
Define r κ 1 A s κ 1 . Since A is unitary, it follows that h ( r κ 1 ) = h ( s κ 1 ) and that K r κ 1 and K s κ 1 have the same eigenvalues. Therefore,
h ( [ A ] i 1 s κ 1 ) = h ( r i 1 ) ( a ) 1 2 log ( ( 2 π e ) i det ( K r i 1 ) ) ( b ) i 2 log ( 2 π e λ max ( K r i 1 ) ) ( c ) i 2 log ( 2 π e λ max ( K r κ 1 ) ) = i 2 log ( 2 π e λ max ( K s κ 1 ) ) ,
where ( a ) holds because a Gaussian distribution yields the largest differential entropy for a given covariance matrix, ( b ) is from the fact that det ( K s i 1 ) = k = 1 i λ k ( K s i 1 ) and ( c ) is due to the Cauchy interlacing theorem [30]. This proves the upper bound in (A112). For the lower bound, we have
h ( r i 1 ) ( a ) h ( r κ 1 ) h ( r i κ i + 1 ) ( b ) h ( r κ 1 ) κ i 2 log ( 2 π e λ max ( K s κ 1 ) ) ,
where ( a ) stems from the fact that h ( a , b ) h ( a ) + h ( b ) and ( b ) follows from (A113). This completes the proof.   □
Proposition A2.
Let u 1 be a random process such that the variance of u ( n ) , σ u ( n ) 2 < for finite n, and
lim n 1 n log ( σ u ( n ) 2 ) = 0 .
Then, lim n n 1 log ( λ max ( K u n 1 ) ) = 0 .          ▲
Proof. 
The assumptions on u 1 imply that, for every ϵ > 0 , there exists a finite N ϵ such that, for every n N ϵ , σ u ( n ) 2 < e n ϵ and S ( N ϵ ) max { σ u ( 1 ) 2 , σ u ( 2 ) 2 , , σ u ( N ϵ ) 2 } < . Then,
1 n ln ( λ max ( K u n 1 ) ) ( a ) 1 n ln k = 1 n σ u ( k ) 2 < 1 n ln ( N ϵ S ( N ϵ ) + ( n N ϵ ) e n ϵ ) < ( b ) 1 n ln ( ( n N ϵ ) e n ϵ ) + N ϵ S ( N ϵ ) n ( n N ϵ ) e n ϵ ,
where ( a ) holds because k = 1 n λ k ( K u n 1 ) = tr { K u n 1 } , while ( b ) stems from the fact that, for every x , y > 0 , ln ( x + y ) < ln ( x ) + y / x . Thus, for every ϵ > 0 , lim n n 1 log ( λ max ( K u n 1 ) ) < ϵ , which means that lim n n 1 log ( λ max ( K u n 1 ) ) = 0 , completing the proof.   □
Proposition A3.
Let v 1 be an entropy-balanced random process. Then, for each ν N and for every sequence of matrices { Ψ n } n = ν , Ψ n R ν × n with orthonormal rows,
lim n 1 n h ( Ψ n v n 1 ) = 0 .
Proof of Proposition A3.
We will first show that
lim n 1 n h ( Ψ n v n 1 ) 0 .
To see this, notice that, for every Ψ n R ν × n with orthonormal rows, there exists a matrix Φ n R ( n ν ) × n with orthonormal rows which are also orthogonal to those of Ψ n . This means that the matrix [ Ψ n T | Φ n T ] T R n × n is unitary; thus,
h ( v n 1 ) = h ( [ Ψ n T | Φ n T ] T v n 1 ) = ( a ) h ( Φ n v n 1 ) + h ( Ψ n v n 1 | Φ n v n 1 ) ( b ) h ( Φ n v n 1 ) + h ( Ψ n v n 1 ) ,
where ( a ) holds due to the chain-rule of differential entropy and ( b ) follows because conditioning does not increase differential entropy. Therefore, h ( Ψ n v n 1 ) h ( v n 1 ) h ( Φ n v n 1 ) . Dividing this by n, taking the limit when n and recalling that v 1 satisfies (17) yields (A117).
We will now prove that lim n 1 n h ( Ψ n v n 1 ) 0 . For this purpose, let v ˜ 1 be a jointly Gaussian random process with the same second-order statistics as v 1 . Then,
h ( Ψ n v n 1 ) h ( Ψ n v ˜ n 1 ) = 1 2 log ( 2 π e ) ν det ( Ψ n K v ˜ n 1 Ψ n T ) 1 2 log ( 2 π e ) ν λ max ( K v ˜ n 1 ) ν ,
with the inequality due to the fact that Ψ n has orthonormal rows. But v 1 meets the requirements of Proposition A2; thus, lim n 1 n h ( Ψ n v n 1 ) lim n ν 2 n ( 2 π e λ max ( K v ˜ n 1 ) ) = 0 . The proof is completed by combining this result with (A117).   □
Lemma A1.
Let u 1 be a random process with independent elements, and where each element u i is uniformly distributed over possible different intervals [ a i 2 , a i 2 ] , such that a max > | a i | > a min > 0 , i N , for some positive and bounded a min < a max . Then, u 1 is entropy balanced.
Proof. 
Without loss of generality, we can assume that a i 1 , for all i (otherwise, we could scale the input by 1 / a min , which would scale the output by the same proportion, increasing the input entropy by n log ( 1 / a min ) and the output entropy by ( n ν ) log ( 1 / a min ) , without changing the result). The input vector u n 1 is confined to an n-box U n (the support of u 1 n ) of volume V n ( U n ) = i = 1 n a i and has entropy log ( i = 1 n a i ) . This support is an n-box which contains n k 2 n k k-boxes of different k-volume. Each of these k-boxes is determined by fixing n k entries in u n 1 to ± a i / 2 , and letting the remaining k entries sweep freely over [ a i 2 , a i 2 ] . Thus, the k-volume of each k-box is the product of the k support sizes a i of the associated selected free-sweeping entries. But recalling that a i > 1 for all i, the volume of each k-box can be upper bounded by i = 1 n a i . With this, the added volume of all the k-boxes contained in the original n-box can be upper bounded as
V k ( U n ) n k 2 n k i = 1 n a i .
We now use this result to upper bound the entropy rate of y n ν + 1 .
Let y n 1 [ Ψ n T | Φ n T ] T u n 1 where [ Ψ n T | Φ n T ] T R n × n is a unitary matrix and where Ψ n R ν × n and Φ n R ( n ν ) × n have orthonormal rows. From this definition, y n ν + 1 will distribute over a finite region Y n ν + 1 R n ν , corresponding to the projection onto the ( n ν ) -dimensional span of the rows of Φ n . Hence, h ( y n ν + 1 ) is upper bounded by the entropy of a uniformly distributed vector over the same support, i.e., by log V n ν ( Y n ν + 1 ) , where V n ν ( Y n ν + 1 ) is the ( n ν ) -dimensional volume of this support. In turn, V n ν ( Y n ν + 1 ) is upper bounded by the sum of the volume of all ( ν k ) -dimensional boxes contained in the n-box in which u n 1 is confined, which we already denoted by V n ν ( U n ) , and which is upper bounded as in (A120). Therefore,
h ( y n 1 + ν ) log V n ν ( Y n ν + 1 ) log V n ν ( U n ) log n ! ( n ν ) ! ν ! 2 ν i = 1 n a i
= log n ν 2 ν + log n ! ( n ν ) ! n ν ν ! + log i = 1 n a i .
Recalling that h ( u n 1 ) = log ( i = 1 n a i ) , dividing by n and taking the limit as n yields
lim n 1 n h ( y n ν + 1 ) h ( u n 1 ) 0 .
On the other hand,
h ( y n ν + 1 ) = h ( y n 1 ) h ( y ν 1 | y n ν + 1 ) = ( a ) h ( u n 1 ) h ( y ν 1 | y n ν + 1 ) h ( u n 1 ) h ( y ν 1 ) ,
where ( a ) follows because [ Ψ n T | Φ n T ] T is an orthogonal matrix. Letting ( y G ) ν 1 correspond to the jointly Gaussian sequence with the same second-order moments as y ν 1 , and recalling that the Gaussian distribution maximizes differential entropy for a given covariance, we obtain the upper bound
h ( y ν 1 ) h ( ( y G ) ν 1 ) = ( a ) 1 2 log ( 2 π e ) ν det ( Ψ n diag { σ u i 2 } i = 1 n Ψ n T ) ( b ) ν 2 log 2 π e max { σ u i 2 } i = 1 n ,
where ( a ) follows since the { u i } i = 1 n are independent, and ( b ) stems from the fact that Ψ n R ν × n has orthonormal rows. Since max { σ u i 2 } i = 1 n is bounded for all n, we obtain by substituting (A125) into (A124) that lim n 1 n ( h ( y n ν + 1 ) h ( u n 1 ) ) 0 . The combination of this with (A123) yields lim n 1 n ( h ( y n ν + 1 ) h ( u n 1 ) ) = 0 , satisfying Condition ii) in Definition 2. From this, the proof is completed by noting that u 1 satisfies Condition i) in Definition 2. This completes the proof.   □
Lemma A2.
Let A ( z ) be a causal, finite-order, stable and strictly minimum-phase rational transfer function with impulse response a 0 , a 1 , such that a 0 = 1 . Then, lim n λ 1 ( A n A n T ) > 0 and lim n λ n ( A n A n T ) < .
Proof of Lemma A2.
The fact that lim n λ n ( A n A n T ) is upper bounded follows directly from the fact that A ( z ) is a stable transfer function. On the other hand, A n A n T is positive definite, with lim n λ 1 ( A n A n T ) 0 . Suppose that lim n λ 1 ( A n A n T ) = 0 . If this were true, then it would hold that lim n λ n ( A n 1 A n T ) = . But A n 1 is the lower triangular Toeplitz matrix associated with A 1 ( z ) , which is stable (since A ( z ) is minimum phase), implying that lim n λ n ( A n 1 A 1 T ) < , thus leading to a contradiction. This completes the proof.   □
We re-state here (for completeness and convenience) the unnumbered lemma in the proof of Reference [16], Theorem 1, as follows:
Lemma A3.
Let the transfer function G ( z ) satisfy Assumption 1 and suppose it has no poles. Then,
λ l ( G n G n T ) = α n , l 2 ( ρ l ) 2 n , if l m , α n , l 2 , o t h e r w i s e ,
where the elements in the sequence { α n , l } are positive and increase or decrease at most polynomially with n.
Lemma A4.
Let A , B be matrices with the same dimensions. Then,
λ min ( A + B ) ( A + B ) T λ min ( A A T ) + λ min ( B B T ) 2 σ max ( A ) σ max ( B ) .
Proof. 
For every x such that x = 1 ,
x T ( A + B ) ( A + B ) T x = x T A A T x + x T B B T x + x T A B T x + x T B A T x λ min ( A A T ) + λ min ( B B T ) 2 σ max ( A ) σ max ( B ) ,
where the last inequality holds because A A T and B B T are symmetric and because of the Cauchy-Schwartz inequality. The proof is completed by noting that (A128) holds for the x that minimizes x T ( A + B ) ( A + B ) T x , and λ min ( ( A + B ) ( A + B ) T ) = min x : x = 1 x T ( A + B ) ( A + B ) T x .   □

References

  1. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
  2. Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
  3. Itô, H. Principle of the minimum entropy in information theory. Proc. Jpn. Acad. 1953, 29, 194–197. [Google Scholar] [CrossRef]
  4. O’Neal, J. Bounds on subjective performance measures for source encoding systems. IEEE Trans. Inf. Theory 1971, 17, 224–231. [Google Scholar] [CrossRef]
  5. Pierobon, M.; Akyildiz, I.F. Capacity of a Diffusion-Based Molecular Communication System With Channel Memory and Molecular Noise. IEEE Trans. Inf. Theory 2013, 59, 942–954. [Google Scholar] [CrossRef]
  6. Akyildiz, I.F.; Pierobon, M.; Balasubramaniam, S. An Information Theoretic Framework to Analyze Molecular Communication Systems Based on Statistical Mechanics. Proc. IEEE 2019, 107, 1230–1255. [Google Scholar] [CrossRef]
  7. Aaron, M.R.; McDonald, R.A.; Protonotarios, E. Entropy power loss in linear sampled data filters. Proc. IEEE 1967, 55, 1093–1094. [Google Scholar] [CrossRef]
  8. Papoulis, A.; Pillai, S.U. Probability, Random Variables and Stochastic Processes, 3rd ed.; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
  9. Kim, Y.H. Feedback capacity of stationary Gaussian channels. IEEE Trans. Inf. Theory 2010, 56, 57–85. [Google Scholar] [CrossRef]
  10. Rudin, W. Real and Complex Analysis; McGraw-Hill: New York, NY, USA, 1987. [Google Scholar]
  11. Serón, M.M.; Braslavsky, J.H.; Goodwin, G.C. Fundamental Limitations in Filtering and Control; Springer: London, UK, 1997. [Google Scholar]
  12. Martins, N.C.; Dahleh, M.M.A. Fundamental limitations of performance in the presence of finite capacity feedback. In Proceedings of the 2005, American Control Conference, Portland, OR, USA, 8–10 June 2005. [Google Scholar]
  13. Martins, N.; Dahleh, M. Feedback control in the presence of noisy Channels: “Bode-like” fundamental limitations of performance. IEEE Trans. Autom. Control 2008, 53, 1604–1615. [Google Scholar] [CrossRef]
  14. Yu, S.; Mehta, P.G. Bode-Like Fundamental Performance Limitations in Control of Nonlinear Systems. IEEE Trans. Autom. Control 2010, 55, 1390–1405. [Google Scholar] [CrossRef]
  15. Gray, R.M. Information rates of autorregressive processes. IEEE Trans. Inf. Theory 1970, IT-16, 412–421. [Google Scholar] [CrossRef]
  16. Hashimoto, T.; Arimoto, S. On the rate-distortion function for the nonstationary Gaussian autoregressive process. IEEE Trans. Inf. Theory 1980, IT-26, 478–480. [Google Scholar] [CrossRef]
  17. Gray, R.M.; Hashimoto, T. A note on rate-distortion functions for nonstationary Gaussian autoregressive processes. IEEE Trans. Inf. Theory 2008, 54, 1319–1322. [Google Scholar] [CrossRef] [Green Version]
  18. Gray, R.M. Toeplitz and circulant matrices: A review. Found. Trends Commun. Inf. Theory 2006, 2, 155–239. [Google Scholar] [CrossRef]
  19. Rényi, A. On the dimension and entropy of probability distributions. Acta Math. Hung. 1959, 10, 193–215. [Google Scholar] [CrossRef]
  20. Shannon, C. Coding theorems for a discrete source with a fidelity criterion. IRNE Nat. Conv. Rec. 1959, 4, 143–163. [Google Scholar]
  21. Goodwin, G.C.; Graebe, S.F.; Salgado, M.E. Control System Design, 1st ed.; Prentice Hall PTR: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
  22. Chen, C.T. Linear System Theorey and Design, 3rd ed.; The Oxford Series in Electrical and Computing Engineering; Oxford University Press: Oxford, UK, 1999. [Google Scholar]
  23. Elia, N. When Bode meets Shannon: Control-oriented feedback communication schemes. IEEE Trans. Autom. Control 2004, 49, 1477–1488. [Google Scholar] [CrossRef]
  24. Silva, E.I.; Derpich, M.S.; Østergaard, J. A framework for control system design subject to average data-rate constraints. IEEE Trans. Autom. Control 2011, 56, 1886–1899. [Google Scholar] [CrossRef] [Green Version]
  25. Silva, E.I.; Derpich, M.S.; Østergaard, J. An achievable data-rate region subject to a stationary performance constraint for LTI plants. IEEE Trans. Autom. Control 2011, 56, 1968–1973. [Google Scholar] [CrossRef] [Green Version]
  26. Yüksel, S. Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems. IEEE Trans. Inf. Theory 2012, 58, 6332–6354. [Google Scholar] [CrossRef] [Green Version]
  27. Freudenberg, J.S.; Middleton, R.H.; Braslavsky, J.H. Minimum Variance Control Over a Gaussian Communication Channel. IRE Trans. Comm. Syst. 2011, 56, 1751–1765. [Google Scholar] [CrossRef] [Green Version]
  28. Ardestanizadeh, E.; Franceschetti, M. Control-theoretic approach to communication with feedback. IEEE Trans. Autom. Control 2012, 57, 2576–2587. [Google Scholar] [CrossRef]
  29. Grenander, U.; Szegö, G. Toeplitz Forms and Their Applications; University of California Press: Berkeley, CA, USA, 1958. [Google Scholar]
  30. Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
  31. Fiedler, M. Bounds for the determinant of the sum of Hermitian matrices. Proc. Am. Math. Soc. 1971, 30, 27–31. [Google Scholar] [CrossRef]
Figure 1. A causal, stable, linear and time-invariant system G with input and output processes, initial state, and output disturbance.
Figure 1. A causal, stable, linear and time-invariant system G with input and output processes, initial state, and output disturbance.
Entropy 23 00947 g001
Figure 2. Support of u (laying in the u-v plane) compared to that of y = G ˘ u (the rhombus in R 3 ).
Figure 2. Support of u (laying in the u-v plane) compared to that of y = G ˘ u (the rhombus in R 3 ).
Entropy 23 00947 g002
Figure 3. Image of the cube [ 0 , 1 ] 3 through the square matrix with columns [ 1 2 0 ] T , [ 0 1 2 ] T , and [ 0 0 1 ] T .
Figure 3. Image of the cube [ 0 , 1 ] 3 through the square matrix with columns [ 1 2 0 ] T , [ 0 1 2 ] T , and [ 0 0 1 ] T .
Entropy 23 00947 g003
Figure 4. (Left): LTI system P within a noisy feedback loop. (Right): equivalent system when the feedback channel is noiseless and has unit gain.
Figure 4. (Left): LTI system P within a noisy feedback loop. (Right): equivalent system when the feedback channel is noiseless and has unit gain.
Entropy 23 00947 g004
Figure 5. (Top): The class of feedback channels described by Assumption 3. (Bottom): an equivalent form.
Figure 5. (Top): The class of feedback channels described by Assumption 3. (Bottom): an equivalent form.
Entropy 23 00947 g005
Figure 6. Block diagram representation of how the non-stationary source x 1 is built and then reconstructed as y = x + u .
Figure 6. Block diagram representation of how the non-stationary source x 1 is built and then reconstructed as y = x + u .
Entropy 23 00947 g006
Figure 7. Block-diagram representation of the changes of variables in the proof of Theorem 9.
Figure 7. Block-diagram representation of the changes of variables in the proof of Theorem 9.
Entropy 23 00947 g007
Figure 8. Block diagram representation a non-white Gaussian channel y = x + z and the coding scheme considered in Reference [9].
Figure 8. Block diagram representation a non-white Gaussian channel y = x + z and the coding scheme considered in Reference [9].
Entropy 23 00947 g008
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Derpich, M.S.; Müller, M.; Østergaard, J. The Entropy Gain of Linear Systems and Some of Its Implications. Entropy 2021, 23, 947. https://doi.org/10.3390/e23080947

AMA Style

Derpich MS, Müller M, Østergaard J. The Entropy Gain of Linear Systems and Some of Its Implications. Entropy. 2021; 23(8):947. https://doi.org/10.3390/e23080947

Chicago/Turabian Style

Derpich, Milan S., Matias Müller, and Jan Østergaard. 2021. "The Entropy Gain of Linear Systems and Some of Its Implications" Entropy 23, no. 8: 947. https://doi.org/10.3390/e23080947

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop