The Entropy Gain of Linear Systems and Some of Its Implications

Derpich, Milan S.; Müller, Matias; Østergaard, Jan

doi:10.3390/e23080947

Open AccessFeature PaperArticle

The Entropy Gain of Linear Systems and Some of Its Implications

by

Milan S. Derpich

^1,*

,

Matias Müller

^1,*

and

Jan Østergaard

^2,*

¹

Department of Electronic Engineering, Universidad Técnica Federico Santa María, Av. España 1680, Valparaíso 2390123, Chile

²

Department of Electronic Systems, Aalborg University, 9220 Aalborg, Denmark

^*

Authors to whom correspondence should be addressed.

Entropy 2021, 23(8), 947; https://doi.org/10.3390/e23080947

Submission received: 23 May 2021 / Revised: 12 July 2021 / Accepted: 20 July 2021 / Published: 24 July 2021

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

We study the increase in per-sample differential entropy rate of random sequences and processes after being passed through a non minimum-phase (NMP) discrete-time, linear time-invariant (LTI) filter G. For LTI discrete-time filters and random processes, it has long been established by Theorem 14 in Shannon’s seminal paper that this entropy gain,

G (G)

, equals the integral of

\log | G (e^{j ω}) |

. In this note, we first show that Shannon’s Theorem 14 does not hold in general. Then, we prove that, when comparing the input differential entropy to that of the entire (longer) output of G, the entropy gain equals

G (G)

. We show that the entropy gain between equal-length input and output sequences is upper bounded by

G (G)

and arises if and only if there exists an output additive disturbance with finite differential entropy (no matter how small) or a random initial state. Unlike what happens with linear maps, the entropy gain in this case depends on the distribution of all the signals involved. We illustrate some of the consequences of these results by presenting their implications in three different problems. Specifically: conditions for equality in an information inequality of importance in networked control problems; extending to a much broader class of sources the existing results on the rate-distortion function for non-stationary Gaussian sources, and an observation on the capacity of auto-regressive Gaussian channels with feedback.

Keywords:

differential entropy rate; non-minimum phase linear time-invariant systems; entropy loss in linear filters; networked control; rate-distortion for non-stationary sources feedback capacity

1. Introduction

We study the difference between the differential entropy rate of a random process

u_{1}^{\infty} = {u_{1}, u_{2}, \dots}

entering a discrete-time linear time-invariant (LTI) system G and the differential entropy rate of its (possibly noisy) output

y_{1}^{\infty}

, as depicted in Figure 1.

Recall that the differential entropy rate of a random process

x_{1}^{\infty}

is given by

\bar{h} (x_{1}^{\infty}) ≜ \lim_{n \to \infty} n^{- 1} h (x_{1}, x_{2}, \dots, x_{n})

, provided the limit exists, where

h (x_{1}, \dots, x_{n}) = E [- \log f (x_{1}, \dots, x_{n})]

is the differential entropy of the ensemble

x_{1}, \dots, x_{n}

with probability density function (PDF) f [1]. The system G is supposed to satisfy the following:

Assumption 1.

The LTI system G in Figure 1 is causal and stable and such that

1.: G has a rational p-th order transfer function $G (z)$ with m zeros ${ρ_{i}}_{i = 1}^{m}$ outside the unit circle, i.e., non-minimum-phase (NMP) zeros, where $m \in {0, 1, \dots, p}$ , indexed in non-increasing magnitude order, i.e., $| ρ_{1} | \geq | ρ_{2} | \geq \dots \geq | ρ_{m} | > 1$ .
2.: The unit-impulse response of G, say, $g_{0}, g_{1}, \dots$ satisfies $| g_{0} | = 1$ .

In this general setup, G may have a random initial state vector

x_{0} \in R^{p}

,

p \in N

, and a real-valued random output disturbance

z_{1}^{\infty}

. Our main purpose is to characterize the limit

\begin{matrix} G (G, x_{0}, u_{1}^{\infty}, z_{1}^{\infty}) ≜ \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})), \end{matrix}

(1)

evaluating the possible effect produced by

x_{0}

and

z_{1}^{\infty}

. This difference can be interpreted as the entropy gain (entropy amplification or entropy boost) introduced by the filter G and (as apparent from the other variables in the argument of

G

) the statistics of

x_{0}, u_{1}^{\infty}, z_{1}^{\infty}

. We shall refer to the special case in which

x_{0}

and

z_{1}^{\infty}

are both zero (or deterministic) as the noise-less case, and write

G (G, 0, u_{1}^{\infty}, 0)

accordingly.

The earliest reference related to this problem corresponds to a noise-less continuous-time counterpart considered by Shannon. In his seminal 1948 paper [2], Shannon gave a formula for the change in differential entropy per degree of freedom that a continuous-time random process

u_{c}

, band-limited to a frequency range

[0, B)

(in Hz), experiences after passing through an LTI continuous-time filter

G_{c}

(without considering a random initial state or an output disturbance). Such entropy per degree of freedom is defined in terms of uniformly taken samples as

\begin{matrix} \bar{h} (u_{c}) & ≜ \lim_{n \to \infty} \frac{1}{n} h (u_{c} (T), u_{c} (2 T), \dots, u_{c} (n T)), \end{matrix}

(2)

with

T ≜ 1 / (2 B)

. In this formula, if the LTI filter has frequency response

G_{c} (ξ)

(with

ξ

in Hz), then the resulting differential entropy rate of the output process

y_{c}

is given by the following theorem:

Theorem 1

(Reference [2], Theorem 14). If an ensemble having an entropy

\bar{h} (u_{c})

per degree of freedom in band B is passed through a filter with characteristic

G_{c} (ξ)

the output ensemble has an entropy

\begin{matrix} \bar{h} (y_{c}) = \bar{h} (u_{c}) + \frac{2}{B} \int_{0}^{B} \log |G_{c} (ξ)| d ξ . \end{matrix}

(3)

Shannon arrived at (3) by arguing that an LTI filter can be seen as a linear operator that selectively scales its input signal along infinitely many frequencies, each of them representing an orthogonal component of the source. He then obtained the result by writing down the determinant of the Jacobian of this operator as the product of the squared frequency response magnitude of the filter over n frequency bands, applying logarithm, dividing by n, and then taking the limit as n tends to infinity.

Remark 1.

There is a factor of two in excess in the integral on the right-hand side (RHS) of (3). To see this, consider a filter with a constant gain a over

[0, B)

(i.e., a simple multiplicative factor). In such case, the entropy rate of

y_{c}

should exceed that of

u_{c}

by

\log |a|

[1]. However, (3) yields an entropy gain equal to

2 \log |a|

. This error arises because the determinant of the Jacobian of the transformation is actually the product of

|G_{c}|

over the n frequency bands considered in Shannon’s argument. Such excess factor of two is also present in the entropy losses appearing in Reference [2], Table 1.

Theorem 14 in Reference [2] has found application in works ranging from traditional themes, such as linear prediction [3] and source coding [4], to molecular communication systems [5,6].

The available literature treating the phenomenon itself of the entropy gain (loss, boost, or amplification) induced by LTI systems seems to be rather scarce. This is not surprising given that (3) was published in Reference [2], Theorem 14, the work which gave birth to Information Theory.

The following publication concerned with this problem is Reference [7], following a time-domain analysis for the corresponding discrete-time problem. In this approach, one can obtain

y_{1}^{n} ≜ {y (1), y (2), \dots, y (n)}

as a function of

u_{1}^{n}

, for every

n \in N

, and evaluate the difference between the limits

\bar{h} (y_{1}^{\infty})

and

\bar{h} (u_{1}^{\infty})

, obtained by letting

n \to \infty

. More precisely, for an LTI discrete-time filter G with impulse response

g_{0}^{\infty} = {g_{0}, g_{1}, \dots}

, we can write

\begin{matrix} y_{n}^{1} = \underset{G_{n}}{\underset{⏟}{(\begin{matrix} g_{0} & 0 & \dots & 0 \\ g_{1} & g_{0} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ g_{n - 1} & g_{n - 2} & \dots & g_{0} \end{matrix})}} u_{n}^{1}, \end{matrix}

(4)

where we adopt the notation

y_{n}^{1}

for column vectors to avoid the abuse of notation incurred by treating the sequence

y_{1}^{n}

as a vector, and because, by writing

y_{n}^{1}

, it is easier to remember that its samples are ordered from top to bottom.

y_{n}^{1} ≜ {[y (1) y (2) \dots y (n)]}^{T}

and the random vector

u_{n}^{1}

is defined likewise. From this, it is clear (see, e.g., the corollary after Theorem 8.6.4 in Reference [1]) that

\begin{matrix} h (y_{n}^{1}) = h (u_{n}^{1}) + \log | \det (G_{n}) |, \end{matrix}

(5)

where

\det (G_{n})

(or simply

\det G_{n}

) stands for the determinant of

G_{n}

. This result is utilized in Reference [7] to show that no entropy gain is produced by a stable minimum phase LTI system G if and only if the first sample in its impulse response has unit magnitude.

In Reference [8], p. 568, the entropy gain of a discrete-time LTI system G (the noise-less version of the setup depicted in Figure 1) is found to be

\begin{matrix} \bar{h} (y_{1}^{\infty}) = \bar{h} (u_{1}^{\infty}) + \frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω, \end{matrix}

(6)

where

y_{1}^{\infty}

is the filter’s discrete-time output process (without the effect of random initial state or an output disturbance) and

\begin{matrix} \bar{h} (y_{1}^{\infty}) ≜ \lim_{n \to \infty} \frac{1}{n} h (y_{1}^{n}) . \end{matrix}

(7)

This result was obtained starting from the fact that, for a Gaussian stationary process

u_{1}^{\infty}

with power spectral density (PSD)

S_{u} (e^{j ω})

,

\bar{h} (u_{1}^{\infty}) = \frac{1}{2 π} \int_{- π}^{π} S_{u} (e^{j ω}) d ω

. If

u_{1}^{\infty}

enters a discrete-time LTI system with frequency response

G (e^{j ω})

, then the PSD of its output

y_{1}^{\infty}

is

S_{y} (e^{j ω}) = S_{u} (e^{j ω}) {|G (e^{j ω})|}^{2}

; thus, it is argued that (6) follows for Gaussian stationary inputs. Then, Reference [8] extends the result for non-Gaussian inputs with a proof sketch which uses a time-domain relation, like (4), to point out that the filter is a linear operator and, as such, the differential entropy of its output exceeds that of its input by a quantity that is independent of the input distribution. (It is worth noting that (6) is the discrete-time equivalent of (3) (without its wrong factor of 2), which follows directly from the correspondence between sampled band-limited continuous-time systems and discrete-time systems.)

It is in Reference [9], Section II-C, where, for the first time, it is shown that, for a stationary Gaussian input

u_{1}^{\infty}

, the full entropy gain predicted by (6) takes place if the system output

y_{1}^{\infty}

is contaminated by an additive output disturbance of length p and positive definite covariance matrix, where p is the order of

G (z)

.

The integral

\frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω

can be related to the structure of the filter G. It is well known (from Jensen’s formula) that if G has a causal and stable rational transfer function

G (z)

and an impulse response with its first sample

g_{0} ≜ \lim_{z \to \infty} G (z)

, then

\begin{matrix} \frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω = \log | g_{0} | + \sum_{i : |ρ_{i}| > 1} \log |ρ_{i}|, \end{matrix}

(8)

where

{ρ_{i}}

are the zeros of

G (z)

(see, e.g., References [10,11]). This provides a straightforward formula to evaluate

\frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω

of a given LTI filter with rational transfer function

G (z)

. When combined with (6), this equation also reveals that if the entropy gain

G (u_{1}^{\infty}, y_{1}^{\infty})

is negative (i.e., if it corresponds to an entropy loss), then

| g_{0} | < 1

(with the corresponding change of variables, this is the case in all the examples given by Shannon in Reference [2], Table 1). More importantly, (8) allows us to concentrate, without loss of generality, on LTI systems

G (z)

, whose first impulse-response sample has unit magnitude, as required by Assumption 1. Under the latter condition, (8) shows that the entropy gain is greater than zero if and only if

G (z)

has zeros outside the unit disk

D ≜ {ρ \in C : |ρ| \leq 1}

. A system with the latter property is said to be non-minimum phase (NMP); conversely, a system with all its zeros inside

D

is said to be minimum phase (MP) [11].

1.1. Main Contributions of this Paper

The main contributions of this paper can be summarized as follows:

Our first main result is showing that (6) and (3) do not hold for a large class of continuous-time filters and inputs. To see this, notice that

$\begin{matrix} | g_{0} | = 1 ⟹ | \det (G_{n}) | = 1, \forall n \in N, \end{matrix}$

(9)

which, in view of (5), is equivalent to $h (y_{n}^{1}) = h (u_{n}^{1}), \forall n \in N$ . In turn, this implies that $\bar{h} (y_{1}^{\infty}) - \bar{h} (u_{1}^{\infty}) = 0$ , regardless of whether $G (z)$ (i.e., the polynomial $g_{0} + g_{1} z^{- 1} + \dots$ ) has zeros with magnitude greater than one (choose, for example, $g_{0} = 1, g_{1} = 2$ , and $g_{k} = 0$ for $k \geq 2$ ). This reveals that (4) holds if and only $G (z)$ is MP. But (6) and (3) are equivalent (correcting for the in excess factor of 2 discussed in Remark 1); thus, Theorem 14 in Reference [2] also does not hold for a class of continuous-time filters. However, the transfer function $G_{c} (s)$ of a band-limited continuous-time filter $G_{c}$ is defined only for imaginary values of s (because the bilateral Laplace transform of $\sin (t) / t$ converges only on the imaginary axis), so one cannot classify such filters as MP or NMP. Instead, we consider a class of continuous-time filters limited to the frequencies in the band $[0, B)$ , where $B > 0$ is in [Hz], defined by having a unit-impulse response of the form

$\begin{matrix} g (t) ≜ \sum_{k = 0}^{η} g_{k} ϕ_{k} (t), \end{matrix}$

(10)

for some absolutely summable sequence of real-valued coefficients ${g_{i}}_{i = 0}^{η}$ , $η = 1, 2, \dots$ , where the sinc functions

$\begin{matrix} ϕ_{k} (t) & ≜ \frac{\sin (2 π B [t - k / (2 B)])}{π [t - k / (2 B)]} . \end{matrix}$

(11)

Since every such g satisfies $g (k / (2 B)) = 0$ for $k < 0$ , it makes sense to refer to such filters as “sample-wise causal”. For this class of band-limited filters, we show that Theorem 14 holds if and only if the z-transform of ${g_{i}}_{i = 0}^{η}$ is MP:

Theorem 2.

Suppose

G_{c}

is a low-pass continuous-time filter with unit-impulse response as in (10). Let the continuous-time random input of

G_{c}

be

\begin{matrix} u_{c} (t) = \frac{1}{2 B} \sum_{k = 1}^{\infty} u (k) ϕ_{k} (t), \end{matrix}

(12)

for some random sequence

{u (k)}_{k = 1}^{\infty}

, with

ϕ_{k}

as in (11), and denote its output as

y_{c}

. Then,

\begin{matrix} \bar{h} (y_{c}) - \bar{h} (u_{c}) = \log |g_{0}| = \log |\frac{1}{B} \int_{0}^{B} ℜ \{G_{c} (ξ)\} d ξ| \overset{(a)}{\leq} \frac{1}{B} \int_{0}^{B} \log |G_{c} (ξ)| d ξ, \end{matrix}

(13)

with equality in

(a)

if and only if the polynomial

g_{0} + g_{1} z^{- 1} + g_{2} z^{- 2} \dots

has no roots outside the unit circle.

2.

We show that

\frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω

actually corresponds to the entropy gain introduced by G but considering the new notion of effective differential entropy rate of

y_{1}^{\infty}

proposed in this paper, defined next.

Definition 1

(The Effective Differential Entropy). Let

y \in R^{ℓ}

be a random vector. If

y

can be written as a linear transformation

y = S u

, for some

u \in R^{n}

(

n \leq ℓ

) with bounded differential entropy,

S \in R^{ℓ \times n}

, then the effective differential entropy of

y

is defined as

\begin{matrix} \overset{˘}{h} (y) ≜ h (A y), \end{matrix}

(14)

where

S = A^{T} T C

is an SVD for

S

, with

T \in R^{n \times n}

.

We can now state our second main result, the proof of which is in Appendix A:

Theorem 3.

Let

u_{1}^{\infty}

be the input of an LTI system G with transfer function

G (z)

without zeros on the unit circle and with an absolutely summable unit impulse response

{g_{i}}_{i = 0}^{η - 1}

, with

η = \infty

if G has an infinite impulse response. Denote the output of G as

y_{1}^{\infty}

. Suppose

|h (u_{1}^{n})| < \infty

for every finite n. Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (\overset{˘}{h} (y_{1}^{n + η} (u_{1}^{n})) - h (u_{1}^{n})) = \frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω, \end{matrix}

(15)

where

y_{1}^{n + η} (u_{1}^{n})

denotes the entire response of G to the input

u_{1}^{n}

.

Theorem 3 states that, when considering the full-length output of a system, the effective entropy gain is introduced by the system itself.

Section 4 provides a geometrical description of the phenomenon behind Definition 1 and Theorem 3.

2.

We show that

\frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω

is a tight upper bound to the entropy gain of G (as defined in (1)), when the output is contaminated by some additional additive signal, such as a random initial state (represented by

x_{0}

in Figure 1) or an output disturbance (such as

z_{1}^{\infty}

in Figure 1), with sufficiently many degrees of freedom (a condition formally stated in Assumption 2 below). Moreover, we show that an entropy gain equal to the latter upper bound can appear even when these disturbances or random initial state have infinitesimally small variances. To the best of our knowledge, the latter phenomenon has been discussed in the literature first (and only) in Reference [9], Section II-C, for Gaussian stationary inputs and an LTI filter. We go beyond the latter result by explicitly and fully characterizing the entropy gain of LTI systems for a large class of not necessarily Gaussian nor stationary random input. We refer to this class as entropy-balanced processes, formally specified in the following definition:

Definition 2.

A random process

{v (k)}_{k = 1}^{\infty}

is said to be entropy balanced if the following two conditions are satisfied:

(i): Its sample variances $σ_{v (n)}^{2}$ are finite for finite n and

$\begin{matrix} \lim_{n \to \infty} \frac{1}{n} \log (σ_{v (n)}^{2}) = 0 . \end{matrix}$

(16)
(ii): For every $ν \in N$ and for every sequence of matrices ${Φ_{n}}_{n = ν + 1}^{\infty}$ , $Φ_{n} \in R^{(n - ν) \times n}$ with orthonormal rows,

$\begin{matrix} \lim_{n \to \infty} \frac{1}{n} & (h (Φ_{n} v_{n}^{1}) - h (v_{n}^{1})) = 0 . \end{matrix}$

(17)

The second condition guarantees that projecting an entropy-balanced process onto any subspace having finitely fewer dimensions yields a process with the same differential entropy rate.

The entropy gain induced by finite-length output disturbances is characterized by our next theorem.

Theorem 4.

In the system of Figure 1, let G satisfy Assumption 1 and suppose that

u_{1}^{\infty}

is entropy balanced. Suppose the random output disturbance

z_{1}^{\infty}

is such that

z (i) = 0, \forall i > κ

, and that

| h (z_{1}^{κ}) | < \infty

. Let

\bar{κ} ≜ \min {κ, m}

, where m is the number of NMP zeros of

G (z)

. Then,

\sum_{i = m - \bar{κ} + 1}^{m} \log | ρ_{i} | \leq {\lim \sup}_{n \to \infty} \frac{1}{n} (h (y_{n}^{1}) - h (u_{n}^{1})) \leq \sum_{i = 1}^{\bar{κ}} \log | ρ_{i} | \overset{(a)}{\leq} \frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω

(18)

with equality in

(a)

if and only if

κ \geq m

.

The proof is presented in Section 6.4, and we provide geometrical insight explaining the phenomenon underlying Definition 2 and Theorem 4 in Section 5.1.

2.

We illustrate the relevance of the results summarized above by applying them to three problems in three areas, namely:

(a): Networked Control: We show that equality holds in the inequality stated in Reference [12], Lemma 3.2 (a fundamental piece for the performance limitation results further developed in Reference [13]), under very general conditions. In addition, we extend the validity of a related equality for the perfect-feedback case, given by Reference [14], Theorem 14, for Gaussian signals, to the much larger class of entropy-balanced processes.
(b): The rate-distortion function for non-stationary Gaussian sources: This problem has been previously solved in References [15,16,17]. We provide a simpler proof based upon the results described above. This proof extends the result stated in References [16,17] to a broader class of non-stationary sources.
(c): Gaussian channel capacity with feedback: We show that capacity results based on using a short random sequence as channel input and relying on a feedback filter which boosts the entropy rate of the end-to-end channel noise (such as the one proposed in Reference [9]), crucially depend upon the complete absence of any additional disturbance anywhere in the system. Specifically, we show that the information rate of such capacity-achieving schemes drops to zero in the presence of any such additional disturbance. As a consequence, the relevance of characterizing the robust (i.e., in the presence of disturbances) feedback capacity of Gaussian channels, which appears to be a fairly unexplored problem, becomes evident.

1.2. Paper Outline

The remainder of this paper begins with some necessary definitions and preliminary results in Section 2. It continues with our detailed exposition in Section 3 of why Shannon’s reasoning fails to yield the right expression for the entropy gain. We present an intuitive discussion leading to the definition of effective differential entropy in Section 4, which is ended by the proof of Theorem 3. Section 5 gives a geometric interpretation of how an arbitrarily small additive perturbation is able to boost the differential entropy rate of the process coming out of an NMP LTI filter. This exposition helps understanding and justifies the introduction of entropy-balanced random processes, which are also characterized there. Section 6 and Section 7 contain our results for the entropy gain produced by an output disturbance and a random initial state, respectively. Our illustrative application results are presented in Section 8, followed by our conclusions in Section 9. Except when presented right after a statement or in its own section, all proofs are given in Appendix B.

2. Preliminaries

2.1. Notation

The sets of natural, real and complex numbers are denoted

N

,

R

, and

C

, respectively. For a complex x,

ℜ {x}

is the real part of x. For a set

S

, the indicator function

1_{S} (x)

equals 1 if

x \in S

and 0 otherwise. For any LTI system G, the transfer function

G (z)

corresponds to the z-transform of the impulse response

g_{0}, g_{1}, \dots

, i.e.,

G (z) = \sum_{i = 0}^{\infty} g_{i} z^{- i}

. For a transfer function

G (z)

, we denote by

G_{n} \in R^{n \times n}

the lower triangular Toeplitz matrix having

{[g_{0} \dots g_{n - 1}]}^{T}

as its first column. We write

x_{1}^{n}

as a shorthand for the sequence

{x_{1}, \dots, x_{n}}

, and, when convenient, we write

x_{1}^{n}

in vector form as

x_{n}^{1} ≜ {[x_{1} x_{2} \dots x_{n}]}^{T}

, where

{()}^{T}

denotes transposition. Random scalars (vectors) are denoted using non-italic characters, such as x (non-italic and boldface characters, such as

x

). The notation

x ⫫ y

means

x

and y are independent. If x and z are conditionally independent given y, we write

x ⟷ y ⟷ z

. For matrices, we use upper-case boldface symbols, such as

A

. We write

λ_{i} (A)

to denote the i-th eigenvalue of

A

sorted in increasing magnitude. If

A \in C^{m \times n}

,

A^{H}

is its conjugate transpose, and

σ_{i} (A) ≜ \sqrt{λ_{i} (A^{H} A)}

, if

m \geq n

, and

σ_{i} (A) ≜ \sqrt{λ_{i} (A A^{H})}

, if

m < n

. We define

σ_{\min} (A) ≜ σ_{1} (A)

and

σ_{\max} (A) ≜ σ_{\min {m, n}} (A)

. The term

A_{i, j}

denotes the entry in the intersection between the i-th row and the j-th column. If

A \in C^{m \times n}

, then

A^{T}

and

A^{*}

denote the transpose and conjugate transpose of

A

, respectively. We write

{[A]}_{i_{2}}^{i_{1}}

, with

1 \leq i_{1} \leq i_{2} \leq m

, to refer to the matrix formed by selecting the rows

i_{1}

to

i_{2}

of

A

. Likewise, for

1 \leq j_{1} \leq j_{2} \leq n

,

\overset{j_{1} j_{2}}{\overset{⎴}{A}}

is the matrix built with columns

j_{1}

to

j_{2}

of

A

. The expression

^{m_{1}} {[A]}_{m_{2}}

corresponds to the square sub-matrix along the main diagonal of

A

, with its top-left and bottom-right corners on

A_{m_{1}, m_{1}}

and

A_{m_{2}, m_{2}}

, respectively. A diagonal matrix whose entries are the elements in a set

D

(wherein elements may be repeated) is denoted as

diag D

. If

A \in R^{n \times m_{1}}

and

B \in R^{n \times m_{2}}

, we write

[A | B] \in R^{n \times (m_{1} + m_{2})}

to denote the augmented matrix built by placing the columns of

A

followed by those of

B

.

2.2. Mutual Information and Differential Entropy

Let

x

, y, and z be random variables with joint PDF

f_{x, y, z}

, and marginal PDFs

f_{x}

,

f_{y}

, and

f_{z}

, respectively. The mutual information between x and y is defined as

I (x; y) ≜ \int f_{x, y} (x, y) \log (\frac{f_{x, y} (x, y)}{f_{x} (x) f_{y} (y)}) d x d y

. The conditional mutual information between x and y given z is defined as

I (x; y | z) ≜ \int f_{x, y, z} (x, y, z) \log (\frac{f_{x, y | z} (x, y | z)}{f_{x | z} (x | z) f_{y | z} (y | z)}) d x d y d z

, where

f_{x, y | z}

is the joint PDF of x and y given z, and

f_{x | z}

,

f_{y | z}

are defined likewise. The conditional differential entropy of x given y is defined as

h (x | y) ≜ - \int f_{x, y} (x, y) \log (f_{x | y} (x | y)) d x d y

.

From these definitions, it is easy to verify the following properties Reference [1], Sections 2.4–2.6 and 8.4–8.6:

Shift invariance: for every deterministic function f,

$\begin{matrix} h (x + f (y) | y) = h (x | y) . \end{matrix}$

(19)
Non-negativity:

$\begin{matrix} I (x; y) \geq 0, \end{matrix}$

(20)

with equality if and only if x and y are independent.
Chain Rule:

$\begin{matrix} I (x; y, z) = I (x; y) + I (x; z | y) . \end{matrix}$

(21)
Relationship with entropy:

$\begin{matrix} I (x; y) = h (x) - h (x | y) = h (y) - h (y | x) . \end{matrix}$

(22)

2.3. System Model and Assumptions

Consider the discrete-time system depicted in Figure 1. In this setup, the block G satisfies Assumption 1.

It is worth noting that there is no loss of generality in considering

g_{0} = 1

, since one can otherwise write

G (z)

as

G^{'} (z) = g_{0} \cdot (G (z) / g_{0})

; thus, the entropy gain introduced by

G^{'} (z)

would be

\log | g_{0} |

plus the entropy gain due to

G (z) / g_{0}

(in agreement with (6)), which has an impulse response with its first sample equal to 1.

The following assumption is made about the output disturbance

z_{1}^{\infty}

:

Assumption 2.

The disturbance

z_{1}^{\infty}

is independent of

u_{1}^{\infty}

and belongs to a κ-dimensional linear subspace, for some finite

κ \in N

. This subspace is spanned by the κ orthonormal columns of a matrix

Φ \in R^{| N | \times κ}

(where

| N |

stands for the countably infinite size of

N

), such that

| h (Φ^{T} z_{\infty}^{1}) | < \infty

. Moreover,

z_{\infty}^{1} = Φ s_{κ}^{1}

, where the random vector

s_{κ}^{1} ≜ Φ^{T} z_{\infty}^{1}

has finite differential entropy, its covariance matrix

K_{s_{κ}^{1}}

satisfies

λ_{\max} (K_{s_{κ}^{1}}) < \infty

, and it is independent of

u_{\infty}^{1}

.

3. Revisiting Theorem 14 in Reference Shannon et al.

In this section, after presenting the proof of Theorem 2, we develop Shannon’s approach into a more detailed and formal exposition. This allows us to explain why, for part of the continuous-time filters considered in Theorem 2, the approach chosen by Shannon to prove Theorem 14 in Reference [2] is unable to predict the correct value for the entropy gain.

3.1. Proof of Theorem 2

To begin with, the Fourier transform of

ϕ_{k}

is

\begin{matrix} Φ_{k} (ξ) ≜ \int_{- \infty}^{\infty} ϕ_{k} (t) e^{- j 2 π ξ t} d t = 1_{[- B, B]} (ξ) e^{- j 2 π ξ k / (2 B)} . \end{matrix}

(23)

It is easy to verify that the functions

ϕ_{k}

satisfy the following orthogonality property:

\begin{matrix} \int_{- \infty}^{\infty} ϕ_{k} (t) ϕ_{i} (t) d t = \{\begin{matrix} 2 B & , k = i \\ 0 & , k \neq i \end{matrix} \end{matrix}

(24)

and

\begin{matrix} ϕ_{k} (- t) = ϕ_{- k} (t) . \end{matrix}

(25)

Notice that

u (k) = u_{c} (\frac{k}{2 B})

,

k \in N

.

The output of

G_{c}

sampled at time

t = ℓ / (2 B)

,

ℓ \in N

, is

\begin{matrix} y (ℓ) ≜ y_{c} (\frac{ℓ}{2 B}) = \int_{- \infty}^{\infty} g (τ) u_{c} (\frac{ℓ}{2 B} - τ) d τ \end{matrix}

(26)

\begin{matrix} = \frac{1}{2 B} \sum_{k = 1}^{\infty} \sum_{i = 0}^{η} g_{i} u (k) \int_{- \infty}^{\infty} ϕ_{i} (τ) ϕ_{k} (\frac{ℓ}{2 B} - τ) d τ \end{matrix}

(27)

\begin{matrix} = \frac{1}{2 B} \sum_{k = 1}^{\infty} \sum_{i = 0}^{η} g_{i} u (k) \int_{- \infty}^{\infty} ϕ_{i} (τ) ϕ_{ℓ - k} (τ) d τ \end{matrix}

(28)

\begin{matrix} = \sum_{i = 0}^{η} g_{i} u (ℓ - i), \end{matrix}

(29)

with

u (k) = 0

for

k \leq 0

. This means that the output samples

y_{1}^{\infty}

are the discrete-time convolution between

u_{1}^{\infty}

and the filter coefficients

{g_{i}}_{i = 0}^{η}

. Therefore, the matrix relation (4) holds. We then obtain that

\bar{h} (y_{c}) = \bar{h} (u_{c}) + \log | g_{0} |

.

The frequency response of

G_{c}

is given by

\begin{matrix} G_{c} (ξ) = \int_{- \infty}^{\infty} g (t) e^{- j 2 π ξ t} d t = \sum_{k = 0}^{η} g_{k} Φ_{k} (ξ) = \sum_{k = 0}^{η} g_{k} e^{- j π ξ k / B}, \end{matrix}

(30)

where

ξ

is in [Hz]. This means that

\begin{matrix} g_{0} = \frac{1}{2 B} \int_{- B}^{B} G_{c} (ξ) d ξ = \frac{1}{B} \int_{0}^{B} ℜ {G_{c} (ξ)} d ξ, \end{matrix}

(31)

where the last equality holds because

G_{c} (ξ)

is conjugate symmetric. Thus, the entropy gain introduced by

G_{c}

is the right-hand side of (13), concluding the proof. □

3.2. Formalizing Shannon’s Argument

In the approach followed by Shannon, it is argued that the entropy gain is the limit as

n \to \infty

of

n^{- 1} \sum_{r = 0}^{n - 1} \log | G_{c} (ξ_{r}) |

over uniformly spaced frequencies

ξ_{0}, \dots, ξ_{n - 1}

. Here, we show that this summation corresponds to

\log | \det ({\tilde{G}}_{n}) |

, where

{\tilde{G}}_{n}

is an n-by-n Toeplitz circulant matrix. Moreover, the sequences of Hermitian matrices

{G_{n} G_{n}^{*}}_{n = 1}^{\infty}

and

{{\tilde{G}}_{n} {\tilde{G}}_{n}^{*}}_{n = 1}^{\infty}

are asymptotically equivalent (as defined in Reference [18], Section 2.3), which would yield

\lim_{n \to \infty} n^{- 1} \log | \det (G_{n}) | = \lim_{n \to \infty} n^{- 1} \log | \det ({\tilde{G}}_{n}) |

if the eigenvalues of

G_{n} G_{n}^{*}

were bounded between constants

0 < ζ_{m} < ζ_{M} < \infty

for all

n \in N

. However, if

G (z)

(the z-transform of

{g_{k}}_{k = 0}^{\infty}

) has NMP zeros, then

G_{n} G_{n}^{*}

has eigenvalues tending to zero exponentially as

n \to \infty

, which precludes these two limits to coincide.

To prove the above claims, we first apply the change of variable

ω ≜ π ξ / B

, with which (30) becomes

\begin{matrix} G_{c} (B ω / π) = G (e^{j ω}) ≜ \sum_{k = 0}^{η} g_{k} e^{- j ω k}, \end{matrix}

(32)

where

G (e^{j ω})

is the frequency response of the discrete-time filter G with unit-impulse response

{g_{i}}_{i = 1}^{η}

and

ω

is in radians per second. Now, following Shannon’s approach, we uniformly sample

G (e^{j ω})

at n frequencies

\begin{matrix} ω_{r} ≜ \{\begin{matrix} r \frac{2 π}{n} & , r / n \leq 0.5 \\ r \frac{2 π}{n} - 2 π & , r / n > 0.5 \end{matrix}, r = 0, 1, \dots, n - 1, \end{matrix}

(33)

which, from (32), yields the spectral samples

\begin{matrix} G (e^{- j ω_{r}}) & = \sum_{k = 0}^{η} g_{k} e^{- j \frac{2 π}{n} r k} . \end{matrix}

(34)

We will cast the reason why (3) fails to coincide with the correct expression for the entropy gain provided by (5) as a disagreement between the asymptotic behavior of the logarithm of the determinant of two sequences of asymptotically equivalent matrices. For that purpose, since (34) coincides with Reference [18], Equation 4.34, we have that the spectral samples

{\{G (e^{- j ω_{r}})\}}_{r = 0}^{n - 1}

are the eigenvalues of the Toeplitz circulant matrix (Reference [18], Chapter 3)

\begin{matrix} {\tilde{G}}_{n} ≜ (\begin{matrix} {\tilde{g}}_{n, 0} & {\tilde{g}}_{n, n - 1} & \dots & {\tilde{g}}_{n, 1} \\ {\tilde{g}}_{n, 1} & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & {\tilde{g}}_{n, n - 1} \\ {\tilde{g}}_{n, n - 1} & ⋱ & ⋱ & {\tilde{g}}_{n, 0} \end{matrix}) & = U_{n}^{*} diag {G (e^{- j ω_{0}}), \dots, G (e^{- j ω_{n - 1}})} U_{n}, \end{matrix}

(35)

where

U_{n} \in C^{n \times n}

is the n-point discrete Fourier transform (DFT) matrix, defined as

\begin{matrix} {[U_{n}]}_{k, r} ≜ \frac{1}{\sqrt{n}} e^{- j \frac{2 π}{n} k r}, k, r = 0, 1, \dots, n - 1 . \end{matrix}

(36)

From Reference [18], Lemma 4.5,

{\tilde{g}}_{n, k} ≜ \sum_{i \in N_{0} : k + n i \leq η} g_{k + i n}

, corresponding to the (possibly) aliased impulse response

g_{0}, g_{1}, \dots, g_{η}

as a result of sampling in frequency.

We can now see that the discrepancy between the entropy gain predicted by (3) and (5) is the disagreement between the following limits:

\begin{matrix} \lim_{n \to \infty} \frac{1}{2 n} \log (\det (G_{n} G_{n}^{*})) & \overset{(31)}{=} \log |\frac{1}{B} \int_{0}^{B} ℜ {G_{c} (ξ)} d ξ|, \end{matrix}

(37a)

\begin{matrix} \lim_{n \to \infty} \frac{1}{2 n} \log (\det ({\tilde{G}}_{n} {\tilde{G}}_{n}^{*})) & \overset{}{=} \frac{1}{B} \int_{0}^{B} \log | G_{c} (ξ) | d ξ, \end{matrix}

(37b)

where, due to (8), the expressions on both right-hand sides differ if and only if

G (z)

has NMP zeros. According to Reference [18], Lemma 4.6, the sequences

{G_{n}}_{n = 1}^{\infty}

and

{{\tilde{G}}_{n}}_{n = 1}^{\infty}

are asymptotically equivalent, which is written as

G_{n} \sim {\tilde{G}}_{n}

. Then, from Reference [18], Theorem 2.1, the Hermitian matrices

G_{n} G_{n}^{*} \sim {\tilde{G}}_{n} {\tilde{G}}_{n}^{*}

, which, from Reference [18], Theorem 2.4, implies that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} f (λ_{i} (G_{n} G_{n}^{*})) = \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{n} f (λ_{i} ({\tilde{G}}_{n} {\tilde{G}}_{n}^{*})), \end{matrix}

(38)

for any function f continuous over a finite interval

[ζ_{m}, ζ_{M}]

such that

\begin{matrix} ζ_{m} \leq λ_{i} (G_{n} G_{n}^{*}), λ_{i} ({\tilde{G}}_{n} {\tilde{G}}_{n}^{*}) \leq ζ_{M}, i = 1, 2, \dots, n, n = 1, 2, \dots . \end{matrix}

(39)

However, when

G (z)

has m NMP zeros, Lemma 7 (in Section 6.3) establishes that there are exactly m eigenvalues of

G_{n}

that tend to zero exponentially as

n \to \infty

. Crucially,

\log (\cdot)

is discontinuous at 0, which precludes the limits in (37) from coinciding.

4. The Effective Differential Entropy

Theorem 3 establishes that the effective differential entropy rate of the entire or complete output of an LTI system exceeds that of the (shorter) input sequence by the RHS of (15). This section provides a geometrical interpretation of this problem and intuition about the effective differential entropy already introduced in Definition 1.

Consider the random vectors

u ≜ {[u_{1} u_{2}]}^{T}

and

y ≜ {[y_{1} y_{2} y_{3}]}^{T}

related via

\begin{matrix} [\begin{matrix} y_{1} \\ y_{2} \\ y_{3} \end{matrix}] = \underset{≜ {\overset{˘}{G}}_{2}}{\underset{⏟}{(\begin{matrix} 1 & 0 \\ 2 & 1 \\ 0 & 2 \end{matrix})}} [\begin{matrix} u_{1} \\ u_{2} \end{matrix}] . \end{matrix}

(40)

Suppose

u

is uniformly distributed over

[0, 1] \times [0, 1]

. Applying the conventional definition of differential entropy of a random sequence, we would have that

\begin{matrix} h (y_{1}, y_{2}, y_{3}) = h (y_{1}, y_{2}) + h (y_{3} | y_{1}, y_{2}) = - \infty \end{matrix}

(41)

because

y_{3}

is a deterministic function of

y_{1}

and

y_{2}

:

y_{3} = [0 2] {[u_{1} u_{2}]}^{T} = [0 2] {(\begin{matrix} 1 & 0 \\ 2 & 1 \end{matrix})}^{- 1} [\begin{matrix} y_{1} \\ y_{2} \end{matrix}] .

In other words, the problem lies in that, although the output is a three-dimensional vector, it only has two degrees of freedom, i.e., it is restricted to a 2-dimensional subspace of

R^{3}

. This is illustrated in Figure 2, where the set

[0, 1] \times [0, 1]

is shown (coinciding with the u-v plane), together with its image through

{\overset{˘}{G}}_{2}

(as defined in (40)).

As can be seen in this figure, the image of the square

{[0, 1]}^{2}

through

{\overset{˘}{G}}_{2}

is a 2-dimensional rhombus over which

{y_{1}, y_{2}, y_{3}}

distributes uniformly. Since the intuitive notion of differential entropy of an ensemble of random variables relates to the size of the region spanned by the associated random vector (and determines how difficult it is to compress it in a lossy fashion with a given precision), one could argue that the differential entropy of

{y_{1}, y_{2}, y_{3}}

, far from being

- \infty

, should be somewhat larger than that of

{u_{1}, u_{2}}

(since the rhombus

{\overset{˘}{G}}_{2} {[0, 1]}^{2}

has a larger area than

{[0, 1]}^{2}

). So, what does it mean that (and why should)

h (y_{1}, y_{2}, y_{3}) = - \infty

? Simply put, the differential entropy relates to the volume spanned by the support of the probability density function. For

y

in our example, the latter (three-dimensional) volume is clearly zero.

From the above discussion, the comparison between the differential entries of

y \in R^{3}

and

u \in R^{2}

of our previous example should take into account that

y

actually lives in a two-dimensional subspace of

R^{3}

. Indeed, since the multiplication by a unitary matrix does not alter differential entries, we could consider the differential entropy of

\begin{matrix} [\begin{matrix} \tilde{y} \\ 0 \end{matrix}] ≜ (\begin{matrix} \overset{˘}{Q} \\ {\bar{q}}^{T} \end{matrix}) y, \end{matrix}

(42)

where

{\overset{˘}{Q}}^{T}

is the

3 \times 2

matrix with orthonormal rows in the SVD of

{\overset{˘}{G}}_{2}

\begin{matrix} {\overset{˘}{G}}_{2} = {\overset{˘}{Q}}^{T} \overset{˘}{D} \overset{˘}{R}, \end{matrix}

(43)

and

\bar{q}

is a unit-norm vector orthogonal to the rows of

\overset{˘}{Q}

(and thus orthogonal to

y

, as well). We are now able to compute the differential entropy in

R^{2}

for

\tilde{y}

, corresponding to the rotated version of

y

such that its support is now aligned with

R^{2}

.

The preceding discussion motivates the use of a modified version of the notion of differential entropy for a random vector

y \in R^{n}

which considers the number of dimensions actually spanned by

y

instead of its length.

It is worth mentioning that Shannon’s differential entropy of a vector

y \in R^{ℓ}

, whose support’s ℓ-volume is greater than zero, arises from considering it as the difference between its (absolute) entropy and that of a random variable uniformly distributed over an ℓ-dimensional, unit-volume region of

R^{ℓ}

. More precisely, if in this case the probability density function (PDF) of

y = {[y_{1} y_{2} \dots y_{ℓ}]}^{T}

is Riemann integrable, then [1], Thm. 9.3.1,

\begin{matrix} h (y) = \lim_{Δ \to 0} [H (y^{Δ}) + ℓ \log Δ], \end{matrix}

(44)

where

y^{Δ}

is the discrete-valued random vector resulting when

y

is quantized using an ℓ-dimensional uniform quantizer with ℓ-cubic quantization cells with volume

Δ^{ℓ}

. However, if we consider a variable

y

whose support belongs to an n-dimensional subspace of

R^{ℓ}

,

n < ℓ

(i.e.,

y = S u = A^{T} T C u

, as in Definition 1), then the entropy of its quantized version in

R^{ℓ}

, say

H_{ℓ} (y^{Δ})

, is distinct from

H_{n} ({(A y)}^{Δ})

, the entropy of

A y

in

R^{n}

. Moreover, it turns out that, in general,

\begin{matrix} \lim_{Δ \to 0} (H_{ℓ} (y^{Δ}) - H_{n} ({(A y)}^{Δ})) \neq 0, \end{matrix}

(45)

despite the fact that

A

has orthonormal rows. Thus, the definition given by (44) does not yield consistent results for the case wherein a random vector has a support’s dimension (i.e., its number of degrees of freedom) smaller that its length (The mentioned inconsistency refers to (45).), which reveals that the asymptotic behavior

H_{ℓ} (y^{Δ})

changes if

y

is rotated. (If this were not the case, then we could redefine (44) replacing ℓ by n, in a spirit similar to the one behind Renyi’s d-dimensional entropy [19].) To see this, consider the case in which

u \in R

distributes uniformly over

[0, 1]

and

y = {[1 1]}^{T} u / \sqrt{2}

. Clearly,

y

distributes uniformly over the unit-length segment connecting the origin with the point

(1, 1) / \sqrt{2}

. Then,

\begin{matrix} H_{2} (y^{Δ}) = - ⌊\frac{1}{Δ \sqrt{2}}⌋ Δ \sqrt{2} \log (Δ \sqrt{2}) - (1 - ⌊\frac{1}{Δ \sqrt{2}}⌋ \sqrt{2} Δ) \log (1 - ⌊\frac{1}{Δ \sqrt{2}}⌋ \sqrt{2} Δ) . \end{matrix}

(46)

On the other hand, since, in this case,

A y = u

, we have that

\begin{matrix} H_{1} ({(A y)}^{Δ}) = H_{1} (u^{Δ}) = - ⌊\frac{1}{Δ}⌋ Δ \log Δ - (1 - ⌊\frac{1}{Δ}⌋ Δ) \log (1 - ⌊\frac{1}{Δ}⌋ Δ) . \end{matrix}

(47)

Thus, the d-dimensional entropy would not generally be equal to the effective differential entropy, that is:

\begin{matrix} \lim_{Δ \to 0} (H_{1} ({(A y)}^{Δ}) - H_{2} (y^{Δ})) = \lim_{Δ \to 0} (⌊\frac{1}{Δ \sqrt{2}}⌋ Δ \sqrt{2} \log (Δ \sqrt{2}) - ⌊\frac{1}{Δ}⌋ Δ \log Δ) = \log \sqrt{2} . \end{matrix}

(48)

The latter example further illustrates why the notion of effective entropy is appropriate in the setup considered in this section, where the effective dimension of the random sequences does not coincide with their length (it is easy to verify that the effective entropy of

y

does not change if one rotates

y

in

R^{ℓ}

).

We finish this section with an example to illustrate the usefulness of the notion of effective differential entropy beyond the context of entropy gain.

Application Example: Shannon Lower Bound

The rate-distortion function (RDF)

R (D)

is the infimum, among all codes, of the expected number of bits per sample necessary to reconstruct a given random source with distortion not greater than D [1]. Let the source and reconstruction be the vectors

x_{ℓ}^{1}

and

x_{ℓ}^{1} + v_{ℓ}^{1}

, respectively, and suppose the distortion is assessed using the mean-squared error (MSE)

d (v_{ℓ}^{1}) ≜ E [∥ v_{ℓ}^{1} ∥^{2}]

. Then, restricting our attention to uniquely-decodable codes Reference [1], p. 105), the Shannon Lower Bound (SLB) [20] establishes that

\begin{matrix} ℓ R (D) \geq h (x_{ℓ}^{1}) - \max_{d (v_{ℓ}^{1}) \leq D} h (v_{ℓ}^{1}), \end{matrix}

(49)

provided

h (x_{ℓ}^{1})

is bounded. Therefore, if

x_{ℓ}^{1}

is the entire forced response of an FIR filter G of order p to an input

u_{1}^{n}

, then

ℓ = n + p

and

h (x_{ℓ}^{1})

is minus infinity, which precludes one from using (49). We will show next that, in this case, the SLB can still be stated by using the effective differential entropy

\overset{˘}{h} (x_{ℓ}^{1})

instead of

h (x_{ℓ}^{1})

. Following Definition 1, we can write the source vector as

x_{ℓ}^{1} = A^{T} T C u_{n}^{1}

, where

A \in R^{n \times ℓ}

has orthonormal rows,

T \in R^{n \times n}

is diagonal with non-negative entries, and

C \in R^{n \times n}

is unitary. Let

H ≜ [A^{T} | {\bar{A}}^{T}] \in R^{ℓ \times ℓ}

be a unitary matrix, which means that

\bar{A} A^{T} = 0_{p \times ℓ}

. Then,

\begin{matrix} ℓ R (D) & \overset{(a)}{\geq} I (x_{ℓ}^{1}; x_{ℓ}^{1} + v_{ℓ}^{1}), \end{matrix}

(50)

\begin{matrix} \overset{}{=} I (H x_{ℓ}^{1}; H x_{ℓ}^{1} + H v_{ℓ}^{1}), \end{matrix}

(51)

\begin{matrix} \overset{}{=} I (A x_{ℓ}^{1}; A x_{ℓ}^{1} + H v_{ℓ}^{1}), \end{matrix}

(52)

\begin{matrix} \overset{}{=} I (A x_{ℓ}^{1}; A x_{ℓ}^{1} + A v_{ℓ}^{1}, \bar{A} v_{ℓ}^{1}), \end{matrix}

(53)

\begin{matrix} \overset{(21)}{=} I (A x_{ℓ}^{1}; A x_{ℓ}^{1} + A v_{ℓ}^{1}), + I (A x_{ℓ}^{1}; \bar{A} v_{ℓ}^{1} | A x_{ℓ}^{1} + A v_{ℓ}^{1}), \end{matrix}

(54)

\begin{matrix} \overset{(20)}{\geq} I (A x_{ℓ}^{1}; A x_{ℓ}^{1} + A v_{ℓ}^{1}), \end{matrix}

(55)

\begin{matrix} \overset{(22)}{=} h (A x_{ℓ}^{1}) - h (A x_{ℓ}^{1} | A x_{ℓ}^{1} + A v_{ℓ}^{1}), \end{matrix}

(56)

\begin{matrix} \overset{(19)}{=} h (A x_{ℓ}^{1}) - h (A v_{ℓ}^{1} | A x_{ℓ}^{1} + A v_{ℓ}^{1}), \end{matrix}

(57)

\begin{matrix} \overset{(b)}{\geq} h (A x_{ℓ}^{1}) - h (A v_{ℓ}^{1}), \end{matrix}

(58)

\begin{matrix} \overset{}{\geq} h (A x_{ℓ}^{1}) - \max_{w_{n}^{1} : d (w_{n}^{1}) \leq D} h (w_{n}^{1}), \end{matrix}

(59)

\begin{matrix} \overset{(c)}{=} \overset{˘}{h} (x_{ℓ}^{1}) - \max_{w_{n}^{1} : d (w_{n}^{1}) \leq D} h (w_{n}^{1}), \end{matrix}

(60)

where

(a)

stems from Reference [1], Theorems 5.4.1 and 5.5.1 and Equations (10).58-10.61,

(b)

holds because conditioning does not increase entropy and

(c)

is from the definition of effective differential entropy.

5. Entropy-Balanced Processes: Geometric Interpretation and Properties

In the first part of this section, we provide a geometric interpretation of the effect that a non-minimum phase LTI system has on its input random process. This will give an intuitive meaning to the notion of an entropy-balanced random process (introduced in Definition 2 above) and provide insights into why and how the entropy gain defined in (1) arises as a consequence of an output random disturbance or a random initial state (the themes of Section 6 and Section 7, respectively).

The second part of this section identifies several entropy-balanced processes and establishes two properties satisfied by this class of processes.

5.1. Geometric Interpretation

We begin our discussion with a simple example.

Example 1.

Suppose that G in Figure 1 is a finite impulse response (FIR) filter with impulse response

g_{0} = 1, g_{1} = 2, g_{i} = 0, \forall i \geq 2

. Notice that this choice yields

G (z) = (z - 2) / z

; thus,

G (z)

has one non-minimum phase zero, at

z = 2

. The associated matrix

G_{n}

for

n = 3

is

G_{3} = (\begin{matrix} 1 & 0 & 0 \\ 2 & 1 & 0 \\ 0 & 2 & 1 \end{matrix}),

whose determinant is clearly one (indeed, all its eigenvalues are 1). Hence, as discussed in the introduction,

h (G_{3} u_{3}^{1}) = h (u_{3}^{1})

; thus,

G_{3}

(and

G_{n}

, in general) does not introduce an entropy gain by itself. However, an interesting phenomenon becomes evident by looking at the SVD of

G_{3}

, given by

G_{3} = Q_{3}^{T} D_{3} R_{3}

, where

Q_{3}

and

R_{3}

are unitary matrices, and

D_{3} ≜ diag {d_{1}, d_{2}, d_{3}}

. In this case,

D_{3} = diag {0.19394, 1.90321, 2.70928}

; thus, one of the singular values of

G_{3}

is much smaller than the others (although the product of all singular values yields 1, as expected). As will be shown in Section 6, for a stable

G (z)

, such uneven distribution of singular values arises only when

G (z)

has non-minimum phase zeros. The effect of this can be visualized by looking at the image of the cube

{[0, 1]}^{3}

through

G_{3}

, shown in Figure 3.

If the input

u_{3}^{1}

were uniformly distributed over this cube (of unit volume), then

G_{3} u_{3}^{1}

would distribute uniformly over the unit-volume parallelepiped depicted in Figure 3; hence,

h (G_{3} u_{3}^{1}) = h (u_{3}^{1})

.

Now, if we add to

G_{3} u_{3}^{1}

a disturbance

z_{3}^{1} = Φ s

, with scalar s uniformly distributed over

[- 0.5, 0.5]

independent of

u_{3}^{1}

, and with

Φ \in R^{3 \times 1}

, the effect would be to “thicken” the support over which the resulting random vector

y_{3}^{1} = G_{3} u_{3}^{1} + z_{3}^{1}

is distributed, along the direction pointed by

Φ

. If

Φ

is aligned with the direction along which the support of

G_{3} u_{3}^{1}

is thinnest (given by

q_{3, 1}

, the first row of

Q_{3}

), then the resulting support would have its volume significantly increased, which can be associated with a large increase in the differential entropy of

y_{3}^{1}

with respect to

u_{3}^{1}

. Indeed, a relatively small variance of s and an approximately aligned

Φ

would still produce a significant entropy gain.

The above example suggests that the entropy gain from

u_{n}^{1}

to

y_{n}^{1}

appears as a combination of two factors. The first of these is the uneven way in which the random vector

G_{n} u_{n}^{1}

is distributed over

R^{n}

. The second factor is the alignment of the disturbance vector

z_{n}^{1}

with respect to the span of the subset

{q_{n, i}}_{i \in Ω_{n}}

of columns of

Q_{n}

, associated with the smallest singular values of

G_{n}

, indexed by the elements in the set

Ω_{n}

. As we shall discuss in the next section, if G has m non-minimum phase zeros, then, as n increases, there will be m singular values of

G_{n}

going to zero exponentially. Since the product of the singular values of

G_{n}

equals 1 for all n, it follows that

\prod_{i \notin Ω_{n}} d_{n, i}

must grow exponentially with n, where

d_{n, i}

is the i-th diagonal entry of

D_{n}

. This implies that

G_{n} u_{n}^{1}

expands with n along the span of

{q_{n, i}}_{i \notin Ω_{n}}

, compensating its shrinkage along the span of

{q_{n, i}}_{i \in Ω_{n}}

, thus keeping

h (G_{n} u_{n}^{1}) = h (u_{n}^{1})

for all n. Thus, as n grows, any small disturbance distributed over the span of

{q_{n, i}}_{i \in Ω_{n}}

, added to

G_{n} u_{n}^{1}

, will keep the support of the resulting distribution from shrinking along this subspace. Consequently, the expansion of

G_{n} u_{n}^{1}

with n along the span of

{q_{n, i}}_{i \notin Ω_{n}}

is no longer compensated, yielding an entropy increase proportional to

\log (\prod_{i \notin Ω_{n}} d_{n, i})

.

The above analysis allows one to anticipate a situation in which no entropy gain would take place even when some singular values of

G_{n}

tend to zero as

n \to \infty

. Since the increase in entropy is made possible by the fact that, as n grows, the support of the distribution of

G_{n} u_{n}^{1}

shrinks along the span of

{q_{n, i}}_{i \in Ω_{n}}

, no such entropy gain should arise if the support of the distribution of the input

u_{n}^{1}

expands accordingly along the directions pointed by the rows

{r_{n, i}}_{i \in Ω_{n}}

of

R_{n}

.

An example of such situation can be easily constructed as follows: Let

G (z)

in Figure 1 have non-minimum phase zeros and suppose that

u_{1}^{\infty}

is generated as

G^{- 1} {\tilde{u}}_{1}^{\infty}

, where

{\tilde{u}}_{1}^{\infty}

is an i.i.d. random process with bounded entropy rate. Since the determinant of

G_{n}^{- 1}

equals 1 for all n, we have that

h (u_{n}^{1}) = h ({\tilde{u}}_{n}^{1})

, for all n. On the other hand,

y_{n}^{1} = G_{n} G_{n}^{- 1} {\tilde{u}}_{n}^{1} + z_{n}^{1} = {\tilde{u}}_{n}^{1} + z_{n}^{1}

. Since

z_{n}^{1} = {[Φ]}_{n}^{1} s_{κ}^{1}

for some finite

κ

(recall Assumption 2), it is easy to show that

\lim_{n \to \infty} \frac{1}{n} h (y_{n}^{1}) = \lim_{n \to \infty} \frac{1}{n} h ({\tilde{u}}_{n}^{1}) = \lim_{n \to \infty} \frac{1}{n} h (u_{n}^{1})

; thus, no entropy gain appears.

The preceding discussion reveals that the entropy gain produced by G in the situation shown in Figure 1 depends on the distribution of the input and on the support and distribution of the disturbance. This stands in stark contrast with the well known fact that the increase in differential entropy produced by an invertible linear operator depends only on its Jacobian, and not on the statistics of the input [2]. We have also seen that the distribution of a random process along the different directions within the Euclidean space which contains it plays a key role, as well. This motivates the need to specify a class of random processes which distribute more or less evenly over all directions. This is precisely the intuitive meaning of an entropy-balanced process.

The following section identifies a large family of processes belonging to this class, as well as two properties which greatly expands this family.

5.2. Characterization of Entropy-Balanced Processes

We have defined the notion of an “entropy-balanced” process in Section 1.1. In words, the first condition in this definition allows one to guarantee that the orthogonal projection of an entropy-balanced process onto any

ν

-dimensional linear subspace has a differential entropy whose magnitude remains bounded or grows at most sub-linearly with n. The second condition states that the projection of an entropy-balanced process

v_{1}^{\infty}

onto any linear subspaces having

ν

fewer dimensions has the same differential entropy rate as the original process. This condition is equivalent to requiring that every unitary transformation on

v_{1}^{n}

yields a random sequence

y_{1}^{n}

such that

\lim_{n \to \infty} \frac{1}{n} h (y_{n - ν + 1}^{n} | y_{1}^{n - ν}) = 0

. This property of the resulting random sequence

y_{1}^{n}

means that one cannot predict its last

ν

samples with arbitrary accuracy by using its previous

n - ν

samples, even if n goes to infinity.

We now characterize a large family of entropy-balanced random processes and establish some of their properties. Although intuition may suggest that most random processes (such as i.i.d. or stationary processes) should be entropy balanced, that statement seems rather difficult to prove. In the following, we show that the entropy-balanced condition is met by i.i.d. processes with per-sample probability density function (PDF) being uniform, piece-wise constant or Gaussian. It is also shown that adding to an entropy-balanced process an independent random processes independent of the former yields another entropy-balanced process, and that filtering an entropy-balanced process by a stable and minimum phase filter yields an entropy-balanced process, as well. The proofs can be found in Appendix B.

Lemma 1.

Let

u_{1}^{\infty}

be a Gaussian random process with independent elements having positive and bounded variance, i.e., there exist

0 < {\overset{ˇ}{σ}}^{2} \leq {\hat{σ}}^{2} < \infty

such that

{\overset{ˇ}{σ}}^{2} \leq σ_{u (n)}^{2} \leq {\hat{σ}}^{2}

,

n \in N

. Then,

u_{1}^{\infty}

is entropy balanced.

Lemma 2.

Let

u_{1}^{\infty}

be a random process with independent elements satisfying Condition i) in Definition 2, in which each

u_{i}

is distributed according to a (possibly different) piece-wise constant PDF such that each interval where this PDF is constant has measure less than θ and greater than ϵ, for some constants

0 < ϵ < θ < \infty

. Then,

u_{1}^{\infty}

is entropy balanced.

Lemma 3.

Let

u_{1}^{\infty}

and

v_{1}^{\infty}

be mutually independent random processes. If

u_{1}^{\infty}

is entropy balanced, and

w_{1}^{\infty} ≜ u_{1}^{\infty} + v_{1}^{\infty}

satisfies

σ_{w (n)}^{2} < \infty

for finite n and

\lim_{n \to \infty} n^{- 1} \log (σ_{w (n)}^{2}) = 0

, then

w_{1}^{\infty}

is also entropy balanced.

The proof of Lemma 3 is on page 33. The working behind this lemma can be interpreted intuitively by noting that adding to a random process another independent random process can only increase the “spread” of the distribution of the former, which tends to balance the entropy of the resulting process along all dimensions in Euclidean space. In addition, it follows from Lemma 3 that all i.i.d. processes having a per-sample PDF which can be constructed by convolving uniform, piece-wise constant or Gaussian PDFs as many times as required are entropy balanced. It also implies that one can have non-stationary processes which are entropy balanced, since Lemma 3 imposes no requirements for the process

v_{1}^{\infty}

.

The next lemma related to the properties of entropy-balanced processes shows that filtering by a stable and minimum phase LTI filter preserves the entropy balanced condition of its input.

Lemma 4.

Let

u_{1}^{\infty}

be an entropy-balanced process and G an LTI stable and minimum-phase filter. Then, the output

w_{1}^{\infty} ≜ G u_{1}^{\infty}

is also an entropy-balanced process.

This result implies that any stable moving-average auto-regressive process constructed from entropy-balanced innovations is also entropy balanced, provided the coefficients of the averaging and regression correspond to a stable MP filter.

The last lemma of this section states a crucial property of entropy-balanced processes (the proof is in Appendix B, page 34).

Lemma 5.

Let

u_{1}^{\infty}

be an entropy balanced process. Consider a disturbance

z_{1}^{\infty}

satisfying Assumption 2 and define

y_{1}^{\infty} ≜ u_{1}^{\infty} + z_{1}^{\infty}

. Then,

\lim_{n \to \infty} n^{- 1} (h (y_{1}^{n}) - h (u_{1}^{n})) = 0

.

We finish this section by pointing out two examples of processes which are non-entropy-balanced, namely the output of a NMP-filter to an entropy-balanced input and the output of an unstable filter to an entropy-balanced input. The first of these cases plays a central role in the next section.

6. Entropy Gain Due to External Disturbances

In this section, we formalize the ideas which were qualitatively outlined in the previous section. Specifically, for the system shown in Figure 1 we will characterize the entropy gain

G (G, x_{0}, u_{1}^{\infty}, z_{1}^{\infty})

defined in (1) for the case in which the initial state

x_{0}

is zero (or deterministic) and there exists a random disturbance of (possibly infinite length)

z_{1}^{\infty}

which satisfies Assumption 2.

6.1. Input Disturbances Do Not Produce Entropy Gain

In this section, we show that random disturbances satisfying Assumption 2, when added to the input

u_{1}^{\infty}

(i.e., before G), do not introduce entropy gain. This result can be obtained from Lemma 6, as stated in the following theorem:

Theorem 5

(Input Disturbances do not Introduce Entropy Gain). Let G and

z_{1}^{\infty}

satisfy Assumptions 1 and 2, respectively. Suppose that

u_{1}^{\infty}

is entropy balanced and consider the output

\begin{matrix} y_{1}^{\infty} = G (u_{1}^{\infty} + z_{1}^{\infty}) . \end{matrix}

(61)

Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) = 0 \end{matrix}

(62)

Proof.

From Lemma 5, the differential entropy rate of

u_{1}^{\infty}

equals that of

u_{1}^{\infty} + z_{1}^{\infty}

. The proof is completed by recalling that G yields no entropy gain for its input

u_{1}^{\infty} + z_{1}^{\infty}

because it corresponds to the noise-less scenario. □

6.2. The Entropy Gain Introduced by Output Disturbances when G is MP is Zero

The results from the previous section yield the following corollary, which states that an LTI system with transfer function

G (z)

without zeros outside the unit circle (i.e., an MP transfer function) cannot introduce entropy gain.

Corollary 1

(Minimum Phase Filters do not Introduce Entropy Gain). Consider the system shown in Figure 1 wherein the input

u_{1}^{\infty}

is an entropy-balanced random process and the output disturbance

z_{1}^{\infty}

satisfies Assumption 2. Besides Assumption 1, suppose that

G (z)

is minimum phase. Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) = 0 . \end{matrix}

(63)

Proof.

Since

G (z)

is minimum phase and stable, the result follows directly from Lemmas 4 and 5. □

6.3. The Entropy Gain Introduced by Output Disturbances when $G (z)$ is NMP

We show here that the entropy gain of an LTI system with transfer function

G (z)

and an output disturbance is at most the sum of the logarithm of the magnitude of the zeros of

G (z)

outside the unit circle.

The following lemma will be instrumental for that purpose.

Lemma 6.

Consider the system in Figure 1, and suppose

z_{1}^{\infty}

satisfies Assumption 2, and that the input process

u_{1}^{\infty}

is entropy balanced. Let

G_{n} = Q_{n}^{T} D_{n} R_{n}

be the SVD of

G_{n}

, where

D_{n} = diag {d_{n, 1}, \dots, d_{n, n}}

are the singular values of

G_{n}

, with

d_{n, 1} \leq d_{n, 2} \leq \dots \leq d_{n, n}

, such that

| \det G_{n} | = 1

\forall n

. Let m be the number of these singular values which tend to zero exponentially as

n \to \infty

. Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) = \lim_{n \to \infty} \frac{1}{n} (- \sum_{i = 1}^{m} \log d_{n, i} + h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1})) . \end{matrix}

(64)

The proof of this lemma can be found on page 34, in Appendix B.

Lemma 6 leaves the need to characterize the asymptotic behavior of the singular values of

G_{n}

. This is accomplished in the following lemma, which relates these singular values to the zeros of

G (z)

. It is a generalization of the unnumbered lemma in the proof of Reference [16], Theorem 1 (restated in Appendix C as Lemma A3), which holds for FIR transfer functions, to the case of infinite-impulse response (IIR) transfer functions (i.e., transfer functions having poles).

Lemma 7.

For a transfer function

G (z)

satisfying Assumption 1, where its zeros

{ρ_{i}}_{i = 1}^{p}

satisfy

| ρ_{1} | \geq \dots \geq | ρ_{m} | > 1 \geq | ρ_{m + 1} | \geq \dots \geq | ρ_{p} |

. Then,

\begin{matrix} λ_{l} (G_{n} G_{n}^{T}) = \{\begin{matrix} α_{n, l}^{2} {| ρ_{l} |}^{- 2 n} & , if l \leq m, \\ α_{n, l}^{2} & , o t h e r w i s e, \end{matrix} \end{matrix}

(65)

where the elements in the sequence

{α_{n, l}}

are positive and increase or decrease at most polynomially with n.

(The proof of this lemma can be found in Appendix B, page 36).

Lemma 6 also precisely formulates the geometric idea outlined in Section 5.1. To see this, notice that no entropy gain is obtained if the output disturbance vector

z_{n}^{1}

becomes orthogonal (with probability 1) to the space spanned by the first m columns of

Q_{n}

sufficiently fast as

n \to \infty

. Recalling from Assumption 2 that

\begin{matrix} z_{n}^{1} = {[Φ]}_{n}^{1} s_{κ}^{1}, \end{matrix}

(66)

where the matrix

Φ

has

κ

orthonormal columns of infinite length, such orthogonality condition can be formally stated by defining

\begin{matrix} κ_{n} ≜ rank ({[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1}) \end{matrix}

(67)

\begin{matrix} {\hat{κ}}_{\infty} ≜ \underset{n \to \infty}{\lim \sup} κ_{n} \end{matrix}

(68)

\begin{matrix} {\overset{ˇ}{κ}}_{\infty} ≜ \underset{n \to \infty}{\lim \inf} κ_{n} \end{matrix}

(69)

as

\hat{κ} = 0

.

If this were the case, then the disturbance would not be able fill the subspace along which

G_{n} u_{n}^{1}

is shrinking exponentially. Indeed, if

κ_{n} = 0

for all n, then

h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1}) = h (^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1} u_{n}^{1}) = \sum_{i = 1}^{m} \log d_{n, i} + h ({[R_{n}]}_{m}^{1} u_{n}^{1})

, and the latter sum cancels out the one on the RHS of (64), while

\lim_{n \to \infty} \frac{1}{n} h ({[R_{n}]}_{m}^{1} u_{n}^{1}) = 0

since

u_{1}^{\infty}

is entropy balanced. On the contrary (and loosely speaking), if the projection of the support of

z_{n}^{1}

onto the subspace spanned by the first m rows of

Q_{n}

is of dimension m (i.e., if

κ_{n} = m

) for all n, then

h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1})

remains bounded for all n, and the entropy limit of the sum

\lim_{n \to \infty} \frac{1}{n} (- \sum_{i = 1}^{m} \log d_{n, i})

on the RHS of (64) yields the largest possible entropy gain. Notice that

- \sum_{i = 1}^{m} \log d_{n, i} = \sum_{i = m + 1}^{n} \log d_{n, i}

(because

\det (G_{n}) = 1

); thus, this entropy gain stems from the uncompensated expansion of

G_{n} u_{n}^{1}

along the space spanned by the rows of

{[Q_{n}]}_{n}^{m + 1}

. Beyond these extreme cases (i.e., for general values of

\overset{ˇ}{κ}

and

\hat{κ}

), the following theorem provides tight bounds on the entropy gain.

Theorem 6.

In the system of Figure 1, suppose that

u_{1}^{\infty}

is entropy balanced, and that

G (z)

and

z_{1}^{\infty}

satisfy Assumptions 1 and 2, respectively, where the zeros

{ρ_{i}}_{i = 1}^{p}

of

G (z)

satisfy

| ρ_{1} | \geq \dots \geq | ρ_{m} | > 1 \geq | ρ_{m + 1} | \geq \dots \geq | ρ_{p} |

. For each

n \in N

, let

Q_{n}^{T} \in R^{n \times n}

be the unitary matrix holding the left singular vectors of

G_{n} \in R^{n \times n}

(as in Lemma 6), where

G_{n}

is as defined in (4).

1.: Then,

$\begin{matrix} 0 \leq \underset{n \to \infty}{\lim \inf} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) \leq \underset{n \to \infty}{\lim \sup} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) \leq \sum_{i = 1}^{{\hat{κ}}_{\infty}} \log | ρ_{i} | \overset{(8)}{\leq} \frac{1}{2 π} \int_{- π}^{π} \log |G (e^{j ω})| d ω . \end{matrix}$

(70)

The bounds on both extremes are tight. Moreover, the lower bound is reached if ${\hat{κ}}_{\infty} = 0$ .
2.: If ${\lim \inf}_{n \to \infty} \frac{1}{n} \log (σ_{\min} ({[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1}) = 0$ , then

$\begin{matrix} \sum_{i = m - {\overset{ˇ}{κ}}_{\infty} + 1}^{m} \log | ρ_{i} | \leq \underset{n \to \infty}{\lim \inf} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) . \end{matrix}$

(71)

Thus, the rightmost upper bound in (70) is achieved if ${\overset{ˇ}{κ}}_{\infty} = m$ .

Proof.

See Appendix B, page 37. □

The next technical result is very useful for finding conditions under which the requirements of point 2 in Theorem 6 are satisfied (the proof is in Appendix B, page 39).

Lemma 8.

Let F be an FIR LTI causal system of order m such that the m zeros of

F (z)

are NMP, and

F_{n} = Q_{n}^{T} D_{n} R_{n}

be an SVD for

F_{n}

, for every

n \in {m, m + 1, \dots}

. For each

κ \in {1, \dots, n}

, define

\begin{matrix} κ_{n} ≜ rank (\overset{1 κ}{\overset{⎴}{{[Q_{n}]}_{m}^{1}}}), \end{matrix}

(72)

and

\bar{κ} ≜ \min {m, κ}

. Then,

\begin{matrix} \lim_{n \to \infty} σ_{\min} (\overset{1 κ}{\overset{⎴}{{[Q_{n}]}_{m}^{1}}}) > 0, \end{matrix}

(73)

and

\lim_{n \to \infty} κ_{n} = \bar{κ}

.

Now, we can prove Theorem 4.

6.4. Proof of Theorem 4

Factorize

G (z)

as

G (z) = F (z) \tilde{G} (z)

, where

\tilde{G} (z)

is stable and minimum phase and

F (z)

is a stable FIR transfer function with all the m non-minimum-phase zeros of

G (z)

. Letting

{\tilde{u}}_{n}^{1} ≜ {\tilde{G}}_{n} u_{n}^{1}

, we have that

h (y_{n}^{1}) = h (F_{n} {\tilde{u}}_{n}^{1} + z_{n}^{1})

,

h ({\tilde{u}}_{n}^{1}) = h (u_{n}^{1})

, and that

{\tilde{u}}_{1}^{\infty}

is entropy balanced (from Lemma 4). Thus,

\begin{matrix} h (y_{n}^{1}) - h (u_{n}^{1}) = h (G_{n} u_{n}^{1} + z_{n}^{1}) - h (u_{n}^{1}) = h (F_{n} {\tilde{u}}_{n}^{1} + z_{n}^{1}) - h ({\tilde{u}}_{n}^{1}) . \end{matrix}

(74)

This means that the entropy gain of G due to the output disturbance

z_{1}^{\infty}

corresponds to the entropy gain of F due to the same output disturbance.

Clearly,

u_{1}^{\infty}

,

F (z)

, and

z_{1}^{\infty}

satisfy the assumptions of Theorem 6 with

Φ = {[I_{κ} | 0]}^{T}

(see Assumption 2). Therefore,

\begin{matrix} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} = \overset{1 κ}{\overset{⎴}{{[Q_{n}]}_{m}^{1}}} . \end{matrix}

(75)

Combining this with Lemma 8, it readily follows that, for every

κ \geq 1

, the condition in point 2 of Theorem 6 is met, and also

\lim_{n \to \infty} κ_{n} = \bar{κ}

. The proof is then completed by substituting

{\lim \inf}_{n \to \infty} κ_{n} = {\lim \sup}_{n \to \infty} κ_{n} = \bar{κ}

into (70) and (71).

7. Entropy Gain Due to a Random Initial State

Here, we analyze the scenario illustrated by Figure 1 for the case in which there exists a random initial state

x_{0}

independent of the input

u_{1}^{\infty}

, and zero (or deterministic) output disturbance.

The treatment of an initial state of the LTI system G requires one to first define an internal model for it. For this purpose, in this section, we consider the state-space realization of G in the Kalman canonical form, given by

\begin{matrix} x (k) ≜ [\begin{matrix} x_{c o} (k) \\ x_{\bar{c} o} (k) \\ x_{c \bar{o}} (k) \\ x_{\bar{c} \bar{o}} (k), \end{matrix}] = (\begin{matrix} A_{c o} & A_{12} & 0 & 0 \\ 0 & A_{\bar{c} o} & 0 & 0 \\ A_{31} & A_{32} & A_{c \bar{o}} & A_{34} \\ 0 & A_{42} & 0 & A_{\bar{c} \bar{o}} \end{matrix}) [\begin{matrix} x_{c o} (k - 1) \\ x_{\bar{c} o} (k - 1) \\ x_{c \bar{o}} (k - 1) \\ x_{\bar{c} \bar{o}} (k - 1), \end{matrix}] + [\begin{matrix} b_{c o} \\ 0 \\ b_{c \bar{o}} \\ 0 \end{matrix}] u (k) \end{matrix}

(76a)

\begin{matrix} y (k) = [\begin{matrix} c_{c o}^{T} & c_{\bar{c} o}^{T} & 0 & 0 \end{matrix}] x (k - 1) + u (k), \end{matrix}

(76b)

(see, e.g., Reference [21] or Reference [22], Chapter 6) where the column state vectors

x_{c o} (k)

,

x_{\bar{c} o} (k)

,

x_{c \bar{o}} (k)

,

x_{\bar{c} \bar{o}} (k)

are, respectively, controllable and observable, non-controllable and observable, controllable and non-observable, and non-controllable and non-observable. There is no loss of generality in choosing this state-space representation, because every state-space representation consistent with a rational transfer function

G (z)

can be written in this form (Reference [22], Theorem 6.7).

Since our interest is on the effect of the random initial state of G on its output, we only need to consider the observable subsystem within (76) and without its input, given by

\begin{matrix} x_{o} (k) ≜ [\begin{matrix} x_{c o} (k) \\ x_{\bar{c} o} (k) \end{matrix}] & = \underset{A_{o}}{\underset{⏟}{(\begin{matrix} A_{c o} & A_{12} \\ 0 & A_{\bar{c} o} \end{matrix})}} [\begin{matrix} x_{c o} (k - 1) \\ x_{\bar{c} o} (k - 1) \end{matrix}], \end{matrix}

(77a)

\begin{matrix} \tilde{y} (k) & = \underset{c_{o}^{T}}{\underset{⏟}{[c_{c o}^{T} c_{\bar{c} o}^{T}]}} x_{o} (k - 1), \end{matrix}

(77b)

where

\tilde{y}

is the natural response of G to its initial state

x_{o} (0)

and

x_{c o} \in R^{p}

and

x_{\bar{c} o} \in R^{q}

. We shall decompose

\tilde{y}

as

\begin{matrix} {\tilde{y}}_{1}^{\infty} = {[{\tilde{y}}_{\bar{c} o}]}_{1}^{n} + {[{\tilde{y}}_{c o}]}_{1}^{\infty}, \end{matrix}

(78)

where

{\tilde{y}}_{\bar{c} o}

and

{\tilde{y}}_{c o}

are the natural responses of G to initial states

{[0_{1 \times p} x_{\bar{c} o} {(0)}^{T}]}^{T}

and

{[x_{c o} {(0)}^{T} 0_{1 \times q}]}^{T}

, respectively. The natural response component

{\tilde{y}}_{c o}

can be generated by the following minimal state-space representation of

G (z)

, without the effect of its input u:

\begin{matrix} \underset{x_{c o} (k)}{\underset{⏟}{[\begin{matrix} x_{c o, 1} (k) \\ x_{c o, 2} (k) \\ x_{c o, 3} (k) \\ ⋮ \\ x_{c o, p} (k) \end{matrix}]}} = \underset{A_{c o}}{\underset{⏟}{(\begin{matrix} b_{1} & b_{2} & b_{3} & \dots & b_{p} \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & 1 & 0 \end{matrix})}} \underset{x_{c o}}{\underset{⏟}{[\begin{matrix} x_{c o, 1} (k - 1) \\ x_{c o, 2} (k - 1) \\ x_{c o, 3} (k - 1) \\ ⋮ \\ x_{c o, p} (k - 1) \end{matrix}],}} \end{matrix}

(79)

\begin{matrix} {\tilde{y}}_{c o} (k) = \underset{a^{T}}{\underset{⏟}{[a_{1} a_{2} \dots a_{p}]}} x_{c o} (k - 1) + \underset{p^{T}}{\underset{⏟}{[b_{1} b_{2} \dots b_{p}]}} x_{c o} (k - 1) . \end{matrix}

(80)

Now, we can state and prove the main result of this section:

Theorem 7.

Suppose G satisfies Assumption 1 and

u_{1}^{\infty}

is entropy balanced. Assume that

x_{o} (0)

(the observable part of the initial state of G) is independent of the input

u_{1}^{\infty}

,

| h (x_{o} (0)) | < \infty

and that

tr {K_{x_{o} (0)}} < \infty

. Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h (u_{1}^{n})) = \sum_{i = 1}^{m} \log |ρ_{i}| . \end{matrix}

(81)

Proof.

Both G and

u_{1}^{\infty}

satisfy the conditions of Theorem 6. Thus, as in its statement, we write

G (z) = F (z) \tilde{G} (z)

, where

\tilde{G} (z)

is stable and minimum phase and

F (z)

is a stable FIR transfer function with only the m non-minimum-phase zeros of

G (z)

.

Defining

w_{n}^{1} ≜ {\tilde{G}}_{n} u_{n}^{1}

, we have

\begin{matrix} y_{n}^{1} = F_{n} {\tilde{G}}_{n} u_{n}^{1} + {\tilde{y}}_{n}^{1} = F_{n} w_{n}^{1} + {\tilde{y}}_{n}^{1}, \end{matrix}

(82)

\begin{matrix} h (w_{n}^{1}) = h (u_{n}^{1}), \end{matrix}

(83)

and

{\tilde{y}}_{n}^{1} ⫫ w_{n}^{1}

. In addition, the fact that G is stable guarantees that the sample second moment of

{\tilde{y}}_{1}^{\infty}

decays exponentially, which means that

{\tilde{y}}_{1}^{\infty}

satisfies Assumption 2. Thus, the conditions of Lemma 6 are met considering

G_{n} = F_{n}

, where now

F_{n} = Q_{n}^{T} D_{n} R_{n}

is the SVD for

F_{n}

, and

d_{n, 1} \leq d_{n, 2} \leq \dots \leq d_{n, n}

. Consequently, the proof would be completed if we can show that

\lim_{n \to \infty} \frac{1}{n} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) = 0

. But all the involved variables have bounded variance, while

R_{n}

is unitary,

{[Q_{n}]}_{m}^{1}

has orthonormal rows and the entries of

{[D_{n}]}_{m}^{1}

decay exponentially with n. This implies that

\lim_{n \to \infty} \frac{1}{n} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) \leq 0

. Therefore, it is only left to prove that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) \geq 0 . \end{matrix}

(84)

Recalling (78), let us decompose

{[{\tilde{y}}_{c o}]}_{n}^{1}

so that

\begin{matrix} {\tilde{y}}_{n}^{1} & = F_{n} {\tilde{P}}_{n} x_{c o} (0) + P_{n} x_{c o} (0) + {[{\tilde{y}}_{\bar{c} o}]}_{n}^{1}, \end{matrix}

(85)

where

{\tilde{P}}_{n}, P_{n} \in R^{n \times (p + q)}

, the sequences

F_{n} {\tilde{P}}_{n} x_{c o} (0)

and

P_{n} x_{c o} (0)

, respectively, are the natural responses of

\tilde{G}

and F to the controllable and observable initial state

x_{c o}

, and

{[{\tilde{y}}_{\bar{c} o}]}_{n}^{1}

is the natural response of G to the non-controllable and observable initial state

x_{\bar{c} o} (0)

. Then,

\begin{matrix} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) & \overset{(a)}{\geq} h ({[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) \overset{(85)}{=} h ({[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} x_{c o} (0) + P_{n} x_{c o} (0) + {[{\tilde{y}}_{\bar{c} o}]}_{n}^{1})), \end{matrix}

(86)

\begin{matrix} \overset{(b)}{=} h ({[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} x_{c o} (0) + P_{n} x_{c o} (0)) | x_{\bar{c} o} (0)) = h ({[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n}) x_{c o} (0) | x_{\bar{c} o} (0)), \end{matrix}

(87)

where

(a)

is from the entropy-power inequality [1] and

(b)

holds because conditioning does not increase entropy and

{[{\tilde{y}}_{\bar{c} o}]}_{n}^{1}

is a deterministic function of

x_{\bar{c} o} (0)

. Let the SVD of

{[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n})

be

\begin{matrix} {[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n}) = S_{n} T_{n} H_{n}, n = m, m + 1, \dots, \end{matrix}

(88)

where

S_{n} \in R^{m \times m}

is unitary,

T_{n} = diag {t_{1}, t_{2}, \dots, t_{m}}

holds the singular values of

{[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n})

and

H_{n} \in R^{m \times p}

has orthonormal rows. Substituting this SVD into (87) we obtain

\begin{matrix} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\tilde{y}}_{n}^{1}) \geq h (S_{n} T_{n} H_{n} x_{c o} (0) | x_{\bar{c} o} (0)) = \log (\det (T_{n})) + h (H_{n} x_{c o} (0) | x_{\bar{c} o} (0)) . \end{matrix}

(89)

This last differential entropy is bounded because

| h (x_{o}) | < \infty

and

tr {K_{x_{o}}} < \infty

, which implies (thanks to Proposition A1) that

| h (H_{n} x_{c o}, x_{\bar{c} o}) | < \infty

, and by the chain rule of entropy,

\begin{matrix} h (H_{n} x_{c o}, x_{\bar{c} o}) = h (x_{\bar{c} o}) + h (H_{n} x_{c o} (0) | x_{\bar{c} o} (0)), \end{matrix}

(90)

so

| h (H_{n} x_{c o} (0) | x_{\bar{c} o} (0)) | < \infty

because

| h (x_{\bar{c} o} (0)) | < \infty

(again from Proposition A1). Thus, in view of (89) and (84), all that remains to prove is that

\begin{matrix} \lim_{n \to \infty} σ_{\min} ({[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n})) > 0, \end{matrix}

(91)

For that purpose, notice that

{[Q_{n}]}_{m}^{1} (F_{n} {\tilde{P}}_{n} + P_{n}) =^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1} {\tilde{P}}_{n} + {[Q_{n}]}_{m}^{1} P_{n}

. Therefore, from Lemma A4 (in Appendix C), it follows that (91) holds if

\begin{matrix} \lim_{n \to \infty} σ_{\min} ({[Q_{n}]}_{m}^{1} P_{n}) > 0, \end{matrix}

(92)

and

\begin{matrix} \lim_{n \to \infty} σ_{\max} (^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1} {\tilde{P}}_{n}) = 0 . \end{matrix}

(93)

To prove (93), recall that the entries in the diagonal matrix

^{1} {[D_{n}]}_{m}

decay exponentially with n. On the other hand, the rows of

{[R_{n}]}_{m}^{1}

are orthonormal. Finally, the fact that

\tilde{G}

is stable implies that the

p + q

columns of

{\tilde{P}}_{n}

have norms which are bounded for all n. These three observations readily yield that (93) holds.

To prove that (92) holds, write the rational transfer function of G (described by (80)) as

\begin{matrix} G (z) = \frac{1 + a_{1} z^{- 1} + \dots + a_{p} z^{- p}}{1 + b_{1} z^{- 1} + \dots + b_{p} z^{- p}} = \underset{F (z)}{\underset{⏟}{(1 + f_{1} z^{- 1} + \dots + f_{m} z^{- m})}} \underset{\tilde{G} (z)}{\underset{⏟}{\frac{1 + {\tilde{a}}_{1} z^{- 1} + \dots + {\tilde{a}}_{\tilde{m}} z^{- \tilde{m}}}{1 + b_{1} z^{- 1} + \dots + b_{p} z^{- p}}}}, \end{matrix}

(94)

where

\tilde{m} ≜ p - m

. The coefficients in the numerator of

G (z)

are related to those of

F (z)

and

\tilde{G} (z)

by the convolution

\begin{matrix} a_{i} = \sum_{j = 0}^{m} f_{j} {\tilde{a}}_{i - j}, i = 1, \dots, p, \end{matrix}

(95)

where

{\tilde{a}}_{0} = f_{0} = 1

.

Denote the natural response of F (up to time n) to its initial state

x_{F} (0)

(which is a linear function of

x_{c o} (0)

) as

\begin{matrix} {\ddot{y}}_{n}^{1} ≜ P_{n} x_{c o} (0) . \end{matrix}

(96)

Let

{\tilde{w}}_{n}^{1} ≜ {\tilde{P}}_{n} x_{c o} (0)

be the natural response of

\tilde{G}

to its initial state

x_{c o} (0)

. Following the structure of (80),

\tilde{w} (k)

can be written as

\begin{matrix} \tilde{w} (k) = [{\tilde{a}}_{1} \dots {\tilde{a}}_{\tilde{m}} 0 \dots 0] x_{c o} (k - 1) + p^{T} x_{c o} (k - 1), k = 1, 2, \dots, \end{matrix}

(97)

where

x_{c o}

satisfies (79). Considering the following minimal state-space representation of F

\begin{matrix} x_{F} (k) ≜ [\begin{matrix} x_{F, 1} (k) \\ x_{F, 2} (k) \\ x_{F, 3} (k) \\ ⋮ \\ x_{F, m} (k) \end{matrix}] = \underset{A_{F}}{\underset{⏟}{(\begin{matrix} 0 & 0 & \dots & \dots & 0 \\ 1 & 0 & 0 & \dots & 0 \\ 0 & 1 & 0 & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & 1 & 0 \end{matrix})}} [\begin{matrix} x_{F, 1} (k - 1) \\ x_{F, 2} (k - 1) \\ x_{F, 3} (k - 1) \\ ⋮ \\ x_{F, m} (k - 1) \end{matrix}] + [\begin{matrix} 1 \\ 0 \\ 0 \\ ⋮ \\ 0 \end{matrix}] w (k), \end{matrix}

(98)

\begin{matrix} {\tilde{y}}_{c o} (k) = \underset{c_{F}^{T}}{\underset{⏟}{[f_{1} f_{2} \dots f_{m}]}} x_{F} (k - 1) + \tilde{w} (k), \end{matrix}

(99)

it can be seen that the natural response of F to its own initial state

x_{F} (0)

can be written as

\begin{matrix} \ddot{y} (k) = {\tilde{y}}_{c o} (k) - \tilde{w} (k) - “ the effect of f_{1}, \dots, f_{k - 1} ” . \end{matrix}

(100)

But, from (80) and (95),

\begin{matrix} {\tilde{y}}_{c o} (k) & = {\tilde{a}}^{T} diag {[f_{1} \dots f_{m}]}_{\tilde{m} + 1} x_{c o} (k - 1) + \underset{\tilde{w} (k)}{\underset{⏟}{[{\tilde{a}}_{1} \dots {\tilde{a}}_{\tilde{m}} 0 \dots 0] x_{c o} (k - 1) + p^{T} x_{c o} (k - 1)}}, \end{matrix}

(101)

where

\tilde{a} ≜ [1 \tilde{a} \dots {\tilde{a}}_{\tilde{m}}]

, and

(102)

therefore,

\begin{matrix} \ddot{y} (1) = {\tilde{a}}^{T} diag {[f_{1} f_{2} \dots f_{m}]}_{\tilde{m} + 1} x_{c o} (0), \end{matrix}

(103)

\begin{matrix} \ddot{y} (2) = {\tilde{a}}^{T} diag {[0 f_{2} \dots f_{m}]}_{\tilde{m} + 1} A_{c o} x_{c o} (0), \end{matrix}

(104)

\begin{matrix} = {\tilde{a}}^{T} diag {[f_{2} \dots f_{m} 0]}_{\tilde{m} + 1} x_{c o} (0) \end{matrix}

(105)

\begin{matrix} ⋮ \end{matrix}

(106)

\begin{matrix} \ddot{y} (m) = {\tilde{a}}^{T} diag {[0 \dots 0 f_{m}]}_{\tilde{m} + 1} A_{c o}^{m - 1} x_{c o}, \end{matrix}

(107)

\begin{matrix} = {\tilde{a}}^{T} diag {[f_{m} 0 \dots 0]}_{\tilde{m} + 1} x_{c o} (0), \end{matrix}

(108)

with

\ddot{y} (k) = 0

for

k > m

. Therefore,

\begin{matrix} {\ddot{y}}_{m}^{1} = E x_{c o} (0) = \underset{E}{\underset{⏟}{[M | N]}} x_{c o} (0), \end{matrix}

(109)

where

M \in R^{m \times (p - m)}

and

N \in R^{m \times m}

is a lower anti-triangular Toeplitz matrix with

{\tilde{a}}_{\tilde{m}} f_{m}

along its main anti diagonal.

This implies that

P_{n} = {[E^{T} | 0_{p \times (n - m)}]}^{T}

and

\begin{matrix} σ_{\min} (E) > 0 . \end{matrix}

(110)

Thus, resuming the reasoning before (94), we have that

\begin{matrix} {[Q_{n}]}_{m}^{1} P_{n} =^{1} {[Q_{n}^{(p)}]}_{m} E . \end{matrix}

(111)

It then follows from (110) and Lemma 8 that

\begin{matrix} \lim_{n \to \infty} σ_{\min} ({[Q_{n}]}_{m}^{1} P_{n}) = \lim_{n \to \infty} σ_{\min} (^{1} {[Q_{n}]}_{m} E) . > 0 . \end{matrix}

(112)

Hence, (91) is satisfied. Substituting (91) into (89) and the latter into (84) yields

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h ({[D_{n}]}_{m}^{1} R_{n} w_{n}^{1} + {[Q_{n}]}_{m}^{1} {\bar{y}}_{n}^{1}) = 0 . \end{matrix}

(113)

The proof is completed by invoking Lemma (6). □

Theorem 7 allows us to formalize the effect that the presence or absence of a random initial state has on the entropy gain using arguments similar to those utilized in Section 6.

8. Some Implications

The purpose of this section is to illustrate how the results obtained in the previous section can be applied to other problems. To do so, we present next some of the implications of these results on three different problems previously addressed in the literature, namely finding the rate-distortion function for non-stationary processes, an inequality in networked control theory, and the feedback capacity of Gaussian stationary channels. The common feature in these three problems is that, in all of them, non-minimum phase transfer functions play a role (either explicitly or implicitly).

8.1. Networked Control

The analysis developed in Reference [13] considers an LTI system P within a noisy feedback loop, as the one depicted in Figure 4. In this scheme, C represents a causal feedback channel which combines the output of P with an exogenous (noise) random process

c_{1}^{\infty}

to generate its output. The process

c_{1}^{\infty}

is assumed independent of the initial state of P, represented by the random vector

x_{0}

, which has finite differential entropy.

For this system, it is shown in Reference [13], Theorem 4.2, that

\begin{matrix} \bar{h} (y_{1}^{\infty}) \geq \bar{h} (u_{1}^{\infty}) + \lim_{n \to \infty} \frac{1}{n} I (x_{0}; y_{1}^{n}), \end{matrix}

(114a)

where

I (x_{0}; y_{1}^{n})

is the mutual information (see Reference [1], Section 8.5) between

x_{0}

and

y_{1}^{n}

, with equality if w is a deterministic function of v. Furthermore, it is shown in Reference [12], Lemma 3.2, that, if

| h (x_{0}) | < \infty

and the steady state variance of system P remains asymptotically bounded as

k \to \infty

, then

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (x_{0}; y_{1}^{n}) \geq \sum_{p_{i} : | p_{i} | > 1} \log |p_{i}|, \end{matrix}

(114b)

where

{p_{i}}

are the poles of P. Thus, for the (simplest) case in which

w = v

, the output

y_{1}^{\infty}

is the result of filtering

u_{1}^{\infty}

by a filter

G = \frac{1}{1 - P}

(as shown in Figure 4 right), and the resulting entropy rate of

y_{1}^{\infty}

will exceed that of

u_{1}^{\infty}

only if there is a random initial state with bounded differential entropy (see (114a)). Moreover, if

w = v

and

G (z)

is stable, (114) (as well as Reference [13], Lemma 4.3) implies that this entropy gain is lower bounded by the right-hand side (RHS) of (8), which is greater than zero if and only if G is NMP. However, both [12,13] do not provide conditions under which this lower bound is reached.

In Reference [14], Theorem 14, it is shown that, when there is perfect feedback (i.e., when

v = w

), as in Figure 4 right, with P being the concatenation of a stabilizing LTI controller and an LTI plant, and assuming

u_{1}^{\infty}

is Gaussian i.i.d. and a Gaussian initial state, then

\begin{matrix} \bar{h} (y_{1}^{\infty}) - \bar{h} (u_{1}^{\infty}) = \sum_{p_{i} : | p_{i} | > 1} \log | p_{i} | . \end{matrix}

(115)

Notice that this implies reaching equality in both (114a) and (114b).

By using the results obtained in Section 7 we show next that equality holds in (114b) provided the feedback channel satisfies the following assumption:

Assumption 3.

The feedback channel in Figure 4 can be written as

\begin{matrix} w = A B v + B F (c), \end{matrix}

(116)

where:

1.: A and B are stable rational transfer functions such that $A B$ is biproper, $A B P$ has the same unstable poles as P, and the feedback $A B$ stabilizes the plant P.
2.: F is any (possibly non-linear) operator such that $\tilde{c} ≜ F (c)$ has finite variance $σ_{\tilde{c} (n)}^{2}$ for finite n, $\lim_{n \to \infty} n^{- 1} \log (σ_{\tilde{c} (n)}^{2}) = 0$ , and
3.: $c_{1}^{\infty} ⫫ x_{0}$ .

We also extend Reference [14], Theorem 14, to situations including a feedback channel satisfying Assumption 3. For the perfect-feedback case, this extends the validity of (115) to a much larger class of distributions for

u_{1}^{\infty}

.

An illustration of the class of feedback channels satisfying this assumption is depicted on top of Figure 5. Trivial examples of channels satisfying Assumption 5 are a Gaussian additive channel preceded and followed by linear operators [23]. Indeed, when F is an LTI system with a strictly causal transfer function, the feedback channel that satisfies Assumption 3 is widely known as a noise shaper with input pre and post filter, used in, e.g., References [24,25,26,27].

Theorem 8.

In the networked control system of Figure 4, suppose that the feedback channel satisfies Assumption 3, that the plant

P (z)

has poles

{p_{i}}_{i}^{p}

, and that the input

u_{1}^{\infty}

is entropy balanced. If the random initial states of

A B

and P, namely

s_{0} \in R^{q}

and

x_{0} \in R^{p}

, respectively, are independent, have finite variance and

| h (x_{0}) | < \infty

, then

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} I (x_{0}; y_{1}^{n}) = \sum_{|p_{i}| > 1} \log |p_{i}| . \end{matrix}

(117a)

Moreover,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{1}^{n}) - h ({\tilde{u}}_{1}^{n})) = \sum_{|p_{i}| > 1} \log |p_{i}|, \end{matrix}

(117b)

where

\tilde{u} ≜ u + B \tilde{c}

(see Figure 5 bottom).

Proof.

Let

P (z) = N (z) / G (z)

and

T (z) ≜ A (z) B (z) = Γ (z) / Θ (z)

. We will first show that the output

y_{n}^{1}

can be written as

\begin{matrix} y_{n}^{1} = G_{n} {\tilde{G}}_{n} {\tilde{u}}_{n}^{1} + G_{n} {\tilde{P}}_{n} {[x_{0}^{T} s_{0}^{T}]}^{T} + P_{n} x_{0}, \end{matrix}

(118)

where

\tilde{G}

is the stable LTI system with biproper and MP transfer function

\begin{matrix} \tilde{G} (z) ≜ \frac{Θ (z)}{Θ (z) G (z) + N (z) Γ (z)}, \end{matrix}

(119)

with

s_{0} \in R^{q}

,

x_{0} \in R^{p}

and

{[x_{0}^{T} s_{0}^{T}]}^{T}

being the random initial states of T, G, and

\tilde{G}

, respectively, and

\begin{matrix} \tilde{u} ≜ u + B \tilde{c} \end{matrix}

(120)

(see Figure 5 bottom). The matrices

{\tilde{P}}_{n} \in R^{n \times p}

and

P_{n} \in R^{n \times (p + q)}

. From Figure 5, it is clear that the transfer function from

\tilde{u}

to y is

G (z) \frac{Θ (z)}{Θ (z) G (z) + N (z) Γ (z)}

, validating the first term on the RHS of (118). In addition, it is evident that the initial state of

\tilde{G}

is a linear combination of

x_{0}

and

s_{0}

, justifying the term

{\tilde{P}}_{n} {[x_{0}^{T} s_{0}^{T}]}^{T}

as the natural response of

\tilde{G}

. Thus, it is only left to prove that the initial state of G is

x_{0}

. For that purpose, let

G (z) = 1 - \sum_{i = 1}^{p} g_{i} z^{- i}

and

N (z) = \sum_{i = 1}^{p} n_{i} z^{- i}

. Define the following variables:

\begin{matrix} o ≜ \frac{1}{G} y, w ≜ N o . \end{matrix}

(121)

Then, the recursion corresponding to

P (z)

is

\begin{matrix} o_{k} = \sum_{i = 1}^{p} g_{i} o_{k - i} + y_{k}, k \geq 1, \end{matrix}

(122)

\begin{matrix} w_{k} = \sum_{i = 1}^{p} n_{i} o_{k - i}, k \geq 1 . \end{matrix}

(123)

This reveals that the initial state of

P (z)

corresponds to

\begin{matrix} x_{0} = [o_{1 - p} o_{2 - p} \dots o_{0}] . \end{matrix}

(124)

But, from (121), o is also the output of

\tilde{G}

to the input

\tilde{u}

, and

\begin{matrix} y_{k} & = o_{k} - \sum_{i = 1}^{p} f_{i} o_{k - i}, k \geq 1, \end{matrix}

(125)

which means that the initial state of G is

x_{0}

.

Now, using (118), we have that

\begin{matrix} I (x_{0}; y_{n}^{1}) = h (y_{n}^{1}) - h (y_{n}^{1} | x_{0}), \end{matrix}

(126)

\begin{matrix} = h (y_{n}^{1}) - h (F_{n} [{\tilde{G}}_{n} {\tilde{u}}_{n}^{1} + {\tilde{P}}_{n} s_{0}]), \end{matrix}

(127)

\begin{matrix} = h (F_{n} {\tilde{u}}_{n}^{1} + P_{n} x_{0}) - h ({\tilde{u}}_{n}^{1}), \end{matrix}

(128)

where the first equality is because

s_{0} ⫫ x_{0}

and

{\bar{u}}_{n}^{1} ≜ {\tilde{G}}_{n} {\tilde{u}}_{n}^{1} + {\tilde{P}}_{n} s_{0}

. The last equality holds since the first sample of the unit-impulse response of G is 1. Since

u_{1}^{\infty}

is entropy balanced,

\tilde{G} (z)

is biproper, stable, and MP, and both

{\tilde{c}}_{1}^{\infty}

and

{\tilde{P}}_{n} s_{0}

have finite variance, it follows from Lemmas 3 and 4 that

{\bar{u}}_{1}^{\infty}

is entropy balanced, as well. Thus, the proof of the first claim is completed by direct application of Theorem 7.

For the second claim,

\begin{matrix} h (y_{n}^{1}) - h ({\tilde{u}}_{n}^{1}) \overset{(a)}{=} h (y_{n}^{1}) - h ({\tilde{G}}_{n} {\tilde{u}}_{n}^{1}) = h (y_{n}^{1}) - h ({\bar{u}}_{n}^{1}) + (h ({\tilde{G}}_{n} {\tilde{u}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1})), \end{matrix}

(129)

where

(a)

holds because the first sample of the unit-impulse response of

\tilde{G}

is

{\tilde{g}}_{0} = \lim_{z \to \infty} \tilde{G} (z) = 1

. Then,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{1}) - h ({\tilde{u}}_{n}^{1})) = \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{1}) - h ({\bar{u}}_{n}^{1})) + \lim_{n \to \infty} \frac{1}{n} (h ({\tilde{G}}_{n} {\tilde{u}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1})), \end{matrix}

(130)

\begin{matrix} \overset{(a)}{=} \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{1}) - h ({\bar{u}}_{n}^{1})), \end{matrix}

(131)

\begin{matrix} \overset{(b)}{=} \sum_{|p_{i}| > 1} \log |p_{i}|, \end{matrix}

(132)

where

(a)

holds because

\tilde{G} \tilde{u}

is entropy balanced (from Lemma 4), and

{\tilde{P}}_{n} s_{0}

has finite variance, allowing us to apply Proposition A3. In turn,

(b)

follows from (128) and (117a). This completes the proof. □

Remark 2.

If

A (z)

had poles outside the unit circle, then Theorem 8 can still be applied by associating those poles to P.

Remark 3.

Under the conditions of Theorem 8, one has that if either

\bar{h} (u_{1}^{\infty})

or

\bar{h} ({\tilde{c}}_{1}^{\infty})

exists, then the other entropy rate exists too. In that case, if

c ⫫ u

and defining

\bar{c} ≜ B \tilde{c}

, (117) yields

\begin{matrix} \bar{h} (y_{1}^{\infty}) - \bar{h} (u_{1}^{\infty}) - \bar{h} ({\bar{c}}_{1}^{\infty}) = \lim_{n \to \infty} \frac{1}{n} I (x_{0}; y_{1}^{n}) = \sum_{|p_{i}| > 1} \log |p_{i}|, \end{matrix}

(133)

revealing that the gap in (114a) is exactly

\bar{h} ({\bar{c}}_{1}^{\infty})

. In addition, in the perfect-feedback scenario, Theorem 8 extends the validity of (115) from the Gaussian i.i.d. u and Gaussian

x_{0}

considered in Reference [14], Theorem 14, to an entropy-balanced u and an

x_{0}

with finite variance and finite differential entropy.

8.2. Rate Distortion Function for Non-Stationary Processes

In this section, we obtain a simpler proof of a result by Gray, Hashimoto and Arimoto [15,16,17], which compares the rate distortion function (RDF) of a non-stationary auto-regressive Gaussian process

x_{1}^{\infty}

(of a certain class to be defined shortly) to that of a corresponding stationary version, under MSE distortion. Our proof is based upon the ideas developed in the previous sections, and extends the class of non-stationary sources for which the results in References [15,16,17] are valid.

To be more precise, let

{a_{i}}_{i = 1}^{\infty}

and

{{\tilde{a}}_{i}}_{i = 1}^{\infty}

be, respectively, the impulse responses of two linear time-invariant filters A and

\tilde{A}

with rational transfer functions

\begin{matrix} A (z) & = \frac{z^{M}}{\prod_{i = 1}^{M} (z - p_{i})}, \end{matrix}

(134)

\begin{matrix} \tilde{A} (z) & = \frac{z^{M}}{\prod_{i = 1}^{M} | p_{i}^{*} | (z - 1 / p_{i}^{*})}, \end{matrix}

(135)

where

|p_{i}| > 1

,

\forall i = 1, \dots, M

. From these definitions, it is clear that

A (z)

is unstable,

\tilde{A} (z)

is stable, and

\begin{matrix} | A (e^{j ω}) | = | \tilde{A} (e^{j ω}) |, \forall ω \in [- π, π] . \end{matrix}

(136)

Notice also that

\lim_{|z| \to \infty} A (z) = 1

and

\lim_{|z| \to \infty} \tilde{A} (z) = 1 / \prod_{i = 1}^{M} | p_{i} |

; thus,

\begin{matrix} a_{0} = 1, & {\tilde{a}}_{0} = \prod_{i = 1}^{M} {| p_{i} |}^{- 1} . \end{matrix}

(137)

Consider the non-stationary random sequence (source)

x_{1}^{\infty}

and the asymptotically stationary source

{\tilde{x}}_{1}^{\infty}

generated by passing a stationary Gaussian process

w_{1}^{\infty}

through

A (z)

and

\tilde{A} (z)

, respectively, which can be written as

\begin{matrix} x_{n}^{1} = A_{n} w_{1}^{n}, n = 1, \dots, \end{matrix}

(138)

\begin{matrix} {\tilde{x}}_{n}^{1} = {\tilde{A}}_{n} w_{1}^{n}, n = 1, \dots . \end{matrix}

(139)

(A block-diagram associated with the construction of x is presented in Figure 6.)

Define the rate-distortion functions for these two sources as

\begin{matrix} R_{x} (D) ≜ \lim_{n \to \infty} R_{x, n} (D), & R_{x, n} (D) & ≜ \min \frac{1}{n} I (x_{1}^{n}; x_{1}^{n} + u_{1}^{n}), \end{matrix}

(140)

\begin{matrix} R_{\tilde{x}} (D) ≜ \lim_{n \to \infty} R_{\tilde{x}, n} (D), & R_{\tilde{x}, n} (D) & ≜ \min \frac{1}{n} I ({\tilde{x}}_{1}^{n}; {\tilde{x}}_{1}^{n} + {\tilde{u}}_{1}^{n}), \end{matrix}

(141)

where, for each n, the minima are taken over all the conditional probability density functions

f_{u_{1}^{n} | x_{1}^{n}}

and

f_{{\tilde{u}}_{1}^{n} | {\tilde{x}}_{1}^{n}}

yielding

E [∥ u_{n}^{1} ∥^{2}] / n \leq D

and

E [∥ {\tilde{u}}_{n}^{1} ∥^{2}] / n \leq D

, respectively.

The above rate-distortion functions have been characterized in References [15,16,17] for the case in which

w_{1}^{\infty}

is an i.i.d. Gaussian process. In particular, it is explicitly stated in References [16,17] that, for that case,

\begin{matrix} R_{x} (D) - R_{\tilde{x}} (D) = \frac{1}{2 π} \int_{- π}^{π} \log | A^{- 1} (e^{j ω}) | d ω = \sum_{i = 1}^{M} \log | p_{i} | . \end{matrix}

(142)

We will next provide an alternative and simpler proof of this result, and extend its validity for general (not-necessarily stationary) Gaussian

w_{1}^{\infty}

, using the entropy gain properties of non-minimum phase filters established in Section 6. Indeed, the approach in References [15,16,17] is based upon asymptotically-equivalent Toeplitz matrices in terms of the signals’ covariance matrices. This restricts

w_{1}^{\infty}

to be Gaussian and i.i.d. and

A (z)

to be an all-pole unstable transfer function, and then, the only non-stationarity allowed is that arising from unstable poles. For instance, a cyclo-stationary innovation followed by an unstable filter

A (z)

would yield a source which cannot be treated using Gray and Hashimoto’s approach. By contrast, the reasoning behind our proof lets

w_{1}^{\infty}

be any entropy-balanced Gaussian process with bounded differential entropy rate, and then let the source be

A w

, with

A (z)

having unstable poles (and possibly zeros and stable poles, as well).

The statement is as follows:

Theorem 9.

Let

w_{1}^{\infty}

be any Gaussian entropy-balanced process with bounded differential entropy rate, and let

x_{1}^{\infty}

and

{\tilde{x}}_{1}^{\infty}

be as defined in (138) and (139), respectively. Then, (142) holds.

Thanks to the ideas developed in the previous sections, it is possible to give an intuitive outline of the proof of this theorem (given in Appendix B, page 40) by using a sequence of block diagrams. More precisely, consider the diagrams shown in Figure 7.

In the top diagram in this figure, suppose that

y = C x + u

realizes the RDF for the non-stationary source x. The sequence u is independent of

x

, and the linear filter

C (z)

is such that the error

(y - x) ⫫ y

(a necessary condition for minimum MSE optimality). The filter

B (z)

is the Blaschke product of

A (z)

(see (A83) in Appendix B) (a stable, NMP filter with unit frequency response magnitude such that

\tilde{x} = B x

).

If one moves the filter

B (z)

towards the source, then the middle diagram in Figure 7 is obtained. By doing this, the stationary source

\tilde{x}

appears with an additive error signal

\tilde{u}

that has the same asymptotic variance as u, reconstructed as

\tilde{y} = C \tilde{x} + \tilde{u}

. From the invertibility of

B (z)

, it also follows that the mutual information rate between

\tilde{x}

and

\tilde{y}

equals that between x and y. Thus, the channel

\tilde{y} = C \tilde{x} + \tilde{u}

has the same rate and distortion as the channel

y = C x + u

.

However, if one now adds a short disturbance d to the error signal

\tilde{u}

(as depicted in the bottom diagram of Figure 7), then the resulting additive error term

\bar{u} = \tilde{u} + d

will be independent of

\tilde{x}

and will have the same asymptotic variance as

\tilde{u}

. Nonetheless, the differential entropy rate of

\bar{u}

will exceed that of

\tilde{u}

by the RHS of (142). This will make the mutual information rate between

\tilde{x}

and

\bar{y}

to be less than that between

\tilde{x}

and

\tilde{y}

by the same amount. Hence,

R_{\tilde{x}} (D)

is at most

R_{x} (D) - \sum_{i = 1}^{m} \log |p_{i}|

. A similar reasoning can be followed to prove that

R_{x} (D) - R_{\tilde{x}} (D) \leq \sum_{i = 1}^{m} \log |p_{i}|

.

8.3. The Feedback Channel Capacity of (Non-White) Gaussian Channels

Consider a non-white additive Gaussian channel of the form

\begin{matrix} y_{k} = x_{k} + z_{k}, \end{matrix}

(143)

where the input x is subject to the power constraint

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} E [∥ x_{n}^{1} ∥^{2}] \leq P, \end{matrix}

(144)

and

z_{1}^{\infty}

is a stationary Gaussian process.

The feedback information capacity of this channel is realized by a Gaussian input x, and is given by

\begin{matrix} C_{FB} = \lim_{n \to \infty} \max_{K_{x_{n}^{1}} : \frac{1}{n} tr {K_{x_{n}^{1}}} \leq P} I (x_{n}^{1}; y_{n}^{1}), \end{matrix}

(145)

where

K_{x_{n}^{1}}

is the covariance matrix of

x_{n}^{1}

, and, for every

k \in N

, the input

x_{k}

is allowed to depend upon the channel outputs

y_{1}^{k - 1}

(since there exists a causal, noise-less feedback channel with one-step delay).

In Reference [9], it was shown that if z is an auto-regressive moving-average process of M-th order, then

C_{FB}

can be achieved by the scheme shown in Figure 8. In this system, B is a strictly causal and stable finite-order filter and

v_{1}^{\infty}

is Gaussian with

v_{k} = 0

for all

k > M

and such that

v_{n}^{1}

is Gaussian with a positive-definite covariance matrix

K_{v_{M}^{1}}

.

Here, we use the ideas developed in Section 6 to show that the information rate achieved by the capacity-achieving scheme proposed in Reference [9] drops to zero if there exists any additive disturbance of length at least M and finite differential entropy affecting the output, no matter how small.

To see this, notice that, in this case, and for all

n > M

,

\begin{matrix} I (x_{1}^{n}; y_{1}^{n}) = I (v_{1}^{M}; y_{1}^{n}) = h (y_{n}^{1}) - h (y_{n}^{1} | v_{n}^{1}), \end{matrix}

(146)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} + v_{n}^{1} | v_{M}^{1}), \end{matrix}

(147)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} | v_{M}^{1}), \end{matrix}

(148)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1}) = h (y_{n}^{1}) - h (z_{n}^{1}), \end{matrix}

(149)

\begin{matrix} = h ((I_{n} + B_{n}) z_{n}^{1} + v_{n}^{1}) - h (z_{n}^{1}), \end{matrix}

(150)

since

\det (I_{n} + B_{n}) = 1

. From Theorem 4, this gap between differential entries is precisely the entropy gain introduced by

I_{n} + B_{n}

to an input

z_{n}^{1}

when the output is affected by the disturbance

v_{M}^{1}

. Thus, from Theorem 4, the capacity of this scheme will correspond to

\frac{1}{2 π} \int_{- π}^{π} \log |1 + B (e^{j ω})| d ω = \sum_{|ρ_{i}| > 1} \log |ρ_{i}|

, where

{ρ_{i}}_{i = 1}^{M}

are the zeros of

1 + B (z)

, which is precisely the result stated in Reference [9], Theorem 4.1.

However, if the output is now affected by an additive disturbance

d_{1}^{\infty}

not passing through

B (z)

such that

d_{k} = 0

,

\forall k > M

and

| h (d_{M}^{1}) | < \infty

, with

d_{1}^{\infty} ⫫ (v_{1}^{M}, z_{1}^{\infty})

, then we will have

\begin{matrix} y_{n}^{1} = v_{n}^{1} + (I_{n} + B_{n}) z_{n}^{1} + d_{n}^{1} . \end{matrix}

(151)

In this case,

\begin{matrix} I (x_{1}^{n}; y_{1}^{n}) = I (v_{1}^{M}; y_{1}^{n}) = h (y_{n}^{1}) - h (y_{n}^{1} | v_{n}^{1}), \end{matrix}

(152)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} + v_{n}^{1} + d_{n}^{1} | v_{M}^{1}), \end{matrix}

(153)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} + d_{n}^{1} | v_{M}^{1}), \end{matrix}

(154)

\begin{matrix} = h (y_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} + d_{n}^{1}) . \end{matrix}

(155)

But

\lim_{n \to \infty} \frac{1}{n} (h ((I_{n} + B_{n}) z_{n}^{1} + v_{n}^{1} + d_{n}^{1}) - h ((I_{n} + B_{n}) z_{n}^{1} + d_{n}^{1})) = 0,

which follows directly from applying Theorem 4 to each of the differential entries. Notice that this result holds irrespective of how small the power of the disturbance may be.

Thus, the capacity-achieving scheme proposed in Reference [9] (and further studied in Reference [28]), although of groundbreaking theoretical importance, would yield zero rate in any practical situation, since in every physically implemented scheme, signals are unavoidably affected by some amount of noise.

9. Conclusions

We have provided an intuitive explanation and a rigorous characterization of the entropy gain of a linear time-invariant (LTI) system, defined as the difference between the differential entropy rates of its output and input random signals. The continuous-time version of this problem, considered by Shannon in Theorem 14 of his 1948 landmark paper, involves an LTI system

G_{c}

band limited to B [Hz]. For this scenario, we restricted our attention to systems such that the samples of its unit-impulse response, taken

{(2 B)}^{- 1}

seconds apart, correspond to the unit-impulse response

g_{0}, g_{1}, \dots

of a causal and stable discrete-time system G. We show that the entropy gain in this case is

\log | g_{0} |

, which implies that, for this class of systems, Shannon’s Theorem 14 holds if and only if

G_{c}

has a corresponding discrete-time G that is minimum phase (MP).

For the discrete-time case, we introduced a new notion referred to as effective differential entropy, which quantifies the amount of uncertainty in vector signals that are confined to subspaces of lower dimensionality than that of the signals themselves. (Note that this is not possible by the conventional notion of differential entropy, which simply diverges to minus infinity.) It turns out that the difference in effective differential entropy rate between an n-length input to an LTI discrete-time system with frequency response

G (e^{j ω})

, and its full length output, as n tends to infinity, equals

\frac{1}{2 π} \int_{- π}^{π} | G (e^{j ω}) | d ω

.

When comparing input and output sequences of equal length, our analysis revealed that, in the absence of external random disturbances, the entropy gain of a discrete-time LTI system G with unit-impulse response

g_{0}, g_{1}, \dots

is simply

\log | g_{0} |

. An entropy gain greater than

\log | g_{0} |

can be obtained only if a random signal is added to the output of G and if such output process has statistical properties that make it susceptible to the added random signal. In order to characterize the role of G, its input has been assumed to be entropy balanced (EB), a notion introduced herein. Crucially, the differential entropy rate of an EB process is not susceptible to random signals. EB processes constitute a large family that includes Gaussian processes with bounded, non-vanishing variance. We also show that (i) the sum of an EB process and any bounded variance process is EB, too, and (ii) passing an EB process by a stable MP filter yields an EB process. When the input is EB, we show that if G has NMP zeros

ρ_{1}, ρ_{2}, \dots, ρ_{m}

, then the largest possible entropy gain is

| g_{0} | + \sum_{i = 1}^{m} \log | ρ_{i} |

, which equals

\frac{1}{2 π} \int_{- π}^{π} | G (e^{j ω}) | d ω

. This upper bound is achieved by adding a finite-length output disturbance with finite variance and bounded differential entropy if and only if its length is at least m, no matter how tiny its variance may be. The same entropy gain is also obtained if G has a random initial state with bounded differential entropy and finite variance.

We used these fundamental insights about when the entropy gain occurs in order to establish a new and more general proof of the quadratic rate-distortion function for non-stationary Gaussian sources. Moreover, we demonstrated that the information rate of the capacity-achieving scheme proposed in Reference [9] for the auto regressive Gaussian channel with feedback drops to zero in the presence of any additive disturbance in the channel input or output of sufficient (finite) length, no matter how small it may be. This has crucial implications in any physical setup, where noise is unavoidable.

Author Contributions

Conceptualization, M.S.D. and M.M.; Investigation, M.S.D., M.M. and J.Ø.; Writing—original draft, M.S.D.; Writing—review and editing, M.M. and J.Ø. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Comisión Nacional de Investigación Científica y Tecnológica grant number FB0008.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 3

The total length of the output ℓ, will grow with the length n of the input, if G is FIR, and will be infinite, if G is IIR. Letting

η + 1

be the length of the impulse response of G in the FIR case, we define the output-length function

\begin{matrix} ℓ (n) ≜ length of y when input is u_{n}^{1} = \{\begin{matrix} n + η & , if G is FIR, \\ \infty & , if G is IIR . \end{matrix} \end{matrix}

(A1)

It is also convenient to define the sequence of matrices

{{\overset{˘}{G}}_{n}}_{n = 1}^{\infty}

, where

{\overset{˘}{G}}_{n} \in R^{ℓ (n) \times n}

is Toeplitz with

{[{\overset{˘}{G}}_{n}]}_{i, j} = 0, \forall i < j

,

{[{\overset{˘}{G}}_{n}]}_{i, j} = g_{i - j}, \forall i \geq j

. This allows one to write the entire output

y_{1}^{ℓ}

of a causal LTI filter G with impulse response

{g_{k}}_{k = 0}^{η}

to an input

u_{1}^{\infty}

as

\begin{matrix} y_{ℓ (n)}^{1} (u_{n}^{1}) = {\overset{˘}{G}}_{n} u_{n}^{1} . \end{matrix}

(A2)

Let the SVD of

{\overset{˘}{G}}_{n}

be

{\overset{˘}{G}}_{n} = {\overset{˘}{Q}}_{n}^{T} {\overset{˘}{D}}_{n} {\overset{˘}{R}}_{n}

, where

{\overset{˘}{Q}}_{n} \in R^{n \times ℓ (n)}

has orthonormal rows,

{\overset{˘}{D}}_{n} \in R^{n \times n}

is diagonal with positive elements, and

{\overset{˘}{R}}_{n} \in R^{n \times n}

is unitary.

The effective differential entropy of

y_{1}^{ℓ (n)} (u_{1}^{n})

exceeds the differential entropy of

u_{1}^{n}

by

\begin{matrix} \overset{˘}{h} (y_{ℓ (n)}^{1} (u_{n}^{1})) - h (u_{n}^{1}) = h ({\overset{˘}{Q}}_{n} {\overset{˘}{G}}_{n} u_{n}^{1}) - h (u_{n}^{1}) = h ({\overset{˘}{D}}_{n} {\overset{˘}{R}}_{n} u_{n}^{1}) - h (u_{n}^{1}) = \log \det ({\overset{˘}{D}}_{n}) . \end{matrix}

(A3)

The determinant of

D_{n}

can be related to that of

{\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n}

by noticing that

\begin{matrix} {\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n} = {({\overset{˘}{Q}}_{n}^{T} {\overset{˘}{D}}_{n} {\overset{˘}{R}}_{n})}^{T} ({\overset{˘}{Q}}_{n}^{T} {\overset{˘}{D}}_{n} {\overset{˘}{R}}_{n}) = {\overset{˘}{R}}_{n}^{T} {\overset{˘}{D}}_{n} {\overset{˘}{Q}}_{n} {\overset{˘}{Q}}_{n}^{T} {\overset{˘}{D}}_{n} {\overset{˘}{R}}_{n} = {\overset{˘}{R}}_{n}^{T} {\overset{˘}{D}}_{n}^{2} {\overset{˘}{R}}_{n} . \end{matrix}

(A4)

Since

{\overset{˘}{R}}_{n}

is unitary, it follows that

\det {\overset{˘}{D}}_{n}^{2} = \det {\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n}

, which from (A3) means that

\begin{matrix} \overset{˘}{h} (y_{ℓ (n)}^{1} (u_{n}^{1})) - h (u_{n}^{1}) = \frac{1}{2} \log (\det ({\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n})) . \end{matrix}

(A5)

The product

H_{n} ≜ {\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n}

is a symmetric Toeplitz matrix, with its first column,

{[h_{0} h_{1} \dots h_{n - 1}]}^{T}

, given by

h_{i} = \sum_{k = 0}^{n} g_{k} g_{k - i}

. Thus, the sequence

{h_{i}}_{i = 0}^{n - 1}

corresponds to the samples 0 to

n - 1

of those resulting from the complete convolution

g * g^{-}

, even when the filter G is IIR, where

g^{-}

denotes the time-reversed (possibly infinitely long) response g. Consequently, and since

G (z)

has no zeros on the unit circle, and g is absolutely summable, we can use the Grenander and Szegö’s theorem [29], and Reference [18], Theorem 4.2, to obtain that

\begin{matrix} \lim_{n \to \infty} \log (\det {({\overset{˘}{G}}_{n}^{T} {\overset{˘}{G}}_{n})}^{1 / n}) = \frac{1}{2 π} \int_{- π}^{π} \log {|G (e^{j ω})|}^{2} d ω . \end{matrix}

(A6)

In order to finish the proof, we divide (A5) by n, take the limit as

n \to \infty

, and replace (A6) in the latter.

Appendix B. Proofs of Results Stated in the Previous Sections

Proof of Lemma 1.

Let

σ_{u (k)}^{2}

be the variance of

u (k)

. Thus,

h (u_{n}^{1}) = \frac{1}{2} \log ({(2 π e^{})}^{n} \det (diag

{σ_{u (k)}^{2}}_{k = 1}^{n}))

. Let

y_{n}^{ν + 1} ≜ Φ_{n} u_{n}^{1}

. Then,

K_{y_{n}^{ν + 1}} = Φ_{n} diag {σ_{u (k)}^{2}}_{k = 1}^{n} Φ_{n}^{T}

. As a consequence,

\begin{matrix} h (y_{n}^{ν + 1}) = \frac{1}{2} \log ({(2 π e^{})}^{[n - ν]} \det (Φ_{n} diag {σ_{u (k)}^{2}}_{k = 1}^{n} Φ_{n}^{T})) . \end{matrix}

(A7)

But from the Courant-Fischer theorem [30],

\begin{matrix} \log (\det (diag {σ_{u (k)}^{2}}_{k = 1}^{n})) - ν \log ({\hat{σ}}^{2}) \leq \log (\det (Φ_{n} diag {σ_{u (k)}^{2}}_{k = 1}^{n} Φ_{n}^{T})) \leq \log (\det (diag {σ_{u (k)}^{2}}_{k = 1}^{n})) - ν \log ({\overset{ˇ}{σ}}^{2}); \end{matrix}

(A8)

thus,

\lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1}) - h (u_{n}^{1})) = 0

, satisfying Condition ii) in Definition 2. Adding to this the fact that, in this case,

σ_{u (n)}^{2} \leq {\hat{σ}}^{2} < \infty

for all n, Condition i) in Definition 2 is satisfied, as well, completing the proof. □

Proof of Lemma 2.

Let

{b_{i, ℓ}}_{ℓ = 1}^{\infty}

be the intervals (bins) in

R

where the sample

u (i)

has constant PDF. Define the discrete random process

c_{1}^{\infty}

, where

c (i) = ℓ

if and only if

u (i) \in b_{i, ℓ}

. Let

y_{n}^{ν + 1} ≜ Φ_{n} u_{n}^{1}

where

Φ_{n} \in R^{(n - ν) \times n}

has orthonormal rows. Then,

\begin{matrix} h (y_{n}^{ν + 1}) = h (y_{n}^{ν + 1} | c_{n}^{1}) + I (c_{n}^{1}; y_{n}^{ν + 1}) \end{matrix}

(A9)

\begin{matrix} \leq h (y_{n}^{ν + 1} | c_{n}^{1}) + I (c_{n}^{1}; u_{n}^{1}), \end{matrix}

(A10)

where the inequality is due to the fact that

u_{n}^{1}

and

y_{n}^{ν + 1}

are deterministic functions of

u_{n}^{1}

; hence,

c_{n}^{1} ⟷ u_{n}^{1} ⟷ y_{n}^{ν + 1}

. Subtracting

h (u_{n}^{1})

from (A9) we obtain

\begin{matrix} h (y_{n}^{ν + 1}) - h (u_{n}^{1}) \leq h (y_{n}^{ν + 1} | c_{n}^{1}) + I (c_{n}^{1}; u_{n}^{1}) - h (u_{n}^{1}) \end{matrix}

(A11)

\begin{matrix} = h (y_{n}^{ν + 1} | c_{n}^{1}) - h (u_{n}^{1} | c_{n}^{1}) . \end{matrix}

(A12)

Hence,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1}) - h (u_{n}^{1})) \leq \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1} | c_{n}^{1}) - h (u_{n}^{1} | c_{n}^{1})) = 0, \end{matrix}

(A13)

where the last equality follows from Lemma A1 (in Appendix C) whose conditions are met because, given

c_{n}^{1}

, the sequence

u_{n}^{1}

has independent entries each of them distributed uniformly over a possibly different interval with finite and positive measure. The opposite inequality is obtained by following the same steps as in the proof of Lemma A1, from (A124) onwards, which completes the proof. □

Proof of Lemma 3.

Let

y_{n}^{1} ≜ {[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} w_{n}^{1}

, where

{[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} \in R^{n \times n}

is a unitary matrix and where

Ψ_{n} \in R^{ν \times n}

and

Φ_{n} \in R^{(n - ν) \times n}

have orthonormal rows.

Then,

\begin{matrix} h (y_{n}^{ν + 1}) = h (y_{n}^{1}) - h (y_{ν}^{1} | y_{n}^{ν + 1}) = h (w_{n}^{1}) - h (y_{ν}^{1} | y_{n}^{ν + 1}) . \end{matrix}

(A14)

We can lower bound

h (y_{ν}^{1} | y_{n}^{ν + 1})

as follows:

\begin{matrix} h (y_{ν}^{1} | y_{n}^{ν + 1}) \overset{}{=} h (Ψ_{n} u_{n}^{1} + Ψ_{n} v_{n}^{1} | Φ_{n} u_{n}^{1} + Φ_{n} v_{n}^{1}) \end{matrix}

(A15)

\begin{matrix} \overset{(a)}{\geq} h (Ψ_{n} u_{n}^{1} + Ψ_{n} v_{n}^{1} | Φ_{n} u_{n}^{1} + Φ_{n} v_{n}^{1}, v_{n}^{1}) \end{matrix}

(A16)

\begin{matrix} \overset{}{=} h (Ψ_{n} u_{n}^{1} | Φ_{n} u_{n}^{1} + Φ_{n} v_{n}^{1}, v_{n}^{1}) \end{matrix}

(A17)

\begin{matrix} \overset{}{=} h (Ψ_{n} u_{n}^{1} | Φ_{n} u_{n}^{1}, v_{n}^{1}) \end{matrix}

(A18)

\begin{matrix} \overset{(b)}{=} h (Ψ_{n} u_{n}^{1} | Φ_{n} u_{n}^{1}) \end{matrix}

(A19)

\begin{matrix} \overset{(c)}{=} h (u_{n}^{1}) - h (Φ_{n} u_{n}^{1}), \end{matrix}

(A20)

where

(a)

holds because conditioning does not increase entropy,

(b)

is from the fact that

u_{n}^{1} ⫫ v_{n}^{1}

, and

(c)

follows from the chain rule of entropy.

Substituting this result into (A14), dividing by n, taking the limit as

n \to \infty

, and recalling that

u_{1}^{\infty}

is entropy balanced, we conclude that

\lim_{n \to \infty} \frac{1}{n} (h (Φ_{n} w_{n}^{1}) - h (w_{n}^{1})) \leq 0

.

The opposite bound over

h (y_{ν}^{1} | y_{n}^{ν + 1})

can be obtained from

\begin{matrix} h (y_{ν}^{1} | y_{n}^{ν + 1}) = h (Ψ_{n} u_{n}^{1} + Ψ_{n} v_{n}^{1} | Φ_{n} u_{n}^{1} + Φ_{n} v_{n}^{1}) \leq h (Ψ_{n} u_{n}^{1} + Ψ_{n} v_{n}^{1}) \leq h (Ψ_{n} {(w_{G})}_{n}^{1}), \end{matrix}

(A21)

where

{(w_{G})}_{n}^{1}

is a jointly Gaussian sequence with the same second-order moment as

w_{n}^{1}

. Therefore,

h (Ψ_{n} {(w_{G})}_{n}^{1}) = \frac{1}{2} \log ({(2 π e^{})}^{ν} \det (Ψ_{n} K_{w_{n}^{1}} Ψ_{n}^{T})) \leq \frac{ν}{2} \log (2 π e^{} λ_{\max} (K_{w_{n}^{1}}))

. But

w_{n}^{1}

satisfies the assumptions of Proposition A2; thus,

\lim_{n \to \infty} n^{- 1} \log (λ_{\max} (K_{w_{n}^{1}})) = 0

. Therefore,

\lim_{n \to \infty} n^{- 1} h (Ψ_{n} {(w_{G})}_{n}^{1}) \leq 0

, which substituted in (A14) yields

\lim_{n \to \infty} - \frac{1}{n} h (y_{ν}^{1} | y_{n}^{ν + 1}) = \lim_{n \to \infty} \frac{1}{n} (h (Φ_{n} w_{n}^{1}) - h (w_{n}^{1})) \geq 0 .

Hence,

w_{1}^{\infty}

satisfies Condition ii) of Definition 2. Since

w_{1}^{\infty}

also satisfies Condition i) of Definition 2, it follows that

w_{1}^{\infty}

is entropy balanced, completing the proof. □

Proof of Lemma 4.

Pick any

ν \in N

and let

y_{n}^{1} ≜ {[Φ_{n}^{T} | Ψ_{n}^{T}]}^{T} w_{n}^{1}

where

{[Φ_{n}^{T} | Ψ_{n}^{T}]}^{T} \in R^{n \times n}

is a unitary matrix and the matrices

Ψ_{n} \in R^{ν \times n}

and

Φ_{n} \in R^{(n - ν) \times n}

have orthonormal rows. Since

w_{n}^{1} = G_{n} u_{n}^{1}

, we have that

\begin{matrix} Φ_{n} w_{n}^{1} = Φ_{n} G_{n} u_{n}^{1} . \end{matrix}

(A22)

Let

Φ_{n} G_{n} = A_{n} Σ_{n} B_{n}

be the SVD of

Φ_{n} G_{n}

, where

A_{n} \in R^{(n - ν) \times (n - ν)}

is an orthogonal matrix,

B_{n} \in R^{(n - ν) \times n}

has orthonormal rows and

Σ_{n} \in R^{(n - ν) \times (n - ν)}

is a diagonal matrix with the singular values of

Φ_{n} G_{n}

.

Hence

\begin{matrix} h (Φ_{n} w_{n}^{1}) = h (Φ_{n} G_{n} u_{n}^{1}) = h (A_{n} Σ_{n} B_{n} u_{n}^{1}) = \log \det (Σ_{n}) + h (B_{n} u_{n}^{1}) . \end{matrix}

(A23)

The singular values of

Φ_{n} G_{n}

are

σ_{i} (Φ_{n} G_{n}) = \sqrt{λ_{i} (Φ_{n} G_{n} G_{n}^{T} Φ_{n}^{T})}

,

i = 1, 2, \dots, n - ν

. Now, notice that

\begin{matrix} λ_{i} ({[Φ_{n}^{T} | Ψ_{n}^{T}]}^{T} G G^{T} [Φ_{n}^{T} | Ψ_{n}^{T}]) = λ_{i} (G G^{T}) \end{matrix}

(A24)

and that

\begin{matrix} {[Φ_{n}^{T} | Ψ_{n}^{T}]}^{T} G G^{T} [Φ_{n}^{T} | Ψ_{n}^{T}] = (\begin{matrix} Φ_{n} G G^{T} Φ_{n}^{T} & Φ_{n} G G^{T} Ψ_{n}^{T} \\ Ψ_{n} G G^{T} Φ_{n}^{T} & Ψ_{n} G G^{T} Ψ_{n}^{T} \end{matrix}) . \end{matrix}

(A25)

Thus, from (A24) and the Cauchy eigenvalue interlacing theorem [30],

\begin{matrix} λ_{i} (G_{n} G_{n}^{T}) \leq λ_{i} (Φ_{n} G G^{T} Φ_{n}^{T}) \leq λ_{i + ν} (G_{n} G_{n}^{T}), i = 1, \dots, n - ν . \end{matrix}

(A26)

Hence,

- \frac{ν}{2 n} \log (λ_{\max} (G_{n} G_{n}^{T})) \leq \frac{1}{n} \log (\det (Σ_{n})) - \frac{1}{n} \log |\det (G_{n})| \leq - \frac{ν}{2 n} \log (λ_{\min} (G_{n} G_{n}^{T})) .

(A27)

Recalling that G is minimum phase (which guarantees that its singular values change at most polynomially with n, due to Lemma 7), we conclude that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} \log (\det (Σ_{n})) = \frac{1}{n} \log | \det (G_{n}) | . \end{matrix}

(A28)

Substituting back into (A23), we arrive to

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (Φ_{n} w_{n}^{1}) \overset{(a)}{=} \lim_{n \to \infty} \frac{1}{n} \log | \det (G_{n}) | + \lim_{n \to \infty} \frac{1}{n} h (u_{1}^{n}) = \lim_{n \to \infty} \frac{1}{n} h (w_{1}^{n}), \end{matrix}

(A29)

where

(a)

holds because

u_{1}^{\infty}

is entropy balanced. This completes the proof. □

Proof of Lemma 5.

Let

{Ψ_{n}}_{n = 1}^{\infty}

be a sequence of matrices, each

Ψ_{n} \in R^{κ \times n}

with orthonormal rows spanning a subspace of

R^{n}

that contains the span of the columns of

{[Φ]}_{n}^{1}

. For each

n \in N

, let

{\bar{Ψ}}_{n} \in R^{(n - κ) \times n}

be such that

H_{n} ≜ {[Ψ_{n}^{T} | {\bar{Ψ}}_{n}^{T}]}^{T}

is a unitary matrix. Then,

\begin{matrix} h ({\bar{Ψ}}_{n} y_{n}^{1}) = h ({\bar{Ψ}}_{n} u_{n}^{1}) . \end{matrix}

(A30)

Thus,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{1}) - h (u_{n}^{1})) = \lim_{n \to \infty} \frac{1}{n} ((h (y_{n}^{1}) - h ({\bar{Ψ}}_{n} y_{n}^{1})) - (h (u_{n}^{1}) - h ({\bar{Ψ}}_{n} u_{n}^{1}))) = 0, \end{matrix}

(A31)

where the last equality holds because

u_{1}^{\infty}

is entropy balanced and

y_{1}^{\infty}

is entropy balanced (from Lemma 3). This completes the proof. □

Proof of Lemma 6.

Since

Q_{n}

is unitary, we have that

\begin{matrix} h (y_{n}^{1}) = h (Q_{n} y_{n}^{1}) = h (\overset{w_{n}^{1}}{\overset{⏞}{\underset{v_{n}^{1}}{\underset{⏟}{D_{n} R_{n} u_{n}^{1}}} + \underset{{\bar{z}}_{n}^{1}}{\underset{⏟}{Q_{n} z_{n}^{1}}}}}) = h (w_{n}^{1}), \end{matrix}

(A32)

where

\begin{matrix} w_{n}^{1} ≜ Q_{n} y_{n}^{1} = v_{n}^{1} + {\bar{z}}_{n}^{1}, \end{matrix}

(A33)

\begin{matrix} v_{n}^{1} ≜ D_{n} R_{n} u_{n}^{1}, \end{matrix}

(A34)

\begin{matrix} {\bar{z}}_{n}^{1} ≜ Q_{n} z_{n}^{1} . \end{matrix}

(A35)

Thus,

\begin{matrix} h (y_{n}^{1}) = h (w_{1}^{n}) \overset{(a)}{=} h (w_{1}^{m}) + h (w_{m + 1}^{n} | w_{1}^{m}) = h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1}) + h (w_{m + 1}^{n} | w_{1}^{m}), \end{matrix}

(A36)

where

(a)

follows from the chain rule of differential entropy. It only remains to show that the limit of

(1 / n) h (w_{m + 1}^{n} | w_{1}^{m})

as

n \to \infty

equals the entropy rate of

u_{1}^{\infty}

. We will do this by deriving a lower and an upper bounds which converge to the same expression as

n \to \infty

.

A lower bound for

h (w_{m + 1}^{n} | w_{1}^{m})

can be obtained by noticing that

\begin{matrix} h (w_{m + 1}^{n} | w_{1}^{m}) = h (v_{m + 1}^{n} + {\bar{z}}_{m + 1}^{n} | v_{1}^{m} + {\bar{z}}_{1}^{m}) \end{matrix}

(A37)

\begin{matrix} \overset{(a)}{\geq} h (v_{m + 1}^{n} + {\bar{z}}_{m + 1}^{n} | v_{1}^{m}, {\bar{z}}_{1}^{n}) \end{matrix}

(A38)

\begin{matrix} \overset{(b)}{=} h (v_{m + 1}^{n} | v_{1}^{m}, {\bar{z}}_{1}^{n}) \end{matrix}

(A39)

\begin{matrix} \overset{(c)}{=} h (v_{m + 1}^{n} | v_{1}^{m}) \end{matrix}

(A40)

\begin{matrix} \overset{(d)}{=} h (v_{1}^{n}) - h (v_{1}^{m}) \end{matrix}

(A41)

\begin{matrix} \overset{(e)}{=} h (u_{1}^{n}) - h (v_{1}^{m}), \end{matrix}

(A42)

where

(a)

follows from the fact that conditioning on more information does not increase differential entropy,

(b)

is due to the fact that

h (x + a) = h (x)

, for any constant a,

(c)

holds because

{\bar{z}}_{1}^{\infty} ⫫ v_{1}^{\infty}

,

(d)

is a direct application of the chain rule of differential entropy, and

(e)

stems from (A34) and the fact that

\det (D_{n} R_{n}) = 1

. On the other hand,

\begin{matrix} h (v_{1}^{m}) = h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) = \sum_{i = 1}^{m} \log d_{n, i} + h ({[R_{n}]}_{m}^{1} u_{n}^{1}) . \end{matrix}

(A43)

Then, by inserting (A43) and (A42) in (A37), dividing by n, and taking the limit

n \to \infty

, we obtain

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (w_{m + 1}^{n} | w_{1}^{m}) \geq \lim_{n \to \infty} \frac{1}{n} (h (u_{1}^{n}) - \sum_{i = 1}^{m} \log d_{n, i} - h ({[R_{n}]}_{m}^{1} u_{n}^{1})) \end{matrix}

(A44)

\begin{matrix} = \bar{h} (u_{1}^{\infty}) - \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{m} \log d_{n, i}, \end{matrix}

(A45)

where the last equality is a consequence of the fact that

u_{1}^{\infty}

is entropy balanced (specifically, from Proposition A3).

We now derive an upper bound for

h (w_{m + 1}^{n} | w_{1}^{m})

. Defining the random vector

x_{n}^{m + 1} ≜ {[R_{n}]}_{n}^{m + 1} u_{n}^{1},

and since

D_{n}

is diagonal, we can write

\begin{matrix} v_{m}^{1} = {[D_{n}]}_{n}^{m + 1} R_{n} u_{n}^{1} =^{m + 1} {[D_{n}]}_{n} x_{n}^{m + 1}, \end{matrix}

(A46)

where

\begin{matrix} ^{m + 1} {[D_{n}]}_{n} ≜ diag {d_{n, m + 1}, d_{n, m + 2}, \dots, d_{n, n}} . \end{matrix}

(A47)

Therefore,

\begin{matrix} h (w_{m + 1}^{n} | w_{1}^{m}) \leq h (w_{n}^{m + 1}) = h (^{m + 1} {[D_{n}]}_{n} x_{n}^{m + 1} + {\bar{z}}_{n}^{m + 1}) \end{matrix}

(A48)

\begin{matrix} = \log \det (^{m + 1} {[D_{n}]}_{n}) + h (x_{n}^{m + 1} + {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}) . \end{matrix}

(A49)

Notice that, by Assumption 2,

{\bar{z}}_{n}^{m + 1} = {[Q_{n}]}_{n}^{m + 1} z_{n}^{1} = {[Q_{n}]}_{n}^{m + 1} {[Φ]}_{n}^{1} s_{κ}^{1}

and, thus, is restricted to the span of

{[Q_{n}]}_{n}^{m + 1} {[Φ]}_{n}^{1}

of dimension

κ_{n} \leq κ

, for all

n \geq m + κ

. Then, for every

n > m + κ_{n}

, one can construct a unitary matrix

H_{n} ≜ {(A_{n}^{T} | B_{n}^{T})}^{T} \in R^{(n - m) \times (n - m)}

, with

A_{n} \in R^{κ \times (n - m)}

and

B_{n} \in R^{(n - m - κ) \times (n - m)}

, such that the rows of

A_{n}

span the space spanned by the columns of

{(^{m + 1} {[D_{n}]}_{n})}^{- 1} {[Q_{n}]}_{n}^{m + 1} {[Φ]}_{n}^{1}

and such that

B_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {[Q_{n}]}_{n}^{m + 1}

{[Φ]}_{n}^{1} = 0

. Therefore, from (A49),

\begin{matrix} h (w_{m + 1}^{n} | w_{1}^{m}) & \leq \log \det (^{m + 1} {[D_{n}]}_{n}) + h (H_{n} x_{n}^{m + 1} + H_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}) \\ = \log \det (^{m + 1} {[D_{n}]}_{n}) + h (B_{n} x_{n}^{m + 1}) + h (A_{n} x_{n}^{m + 1} + A_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1} | B_{n} x_{n}^{m + 1}) \\ \leq \log \det (^{m + 1} {[D_{n}]}_{n}) + h (B_{n} x_{n}^{m + 1}) + h (A_{n} x_{n}^{m + 1} + A_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}) \\ \leq \log \det (^{m + 1} {[D_{n}]}_{n}) + h (B_{n} x_{n}^{m + 1}) + \frac{1}{2} \log ({(2 π e^{})}^{κ} \det (K_{A_{n} x_{n}^{m + 1}} + K_{A_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}})) \\ \leq \log \det (^{m + 1} {[D_{n}]}_{n}) + h (B_{n} x_{n}^{m + 1}) + \frac{1}{2} \log ({(2 π e^{})}^{κ} {[λ_{\max} (K_{x_{n}^{m + 1}}) + \frac{λ_{\max} (K_{{\bar{z}}_{n}^{m + 1}})}{λ_{\min} {(^{m + 1} {[D_{n}]}_{n})}^{2}}]}^{κ}), \end{matrix}

where

K_{A_{n} x_{n}^{m + 1}}

and

K_{A_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}}

are the covariance matrices of

A_{n} x_{n}^{m + 1}

and

A_{n} {(^{m + 1} {[D_{n}]}_{n})}^{- 1} {\bar{z}}_{n}^{m + 1}

, respectively, and where the last inequality follows from [31]. The fact that

λ_{\max} (K_{x_{n}^{m + 1}})

and

λ_{\max} (K_{{\bar{z}}_{n}^{m + 1}})

are upper bounded for all n, and the fact that

λ_{\min} (^{m + 1} {[D_{n}]}_{n})

either grows with n or decreases sub-exponentially (from Lemma 7), imply that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (w_{m + 1}^{n} | w_{1}^{m}) \leq \lim_{n \to \infty} \frac{1}{n} \log \det (^{m + 1} {[D_{n}]}_{n}) + \lim_{n \to \infty} \frac{1}{n} h (B_{n} x_{n}^{m + 1}) . \end{matrix}

(A50)

But the fact that

\det D_{n} = 1

implies that

\log \det (^{m + 1} {[D_{n}]}_{n}) = - \sum_{i = 1}^{m} \log d_{n, i}

. On the other hand, recalling that

x_{n}^{m + 1} = {[R_{n}]}_{n}^{m + 1} u_{n}^{1}

and noting that

B_{n} {[R_{n}]}_{n}^{m + 1}

has orthonormal rows, reveals that

\lim_{n \to \infty} \frac{1}{n} h (B_{n} x_{n}^{m + 1}) = \bar{h} (u_{1}^{\infty})

(from the assumption that

u_{1}^{\infty}

is entropy balanced). Therefore,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (w_{m + 1}^{n} | w_{1}^{m}) \leq \bar{h} (u_{1}^{\infty}) - \lim_{n \to \infty} \frac{1}{n} \sum_{i = 1}^{m} \log d_{n, i}, \end{matrix}

(A51)

which coincides with the lower bound found in (A45), completing the proof. □

Proof of Lemma 7.

The transfer function

G (z)

can be factored as

G (z) = \tilde{G} (z) F (z)

, where

\tilde{G} (z)

is stable and minimum phase and

F (z)

is stable with all the non-minimum phase zeros of

G (z)

, both being biproper rational functions. From Lemma A2 (in Appendix C), in the limit as

n \to \infty

, the eigenvalues of

{\tilde{G}}_{n}^{T} {\tilde{G}}_{n}

are lower and upper bounded by

λ_{\min} ({\tilde{G}}^{T} \tilde{G})

and

λ_{\max} ({\tilde{G}}^{T} \tilde{G})

, respectively, where

0 < λ_{\min} ({\tilde{G}}^{T} \tilde{G}) \leq λ_{\max} ({\tilde{G}}^{T} \tilde{G}) < \infty

. Let

{\tilde{G}}_{n} = {\tilde{Q}}_{n}^{T} {\tilde{D}}_{n} {\tilde{R}}_{n}

and

F_{n} = Q_{n}^{T} D_{n} R_{n}

be the SVDs of

{\tilde{G}}_{n}

and

F_{n}

, respectively, with

{\tilde{d}}_{n, 1} \leq {\tilde{d}}_{n, 2} \leq \dots \leq {\tilde{d}}_{n, n}

and

d_{n, 1} \leq d_{n, 2} \leq \dots \leq d_{n, n}

being the diagonal entries of the diagonal matrices

{\tilde{D}}_{n}

,

D_{n}

, respectively. Then,

\begin{matrix} G_{n}^{T} G_{n} = F_{n}^{T} {\tilde{G}}_{n}^{T} {\tilde{G}}_{n} F_{n} = {({\tilde{D}}_{n} {\tilde{R}}_{n} Q_{n}^{T} D_{n} R_{n})}^{T} {\tilde{D}}_{n} {\tilde{R}}_{n} Q_{n}^{T} D_{n} R_{n} . \end{matrix}

(A52)

Denoting the i-th row of

R_{n}

by

r_{n, i}^{T}

be, we have that, from the Courant-Fischer theorem [30] that

\begin{matrix} λ_{i} (G_{n}^{T} G_{n}) \leq \max_{v \in span {r_{n, k}}_{k = 1}^{i} : ∥ v ∥ = 1} {∥ G v ∥}^{2} \end{matrix}

(A53)

\begin{matrix} = \max_{v \in span {r_{n, k}}_{k = 1}^{i} : ∥ v ∥ = 1} {∥ {\tilde{D}}_{n} {\tilde{R}}_{n}^{T} Q_{n}^{T} D_{n} R_{n} v ∥}^{2} \end{matrix}

(A54)

\begin{matrix} \leq d_{n, i}^{2} {\tilde{d}}_{n, n}^{2} . \end{matrix}

(A55)

Likewise,

\begin{matrix} λ_{i} (G_{n}^{T} G_{n}) \geq \min_{v \in span {r_{n, k}}_{k = i}^{n} : ∥ v ∥ = 1} ∥ G v ∥ \end{matrix}

(A56)

\begin{matrix} = \min_{v \in span {r_{n, k}}_{k = i}^{n} : ∥ v ∥ = 1} {∥ {\tilde{D}}_{n} {\tilde{R}}_{n}^{T} Q_{n}^{T} D_{n} R_{n} v ∥}^{2} \end{matrix}

(A57)

\begin{matrix} \geq d_{n, i}^{2} {\tilde{d}}_{n, 1}^{2} . \end{matrix}

(A58)

Thus,

\begin{matrix} \lim_{n \to \infty} \frac{λ_{i} (G_{n}^{T} G_{n})}{d_{n, i}^{2}} \in (λ_{\min} ({\tilde{G}}^{T} \tilde{G}), λ_{\max} ({\tilde{G}}^{T} \tilde{G})) . \end{matrix}

(A59)

The result now follows directly from Lemma A3 (in Appendix C). □

Proof of Theorem 6.

To begin with, the entropy power inequality [1] gives

h (y_{n}^{1}) = h (G_{n} u_{n}^{1} + z_{n}^{1}) \geq h (G_{n} u_{n}^{1}) = h (y_{n}^{1})

, proving the lower bound in (70).

To obtain the other bounds on the entropy gain of

G_{n}

, we will use Lemma 6. Recalling the structure of

z_{1}^{\infty}

specified in Assumption 2, the random vector whose differential entropy appears on the RHS of (64) takes the form

\begin{matrix} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1} = {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1} . \end{matrix}

(A60)

Notice that, for every

n \geq κ

, the columns of the matrix

{[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} \in R^{m \times κ}

span a space of dimension

κ_{n} \in {0, 1, \dots, \bar{κ}}

, with

\bar{κ} ≜ \min {m, κ}

. If

κ_{n} = 0

(i.e.,

{[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} = 0

), then

\begin{matrix} h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1}) = h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) . \end{matrix}

(A61)

If that is that case for every

n \geq κ

, the lower bound in (70) is reached by inserting the latter expression into (64) and invoking Lemma 7.

Let

{[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} = A_{n}^{T} T_{n} B_{n}

be an SVD for

{[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1}

, where

A_{n} \in R^{κ_{n} \times m}

has orthonormal rows,

T_{n} = diag {t_{1} (n), t_{2} (n), \dots, t_{κ_{n}} (n)},

where

0 < t_{1} (n) \leq t_{2} (n) \leq \dots \leq t_{κ_{n}} (n) \leq 1

are the singular values of

{[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1}

, and

B_{n} \in R^{κ_{n} \times κ}

has orthonormal rows. Construct a unitary matrix

H_{n} \in R^{m \times m}

such that

\begin{matrix} H_{n} ≜ (\begin{matrix} A_{n} \\ {\bar{A}}_{n} \end{matrix}), \end{matrix}

(A62)

where

A_{n} \in R^{κ_{n} \times m}

is as before, and

{\bar{A}}_{n} \in R^{(m - κ_{n}) \times m}

has orthonormal rows, and its row span is the orthogonal complement of that of

A_{n}

. Thus,

\begin{matrix} H_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} = (\begin{matrix} A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} \\ 0_{(m - κ_{n}) \times κ} \end{matrix}), n \geq N . \end{matrix}

(A63)

From (A63) and (A60), we obtain

\begin{matrix} h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1}) = h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1}) \end{matrix}

(A64)

\begin{matrix} = h (H_{n} ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1})) \end{matrix}

(A65)

\begin{matrix} = h (A_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1} | {\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) + (1 - 1_{{m}} (κ_{n})) h ({\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}), \end{matrix}

(A66)

where the indicator function

1_{{m}} (κ_{n}) = 1

if

κ_{n} = m

and 0 otherwise. The first differential entropy on the RHS of (A66) can be lower bounded as

\begin{matrix} h (A_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1} | {\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) \overset{(a)}{\geq} h (A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1} | {\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) \\ \overset{(b)}{=} h (A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1}) = h (T_{n} B_{n} s_{κ}^{1}) \overset{(c)}{\geq} κ_{n} \log (t_{1} (n)) + h (s_{κ}^{1}) - \frac{κ - κ_{n}}{2} \log (λ_{\max} (K_{s_{κ}^{1}})), \end{matrix}

(A67)

where

(a)

is from the entropy power inequality [1],

(b)

holds because

s_{κ}^{1} ⫫ u_{n}^{1}

and

(c)

is from Proposition A1. An upper bound can be obtained as

\begin{matrix} h (A_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + A_{n} {[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1} s_{κ}^{1} | {\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) \overset{(a)}{\leq} h (A_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + T_{n} B_{n} s_{κ}^{1}) \\ \overset{(b)}{\leq} \frac{1}{2} \log ({(2 π e^{})}^{m} \det (K_{A_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}} + K_{T_{n} B_{n} s_{κ}^{1}})) \\ \overset{(c)}{\leq} \frac{κ_{n}}{2} \log ((2 π e^{}) ({(d_{m})}^{2} λ_{\max} (K_{u_{n}^{1}}) + {(t_{κ_{n}} (n))}^{2} λ_{\max} (K_{s_{κ}^{1}}))), \end{matrix}

(A68)

where

(a)

holds because conditioning does not increase entropy,

(b)

is because a Gaussian distribution maximizes the differential entropy for a given covariance matrix, and

(c)

is due to Reference [31]. Notice that

u_{1}^{\infty}

satisfies the requirements of Proposition A2, implying that

\lim_{n \to \infty} n^{- 1} λ_{\max} (K_{u_{n}^{1}}) = 0

. Thus, since

t_{κ_{n}} (n) \leq 1

, it follows from (A67), (A68), and (A66) that

\begin{matrix} \lim_{n \to \infty} \frac{1 - 1_{{m}} (κ_{n})}{n} h ({\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) + \lim_{n \to \infty} \frac{κ_{n}}{n} \log (t_{1} (n)) \\ \leq \lim_{n \to \infty} \frac{1}{n} h ({[D_{n}]}_{m}^{1} R_{n} u_{n}^{1} + {[Q_{n}]}_{m}^{1} z_{n}^{1}) \leq \lim_{n \to \infty} (1 - 1_{{m}} (κ_{n})) h ({\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) . \end{matrix}

(A69)

For the last differential entropy on the RHS of (A66), notice that

{[D_{n}]}_{m}^{1} R_{n} =^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1}

. Consider the SVD

{\bar{A}}_{n}^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1} = V_{n}^{T} Σ_{n} W_{n}

, with

V_{n} \in R^{(m - κ_{n}) \times (m - κ_{n})}

being unitary,

Σ_{n} \in R^{(m - κ_{n}) \times (m - κ_{n})}

being diagonal, and

W_{n} \in R^{(m - κ_{n}) \times n}

having orthonormal rows. We can then conclude that

\begin{matrix} h ({\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) = h (Σ_{n} W_{n} u_{n}^{1}) = \log |\det (Σ_{n})| + h (W_{n} u_{n}^{1}) . \end{matrix}

(A70)

Now, the fact that

\begin{matrix} {\bar{A}}_{n}^{1} {[D_{n}]}_{m}^{1} {[D_{n}]}_{m} {\bar{A}}_{n}^{T} = {\bar{A}}_{n}^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1} {({\bar{A}}_{n}^{1} {[D_{n}]}_{m} {[R_{n}]}_{m}^{1})}^{T} = V^{T} Σ W W^{T} Σ^{T} V = V^{T} Σ Σ^{T} V \end{matrix}

reveals that

\begin{matrix} \log |\det Σ| = \frac{1}{2} \log | \det ({\bar{A}}_{n} {(^{1} {[D_{n}]}_{m})}^{2} {\bar{A}}_{n}^{T}) | . \end{matrix}

(A71)

Recalling that

{\bar{A}}_{n} = {[H_{n}]}_{m}^{κ_{n} + 1}

and that

H_{n} \in R^{m \times m}

is unitary, it is easy to show (by using the Courant-Fischer theorem [30]) that

\begin{matrix} \sum_{i = 1}^{m - κ_{n}} \log d_{n, i} \overset{(a)}{\leq} \frac{1}{2} \log |\det ({\bar{A}}_{n} {(^{1} {[D_{n}]}_{m})}^{2} {\bar{A}}_{n}^{T})| \overset{(b)}{\leq} \sum_{i = κ_{n} + 1}^{m} \log d_{n, i}, \end{matrix}

(A72)

with equality in

(a)

and

(b)

if and only if

{\bar{A}}_{n} = [I_{m - κ_{n}} | 0]

and

{\bar{A}}_{n} = [0 | I_{m - κ_{n}}]

, respectively. Substituting this into (A71) and then the latter into (A70), we arrive to

\begin{matrix} h ({[W_{n}]}_{m}^{1} u_{n}^{1}) + \sum_{i = 1}^{m - κ_{n}} \log d_{n, i} \leq h ({\bar{A}}_{n} {[D_{n}]}_{m}^{1} R_{n} u_{n}^{1}) \leq h ({[W_{n}]}_{m}^{1} u_{n}^{1}) + \sum_{i = κ_{n} + 1}^{m} \log d_{n, i} . \end{matrix}

(A73)

Substituting the upper bound from this equation and from (A68) into (A66) and the latter in (64), exploiting the fact that

u_{1}^{\infty}

is entropy balanced (which ensures that

u_{1}^{\infty}

satisfies Condition i) in Definition 2) and invoking Lemma 7 yields the upper bound in (70).

Doing the same substitutions but with the lower bounds in (A73) and (A67), and using the assumption that

\lim_{n \to \infty} \frac{1}{n} \log (t_{1} (n)) = 0

, gives the lower bound of (71). This completes the proof. □

Proof of Lemma 8.

We will consider first the case

κ = m

and show that

\lim_{n \to \infty} σ_{\min}

(^{1} {[Q_{n}]}_{m}) > 0

, where now

Q_{n}^{T}

is the left unitary matrix in the SVD

F_{n} = Q_{n}^{T} D_{n} R_{n}

. We will prove that this is the case by using a contradiction argument. Thus, suppose the contrary, i.e., that

\begin{matrix} \lim_{n \to \infty} σ_{\min} (^{1} {[Q_{n}]}_{m}) = 0 . \end{matrix}

(A74)

Then, there exists a sequence of unit-norm vectors

{v_{n}}_{n = 1}^{\infty}

, with

v_{n} \in R^{m}

for all n, such that

\begin{matrix} \lim_{n \to \infty} ∥ v_{n}^{T}^{1} {[Q_{n}]}_{m} ∥ = 0 . \end{matrix}

(A75)

For each

n \in N

, define the n-length unit-norm image vectors

t_{n}^{T} ≜ v_{n}^{T} {[Q_{n}]}_{m}^{1}

. Then,

\begin{matrix} ∥ F_{n}^{T} t_{n} ∥ = ∥ R_{n}^{T} D_{n} Q_{n} t_{n} ∥ = ∥ D_{n} Q_{n} t_{n} ∥ = ∥^{1} {[D_{n}]}_{m} v_{n} ∥, \end{matrix}

(A76)

where the last equality follows from the fact that, by construction,

t_{n}^{T}

is in the span of the first m rows of

Q_{n}

, together with the fact that

Q_{n}

is unitary (which implies that

{[Q_{n}]}_{n}^{m + 1} t_{n} = 0

). Since the top m entries in

D_{n}

decay exponentially as n increases, we have that

\begin{matrix} ∥ F_{n}^{T} t_{n} ∥ \leq O (ζ_{n} | ρ_{M} |^{- n}), \end{matrix}

(A77)

where

ζ_{n}

is a finite-order polynomial of n (from Lemma A3, in Appendix C).

Now, notice that

{[F_{n}]}_{n}^{m + 1} {({[F_{n}]}_{n}^{m + 1})}^{T}

is a Toeplitz matrix with the convolution of f and

f^{-}

(the impulse response of F and its time-reversed version, respectively) on its first row and column. It then follows from Reference [18], Lemma 4.1, that

\begin{matrix} \lim_{n \to \infty} λ_{\min} ({[F_{n}]}_{n}^{m + 1} {({[F_{n}]}_{n}^{m + 1})}^{T}) = \min_{ω : ω \in [- π, π]} {| F (e^{j ω}) |}^{2} > 0 \end{matrix}

(A78)

(the inequality is strict because all the zeros of

F (z)

are strictly outside the unit disk). Then, we conclude that

\begin{matrix} \lim_{n \to \infty} σ_{\min} ({[F_{n}]}_{n}^{m + 1}) > 0 . \end{matrix}

(A79)

Recall that

∥ t_{n} ∥ = 1

; thus, from (A75),

\lim_{n \to \infty} ∥ {[t_{n}]}_{m}^{1} ∥ = 0

and

\lim_{n \to \infty} ∥ {[t_{n}]}_{n}^{m + 1} ∥ = 1

, which means that

\lim_{n \to \infty} ∥ F_{n}^{T} t_{n} ∥ = \lim_{n \to \infty} ∥ {({[F_{n}]}_{n}^{m + 1})}^{T} {[t_{n}]}_{n}^{m + 1} ∥ \geq \lim_{n \to \infty} σ_{\min} ({[F_{n}]}_{n}^{m + 1}) \overset{(79)}{>} 0

, which contradicts (A77). Therefore,

\begin{matrix} \lim_{n \to \infty} σ_{\min} (^{1} {[Q_{n}]}_{m}) > 0 . \end{matrix}

(A80)

Now, consider an arbitrary

κ \geq 1

. Since

\begin{matrix} σ_{\min} (\overset{1 κ}{\overset{⎴}{{[Q_{n}]}_{m}^{1}}}) \geq σ_{\min} (^{1} {[Q_{n}]}_{m}), \end{matrix}

(A81)

it follows from (A80) that

\begin{matrix} \lim_{n \to \infty} σ_{\min} ({[Q_{n}]}_{m}^{1} {[Φ]}_{n}^{1}) = \lim_{n \to \infty} σ_{\min} (\overset{1 κ}{\overset{⎴}{{[Q_{n}]}_{m}^{1}}}) > 0; \end{matrix}

(A82)

thus,

\lim_{n \to \infty} κ_{n} = \bar{κ}

. This completes the proof. □

Proof of Theorem 9.

Denote the Blaschke product [11] of

A (z)

as

\begin{matrix} B (z) ≜ \frac{\prod_{i = 1}^{m} (z - p_{i})}{\prod_{i = 1}^{m} p_{i}^{*} (z - 1 / p_{i}^{*})}, \end{matrix}

(A83)

which clearly satisfies

\begin{matrix} | B (e^{j ω}) | & = 1, \forall ω \in [- π, π], \end{matrix}

(A84)

\begin{matrix} b_{0} & ≜ \lim_{| z | \to \infty} B (z) = \frac{1}{\prod_{i = 1}^{m} p_{i}^{*}}, \end{matrix}

(A85)

where

b_{0}

is the first sample in the impulse response of

B (z)

. Notice that (A84) implies that

\lim_{n \to \infty} \frac{1}{n} E [∥ B_{n} u_{n}^{1} ∥^{2}] = \lim_{n \to \infty} \frac{1}{n} E [∥ u_{n}^{1} ∥^{2}]

for every sequence of random variables

u_{1}^{\infty}

with uniformly bounded variance. Since

B (z)

has only stable poles and its zeros coincide exactly with the poles of

A (z)

, it follows that

B (z) A (z)

is an MP stable transfer function. Thus, the asymptotically stationary process

{\tilde{x}}_{1}^{\infty}

defined in (139) can be constructed as

\begin{matrix} {\tilde{x}}_{n}^{1} ≜ B_{n} x_{n}^{1}, \end{matrix}

(A86)

where

B_{n}

is a Toeplitz lower triangular matrix with its main diagonal entries equal to

b_{0}

. Since

w_{1}^{\infty}

is entropy balanced, so is

{\tilde{x}}_{1}^{\infty}

, thanks to Lemma 4.

The fact that

B (z)

is biproper with

b_{0}

as in (A85) implies that, for any

u_{n}^{1}

with finite differential entropy,

\begin{matrix} h (B_{n} u_{n}^{1}) = h (u_{n}^{1}) - n \underset{≜ G}{\underset{⏟}{\sum_{i = 1}^{m} \log |p_{i}|}}, \end{matrix}

(A87)

which will be utilized next.

For any given

n \geq m

, suppose that

C (z)

is chosen and

x_{n}^{1}

and

u_{n}^{1}

are distributed so as to minimize

I (x_{n}^{1}; C_{n} x_{n}^{1} + u_{n}^{1})

subject to the constraint

E [∥ y_{n}^{1} - x_{n}^{1} ∥^{2}] = E [∥ (C_{n} - I) x_{n}^{1} ∥^{2}] + E [∥ u_{n}^{1} {∥]}^{2} \leq D

(i.e.,

x_{n}^{1}, u_{n}^{1}

is a realization of

R_{x, n} (D)

), yielding the reconstruction

\begin{matrix} y_{n}^{1} = C_{n} x_{n}^{1} + u_{n}^{1} . \end{matrix}

(A88)

Since we are considering mean-squared error distortion, it follows that, for rate-distortion optimality,

u_{n}^{1}

must be jointly Gaussian with

x_{n}^{1}

. In addition, there is no loss of rate-distortion optimality if

u_{1}^{\infty}

is entropy balanced (otherwise, it would have a lower entropy rate than its entropy-balanced counterpart, which differs from the former only on a finite number of samples and has the same asymptotic MSE). From these vectors, define

\begin{matrix} {\tilde{u}}_{n}^{1} ≜ B_{n} u_{n}^{1}, \end{matrix}

(A89)

\begin{matrix} {\tilde{y}}_{n}^{1} ≜ B_{n} y_{n}^{1} = B_{n} C_{n} {(B_{n})}^{- 1} {\tilde{x}}_{n}^{1} + {\tilde{u}}_{n}^{1}, \end{matrix}

(A90)

\begin{matrix} {\bar{y}}_{n}^{1} ≜ {\tilde{y}}_{n}^{1} + d_{n}^{1} = B_{n} C_{n} {(B_{n})}^{- 1} {\tilde{x}}_{n}^{1} + {\tilde{u}}_{n}^{1} + d_{n}^{1}, \end{matrix}

(A91)

where

d_{n}^{1}

is a zero-mean Gaussian vector independent of

({\tilde{u}}_{n}^{1}, {\tilde{x}}_{n}^{1})

with finite differential entropy and finite variance such that

d_{k} = 0

,

\forall k > m

. Then, we have that (the change of variables and the steps in this chain of equations is represented by the block diagrams shown in Figure 7)

\begin{matrix} n R_{x, n} (D) = I (x_{n}^{1}; y_{n}^{1}) \overset{(a)}{=} I (B_{n} x_{n}^{1}; B_{n} y_{n}^{1}) = I ({\tilde{x}}_{n}^{1}; {\tilde{y}}_{n}^{1}) \end{matrix}

(A92)

\begin{matrix} = h ({\tilde{y}}_{n}^{1}) - h ({\tilde{y}}_{n}^{1} | {\tilde{x}}_{n}^{1}) \end{matrix}

(A93)

\begin{matrix} \overset{(b)}{=} h ({\tilde{y}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1} | {\tilde{x}}_{n}^{1}) \end{matrix}

(A94)

\begin{matrix} \overset{(c)}{=} h ({\tilde{y}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1}) \end{matrix}

(A95)

\begin{matrix} \overset{(d)}{=} h ({\tilde{y}}_{n}^{1}) - (h ({\tilde{u}}_{n}^{1} + d_{n}^{1}) + [h (u_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1})] - n G) \end{matrix}

(A96)

\begin{matrix} \overset{(e)}{=} h ({\tilde{y}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1} | {\tilde{x}}_{n}^{1}) + n G - [h (u_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1})] \end{matrix}

(A97)

\begin{matrix} \overset{(f)}{=} h ({\tilde{y}}_{n}^{1}) - h ({\bar{y}}_{n}^{1} | {\bar{x}}_{n}^{1}) + n G - [h (u_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1})] \end{matrix}

(A98)

\begin{matrix} = h ({\tilde{y}}_{n}^{1}) - h ({\bar{y}}_{n}^{1}) + I ({\tilde{x}}_{n}^{1}; {\bar{y}}_{n}^{1}) + n G - [h (u_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1})] \end{matrix}

(A99)

\begin{matrix} = I ({\tilde{x}}_{n}^{1}; {\bar{y}}_{n}^{1}) + n G - [h (u_{n}^{1}) - h ({\tilde{u}}_{n}^{1} + d_{n}^{1})] + [h ({\tilde{y}}_{n}^{1}) - h ({\tilde{y}}_{n}^{1} + d_{n}^{1})], \end{matrix}

(A100)

where

(a)

follows from

B_{n}

being invertible,

(b)

is due to the fact that

{\tilde{y}}_{n}^{1} = P_{n} {\tilde{x}}_{n}^{1} + {\tilde{u}}_{n}^{1}

,

(c)

holds because

u_{n}^{1} ⫫ x_{n}^{1}

. The equality

(d)

stems from

h ({\tilde{u}}_{n}^{1}) = h (u_{n}^{1}) - n G

(see (A87)). Equality holds in

(e)

because

{\tilde{x}}_{n}^{1} ⫫ ({\tilde{u}}_{n}^{1}, d_{n}^{1})

and in

(f)

because of (A91). But from Theorem 4 and since

u_{1}^{\infty}

is entropy balanced,

\lim_{n \to \infty} \frac{1}{n} (h ({\tilde{u}}_{n}^{1} + d_{m}^{1}) - h (u_{n}^{1})) = 0

. From Lemma 3 and because

u_{1}^{\infty}

is entropy balanced, so is

{\tilde{y}}_{1}^{\infty}

. This guarantees, from Lemma 5, that

\lim_{n \to \infty} n^{- 1} [h ({\tilde{y}}_{n}^{1}) - h ({\tilde{y}}_{n}^{1} + d_{n}^{1})] = 0

. Thus,

R_{x, n} (D) = \lim_{n \to \infty} \frac{1}{n} ({\tilde{x}}_{n}^{1}; {\bar{y}}_{n}^{1}) + G \geq R_{\tilde{x}, n} (D) + G

.

At the same time, the distortion for the source

{\tilde{x}}_{n}^{1}

when reconstructed as

{\bar{y}}_{n}^{1}

is

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} E [∥ {\bar{y}}_{n}^{1} - {\tilde{x}}_{n}^{1} ∥^{2}] = \lim_{n \to \infty} \frac{1}{n} (E [∥ \tilde{y} - {\tilde{x}}_{n}^{1} ∥^{2}] + E [∥ d_{n}^{1} ∥^{2}]) \overset{(a)}{=} \lim_{n \to \infty} \frac{1}{n} E [∥ \tilde{y} - {\tilde{x}}_{n}^{1} ∥^{2}] \end{matrix}

(A101)

\begin{matrix} = \lim_{n \to \infty} \frac{1}{n} E [∥ B_{n} (y_{n}^{1} - x_{n}^{1}) ∥^{2}] \overset{(b)}{=} \lim_{n \to \infty} \frac{1}{n} E [∥ y_{n}^{1} - x_{n}^{1} ∥^{2}], \end{matrix}

(A102)

where

(a)

holds because

∥ d_{n}^{1} ∥ = ∥ d_{m}^{1} ∥

is bounded, and

(b)

is due to the fact that, in the limit,

B (z)

is a unitary operator. Recalling the definitions of

R_{\tilde{x}} (D)

and

R_{\tilde{x}} (D)

, we conclude that

\lim_{n \to \infty} \frac{1}{n} ({\tilde{x}}_{n}^{1}; {\bar{y}}_{n}^{1}) \geq R_{\tilde{x}, n} (D)

; therefore,

\begin{matrix} R_{x} (D) - R_{\tilde{x}} (D) \geq \sum_{i = 1}^{m} \log | p_{i} | . \end{matrix}

(A103)

In order to complete the proof, it suffices to show that

R_{x} (D) - R_{\tilde{x}} (D) \leq \sum_{i = 1}^{m} \log | p_{i} |

. For this purpose, consider now the (asymptotically) stationary source

{\tilde{x}}_{n}^{1}

, and suppose that

{\hat{y}}_{n}^{1} = {\tilde{x}}_{n}^{1} + u_{n}^{1}

realizes

R_{\tilde{x}, n} (D)

. Again,

{\tilde{x}}_{n}^{1}

and

u_{n}^{1}

will be jointly Gaussian, satisfying

{\hat{y}}_{n}^{1} ⫫ u_{n}^{1}

(the latter condition is required for minimum MSE optimality). From this, one can propose an alternative realization in which the error sequence is

\tilde{u} ≜ B_{n} u_{n}^{1}

, yielding an output

{\tilde{y}}_{n}^{1} = {\tilde{x}}_{n}^{1} + {\tilde{u}}_{n}^{1}

with

{\tilde{y}}_{n}^{1} ⫫ {\tilde{u}}_{n}^{1}

. Then,

\begin{matrix} n R_{\tilde{x}, n} (D) = I ({\tilde{x}}_{n}^{1}; {\hat{y}}_{n}^{1}) = h ({\tilde{x}}_{n}^{1}) - h ({\tilde{x}}_{n}^{1} | {\hat{y}}_{n}^{1}) \end{matrix}

(A104)

\begin{matrix} \overset{(a)}{=} h ({\tilde{x}}_{n}^{1}) - h (u_{n}^{1}) \end{matrix}

(A105)

\begin{matrix} \overset{(b)}{=} h ({\tilde{x}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1}) - n G \end{matrix}

(A106)

\begin{matrix} \overset{(c)}{=} h ({\tilde{x}}_{n}^{1}) - h ({\tilde{u}}_{n}^{1} | {\tilde{y}}_{n}^{1}) - n G \end{matrix}

(A107)

\begin{matrix} \overset{(d)}{=} h ({\tilde{x}}_{n}^{1}) - h ({\tilde{x}}_{n}^{1} | {\tilde{y}}_{n}^{1}) - n G \end{matrix}

(A108)

\begin{matrix} \overset{}{=} I ({\tilde{x}}_{n}^{1}; {\tilde{y}}_{n}^{1}) - n G \end{matrix}

(A109)

\begin{matrix} \overset{}{=} I (B_{n} x_{n}^{1}; B_{n} y_{n}^{1}) - n G \end{matrix}

(A110)

\begin{matrix} \overset{(e)}{=} I (x_{n}^{1}; y_{n}^{1}) - n G, \end{matrix}

(A111)

where

(a)

follows by recalling that

{\hat{y}}_{n}^{1} = {\tilde{x}}_{n}^{1} + u_{n}^{1}

and because

{\hat{y}}_{n}^{1} ⫫ u_{n}^{1}

,

(b)

stems from (A87),

(c)

is a consequence of

{\tilde{y}}_{n}^{1} ⫫ {\tilde{u}}_{n}^{1}

,

(d)

follows from the fact that

{\tilde{y}}_{n}^{1} = {\tilde{x}}_{n}^{1} + {\tilde{u}}_{n}^{1}

. Finally,

(e)

holds because

B_{n}

is invertible for all n. Since, asymptotically as

n \to \infty

, the distortion yielded by

y_{n}^{1}

for the non-stationary source

x_{n}^{1}

is the same which is obtained when

{\tilde{x}}_{n}^{1}

is reconstructed as

{\hat{y}}_{n}^{1}

(recall (A84)), we conclude that

R_{x} (D) - R_{\tilde{x}} (D) \leq \sum_{i = 1}^{M} \log | p_{i} |

, completing the proof. □

Appendix C. Technical Lemmas and Propositions

Proposition A1.

Let the random vector

s_{κ}^{1}

have finite differential entropy, and suppose its covariance matrix

K_{s_{κ}^{1}}

satisfies

λ_{\max} (K_{s_{κ}^{1}}) < \infty

. Then, for any unitary matrix

A \in R^{κ \times κ}

and

i = 1, 2, \dots, κ

\begin{matrix} h (s_{κ}^{1}) - \frac{κ - i}{2} \log (2 π e^{} λ_{\max} (K_{s_{κ}^{1}})) \leq h ({[A]}_{i}^{1} s_{κ}^{1}) \leq \frac{i}{2} \log (2 π e^{} λ_{\max} (K_{s_{κ}^{1}})) . \end{matrix}

(A112)

Proof.

Define

r_{κ}^{1} ≜ A s_{κ}^{1}

. Since

A

is unitary, it follows that

h (r_{κ}^{1}) = h (s_{κ}^{1})

and that

K_{r_{κ}^{1}}

and

K_{s_{κ}^{1}}

have the same eigenvalues. Therefore,

\begin{matrix} h ({[A]}_{i}^{1} s_{κ}^{1}) = h (r_{i}^{1}) \overset{(a)}{\leq} \frac{1}{2} \log ({(2 π e^{})}^{i} \det (K_{r_{i}^{1}})) \overset{(b)}{\leq} \frac{i}{2} \log (2 π e^{} λ_{\max} (K_{r_{i}^{1}})) \overset{(c)}{\leq} \frac{i}{2} \log (2 π e^{} λ_{\max} (K_{r_{κ}^{1}})) \\ = \frac{i}{2} \log (2 π e^{} λ_{\max} (K_{s_{κ}^{1}})), \end{matrix}

(A113)

where

(a)

holds because a Gaussian distribution yields the largest differential entropy for a given covariance matrix,

(b)

is from the fact that

\det (K_{s_{i}^{1}}) = \prod_{k = 1}^{i} λ_{k} (K_{s_{i}^{1}})

and

(c)

is due to the Cauchy interlacing theorem [30]. This proves the upper bound in (A112). For the lower bound, we have

\begin{matrix} h (r_{i}^{1}) \overset{(a)}{\geq} h (r_{κ}^{1}) - h (r_{i κ}^{i + 1}) \overset{(b)}{\geq} h (r_{κ}^{1}) - \frac{κ - i}{2} \log (2 π e^{} λ_{\max} (K_{s_{κ}^{1}})), \end{matrix}

(A114)

where

(a)

stems from the fact that

h (a, b) \leq h (a) + h (b)

and

(b)

follows from (A113). This completes the proof. □

Proposition A2.

Let

u_{1}^{\infty}

be a random process such that the variance of

u (n)

,

σ_{u (n)}^{2} < \infty

for finite n, and

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} \log (σ_{u (n)}^{2}) = 0 . \end{matrix}

(A115)

Then,

\lim_{n \to \infty} n^{- 1} \log (λ_{\max} (K_{u_{n}^{1}})) = 0

. ▲

Proof.

The assumptions on

u_{1}^{\infty}

imply that, for every

ϵ > 0

, there exists a finite

N_{ϵ}

such that, for every

n \geq N_{ϵ}

,

σ_{u (n)}^{2} < e^{n ϵ}

and

S (N_{ϵ}) ≜ \max {σ_{u (1)}^{2}, σ_{u (2)}^{2}, \dots, σ_{u (N_{ϵ})}^{2}} < \infty

. Then,

\begin{matrix} \frac{1}{n} \ln (λ_{\max} (K_{u_{n}^{1}})) \overset{(a)}{\leq} \frac{1}{n} \ln (\sum_{k = 1}^{n} σ_{u (k)}^{2}) < \frac{1}{n} \ln (N_{ϵ} S (N_{ϵ}) + (n - N_{ϵ}) e^{n ϵ}) \overset{(b)}{<} \frac{1}{n} \ln ((n - N_{ϵ}) e^{n ϵ}) + \frac{N_{ϵ} S (N_{ϵ})}{n (n - N_{ϵ}) e^{n ϵ}}, \end{matrix}

where

(a)

holds because

\sum_{k = 1}^{n} λ_{k} (K_{u_{n}^{1}}) = tr {K_{u_{n}^{1}}}

, while

(b)

stems from the fact that, for every

x, y > 0

,

\ln (x + y) < \ln (x) + y / x

. Thus, for every

ϵ > 0

,

\lim_{n \to \infty} n^{- 1} \log (λ_{\max} (K_{u_{n}^{1}})) < ϵ

, which means that

\lim_{n \to \infty} n^{- 1} \log (λ_{\max} (K_{u_{n}^{1}})) = 0

, completing the proof. □

Proposition A3.

Let

v_{1}^{\infty}

be an entropy-balanced random process. Then, for each

ν \in N

and for every sequence of matrices

{Ψ_{n}}_{n = ν}^{\infty}

,

Ψ_{n} \in R^{ν \times n}

with orthonormal rows,

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (Ψ_{n} v_{n}^{1}) = 0 . \end{matrix}

(A116)

Proof of Proposition A3.

We will first show that

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} h (Ψ_{n} v_{n}^{1}) \geq 0 . \end{matrix}

(A117)

To see this, notice that, for every

Ψ_{n} \in R^{ν \times n}

with orthonormal rows, there exists a matrix

Φ_{n} \in R^{(n - ν) \times n}

with orthonormal rows which are also orthogonal to those of

Ψ_{n}

. This means that the matrix

{[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} \in R^{n \times n}

is unitary; thus,

\begin{matrix} h (v_{n}^{1}) = h ({[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} v_{n}^{1}) \overset{(a)}{=} h (Φ_{n} v_{n}^{1}) + h (Ψ_{n} v_{n}^{1} | Φ_{n} v_{n}^{1}) \overset{(b)}{\leq} h (Φ_{n} v_{n}^{1}) + h (Ψ_{n} v_{n}^{1}), \end{matrix}

(A118)

where

(a)

holds due to the chain-rule of differential entropy and

(b)

follows because conditioning does not increase differential entropy. Therefore,

h (Ψ_{n} v_{n}^{1}) \geq h (v_{n}^{1}) - h (Φ_{n} v_{n}^{1})

. Dividing this by n, taking the limit when

n \to \infty

and recalling that

v_{1}^{\infty}

satisfies (17) yields (A117).

We will now prove that

\lim_{n \to \infty} \frac{1}{n} h (Ψ_{n} v_{n}^{1}) \leq 0

. For this purpose, let

{\tilde{v}}_{1}^{\infty}

be a jointly Gaussian random process with the same second-order statistics as

v_{1}^{\infty}

. Then,

\begin{matrix} h (Ψ_{n} v_{n}^{1}) \leq h (Ψ_{n} {\tilde{v}}_{n}^{1}) = \frac{1}{2} \log ({(2 π e^{})}^{ν} \det (Ψ_{n} K_{{\tilde{v}}_{n}^{1}} Ψ_{n}^{T})) \leq \frac{1}{2} \log ({(2 π e^{})}^{ν} λ_{\max} {(K_{{\tilde{v}}_{n}^{1}})}^{ν}), \end{matrix}

(A119)

with the inequality due to the fact that

Ψ_{n}

has orthonormal rows. But

v_{1}^{\infty}

meets the requirements of Proposition A2; thus,

\lim_{n \to \infty} \frac{1}{n} h (Ψ_{n} v_{n}^{1}) \leq \lim_{n \to \infty} \frac{ν}{2 n} (2 π e^{} λ_{\max} (K_{{\tilde{v}}_{n}^{1}})) = 0

. The proof is completed by combining this result with (A117). □

Lemma A1.

Let

u_{1}^{\infty}

be a random process with independent elements, and where each element

u_{i}

is uniformly distributed over possible different intervals

[- \frac{a_{i}}{2}, \frac{a_{i}}{2}]

, such that

a_{\max} > | a_{i} | > a_{\min} > 0, \forall i \in N

, for some positive and bounded

a_{\min} < a_{\max}

. Then,

u_{1}^{\infty}

is entropy balanced.

Proof.

Without loss of generality, we can assume that

a_{i} \geq 1

, for all i (otherwise, we could scale the input by

1 / a_{\min}

, which would scale the output by the same proportion, increasing the input entropy by

n \log (1 / a_{\min})

and the output entropy by

(n - ν) \log (1 / a_{\min})

, without changing the result). The input vector

u_{n}^{1}

is confined to an n-box

U_{n}

(the support of

u_{1}^{n}

) of volume

V_{n} (U_{n}) = \prod_{i = 1}^{n} a_{i}

and has entropy

\log (\prod_{i = 1}^{n} a_{i})

. This support is an n-box which contains

(\binom{n}{k}) 2^{n - k}

k-boxes of different k-volume. Each of these k-boxes is determined by fixing

n - k

entries in

u_{n}^{1}

to

\pm a_{i} / 2

, and letting the remaining k entries sweep freely over

[- \frac{a_{i}}{2}, \frac{a_{i}}{2}]

. Thus, the k-volume of each k-box is the product of the k support sizes

a_{i}

of the associated selected free-sweeping entries. But recalling that

a_{i} > 1

for all i, the volume of each k-box can be upper bounded by

\prod_{i = 1}^{n} a_{i}

. With this, the added volume of all the k-boxes contained in the original n-box can be upper bounded as

\begin{matrix} V_{k}^{□} (U_{n}) \leq (\binom{n}{k}) 2^{n - k} \prod_{i = 1}^{n} a_{i} . \end{matrix}

(A120)

We now use this result to upper bound the entropy rate of

y_{n}^{ν + 1}

.

Let

y_{n}^{1} ≜ {[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} u_{n}^{1}

where

{[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T} \in R^{n \times n}

is a unitary matrix and where

Ψ_{n} \in R^{ν \times n}

and

Φ_{n} \in R^{(n - ν) \times n}

have orthonormal rows. From this definition,

y_{n}^{ν + 1}

will distribute over a finite region

Y_{n}^{ν + 1} \subseteq R^{n - ν}

, corresponding to the projection onto the

(n - ν)

-dimensional span of the rows of

Φ_{n}

. Hence,

h (y_{n}^{ν + 1})

is upper bounded by the entropy of a uniformly distributed vector over the same support, i.e., by

\log V_{n - ν} (Y_{n}^{ν + 1})

, where

V_{n - ν} (Y_{n}^{ν + 1})

is the

(n - ν)

-dimensional volume of this support. In turn,

V_{n - ν} (Y_{n}^{ν + 1})

is upper bounded by the sum of the volume of all

(ν - k)

-dimensional boxes contained in the n-box in which

u_{n}^{1}

is confined, which we already denoted by

V_{n - ν}^{□} (U_{n})

, and which is upper bounded as in (A120). Therefore,

\begin{matrix} h (y_{n}^{1 + ν}) \leq \log V_{n - ν} (Y_{n}^{ν + 1}) \leq \log V_{n - ν}^{□} (U_{n}) \leq \log (\frac{n!}{(n - ν)! ν!} 2^{ν} \prod_{i = 1}^{n} a_{i}) \end{matrix}

(A121)

\begin{matrix} = \log (n^{ν} 2^{ν}) + \log (\frac{n!}{(n - ν)! n^{ν} ν!}) + \log (\prod_{i = 1}^{n} a_{i}) . \end{matrix}

(A122)

Recalling that

h (u_{n}^{1}) = \log (\prod_{i = 1}^{n} a_{i})

, dividing by n and taking the limit as

n \to \infty

yields

\begin{matrix} \lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1}) - h (u_{n}^{1})) \leq 0 . \end{matrix}

(A123)

On the other hand,

\begin{matrix} h (y_{n}^{ν + 1}) = h (y_{n}^{1}) - h (y_{ν}^{1} | y_{n}^{ν + 1}) \overset{(a)}{=} h (u_{n}^{1}) - h (y_{ν}^{1} | y_{n}^{ν + 1}) \geq h (u_{n}^{1}) - h (y_{ν}^{1}), \end{matrix}

(A124)

where

(a)

follows because

{[Ψ_{n}^{T} | Φ_{n}^{T}]}^{T}

is an orthogonal matrix. Letting

{(y_{G})}_{ν}^{1}

correspond to the jointly Gaussian sequence with the same second-order moments as

y_{ν}^{1}

, and recalling that the Gaussian distribution maximizes differential entropy for a given covariance, we obtain the upper bound

h (y_{ν}^{1}) \leq h ({(y_{G})}_{ν}^{1}) \overset{(a)}{=} \frac{1}{2} \log ({(2 π e^{})}^{ν} \det (Ψ_{n} diag {σ_{u_{i}}^{2}}_{i = 1}^{n} Ψ_{n}^{T})) \overset{(b)}{\leq} \frac{ν}{2} \log (2 π e^{} \max {σ_{u_{i}}^{2}}_{i = 1}^{n}),

(A125)

where

(a)

follows since the

{u_{i}}_{i = 1}^{n}

are independent, and

(b)

stems from the fact that

Ψ_{n} \in R^{ν \times n}

has orthonormal rows. Since

\max {σ_{u_{i}}^{2}}_{i = 1}^{n}

is bounded for all n, we obtain by substituting (A125) into (A124) that

\lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1}) - h (u_{n}^{1})) \geq 0

. The combination of this with (A123) yields

\lim_{n \to \infty} \frac{1}{n} (h (y_{n}^{ν + 1}) - h (u_{n}^{1})) = 0

, satisfying Condition ii) in Definition 2. From this, the proof is completed by noting that

u_{1}^{\infty}

satisfies Condition i) in Definition 2. This completes the proof. □

Lemma A2.

Let

A (z)

be a causal, finite-order, stable and strictly minimum-phase rational transfer function with impulse response

a_{0}, a_{1}, \dots

such that

a_{0} = 1

. Then,

\lim_{n \to \infty} λ_{1} (A_{n} A_{n}^{T}) > 0

and

\lim_{n \to \infty} λ_{n} (A_{n} A_{n}^{T}) < \infty

.

Proof of Lemma A2.

The fact that

\lim_{n \to \infty} λ_{n} (A_{n} A_{n}^{T})

is upper bounded follows directly from the fact that

A (z)

is a stable transfer function. On the other hand,

A_{n} A_{n}^{T}

is positive definite, with

\lim_{n \to \infty} λ_{1} (A_{n} A_{n}^{T}) \geq 0

. Suppose that

\lim_{n \to \infty} λ_{1} (A_{n} A_{n}^{T}) = 0

. If this were true, then it would hold that

\lim_{n \to \infty} λ_{n} (A_{n}^{- 1} A_{n}^{- T}) = \infty

. But

A_{n}^{- 1}

is the lower triangular Toeplitz matrix associated with

A^{- 1} (z)

, which is stable (since

A (z)

is minimum phase), implying that

\lim_{n \to \infty} λ_{n} (A_{n}^{- 1} A_{1}^{- T}) < \infty

, thus leading to a contradiction. This completes the proof. □

We re-state here (for completeness and convenience) the unnumbered lemma in the proof of Reference [16], Theorem 1, as follows:

Lemma A3.

Let the transfer function

G (z)

satisfy Assumption 1 and suppose it has no poles. Then,

\begin{matrix} λ_{l} (G_{n} G_{n}^{T}) = \{\begin{matrix} α_{n, l}^{2} {(ρ_{l})}^{- 2 n} & , if l \leq m, \\ α_{n, l}^{2} & , o t h e r w i s e, \end{matrix} \end{matrix}

(A126)

where the elements in the sequence

{α_{n, l}}

are positive and increase or decrease at most polynomially with n.

Lemma A4.

Let

A, B

be matrices with the same dimensions. Then,

\begin{matrix} λ_{\min} ((A + B) {(A + B)}^{T}) \geq λ_{\min} (A A^{T}) + λ_{\min} (B B^{T}) - 2 σ_{\max} (A) σ_{\max} (B) . \end{matrix}

(A127)

Proof.

For every

x

such that

∥ x ∥ = 1

,

\begin{matrix} x^{T} (A + B) {(A + B)}^{T} x = x^{T} A A^{T} x + x^{T} B B^{T} x + x^{T} A B^{T} x + x^{T} B A^{T} x \\ \geq λ_{\min} (A A^{T}) + λ_{\min} (B B^{T}) - 2 σ_{\max} (A) σ_{\max} (B), \end{matrix}

(A128)

where the last inequality holds because

A A^{T}

and

B B^{T}

are symmetric and because of the Cauchy-Schwartz inequality. The proof is completed by noting that (A128) holds for the

x

that minimizes

x^{T} (A + B) {(A + B)}^{T} x

, and

λ_{\min} ((A + B) {(A + B)}^{T}) = \min_{x : ∥ x ∥ = 1} x^{T} (A + B) {(A + B)}^{T} x

. □

References

Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
Shannon, C.E.; Weaver, W. The Mathematical Theory of Communication; University of Illinois Press: Urbana, IL, USA, 1949. [Google Scholar]
Itô, H. Principle of the minimum entropy in information theory. Proc. Jpn. Acad. 1953, 29, 194–197. [Google Scholar] [CrossRef]
O’Neal, J. Bounds on subjective performance measures for source encoding systems. IEEE Trans. Inf. Theory 1971, 17, 224–231. [Google Scholar] [CrossRef]
Pierobon, M.; Akyildiz, I.F. Capacity of a Diffusion-Based Molecular Communication System With Channel Memory and Molecular Noise. IEEE Trans. Inf. Theory 2013, 59, 942–954. [Google Scholar] [CrossRef]
Akyildiz, I.F.; Pierobon, M.; Balasubramaniam, S. An Information Theoretic Framework to Analyze Molecular Communication Systems Based on Statistical Mechanics. Proc. IEEE 2019, 107, 1230–1255. [Google Scholar] [CrossRef]
Aaron, M.R.; McDonald, R.A.; Protonotarios, E. Entropy power loss in linear sampled data filters. Proc. IEEE 1967, 55, 1093–1094. [Google Scholar] [CrossRef]
Papoulis, A.; Pillai, S.U. Probability, Random Variables and Stochastic Processes, 3rd ed.; McGraw-Hill: New York, NY, USA, 1991. [Google Scholar]
Kim, Y.H. Feedback capacity of stationary Gaussian channels. IEEE Trans. Inf. Theory 2010, 56, 57–85. [Google Scholar] [CrossRef]
Rudin, W. Real and Complex Analysis; McGraw-Hill: New York, NY, USA, 1987. [Google Scholar]
Serón, M.M.; Braslavsky, J.H.; Goodwin, G.C. Fundamental Limitations in Filtering and Control; Springer: London, UK, 1997. [Google Scholar]
Martins, N.C.; Dahleh, M.M.A. Fundamental limitations of performance in the presence of finite capacity feedback. In Proceedings of the 2005, American Control Conference, Portland, OR, USA, 8–10 June 2005. [Google Scholar]
Martins, N.; Dahleh, M. Feedback control in the presence of noisy Channels: “Bode-like” fundamental limitations of performance. IEEE Trans. Autom. Control 2008, 53, 1604–1615. [Google Scholar] [CrossRef]
Yu, S.; Mehta, P.G. Bode-Like Fundamental Performance Limitations in Control of Nonlinear Systems. IEEE Trans. Autom. Control 2010, 55, 1390–1405. [Google Scholar] [CrossRef]
Gray, R.M. Information rates of autorregressive processes. IEEE Trans. Inf. Theory 1970, IT-16, 412–421. [Google Scholar] [CrossRef]
Hashimoto, T.; Arimoto, S. On the rate-distortion function for the nonstationary Gaussian autoregressive process. IEEE Trans. Inf. Theory 1980, IT-26, 478–480. [Google Scholar] [CrossRef]
Gray, R.M.; Hashimoto, T. A note on rate-distortion functions for nonstationary Gaussian autoregressive processes. IEEE Trans. Inf. Theory 2008, 54, 1319–1322. [Google Scholar] [CrossRef] [Green Version]
Gray, R.M. Toeplitz and circulant matrices: A review. Found. Trends Commun. Inf. Theory 2006, 2, 155–239. [Google Scholar] [CrossRef]
Rényi, A. On the dimension and entropy of probability distributions. Acta Math. Hung. 1959, 10, 193–215. [Google Scholar] [CrossRef]
Shannon, C. Coding theorems for a discrete source with a fidelity criterion. IRNE Nat. Conv. Rec. 1959, 4, 143–163. [Google Scholar]
Goodwin, G.C.; Graebe, S.F.; Salgado, M.E. Control System Design, 1st ed.; Prentice Hall PTR: Upper Saddle River, NJ, USA, 2000. [Google Scholar]
Chen, C.T. Linear System Theorey and Design, 3rd ed.; The Oxford Series in Electrical and Computing Engineering; Oxford University Press: Oxford, UK, 1999. [Google Scholar]
Elia, N. When Bode meets Shannon: Control-oriented feedback communication schemes. IEEE Trans. Autom. Control 2004, 49, 1477–1488. [Google Scholar] [CrossRef]
Silva, E.I.; Derpich, M.S.; Østergaard, J. A framework for control system design subject to average data-rate constraints. IEEE Trans. Autom. Control 2011, 56, 1886–1899. [Google Scholar] [CrossRef] [Green Version]
Silva, E.I.; Derpich, M.S.; Østergaard, J. An achievable data-rate region subject to a stationary performance constraint for LTI plants. IEEE Trans. Autom. Control 2011, 56, 1968–1973. [Google Scholar] [CrossRef] [Green Version]
Yüksel, S. Characterization of Information Channels for Asymptotic Mean Stationarity and Stochastic Stability of Nonstationary/Unstable Linear Systems. IEEE Trans. Inf. Theory 2012, 58, 6332–6354. [Google Scholar] [CrossRef] [Green Version]
Freudenberg, J.S.; Middleton, R.H.; Braslavsky, J.H. Minimum Variance Control Over a Gaussian Communication Channel. IRE Trans. Comm. Syst. 2011, 56, 1751–1765. [Google Scholar] [CrossRef] [Green Version]
Ardestanizadeh, E.; Franceschetti, M. Control-theoretic approach to communication with feedback. IEEE Trans. Autom. Control 2012, 57, 2576–2587. [Google Scholar] [CrossRef]
Grenander, U.; Szegö, G. Toeplitz Forms and Their Applications; University of California Press: Berkeley, CA, USA, 1958. [Google Scholar]
Horn, R.A.; Johnson, C.R. Matrix Analysis; Cambridge University Press: Cambridge, UK, 1985. [Google Scholar]
Fiedler, M. Bounds for the determinant of the sum of Hermitian matrices. Proc. Am. Math. Soc. 1971, 30, 27–31. [Google Scholar] [CrossRef]

Figure 1. A causal, stable, linear and time-invariant system G with input and output processes, initial state, and output disturbance.

Figure 2. Support of

u

(laying in the u-v plane) compared to that of

y = \overset{˘}{G} u

(the rhombus in

R^{3}

).

Figure 2. Support of

u

(laying in the u-v plane) compared to that of

y = \overset{˘}{G} u

(the rhombus in

R^{3}

).

Figure 3. Image of the cube

{[0, 1]}^{3}

through the square matrix with columns

{[1 2 0]}^{T}

,

{[0 1 2]}^{T}

, and

{[0 0 1]}^{T}

.

Figure 3. Image of the cube

{[0, 1]}^{3}

through the square matrix with columns

{[1 2 0]}^{T}

,

{[0 1 2]}^{T}

, and

{[0 0 1]}^{T}

.

Figure 4. (Left): LTI system P within a noisy feedback loop. (Right): equivalent system when the feedback channel is noiseless and has unit gain.

Figure 5. (Top): The class of feedback channels described by Assumption 3. (Bottom): an equivalent form.

Figure 6. Block diagram representation of how the non-stationary source

x_{1}^{\infty}

is built and then reconstructed as

y = x + u

.

Figure 6. Block diagram representation of how the non-stationary source

x_{1}^{\infty}

is built and then reconstructed as

y = x + u

.

Figure 7. Block-diagram representation of the changes of variables in the proof of Theorem 9.

Figure 8. Block diagram representation a non-white Gaussian channel

y = x + z

and the coding scheme considered in Reference [9].

Figure 8. Block diagram representation a non-white Gaussian channel

y = x + z

and the coding scheme considered in Reference [9].

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Derpich, M.S.; Müller, M.; Østergaard, J. The Entropy Gain of Linear Systems and Some of Its Implications. Entropy 2021, 23, 947. https://doi.org/10.3390/e23080947

AMA Style

Derpich MS, Müller M, Østergaard J. The Entropy Gain of Linear Systems and Some of Its Implications. Entropy. 2021; 23(8):947. https://doi.org/10.3390/e23080947

Chicago/Turabian Style

Derpich, Milan S., Matias Müller, and Jan Østergaard. 2021. "The Entropy Gain of Linear Systems and Some of Its Implications" Entropy 23, no. 8: 947. https://doi.org/10.3390/e23080947

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Entropy Gain of Linear Systems and Some of Its Implications

Abstract

1. Introduction

1.1. Main Contributions of this Paper

1.2. Paper Outline

2. Preliminaries

2.1. Notation

2.2. Mutual Information and Differential Entropy

2.3. System Model and Assumptions

3. Revisiting Theorem 14 in Reference Shannon et al.

3.1. Proof of Theorem 2

3.2. Formalizing Shannon’s Argument

4. The Effective Differential Entropy

Application Example: Shannon Lower Bound

5. Entropy-Balanced Processes: Geometric Interpretation and Properties

5.1. Geometric Interpretation

5.2. Characterization of Entropy-Balanced Processes

6. Entropy Gain Due to External Disturbances

6.1. Input Disturbances Do Not Produce Entropy Gain

6.2. The Entropy Gain Introduced by Output Disturbances when G is MP is Zero

6.3. The Entropy Gain Introduced by Output Disturbances when G ( z ) is NMP

6.4. Proof of Theorem 4

7. Entropy Gain Due to a Random Initial State

8. Some Implications

8.1. Networked Control

8.2. Rate Distortion Function for Non-Stationary Processes

8.3. The Feedback Channel Capacity of (Non-White) Gaussian Channels

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of Theorem 3

Appendix B. Proofs of Results Stated in the Previous Sections

Appendix C. Technical Lemmas and Propositions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

6.3. The Entropy Gain Introduced by Output Disturbances when $G (z)$ is NMP