Probability, Statistics,
and Random Processes
for Electrical Engineering
Third Edition
Alberto Leon-Garcia
University of Toronto
Upper Saddle River, NJ 07458
Contents
Preface
ix
CHAPTER 1
1.1
1.2
1.3
1.4
1.5
1.6
CHAPTER 2
2.1
2.2
*2.3
2.4
2.5
2.6
*2.7
*2.8
*2.9
CHAPTER 3
3.1
3.2
3.3
3.4
3.5
3.6
Probability Models in Electrical
and Computer Engineering
1
Mathematical Models as Tools in Analysis and Design
2
Deterministic Models
4
Probability Models
4
A Detailed Example: A Packet Voice Transmission System
Other Examples
11
Overview of Book
16
Summary
17
Problems
18
Basic Concepts of Probability Theory
21
Specifying Random Experiments
21
The Axioms of Probability
30
Computing Probabilities Using Counting Methods
41
Conditional Probability
47
Independence of Events
53
Sequential Experiments
59
Synthesizing Randomness: Random Number Generators
Fine Points: Event Classes
70
Fine Points: Probabilities of Sequences of Events
75
Summary
79
Problems
80
Discrete Random Variables
9
67
96
The Notion of a Random Variable
96
Discrete Random Variables and Probability Mass Function
Expected Value and Moments of Discrete Random Variable
Conditional Probability Mass Function
111
Important Discrete Random Variables
115
Generation of Discrete Random Variables
127
Summary
129
Problems
130
99
104
v
vi
Contents
CHAPTER 4
4.1
4.2
4.3
4.4
4.5
4.6
4.7
4.8
4.9
*4.10
CHAPTER 5
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
CHAPTER 6
6.1
6.2
6.3
6.4
6.5
6.6
One Random Variable
141
The Cumulative Distribution Function
141
The Probability Density Function
148
The Expected Value of X
155
Important Continuous Random Variables
163
Functions of a Random Variable
174
The Markov and Chebyshev Inequalities
181
Transform Methods
184
Basic Reliability Calculations
189
Computer Methods for Generating Random Variables
Entropy
202
Summary
213
Problems
215
Pairs of Random Variables
194
233
Two Random Variables
233
Pairs of Discrete Random Variables
236
The Joint cdf of X and Y
242
The Joint pdf of Two Continuous Random Variables
248
Independence of Two Random Variables
254
Joint Moments and Expected Values of a Function of Two Random
Variables
257
Conditional Probability and Conditional Expectation
261
Functions of Two Random Variables
271
Pairs of Jointly Gaussian Random Variables
278
Generating Independent Gaussian Random Variables
284
Summary
286
Problems
288
Vector Random Variables
303
Vector Random Variables
303
Functions of Several Random Variables
309
Expected Values of Vector Random Variables
318
Jointly Gaussian Random Vectors
325
Estimation of Random Variables
332
Generating Correlated Vector Random Variables
342
Summary
346
Problems
348
Contents
CHAPTER 7
7.1
7.2
7.3
*7.4
*7.5
7.6
CHAPTER 8
8.1
8.2
8.3
8.4
8.5
8.6
8.7
CHAPTER 9
9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
*9.9
9.10
Sums of Random Variables and Long-Term Averages
359
Sums of Random Variables
360
The Sample Mean and the Laws of Large Numbers
365
Weak Law of Large Numbers
367
Strong Law of Large Numbers
368
The Central Limit Theorem
369
Central Limit Theorem
370
Convergence of Sequences of Random Variables
378
Long-Term Arrival Rates and Associated Averages
387
Calculating Distribution’s Using the Discrete Fourier
Transform
392
Summary
400
Problems
402
Statistics
411
Samples and Sampling Distributions
411
Parameter Estimation
415
Maximum Likelihood Estimation
419
Confidence Intervals
430
Hypothesis Testing
441
Bayesian Decision Methods
455
Testing the Fit of a Distribution to Data
462
Summary
469
Problems
471
Random Processes
vii
487
Definition of a Random Process
488
Specifying a Random Process
491
Discrete-Time Processes: Sum Process, Binomial Counting Process,
and Random Walk
498
Poisson and Associated Random Processes
507
Gaussian Random Processes, Wiener Process
and Brownian Motion
514
Stationary Random Processes
518
Continuity, Derivatives, and Integrals of Random Processes
529
Time Averages of Random Processes and Ergodic Theorems
540
Fourier Series and Karhunen-Loeve Expansion
544
Generating Random Processes
550
Summary
554
Problems
557
viii
Contents
CHAPTER 10
10.1
10.2
10.3
10.4
*10.5
*10.6
10.7
Analysis and Processing of Random Signals
Power Spectral Density
577
Response of Linear Systems to Random Signals
587
Bandlimited Random Processes
597
Optimum Linear Systems
605
The Kalman Filter
617
Estimating the Power Spectral Density
622
Numerical Techniques for Processing Random Signals
Summary
633
Problems
635
CHAPTER 11
11.1
11.2
11.3
Markov Chains
CHAPTER 12
12.1
12.2
12.3
12.4
12.5
12.6
12.7
12.8
12.9
12.10
647
Introduction to Queueing Theory
713
The Elements of a Queueing System
714
Little’s Formula
715
The M/M/1 Queue
718
Multi-Server Systems: M/M/c, M/M/c/c, And M>M> ˆ
727
Finite-Source Queueing Systems
734
M/G/1 Queueing Systems
738
M/G/1 Analysis Using Embedded Markov Chains
745
Burke’s Theorem: Departures From M/M/c Systems
754
Networks of Queues: Jackson’s Theorem
758
Simulation and Data Analysis of Queueing Systems
771
Summary
782
Problems
784
Appendices
Index
628
Markov Processes
647
Discrete-Time Markov Chains
650
Classes of States, Recurrence Properties, and Limiting
Probabilities
660
Continuous-Time Markov Chains
673
Time-Reversed Markov Chains
686
Numerical Techniques for Markov Chains
692
Summary
700
Problems
702
11.4
*11.5
11.6
A.
B.
C.
577
Mathematical Tables
797
Tables of Fourier Transforms
Matrices and Linear Algebra
805
800
802
CHAPTER
Basic Concepts
of Probability Theory
2
This chapter presents the basic concepts of probability theory. In the remainder of the
book, we will usually be further developing or elaborating the basic concepts presented here. You will be well prepared to deal with the rest of the book if you have a good
understanding of these basic concepts when you complete the chapter.
The following basic concepts will be presented. First, set theory is used to specify
the sample space and the events of a random experiment. Second, the axioms of probability specify rules for computing the probabilities of events. Third, the notion of conditional probability allows us to determine how partial information about the outcome
of an experiment affects the probabilities of events. Conditional probability also allows
us to formulate the notion of “independence” of events and of experiments. Finally, we
consider “sequential” random experiments that consist of performing a sequence of
simple random subexperiments. We show how the probabilities of events in these experiments can be derived from the probabilities of the simpler subexperiments. Throughout
the book it is shown that complex random experiments can be analyzed by decomposing them into simple subexperiments.
2.1
SPECIFYING RANDOM EXPERIMENTS
A random experiment is an experiment in which the outcome varies in an unpredictable fashion when the experiment is repeated under the same conditions. A random experiment is specified by stating an experimental procedure and a set of one or
more measurements or observations.
Example 2.1
Experiment E1: Select a ball from an urn containing balls numbered 1 to 50. Note the number of
the ball.
Experiment E2 : Select a ball from an urn containing balls numbered 1 to 4. Suppose that balls 1
and 2 are black and that balls 3 and 4 are white. Note the number and color of the ball you select.
Experiment E3: Toss a coin three times and note the sequence of heads and tails.
Experiment E4: Toss a coin three times and note the number of heads.
Experiment E5 : Count the number of voice packets containing only silence produced from a
group of N speakers in a 10-ms period.
21
22
Chapter 2
Basic Concepts of Probability Theory
Experiment E6 : A block of information is transmitted repeatedly over a noisy channel until an
error-free block arrives at the receiver. Count the number of transmissions required.
Experiment E7: Pick a number at random between zero and one.
Experiment E8: Measure the time between page requests in a Web server.
Experiment E9: Measure the lifetime of a given computer memory chip in a specified environment.
Experiment E10: Determine the value of an audio signal at time t1 .
Experiment E11: Determine the values of an audio signal at times t1 and t2 .
Experiment E12: Pick two numbers at random between zero and one.
Experiment E13 : Pick a number X at random between zero and one, then pick a number Y at
random between zero and X.
Experiment E14 : A system component is installed at time t = 0. For t Ú 0 let X1t2 = 1 as long
as the component is functioning, and let X1t2 = 0 after the component fails.
The specification of a random experiment must include an unambiguous statement
of exactly what is measured or observed. For example, random experiments may consist
of the same procedure but differ in the observations made, as illustrated by E3 and E4 .
A random experiment may involve more than one measurement or observation,
as illustrated by E2 , E3 , E11 , E12 , and E13 . A random experiment may even involve a
continuum of measurements, as shown by E14 .
Experiments E3 , E4 , E5 , E6 , E12 , and E13 are examples of sequential experiments that can be viewed as consisting of a sequence of simple subexperiments. Can
you identify the subexperiments in each of these? Note that in E13 the second subexperiment depends on the outcome of the first subexperiment.
2.1.1
The Sample Space
Since random experiments do not consistently yield the same result, it is necessary to
determine the set of possible results. We define an outcome or sample point of a random experiment as a result that cannot be decomposed into other results. When we
perform a random experiment, one and only one outcome occurs. Thus outcomes are
mutually exclusive in the sense that they cannot occur simultaneously. The sample
space S of a random experiment is defined as the set of all possible outcomes.
We will denote an outcome of an experiment by z, where z is an element or point
in S. Each performance of a random experiment can then be viewed as the selection at
random of a single point (outcome) from S.
The sample space S can be specified compactly by using set notation. It can be visualized by drawing tables, diagrams, intervals of the real line, or regions of the plane. There
are two basic ways to specify a set:
1. List all the elements, separated by commas, inside a pair of braces:
A = 50, 1, 2, 36,
2. Give a property that specifies the elements of the set:
A = 5x : x is an integer such that 0 … x … 36.
Note that the order in which items are listed does not change the set, e.g., 50, 1, 2, 36
and 51, 2, 3, 06 are the same set.
Section 2.1
Specifying Random Experiments
23
Example 2.2
The sample spaces corresponding to the experiments in Example 2.1 are given below using set
notation:
S1 = 51, 2, Á , 506
S2 = 511, b2, 12, b2, 13, w2, 14, w26
S3 = 5HHH, HHT, HTH, THH, TTH, THT, HTT, TTT6
S4 = 50, 1, 2, 36
S5 = 50, 1, 2, Á , N6
S6 = 51, 2, 3, Á 6
S7 = 5x : 0 … x … 16 = 30, 14
See Fig. 2.1(a).
S8 = 5t : t Ú 06 = 30, q 2
S9 = 5t : t Ú 06 = 30, q 2
See Fig. 2.1(b).
S10 = 5v : - q 6 v 6 q 6 = 1- q , q 2
S11 = 51v1 , v22 : - q 6 v1 6 q and - q 6 v2 6 q 6
S12 = 51x, y2 : 0 … x … 1 and 0 … y … 16
S13 = 51x, y2 : 0 … y … x … 16
See Fig. 2.1(c).
See Fig. 2.1(d).
S14 = set of functions X1t2 for which X1t2 = 1 for 0 … t 6 t0 and X1t2 = 0 for t Ú t0 ,
where t0 7 0 is the time when the component fails.
Random experiments involving the same experimental procedure may have different sample spaces as shown by Experiments E3 and E4 . Thus the purpose of an experiment affects the choice of sample space.
S7
S9
x
0
1
(a) Sample space for Experiment E7.
t
0
(b) Sample space for Experiment E9.
y
y
1
1
S12
0
S13
1
x
(c) Sample space for Experiment E12.
FIGURE 2.1
Sample spaces for Experiments E7 , E9 , E12 , and E13 .
0
1
x
(d) Sample space for Experiment E13.
24
Chapter 2
Basic Concepts of Probability Theory
There are three possibilities for the number of outcomes in a sample space. A
sample space can be finite, countably infinite, or uncountably infinite. We call S a
discrete sample space if S is countable; that is, its outcomes can be put into one-to-one
correspondence with the positive integers. We call S a continuous sample space if S is
not countable. Experiments E1 , E2 , E3 , E4 , and E5 have finite discrete sample spaces.
Experiment E6 has a countably infinite discrete sample space. Experiments E7 through
E13 have continuous sample spaces.
Since an outcome of an experiment can consist of one or more observations or
measurements, the sample space S can be multi-dimensional. For example, the outcomes in Experiments E2 , E11 , E12 , and E13 are two-dimensional, and those in Experiment E3 are three-dimensional. In some instances, the sample space can be written as
the Cartesian product of other sets.1 For example, S11 = R * R, where R is the set of
real numbers, and S3 = S * S * S, where S = 5H, T6.
It is sometimes convenient to let the sample space include outcomes that are
impossible. For example, in Experiment E9 it is convenient to define the sample
space as the positive real line, even though a device cannot have an infinite lifetime.
2.1.2
Events
We are usually not interested in the occurrence of specific outcomes, but rather in
the occurrence of some event (i.e., whether the outcome satisfies certain conditions). This requires that we consider subsets of S. We say that A is a subset of B if
every element of A also belongs to B. For example, in Experiment E10 , which involves the measurement of a voltage, we might be interested in the event “signal
voltage is negative.” The conditions of interest define a subset of the sample space,
namely, the set of points z from S that satisfy the given conditions. For example,
“voltage is negative” corresponds to the set 5z : - q 6 z 6 06. The event occurs if
and only if the outcome of the experiment z is in this subset. For this reason events
correspond to subsets of S.
Two events of special interest are the certain event, S, which consists of all outcomes and hence always occurs, and the impossible or null event, , which contains no
outcomes and hence never occurs.
Example 2.3
In the following examples, A k refers to an event corresponding to Experiment Ek in Example 2.1.
E1 :
E2 :
E3 :
E4 :
E5 :
1
“An even-numbered ball is selected,” A 1 = 52, 4, Á , 48, 506.
“The ball is white and even-numbered,” A 2 = 514, w26.
“The three tosses give the same outcome,” A 3 = 5HHH, TTT6.
“The number of heads equals the number of tails,” A 4 = .
“No active packets are produced,” A 5 = 506.
The Cartesian product of the sets A and B consists of the set of all ordered pairs (a, b), where the first element is taken from A and the second from B.
Section 2.1
Specifying Random Experiments
25
“Fewer than 10 transmissions are required,” A 6 = 51, Á , 96.
“The number selected is nonnegative,” A 7 = S7 .
“Less than t0 seconds elapse between page requests,” A 8 = 5t : 0 … t 6 t06 = 30, t02.
“The chip lasts more than 1000 hours but fewer than 1500 hours,” A 9 = 5t : 1000 6 t 6 15006
= 11000, 15002.
E10: “The absolute value of the voltage is less than 1 volt,” A 10 = 5v : -1 6 v 6 16 = 1-1, 12.
E11: “The two voltages have opposite polarities,” A 11 = 51v1 , v22 : 1v1 6 0 and v2 7 02 or 1v1 7 0
and v2 6 026.
E12: “The two numbers differ by less than 1/10,” A 12 = 51x, y2 : 1x, y2 in S12 and ƒ x - y ƒ 6 1/106.
E13: “The two numbers differ by less than 1/10,” A 13 = 51x, y2 : 1x, y2 in S13 and ƒ x - y ƒ 6 1/106.
E14: “The system is functioning at time t1 ,” A 14 = subset of S14 for which X1t12 = 1.
E6:
E7:
E8:
E9:
An event may consist of a single outcome, as in A 2 and A 5 . An event from a
discrete sample space that consists of a single outcome is called an elementary event.
Events A 2 and A 5 are elementary events. An event may also consist of the entire sample space, as in A 7 . The null event, , arises when none of the outcomes satisfy the conditions that specify a given event, as in A 4 .
2.1.3
Review of Set Theory
In random experiments we are interested in the occurrence of events that are represented by sets. We can combine events using set operations to obtain other events. We
can also express complicated events as combinations of simple events. Before proceeding with further discussion of events and random experiments, we present some essential concepts from set theory.
A set is a collection of objects and will be denoted by capital letters S, A, B, Á .
We define U as the universal set that consists of all possible objects of interest in a
given setting or application. In the context of random experiments we refer to the universal set as the sample space. For example, the universal set in Experiment E6 is
U = 51, 2, Á 6. A set A is a collection of objects from U, and these objects are called
the elements or points of the set A and will be denoted by lowercase letters,
z, a, b, x, y, Á . We use the notation:
xHA
and
xxA
to indicate that “x is an element of A” or “x is not an element of A,” respectively.
We use Venn diagrams when discussing sets. A Venn diagram is an illustration of
sets and their interrelationships. The universal set U is usually represented as the set of
all points within a rectangle as shown in Fig. 2.2(a). The set A is then the set of points
within an enclosed region inside the rectangle.
We say A is a subset of B if every element of A also belongs to B, that is, if x H A
implies x H B. We say that “A is contained in B” and we write:
A ( B.
If A is a subset of B, then the Venn diagram shows the region for A to be inside the
region for B as shown in Fig. 2.2(e).
26
Chapter 2
Basic Concepts of Probability Theory
U
A
B
A
(a) A 傼 B
B
(b) A 傽 B
A
A
B
Ac
(d) A 傽 B
(c) Ac
A
A
B
B
(e) A 傺 B
A
(f) A B
B
(g) (A 傼 B)c
(h) Ac 傽 Bc
FIGURE 2.2
Set operations and set relations.
Example 2.4
In Experiment E6 three sets of interest might be A = 5x : x Ú 106 = 510, 11, Á 6, that is, 10 or
more transmissions are required; B = 52, 4, 6, Á 6, the number of transmissions is an even number; and C = 5x: x Ú 206 = 520, 21, Á 6. Which of these sets are subsets of the others?
Clearly, C is a subset of A 1C ( A2. However, C is not a subset of B, and B is not a subset
of C, because both sets contain elements the other set does not contain. Similarly, B is not a subset of A, and A is not a subset of B.
The empty set ⵰ is defined as the set with no elements. The empty set is a subset of every set, that is, for any set A, ( A.
We say sets A and B are equal if they contain the same elements. Since every element in A is also in B, then x H A implies x H B, so A ( B. Similarly every element in B
is also in A, so x H B implies x H A and so B ( A. Therefore:
A = B
if and only if A ( B and B ( A.
The standard method to show that two sets, A and B, are equal is to show that
A ( B and B ( A. A second method is to list all the items in A and all the items in B,
and to show that the items are the same. A variation of this second method is to use a
Section 2.1
Specifying Random Experiments
27
Venn diagram to identify the region that corresponds to A and to then show that the
Venn diagram for B occupies the same region. We provide examples of both methods
shortly.
We will use three basic operations on sets. The union and the intersection operations are applied to two sets and produce a third set. The complement operation is applied to a single set to produce another set.
The union of two sets A and B is denoted by A ´ B and is defined as the set of
outcomes that are either in A or in B, or both:
A ´ B = 5x : x H A or x H B6.
The operation A ´ B corresponds to the logical “or” of the properties that define set A
and set B, that is, x is in A ´ B if x satisfies the property that defines A, or x satisfies the
property that defines B, or both. The Venn diagram for A ´ B consists of the shaded
region in Fig. 2.2(a).
The intersection of two sets A and B is denoted by A ¨ B and is defined as the set
of outcomes that are in both A and B:
A ¨ B = 5x : x H A and x H B6.
The operation A ¨ B corresponds to the logical “and” of the properties that define
set A and set B. The Venn diagram for A ¨ B consists of the double shaded region
in Fig. 2.2(b). Two sets are said to be disjoint or mutually exclusive if their intersection is the null set, A ¨ B = . Figure 2.2(d) shows two mutually exclusive sets A
and B.
The complement of a set A is denoted by Ac and is defined as the set of all elements not in A:
Ac = 5x : x x A6.
The operation Ac corresponds to the logical “not” of the property that defines set A.
Figure 2.2(c) shows Ac. Note that Sc = and c = S.
The relative complement or difference of sets A and B is the set of elements in A
that are not in B:
A - B = 5x : x H A and x x B6.
A - B is obtained by removing from A all the elements that are also in B, as illustrated in Fig. 2.2(f). Note that A - B = A ¨ Bc. Note also that Bc = S - B.
Example 2.5
Let A, B, and C be the events from Experiment E6 in Example 2.4. Find the following events:
A ´ B, A ¨ B, Ac, Bc, A - B, and B - A.
A ´ B = 52, 4, 6, 8, 10, 11, 12, Á 6;
A ¨ B = 510, 12, 14, Á 6;
Ac = 5x : x 6 106 = 51, 2, Á , 96;
Bc = 51, 3, 5, Á 6;
28
Chapter 2
Basic Concepts of Probability Theory
A - B = 511, 13, 15, Á 6;
and B - A = 52, 4, 6, 86.
The three basic set operations can be combined to form other sets. The following
properties of set operations are useful in deriving new expressions for combinations
of sets:
Commutative properties:
A´B = B´A
A ¨ B = B ¨ A.
and
(2.1)
Associative properties:
A ´ 1B ´ C2 = 1A ´ B2 ´ C
and
A ¨ 1B ¨ C2 = 1A ¨ B2 ¨ C.
(2.2)
Distributive properties:
A ´ 1B ¨ C2 = 1A ´ B2 ¨ 1A ´ C2
and
A ¨ 1B ´ C2 = 1A ¨ B2 ´ 1A ¨ C2.
(2.3)
By applying the above properties we can derive new identities. DeMorgan’s rules provide an important such example:
DeMorgan’s rules:
1A ´ B2c = Ac ¨ Bc
and
1A ¨ B2c = Ac ´ Bc
(2.4)
Example 2.6
Prove DeMorgan’s rules by using Venn diagrams and by demonstrating set equality.
First we will use a Venn diagram to show the first equality. The shaded region in Fig. 2.2(g)
shows the complement of A ´ B, the left-hand side of the equation. The cross-hatched region in
Fig. 2.2(h) shows the intersection of Ac and Bc. The two regions are the same and so the sets are
equal. Try sketching the Venn diagrams for the second equality in Eq. (2.4).
Next we prove DeMorgan’s rules by proving set equality. The proof has two parts: First we
show that 1A ´ B2c ( Ac ¨ Bc; then we show that Ac ¨ Bc ( 1A ´ B2c. Together these results
imply 1A ´ B2c = Ac ¨ Bc.
First, suppose that x H 1A ´ B2c, then x x A ´ B. In particular, we have x x A, which implies x H Ac. Similarly, we have x x B, which implies x H Bc. Hence x is in both Ac and Bc, that is,
x H Ac ¨ Bc. We have shown that 1A ´ B2c ( Ac ¨ Bc.
To prove inclusion in the other direction, suppose that x H Ac ¨ Bc. This implies that
c
x H A , so x x A. Similarly, x H Bc and so x x B. Therefore, x x 1A ´ B2 and so x H 1A ´ B2c. We
have shown that Ac ¨ Bc ( 1A ´ B2c. This proves that 1A ´ B2c = Ac ¨ Bc.
To prove the second DeMorgan rule, apply the first DeMorgan rule to Ac and Bc to
obtain:
1Ac ´ Bc2c = 1Ac2c ¨ 1Bc2c = A ¨ B,
where we used the identity A = 1Ac2c. Now take complements of both sides of the above
equation:
Ac ´ Bc = 1A ¨ B2c.
Section 2.1
Specifying Random Experiments
29
Example 2.7
For Experiment E10 , let the sets A, B, and C be defined by
A = 5v : ƒ v ƒ 7 106,
B = 5v : v 6 -56,
C = 5v : v 7 06,
“magnitude of v is greater than 10 volts,”
“v is less than -5 volts,”
“v is positive.”
You should then verify that
A ´ B = 5v : v 6 -5 or v 7 106,
A ¨ B = 5v : v 6 -106,
C c = 5v : v … 06,
1A ´ B2 ¨ C = 5v : v 7 106,
A ¨ B ¨ C = , and
1A ´ B2c = 5v : -5 … v … 106.
The union and intersection operations can be repeated for an arbitrary number
of sets. Thus the union of n sets
Á
d Ak = A1 ´ A2 ´ ´ An
n
(2.5)
k=1
is the set that consists of all elements that are in A k for at least one value of k. The same
definition applies to the union of a countably infinite sequence of sets:
d Ak .
(2.6)
Á
t Ak = A1 ¨ A2 ¨ ¨ An
(2.7)
q
k=1
The intersection of n sets
n
k=1
is the set that consists of elements that are in all of the sets A 1 , Á , A n . The same definition applies to the intersection of a countably infinite sequence of sets:
t Ak .
q
(2.8)
k=1
We will see that countable unions and intersections of sets are essential in dealing with
sample spaces that are not finite.
2.1.4
Event Classes
We have introduced the sample space S as the set of all possible outcomes of the random experiment. We have also introduced events as subsets of S. Probability theory
also requires that we state the class F of events of interest. Only events in this class
30
Chapter 2
Basic Concepts of Probability Theory
are assigned probabilities. We expect that any set operation on events in F will produce a set that is also an event in F. In particular, we insist that complements, as well
as countable unions and intersections of events in F, i.e., Eqs. (2.1) and (2.5) through
(2.8), result in events in F. When the sample space S is finite or countable, we simply
let F consist of all subsets of S and we can proceed without further concerns about F.
However, when S is the real line R (or an interval of the real line), we cannot let F be
all possible subsets of R and still satisfy the axioms of probability. Fortunately, we can
obtain all the events of practical interest by letting F be of the class of events obtained as complements and countable unions and intersections of intervals of the real
line, e.g., (a, b] or 1- q , b]. We will refer to this class of events as the Borel field. In the
remainder of the book, we will refer to the event class F from time to time. For the introductory-level course in probability you will not need to know more than what is
stated in this paragraph.
When we speak of a class of events we are referring to a collection (set) of events
(sets), that is, we are speaking of a “set of sets.” We refer to the collection of sets as a
class to remind us that the elements of the class are sets. We use script capital letters to
refer to a class, e.g., C, F, G. If the class C consists of the collection of sets A 1 , Á , A k ,
then we write C = 5A 1 , Á , A k6.
Example 2.8
Let S = 5T, H6 be the outcome of a coin toss. Let every subset of S be an event. Find all possible events of S.
An event is a subset of S, so we need to find all possible subsets of S. These are:
S = 5, 5H6, 5T6, 5H, T66.
Note that S includes both the empty set and S. Let iT and iH be binary numbers where i = 1 indicates that the corresponding element of S is in a given subset. We generate all possible subsets
by taking all possible values of the pair iT and iH . Thus iT = 0, iH = 1 corresponds to the set
5H6. Clearly there are 2 2 possible subsets as listed above.
For a finite sample space, S = 51, 2, Á , k6,2 we usually allow all subsets of S to be
events. This class of events is called the power set of S and we will denote it by S. We can
index all possible subsets of S with binary numbers i1 , i2 , Á , ik , and we find that the
power set of S has 2 k members. Because of this, the power set is also denoted by S = 2 S.
Section 2.8 discusses some of the fine points on event classes.
2.2
THE AXIOMS OF PROBABILITY
Probabilities are numbers assigned to events that indicate how “likely” it is that the
events will occur when an experiment is performed. A probability law for a random experiment is a rule that assigns probabilities to the events of the experiment that belong
to the event class F. Thus a probability law is a function that assigns a number to sets
(events). In Section 1.3 we found a number of properties of relative frequency that any
definition of probability should satisfy. The axioms of probability formally state that a
The discussion applies to any finite sample space with arbitrary objects S = 5x1 , Á , xk6, but we consider
51, 2, Á , k6 for notational simplicity.
2
Section 2.2
The Axioms of Probability
31
probability law must satisfy these properties. In this section, we develop a number of
results that follow from this set of axioms.
Let E be a random experiment with sample space S and event class F. A
probability law for the experiment E is a rule that assigns to each event A H F a
number P[A], called the probability of A, that satisfies the following axioms:
Axiom I
Axiom II
Axiom III
Axiom III¿
0 … P3A4
P3S4 = 1
If A ¨ B = , then P3A ´ B4 = P3A4 + P3B4.
If A 1 , A 2 , Á is a sequence of events such that
A i ¨ A j = for all i Z j, then
P B d A k R = a P3A k4.
q
q
k=1
k=1
Axioms I, II, and III are enough to deal with experiments with finite sample
spaces. In order to handle experiments with infinite sample spaces, Axiom III needs to
be replaced by Axiom III¿. Note that Axiom III¿ includes Axiom III as a special case,
by letting A k = for k Ú 3. Thus we really only need Axioms I, II, and III¿. Nevertheless we will gain greater insight by starting with Axioms I, II, and III.
The axioms allow us to view events as objects possessing a property (i.e., their
probability) that has attributes similar to physical mass. Axiom I states that the probability (mass) is nonnegative, and Axiom II states that there is a fixed total amount of
probability (mass), namely 1 unit. Axiom III states that the total probability (mass) in
two disjoint objects is the sum of the individual probabilities (masses).
The axioms provide us with a set of consistency rules that any valid probability
assignment must satisfy. We now develop several properties stemming from the axioms
that are useful in the computation of probabilities.
The first result states that if we partition the sample space into two mutually exclusive events, A and Ac, then the probabilities of these two events add up to one.
Corollary 1
P3Ac4 = 1 - P3A4
Proof: Since an event A and its complement Ac are mutually exclusive, A ¨ Ac = , we have
from Axiom III that
P3A ´ Ac4 = P3A4 + P3Ac4.
Since S = A ´ Ac, by Axiom II,
1 = P3S4 = P3A ´ Ac4 = P3A4 + P3Ac4.
The corollary follows after solving for P3Ac4.
The next corollary states that the probability of an event is always less than or
equal to one. Corollary 2 combined with Axiom I provide good checks in problem
32
Chapter 2
Basic Concepts of Probability Theory
solving: If your probabilities are negative or are greater than one, you have made a
mistake somewhere!
Corollary 2
P3A4 … 1
Proof: From Corollary 1,
P3A4 = 1 - P3Ac4 … 1,
since P3Ac4 Ú 0.
Corollary 3 states that the impossible event has probability zero.
Corollary 3
P34 = 0
Proof: Let A = S and Ac = in Corollary 1:
P34 = 1 - P3S4 = 0.
Corollary 4 provides us with the standard method for computing the probability
of a complicated event A. The method involves decomposing the event A into the
union of disjoint events A 1 , A 2 , Á , A n . The probability of A is the sum of the probabilities of the A k’s.
Corollary 4
If A 1 , A 2 , Á , A n are pairwise mutually exclusive, then
P B d A k R = a P3A k4
n
n
k=1
k=1
for n Ú 2.
Proof: We use mathematical induction. Axiom III implies that the result is true for n = 2. Next
we need to show that if the result is true for some n, then it is also true for n + 1. This, combined
with the fact that the result is true for n = 2, implies that the result is true for n Ú 2.
Suppose that the result is true for some n 7 2; that is,
P B d A k R = a P3A k4,
n
n
k=1
k=1
(2.9)
and consider the n + 1 case
P B d A k R = P B b d A k r ´ A n + 1 R = P B d A k R + P3A n + 14,
n+1
n
n
k=1
k=1
k=1
(2.10)
where we have applied Axiom III to the second expression after noting that the union of events
A 1 to A n is mutually exclusive with A n + 1 . The distributive property then implies
b d A k r ¨ A n + 1 = d 5A k ¨ A n + 16 = d = .
n
n
n
k=1
k=1
k=1
Section 2.2
The Axioms of Probability
33
Substitution of Eq. (2.9) into Eq. (2.10) gives the n + 1 case
P B d A k R = a P3A k4.
n+1
n+1
k=1
k=1
Corollary 5 gives an expression for the union of two events that are not necessarily mutually exclusive.
Corollary 5
P3A ´ B4 = P3A4 + P3B4 - P3A ¨ B4
Proof: First we decompose A ´ B, A, and B as unions of disjoint events. From the Venn diagram
in Fig. 2.3,
P3A ´ B4 = P3A ¨ Bc4 + P3B ¨ Ac4 + P3A ¨ B4
P3A4 = P3A ¨ Bc4 + P3A ¨ B4
P3B4 = P3B ¨ Ac4 + P3A ¨ B4
By substituting P3A ¨ Bc4 and P3B ¨ Ac4 from the two lower equations into the top equation,
we obtain the corollary.
By looking at the Venn diagram in Fig. 2.3, you will see that the sum P[A] + P[B]
counts the probability (mass) of the set A ¨ B twice. The expression in Corollary 5
makes the appropriate correction.
Corollary 5 is easily generalized to three events,
P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4
- P3A ¨ C4 - P3B ¨ C4 + P3A ¨ B ¨ C4,
and in general to n events, as shown in Corollary 6.
A 傽 Bc
Ac 傽 B
A傽B
A
B
FIGURE 2.3
Decomposition of A ´ B into three disjoint sets.
(2.11)
34
Chapter 2
Basic Concepts of Probability Theory
Corollary 6
P B d A k R = a P3A j4 - a P3A j ¨ A k4 + Á
n
n
k=1
j=1
j6k
+ 1-12n + 1P3A 1 ¨ Á ¨ A n4.
Proof is by induction (see Problems 2.26 and 2.27).
Since probabilities are nonnegative, Corollary 5 implies that the probability
of the union of two events is no greater than the sum of the individual event probabilities
P3A ´ B4 … P3A4 + P3B4.
(2.12)
The above inequality is a special case of the fact that a subset of another set must
have smaller probability. This result is frequently used to obtain upper bounds for
probabilities of interest. In the typical situation, we are interested in an event A whose
probability is difficult to find; so we find an event B for which the probability can be
found and that includes A as a subset.
Corollary 7
If A ( B, then P3A4 … P3B4.
Proof: In Fig. 2.4, B is the union of A and Ac ¨ B, thus
P3B4 = P3A4 + P3Ac ¨ B4 Ú P3A4,
since P3Ac ¨ B4 Ú 0.
The axioms together with the corollaries provide us with a set of rules for computing the probability of certain events in terms of other events. However, we still need an
initial probability assignment for some basic set of events from which the probability of
all other events can be computed. This problem is dealt with in the next two subsections.
A
Ac 傽 B
B
FIGURE 2.4
If A ( B, then P1A2 … P1B2.
Section 2.2
2.2.1
The Axioms of Probability
35
Discrete Sample Spaces
In this section we show that the probability law for an experiment with a countable sample space can be specified by giving the probabilities of the elementary events. First, suppose that the sample space is finite, S = 5a1 , a2 , Á , an6 and let F consist of all subsets
of S. All distinct elementary events are mutually exclusive, so by Corollary 4 the probœ
ability of any event B = 5a1œ , a2œ , Á , am
6 is given by
œ
P3B4 = P35a1œ , a2œ , Á , am
64
œ
= P35a1œ 64 + P35a2œ 64 + Á + P35am
64;
(2.13)
that is, the probability of an event is equal to the sum of the probabilities of the outcomes
in the event.Thus we conclude that the probability law for a random experiment with a finite sample space is specified by giving the probabilities of the elementary events.
If the sample space has n elements, S = 5a1 , Á , an6, a probability assignment of
particular interest is the case of equally likely outcomes. The probability of the elementary events is
1
P35a164 = P35a264 = Á = P35an64 = .
n
(2.14)
k
P3B4 = P35a1œ 64 + Á + P35akœ 64 = .
n
(2.15)
The probability of any event that consists of k outcomes, say B = 5a1œ , Á , akœ 6, is
Thus if outcomes are equally likely, then the probability of an event is equal to the number of outcomes in the event divided by the total number of outcomes in the sample
space. Section 2.3 discusses counting methods that are useful in finding probabilities in
experiments that have equally likely outcomes.
Consider the case where the sample space is countably infinite, S = 5a1 , a2 , Á 6.
Let the event class F be the class of all subsets of S. Note that F must now satisfy Eq. (2.8)
because events can consist of countable unions of sets. Axiom III¿ implies that the
probability of an event such as D = 5b1 , b2 , b3 , Á 6 is given by
P3D4 = P35b1œ , b2œ , b3œ , Á 64 = P35b1œ 64 + P35b2œ 64 + P35b3œ 64 + Á
The probability of an event with a countably infinite sample space is determined from
the probabilities of the elementary events.
Example 2.9
An urn contains 10 identical balls numbered 0, 1, Á , 9. A random experiment involves selecting a
ball from the urn and noting the number of the ball. Find the probability of the following events:
A = “number of ball selected is odd,”
B = “number of ball selected is a multiple of 3,”
C = “number of ball selected is less than 5,”
and of A ´ B and A ´ B ´ C.
36
Chapter 2
Basic Concepts of Probability Theory
The sample space is S = 50, 1, Á , 96, so the sets of outcomes corresponding to the above
events are
A = 51, 3, 5, 7, 96,
B = 53, 6, 96,
C = 50, 1, 2, 3, 46.
and
If we assume that the outcomes are equally likely, then
P3A4 = P35164 + P35364 + P35564 + P35764 + P35964 =
P3B4 = P35364 + P35664 + P35964 =
5
.
10
3
.
10
P3C4 = P35064 + P35164 + P35264 + P35364 + P35464 =
5
.
10
From Corollary 5,
P3A ´ B4 = P3A4 + P3B4 - P3A ¨ B4 =
5
3
2
6
+
=
,
10
10
10
10
where we have used the fact that A ¨ B = 53, 96, so P3A ¨ B4 = 2>10. From Corollary 6,
P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4
- P3A ¨ C4 - P3B ¨ C4 + P3A ¨ B ¨ C4
=
3
5
2
2
1
1
5
+
+
+
10
10
10
10
10
10
10
=
9
.
10
You should verify the answers for P3A ´ B4 and P3A ´ B ´ C4 by enumerating the outcomes in
the events.
Many probability models can be devised for the same sample space and events by
varying the probability assignment; in the case of finite sample spaces all we need to do
is come up with n nonnegative numbers that add up to one for the probabilities of the
elementary events. Of course, in any particular situation, the probability assignment
should be selected to reflect experimental observations to the extent possible. The following example shows that situations can arise where there is more than one “reasonable” probability assignment and where experimental evidence is required to decide
on the appropriate assignment.
Example 2.10
Suppose that a coin is tossed three times. If we observe the sequence of heads and tails, then
there are eight possible outcomes S3 = 5HHH, HHT, HTH, THH, TTH, THT, HTT, TTT6. If
we assume that the outcomes of S3 are equiprobable, then the probability of each of the eight elementary events is 1/8. This probability assignment implies that the probability of obtaining two
heads in three tosses is, by Corollary 3,
P3“2 heads in 3 tosses”4 = P35HHT, HTH, THH64
= P35HHT64 + P35HTH64 + P35THH64 =
3
.
8
Section 2.2
The Axioms of Probability
37
Now suppose that we toss a coin three times but we count the number of heads in three
tosses instead of observing the sequence of heads and tails. The sample space is now
S4 = 50, 1, 2, 36. If we assume the outcomes of S4 to be equiprobable, then each of the elementary events of S4 has probability 1/4. This second probability assignment predicts that the probability of obtaining two heads in three tosses is
P3“2 heads in 3 tosses”4 = P35264 =
1
.
4
The first probability assignment implies that the probability of two heads in three tosses is 3/8, and the second probability assignment predicts that the probability is 1/4. Thus the
two assignments are not consistent with each other. As far as the theory is concerned, either
one of the assignments is acceptable. It is up to us to decide which assignment is more appropriate. Later in the chapter we will see that only the first assignment is consistent with
the assumption that the coin is fair and that the tosses are “independent.” This assignment
correctly predicts the relative frequencies that would be observed in an actual coin tossing
experiment.
Finally we consider an example with a countably infinite sample space.
Example 2.11
A fair coin is tossed repeatedly until the first heads shows up; the outcome of the experiment is
the number of tosses required until the first heads occurs. Find a probability law for this experiment.
It is conceivable that an arbitrarily large number of tosses will be required until heads
occurs, so the sample space is S = 51, 2, 3, Á 6. Suppose the experiment is repeated n times.
Let Nj be the number of trials in which the jth toss results in the first heads. If n is very large,
we expect N1 to be approximately n/2 since the coin is fair. This implies that a second toss is
necessary about n - N1 L n>2 times, and again we expect that about half of these—that is,
n/4—will result in heads, and so on, as shown in Fig. 2.5. Thus for large n, the relative frequencies are
fj L
Nj
1 j
= a b
n
2
j = 1, 2, Á .
We therefore conclude that a reasonable probability law for this experiment is
1 j
P3 j tosses till first heads4 = a b
2
j = 1, 2, Á .
(2.16)
We can verify that these probabilities add up to one by using the geometric series with a = 1/2:
a
j
= 1.
aa = 1 - a `
j=1
a = 1/2
q
2.2.2
Continuous Sample Spaces
Continuous sample spaces arise in experiments in which the outcomes are numbers
that can assume a continuum of values, so we let the sample space S be the entire real
line R (or some interval of the real line). We could consider letting the event class consist of all subsets of R. But it turns out that this class is “too large” and it is impossible
38
Chapter 2
Basic Concepts of Probability Theory
n trials
Tails
Heads
⬇
n
N1 ⬇
2
n trials
2
Tails
Heads
⬇
1 n
n
N1 ⬇
2 2
4
n trials
4
Tails
Heads
N3 ⬇
⬇
n
8
n trials
8
Heads
N4 ⬇
n
16
FIGURE 2.5
In n trials heads comes up in the first toss approximately n/2 times, in
the second toss approximately n/4 times, and so on.
to assign probabilities to all the subsets of R. Fortunately, it is possible to assign probabilities to all events in a smaller class that includes all events of practical interest. This
class denoted by B, is called the Borel field and it contains all open and closed intervals
of the real line as well as all events that can be obtained as countable unions, intersections, and complements.3 Axiom III¿ is once again the key to calculating probabilities of
events. Let A 1 , A 2 , Á be a sequence of mutually exclusive events that are represented
by intervals of the real line, then
P B d A k R = a P3A k4
q
q
k=1
k=1
where each P3A k4 is specified by the probability law. For this reason, probability laws
in experiments with continuous sample spaces specify a rule for assigning numbers to intervals of the real line.
Example 2.12
Consider the random experiment “pick a number x at random between zero and one.” The sample
space S for this experiment is the unit interval [0, 1], which is uncountably infinite. If we suppose that
all the outcomes S are equally likely to be selected, then we would guess that the probability that the
outcome is in the interval [0, 1/2] is the same as the probability that the outcome is in the interval
[1/2, 1].We would also guess that the probability of the outcome being exactly equal to 1/2 would be
zero since there are an uncountably infinite number of equally likely outcomes.
3
Section 2.9 discusses B in more detail.
Section 2.2
The Axioms of Probability
39
Consider the following probability law: “The probability that the outcome falls in a subinterval of S is equal to the length of the subinterval,” that is,
P33a, b44 = 1b - a2
for 0 … a … b … 1,
(2.17)
where by P[[a, b]] we mean the probability of the event corresponding to the interval [a, b].
Clearly, Axiom I is satisfied since b Ú a Ú 0. Axiom II follows from S = 3a, b4 with a = 0 and
b = 1.
We now show that the probability law is consistent with the previous guesses about the
probabilities of the events [0, 1/2], [1/2, 1], and 51/26:
P330, 0.544 = 0.5 - 0 = .5
P330.5, 144 = 1 - 0.5 = .5
In addition, if x0 is any point in S, then P33x0 , x044 = 0 since individual points have zero width.
Now suppose that we are interested in an event that is the union of several intervals; for
example, “the outcome is at least 0.3 away from the center of the unit interval,” that is,
A = 30, 0.24 ´ 30.8, 14. Since the two intervals are disjoint, we have by Axiom III
P3A4 = P330, 0.244 + P330.8, 144 = .4.
The next example shows that an initial probability assignment that specifies the
probability of semi-infinite intervals also suffices to specify the probabilities of all
events of interest.
Example 2.13
Suppose that the lifetime of a computer memory chip is measured, and we find that “the proportion of chips whose lifetime exceeds t decreases exponentially at a rate a.” Find an appropriate
probability law.
Let the sample space in this experiment be S = 10, q 2. If we interpret the above finding
as “the probability that a chip’s lifetime exceeds t decreases exponentially at a rate a,” we then
obtain the following assignment of probabilities to events of the form 1t, q 2:
P31t, q 24 = e -at
for t 7 0,
(2.18)
where a 7 0. Note that the exponential is a number between 0 and 1 for t 7 0, so Axiom I is satisfied. Axiom II is satisfied since
P3S4 = P310, q 24 = 1.
The probability that the lifetime is in the interval (r, s] is found by noting in Fig. 2.6 that
1r, s4 ´ 1s, q 2 = 1r, q 2, so by Axiom III,
P31r, q 24 = P31r, s44 + P31s, q 24.
共
r
FIGURE 2.6
1r, q 2 = 1r, s4 ´ 1s, q 2.
兴共
s
40
Chapter 2
Basic Concepts of Probability Theory
By rearranging the above equation we obtain
P31r, s44 = P31r, q 24 - P31s, q 24 = e -ar - e -as.
We thus obtain the probability of arbitrary intervals in S.
In both Example 2.12 and Example 2.13, the probability that the outcome takes on
a specific value is zero. You may ask: If an outcome (or event) has probability zero, doesn’t
that mean it cannot occur? And you may then ask: How can all the outcomes in a sample space have probability zero? We can explain this paradox by using the relative
frequency interpretation of probability.An event that occurs only once in an infinite number of trials will have relative frequency zero. Hence the fact that an event or outcome has
relative frequency zero does not imply that it cannot occur, but rather that it occurs very
infrequently. In the case of continuous sample spaces, the set of possible outcomes is so
rich that all outcomes occur infrequently enough that their relative frequencies are zero.
We end this section with an example where the events are regions in the plane.
Example 2.14
Consider Experiment E12 , where we picked two numbers x and y at random between zero and
one. The sample space is then the unit square shown in Fig. 2.7(a). If we suppose that all pairs of
numbers in the unit square are equally likely to be selected, then it is reasonable to use a probability assignment in which the probability of any region R inside the unit square is equal to the
area of R. Find the probability of the following events: A = 5x 7 0.56, B = 5y 7 0.56, and
C = 5x 7 y6.
y
y
1
1
S
x
0
1
x
0
1
2
1
2
(b) Event 兵x
(a) Sample space
y
1
x
1
其
2
y
1
1
y
1
2
1
2
xy
0
1
1
(c) Event 兵y 其
2
x
0
1
(d) Event 兵x y其
FIGURE 2.7
A two-dimensional sample space and three events.
x
Section 2.3
Computing Probabilities Using Counting Methods
41
Figures 2.7(b) through 2.7(d) show the regions corresponding to the events A, B, and C.
Clearly each of these regions has area 1/2. Thus
1
1
1
P3B4 = ,
P3C4 = .
P3A4 = ,
2
2
2
We reiterate how to proceed from a problem statement to its probability model.
The problem statement implicitly or explicitly defines a random experiment, which
specifies an experimental procedure and a set of measurements and observations.
These measurements and observations determine the set of all possible outcomes and
hence the sample space S.
An initial probability assignment that specifies the probability of certain events
must be determined next. This probability assignment must satisfy the axioms of probability. If S is discrete, then it suffices to specify the probabilities of elementary events.
If S is continuous, it suffices to specify the probabilities of intervals of the real line or
regions of the plane. The probability of other events of interest can then be determined
from the initial probability assignment and the axioms of probability and their corollaries. Many probability assignments are possible, so the choice of probability assignment must reflect experimental observations and/or previous experience.
*2.3
COMPUTING PROBABILITIES USING COUNTING METHODS4
In many experiments with finite sample spaces, the outcomes can be assumed to be
equiprobable. The probability of an event is then the ratio of the number of outcomes in
the event of interest to the total number of outcomes in the sample space (Eq. (2.15)).
The calculation of probabilities reduces to counting the number of outcomes in an
event. In this section, we develop several useful counting (combinatorial) formulas.
Suppose that a multiple-choice test has k questions and that for question i the
student must select one of ni possible answers. What is the total number of ways of answering the entire test? The answer to question i can be viewed as specifying the ith
component of a k-tuple, so the above question is equivalent to: How many distinct ordered k-tuples 1x1 , Á , xk2 are possible if xi is an element from a set with ni distinct elements?
Consider the k = 2 case. If we arrange all possible choices for x1 and for x2 along
the sides of a table as shown in Fig. 2.8, we see that there are n1n2 distinct ordered pairs.
For triplets we could arrange the n1n2 possible pairs 1x1 , x22 along the vertical side of
the table and the n3 choices for x3 along the horizontal side. Clearly, the number of possible triplets is n1n2n3 .
In general, the number of distinct ordered k-tuples 1x1 , Á , xk2 with components
xi from a set with ni distinct elements is
number of distinct ordered k-tuples = n1n2 Á nk .
(2.19)
Many counting problems can be posed as sampling problems where we select
“balls” from “urns” or “objects” from “populations.” We will now use Eq. (2.19) to develop combinatorial formulas for various types of sampling.
4
This section and all sections marked with an asterisk may be skipped without loss of continuity.
42
Chapter 2
Basic Concepts of Probability Theory
x1
an1
b1
(a1,b1)
(a2,b1)
...
(an1,b1)
b2
(a1,b2)
(a2,b2)
...
(an1,b2)
...
(an1,bn2)
.
(a1,bn2)
..
bn2
...
...
...
a2
...
x2
a1
(a2,bn2)
FIGURE 2.8
If there are n1 distinct choices for x1 and n2 distinct choices
for x2, then there are n1n2 distinct ordered pairs 1x1 , x22.
2.3.1
Sampling with Replacement and with Ordering
Suppose we choose k objects from a set A that has n distinct objects, with replacement—that is, after selecting an object and noting its identity in an ordered list, the object is placed back in the set before the next choice is made. We will refer to the set A
as the “population.” The experiment produces an ordered k-tuple
1x1 , Á , xk2,
where xi H A and i = 1, Á , k. Equation (2.19) with n1 = n2 = Á = nk = n implies that
number of distinct ordered k-tuples = nk.
(2.20)
Example 2.15
An urn contains five balls numbered 1 to 5. Suppose we select two balls from the urn with replacement. How many distinct ordered pairs are possible? What is the probability that the two
draws yield the same number?
Equation (2.20) states that the number of ordered pairs is 52 = 25. Table 2.1 shows the 25
possible pairs. Five of the 25 outcomes have the two draws yielding the same number; if we suppose that all pairs are equiprobable, then the probability that the two draws yield the same number is 5/25 = .2.
2.3.2
Sampling without Replacement and with Ordering
Suppose we choose k objects in succession without replacement from a population A of
n distinct objects. Clearly, k … n. The number of possible outcomes in the first draw is
n1 = n; the number of possible outcomes in the second draw is n2 = n - 1, namely all
n objects except the one selected in the first draw; and so on, up to nk = n - 1k - 12 in
the final draw. Equation (2.19) then gives
number of distinct ordered k-tuples = n1n - 12 Á 1n - k + 12.
(2.21)
Section 2.3
Computing Probabilities Using Counting Methods
43
TABLE 2.1 Enumeration of possible outcomes in various types of
sampling of two balls from an urn containing five distinct balls.
(a) Ordered pairs for sampling with replacement.
(1, 1)
(2, 1)
(3, 1)
(4, 1)
(5, 1)
(1, 2)
(2, 2)
(3, 2)
(4, 2)
(5, 2)
(1, 3)
(2, 3)
(3, 3)
(4, 3)
(5, 3)
(1, 4)
(2, 4)
(3, 4)
(4, 4)
(5, 4)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
(5, 5)
(b) Ordered pairs for sampling without replacement.
(1, 2)
(2, 1)
(3, 1)
(4, 1)
(5, 1)
(3, 2)
(4, 2)
(5, 2)
(1, 3)
(1, 4)
(1, 5)
(2, 3)
(2, 4)
(3, 4)
(2, 5)
(3, 5)
(4, 5)
(4, 3)
(5, 3)
(5, 4)
(c) Pairs for sampling without replacement or ordering.
(1, 2)
(1, 3)
(2, 3)
(1, 4)
(2, 4)
(3, 4)
(1, 5)
(2, 5)
(3, 5)
(4, 5)
Example 2.16
An urn contains five balls numbered 1 to 5. Suppose we select two balls in succession without replacement. How many distinct ordered pairs are possible? What is the probability that the first
ball has a number larger than that of the second ball?
Equation (2.21) states that the number of ordered pairs is 5142 = 20. The 20 possible ordered pairs are shown in Table 2.1(b). Ten ordered pairs in Tab. 2.1(b) have the first number larger than the second number; thus the probability of this event is 10/20 = 1/2.
Example 2.17
An urn contains five balls numbered 1, 2, Á , 5. Suppose we draw three balls with replacement.
What is the probability that all three balls are different?
From Eq. (2.20) there are 53 = 125 possible outcomes, which we will suppose are
equiprobable. The number of these outcomes for which the three draws are different is given
by Eq. (2.21): 5142132 = 60. Thus the probability that all three balls are different is
60/125 = .48.
2.3.3
Permutations of n Distinct Objects
Consider sampling without replacement with k = n. This is simply drawing objects
from an urn containing n distinct objects until the urn is empty. Thus, the number of
possible orderings (arrangements, permutations) of n distinct objects is equal to the
44
Chapter 2
Basic Concepts of Probability Theory
number of ordered n-tuples in sampling without replacement with k = n. From Eq. (2.21),
we have
number of permutations of n objects = n1n - 12 Á 122112 ! n!.
(2.22)
We refer to n! as n factorial.
We will see that n! appears in many of the combinatorial formulas. For large n,
Stirling’s formula is very useful:
n! ' 22p nn + 1/2e -n,
(2.23)
where the sign ' indicates that the ratio of the two sides tends to unity as n : q
[Feller, p. 52].
Example 2.18
Find the number of permutations of three distinct objects 51, 2, 36. Equation (2.22) gives
3! = 3122112 = 6. The six permutations are
123
312
231
132
213
321.
Example 2.19
Suppose that 12 balls are placed at random into 12 cells, where more than 1 ball is allowed to occupy a cell. What is the probability that all cells are occupied?
The placement of each ball into a cell can be viewed as the selection of a cell number between 1 and 12. Equation (2.20) implies that there are 12 12 possible placements of the 12 balls in
the 12 cells. In order for all cells to be occupied, the first ball selects from any of the 12 cells, the
second ball from the remaining 11 cells, and so on. Thus the number of placements that occupy
all cells is 12!. If we suppose that all 12 12 possible placements are equiprobable, we find that the
probability that all cells are occupied is
12 11
1
12!
= a b a b Á a b = 5.37110-52.
12 12
12
12 12
This answer is surprising if we reinterpret the question as follows. Given that 12 airplane
crashes occur at random in a year, what is the probability that there is exactly 1 crash each
month? The above result shows that this probability is very small. Thus a model that assumes
that crashes occur randomly in time does not predict that they tend to occur uniformly over time
[Feller, p. 32].
2.3.4
Sampling without Replacement and without Ordering
Suppose we pick k objects from a set of n distinct objects without replacement and that
we record the result without regard to order. (You can imagine putting each selected
object into another jar, so that when the k selections are completed we have no record
of the order in which the selection was done.) We call the resulting subset of k selected
objects a “combination of size k.”
From Eq. (2.22), there are k! possible orders in which the k objects in the second
jar could have been selected. Thus if C nk denotes the number of combinations of size k
Section 2.3
Computing Probabilities Using Counting Methods
45
from a set of size n, then C nkk! must be the total number of distinct ordered samples of
k objects, which is given by Eq. (2.21). Thus
C nkk! = n1n - 12 Á 1n - k + 12,
(2.24)
and the number of different combinations of size k from a set of size n, k … n, is
C nk =
n1n - 12 Á 1n - k + 12
k!
=
n
n!
! ¢ ≤.
k
k! 1n - k2!
(2.25)
The expression A k B is called a binomial coefficient and is read “n choose k.”
Note that choosing k objects out of a set of n is equivalent to choosing the n - k
objects that are to be left out. It then follows that (also see Problem 2.60):
n
n
k
¢ ≤ = ¢
n
≤.
n - k
Example 2.20
Find the number of ways of selecting two objects from A = 51, 2, 3, 4, 56 without regard to order.
Equation (2.25) gives
5
2
¢ ≤ =
5!
= 10.
2! 3!
Table 2.1(c) gives the 10 pairs.
Example 2.21
Find the number of distinct permutations of k white balls and n - k black balls.
This problem is equivalent to the following sampling problem: Put n tokens numbered 1 to
n in an urn, where each token represents a position in the arrangement of balls; pick a combination of k tokens and put the k white balls in the corresponding positions. Each combination of
size k leads to a distinct arrangement (permutation) of k white balls and n - k black balls. Thus
the number of distinct permutations of k white balls and n - k black balls is C nk .
As a specific example let n = 4 and k = 2. The number of combinations of size 2 from a
set of four distinct objects is
4
2
¢ ≤ =
4132
4!
=
= 6.
2! 2!
2112
The 6 distinct permutations with 2 whites (zeros) and 2 blacks (ones) are
1100
0110
0011
1001
1010
0101.
Example 2.22 Quality Control
A batch of 50 items contains 10 defective items. Suppose 10 items are selected at random and
tested. What is the probability that exactly 5 of the items tested are defective?
46
Chapter 2
Basic Concepts of Probability Theory
The number of ways of selecting 10 items out of a batch of 50 is the number of combinations of size 10 from a set of 50 objects:
¢
50
50!
.
≤ =
10
10! 40!
The number of ways of selecting 5 defective and 5 nondefective items from the batch of 50 is the
product N1N2 , where N1 is the number of ways of selecting the 5 items from the set of 10 defective items, and N2 is the number of ways of selecting 5 items from the 40 nondefective items. Thus
the probability that exactly 5 tested items are defective is
¢
10 40
≤¢ ≤
5
5
¢
50
≤
10
=
10! 40! 10! 40!
= .016.
5! 5! 35! 5! 50!
Example 2.21 shows that sampling without replacement and without ordering is
equivalent to partitioning the set of n distinct objects into two sets: B, containing the k
items that are picked from the urn, and Bc, containing the n - k left behind. Suppose
we partition a set of n distinct objects into J subsets B1 , B2 , Á , BJ , where BJ is assigned kJ elements and k1 + k2 + Á + kJ = n.
In Problem 2.61, it is shown that the number of distinct partitions is
n!
.
k1! k2! Á kJ!
(2.26)
Equation (2.26) is called the multinomial coefficient. The binomial coefficient is the
J = 2 case of the multinomial coefficient.
Example 2.23
A six-sided die is tossed 12 times. How many distinct sequences of faces (numbers from the set
51, 2, 3, 4, 5, 66) have each number appearing exactly twice? What is the probability of obtaining
such a sequence?
The number of distinct sequences in which each face of the die appears exactly twice is the
same as the number of partitions of the set 51, 2, Á , 126 into 6 subsets of size 2, namely
12!
12!
= 6 = 7,484,400.
2! 2! 2! 2! 2! 2!
2
From Eq. (2.20) we have that there are 612 possible outcomes in 12 tosses of a die. If we suppose
that all of these have equal probabilities, then the probability of obtaining a sequence in which
each face appears exactly twice is
7,484,400
12!/2 6
M 3.4110-32.
=
2,176,782,336
612
Section 2.4
2.3.5
Conditional Probability
47
Sampling with Replacement and without Ordering
Suppose we pick k objects from a set of n distinct objects with replacement and we
record the result without regard to order. This can be done by filling out a form which
has n columns, one for each distinct object. Each time an object is selected, an “x” is
placed in the corresponding column. For example, if we are picking 5 objects from 4
distinct objects, one possible form would look like this:
Object 1
xx
Object 2
/
Object 3
/
x
Object 4
/
xx
where the slash symbol (“/”) is used to separate the entries for different columns. Note
that this form can be summarized by the sequence
xx//x/xx
where the n - 1 /’s indicate the lines between columns, and where nothing appears between consecutive /’s if the corresponding object was not selected. Each different
arrangement of 5 x’s and 3 /’s leads to a distinct form. If we identify x’s with “white
balls” and /’s with “black balls,” then this problem was considered in Example 2.21, and
8
the number of different arrangements is given by A 3 B .
In the general case the form will involve k x’s and n - 1 /’s. Thus the number of
different ways of picking k objects from a set of n distinct objects with replacement and
without ordering is given by
¢
2.4
n - 1 + k
n - 1 + k
≤ = ¢
≤.
k
n - 1
CONDITIONAL PROBABILITY
Quite often we are interested in determining whether two events, A and B, are related in
the sense that knowledge about the occurrence of one, say B, alters the likelihood of occurrence of the other, A. This requires that we find the conditional probability, P3A ƒ B4,
of event A given that event B has occurred. The conditional probability is defined by
P3A ƒ B4 =
P3A ¨ B4
P3B4
for P3B4 7 0.
(2.27)
Knowledge that event B has occurred implies that the outcome of the experiment is in the set B. In computing P3A ƒ B4 we can therefore view the experiment as
now having the reduced sample space B as shown in Fig. 2.9. The event A occurs in the
reduced sample space if and only if the outcome z is in A ¨ B. Equation (2.27) simply
renormalizes the probability of events that occur jointly with B. Thus if we let A = B,
Eq. (2.27) gives P3B ƒ B4 = 1, as required. It is easy to show that P3A ƒ B4, for fixed B,
satisfies the axioms of probability. (See Problem 2.74.)
If we interpret probability as relative frequency, then P3A ƒ B4 should be the relative frequency of the event A ¨ B in experiments where B occurred. Suppose that the
experiment is performed n times, and suppose that event B occurs nB times, and that
48
Chapter 2
Basic Concepts of Probability Theory
S
B
A傽B
A
FIGURE 2.9
If B is known to have occurred, then A can occur only
if A ¨ B occurs.
event A ¨ B occurs nA¨B times. The relative frequency of interest is then
P3A ¨ B4
nA¨B/n
nA¨B
=
:
,
nB
nB/n
P3B4
where we have implicitly assumed that P3B4 7 0. This is in agreement with Eq. (2.27).
Example 2.24
A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls,
numbered 3 and 4. The number and color of the ball is noted, so the sample space is
511, b2, 12, b2, 13, w2, 14, w26. Assuming that the four outcomes are equally likely, find P3A ƒ B4
and P3A ƒ C4, where A, B, and C are the following events:
A = 511, b2, 12, b26, “black ball selected,”
B = 512, b2, 14, w26, “even-numbered ball selected,” and
C = 513, w2, 14, w26, “number of ball is greater than 2.”
Since P3A ¨ B4 = P312, b24 and P3A ¨ C4 = P34 = 0, Eq. (2.24) gives
P3A ƒ B4 =
P3A ƒ C4 =
P3A ¨ B4
P3B4
P3A ¨ C4
P3C4
=
.25
= .5 = P3A4
.5
=
0
= 0 Z P3A4.
.5
In the first case, knowledge of B did not alter the probability of A. In the second case, knowledge
of C implied that A had not occurred.
If we multiply both sides of the definition of P3A ƒ B4 by P[B] we obtain
P3A ¨ B4 = P3A ƒ B4P3B4.
(2.28a)
P3A ¨ B4 = P3B ƒ A4P3A4.
(2.28b)
Similarly we also have that
Section 2.4
Conditional Probability
49
In the next example we show how this equation is useful in finding probabilities
in sequential experiments. The example also introduces a tree diagram that facilitates
the calculation of probabilities.
Example 2.25
An urn contains two black balls and three white balls. Two balls are selected at random from the
urn without replacement and the sequence of colors is noted. Find the probability that both balls
are black.
This experiment consists of a sequence of two subexperiments. We can imagine working
our way down the tree shown in Fig. 2.10 from the topmost node to one of the bottom nodes: We
reach node 1 in the tree if the outcome of the first draw is a black ball; then the next subexperiment consists of selecting a ball from an urn containing one black ball and three white balls. On
the other hand, if the outcome of the first draw is white, then we reach node 2 in the tree and the
second subexperiment consists of selecting a ball from an urn that contains two black balls and
two white balls. Thus if we know which node is reached after the first draw, then we can state the
probabilities of the outcome in the next subexperiment.
Let B1 and B2 be the events that the outcome is a black ball in the first and second draw,
respectively. From Eq. (2.28b) we have
P3B1 ¨ B24 = P3B2 ƒ B14P3B14.
In terms of the tree diagram in Fig. 2.10, P3B14 is the probability of reaching node 1 and P3B2 ƒ B14 is
the probability of reaching the leftmost bottom node from node 1. Now P3B14 = 2/5 since the first
draw is from an urn containing two black balls and three white balls; P3B2 ƒ B14 = 1/4 since, given B1 ,
the second draw is from an urn containing one black ball and three white balls. Thus
P3B1 ¨ B24 =
1
12
=
.
45
10
In general, the probability of any sequence of colors is obtained by multiplying the probabilities
corresponding to the node transitions in the tree in Fig. 2.10.
0
B1
2
5
3
5
W1
1
B2
1
10
1
4
Outcome of first draw
2
3
4
W2
3
10
B2
3
10
2
4
2
4
W2
Outcome of second draw
3
10
FIGURE 2.10
The paths from the top node to a bottom node correspond to the possible outcomes
in the drawing of two balls from an urn without replacement. The probability of a
path is the product of the probabilities in the associated transitions.
50
Chapter 2
Basic Concepts of Probability Theory
Example 2.26 Binary Communication System
Many communication systems can be modeled in the following way. First, the user inputs a 0 or a 1
into the system, and a corresponding signal is transmitted. Second, the receiver makes a decision
about what was the input to the system, based on the signal it received. Suppose that the user sends
0s with probability 1 - p and 1s with probability p, and suppose that the receiver makes random
decision errors with probability e. For i = 0, 1, let A i be the event “input was i,” and let Bi be the
event “receiver decision was i.” Find the probabilities P3A i ¨ Bj4 for i = 0, 1 and j = 0, 1.
The tree diagram for this experiment is shown in Fig. 2.11. We then readily obtain the desired probabilities
P3A 0 ¨ B04 = 11 - p211 - e2,
P3A 0 ¨ B14 = 11 - p2e,
P3A 1 ¨ B04 = pe, and
P3A 1 ¨ B14 = p11 - e2.
Let B1 , B2 , Á , Bn be mutually exclusive events whose union equals the sample
space S as shown in Fig. 2.12. We refer to these sets as a partition of S. Any event A can
be represented as the union of mutually exclusive events in the following way:
A = A ¨ S = A ¨ 1B1 ´ B2 ´ Á ´ Bn2
= 1A ¨ B12 ´ 1A ¨ B22 ´ Á ´ 1A ¨ Bn2.
(See Fig. 2.12.) By Corollary 4, the probability of A is
P3A4 = P3A ¨ B14 + P3A ¨ B24 + Á + P3A ¨ Bn4.
By applying Eq. (2.28a) to each of the terms on the right-hand side, we obtain the
theorem on total probability:
(2.29)
P3A4 = P3A ƒ B14P3B14 + P3A ƒ B24P3B24 + Á + P3A ƒ Bn4P3Bn4.
This result is particularly useful when the experiments can be viewed as consisting of a sequence of two subexperiments as shown in the tree diagram in Fig. 2.10.
0
0
(1 ⫺ p)(1 ⫺ ε)
1⫺ε
ε
1⫺p
1
(1 ⫺ p)ε pε
1
p
0
ε
Input into binary channel
1⫺ε
1
Output from binary channel
p(1 ⫺ ε)
FIGURE 2.11
Probabilities of input-output pairs in a binary transmission system.
Section 2.4
B3
B1
Conditional Probability
51
Bn 1
A
Bn
B2
FIGURE 2.12
A partition of S into n disjoint sets.
Example 2.27
In the experiment discussed in Example 2.25, find the probability of the event W2 that the second
ball is white.
The events B1 = 51b, b2, 1b, w26 and W1 = 51w, b2, 1w, w26 form a partition of the sample space, so applying Eq. (2.29) we have
P3W24 = P3W2 ƒ B14P3B14 + P3W2 ƒ W14P3W14
=
13
3
32
+
= .
45
25
5
It is interesting to note that this is the same as the probability of selecting a white ball in the first
draw. The result makes sense because we are computing the probability of a white ball in the second draw under the assumption that we have no knowledge of the outcome of the first draw.
Example 2.28
A manufacturing process produces a mix of “good” memory chips and “bad” memory chips. The
lifetime of good chips follows the exponential law introduced in Example 2.13, with a rate of failure a. The lifetime of bad chips also follows the exponential law, but the rate of failure is 1000a.
Suppose that the fraction of good chips is 1 - p and of bad chips, p. Find the probability that a
randomly selected chip is still functioning after t seconds.
Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is
good,” and B the event “chip is bad.” By the theorem on total probability we have
P3C4 = P3C ƒ G4P3G4 + P3C ƒ B4P3B4
= P3C ƒ G411 - p2 + P3C ƒ B4p
= 11 - p2e -at + pe -1000at,
where we used the fact that P3C ƒ G4 = e -at and P3C ƒ B4 = e -1000at.
52
2.4.1
Chapter 2
Basic Concepts of Probability Theory
Bayes’ Rule
Let B1 , B2 , Á , Bn be a partition of a sample space S. Suppose that event A occurs; what
is the probability of event Bj? By the definition of conditional probability we have
P3Bj ƒ A4 =
P3A ¨ Bj4
P3A4
=
P3A ƒ Bj4P3Bj4
a P3A ƒ Bk4P3Bk4
n
,
(2.30)
k=1
where we used the theorem on total probability to replace P[A]. Equation (2.30) is
called Bayes’ rule.
Bayes’ rule is often applied in the following situation. We have some random experiment in which the events of interest form a partition. The “a priori probabilities” of
these events, P3Bj4, are the probabilities of the events before the experiment is performed. Now suppose that the experiment is performed, and we are informed that
event A occurred; the “a posteriori probabilities” are the probabilities of the events in
the partition, P3Bj ƒ A4, given this additional information. The following two examples
illustrate this situation.
Example 2.29 Binary Communication System
In the binary communication system in Example 2.26, find which input is more probable given
that the receiver has output a 1. Assume that, a priori, the input is equally likely to be 0 or 1.
Let A k be the event that the input was k, k = 0, 1, then A 0 and A 1 are a partition of the sample
space of input-output pairs. Let B1 be the event “receiver output was a 1.” The probability of B1 is
P3B14 = P3B1 ƒ A 04P3A 04 + P3B1 ƒ A 14P3A 14
1
1
1
= ea b + 11 - e2a b = .
2
2
2
Applying Bayes’ rule, we obtain the a posteriori probabilities
P3A 0 ƒ B14 =
P3A 1 ƒ B14 =
P3B1 ƒ A 04P3A 04
=
P3B1 ƒ A 14P3A 14
=
P3B14
P3B14
e/2
= e
1/2
11 - e2/2
1/2
= 11 - e2.
Thus, if e is less than 1/2, then input 1 is more likely than input 0 when a 1 is observed at the output of the channel.
Example 2.30 Quality Control
Consider the memory chips discussed in Example 2.28. Recall that a fraction p of the chips are
bad and tend to fail much more quickly than good chips. Suppose that in order to “weed out”
the bad chips, every chip is tested for t seconds prior to leaving the factory. The chips that fail
are discarded and the remaining chips are sent out to customers. Find the value of t for which
99% of the chips sent out to customers are good.
Section 2.5
Independence of Events
53
Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is
good,” and B be the event “chip is bad.” The problem requires that we find the value of t for
which
P3G ƒ C4 = .99.
We find P3G ƒ C4 by applying Bayes’ rule:
P3G ƒ C4 =
=
P3C ƒ G4P3G4
P3C ƒ G4P3G4 + P3C ƒ B4P3B4
11 - p2e-at
11 - p2e-at + pe-a1000t
=
1 +
1
pe-a1000t
= .99.
11 - p2e-at
The above equation can then be solved for t:
t =
99p
1
lna
b.
999a
1 - p
For example, if 1/a = 20,000 hours and p = .10, then t = 48 hours.
2.5
INDEPENDENCE OF EVENTS
If knowledge of the occurrence of an event B does not alter the probability of some
other event A, then it would be natural to say that event A is independent of B. In
terms of probabilities this situation occurs when
P3A4 = P3A ƒ B4 =
P3A ¨ B4
P3B4
.
The above equation has the problem that the right-hand side is not defined when
P3B4 = 0.
We will define two events A and B to be independent if
P3A ¨ B4 = P3A4P3B4.
(2.31)
Equation (2.31) then implies both
P3A ƒ B4 = P3A4
(2.32a)
P3B ƒ A4 = P3B4
(2.32b)
and
Note also that Eq. (2.32a) implies Eq. (2.31) when P3B4 Z 0 and Eq. (2.32b) implies
Eq. (2.31) when P3A4 Z 0.
54
Chapter 2
Basic Concepts of Probability Theory
Example 2.31
A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls,
numbered 3 and 4. Let the events A, B, and C be defined as follows:
A = 511, b2, 12, b26, “black ball selected”;
B = 512, b2, 14, w26, “even-numbered ball selected”; and
C = 513, w2, 14, w26, “number of ball is greater than 2.”
Are events A and B independent? Are events A and C independent?
First, consider events A and B. The probabilities required by Eq. (2.31) are
P3A4 = P3B4 =
1
,
2
and
P3A ¨ B4 = P3512, b264 =
1
.
4
Thus
P3A ¨ B4 =
1
= P3A4P3B4,
4
and the events A and B are independent. Equation (2.32b) gives more insight into the meaning
of independence:
P3A ƒ B4 =
P3A4 =
P3A ¨ B4
P3B4
P3A4
P3S4
=
=
P3512, b264
P3512, b2, 14, w264
=
P3511, b2, 12, b264
1/4
1
=
1/2
2
P3511, b2, 12, b2, 13, w2, 14, w264
=
1/2
.
1
These two equations imply that P3A4 = P3A ƒ B4 because the proportion of outcomes in S that
lead to the occurrence of A is equal to the proportion of outcomes in B that lead to A. Thus knowledge of the occurrence of B does not alter the probability of the occurrence of A.
Events A and C are not independent since P3A ¨ C4 = P34 = 0 so
P3A ƒ C4 = 0 Z P3A4 = .5.
In fact, A and C are mutually exclusive since A ¨ C = , so the occurrence of C implies that A
has definitely not occurred.
In general if two events have nonzero probability and are mutually exclusive,
then they cannot be independent. For suppose they were independent and mutually
exclusive; then
0 = P3A ¨ B4 = P3A4P3B4,
which implies that at least one of the events must have zero probability.
Section 2.5
Independence of Events
55
Example 2.32
Two numbers x and y are selected at random between zero and one. Let the events A, B, and C
be defined as follows:
A = 5x 7 0.56,
B = 5y 7 0.56,
and C = 5x 7 y6.
Are the events A and B independent? Are A and C independent?
Figure 2.13 shows the regions of the unit square that correspond to the above events.
Using Eq. (2.32a), we have
P3A ƒ B4 =
P3A ¨ B4
P3B4
=
1/4
1
= = P3A4,
1/2
2
so events A and B are independent. Again we have that the “proportion” of outcomes in S leading to A is equal to the “proportion” in B that lead to A.
Using Eq. (2.32b), we have
P3A ƒ C4 =
P3A ¨ C4
P3C4
=
3/8
3
1
= Z = P3A4,
1/2
4
2
so events A and C are not independent. Indeed from Fig. 2.13(b) we can see that knowledge of
the fact that x is greater than y increases the probability that x is greater than 0.5.
What conditions should three events A, B, and C satisfy in order for them to be
independent? First, they should be pairwise independent, that is,
P3A ¨ B4 = P3A4P3B4, P3A ¨ C4 = P3A4P3C4, and P3B ¨ C4 = P3B4P3C4.
y
1
B
1
2
A
x
1
1
2
(a) Events A and B are independent.
0
y
1
A
C
x
1
1
2
(b) Events A and C are not independent.
0
FIGURE 2.13
Examples of independent and
nonindependent events.
56
Chapter 2
Basic Concepts of Probability Theory
In addition, knowledge of the joint occurrence of any two, say A and B, should not affect the probability of the third, that is,
P3C ƒ A ¨ B4 = P3C4.
In order for this to hold, we must have
P3C ƒ A ¨ B4 =
P3A ¨ B ¨ C4
P3A ¨ B4
= P3C4.
This in turn implies that we must have
P3A ¨ B ¨ C4 = P3A ¨ B4P3C4 = P3A4P3B4P3C4,
where we have used the fact that A and B are pairwise independent. Thus we conclude
that three events A, B, and C are independent if the probability of the intersection of any
pair or triplet of events is equal to the product of the probabilities of the individual events.
The following example shows that if three events are pairwise independent, it
does not necessarily follow that P3A ¨ B ¨ C4 = P3A4P3B4P3C4.
Example 2.33
Consider the experiment discussed in Example 2.32 where two numbers are selected at random
from the unit interval. Let the events B, D, and F be defined as follows:
B = ey 7
1
f,
2
F = ex 6
1
1
1
1
and y 6 f ´ e x 7 and y 7 f.
2
2
2
2
D = ex 6
1
f
2
The three events are shown in Fig. 2.14. It can be easily verified that any pair of these events is independent:
P3B ¨ D4 =
1
= P3B4P3D4,
4
P3B ¨ F4 =
1
= P3B4P3F4, and
4
P3D ¨ F4 =
1
= P3D4P3F4.
4
However, the three events are not independent, since B ¨ D ¨ F = , so
P3B ¨ D ¨ F4 = P34 = 0 Z P3B4P3D4P3F4 =
1
.
8
In order for a set of n events to be independent, the probability of an event
should be unchanged when we are given the joint occurrence of any subset of the other
events. This requirement naturally leads to the following definition of independence.
The events A 1 , A 2 , Á , A n are said to be independent if for k = 2, Á , n,
P3A i1 ¨ A i2 ¨ Á ¨ A ik4 = P3A i14P3A i24 Á P3A ik4,
(2.33)
Section 2.5
Independence of Events
57
y
y
1
1
B
1
2
D
0
x
1
0
1
(a) B ⫽ {y }
2
1
2
1
x
1
(b) D ⫽ {x }
2
y
1
F
1
2
F
0
(c) F ⫽ {x
1
2
1
x
1
1
1
1
and y } {x and y }
2
2
2
2
FIGURE 2.14
Events B, D, and F are pairwise independent, but the
triplet B, D, F are not independent events.
where 1 … i1 6 i2 6 Á 6 ik … n. For a set of n events we need to verify that the
probabilities of all 2 n - n - 1 possible intersections factor in the right way.
The above definition of independence appears quite cumbersome because it requires that so many conditions be verified. However, the most common application of
the independence concept is in making the assumption that the events of separate experiments are independent. We refer to such experiments as independent experiments.
For example, it is common to assume that the outcome of a coin toss is independent of
the outcomes of all prior and all subsequent coin tosses.
Example 2.34
Suppose a fair coin is tossed three times and we observe the resulting sequence of heads and
tails. Find the probability of the elementary events.
The sample space of this experiment is S = 5HHH, HHT, HTH, THH, TTH, THT,
HTT, TTT6. The assumption that the coin is fair means that the outcomes of a single toss are
equiprobable, that is, P3H4 = P3T4 = 1/2. If we assume that the outcomes of the coin tosses are
independent, then
1
,
8
1
P35HHT64 = P35H64P35H64P35T64 = ,
8
P35HHH64 = P35H64P35H64P35H64 =
58
Chapter 2
Basic Concepts of Probability Theory
1
,
8
1
P35THH64 = P35T64P35H64P35H64 = ,
8
1
P35TTH64 = P35T64P35T64P35H64 = ,
8
1
P35THT64 = P35T64P35H64P35T64 = ,
8
1
P35HTT64 = P35H64P35T64P35T64 = , and
8
1
P35TTT64 = P35T64P35T64P35T64 = .
8
P35HTH64 = P35H64P35T64P35H64 =
Example 2.35 System Reliability
A system consists of a controller and three peripheral units. The system is said to be “up” if the
controller and at least two of the peripherals are functioning. Find the probability that the system is up, assuming that all components fail independently.
Define the following events: A is “controller is functioning” and Bi is “peripheral i is functioning” where i = 1, 2, 3. The event F, “two or more peripheral units are functioning,” occurs if
all three units are functioning or if exactly two units are functioning. Thus
F = 1B1 ¨ B2 ¨ Bc32 ´ 1B1 ¨ Bc2 ¨ B32
´ 1Bc1 ¨ B2 ¨ B32 ´ 1B1 ¨ B2 ¨ B32.
Note that the events in the above union are mutually exclusive. Thus
P3F4 = P3B14P3B24P3Bc34 + P3B14P3Bc24P3B34
+ P3Bc14P3B24P3B34 + P3B14P3B24P3B34
= 311 - a22a + 11 - a23,
where we have assumed that each peripheral fails with probability a, so that P3Bi4 = 1 - a and
P3Bci 4 = a.
The event “system is up” is then A ¨ F. If we assume that the controller fails with probability p, then
P3“system up”4 = P3A ¨ F4 = P3A4P3F4
= 11 - p2P3F4
= 11 - p25311 - a22a + 11 - a236.
Let a = 10%, then all three peripherals are functioning 11 - a23 = 72.9% of the time and
two are functioning and one is “down” 311 - a22a = 24.3% of the time. Thus two or more
peripherals are functioning 97.2% of the time. Suppose that the controller is not very reliable,
say p = 20%, then the system is up only 77.8% of the time, mostly because of controller
failures.
Suppose a second identical controller with p = 20% is added to the system, and that the
system is “up” if at least one of the controllers is functioning and if two or more of the peripherals are functioning. In Problem 2.94, you are asked to show that at least one of the controllers is
Section 2.6
Sequential Experiments
59
functioning 96% of the time, and that the system is up 93.3% of the time. This is an increase of
16% over the system with a single controller.
2.6
SEQUENTIAL EXPERIMENTS
Many random experiments can be viewed as sequential experiments that consist of a
sequence of simpler subexperiments. These subexperiments may or may not be independent. In this section we discuss methods for obtaining the probabilities of events in
sequential experiments.
2.6.1
Sequences of Independent Experiments
Suppose that a random experiment consists of performing experiments E1 , E2 , Á , En .
The outcome of this experiment will then be an n-tuple s = 1s1 , Á , sn2, where sk is the
outcome of the kth subexperiment. The sample space of the sequential experiment is
defined as the set that contains the above n-tuples and is denoted by the Cartesian
product of the individual sample spaces S1 * S2 * Á * Sn .
We can usually determine, because of physical considerations, when the subexperiments are independent, in the sense that the outcome of any given subexperiment cannot affect the outcomes of the other subexperiments. Let A 1 , A 2 , Á , A n be events such
that A k concerns only the outcome of the kth subexperiment. If the subexperiments are
independent, then it is reasonable to assume that the above events A 1 , A 2 , Á , A n are
independent. Thus
P3A 1 ¨ A 2 ¨ Á ¨ A n4 = P3A 14P3A 24 Á P3A n4.
(2.34)
This expression allows us to compute all probabilities of events of the sequential experiment.
Example 2.36
Suppose that 10 numbers are selected at random from the interval [0, 1]. Find the probability
that the first 5 numbers are less than 1/4 and the last 5 numbers are greater than 1/2. Let
x1 , x2 , Á , x10 be the sequence of 10 numbers, then the events of interest are
Ak = e xk 6
1
f
4
for k = 1, Á , 5
Ak = e xk 7
1
f
2
for k = 6, Á , 10.
If we assume that each selection of a number is independent of the other selections, then
P3A 1 ¨ A 2 ¨ Á ¨ A 104 = P3A 14P3A 24 Á P3A 104
1 5 1 5
= a b a b .
4
2
We will now derive several important models for experiments that consist of sequences of independent subexperiments.
60
2.6.2
Chapter 2
Basic Concepts of Probability Theory
The Binomial Probability Law
A Bernoulli trial involves performing an experiment once and noting whether a particular event A occurs. The outcome of the Bernoulli trial is said to be a “success” if A occurs and a “failure” otherwise. In this section we are interested in finding the
probability of k successes in n independent repetitions of a Bernoulli trial.
We can view the outcome of a single Bernoulli trial as the outcome of a toss of a coin
for which the probability of heads (success) is p = P3A4. The probability of k successes in
n Bernoulli trials is then equal to the probability of k heads in n tosses of the coin.
Example 2.37
Suppose that a coin is tossed three times. If we assume that the tosses are independent and the
probability of heads is p, then the probability for the sequences of heads and tails is
P35HHH64 = P35H64P35H64P35H64 = p3,
P35HHT64 = P35H64P35H64P35T64 = p211 - p2,
P35HTH64 = P35H64P35T64P35H64 = p211 - p2,
P35THH64 = P35T64P35H64P35H64 = p211 - p2,
P35TTH64 = P35T64P35T64P35H64 = p11 - p22,
P35THT64 = P35T64P35H64P35T64 = p11 - p22,
P35HTT64 = P35H64P35T64P35T64 = p11 - p22, and
P35TTT64 = P35T64P35T64P35T64 = 11 - p23
where we used the fact that the tosses are independent. Let k be the number of heads in three
trials, then
P3k = 04 = P35TTT64 = 11 - p23,
P3k = 14 = P35TTH, THT, HTT64 = 3p11 - p22,
P3k = 24 = P35HHT, HTH, THH64 = 3p211 - p2, and
P3k = 34 = P35HHH64 = p3.
The result in Example 2.37 is the n = 3 case of the binomial probability law.
Theorem
Let k be the number of successes in n independent Bernoulli trials, then the probabilities of k are
given by the binomial probability law:
n
pn1k2 = ¢ ≤ pk11 - p2n - k
k
for
k = 0, Á , n,
(2.35)
Section 2.6
Sequential Experiments
61
where pn1k2 is the probability of k successes in n trials, and
n
k
¢ ≤ =
n!
k! 1n - k2!
(2.36)
is the binomial coefficient.
The term n! in Eq. (2.36) is called n factorial and is defined by n! = n1n - 12 Á
122112. By definition 0! is equal to 1.
We now prove the above theorem. Following Example 2.34 we see that each of
the sequences with k successes and n - k failures has the same probability, namely
pk11 - p2n - k. Let Nn1k2 be the number of distinct sequences that have k successes
and n - k failures, then
pn1k2 = Nn1k2pk11 - p2n - k.
(2.37)
n
Nn1k2 = ¢ ≤ .
k
(2.38)
The expression Nn1k2 is the number of ways of picking k positions out of n for the successes. It can be shown that5
The theorem follows by substituting Eq. (2.38) into Eq. (2.37).
Example 2.38
Verify that Eq. (2.35) gives the probabilities found in Example 2.37.
In Example 2.37, let “toss results in heads” correspond to a “success,” then
p3102 =
3! 0
p 11
0! 3!
3! 1
p 11
p3112 =
1! 2!
3! 2
p3122 =
p 11
2! 1!
3! 3
p3132 =
p 11
0! 3!
- p23 = 11 - p23,
- p22 = 3p11 - p22,
- p21 = 3p211 - p2, and
- p20 = p3,
which are in agreement with our previous results.
You were introduced to the binomial coefficient in an introductory calculus
course when the binomial theorem was discussed:
n
n
1a + b2n = a ¢ ≤ akbn - k.
k
k=0
5
See Example 2.21.
(2.39a)
62
Chapter 2
Basic Concepts of Probability Theory
If we let a = b = 1, then
n
n
n
2 n = a ¢ ≤ = a Nn1k2,
k=0 k
k=0
which is in agreement with the fact that there are 2 n distinct possible sequences of successes and failures in n trials. If we let a = p and b = 1 - p in Eq. (2.39a), we then obtain
n
n
n
1 = a ¢ ≤ pk11 - p2n - k = a pn1k2,
k=0 k
k=0
(2.39b)
which confirms that the probabilities of the binomial probabilities sum to 1.
The term n! grows very quickly with n, so numerical problems are encountered for
relatively small values of n if one attempts to compute pn1k2 directly using Eq. (2.35).
The following recursive formula avoids the direct evaluation of n! and thus extends the
range of n for which pn1k2 can be computed before encountering numerical difficulties:
pn1k + 12 =
1n - k2p
1k + 1211 - p2
pn1k2.
(2.40)
Later in the book, we present two approximations for the binomial probabilities for
the case when n is large.
Example 2.39
Let k be the number of active (nonsilent) speakers in a group of eight noninteracting (i.e., independent) speakers. Suppose that a speaker is active with probability 1/3. Find the probability that
the number of active speakers is greater than six.
For i = 1, Á , 8, let A i denote the event “ith speaker is active.” The number of active
speakers is then the number of successes in eight Bernoulli trials with p = 1>3. Thus the probability that more than six speakers are active is
8 1 7 2
8 1 8
P3k = 74 + P3k = 84 = ¢ ≤ a b a b + ¢ ≤ a b
3
7 3
8 3
= .00244 + .00015 = .00259.
Example 2.40 Error Correction Coding
A communication system transmits binary information over a channel that introduces random
bit errors with probability e = 10-3. The transmitter transmits each information bit three times,
and a decoder takes a majority vote of the received bits to decide on what the transmitted bit
was. Find the probability that the receiver will make an incorrect decision.
The receiver can correct a single error, but it will make the wrong decision if the channel
introduces two or more errors. If we view each transmission as a Bernoulli trial in which a “success” corresponds to the introduction of an error, then the probability of two or more errors in
three Bernoulli trials is
3
3
P3k Ú 24 = ¢ ≤ 1.001221.9992 + ¢ ≤ 1.00123 M 3110-62.
2
3
Section 2.6
2.6.3
Sequential Experiments
63
The Multinomial Probability Law
The binomial probability law can be generalized to the case where we note the occurrence of more than one event. Let B1 , B2 , Á , BM be a partition of the sample
space S of some random experiment and let P3Bj4 = pj . The events are mutually exclusive, so
p1 + p2 + Á + pM = 1.
Suppose that n independent repetitions of the experiment are performed. Let kj
be the number of times event Bj occurs, then the vector 1k1 , k2 , Á , kM2 specifies the
number of times each of the events Bj occurs. The probability of the vector 1k1 , Á , kM2
satisfies the multinomial probability law:
P31k1 , k2 , Á , kM24 =
n!
k
pk1pk2 Á pMM ,
k1! k2! Á kM! 1 2
(2.41)
where k1 + k2 + Á + kM = n. The binomial probability law is the M = 2 case of the
multinomial probability law. The derivation of the multinomial probabilities is identical to that of the binomial probabilities. We only need to note that the number of different sequences with k1 , k2 , Á , kM instances of the events B1 , B2 , Á , BM is given by
the multinomial coefficient in Eq. (2.26).
Example 2.41
A dart is thrown nine times at a target consisting of three areas. Each throw has a probability of
.2, .3, and .5 of landing in areas 1, 2, and 3, respectively. Find the probability that the dart lands
exactly three times in each of the areas.
This experiment consists of nine independent repetitions of a subexperiment that has
three possible outcomes. The probability for the number of occurrences of each outcome is given
by the multinomial probabilities with parameters n = 9 and p1 = .2, p2 = .3, and p3 = .5:
P313, 3, 324 =
9!
1.2231.3231.523 = .04536.
3! 3! 3!
Example 2.42
Suppose we pick 10 telephone numbers at random from a telephone book and note the last digit in
each of the numbers.What is the probability that we obtain each of the integers from 0 to 9 only once?
The probabilities for the number of occurrences of the integers is given by the multinomial
probabilities with parameters M = 10, n = 10, and pj = 1/10 if we assume that the 10 integers in
the range 0 to 9 are equiprobable.The probability of obtaining each integer once in 10 draws is then
10!
1.1210 M 3.6110-42.
1! 1! Á 1!
2.6.4
The Geometric Probability Law
Consider a sequential experiment in which we repeat independent Bernoulli trials
until the occurrence of the first success. Let the outcome of this experiment be m, the
number of trials carried out until the occurrence of the first success. The sample space
64
Chapter 2
Basic Concepts of Probability Theory
for this experiment is the set of positive integers. The probability, p(m), that m trials are
required is found by noting that this can only happen if the first m - 1 trials result in
failures and the mth trial in success.6 The probability of this event is
p1m2 = P3A c1A c2 Á A cm - 1A m4 = 11 - p2m - 1p
m = 1, 2, Á ,
(2.42a)
where A i is the event “success in ith trial.” The probability assignment specified by
Eq. (2.42a) is called the geometric probability law.
The probabilities in Eq. (2.42a) sum to 1:
1
m-1
= p
= 1,
a p1m2 = p a q
1 - q
m=1
m=1
q
q
(2.42b)
where q = 1 - p, and where we have used the formula for the summation of a geometric
series. The probability that more than K trials are required before a success occurs has a
simple form:
P35m 7 K64 = p a qm - 1 = pqK a qj
q
q
m=K+1
j=0
= pqK
1
1 - q
= q K.
(2.43)
Example 2.43 Error Control by Retransmission
Computer A sends a message to computer B over an unreliable radio link. The message is encoded
so that B can detect when errors have been introduced into the message during transmission. If B
detects an error, it requests A to retransmit it. If the probability of a message transmission error is
q = .1, what is the probability that a message needs to be transmitted more than two times?
Each transmission of a message is a Bernoulli trial with probability of success p = 1 - q.
The Bernoulli trials are repeated until the first success (error-free transmission). The probability
that more than two transmissions are required is given by Eq. (2.43):
P3m 7 24 = q2 = 10-2.
2.6.5
Sequences of Dependent Experiments
In this section we consider a sequence or “chain” of subexperiments in which the outcome of a given subexperiment determines which subexperiment is performed next.
We first give a simple example of such an experiment and show how diagrams can be
used to specify the sample space.
Example 2.44
A sequential experiment involves repeatedly drawing a ball from one of two urns, noting the
number on the ball, and replacing the ball in its urn. Urn 0 contains a ball with the number 1
and two balls with the number 0, and urn 1 contains five balls with the number 1 and one ball
6
See Example 2.11 in Section 2.2 for a relative frequency interpretation of how the geometric probability law
comes about.
Section 2.6
Sequential Experiments
65
with the number 0. The urn from which the first draw is made is selected at random by flipping
a fair coin. Urn 0 is used if the outcome is heads and urn 1 if the outcome is tails. Thereafter the
urn used in a subexperiment corresponds to the number on the ball selected in the previous
subexperiment.
The sample space of this experiment consists of sequences of 0s and 1s. Each possible sequence corresponds to a path through the “trellis” diagram shown in Fig. 2.15(a). The nodes in
the diagram denote the urn used in the nth subexperiment, and the labels in the branches denote
the outcome of a subexperiment. Thus the path 0011 corresponds to the sequence: The coin toss
was heads so the first draw was from urn 0; the outcome of the first draw was 0, so the second
draw was from urn 0; the outcome of the second draw was 1, so the third draw was from urn 1;
and the outcome from the third draw was 1, so the fourth draw is from urn 1.
Now suppose that we want to compute the probability of a particular sequence of
outcomes, say s0 , s1 , s2 . Denote this probability by P35s06 ¨ 5s16 ¨ 5s264. Let A = 5s26
and B = 5s06 ¨ 5s16, then since P3A ¨ B4 = P3A ƒ B4P3B4 we have
P35s06 ¨ 5s16 ¨ 5s264 = P35s26 ƒ 5s06 ¨ 5s164P35s06 ¨ 5s164
= P35s26 ƒ 5s06 ¨ 5s164P35s16 ƒ 5s064P35s064.
(2.44)
Now note that in the above urn example the probability P35sn6 ƒ 5s06 ¨ Á ¨ 5sn - 164
depends only on 5sn - 16 since the most recent outcome determines which subexperiment is performed:
P35sn6 ƒ 5s06 ¨ Á ¨ 5sn - 164 = P35sn6 ƒ 5sn - 164.
0
0
0
0
1
h
t
1
0
1
0
1
2
1
2
3
0
1
3
1
2
1
6
1
0
1
1
1
2
3
(a) Each sequence of outcomes corresponds
to a path through this trellis diagram.
2
3
2
3
0
1
3
5
6
1
6
1
0
1
0
1
1
0
0
1
4
0
1
3
5
6
1
6
1
5
6
1
(b) The probability of a sequence of outcomes is the
product of the probabilities along the associated path.
FIGURE 2.15
Trellis diagram for a Markov chain.
(2.45)
66
Chapter 2
Basic Concepts of Probability Theory
Therefore for the sequence of interest we have that
P35s06 ¨ 5s16 ¨ 5s264 = P35s26 ƒ 5s164P35s16 ƒ 5s064P35s064.
(2.46)
Sequential experiments that satisfy Eq. (2.45) are called Markov chains. For these
experiments, the probability of a sequence s0 , s1 , Á , sn is given by
P3s0 , s1 , Á , sn4 = P3sn ƒ sn - 14P3sn - 1 ƒ sn - 24 Á P3s1 ƒ s04P3s04
(2.47)
where we have simplified notation by omitting braces. Thus the probability of the sequence s0 , Á , sn is given by the product of the probability of the first outcome s0 and
the probabilities of all subsequent transitions, s0 to s1 , s1 to s2 , and so on. Chapter 11
deals with Markov chains.
Example 2.45
Find the probability of the sequence 0011 for the urn experiment introduced in Example 2.44.
Recall that urn 0 contains two balls with label 0 and one ball with label 1, and that urn 1
contains five balls with label 1 and one ball with label 0. We can readily compute the probabilities
of sequences of outcomes by labeling the branches in the trellis diagram with the probability of
the corresponding transition as shown in Fig. 2.15(b). Thus the probability of the sequence 0011 is
given by
P300114 = P31 ƒ 14P31 ƒ 04P30 ƒ 04P304,
where the transition probabilities are given by
P31 ƒ 04 =
1
3
and
P30 ƒ 04 =
2
3
P31 ƒ 14 =
5
6
and
P30 ƒ 14 =
1
,
6
and the initial probabilities are given by
P102 =
1
= P314.
2
If we substitute these values into the expression for P[0011], we obtain
5
5 1 2 1
.
P300114 = a b a b a b a b =
6 3 3 2
54
The two-urn experiment in Examples 2.44 and 2.45 is the simplest example of the
Markov chain models that are discussed in Chapter 11. The two-urn experiment discussed here is used to model situations in which there are only two outcomes, and in
which the outcomes tend to occur in bursts. For example, the two-urn model has been
used to model the “bursty” behavior of the voice packets generated by a single speaker where bursts of active packets are separated by relatively long periods of silence.
The model has also been used for the sequence of black and white dots that result from
scanning a black and white image line by line.
Section 2.7
*2.7
Synthesizing Randomness: Random Number Generators
67
A COMPUTER METHOD FOR SYNTHESIZING RANDOMNESS: RANDOM NUMBER
GENERATORS
This section introduces the basic method for generating sequences of “random” numbers using a computer. Any computer simulation of a system that involves randomness
must include a method for generating sequences of random numbers. These random
numbers must satisfy long-term average properties of the processes they are simulating.
In this section we focus on the problem of generating random numbers that are “uniformly distributed” in the interval [0, 1]. In the next chapter we will show how these random numbers can be used to generate numbers with arbitrary probability laws.
The first problem we must confront in generating a random number in the interval [0, 1] is the fact that there are an uncountably infinite number of points in the interval, but the computer is limited to representing numbers with finite precision only.
We must therefore be content with generating equiprobable numbers from some finite
set, say 50, 1, Á , M - 16 or 51, 2, Á , M6. By dividing these numbers by M, we obtain
numbers in the unit interval. These numbers can be made increasingly dense in the unit
interval by making M very large.
The next step involves finding a mechanism for generating random numbers. The
direct approach involves performing random experiments. For example, we can generate integers in the range 0 to 2 m - 1 by flipping a fair coin m times and replacing the
sequence of heads and tails by 0s and 1s to obtain the binary representation of an integer. Another example would involve drawing a ball from an urn containing balls numbered 1 to M. Computer simulations involve the generation of long sequences of
random numbers. If we were to use the above mechanisms to generate random numbers, we would have to perform the experiments a large number of times and store the
outcomes in computer storage for access by the simulation program. It is clear that this
approach is cumbersome and quickly becomes impractical.
2.7.1
Pseudo-Random Number Generation
The preferred approach for the computer generation of random numbers involves the
use of recursive formulas that can be implemented easily and quickly. These pseudorandom number generators produce a sequence of numbers that appear to be random
but that in fact repeat after a very long period. The currently preferred pseudo-random
number generator is the so-called Mersenne Twister, which is based on a matrix linear
recurrence over a binary field. This algorithm can yield sequences with an extremely
long period of 2 19937 - 1. The Mersenne Twister generates 32-bit integers, so
M = 2 32 - 1 in terms of our previous discussion. We obtain a sequence of numbers in
the unit interval by dividing the 32-bit integers by 2 32. The sequence of such numbers
should be equally distributed over unit cubes of very high dimensionality. The
Mersenne Twister has been shown to meet this condition up to 632-dimensionality. In
addition, the algorithm is fast and efficient in terms of storage.
Software implementations of the Mersenne Twister are widely available and incorporated into numerical packages such as MATLAB® and Octave.7 Both MATLAB and
Octave provide a means to generate random numbers from the unit interval using the
7
MATLAB® and Octave are interactive computer programs for numerical computations involving matrices.
MATLAB® is a commercial product sold by The Mathworks, Inc. Octave is a free, open-source program that is
mostly compatible with MATLAB in terms of computation. Long [9] provides an introduction to Octave.
68
Chapter 2
Basic Concepts of Probability Theory
rand command. The rand (n, m) operator returns an n row by m column matrix with
elements that are random numbers from the interval [0, 1). This operator is the starting
point for generating all types of random numbers.
Example 2.46 Generation of Numbers from the Unit Interval
First, generate 6 numbers from the unit interval. Next, generate 10,000 numbers from the unit interval. Plot the histogram and empirical distribution function for the sequence of 10,000 numbers.
The following command results in the generation of six numbers from the unit interval.
>rand(1,6)
ans =
Columns 1 through 6:
0.642667 0.147811 0.317465 0.512824 0.710823 0.406724
The following set of commands will generate 10000 numbers and produce the histogram
shown in Fig. 2.16.
>X-rand(10000,1);
% Return result in a 10,000-element column vector X.
>K=0.005:0.01;0.995;
% Produce column vector K consisting of the mid points
% for 100 bins of width 0.01 in the unit interval.
>Hist(X,K)
% Produce the desired histogram in Fig 2.16.
>plot(K,empirical_cdf(K,X))
% Plot the proportion of elements in the array X less
% than or equal to k, where k is an element of K.
The empirical cdf is shown in Fig. 2.17. It is evident that the array of random numbers is uniformly distributed in the unit interval.
140
120
100
80
60
40
20
0
0
0.2
0.4
0.6
0.8
FIGURE 2.16
Histogram resulting from experiment to generate 10,000 numbers in the unit interval.
1
Section 2.7
Synthesizing Randomness: Random Number Generators
69
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FIGURE 2.17
Empirical cdf of experiment that generates 10,000 numbers.
2.7.2
Simulation of Random Experiments
MATLAB® and Octave provide functions that are very useful in carrying out numerical evaluation of probabilities involving the most common distributions. Functions
are also provided for the generation of random numbers with specific probability distributions. In this section we consider Bernoulli trials and binomial distributions. In
Chapter 3 we consider experiments with discrete sample spaces.
Example 2.47 Bernoulli Trials and Binomial Probabilities
First, generate the outcomes of eight Bernoulli trials. Next, generate the outcomes of 100 repetitions of a random experiment that counts the number of successes in 16 Bernoulli trials with
probability of success 1冫2 . Plot the histogram of the outcomes in the 100 experiments and compare
to the binomial probabilities with n = 16 and p = 1/2 .
The following command will generate the outcomes of eight Bernoulli trials, as shown by
the answer that follows.
>X=rand(1,8)<0.5;
% Generate 1 row of Bernoulli trials with p = 0.5
X=
01100011
If the number produced by rand for a given Bernoulli trial is less than p = 0.5, then the outcome
of the Bernoulli trial is 1.
70
Chapter 2
Basic Concepts of Probability Theory
Next we show the set of commands to generate the outcomes of 100 repetitions of random
experiments where each involves 16 Bernoulli trials.
>X=rand(100,16)<0.5;
% Generate 100 rows of 16 Bernoulli trials with
% p = 0.5.
>Y=sum(X,2);
% Add the results of each row to obtain the number of
% successes in each experiment. Y contains the 100
% outcomes.
>K=0:16;
>Z=empirical_pdf(K,Y));
% Find the relative frequencies of the outcomes in Y.
>Bar(K,Z)
% Produce a bar graph of the relative frequencies.
>hold on
% Retains the graph for next command.
>stem(K,binomial_pdf(K,16,0.5))
% Plot the binomial probabilities along
% with the corresponding relative frequencies.
Figure 2.18 shows that there is good agreement between the relative frequencies and
the binomial probabilities.
*2.8
FINE POINTS: EVENT CLASSES8
If the sample space S is discrete, then the event class can consist of all subsets of S.
There are situations where we may wish or are compelled to let the event class F be a
smaller class of subsets of S. In these situations, only the subsets that belong to this
class are considered events. In this section we explain how these situations arise.
Let C be the class of events of interest in a random experiment. It is reasonable to
expect that any set operation on events in C will produce a set that is also an event in C.
We can then ask any question regarding events of the random experiment, express it
using set operations, and obtain an event that is in C. Mathematically, we require that C
be a field.
A collection of sets F is called a field if it satisfies the following conditions:
(i) H F
(ii) if A H F and B H F, then A ´ B H F
(iii) if A H F then Ac H F.
(2.48a)
(2.48b)
(2.48c)
Using DeMorgan’s rule we can show that (ii) and (iii) imply that if A H F and
B H F, then A ¨ B H F. Conditions (ii) and (iii) then imply that any finite union or intersection of events in F will result in an event that is also in F.
Example 2.48
Let S = 5T, H6. Find the field generated by set operations on the class consisting of elementary
events of S : C = 55H6, 5T66.
8
The “Fine Points” sections elaborate on concepts and distinctions that are not required in an introductory
course. The material in these sections is not necessarily more mathematical, but rather is not usually covered
in a first course in probability.
Problems
81
7. W. Feller, An Introduction to Probability Theory and Its Applications, 3d ed.,
Wiley, New York, 1968.
8. A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, New York, 1970.
9. P. J. G. Long, “Introduction to Octave,” University of Cambridge, September
2005, available online.
10. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill,
New York, 2000.
PROBLEMS
Section 2.1: Specifying Random Experiments
2.1.
The (loose) minute hand in a clock is spun hard and the hour at which the hand comes to
rest is noted.
(a) What is the sample space?
(b) Find the sets corresponding to the events: A = “hand is in first 4 hours”; B = “hand
is between 2nd and 8th hours inclusive”; and D = “hand is in an odd hour.”
(c) Find the events: A ¨ B ¨ D, Ac ¨ B, A ´ 1B ¨ Dc2, 1A ´ B2 ¨ Dc.
2.2.
A die is tossed twice and the number of dots facing up in each toss is counted and noted
in the order of occurrence.
(a) Find the sample space.
(b) Find the set A corresponding to the event “number of dots in first toss is not less than
number of dots in second toss.”
(c) Find the set B corresponding to the event “number of dots in first toss is 6.”
(d) Does A imply B or does B imply A?
(e) Find A ¨ Bc and describe this event in words.
(f) Let C correspond to the event “number of dots in dice differs by 2.” Find A ¨ C.
2.3.
Two dice are tossed and the magnitude of the difference in the number of dots facing up
in the two dice is noted.
(a) Find the sample space.
(b) Find the set A corresponding to the event “magnitude of difference is 3.”
(c) Express each of the elementary events in this experiment as the union of elementary
events from Problem 2.2.
2.4. A binary communication system transmits a signal X that is either a +2 voltage signal
or a -2 voltage signal. A malicious channel reduces the magnitude of the received
signal by the number of heads it counts in two tosses of a coin. Let Y be the resulting
signal.
(a) Find the sample space.
(b) Find the set of outcomes corresponding to the event “transmitted signal was definitely +2.”
(c) Describe in words the event corresponding to the outcome Y = 0.
2.5.
A desk drawer contains six pens, four of which are dry.
(a) The pens are selected at random one by one until a good pen is found. The sequence
of test results is noted. What is the sample space?
82
Chapter 2
2.6.
2.7.
2.8.
2.9.
2.10.
2.11.
2.12.
2.13.
2.14.
2.15.
Basic Concepts of Probability Theory
(b) Suppose that only the number, and not the sequence, of pens tested in part a is noted.
Specify the sample space.
(c) Suppose that the pens are selected one by one and tested until both good pens have
been identified, and the sequence of test results is noted. What is the sample space?
(d) Specify the sample space in part c if only the number of pens tested is noted.
Three friends (Al, Bob, and Chris) put their names in a hat and each draws a name from
the hat. (Assume Al picks first, then Bob, then Chris.)
(a) Find the sample space.
(b) Find the sets A, B, and C that correspond to the events “Al draws his name,” “Bob
draws his name,” and “Chris draws his name.”
(c) Find the set corresponding to the event, “no one draws his own name.”
(d) Find the set corresponding to the event, “everyone draws his own name.”
(e) Find the set corresponding to the event, “one or more draws his own name.”
Let M be the number of message transmissions in Experiment E6.
(a) What is the set A corresponding to the event “M is even”?
(b) What is the set B corresponding to the event “M is a multiple of 3”?
(c) What is the set C corresponding to the event “6 or fewer transmissions are required”?
(d) Find the sets A ¨ B, A - B, A ¨ B ¨ C and describe the corresponding events in
words.
A number U is selected at random from the unit interval. Let the events A and B be:
A = “U differs from 1/2 by more than 1/4” and B = “1 - U is less than 1/2.” Find the
events A ¨ B, Ac ¨ B, A ´ B.
The sample space of an experiment is the real line. Let the events A and B correspond to
the following subsets of the real line: A = 1- q , r4 and B = 1- q , s4, where r … s. Find
an expression for the event C = 1r, s] in terms of A and B. Show that B = A ´ C and
A ¨ C = .
Use Venn diagrams to verify the set identities given in Eqs. (2.2) and (2.3). You will need
to use different colors or different shadings to denote the various regions clearly.
Show that:
(a) If event A implies B, and B implies C, then A implies C.
(b) If event A implies B, then Bc implies Ac.
Show that if A ´ B = A and A ¨ B = A then A = B.
Let A and B be events. Find an expression for the event “exactly one of the events A and
B occurs.” Draw a Venn diagram for this event.
Let A, B, and C be events. Find expressions for the following events:
(a) Exactly one of the three events occurs.
(b) Exactly two of the events occur.
(c) One or more of the events occur.
(d) Two or more of the events occur.
(e) None of the events occur.
Figure P2.1 shows three systems of three components, C1 , C2 , and C3 . Figure P2.1(a) is a
“series” system in which the system is functioning only if all three components are functioning. Figure 2.1(b) is a “parallel” system in which the system is functioning as long as
at least one of the three components is functioning. Figure 2.1(c) is a “two-out-of-three”
Problems
83
system in which the system is functioning as long as at least two components are functioning. Let A k be the event “component k is functioning.” For each of the three system
configurations, express the event “system is functioning” in terms of the events A k .
C1
C3
C2
(a) Series system
C1
C1
C2
C2
C1
C3
C3
C2
C3
(b) Parallel system
(c) Two-out-of-three system
FIGURE P2.1
2.16. A system has two key subsystems. The system is “up” if both of its subsystems are functioning. Triple redundant systems are configured to provide high reliability. The overall
system is operational as long as one of three systems is “up.” Let A jk correspond to the
event “unit k in system j is functioning,” for j = 1, 2, 3 and k = 1, 2.
(a) Write an expression for the event “overall system is up.”
(b) Explain why the above problem is equivalent to the problem of having a connection
in the network of switches shown in Fig. P2.2.
A11
A12
A21
A22
A31
A32
FIGURE P2.2
2.17. In a specified 6-AM-to-6-AM 24-hour period, a student wakes up at time t1 and goes to
sleep at some later time t2 .
(a) Find the sample space and sketch it on the x-y plane if the outcome of this experiment consists of the pair 1t1 , t22.
(b) Specify the set A and sketch the region on the plane corresponding to the event “student is asleep at noon.”
(c) Specify the set B and sketch the region on the plane corresponding to the event “student sleeps through breakfast (7–9 AM).”
(d) Sketch the region corresponding to A ¨ B and describe the corresponding event in
words.
84
Chapter 2
Basic Concepts of Probability Theory
2.18. A road crosses a railroad track at the top of a steep hill. The train cannot stop for oncoming
cars and cars, cannot see the train until it is too late. Suppose a train begins crossing the road
at time t 1 and that the car begins crossing the track at time t 2, where 0 < t 1 < T and 0 < t 2 < T.
(a) Find the sample space of this experiment.
(b) Suppose that it takes the train d 1 seconds to cross the road and it takes the car d 2 seconds to cross the track. Find the set that corresponds to a collision taking place.
(c) Find the set that corresponds to a collision is missed by 1 second or less.
2.19. A random experiment has sample space S = { - 1, 0, +1}.
(a) Find all the subsets of S.
(b) The outcome of a random experiment consists of pairs of outcomes from S where the
elements of the pair cannot be equal. Find the sample space S ¿ of this experiment.
How many subsets does S ¿ have?
2.20. (a) A coin is tossed twice and the sequence of heads and tails is noted. Let S be the sample space of this experiment. Find all subsets of S.
(b) A coin is tossed twice and the number of heads is noted. Let S? be the sample space
of this experiment. Find all subsets of S ¿ .
(c) Consider parts a and b if the coin is tossed 10 times. How many subsets do S and
S ¿ have? How many bits are needed to assign a binary number to each possible
subset?
Section 2.2: The Axioms of Probability
2.21. A die is tossed and the number of dots facing up is noted.
(a) Find the probability of the elementary events under the assumption that all faces of
the die are equally likely to be facing up after a toss.
(b) Find the probability of the events: A = 5more than 3 dots6; B = 5odd number
of dots6.
(c) Find the probability of A ´ B, A ¨ B, Ac.
2.22. In Problem 2.2, a die is tossed twice and the number of dots facing up in each toss is
counted and noted in the order of occurrence.
(a) Find the probabilities of the elementary events.
(b) Find the probabilities of events A, B, C, A ¨ Bc, and A ¨ C defined in Problem 2.2.
2.23. A random experiment has sample space S = 5a, b, c, d6. Suppose that P35c, d64 = 3/8,
P35b, c64 = 6/8, and P35d64 = 1/8, P35c, d64 = 3/8. Use the axioms of probability to
find the probabilities of the elementary events.
2.24. Find the probabilities of the following events in terms of P[A], P[B], and P3A ¨ B4:
(a) A occurs and B does not occur; B occurs and A does not occur.
(b) Exactly one of A or B occurs.
(c) Neither A nor B occur.
2.25. Let the events A and B have P3A4 = x, P3B4 = y, and P3A ´ B4 = z. Use Venn diagrams to find P3A ¨ B], P3Ac ¨ Bc4, P3Ac ´ Bc4, P3A ¨ Bc4, P3Ac ´ B4.
2.26. Show that
P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4 - P3A ¨ C4 - P3B ¨ C4
+ P3A ¨ B ¨ C4.
2.27. Use the argument from Problem 2.26 to prove Corollary 6 by induction.
Problems
85
2.28. A hexadecimal character consists of a group of three bits. Let A i be the event “ith bit in a
character is a 1.”
(a) Find the probabilities for the following events: A 1 , A 1 ¨ A 3 , A 1 ¨ A 2 ¨ A 3 and
A 1 ´ A 2 ´ A 3 . Assume that the values of bits are determined by tosses of a fair coin.
(b) Repeat part a if the coin is biased.
2.29. Let M be the number of message transmissions in Problem 2.7. Find the probabilities of
the events A, B, C, C c, A ¨ B, A - B, A ¨ B ¨ C. Assume the probability of successful
transmission is 1/2.
2.30. Use Corollary 7 to prove the following:
(a) P3A ´ B ´ C4 … P3A4 + P3B4 + P3C4.
(b) P B d A k R … a P3A k4.
n
n
k=1
k=1
(c) P B t A k R Ú 1 - a P3A ck4.
2.31.
2.32.
2.33.
2.34.
2.35.
n
n
k=1
k=1
The second expression is called the union bound.
Let p be the probability that a single character appears incorrectly in this book. Use the
union bound for the probability of there being any errors in a page with n characters.
A die is tossed and the number of dots facing up is noted.
(a) Find the probability of the elementary events if faces with an even number of dots
are twice as likely to come up as faces with an odd number.
(b) Repeat parts b and c of Problem 2.21.
Consider Problem 2.1 where the minute hand in a clock is spun. Suppose that we now
note the minute at which the hand comes to rest.
(a) Suppose that the minute hand is very loose so the hand is equally likely to come to
rest anywhere in the clock. What are the probabilities of the elementary events?
(b) Now suppose that the minute hand is somewhat sticky and so the hand is 1/2 as likely to land in the second minute than in the first, 1/3 as likely to land in the third
minute as in the first, and so on. What are the probabilities of the elementary events?
(c) Now suppose that the minute hand is very sticky and so the hand is 1/2 as likely to
land in the second minute than in the first, 1/2 as likely to land in the third minute as
in the second, and so on. What are the probabilities of the elementary events?
(d) Compare the probabilities that the hand lands in the last minute in parts a, b, and c.
A number x is selected at random in the interval 3-1, 24. Let the events A = 5x 6 06,
B = 5 ƒ x - 0.5 ƒ 6 0.56, and C = 5x 7 0.756.
(a) Find the probabilities of A, B, A ¨ B, and A ¨ C.
(b) Find the probabilities of A ´ B, A ´ C, and A ´ B ´ C, first, by directly evaluating
the sets and then their probabilities, and second, by using the appropriate axioms or
corollaries.
A number x is selected at random in the interval 3 -1, 24. Numbers from the subinterval
[0, 2] occur half as frequently as those from 3-1, 02.
(a) Find the probability assignment for an interval completely within 3-1, 02; completely within [0, 2]; and partly in each of the above intervals.
(b) Repeat Problem 2.34 with this probability assignment.
86
Chapter 2
Basic Concepts of Probability Theory
2.36. The lifetime of a device behaves according to the probability law P31t, q 24 = 1/t for t 7 1.
Let A be the event “lifetime is greater than 4,” and B the event “lifetime is greater than 8.”
(a) Find the probability of A ¨ B, and A ´ B.
(b) Find the probability of the event “lifetime is greater than 6 but less than or equal to 12.”
2.37. Consider an experiment for which the sample space is the real line. A probability law assigns probabilities to subsets of the form 1- q , r4.
(a) Show that we must have P31- q , r44 … P31- q , s44 when r 6 s.
(b) Find an expression for P[(r, s]] in terms of P31- q , r44 and P31- q , s44
(c) Find an expression for P31s, q 24.
2.38. Two numbers (x, y) are selected at random from the interval [0, 1].
(a) Find the probability that the pair of numbers are inside the unit circle.
(b) Find the probability that y 7 2x.
*Section 2.3: Computing Probabilities Using Counting Methods
2.39. The combination to a lock is given by three numbers from the set 50, 1, Á , 596. Find the
number of combinations possible.
2.40. How many seven-digit telephone numbers are possible if the first number is not allowed
to be 0 or 1?
2.41. A pair of dice is tossed, a coin is flipped twice, and a card is selected at random from a
deck of 52 distinct cards. Find the number of possible outcomes.
2.42. A lock has two buttons: a “0” button and a “1” button. To open a door you need to push
the buttons according to a preset 8-bit sequence. How many sequences are there? Suppose you press an arbitrary 8-bit sequence; what is the probability that the door opens? If
the first try does not succeed in opening the door, you try another number; what is the
probability of success?
2.43. A Web site requires that users create a password with the following specifications:
• Length of 8 to 10 characters
• Includes at least one special character 5!, @, #, $, %, ¿, &, *, 1, 2, +, =, 5, 6, ƒ , 6, 7,
O , ' , -, 3, 4, /, ?6
• No spaces
• May contain numbers (0–9), lower and upper case letters (a–z, A–Z)
• Is case-sensitive.
How many passwords are there? How long would it take to try all passwords if a password can be tested in 1 microsecond?
2.44. A multiple choice test has 10 questions with 3 choices each. How many ways are there to
answer the test? What is the probability that two papers have the same answers?
2.45. A student has five different t-shirts and three pairs of jeans (“brand new,” “broken in,”
and “perfect”).
(a) How many days can the student dress without repeating the combination of jeans
and t-shirt?
(b) How many days can the student dress without repeating the combination of jeans
and t-shirt and without wearing the same t-shirt on two consecutive days?
2.46. Ordering a “deluxe” pizza means you have four choices from 15 available toppings. How
many combinations are possible if toppings can be repeated? If they cannot be repeated?
Assume that the order in which the toppings are selected does not matter.
2.47. A lecture room has 60 seats. In how many ways can 45 students occupy the seats in the
room?
Problems
87
2.48. List all possible permutations of two distinct objects; three distinct objects; four distinct
objects. Verify that the number is n!.
2.49. A toddler pulls three volumes of an encyclopedia from a bookshelf and, after being scolded, places them back in random order. What is the probability that the books are in the
correct order?
2.50. Five balls are placed at random in five buckets. What is the probability that each bucket
has a ball?
2.51. List all possible combinations of two objects from two distinct objects; three distinct objects; four distinct objects. Verify that the number is given by the binomial coefficient.
2.52. A dinner party is attended by four men and four women. How many unique ways can the
eight people sit around the table? How many unique ways can the people sit around the
table with men and women alternating seats?
2.53. A hot dog vendor provides onions, relish, mustard, ketchup, Dijon ketchup, and hot peppers for your hot dog. How many variations of hot dogs are possible using one condiment? Two condiments? None, some, or all of the condiments?
2.54. A lot of 100 items contains k defective items. M items are chosen at random and tested.
(a) What is the probability that m are found defective? This is called the hypergeometric
distribution.
(b) A lot is accepted if 1 or fewer of the M items are defective. What is the probability
that the lot is accepted?
2.55. A park has N raccoons of which eight were previously captured and tagged. Suppose that
20 raccoons are captured. Find the probability that four of these are found to be tagged.
Denote this probability, which depends on N, by p(N). Find the value of N that maximizes
this probability. Hint: Compare the ratio p1N2/p1N - 12 to unity.
2.56. A lot of 50 items has 40 good items and 10 bad items.
(a) Suppose we test five samples from the lot, with replacement. Let X be the number of
defective items in the sample. Find P3X = k4.
(b) Suppose we test five samples from the lot, without replacement. Let Y be the number
of defective items in the sample. Find P3Y = k4.
2.57. How many distinct permutations are there of four red balls, two white balls, and three
black balls?
2.58. A hockey team has 6 forwards, 4 defensemen, and 2 goalies. At any time, 3 forwards, 2 defensemen, and 1 goalie can be on the ice. How many combinations of players can a coach
put on the ice?
2.59. Find the probability that in a class of 28 students exactly four were born in each of the
seven days of the week.
2.60. Show that
n
k
¢ ≤ = ¢
n
≤
n-k
2.61. In this problem we derive the multinomial coefficient. Suppose we partition a set of n distinct objects into J subsets B1 , B2 , Á , BJ of size k1 , Á , kJ , respectively, where ki Ú 0,
and k1 + k2 + Á + kJ = n.
(a) Let Ni denote the number of possible outcomes when the ith subset is selected.
Show that
N1 = ¢
n
n - k1
n - k1 - Á - kJ - 2
≤ , N2 = ¢
≤ , Á , NJ - 1 = ¢
≤.
k1
k2
kJ - 1
88
Chapter 2
Basic Concepts of Probability Theory
(b) Show that the number of partitions is then:
N1N2 Á NJ - 1 =
n!
.
k1! k2! Á kJ!
Section 2.4: Conditional Probability
2.62. A die is tossed twice and the number of dots facing up is counted and noted in the order
of occurrence. Let A be the event “number of dots in first toss is not less than number of
dots in second toss,” and let B be the event “number of dots in first toss is 6.” Find P3A ƒ B4
and P3B ƒ A4.
2.63. Use conditional probabilities and tree diagrams to find the probabilities for the elementary events in the random experiments defined in parts a to d of Problem 2.5.
2.64. In Problem 2.6 (name in hat), find P3B ¨ C ƒ A4 and P3C ƒ A ¨ B4.
2.65. In Problem 2.29 (message transmissions), find P3B ƒ A4 and P3A ƒ B4.
2.66. In Problem 2.8 (unit interval), find P3B ƒ A4 and P3A ƒ B4.
2.67. In Problem 2.36 (device lifetime), find P3B ƒ A4 and P3A ƒ B4.
2.68. In Problem 2.33, let A = 5hand rests in last 10 minutes6 and B = 5hand rests in last
5 minutes6. Find P3B ƒ A4 for parts a, b, and c.
2.69. A number x is selected at random in the interval 3- 1, 24. Let the events A = 5x 6 06,
B = 5 ƒ x - 0.5 ƒ 6 0.56, and C = 5x 7 0.756. Find P3A ƒ B4, P3B ƒ C4, P3A ƒ C c4, P3B ƒ C c4.
2.70. In Problem 2.36, let A be the event “lifetime is greater than t,” and B the event “lifetime
is greater than 2t.” Find P3B ƒ A4. Does the answer depend on t? Comment.
2.71. Find the probability that two or more students in a class of 20 students have the same
birthday. Hint: Use Corollary 1. How big should the class be so that the probability that
two or more students have the same birthday is 1/2?
2.72. A cryptographic hash takes a message as input and produces a fixed-length string as output, called the digital fingerprint. A brute force attack involves computing the hash for a
large number of messages until a pair of distinct messages with the same hash is found.
Find the number of attempts required so that the probability of obtaining a match is 1/2.
How many attempts are required to find a matching pair if the digital fingerprint is 64 bits
long? 128 bits long?
2.73. (a) Find P3A ƒ B4 if A ¨ B = ; if A ( B; if A ) B.
(b) Show that if P3A ƒ B4 7 P3A4, then P3B ƒ A4 7 P3B4.
2.74. Show that P3A ƒ B4 satisfies the axioms of probability.
(i) 0 … P3A ƒ B4 … 1
(ii) P3S ƒ B4 = 1
(iii) If A ¨ C = , then P3A ´ C ƒ B4 = P3A ƒ B4 + P3C ƒ B4.
2.75. Show that P3A ¨ B ¨ C4 = P3A ƒ B ¨ C4P3B ƒ C4P3C4.
2.76. In each lot of 100 items, two items are tested, and the lot is rejected if either of the tested
items is found defective.
(a) Find the probability that a lot with k defective items is accepted.
(b) Suppose that when the production process malfunctions, 50 out of 100 items are defective. In order to identify when the process is malfunctioning, how many items
should be tested so that the probability that one or more items are found defective is
at least 99%?
Problems
89
2.77. A nonsymmetric binary communications channel is shown in Fig. P2.3. Assume the input
is “0” with probability p and “1” with probability 1 - p.
(a) Find the probability that the output is 0.
(b) Find the probability that the input was 0 given that the output is 1. Find the
probability that the input is 1 given that the output is 1. Which input is more
probable?
Input
0
1 ε1
Output
0
ε1
ε2
1
1 ε2
1
FIGURE P2.3
2.78. The transmitter in Problem 2.4 is equally likely to send X = +2 as X = -2. The malicious channel counts the number of heads in two tosses of a fair coin to decide by how
much to reduce the magnitude of the input to produce the output Y.
(a) Use a tree diagram to find the set of possible input-output pairs.
(b) Find the probabilities of the input-output pairs.
(c) Find the probabilities of the output values.
(d) Find the probability that the input was X = +2 given that Y = k.
2.79. One of two coins is selected at random and tossed three times. The first coin comes up
heads with probability p1 and the second coin with probability p2 = 2/3 7 p1 = 1/3.
(a) What is the probability that the number of heads is k?
(b) Find the probability that coin 1 was tossed given that k heads were observed, for
k = 0, 1, 2, 3.
(c) In part b, which coin is more probable when k heads have been observed?
(d) Generalize the solution in part b to the case where the selected coin is tossed m times.
In particular, find a threshold value T such that when k 7 T heads are observed, coin
1 is more probable, and when k 6 T are observed, coin 2 is more probable.
(e) Suppose that p2 = 1 (that is, coin 2 is two-headed) and 0 6 p1 6 1. What is the
probability that we do not determine with certainty whether the coin is 1 or 2?
2.80. A computer manufacturer uses chips from three sources. Chips from sources A, B, and C
are defective with probabilities .005, .001, and .010, respectively. If a randomly selected
chip is found to be defective, find the probability that the manufacturer was A; that the
manufacturer was C. Assume that the proportions of chips from A, B, and C are 0.5, 0.1,
and 0.4, respectively.
2.81. A ternary communication system is shown in Fig. P2.4. Suppose that input symbols 0, 1,
and 2 occur with probability 1/3 respectively.
(a) Find the probabilities of the output symbols.
(b) Suppose that a 1 is observed at the output. What is the probability that the input was
0? 1? 2?
90
Chapter 2
Basic Concepts of Probability Theory
Input
1ε
Output
0
ε
0
1
1ε
ε
1
ε
2
2
1ε
FIGURE P2.4
Section 2.5: Independence of Events
2.82. Let S = 51, 2, 3, 46 and A = 51, 26, B = 51, 36, C = 51, 46. Assume the outcomes are
equiprobable. Are A, B, and C independent events?
2.83. Let U be selected at random from the unit interval. Let A = 50 6 U 6 1/26,
B = 51/4 6 U 6 3/46, and C = 51/2 6 U 6 16. Are any of these events independent?
2.84. Alice and Mary practice free throws at the basketball court after school. Alice makes free
throws with probability pa and Mary makes them with probability pm . Find the probability of the following outcomes when Alice and Mary each take one shot: Alice scores a
basket; Either Alice or Mary scores a basket; both score; both miss.
2.85. Show that if A and B are independent events, then the pairs A and Bc, Ac and B, and Ac
and Bc are also independent.
2.86. Show that events A and B are independent if P3A ƒ B4 = P3A ƒ Bc4.
2.87. Let A, B, and C be events with probabilities P[A], P[B], and P[C].
(a) Find P3A ´ B4 if A and B are independent.
(b) Find P3A ´ B4 if A and B are mutually exclusive.
(c) Find P3A ´ B ´ C4 if A, B, and C are independent.
(d) Find P3A ´ B ´ C4 if A, B, and C are pairwise mutually exclusive.
2.88. An experiment consists of picking one of two urns at random and then selecting a ball
from the urn and noting its color (black or white). Let A be the event “urn 1 is selected”
and B the event “a black ball is observed.” Under what conditions are A and B independent?
2.89. Find the probabilities in Problem 2.14 assuming that events A, B, and C are independent.
2.90. Find the probabilities that the three types of systems are “up” in Problem 2.15. Assume that all units in the system fail independently and that a type k unit fails with
probability pk .
2.91. Find the probabilities that the system is “up” in Problem 2.16. Assume that all units in the
system fail independently and that a type k unit fails with probability pk .
2.92. A random experiment is repeated a large number of times and the occurrence of events
A and B is noted. How would you test whether events A and B are independent?
2.93. Consider a very long sequence of hexadecimal characters. How would you test whether
the relative frequencies of the four bits in the hex characters are consistent with independent tosses of coin?
2.94. Compute the probability of the system in Example 2.35 being “up” when a second controller is added to the system.
Problems
91
2.95. In the binary communication system in Example 2.26, find the value of e for which the
input of the channel is independent of the output of the channel. Can such a channel be
used to transmit information?
2.96. In the ternary communication system in Problem 2.81, is there a choice of e for which the
input of the channel is independent of the output of the channel?
Section 2.6: Sequential Experiments
2.97. A block of 100 bits is transmitted over a binary communication channel with probability
of bit error p = 10 -2.
(a) If the block has 1 or fewer errors then the receiver accepts the block. Find the probability that the block is accepted.
(b) If the block has more than 1 error, then the block is retransmitted. Find the probability that M retransmissions are required.
2.98. A fraction p of items from a certain production line is defective.
(a) What is the probability that there is more than one defective item in a batch of n
items?
(b) During normal production p = 10 -3 but when production malfunctions p = 10-1.
Find the size of a batch that should be tested so that if any items are found defective
we are 99% sure that there is a production malfunction.
2.99. A student needs eight chips of a certain type to build a circuit. It is known that 5% of
these chips are defective. How many chips should he buy for there to be a greater than
90% probability of having enough chips for the circuit?
2.100. Each of n terminals broadcasts a message in a given time slot with probability p.
(a) Find the probability that exactly one terminal transmits so the message is received by
all terminals without collision.
(b) Find the value of p that maximizes the probability of successful transmission in part a.
(c) Find the asymptotic value of the probability of successful transmission as n becomes
large.
2.101. A system contains eight chips. The lifetime of each chip has a Weibull probability law:
k
with parameters l and k = 2: P31t, q 24 = e -1lt2 for t Ú 0. Find the probability that at
least two chips are functioning after 2/l seconds.
2.102. A machine makes errors in a certain operation with probability p. There are two types of
errors. The fraction of errors that are type 1 is a, and type 2 is 1 - a.
(a) What is the probability of k errors in n operations?
(b) What is the probability of k1 type 1 errors in n operations?
(c) What is the probability of k2 type 2 errors in n operations?
(d) What is the joint probability of k1 and k2 type 1 and 2 errors, respectively, in n operations?
2.103. Three types of packets arrive at a router port. Ten percent of the packets are “expedited
forwarding (EF),” 30 percent are “assured forwarding (AF),” and 60 percent are “best effort (BE).”
(a) Find the probability that k of N packets are not expedited forwarding.
(b) Suppose that packets arrive one at a time. Find the probability that k packets are
received before an expedited forwarding packet arrives.
(c) Find the probability that out of 20 packets, 4 are EF packets, 6 are AF packets, and 10
are BE.
92
Chapter 2
Basic Concepts of Probability Theory
2.104. A run-length coder segments a binary information sequence into strings that consist of
either a “run” of k “zeros” punctuated by a “one”, for k = 0, Á , m - 1, or a string of m
“zeros.” The m = 3 case is:
2.105.
2.106.
2.107.
2.108.
String
Run-length k
1
01
001
0
1
2
000
3
Suppose that the information is produced by a sequence of Bernoulli trials with
P3“one”4 = P3success4 = p.
(a) Find the probability of run-length k in the m = 3 case.
(b) Find the probability of run-length k for general m.
The amount of time cars are parked in a parking lot follows a geometric probability law
with p = 1/2. The charge for parking in the lot is $1 for each half-hour or less.
(a) Find the probability that a car pays k dollars.
(b) Suppose that there is a maximum charge of $6. Find the probability that a car pays k
dollars.
A biased coin is tossed repeatedly until heads has come up three times. Find the probability that k tosses are required. Hint: Show that 5“k tosses are required”6 = A ¨ B,
where A = 5“kth toss is heads”6 and B = 5“2 heads occurs in k - 1 tosses”6.
An urn initially contains two black balls and two white balls. The following experiment is
repeated indefinitely: A ball is drawn from the urn; if the color of the ball is the same as
the majority of balls remaining in the urn, then the ball is put back in the urn. Otherwise
the ball is left out.
(a) Draw the trellis diagram for this experiment and label the branches by the transition
probabilities.
(b) Find the probabilities for all sequences of outcomes of length 2 and length 3.
(c) Find the probability that the urn contains no black balls after three draws; no white
balls after three draws.
(d) Find the probability that the urn contains two black balls after n trials; two white
balls after n trials.
In Example 2.45, let p01n2 and p11n2 be the probabilities that urn 0 or urn 1 is used in the
nth subexperiment.
(a) Find p0112 and p1112.
(b) Express p01n + 12 and p11n + 12 in terms of p01n2 and p11n2.
(c) Evaluate p01n2 and p11n2 for n = 2, 3, 4.
(d) Find the solution to the recursion in part b with the initial conditions given in part a.
(e) What are the urn probabilities as n approaches infinity?
*Section 2.7: Synthesizing Randomness: Number Generators
2.109. An urn experiment is to be used to simulate a random experiment with sample
space S = 51, 2, 3, 4, 56 and probabilities p1 = 1/3, p2 = 1/5, p3 = 1/4, p4 = 1/7, and
p5 = 1 - 1p1 + p2 + p3 + p42. How many balls should the urn contain? Generalize
Problems
2.110.
2.111.
2.112.
2.113.
93
the result to show that an urn experiment can be used to simulate any random experiment with finite sample space and with probabilities given by rational numbers.
Suppose we are interested in using tosses of a fair coin to simulate a random experiment
in which there are six equally likely outcomes, where S = 50, 1, 2, 3, 4, 56. The following
version of the “rejection method” is proposed:
1. Toss a fair coin three times and obtain a binary number by identifying heads with
zero and tails with one.
2. If the outcome of the coin tosses in step 1 is the binary representation for a number in S, output the number. Otherwise, return to step 1.
(a) Find the probability that a number is produced in step 2.
(b) Show that the numbers that are produced in step 2 are equiprobable.
(c) Generalize the above algorithm to show how coin tossing can be used to simulate
any random urn experiment.
Use the rand function in Octave to generate 1000 pairs of numbers in the unit square.
Plot an x-y scattergram to confirm that the resulting points are uniformly distributed in
the unit square.
Apply the rejection method introduced above to generate points that are uniformly distributed in the x 7 y portion of the unit square. Use the rand function to generate a pair
of numbers in the unit square. If x 7 y, accept the number. If not, select another pair.
Plot an x-y scattergram for the pair of accepted numbers and confirm that the resulting
points are uniformly distributed in the x 7 y region of the unit square.
The sample mean-squared value of the numerical outcomes X112, X122, Á X1n2 of a series of n repetitions of an experiment is defined by
8X29n =
1 n 2
X 1j2.
n ja
=1
(a) What would you expect this expression to converge to as the number of repetitions n
becomes very large?
(b) Find a recursion formula for 8X29n similar to the one found in Problem 1.9.
2.114. The sample variance is defined as the mean-squared value of the variation of the samples
about the sample mean
8V29n =
1 n
5X1j2 - 8X9n62.
n ja
=1
Note that the 8X9n also depends on the sample values. (It is customary to replace the n in
the denominator with n - 1 for technical reasons that will be discussed in Chapter 8. For
now we will use the above definition.)
(a) Show that the sample variance satisfies the following expression:
8V29n = 8X29n - 8X92n.
(b) Show that the sample variance satisfies the following recursion formula:
8V29n = a1 with 8V290 = 0.
1
1
1
b8V29n - 1 + a1 - b1X1n2 - 8X9n - 122,
n
n
n
94
Chapter 2
Basic Concepts of Probability Theory
2.115. Suppose you have a program to generate a sequence of numbers Un that is uniformly distributed in [0, 1]. Let Yn = aUn + b.
(a) Find a and b so that Yn is uniformly distributed in the interval [a, b].
(b) Let a = -5 and b = 15. Use Octave to generate Yn and to compute the sample mean
and sample variance in 1000 repetitions. Compare the sample mean and sample variance to 1a + b2/2 and 1b - a22/12, respectively.
2.116. Use Octave to simulate 100 repetitions of the random experiment where a coin is tossed
16 times and the number of heads is counted.
(a) Confirm that your results are similar to those in Figure 2.18.
(b) Rerun the experiment with p = 0.25 and p = 0.75. Are the results as expected?
*Section 2.8: Fine Points: Event Classes
2.117. In Example 2.49, Homer maps the outcomes from Lisa’s sample space SL = 5r, g, t6 into
a smaller sample space SH = 5R, G6 : f1r2 = R, f1g2 = G, and f1t2 = G.
Define the inverse image events as follows:
f -115R62 = A 1 = 5r6 and f -115G62 = A 2 = 5g, t6.
Let A and B be events in Homer’s sample space.
(a) Show that f -11A ´ B2 = f -11A2 ´ f -11B2.
(b) Show that f -11A ¨ B2 = f -11A2 ¨ f -11B2.
(c) Show that f -11Ac2 = f -11A2c.
(d) Show that the results in parts a, b, and c hold for a general mapping f from a sample
space S to a set S¿.
2.118. Let f be a mapping from a sample space S to a finite set S¿ = 5y1 , y2 , Á , yn6.
(a) Show that the set of inverse images A k = f -115yk62 forms a partition of S.
(b) Show that any event B of S¿ can be related to a union of A k’s.
2.119. Let A be any subset of S . Show that the class of sets 5, A, Ac, S6 is a field.
*Section 2.9: Fine Points: Probabilities of Sequences of Events
2.120. Find the countable union of the following sequences of events:
(a) A n = 3a + 1/n, b - 1/n4.
(b) Bn = 1-n, b - 1/n].
(c) Cn = 3a + 1/n, b2.
2.121. Find the countable intersection of the following sequences of events:
(a) A n = 1a - 1/n, b + 1/n2.
(b) Bn = 3a, b + 1/n2.
(c) Cn = 1a - 1/n, b4.
2.122. (a) Show that the Borel field can be generated from the complements and countable
intersections and unions of open sets (a, b).
(b) Suggest other classes of sets that can generate the Borel field.
2.123. Find expressions for the probabilities of the events in Problem 2.120.
2.124. Find expressions for the probabilities of the events in Problem 2.121.
Problems
95
Problems Requiring Cumulative Knowledge
2.125. Compare the binomial probability law and the hypergeometric law introduced in Problem 2.54 as follows.
(a) Suppose a lot has 20 items of which five are defective. A batch of ten items is tested
without replacement. Find the probability that k are found defective for k = 0, Á , 10.
Compare this to the binomial probabilities with n = 10 and p = 5/20 = .25.
(b) Repeat but with a lot of 1000 items of which 250 are defective. A batch of ten items is
tested without replacement. Find the probability that k are found defective for
k = 0, Á , 10. Compare this to the binomial probabilities with n = 10 and p = 5/20
= .25.
2.126. Suppose that in Example 2.43, computer A sends each message to computer B simultaneously over two unreliable radio links. Computer B can detect when errors have occurred in either link. Let the probability of message transmission error in link 1 and link
2 be q1 and q2 respectively. Computer B requests retransmissions until it receives an
error-free message on either link.
(a) Find the probability that more than k transmissions are required.
(b) Find the probability that in the last transmission, the message on link 2 is received
free of errors.
2.127. In order for a circuit board to work, seven identical chips must be in working order. To
improve reliability, an additional chip is included in the board, and the design allows it to
replace any of the seven other chips when they fail.
(a) Find the probability pb that the board is working in terms of the probability p that an
individual chip is working.
(b) Suppose that n circuit boards are operated in parallel, and that we require a 99.9%
probability that at least one board is working. How many boards are needed?
2.128. Consider a well-shuffled deck of cards consisting of 52 distinct cards, of which four are
aces and four are kings.
(a) Find the probability of obtaining an ace in the first draw.
(b) Draw a card from the deck and look at it. What is the probability of obtaining an
ace in the second draw? Does the answer change if you had not observed the first
draw?
(c) Suppose we draw seven cards from the deck. What is the probability that the seven
cards include three aces? What is the probability that the seven cards include two
kings? What is the probability that the seven cards include three aces and/or two
kings?
(d) Suppose that the entire deck of cards is distributed equally among four players. What
is the probability that each player gets an ace?
CHAPTER
Discrete Random
Variables
3
In most random experiments we are interested in a numerical attribute of the outcome
of the experiment. A random variable is defined as a function that assigns a numerical
value to the outcome of the experiment. In this chapter we introduce the concept of a
random variable and methods for calculating probabilities of events involving a random variable. We focus on the simplest case, that of discrete random variables, and introduce the probability mass function. We define the expected value of a random
variable and relate it to our intuitive notion of an average. We also introduce the conditional probability mass function for the case where we are given partial information
about the random variable. These concepts and their extension in Chapter 4 provide us
with the tools to evaluate the probabilities and averages of interest in the design of systems involving randomness.
Throughout the chapter we introduce important random variables and discuss
typical applications where they arise. We also present methods for generating random
variables. These methods are used in computer simulation models that predict the behavior and performance of complex modern systems.
3.1
THE NOTION OF A RANDOM VARIABLE
The outcome of a random experiment need not be a number. However, we are usually
interested not in the outcome itself, but rather in some measurement or numerical attribute of the outcome. For example, in n tosses of a coin, we may be interested in the
total number of heads and not in the specific order in which heads and tails occur. In a
randomly selected Web document, we may be interested only in the length of the document. In each of these examples, a measurement assigns a numerical value to the outcome of the random experiment. Since the outcomes are random, the results of the
measurements will also be random. Hence it makes sense to talk about the probabilities of the resulting numerical values. The concept of a random variable formalizes this
notion.
A random variable X is a function that assigns a real number, X1z2, to each outcome z in the sample space of a random experiment. Recall that a function is simply a
rule for assigning a numerical value to each element of a set, as shown pictorially in
96
Section 3.1
The Notion of a Random Variable
97
S
X(z) x
real
line
z
x
SX
FIGURE 3.1
A random variable assigns a number X1z2 to each outcome z in the
sample space S of a random experiment.
Fig. 3.1. The specification of a measurement on the outcome of a random experiment
defines a function on the sample space, and hence a random variable. The sample space
S is the domain of the random variable, and the set SX of all values taken on by X is the
range of the random variable. Thus SX is a subset of the set of all real numbers. We will
use the following notation: capital letters denote random variables, e.g., X or Y, and
lower case letters denote possible values of the random variables, e.g., x or y.
Example 3.1 Coin Tosses
A coin is tossed three times and the sequence of heads and tails is noted.The sample space for this
experiment is S = 5HHH, HHT, HTH, HTT, THH, THT, TTH, TTT6. Let X be the number of
heads in the three tosses. X assigns each outcome z in S a number from the set SX = 50, 1, 2, 36.
The table below lists the eight outcomes of S and the corresponding values of X.
z:
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
X1z2:
3
2
2
2
1
1
1
0
X is then a random variable taking on values in the set SX = 50, 1, 2, 36.
Example 3.2
A Betting Game
A player pays $1.50 to play the following game: A coin is tossed three times and the number of
heads X is counted. The player receives $1 if X = 2 and $8 if X = 3, but nothing otherwise. Let
Y be the reward to the player. Y is a function of the random variable X and its outcomes can be
related back to the sample space of the underlying random experiment as follows:
z:
HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
X1z2:
3
8
2
1
2
1
2
1
1
0
1
0
1
0
0
0
Y1z2:
Y is then a random variable taking on values in the set SY = 50, 1, 86.
98
Chapter 3
Discrete Random Variables
The above example shows that a function of a random variable produces another
random variable.
For random variables, the function or rule that assigns values to each outcome is
fixed and deterministic, as, for example, in the rule “count the total number of dots facing up in the toss of two dice.” The randomness in the experiment is complete as soon
as the toss is done. The process of counting the dots facing up is deterministic. Therefore the distribution of the values of a random variable X is determined by the probabilities of the outcomes z in the random experiment. In other words, the randomness in
the observed values of X is induced by the underlying random experiment, and we
should therefore be able to compute the probabilities of the observed values of X in
terms of the probabilities of the underlying outcomes.
Example 3.3 Coin Tosses and Betting
Let X be the number of heads in three independent tosses of a fair coin. Find the probability of
the event 5X = 26. Find the probability that the player in Example 3.2 wins $8.
Note that X1z2 = 2 if and only if z is in 5HHT, HTH, THH6. Therefore
P3X = 24 = P35HHT, HTH, HHT64
= P35HHT64 + P35HTH64 + P35HHT64
= 3/8.
The event 5Y = 86 occurs if and only if the outcome z is HHH, therefore
P3Y = 84 = P35HHH64 = 1/8.
Example 3.3 illustrates a general technique for finding the probabilities of events
involving the random variable X. Let the underlying random experiment have sample
space S and event class F. To find the probability of a subset B of R, e.g., B = 5xk6, we
need to find the outcomes in S that are mapped to B, that is,
A = 5z : X1z2 H B6
(3.1)
as shown in Fig. 3.2. If event A occurs then X1z2 H B, so event B occurs. Conversely, if
event B occurs, then the value X1z2 implies that z is in A, so event A occurs. Thus the
probability that X is in B is given by:
P3X H B4 = P3A4 = P35z : X1z2 H B64.
(3.2)
S
A
B
FIGURE 3.2
P3X in B4 ⴝ P3z in A4
real
line
Section 3.2
Discrete Random Variables and Probability Mass Function
99
We refer to A and B as equivalent events.
In some random experiments the outcome z is already the numerical value we
are interested in. In such cases we simply let X1z2 = z, that is, the identity function, to
obtain a random variable.
* 3.1.1 Fine Point: Formal Definition of a Random Variable
In going from Eq. (3.1) to Eq. (3.2) we actually need to check that the event A is in F,
because only events in F have probabilities assigned to them. The formal definition of
a random variable in Chapter 4 will explicitly state this requirement.
If the event class F consists of all subsets of S, then the set A will always be in F,
and any function from S to R will be a random variable. However, if the event class F
does not consist of all subsets of S, then some functions from S to R may not be random
variables, as illustrated by the following example.
Example 3.4
A Function That Is Not a Random Variable
This example shows why the definition of a random variable requires that we check that the set
A is in F. An urn contains three balls. One ball is electronically coded with a label 00. Another
ball is coded with 01, and the third ball has a 10 label. The sample space for this experiment is
S = 500, 01, 106. Let the event class F consist of all unions, intersections, and complements of
the events A 1 = 500, 106 and A 2 = 5016. In this event class, the outcome 00 cannot be distinguished from the outcome 10. For example, this could result from a faulty label reader that cannot distinguish between 00 and 10. The event class has four events F = 5, 500, 106, 5016,
500, 01, 1066. Let the probability assignment for the events in F be P3500, 1064 = 2/3 and
P350164 = 1/3.
Consider the following function X from S to R: X1002 = 0, X1012 = 1, X1102 = 2. To
find the probability of 5X = 06, we need the probability of 5z: X1z2 = 06 = 5006. However,
5006 is not in the class F, and so X is not a random variable because we cannot determine the
probability that X = 0.
3.2
DISCRETE RANDOM VARIABLES AND PROBABILITY MASS FUNCTION
A discrete random variable X is defined as a random variable that assumes values from
a countable set, that is, SX = 5x1 , x2 , x3 , Á 6. A discrete random variable is said to be
finite if its range is finite, that is, SX = 5x1 , x2 , Á , xn6. We are interested in finding the
probabilities of events involving a discrete random variable X. Since the sample space SX
is discrete, we only need to obtain the probabilities for the events A k = 5z: X1z2 = xk6
in the underlying random experiment. The probabilities of all events involving X can be
found from the probabilities of the A k’s.
The probability mass function (pmf) of a discrete random variable X is defined as:
pX1x2 = P3X = x4 = P35z : X1z2 = x64 for x a real number.
(3.3)
Note that pX1x2 is a function of x over the real line, and that pX1x2 can be nonzero
only at the values x1 , x2 , x3 , Á . For xk in SX , we have pX1xk2 = P[A k].
100
Chapter 3
Discrete Random Variables
S
A1 A2 … Ak
…
x1
x2
…
xk
…
FIGURE 3.3
Partition of sample space S associated with a discrete random variable.
The events A 1 , A 2 , Á form a partition of S as illustrated in Fig. 3.3. To see this,
we first show that the events are disjoint. Let j Z k, then
A j ¨ A k = 5z: X1z2 = xj and X1z2 = xk6 =
since each z is mapped into one and only one value in SX . Next we show that S is the
union of the A k’s. Every z in S is mapped into some xk so that every z belongs to an
event A k in the partition. Therefore:
S = A1 ´ A2 ´ Á .
All events involving the random variable X can be expressed as the union of
events A k’s. For example, suppose we are interested in the event X in B = 5x2 , x56,
then
P3X in B4 = P35z : X1z2 = x26 ´ 5z: X1z2 = x564
= P3A 2 ´ A 54 = P3A 24 + P3A 54
= pX122 + pX152.
The pmf pX1x2 satisfies three properties that provide all the information required to calculate probabilities for events involving the discrete random variable X:
(i) pX1x2 Ú 0 for all x
(3.4a)
(ii) a pX1x2 = a pX1xk2 = a P3A k4 = 1
(3.4b)
(iii) P3X in B4 = a pX1x2 where B ( SX .
(3.4c)
xHSX
all k
all k
xHB
Property (i) is true because the pmf values are defined as a probability, pX1x2 =
P3X= x4. Property (ii) follows because the events A k = 5X = xk6 form a partition
of S. Note that the summations in Eqs. (3.4b) and (3.4c) will have a finite or infinite
number of terms depending on whether the random variable is finite or not. Next consider property (iii). Any event B involving X is the union of elementary events, so by
Axiom III¿ we have:
P3X in B4 = P3 d 5z: X1z2 = x64 = a P3X = x4 = a pX1x2.
xHB
xHB
xHB
Section 3.2
Discrete Random Variables and Probability Mass Function
101
The pmf of X gives us the probabilities for all the elementary events from SX .
The probability of any subset of SX is obtained from the sum of the corresponding elementary events. In fact we have everything required to specify a probability law for the
outcomes in SX . If we are only interested in events concerning X, then we can forget
about the underlying random experiment and its associated probability law and just
work with SX and the pmf of X.
Example 3.5
Coin Tosses and Binomial Random Variable
Let X be the number of heads in three independent tosses of a coin. Find the pmf of X.
Proceeding as in Example 3.3, we find:
p0 = P3X = 04 = P35TTT64 = 11 - p23,
p1 = P3X = 14 = P35HTT64 + P35THT64 + P35TTH64 = 311 - p22p,
p2 = P3X = 24 = P35HHT64 + P35HTH64 + P35THH64 = 311 - p2p2,
p3 = P3X = 34 = P35HHH64 = p3.
Note that pX102 + pX112 + pX122 + pX132 = 1.
Example 3.6
A Betting Game
A player receives $1 if the number of heads in three coin tosses is 2, $8 if the number is 3, but
nothing otherwise. Find the pmf of the reward Y.
pY102 = P3z H 5TTT, TTH, THT, HTT64 = 4/8 = 1/2
pY112 = P3z H 5THH, HTH, HHT64 = 3/8
pY182 = P3z H 5HHH64 = 1/8.
Note that pY102 + pY112 + pY182 = 1.
Figures 3.4(a) and (b) show the graph of pX1x2 versus x for the random variables
in Examples 3.5 and 3.6, respectively. In general, the graph of the pmf of a discrete random variable has vertical arrows of height pX1xk2 at the values xk in SX . We may view
the total probability as one unit of mass and pX1x2 as the amount of probability mass
that is placed at each of the discrete points x1 , x2 , Á . The relative values of pmf at different points give an indication of the relative likelihoods of occurrence.
Example 3.7
Random Number Generator
A random number generator produces an integer number X that is equally likely to be any element in the set SX = 50, 1, 2, Á , M - 16. Find the pmf of X.
For each k in SX , we have pX1k2 = 1/M. Note that
pX102 + pX112 + Á + pX1M - 12 = 1.
We call X the uniform random variable in the set 50, 1, Á , M - 16.
102
Chapter 3
Discrete Random Variables
3
8
3
8
1
8
1
8
0
1
2
x
3
(a)
4
8
3
8
1
8
x
0
1
2
3
4
5
6
7
8
(b)
FIGURE 3.4
(a) Graph of pmf in three coin tosses; (b) Graph of pmf in betting game.
Example 3.8
Bernoulli Random Variable
Let A be an event of interest in some random experiment, e.g., a device is not defective. We
say that a “success” occurs if A occurs when we perform the experiment. The Bernoulli random variable IA is equal to 1 if A occurs and zero otherwise, and is given by the indicator
function for A:
IA1z2 = b
0
1
if z not in A
if z in A.
(3.5a)
Find the pmf of IA .
IA1z2 is a finite discrete random variable with values from SI = 50, 16, with pmf:
pI102 = P35z : z H Ac64 = 1 - p
pI112 = P35z : z H A64 = p.
(3.5b)
We call IA the Bernoulli random variable. Note that pI112 + pI122 = 1.
Example 3.9
Message Transmissions
Let X be the number of times a message needs to be transmitted until it arrives correctly at its
destination. Find the pmf of X. Find the probability that X is an even number.
X is a discrete random variable taking on values from SX = 51, 2, 3, Á 6. The event
5X = k6 occurs if the underlying experiment finds k - 1 consecutive erroneous transmissions
Section 3.2
Discrete Random Variables and Probability Mass Function
103
(“failures”) followed by a error-free one (“success”):
pX1k2 = P3X = k4 = P300 Á 014 = 11 - p2k - 1p = qk - 1p k = 1, 2, Á .
(3.6)
We call X the geometric random variable, and we say that X is geometrically distributed. In
Eq. (2.42b), we saw that the sum of the geometric probabilities is 1.
1
1
.
=
P3X is even4 = a pX12k2 = p a q2k - 1 = p
2
1
+
q
1 - q
k=1
k=1
q
q
Example 3.10 Transmission Errors
A binary communications channel introduces a bit error in a transmission with probability p. Let
X be the number of errors in n independent transmissions. Find the pmf of X. Find the probability of one or fewer errors.
X takes on values in the set SX = 50, 1, Á , n6. Each transmission results in a “0” if there is
no error and a “1” if there is an error, P3“1”4 = p and P3“0”4 = 1 - p. The probability of k errors
in n bit transmissions is given by the probability of an error pattern that has k 1’s and n - k 0’s:
n
pX1k2 = P3X = k4 = ¢ ≤ pk11 - p2n - k k = 0, 1, Á , n.
k
(3.7)
We call X the binomial random variable, with parameters n and p. In Eq. (2.39b), we saw that the
sum of the binomial probabilities is 1.
n
n
P3X … 14 = ¢ ≤ p011 - p2n - 0 + ¢ ≤ p111 - p2n - 1 = 11 - p2n + np11 - p2n - 1.
0
1
Finally, let’s consider the relationship between relative frequencies and the pmf
pX1xk2. Suppose we perform n independent repetitions to obtain n observations of
the discrete random variable X. Let Nk1n2 be the number of times the event X = xk
occurs and let fk1n2 = Nk1n2/n be the corresponding relative frequency. As n becomes large we expect that fk1n2 : pX1xk2. Therefore the graph of relative frequencies should approach the graph of the pmf. Figure 3.5(a) shows the graph of relative
0.5
0.14
0.12
0.4
0.1
0.08
0.3
0.06
0.2
0.04
0.1
0.02
0
1
0
1
2
3
4
(a)
5
6
7
8
0
0
2
4
6
(b)
8
10
FIGURE 3.5
(a) Relative frequencies and corresponding uniform pmf; (b) Relative frequencies and corresponding geometric pmf.
12
104
Chapter 3
Discrete Random Variables
frequencies for 1000 repetitions of an experiment that generates a uniform random
variable from the set 50, 1, Á , 76 and the corresponding pmf. Figure 3.5(b) shows the
graph of relative frequencies and pmf for a geometric random variable with p = 1/2
and n = 1000 repetitions. In both cases we see that the graph of relative frequencies
approaches that of the pmf.
3.3
EXPECTED VALUE AND MOMENTS OF DISCRETE RANDOM VARIABLE
In order to completely describe the behavior of a discrete random variable, an entire
function, namely pX1x2, must be given. In some situations we are interested in a few
parameters that summarize the information provided by the pmf. For example, Fig. 3.6
shows the results of many repetitions of an experiment that produces two random variables. The random variable Y varies about the value 0, whereas the random variable X
varies around the value 5. It is also clear that X is more spread out than Y. In this section we introduce parameters that quantify these properties.
The expected value or mean of a discrete random variable X is defined by
mX = E3X4 = a xpX1x2 = a xkpX1xk2.
xHSX
(3.8)
k
The expected value E[X] is defined if the above sum converges absolutely, that is,
E3 ƒ X ƒ 4 = a ƒ xk ƒ pX1xk2 6 q .
(3.9)
k
There are random variables for which Eq. (3.9) does not converge. In such cases, we say
that the expected value does not exist.
8
7
6
Xi
5
4
3
2
1
Yi
0
1
2
0
10 20 30 40 50 60 70 80 90 100 110 120 130 140 150
Trial number
FIGURE 3.6
The graphs show 150 repetitions of the experiments yielding X and Y. It is clear
that X is centered about the value 5 while Y is centered about 0. It is also clear that
X is more spread out than Y.
Section 3.3
Expected Value and Moments of Discrete Random Variable
105
If we view pX1x2 as the distribution of mass on the points x1 , x2 , Á in the real
line, then E[X] represents the center of mass of this distribution. For example, in Fig.
3.5(a), we can see that the pmf of a discrete random variable that is uniformly distributed in 50, Á , 76 has a center of mass at 3.5.
Example 3.11 Mean of Bernoulli Random Variable
Find the expected value of the Bernoulli random variable IA .
From Example 3.8, we have
E3IA4 = 0pI102 + 1pI112 = p.
where p is the probability of success in the Bernoulli trial.
Example 3.12 Three Coin Tosses and Binomial Random Variable
Let X be the number of heads in three tosses of a fair coin. Find E[X].
Equation (3.8) and the pmf of X that was found in Example 3.5 gives:
3
3
3
1
1
E3X4 = a kpX1k2 = 0 a b + 1 a b + 2a b + 3a b = 1.5.
8
8
8
8
k=0
Note that the above is the n = 3, p = 1/2 case of a binomial random variable, which we will see
has E3X4 = np.
Example 3.13 Mean of a Uniform Discrete Random Variable
Let X be the random number generator in Example 3.7. Find E[X].
From Example 3.5 we have pX1j2 = 1/M for j = 0, Á , M - 1, so
M-1
1M - 12M
1M - 12
1
1
E3X4 = a k
=
50 + 1 + 2 + Á + M - 16 =
=
M
2M
2
k=0 M
Á
+ L = 1L + 12L/2. Note that for M = 8, E3X4 = 3.5,
where we used the fact that 1 + 2 +
which is consistent with our observation of the center of mass in Fig. 3.5(a).
The use of the term “expected value” does not mean that we expect to observe
E[X] when we perform the experiment that generates X. For example, the expected
value of a Bernoulli trial is p, but its outcomes are always either 0 or 1.
E[X] corresponds to the “average of X” in a large number of observations of X.
Suppose we perform n independent repetitions of the experiment that generates X,
and we record the observed values as x112, x122, Á , x1n2, where x( j) is the observation
in the jth experiment. Let Nk1n2 be the number of times xk is observed, and let
fk1n2 = Nk1n2/n be the corresponding relative frequency. The arithmetic average, or
sample mean, of the observations, is:
8X9n =
x112 + x122 + Á + x1n2
=
x1N11n2 + x2N21n2 + Á + xkNk1n2 + Á
n
= x1f11n2 + x2f21n2 + Á + xkfk1n2 + Á
= a xkfk1n2.
k
n
(3.10)
106
Chapter 3
Discrete Random Variables
The first numerator adds the observations in the order in which they occur, and the second numerator counts how many times each xk occurs and then computes the total. As n
becomes large, we expect relative frequencies to approach the probabilities pX1xk2:
lim fk1n2 = pX1xk2 for all k.
n: q
(3.11)
Equation (3.10) then implies that:
8X9n = a xkfk1n2 : a xkpX1xk2 = E3X4.
k
(3.12)
k
Thus we expect the sample mean to converge to E[X] as n becomes large.
Example 3.14 A Betting Game
A player at a fair pays $1.50 to toss a coin three times. The player receives $1 if the number of
heads is 2, $8 if the number is 3, but nothing otherwise. Find the expected value of the reward Y.
What is the expected value of the gain?
The expected reward is:
4
3
1
11
E3Y4 = 0pY102 + 1pY1122 + 8pY182 = 0a b + 1a b + 8a b = a b.
8
8
8
8
The expected gain is:
E3Y - 1.54 =
12
1
11
= - .
8
8
8
Players lose 12.5 cents on average per game, so the house makes a nice profit over the long run.
In Example 3.18 we will see that some engineering designs also “bet” that users will behave a
certain way.
Example 3.15 Mean of a Geometric Random Variable
Let X be the number of bytes in a message, and suppose that X has a geometric distribution with
parameter p. Find the mean of X.
X can take on arbitrarily large values since SX = 51, 2, Á 6. The expected value is:
E3X4 = a kpqk - 1 = p a kqk - 1.
q
q
k=1
k=1
This expression is readily evaluated by differentiating the series
1
= a xk
1 - x
k=0
q
(3.13)
to obtain
11 - x2
2
Letting x = q, we obtain
= a kxk - 1.
q
1
E3X4 = p
(3.14)
k=0
1
1
= .
2
p
11 - q2
We see that X has a finite expected value as long as p 7 0.
(3.15)
Section 3.3
Expected Value and Moments of Discrete Random Variable
107
For certain random variables large values occur sufficiently frequently that the
expected value does not exist, as illustrated by the following example.
Example 3.16 St. Petersburg Paradox
A fair coin is tossed repeatedly until a tail comes up. If X tosses are needed, then the casino
pays the gambler Y = 2 X dollars. How much should the gambler be willing to pay to play this
game?
If the gambler plays this game a large number of times, then the payoff should be the expected value of Y = 2 X. If the coin is fair, P3X = k4 = 11/22k and P3Y = 2 k4 = 11/22k, so:
1 k
E3Y4 = a 2 kpY12 k2 = a 2 k a b = 1 + 1 + Á = q .
2
q
q
k=1
k=1
This game does indeed appear to offer the gambler a sweet deal, and so the gambler should be
willing to pay any amount to play the game! The paradox is that a sane person would not pay a
lot to play this game. Problem 3.34 discusses ways to resolve the paradox.
Random variables with unbounded expected value are not uncommon and appear in models where outcomes that have extremely large values are not that rare. Examples include the sizes of files in Web transfers, frequencies of words in large bodies
of text, and various financial and economic problems.
3.3.1
Expected Value of Functions of a Random Variable
Let X be a discrete random variable, and let Z = g1X2. Since X is discrete, Z = g1X2
will assume a countable set of values of the form g1xk2 where xk H SX . Denote the set
of values assumed by g(X) by 5z1 , z2 , Á 6. One way to find the expected value of Z is
to use Eq. (3.8), which requires that we first find the pmf of Z. Another way is to use
the following result:
E3Z4 = E3g1X24 = a g1xk2pX1xk2.
(3.16)
k
To show Eq. (3.16) group the terms xk that are mapped to each value zj:
a g1xk2pX1xk2 = a zj b
k
j
a
xk :g1xk2 = zj
pX1xk2 r = a zjpZ1zj2 = E3Z4.
j
The sum inside the braces is the probability of all terms xk for which g1xk2 = zj , which
is the probability that Z = zj , that is, pZ1zj2.
Example 3.17 Square-Law Device
Let X be a noise voltage that is uniformly distributed in SX = 5-3, -1, +1, +36 with pX1k2 = 1/4
for k in SX . Find E[Z] where Z = X2.
Using the first approach we find the pmf of Z:
pZ192 = P[X H 5-3, +36] = pX1-32 + pX132 = 1/2
pZ112 = pX1-12 + pX112 = 1/2
108
Chapter 3
Discrete Random Variables
and so
1
1
E3Z4 = 1a b + 9 a b = 5.
2
2
The second approach gives:
1
20
= 5.
E3Z4 = E3X24 = a k2pX1k2 = 51-322 + 1-122 + 12 + 326 =
4
4
k
Equation 3.16 implies several very useful results. Let Z be the function
Z = ag1X2 + bh1X2 + c
where a, b, and c are real numbers, then:
E3Z4 = aE3g1X24 + bE3h1X24 + c.
(3.17a)
From Eq. (3.16) we have:
E3Z4 = E3ag1X2 + bh1X2 + c4 = a 1ag1xk2 + bh1xk2 + c2pX1xk2
= a a g1xk2pX1xk2 + b a h1xk2pX1xk2 + c a pX1xk2
k
k
k
k
= aE3g1X24 + bE3h1X24 + c.
Equation (3.17a), by setting a, b, and/or c to 0 or 1, implies the following expressions:
E3g1X2 + h1X24 = E3g1X24 + E3h1X24.
(3.17b)
E3aX4 = aE3X4.
(3.17c)
E3X + c4 = E3X4 + c.
(3.17d)
E3c4 = c.
(3.17e)
Example 3.18 Square-Law Device
The noise voltage X in the previous example is amplified and shifted to obtain Y = 2X + 10,
and then squared to produce Z = Y2 = 12X + 1022. Find E[Z].
E3Z4 = E312X + 10224 = E34X2 + 40X + 1004
= 4E3X24 + 40E3X4 + 100 = 4152 + 40102 + 100 = 120.
Example 3.19 Voice Packet Multiplexer
Let X be the number of voice packets containing active speech produced by n = 48 independent
speakers in a 10-millisecond period as discussed in Section 1.4. X is a binomial random variable
with parameter n and probability p = 1/3. Suppose a packet multiplexer transmits up to
M = 20 active packets every 10 ms, and any excess active packets are discarded. Let Z be the
number of packets discarded. Find E[Z].
Section 3.3
Expected Value and Moments of Discrete Random Variable
109
The number of packets discarded every 10 ms is the following function of X:
Z = 1X - M2+ ! b
0
X - M
if X … M
if X 7 M.
48
48 1 k 2 48 - k
= 0.182.
E3Z4 = a 1k - 202 ¢ ≤ a b a b
k
3
3
k = 20
Every 10 ms E3X4 = np = 16 active packets are produced on average, so the fraction of active
packets discarded is 0.182/16 = 1.1%, which users will tolerate. This example shows that engineered systems also play “betting” games where favorable statistics are exploited to use resources efficiently. In this example, the multiplexer transmits 20 packets per period instead of 48
for a reduction of 28/48 = 58%.
3.3.2
Variance of a Random Variable
The expected value E[X], by itself, provides us with limited information about X. For example, if we know that E3X4 = 0, then it could be that X is zero all the time. However,
it is also possible that X can take on extremely large positive and negative values. We
are therefore interested not only in the mean of a random variable, but also in the extent of the random variable’s variation about its mean. Let the deviation of the random
variable X about its mean be X - E3X4, which can take on positive and negative values. Since we are interested in the magnitude of the variations only, it is convenient to
work with the square of the deviation, which is always positive, D1X2 = 1X - E3X422.
The expected value is a constant, so we will denote it by mX = E3X4. The variance of
the random variable X is defined as the expected value of D:
s2X = VAR3X4 = E31X - mX224
= a 1x - mX22pX1x2 = a 1xk - mX22pX1xk2.
q
(3.18)
k=1
xHSX
The standard deviation of the random variable X is defined by:
sX = STD3X4 = VAR3X41/2.
(3.19)
By taking the square root of the variance we obtain a quantity with the same units as X.
An alternative expression for the variance can be obtained as follows:
VAR3X4 = E31X - mX224 = E3X2 - 2mXX + m2X4
= E3X24 - 2mXE3X4 + m2X
= E3X24 - m2X .
(3.20)
E3X24 is called the second moment of X. The nth moment of X is defined as E3Xn4.
Equations (3.17c), (3.17d), and (3.17e) imply the following useful expressions for
the variance. Let Y = X + c, then
VAR3X + c4 = E31X + c - 1E3X4 + c24224
= E31X - E3X4224 = VAR3X4.
(3.21)
110
Chapter 3
Discrete Random Variables
Adding a constant to a random variable does not affect the variance. Let Z = cX,
then:
VAR3cX4 = E31cX - cE3X4224 = E3c21X - E3X4224 = c2 VAR3X4. (3.22)
Scaling a random variable by c scales the variance by c2 and the standard deviation by ƒ c ƒ .
Now let X = c, a random variable that is equal to a constant with probability 1, then
VAR3X4 = E31X - c224 = E304 = 0.
(3.23)
A constant random variable has zero variance.
Example 3.20 Three Coin Tosses
Let X be the number of heads in three tosses of a fair coin. Find VAR[X].
3
3
1
1
E3X24 = 0a b + 12 a b + 2 2 a b + 32 a b = 3 and
8
8
8
8
VAR3X4 = E3X24 - m2X = 3 - 1.52 = 0.75.
Recall that this is an n = 3, p = 1>2 binomial random variable. We see later that variance for the
binomial random variable is npq.
Example 3.21 Variance of Bernoulli Random Variable
Find the variance of the Bernoulli random variable IA .
E3I 2A4 = 0pI102 + 12pI112 = p and so
VAR3IA4 = p - p2 = p11 - p2 = pq.
Example 3.22 Variance of Geometric Random Variable
Find the variance of the geometric random variable.
Differentiate the term 11 - x22-1 in Eq. (3.14) to obtain
2
= a k1k - 12xk - 2.
11 - x23
k=0
q
Let x = q and multiply both sides by pq to obtain:
= pq a k1k - 12qk - 2
11 - q23
k=0
q
2pq
= a k1k - 12pqk - 1 = E3X24 - E3X4.
q
k=0
So the second moment is
E3X24 =
2pq
11 - q23
+ E3X4 =
2q
p2
+
1 + q
1
=
p
p2
(3.24)
Section 3.4
Conditional Probability Mass Function
111
and the variance is
VAR3X4 = E3X24 - E3X42 =
3.4
1 + q
p
2
-
q
1
= 2.
2
p
p
CONDITIONAL PROBABILITY MASS FUNCTION
In many situations we have partial information about a random variable X or about
the outcome of its underlying random experiment. We are interested in how this information changes the probability of events involving the random variable. The conditional probability mass function addresses this question for discrete random variables.
3.4.1
Conditional Probability Mass Function
Let X be a discrete random variable with pmf pX1x2, and let C be an event that has
nonzero probability, P3C4 7 0. See Fig. 3.7. The conditional probability mass function
of X is defined by the conditional probability:
pX1x ƒ C2 = P3X = x ƒ C4
for x a real number.
(3.25)
Applying the definition of conditional probability we have:
pX1x ƒ C2 =
P35X = x6 ¨ C4
P3C4
(3.26)
.
The above expression has a nice intuitive interpretation:The conditional probability of the
event 5X = xk6 is given by the probabilities of outcomes z for which both X1z2 = xk and
z are in C, normalized by P[C].
The conditional pmf satisfies Eqs. (3.4a) – (3.4c). Consider Eq. (3.4b). The set of
events A k = 5X = xk6 is a partition of S, so
C = d 1A k ¨ C2, and
k
a pX1xk ƒ C2 = a pX1xk ƒ C2 = a
xk HSX
all k
=
P35X = xk6 ¨ C4
P3C4
all k
P3C4
1
P3A k ¨ C4 =
= 1.
a
P3C4 all k
P3C4
S
Ak
X(z) xk
C
xk
FIGURE 3.7
Conditional pmf of X given event C.
112
Chapter 3
Discrete Random Variables
Similarly we can show that:
P3X in B ƒ C4 = a pX1x ƒ C2 where B ( SX .
xHB
Example 3.23 A Random Clock
The minute hand in a clock is spun and the outcome z is the minute where the hand comes to
rest. Let X be the hour where the hand comes to rest. Find the pmf of X. Find the conditional
pmf of X given B = 5first 4 hours6; given D = 51 6 z … 116.
We assume that the hand is equally likely to rest at any of the minutes in the range
S = 51, 2, Á , 606, so P3z = k4 = 1/60 for k in S. X takes on values from SX = 51, 2, Á , 126
and it is easy to show that pX1j2 = 1/12 for j in SX . Since B = 51, 2, 3, 46:
pX1j ƒ B2 =
P35X = j6 ¨ B4
P3B4
P3X = j4
= c
1/3
=
0
1
4
P3X H 5j6 ¨ 51, 2, 3, 464
=
P3X H 51, 2, 3, 464
if j H 51, 2, 3, 46
otherwise.
The event B above involves X only. The event D, however, is stated in terms of the outcomes in the underlying experiment (i.e., minutes not hours), so the probability of the intersection has to be expressed accordingly:
pX1j ƒ D2 =
P35X = j6 ¨ D4
P3D4
=
P3z : X1z2 = j and z H 52, Á , 1164
P3z H 52, 3, 4, 564
4
=
10/60
10
P3z H 56, 7, 8, 9, 1064
5
= f
=
10/60
10
P3z H 51164
1
=
10/60
10
P3z H 52, Á , 1164
for j = 1
for j = 2
for j = 3.
Most of the time the event C is defined in terms of X, for example C = 5X 7 106
or C = 5a … X … b6. For xk in SX , we have the following general result:
pX1xk2
pX1xk ƒ C2 = c P3C4
0
if xk H C
(3.27)
if xk x C.
The above expression is determined entirely by the pmf of X.
Example 3.24 Residual Waiting Times
Let X be the time required to transmit a message, where X is a uniform random variable with
SX = 51, 2, Á , L6. Suppose that a message has already been transmitting for m time units, find
the probability that the remaining transmission time is j time units.
Section 3.4
Conditional Probability Mass Function
113
We are given C = 5X 7 m6, so for m + 1 … m + j … L:
pX1m + j ƒ X 7 m2 =
P3X = m + j4
P3X 7 m4
1
1
L
=
=
L - m
L - m
L
for m + 1 … m + j … L.
(3.28)
X is equally likely to be any of the remaining L - m possible values. As m increases, 1/1L - m2
increases implying that the end of the message transmission becomes increasingly likely.
Many random experiments have natural ways of partitioning the sample space S
into the union of disjoint events B1 , B2 , Á , Bn . Let pX1x ƒ Bi2 be the conditional pmf of
X given event Bi . The theorem on total probability allows us to find the pmf of X in
terms of the conditional pmf’s:
pX1x2 = a pX1x ƒ Bi2P3Bi4.
n
(3.29)
i=1
Example 3.25 Device Lifetimes
A production line yields two types of devices. Type 1 devices occur with probability a and work
for a relatively short time that is geometrically distributed with parameter r. Type 2 devices work
much longer, occur with probability 1 - a, and have a lifetime that is geometrically distributed
with parameter s. Let X be the lifetime of an arbitrary device. Find the pmf of X.
The random experiment that generates X involves selecting a device type and then observing its lifetime. We can partition the sets of outcomes in this experiment into event B1, consisting of those outcomes in which the device is type 1, and B2, consisting of those outcomes in
which the device is type 2. The conditional pmf’s of X given the device type are:
pXƒB11k2 = 11 - r2k - 1r
for k = 1, 2, Á
pXƒB21k2 = 11 - s2k - 1s
for k = 1, 2, Á .
and
We obtain the pmf of X from Eq. (3.29):
pX1k2 = pX1k ƒ B12P3B14 + pX1k ƒ B22P3B24
= 11 - r2k - 1ra + 11 - s2k - 1s11 - a2
3.4.2
for k = 1, 2, Á .
Conditional Expected Value
Let X be a discrete random variable, and suppose that we know that event B has occurred. The conditional expected value of X given B is defined as:
mXƒB = E3X ƒ B4 = a xpX1x ƒ B2 = a xkpX1xk ƒ B2
xHSX
k
(3.30)
114
Chapter 3
Discrete Random Variables
where we apply the absolute convergence requirement on the summation.The conditional
variance of X given B is defined as:
VAR3X ƒ B4 = E31X - mXƒB22 ƒ B4 = a 1xk - mXƒB22pX1xk ƒ B2
q
k=1
= E3X2 ƒ B4 - m2XƒB .
Note that the variation is measured with respect to mXƒB, not mX .
Let B1, B2,..., Bn be the partition of S, and let pX1x ƒ Bi2 be the conditional pmf of X
given event Bi. E[X] can be calculated from the conditional expected values E3X ƒ B4:
E3X4 = a E3X ƒ Bi4P3Bi4.
n
(3.31a)
i=1
By the theorem on total probability we have:
E3X4 = a kpX1xk2 = a k b a pX1xk ƒ Bi2P3Bi4 r
n
k
k
i=1
= a b a kpX1xk ƒ Bi2 r P3Bi4 = a E3X ƒ Bi4P3Bi4,
n
n
i=1
i=1
k
where we first express pX1xk2 in terms of the conditional pmf’s, and we then change
the order of summation. Using the same approach we can also show
E3g1X24 = a E3g1X2 ƒ Bi4P3Bi4.
n
(3.31b)
i=1
Example 3.26 Device Lifetimes
Find the mean and variance for the devices in Example 3.25.
The conditional mean and second moment of each device type is that of a geometric random variable with the corresponding parameter:
mXƒB1 = 1/r E3X2 ƒ B14 = 11 + r2/r2
mXƒB2 = 1/s E3X2 ƒ B24 = 11 + s2/s2.
The mean and the second moment of X are then:
mX = mXƒB1a + mXƒB211 - a2 = a/r + 11 - a2/s
E3X24 = E3X2 ƒ B14a + E3X2 ƒ B2411 - a2 = a11 + r2/r2 + 11 - a211 + s2/s2.
Finally, the variance of X is:
VAR3X4 = E3X24 - m2X =
a11 + r2
r
2
+
11 - a211 + s2
s
2
- a
11 - a2 2
a
+
b .
r
s
Note that we do not use the conditional variances to find VAR[Y] because Eq.
(3.31b) does not apply to conditional variances. (See Problem 3.40.) However, the
equation does apply to the conditional second moments.
Section 3.5
3.5
Important Discrete Random Variables
115
IMPORTANT DISCRETE RANDOM VARIABLES
Certain random variables arise in many diverse, unrelated applications. The pervasiveness of these random variables is due to the fact that they model fundamental mechanisms that underlie random behavior. In this section we present the most important of
the discrete random variables and discuss how they arise and how they are interrelated. Table 3.1 summarizes the basic properties of the discrete random variables discussed in this section. By the end of this chapter, most of these properties presented in
the table will have been introduced.
TABLE 3.1 Discrete random variables
Bernoulli Random Variable
SX = 50, 16
p0 = q = 1 - p
p1 = p
0 … p … 1
GX1z2 = 1q + pz2
E3X4 = p VAR3X4 = p11 - p2
Remarks: The Bernoulli random variable is the value of the indicator function IA for some event A; X = 1
if A occurs and 0 otherwise.
Binomial Random Variable
SX = 50, 1, Á , n6
n
pk = ¢ ≤ pk11 - p2n - k
k
k = 0, 1, Á , n
E3X4 = np VAR3X4 = np11 - p2
GX1z2 = 1q + pz2n
Remarks: X is the number of successes in n Bernoulli trials and hence the sum of n independent, identically
distributed Bernoulli random variables.
Geometric Random Variable
First Version: SX = 50, 1, 2, Á 6
pk = p11 - p2k
E3X4 =
k = 0, 1, Á
1 - p
VAR3X4 =
p
1 - p
p
2
GX1z2 =
p
1 - qz
Remarks: X is the number of failures before the first success in a sequence of independent Bernoulli trials.
The geometric random variable is the only discrete random variable with the memoryless property.
Second Version: SX¿ = 51, 2, Á 6
pk = p11 - p2k - 1
E3X¿4 =
1
p
k = 1, 2, Á
VAR3X¿4 =
1 - p
p2
GX¿1z2 =
pz
1 - qz
Remarks: X¿ = X + 1 is the number of trials until the first success in a sequence of independent Bernoulli
trials.
(Continued)
116
Chapter 3
Discrete Random Variables
TABLE 3.1 Continued
Negative Binomial Random Variable
SX = 5r, r + 1, Á 6 where r is a positive integer
pk = ¢
k - 1 r
≤ p 11 - p2k - r
r - 1
E3X4 =
r
p
VAR3X4 =
k = r, r + 1, Á
r11 - p2
p
GX1z2 = a
2
pz
1 - qz
b
r
Remarks: X is the number of trials until the rth success in a sequence of independent Bernoulli trials.
Poisson Random Variable
SX = 50, 1, 2, Á 6
pk =
ak -a
e
k!
E3X4 = a
and a 7 0
k = 0, 1, Á
VAR3X4 = a
GX1z2 = ea1z - 12
Remarks: X is the number of events that occur in one time unit when the time between events is exponentially distributed with mean 1/a.
Uniform Random Variable
SX = 51, 2, Á , L6
pk =
1
L
E3X4 =
k = 1, 2, Á , L
L + 1
2
VAR3X4 =
L2 - 1
12
GX1z2 =
z 1 - zL
L 1 - z
Remarks: The uniform random variable occurs whenever outcomes are equally likely. It plays a key role in
the generation of random numbers.
Zipf Random Variable
SX = 51, 2, Á , L6 where L is a positive integer
pk =
1 1
cL k
E3X4 =
L
cL
k = 1, 2, Á , L where cL is given by Eq. 13.452
VAR3X4 =
L1L + 12
2cL
-
L2
c2L
Remarks: The Zipf random variable has the property that a few outcomes occur frequently but most outcomes occur rarely.
Discrete random variables arise mostly in applications where counting is involved. We begin with the Bernoulli random variable as a model for a single coin toss.
By counting the outcomes of multiple coin tosses we obtain the binomial, geometric,
and Poisson random variables.
Section 3.5
3.5.1
Important Discrete Random Variables
117
The Bernoulli Random Variable
Let A be an event related to the outcomes of some random experiment. The Bernoulli
random variable IA (defined in Example 3.8) equals one if the event A occurs, and zero
otherwise. IA is a discrete random variable since it assigns a number to each outcome
of S. It is a discrete random variable with range = 50, 16, and its pmf is
pI102 = 1 - p
and
pI112 = p,
(3.32)
where P3A4 = p.
In Example 3.11 we found the mean of IA:
mI = E3IA4 = p.
The sample mean in n independent Bernoulli trials is simply the relative frequency of
successes and converges to p as n increases:
0N01n2 + 1N11n2
= f11n2 : p.
n
In Example 3.21 we found the variance of IA:
8IA9n =
s2I = VAR3IA4 = p11 - p2 = pq.
The variance is quadratic in p, with value zero at p = 0 and p = 1 and maximum at
p = 1/2. This agrees with intuition since values of p close to 0 or to 1 imply a preponderance of successes or failures and hence less variability in the observed values. The
maximum variability occurs when p = 1/2 which corresponds to the case that is most
difficult to predict.
Every Bernoulli trial, regardless of the event A, is equivalent to the tossing of a
biased coin with probability of heads p. In this sense, coin tossing can be viewed as representative of a fundamental mechanism for generating randomness, and the Bernoulli random variable is the model associated with it.
3.5.2
The Binomial Random Variable
Suppose that a random experiment is repeated n independent times. Let X be the number of times a certain event A occurs in these n trials. X is then a random variable with
range SX = 50, 1, Á , n6. For example, X could be the number of heads in n tosses of
a coin. If we let Ij be the indicator function for the event A in the jth trial, then
X = I1 + I2 + Á + In ,
that is, X is the sum of the Bernoulli random variables associated with each of the n independent trials.
In Section 2.6, we found that X has probabilities that depend on n and p:
n
P3X = k4 = pX1k2 = ¢ ≤ pk11 - p2n - k
k
for k = 0, Á , n.
(3.33)
X is called the binomial random variable. Figure 3.8 shows the pdf of X for n = 24 and
p = .2 and p = .5. Note that P3X = k4 is maximum at kmax = 31n + 12p4, where [x]
118
Chapter 3
Discrete Random Variables
.2
.2
n ⫽ 24
p ⫽ .2
n ⫽ 24
p ⫽ .5
.15
.15
.1
.1
.05
.05
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
(a)
(b)
FIGURE 3.8
Probability mass functions of binomial random variable (a) p ⴝ 0.2; (b) p ⴝ 0.5.
denotes the largest integer that is smaller than or equal to x. When 1n + 12p is an integer, then the maximum is achieved at kmax and kmax - 1. (See Problem 3.50.)
The factorial terms grow large very quickly and cause overflow problems in the
n
calculation of ¢ ≤ . We can use Eq. (2.40) for the ratio of successive terms in the
k
pmf allows us to calculate pX1k + 12 in terms of pX1k2 and delays the onset of
overflows:
pX1k + 12
n - k p
=
pX1k2
k + 11 - p
where pX102 = 11 - p2n.
(3.34)
The binomial random variable arises in applications where there are two types of
objects (i.e., heads/tails, correct/erroneous bits, good/defective items, active/silent speakers), and we are interested in the number of type 1 objects in a randomly selected batch
of size n, where the type of each object is independent of the types of the other objects in
the batch. Examples involving the binomial random variable were given in Section 2.6.
Example 3.27 Mean of a Binomial Random Variable
The expected value of X is:
n
n
n
n
n!
pk11 - p2n - k
E3X4 = a kpX1k2 = a k ¢ ≤ pk11 - p2n - k = a k
k
k!1n
- k2!
k=0
k=0
k=1
n
1n - 12!
pk - 111 - p2n - k
= np a
k = 1 1k - 12!1n - k2!
n-1
1n - 12!
pj11 - p2n - 1 - j = np,
= np a
j = 0 j!1n - 1 - j2!
(3.35)
where the first line uses the fact that the k = 0 term in the sum is zero, the second line cancels out
the k and factors np outside the summation, and the last line uses the fact that the summation is
equal to one since it adds all the terms in a binomial pmf with parameters n - 1 and p.
Section 3.5
Important Discrete Random Variables
119
The expected value E3X4 = np agrees with our intuition since we expect a fraction p of
the outcomes to result in success.
Example 3.28 Variance of a Binomial Random Variable
To find E3X24 below, we remove the k = 0 term and then let k¿ = k - 1:
n
n
n!
n!
E3X24 = a k2
pk11 - p2n - k = a k
pk11 - p2n - k
k!1n
k2!
1k
12!1n
- k2!
k=0
k=1
= np a 1k¿ + 12 ¢
n-1
k¿ = 0
= np b a k¿ ¢
n-1
k¿ = 0
n - 1
≤ k¿11 - p2n - 1 - k
k¿ p
n-1
n - 1
n - 1
≤ pk¿11 - p2n - 1 - k + a 1 ¢
≤ k¿11 - p2n - 1 - k¿ r
k¿
k¿ p
k¿ = 0
= np51n - 12p + 16 = np1np + q2.
In the third line we see that the first sum is the mean of a binomial random variable with parameters 1n - 12 and p, and hence equal to 1n - 12p. The second sum is the sum of the binomial
probabilities and hence equal to 1.
We obtain the variance as follows:
s2X = E3X24 - E3X42 = np1np + q2 - 1np22 = npq = np11 - p2.
We see that the variance of the binomial is n times the variance of a Bernoulli random variable.
We observe that values of p close to 0 or to 1 imply smaller variance, and that the maximum variability is when p = 1/2.
Example 3.29 Redundant Systems
A system uses triple redundancy for reliability: Three microprocessors are installed and the system is designed so that it operates as long as one microprocessor is still functional. Suppose that
the probability that a microprocessor is still active after t seconds is p = e -lt. Find the probability that the system is still operating after t seconds.
Let X be the number of microprocessors that are functional at time t. X is a binomial random variable with parameter n = 3 and p. Therefore:
P3X Ú 14 = 1 - P3X = 04 = 1 - 11 - e -lt23.
3.5.3
The Geometric Random Variable
The geometric random variable arises when we count the number M of independent
Bernoulli trials until the first occurrence of a success. M is called the geometric random
variable and it takes on values from the set 51, 2, Á 6. In Section 2.6, we found that the
pmf of M is given by
P3M = k4 = pM1k2 = 11 - p2k - 1p k = 1, 2, Á ,
(3.36)
where p = P3A4 is the probability of “success” in each Bernoulli trial. Figure 3.5(b)
shows the geometric pmf for p = 1/2. Note that P3M = k4 decays geometrically with k,
and that the ratio of consecutive terms is pM1k + 12>pM1k2 = 11 -p2 = q. As p increases, the pmf decays more rapidly.
120
Chapter 3
Discrete Random Variables
The probability that M … k can be written in closed form:
k-1
k
1 - qk
= 1 - qk.
P3M … k4 = a pqj - 1 = p a qj¿ = p
1 - q
j¿ = 0
j=1
(3.37)
Sometimes we are interested in M¿ = M - 1, the number of failures before a success
occurs. We also refer to M¿ as a geometric random variable. Its pmf is:
P3M¿ = k4 = P3M = k + 14 = 11 - p2kp k = 0, 1, 2, Á .
(3.38)
In Examples 3.15 and 3.22, we found the mean and variance of the geometric random variable:
1 - p
.
mM = E3M4 = 1/p
VAR3M4 =
p2
We see that the mean and variance increase as p, the success probability, decreases.
The geometric random variable is the only discrete random variable that satisfies
the memoryless property:
P3M Ú k + j ƒ M 7 j4 = P3M Ú k4 for all j, k 7 1.
(See Problems 3.54 and 3.55.) The above expression states that if a success has not occurred in the first j trials, then the probability of having to perform at least k more trials is the same as the probability of initially having to perform at least k trials. Thus,
each time a failure occurs, the system “forgets” and begins anew as if it were performing the first trial.
The geometric random variable arises in applications where one is interested in
the time (i.e., number of trials) that elapses between the occurrence of events in a sequence of independent experiments, as in Examples 2.11 and 2.43. Examples where the
modified geometric random variable M¿ arises are: number of customers awaiting service in a queueing system; number of white dots between successive black dots in a
scan of a black-and-white document.
3.5.4
The Poisson Random Variable
In many applications, we are interested in counting the number of occurrences of an
event in a certain time period or in a certain region in space. The Poisson random variable arises in situations where the events occur “completely at random” in time or
space. For example, the Poisson random variable arises in counts of emissions from radioactive substances, in counts of demands for telephone connections, and in counts of
defects in a semiconductor chip.
The pmf for the Poisson random variable is given by
P3N = k4 = pN1k2 =
ak -a
e
k!
for k = 0, 1, 2, Á ,
(3.39)
where a is the average number of event occurrences in a specified time interval or region
in space. Figure 3.9 shows the Poisson pmf for several values of a. For a 6 1, P3N = k4
is maximum at k = 0; for a 7 1, P3N = k4 is maximum at 3a4; if a is a positive integer,
the P3N = k4 is maximum at k = a and at k = a - 1.
Section 3.5
Important Discrete Random Variables
.5
α ⫽ 0.75
.4
.3
.2
.1
0
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
(a)
.25
α⫽3
.2
.15
.1
.05
0
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
(b)
.25
α⫽9
.2
.15
.1
.05
0 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
(c)
FIGURE 3.9
Probability mass functions of Poisson random variable (a) a = 0.75;
(b) a = 3; (c) a = 9.
121
122
Chapter 3
Discrete Random Variables
The pmf of the Poisson random variable sums to one, since
ak -a
ak
-a
e
=
e
= e -aea = 1,
a
a
k = 0 k!
k = 0 k!
q
q
where we used the fact that the second summation is the infinite series expansion for ea.
It is easy to show that the mean and variance of a Poisson random variable is
given by:
E3N4 = a
and
s2N = VAR3N4 = a.
Example 3.30 Queries at a Call Center
The number N of queries arriving in t seconds at a call center is a Poisson random variable with
a = lt where l is the average arrival rate in queries/second. Assume that the arrival rate is four
queries per minute. Find the probability of the following events: (a) more than 4 queries in 10
seconds; (b) fewer than 5 queries in 2 minutes.
The arrival rate in queries/second is l = 4 queries/60 sec = 1/15 queries/sec. In part a, the
time interval is 10 seconds, so we have a Poisson random variable with a = 11/15 queries/sec2 *
10 seconds = 10/15 queries. The probability of interest is evaluated numerically:
4
12/32k
k=0
k!
P3N 7 44 = 1 - P3N … 44 = 1 - a
e -2/3 = 6.33110-42.
In part b, the time interval of interest is t = 120 seconds, so a = 1/15 * 120 seconds = 8. The
probability of interest is:
5 182k
e -8 = 0.10.
P3N … 54 = a
k = 0 k!
Example 3.31 Arrivals at a Packet Multiplexer
The number N of packet arrivals in t seconds at a multiplexer is a Poisson random variable with
a = lt where l is the average arrival rate in packets/second. Find the probability that there are
no packet arrivals in t seconds.
P3N = 04 =
a0 -lt
e
= e -lt.
0!
This equation has an interesting interpretation. Let Z be the time until the first packet arrival. Suppose we ask, “What is the probability that X 7 t, that is, the next arrival occurs t or
more seconds later?” Note that 5N = 06 implies 5Z 7 t6 and vice versa, so P3Z 7 t4 = e -lt.
The probability of no arrival decreases exponentially with t.
Note that we can also show that
n - 1 1lt2k
e -lt.
P3N1t2 Ú n4 = 1 - P3N1t2 6 n4 = 1 - a
k = 0 k!
One of the applications of the Poisson probabilities in Eq. (3.39) is to approximate the binomial probabilities in the case where p is very small and n is very large,
Section 3.5
Important Discrete Random Variables
123
that is, where the event A of interest is very rare but the number of Bernoulli trials is
very large. We show that if a = np is fixed, then as n becomes large:
ak -a
n
pk = ¢ ≤ pk11 - p2n - k M
e
k!
k
for k = 0, 1, Á .
(3.40)
Equation (3.40) is obtained by taking the limit n : q in the expression for pk , while
keeping a = np fixed. First, consider the probability that no events occur in n trials:
p0 = 11 - p2n = a1 -
a n
b : e -a
n
as n : q ,
(3.41)
where the limit in the last expression is a well known result from calculus. Consider the
ratio of successive binomial probabilities:
1n - k2p
11 - k/n2a
pk + 1
=
=
pk
1k + 12q
1k + 1211 - a/n2
a
as n : q .
:
k + 1
Thus the limiting probabilities satisfy
pk + 1 =
a
a
a
ak -a
a
pk = a
b a b Á a bp0 =
e .
k + 1
k + 1 k
1
k!
(3.42)
Thus the Poisson pmf can be used to approximate the binomial pmf for large n and
small p, using a = np.
Example 3.32 Errors in Optical Transmission
An optical communication system transmits information at a rate of 109 bits/second. The probability of a bit error in the optical communication system is 10-9. Find the probability of five or
more errors in 1 second.
Each bit transmission corresponds to a Bernoulli trial with a “success” corresponding to a
bit error in transmission. The probability of k errors in n = 109 transmissions (1 second) is then
given by the binomial probability with n = 109 and p = 10-9. The Poisson approximation uses
a = np = 109110-92 = 1. Thus
4
ak -a
P3N Ú 54 = 1 - P3N 6 54 = 1 - a
e
k = 0 k!
= 1 - e -1 e 1 +
1
1
1
1
+
+
+
f = .00366.
1!
2!
3!
4!
The Poisson random variable appears in numerous physical situations because
many models are very large in scale and involve very rare events. For example, the
Poisson pmf gives an accurate prediction for the relative frequencies of the number of
particles emitted by a radioactive mass during a fixed time period. This correspondence can be explained as follows. A radioactive mass is composed of a large number
of atoms, say n. In a fixed time interval each atom has a very small probability p of disintegrating and emitting a radioactive particle. If atoms disintegrate independently of
124
Chapter 3
Discrete Random Variables
…
0
t
T
FIGURE 3.10
Event occurrences in n subintervals of [0, T].
other atoms, then the number of emissions in a time interval can be viewed as the number of successes in n trials. For example, one microgram of radium contains about
n = 1016 atoms, and the probability that a single atom will disintegrate during a onemillisecond time interval is p = 10 -15 [Rozanov, p. 58]. Thus it is an understatement to
say that the conditions for the approximation in Eq. (3.40) hold: n is so large and p so
small that one could argue that the limit n : q has been carried out and that the number of emissions is exactly a Poisson random variable.
The Poisson random variable also comes up in situations where we can imagine a
sequence of Bernoulli trials taking place in time or space. Suppose we count the number of event occurrences in a T-second interval. Divide the time interval into a very
large number, n, of subintervals as shown in Fig. 3.10. A pulse in a subinterval indicates
the occurrence of an event. Each subinterval can be viewed as one in a sequence of independent Bernoulli trials if the following conditions hold: (1) At most one event can
occur in a subinterval, that is, the probability of more than one event occurrence is negligible; (2) the outcomes in different subintervals are independent; and (3) the probability of an event occurrence in a subinterval is p = a/n, where a is the average
number of events observed in a 1-second interval. The number N of events in 1 second
is a binomial random variable with parameters n and p = a/n. Thus as n : q , N becomes a Poisson random variable with parameter a. In Chapter 9 we will revisit this result when we discuss the Poisson random process.
3.5.5
The Uniform Random Variable
The discrete uniform random variable Y takes on values in a set of consecutive integers SY = 5j + 1, Á , j + L6 with equal probability:
pY1k2 =
1
L
for k H 5j + 1, Á , j + L6.
(3.43)
This humble random variable occurs whenever outcomes are equally likely, e.g., toss of
a fair coin or a fair die, spinning of an arrow in a wheel divided into equal segments, selection of numbers from an urn. It is easy to show that the mean and variance are:
E3Y4 = j +
L + 1
2
and VAR3Y4 =
L2 - 1
.
12
Example 3.33 Discrete Uniform Random Variable in Unit Interval
Let X be a uniform random variable in SX = 50, 1, Á , L - 16. We define the discrete uniform
random variable in the unit interval by
U =
X
L
so
SU = e 0,
1 2 3
1
, , , Á , 1 - f.
L L L
L
Section 3.5
Important Discrete Random Variables
125
U has pmf:
pU a
1
k
b =
L
L
for k = 0, 2, Á , L - 1.
The pmf of U puts equal probability mass 1/L on equally spaced points xk = k/L in the unit interval. The probability of a subinterval of the unit interval is equal to the number of points in the
subinterval multiplied by 1/L. As L becomes very large, this probability is essentially the length
of the subinterval.
3.5.6
The Zipf Random Variable
The Zipf random variable is named for George Zipf who observed that the frequency of words in a large body of text is proportional to their rank. Suppose that words
are ranked from most frequent, to next most frequent, and so on. Let X be the rank
of a word, then SX = 51, 2, Á , L6 where L is the number of distinct words. The pmf
of X is:
pX1k2 =
1 1
cL k
for k = 1, 2, Á , L.
(3.44)
where cL is a normalization constant. The second word has 1/2 the frequency of occurrence as the first, the third word has 1/3 the frequency of the first, and so on. The normalization constant cL is given by the sum:
L
1
1
1
1
cL = a = 1 + + + Á +
2
3
L
j=1 j
(3.45)
The constant cL occurs frequently in calculus and is called the Lth harmonic
mean and increases approximately as lnL. For example, for L = 100, cL = 5.187378
and cL - ln1L2 = 0.582207. It can be shown that as L : q , cL - lnL : 0.57721 Á .
The mean of X is given by:
L
L
L
1
=
.
E3X4 = a jpX1j2 = a j
cL
j = 1 cLj
j=1
(3.46)
The second moment and variance of X are:
and
L
L1L + 12
1
1 L
j =
E3X24 = a j2
=
a
cL j = 1
2cL
j = 1 cLj
VAR3X4 =
L1L + 12
2cL
-
L2
.
c2L
(3.47)
The Zipf and related random variables have gained prominence with the
growth of the Internet where they have been found in a variety of measurement
studies involving Web page sizes, Web access behavior, and Web page interconnectivity. These random variables had previously been found extensively in studies on the
distribution of wealth and, not surprisingly, are now found in Internet video rentals
and book sales.
Discrete Random Variables
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Zipf
97
89
81
73
65
57
49
41
33
25
17
9
Geometric
1
P [X > k]
Chapter 3
k
FIGURE 3.11
Zipf distribution and its long tail.
% wealth
126
1.2
1
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
% population
0.8
1
1.2
FIGURE 3.12
Lorenz curve for Zipf random variable with L ⴝ 100.
Example 3.34 Rare Events and Long Tails
The Zipf random variable X has the property that a few outcomes (words) occur frequently but
most outcomes occur rarely. Find the probability of words with rank higher than m.
P3X 7 m4 = 1 - P3X … m4 = 1 -
cm
1 m 1
= 1 cL ja
cL
j
=1
for m … L.
(3.48)
We call P3X 7 m4 the probability of the tail of the distribution of X. Figure 3.11 shows
the P3X 7 m4 with L = 100 which has E[X] = 100/c100 = 19.28. Figure 3.12 also shows
P[Y 7 m] for a geometric random variable with the same mean, that is, 1/p = 19.28. It can be
seen that P3Y 7 m4 for the geometric random variable drops off much more quickly than
P3X 7 m4. The Zipf distribution is said to have a “long tail” because rare events are more likely to occur than in traditional probability models.
Example 3.35 80/20 Rule and the Lorenz Curve
Let X correspond to a level of wealth and pX1k2 be the proportion of a population that has
wealth k. Suppose that X is a Zipf random variable. Thus pX112 is the proportion of the population with wealth 1, pX122 the proportion with wealth 2, and so on. The long tail of the Zipf distribution suggests that very rich individuals are not very rare. We frequently hear statements
such as “20% of the population owns 80% of the wealth.” The Lorenz curve plots the proportion
Section 3.6
Generation of Discrete Random Variables
127
of wealth owned by the poorest fraction x of the population, as the x varies from 0 to 1. Find the
Lorenz curve for L = 100.
For k in 51, 2, Á , L6, the fraction of the population with wealth k or less is:
Fk = P3X … k4 =
ck
1 k 1
=
.
cL ja
cL
=1 j
(3.49)
The proportion of wealth owned by the population that has wealth k or less is:
a jpX1j2
k
Wk =
j=1
a ipX1i2
L
i=1
=
1 k 1
j
cL ja
=1 j
1 L 1
i
cL ia
=1 i
=
k
.
L
(3.50)
The denominator in the above expression is the total wealth of the entire population. The Lorenz
curve consists of the plot of points 1Fk , Wk2 which is shown in Fig. 3.12 for L = 100. In the graph the
70% poorest proportion of the population own only 20% of the total wealth, or conversely, the 30%
wealthiest fraction of the population owns 80% of the wealth. See Problem 3.75 for a discussion of
what the Lorenz curve should look like in the cases of extreme fairness and extreme unfairness.
The explosive growth in the Internet has led to systems of huge scale. For probability models this growth has implied random variables that can attain very large values. Measurement studies have revealed many instances of random variables with long
tail distributions.
If we try to let L approach infinity in Eq. (3.45), cL grows without bound since the
series does not converge. However, if we make the pmf proportional to 11/k2a then the
series converges as long as a 7 1. We define the Zipf or zeta random variable with
range 51, 2, 3, Á 6 to have pmf:
pZ1k2 =
1 1
za ka
for k = 1, 2, Á ,
(3.51)
where za is a normalization constant given by the zeta function which is defined by:
1
1
1
za = a a = 1 + a + a + Á
j
2
3
j=1
q
(3.52)
for a 7 1.
The convergence of the above series is discussed in standard calculus books.
The mean of Z is given by:
L
L
za - 1
1
1 L 1
E3Z4 = a jpZ1j2 = a j
a =
a ja - 1 = z
z
z
j
a j=1
a
j=1
j=1
a
for a 7 2,
where the sum of the sequence 1/ja - 1 converges only if a - 1 7 1, that is, a 7 2. We
can similarly show that the second moment (and hence the variance) exists only if a 7 3.
3.6
GENERATION OF DISCRETE RANDOM VARIABLES
Suppose we wish to generate the outcomes of a random experiment that has sample space S = 5a1 , a2 , Á , an6 with probability of elementary events pj = P35aj64.
We divide the unit interval into n subintervals. The jth subinterval has length pj and
128
Chapter 3
Discrete Random Variables
1
X5
0.9
X4
0.8
0.7
X3
0.6
U 0.5
0.4
X2
0.3
0.2
0.1
0
X0
0
X1
1
2
3
4
5
x
FIGURE 3.13
Generating a binomial random variable with n ⴝ 5, p ⴝ 1/2.
corresponds to outcome aj . Each trial of the experiment first uses rand to obtain a
number U in the unit interval. The outcome of the experiment is aj if U is in the jth
subinterval. Figure 3.13 shows the portioning of the unit interval according to the
pmf of an n = 5, p = 0.5 binomial random variable.
The Octave function discrete_rnd implements the above method and can be
used to generate random numbers with desired probabilities. Functions to generate
random numbers with common distributions are also available. For example,
poisson_rnd (lambda, r, c) can be used to generate an array of Poisson-distributed
random numbers with rate lambda.
Example 3.36 Generation of Tosses of a Die
Use discrete_rnd to generate 20 samples of a toss of a die.
> V=1:6;
% Define SX = 51, 2, 3, 4, 5, 66.
> P=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6];
% Set all the pmf values for X to 1/6.
> discrete_rnd (20, V, P)
% Generate 20 samples from SX with pmf P.
ans =
6 2 2 6 5 2 6 1 3 6 3 1 6 3 4 2 5 3 4 1
Example 3.37 Generation of Poisson Random Variable
Use the built-in function to generate 20 samples of a Poisson random variable with a = 2.
> Poisson_rnd (2,1,20)
% Generate a 1 * 20 array of samples of a Poisson
% random variable with a = 2.
ans =
4 3 0 2 3 2 1 2 1 4 0 1 2 2 3 4 0 1 3
Annotated References
129
The problems at the end of the chapter elaborate on the rich set of experiments
that can be simulated using these basic capabilities of MATLAB or Octave. In the remainder of this book, we will use Octave in examples because it is freely available.
SUMMARY
• A random variable is a function that assigns a real number to each outcome of a
random experiment. A random variable is defined if the outcome of a random experiment is a number, or if a numerical attribute of an outcome is of interest.
• The notion of an equivalent event enables us to derive the probabilities of events
involving a random variable in terms of the probabilities of events involving the
underlying outcomes.
• A random variable is discrete if it assumes values from some countable set. The
probability mass function is sufficient to calculate the probability of all events
involving a discrete random variable.
• The probability of events involving discrete random variable X can be expressed
as the sum of the probability mass function pX1x2.
• If X is a random variable, then Y = g1X2 is also a random variable.
• The mean, variance, and moments of a discrete random variable summarize some
of the information about the random variable X. These parameters are useful in
practice because they are easier to measure and estimate than the pmf.
• The conditional pmf allows us to calculate the probability of events given partial
information about the random variable X.
• There are a number of methods for generating discrete random variables with
prescribed pmf’s in terms of a random variable that is uniformly distributed in
the unit interval.
CHECKLIST OF IMPORTANT TERMS
Discrete random variable
Equivalent event
Expected value of X
Function of a random variable
nth moment of X
Probability mass function
Random variable
Standard deviation of X
Variance of X
ANNOTATED REFERENCES
Reference [1] is the standard reference for electrical engineers for the material on random variables. Reference [2] discusses some of the finer points regarding the concepts
of a random variable at a level accessible to students of this course. Reference [3] is a
classic text, rich in detailed examples. Reference [4] presents detailed discussions of the
various methods for generating random numbers with specified distributions. Reference [5] is entirely focused on discrete random variables.
1. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic
Processes, 4th ed., McGraw-Hill, New York, 2002.
2. K. L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974.
3. W. Feller, An Introduction to Probability Theory and Its Applications, Wiley, New
York, 1968.
130
Chapter 3
Discrete Random Variables
4. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill,
New York, 2000.
5. N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate Discrete Distributions, Wiley,
New York, 2005.
6. Y. A. Rozanov, Probability Theory: A Concise Course, Dover Publications, New
York, 1969.
PROBLEMS
Section 3.1: The Notion of a Random Variable
3.1. Let X be the maximum of the number of heads obtained when Carlos and Michael each
flip a fair coin twice.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
3.2. A die is tossed and the random variable X is defined as the number of full pairs of dots in
the face showing up.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
(d) Repeat parts a, b, and c, if Y is the number of full or partial pairs of dots in the face
showing up.
(e) Explain why P3X = 04 and P3Y = 04 are not equal.
3.3. The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where
the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product
of x and y, where sgn(t) is 1 if t 7 0, 0 if t = 0, and -1 if t 6 0.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
3.4. A data source generates hexadecimal characters. Let X be the integer value corresponding to a hex character. Suppose that the four binary digits in the character are independent and each is equally likely to be 0 or 1.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
(d) Let Y be the integer value of a hex character but suppose that the most significant bit
is three times as likely to be a “0” as a “1”. Find the probabilities for the values of Y.
3.5. Two transmitters send messages through bursts of radio signals to an antenna. During
each time slot each transmitter sends a message with probability 1>2. Simultaneous transmissions result in loss of the messages. Let X be the number of time slots until the first
message gets through.
Problems
131
(a) Describe the underlying sample space S of this random experiment and specify the
probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
3.6. An information source produces binary triplets 5000, 111, 010, 101, 001, 110, 100, 0116
with corresponding probabilities 51/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/166. A binary code
assigns a codeword of length -log2 pk to triplet k. Let X be the length of the string assigned to the output of the information source.
(a) Show the mapping from S to SX , the range of X.
(b) Find the probabilities for the various values of X.
3.7. An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total
amount that results when two bills are drawn from the urn without replacement.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
3.8. An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total
amount that results when two bills are drawn from the urn with replacement.
(a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
3.9. A coin is tossed n times. Let the random variable Y be the difference between the number of heads and the number of tails in the n tosses of a coin. Assume P[heads] = p.
(a) Describe the sample space of S.
(b) Find the probability of the event 5Y = 06.
(c) Find the probabilities for the other values of Y.
3.10. An m-bit password is required to access a system. A hacker systematically works through
all possible m-bit patterns. Let X be the number of patterns tested until the correct password is found.
(a) Describe the sample space of S.
(b) Show the mapping from S to SX , the range of X.
(c) Find the probabilities for the various values of X.
Section 3.2: Discrete Random Variables and Probability Mass Function
3.11. Let X be the maximum of the coin tosses in Problem 3.1.
(a) Compare the pmf of X with the pmf of Y, the number of heads in two tosses of a fair
coin. Explain the difference.
(b) Suppose that Carlos uses a coin with probability of heads p = 3/4. Find the pmf
of X.
3.12. Consider an information source that produces binary pairs that we designate as
SX = 51, 2, 3, 46. Find and plot the pmf in the following cases:
(a) pk = p1/k for all k in SX .
(b) pk + 1 = pk/2 for k = 2, 3, 4.
132
Chapter 3
3.13.
3.14.
3.15.
3.16.
3.17.
3.18.
3.19.
3.20.
Discrete Random Variables
(c) pk + 1 = pk/2 k for k = 2, 3, 4.
(d) Can the random variables in parts a, b, and c be extended to take on values in the set
51, 2, Á 6? If yes, specify the pmf of the resulting random variables. If no, explain
why not.
Let X be a random variable with pmf pk = c/k2 for k = 1, 2, Á .
(a) Estimate the value of c numerically. Note that the series converges.
(b) Find P3X 7 44.
(c) Find P36 … X … 84.
Compare P3X Ú 84 and P3Y Ú 84 for outputs of the data source in Problem 3.4.
In Problem 3.5 suppose that terminal 1 transmits with probability 1>2 in a given time slot,
but terminal 2 transmits with probability p.
(a) Find the pmf for the number of transmissions X until a message gets through.
(b) Given a successful transmission, find the probability that terminal 2 transmitted.
(a) In Problem 3.7 what is the probability that the amount drawn from the urn is more
than $2? More than $50?
(b) Repeat part a for Problem 3.8.
A modem transmits a +2 voltage signal into a channel. The channel adds to this signal a
noise term that is drawn from the set 50, -1, -2, -36 with respective probabilities
54/10, 3/10, 2/10, 1/106.
(a) Find the pmf of the output Y of the channel.
(b) What is the probability that the output of the channel is equal to the input of the
channel?
(c) What is the probability that the output of the channel is positive?
A computer reserves a path in a network for 10 minutes.To extend the reservation the computer must successfully send a “refresh” message before the expiry time. However, messages are lost with probability 1>2. Suppose that it takes 10 seconds to send a refresh
request and receive an acknowledgment. When should the computer start sending refresh
messages in order to have a 99% chance of successfully extending the reservation time?
A modem transmits over an error-prone channel, so it repeats every “0” or “1” bit transmission five times. We call each such group of five bits a “codeword.” The channel
changes an input bit to its complement with probability p = 1/10 and it does so independently of its treatment of other input bits. The modem receiver takes a majority vote of
the five received bits to estimate the input signal. Find the probability that the receiver
makes the wrong decision.
Two dice are tossed and we let X be the difference in the number of dots facing up.
(a) Find and plot the pmf of X.
(b) Find the probability that ƒ X ƒ … k for all k.
Section 3.3: Expected Value and Moments of Discrete Random Variable
3.21. (a) In Problem 3.11, compare E[Y] to E[X] where X is the maximum of coin tosses.
(b) Compare VAR[X] and VAR[Y].
3.22. Find the expected value and variance of the output of the information sources in Problem
3.12, parts a, b, and c.
3.23. (a) Find E[X] for the hex integers in Problem 3.4.
(b) Find VAR[X].
Problems
133
3.24. Find the mean codeword length in Problem 3.6. How can this average be interpreted in a
very large number of encodings of binary triplets?
3.25. (a) Find the mean and variance of the amount drawn from the urn in Problem 3.7.
(b) Find the mean and variance of the amount drawn from the urn in Problem 3.8.
3.26. Find E[Y] and VAR[Y] for the difference between the number of heads and tails in Problem
3.9. In a large number of repetitions of this random experiment, what is the meaning of E[Y]?
3.27. Find E[X] and VAR[X] in Problem 3.13.
3.28. Find the expected value and variance of the modem signal in Problem 3.17.
3.29. Find the mean and variance of the time that it takes to renew the reservation in Problem 3.18.
3.30. The modem in Problem 3.19 transmits 1000 5-bit codewords. What is the average number
of codewords in error? If the modem transmits 1000 bits individually without repetition,
what is the average number of bits in error? Explain how error rate is traded off against
transmission speed.
3.31. (a) Suppose a fair coin is tossed n times. Each coin toss costs d dollars and the reward in
obtaining X heads is aX2 + bX. Find the expected value of the net reward.
(b) Suppose that the reward in obtaining X heads is aX, where a 7 0. Find the expected
value of the reward.
3.32. Let g1X2 = IA , where A = 5X 7 106.
(a) Find E[g (X)] for X as in Problem 3.12a with SX = 51, 2, Á , 156.
(b) Repeat part a for X as in Problem 3.12b with SX = 51, 2, Á , 156.
(c) Repeat part a for X as in Problem 3.12c with SX = 51, 2, Á , 156.
3.33. Let g1X2 = 1X - 102+ (see Example 3.19).
(a) Find E[X] for X as in Problem 3.12a with SX = 51, 2, Á , 156.
(b) Repeat part a for X as in Problem 3.12b with SX = 51, 2, Á , 156.
(c) Repeat part a for X as in Problem 3.12c with SX = 51, 2, Á , 156.
3.34. Consider the St. Petersburg Paradox in Example 3.16. Suppose that the casino has a total
of M = 2 m dollars, and so it can only afford a finite number of coin tosses.
(a) How many tosses can the casino afford?
(b) Find the expected payoff to the player.
(c) How much should a player be willing to pay to play this game?
Section 3.4: Conditional Probability Mass Function
3.35. (a) In Problem 3.11a, find the conditional pmf of X, the maximum of coin tosses, given
that X 7 0.
(b) Find the conditional pmf of X given that Michael got one head in two tosses.
(c) Find the conditional pmf of X given that Michael got one head in the first toss.
(d) In Problem 3.11b, find the probability that Carlos got the maximum given that X = 2.
3.36. Find the conditional pmf for the quaternary information source in Problem 3.12, parts a,
b, and c given that X 6 4.
3.37. (a) Find the conditional pmf of the hex integer X in Problem 3.4 given that X 6 8.
(b) Find the conditional pmf of X given that the first bit is 0.
(c) Find the conditional pmf of X given that the 4th bit is 0.
3.38. (a) Find the conditional pmf of X in Problem 3.5 given that no message gets through in
time slot 1.
(b) Find the conditional pmf of X given that the first transmitter transmitted in time slot 1.
134
Chapter 3
Discrete Random Variables
3.39. (a) Find the conditional expected value of X in Problem 3.5 given that no message gets
through in the first time slot. Show that E3X ƒ X 7 14 = E3X4 + 1.
(b) Find the conditional expected value of X in Problem 3.5 given that a message gets
through in the first time slot.
(c) Find E[X] by using the results of parts a and b.
(d) Find E3X24 and VAR[X] using the approach in parts b and c.
3.40. Explain why Eq. (3.31b) can be used to find E3X24, but it cannot be used to directly find
VAR[X].
3.41. (a) Find the conditional pmf for X in Problem 3.7 given that the first draw produced k
dollars.
(b) Find the conditional expected value corresponding to part a.
(c) Find E[X] using the results from part b.
(d) Find E3X24 and VAR[X] using the approach in parts b and c.
3.42. Find E[Y] and VAR[Y] for the difference between the number of heads and tails in n
tosses in Problem 3.9. Hint: Condition on the number of heads.
3.43. (a) In Problem 3.10 find the conditional pmf of X given that the password has not been
found after k tries.
(b) Find the conditional expected value of X given X 7 k.
(c) Find E[X] from the results in part b.
Section 3.5: Important Discrete Random Variables
3.44. Indicate the value of the indicator function for the event A, IA1z2, for each z in the sample space S. Find the pmf and expected of IA .
(a) S = 51, 2, 3, 4, 56 and A = 5z 7 36.
(b) S = 30, 14 and A = 50.3 6 z … 0.76.
(c) S = 5z = 1x, y2 : 0 6 x 6 1, 0 6 y 6 16 and
A = 5z = 1x, y2 : 0.25 6 x + y 6 1.256.
(d) S = 1- q , q 2 and A = 5z 7 a6.
3.45. Let A and B be events for a random experiment with sample space S. Show that the
Bernoulli random variable satisfies the following properties:
(a) IS = 1 and I = 0.
(b) IA¨B = IAIB and IA´B = IA + IB - IAIB .
(c) Find the expected value of the indicator functions in parts a and b.
3.46. Heat must be removed from a system according to how fast it is generated. Suppose the
system has eight components each of which is active with probability 0.25, independently
of the others. The design of the heat removal system requires finding the probabilities of
the following events:
(a) None of the systems is active.
(b) Exactly one is active.
(c) More than four are active.
(d) More than two and fewer than six are active.
3.47. Eight numbers are selected at random from the unit interval.
(a) Find the probability that the first four numbers are less than 0.25 and the last four
are greater than 0.25.
Problems
135
(b) Find the probability that four numbers are less than 0.25 and four are greater than 0.25.
(c) Find the probability that the first three numbers are less than 0.25, the next two are
between 0.25 and 0.75, and the last three are greater than 0.75.
(d) Find the probability that three numbers are less than 0.25, two are between 0.25 and
0.75, and three are greater than 0.75.
(e) Find the probability that the first four numbers are less than 0.25 and the last four
are greater than 0.75.
(f) Find the probability that four numbers are less than 0.25 and four are greater than 0.75.
3.48. (a) Plot the pmf of the binomial random variable with n = 4 and n = 5, and
p = 0.10, p = 0.5, and p = 0.90.
(b) Use Octave to plot the pmf of the binomial random variable with n = 100 and
p = 0.10, p = 0.5, and p = 0.90.
3.49. Let X be a binomial random variable that results from the performance of n Bernoulli
trials with probability of success p.
(a) Suppose that X = 1. Find the probability that the single event occurred in the kth
Bernoulli trial.
(b) Suppose that X = 2. Find the probability that the two events occurred in the jth and
kth Bernoulli trials where j 6 k.
(c) In light of your answers to parts a and b in what sense are the successes distributed
“completely at random” over the n Bernoulli trials?
3.50. Let X be the binomial random variable.
(a) Show that
pX1k + 12
n - k p
=
pX1k2
k + 11 - p
where
pX102 = 11 - p2n.
(b) Show that part a implies that: (1) P3X = k4 is maximum at kmax = 31n + 12p4,
where [x] denotes the largest integer that is smaller than or equal to x; and (2) when
1n + 12p is an integer, then the maximum is achieved at kmax and kmax - 1.
3.51. Consider the expression 1a + b + c2n.
(a) Use the binomial expansion for 1a + b2 and c to obtain an expression for 1a + b + c2n.
(b) Now expand all terms of the form 1a + b2k and obtain an expression that involves the multinomial coefficient for M = 3 mutually exclusive events,
A1 , A2 , A3 .
(c) Let p1 = P3A 14, p2 = P3A 24, p3 = P3A 34. Use the result from part b to show that
the multinomial probabilities add to one.
3.52. A sequence of characters is transmitted over a channel that introduces errors with probability p = 0.01.
(a) What is the pmf of N, the number of error-free characters between erroneous characters?
(b) What is E[N]?
(c) Suppose we want to be 99% sure that at least 1000 characters are received correctly
before a bad one occurs. What is the appropriate value of p?
3.53. Let N be a geometric random variable with SN = 51, 2, Á 6.
(a) Find P3N = k ƒ N … m4.
(b) Find the probability that N is odd.
136
Chapter 3
Discrete Random Variables
3.54. Let M be a geometric random variable. Show that M satisfies the memoryless property:
P3M Ú k + j ƒ M Ú j + 14 = P3M Ú k4 for all j, k 7 1.
3.55. Let X be a discrete random variable that assumes only nonnegative integer values and
that satisfies the memoryless property. Show that X must be a geometric random variable. Hint: Find an equation that must be satisfied by g1m2 = P3M Ú m4.
3.56. An audio player uses a low-quality hard drive. The initial cost of building the player is
$50. The hard drive fails after each month of use with probability 1/12. The cost to repair
the hard drive is $20. If a 1-year warranty is offered, how much should the manufacturer
charge so that the probability of losing money on a player is 1% or less? What is the average cost per player?
3.57. A Christmas fruitcake has Poisson-distributed independent numbers of sultana raisins,
iridescent red cherry bits, and radioactive green cherry bits with respective averages 48,
24, and 12 bits per cake. Suppose you politely accept 1/12 of a slice of the cake.
(a) What is the probability that you get lucky and get no green bits in your slice?
(b) What is the probability that you get really lucky and get no green bits and two or
fewer red bits in your slice?
(c) What is the probability that you get extremely lucky and get no green or red bits and
more than five raisins in your slice?
3.58. The number of orders waiting to be processed is given by a Poisson random variable with
parameter a = l/nm, where l is the average number of orders that arrive in a day, m is
the number of orders that can be processed by an employee per day, and n is the number
of employees. Let l = 5 and m = 1. Find the number of employees required so the probability that more than four orders are waiting is less than 10%. What is the probability
that there are no orders waiting?
3.59. The number of page requests that arrive at a Web server is a Poisson random variable
with an average of 6000 requests per minute.
(a) Find the probability that there are no requests in a 100-ms period.
(b) Find the probability that there are between 5 and 10 requests in a 100-ms period.
3.60. Use Octave to plot the pmf of the Poisson random variable with a = 0.1, 0.75, 2, 20.
3.61. Find the mean and variance of a Poisson random variable.
3.62. For the Poisson random variable, show that for a 6 1, P3N = k4 is maximum at k = 0;
for a 7 1, P3N = k4 is maximum at 3a4; and if a is a positive integer, then P3N = k4 is
maximum at k = a, and at k = a - 1. Hint: Use the approach of Problem 3.50.
3.63. Compare the Poisson approximation and the binomial probabilities for k = 0, 1, 2, 3 and
n = 10, p = 0.1; n = 20 and p = 0.05; and n = 100 and p = 0.01.
3.64. At a given time, the number of households connected to the Internet is a Poisson random
variable with mean 50. Suppose that the transmission bit rate available for the household
is 20 Megabits per second.
(a) Find the probability of the distribution of the transmission bit rate per user.
(b) Find the transmission bit rate that is available to a user with probability 90% or
higher.
(c) What is the probability that a user has a share of 1 Megabit per second or higher?
3.65. An LCD display has 1000 * 750 pixels. A display is accepted if it has 15 or fewer faulty
pixels. The probability that a pixel is faulty coming out of the production line is 10 -5. Find
the proportion of displays that are accepted.
Problems
137
3.66. A data center has 10,000 disk drives. Suppose that a disk drive fails in a given day with
probability 10 -3.
(a) Find the probability that there are no failures in a given day.
(b) Find the probability that there are fewer than 10 failures in two days.
(c) Find the number of spare disk drives that should be available so that all failures in a
day can be replaced with probability 99%.
3.67. A binary communication channel has a probability of bit error of 10-6. Suppose that
transmissions occur in blocks of 10,000 bits. Let N be the number of errors introduced by
the channel in a transmission block.
(a) Find P3N = 04, P3N … 34.
(b) For what value of p will the probability of 1 or more errors in a block be 99%?
3.68. Find the mean and variance of the uniform discrete random variable that takes on values
in the set 51, 2, Á , L6 with equal probability. You will need the following formulas:
ai =
n
i=1
n1n + 12
2
2
ai =
n
i=1
n1n + 1212n + 12
.
6
3.69. A voltage X is uniformly distributed in the set 5-3, Á , 3, 46.
(a) Find the mean and variance of X.
(b) Find the mean and variance of Y = - 2X2 + 3.
(c) Find the mean and variance of W = cos1pX/82.
(d) Find the mean and variance of Z = cos21pX/82.
3.70. Ten news Web sites are ranked in terms of popularity, and the frequency of requests to
these sites are known to follow a Zipf distribution.
(a) What is the probability that a request is for the top-ranked site?
(b) What is the probability that a request is for one of the bottom five sites?
3.71. A collection of 1000 words is known to have a Zipf distribution.
(a) What is the probability of the 10 top-ranked words?
(b) What is the probability of the 10 lowest-ranked words?
3.72. What is the shape of the log of the Zipf probability vs. the log of the rank?
3.73. Plot the mean and variance of the Zipf random variable for L = 1 to L = 100.
3.74. An online video store has 10,000 titles. In order to provide fast response, the store caches
the most popular titles. How many titles should be in the cache so that with probability
99% an arriving video request will be in the cache?
3.75. (a) Income distribution is perfectly equal if every individual has the same income. What
is the Lorenz curve in this case?
(b) In a perfectly unequal income distribution, one individual has all the income and all
others have none. What is the Lorenz curve in this case?
3.76. Let X be a geometric random variable in the set 51, 2, Á 6.
(a) Find the pmf of X.
(b) Find the Lorenz curve of X. Assume L is infinite.
(c) Plot the curve for p = 0.1, 0.5, 0.9.
3.77. Let X be a zeta random variable with parameter a.
(a) Find an expression for P3X … k4.
138
Chapter 3
Discrete Random Variables
(b) Plot the pmf of X for a = 1.5, 2, and 3.
(c) Plot P3X … k4 for a = 1.5, 2, and 3.
Section 3.6: Generation of Discrete Random Variables
3.78. Octave provides function calls to evaluate the pmf of important discrete random variables. For example, the function Poisson_pdf(x, lambda) computes the pmf at x for the
Poisson random variable.
(a) Plot the Poisson pmf for l = 0.5, 5, 50, as well as P3X … k4 and P3X 7 k4.
(b) Plot the binomial pmf for n = 48 and p = 0.10, 0.30, 0.50, 0.75, as well as P3X … k4
and P3X 7 k4.
(c) Compare the binomial probabilities with the Poisson approximation for n = 100,
p = 0.01.
3.79. The discrete_pdf function in Octave makes it possible to specify an arbitrary pmf for a
specified SX .
(a) Plot the pmf for Zipf random variables with L = 10, 100, 1000, as well as P3X … k4
and P3X 7 k4.
(b) Plot the pmf for the reward in the St. Petersburg Paradox for m = 20 in Problem 3.34, as
well as P3X … k4 and P3X 7 k4. (You will need to use a log scale for the values of k.)
3.80. Use Octave to plot the Lorenz curve for the Zipf random variables in Problem 3.79a.
3.81. Repeat Problem 3.80 for the binomial random variable with n = 100 and p = 0.1, 0.5,
and 0.9.
3.82. (a) Use the discrete_rnd function in Octave to simulate the urn experiment discussed in
Section 1.3. Compute the relative frequencies of the outcomes in 1000 draws from the urn.
(b) Use the discrete_pdf function in Octave to specify a pmf for a binomial random
variable with n = 5 and p = 0.2. Use discrete_rnd to generate 100 samples and
plot the relative frequencies.
(c) Use binomial_rnd to generate the 100 samples in part b.
3.83. Use the discrete_rnd function to generate 200 samples of the Zipf random variable in Problem 3.79a. Plot the sequence of outcomes as well as the overall relative
frequencies.
3.84. Use the discrete_rnd function to generate 200 samples of the St. Petersburg Paradox
random variable in Problem 3.79b. Plot the sequence of outcomes as well as the overall
relative frequencies.
3.85. Use Octave to generate 200 pairs of numbers, 1Xi , Yi2, in which the components are independent, and each component is uniform in the set 51, 2, Á , 9, 106.
(a) Plot the relative frequencies of the X and Y outcomes.
(b) Plot the relative frequencies of the random variable Z = X + Y. Can you discern
the pmf of Z?
(c) Plot the relative frequencies of W = XY. Can you discern the pmf of Z?
(d) Plot the relative frequencies of V = X/Y. Is the pmf discernable?
3.86. Use Octave function binomial_rnd to generate 200 pairs of numbers, 1Xi , Yi2, in which
the components are independent, and where Xi are binomial with parameter
n = 8, p = 0.5 and Yi are binomial with parameter n = 4, p = 0.5.
Problems
139
(a) Plot the relative frequencies of the X and Y outcomes.
(b) Plot the relative frequencies of the random variable Z = X + Y. Does this correspond to the pmf you would expect? Explain.
3.87. Use Octave function Poisson_rnd to generate 200 pairs of numbers, 1Xi , Yi2, in which
the components are independent, and where Xi are the number of arrivals to a system in
one second and Yi are the number of arrivals to the system in the next two seconds. Assume that the arrival rate is five customers per second.
(a) Plot the relative frequencies of the X and Y outcomes.
(b) Plot the relative frequencies of the random variable Z = X + Y. Does this correspond to the pmf you would expect? Explain.
Problems Requiring Cumulative Knowledge
3.88. The fraction of defective items in a production line is p. Each item is tested and defective
items are identified correctly with probability a.
(a) Assume nondefective items always pass the test. What is the probability that k items
are tested until a defective item is identified?
(b) Suppose that the identified defective items are removed. What proportion of the
remaining items is defective?
(c) Now suppose that nondefective items are identified as defective with probability b.
Repeat part b.
3.89. A data transmission system uses messages of duration T seconds. After each message
transmission, the transmitter stops and waits T seconds for a reply from the receiver.The receiver immediately replies with a message indicating that a message was received correctly.
The transmitter proceeds to send a new message if it receives a reply within T seconds; otherwise, it retransmits the previous message. Suppose that messages can be completely garbled while in transit and that this occurs with probability p. Find the maximum possible rate
at which messages can be successfully transmitted from the transmitter to the receiver.
3.90. An inspector selects every nth item in a production line for a detailed inspection. Suppose that the time between item arrivals is an exponential random variable with mean 1
minute, and suppose that it takes 2 minutes to inspect an item. Find the smallest value of
n such that with a probability of 90% or more, the inspection is completed before the arrival of the next item that requires inspection.
3.91. The number X of photons counted by a receiver in an optical communication system is a
Poisson random variable with rate l1 when a signal is present and a Poisson random variable
with rate l0 6 l1 when a signal is absent. Suppose that a signal is present with probability p.
(a) Find P3signal present ƒ X = k4 and P3signal absent ƒ X = k4.
(b) The receiver uses the following decision rule:
If P3signal present ƒ X = k4 7 P3signal absent ƒ X = k4, decide signal present;
otherwise, decide signal absent.
Show that this decision rule leads to the following threshold rule:
If X 7 T, decide signal present; otherwise, decide signal absent.
(c) What is the probability of error for the above decision rule?
140
Chapter 3
Discrete Random Variables
3.92. A binary information source (e.g., a document scanner) generates very long strings of 0’s followed by occasional 1’s. Suppose that symbols are independent and that p = P3symbol = 04
is very close to one. Consider the following scheme for encoding the run X of 0’s between
consecutive 1’s:
1. If X = n, express n as a multiple of an integer M = 2 m and a remainder r, that is, find
k and r such that n = kM + r, where 0 … r 6 M - 1;
2. The binary codeword for n then consists of a prefix consisting of k 0’s followed by a 1,
and a suffix consisting of the m-bit representation of the remainder r. The decoder can
deduce the value of n from this binary string.
(a) Find the probability that the prefix has k zeros, assuming that pM = 1/2.
(b) Find the average codeword length when pM = 1/2.
(c) Find the compression ratio, which is defined as the ratio of the average run length
to the average codeword length when pM = 1/2.
CHAPTER
One Random Variable
4
In Chapter 3 we introduced the notion of a random variable and we developed methods for calculating probabilities and averages for the case where the random variable is
discrete. In this chapter we consider the general case where the random variable may
be discrete, continuous, or of mixed type. We introduce the cumulative distribution
function which is used in the formal definition of a random variable, and which can
handle all three types of random variables. We also introduce the probability density
function for continuous random variables. The probabilities of events involving a random variable can be expressed as integrals of its probability density function. The expected value of continuous random variables is also introduced and related to our
intuitive notion of average. We develop a number of methods for calculating probabilities and averages that are the basic tools in the analysis and design of systems that involve randomness.
4.1
THE CUMULATIVE DISTRIBUTION FUNCTION
The probability mass function of a discrete random variable was defined in terms of
events of the form 5X = b6. The cumulative distribution function is an alternative approach which uses events of the form 5X … b6. The cumulative distribution function
has the advantage that it is not limited to discrete random variables and applies to all
types of random variables. We begin with a formal definition of a random variable.
Definition: Consider a random experiment with sample space S and event
class F. A random variable X is a function from the sample space S to R with
the property that the set A b = 5z : X1z2 … b6 is in F for every b in R.
The definition simply requires that every set Ab have a well defined probability in
the underlying random experiment, and this is not a problem in the cases we will consider.
Why does the definition use sets of the form 5z : X1z2 … b6 and not 5z : X1z2 = x b6?
We will see that all events of interest in the real line can be expressed in terms of sets of
the form 5z : X1z2 … b6.
The cumulative distribution function (cdf) of a random variable X is defined as
the probability of the event 5X … x6:
FX1x2 = P3X … x4
for - q 6 x 6 + q ,
(4.1)
141
142
Chapter 4
One Random Variable
that is, it is the probability that the random variable X takes on a value in the set
1- q , x4. In terms of the underlying sample space, the cdf is the probability of the
event 5z : X1z2 … x6. The event 5X … x6 and its probability vary as x is varied; in
other words, FX1x2 is a function of the variable x.
The cdf is simply a convenient way of specifying the probability of all semi-infinite intervals of the real line of the form 1- q , b4. The events of interest when dealing
with numbers are intervals of the real line, and their complements, unions, and intersections. We show below that the probabilities of all of these events can be expressed in
terms of the cdf.
The cdf has the following interpretation in terms of relative frequency. Suppose
that the experiment that yields the outcome z, and hence X1z2, is performed a large
number of times. FX1b2 is then the long-term proportion of times in which X1z2 … b.
Before developing the general properties of the cdf, we present examples of the
cdfs for three basic types of random variables.
Example 4.1 Three Coin Tosses
Figure 4.1(a) shows the cdf X, the number of heads in three tosses of a fair coin. From Example 3.1
we know that X takes on only the values 0, 1, 2, and 3 with probabilities 1/8, 3/8, 3/8, and 1/8, respectively, so FX1x2 is simply the sum of the probabilities of the outcomes from 50, 1, 2, 36 that are less
than or equal to x.The resulting cdf is seen to be a nondecreasing staircase function that grows from
0 to 1. The cdf has jumps at the points 0, 1, 2, 3 of magnitudes 1/8, 3/8, 3/8, and 1/8, respectively.
Let us take a closer look at one of these discontinuities, say, in the vicinity of
x = 1. For d a small positive number, we have
FX11 - d2 = P3X … 1 - d4 = P50 heads6 =
1
8
so the limit of the cdf as x approaches 1 from the left is 1/8. However,
FX112 = P3X … 14 = P30 or 1 heads4 =
3
1
1
+ = ,
8
8
2
and furthermore the limit from the right is
FX11 + d2 = P3X … 1 + d4 = P30 or 1 heads4 =
FX (x)
1
.
2
fX (x)
x
0
1
2
(a)
3
FIGURE 4.1
cdf (a) and pdf (b) of a discrete random variable.
3
8
3
8
1
8
0
1
1
8
2
(b)
3
x
Section 4.1
The Cumulative Distribution Function
143
Thus the cdf is continuous from the right and equal to 1/2 at the point x = 1. Indeed,
we note the magnitude of the jump at the point x = 1 is equal to P3X = 14 = 1/2
- 1/8 = 3/8. Henceforth we will use dots in the graph to indicate the value of the cdf at
the points of discontinuity.
The cdf can be written compactly in terms of the unit step function:
u1x2 = b
for x 6 0
for x Ú 0 ,
0
1
(4.2)
then
FX1x2 =
3
3
1
1
u1x2 + u1x - 12 + u1x - 22 + u1x - 32.
8
8
8
8
Example 4.2 Uniform Random Variable in the Unit Interval
Spin an arrow attached to the center of a circular board. Let u be the final angle of the arrow,
where 0 6 u … 2p. The probability that u falls in a subinterval of 10, 2p4 is proportional to
the length of the subinterval. The random variable X is defined by X1u2 = u>2p. Find the cdf
of X:
As u increases from 0 to 2p, X increases from 0 to 1. No outcomes u lead to values x … 0, so
FX1x2 = P3X … x4 = P34 = 0
for x 6 0.
For 0 6 x … 1, 5X … x6 occurs when 5u … 2px6 so
FX1x2 = P3X … x4 = P35u … 2px64 = 2px/2p = x
0 6 x … 1.
(4.3)
Finally, for x 7 1, all outcomes u lead to 5X1u2 … 1 6 x6, therefore:
FX1x2 = P3X … x4 = P30 6 u … 2p4 = 1
for x 7 1.
We say that X is a uniform random variable in the unit interval. Figure 4.2(a) shows the cdf
of the general uniform random variable X. We see that FX1x2 is a nondecreasing continuous
function that grows from 0 to 1 as x ranges from its minimum values to its maximum values.
FX (x)
fX (x)
1
ba
1
x
x
a
b
(a)
FIGURE 4.2
cdf (a) and pdf (b) of a continuous random variable.
a
b
(b)
144
Chapter 4
One Random Variable
Example 4.3
The waiting time X of a customer at a taxi stand is zero if the customer finds a taxi parked at the
stand, and a uniformly distributed random length of time in the interval 30, 14 (in hours) if no
taxi is found upon arrival. The probability that a taxi is at the stand when the customer arrives is
p. Find the cdf of X.
The cdf is found by applying the theorem on total probability:
FX1x2 = P3X … x4 = P3X … x ƒ find taxi4p + P3X … x ƒ no taxi411 - p2.
Note that P3X … x ƒ find taxi4 = 1 when x Ú 0 and 0 otherwise. Furthermore P3X … x ƒ no taxi4
is given by Eq. (4.3), therefore
x 6 0
0 … x … 1
x 7 1.
0
FX1x2 = c p + 11 - p2x
1
The cdf, shown in Fig. 4.3(a), combines some of the properties of the cdf in Example 4.1
(discontinuity at 0) and the cdf in Example 4.2 (continuity over intervals). Note that FX1x2 can
be expressed as the sum of a step function with amplitude p and a continuous function of x.
We are now ready to state the basic properties of the cdf. The axioms of probability and their corollaries imply that the cdf has the following properties:
(i) 0 … FX1x2 … 1.
(ii) lim FX1x2 = 1.
x: q
(iii)
lim FX1x2 = 0.
x: -q
(iv) FX1x2 is a nondecreasing function of x, that is, if a 6 b, then FX1a2 … FX1b2.
(v) FX1x2 is continuous from the right, that is, for h 7 0, FX1b2 = lim FX1b + h2
h:0
= FX1b+2.
These five properties confirm that, in general, the cdf is a nondecreasing function that
grows from 0 to 1 as x increases from - q to q . We already observed these properties
in Examples 4.1, 4.2, and 4.3. Property (v) implies that at points of discontinuity, the cdf
1
FX (x)
fX (x)
p
1⫺p
p
x
0
1
(a)
FIGURE 4.3
cdf (a) and pdf (b) of a random variable of mixed type.
x
0
1
(b)
Section 4.1
The Cumulative Distribution Function
145
is equal to the limit from the right. We observed this property in Examples 4.1 and 4.3.
In Example 4.2 the cdf is continuous for all values of x, that is, the cdf is continuous both
from the right and from the left for all x.
The cdf has the following properties which allow us to calculate the probability of
events involving intervals and single values of X:
(vi) P3a 6 X … b4 = FX1b2 - FX1a2.
(vii) P3X = b4 = FX1b2 - FX1b-2.
(viii) P3X 7 x4 = 1 - FX1x2.
Property (vii) states that the probability that X = b is given by the magnitude of the
jump of the cdf at the point b. This implies that if the cdf is continuous at a point b, then
P3X = b4 = 0. Properties (vi) and (vii) can be combined to compute the probabilities
of other types of intervals. For example, since 5a … X … b6 = 5X = a6 ´ 5a 6 X
… b6, then
P3a … X … b4 = P3X = a4 + P3a 6 X … b4
= FX1a2 - FX1a -2 + FX1b2 - FX1a2 = FX1b2 - FX1a -2. (4.4)
If the cdf is continuous at the endpoints of an interval, then the endpoints have zero
probability, and therefore they can be included in, or excluded from, the interval without affecting the probability.
Example 4.4
Let X be the number of heads in three tosses of a fair coin. Use the cdf to find the probability of
the events A = 51 6 X … 26, B = 50.5 … X 6 2.56, and C = 51 … X 6 26.
From property (vi) and Fig. 4.1 we have
P31 6 X … 24 = FX122 - FX112 = 7/8 - 1/2 = 3/8.
The cdf is continuous at x = 0.5 and x = 2.5, so
P30.5 … X 6 2.54 = FX12.52 - FX10.52 = 7/8 - 1/8 = 6/8.
Since 51 … X 6 26 ´ 5X = 26 = 51 … X … 26, from Eq. (4.4) we have
P51 … X 6 24 + P3X = 24 = FX122 - FX11-2,
and using property (vii) for P3X = 24:
P51 … X 6 24 = FX122 - FX11-2 - P3X = 24 = FX122 - FX11-2 - 1FX122 - FX12 -22
= FX12 -2 - FX11-2 = 4/8 - 1/8 = 3/8.
Example 4.5
Let X be the uniform random variable from Example 4.2. Use the cdf to find the probability of
the events 5-0.5 6 X 6 0.256, 50.3 6 X 6 0.656, and 5 ƒ X - 0.4 ƒ 7 0.26.
146
Chapter 4
One Random Variable
The cdf of X is continuous at every point so we have:
P3-0.5 6 X … 0.254 = FX10.252 - FX1-0.52 = 0.25 - 0 = 0.25,
P30.3 6 X 6 0.654 = FX10.652 - FX10.32 = 0.65 - 0.3 = 0.35,
P3 ƒ X - 0.4 ƒ 7 0.24 = P35X 6 0.26 ´ 5X 7 0.64 = P3X 6 0.24 + P3X 7 0.64
= FX10.22 + 11 - FX10.622 = 0.2 + 0.4 = 0.6.
We now consider the proof of the properties of the cdf.
• Property (i) follows from the fact that the cdf is a probability and hence must satisfy Axiom I and Corollary 2.
• To obtain property (iv), we note that the event 5X … a6 is a subset of 5X … b6,
and so it must have smaller or equal probability (Corollary 7).
• To show property (vi), we note that 5X … b6 can be expressed as the union of
mutually exclusive events: 5X … a6 ´ 5a 6 X … b6 = 5X … b6, and so by
Axiom III, FX1a2 + P3a 6 X … b4 = FX1b2.
• Property (viii) follows from 5X 7 x6 = 5X … x6c and Corollary 1.
While intuitively clear, properties (ii), (iii), (v), and (vii) require more advanced limiting arguments that are discussed at the end of this section.
4.1.1
The Three Types of Random Variables
The random variables in Examples 4.1, 4.2, and 4.3 are typical of the three most basic
types of random variable that we are interested in.
Discrete random variables have a cdf that is a right-continuous, staircase function
of x, with jumps at a countable set of points x0 , x1 , x2 , Á . The random variable in
Example 4.1 is a typical example of a discrete random variable. The cdf FX1x2 of a discrete random variable is the sum of the probabilities of the outcomes less than x and
can be written as the weighted sum of unit step functions as in Example 4.1:
FX1x2 = a pX1xk2 = a pX1xk2u1x - xk2,
xk … x
(4.5)
k
where the pmf pX1xk2 = P3X = xk4 gives the magnitude of the jumps in the cdf. We
see that the pmf can be obtained from the cdf and vice versa.
A continuous random variable is defined as a random variable whose cdf FX1x2
is continuous everywhere, and which, in addition, is sufficiently smooth that it can be
written as an integral of some nonnegative function f(x):
FX1x2 =
x
L- q
f1t2 dt.
(4.6)
The random variable discussed in Example 4.2 can be written as an integral of the function
shown in Fig. 4.2(b). The continuity of the cdf and property (vii) implies that continuous
Section 4.1
The Cumulative Distribution Function
147
random variables have P3X = x4 = 0 for all x. Every possible outcome has probability
zero! An immediate consequence is that the pmf cannot be used to characterize the probabilities of X. A comparison of Eqs. (4.5) and (4.6) suggests how we can proceed to characterize continuous random variables. For discrete random variables, (Eq. 4.5), we calculate
probabilities as summations of probability masses at discrete points. For continuous random variables, (Eq. 4.6), we calculate probabilities as integrals of “probability densities”
over intervals of the real line.
A random variable of mixed type is a random variable with a cdf that has jumps
on a countable set of points x0 , x1 , x2 , Á , but that also increases continuously over at
least one interval of values of x. The cdf for these random variables has the form
FX1x2 = pF11x2 + 11 - p2F21x2,
where 0 6 p 6 1, and F11x2 is the cdf of a discrete random variable and F21x2 is the cdf
of a continuous random variable. The random variable in Example 4.3 is of mixed type.
Random variables of mixed type can be viewed as being produced by a two-step
process: A coin is tossed; if the outcome of the toss is heads, a discrete random variable
is generated according to F11x2; otherwise, a continuous random variable is generated
according to F21x2.
*4.1.2 Fine Point: Limiting properties of cdf
Properties (ii), (iii), (v), and (vii) require the continuity property of the probability
function discussed in Section 2.9. For example, for property (ii), we consider the sequence of events 5X … n6 which increases to include all of the sample space S as n approaches q , that is, all outcomes lead to a value of X less than infinity. The continuity
property of the probability function (Corollary 8) implies that:
lim FX1n2 = lim P3X … n4 = P3 lim 5X … n64 = P3S4 = 1.
n: q
n: q
n: q
For property (iii), we take the sequence 5X … -n6 which decreases to the empty set
, that is, no outcome leads to a value of X less than - q :
lim FX1-n2 = lim P3X … -n4 = P3 lim 5X … -n64 = P34 = 0.
n: q
n: q
n: q
For property (v), we take the sequence of events 5X … x + 1/n6 which decreases to
5X … x6 from the right:
lim FX1x + 1/n2 = lim P3X … x + 1/n4
n: q
n: q
= P3 lim 5X … x + 1/n64 = P35X … x64 = FX1x2.
n: q
Finally, for property (vii), we take the sequence of events, 5b - 1/n 6 X … b6 which
decreases to 5b6 from the left:
lim 1FX1b2 - FX1b - 1/n22 = lim P3b - 1/n 6 X … b4
n: q
n: q
= P3 lim 5b - 1/n 6 X … b64 = P3X = b4.
n: q
148
4.2
Chapter 4
One Random Variable
THE PROBABILITY DENSITY FUNCTION
The probability density function of X (pdf), if it exists, is defined as the derivative of
FX1x2:
fX1x2 =
dFX1x2
dx
.
(4.7)
In this section we show that the pdf is an alternative, and more useful, way of specifying the information contained in the cumulative distribution function.
The pdf represents the “density” of probability at the point x in the following
sense: The probability that X is in a small interval in the vicinity of x—that is, 5x 6 X
… x + h6—is
P3x 6 X … x + h4 = FX1x + h2 - FX1x2
=
FX1x + h2 - FX1x2
h
h.
(4.8)
If the cdf has a derivative at x, then as h becomes very small,
P3x 6 X … x + h4 M fX1x2h.
(4.9)
Thus fX1x2 represents the “density” of probability at the point x in the sense that the probability that X is in a small interval in the vicinity of x is approximately fX1x2h. The derivative of the cdf, when it exists, is positive since the cdf is a nondecreasing function of x, thus
(i) fX1x2 Ú 0.
(4.10)
Equations (4.9) and (4.10) provide us with an alternative approach to specifying
the probabilities involving the random variable X. We can begin by stating a nonnegative function fX1x2, called the probability density function, which specifies the probabilities of events of the form “X falls in a small interval of width dx about the point x,”
as shown in Fig. 4.4(a). The probabilities of events involving X are then expressed in
terms of the pdf by adding the probabilities of intervals of width dx. As the widths of
the intervals approach zero, we obtain an integral in terms of the pdf. For example, the
probability of an interval [a, b] is
b
(4.11)
fX1x2 dx.
La
The probability of an interval is therefore the area under fX1x2 in that interval, as shown
in Fig. 4.4(b). The probability of any event that consists of the union of disjoint intervals can thus be found by adding the integrals of the pdf over each of the intervals.
The cdf of X can be obtained by integrating the pdf:
(ii) P3a … X … b4 =
(iii) FX1x2 =
x
(4.12)
fX1t2 dt.
L- q
In Section 4.1, we defined a continuous random variable as a random variable X whose
cdf was given by Eq. (4.12). Since the probabilities of all events involving X can be
written in terms of the cdf, it then follows that these probabilities can be written in
Section 4.2
149
The Probability Density Function
fX (x)
fX (x)
x
a
x x dx
P关x X x dx兴 ⬵ fX (x)dx
x
b
P关a X b兴 兰ab fX (x)dx
(a)
(b)
FIGURE 4.4
(a) The probability density function specifies the probability of intervals of infinitesimal width. (b) The probability of an
interval [a, b] is the area under the pdf in that interval.
terms of the pdf. Thus the pdf completely specifies the behavior of continuous random
variables.
By letting x tend to infinity in Eq. (4.12), we obtain a normalization condition for
pdf’s:
+q
(iv) 1 =
L- q
fX1t2 dt.
(4.13)
The pdf reinforces the intuitive notion of probability as having attributes similar
to “physical mass.” Thus Eq. (4.11) states that the probability “mass” in an interval is
the integral of the “density of probability mass” over the interval. Equation (4.13)
states that the total mass available is one unit.
A valid pdf can be formed from any nonnegative, piecewise continuous function
g(x) that has a finite integral:
q
L- q
g1x2 dx = c 6 q .
(4.14)
By letting fX1x2 = g1x2/c, we obtain a function that satisfies the normalization condition. Note that the pdf must be defined for all real values of x; if X does not take on values from some region of the real line, we simply set fX1x2 = 0 in the region.
Example 4.6 Uniform Random Variable
The pdf of the uniform random variable is given by:
1
fX1x2 = c b - a
0
a … x … b
x 6 a and x 7 b
(4.15a)
150
Chapter 4
One Random Variable
and is shown in Fig. 4.2(b). The cdf is found from Eq. (4.12):
x 6 a
0
x - a
FX1x2 = d
b - a
1
a … x … b
(4.15b)
x 7 b.
The cdf is shown in Fig. 4.2(a).
Example 4.7 Exponential Random Variable
The transmission time X of messages in a communication system has an exponential distribution:
P3X 7 x4 = e -lx
x 7 0.
Find the cdf and pdf of X.
The cdf is given by FX1x2 = 1 - P3X 7 x4
FX1x2 = b
x 6 0
x Ú 0.
0
1 - e -lx
(4.16a)
The pdf is obtained by applying Eq. (4.7):
œ
fX1x2 = F X
1x2 = b
Example 4.8
x 6 0
x Ú 0.
0
le -lx
(4.16b)
Laplacian Random Variable
The pdf of the samples of the amplitude of speech waveforms is found to decay exponentially at
a rate a, so the following pdf is proposed:
fX1x2 = ce -aƒxƒ
- q 6 x 6 q.
(4.17)
Find the constant c, and then find the probability P3 ƒ X ƒ 6 v4.
We use the normalization condition in (iv) to find c:
q
1 =
L- q
ce -aƒxƒ dx = 2
L0
q
ce -ax dx =
2c
.
a
Therefore c = a/2. The probability P[ ƒ X ƒ 6 v] is found by integrating the pdf:
v
P3 ƒ X ƒ 6 v4 =
4.2.1
v
a
a
e -aƒxƒ dx = 2 a b
e -ax dx = 1 - e -av.
2 L-v
2 L0
pdf of Discrete Random Variables
The derivative of the cdf does not exist at points where the cdf is not continuous. Thus
the notion of pdf as defined by Eq. (4.7) does not apply to discrete random variables
at the points where the cdf is discontinuous. We can generalize the definition of the
Section 4.2
The Probability Density Function
151
probability density function by noting the relation between the unit step function and
the delta function. The unit step function is defined as
u1x2 = b
x 6 0
x Ú 0.
0
1
(4.18a)
The delta function d1t2 is related to the unit step function by the following equation:
x
u1x2 =
L- q
(4.18b)
d1t2 dt.
A translated unit step function is then:
u1x - x02 =
x - x0
L- q
x
L- q
d1t2 dt =
d1t¿ - x02 dt¿.
(4.18c)
Substituting Eq. (4.18c) into the cdf of a discrete random variables:
FX1x2 = a pX1xk2u1x - xk2 = a pX1xk2
k
k
x
=
L- q
x
L- q
d1t - xk2 dt
a pX1xk2d1t - xk2 dt.
(4.19)
k
This suggests that we define the pdf for a discrete random variable by
fX1x2 =
d
F 1x2 = a pX1xk2d1x - xk2.
dx X
k
(4.20)
Thus the generalized definition of pdf places a delta function of weight P3X = xk4 at
the points xk where the cdf is discontinuous.
To provide some intuition on the delta function, consider a narrow rectangular
pulse of unit area and width ¢ centered at t = 0:
p¢1t2 = b
- ¢/2 … t … ¢/2
ƒ t ƒ 7 ¢.
1/¢
0
Consider the integral of p¢(t):
x
x
L- q
p¢1t2 dt = e
L- q
x
L- q
p¢1t2 dt =
p¢1t2 dt =
x
L- q
0 dt = 0
for x 6 - ¢/2
u : u1x2. (4.21)
¢/2
L-¢/2
1/¢ dt = 1
for x 7 ¢/2
As ¢ : 0, we see that the integral of the narrow pulse approaches the unit step function. For this reason, we visualize the delta function d1t2 as being zero everywhere
152
Chapter 4
One Random Variable
except at x = 0 where it is unbounded. The above equation does not apply at the value
x = 0. To maintain the right continuity in Eq. (4.18a), we use the convention:
0
u102 = 1 =
L- q
d1t2 dt.
If we replace p¢1t2 in the above derivation with g1t2p¢1t2, we obtain the “sifting”
property of the delta function:
q
g102 =
L- q
g1x02 =
g1t2d1t2 dt and
q
L- q
g1t2d1t - x02 dt.
(4.22)
The delta function is viewed as sifting through x and picking out the value of g at the
point where the delta functions is centered, that is, g1x02 for the expression on the right.
The pdf for the discrete random variable discussed in Example 4.1 is shown in
Fig. 4.1(b). The pdf of a random variable of mixed type will also contain delta functions
at the points where its cdf is not continuous. The pdf for the random variable discussed
in Example 4.3 is shown in Fig. 4.3(b).
Example 4.9
Let X be the number of heads in three coin tosses as in Example 4.1. Find the pdf of X. Find
P31 6 X … 24 and P32 … X 6 34 by integrating the pdf.
In Example 4.1 we found that the cdf of X is given by
FX1x2 =
3
3
1
1
u1x2 + u1x - 12 + u1x - 22 + u1x - 32.
8
8
8
8
It then follows from Eqs. (4.18) and (4.19) that
fX1x2 =
1
3
3
1
d1x2 + d1x - 12 + d1x - 22 + d1x - 32.
8
8
8
8
When delta functions appear in the limits of integration, we must indicate whether the delta
functions are to be included in the integration. Thus in P31 6 X … 24 = P3X in 11, 244, the
delta function located at 1 is excluded from the integral and the delta function at 2 is included:
P31 6 X … 24 =
2+
fX1x2 dx =
3
.
8
3-
fX1x2 dx =
3
.
8
L1+
Similarly, we have that
P32 … X 6 34 =
4.2.2
L2-
Conditional cdf’s and pdf’s
Conditional cdf’s can be defined in a straightforward manner using the same approach
we used for conditional pmf’s. Suppose that event C is given and that P3C4 7 0. The
conditional cdf of X given C is defined by
FX1x ƒ C2 =
P35X … x6 ¨ C4
P3C4
if P3C4 7 0.
(4.23)
Section 4.2
The Probability Density Function
153
It is easy to show that FX1x ƒ C2 satisfies all the properties of a cdf. (See Problem 4.29.)
The conditional pdf of X given C is then defined by
fX1x ƒ C2 =
d
F 1x ƒ C2.
dx X
(4.24)
Example 4.10
The lifetime X of a machine has a continuous cdf FX1x2. Find the conditional cdf and pdf given
the event C = 5X 7 t6 (i.e., “machine is still working at time t”).
The conditional cdf is
FX1x ƒ X 7 t2 = P3X … x ƒ X 7 t4 =
P35X … x6 ¨ 5X 7 t64
P3X 7 t4
.
The intersection of the two events in the numerator is equal to the empty set when x 6 t and to
5t 6 X … x6 when x Ú t. Thus
FX1x ƒ X 7 t2 = c
0
FX1x2 - FX1t2
1 - FX1t2
x … t
x 7 t.
The conditional pdf is found by differentiating with respect to x:
fX1x ƒ X 7 t2 =
fX1x2
1 - FX1t2
x Ú t.
Now suppose that we have a partition of the sample space S into the union of disjoint events B1 , B2 , Á , Bn . Let FX1x ƒ Bi2 be the conditional cdf of X given event Bi .
The theorem on total probability allows us to find the cdf of X in terms of the conditional cdf’s:
FX1x2 = P3X … x4 = a P3X … x ƒ Bi4P3Bi4 = a FX1x ƒ Bi2P3Bi4.
n
n
i=1
i=1
(4.25)
The pdf is obtained by differentiation:
fX1x2 =
n
d
FX1x2 = a fX1x ƒ Bi2P3Bi4.
dx
i=1
(4.26)
Example 4.11
A binary transmission system sends a “0” bit by transmitting a -v voltage signal, and a “1” bit by
transmitting a +v. The received signal is corrupted by Gaussian noise and given by:
Y = X + N
where X is the transmitted signal, and N is a noise voltage with pdf fN1x2. Assume that
P3“1”4 = p = 1 - P3“0”4. Find the pdf of Y.
154
Chapter 4
One Random Variable
Let B0 be the event “0” is transmitted and B1 be the event “1” is transmitted, then B0 , B1
form a partition, and
FY1x2 = FY1x ƒ B023B04 + FY1x ƒ B123B14
= P3Y … x ƒ X = -v411 - p2 + P3Y … x ƒ X = v4p.
Since Y = X + N, the event 5Y 6 x ƒ X = v6 is equivalent to 5v + N 6 x6 and 5N 6 x - v6,
and the event 5Y 6 x ƒ X = -v6 is equivalent to 5N 6 x + v6. Therefore the conditional
cdf’s are:
FY1x ƒ B02 = P3N … x + v4 = FN1x + v2
and
FY1x ƒ B12 = P3N … x - v4 = FN1x - v2.
The cdf is:
FY1x2 = FN1x + v211 - p2 + FN1x - v2p.
The pdf of N is then:
fY1x2 =
=
d
FY1x2
dx
d
d
FN1x + v211 - p2 +
FN1x - v2p
dx
dx
= fN1x + v211 - p2 + fN1x - v2p.
The Gaussian random variable has pdf:
fN1x2 =
1
22ps
2
2
e -x /2s
2
- q 6 x 6 q.
The conditional pdfs are:
fY1x ƒ B02 = fN1x + v2 =
22ps
fN(x v)
fN(x v)
1
2
2
2
e -1x + v2 /2s
x
⫺v
0
FIGURE 4.5
The conditional pdfs given the input signal
v
Section 4.3
The Expected Value of X
155
and
fY1x ƒ B12 = fN1x - v2 =
1
22ps
2
2
2
e -1x - v2 /2s .
The pdf of the received signal Y is then:
fY1x2 =
1
22ps
e -1x + v2 /2s 11 - p2 +
2
2
2
1
22ps
2
2
2
e -1x - v2 /2s p.
Figure 4.5 shows the two conditional pdfs. We can see that the transmitted signal X shifts the center of mass of the Gaussian pdf.
4.3
THE EXPECTED VALUE OF X
We discussed the expected value for discrete random variables in Section 3.3, and found
that the sample mean of independent observations of a random variable approaches
E3X4. Suppose we perform a series of such experiments for continuous random variables. Since continuous random variables have P3X = x4 = 0 for any specific value
of x, we divide the real line into small intervals and count the number of times Nk1n2
the observations fall in the interval 5xk 6 X 6 xk + ¢6. As n becomes large, then the
relative frequency fk1n2 = Nk1n2/n will approach fX1xk2¢, the probability of the interval. We calculate the sample mean in terms of the relative frequencies and let n : q :
8X9n = a xkfk1n2 : a xkfX1xk2¢.
k
k
The expression on the right-hand side approaches an integral as we decrease ¢.
The expected value or mean of a random variable X is defined by
+q
(4.27)
tfX1t2 dt.
L- q
The expected value E[X] is defined if the above integral converges absolutely, that is,
E3X4 =
E3 ƒ X ƒ 4 =
+q
ƒ t ƒ fX1t2 dt
6 q.
L- q
If we view fX1x2 as the distribution of mass on the real line, then E[X] represents the
center of mass of this distribution.
We already discussed E[X] for discrete random variables in detail, but it is worth
noting that the definition in Eq. (4.27) is applicable if we express the pdf of a discrete
random variable using delta functions:
+q
E3X4 =
L- q
t a pX1xk2d1t - xk2 dt
k
= a pX1xk2
k
+q
L- q
= a pX1xk2xk .
k
t a d1t - xk2 dt
k
156
Chapter 4
One Random Variable
Example 4.12 Mean of a Uniform Random Variable
The mean for a uniform random variable is given by
E3X4 = 1b - a2-1
La
b
t dt =
a + b
,
2
which is exactly the midpoint of the interval [a, b]. The results shown in Fig. 3.6 were obtained by
repeating experiments in which outcomes were random variables Y and X that had uniform cdf’s
in the intervals 3-1, 14 and [3, 7], respectively. The respective expected values, 0 and 5, correspond to the values about which X and Y tend to vary.
The result in Example 4.12 could have been found immediately by noting that
E3X4 = m when the pdf is symmetric about a point m. That is, if
fX1m - x2 = fX1m + x2
for all x,
then, assuming that the mean exists,
+q
0 =
L- q
1m - t2fX1t2 dt = m -
+q
L- q
tfX1t2 dt.
The first equality above follows from the symmetry of fX1t2 about t = m and the odd
symmetry of 1m - t2 about the same point. We then have that E3X4 = m.
Example 4.13 Mean of a Gaussian Random Variable
The pdf of a Gaussian random variable is symmetric about the point x = m. Therefore E3X4 = m.
The following expressions are useful when X is a nonnegative random variable:
E3X4 =
and
L0
q
11 - FX1t22 dt
E3X4 = a P3X 7 k4
if X continuous and nonnegative
(4.28)
q
if X nonnegative, integer-valued.
(4.29)
k=0
The derivation of these formulas is discussed in Problem 4.47.
Example 4.14 Mean of Exponential Random Variable
The time X between customer arrivals at a service station has an exponential distribution. Find
the mean interarrival time.
Substituting Eq. (4.17) into Eq. (4.27) we obtain
E3X4 =
L0
q
tle -lt dt.
Section 4.3
The Expected Value of X
157
We evaluate the integral using integration by parts 1 1 udv = uv - 1 vdu2, with u = t and
dv = le -lt dt:
E3X4 = -te -lt `
q
+
0
L0
q
e -lt dt
q
= lim te -lt - 0 + b
t: q
-e -lt
r
l
0
1
1
-e -lt
+
= ,
t: q l
l
l
= lim
where we have used the fact that e -lt and te -lt go to zero as t approaches infinity.
For this example, Eq. (4.28) is much easier to evaluate:
E3X4 =
L0
q
e -lt dt =
1
.
l
Recall that l is the customer arrival rate in customers per second. The result that the mean interarrival time E3X4 = 1/l seconds per customer then makes sense intuitively.
4.3.1
The Expected Value of Y ⴝ g1X2
Suppose that we are interested in finding the expected value of Y = g1X2. As in the
case of discrete random variables (Eq. (3.16)), E[Y] can be found directly in terms of
the pdf of X:
q
E3Y4 =
L- q
g1x2fX1x2 dx.
(4.30)
To see how Eq. (4.30) comes about, suppose that we divide the y-axis into intervals
of length h, we index the intervals with the index k and we let yk be the value in the
center of the kth interval. The expected value of Y is approximated by the following sum:
E3Y4 M a ykfY1yk2h.
k
Suppose that g(x) is strictly increasing, then the kth interval in the y-axis has a unique
corresponding equivalent event of width hk in the x-axis as shown in Fig. 4.6. Let xk be
the value in the kth interval such that g1xk2 = yk , then since fY1yk2h = fX1xk2hk ,
E3Y4 M a g1xk2fX1xk2hk .
k
By letting h approach zero, we obtain Eq. (4.30). This equation is valid even if g(x) is
not strictly increasing.
158
Chapter 4
One Random Variable
y g(x)
yk
h
hk
xk
x
FIGURE 4.6
Two infinitesimal equivalent events.
Example 4.15 Expected Values of a Sinusoid with Random Phase
Let Y = a cos1vt + ®2 where a, v, and t are constants, and ® is a uniform random variable
in the interval 10, 2p2. The random variable Y results from sampling the amplitude of a sinusoid with random phase ®. Find the expected value of Y and expected value of the power of
Y, Y2.
E3Y4 = E3a cos1vt + ®24
=
L0
2p
a cos1vt + u2
2p
du
= -a sin1vt + u2 `
2p
0
= -a sin1vt + 2p2 + a sin1vt2 = 0.
The average power is
E3Y24 = E3a2 cos21vt + ®24 = E B
=
a2
a2
+
2
2 L0
2p
cos12vt + u2
a2
a2
+
cos12vt + 2®2 R
2
2
du
a2
=
.
2p
2
Note that these answers are in agreement with the time averages of sinusoids: the time average
(“dc” value) of the sinusoid is zero; the time-average power is a2/2.
Section 4.3
159
The Expected Value of X
Example 4.16 Expected Values of the Indicator Function
Let g1X2 = IC1X2 be the indicator function for the event 5X in C6, where C is some interval or
union of intervals in the real line:
g1X2 = b
0
1
X not in C
X in C,
then
+q
L- q
E3Y4 =
g1X2fX1x2 dx =
LC
fX1x2 dx = P3X in C4.
Thus the expected value of the indicator of an event is equal to the probability of the event.
It is easy to show that Eqs. (3.17a)–(3.17e) hold for continuous random variables
using Eq. (4.30). For example, let c be some constant, then
q
L- q
E3c4 =
cfX1x2 dx = c
q
L- q
fX1x2 dx = c
(4.31)
and
q
E3cX4 =
L- q
cxfX1x2 dx = c
q
L- q
xfX1x2 dx = cE3X4.
(4.32)
The expected value of a sum of functions of a random variable is equal to the sum
of the expected values of the individual functions:
E3Y4 = E B a gk1X2 R
n
k=1
=
L-
a gk1x2fX1x2 dx = a
q
q n
n
q
k=1 L
-q
k=1
gk1x2fX1x2 dx
= a E3gk1X24.
n
(4.33)
k=1
Example 4.17
Let Y = g1X2 = a0 + a1X + a2X2 + Á + anXn, where ak are constants, then
E3Y4 = E3a04 + E3a1X4 + Á + E3anXn4
= a0 + a1E3X4 + a2E3X24 + Á + anE3Xn4,
where we have used Eq. (4.33), and Eqs. (4.31) and (4.32). A special case of this result is that
E3X + c4 = E3X4 + c,
that is, we can shift the mean of a random variable by adding a constant to it.
160
4.3.2
Chapter 4
One Random Variable
Variance of X
The variance of the random variable X is defined by
VAR3X4 = E31X - E3X4224 = E3X24 - E3X42
(4.34)
The standard deviation of the random variable X is defined by
STD3X4 = VAR3X41/2.
(4.35)
Example 4.18 Variance of Uniform Random Variable
Find the variance of the random variable X that is uniformly distributed in the interval [a, b].
Since the mean of X is 1a + b2/2,
b
VAR3X4 =
Let y = 1x - 1a + b2/22,
a + b 2
1
ax b dx.
b - a La
2
1b - a2/2
1b - a22
1
y2 dy =
.
b - a L-1b - a2/2
12
VAR3X4 =
The random variables in Fig. 3.6 were uniformly distributed in the interval 3-1, 14 and [3, 7], respectively. Their variances are then 1/3 and 4/3. The corresponding standard deviations are 0.577
and 1.155.
Example 4.19 Variance of Gaussian Random Variable
Find the variance of a Gaussian random variable.
First multiply the integral of the pdf of X by 22p s to obtain
q
L- q
e -1x - m2 /2s dx = 22p s.
2
Differentiate both sides with respect to s:
q
L- q
¢
1x - m22
s3
2
≤ e -1x - m2 /2s dx = 22p.
2
2
By rearranging the above equation, we obtain
q
1x - m22e -1x - m2 /2s dx = s2.
22p s L- q
This result can also be obtained by direct integration. (See Problem 4.46.) Figure 4.7 shows the
Gaussian pdf for several values of s; it is evident that the “width” of the pdf increases with s.
VAR3X4 =
1
2
2
The following properties were derived in Section 3.3:
VAR3c4 = 0
(4.36)
VAR3X + c4 = VAR3X4
2
VAR3cX4 = c VAR3X4,
where c is a constant.
(4.37)
(4.38)
Section 4.3
The Expected Value of X
161
fX(x) 1
.9
.8
.7
.6
s
.5
1
2
.4
.3
s1
.2
.1
0
m4
m2
m
x
m2
m4
FIGURE 4.7
Probability density function of Gaussian random variable.
The mean and variance are the two most important parameters used in summarizing the pdf of a random variable. Other parameters are occasionally used. For example, the skewness defined by E31X - E3X4234/STD3X43 measures the degree of
asymmetry about the mean. It is easy to show that if a pdf is symmetric about its
mean, then its skewness is zero. The point to note with these parameters of the pdf is
that each involves the expected value of a higher power of X. Indeed we show in a
later section that, under certain conditions, a pdf is completely specified if the expected values of all the powers of X are known. These expected values are called the moments of X.
The nth moment of the random variable X is defined by
E3Xn4 =
q
(4.39)
xnfX1x2 dx.
L- q
The mean and variance can be seen to be defined in terms of the first two moments,
E3X4 and E3X24.
*Example 4.20 Analog-to-Digital Conversion: A Detailed Example
A quantizer is used to convert an analog signal (e.g., speech or audio) into digital form. A quantizer maps a random voltage X into the nearest point q(X) from a set of 2 R representation values
as shown in Fig. 4.8(a). The value X is then approximated by q(X), which is identified by an R-bit
binary number. In this manner, an “analog” voltage X that can assume a continuum of values is
converted into an R-bit number.
The quantizer introduces an error Z = X - q1X2 as shown in Fig. 4.8(b). Note that Z is a
function of X and that it ranges in value between -d/2 and d/2, where d is the quantizer step size.
Suppose that X has a uniform distribution in the interval 3-xmax , xmax4, that the quantizer has 2 R
levels, and that 2xmax = 2 Rd. It is easy to show that Z is uniformly distributed in the interval
3-d/2, d/24 (see Problem 4.93).
162
Chapter 4
One Random Variable
7d
2
4d
5d
2
3d
3d
2
2d
d
q(x)
0
⫺d
⫺2d
⫺3d
d
2
4d 3d 2d d
⫺
7d
2
d
3d ⫺
⫺
2
2
5d
⫺
2
fX(x)
4d
d
2
0
3d 2d d
x
0
d
2d
3d
1
8d
x q(x)
d
2d
3d
4d
x
d
⫺
2
4d
⫺4d
(a)
(b)
FIGURE 4.8
(a) A uniform quantizer maps the input x into the closest point from the set 5;d/2, ;3d/2, ;5d/2, ;7d/26. (b) The uniform
quantizer error for the input x is x - q1x2.
Therefore from Example 4.12,
E3Z4 =
The error Z thus has mean zero.
By Example 4.18,
VAR3Z4 =
d/2 - d/2
= 0.
2
1d/2 - 1-d/2222
12
=
d2
.
12
This result is approximately correct for any pdf that is approximately flat over each quantizer interval. This is the case when 2 R is large.
The approximation q(x) can be viewed as a “noisy” version of X since
Q1X2 = X - Z,
where Z is the quantization error Z. The measure of goodness of a quantizer is specified by the
SNR ratio, which is defined as the ratio of the variance of the “signal” X to the variance of the
distortion or “noise” Z:
VAR3X4
VAR3X4
SNR =
=
VAR3Z4
d2/12
=
VAR3X4
x2max/3
2 2R,
where we have used the fact that d = 2xmax/2 R. When X is nonuniform, the value xmax is selected so that P3 ƒ X ƒ 7 xmax4 is small. A typical choice is xmax = 4 STD3X4. The SNR is then
SNR =
3 2R
2 .
16
This important formula is often quoted in decibels:
SNR dB = 10 log10 SNR = 6R - 7.3 dB.
Section 4.4
Important Continuous Random Variables
163
The SNR increases by a factor of 4 (6 dB) with each additional bit used to represent X. This
makes sense since each additional bit doubles the number of quantizer levels, which in turn reduces the step size by a factor of 2. The variance of the error should then be reduced by the
square of this, namely 2 2 = 4.
4.4
IMPORTANT CONTINUOUS RANDOM VARIABLES
We are always limited to measurements of finite precision, so in effect, every random
variable found in practice is a discrete random variable. Nevertheless, there are several
compelling reasons for using continuous random variable models. First, in general, continuous random variables are easier to handle analytically. Second, the limiting form of
many discrete random variables yields continuous random variables. Finally, there are
a number of “families” of continuous random variables that can be used to model a
wide variety of situations by adjusting a few parameters. In this section we continue
our introduction of important random variables. Table 4.1 lists some of the more important continuous random variables.
4.4.1
The Uniform Random Variable
The uniform random variable arises in situations where all values in an interval of the real
line are equally likely to occur.The uniform random variable U in the interval [a, b] has pdf:
1
fU1x2 = c b - a
0
a … x … b
(4.40)
x 6 a and x 7 b
and cdf
0
x - a
FU1x2 = d
b - a
1
x 6 a
a … x … b
(4.41)
x 7 b.
See Figure 4.2. The mean and variance of U are given by:
E3U4 =
a + b
2
and VAR3X4 =
1b - a22
2
.
(4.42)
The uniform random variable appears in many situations that involve equally
likely continuous random variables. Obviously U can only be defined over intervals
that are finite in length. We will see in Section 4.9 that the uniform random variable
plays a crucial role in generating random variables in computer simulation models.
4.4.2
The Exponential Random Variable
The exponential random variable arises in the modeling of the time between occurrence of events (e.g., the time between customer demands for call connections), and in
the modeling of the lifetime of devices and systems. The exponential random variable
X with parameter l has pdf
164
Chapter 4
One Random Variable
TABLE 4.1 Continuous random variables.
Uniform Random Variable
SX = 3a, b4
fX1x2 =
1
b - a
a … x … b
E3X4 =
a + b
2
VAR3X4 =
1b - a22
12
£ X1v2 =
ejvb - ejva
jv1b - a2
Exponential Random Variable
SX = 30, q 2
fX1x2 = le -lx
x Ú 0 and l 7 0
1
l
1
E3X4 =
£ X1v2 =
VAR3X4 = 2
l
l - jv
l
Remarks: The exponential random variable is the only continuous random variable with the memoryless
property.
Gaussian (Normal) Random Variable
SX = 1- q , + q 2
fX1x2 =
2
2
e -1x - m2 /2s
- q 6 x 6 + q and s 7 0
22ps
2 2
£ X1v2 = ejmv - s v /2
E3X4 = m
VAR3X4 = s2
Remarks: Under a wide range of conditions X can be used to approximate the sum of a large number of independent random variables.
Gamma Random Variable
SX = 10, + q 2
fX1x2 =
l1lx2a - 1e -lx
x 7 0 and a 7 0, l 7 0
≠1a2
where ≠1z2 is the gamma function (Eq. 4.56).
VAR3X4 = a/l2
E3X4 = a/l
£ X1v2 =
1
11 - jv/l2a
Special Cases of Gamma Random Variable
m–1 Erlang Random Variable: a = m, a positive integer
fX1x2 =
le -lx1lx2m - 2
1m - 12!
x 7 0
£ X1v2 = a
m
1
b
1 - jv/l
Remarks: An m–1 Erlang random variable is obtained by adding m independent exponentially distributed
random variables with parameter l.
Chi-Square Random Variable with k degrees of freedom: a = k/2, k a positive integer, and l = 1/2
fX1x2 =
x1k - 22/2e -x/2
2
k/2
≠1k/22
x 7 0
£ X1v2 = a
k/2
1
b
1 - 2jv
Remarks: The sum of k mutually independent, squared zero-mean, unit-variance Gaussian random variables is a chi-square random variable with k degrees of freedom.
Section 4.4
Important Continuous Random Variables
TABLE 4.1 Continuous random variables.
Laplacian Random Variable
SX = 1- q , q 2
a
fX1x2 = e -aƒxƒ
2
-q 6 x 6 +q
VAR3X4 = 2/a2
E3X4 = 0
and a 7 0
£ X1v2 =
a2
2
v + a2
Rayleigh Random Variable
SX = [0, q 2
fX1x2 =
x
2
a2
2
e -x /2a
E3X4 = a2p/2
x Ú 0 and a 7 0
VAR3X4 = 12 - p/22a2
Cauchy Random Variable
SX = 1- q , + q 2
fX1x2 =
a/p
x2 + a2
-q 6 x 6 +q
and a 7 0
£ X1v2 = e -aƒvƒ
Mean and variance do not exist.
Pareto Random Variable
SX = 3xm , q 2xm 7 0.
x 6 xm
0
fX1x2 = c a
E3X4 =
xam
x Ú xm
xa + 1
axm
a - 1
for a 7 1
VAR3X4 =
ax2m
1a - 221a - 122
for a 7 2
Remarks: The Pareto random variable is the most prominent example of random variables with “long
tails,” and can be viewed as a continuous version of the Zipf discrete random variable.
Beta Random Variable
≠1a + b2 a - 1
x 11 - x2b - 1
fX1x2 = c ≠1a2 ≠1b2
0
E[X] =
a
a + b
VAR3X4 =
0 6 x 6 1 and a 7 0, b 7 0
otherwise
ab
1a + b221a + b + 12
Remarks: The beta random variable is useful for modeling a variety of pdf shapes for random variables
that range over finite intervals.
165
166
Chapter 4
One Random Variable
fX1x2 = b
x 6 0
x Ú 0
0
le -lx
(4.43)
and cdf
FX1x2 = b
0
1 - e -lx
x 6 0
x Ú 0.
(4.44)
The cdf and pdf of X are shown in Fig. 4.9.
The parameter l is the rate at which events occur, so in Eq. (4.44) the probability
of an event occurring by time x increases at the rate l increases. Recall from Example
3.31 that the interarrival times between events in a Poisson process (Fig. 3.10) is an exponential random variable.
The mean and variance of X are given by:
E3U4 =
1
l
and VAR3X4 =
1
.
l2
(4.45)
In event interarrival situations, l is in units of events/second and 1/l is in units of seconds per event interarrival.
The exponential random variable satisfies the memoryless property:
P3X 7 t + h ƒ X 7 t4 = P3X 7 h4.
(4.46)
The expression on the left side is the probability of having to wait at least h additional
seconds given that one has already been waiting t seconds. The expression on the right
side is the probability of waiting at least h seconds when one first begins to wait. Thus
the probability of waiting at least an additional h seconds is the same regardless of how
long one has already been waiting! We see later in the book that the memoryless property of the exponential random variable makes it the cornerstone for the theory of
fX(x)
FX(x)
1
1 elx
lelx
x
x
0
0
(a)
(b)
FIGURE 4.9
An example of a continuous random variable—the exponential random variable. Part (a) is the cdf and part (b) is the pdf.
Section 4.4
Important Continuous Random Variables
167
Markov chains, which is used extensively in evaluating the performance of computer
systems and communications networks.
We now prove the memoryless property:
P3X 7 t + h ƒ X 7 t4 =
=
P35X 7 t + h6 ¨ 5X 7 t64
P3X 7 t4
P3X 7 t + h4
P3X 7 t4
= e
-lh
for h 7 0
e -l1t + h2
e -lt
=
= P3X 7 h4.
It can be shown that the exponential random variable is the only continuous random
variable that satisfies the memoryless property.
Examples 2.13, 2.28, and 2.30 dealt with the exponential random variable.
4.4.3
The Gaussian (Normal) Random Variable
There are many situations in manmade and in natural phenomena where one deals with a
random variable X that consists of the sum of a large number of “small” random variables.
The exact description of the pdf of X in terms of the component random variables can become quite complex and unwieldy. However, one finds that under very general conditions,
as the number of components becomes large, the cdf of X approaches that of the Gaussian
(normal) random variable.1 This random variable appears so often in problems involving
randomness that it has come to be known as the “normal” random variable.
The pdf for the Gaussian random variable X is given by
fX1x2 =
- q 6 x 6 q,
(4.47)
22ps
where m and s 7 0 are real numbers, which we showed in Examples 4.13 and 4.19 to be
the mean and standard deviation of X. Figure 4.7 shows that the Gaussian pdf is a “bellshaped” curve centered and symmetric about m and whose “width” increases with s.
The cdf of the Gaussian random variable is given by
1
2
2
e -1x - m2 /2s
P3X … x4 =
x
22ps L- q
1
2
2
e -1x¿ - m2 /2s dx¿.
(4.48)
The change of variable t = 1x¿ - m2/s results in
FX1x2 =
1x - m2/s
22p L- q
1
= £a
2
e -t /2 dt
x - m
b
s
(4.49)
where £1x2 is the cdf of a Gaussian random variable with m = 0 and s = 1:
£1x2 =
1
x
22p L- q
1
2
e -t /2 dt.
This result, called the central limit theorem, will be discussed in Chapter 7.
(4.50)
168
Chapter 4
One Random Variable
Therefore any probability involving an arbitrary Gaussian random variable can be expressed in terms of £1x2.
Example 4.21
Show that the Gaussian pdf integrates to one. Consider the square of the integral of the pdf:
B
q
22p L- q
1
q
q
2
2
e -x /2 dx R =
1
2
2
e -y /2 dy
e -x /2 dx
2p L- q
L- q
q
q
1
2
2
=
e -1x + y 2/2 dx dy.
2p L- q L- q
Let x = r cos u and y = r sin u and carry out the change from Cartesian to polar coordinates,
then we obtain:
q
q
2p
1
2
2
e -r /2r dr du =
re -r /2 dr
2p L0 L0
L0
= 3-e -r /240
q
2
= 1.
In electrical engineering it is customary to work with the Q-function, which is defined by
Q1x2 = 1 - £1x2
(4.51)
=
22p Lx
1
q
2
e -t /2 dt.
(4.52)
Q(x) is simply the probability of the “tail” of the pdf. The symmetry of the pdf implies that
Q102 = 1/2
and
Q1-x2 = 1 - Q1x2.
(4.53)
The integral in Eq. (4.50) does not have a closed-form expression. Traditionally
the integrals have been evaluated by looking up tables that list Q(x) or by using approximations that require numerical evaluation [Ross]. The following expression has
been found to give good accuracy for Q(x) over the entire range 0 6 x 6 q :
Q1x2 M B
1
11 - a2x + a2x + b
2
R
22p
1
2
e -x /2,
(4.54)
where a = 1/p and b = 2p [Gallager]. Table 4.2 shows Q(x) and the value given by the
above approximation. In some problems, we are interested in finding the value of x for
which Q1x2 = 10-k. Table 4.3 gives these values for k = 1, Á , 10.
The Gaussian random variable plays a very important role in communication systems, where transmission signals are corrupted by noise voltages resulting from the
thermal motion of electrons. It can be shown from physical principles that these voltages will have a Gaussian pdf.
Section 4.4
Important Continuous Random Variables
169
TABLE 4.2 Comparison of Q(x) and approximation given by Eq. (4.54).
x
Q(x)
Approximation
x
Q(x)
Approximation
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
5.00E-01
4.60E-01
4.21E-01
3.82E-01
3.45E-01
3.09E-01
2.74E-01
2.42E-01
2.12E-01
1.84E-01
1.59E-01
1.36E-01
1.15E-01
9.68E-02
8.08E-02
6.68E-02
5.48E-02
4.46E-02
3.59E-02
2.87E-02
2.28E-02
1.79E-02
1.39E-02
1.07E-02
8.20E-03
6.21E-03
4.66E-03
5.00E-01
4.58E-01
4.17E-01
3.78E-01
3.41E-01
3.05E-01
2.71E-01
2.39E-01
2.09E-01
1.82E-01
1.57E-01
1.34E-01
1.14E-01
9.60E-02
8.01E-02
6.63E-02
5.44E-02
4.43E-02
3.57E-02
2.86E-02
2.26E-02
1.78E-02
1.39E-02
1.07E-02
8.17E-03
6.19E-03
4.65E-03
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
8.5
9.0
9.5
10.0
3.47E-03
2.56E-03
1.87E-03
1.35E-03
9.68E-04
6.87E-04
4.83E-04
3.37E-04
2.33E-04
1.59E-04
1.08E-04
7.24E-05
4.81E-05
3.17E-05
3.40E-06
2.87E-07
1.90E-08
9.87E-10
4.02E-11
1.28E-12
3.19E-14
6.22E-16
9.48E-18
1.13E-19
1.05E-21
7.62E-24
3.46E-03
2.55E-03
1.86E-03
1.35E-03
9.66E-04
6.86E-04
4.83E-04
3.36E-04
2.32E-04
1.59E-04
1.08E-04
7.23E-05
4.81E-05
3.16E-05
3.40E-06
2.87E-07
1.90E-08
9.86E-10
4.02E-11
1.28E-12
3.19E-14
6.22E-16
9.48E-18
1.13E-19
1.05E-21
7.62E-24
Example 4.22
A communication system accepts a positive voltage V as input and outputs a voltage
Y = aV + N, where a = 10 -2 and N is a Gaussian random variable with parameters m = 0 and
s = 2. Find the value of V that gives P3Y 6 04 = 10-6.
The probability P3Y 6 04 is written in terms of N as follows:
P3Y 6 04 = P3aV + N 6 04
= P3N 6 -aV4 = £ a
-aV
aV
b = Qa
b = 10-6.
s
s
From Table 4.3 we see that the argument of the Q-function should be aV/s = 4.753. Thus
V = 14.7532s/a = 950.6.
170
Chapter 4
One Random Variable
Q1x2 = 10-k
TABLE 4.3
4.4.4
k
x = Q ⴚ1110 ⴚk2
1
1.2815
2
3
4
5
6
7
8
9
10
2.3263
3.0902
3.7190
4.2649
4.7535
5.1993
5.6120
5.9978
6.3613
The Gamma Random Variable
The gamma random variable is a versatile random variable that appears in many applications. For example, it is used to model the time required to service customers in queueing systems, the lifetime of devices and systems in reliability studies, and the defect
clustering behavior in VLSI chips.
The pdf of the gamma random variable has two parameters, a 7 0 and l 7 0,
and is given by
l1lx2a - 1e -lx
0 6 x 6 q,
(4.55)
fX1x2 =
≠1a2
where ≠1z2 is the gamma function, which is defined by the integral
≠1z2 =
L0
q
xz - 1e -x dx
z 7 0.
(4.56)
The gamma function has the following properties:
1
≠a b = 2p,
2
≠1z + 12 = z≠1z2
≠1m + 12 = m!
for z 7 0, and
for m a nonnegative integer.
The versatility of the gamma random variable is due to the richness of the gamma
function ≠1z2. The pdf of the gamma random variable can assume a variety of shapes
as shown in Fig. 4.10. By varying the parameters a and l it is possible to fit the gamma
pdf to many types of experimental data. In addition, many random variables are special cases of the gamma random variable. The exponential random variable is obtained
by letting a = 1. By letting l = 1/2 and a = k/2, where k is a positive integer, we obtain the chi-square random variable, which appears in certain statistical problems. The
m-Erlang random variable is obtained when a = m, a positive integer. The m-Erlang
random variable is used in the system reliability models and in queueing systems models. Both of these random variables are discussed in later examples.
Section 4.4
fX (x) 1.5
1.4
1.3
1.2
1.1
1
.9
.8
.7
.6
.5
.4
.3
.2
.1
0
Important Continuous Random Variables
171
l⫽1
1
a⫽
2
a⫽1
a⫽2
0
1
2
x
3
4
FIGURE 4.10
Probability density function of gamma random variable.
Example 4.23
Show that the pdf of a gamma random variable integrates to one.
The integral of the pdf is
L0
q
fX1x2 dx =
L0
q
l1lx2a - 1e -lx
≠1a2
dx
q
=
la
xa - 1e -lx dx.
≠1a2 L0
Let y = lx, then dx = dy/l and the integral becomes
q
la
ya - 1e -y dy = 1,
≠1a2la L0
where we used the fact that the integral equals ≠1a2.
In general, the cdf of the gamma random variable does not have a closed-form
expression. We will show that the special case of the m-Erlang random variable does
have a closed-form expression for the cdf by using its close interrelation with the exponential and Poisson random variables. The cdf can also be obtained by integration of
the pdf (see Problem 4.74).
Consider once again the limiting procedure that was used to derive the Poisson
random variable. Suppose that we observe the time Sm that elapses until the occurrence of the mth event. The times X1 , X2 , Á , Xm between events are exponential random variables, so we must have
Sm = X1 + X2 + Á + Xm .
172
Chapter 4
One Random Variable
We will show that Sm is an m-Erlang random variable. To find the cdf of Sm , let N(t) be
the Poisson random variable for the number of events in t seconds. Note that the mth
event occurs before time t—that is, Sm … t—if and only if m or more events occur in t
seconds, namely N1t2 Ú m. The reasoning goes as follows. If the mth event has occurred before time t, then it follows that m or more events will occur in time t. On the
other hand, if m or more events occur in time t, then it follows that the mth event occurred by time t. Thus
FSm1t2 = P3Sm … t4 = P3N1t2 Ú m4
(4.57)
m - 1 1lt2k
= 1 - a
k!
k=0
e -lt,
(4.58)
where we have used the result of Example 3.31. If we take the derivative of the above
cdf, we finally obtain the pdf of the m-Erlang random variable. Thus we have shown
that Sm is an m-Erlang random variable.
Example 4.24
A factory has two spares of a critical system component that has an average lifetime of 1/l = 1
month. Find the probability that the three components (the operating one and the two spares)
will last more than 6 months. Assume the component lifetimes are exponential random variables.
The remaining lifetime of the component in service is an exponential random variable
with rate l by the memoryless property. Thus, the total lifetime X of the three components is the
sum of three exponential random variables with parameter l = 1. Thus X has a 3-Erlang distribution with l = 1. From Eq. (4.58) the probability that X is greater than 6 is
P3X 7 64 = 1 - P3X … 64
2
6k
= a e -6 = .06197.
k = 0 k!
4.4.5
The Beta Random Variable
The beta random variable X assumes values over a closed interval and has pdf:
fX1x2 = cxa - 111 - x2b - 1
for 0 6 x 6 1
(4.59)
where the normalization constant is the reciprocal of the beta function
1
1
= B1a, b2 = xa - 111 - x2b - 1 dx
c
L0
and where the beta function is related to the gamma function by the following expression:
B1a, b2 =
≠1a2≠1b2
≠1a + b2
.
When a = b = 1, we have the uniform random variable. Other choices of a and b give
pdfs over finite intervals that can differ markedly from the uniform. See Problem 4.75. If
Section 4.4
Important Continuous Random Variables
173
a = b 7 1, then the pdf is symmetric about x = 1/2 and is concentrated about x = 1/2
as well.When a = b 6 1, then the pdf is symmetric but the density is concentrated at the
edges of the interval. When a 6 b (or a 7 b) the pdf is skewed to the right (or left).
The mean and variance are given by:
E3X4 =
a
a + b
and VAR3X4 =
ab
.
1a + b2 1a + b + 12
2
(4.60)
The versatility of the pdf of the beta random variable makes it useful to model a
variety of behaviors for random variables that range over finite intervals. For example,
in a Bernoulli trial experiment, the probability of success p could itself be a random
variable. The beta pdf is frequently used to model p.
4.4.6
The Cauchy Random Variable
The Cauchy random variable X assumes values over the entire real line and has pdf:
fX1x2 =
1/p
.
1 + x2
(4.61)
It is easy to verify that this pdf integrates to 1. However, X does not have any moments
since the associated integrals do not converge. The Cauchy random variable arises as
the tangent of a uniform random variable in the unit interval.
4.4.7
The Pareto Random Variable
The Pareto random variable arises in the study of the distribution of wealth where it
has been found to model the tendency for a small portion of the population to own a
large portion of the wealth. Recently the Pareto distribution has been found to capture the behavior of many quantities of interest in the study of Internet behavior,
e.g., sizes of files, packet delays, audio and video title preferences, session times in
peer-to-peer networks, etc. The Pareto random variable can be viewed as a continuous
version of the Zipf discrete random variable.
The Pareto random variable X takes on values in the range x 7 xm , where xm
is a positive real number. X has complementary cdf with shape parameter a 7 0
given by:
1
x 6 xm
a
(4.62)
x
P3X 7 x4 = c m
x Ú xm .
a
x
The tail of X decays algebraically with x which is rather slower in comparison to the exponential and Gaussian random variables. The Pareto random variable is the most
prominent example of random variables with “long tails.”
The cdf and pdf of X are:
0
a
FX1x2 = c 1 - xm
xa
x 6 xm
x Ú xm .
(4.63)
174
Chapter 4
One Random Variable
Because of its long tail, the cdf of X approaches 1 rather slowly as x increases.
x 6 xm
0
fX1x2 = c a
xam
xa + 1
(4.64)
x Ú xm .
Example 4.25 Mean and Variance of Pareto Random Variable
Find the mean and variance of the Pareto random variable.
q
E3X4 =
Lxm
ta
xam
t
dt =
a+1
q
Lxm
a
xam
axm
xam
a
dt
=
=
a
a
1
t
a - 1 xm
a - 1
for a 7 1
(4.65)
where the integral is defined for a 7 1, and
E3X24 =
q
Lxm
t 2a
xam
t
dt =
a+1
q
Lxm
a
xam
t
a-1
dt =
ax2m
xam
a
=
a
2
a - 2 xm
a - 2
for a 7 2
where the second moment is defined for a 7 2.
The variance of X is then:
VAR3X4 =
4.5
ax2m 2
ax2m
ax2m
- ¢
≤ =
a - 2
a - 1
1a - 221a - 122
for a 7 2.
(4.66)
FUNCTIONS OF A RANDOM VARIABLE
Let X be a random variable and let g(x) be a real-valued function defined on the real
line. Define Y = g1X2, that is, Y is determined by evaluating the function g(x) at the
value assumed by the random variable X. Then Y is also a random variable. The probabilities with which Y takes on various values depend on the function g(x) as well as
the cumulative distribution function of X. In this section we consider the problem of
finding the cdf and pdf of Y.
Example 4.26
Let the function h1x2 = 1x2+ be defined as follows:
1x2+ = b
0
x
if x 6 0
if x Ú 0.
For example, let X be the number of active speakers in a group of N speakers, and let Y be the
number of active speakers in excess of M, then Y = 1X - M2+. In another example, let X be a
voltage input to a halfwave rectifier, then Y = 1X2+ is the output.
Section 4.5
Functions of a Random Variable
175
Example 4.27
Let the function q(x) be defined as shown in Fig. 4.8(a), where the set of points on the real line are
mapped into the nearest representation point from the set SY = 5-3.5d, -2.5d, -1.5d, -0.5d,
0.5d, 1.5d, 2.5d, 3.5d6. Thus, for example, all the points in the interval (0, d) are mapped into the
point d/2. The function q(x) represents an eight-level uniform quantizer.
Example 4.28
Consider the linear function c1x2 = ax + b, where a and b are constants. This function arises in
many situations. For example, c(x) could be the cost associated with the quantity x, with the constant
a being the cost per unit of x, and b being a fixed cost component. In a signal processing context,
c1x2 = ax could be the amplified version (if a 7 1) or attenuated version (if a 6 1) of the voltage x.
The probability of an event C involving Y is equal to the probability of the equivalent event B of values of X such that g(X) is in C:
P3Y in C4 = P3g1X2 in C4 = P3X in B4.
Three types of equivalent events are useful in determining the cdf and pdf of Y = g1X2:
(1) The event 5g1X2 = yk6 is used to determine the magnitude of the jump at a point yk
where the cdf of Y is known to have a discontinuity; (2) the event 5g1X2 … y6 is used to
find the cdf of Y directly; and (3) the event 5y 6 g1X2 … y + h6 is useful in determining
the pdf of Y. We will demonstrate the use of these three methods in a series of examples.
The next two examples demonstrate how the pmf is computed in cases where
Y = g1X2 is discrete. In the first example, X is discrete. In the second example, X is
continuous.
Example 4.29
Let X be the number of active speakers in a group of N independent speakers. Let p be the probability that a speaker is active. In Example 2.39 it was shown that X has a binomial distribution
with parameters N and p. Suppose that a voice transmission system can transmit up to M voice
signals at a time, and that when X exceeds M, X - M randomly selected signals are discarded.
Let Y be the number of signals discarded, then
Y = 1X - M2+.
Y takes on values from the set SY = 50, 1, Á , N - M6. Y will equal zero whenever X is less
than or equal to M, and Y will equal k 7 0 when X is equal to M + k. Therefore
P3Y = 04 = P3X in 50, 1, Á , M64 = a pj
M
j=0
and
P3Y = k4 = P3X = M + k4 = pM + k
where pj is the pmf of X.
0 6 k … N - M,
176
Chapter 4
One Random Variable
Example 4.30
Let X be a sample voltage of a speech waveform, and suppose that X has a uniform distribution
in the interval 3-4d, 4d4. Let Y = q1X2, where the quantizer input-output characteristic is as
shown in Fig. 4.10. Find the pmf for Y.
The event 5Y = q6 for q in SY is equivalent to the event 5X in Iq6, where Iq is an interval
of points mapped into the representation point q. The pmf of Y is therefore found by evaluating
P3Y = q4 =
fX1t2 dt.
LIq
It is easy to see that the representation point has an interval of length d mapped into it. Thus the
eight possible outputs are equiprobable, that is, P3Y = q4 = 1/8 for q in SY .
In Example 4.30, each constant section of the function q(X) produces a delta
function in the pdf of Y. In general, if the function g(X) is constant during certain intervals and if the pdf of X is nonzero in these intervals, then the pdf of Y will contain
delta functions. Y will then be either discrete or of mixed type.
The cdf of Y is defined as the probability of the event 5Y … y6. In principle, it
can always be obtained by finding the probability of the equivalent event 5g1X2 … y6
as shown in the next examples.
Example 4.31 A Linear Function
Let the random variable Y be defined by
Y = aX + b,
where a is a nonzero constant. Suppose that X has cdf FX1x2, then find FY1y2.
The event 5Y … y6 occurs when A = 5aX + b … y6 occurs. If a 7 0, then A = 5X …
(y - b2/a6 (see Fig. 4.11), and thus
FY1y2 = PcX …
y - b
y - b
d = FX a
b
a
a
a 7 0.
On the other hand, if a 6 0, then A = 5X Ú 1y - b2/a6, and
FY1y2 = Pc X Ú
y - b
y - b
d = 1 - FX a
b
a
a
a 6 0.
We can obtain the pdf of Y by differentiating with respect to y. To do this we need to use the
chain rule for derivatives:
dF du
dF
=
,
dy
du dy
where u is the argument of F. In this case, u = 1y - b2/a, and we then obtain
fY1y2 =
y - b
1
fX a
b
a
a
a 7 0
Section 4.5
Functions of a Random Variable
y
Y⫽
aX
⫹
177
b
{Y y}
{X
y⫺b
}
a
x
y⫺b
a
FIGURE 4.11
The equivalent event for 5Y … y6 is the event
5X … 1y - b2/a6, if a 7 0.
and
fY1y2 =
y - b
1
f a
b
-a X
a
a 6 0.
The above two results can be written compactly as
y - b
1
fX a
b.
a
ƒaƒ
fY1y2 =
(4.67)
Example 4.32 A Linear Function of a Gaussian Random Variable
Let X be a random variable with a Gaussian pdf with mean m and standard deviation s:
fX1x2 =
1
22p s
2
2
e -1x - m2 /2s
- q 6 x 6 q.
(4.68)
Let Y = aX + b, then find the pdf of Y.
Substitution of Eq. (4.68) into Eq. (4.67) yields
fY1y2 =
1
22p ƒ as ƒ
2
2
e -1y - b - am2 /21as2 .
Note that Y also has a Gaussian distribution with mean b + am and standard deviation ƒ a ƒ s.
Therefore a linear function of a Gaussian random variable is also a Gaussian random variable.
Example 4.33
Let the random variable Y be defined by
Y = X 2,
where X is a continuous random variable. Find the cdf and pdf of Y.
178
Chapter 4
One Random Variable
Y ⫽ X2
兵Y y其
冑y
冑y
FIGURE 4.12
The equivalent event for 5Y … y6 is the event
5- 1y … X … 1y6, if y Ú 0.
The event 5Y … y6 occurs when 5X2 … y6 or equivalently when 5- 1y … X … 1y6
for y nonnegative; see Fig. 4.12. The event is null when y is negative. Thus
FY1y2 = b
0
FX11y2 - FX1- 1y2
and differentiating with respect to y,
fY1y2 =
=
fX11y2
21y
fX11y2
21y
+
fX1- 1y2
y 7 0
-21y
fX1- 1y2
21y
y 6 0
y 7 0
.
(4.69)
Example 4.34 A Chi-Square Random Variable
Let X be a Gaussian random variable with mean m = 0 and standard deviation s = 1. X is then
said to be a standard normal random variable. Let Y = X2. Find the pdf of Y.
Substitution of Eq. (4.68) into Eq. (4.69) yields
fY1y2 =
e -y/2
22yp
y Ú 0.
(4.70)
From Table 4.1 we see that fY1y2 is the pdf of a chi-square random variable with one degree of
freedom.
The result in Example 4.33 suggests that if the equation y0 = g1x2 has n solutions, x0 , x1 , Á , xn , then fY1y02 will be equal to n terms of the type on the right-hand
Section 4.5
Functions of a Random Variable
179
y ⫽ g(x)
y ⫹ dy
y
x1 x1 dx1 x2 dx2 x2
x3 x3 dx3
FIGURE 4.13
The equivalent event of 5y 6 Y 6 y + dy6 is 5x1 6 X 6 x1 + dx16
´ 5x2 + dx2 6 X 6 x26 ´ 5x3 6 X 6 x3 + dx36.
side of Eq. (4.69). We now show that this is generally true by using a method for directly obtaining the pdf of Y in terms of the pdf of X.
Consider a nonlinear function Y = g1X2 such as the one shown in Fig. 4.13. Consider the event Cy = 5y 6 Y 6 y + dy6 and let By be its equivalent event. For y indicated in the figure, the equation g1x2 = y has three solutions x1 , x2 , and x3 , and the
equivalent event By has a segment corresponding to each solution:
By = 5x1 6 X 6 x1 + dx16 ´ 5x2 + dx2 6 X 6 x26
´ 5x3 6 X 6 x3 + dx36.
The probability of the event Cy is approximately
P3Cy4 = fY1y2 ƒ dy ƒ ,
(4.71)
where ƒ dy ƒ is the length of the interval y 6 Y … y + dy. Similarly, the probability of
the event By is approximately
P3By4 = fX1x12 ƒ dx1 ƒ + fX1x22 ƒ dx2 ƒ + fX1x32 ƒ dx3 ƒ .
(4.72)
Since Cy and By are equivalent events, their probabilities must be equal. By equating
Eqs. (4.71) and (4.72) we obtain
fX1x2
fY1y2 = a
`
k ƒ dy>dx ƒ x = xk
dx
= a fX1x2 `
``
dy
k
(4.73)
.
(4.74)
x = xk
It is clear that if the equation g1x2 = y has n solutions, the expression for the pdf of Y
at that point is given by Eqs. (4.73) and (4.74), and contains n terms.
180
Chapter 4
One Random Variable
Example 4.35
Let Y = X2 as in Example 4.34. For y Ú 0, the equation y = x2 has two solutions, x0 = 1y and
x1 = - 1y, so Eq. (4.73) has two terms. Since dy/dx = 2x, Eq. (4.73) yields
fY1y2 =
fX11y2
21y
+
fX1- 1y2
2 1y
.
This result is in agreement with Eq. (4.69). To use Eq. (4.74), we note that
1
dx
d
,
=
; 1y = ;
dy
dy
2 1y
which when substituted into Eq. (4.74) then yields Eq. (4.69) again.
Example 4.36 Amplitude Samples of a Sinusoidal Waveform
Let Y = cos1X2, where X is uniformly distributed in the interval 10, 2p]. Y can be viewed as the
sample of a sinusoidal waveform at a random instant of time that is uniformly distributed over
the period of the sinusoid. Find the pdf of Y.
It can be seen in Fig. 4.14 that for -1 6 y 6 1 the equation y = cos1x2 has two solutions in
the interval of interest, x0 = cos-11y2 and x1 = 2p - x0 . Since (see an introductory calculus
textbook)
dy
` = -sin1x02 = -sin1cos-11y22 = - 21 - y2 ,
dx x0
and since fX1x2 = 1/2p in the interval of interest, Eq. (4.73) yields
fY1y2 =
=
1
2p21 - y
2
+
1
1
2p21 - y2
for -1 6 y 6 1.
p21 - y2
1
Y ⫽ cos X
0.5
y
0
cos⫺1(y)
p
⫺0.5
⫺1
FIGURE 4.14
y = cos x has two roots in the interval 10, 2p2.
2p ⫺cos⫺1y 2p
x
Section 4.6
The Markov and Chebyshev Inequalities
181
The cdf of Y is found by integrating the above:
y 6 -1
0
sin-1y
1
FY1y2 = d +
p
2
1
-1 … y … 1
y 7 1.
Y is said to have the arcsine distribution.
4.6
THE MARKOV AND CHEBYSHEV INEQUALITIES
In general, the mean and variance of a random variable do not provide enough information to determine the cdf/pdf. However, the mean and variance of a random variable X do allow us to obtain bounds for probabilities of the form P3 ƒ X ƒ Ú t4. Suppose
first that X is a nonnegative random variable with mean E 3X4. The Markov inequality
then states that
E3X4
(4.75)
for X nonnegative.
P3X Ú a4 …
a
We obtain Eq. (4.75) as follows:
E3X4 =
L0
a
tfX1t2 dt +
La
q
tfX1t2 dt Ú
La
q
tfX1t2 dt
q
afX1t2 dt = aP3X Ú a4.
La
The first inequality results from discarding the integral from zero to a; the second inequality results from replacing t with the smaller number a.
Ú
Example 4.37
The mean height of children in a kindergarten class is 3 feet, 6 inches. Find the bound on the probability that a kid in the class is taller than 9 feet.The Markov inequality gives P3H Ú 94 … 42/108
= .389.
The bound in the above example appears to be ridiculous. However, a bound, by
its very nature, must take the worst case into consideration. One can easily construct a
random variable for which the bound given by the Markov inequality is exact. The reason we know that the bound in the above example is ridiculous is that we have knowledge about the variability of the children’s height about their mean.
Now suppose that the mean E3X4 = m and the variance VAR3X4 = s2 of a
random variable are known, and that we are interested in bounding P3 ƒ X - m ƒ Ú a4.
The Chebyshev inequality states that
P3 ƒ X - m ƒ Ú a4 …
s2
.
a2
(4.76)
182
Chapter 4
One Random Variable
The Chebyshev inequality is a consequence of the Markov inequality. Let D2 = 1X
- m22 be the squared deviation from the mean. Then the Markov inequality applied to
D2 gives
P3D2 Ú a24 …
E31X - m224
a
2
=
s2
.
a2
Equation (4.76) follows when we note that 5D Ú a26 and 5 ƒ X - m ƒ Ú a6 are equivalent events.
Suppose that a random variable X has zero variance; then the Chebyshev inequality implies that
2
P3X = m4 = 1,
(4.77)
that is, the random variable is equal to its mean with probability one. In other words, X
is equal to the constant m in almost all experiments.
Example 4.38
The mean response time and the standard deviation in a multi-user computer system are known
to be 15 seconds and 3 seconds, respectively. Estimate the probability that the response time is
more than 5 seconds from the mean.
The Chebyshev inequality with m = 15 seconds, s = 3 seconds, and a = 5 seconds gives
P3 ƒ X - 15 ƒ Ú 54 …
9
= .36.
25
Example 4.39
If X has mean m and variance s2, then the Chebyshev inequality for a = ks gives
1
.
k2
Now suppose that we know that X is a Gaussian random variable, then for k = 2, P3 ƒ X - m ƒ Ú 2s4
= .0456, whereas the Chebyshev inequality gives the upper bound .25.
P3 ƒ X - m ƒ Ú ks4 …
Example 4.40 Chebyshev Bound Is Tight
Let the random variable X have P3X = - v4 = P3X = v4 = 0.5. The mean is zero and the variance is VAR3X4 = E3X24 = 1-v22 0.5 + v2 0.5 = v2.
Note that P3 ƒ X ƒ Ú v4 = 1. The Chebyshev inequality states:
P3 ƒ X ƒ Ú v4 … 1 -
VAR3X4
= 1.
v2
We see that the bound and the exact value are in agreement, so the bound is tight.
Section 4.6
The Markov and Chebyshev Inequalities
183
We see from Example 4.38 that for certain random variables, the Chebyshev inequality can give rather loose bounds. Nevertheless, the inequality is useful in situations
in which we have no knowledge about the distribution of a given random variable other
than its mean and variance. In Section 7.2, we will use the Chebyshev inequality to prove
that the arithmetic average of independent measurements of the same random variable
is highly likely to be close to the expected value of the random variable when the number of measurements is large. Problems 4.100 and 4.101 give examples of this result.
If more information is available than just the mean and variance, then it is possible to obtain bounds that are tighter than the Markov and Chebyshev inequalities.
Consider the Markov inequality again. The region of interest is A = 5t Ú a6, so let
IA1t2 be the indicator function, that is, IA1t2 = 1 if t H A and IA1t2 = 0 otherwise. The
key step in the derivation is to note that t/a Ú 1 in the region of interest. In effect we
bounded IA1t2 by t/a as shown in Fig. 4.15. We then have:
P3X Ú a4 =
L0
q
IA1t2fX1t2 dt …
q
E3X4
t
fX1t2 dt =
.
a
a
L0
By changing the upper bound on IA1t2, we can obtain different bounds on P3X Ú a4.
Consider the bound IA1t2 … es1t - a2, also shown in Fig. 4.15, where s 7 0. The resulting
bound is:
P3X Ú a4 =
L0
q
= e -sa
IA1t2fX1t2 dt …
L0
q
L0
q
es1t - a2fX1t2 dt
estfX1t2 dt = e -saE3esX4.
(4.78)
This bound is called the Chernoff bound, which can be seen to depend on the expected
value of an exponential function of X. This function is called the moment generating
function and is related to the transforms that are introduced in the next section. We develop the Chernoff bound further in the next section.
es(t a)
0
a
FIGURE 4.15
Bounds on indicator function for A = 5t Ú a6.
184
4.7
Chapter 4
One Random Variable
TRANSFORM METHODS
In the old days, before calculators and computers, it was very handy to have logarithm tables around if your work involved performing a large number of multiplications. If you wanted to multiply the numbers x and y, you looked up log(x) and
log(y), added log(x) and log(y), and then looked up the inverse logarithm of the
result. You probably remember from grade school that longhand multiplication is
more tedious and error-prone than addition. Thus logarithms were very useful as a
computational aid.
Transform methods are extremely useful computational aids in the solution of
equations that involve derivatives and integrals of functions. In many of these problems,
the solution is given by the convolution of two functions: f11x2 * f21x2. We will define
the convolution operation later. For now, all you need to know is that finding the convolution of two functions can be more tedious and error-prone than longhand multiplication! In this section we introduce transforms that map the function fk1x2 into another
function fk1v2, and that satisfy the property that f 3f11x2 * f21x24 = f11v2f21v2. In
other words, the transform of the convolution is equal to the product of the individual
transforms. Therefore transforms allow us to replace the convolution operation by
the much simpler multiplication operation. The transform expressions introduced in
this section will prove very useful when we consider sums of random variables in
Chapter 7.
4.7.1
The Characteristic Function
The characteristic function of a random variable X is defined by
£ X1v2 = E3ejvX4
q
=
L- q
fX1x2ejvx dx,
(4.79a)
(4.79b)
where j = 2 -1 is the imaginary unit number. The two expressions on the right-hand
side motivate two interpretations of the characteristic function. In the first expression,
£ X1v2 can be viewed as the expected value of a function of X, ejvX, in which the parameter v is left unspecified. In the second expression, £ X1v2 is simply the Fourier
transform of the pdf fX1x2 (with a reversal in the sign of the exponent). Both of these
interpretations prove useful in different contexts.
If we view £ X1v2 as a Fourier transform, then we have from the Fourier transform inversion formula that the pdf of X is given by
fX1x2 =
q
1
£ 1v2e -jvx dv.
2p L- q X
(4.80)
It then follows that every pdf and its characteristic function form a unique Fourier
transform pair. Table 4.1 gives the characteristic function of some continuous random
variables.
Section 4.7
Transform Methods
185
Example 4.41 Exponential Random Variable
The characteristic function for an exponentially distributed random variable with parameter l is
given by
£ X1v2 =
=
L0
q
le -lxejvx dx =
L0
q
le -1l - jv2x dx
l
.
l - jv
If X is a discrete random variable, substitution of Eq. (4.20) into the definition of
£ X1v2 gives
£ X1v2 = a pX1xk2ejvxk
discrete random variables.
k
Most of the time we deal with discrete random variables that are integer-valued. The
characteristic function is then
£ X1v2 = a pX1k2ejvk
q
q
integer-valued random variables.
(4.81)
k=-
Equation (4.81) is the Fourier transform of the sequence pX1k2. Note that the
Fourier transform in Eq. (4.81) is a periodic function of v with period 2p, since
ej1v + 2p2k= ejvkejk2p and ejk2p = 1. Therefore the characteristic function of integervalued random variables is a periodic function of v. The following inversion formula
allows us to recover the probabilities pX1k2 from £ X1v2:
pX1k2 =
2p
1
£ X1v2e -jvk dv
2p L0
k = 0, ;1, ;2, Á
(4.82)
Indeed, a comparison of Eqs. (4.81) and (4.82) shows that the pX1k2 are simply the coefficients of the Fourier series of the periodic function £ X1v2.
Example 4.42 Geometric Random Variable
The characteristic function for a geometric random variable is given by
£ X1v2 = a pqkejvk = p a 1qejv2k
=
q
q
k=0
k=0
p
1 - qejv
.
Since fX1x2 and £ X1v2 form a transform pair, we would expect to be able to obtain the moments of X from £ X1v2. The moment theorem states that the moments of
186
Chapter 4
One Random Variable
X are given by
E3Xn4 =
1 dn
.
£ 1v2 `
jn dvn X
v=0
(4.83)
To show this, first expand ejvx in a power series in the definition of £ X1v2:
£ X1v2 =
q
L- q
fX1x2 b 1 + jvX +
1jvX22
2!
+ Á r dx.
Assuming that all the moments of X are finite and that the series can be integrated
term by term, we obtain
£ X1v2 = 1 + jvE3X4 +
1jv22E3X24
2!
+ Á +
1jv2nE3Xn4
n!
+ Á.
If we differentiate the above expression once and evaluate the result at v = 0 we obtain
d
= jE3X4.
£ 1v2 `
dv X
v=0
If we differentiate n times and evaluate at v = 0, we finally obtain
dn
£ 1v2 `
= jnE3Xn4,
dvn X
v=0
which yields Eq. (4.83).
Note that when the above power series converges, the characteristic function and
hence the pdf by Eq. (4.80) are completely determined by the moments of X.
Example 4.43
To find the mean of an exponentially distributed random variable, we differentiate £ X1v2
= l1l - jv2-1 once, and obtain
lj
œ
.
£X
1v2 =
1l - jv22
œ
102/j = 1/l.
The moment theorem then implies that E3X4 = £ X
If we take two derivatives, we obtain
fl
£X
1v2 =
-2l
,
1l - jv23
fl
102/j2 = 2/l2. The variance of X is then given by
so the second moment is then E3X24 = £ X
VAR3X4 = E3X24 - E3X42 =
1
1
2
- 2 = 2.
l2
l
l
Section 4.7
Transform Methods
187
Example 4.44 Chernoff Bound for Gaussian Random Variable
Let X be a Gaussian random variable with mean m and variance s2. Find the Chernoff bound
for X.
The Chernoff bound (Eq. 4.78) depends on the moment generating function:
E3esX4 = £ X1-js2.
In terms of the characteristic function the bound is given by:
P3X Ú a4 … e -sa £ X1-js2 for s Ú 0.
The parameter s can be selected to minimize the upper bound.
The bound for the Gaussian random variable is:
2 2
2 2
P3X Ú a4 … e -saems + s s /2 = e -s1a - m2 + s s /2 for s Ú 0.
We minimize the upper bound by minimizing the exponent:
d
a - m
1-s1a - m2 + s2s2/22 which implies s =
.
ds
s2
The resulting upper bound is:
0 =
P3X Ú a4 = Q a
a - m
2
2
b … e -1a - m2 /2s .
s
This bound is much better than the Chebyshev bound and is similar to the estimate given in
Eq. (4.54).
4.7.2
The Probability Generating Function
In problems where random variables are nonnegative, it is usually more convenient to
use the z-transform or the Laplace transform. The probability generating function
GN1z2 of a nonnegative integer-valued random variable N is defined by
GN1z2 = E3zN4
= a pN1k2zk.
(4.84a)
q
(4.84b)
k=0
The first expression is the expected value of the function of N, zN. The second expression is the z-transform of the pmf (with a sign change in the exponent). Table 3.1 shows
the probability generating function for some discrete random variables. Note that the
characteristic function of N is given by £ N1v2 = GN1ejv2.
Using a derivation similar to that used in the moment theorem, it is easy to show
that the pmf of N is given by
pN1k2 =
1 dk
G 1z2 `
.
k! dzk N
z=0
(4.85)
This is why GN1z2 is called the probability generating function. By taking the first two
derivatives of GN1z2 and evaluating the result at z = 1, it is possible to find the first
188
Chapter 4
One Random Variable
two moments of X:
d
= a pN1k2kzk - 1 `
= a kpN1k2 = E3N4
G 1z2 `
dz N
k=0
k=0
z=1
z=1
q
q
and
d2
= a pN1k2k1k - 12zk - 2 `
GN1z2 `
dz2
k=0
z=1
z=1
q
= a k1k - 12pN1k2 = E3N1N - 124 = E3N 24 - E3N4.
q
k=0
Thus the mean and variance of X are given by
and
œ
112
E3N4 = G N
(4.86)
œ
œ
fl
11222.
112 - 1G N
112 + G N
VAR3N4 = G N
(4.87)
Example 4.45 Poisson Random Variable
The probability generating function for the Poisson random variable with parameter a is given by
1az2
ak -a k
GN1z2 = a
e z = e -a a
k!
k=0
k = 0 k!
q
q
k
= e -aeaz = ea1z - 12.
The first two derivatives of GN1z2 are given by
œ
1z2 = aea1z - 12
GN
and
fl
GN
1z2 = a2ea1z - 12.
Therefore the mean and variance of the Poisson are
E3N4 = a
VAR3N4 = a2 + a - a2 = a.
4.7.3
The Laplace Transform of the pdf
In queueing theory one deals with service times, waiting times, and delays. All of these
are nonnegative continuous random variables. It is therefore customary to work with
the Laplace transform of the pdf,
q
(4.88)
fX1x2e -sx dx = E3e -sX4.
L0
Note that X*1s2 can be interpreted as a Laplace transform of the pdf or as an expected
value of a function of X, e -sX.
X*1s2 =
Section 4.8
Basic Reliability Calculations
189
The moment theorem also holds for X*1s2:
E3Xn4 = 1-12n
dn
.
X*1s2 `
dsn
s=0
(4.89)
Example 4.46 Gamma Random Variable
The Laplace transform of the gamma pdf is given by
q
q a
X*1s2 =
la
l xa - 1e -lxe -sx
dx =
xa - 1e -1l + s2x dx
≠1a2
≠1a2 L0
L0
q
=
la
1
la
,
ya - 1e -y dy =
a
≠1a2 1l + s2 L0
1l + s2a
where we used the change of variable y = 1l + s2x. We can then obtain the first two moments
of X as follows:
E3X4 = -
ala
a
d
la
=
=
`
a`
ds 1l + s2 s = 0
l
1l + s2a + 1 s = 0
and
E3X24 =
a1a + 12la
a1a + 12
la
d2
.
=
=
`
`
2 1l + s2a
a
+
2
ds
l2
1l + s2
s=0
s=0
Thus the variance of X is
VAR1X2 = E3X24 - E3X42 =
4.8
a
.
l2
BASIC RELIABILITY CALCULATIONS
In this section we apply some of the tools developed so far to the calculation of
measures that are of interest in assessing the reliability of systems. We also show
how the reliability of a system can be determined in terms of the reliability of its
components.
4.8.1
The Failure Rate Function
Let T be the lifetime of a component, a subsystem, or a system. The reliability at time t
is defined as the probability that the component, subsystem, or system is still functioning at time t:
R1t2 = P3T 7 t4.
(4.90)
The relative frequency interpretation implies that, in a large number of components or
systems, R(t) is the fraction that fail after time t. The reliability can be expressed in
terms of the cdf of T:
R1t2 = 1 - P3T … t4 = 1 - FT1t2.
(4.91)
190
Chapter 4
One Random Variable
Note that the derivative of R(t) gives the negative of the pdf of T:
R¿1t2 = -fT1t2.
(4.92)
The mean time to failure (MTTF) is given by the expected value of T:
q
q
fT1t2 dt =
R1t2 dt,
L0
L0
where the second expression was obtained using Eqs. (4.28) and (4.91).
Suppose that we know a system is still functioning at time t; what is its future behavior? In Example 4.10, we found that the conditional cdf of T given that T 7 t is
given by
FT1x ƒ T 7 t2 = P3T … x ƒ T 7 t4
E3T4 =
0
= c FT1x2 - FT1t2
1 - FT1t2
The pdf associated with FT1x ƒ T 7 t2 is
fT1x ƒ T 7 t2 =
fT1x2
1 - FT1t2
x 6 t
x Ú t.
x Ú t.
(4.93)
(4.94)
Note that the denominator of Eq. (4.94) is equal to R(t).
The failure rate function r(t) is defined as fT1x ƒ T 7 t2 evaluated at x = t:
r1t2 = fT1t ƒ T 7 t2
=
-R¿1t2
,
(4.95)
R1t2
since by Eq. (4.92), R¿1t2 = -fT1t2. The failure rate function has the following meaning:
P3t 6 T … t + dt ƒ T 7 t4 = fT1t ƒ T 7 t2 dt = r1t2 dt.
(4.96)
In words, r(t) dt is the probability that a component that has functioned up to time t will
fail in the next dt seconds.
Example 4.47 Exponential Failure Law
Suppose a component has a constant failure rate function, say r1t2 = l. Find the pdf and the
MTTF for its lifetime T.
Equation (4.95) implies that
R¿1t2
(4.97)
= -l.
R1t2
Equation (4.97) is a first-order differential equation with initial condition R102 = 1. If we
integrate both sides of Eq. (4.97) from 0 to t, we obtain
-
L0
t
l dt¿ + k =
t R¿1t¿2
L0 R1t¿2
dt¿ = ln R1t2,
Section 4.8
Basic Reliability Calculations
191
which implies that
R1t2 = Ke -lt,
where K = ek.
The initial condition R102 = 1 implies that K = 1. Thus
R1t2 = e -lt
and
t 7 0
fT1t2 = le -lt
(4.98)
t 7 0.
Thus if T has a constant failure rate function, then T is an exponential random variable. This is
not surprising, since the exponential random variable satisfies the memoryless property. The
MTTF = E3T4 = 1/l.
The derivation that was used in Example 4.47 can be used to show that, in general, the failure rate function and the reliability are related by
R1t2 = exp b -
L0
t
r1t¿2 dt¿ r
(4.99)
and from Eq. (4.92),
fT1t2 = r1t2 exp b -
L0
t
r1t¿2 dt¿ r .
(4.100)
Figure 4.16 shows the failure rate function for a typical system. Initially there may
be a high failure rate due to defective parts or installation. After the “bugs” have been
worked out, the system is stable and has a low failure rate. At some later point, ageing
and wear effects set in, resulting in an increased failure rate. Equations (4.99) and
(4.100) allow us to postulate reliability functions and the associated pdf’s in terms of
the failure rate function, as shown in the following example.
r(t)
t
FIGURE 4.16
Failure rate function for a typical system.
192
Chapter 4
One Random Variable
Example 4.48 Weibull Failure Law
The Weibull failure law has failure rate function given by
r1t2 = abtb - 1,
(4.101)
where a and b are positive constants. Equation (4.99) implies that the reliability is given by
b
R1t2 = e -at .
Equation (4.100) then implies that the pdf for T is
fT1t2 = abtb - 1e -at
b
t 7 0.
(4.102)
Figure 4.17 shows fT1t2 for a = 1 and several values of b. Note that b = 1 yields the exponential failure law, which has a constant failure rate. For b 7 1, Eq. (4.101) gives a failure rate
function that increases with time. For b 6 1, Eq. (4.101) gives a failure rate function that decreases with time. Further properties of the Weibull random variable are developed in the
problems.
4.8.2
Reliability of Systems
Suppose that a system consists of several components or subsystems. We now show
how the reliability of a system can be computed in terms of the reliability of its subsystems if the components are assumed to fail independently of each other.
fT (t)
1.5
b4
1
b1
b2
.5
0
0
0.5
1
1.5
t
FIGURE 4.17
Probability density function of Weibull random variable, a = 1 and
b = 1, 2, 4.
2
Section 4.8
C1
C2
Basic Reliability Calculations
193
Cn
(a)
C1
C2
Cn
(b)
FIGURE 4.18
(a) System consisting of n components in series. (b) System consisting
of n components in parallel.
Consider first a system that consists of the series arrangement of n components
as shown in Fig. 4.18(a). This system is considered to be functioning only if all the components are functioning. Let A s be the event “system functioning at time t,” and let A j
be the event “jth component is functioning at time t,” then the probability that the system is functioning at time t is
R1t2 = P3A s4
= P3A 1 ¨ A 2 ¨ Á ¨ A n4 = P3A 14P3A 24 Á P3A n4
= R11t2R21t2 Á Rn1t2,
(4.103)
since P3A j4 = Rj1t2, the reliability function of the jth component. Since probabilities
are numbers that are less than or equal to one, we see that R (t) can be no more reliable
than the least reliable of the components, that is, R1t2 … minj Rj1t2.
If we apply Eq. (4.99) to each of the Rj1t2 in Eq. (4.103), we then find that the failure rate function of a series system is given by the sum of the component failure rate
functions:
t
R1t2 = exp E - 10 r11t¿2 dt¿ F exp E - 10 r21t¿2 dt¿ F Á exp E - 10 rn1t¿2 dt¿ F
t
t
t
= exp E - 10 3r11t¿2 + r21t¿2 + Á + rn1t¿24 dt¿ F .
Example 4.49
Suppose that a system consists of n components in series and that the component lifetimes are
exponential random variables with rates l1 , l2 , Á , ln . Find the system reliability.
194
Chapter 4
One Random Variable
From Eqs. (4.98) and (4.103), we have
R1t2 = e -l1te -l2t Á e -lnt
= e -1l1 +
Á
+ ln2t
.
Thus the system reliability is exponentially distributed with rate l1 + l2 + Á + ln .
Now suppose that a system consists of n components in parallel, as shown in
Fig. 4.18(b). This system is considered to be functioning as long as at least one of the
components is functioning. The system will not be functioning if and only if all the
components have failed, that is,
Thus
P3A cs4 = P3A c14P3A c24 Á P3A cn4.
1 - R1t2 = 11 - R11t2211 - R21t22 Á 11 - Rn1t22,
and finally,
R1t2 = 1 - 11 - R11t2211 - R21t22 Á 11 - Rn1t22.
(4.104)
Example 4.50
Compare the reliability of a single-unit system against that of a system that operates two units in
parallel. Assume all units have exponentially distributed lifetimes with rate 1.
The reliability of the single-unit system is
Rs1t2 = e -t.
The reliability of the two-unit system is
Rp1t2 = 1 - 11 - e -t211 - e -t2
= e -t12 - e -t2.
The parallel system is more reliable by a factor of
12 - e -t2 7 1.
More complex configurations can be obtained by combining subsystems consisting
of series and parallel components. The reliability of such systems can then be computed in
terms of the subsystem reliabilities. See Example 2.35 for an example of such a calculation.
4.9
COMPUTER METHODS FOR GENERATING RANDOM VARIABLES
The computer simulation of any random phenomenon involves the generation of random variables with prescribed distributions. For example, the simulation of a queueing
system involves generating the time between customer arrivals as well as the service
times of each customer. Once the cdf’s that model these random quantities have been
selected, an algorithm for generating random variables with these cdf’s must be found.
MATLAB and Octave have built-in functions for generating random variables for all
Section 4.9
Computer Methods for Generating Random Variables
195
of the well known distributions. In this section we present the methods that are used
for generating random variables. All of these methods are based on the availability of
random numbers that are uniformly distributed between zero and one. Methods for
generating these numbers were discussed in Section 2.7.
All of the methods for generating random variables require the evaluation of either the pdf, the cdf, or the inverse of the cdf of the random variable of interest. We can
write programs to perform these evaluations, or we can use the functions available in
programs such as MATLAB and Octave. The following example shows some typical
evaluations for the Gaussian random variable.
Example 4.51 Evaluation of pdf, cdf, and Inverse cdf
Let X be a Gaussian random variable with mean 1 and variance 2. Find the pdf at x = 7. Find the
cdf at x = - 2. Find the value of x at which the cdf = 0.25.
The following commands show how these results are obtained using Octave.
> normal_pdf (7, 1, 2)
ans = 3.4813e-05
> normal_cdf (-2, 1, 2)
ans = 0.016947
> normal_inv (0.25, 1, 2)
ans = 0.046127
4.9.1
The Transformation Method
Suppose that U is uniformly distributed in the interval [0, 1]. Let FX1x2 be the cdf of
the random variable we are interested in generating. Define the random variable,
-1
Z = FX
1U2; that is, first U is selected and then Z is found as indicated in Fig. 4.19. The
cdf of Z is
-1
P3Z … x4 = P3F X
1U2 … x4 = P3U … FX1x24.
But if U is uniformly distributed in [0, 1] and 0 … h … 1, then P3U … h4 = h (see
Example 4.6). Thus
P3Z … x4 = FX1x2,
-1
1U2 has the desired cdf.
and Z = F X
Transformation Method for Generating X:
1. Generate U uniformly distributed in [0, 1].
-1
1U2.
2. Let Z = F X
Example 4.52 Exponential Random Variable
To generate an exponentially distributed random variable X with parameter l, we need to invert
the expression u = FX1x2 = 1 - e -lx. We obtain
X = -
1
ln11 - U2.
l
196
Chapter 4
One Random Variable
1
0.9
FX (x)
0.8
0.7
U
0.6
U 0.5
0.4
0.3
0.2
0.1
0
Z = FX⫺1(U)
0
FIGURE 4.19
Transformation method for generating a random variable with cdf FX1x2.
Note that we can use the simpler expression X = - ln1U2/l, since 1 - U is also uniformly distributed in [0, 1]. The first two lines of the Octave commands below show how to implement
the transformation method to generate 1000 exponential random variables with l = 1. Figure
4.20 shows the histogram of values obtained. In addition, the figure shows the probability that
samples of the random variables fall in the corresponding histogram bins. Good correspondence
between the histograms and these probabilities are observed. In Chapter 8 we introduce methods for assessing the goodness-of-fit of data to a given distribution. Both MATLAB and Octave
use the transformation method in their function exponential_rnd.
> U=rand (1, 1000);
> X=-log(U);
> K=0.25:0.5:6;
> P(1)=1-exp(-0.5)
> for i=2:12,
> P(i)=P(i-1)*exp(-0.5)
> end;
> stem (K, P)
> hold on
% Generate 1000 uniform random variables.
% Compute 1000 exponential RVs.
% The remaining lines show how to generate
% the histogram bins.
> Hist (X, K, 1)
4.9.2
The Rejection Method
We first consider the simple version of this algorithm and explain why it works; then
we present it in its general form. Suppose that we are interested in generating a random variable Z with pdf fX1x2 as shown in Fig. 4.21. In particular, we assume that: (1)
the pdf is nonzero only in the interval [0, a], and (2) the pdf takes on values in the
range [0, b]. The rejection method in this case works as follows:
Section 4.9
197
Computer Methods for Generating Random Variables
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
0
1
2
3
4
6
5
FIGURE 4.20
Histogram of 1000 exponential random variables using transformation method.
b
Reject
Accept
Y
0
fX (x)
a
x x dx
0
X1
FIGURE 4.21
Rejection method for generating a random variable with pdf fX1x2.
1. Generate X1 uniform in the interval [0, a].
2. Generate Y uniform in the interval [0, b].
3. If Y … fX1X12, then output Z = X1 ; else, reject X1 and return to step 1.
198
Chapter 4
One Random Variable
Note that this algorithm will perform a random number of steps before it produces the
output Z.
We now show that the output Z has the desired pdf. Steps 1 and 2 select a point at
random in a rectangle of width a and height b. The probability of selecting a point in
any region is simply the area of the region divided by the total area of the rectangle, ab.
Thus the probability of accepting X1 is the probability of the region below fX1x2 divided by ab. But the area under any pdf is 1, so we conclude that the probability of success
(i.e., acceptance) is 1/ab. Consider now the following probability:
P3x 6 X1 … x + dx ƒ X1 is accepted4
=
=
P35x 6 X1 … x + dx6 ¨ 5X1 accepted64
P3X1 accepted4
fX1x2 dx/ab
shaded area/ab
=
1/ab
1/ab
= fX1x2 dx.
Therefore X1 when accepted has the desired pdf. Thus Z has the desired pdf.
Example 4.53 Generating Beta Random Variables
Show that the beta random variables with a¿ = b¿ = 2 can be generated using the rejection method.
The pdf of the beta random variable with a¿ = b¿ = 2 is similar to that shown in Fig. 4.21.
This beta pdf is maximum at x = 1/2 and the maximum value is:
11/222 - 111/222 - 1
B12, 22
=
1/4
1/4
3
=
= .
≠122≠122/≠142
1!1!/3!
2
Therefore we can generate this beta random variable using the rejection method with b = 1.5.
The algorithm as stated above can have two problems. First, if the rectangle does
not fit snugly around fX1x2, the number of X1’s that need to be generated before acceptance may be excessive. Second, the above method cannot be used if fX1x2 is unbounded or if its range is not finite. The general version of this algorithm overcomes
both problems. Suppose we want to generate Z with pdf fX1x2. Let W be a random
variable with pdf fW1x2 that is easy to generate and such that for some constant K 7 1,
KfW1x2 Ú fX1x2
for all x,
that is, the region under KfW1x2 contains fX1x2 as shown in Fig. 4.22.
Rejection Method for Generating X:
1. Generate X1 with pdf fW1x2. Define B1X12 = KfW1X12.
2. Generate Y uniform in 30, B1X124.
3. If Y … fX1X12, then output Z = X1 ; else reject X1 and return to step 1.
See Problem 4.143 for a proof that Z has the desired pdf.
Section 4.9
Computer Methods for Generating Random Variables
199
1
0.9
0.8
0.7
0.6
Reject
Y 0.5
0.4
KfW (x)
0.3
fX (x)
0.2
Accept
0.1
0
0
1
2
3
X1
FIGURE 4.22
Rejection method for generating a random variable with gamma pdf and with
0 6 a 6 1.
Example 4.54 Gamma Random Variable
We now show how the rejection method can be used to generate X with gamma pdf and parameters
0 6 a 6 1 and l = 1. A function KfW1x2 that “covers” fX1x2 is easily obtained (see Fig. 4.22):
fX1x2 =
x
a - 1 -x
e
≠1a2
xa - 1
≠1a2
… KfW1x2 = d -x
e
≠1a2
0 … x … 1
x 7 1.
The pdf fW1x2 that corresponds to the function on the right-hand side is
aexa - 1
a + e
fW1x2 = d
e -x
ae
a + e
0 … x … 1
x Ú 1.
The cdf of W is
FW1x2 = d
exa
a + e
0 … x … 1
1 - ae
e -x
a + e
x 7 1.
W is easy to generate using the transformation method, with
-1
1u2 = d
FW
c
1a + e2u
e
d
1/a
-lnc1a + e2
u … e/1a + e2
11 - u2
ae
d
u 7 e/1a + e2.
200
Chapter 4
One Random Variable
We can therefore use the transformation method to generate this fW1x2, and then the rejection method to generate any gamma random variable X with parameters 0 6 a 6 1 and
l = 1. Finally we note that if we let W = lX, then W will be gamma with parameters a and
l. The generation of gamma random variables with a 7 1 is discussed in Problem 4.142.
Example 4.55 Implementing Rejection Method for Gamma Random Variables
Given below is an Octave function definition to implement the rejection method using the above
transformation.
% Generate random numbers from the gamma distribution for 0 … a … 1.
function X = gamma_rejection_method_altone(alpha)
while (true),
X = special_inverse(alpha);
B = special_pdf (X, alpha);
Y = rand.* B;
if (Y <= fx_gamma_pdf (X, alpha)),
break;
end
% Step 1: Generate X with pdf fX1x2.
% Step 2: Generate Y uniform in 30, KfX1X24.
% Step 3: Accept or reject Á
end
% Helper function to generate random variables according to KfZ1x2.
function X = special_inverse (alpha)
u = rand;
if (u <= e./(alpha+e)),
X = ((alpha+e).*u./e). ^ (1./alpha);
elseif (u > e./(alpha+e)),
X = -log((alpha+e).*(1-u)./(alpha.*e));
end
% Return B in order to generate uniform variables in 30, KfZ1X24.
function B = special_pdf (X, alpha)
if (X >=0 && X <= 1),
B = alpha.*e.*X.^(alpha-1)./(alpha + e);
elseif (X > 1),
B = alpha.*e.*(e. ^(-X)./(alpha + e));
end
% pdf of the gamma distribution.
% Could also use the built in gamma_pdf (X, A, B) function supplied with Octave
setting B = 1
function Y = fx_gamma_pdf (x, alpha)
y = (x.^ (alpha-1)).*(e.^ (-x))./(gamma(alpha));
Figure 4.23 shows the histogram of 1000 samples obtained using this function. The figure
also shows the probability that the samples fall in the bins of the histogram.
We have presented the most common methods that are used to generate random variables. These methods are incorporated in the functions provided by programs
such as MATLAB and Octave, so in practice you do not need to write programs to
Section 4.9
Computer Methods for Generating Random Variables
201
350 ⫹
Expected Frequencies
Empirical Frequencies
⫹
300
250
200
150
⫹
100
⫹
⫹
⫹
50
0
⫹⫹
0
0.5
⫹⫹
⫹
⫹ ⫹⫹
1
⫹⫹ ⫹⫹
1.5
⫹⫹⫹ ⫹ ⫹ ⫹ ⫹ ⫹⫹ ⫹⫹⫹ ⫹⫹⫹
⫹⫹⫹⫹ ⫹ ⫹ ⫹ ⫹⫹⫹⫹⫹ ⫹⫹⫹ ⫹⫹⫹
2
2.5
3
3.5
4
4.5
5
FIGURE 4.23
1000 samples of gamma random variable using rejection method.
generate the most common random variables. You simply need to invoke the appropriate functions.
Example 4.56 Generating Gamma Random Variables
Use Octave to obtain eight Gamma random variables with a = 0.25 and l = 1.
The Octave command and the corresponding answer are given below:
> gamma_rnd (0.25, 1, 1, 8)
ans =
Columns 1 through 6:
0.00021529
0.09331491
0.00013400
0.23384718
Columns 7 and 8:
1.72940941
4.9.3
0.24606757
0.08665787
1.29599702
Generation of Functions of a Random Variable
Once we have a simple method of generating a random variable X, we can easily generate any random variable that is defined by Y = g1X2 or even Z = h1X1 , X2 , Á , Xn2,
where X1 , Á , Xn are n outputs of the random variable generator.
202
Chapter 4
One Random Variable
Example 4.57 m-Erlang Random Variable
Let X1 , X2 , Á be independent, exponentially distributed random variables with parameter l.
In Chapter 7 we show that the random variable
Y = X1 + X2 + Á + Xm
has an m-Erlang pdf with parameter l. We can therefore generate an m-Erlang random variable
by first generating m exponentially distributed random variables using the transformation
method, and then taking the sum. Since the m-Erlang random variable is a special case of the
gamma random variable, for large m it may be preferable to use the rejection method described
in Problem 4.142.
4.9.4
Generating Mixtures of Random Variables
We have seen in previous sections that sometimes a random variable consists of a mixture of several random variables. In other words, the generation of the random variable
can be viewed as first selecting a random variable type according to some pmf, and
then generating a random variable from the selected pdf type. This procedure can be
simulated easily.
Example 4.58 Hyperexponential Random Variable
A two-stage hyperexponential random variable has pdf
fX1x2 = pae -ax + 11 - p2be -bx.
It is clear from the above expression that X consists of a mixture of two exponential random
variables with parameters a and b, respectively. X can be generated by first performing a
Bernoulli trial with probability of success p. If the outcome is a success, we then use the transformation method to generate an exponential random variable with parameter a. If the outcome is
a failure, we generate an exponential random variable with parameter b instead.
*4.10
ENTROPY
Entropy is a measure of the uncertainty in a random experiment. In this section, we
first introduce the notion of the entropy of a random variable and develop several of
its fundamental properties. We then show that entropy quantifies uncertainty by the
amount of information required to specify the outcome of a random experiment. Finally, we discuss the method of maximum entropy, which has found wide use in characterizing random variables when only some parameters, such as the mean or variance,
are known.
4.10.1 The Entropy of a Random Variable
Let X be a discrete random variable with SX = 51, 2, Á , K6 and pmf pk = P3X = k4.
We are interested in quantifying the uncertainty of the event A k = 5X = k6. Clearly, the
uncertainty of A k is low if the probability of A k is close to one, and it is high if the
Summary
213
SUMMARY
• The cumulative distribution function FX1x2 is the probability that X falls in the
interval 1- q , x4. The probability of any event consisting of the union of intervals can be expressed in terms of the cdf.
• A random variable is continuous if its cdf can be written as the integral of a nonnegative function. A random variable is mixed if it is a mixture of a discrete and a
continuous random variable.
• The probability of events involving a continuous random variable X can be expressed as integrals of the probability density function fX1x2.
• If X is a random variable, then Y = g1X2 is also a random variable. The notion of
equivalent events allows us to derive expressions for the cdf and pdf of Y in terms
of the cdf and pdf of X.
• The cdf and pdf of the random variable X are sufficient to compute all probabilities involving X alone. The mean, variance, and moments of a random variable
summarize some of the information about the random variable X. These parameters are useful in practice because they are easier to measure and estimate than
the cdf and pdf.
• Conditional cdf’s or pdf’s incorporate partial knowledge about the outcome of an
experiment in the calculation of probabilities of events.
• The Markov and Chebyshev inequalities allow us to bound probabilities involving X in terms of its first two moments only.
• Transforms provide an alternative but equivalent representation of the pmf and
pdf. In certain types of problems it is preferable to work with the transforms
rather than the pmf or pdf. The moments of a random variable can be obtained
from the corresponding transform.
• The reliability of a system is the probability that it is still functioning after t hours
of operation. The reliability of a system can be determined from the reliability of
its subsystems.
• There are a number of methods for generating random variables with prescribed
pmf’s or pdf’s in terms of a random variable that is uniformly distributed in the
unit interval. These methods include the transformation and the rejection methods as well as methods that simulate random experiments (e.g., functions of random variables) and mixtures of random variables.
• The entropy of a random variable X is a measure of the uncertainty of X in terms
of the average amount of information required to identify its value.
• The maximum entropy method is a procedure for estimating the pmf or pdf of a
random variable when only partial information about X, in the form of expected
values of functions of X, is available.
214
Chapter 4
One Random Variable
CHECKLIST OF IMPORTANT TERMS
Characteristic function
Chebyshev inequality
Chernoff bound
Conditional cdf, pdf
Continuous random variable
Cumulative distribution function
Differential entropy
Discrete random variable
Entropy
Equivalent event
Expected value of X
Failure rate function
Function of a random variable
Laplace transform of the pdf
Markov inequality
Maximum entropy method
Mean time to failure (MTTF)
Moment theorem
nth moment of X
Probability density function
Probability generating function
Probability mass function
Random variable
Random variable of mixed type
Rejection method
Reliability
Standard deviation of X
Transformation method
Variance of X
ANNOTATED REFERENCES
Reference [1] is the standard reference for electrical engineers for the material on random variables. Reference [2] is entirely devoted to continuous distributions. Reference
[3] discusses some of the finer points regarding the concept of a random variable at a
level accessible to students of this course. Reference [4] presents detailed discussions
of the various methods for generating random numbers with specified distributions.
Reference [5] also discusses the generation of random variables. Reference [9] is focused on signal processing. Reference [11] discusses entropy in the context of information theory.
1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes,
McGraw-Hill, New York, 2002.
2. N. Johnson et al., Continuous Univariate Distributions, vol. 2, Wiley, New York,
1995.
3. K. L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974.
4. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill,
New York, 2000.
5. S. M. Ross, Introduction to Probability Models, Academic Press, New York, 2003.
6. H. Cramer, Mathematical Methods of Statistics, Princeton University Press,
Princeton, N.J., 1946.
7. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, National Bureau of Standards, Washington, D.C., 1964. Downloadable: www.math.sfu.ca/~cbm
/aands/.
8. R. C. Cheng, “The Generation of Gamma Variables with Nonintegral Shape Parameter,” Appl. Statist., 26: 71–75, 1977.
9. R. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing,
Cambridge Univ. Press, Cambridge, UK, 2005.
Problems
215
10. P. O. Börjesson and C. E. W. Sundberg, “Simple Approximations of the Error
Function Q(x) for Communications Applications,” IEEE Trans. on Communications, March 1979, 639–643.
11. R. G. Gallager, Information Theory and Reliable Communication, Wiley, New
York, 1968.
PROBLEMS
Section 4.1: The Cumulative Distribution Function
4.1. An information source produces binary pairs that we designate as SX = 51, 2, 3, 46 with
the following pmf’s:
(i) pk = p1/k for all k in SX .
(ii) pk + 1 = pk/2 for k = 2, 3, 4.
4.2.
4.3.
4.4.
4.5.
4.6.
4.7.
(iii) pk + 1 = pk/2 k for k = 2, 3, 4.
(a) Plot the cdf of these three random variables.
(b) Use the cdf to find the probability of the events: 5X … 16, 5X 6 2.56,
50.5 6 X … 26, 51 6 X 6 46.
A die is tossed. Let X be the number of full pairs of dots in the face showing up, and Y be the
number of full or partial pairs of dots in the face showing up. Find and plot the cdf of X and Y.
The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where
the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product
of x and y, where sgn(t) is 1 if t 7 0, 0 if t = 0, and -1 if t 6 0.
(a) Find and plot the cdf of the random variable X.
(b) Does the cdf change if the clock hand has a propensity to stop at 3, 6, 9, and 12 o’clock?
An urn contains 8 $1 bills and two $5 bills. Let X be the total amount that results when
two bills are drawn from the urn without replacement, and let Y be the total amount that
results when two bills are drawn from the urn with replacement.
(a) Plot and compare the cdf’s of the random variables.
(b) Use the cdf to compare the probabilities of the following events in the two problems: 5X = $26, 5X 6 $76, 5X Ú 66.
Let Y be the difference between the number of heads and the number of tails in the 3
tosses of a fair coin.
(a) Plot the cdf of the random variable Y.
(b) Express P3 ƒ Y ƒ 6 y4 in terms of the cdf of Y.
A dart is equally likely to land at any point inside a circular target of radius 2. Let R be
the distance of the landing point from the origin.
(a) Find the sample space S and the sample space of R, SR .
(b) Show the mapping from S to SR .
(c) The “bull’s eye” is the central disk in the target of radius 0.25. Find the event A in SR
corresponding to “dart hits the bull’s eye.” Find the equivalent event in S and P[A].
(d) Find and plot the cdf of R.
A point is selected at random inside a square defined by 51x, y2: 0 … x … b, 0 … y … b6.
Assume the point is equally likely to fall anywhere in the square. Let the random variable
Z be given by the minimum of the two coordinates of the point where the dart lands.
(a) Find the sample space S and the sample space of Z, SZ .
216
Chapter 4
4.8.
4.9.
4.10.
4.11.
4.12.
One Random Variable
(b) Show the mapping from S to SZ .
(c) Find the region in the square corresponding to the event 5Z … z6.
(d) Find and plot the cdf of Z.
(e) Use the cdf to find: P3Z 7 04, P3Z 7 b4, P3Z … b/24, P3Z 7 b/44.
Let z be a point selected at random from the unit interval. Consider the random variable
X = 11 - z2-1/2.
(a) Sketch X as a function of z.
(b) Find and plot the cdf of X.
(c) Find the probability of the events 5X 7 16, 55 6 X 6 76, 5X … 206.
The loose hand of a clock is spun hard and the outcome z is the angle in the range [0, 2p2
where the hand comes to rest. Consider the random variable X1z2 = 2 sin1z/42.
(a) Sketch X as a function of z.
(b) Find and plot the cdf of X.
(c) Find the probability of the events 5X 7 16, 5-1/2 6 X 6 1/26, 5X … 1/126.
Repeat Problem 4.9 if 80% of the time the hand comes to rest anywhere in the circle, but
20% of the time the hand comes to rest at 3, 6, 9, or 12 o’clock.
The random variable X is uniformly distributed in the interval 3 -1, 24.
(a) Find and plot the cdf of X.
(b) Use the cdf to find the probabilities of the following events: 5X … 06,
5 ƒ X - 0.5 ƒ 6 16, and C = 5X 7 -0.56.
The cdf of the random variable X is given by:
0
0.5
FX1x2 = d
11 + x2/2
1
x
-1 … x
0 … x
x
6
…
…
Ú
-1
0
1
1.
(a) Plot the cdf and identify the type of random variable.
(b) Find P3X … -14, P3X = -14, P3X 6 0.54, P3 - 0.5 6 X 6 0.54, P3X 7 -14,
P3X … 24, P3X 7 34.
4.13. A random variable X has cdf:
0
FX1x2 = c 1 - 1 e -2x
4
for x 6 0
for x Ú 0.
(a) Plot the cdf and identify the type of random variable.
(b) Find P3X … 24, P3X = 04, P3X 6 04, P32 6 X 6 64, P3X 7 104.
4.14. The random variable X has cdf shown in Fig. P4.1.
(a) What type of random variable is X?
(b) Find the following probabilities: P3X 6 - 14, P3X … - 14, P3-1 6 X 6 -0.754,
P3-0.5 … X 6 04, P3 -0.5 … X … 0.54, P3 ƒ X - 0.5 ƒ 6 0.54.
4.15. For b 7 0 and l 7 0, the Weibull random variable Y has cdf:
FX1x2 = b
0
b
1 - e -1x/l2
for x 6 0
for x Ú 0.
Problems
217
1
6
10
2
10
4
10
x
1
1
2
0
1
FIGURE P4.1
(a) Plot the cdf of Y for b = 0.5, 1, and 2.
(b) Find the probability P3jl 6 X 6 1j + 12l4 and P3X 7 jl4.
(c) Plot log P3X 7 x4 vs. log x.
4.16. The random variable X has cdf:
0
FX1x2 = c 0.5 + c sin21px/22
1
x 6 0
0 … x … 1
x 7 1.
(a) What values can c assume?
(b) Plot the cdf.
(c) Find P3X 7 04.
Section 4.2: The Probability Density Function
4.17. A random variable X has pdf:
fX1x2 = b
c11 - x22
0
-1 … x … 1
elsewhere.
(a) Find c and plot the pdf.
(b) Plot the cdf of X.
(c) Find P3X = 04, P30 6 X 6 0.54, and P3 ƒ X - 0.5 ƒ 6 0.254.
4.18. A random variable X has pdf:
fX1x2 = b
cx11 - x22
0
0 … x … 1
elsewhere.
Find c and plot the pdf.
Plot the cdf of X.
Find P30 6 X 6 0.54, P3X = 14, P3.25 6 X 6 0.54.
In Problem 4.6, find and plot the pdf of the random variable R, the distance from the
dart to the center of the target.
(b) Use the pdf to find the probability that the dart is outside the bull’s eye.
4.20. (a) Find and plot the pdf of the random variable Z in Problem 4.7.
(b) Use the pdf to find the probability that the minimum is greater than b/3.
(a)
(b)
(c)
4.19. (a)
218
Chapter 4
One Random Variable
4.21. (a) Find and plot the pdf in Problem 4.8.
(b) Use the pdf to find the probabilities of the events: 5X 7 a6 and 5X 7 2a6.
4.22. (a) Find and plot the pdf in Problem 4.12.
(b) Use the pdf to find P3- 1 … X 6 0.254.
4.23. (a) Find and plot the pdf in Problem 4.13.
(b) Use the pdf to find P3X = 04, P3X 7 84.
4.24. (a) Find and plot the pdf of the random variable in Problem 4.14.
(b) Use the pdf to calculate the probabilities in Problem 4.14b.
4.25. Find and plot the pdf of the Weibull random variable in Problem 4.15a.
4.26. Find the cdf of the Cauchy random variable which has pdf:
fX1x2 =
a/p
x2 + a2
- q 6 x 6 q.
4.27. A voltage X is uniformly distributed in the set 5-3, -2, Á , 3, 46.
(a) Find the pdf and cdf of the random variable X.
(b) Find the pdf and cdf of the random variable Y = -2X2 + 3.
(c) Find the pdf and cdf of the random variable W = cos1pX/82.
(d) Find the pdf and cdf of the random variable Z = cos21pX/82.
4.28. Find the pdf and cdf of the Zipf random variable in Problem 3.70.
4.29. Let C be an event for which P3C4 7 0. Show that FX1x ƒ C2 satisfies the eight properties of
a cdf.
4.30. (a) In Problem 4.13, find FX1x ƒ C2 where C = 5X 7 06.
(b) Find FX1x ƒ C2 where C = 5X = 06.
4.31. (a) In Problem 4.10, find FX1x ƒ B2 where B = 5hand does not stop at 3, 6, 9, or 12
o’clock6.
(b) Find FX1x ƒ Bc2.
4.32. In Problem 4.13, find fX1x ƒ B2 and FX1x ƒ B2 where B = 5X 7 0.256.
4.33. Let X be the exponential random variable.
(a) Find and plot FX1x ƒ X 7 t2. How does FX1x ƒ X 7 t2 differ from FX1x2?
(b) Find and plot fX1x ƒ X 7 t2.
(c) Show that P3X 7 t + x ƒ X 7 t4 = P3X 7 x4. Explain why this is called the memoryless property.
4.34. The Pareto random variable X has cdf:
0
a
FX1x2 = c 1 - xm
xa
x 6 xm
x Ú xm .
(a) Find and plot the pdf of X.
(b) Repeat Problem 4.33 parts a and b for the Pareto random variable.
(c) What happens to P3X 7 t + x ƒ X 7 t4 as t becomes large? Interpret this result.
4.35. (a) Find and plot FX1x ƒ a … X … b2. Compare FX1x ƒ a … X … b2 to FX1x2.
(b) Find and plot fX1x ƒ a … X … b2.
4.36. In Problem 4.6, find FR1r ƒ R 7 12 and fR1r ƒ R 7 12.
Problems
219
4.37. (a) In Problem 4.7, find FZ1z ƒ b/4 … Z … b/22 and fZ1z ƒ b/4 … Z … b/22.
(b) Find FZ1z ƒ B2 and fZ1z ƒ B2, where B = 5x 7 b/26.
4.38. A binary transmission system sends a “0” bit using a -1 voltage signal and a “1” bit by
transmitting a +1. The received signal is corrupted by noise N that has a Laplacian distribution with parameter a. Assume that “0” bits and “1” bits are equiprobable.
(a) Find the pdf of the received signal Y = X + N, where X is the transmitted signal,
given that a “0” was transmitted; that a “1” was transmitted.
(b) Suppose that the receiver decides a “0” was sent if Y 6 0, and a “1” was sent if
Y Ú 0. What is the probability that the receiver makes an error given that a +1 was
transmitted? a -1 was transmitted?
(c) What is the overall probability of error?
Section 4.3: The Expected Value of X
4.39.
4.40.
4.41.
4.42.
4.43.
4.44.
4.45.
4.46.
4.47.
4.48.
4.49.
4.50.
4.51.
4.52.
4.53.
4.54.
Find the mean and variance of X in Problem 4.17.
Find the mean and variance of X in Problem 4.18.
Find the mean and variance of Y, the distance from the dart to the origin, in Problem 4.19.
Find the mean and variance of Z, the minimum of the coordinates in a square, in Problem 4.20.
Find the mean and variance of X = 11 - z2-1/2 in Problem 4.21. Find E[X] using Eq. (4.28).
Find the mean and variance of X in Problems 4.12 and 4.22.
Find the mean and variance of X in Problems 4.13 and 4.23. Find E[X] using Eq. (4.28).
Find the mean and variance of the Gaussian random variable by direct integration of
Eqs. (4.27) and (4.34).
Prove Eqs. (4.28) and (4.29).
Find the variance of the exponential random variable.
(a) Show that the mean of the Weibull random variable in Problem 4.15 is ≠11 + 1/b2
where ≠1x2 is the gamma function defined in Eq. (4.56).
(b) Find the second moment and the variance of the Weibull random variable.
Explain why the mean of the Cauchy random variable does not exist.
Show that E[X] does not exist for the Pareto random variable with a = 1 and xm = 1.
Verify Eqs. (4.36), (4.37), and (4.38).
Let Y = A cos1vt2 + c where A has mean m and variance s2 and v and c are constants.
Find the mean and variance of Y. Compare the results to those obtained in Example 4.15.
A limiter is shown in Fig. P4.2.
g(x)
a
a
a
FIGURE P4.2
0
a
x
220
Chapter 4
One Random Variable
(a) Find an expression for the mean and variance of Y = g(X) for an arbitrary continuous random variable X.
(b) Evaluate the mean and variance if X is a Laplacian random variable with l = a = 1.
(c) Repeat part (b) if X is from Problem 4.17 with a = 1/2.
(d) Evaluate the mean and variance if X = U3 where U is a uniform random variable in
the unit interval, 3-1, 14 and a = 1/2.
4.55. A limiter with center-level clipping is shown in Fig. P4.3.
(a) Find an expression for the mean and variance of Y = g(X) for an arbitrary continuous random variable X.
(b) Evaluate the mean and variance if X is Laplacian with l = a = 1 and b = 2.
(c) Repeat part (b) if X is from Problem 4.22, a = 1/2, b = 3/2.
(d) Evaluate the mean and variance if X = b cos12pU2 where U is a uniform random
variable in the unit interval 3-1, 14 and a = 3/4, b = 1/2.
y
b
b
a
a
b
x
b
FIGURE P4.3
4.56. Let Y = 3X + 2.
(a) Find the mean and variance of Y in terms of the mean and variance of X.
(b) Evaluate the mean and variance of Y if X is Laplacian.
(c) Evaluate the mean and variance of Y if X is an arbitrary Gaussian random variable.
(d) Evaluate the mean and variance of Y if X = b cos12pU2 where U is a uniform random variable in the unit interval.
4.57. Find the nth moment of U, the uniform random variable in the unit interval. Repeat for X
uniform in [a, b].
4.58. Consider the quantizer in Example 4.20.
(a) Find the conditional pdf of X given that X is in the interval (d, 2d).
(b) Find the conditional expected value and conditional variance of X given that X is in
the interval (d, 2d).
Problems
221
(c) Now suppose that when X falls in (d, 2d), it is mapped onto the point c where
d 6 c 6 2d. Find an expression for the expected value of the mean square error:
E31X - c22 ƒ d 6 X 6 2d4.
(d) Find the value c that minimizes the above mean square error. Is c the midpoint of
the interval? Explain why or why not by sketching possible conditional pdf shapes.
(e) Find an expression for the overall mean square error using the approach in parts c and d.
Section 4.4: Important Continuous Random Variables
4.59. Let X be a uniform random variable in the interval 3 -2, 24. Find and plot P3 ƒ X ƒ 7 x4.
4.60. In Example 4.20, let the input to the quantizer be a uniform random variable in the interval 3 -4d, 4d4. Show that Z = X - Q1X2 is uniformly distributed in 3-d/2, d/24.
4.61. Let X be an exponential random variable with parameter l.
(a) For d 7 0 and k a nonnegative integer, find P3kd 6 X 6 1k + 12d4.
(b) Segment the positive real line into four equiprobable disjoint intervals.
4.62. The rth percentile, p1r2, of a random variable X is defined by P3X … p1r24 = r/100.
(a) Find the 90%, 95%, and 99% percentiles of the exponential random variable with
parameter l.
(b) Repeat part a for the Gaussian random variable with parameters m = 0 and s2.
4.63. Let X be a Gaussian random variable with m = 5 and s2 = 16.
(a) Find P3X 7 44, P3X Ú 74, P36.72 6 X 6 10.164, P32 6 X 6 74, P36 … X … 84.
(b) P3X 6 a4 = 0.8869, find a.
(c) P3X 7 b4 = 0.11131, find b.
(d) P313 6 X … c4 = 0.0123, find c.
4.64. Show that the Q-function for the Gaussian random variable satisfies Q1-x2 = 1 - Q1x2.
4.65. Use Octave to generate Tables 4.2 and 4.3.
4.66. Let X be a Gaussian random variable with mean m and variance s2.
(a) Find P3X … m4.
(b) Find P3 ƒ X - m ƒ 6 ks4, for k = 1, 2, 3, 4, 5, 6.
(c) Find the value of k for which Q1k2 = P3X 7 m + ks4 = 10-j for j = 1, 2, 3, 4, 5, 6.
4.67. A binary transmission system transmits a signal X ( -1 to send a “0” bit; +1 to send a “1”
bit). The received signal is Y = X + N where noise N has a zero-mean Gaussian distribution with variance s2. Assume that “0” bits are three times as likely as “1” bits.
(a) Find the conditional pdf of Y given the input value: fY1y ƒ X = +12 and
fY1y ƒ X = -12.
(b) The receiver decides a “0” was transmitted if the observed value of y satisfies
fY1y ƒ X = -12P3X = -14 7 fY1y ƒ X = +12P3X = +14
and it decides a “1” was transmitted otherwise. Use the results from part a to show
that this decision rule is equivalent to: If y 6 T decide “0”; if y Ú T decide “1”.
(c) What is the probability that the receiver makes an error given that a +1 was transmitted? a -1 was transmitted? Assume s2 = 1/16.
(d) What is the overall probability of error?
222
Chapter 4
One Random Variable
4.68. Two chips are being considered for use in a certain system. The lifetime of chip 1 is modeled by a Gaussian random variable with mean 20,000 hours and standard deviation
5000 hours. (The probability of negative lifetime is negligible.) The lifetime of chip 2 is
also a Gaussian random variable but with mean 22,000 hours and standard deviation
1000 hours. Which chip is preferred if the target lifetime of the system is 20,000 hours?
24,000 hours?
4.69. Passengers arrive at a taxi stand at an airport at a rate of one passenger per minute. The
taxi driver will not leave until seven passengers arrive to fill his van. Suppose that passenger interarrival times are exponential random variables, and let X be the time to fill a
van. Find the probability that more than 10 minutes elapse until the van is full.
4.70. (a) Show that the gamma random variable has mean:
E3X4 = a/l.
(b) Show that the gamma random variable has second moment, and variance given by:
E3X24 = a1a + 12/l2 and VAR3X4 = a/l2.
4.71.
4.72.
4.73.
4.74.
4.75.
(c) Use parts a and b to obtain the mean and variance of an m-Erlang random variable.
(d) Use parts a and b to obtain the mean and variance of a chi-square random variable.
The time X to complete a transaction in a system is a gamma random variable with mean
4 and variance 8. Use Octave to plot P3X 7 x4 as a function of x. Note: Octave uses
b = 1/2.
(a) Plot the pdf of an m-Erlang random variable for m = 1, 2, 3 and l = 1.
(b) Plot the chi-square pdf for k = 1, 2, 3.
A repair person keeps four widgets in stock. What is the probability that the widgets in
stock will last 15 days if the repair person needs to replace widgets at an average rate of
one widget every three days, where the time between widget failures is an exponential
random variable?
(a) Find the cdf of the m-Erlang random variable by integration of the pdf. Hint: Use integration by parts.
(b) Show that the derivative of the cdf given by Eq. (4.58) gives the pdf of an m-Erlang
random variable.
Plot the pdf of a beta random variable with: a = b = 1/4, 1, 4, 8; a = 5, b = 1; a = 1, b = 3;
a = 2, b = 5.
Section 4.5: Functions of a Random Variable
4.76. Let X be a Gaussian random variable with mean 2 and variance 4. The reward in a system
is given by Y = 1X2 + . Find the pdf of Y.
4.77. The amplitude of a radio signal X is a Rayleigh random variable with pdf:
fX1x2 =
x -x2/2a2
e
a2
x 7 0, a 7 0.
(a) Find the pdf of Z = 1X - r2 + .
(b) Find the pdf of Z = X2.
4.78. A wire has length X, an exponential random variable with mean 5p cm. The wire is cut to
make rings of diameter 1 cm. Find the probability for the number of complete rings produced by each length of wire.
Problems
223
4.79. A signal that has amplitudes with a Gaussian pdf with zero mean and unit variance is applied to the quantizer in Example 4.27.
(a) Pick d so that the probability that X falls outside the range of the quantizer is 1%.
(b) Find the probability of the output levels of the quantizer.
4.80. The signal X is amplified and shifted as follows: Y = 2X + 3, where X is the random
variable in Problem 4.12. Find the cdf and pdf of Y.
4.81. The net profit in a transaction is given by Y = 2 - 4X where X is the random variable in
Problem 4.13. Find the cdf and pdf of Y.
4.82. Find the cdf and pdf of the output of the limiter in Problem 4.54 parts b, c, and d.
4.83. Find the cdf and pdf of the output of the limiter with center-level clipping in Problem 4.55
parts b, c, and d.
4.84. Find the cdf and pdf of Y = 3X + 2 in Problem 4.56 parts b, c, and d.
4.85. The exam grades in a certain class have a Gaussian pdf with mean m and standard deviation s. Find the constants a and b so that the random variable y = aX + b has a Gaussian pdf with mean m¿ and standard deviation s¿.
4.86. Let X = Un where n is a positive integer and U is a uniform random variable in the unit
interval. Find the cdf and pdf of X.
4.87. Repeat Problem 4.86 if U is uniform in the interval 3-1, 14.
4.88. Let Y = ƒ X ƒ be the output of a full-wave rectifier with input voltage X.
(a) Find the cdf of Y by finding the equivalent event of 5Y … y6. Find the pdf of Y by
differentiation of the cdf.
(b) Find the pdf of Y by finding the equivalent event of 5y 6 Y … y + dy6. Does the
answer agree with part a?
(c) What is the pdf of Y if the fX1x2 is an even function of x?
4.89. Find and plot the cdf of Y in Example 4.34.
4.90. A voltage X is a Gaussian random variable with mean 1 and variance 2. Find the pdf of
the power dissipated by an R-ohm resistor P = RX2.
4.91. Let Y = eX.
(a) Find the cdf and pdf of Y in terms of the cdf and pdf of X.
(b) Find the pdf of Y when X is a Gaussian random variable. In this case Y is said to be
a lognormal random variable. Plot the pdf and cdf of Y when X is zero-mean with
variance 1/8; repeat with variance 8.
4.92. Let a radius be given by the random variable X in Problem 4.18.
(a) Find the pdf of the area covered by a disc with radius X.
(b) Find the pdf of the volume of a sphere with radius X.
(c) Find the pdf of the volume of a sphere in Rn:
Y = b
12p21n - 12/2 Xn/12 * 4 * Á * n2
212p21n - 12/2 Xn/11 * 3 * Á * n2
for n even
for n odd.
4.93. In the quantizer in Example 4.20, let Z = X - q1X2. Find the pdf of Z if X is a Laplacian random variable with parameter a = d/2.
4.94. Let Y = a tan pX, where X is uniformly distributed in the interval 1-1, 12.
(a) Show that Y is a Cauchy random variable.
(b) Find the pdf of Y = 1/X.
224
Chapter 4
One Random Variable
4.95. Let X be a Weibull random variable in Problem 4.15. Let Y = 1X/l2b. Find the cdf and
pdf of Y.
4.96. Find the pdf of X = -ln11 - U2, where U is a uniform random variable in (0, 1).
Section 4.6: The Markov and Chebyshev Inequalities
4.97. Compare the Markov inequality and the exact probability for the event 5X 7 c6 as a function of c for:
(a) X is a uniform random variable in the interval [0, b].
(b) X is an exponential random variable with parameter l.
(c) X is a Pareto random variable with a 7 1.
(d) X is a Rayleigh random variable.
4.98. Compare the Markov inequality and the exact probability for the event 5X 7 c6 as a function of c for:
(a) X is a uniform random variable in 51, 2, Á , L6.
(b) X is a geometric random variable.
(c) X is a Zipf random variable with L = 10; L = 100.
(d) X is a binomial random variable with n = 10, p = 0.5; n = 50, p = 0.5.
4.99. Compare the Chebyshev inequality and the exact probability for the event 5 ƒ X - m ƒ 7 c6
as a function of c for:
(a) X is a uniform random variable in the interval 3-b, b4.
(b) X is a Laplacian random variable with parameter a.
(c) X is a zero-mean Gaussian random variable.
(d) X is a binomial random variable with n = 10, p = 0.5; n = 50, p = 0.5.
4.100. Let X be the number of successes in n Bernoulli trials where the probability of success is
p. Let Y = X/n be the average number of successes per trial. Apply the Chebyshev inequality to the event 5 ƒ Y - p ƒ 7 a6. What happens as n : q ?
4.101. Suppose that light bulbs have exponentially distributed lifetimes with unknown mean
E[X]. Suppose we measure the lifetime of n light bulbs, and we estimate the mean E[X]
by the arithmetic average Y of the measurements. Apply the Chebyshev inequality to the
event 5 ƒ Y - E3X4 ƒ 7 a6. What happens as n : q ? Hint: Use the m-Erlang random
variable.
Section 4.7: Transform Methods
4.102. (a) Find the characteristic function of the uniform random variable in 3 -b, b4.
(b) Find the mean and variance of X by applying the moment theorem.
4.103. (a) Find the characteristic function of the Laplacian random variable.
(b) Find the mean and variance of X by applying the moment theorem.
4.104. Let £ X1v2 be the characteristic function of an exponential random variable. What random variable does £ nX1v2 correspond to?
Problems
225
4.105. Find the mean and variance of the Gaussian random variable by applying the moment
theorem to the characteristic function given in Table 4.1.
4.106. Find the characteristic function of Y = aX + b where X is a Gaussian random variable.
Hint: Use Eq. (4.79).
4.107. Show that the characteristic function for the Cauchy random variable is e -ƒvƒ.
4.108. Find the Chernoff bound for the exponential random variable with l = 1. Compare the
bound to the exact value for P3X 7 54.
4.109. (a) Find the probability generating function of the geometric random variable.
(b) Find the mean and variance of the geometric random variable from its pgf.
4.110. (a) Find the pgf for the binomial random variable X with parameters n and p.
(b) Find the mean and variance of X from the pgf.
4.111. Let GX1z2 be the pgf for a binomial random variable with parameters n and p, and let
GY1z2 be the pgf for a binomial random variable with parameters m and p. Consider the
function GX1z2 GY1z2. Is this a valid pgf? If so, to what random variable does it correspond?
4.112. Let GN1z2 be the pgf for a Poisson random variable with parameter a, and let GM1z2 be
the pgf for a Poisson random variable with parameters b. Consider the function
GN1z2 GM1z2. Is this a valid pgf? If so, to what random variable does it correspond?
4.113. Let N be a Poisson random variable with parameter a = 1. Compare the Chernoff bound
and the exact value for P3X Ú 54.
4.114. (a) Find the pgf GU1z2 for the discrete uniform random variable U.
(b) Find the mean and variance from the pgf.
(c) Consider GU1z22. Does this function correspond to a pgf? If so, find the mean of the
corresponding random variable.
4.115. (a) Find P3X = r4 for the negative binomial random variable from the pgf in Table 3.1.
(b) Find the mean of X.
4.116. Derive Eq. (4.89).
4.117. Obtain the nth moment of a gamma random variable from the Laplace transform of
its pdf.
4.118. Let X be the mixture of two exponential random variables (see Example 4.58). Find the
Laplace transform of the pdf of X.
4.119. The Laplace transform of the pdf of a random variable X is given by:
X * 1s2 =
b
a
.
s + as + b
Find the pdf of X. Hint: Use a partial fraction expansion of X*1s2.
4.120. Find a relationship between the Laplace transform of a gamma random variable pdf with
parameters a and l and the Laplace transform of a gamma random variable with parameters a - 1 and l. What does this imply if X is an m-Erlang random variable?
4.121. (a) Find the Chernoff bound for P3X 7 t4 for the gamma random variable.
(b) Compare the bound to the exact value of P3X Ú 94 for an m = 3, l = 1 Erlang
random variable.
226
Chapter 4
One Random Variable
Section 4.8: Basic Reliability Calculations
4.122. The lifetime T of a device has pdf
1/10T0
fT1t2 = c 0.9le -l1t - T02
0
0 6 t 6 T0
t Ú T0
t 6 T0 .
(a) Find the reliability and MTTF of the device.
(b) Find the failure rate function.
(c) How many hours of operation can be considered to achieve 99% reliability?
4.123. The lifetime T of a device has pdf
fT1t2 = b
4.124.
4.125.
4.126.
4.127.
4.128.
1/T0
0
a … t … a + T0
elsewhere.
(a) Find the reliability and MTTF of the device.
(b) Find the failure rate function.
(c) How many hours of operation can be considered to achieve 99% reliability?
The lifetime T of a device is a Rayleigh random variable.
(a) Find the reliability of the device.
(b) Find the failure rate function. Does r(t) increase with time?
(c) Find the reliability of two devices that are in series.
(d) Find the reliability of two devices that are in parallel.
The lifetime T of a device is a Weibull random variable.
(a) Plot the failure rates for a = 1 and b = 0.5; for a = 1 and b = 2.
(b) Plot the reliability functions in part a.
(c) Plot the reliability of two devices that are in series.
(d) Plot the reliability of two devices that are in parallel.
A system starts with m devices, 1 active and m - 1 on standby. Each device has an exponential lifetime. When a device fails it is immediately replaced with another device (if one
is still available).
(a) Find the reliability of the system.
(b) Find the failure rate function.
Find the failure rate function of the memory chips discussed in Example 2.28. Plot
In(r(t)) versus at.
A device comes from two sources. Devices from source 1 have mean m and exponentially
distributed lifetimes. Devices from source 2 have mean m and Pareto-distributed lifetimes
with a 7 1. Assume a fraction p is from source 1 and a fraction 1 - p from source 2.
(a) Find the reliability of an arbitrarily selected device.
(b) Find the failure rate function.
Problems
227
4.129. A device has the failure rate function:
1 + 911 - t2
r1t2 = c 1
1 + 101t - 102
4.130.
4.131.
4.132.
4.133.
0 … t 6 1
1 … t 6 10
t Ú 10.
Find the reliability function and the pdf of the device.
A system has three identical components and the system is functioning if two or more
components are functioning.
(a) Find the reliability and MTTF of the system if the component lifetimes are exponential random variables with mean 1.
(b) Find the reliability of the system if one of the components has mean 2.
Repeat Problem 4.130 if the component lifetimes are Weibull distributed with b = 3.
A system consists of two processors and three peripheral units. The system is functioning
as long as one processor and two peripherals are functioning.
(a) Find the system reliability and MTTF if the processor lifetimes are exponential random variables with mean 5 and the peripheral lifetimes are Rayleigh random variables with mean 10.
(b) Find the system reliability and MTTF if the processor lifetimes are exponential random variables with mean 10 and the peripheral lifetimes are exponential random
variables with mean 5.
An operation is carried out by a subsystem consisting of three units that operate in a series configuration.
(a) The units have exponentially distributed lifetimes with mean 1. How many subsystems should be operated in parallel to achieve a reliability of 99% in T hours of
operation?
(b) Repeat part a with Rayleigh-distributed lifetimes.
(c) Repeat part a with Weibull-distributed lifetimes with b = 3.
Section 4.9: Computer Methods for Generating Random Variables
4.134. Octave provides function calls to evaluate the pdf and cdf of important continuous random variables. For example, the functions \normal_cdf(x, m, var) and normal_pdf(x, m,
var) compute the cdf and pdf, respectively, at x for a Gaussian random variable with
mean m and variance var.
(a) Plot the conditional pdfs in Example 4.11 if v = ;2 and the noise is zero-mean and
unit variance.
(b) Compare the cdf of the Gaussian random variable with the Chernoff bound obtained in Example 4.44.
4.135. Plot the pdf and cdf of the gamma random variable for the following cases.
(a) l = 1 and a = 1, 2, 4.
(b) l = 1/2 and a = 1/2, 1, 3/2, 5/2.
228
Chapter 4
One Random Variable
4.136. The random variable X has the triangular pdf shown in Fig. P4.4.
(a) Find the transformation needed to generate X.
(b) Use Octave to generate 100 samples of X. Compare the empirical pdf of the samples
with the desired pdf.
fX (x)
c
⫺a
0
a
x
FIGURE P4.4
4.137. For each of the following random variables: Find the transformation needed to generate
the random variable X; use Octave to generate 1000 samples of X; Plot the sequence of
outcomes; compare the empirical pdf of the samples with the desired pdf.
(a) Laplacian random variable with a = 1.
(b) Pareto random variable with a = 1.5, 2, 2.5.
(c) Weibull random variable with b = 0.5, 2, 3 and l = 1.
4.138. A random variable Y of mixed type has pdf
fY1x2 = pd1x2 + 11 - p2fY1x2,
4.139.
4.140.
4.141.
4.142.
where X is a Laplacian random variable and p is a number between zero and one. Find
the transformation required to generate Y.
Specify the transformation method needed to generate the geometric random variable
with parameter p = 1/2. Find the average number of comparisons needed in the search
to determine each outcome.
Specify the transformation method needed to generate the Poisson random variable with
small parameter a. Compute the average number of comparisons needed in the search.
The following rejection method can be used to generate Gaussian random variables:
1. Generate U1 , a uniform random variable in the unit interval.
2. Let X1 = -ln1U12.
3. Generate U2 , a uniform random variable in the unit interval. If U2 …
exp5-1X 1 - 122/26, accept X1 . Otherwise, reject X1 and go to step 1.
4. Generate a random sign 1+ or -2 with equal probability. Output X equal to X1
with the resulting sign.
(a) Show that if X1 is accepted, then its pdf corresponds to the pdf of the absolute value
of a Gaussian random variable with mean 0 and variance 1.
(b) Show that X is a Gaussian random variable with mean 0 and variance 1.
Cheng (1977) has shown that the function KfZ1x2 bounds the pdf of a gamma random
variable with a 7 1, where
fZ1x2 =
lalxl - 1
1al + xl22
and
K = 12a - 121/2.
Find the cdf of fZ1x2 and the corresponding transformation needed to generate Z.
Problems
229
4.143. (a) Show that in the modified rejection method, the probability of accepting X1 is 1/K.
Hint: Use conditional probability.
(b) Show that Z has the desired pdf.
4.144. Two methods for generating binomial random variables are: (1) Generate n Bernoulli
random variables and add the outcomes; (2) Divide the unit interval according to binomial probabilities. Compare the methods under the following conditions:
(a) p = 1/2, n = 5, 25, 50;
(b) p = 0.1, n = 5, 25, 50.
(c) Use Octave to implement the two methods by generating 1000 binomially distributed samples.
4.145. Let the number of event occurrences in a time interval be a Poisson random variable. In
Section 3.4, it was found that the time between events for a Poisson random variable is an
exponentially distributed random variable.
(a) Explain how one can generate Poisson random variables from a sequence of exponentially distributed random variables.
(b) How does this method compare with the one presented in Problem 4.140?
(c) Use Octave to implement the two methods when a = 3, a = 25, and a = 100.
4.146. Write a program to generate the gamma pdf with a 7 1 using the rejection method discussed in Problem 4.142. Use this method to generate m-Erlang random variables with
m = 2, 10 and l = 1 and compare the method to the straightforward generation of m exponential random variables as discussed in Example 4.57.
*Section 4.10: Entropy
4.147. Let X be the outcome of the toss of a fair die.
(a) Find the entropy of X.
(b) Suppose you are told that X is even. What is the reduction in entropy?
4.148. A biased coin is tossed three times.
(a) Find the entropy of the outcome if the sequence of heads and tails is noted.
(b) Find the entropy of the outcome if the number of heads is noted.
(c) Explain the difference between the entropies in parts a and b.
4.149. Let X be the number of tails until the first heads in a sequence of tosses of a biased coin.
(a) Find the entropy of X given that X Ú k.
(b) Find the entropy of X given that X … k.
4.150. One of two coins is selected at random: Coin A has P[heads] = 1/10 and coin B has
P[heads] = 9/10.
(a) Suppose the coin is tossed once. Find the entropy of the outcome.
(b) Suppose the coin is tossed twice and the sequence of heads and tails is observed.
Find the entropy of the outcome.
4.151. Suppose that the randomly selected coin in Problem 4.150 is tossed until the first occurrence of heads. Suppose that heads occurs in the kth toss. Find the entropy regarding the
identity of the coin.
4.152. A communication channel accepts input I from the set 50, 1, 2, 3, 4, 5, 66. The channel
output is X = I + N mod 7, where N is equally likely to be +1 or -1.
(a) Find the entropy of I if all inputs are equiprobable.
(b) Find the entropy of I given that X = 4.
230
Chapter 4
One Random Variable
4.153. Let X be a discrete random variable with entropy HX .
(a) Find the entropy of Y = 2X.
(b) Find the entropy of any invertible transformation of X.
4.154. Let (X, Y) be the pair of outcomes from two independent tosses of a die.
(a) Find the entropy of X.
(b) Find the entropy of the pair (X, Y).
(c) Find the entropy in n independent tosses of a die. Explain why entropy is additive in
this case.
4.155. Let X be the outcome of the toss of a die, and let Y be a randomly selected integer less
than or equal to X.
(a) Find the entropy of Y.
(b) Find the entropy of the pair (X, Y) and denote it by H(X, Y).
(c) Find the entropy of Y given X = k and denote it by g1k2 = H1Y ƒ X = k2. Find
E3g1X24 = E3H1Y ƒ X24.
(d) Show that H1X, Y2 = HX + E3H1Y ƒ X24. Explain the meaning of this equation.
4.156. Let X take on values from 51, 2, Á , K6. Suppose that P3X = K4 = p, and let HY be the
entropy of X given that X is not equal to K. Show that HX = -p ln p - 11 - p2
ln11 - p2 + 11 - p2HY .
4.157. Let X be a uniform random variable in Example 4.62. Find and plot the entropy of Q as a
function of the variance of the error X - Q1X2. Hint: Express the variance of the error
in terms of d and substitute into the expression for the entropy of Q.
4.158. A communication channel accepts as input either 000 or 111. The channel transmits each
binary input correctly with probability 1 - p and erroneously with probability p. Find
the entropy of the input given that the output is 000; given that the output is 010.
4.159. Let X be a uniform random variable in the interval 3-a, a4. Suppose we are told that the
X is positive. Use the approach in Example 4.62 to find the reduction in entropy. Show
that this is equal to the difference of the differential entropy of X and the differential entropy of X given 5X 7 06.
4.160. Let X be uniform in [a, b], and let Y = 2X. Compare the differential entropies of X and
Y. How does this result differ from the result in Problem 4.153?
4.161. Find the pmf for the random variable X for which the sequence of questions in Fig. 4.26(a)
is optimum.
4.162. Let the random variable X have SX = 51, 2, 3, 4, 5, 66 and pmf (3/8, 3/8, 1/8, 1/16, 1/32,
1/32). Find the entropy of X. What is the best code you can find for X?
4.163. Seven cards are drawn from a deck of 52 distinct cards. How many bits are required to
represent all possible outcomes?
4.164. Find the optimum encoding for the geometric random variable with p = 1/2.
4.165. An urn experiment has 10 equiprobable distinct outcomes. Find the performance of the
best tree code for encoding (a) a single outcome of the experiment; (b) a sequence of n
outcomes of the experiment.
4.166. A binary information source produces n outputs. Suppose we are told that there are k 1’s
in these n outputs.
(a) What is the best code to indicate which pattern of k 1’s and n - k 0’s occurred?
(b) How many bits are required to specify the value of k using a code with a fixed number of bits?
Problems
231
4.167. The random variable X takes on values from the set 51, 2, 3, 46. Find the maximum entropy pmf for X given that E3X4 = 2.
4.168. The random variable X is nonnegative. Find the maximum entropy pdf for X given that
E3X4 = 10.
4.169. Find the maximum entropy pdf of X given that E3X24 = c.
4.170. Suppose we are given two parameters of the random variable X, E3g11X24 = c1 and
E3g21X24 = c2 .
(a) Show that the maximum entropy pdf for X has the form
fX1x2 = Ce -l1g11x2 - l2g21x2.
(b) Find the entropy of X.
4.171. Find the maximum entropy pdf of X given that E3X4 = m and VAR3X4 = s2.
Problems Requiring Cumulative Knowledge
4.172. Three types of customers arrive at a service station. The time required to service type 1
customers is an exponential random variable with mean 2. Type 2 customers have a Pareto distribution with a = 3 and xm = 1. Type 3 customers require a constant service time
of 2 seconds. Suppose that the proportion of type 1, 2, and 3 customers is 1/2, 1/8, and 3/8,
respectively. Find the probability that an arbitrary customer requires more than 15 seconds of service time. Compare the above probability to the bound provided by the
Markov inequality.
4.173. The lifetime X of a light bulb is a random variable with
P3X 7 t4 = 2/12 + t2 for t 7 0.
Suppose three new light bulbs are installed at time t = 0. At time t = 1 all three light
bulbs are still working. Find the probability that at least one light bulb is still working at
time t = 9.
4.174. The random variable X is uniformly distributed in the interval [0, a]. Suppose a is unknown, so we estimate a by the maximum value observed in n independent repetitions of
the experiment; that is, we estimate a by Y = max5X1 , X2 , Á , Xn6.
(a) Find P3Y … y4.
(b) Find the mean and variance of Y, and explain why Y is a good estimate for a when N
is large.
4.175. The sample X of a signal is a Gaussian random variable with m = 0 and s2 = 1. Suppose
that X is quantized by a nonuniform quantizer consisting of four intervals:
1- q , -a4, 1-a, 04, 10, a4, and 1a, q 2.
(a) Find the value of a so that X is equally likely to fall in each of the four intervals.
(b) Find the representation point xi = q1X2 for X in (0, a] that minimizes the meansquared error, that is,
3
0
a
1x - x122 fX1x2 dx is minimized.
Hint: Differentiate the above expression with respect to xi . Find the representation
points for the other intervals.
(c) Evaluate the mean-squared error of the quantizer E31X - q1X224.
232
Chapter 4
One Random Variable
4.176. The output Y of a binary communication system is a unit-variance Gaussian random with
mean zero when the input is “0” and mean one when the input is “one”. Assume the input
is 1 with probability p.
(a) Find P3input is 1 ƒ y 6 Y 6 y + h4 and P3input is 0 ƒ y 6 Y 6 y + h4.
(b) The receiver uses the following decision rule:
If P3input is 1 ƒ y 6 Y 6 y + h4 7 P3input is 0 ƒ y 6 Y 6 y + h4, decide input
was 1; otherwise, decide input was 0.
Show that this decision rule leads to the following threshold rule:
If Y 7 T, decide input was 1; otherwise, decide input was 0.
(c) What is the probability of error for the above decision rule?
CHAPTER
Pairs of Random
Variables
5
Many random experiments involve several random variables. In some experiments a
number of different quantities are measured. For example, the voltage signals at several points in a circuit at some specific time may be of interest. Other experiments involve the repeated measurement of a certain quantity such as the repeated
measurement (“sampling”) of the amplitude of an audio or video signal that varies
with time. In Chapter 4 we developed techniques for calculating the probabilities of
events involving a single random variable in isolation. In this chapter, we extend the
concepts already introduced to two random variables:
• We use the joint pmf, cdf, and pdf to calculate the probabilities of events that involve the joint behavior of two random variables;
• We use expected value to define joint moments that summarize the behavior of
two random variables;
• We determine when two random variables are independent, and we quantify
their degree of “correlation” when they are not independent;
• We obtain conditional probabilities involving a pair of random variables.
In a sense we have already covered all the fundamental concepts of probability
and random variables, and we are “simply” elaborating on the case of two or more random variables. Nevertheless, there are significant analytical techniques that need to be
learned, e.g., double summations of pmf’s and double integration of pdf’s, so we first
discuss the case of two random variables in detail because we can draw on our geometric intuition. Chapter 6 considers the general case of vector random variables. Throughout these two chapters you should be mindful of the forest (fundamental concepts) and
the trees (specific techniques)!
5.1
TWO RANDOM VARIABLES
The notion of a random variable as a mapping is easily generalized to the case where
two quantities are of interest. Consider a random experiment with sample space S and
event class F. We are interested in a function that assigns a pair of real numbers
233
234
Chapter 5
Pairs of Random Variables
S
R2
y
X(z)
z
x
(a)
S
y
A
X(z)
z
B
x
(b)
FIGURE 5.1
(a) A function assigns a pair of real numbers to each outcome
in S. (b) Equivalent events for two random variables.
X1z2 = 1X1z2, Y1z22 to each outcome z in S. Basically we are dealing with a vector
function that maps S into R 2, the real plane, as shown in Fig. 5.1(a). We are ultimately interested in events involving the pair (X, Y).
Example 5.1
Let a random experiment consist of selecting a student’s name from an urn. Let z denote the
outcome of this experiment, and define the following two functions:
H1z2 = height of student z in centimeters
W1z2 = weight of student z in kilograms
1H1z2, W1z22 assigns a pair of numbers to each z in S.
We are interested in events involving the pair (H, W). For example, the event
B = 5H … 183, W … 826 represents students with height less that 183 cm (6 feet) and weight less
than 82 kg (180 lb).
Example 5.2
A Web page provides the user with a choice either to watch a brief ad or to move directly to the
requested page. Let z be the patterns of user arrivals in T seconds, e.g., number of arrivals, and
listing of arrival times and types. Let N11z2 be the number of times the Web page is directly requested and let N21z2 be the number of times that the ad is chosen. 1N11z2, N21z22 assigns a pair
of nonnegative integers to each z in S. Suppose that a type 1 request brings 0.001¢ in revenue
and a type 2 request brings in 1¢. Find the event “revenue in T seconds is less than $100.”
The total revenue in T seconds is 0.001 N1 + 1 N2 , and so the event of interest is
B = 50.001 N1 + 1 N2 6 10,0006.
Section 5.1
Two Random Variables
235
Example 5.3
Let the outcome z in a random experiment be the length of a randomly selected message. Suppose that messages are broken into packets of maximum length M bytes. Let Q be the number of
full packets in a message and let R be the number of bytes left over. 1Q1z2, R1z22 assigns a pair
of numbers to each z in S. Q takes on values in the range 0, 1, 2, Á , and R takes on values in the
range 0, 1, Á , M - 1. An event of interest may be B = 5R 6 M/26, “the last packet is less than
half full.”
Example 5.4
Let the outcome of a random experiment result in a pair z = 1z1 , z22 that results from two independent spins of a wheel. Each spin of the wheel results in a number in the interval 10, 2p].
Define the pair of numbers (X, Y) in the plane as follows:
X1z2 = ¢ 2 ln
2p 1/2
≤ cos z2
z1
Y1z2 = ¢ 2 ln
2p 1/2
≤ sin z2 .
z1
The vector function 1X1z2, Y1z22 assigns a pair of numbers in the plane to each z in S. The
square root term corresponds to a radius and to z2 an angle.
We will see that (X, Y) models the noise voltages encountered in digital communication
systems. An event of interest here may be B = 5X2 + Y2 6 r26, “total noise power is less
than r2.”
The events involving a pair of random variables (X, Y) are specified by conditions
that we are interested in and can be represented by regions in the plane. Figure 5.2
shows three examples of events:
A = 5X + Y … 106
B = 5min1X, Y2 … 56
C = 5X2 + Y2 … 1006.
Event A divides the plane into two regions according to a straight line. Note that the
event in Example 5.2 is of this type. Event C identifies a disk centered at the origin and
y
y
y
(0, 10)
(5, 5)
(0, 10)
C
B
(10, 0)
x
A
FIGURE 5.2
Examples of two-dimensional events.
x
(10, 0)
x
236
Chapter 5
Pairs of Random Variables
it corresponds to the event in Example 5.4. Event B is found by noting that
5min1X, Y2 … 56 = 5X … 56 ´ 5Y … 56, that is, the minimum of X and Y is less
than or equal to 5 if either X and/or Y is less than or equal to 5.
To determine the probability that the pair X = 1X, Y2 is in some region B in the
plane, we proceed as in Chapter 3 to find the equivalent event for B in the underlying
sample space S:
(5.1a)
A = X -11B2 = 5z: 1X1z2, Y1z22 in B6.
The relationship between A = X -11B2 and B is shown in Fig. 5.1(b). If A is in F, then
it has a probability assigned to it, and we obtain:
P3X in B4 = P3A4 = P35z: 1X1z2, Y1z22 in B64.
(5.1b)
The approach is identical to what we followed in the case of a single random variable.
The only difference is that we are considering the joint behavior of X and Y that is induced by the underlying random experiment.
A scattergram can be used to deduce the joint behavior of two random variables.
A scattergram plot simply places a dot at every observation pair (x, y) that results from
performing the experiment that generates (X, Y). Figure 5.3 shows the scattergram for
200 observations of four different pairs of random variables. The pairs in Fig. 5.3(a) appear to be uniformly distributed in the unit square. The pairs in Fig. 5.3(b) are clearly
confined to a disc of unit radius and appear to be more concentrated near the origin.
The pairs in Fig. 5.3(c) are concentrated near the origin, and appear to have circular
symmetry, but are not bounded to an enclosed region. The pairs in Fig. 5.3(d) again are
concentrated near the origin and appear to have a clear linear relationship of some
sort, that is, larger values of x tend to have linearly proportional increasing values of y.
We later introduce various functions and moments to characterize the behavior of
pairs of random variables illustrated in these examples.
The joint probability mass function, joint cumulative distribution function, and
joint probability density function provide approaches to specifying the probability law
that governs the behavior of the pair (X, Y). Our general approach is as follows. We
first focus on events that correspond to rectangles in the plane:
B = 5X in A 16 ¨ 5Y in A 26
(5.2)
where A k is a one-dimensional event (i.e., subset of the real line). We say that these
events are of product form. The event B occurs when both 5X in A 16 and 5Y in A 26
occur jointly. Figure 5.4 shows some two-dimensional product-form events. We use Eq.
(5.1b) to find the probability of product-form events:
P3B4 = P35X in A 16 ¨ 5Y in A 264 ! P3X in A 1 , Y in A n4.
(5.3)
By defining A appropriately we then obtain the joint pmf, joint cdf, and joint pdf of
(X, Y).
5.2
PAIRS OF DISCRETE RANDOM VARIABLES
Let the vector random variable X = 1X, Y2 assume values from some countable set
SX,Y = 51xj , yk2, j = 1, 2, Á , k = 1, 2, Á 6. The joint probability mass function of X
specifies the probabilities of the event 5X = x6 ¨ 5Y = y6:
Section 5.2
1
237
Pairs of Discrete Random Variables
1.5
1.0
0.8
0.5
0.6
y
y
0
0.4
–0.5
0.2
0
–1
0.2
0
0.6
0.4
0.8
–1.5
–1.5
1
–1
–0.5
0
x
(a)
y
4
3
3
2
2
1
1
y
0
0
–1
–1
–2
–2
–3
–3
–4
–4
–3
–2
1.5
1.0
(b)
4
–4
0.5
x
0
–1
1
2
3
4
–4
–3
–2
–1
0
1
3
2
x
x
(c)
(d)
FIGURE 5.3
A scattergram for 200 observations of four different pairs of random variables.
y
(x1, y2)
(x2, y2)
y
y
y2
y2
y1
y1
x
{x1 X x2} 傽 {Y y2}
x1
x2
x
{x1 X x2} 傽 {y1 Y y2}
FIGURE 5.4
Some two-dimensional product-form events.
⫺x1
x1
{ X x1} 傽 {y1 Y y2}
x
4
238
Chapter 5
Pairs of Random Variables
pX,Y1x, y2 = P35X = x6 ¨ 5Y = y64
for 1x, y2 H R2.
! P3X = x, Y = y4
(5.4a)
The values of the pmf on the set SX,Y provide the essential information:
pX,Y1xj , yk2 = P35X = xj6 ¨ 5Y = yk64
! P3X = xj , Y = yk4 1xj , yk2 H SX,Y .
(5.4b)
There are several ways of showing the pmf graphically: (1) For small sample
spaces we can present the pmf in the form of a table as shown in Fig. 5.5(a). (2) We can
present the pmf using arrows of height pX,Y1xj , yk2 placed at the points 51xj , yk26 in
the plane, as shown in Fig. 5.5(b), but this can be difficult to draw. (3) We can place dots
at the points 51xj , yk26 and label these with the corresponding pmf value as shown in
Fig. 5.5(c).
The probability of any event B is the sum of the pmf over the outcomes in B:
P3X in B4 = a a pX,Y1xj , yk2.
(5.5)
1xj,yk2 in B
Frequently it is helpful to sketch the region that contains the points in B as shown, for
example, in Fig. 5.6. When the event B is the entire sample space SX,Y , we have:
a a pX,Y1xj , yk2 = 1.
q
q
(5.6)
j=1 k=1
Example 5.5
A packet switch has two input ports and two output ports. At a given time slot a packet arrives at
each input port with probability 1/2, and is equally likely to be destined to output port 1 or 2. Let
X and Y be the number of packets destined for output ports 1 and 2, respectively. Find the pmf
of X and Y, and show the pmf graphically.
The outcome Ij for an input port j can take the following values: “n”, no packet arrival
(with probability 1/2); “a1”, packet arrival destined for output port 1 (with probability 1/4); “a2”,
packet arrival destined for output port 2 (with probability 1/4). The underlying sample space S
consists of the pair of input outcomes z = 1I1 , I22. The mapping for (X, Y) is shown in the table
below:
z
(n, n)
X, Y (0, 0)
(n, a1)
(n, a2)
(a1, n)
(a1, a1)
(a1, a2)
(a2, n)
(a2, a1)
(a2, a2)
(1, 0)
(0, 1)
(1, 0)
(2, 0)
(1, 1)
(0, 1)
(1, 1)
(0, 2)
The pmf of (X, Y) is then:
pX,Y10, 02 = P3z = 1n, n24 =
11
1
= ,
22
4
pX,Y10, 12 = P3z H 51n, a22, 1a2, n264 = 2 *
1
1
= ,
8
4
Pairs of Discrete Random Variables
PX (2) ⫽ 1/16
PX (1) ⫽ 6/16
PX (0) ⫽ 9/16
Section 5.2
PY (2) ⫽ 1/16
2
1/16
y 1
1/4
1/8
0
1/4
1/4
1/16
0
1
x
(a)
2
PY (1) ⫽ 6/16
PY (0) ⫽ 9/16
y
x
1
16
1
8
1
4
2
2
1
4
y
6
16
1
16
1
4
1
16
1
9
16
2
9
16
1
6
16
x
1
16
2
0
1
1
0
0
(b)
y
3
1
16
2
1
0
0
1
4
1
8
1
4
1
4
1
1
16
2
(c)
x
3
FIGURE 5.5
Graphical representations of pmf’s: (a) in table format; (b) use of arrows to show height;
(c) labeled dots corresponding to pmf value.
239
240
Chapter 5
Pairs of Random Variables
y
6
5
4
3
2
1/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
1/42
2/42
1/42
1/42
1/42
1/42
1
1
2
3
4
5
x
6
FIGURE 5.6
Showing the pmf via a sketch containing the points in B.
pX,Y11, 02 = P3z H 51n, a12, 1a1, n264 =
1
,
4
1
pX,Y11, 12 = P3z H 51a1, a22, 1a2, a1264 = ,
8
1
pX,Y10, 22 = P3z = 1a2, a224 =
,
16
1
.
pX,Y12, 02 = P3z = 1a1, a124 =
16
Figure 5.5(a) shows the pmf in tabular form where the number of rows and columns accommodate the range of X and Y respectively. Each entry in the table gives the pmf value for the
corresponding x and y. Figure 5.5(b) shows the pmf using arrows in the plane. An arrow of height
pX,Y1j, k2 is placed at each of the points in SX,Y = 510, 02, 10, 12, 11, 02, 11, 12, 10, 22, 12, 026.
Figure 5.5(c) shows the pmf using labeled dots in the plane. A dot with label pX,Y1j, k2 is placed
at each of the points in SX,Y .
Example 5.6
A random experiment consists of tossing two “loaded” dice and noting the pair of numbers
(X, Y) facing up. The joint pmf pX,Y1j, k2 for j = 1, Á , 6 and k = 1, Á , 6 is given by the twodimensional table shown in Fig. 5.6. The (j, k) entry in the table contains the value pX,Y1j, k2.
Find the P3min1X, Y2 = 34.
Figure 5.6 shows the region that corresponds to the set 5min1x, y2 = 36. The probability
of this event is given by:
Section 5.2
241
Pairs of Discrete Random Variables
P3min1X, Y2 = 34 = pX,Y16, 32 + pX,Y15, 32 + pX,Y14, 32
+ pX,Y13, 32 + pX,Y13, 42 + pX,Y13, 52 + pX,Y13, 62
= 6a
5.2.1
2
8
1
b +
=
.
42
42
42
Marginal Probability Mass Function
The joint pmf of X provides the information about the joint behavior of X and Y. We
are also interested in the probabilities of events involving each of the random variables
in isolation. These can be found in terms of the marginal probability mass functions:
pX1xj2 = P3X = xj4
= P3X = xj , Y = anything4
= P35X = xj and Y = y16 ´ 5X = xj and Y = y26 ´
= a pX,Y1xj , yk2,
q
Á4
(5.7a)
k=1
and similarly,
pY1yk2 = P3Y = yk4
= a pX,Y1xj , yk2.
q
(5.7b)
j=1
The marginal pmf’s satisfy all the properties of one-dimensional pmf’s, and they
supply the information required to compute the probability of events involving the
corresponding random variable.
The probability pX,Y1xj , yk2 can be interpreted as the long-term relative frequency
of the joint event 5X = Xj6 ¨ 5Y = Yk6 in a sequence of repetitions of the random
experiment. Equation (5.7a) corresponds to the fact that the relative frequency of the
event 5X = Xj6 is found by adding the relative frequencies of all outcome pairs in which
Xj appears. In general, it is impossible to deduce the relative frequencies of pairs of values
X and Y from the relative frequencies of X and Y in isolation. The same is true for pmf’s:
In general, knowledge of the marginal pmf’s is insufficient to specify the joint pmf.
Example 5.7
Find the marginal pmf for the output ports (X, Y) in Example 5.2.
Figure 5.5(a) shows that the marginal pmf is found by adding entries along a row or column
in the table. For example, by adding along the x = 1 column we have:
pX112 = P3X = 14 = pX,Y11, 02 + pX,Y11, 12 =
1
1
3
+ = .
4
8
8
Similarly, by adding along the y = 0 row:
pY102 = P3Y = 04 = pX,Y10, 02 + pX,Y11, 02 + pX,Y12, 02 =
Figure 5.5(b) shows the marginal pmf using arrows on the real line.
1
1
9
1
+ +
=
.
4
4
16
16
242
Chapter 5
Pairs of Random Variables
Example 5.8
Find the marginal pmf’s in the loaded dice experiment in Example 5.2.
The probability that X = 1 is found by summing over the first row:
P3X = 14 =
1
1
1
2
+
+ Á +
= .
42
42
42
6
Similarly, we find that P3X = j4 = 1/6 for j = 2, Á , 6. The probability that Y = k is found by
summing over the kth column. We then find that P3Y = k4 = 1/6 for k = 1, 2, Á , 6. Thus each
die, in isolation, appears to be fair in the sense that each face is equiprobable. If we knew only
these marginal pmf’s we would have no idea that the dice are loaded.
Example 5.9
In Example 5.3, let the number of bytes N in a message have a geometric distribution with parameter 1 - p and range SN = 50, 1, 2, Á 6. Find the joint pmf and the marginal pmf’s of Q and R.
If a message has N bytes, then the number of full packets is the quotient Q in the division
of N by M, and the number of remaining bytes is the remainder R. The probability of the pair
51q, r26 is given by
P3Q = q, R = r4 = P3N = qM + r4 = 11 - p2pqM + r.
The marginal pmf of Q is
P3Q = q4 = P3N in5qM, qM + 1, Á , qM + 1M - 1264
=
1M - 12
qM + k
a 11 - p2p
k=0
= 11 - p2pqM
1 - pM
= 11 - pM21pM2q
1 - p
q = 0, 1, 2, Á
The marginal pmf of Q is geometric with parameter pM. The marginal pmf of R is:
P3R = r4 = P3N in5r, M + r, 2M + r, Á 64
q
11 - p2
= a 11 - p2pqM + r =
pr r = 0, 1, Á , M - 1.
1 - pM
q=0
R has a truncated geometric pmf. As an exercise, you should verify that all the above marginal
pmf’s add to 1.
5.3
THE JOINT CDF OF X AND Y
In Chapter 3 we saw that semi-infinite intervals of the form 1- q , x4 are a basic building block from which other one-dimensional events can be built. By defining the cdf
FX1x2 as the probability of 1- q , x4, we were then able to express the probabilities of
other events in terms of the cdf. In this section we repeat the above development for
two-dimensional random variables.
Section 5.3
y
The Joint cdf of x and y
243
FX, Y (x1y1) ⫽ P[X x1, Y y1]
(x1, y1)
x
FIGURE 5.7
The joint cumulative distribution function is defined as
the probability of the semi-infinite rectangle defined by
the point 1x1 , y12.
A basic building block for events involving two-dimensional random variables is
the semi-infinite rectangle defined by 51x, y2: x … x1 and y … y16, as shown in Fig. 5.7.
We also use the more compact notation 5x … x1 , y … y16 to refer to this region. The
joint cumulative distribution function of X and Y is defined as the probability of the
event 5X … x16 ¨ 5Y … y16:
FX,Y1x1 , y12 = P3X … x1 , Y … y14.
(5.8)
In terms of relative frequency, FX,Y1x1 , y12 represents the long-term proportion
of time in which the outcome of the random experiment yields a point X that falls in
the rectangular region shown in Fig. 5.7. In terms of probability “mass,” FX,Y1x1 , y12
represents the amount of mass contained in the rectangular region.
The joint cdf satisfies the following properties.
(i) The joint cdf is a nondecreasing function of x and y:
FX,Y1x1 , y12 … FX,Y1x2 , y22
(ii) FX,Y1x1 , - q 2 = 0,
if x1 … x2 and y1 … y2 ,
FX,Y1- q , y12 = 0,
FX,Y1 q , q 2 = 1.
(5.9a)
(5.9b)
(iii) We obtain the marginal cumulative distribution functions by removing the
constraint on one of the variables. The marginal cdf’s are the probabilities of
the regions shown in Fig. 5.8:
FX1x12 = FX,Y1x1 , q 2 and FY1y12 = FX,Y1 q , y12.
(5.9c)
(iv) The joint cdf is continuous from the “north” and from the “east,” that is,
lim FX,Y1x, y2 = FX,Y1a, y2 and
x : a+
lim FX,Y1x, y2 = FX,Y1x, b2.
y : b+
(5.9d)
(v) The probability of the rectangle 5x1 6 x … x2 , y1 6 y … y26 is given by:
P3x1 6 X … x2 , y1 6 Y … y24 =
FX,Y1x2 , y22 - FX,Y1x2 , y12 - FX,Y1x1 , y22 + FX,Y1x1 , y12.
(5.9e)
244
Chapter 5
Pairs of Random Variables
y
y
y1
x1
x
x
FX ( x1) ⫽ P[X x1, Y ]
FY ( y1) ⫽ P[X , Y y1]
FIGURE 5.8
The marginal cdf’s are the probabilities of these half-planes.
Property (i) follows by noting that the semi-infinite rectangle defined by 1x1 , y12 is
contained in that defined by 1x2 , y22 and applying Corollary 7. Properties (ii) to (iv)
are obtained by limiting arguments. For example, the sequence 5x … x1 and y … -n6
is decreasing and approaches the empty set , so
FX,Y1x1 , - q 2 = lim FX,Y1x1 , -n2 = P34 = 0.
n: q
For property (iii) we take the sequence 5x … x1 and y … n6 which increases to
5x … x16, so
lim FX,Y1x1 , n2 = P3X … x14 = FX1x12.
n: q
For property (v) note in Fig. 5.9(a) that B = 5x1 6 x … x2 , y … y16 = 5X … x2 ,
Y … y16 - 5X … x1 , Y … y16, so P3B4 = P3x1 6 X … x2 , Y … y14 = FX,Y1x2 , y12
- FX,Y1x1 , y12. In Fig. 5.9(b), note that FX,Y1x2 , y22 = P3A4 + P3B4 + FX,Y1x1 , y22.
Property (v) follows by solving for P[A] and substituting the expression for P[B].
y
y
x1
x1
x2
x2
x
x
(x2, y2)
(x1, y2)
(x1, y1)
y1
(x2, y1)
B
(a)
y2
y1
A
(x1, y1)
B
(b)
FIGURE 5.9
The joint cdf can be used to determine the probability of various events.
(x2, y1)
Section 5.3
The Joint cdf of x and y
245
y
9
16
15
16
1
1
2
7
8
15
16
1
4
1
2
9
16
2
1
x
0
0
1
2
FIGURE 5.10
Joint cdf for packet switch example.
Example 5.10
Plot the joint cdf of X and Y from Example 5.6. Find the marginal cdf of X.
To find the cdf of X, we identify the regions in the plane according to which points in SX,Y
are included in the rectangular region defined by (x, y). For example,
• The regions outside the first quadrant do not include any of the points, so FX,Y1x, y2 = 0.
• The region 50 … x 6 1, 0 … y 6 16 contains the point (0, 0), so FX,Y1x, y2 = 1/4.
Figure 5.10 shows the cdf after all possible regions are examined.
We need to consider several cases to find FX1x2. For x 6 0, we have FX1x2 = 0. For
0 … x 6 1, we have FX1x2 = FX,Y1x, q 2 = 9/16. For 1 … x 6 2, we have FX1x2 = FX,Y
1x, q 2 = 15/16. Finally, for x Ú 1, we have FX1x2 = FX,Y1x, q 2 = 1. Therefore FX(x) is a
staircase function and X is a discrete random variable with pX102 = 9/16, pX112 = 6/16, and
pX122 = 1/16.
Example 5.11
The joint cdf for the pair of random variables X = 1X, Y2 is given by
0
xy
FX,Y1x, y2 = e x
y
1
x
0
0
0
x
6
…
…
…
Ú
0 or y 6 0
x … 1, 0 … y … 1
x … 1, y 7 1
y … 1, x 7 1
1, y Ú 1.
(5.10)
Plot the joint cdf and find the marginal cdf of X.
Figure 5.11 shows a plot of the joint cdf of X and Y. FX,Y1x, y2 is continuous for all points
in the plane. FX,Y1x, y2 = 1 for all x Ú 1 and y Ú 1, which implies that X and Y each assume
values less than or equal to one.
246
Chapter 5
Pairs of Random Variables
1
0.9
0.8
0.7
0.6
0.5 f (x, y)
0.4
0.3
0.2
1.5
0.1
1
y
0
1.5
0.5
1
0.5
0
x
0
FIGURE 5.11
Joint cdf for two uniform random variables.
The marginal cdf of X is:
0
FX1x2 = FX,Y1x, q 2 = c x
1
x 6 0
0 … x … 1
x Ú 1.
X is uniformly distributed in the unit interval.
Example 5.12
The joint cdf for the vector of random variable X = 1X, Y2 is given by
FX,Y1x, y2 = b
11 - e -ax211 - e -by2
0
x Ú 0, y Ú 0
elsewhere.
Find the marginal cdf’s.
The marginal cdf’s are obtained by letting one of the variables approach infinity:
FX1x2 = lim FX,Y1x, y2 = 1 - e -ax x Ú 0
y: q
FY1y2 = lim FX,Y1x, y2 = 1 - e -by y Ú 0.
x: q
X and Y individually have exponential distributions with parameters a and b, respectively.
Section 5.3
The Joint cdf of x and y
247
Example 5.13
Find the probability of the events A = 5X … 1, Y … 16, B = 5X 7 x, Y 7 y6, where x 7 0
and y 7 0, and D = 51 6 X … 2, 2 6 Y … 56 in Example 5.12.
The probability of A is given directly by the cdf:
P3A4 = P3X … 1, Y … 14 = FX,Y11, 12 = 11 - e -a211 - e -b2.
The probability of B requires more work. By DeMorgan’s rule:
Bc = 15X 7 x6 ¨ 5Y 7 y62c = 5X … x6 ´ 5Y … y6.
Corollary 5 in Section 2.2 gives the probability of the union of two events:
P3Bc4 = P3X … x4 + P3Y … y4 - P3X … x, Y … y4
= 11 - e -ax2 + 11 - e -by2 - 11 - e -ax211 - e -by2
= 1 - e -axe -by.
Finally we obtain the probability of B:
P3B4 = 1 - P3Bc4 = e -axe -by.
You should sketch the region B on the plane and identify the events involved in the calculation
of the probability of Bc.
The probability of event D is found by applying property (vi) of the joint cdf:
P31 6 X … 2, 2 6 Y … 54
= FX,Y12, 52 - FX,Y12, 22 - FX,Y11, 52 + FX,Y11, 22
= 11 - e -2a211 - e -5b2 - 11 - e -2a211 - e -2b2
-11 - e -a211 - e -5b2 + 11 - e -a211 - e -2b2.
5.3.1
Random Variables That Differ in Type
In some problems it is necessary to work with joint random variables that differ in
type, that is, one is discrete and the other is continuous. Usually it is rather clumsy to
work with the joint cdf, and so it is preferable to work with either P[X = k, Y … y] or
P3X = k, y1 6 Y … y24. These probabilities are sufficient to compute the joint cdf
should we have to.
Example 5.14 Communication Channel with Discrete Input and Continuous Output
The input X to a communication channel is +1 volt or -1 volt with equal probability. The output
Y of the channel is the input plus a noise voltage N that is uniformly distributed in the interval
from -2 volts to +2 volts. Find P3X = +1, Y … 04.
This problem lends itself to the use of conditional probability:
P3X = +1, Y … y4 = P3Y … y ƒ X = +14P3X = +14,
248
Chapter 5
Pairs of Random Variables
where P3X = +14 = 1/2. When the input X = 1, the output Y is uniformly distributed in the
interval 3-1, 34; therefore
P3Y … y ƒ X = +14 =
y + 1
4
for -1 … y … 3.
Thus P3X = + 1, Y … 04 = P3Y … 0 ƒ X = +14P3X = +14 = 11/2211/42 = 1/8.
5.4
THE JOINT PDF OF TWO CONTINUOUS RANDOM VARIABLES
The joint cdf allows us to compute the probability of events that correspond to “rectangular” shapes in the plane. To compute the probability of events corresponding to regions
other than rectangles, we note that any reasonable shape (i.e., disk, polygon, or half-plane)
can be approximated by the union of disjoint infinitesimal rectangles, Bj,k . For example,
Fig. 5.12 shows how the events A = 5X + Y … 16 and B = 5X2 + X2 … 16 are
approximated by rectangles of infinitesimal width. The probability of such events can
therefore be approximated by the sum of the probabilities of infinitesimal rectangles, and
if the cdf is sufficiently smooth, the probability of each rectangle can be expressed in
terms of a density function:
P3B4 L a a P3Bj,k4 = b fX,Y1xj , yk2 ¢x¢y.
j
1xj, yk2HB
k
As ¢x and ¢y approach zero, the above equation becomes an integral of a probability
density function over the region B.
We say that the random variables X and Y are jointly continuous if the probabilities of events involving (X, Y) can be expressed as an integral of a probability density
function. In other words, there is a nonnegative function fX,Y1x, y2, called the joint
y
y
x
x
Bj,k
Bj,k
FIGURE 5.12
Some two-dimensional non-product form events.
Section 5.4
The Joint pdf of Two Continuous Random Variables
249
f (x, y)
y
dA
x
FIGURE 5.13
The probability of A is the integral of fX,Y1x, y2 over the region
defined by A.
probability density function, that is defined on the real plane such that for every event
B, a subset of the plane,
P3X in B4 =
LB L
fX,Y1x¿, y¿2 dx¿ dy¿,
(5.11)
as shown in Fig. 5.13. Note the similarity to Eq. (5.5) for discrete random variables.
When B is the entire plane, the integral must equal one:
q
q
(5.12)
fX,Y1x¿, y¿2 dx¿ dy¿.
L- q L- q
Equations (5.11) and (5.12) again suggest that the probability “mass” of an event is
found by integrating the density of probability mass over the region corresponding to
the event.
The joint cdf can be obtained in terms of the joint pdf of jointly continuous random variables by integrating over the semi-infinite rectangle defined by (x, y):
1 =
x
y
(5.13)
fX,Y1x¿, y¿2 dx¿ dy¿.
L- q L- q
It then follows that if X and Y are jointly continuous random variables, then the pdf
can be obtained from the cdf by differentiation:
FX,Y1x, y2 =
fX,Y1x, y2 =
0 2FX,Y1x, y2
0x 0y
.
(5.14)
250
Chapter 5
Pairs of Random Variables
Note that if X and Y are not jointly continuous, then it is possible that the above partial
derivative does not exist. In particular, if the FX,Y1x, y2 is discontinuous or if its partial derivatives are discontinuous, then the joint pdf as defined by Eq. (5.14) will not exist.
The probability of a rectangular region is obtained by letting B = 51x, y2: a1 6 x …
b1 and a2 6 y … b26 in Eq. (5.11):
P3a1 6 X … b1 , a2 6 Y … b24 =
b1
b2
La1 La2
fX,Y1x¿, y¿2 dx¿ dy¿.
(5.15)
It then follows that the probability of an infinitesimal rectangle is the product of the
pdf and the area of the rectangle:
P3x 6 X … x + dx, y 6 Y … y + dy4 =
Lx
x + dx
Ly
y + dy
fX,Y1x¿, y¿2 dx¿ dy¿
M fX,Y1x, y2 dx dy.
(5.16)
Equation (5.16) can be interpreted as stating that the joint pdf specifies the probability
of the product-form events
5x 6 X … x + dx6 ¨ 5y 6 Y … y + dy6.
The marginal pdf’s fX1x2 and fY1y2 are obtained by taking the derivative of the
corresponding marginal cdf’s, FX1x2 = FX,Y1x, q 2 and FY1y2 = FX,Y1 q , y2. Thus
fX1x2 =
q
=
q
x
d
fX,Y1x¿, y¿2 dy¿ r dx¿
b
dx L- q L- q
L- q
fX,Y1x,y¿2 dy¿.
(5.17a)
Similarly,
fY1y2 =
q
(5.17b)
fX,Y1x¿, y2 dx¿.
L- q
Thus the marginal pdf’s are obtained by integrating out the variables that are not of
interest.
Note that fX1x2 dx M P3x 6 X … x + dx, Y 6 q 4 is the probability of the
infinitesimal strip shown in Fig. 5.14(a). This reminds us of the interpretation of
the marginal pmf’s as the probabilities of columns and rows in the case of discrete
random variables. It is not surprising then that Eqs. (5.17a) and (5.17b) for the
marginal pdf’s and Eqs. (5.7a) and (5.7b) for the marginal pmf’s are identical
except for the fact that one contains an integral and the other a summation. As in
the case of pmf’s, we note that, in general, the joint pdf cannot be obtained from
the marginal pdf’s.
Section 5.4
The Joint pdf of Two Continuous Random Variables
251
y
y
y ⫹ dy
x
x ⫹ dx
y
x
x
fX(x)dx ⬵ P[x X x ⫹ dx, Y ]
fY(y)dy ⬵ P[X , y Y y ⫹ dy]
(a)
(b)
FIGURE 5.14
Interpretation of marginal pdf’s.
Example 5.15 Jointly Uniform Random Variables
A randomly selected point (X, Y) in the unit square has the uniform joint pdf given by
fX,Y1x, y2 = b
1
0
0 … x … 1 and 0 … y … 1
elsewhere.
The scattergram in Fig. 5.3(a) corresponds to this pair of random variables. Find the joint cdf of
X and Y.
The cdf is found by evaluating Eq. (5.13).You must be careful with the limits of the integral:
The limits should define the region consisting of the intersection of the semi-infinite rectangle
defined by (x, y) and the region where the pdf is nonzero.There are five cases in this problem, corresponding to the five regions shown in Fig. 5.15.
1.
If x 6 0 or y 6 0, the pdf is zero and Eq. (5.14) implies
FX,Y1x, y2 = 0.
2.
If (x, y) is inside the unit interval,
FX,Y1x, y2 =
3.
L0 L0
y
1 dx¿ dy¿ = xy.
If 0 … x … 1 and y 7 1,
FX,Y1x, y2 =
4.
x
x
L0 L0
1
1 dx¿ dy¿ = x.
Similarly, if x 7 1 and 0 … y … 1,
FX,Y1x, y2 = y.
252
Chapter 5
Pairs of Random Variables
y
III
V
II
IV
I
1
0
x
1
FIGURE 5.15
Regions that need to be considered separately in computing cdf
in Example 5.15.
5.
Finally, if x 7 1 and y 7 1,
FX,Y1x, y2 =
1
L0 L0
1
1 dx¿ dy¿ = 1.
We see that this is the joint cdf of Example 5.11.
Example 5.16
Find the normalization constant c and the marginal pdf’s for the following joint pdf:
fX,Y1x, y2 = b
ce -xe -y
0
0 … y … x 6 q
elsewhere.
The pdf is nonzero in the shaded region shown in Fig. 5.16(a). The constant c is found from
the normalization condition specified by Eq. (5.12):
q
1 =
L0 L0
x
ce -xe -y dy dx =
L0
q
ce -x11 - e -x2 dx =
c
.
2
Therefore c = 2. The marginal pdf’s are found by evaluating Eqs. (5.17a) and (5.17b):
fX1x2 =
L0
q
fX,Y1x, y2 dy =
L0
x
2e -xe -y dy = 2e -x11 - e -x2
0 … x 6 q
and
fY1y2 =
L0
q
fX,Y1x, y2 dx =
Ly
q
2e -xe -y dx = 2e -2y
0 … y 6 q.
You should fill in the steps in the evaluation of the integrals as well as verify that the marginal
pdf’s integrate to 1.
Section 5.4
The Joint pdf of Two Continuous Random Variables
y
253
y
x⫽y
1
2
x⫽y
x⫹y⫽1
x
x
1
2
(b)
(a)
FIGURE 5.16
The random variables X and Y in Examples 5.16 and 5.17 have a pdf that is nonzero only in the shaded
region shown in part (a).
Example 5.17
Find P3X + Y … 14 in Example 5.16.
Figure 5.16(b) shows the intersection of the event 5X + Y … 16 and the region where the
pdf is nonzero. We obtain the probability of the event by “adding” (actually integrating) infinitesimal rectangles of width dy as indicated in the figure:
.5
P3X + Y … 14 =
L0 Ly
1-y
2e -xe -y dx dy =
L0
.5
2e -y3e -y - e -11 - y24 dy
= 1 - 2e -1.
Example 5.18 Jointly Gaussian Random Variables
The joint pdf of X and Y, shown in Fig. 5.17, is
fX,Y1x, y2 =
1
2
2
2p21 - r
e -1x
- 2rxy + y22/211 - r22
- q 6 x, y 6 q .
(5.18)
We say that X and Y are jointly Gaussian.1 Find the marginal pdf’s.
The marginal pdf of X is found by integrating fX,Y1x, y2 over y:
fX1x2 =
1
e -x /211 - r 2
2
2
q
2L
-q
2p21 - r
2
e -1y
- 2rxy2/211 - r22
dy.
This is an important special case of jointly Gaussian random variables.The general case is discussed in Section 5.9.
254
Chapter 5
Pairs of Random Variables
fX,Y (x,y)
0.4
0.3
3
0.2
2
1
0.1
0
0
–3
-1
–2
–1
-2
0
1
2
3
-3
FIGURE 5.17
Joint pdf of two jointly Gaussian random variables.
We complete the square of the argument of the exponent by adding and subtracting r2x2, that is,
y2 - 2rxy + r2x2 - r2x2 = 1y - rx22 - r2x2. Therefore
fX1x2 =
e -x /211 - r 2
2
2
2p21 - r2 L- q
e -x /2
22p L- q
2
e -31y - rx2
q -1y - rx22/211 - r22
2
=
q
22p11 - r22
e
- r2x24/211 - r22
dy
dy
2
=
e -x /2
22p
,
where we have noted that the last integral equals one since its integrand is a Gaussian pdf with
mean rx and variance 1 - r2. The marginal pdf of X is therefore a one-dimensional Gaussian
pdf with mean 0 and variance 1. From the symmetry of fX,Y1x, y2 in x and y, we conclude that the
marginal pdf of Y is also a one-dimensional Gaussian pdf with zero mean and unit variance.
5.5
INDEPENDENCE OF TWO RANDOM VARIABLES
X and Y are independent random variables if any event A 1 defined in terms of X is independent of any event A 2 defined in terms of Y; that is,
P3X in A 1 , Y in A 24 = P3X in A 14P3Y in A 24.
(5.19)
In this section we present a simple set of conditions for determining when X and Y are
independent.
Suppose that X and Y are a pair of discrete random variables, and suppose we
are interested in the probability of the event A = A 1 ¨ A 2 , where A 1 involves only
X and A 2 involves only Y. In particular, if X and Y are independent, then A 1 and
A 2 are independent events. If we let A 1 = 5X = xj6 and A 2 = 5Y = yk6, then the
Section 5.5
Independence of Two Random Variables
255
independence of X and Y implies that
pX,Y1xj , yk2 = P3X = xj , Y = yk4
= P3X = xj4P3Y = yk4
= pX1xj2pY1yk2
for all xj and yk .
(5.20)
Therefore, if X and Y are independent discrete random variables, then the joint pmf is
equal to the product of the marginal pmf’s.
Now suppose that we don’t know if X and Y are independent, but we do know that
the pmf satisfies Eq. (5.20). Let A = A 1 ¨ A 2 be a product-form event as above, then
P3A4 =
a
a pX,Y1xj , yk2
xj in A1 yk in A2
=
a
a pX1xj2pY1yk2
xj in A1 yk in A2
=
a pX1xj2 a pY1yk2
xj in A1
yk in A2
= P3A 14P3A 24,
(5.21)
which implies that A 1 and A 2 are independent events. Therefore, if the joint pmf of X
and Y equals the product of the marginal pmf’s, then X and Y are independent. We have
just proved that the statement “X and Y are independent” is equivalent to the statement “the joint pmf is equal to the product of the marginal pmf’s.” In mathematical
language, we say, the “discrete random variables X and Y are independent if and only if
the joint pmf is equal to the product of the marginal pmf’s for all xj , yk .”
Example 5.19
Is the pmf in Example 5.6 consistent with an experiment that consists of the independent tosses
of two fair dice?
The probability of each face in a toss of a fair die is 1/6. If two fair dice are tossed and if the
tosses are independent, then the probability of any pair of faces, say j and k, is:
P3X = j, Y = k4 = P3X = j4P3Y = k4 =
1
.
36
Thus all possible pairs of outcomes should be equiprobable. This is not the case for the joint pmf
given in Example 5.6. Therefore the tosses in Example 5.6 are not independent.
Example 5.20
Are Q and R in Example 5.9 independent? From Example 5.9 we have
P3Q = q4P3R = r4 = 11 - pM21pM2q
= 11 - p2pMq + r
11 - p2
1 - pM
pr
256
Chapter 5
Pairs of Random Variables
= P3Q = q, R = r4
for all q = 0, 1, Á
r = 0, Á , M - 1.
Therefore Q and R are independent.
In general, it can be shown that the random variables X and Y are independent if
and only if their joint cdf is equal to the product of its marginal cdf’s:
FX,Y1x, y2 = FX1x2FY1y2
for all x and y.
(5.22)
Similarly, if X and Y are jointly continuous, then X and Y are independent if and
only if their joint pdf is equal to the product of the marginal pdf’s:
fX,Y1x, y2 = fX1x2fY1y2
for all x and y.
(5.23)
Equation (5.23) is obtained from Eq. (5.22) by differentiation. Conversely, Eq. (5.22) is
obtained from Eq. (5.23) by integration.
Example 5.21
Are the random variables X and Y in Example 5.16 independent?
Note that fX1x2 and fY1y2 are nonzero for all x 7 0 and all y 7 0. Hence fX1x2fY1y2 is
nonzero in the entire positive quadrant. However fX,Y1x, y2 is nonzero only in the region y 6 x
inside the positive quadrant. Hence Eq. (5.23) does not hold for all x, y and the random variables
are not independent. You should note that in this example the joint pdf appears to factor, but
nevertheless it is not the product of the marginal pdf’s.
Example 5.22
Are the random variables X and Y in Example 5.18 independent? The product of the marginal
pdf’s of X and Y in Example 5.18 is
fX1x2fY1y2 =
1 -1x2 + y22/2
e
2p
- q 6 x, y 6 q .
By comparing to Eq. (5.18) we see that the product of the marginals is equal to the joint pdf if
and only if r = 0. Therefore the jointly Gaussian random variables X and Y are independent if
and only if r = 0. We see in a later section that r is the correlation coefficient between X and Y.
Example 5.23
Are the random variables X and Y independent in Example 5.12? If we multiply the marginal
cdf’s found in Example 5.12 we find
FX1x2FY1y2 = 11 - e -ax211 - e -by2 = FX,Y1x, y2
for all x and y.
Therefore Eq. (5.22) is satisfied so X and Y are independent.
If X and Y are independent random variables, then the random variables defined
by any pair of functions g(X) and h(Y) are also independent. To show this, consider the
Section 5.6
Joint Moments and Expected Values of a Function of Two Random Variables
257
one-dimensional events A and B. Let A¿ be the set of all values of x such that if x is in
A¿ then g(x) is in A, and let B¿ be the set of all values of y such that if y is in B¿ then
h(y) is in B. (In Chapter 3 we called A¿ and B¿ the equivalent events of A and B.) Then
P3g1X2 in A, h1Y2 in B4 = P3X in A¿, Y in B¿4
= P3X in A¿4P3Y in B¿4
= P3g1X2 in A4P3h1Y2 in B4.
(5.24)
The first and third equalities follow from the fact that A and A¿ and B and B¿ are
equivalent events. The second equality follows from the independence of X and Y.
Thus g(X) and h(Y) are independent random variables.
5.6
JOINT MOMENTS AND EXPECTED VALUES OF A FUNCTION OF TWO RANDOM
VARIABLES
The expected value of X identifies the center of mass of the distribution of X. The
variance, which is defined as the expected value of 1X - m22, provides a measure of
the spread of the distribution. In the case of two random variables we are interested
in how X and Y vary together. In particular, we are interested in whether the variation of X and Y are correlated. For example, if X increases does Y tend to increase or
to decrease? The joint moments of X and Y, which are defined as expected values of
functions of X and Y, provide this information.
5.6.1
Expected Value of a Function of Two Random Variables
The problem of finding the expected value of a function of two or more random variables is similar to that of finding the expected value of a function of a single random
variable. It can be shown that the expected value of Z = g1X, Y2 can be found using
the following expressions:
q
E3Z4 = d
q
L- q L- q
g1x, y2fX,Y1x, y2 dx dy
X, Y jointly continuous
(5.25)
a a g1xi , yn2pX,Y1xi , yn2
i
X, Y discrete.
n
Example 5.24 Sum of Random Variables
Let Z = X + Y. Find E[Z].
E3Z4 = E3X + Y4
=
=
q
q
q
q
L- q L- q
L- q L- q
q
=
L- q
1x¿ + y¿2fX,Y1x¿, y¿2 dx¿ dy¿
x¿fX,Y1x¿, y¿2 dy¿ dx¿ +
x¿fX1x¿2 dx¿ +
q
L- q
q
q
L- q L- q
y¿ fX,Y1x¿, y¿2 dx¿ dy¿
y¿fY1y¿2 dy¿ = E3X4 + E3Y4.
(5.26)
258
Chapter 5
Pairs of Random Variables
Thus the expected value of the sum of two random variables is equal to the sum of the individual
expected values. Note that X and Y need not be independent.
The result in Example 5.24 and a simple induction argument show that the expected value of a sum of n random variables is equal to the sum of the expected values:
E3X1 + X2 + Á + Xn4 = E3X14 + Á + E3Xn4.
(5.27)
Note that the random variables do not have to be independent.
Example 5.25 Product of Functions of Independent Random Variables
Suppose that X and Y are independent random variables, and let g1X, Y2 = g11X2g21Y2. Find
E3g1X, Y24 = E3g11X2g21Y24.
E3g11X2g21Y24 =
q
q
L- q L- q
q
= b
L- q
g11x¿2g21y¿2fX1x¿2fY1y¿2 dx¿ dy¿
g11x¿2fX1x¿2 dx¿ r b
q
L- q
g21y¿2fY1y¿2 dy¿ r
= E3g11X24E3g21Y24.
5.6.2
Joint Moments, Correlation, and Covariance
The joint moments of two random variables X and Y summarize information about
their joint behavior. The jkth joint moment of X and Y is defined by
q
E3X Y 4 = d
j
k
q
L- q L- q
a
i
xjykfX,Y1x, y2 dx dy
j k
a xi ynpX,Y1xi ,
n
yn2
X, Y jointly continuous
(5.28)
X, Y discrete.
If j = 0, we obtain the moments of Y, and if k = 0, we obtain the moments of X. In
electrical engineering, it is customary to call the j = 1 k = 1 moment, E[XY], the
correlation of X and Y. If E3XY4 = 0, then we say that X and Y are orthogonal.
The jkth central moment of X and Y is defined as the joint moment of the centered random variables, X - E3X4 and Y - E3Y4:
E31X - E3X42j1Y - E3Y42k4.
Note that j = 2 k = 0 gives VAR(X) and j = 0 k = 2 gives VAR(Y).
The covariance of X and Y is defined as the j = k = 1 central moment:
COV1X, Y2 = E31X - E3X421Y - E3Y424.
The following form for COV(X, Y) is sometimes more convenient to work with:
COV1X, Y2 = E3XY - XE3Y4 - YE3X4 + E3X4E3Y44
(5.29)
Section 5.6
259
Joint Moments and Expected Values of a Function of Two Random Variables
= E3XY4 - 2E3X4E3Y4 + E3X4E3Y4
= E3XY4 - E3X4E3Y4.
(5.30)
Note that COV1X, Y2 = E3XY4 if either of the random variables has mean zero.
Example 5.26 Covariance of Independent Random Variables
Let X and Y be independent random variables. Find their covariance.
COV1X, Y2 = E31X - E3X421Y - E3Y424
= E3X - E3X44E3Y - E3Y44
= 0,
where the second equality follows from the fact that X and Y are independent, and the third
equality follows from E3X - E3X44 = E3X4 - E3X4 = 0. Therefore pairs of independent
random variables have covariance zero.
Let’s see how the covariance measures the correlation between X and Y.The covariance measures the deviation from mX = E3X4 and mY = E3Y4. If a positive value of
1X - mX2 tends to be accompanied by a positive values of 1Y - mY2, and negative
1X - mX2 tend to be accompanied by negative 1Y - mY2; then 1X - mX21Y - mY2
will tend to be a positive value, and its expected value, COV(X, Y), will be positive. This is
the case for the scattergram in Fig. 5.3(d) where the observed points tend to cluster along a
line of positive slope. On the other hand, if 1X - mX2 and 1Y - mY2 tend to have opposite signs, then COV(X, Y) will be negative. A scattergram for this case would have observation points cluster along a line of negative slope. Finally if 1X - mX2 and 1Y - mY2
sometimes have the same sign and sometimes have opposite signs, then COV(X, Y) will be
close to zero. The three scattergrams in Figs. 5.3(a), (b), and (c) fall into this category.
Multiplying either X or Y by a large number will increase the covariance, so we
need to normalize the covariance to measure the correlation in an absolute scale. The
correlation coefficient of X and Y is defined by
rX,Y =
COV1X, Y2
sXsY
=
E3XY4 - E3X4E3Y4
sXsY
,
(5.31)
where sX = 2VAR1X2 and sY = 2VAR1Y2 are the standard deviations of X and
Y, respectively.
The correlation coefficient is a number that is at most 1 in magnitude:
-1 … rX,Y … 1.
(5.32)
To show Eq. (5.32), we begin with an inequality that results from the fact that the
expected value of the square of a random variable is nonnegative:
0 … Eb ¢
X - E3X4
sX
;
Y - E3Y4
sY
2
≤ r
260
Chapter 5
Pairs of Random Variables
= 1 ; 2rX,Y + 1
= 211 ; rX,Y2.
The last equation implies Eq. (5.32).
The extreme values of rX,Y are achieved when X and Y are related linearly,
Y = aX + b; rX,Y = 1 if a 7 0 and rX,Y = -1 if a 6 0. In Section 6.5 we show that
rX,Y can be viewed as a statistical measure of the extent to which Y can be predicted by
a linear function of X.
X and Y are said to be uncorrelated if rX,Y = 0. If X and Y are independent, then
COV1X, Y2 = 0, so rX,Y = 0. Thus if X and Y are independent, then X and Y are uncorrelated. In Example 5.22, we saw that if X and Y are jointly Gaussian and rX,Y = 0,
then X and Y are independent random variables. Example 5.27 shows that this is not always true for non-Gaussian random variables: It is possible for X and Y to be uncorrelated but not independent.
Example 5.27 Uncorrelated but Dependent Random Variables
Let ® be uniformly distributed in the interval 10, 2p2. Let
X = cos ®
and
Y = sin ®.
The point (X, Y) then corresponds to the point on the unit circle specified by the angle ®, as shown
in Fig. 5.18. In Example 4.36, we saw that the marginal pdf’s of X and Y are arcsine pdf’s, which are
nonzero in the interval 1-1, 12. The product of the marginals is nonzero in the square defined by
-1 … x … 1 and -1 … y … 1, so if X and Y were independent the point (X, Y) would assume all
values in this square. This is not the case, so X and Y are dependent.
We now show that X and Y are uncorrelated:
E3XY4 = E3sin ® cos ®4 =
=
1
4p L0
1
2p L0
2p
sin f cos f df
2p
sin 2f df = 0.
Since E3X4 = E3Y4 = 0, Eq. (5.30) then implies that X and Y are uncorrelated.
Example 5.28
Let X and Y be the random variables discussed in Example 5.16. Find E[XY], COV(X, Y), and
rX,Y .
Equations (5.30) and (5.31) require that we find the mean, variance, and correlation of
X and Y. From the marginal pdf’s of X and Y obtained in Example 5.16, we find that
E3X4 = 3/2 and VAR3X4 = 5/4, and that E3Y4 = 1/2 and VAR3Y4 = 1/4. The correlation of
X and Y is
q
E3XY4 =
=
L0 L0
L0
q
x
xy2e -xe -y dy dx
2xe -x11 - e -x - xe -x2 dx = 1.
Section 5.7
Conditional Probability and Conditional Expectation
261
y
1
(cos θ, sin θ)
θ
⫺1
x
1
⫺1
FIGURE 5.18
(X, Y) is a point selected at random on the unit circle. X and Y
are uncorrelated but not independent.
Thus the correlation coefficient is given by
1 rX,Y =
5.7
31
22
5 1
A 4A 4
=
1
25
.
CONDITIONAL PROBABILITY AND CONDITIONAL EXPECTATION
Many random variables of practical interest are not independent:The output Y of a communication channel must depend on the input X in order to convey information; consecutive samples of a waveform that varies slowly are likely to be close in value and hence
are not independent. In this section we are interested in computing the probability of
events concerning the random variable Y given that we know X = x. We are also interested in the expected value of Y given X = x. We show that the notions of conditional
probability and conditional expectation are extremely useful tools in solving problems,
even in situations where we are only concerned with one of the random variables.
5.7.1
Conditional Probability
The definition of conditional probability in Section 2.4 allows us to compute the probability that Y is in A given that we know that X = x:
P3Y in A ƒ X = x4 =
P3Y in A, X = x4
P3X = x4
for P3X = x4 7 0.
(5.33)
262
Chapter 5
Pairs of Random Variables
Case 1: X Is a Discrete Random Variable
For X and Y discrete random variables, the conditional pmf of Y given X ⴝ x is defined by:
pY1y ƒ x2 = P3Y = y ƒ X = x4 =
P3X = x, Y = y4
P3X = x4
=
pX,Y1x, y2
pX1x2
(5.34)
for x such that P3X = x4 7 0. We define pY1y ƒ x2 = 0 for x such that P3X = x4 = 0.
Note that pY1y ƒ x2 is a function of y over the real line, and that pY1y ƒ x2 7 0 only for
y in a discrete set 5y1 , y2 , Á 6.
The conditional pmf satisfies all the properties of a pmf, that is, it assigns nonnegative values to every y and these values add to 1. Note from Eq. (5.34) that
pY1y ƒ xk2 is simply the cross section of pX,Y1xk ,y2 along the X = xk column in Fig. 5.6,
but normalized by the probability pX1xk2.
The probability of an event A given X = xk is found by adding the pmf values of
the outcomes in A:
P3Y in A ƒ X = xk4 = a p Y1yj ƒ xk2.
(5.35)
yj in A
If X and Y are independent, then using Eq (5.20)
pY1yj ƒ xk2 =
P3X = xk , Y = yj4
P3X = xk4
= P3Y = yj4 = pY1yj2.
(5.36)
In other words, knowledge that X = xk does not affect the probability of events A
involving Y.
Equation (5.34) implies that the joint pmf pX,Y1x, y2 can be expressed as the
product of a conditional pmf and a marginal pmf:
pX,Y1xk , yj2 = pY1yj ƒ xk2pX1xk2 and pX,Y1xk , yj2 = pX1xk ƒ yj2pY1yj2. (5.37)
This expression is very useful when we can view the pair (X, Y) as being generated sequentially, e.g., first X, and then Y given X = x. We find the probability that Y is in A as follows:
P3Y in A4 = a a pX,Y1xk , yj2
all xk yj in A
= a a pY1yj ƒ xk2pX1xk2
all xk yj in A
= a pX1xk2 a pY1yj ƒ xk2
all xk
yj in A
= a P3Y in A ƒ X = xk4pX1xk2.
(5.38)
all xk
Equation (5.38) is simply a restatement of the theorem on total probability discussed
in Chapter 2. In other words, to compute P[Y in A] we can first compute
P3Y in A ƒ X = xk4 and then “average” over Xk .
Section 5.7
263
Conditional Probability and Conditional Expectation
Example 5.29 Loaded Dice
Find pY1y ƒ 52 in the loaded dice experiment considered in Examples 5.6 and 5.8.
In Example 5.8 we found that pX152 = 1/6. Therefore:
pY1y ƒ 52 =
pX,Y15, y2
pX152
and so pY15 ƒ 52 = 2/7 and
pY11 ƒ 52 = pY12 ƒ 52 = pY13 ƒ 52 = pY14 ƒ 52 = pY16 ƒ 52 = 1/7.
Clearly this die is loaded.
Example 5.30 Number of Defects in a Region; Random Splitting of Poisson Counts
The total number of defects X on a chip is a Poisson random variable with mean a. Each defect
has a probability p of falling in a specific region R and the location of each defect is independent
of the locations of other defects. Find the pmf of the number of defects Y that fall in the region R.
We can imagine performing a Bernoulli trial each time a defect occurs with a “success”
occurring when the defect falls in the region R. If the total number of defects is X = k, then Y
is a binomial random variable with parameters k and p:
0
pY1j ƒ k2 = c a k bpj11 - p2k - j
j
j 7 k
0 … j … k.
From Eq. (5.38) and noting that k Ú j, we have
k!
ak
pY1j2 = a pY1j ƒ k2pX1k2 = a
pj11 - p2k - j e-a
k!
k = j j!1k - j2!
k=0
q
q
=
1ap2je-a
j!
1ap2 e
k=j
j -a
=
j!
a
q
511 - p2a6k - j
1k - j2!
e11 - p2a =
1ap2j
j!
e-ap.
Thus Y is a Poisson random variable with mean ap.
Suppose Y is a continuous random variable. Eq. (5.33) can be used to define the
conditional cdf of Y given X ⴝ xk:
FY1y ƒ xk2 =
P3Y … y, X = xk4
P3X = xk4
,
for P3X = xk4 7 0.
(5.39)
It is easy to show that FY1y ƒ xk2 satisfies all the properties of a cdf. The conditional pdf
of Y given X ⴝ xk, if the derivative exists, is given by
fY1y ƒ xk2 =
d
F 1y ƒ xk2.
dy Y
(5.40)
264
Chapter 5
Pairs of Random Variables
If X and Y are independent, P3Y … y, X = Xk4 = P3Y … y4P3X = Xk4 so FY1y ƒ x2 =
FY1y2 and fY1y ƒ x2 = fY1y2. The probability of event A given X = xk is obtained by
integrating the conditional pdf:
P3Y in A ƒ X = xk4 =
fY1y ƒ xk2 dy.
Ly in A
(5.41)
We obtain P[Y in A] using Eq. (5.38).
Example 5.31 Binary Communications System
The input X to a communication channel assumes the values +1 or -1 with probabilities 1/3 and
2/3. The output Y of the channel is given by Y = X + N, where N is a zero-mean, unit variance
Gaussian random variable. Find the conditional pdf of Y given X = +1, and given X = -1.
Find P3X = +1 ƒ Y 7 04.
The conditional cdf of Y given X = +1 is:
FY1y ƒ +12 = P3Y … y ƒ X = +14 = P3N + 1 … y4
y-1
= P3N … y - 14 =
L
-q
1
22p
2
e -x /2 dx
where we noted that if X = +1, then Y = N + 1 and Y depends only on N. Thus, if X = +1,
then Y is a Gaussian random variable with mean 1 and unit variance. Similarly, if X = -1, then
Y is Gaussian with mean -1 and unit variance.
The probabilities that Y 7 0 given X = +1 and X = -1 is:
q
P3Y 7 0 ƒ X = +14 =
L0 22p
q
P3Y 7 0 ƒ X = -14 =
1
L
0
22p
1
q
2
e -1x - 12 /2 dx =
L-1 22p
q
2
e -1x + 12 /2 dx =
1
L1 22p
1
2
e -t /2 dt = 1 - Q112 = 0.841.
2
e -t /2 dt = Q112 = 0.159.
Applying Eq. (5.38), we obtain:
P3Y 7 04 = P3Y 7 0 ƒ X = +14
1
2
+ P3Y 7 0 ƒ X = -14 = 0.386.
3
3
From Bayes’ theorem we find:
P3X = +1 ƒ Y 7 04 =
P3Y 7 0 ƒ X = +14P3X = +14
P3Y 7 04
=
11 - Q1122/3
11 + Q1122/3
= 0.726.
We conclude that if Y 7 0, then X = +1 is more likely than X = -1. Therefore the receiver
should decide that the input is X = +1 when it observes Y 7 0.
In the previous example, we made an interesting step that is worth elaborating on
because it comes up quite frequently: P3Y … y ƒ X = +14 = P3N + 1 … y4, where
Y = X + N. Let’s take a closer look:
Section 5.7
P3Y … z ƒ X = x4 =
Conditional Probability and Conditional Expectation
P35X + N … z6 ¨ 5X = x64
P3X = x4
=
265
P35x + N … z6 ¨ 5X = x64
P3X = x4
= P3x + N … z ƒ X = x4 = P3N … z - x ƒ X = x4.
In the first line, the events 5X + N … z6 and 5x + N … z6 are quite different. The
first involves the two random variables X and N, whereas the second only involves N
and consequently is much simpler. We can then apply an expression such as Eq. (5.38)
to obtain P3Y … z4. The step we made in the example, however, is even more interesting. Since X and N are independent random variables, we can take the expression one
step further:
P3Y … z ƒ X = x4 = P3N … z - x ƒ X = x4 = P3N … z - x4.
The independence of X and N allows us to dispense with the conditioning on x altogether!
Case 2: X Is a Continuous Random Variable
If X is a continuous random variable, then P3X = x4 = 0 so Eq. (5.33) is undefined
for all x. If X and Y have a joint pdf that is continuous and nonzero over some region
of the plane, we define the conditional cdf of Y given X ⴝ x by the following limiting
procedure:
FY1y ƒ x2 = lim FY1y ƒ x 6 X … x + h2.
(5.42)
h:0
The conditional cdf on the right side of Eq. (5.42) is:
FY1y ƒ x 6 X … x + h2 =
y
=
L- q Lx
P3Y … y, x 6 X … x + h4
P3x 6 X … x + h4
x+h
fX,Y1x¿, y¿2 dx¿ dy¿
Lx
x+h
fX1x¿2 dx¿
y
=
L- q
fX,Y1x, y¿2 dy¿h
fX1x2h
.
(5.43)
As we let h approach zero, Eqs. (5.42) and (5.43) imply that
y
FY1y ƒ x2 =
L- q
fX,Y1x, y¿2 dy¿
fX1x2
.
(5.44)
The conditional pdf of Y given X ⴝ x is then:
fY1y ƒ x2 =
fX,Y1x, y2
d
FY1y ƒ x2 =
.
dy
fX1x2
(5.45)
266
Chapter 5
Pairs of Random Variables
fX,Y (x,y)
y
y dy
y
x
fy(y x)dy
x dx
x
fXY (x,y)dxdy
fx(x)dx
FIGURE 5.19
Interpretation of conditional pdf.
It is easy to show that fY1y ƒ x2 satisfies the properties of a pdf.We can interpret fY1y ƒ x2 dy
as the probability that Y is in the infinitesimal strip defined by 1y, y + dy2 given that X
is in the infinitesimal strip defined by 1x, x + dx2, as shown in Fig. 5.19.
The probability of event A given X = x is obtained as follows:
P3Y in A ƒ X = x4 =
fY1y ƒ x2 dy.
Ly in A
(5.46)
There is a strong resemblance between Eq. (5.34) for the discrete case and Eq. (5.45)
for the continuous case. Indeed many of the same properties hold. For example, we
obtain the multiplication rule from Eq. (5.45):
fX,Y1x, y2 = fY1y ƒ x2fX1x2 and fX,Y1x, y2 = fX1x ƒ y2fY1y2.
(5.47)
If X and Y are independent, then fX,Y1x, y2 = fX1x2fY1y2 and fY1y ƒ x2 = fY1y2,
fX1x ƒ y2 = fX1x2, FY1y ƒ x2 = FY1y2, and FX1x ƒ y2 = FX1x2.
By combining Eqs. (5.46) and (5.47), we can show that:
q
P3Y in A4 =
L- q
P3Y in A ƒ X = x4fX1x2 dx.
(5.48)
You can think of Eq. (5.48) as the “continuous” version of the theorem on total probability. The following examples show the usefulness of the above results in calculating the
probabilities of complicated events.
Section 5.7
Conditional Probability and Conditional Expectation
267
Example 5.32
Let X and Y be the random variables in Example 5.8. Find fX1x ƒ y2 and fY1y ƒ x2.
Using the marginal pdf’s obtained in Example 5.8, we have
fX1y ƒ x2 =
2e -xe -y
= e -1x - y2
2e -2y
for x Ú y
fY1y ƒ x2 =
e -y
2e -xe -y
-x =
2e 11 - e 2
1 - e -x
for 0 6 y 6 x.
-x
The conditional pdf of X is an exponential pdf shifted by y to the right. The conditional pdf of Y
is an exponential pdf that has been truncated to the interval [0, x].
Example 5.33 Number of Arrivals During a Customer’s Service Time
The number N of customers that arrive at a service station during a time t is a Poisson random
variable with parameter bt. The time T required to service each customer is an exponential random variable with parameter a. Find the pmf for the number N that arrive during the service
time T of a specific customer. Assume that the customer arrivals are independent of the
customer service time.
Equation (5.48) holds even if Y is a discrete random variable, thus
P3N = k4 =
L0
q
q
P3N = k ƒ T = t4fT1t2 dt
1bt2k
=
L0
=
ab k
tke-1a + b2t dt.
k! L0
k!
e-btae-at dt
q
Let r = 1a + b2t, then
P3N = k4 =
=
ab k
k!1a + b2k + 1 L0
ab k
1a + b2
k+1
= a
q
rke -r dr
k
b
a
ba
b ,
1a + b2 1a + b2
where we have used the fact that the last integral is a gamma function and is equal to k!. Thus N
is a geometric random variable with probability of “success” a/1a + b2. Each time a customer
arrives we can imagine that a new Bernoulli trial begins where “success” occurs if the customer’s
service time is completed before the next arrival.
Example 5.34
X is selected at random from the unit interval; Y is then selected at random from the interval(0, X). Find the cdf of Y.
268
Chapter 5
Pairs of Random Variables
When X = x, Y is uniformly distributed in (0, x) so the conditional cdf given X = x is
P3Y … y ƒ X = k4 = b
0 … y … x
x 6 y.
y/x
1
Equation (5.48) and the above conditional cdf yield:
FY1y2 = P3Y … y4 =
=
L0
L0
1
P3Y … y ƒ X = x4fX1x2 dx =
1
y
1 dx¿ +
y
dx¿ = y - y ln y.
Ly x¿
The corresponding pdf is obtained by taking the derivative of the cdf:
fY1y2 = - ln y 0 … y … 1.
Example 5.35 Maximum A Posteriori Receiver
For the communications system in Example 5.31, find the probability that the input was X = +1
given that the output of the channel is Y = y.
This is a tricky version of Bayes’ rule. Condition on the event 5y 6 Y … y + ¢6 instead
of 5Y = y6:
P3X = +1 ƒ y 6 Y 6 y + ¢4 =
=
P3y 6 Y 6 y + ¢4
fY1y ƒ +12¢11/32
fY1y ƒ +12¢11/32 + fY1y ƒ -12¢12/32
1
=
P3y 6 Y 6 y + ¢ ƒ X = +14P3X = +14
22p
1
22p
e-1y - 12 /211/32 +
2
e
=
e
e-1y - 12 /211/32
2
1
22p
-1y - 122/2
-1y - 122/2
+ 2e
-1y + 122/2
=
e-1y + 12 /212/32
2
1
.
1 + 2e-2y
The above expression is equal to 1/2 when yT = 0.3466. For y 7 yT , X = +1 is more likely, and
for y 6 yT , X = -1 is more likely. A receiver that selects the input X that is more likely given
Y = y is called a maximum a posteriori receiver.
5.7.2
Conditional Expectation
The conditional expectation of Y given X ⴝ x is defined by
q
E3Y ƒ x4 =
L- q
yfY1y ƒ x2 dy.
(5.49a)
Section 5.7
Conditional Probability and Conditional Expectation
269
In the special case where X and Y are both discrete random variables we have:
E3Y ƒ xk4 = a yjpY1yj ƒ xk2.
(5.49b)
yj
Clearly, E3Y ƒ x4 is simply the center of mass associated with the conditional pdf or pmf.
The conditional expectation E3Y ƒ x4 can be viewed as defining a function of x:
g1x2 = E3Y ƒ x4. It therefore makes sense to talk about the random variable g1X2 =
E3Y ƒ X4. We can imagine that a random experiment is performed and a value for
X is obtained, say X = x0 , and then the value g1x02 = E3Y ƒ x04 is produced.We are interested in E3g1X24 = E3E3Y ƒ X44. In particular, we now show that
E3Y4 = E3E3Y ƒ X44,
(5.50)
where the right-hand side is
q
E3E3Y ƒ X44 =
L- q
E3Y ƒ x4fX1x2 dx
E3E3Y ƒ X44 = a E3Y ƒ xk4pX1xk2
X continuous
(5.51a)
X discrete.
(5.51b)
xk
We prove Eq. (5.50) for the case where X and Y are jointly continuous random
variables, then
q
E3E3Y ƒ X44 =
L- q
E3Y ƒ x4fX1x2 dx
q
q
=
L- q L- q
q
=
q
L- q L- q
y
q
=
yfY1y ƒ x2 dy fX1x2 dx
L- q
fX,Y1x, y2 dx dy
yfY1y2 dy = E3Y4.
The above result also holds for the expected value of a function of Y:
E3h1Y24 = E3E3h1Y2 ƒ X44.
In particular, the kth moment of Y is given by
E3Yk4 = E3E3Yk ƒ X44.
Example 5.36 Average Number of Defects in a Region
Find the mean of Y in Example 5.30 using conditional expectation.
E3Y4 = a E3Y ƒ X = k4P3X = k4 = a kpP3X = k4 = pE3X4 = pa.
q
q
k=0
k=0
270
Chapter 5
Pairs of Random Variables
The second equality uses the fact that E3Y ƒ X = k4 = kp since Y is binomial with parameters k and p. Note that the second to the last equality holds for any pmf of X. The fact that X
is Poisson with mean a is not used until the last equality.
Example 5.37 Binary Communications Channel
Find the mean of the output Y in the communications channel in Example 5.31.
Since Y is a Gaussian random variable with mean +1 when X = +1, and -1 when
X = -1, the conditional expected values of Y given X are:
E3Y ƒ +14 = 1
and E3Y ƒ -14 = -1.
Equation (5.38b) implies
E3Y4 = a E3Y ƒ X = k4P3X = k4 = + 111/32 - 112/32 = -1/3.
q
k=0
The mean is negative because the X = -1 inputs occur twice as often as X = +1.
Example 5.38 Average Number of Arrivals in a Service Time
Find the mean and variance of the number of customer arrivals N during the service time T of a
specific customer in Example (5.33).
N is a Poisson random variable with parameter bt when T = t is given, so the first two
conditional moments are:
E3N ƒ T = t4 = bt
E3N 2 ƒ T = t4 = 1bt2 + 1bt22.
The first two moments of N are obtained from Eq. (5.50):
L0
E3N4 =
E3N 24 =
L0
q
E3N ƒ T = t4fT1t2 dt =
q
E3N 2 ƒ T = t4fT1t2 dt =
L0
q
L0
btfT1t2 dt = bE3T4
q
5bt + b 2t26fT1t2 dt
= bE3T4 + b 2E3T24.
The variance of N is then
VAR3N4 = E3N 24 - 1E3N422
= b 2E3T24 + bE3T4 - b 21E3T422
= b 2 VAR3T4 + bE3T4.
Note that if T is not random (i.e., E3T4 = constant and VAR3T4 = 0) then the mean and
variance of N are those of a Poisson random variable with parameter bE3T4. When T is random,
the mean of N remains the same but the variance of N increases by the term b 2 VAR3T4, that is,
the variability of T causes greater variability in N. Up to this point, we have intentionally avoided using the fact that T has an exponential distribution to emphasize that the above results hold
Section 5.8
Functions of Two Random Variables
271
for any service time distribution fT1t2. If T is exponential with parameter a, then E3T4 = 1/a and
VAR3T4 = 1/a2, so
E3N4 =
5.8
b
a
VAR3N4 =
and
b2
a2
+
b
.
a
FUNCTIONS OF TWO RANDOM VARIABLES
Quite often we are interested in one or more functions of the random variables associated with some experiment. For example, if we make repeated measurements of the same
random quantity, we might be interested in the maximum and minimum value in the set,
as well as the sample mean and sample variance. In this section we present methods of
determining the probabilities of events involving functions of two random variables.
5.8.1
One Function of Two Random Variables
Let the random variable Z be defined as a function of two random variables:
Z = g1X, Y2.
(5.52)
The cdf of Z is found by first finding the equivalent event of 5Z … z6, that is, the set
Rz = 5x = 1x, y2 such that g1x2 … z6, then
Fz1z2 = P3X in Rz4 =
O
1x, y2HRz
fX,Y1x¿, y¿2 dx¿ dy¿.
(5.53)
The pdf of Z is then found by taking the derivative of Fz1z2.
Example 5.39 Sum of Two Random Variables
Let Z = X + Y. Find FZ1z2 and fZ1z2 in terms of the joint pdf of X and Y.
The cdf of Z is found by integrating the joint pdf of X and Y over the region of the plane
corresponding to the event 5Z … z6, as shown in Fig. 5.20.
y
y ⫽ ⫺x ⫹ z
x
FIGURE 5.20
P3Z … z4 = P3X + Y … z4.
272
Chapter 5
Pairs of Random Variables
FZ1z2 =
q
z - x¿
L- q L- q
fX,Y1x¿, y¿2 dy¿ dx¿.
The pdf of Z is
fZ1z2 =
q
d
FZ1z2 =
fX,Y1x¿, z - x¿2 dx¿.
dz
L- q
(5.54)
Thus the pdf for the sum of two random variables is given by a superposition integral.
If X and Y are independent random variables, then by Eq. (5.23) the pdf is given by the
convolution integral of the marginal pdf’s of X and Y:
q
fZ1z2 =
fX1x¿2fY1z - x¿2 dx¿.
(5.55)
L- q
In Chapter 7 we show how transform methods are used to evaluate convolution integrals such as
Eq. (5.55).
Example 5.40 Sum of Nonindependent Gaussian Random Variables
Find the pdf of the sum Z = X + Y of two zero-mean, unit-variance Gaussian random variables with correlation coefficient r = -1/2.
The joint pdf for this pair of random variables was given in Example 5.18. The pdf of Z is
obtained by substituting the pdf for the joint Gaussian random variables into the superposition
integral found in Example 5.39:
fZ1z2 =
q
L- q
fX,Y1x¿, z - x¿2 dx¿
q
=
1
2
2
2
e -3x¿ - 2rx¿1z - x¿2 + 1z - x¿2 4/211 - r 2 dx¿
2p11 - r221/2 L- q
=
1
2
2
e -1x¿ - x¿z + z 2/213/42 dx¿.
2p13/421/2 L- q
q
After completing the square of the argument in the exponent we obtain
fZ1z2 =
2
e -z /2
22p
.
Thus the sum of these two nonindependent Gaussian random variables is also a zero-mean, unitvariance Gaussian random variable.
Example 5.41 A System with Standby Redundancy
A system with standby redundancy has a single key component in operation and a duplicate of
that component in standby mode. When the first component fails, the second component is put
into operation. Find the pdf of the lifetime of the standby system if the components have independent exponentially distributed lifetimes with the same mean.
Let T1 and T2 be the lifetimes of the two components, then the system lifetime is
T = T1 + T2 , and the pdf of T is given by Eq. (5.55). The terms in the integrand are
Section 5.8
fT11x2 = b
fT21z - x2 = b
le-lx
0
Functions of Two Random Variables
273
x Ú 0
x 6 0
le-l1z - x2
0
z - x Ú 0
x 7 z.
Note that the first equation sets the lower limit of integration to 0 and the second equation sets
the upper limit to z. Equation (5.55) becomes
fT1z2 =
L0
z
le-lxle-l1z - x2 dx
= l2e-lz
L0
z
dx = l2ze-lz.
Thus T is an Erlang random variable with parameter m = 2.
The conditional pdf can be used to find the pdf of a function of several random
variables. Let Z = g1X, Y2, and suppose we are given that Y = y, then Z = g1X, y2
is a function of one random variable. Therefore we can use the methods developed in
Section 4.5 for single random variables to find the pdf of Z given Y = y: fZ1z ƒ Y = y2.
The pdf of Z is then found from
q
fZ1z2 =
L- q
fZ1z ƒ y¿2fY1y¿2 dy¿.
Example 5.42
Let Z = X/Y. Find the pdf of Z if X and Y are independent and both exponentially distributed
with mean one.
Assume Y = y, then Z = X/y is simply a scaled version of X. Therefore from Example
4.31
fZ1z ƒ y2 = ƒ y ƒ fX1yz ƒ y2.
The pdf of Z is therefore
fZ1z2 =
q
L- q
ƒ y¿ ƒ fX1y¿z ƒ y¿2fY1y¿2 dy¿ =
q
L- q
ƒ y¿ ƒ fX,Y1y¿z, y¿2 dy¿.
We now use the fact that X and Y are independent and exponentially distributed with mean one:
fZ1z2 =
L0
q
y¿fX1y¿z2fY1y¿2 dy¿
q
=
L0
=
1
11 + z22
y¿e-y¿ze-y¿ dy¿
z 7 0.
z 7 0
274
5.8.2
Chapter 5
Pairs of Random Variables
Transformations of Two Random Variables
Let X and Y be random variables associated with some experiment, and let the random
variables Z1 and Z2 be defined by two functions of X = 1X, Y2:
Z1 = g11X2
Z2 = g21X2.
and
We now consider the problem of finding the joint cdf and pdf of Z1 and Z2 .
The joint cdf of Z1 and Z2 at the point z = 1z1 , z22 is equal to the probability of
the region of x where gk1x2 … zk for k = 1, 2:
Fz1, z21z1 , z22 = P3g11X2 … z1 , g21X2 … z24.
(5.56a)
If X, Y have a joint pdf, then
Fz1, z21z1 , z22 =
fX,Y1x¿, y¿2 dx¿ dy¿.
O
(5.56b)
x¿: gk1x¿2 … zk
Example 5.43
Let the random variables W and Z be defined by
W = min1X, Y2
and
Z = max1X, Y2.
Find the joint cdf of W and Z in terms of the joint cdf of X and Y.
Equation (5.56a) implies that
FW, Z1w z2 = P35min1X, Y2 … w6 ¨ 5max1X, Y2 … z64.
The region corresponding to this event is shown in Fig. 5.21. From the figure it is clear that if
z 7 w, the above probability is the probability of the semi-infinite rectangle defined by the
(z, z)
A
(w, w)
FIGURE 5.21
5min1X, Y2 … w = 5X … w6 ´ 5Y … w6 and
5max1X, Y2 … z = 5X … z6 ¨ 5Y … z6.
Section 5.8
Functions of Two Random Variables
275
point (z, z) minus the square region denoted by A. Thus if z 7 w,
FW, Z1w, z2 = FX,Y1z, z2 - P3A4
= FX,Y1z, z2
- 5FX,Y1z, z2 - FX,Y1w, z2 - FX,Y1z, w2 + FX,Y1w, w26
= FX,Y1w, z2 + FX,Y1z, w2 - FX,Y1w, w2.
If z 6 w then
FW,Z1w, z2 = FX,Y1z, z2.
Example 5.44 Radius and Angle of Independent Gaussian Random Variables
Let X and Y be zero-mean, unit-variance independent Gaussian random variables. Find the joint
cdf and pdf of R and ®, the radius and angle of the point (X, Y):
R = 1X2 + Y221/2
® = tan-1 1Y/X2.
The joint cdf of R and ® is:
FR, ®1r0 , u02 = P3R … r0 , ® … u04 =
where
O
e -1x + y 2/2
dx dy
2p
2
1x, y2HR1r0, u02
2
R1r0, u02 = 51x, y2: 2x2 + y2 … r0 , 0 6 tan-11Y/X2 … u06.
The region Rr0,u0 is the pie-shaped region in Fig. 5.22. We change variables from Cartesian to
polar coordinates to obtain:
FR,® 1r0 , u02 = P3R … r0 , ® … u04 =
=
r0
u0 -r2/2
e
r dr du
L0 L0 2p
u0
2
A 1 - e -r0/2 B , 0 6 u0 6 2p 0 6 r0 6 q .
2p
y
r0
θ0
x
FIGURE 5.22
Region of integration Rr0, u0 in Example 5.44.
(5.57)
276
Chapter 5
Pairs of Random Variables
R and ® are independent random variables, where R has a Rayleigh distribution and ® is
uniformly distributed in 10, 2p2. The joint pdf is obtained by taking partial derivatives with
respect to r and u:
fR,®1r, u2 =
=
02 u
2
11 - e -r /22
0r0u 2p
1
2
A re -r /2 B ,
2p
0 6 u 6 2p 0 6 r 6 q .
This transformation maps every point in the plane from Cartesian coordinates to polar
coordinates. We can also go backwards from polar to Cartesian coordinates. First we generate independent Rayleigh R and uniform ® random variables. We then transform R and ® into Cartesian coordinates to obtain an independent pair of zero-mean, unit-variance Gaussians. Neat!
5.8.3
pdf of Linear Transformations
The joint pdf of Z can be found directly in terms of the joint pdf of X by finding the
equivalent events of infinitesimal rectangles. We consider the linear transformation of
two random variables:
V = aX + bY
W = cX + eY
or
B
V
a
R = B
W
c
b X
R B R.
e
Y
Denote the above matrix by A. We will assume that A has an inverse, that is, it has determinant ƒ ae - bc ƒ Z 0, so each point (v, w) has a unique corresponding point (x, y)
obtained from
x
y
v
w
B R = A-1 B R .
(5.58)
Consider the infinitesimal rectangle shown in Fig. 5.23. The points in this rectangle are
mapped into the parallelogram shown in the figure. The infinitesimal rectangle and the
parallelogram are equivalent events, so their probabilities must be equal. Thus
fX,Y1x, y2dx dy M fV, W1v, w2 dP
where dP is the area of the parallelogram. The joint pdf of V and W is thus given by
fV, W1v, w2 =
fX,Y1x, y2
dP
`
`
dx dy
,
(5.59)
where x and y are related to 1v, w2 by Eq. (5.58). Equation (5.59) states that the joint
pdf of V and W at 1v, w2 is the pdf of X and Y at the corresponding point (x, y), but
rescaled by the “stretch factor” dP/dx dy. It can be shown that dP = 1 ƒ ae - bc ƒ 2 dx dy,
so the “stretch factor” is
`
ƒ ae - bc ƒ 1dx dy2
dP
= ƒ ae - bc ƒ = ƒ A ƒ ,
` =
dx dy
1dx dy2
Section 5.8
277
Functions of Two Random Variables
w
y
(v ⫹ adx ⫹ bdy, w ⫹ cdx ⫹ edy)
(x, y ⫹ dy)
(v ⫹ bdy, w ⫹ edy)
(x ⫹ dx, y ⫹ dy)
(v ⫹ adx, w ⫹ cdx)
(x, y)
(v, w)
(x ⫹ dx, y)
v
x
v ⫽ ax ⫹ by
w ⫽ cx ⫹ ey
FIGURE 5.23
Image of an infinitesimal rectangle under a linear transformation.
where ƒ A ƒ is the determinant of A.
The above result can be written compactly using matrix notation. Let the vector
Z be
Z = AX,
where A is an n * n invertible matrix. The joint pdf of Z is then
fz1z2 =
fx1A-1z2.
ƒAƒ
(5.60)
Example 5.45 Linear Transformation of Jointly Gaussian Random Variables
Let X and Y be the jointly Gaussian random variables introduced in Example 5.18. Let V and W
be obtained from (X, Y) by
B
V
1
1
R =
B
W
-1
22
1 X
X
R B R = AB R.
1 Y
Y
Find the joint pdf of V and W.
The determinant of the matrix is ƒ A ƒ = 1, and the inverse mapping is given by
B
1 1
X
R =
B
Y
22 1
-1
V
R B R,
1 W
so X = 1V - W2/22 and Y = 1V + W2/22. Therefore the pdf of V and W is
fV, W1v, w2 = fX,Y ¢
v - w v + w
,
≤,
22
22
278
Chapter 5
Pairs of Random Variables
where
fX,Y1x, y2 =
1
2
2
2p21 - r
e -1x
- 2rxy + y22/211 - r22
.
By substituting for x and y, the argument of the exponent becomes
1v - w22/2 - 2r1v - w21v + w2/2 + 1v + w22/2
211 - r22
=
v2
w2
+
.
211 + r2
211 - r2
Thus
fV,W1v, w2 =
1
2
2
e -53v /211 + r24 + 3w /211 - r246.
2p11 - r221/2
It can be seen that the transformed variables V and W are independent, zero-mean Gaussian random variables with variance 1 + r and 1 - r, respectively. Figure 5.24 shows contours of
equal value of the joint pdf of (X, Y). It can be seen that the pdf has elliptical symmetry about
the origin with principal axes at 45° with respect to the axes of the plane. In Section 5.9 we show
that the above linear transformation corresponds to a rotation of the coordinate system so that
the axes of the plane are aligned with the axes of the ellipse.
5.9
PAIRS OF JOINTLY GAUSSIAN RANDOM VARIABLES
The jointly Gaussian random variables appear in numerous applications in electrical
engineering. They are frequently used to model signals in signal processing applications,
and they are the most important model used in communication systems that involve
dealing with signals in the presence of noise. They also play a central role in many statistical methods.
The random variables X and Y are said to be jointly Gaussian if their joint pdf
has the form
fX, Y1x, y2 =
exp b
x - m1 2
x - m1 y - m2
y - m2 2
-1
2r
+
B¢
≤
¢
≤
¢
≤
¢
≤ Rr
X,Y
s1
s1
s2
s2
211 - r2X,Y2
2ps1s2 21 - r2X,Y
(5.61a)
for - q 6 x 6 q and - q 6 y 6 q .
The pdf is centered at the point 1m1 , m22, and it has a bell shape that depends on
the values of s1 , s2 , and rX,Y as shown in Fig. 5.25. As shown in the figure, the pdf is
constant for values x and y for which the argument of the exponent is constant:
B¢
x - m1 2
x - m1 y - m2
y - m2 2
≤ - 2rX,Y ¢
≤¢
≤ + ¢
≤ R = constant.
s1
s1
s2
s2
(5.61b)
Section 5.9
Pairs of Jointly Gaussian Random Variables
279
v
y
x
w
FIGURE 5.24
Contours of equal value of joint Gaussian pdf
discussed in Example 5.45.
(b)
(a)
FIGURE 5.25
Jointly Gaussian pdf (a) r = 0 (b) r = – 0.9.
Figure 5.26 shows the orientation of these elliptical contours for various values of s1 , s2 ,
and rX,Y . When rX,Y = 0, that is, when X and Y are independent, the equal-pdf contour
is an ellipse with principal axes aligned with the x- and y-axes. When rX,Y Z 0, the major
axis of the ellipse is oriented along the angle [Edwards and Penney, pp. 570–571]
u =
1
2
arctan-1 tan ¢
2rX,Ys1s2
s21 - s22
Note that the angle is 45° when the variances are equal.
≤.
(5.62)
Chapter 5
Pairs of Random Variables
y
y
)
, m2
(m 1
,m
)
2
m1
(
θ
π
4
x
σ1 σ2
0θ
π
4
x
(a)
π
θ⫽
4
σ1 ⫽ σ2
(b)
1, m
2)
y
(m
280
θ
x
π
π
θ
4
2
σ1 σ2
(c)
FIGURE 5.26
Orientation of contours of equal value of joint Gaussian pdf for rX,Y 7 0.
The marginal pdf of X is found by integrating fX,Y1x, y2 over all y. The integration is carried out by completing the square in the exponent as was done in Example
5.18. The result is that the marginal pdf of X is
fX1x2 =
e -1x - m12 /2s 1
2
22ps1
2
(5.63)
,
that is, X is a Gaussian random variable with mean m1 and variance s21 . Similarly, the
marginal pdf for Y is found to be Gaussian with pdf mean m2 and variance s22 .
The conditional pdf’s fX1x ƒ y2 and fY1y ƒ x2 give us information about the interrelation between X and Y. The conditional pdf of X given Y = y is
fX1x ƒ y2 =
fX,Y1x, y2
fY1y2
exp b
=
2
s1
-1
1y
m
2
m
x
r
B
R
r
2
1
X,Y
s2
211 - r2X,Y2s21
22ps2111 - r2X,Y2
.
(5.64)
Section 5.9
Pairs of Jointly Gaussian Random Variables
281
Equation (5.64) shows that the conditional pdf of X given Y = y is also Gaussian but with
conditional mean m1 + rX,Y1s1/s221y - m22 and conditional variance s2111 - r2X,Y2.
Note that when rX,Y = 0, the conditional pdf of X given Y = y equals the marginal pdf
of X.This is consistent with the fact that X and Y are independent when rX,Y = 0. On the
other hand, as ƒ rX,Y ƒ : 1 the variance of X about the conditional mean approaches zero,
so the conditional pdf approaches a delta function at the conditional mean. Thus when
ƒ rX,Y ƒ = 1, the conditional variance is zero and X is equal to the conditional mean with
probability one.We note that similarly fY1y ƒ x2 is Gaussian with conditional mean m2 + rX,Y
2
1s2/s121x - m12 and conditional variance s2211 -rX,Y
2.
We now show that the rX,Y in Eq. (5.61a) is indeed the correlation coefficient
between X and Y. The covariance between X and Y is defined by
COV1X, Y2 = E31X - m121Y - m224
= E3E31X - m121Y - m22 ƒ Y44.
Now the conditional expectation of 1X - m121Y - m22 given Y = y is
E31X - m121Y - m22 ƒ Y = y4 = 1y - m22E3X - m1 ƒ Y = y4
= 1y - m221E3X ƒ Y = y4 - m12
= 1y - m22 ¢ rX,Y
s1
1y - m22 ≤ ,
s2
where we have used the fact that the conditional mean of X given Y = y is
m1 + rX,Y1s1/s221y - m22. Therefore
E31X - m121Y - m22 ƒ Y4 = rX,Y
s1
1Y - m222
s2
and
COV1X, Y2 = E3E31X - m121Y - m22 ƒ Y44 = rX,Y
= rX,Ys1s2 .
s1
E31Y - m2224
s2
The above equation is consistent with the definition of the correlation coefficient,
rX,Y = COV1X, Y2/s1s2 . Thus the rX,Y in Eq. (5.61a) is indeed the correlation coefficient between X and Y.
Example 5.46
The amount of yearly rainfall in city 1 and in city 2 is modeled by a pair of jointly Gaussian random variables, X and Y, with pdf given by Eq. (5.61a). Find the most likely value of X given that we know Y = y.
The most likely value of X given Y = y is the value of x for which fX1x ƒ y2 is maximum. The
conditional pdf of X given Y = y is given by Eq. (5.64), which is maximum at the conditional mean
E3X ƒ y4 = m1 + rX,Y
s1
1y - m22.
s2
Note that this “maximum likelihood” estimate is a linear function of the observation y.
282
Chapter 5
Pairs of Random Variables
Example 5.47 Estimation of Signal in Noise
Let Y = X + N where X (the “signal”) and N (the “noise’) are independent zero-mean Gaussian
random variables with different variances. Find the correlation coefficient between the observed
signal Y and the desired signal X. Find the value of x that maximizes fX1x ƒ y2.
The mean and variance of Y and the covariance of X and Y are:
E3Y4 = E3X4 + E3N4 = 0
s2Y = E3Y24 = E31X + N224 = E3X2 + 2XN + N 24 = E3X24 + E3N 24 = sX2 + sN2 .
COV1X, Y2 = E31X - E3X421E1Y - E3Y424 = E3XY4 = E3X1X + N24 = sX2 .
Therefore, the correlation coefficient is:
rX,Y =
COV1X, Y2
=
sXsY
sX
sX
=
=
sY
1s2X + s2N21/2
1
2
sN
¢1 +
2
sX
≤
1/2
.
2
2
2
Note that rX,Y
= sX
/sY2 = 1 - sN
/sY2 .
To find the joint pdf of X and Y consider the following linear transformation:
X = X
Y = X + N
X = X
N = -X + Y.
which has inverse
From Eq. (5.52) we have:
fX,Y1x, y2 =
fX, N1x, y2
det A
2
=
`
2
2
=
x = x, n = y - x
2
2
2
2
e -x /2sX e -n /2sN
22psX 22psN
`
x = x, n = y - x
2
e -x /2sX e -1y - x2 /2sN .
22psX 22psN
The conditional pdf of the signal X given the observation Y is then:
fX1x ƒ y2 =
=
fX,Y1x, y2
fY1y2
2
22psX
2
22psN
2
expe - 21
22psNsX/sY
1
2
2 Ax
- rX,Y
2sX
- A sX2
2
21 - rX,Y
sX
2 B
+ sX
2
sX
2
2
e -y /2sY
expe - 21 A A sxX B 2 + A y s-N x B 2 - A syY B 2 B f
expe - 211
=
e -x /2sX e -1y - x2 /2sN 22psY
2
=
Ax -
sX2
2
22psNsX/sY
=
yB2 f
s2Y
2 2
sXsN
sY
yB2 f
.
This pdf has its maximum value, when the argument of the exponent is zero, that is,
x = ¢
s2X
s2X + s2N
1
2
≤ y = £ 1 + sN ≥y.
2
sX
Section 5.9
Pairs of Jointly Gaussian Random Variables
283
y
w
v
θ
x
FIGURE 5.27
A rotation of the coordinate system transforms a pair of
dependent Gaussian random variables into a pair of independent
Gaussian random variables.
The signal-to-noise ratio (SNR) is defined as the ratio of the variance of X and the variance of N.
At high SNRs this estimator gives x L y, and at very low signal-to-noise ratios, it gives x L 0.
Example 5.48 Rotation of Jointly Gaussian Random Variables
The ellipse corresponding to an arbitrary two-dimensional Gaussian vector forms an angle
u =
2rs1s2
1
arctan ¢ 2
≤
2
s1 - s22
relative to the x-axis. Suppose we define a new coordinate system whose axes are aligned with those
of the ellipse as shown in Fig. 5.27. This is accomplished by using the following rotation matrix:
B
V
cos u
R = B
W
-sin u
sin u X
R B R.
cos u Y
To show that the new random variables are independent it suffices to show that they have
covariance zero:
COV1V, W2 = E31V - E3V421W - E3W424
= E351X - m12cos u + 1Y - m22sin u6
* 5-1X - m12sin u + 1Y - m22 cos u64
= -s21 sin u cos u + COV1X, Y2cos2 u
-COV1X, Y2sin2 u + s22 sin u cos u
=
=
1s22 - s212sin 2u + 2 COV1X, Y2cos 2u
2
cos
2u31s22
-
s212
tan 2u + 2 COV1X, Y24
2
.
284
Chapter 5
Pairs of Random Variables
If we let the angle of rotation u be such that
tan 2u =
2 COV1X, Y2
s21 - s22
then the covariance of V and W is zero as required.
*5.10
,
GENERATING INDEPENDENT GAUSSIAN RANDOM VARIABLES
We now present a method for generating unit-variance, uncorrelated (and hence independent) jointly Gaussian random variables. Suppose that X and Y are two independent zero-mean, unit-variance jointly Gaussian random variables with pdf:
fX,Y1x, y2 =
1 -1x2 + y22/2
e
.
2p
In Example 5.44 we saw that the transformation
R = 2X2 + Y2
and
® = tan-1 Y/X
leads to the pair of independent random variables
fR,®1r, u2 =
1 -r2/2
re
= fR1r2f®1u2,
2p
where R is a Rayleigh random variable and ® is a uniform random variable. The above
transformation is invertible. Therefore we can also start with independent Rayleigh
and uniform random variables and produce zero-mean, unit-variance independent
Gaussian random variables through the transformation:
X = R cos ®
and Y = R sin ®.
(5.65)
Consider W = R2 where R is a Rayleigh random variable. From Example 5.41
we then have that: W has pdf
fR11w2
1we -1w2/2
1
= e -w/2.
2 1w
2
21w
W = R2 has an exponential distribution with l = 1/2.
Therefore we can generate R2 by generating an exponential random variable
with parameter 1/2, and we can generate ® by generating a random variable that is
uniformly distributed in the interval 10, 2p2. If we substitute these random variables
into Eq. (5.65), we then obtain a pair of independent zero-mean, unit-variance Gaussian random variables. The above discussion thus leads to the following algorithm:
fW1w2 =
=
1. Generate U1 and U2 , two independent random variables uniformly distributed in
the unit interval.
2. Let R2 = -2 log U1 and ® = 2pU2 .
3. Let X = R cos ® = 1-2 log U121/2 cos 2pU2 and Y = R sin ® = 1-2 log U121/2
sin 2pU2 .
Section 5.10 Generating Independent Gaussian Random Variables
285
Then X and Y are independent, zero-mean, unit-variance Gaussian random variables. By repeating the above procedure we can generate any number of such random variables.
Example 5.49
Use Octave or MATLAB to generate 1000 independent zero-mean, unit-variance Gaussian random variables. Compare a histogram of the observed values with the pdf of a zero-mean unitvariance random variable.
The Octave commands below show the steps for generating the Gaussian random variables. A set of histogram range values K from -4 to 4 is created and used to build a normalized
histogram Z. The points in Z are then plotted and compared to the value predicted to fall in
each interval by the Gaussian pdf. These plots are shown in Fig. 5.28, which shows excellent
agreement.
> U1=rand(1000,1);
% Create a 1000-element vector U1 (step 1).
> U2=rand(1000,1);
% Create a 1000-element vector U2 (step 1).
> R2=-2*log(U1);
% Find R 2 (step 2).
> TH=2*pi*U2;
% Find u (step 2).
> X=sqrt(R2).*sin(TH);
% Generate X (step 3).
0.1
0.08
0.06
0.04
0.02
0
⫺3 ⫺2.5 ⫺2 ⫺1.5 ⫺1 ⫺0.5 0
FIGURE 5.28
Histogram of 1000 observations of a Gaussian random variable.
0.5
1
1.5
2
2.5
3
286
Chapter 5
Pairs of Random Variables
4
3
2
1
0
–1
–2
–3
–4
–4
–3
–2
–1
0
1
2
3
4
FIGURE 5.29
Scattergram of 5000 pairs of jointly Gaussian random variables.
> Y=sqrt(R2).*cos(TH);
% Generate Y (step 3).
> K=-4:.2:4;
% Create histogram range values K.
> Z=hist(X,K)/1000
% Create normalized histogram Z based on K.
> bar(K,Z)
% Plot Z.
> hold on
> stem(K,.2*normal_pdf(K,0,1))
% Compare to values predicted by pdf.
We also plotted the X values vs. the Y values for 5000 pairs of generated random variables
in a scattergram as shown in Fig. 5.29. Good agreement with the circular symmetry of the jointly
Gaussian pdf of zero-mean, unit-variance pairs is observed.
In the next chapter we will show how to generate a vector of jointly Gaussian random
variables with an arbitrary covariance matrix.
SUMMARY
• The joint statistical behavior of a pair of random variables X and Y is specified
by the joint cumulative distribution function, the joint probability mass function, or the joint probability density function. The probability of any event involving the joint behavior of these random variables can be computed from
these functions.
Annotated References
287
• The statistical behavior of individual random variables from X is specified by the
marginal cdf, marginal pdf, or marginal pmf that can be obtained from the joint
cdf, joint pdf, or joint pmf of X.
• Two random variables are independent if the probability of a product-form event
is equal to the product of the probabilities of the component events. Equivalent
conditions for the independence of a set of random variables are that the joint
cdf, joint pdf, or joint pmf factors into the product of the corresponding marginal
functions.
• The covariance and the correlation coefficient of two random variables are measures of the linear dependence between the random variables.
• If X and Y are independent, then X and Y are uncorrelated, but not vice versa. If
X and Y are jointly Gaussian and uncorrelated, then they are independent.
• The statistical behavior of X, given the exact values of X or Y, is specified by the
conditional cdf, conditional pmf, or conditional pdf. Many problems lend themselves to a solution that involves conditioning on the value of one of the random
variables. In these problems, the expected value of random variables can be obtained by conditional expectation.
• The joint pdf of a pair of jointly Gaussian random variables is determined by the
means, variances, and covariance. All marginal pdf’s and conditional pdf’s are
also Gaussian pdf’s.
• Independent Gaussian random variables can be generated by a transformation of
uniform random variables.
CHECKLIST OF IMPORTANT TERMS
Central moments of X and Y
Conditional cdf
Conditional expectation
Conditional pdf
Conditional pmf
Correlation of X and Y
Covariance X and Y
Independent random variables
Joint cdf
Joint moments of X and Y
Joint pdf
Joint pmf
Jointly continuous random variables
Jointly Gaussian random variables
Linear transformation
Marginal cdf
Marginal pdf
Marginal pmf
Orthogonal random variables
Product-form event
Uncorrelated random variables
ANNOTATED REFERENCES
Papoulis [1] is the standard reference for electrical engineers for the material on random variables. References [2] and [3] present many interesting examples involving
multiple random variables. The book by Jayant and Noll [4] gives numerous applications of probability concepts to the digital coding of waveforms.
1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes,
McGraw-Hill, New York, 2002.
288
Chapter 5
Pairs of Random Variables
2. L. Breiman, Probability and Stochastic Processes, Houghton Mifflin, Boston,
1969.
3. H. J. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences, vol. 1,
Wiley, New York, 1979.
4. N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, Englewood
Cliffs, N.J., 1984.
5. N. Johnson et al., Continuous Multivariate Distributions, Wiley, New York, 2000.
6. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory
for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986.
7. H. Anton, Elementary Linear Algebra, 9th ed., Wiley, New York, 2005.
8. C. H. Edwards, Jr., and D. E. Penney, Calculus and Analytic Geometry, 4th ed.,
Prentice Hall, Englewood Cliffs, N.J., 1994.
PROBLEMS
Section 5.1: Two Random Variables
5.1. Let X be the maximum and let Y be the minimum of the number of heads obtained when
Carlos and Michael each flip a fair coin twice.
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the probabilities for all values of (X, Y).
(c) Find P3X = Y4.
(d) Repeat parts b and c if Carlos uses a biased coin with P3heads4 = 3/4.
5.2. Let X be the difference and let Y be the sum of the number of heads obtained when Carlos and Michael each flip a fair coin twice.
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the probabilities for all values of (X, Y).
(c) Find P3X + Y = 14, P3X + Y = 24.
5.3. The input X to a communication channel is “ -1”or “1”, with respective probabilities 1/4
and 3/4. The output of the channel Y is equal to: the corresponding input X with probability 1 - p - pe ; -X with probability p; 0 with probability pe .
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the probabilities for all values of (X, Y).
(c) Find P3X Z Y4, P3Y = 04.
5.4. (a) Specify the range of the pair 1N1 , N22 in Example 5.2.
(b) Specify and sketch the event “more revenue comes from type 1 requests than type 2
requests.”
5.5. (a) Specify the range of the pair (Q, R) in Example 5.3.
(b) Specify and sketch the event “last packet is more than half full.”
5.6. Let the pair of random variables H and W be the height and weight in Example 5.1.
The body mass index is a measure of body fat and is defined by BMI = W/H 2 where
W is in kilograms and H is in meters. Determine and sketch on the plane the
following events: A = 5“obese,” BMI Ú 306; B = 5“overweight,” 25 … BMI 6 306;
C = 5“normal,” 18.5 … BMI 6 256; and D = 5“underweight,” BMI 6 18.56.
Problems
289
5.7. Let (X, Y) be the two-dimensional noise signal in Example 5.4. Specify and sketch the
events:
(a) “Maximum noise magnitude is greater than 5.”
(b) “The noise power X2 + Y2 is greater than 4.”
(c) “The noise power X2 + Y2 is greater than 4 and less than 9.”
5.8. For the pair of random variables (X, Y) sketch the region of the plane corresponding to
the following events. Identify which events are of product form.
(a) 5X + Y 7 36.
(b) 5eX 7 Ye36.
(c) 5min1X, Y2 7 06 ´ 5max5X, Y2 6 06.
(d) 5 ƒ X - Y ƒ Ú 16.
(e) 5 ƒ X/Y ƒ 7 26.
(f) 5X/Y 6 26.
(g) 5X3 7 Y6.
(h) 5XY 6 06.
(i) 5max1 ƒ X ƒ , Y2 6 36.
Section 5.2: Pairs of Discrete Random Variables
5.9. (a)
(b)
(c)
5.10. (a)
(b)
(c)
5.11. (a)
Find and sketch pX,Y1x, y2 in Problem 5.1 when using a fair coin.
Find pX1x2 and pY1y2.
Repeat parts a and b if Carlos uses a biased coin with P3heads4 = 3/4.
Find and sketch pX,Y1x, y2 in Problem 5.2 when using a fair coin.
Find pX1x2 and pY1y2.
Repeat parts a and b if Carlos uses a biased coin with P3heads4 = 3/4.
Find the marginal pmf’s for the pairs of random variables with the indicated joint
pmf.
(i)
X/Y
-1
0
1
-1
1/6
0
1/6
(ii)
0
1/6
0
1/6
1
0
1/3
0
X/Y -1
-1 1/9
0
1/9
1
1/9
(iii)
0
1/9
1/9
1/9
1
1/9
1/9
1/9
X/Y -1
-1 1/3
0
0
1
0
0
0
1/3
0
1
0
0
1/3
(b) Find the probability of the events A = 5X 7 06, B = 5X Ú Y6, and C =
5X = -Y6 for the above joint pmf’s.
5.12. A modem transmits a two-dimensional signal (X, Y) given by:
X = r cos12p®/82 and Y = r sin12p®/82
where ® is a discrete uniform random variable in the set 50, 1, 2, Á , 76.
(a) Show the mapping from S to SXY , the range of the pair (X, Y).
(b) Find the joint pmf of X and Y.
(c) Find the marginal pmf of X and of Y.
(d) Find the probability of the following events: A = 5X = 06, B = 5Y … r> 226,
C = 5X Ú r> 22, Y Ú r> 226, D = 5X 6 -r> 226.
290
Chapter 5
Pairs of Random Variables
5.13. Let N1 be the number of Web page requests arriving at a server in a 100-ms period and let
N2 be the number of Web page requests arriving at a server in the next 100-ms period.
Assume that in a 1-ms interval either zero or one page request takes place with respective probabilities 1 - p = 0.95 and p = 0.05, and that the requests in different 1-ms intervals are independent of each other.
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the joint pmf of X and Y.
(c) Find the marginal pmf for X and for Y.
(d) Find the probability of the events A = 5X Ú Y6, B = 5X = Y = 06, C = 5X 7 5,
Y 7 36.
(e) Find the probability of the event D = 5X + Y = 106.
5.14. Let N1 be the number of Web page requests arriving at a server in the period (0, 100) ms
and let N2 be the total combined number of Web page requests arriving at a server in the
period (0, 200) ms. Assume arrivals occur as in Problem 5.13.
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the joint pmf of N1 and N2 .
(c) Find the marginal pmf for N1 and N2 .
(d) Find the probability of the events A = 5N1 6 N26, B = 5N2 = 06, C = 5N1 7 5,
N2 7 36, D = 5 ƒ N2 - 2N1 ƒ 6 26.
5.15. At even time instants, a robot moves either + ¢ cm or - ¢ cm in the x-direction according
to the outcome of a coin flip; at odd time instants, a robot moves similarly according to
another coin flip in the y-direction. Assuming that the robot begins at the origin, let X
and Y be the coordinates of the location of the robot after 2n time instants.
(a) Describe the underlying space S of this random experiment and show the mapping
from S to SXY , the range of the pair (X, Y).
(b) Find the marginal pmf of the coordinates X and Y.
(c) Find the probability that the robot is within distance 22 of the origin after 2n time
instants.
Section 5.3: The Joint cdf of x and y
5.16. (a) Sketch the joint cdf for the pair (X, Y) in Problem 5.1 and verify that the properties of
the joint cdf are satisfied. You may find it helpful to first divide the plane into regions
where the cdf is constant.
(b) Find the marginal cdf of X and of Y.
5.17. A point 1X, Y2 is selected at random inside a triangle defined by 51x, y2 : 0 … y … x … 16.
Assume the point is equally likely to fall anywhere in the triangle.
(a) Find the joint cdf of X and Y.
(b) Find the marginal cdf of X and of Y.
(c) Find the probabilities of the following events in terms of the joint cdf:
A = 5X … 1/2, Y … 3/46; B = 51/4 6 X … 3/4 , 1/4 6 Y … 3/46.
5.18. A dart is equally likely to land at any point 1X1 , X22 inside a circular target of unit radius.
Let R and ® be the radius and angle of the point 1X1 , X22.
(a) Find the joint cdf of R and ®.
(b) Find the marginal cdf of R and ®.
Problems
291
(c) Use the joint cdf to find the probability that the point is in the first quadrant of the
real plane and that the radius is greater than 0.5.
5.19. Find an expression for the probability of the events in Problem 5.8 parts c, h, and i in
terms of the joint cdf of X and Y.
5.20. The pair (X, Y) has joint cdf given by:
FX,Y1x, y2 = b
11 - 1/x2211 - 1/y22
0
for x 7 1, y 7 1
elsewhere.
(a) Sketch the joint cdf.
(b) Find the marginal cdf of X and of Y.
(c) Find the probability of the following events: 5X 6 3, Y … 56, 5X 7 4, Y 7 36.
5.21. Is the following a valid cdf? Why?
FX,Y1x, y2 = b
11 - 1/x2y22
0
for x 7 1, y 7 1
elsewhere.
5.22. Let FX1x2 and FY1y2 be valid one-dimensional cdf’s. Show that FX,Y1x, y2 = FX1x2FY1y2
satisfies the properties of a two-dimensional cdf.
5.23. The number of users logged onto a system N and the time T until the next user logs off
have joint probability given by:
P3N = n, X … t4 = 11 - r2rn - 111 - e -nlt2
for n = 1, 2, Á
t 7 0.
(a) Sketch the above joint probability.
(b) Find the marginal pmf of N.
(c) Find the marginal cdf of X.
(d) Find P3N … 3, X 7 3/l4.
5.24. A factory has n machines of a certain type. Let p be the probability that a machine is
working on any given day, and let N be the total number of machines working on a certain day. The time T required to manufacture an item is an exponentially distributed random variable with rate ka if k machines are working. Find and P3T … t4. Find P3T … t4
as t : q and explain the result.
Section 5.4: The Joint pdf of Two Continuous Random Variables
5.25. The amplitudes of two signals X and Y have joint pdf:
fX,Y1x, y2 = e -x/2ye -y
2
for x 7 0, y 7 0.
(a) Find the joint cdf.
(b) Find P3X1/2 7 Y4.
(c) Find the marginal pdfs.
5.26. Let X and Y have joint pdf:
fX,Y1x, y2 = k1x + y2
(a)
(b)
(c)
(d)
for 0 … x … 1, 0 … y … 1.
Find k.
Find the joint cdf of (X, Y).
Find the marginal pdf of X and of Y.
Find P3X 6 Y4, P3Y 6 X24, P3X + Y 7 0.54.
292
Chapter 5
Pairs of Random Variables
5.27. Let X and Y have joint pdf:
fX,Y1x, y2 = kx11 - x2y
for 0 6 x 6 1, 0 6 y 6 1.
(a) Find k.
(b) Find the joint cdf of (X, Y).
(c) Find the marginal pdf of X and of Y.
(d) Find P3Y 6 X1/24, P3X 6 Y4.
5.28. The random vector (X, Y) is uniformly distributed (i.e., f1x, y2 = k) in the regions shown
in Fig. P5.1 and zero elsewhere.
(i)
(ii)
y
1
(iii)
y
1
1
x
y
1
1
x
1
x
FIGURE P5.1
(a) Find the value of k in each case.
(b) Find the marginal pdf for X and for Y in each case.
(c) Find P3X 7 0, Y 7 04.
5.29. (a) Find the joint cdf for the vector random variable introduced in Example 5.16.
(b) Use the result of part a to find the marginal cdf of X and of Y.
5.30. Let X and Y have the joint pdf:
fX,Y1x, y2 = ye -y11 + x2
5.31.
5.32.
5.33.
5.34.
for x 7 0, y 7 0.
Find the marginal pdf of X and of Y.
Let X and Y be the pair of random variables in Problem 5.17.
(a) Find the joint pdf of X and Y.
(b) Find the marginal pdf of X and of Y.
(c) Find P3Y 6 X24.
Let R and ® be the pair of random variables in Problem 5.18.
(a) Find the joint pdf of R and ®.
(b) Find the marginal pdf of R and of ®.
Let (X, Y) be the jointly Gaussian random variables discussed in Example 5.18. Find
P3X2 + Y2 7 r24 when r = 0. Hint: Use polar coordinates to compute the integral.
The general form of the joint pdf for two jointly Gaussian random variables is given by
Eq. (5.61a). Show that X and Y have marginal pdfs that correspond to Gaussian random
variables with means m1 and m2 and variances s21 and s22 respectively.
Problems
293
5.35. The input X to a communication channel is +1 or –1 with probability p and 1 – p, respectively. The received signal Y is the sum of X and noise N which has a Gaussian distribution with zero mean and variance s2 = 0.25.
(a) Find the joint probability P3X = j, Y … y4.
(b) Find the marginal pmf of X and the marginal pdf of Y.
(c) Suppose we are given that Y 7 0. Which is more likely, X = 1 or X = -1?
5.36. A modem sends a two-dimensional signal X from the set 511, 12, 11, -12, 1-1, 12,
1-1, -126. The channel adds a noise signal 1N1 , N22, so the received signal is
Y = X + N = 1X1 + N1 , X2 + N22. Assume that 1N1 , N22 have the jointly Gaussian
pdf in Example 5.18 with r = 0. Let the distance between X and Y be
d1X, Y2 = 51X1 - Y122 + 1X2 - Y22261/2.
(a) Suppose that X = 11, 12. Find and sketch region for the event 5Y is closer to (1, 1)
than to the other possible values of X6. Evaluate the probability of this event.
(b) Suppose that X = 11, 12. Find and sketch region for the event 5Y is closer to
11, -12 than to the other possible values of X6. Evaluate the probability of this
event.
(c) Suppose that X = 11, 12. Find and sketch region for the event 5d1X, Y2 7 16.
Evaluate the probability of this event. Explain why this probability is an upper
bound on the probability that Y is closer to a signal other than X = 11, 12.
Section 5.5: Independence of Two Random Variables
5.37. Let X be the number of full pairs and let Y be the remainder of the number of dots observed in a toss of a fair die. Are X and Y independent random variables?
5.38. Let X and Y be the coordinates of the robot in Problem 5.15 after 2n time instants. Determine whether X and Y are independent random variables.
5.39. Let X and Y be the coordinates of the two-dimensional modem signal (X, Y) in
Problem 5.12.
(a) Determine if X and Y are independent random variables.
(b) Repeat part a if even values of ® are twice as likely as odd values.
5.40. Determine which of the joint pmfs in Problem 5.11 correspond to independent pairs of
random variables.
5.41. Michael takes the 7:30 bus every morning. The arrival time of the bus at the stop is uniformly distributed in the interval [7:27, 7:37]. Michael’s arrival time at the stop is also uniformly distributed in the interval [7:25, 7:40]. Assume that Michael’s and the bus’s arrival
times are independent random variables.
(a) What is the probability that Michael arrives more than 5 minutes before the bus?
(b) What is the probability that Michael misses the bus?
5.42. Are R and ® independent in Problem 5.18?
5.43. Are X and Y independent in Problem 5.20?
5.44. Are the signal amplitudes X and Y independent in Problem 5.25?
5.45. Are X and Y independent in Problem 5.26?
5.46. Are X and Y independent in Problem 5.27?
294
Chapter 5
Pairs of Random Variables
5.47. Let X and Y be independent random variables. Find an expression for the probability of
the following events in terms of FX1x2 and FY1y2.
(a) 5a 6 X … b6 ¨ 5Y 7 d6.
(b) 5a 6 X … b6 ¨ 5c … Y 6 d6.
(c) 5 ƒ X ƒ 6 a6 ¨ 5c … Y … d6.
5.48. Let X and Y be independent random variables that are uniformly distributed in 3-1, 14.
Find the probability of the following events:
(a) P3X2 6 1/2, ƒ Y ƒ 6 1/24.
(b) P34X 6 1, Y 6 04.
(c) P3XY 6 1/24.
(d) P3max1X, Y2 6 1/34.
5.49. Let X and Y be random variables that take on values from the set 5-1, 0, 16.
(a) Find a joint pmf for which X and Y are independent.
(b) Are X2 and Y2 independent random variables for the pmf in part a?
(c) Find a joint pmf for which X and Y are not independent, but for which X2 and Y2
are independent.
5.50. Let X and Y be the jointly Gaussian random variables introduced in Problem 5.34.
(a) Show that X and Y are independent random variables if and only if r = 0.
(b) Suppose r = 0, find P3XY 6 04.
5.51. Two fair dice are tossed repeatedly until a pair occurs. Let K be the number of tosses required and let X be the number showing up in the pair. Find the joint pmf of K and X and
determine whether K and X are independent.
5.52. The number of devices L produced in a day is geometric distributed with probability of
success p. Let N be the number of working devices and let M be the number of defective
devices produced in a day.
(a) Are N and M independent random variables?
(b) Find the joint pmf of N and M.
(c) Find the marginal pmfs of N and M. (See hint in Problem 5.87b.)
(d) Are L and M independent random variables?
5.53. Let N1 be the number of Web page requests arriving at a server in a 100-ms period and let
N2 be the number of Web page requests arriving at a server in the next 100-ms period.
Use the result of Problem 5.13 parts a and b to develop a model where N1 and N2 are
independent Poisson random variables.
5.54. (a) Show that Eq. (5.22) implies Eq. (5.21).
(b) Show that Eq. (5.21) implies Eq. (5.22).
5.55. Verify that Eqs. (5.22) and (5.23) can be obtained from each other.
Section 5.6: Joint Moments and Expected Values of a Function of Two Random
Variables
5.56. (a) Find E31X + Y224.
(b) Find the variance of X + Y.
(c) Under what condition is the variance of the sum equal to the sum of the individual
variances?
Problems
295
5.57. Find E3 ƒ X - Y ƒ 4 if X and Y are independent exponential random variables with parameters l1 = 1 and l2 = 2, respectively.
5.58. Find E3X2eY4 where X and Y are independent random variables, X is a zero-mean,
unit-variance Gaussian random variable, and Y is a uniform random variable in the
interval [0, 3].
5.59. For the discrete random variables X and Y in Problem 5.1, find the correlation and covariance,
and indicate whether the random variables are independent, orthogonal, or uncorrelated.
5.60. For the discrete random variables X and Y in Problem 5.2, find the correlation and
covariance, and indicate whether the random variables are independent, orthogonal,
or uncorrelated.
5.61. For the three pairs of discrete random variables in Problem 5.11, find the correlation and
covariance of X and Y, and indicate whether the random variables are independent, orthogonal, or uncorrelated.
5.62. Let N1 and N2 be the number of Web page requests in Problem 5.13. Find the correlation
and covariance of N1 and N2 , and indicate whether the random variables are independent, orthogonal, or uncorrelated.
5.63. Repeat Problem 5.62 for N1 and N2 , the number of Web page requests in Problem 5.14.
5.64. Let N and T be the number of users logged on and the time till the next logoff in
Problem 5.23. Find the correlation and covariance of N and T, and indicate whether
the random variables are independent, orthogonal, or uncorrelated.
5.65. Find the correlation and covariance of X and Y in Problem 5.26. Determine whether X
and Y are independent, orthogonal, or uncorrelated.
5.66. Repeat Problem 5.65 for X and Y in Problem 5.27.
5.67. For the three pairs of continuous random variables X and Y in Problem 5.28, find the correlation and covariance, and indicate whether the random variables are independent, orthogonal, or uncorrelated.
5.68. Find the correlation coefficient between X and Y = aX + b. Does the answer depend
on the sign of a?
5.69. Propose a method for estimating the covariance of two random variables.
5.70. (a) Complete the calculations for the correlation coefficient in Example 5.28.
(b) Repeat the calculations if X and Y have the pdf:
fX,Y1x, y2 = e -1x + ƒyƒ2
for x 7 0, -x 6 y 6 x.
5.71. The output of a channel Y = X + N, where the input X and the noise N are independent, zero-mean random variables.
(a) Find the correlation coefficient between the input X and the output Y.
(b) Suppose we estimate the input X by a linear function g1Y2 = aY. Find the value of a
that minimizes the mean squared error E31X - aY224.
(c) Express the resulting mean-square error in terms of sX/sN .
5.72. In Example 5.27 let X = cos ®/4 and Y = sin ®/4. Are X and Y uncorrelated?
5.73. (a) Show that COV1X, E3Y ƒ X42 = COV1X, Y2.
(b) Show that E3Y ƒ X = x4 = E3Y4, for all x, implies that X and Y are uncorrelated.
5.74. Use the fact that E31tX + Y224 Ú 0 for all t to prove the Cauchy-Schwarz inequality:
1E3XY422 … E3X24E3Y24.
Hint: Consider the discriminant of the quadratic equation in t that results from the above
inequality.
296
Chapter 5
Pairs of Random Variables
Section 5.7: Conditional Probability and Conditional Expectation
5.75. (a) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.1 assuming fair coins are used.
(b) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.1 assuming Carlos uses a coin with
p = 3/4.
(c) What is the effect on pX1x ƒ y2 of Carlos using a biased coin?
(d) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in part a; then find E[X] and E[Y].
(e) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in part b; then find E[X] and E[Y].
5.76. (a) Find pX1x ƒ y2 for the communication channel in Problem 5.3.
(b) For each value of y, find the value of x that maximizes pX1x ƒ y2. State any assumptions about p and pe .
(c) Find the probability of error if a receiver uses the decision rule from part b.
5.77. (a) In Problem 5.11(i), which conditional pmf given X provides the most information
about Y: pY1y ƒ -12, pY1y ƒ 02, or pY1y ƒ +12? Explain why.
(b) Compare the conditional pmfs in Problems 5.11(ii) and (iii) and explain which of
these two cases is “more random.”
(c) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in Problems 5.11(i), (ii), (iii); then find E[X]
and E[Y].
(d) Find E3Y2 ƒ X = x4 and E3X2 ƒ Y = y4 in Problems 5.11(i), (ii), (iii); then find
VAR[X] and VAR[Y].
5.78. (a) Find the conditional pmf of N1 given N2 in Problem 5.14.
(b) Find P3N1 = k ƒ N2 = 2k4 for k = 5, 10, 20. Hint: Use Stirling’s fromula.
(c) Find E3N1 ƒ N2 = k4, then find E3N14.
5.79. In Example 5.30, let Y be the number of defects inside the region R and let Z be the number of defects outside the region.
(a) Find the pmf of Z given Y.
(b) Find the joint pmf of Y and Z.
(c) Are Y and Z independent random variables? Is the result intuitive?
5.80. (a) Find fY1y ƒ x2 in Problem 5.26.
(b) Find P3Y 7 X ƒ x4.
(c) Find P3Y 7 X4 using part b.
(d) Find E3Y ƒ X = x4.
5.81. (a) Find fY1y ƒ x2 in Problem 5.28(i).
(b) Find E3Y ƒ X = x4 and E 3 Y4.
(c) Repeat parts a and b of Problem 5.28(ii).
(d) Repeat parts a and b of Problem 5.28(iii).
5.82. (a) Find fY1y ƒ x2 in Example 5.27.
(b) Find E3Y ƒ X = x4.
(c) Find E3Y4.
(d) Find E3XY ƒ X = x4.
(e) Find E3XY4.
5.83. Find fY1y ƒ x2 and fX1x ƒ y2 for the jointly Gaussian pdf in Problem 5.34.
5.84. (a) Find fX1t ƒ N = n2 in Problem 5.23.
(b) Find E3Xt ƒ N = n4.
(c) Find the value of n that maximizes P3N = n ƒ t 6 X 6 t + dt4.
Problems
297
5.85. (a) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.12.
(b) Find E3Y ƒ X = x4.
(c) Find E3XY ƒ X = x4 and E3XY4.
5.86. A customer enters a store and is equally likely to be served by one of three clerks. The
time taken by clerk 1 is a constant random variable with mean two minutes; the time for
clerk 2 is exponentially distributed with mean two minutes; and the time for clerk 3 is
Pareto distributed with mean two minutes and a = 2.5.
(a) Find the pdf of T, the time taken to service a customer.
(b) Find E[T] and VAR[T].
5.87. A message requires N time units to be transmitted, where N is a geometric random
variable with pmf pi = 11 - a2ai - 1, i = 1, 2, Á . A single new message arrives during a time unit with probability p, and no messages arrive with probability 1 - p.
Let K be the number of new messages that arrive during the transmission of a
single message.
(a) Find E[K] and VAR[K] using conditional expectation.
n
(b) Find the pmf of K. Hint: 11 - b2-1k + 12 = a ¢ ≤ b n - k.
k
n=k
q
(c) Find the conditional pmf of N given K = k.
(d) Find the value of n that maximizes P3N = n ƒ X = k4.
5.88. The number of defects in a VLSI chip is a Poisson random variable with rate r. However,
r is itself a gamma random variable with parameters a and l.
(a) Use conditional expectation to find E[N] and VAR[N].
(b) Find the pmf for N, the number of defects.
5.89. (a) In Problem 5.35, find the conditional pmf of the input X of the communication channel given that the output is in the interval y 6 Y … y + dy.
(b) Find the value of X that is more probable given y 6 Y … y + dy.
(c) Find an expression for the probability of error if we use the result of part b to decide
what the input to the channel was.
Section 5.8: Functions of Two Random Variables
5.90. Two toys are started at the same time each with a different battery. The first battery has a
lifetime that is exponentially distributed with mean 100 minutes; the second battery has a
Rayleigh-distributed lifetime with mean 100 minutes.
(a) Find the cdf to the time T until the battery in a toy first runs out.
(b) Suppose that both toys are still operating after 100 minutes. Find the cdf of the time
T2 that subsequently elapses until the battery in a toy first runs out.
(c) In part b, find the cdf of the total time that elapses until a battery first fails.
5.91. (a) Find the cdf of the time that elapses until both batteries run out in Problem 5.90a.
(b) Find the cdf of the remaining time until both batteries run out in Problem 5.90b.
5.92. Let K and N be independent random variables with nonnegative integer values.
(a) Find an expression for the pmf of M = K + N.
(b) Find the pmf of M if K and N are binomial random variables with parameters (k, p)
and (n, p).
(c) Find the pmf of M if K and N are Poisson random variables with parameters a1 and
a2 , respectively.
298
Chapter 5
Pairs of Random Variables
5.93. The number X of goals the Bulldogs score against the Flames has a geometric distribution with mean 2; the number of goals Y that the Flames score against the Bulldogs is also
geometrically distributed but with mean 4.
(a) Find the pmf of the Z = X - Y. Assume X and Y are independent.
(b) What is the probability that the Bulldogs beat the Flames? Tie the Flames?
(c) Find E[Z].
5.94. Passengers arrive at an airport taxi stand every minute according to a Bernoulli random
variable. A taxi will not leave until it has two passengers.
(a) Find the pmf until the time T when the taxi has two passengers.
(b) Find the pmf for the time that the first customer waits.
5.95. Let X and Y be independent random variables that are uniformly distributed in the interval [0, 1]. Find the pdf of Z = XY.
5.96. Let X1 , X2 , and X3 be independent and uniformly distributed in 3-1, 14.
(a) Find the cdf and pdf of Y = X1 + X2 .
(b) Find the cdf of Z = Y + X3 .
5.97. Let X and Y be independent random variables with gamma distributions and parameters
1a1 , l2 and 1a2 , l2, respectively. Show that Z = X + Y is gamma-distributed with parameters 1a1 + a2 , l2. Hint: See Eq. (4.59).
5.98. Signals X and Y are independent. X is exponentially distributed with mean 1 and Y is
exponentially distributed with mean 1.
(a) Find the cdf of Z = ƒ X - Y ƒ .
(b) Use the result of part a to find E[Z].
5.99. The random variables X and Y have the joint pdf
fX,Y1x, y2 = e -1x + y2
for 0 6 y 6 x 6 1.
Find the pdf of Z = X + Y.
5.100. Let X and Y be independent Rayleigh random variables with parameters a = b = 1.
Find the pdf of Z = X/Y.
5.101. Let X and Y be independent Gaussian random variables that are zero mean and unit
variance. Show that Z = X/Y is a Cauchy random variable.
5.102. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are independent
and X is uniformly distributed in [0, 1] and Y is uniformly distributed in [0, 1].
5.103. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are independent
exponential random variables with the same mean.
5.104. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are the independent Pareto random variables with the same distribution.
5.105. Let W = X + Y and Z = X - Y.
(a) Find an expression for the joint pdf of W and Z.
(b) Find fW,Z1z, w2 if X and Y are independent exponential random variables with
parameter l = 1.
(c) Find fW,Z1z, w2 if X and Y are independent Pareto random variables with the same
distribution.
5.106. The pair (X, Y) is uniformly distributed in a ring centered about the origin and inner and
outer radii r1 6 r2 . Let R and ® be the radius and angle corresponding to (X, Y). Find the
joint pdf of R and ®.
Problems
299
5.107. Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Let
V = aX + bY and W = cX + eY.
(a) Find the joint pdf of V and W, assuming the transformation matrix A is invertible.
(b) Suppose A is not invertible. What is the joint pdf of V and W?
5.108. Let X and Y be independent Gaussian random variables that are zero mean and unit
variance. Let W = X2 + Y2 and let ® = tan-11Y/X2. Find the joint pdf of W and ®.
5.109. Let X and Y be the random variables introduced in Example 5.4. Let R = 1X2 + Y221/2
and let ® = tan-11Y/X2.
(a) Find the joint pdf of R and ®.
(b) What is the joint pdf of X and Y?
Section 5.9: Pairs of Jointly Gaussian Variables
5.110. Let X and Y be jointly Gaussian random variables with pdf
fX,Y1x, y2 =
exp5-2x2 - y2/26
2pc
for all x, y.
Find VAR[X], VAR[Y], and COV(X, Y).
5.111. Let X and Y be jointly Gaussian random variables with pdf
fX,Y1x, y2 =
5.112.
5.113.
5.114.
5.115.
5.116.
expe
-1 2
3x + 4y2 - 3xy + 3y - 2x + 14 f
2
2p
for all x, y.
Find E[X], E[Y], VAR[X], VAR[Y], and COV(X, Y).
Let X and Y be jointly Gaussian random variables with E3Y4 = 0, s1 = 1, s2 = 2, and
E3X ƒ Y4 = Y/4 + 1. Find the joint pdf of X and Y.
Let X and Y be zero-mean, independent Gaussian random variables with s2 = 1.
(a) Find the value of r for which the probability that (X, Y) falls inside a circle of radius
r is 1/2.
(b) Find the conditional pdf of (X, Y) given that (X, Y) is not inside a ring with inner radius r1 and outer radius r2 .
Use a plotting program (as provided by Octave or MATLAB) to show the pdf for jointly
Gaussian zero-mean random variables with the following parameters:
(a) s1 = 1, s2 = 1, r = 0.
(b) s1 = 1, s2 = 1, r = 0.8.
(c) s1 = 1, s2 = 1, r = -0.8.
(d) s1 = 1, s2 = 2, r = 0.
(e) s1 = 1, s2 = 2, r = 0.8.
(f) s1 = 1, s2 = 10, r = 0.8.
Let X and Y be zero-mean, jointly Gaussian random variables with s1 = 1, s2 = 2, and
correlation coefficient r.
(a) Plot the principal axes of the constant-pdf ellipse of (X, Y).
(b) Plot the conditional expectation of Y given X = x.
(c) Are the plots in parts a and b the same or different? Why?
Let X and Y be zero-mean, unit-variance jointly Gaussian random variables for which
r = 1. Sketch the joint cdf of X and Y. Does a joint pdf exist?
300
Chapter 5
Pairs of Random Variables
5.117. Let h(x, y) be a joint Gaussian pdf for zero-mean, unit-variance Gaussian random variables with correlation coefficient r1 . Let g(x, y) be a joint Gaussian pdf for zero-mean,
unit-variance Gaussian random variables with correlation coefficient r2 Z r1 . Suppose
the random variables X and Y have joint pdf
fX,Y1x, y2 = 5h1x, y2 + g1x, y26/2.
(a) Find the marginal pdf for X and for Y.
(b) Explain why X and Y are not jointly Gaussian random variables.
5.118. Use conditional expectation to show that for X and Y zero-mean, jointly Gaussian random
variables, E3X2Y24 = E3X24E3Y24 + 2E3XY42.
5.119. Let X = 1X, Y2 be the zero-mean jointly Gaussian random variables in Problem 5.110.
Find a transformation A such that Z = AX has components that are zero-mean, unitvariance Gaussian random variables.
5.120. In Example 5. 47, suppose we estimate the value of the signal X from the noisy observation Y by:
N =
X
1
Y.
1 + sN2 /sX2
N 224.
(a) Evaluate the mean square estimation error: E31X - X
(b) How does the estimation error in part a vary with signal-to-noise ratio sX/sN?
Section 5.10: Generating Independent Gaussian Random Variables
5.121. Find the inverse of the cdf of the Rayleigh random variable to derive the transformation
method for generating Rayleigh random variables. Show that this method leads to the same
algorithm that was presented in Section 5.10.
5.122. Reproduce the results presented in Example 5.49.
5.123. Consider the two-dimensional modem in Problem 5.36.
(a) Generate 10,000 discrete random variables uniformly distributed in the set
51, 2, 3, 46. Assign each outcome in this set to one of the signals
511, 12, 11, -12, 1-1, 12, 1 -1, -126. The sequence of discrete random variables
then produces a sequence of 10,000 signal points X.
(b) Generate 10,000 noise pairs N of independent zero-mean, unit-variance jointly
Gaussian random variables.
(c) Form the sequence of 10,000 received signals Y = 1Y1 , Y22 = X + N.
(d) Plot the scattergram of received signal vectors. Is the plot what you expected?
N = 1sgn1Y 2,
(e) Estimate the transmitted signal by the quadrant that Y falls in: X
1
sgn1Y222.
(f) Compare the estimates with the actually transmitted signals to estimate the probability of error.
5.124. Generate a sequence of 1000 pairs of independent zero-mean Gaussian random variables, where X has variance 2 and N has variance 1. Let Y = X + N be the noisy signal
from Example 5.47.
(a) Estimate X using the estimator in Problem 5.120, and calculate the sequence of estimation errors.
(b) What is the pdf of the estimation error?
(c) Compare the mean, variance, and relative frequencies of the estimation error with
the result from part b.
Problems
301
5.125. Let X1 , X2 , Á , X1000 be a sequence of zero-mean, unit-variance independent Gaussian
random variables. Suppose that the sequence is “smoothed” as follows:
Yn = 1Xn + XN - 12/2 where X0 = 0.
(a) Find the pdf of 1Yn , Yn + 12.
(b) Generate the sequence of Xn and the corresponding sequence Yn . Plot the scattergram of 1Yn , Yn + 12. Does it agree with the result from part a?
(c) Repeat parts a and b for Zn = 1Xn - XN - 12/2.
5.126. Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Find the
linear transformation to generate jointly Gaussian random variables with means m1 , m2 , variances s 21 , s 22 , and correlation coefficient r. Hint: Use the conditional pdf in Eq. (5.64).
5.127. (a) Use the method developed in Problem 5.126 to generate 1000 pairs of jointly Gaussian random variables with m1 = 1, m2 = -1, variances s21 = 1, s22 = 2, and correlation coefficient r = -1/2.
(b) Plot a two-dimensional scattergram of the 1000 pairs and compare to equal-pdf contour lines for the theoretical pdf.
5.128. Let H and W be the height and weight of adult males. Studies have shown that H (in cm)
and V = ln W (W in kg) are jointly Gaussian with parameters mH = 174 cm, mV = 4.4,
s2H = 42.36, s2V = 0.021, and COV1H, V2 = 0.458.
(a) Use the method in part a to generate 1000 pairs (H, V). Plot a scattergram to check
the joint pdf.
(b) Convert the (H, V) pairs into (H, W) pairs.
(c) Calculate the body mass index for each outcome, and estimate the proportion of the
population that is underweight, normal, overweight, or obese. (See Problem 5.6.)
Problems Requiring Cumulative Knowledge
5.129. The random variables X and Y have joint pdf:
fX,Y1x, y2 = c sin 1x + y2
(a)
(b)
(c)
(d)
0 … x … p/2, 0 … y … p/2.
Find the value of the constant c.
Find the joint cdf of X and Y.
Find the marginal pdf’s of X and of Y.
Find the mean, variance, and covariance of X and Y.
5.130. An inspector selects an item for inspection according to the outcome of a coin flip:The item is
inspected if the outcome is heads. Suppose that the time between item arrivals is an exponential random variable with mean one. Assume the time to inspect an item is a constant value t.
(a) Find the pmf for the number of item arrivals between consecutive inspections.
(b) Find the pdf for the time X between item inspections. Hint: Use conditional expectation.
(c) Find the value of p, so that with a probability of 90% an inspection is completed before the next item is selected for inspection.
5.131. The lifetime X of a device is an exponential random variable with mean = 1/R. Suppose
that due to irregularities in the production process, the parameter R is random and has a
gamma distribution.
(a) Find the joint pdf of X and R.
(b) Find the pdf of X.
(c) Find the mean and variance of X.
302
Chapter 5
Pairs of Random Variables
5.132. Let X and Y be samples of a random signal at two time instants. Suppose that X and Y are
independent zero-mean Gaussian random variables with the same variance. When signal
“0” is present the variance is s20, and when signal “1” is present the variance is s21 7 s20 .
Suppose signals 0 and 1 occur with probabilities p and 1 - p, respectively. Let
R2 = X2 + Y2 be the total energy of the two observations.
(a) Find the pdf of R2 when signal 0 is present; when signal 1 is present. Find the pdf of R2.
(b) Suppose we use the following “signal detection” rule: If R2 7 T, then we decide signal 1 is present; otherwise, we decide signal 0 is present. Find an expression for the
probability of error in terms of T.
(c) Find the value of T that minimizes the probability of error.
5.133. Let U0 , U1 , Á be a sequence of independent zero-mean, unit-variance Gaussian random variables. A “low-pass filter” takes the sequence Ui and produces the output
sequence Xn = 1Un + Un - 12/2, and a “high-pass filter” produces the output sequence
Yn = 1Un - Un - 12/2 .
(a) Find the joint pdf of Xn and Xn - 1 ; of Xn and Xn + m , m 7 1.
(b) Repeat part a for Yn .
(c) Find the joint pdf of Xn and Ym .
CHAPTER
Vector Random
Variables
6
In the previous chapter we presented methods for dealing with two random variables.
In this chapter we extend these methods to the case of n random variables in the following ways:
• By representing n random variables as a vector, we obtain a compact notation for
the joint pmf, cdf, and pdf as well as marginal and conditional distributions.
• We present a general method for finding the pdf of transformations of vector random variables.
• Summary information of the distribution of a vector random variable is provided
by an expected value vector and a covariance matrix.
• We use linear transformations and characteristic functions to find alternative
representations of random vectors and their probabilities.
• We develop optimum estimators for estimating the value of a random variable
based on observations of other random variables.
• We show how jointly Gaussian random vectors have a compact and easy-to-workwith pdf and characteristic function.
6.1
VECTOR RANDOM VARIABLES
The notion of a random variable is easily generalized to the case where several quantities are of interest. A vector random variable X is a function that assigns a vector of
real numbers to each outcome z in S, the sample space of the random experiment. We
use uppercase boldface notation for vector random variables. By convention X is a column vector (n rows by 1 column), so the vector random variable with components
X1 , X2 , Á , Xn corresponds to
X1
X
X ⴝ D . 2 T = 3X1 , X2 , Á , Xn4T,
..
Xn
303
304
Chapter 6
Vector Random Variables
where “T” denotes the transpose of a matrix or vector. We will sometimes write
X = 1X1 , X2 , Á , Xn2 to save space and omit the transpose unless dealing with matrices. Possible values of the vector random variable are denoted by x = 1x1 , x2 , Á , xn2
where xi corresponds to the value of Xi .
Example 6.1
Arrivals at a Packet Switch
Packets arrive at each of three input ports of a packet switch according to independent Bernoulli
trials with p = 1/2. Each arriving packet is equally likely to be destined to any of three output
ports. Let X = 1X1 , X2 , X32 where Xi is the total number of packets arriving for output port i.
X is a vector random variable whose values are determined by the pattern of arrivals at the
input ports.
Example 6.2 Joint Poisson Counts
A random experiment consists of finding the number of defects in a semiconductor chip and identifying their locations. The outcome of this experiment consists of the vector z = 1n, y1 , y2 , Á , yn2,
where the first component specifies the total number of defects and the remaining components
specify the coordinates of their location. Suppose that the chip consists of M regions. Let
N11z2, N21z2, Á , NM1z2 be the number of defects in each of these regions, that is, Nk1z2 is the
number of y’s that fall in region k. The vector N1z2 = 1N1 , N2 , Á , NM2 is then a vector random
variable.
Example 6.3
Samples of an Audio Signal
Let the outcome z of a random experiment be an audio signal X(t). Let the random variable
Xk = X1kT2 be the sample of the signal taken at time kT. An MP3 codec processes the audio in
blocks of n samples X = 1X1 , X2 , Á , Xn2. X is a vector random variable.
6.1.1
Events and Probabilities
Each event A involving X = 1X1 , X2 , Á , Xn2 has a corresponding region in an ndimensional real space Rn. As before, we use “rectangular” product-form sets in R n
as building blocks. For the n-dimensional random variable X = 1X1 , X2 , Á , Xn2,
we are interested in events that have the product form
A = 5X1 in A 16 ¨ 5X2 in A 26 ¨ Á ¨ 5Xn in A n6,
(6.1)
where each A k is a one-dimensional event (i.e., subset of the real line) that involves Xk
only. The event A occurs when all of the events 5Xk in A k6 occur jointly.
We are interested in obtaining the probabilities of these product-form events:
P3A4 = P3X H A4 = P35X1 in A 16 ¨ 5X2 in A 26 ¨ Á ¨ 5Xn in A n64
! P3X1 in A 1 , X2 in A 2 , Á , Xn in A n4.
(6.2)
Section 6.1
Vector Random Variables
305
In principle, the probability in Eq. (6.2) is obtained by finding the probability of the
equivalent event in the underlying sample space, that is,
P3A4 = P35z in S : X1z2 in A64
= P35z in S : X11z2 H A 1 , X21z2 H A 2 , Á , Xn1z2 H A n64.
(6.3)
Equation (6.2) forms the basis for the definition of the n-dimensional joint probability
mass function, cumulative distribution function, and probability density function. The
probabilities of other events can be expressed in terms of these three functions.
6.1.2
Joint Distribution Functions
The joint cumulative distribution function of X1 , X2 , Á , Xn is defined as the probability of an n-dimensional semi-infinite rectangle associated with the point 1x1 , Á , xn2:
FX1x2 ! FX1, X2, Á , Xn1x1 , x2 , Á , xn2 = P3X1 … x1 , X2 … x2 , Á , Xn … xn4.
(6.4)
The joint cdf is defined for discrete, continuous, and random variables of mixed type.
The probability of product-form events can be expressed in terms of the joint cdf.
The joint cdf generates a family of marginal cdf’s for subcollections of the random variables X1 , Á , Xn . These marginal cdf’s are obtained by setting the appropriate entries to + q in the joint cdf in Eq. (6.4). For example:
Joint cdf for X1 , Á , Xn - 1 is given by FX1, X2, Á , Xn1x1 , x2 , Á , xn - 1 , q2 and
Joint cdf for X1 and X2 is given by FX1, X2 , Á , Xn1x1 , x2 , q, Á , q2.
Example 6.4
A radio transmitter sends a signal to a receiver using three paths. Let X1 , X2 , and X3 be the signals that arrive at the receiver along each path. Find P3max1X1 , X2 , X32 … 54.
The maximum of three numbers is less than 5 if and only if each of the three numbers is
less than 5; therefore
P3A4 = P35X1 … 56 ¨ 5X2 … 56 ¨ 5X3 … 564
= FX1,X2,X315, 5, 52.
The joint probability mass function of n discrete random variables is defined by
pX1x2 ! pX1, X2 , Á , Xn1x1 , x2 , Á , xn2 = P3X1 = x1 , X2 = x2 , Á , Xn = xn4.
(6.5)
The probability of any n-dimensional event A is found by summing the pmf over the
points in the event
P3X in A4 = a Á a pX1,X2, Á , Xn1x1 , x2 , Á , xn2.
x in A
(6.6)
306
Chapter 6
Vector Random Variables
The joint pmf generates a family of marginal pmf’s that specifies the joint probabilities for subcollections of the n random variables. For example, the one-dimensional
pmf of Xj is found by adding the joint pmf over all variables other than xj:
pXj1xj2 = P3Xj = xj4 = a Á a a Á a pX1, X2 , Á , Xn1x1 , x2 , Á , xn2. (6.7)
xn
xj - 1 xj + 1
x1
The two-dimensional joint pmf of any pair Xj and Xk is found by adding the joint pmf
over all n - 2 other variables, and so on. Thus, the marginal pmf for X1 , Á , Xn - 1 is
given by
pX1 , Á , Xn - 11x1 , x2 , Á , xn - 12 = a pX1 , Á , Xn1x1 , x2 , Á , xn2.
(6.8)
xn
A family of conditional pmf’s is obtained from the joint pmf by conditioning
on different subcollections of the random variables. For example, if pX1 , Á , Xn - 1
1x1 , Á , xn - 12 7 0:
pX1 , Á , Xn1x1 , Á , xn2
.
pXn1xn ƒ x1 , Á , xn - 12 = p
X1 , Á , Xn - 11x1 , Á , xn - 12
(6.9a)
Repeated applications of Eq. (6.9a) yield the following very useful expression:
pX1 , Á , Xn1x1 , Á , xn2 =
pXn1xn | x1 , Á , xn - 12pXn - 11xn - 1 | x1 , Á , xn - 22 Á pX21x2 | x12pX11x12. (6.9b)
Example 6.5
Arrivals at a Packet Switch
Find the joint pmf of X = 1X1 , X2 , X32 in Example 6.1. Find P3X1 7 X34.
Let N be the total number of packets arriving in the three input ports. Each input port has
an arrival with probability p = 1/2, so N is binomial with pmf:
3 1
pN1n2 = ¢ ≤ 3
n 2
for 0 … n … 3.
Given N = n, the number of packets arriving for each output port has a multinomial distribution:
n!
1
pX1,X2,X31i, j, k ƒ i + j + k = n2 = c i! j! k! 3n
0
for i + j + k = n, i Ú 0, j Ú 0, k Ú 0
otherwise.
The joint pmf of X is then:
3 1
pX1i, j, k2 = pX1i, j, k ƒ n2 ¢ ≤ 3
n 2
for i Ú 0, j Ú 0, k Ú 0, i + j + k = n … 3.
The explicit values of the joint pmf are:
pX10, 0, 02 =
1 3 1
1
0!
=
¢ ≤
0! 0! 0! 30 0 2 3
8
Section 6.1
Vector Random Variables
pX11, 0, 02 = pX10, 1, 02 = pX10, 0, 12 =
1!
1 3 1
3
=
¢ ≤
0! 0! 1! 31 1 2 3
24
pX11, 1, 02 = pX11, 0, 12 = pX10, 1, 12 =
1 3 1
6
2!
=
¢ ≤
0! 1! 1! 32 2 2 3
72
307
pX12, 0, 02 = pX10, 2, 02 = pX10, 0, 22 = 3/72
pX11, 1, 12 = 6/216
pX10, 1, 22 = pX10, 2, 12 = pX11, 0, 22 = pX11, 2, 02 = pX12, 0, 12 = pX12, 1, 02 = 3/216
pX13, 0, 02 = pX10, 3, 02 = pX10, 0, 32 = 1/216.
Finally:
P3X1 7 X34 = pX11, 0, 02 + pX11, 1, 02 + pX12, 0, 02 + pX11, 2, 02
+ pX12, 0, 12 + pX12, 1, 02 + pX13, 0, 02
= 8/27.
We say that the random variables X1 , X2 , Á , Xn are jointly continuous random
variables if the probability of any n-dimensional event A is given by an n-dimensional
integral of a probability density function:
P3X in A4 = Á
fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ ,
Lx in A L
(6.10)
where fX1, Á , Xn1x1 , Á , xn2 is the joint probability density function.
The joint cdf of X is obtained from the joint pdf by integration:
x1
FX1x2 = FX1,X2 , Á , Xn1x1 , x2 , Á , xn2 =
xn
Á
fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ .
L- q L- q
(6.11)
The joint pdf (if the derivative exists) is given by
fX1x2 ! fX1,X2,Á , Xn1x1 , x2 , Á , xn2 =
0n
FX ,Á ,Xn1x1 , Á , xn2.
0x1 Á 0xn 1
(6.12)
A family of marginal pdf’s is associated with the joint pdf in Eq. (6.12). The marginal pdf for a subset of the random variables is obtained by integrating the other
variables out. For example, the marginal pdf of X1 is
fX11x12 =
q
q
fX1,X2, Á , Xn1x1 , x2œ , Á , xnœ 2 dx2œ Á dxnœ .
L- q
L- q
As another example, the marginal pdf for X1 , Á , Xn - 1 is given by
Á
fX1, Á , Xn - 11x1 , Á , xn - 12 =
q
L- q
fX1, Á , Xn1x1 , Á , xn - 1 , xnœ 2 dxnœ .
(6.13)
(6.14)
A family of conditional pdf’s is also associated with the joint pdf. For example,
the pdf of Xn given the values of X1 , Á , Xn - 1 is given by
fXn1xn | x1 , Á , xn - 12 =
fX1, Á , Xn1x1 , Á , xn2
fX1, Á , Xn - 11x1 , Á , xn - 12
(6.15a)
308
Chapter 6
Vector Random Variables
if fX1, Á ,Xn - 11x1 , Á , xn - 12 7 0.
Repeated applications of Eq. (6.15a) yield an expression analogous to Eq. (6.9b):
fX1, Á ,Xn1x1 , Á , xn2 =
fXn1xn ƒ x1 , Á , xn - 12fXn - 11xn - 1 ƒ x1 , Á , xn - 22 Á fX21x2 ƒ x12fX11x12.
(6.15b)
Example 6.6
The random variables X1 , X2 , and X3 have the joint Gaussian pdf
fX1,X2,X31x1 , x2 , x32 =
e -1x1 + x2 - 12 x1x2 +
2p1p
2
2
冫2 x32 2
1
.
Find the marginal pdf of X1 and X3 . Find the conditional pdf of X2 given X1 and X3 .
The marginal pdf for the pair X1 and X3 is found by integrating the joint pdf over x2 :
fX1,X31x1 , x32 =
q -1x 2 + x 2 - 12x x 2
1 2
1
2
2
e -x 3 / 2
22p L- q
e
2p/22
dx2 .
The above integral was carried out in Example 5.18 with r = -1/22 . By substituting the result
of the integration above, we obtain
fX1,X31x1 , x32 =
2
2
e -x3 / 2 e -x1/2
22p 22p
.
Therefore X1 and X3 are independent zero-mean, unit-variance Gaussian random variables.
The conditional pdf of X2 given X1 and X3 is:
fX21x2 ƒ x1 , x32 =
e -1x1 + x2 - 12x1x2 +
2p1p
冫2x322
1
22p22p
2
2
e -x3 / 2e -x1 / 2
e -1 冫2x1 + x2 - 12x1x22
e -1x2-x1/12x12
=
.
1p
1p
1
=
2
2
2
2
2
We conclude that X2 given and X3 is a Gaussian random variable with mean x1/22 and
variance 1/2.
Example 6.7
Multiplicative Sequence
Let X1 be uniform in [0, 1], X2 be uniform in 30, X14, and X3 be uniform in 30, X24. (Note that X3
is also the product of three uniform random variables.) Find the joint pdf of X and the marginal
pdf of X3 .
For 0 6 z 6 y 6 x 6 1, the joint pdf is nonzero and given by:
fX1,X2,X31x1 , x2 , x32 = fX31z | x, y2fX21y | x2fX11x2 =
1
1 1
1 =
.
yx
xy
Section 6.2
Functions of Several Random Variables
309
The joint pdf of X2 and X3 is nonzero for 0 6 z 6 y 6 1 and is obtained by integrating x between y and 1:
1
1
1
1 1
1
dx = ln x ` = ln .
fX2,X31x2 , x32 =
y
y y
y xy
y
3
We obtain the pdf of X3 by integrating y between z and 1:
fX31x32 = -
1
1
1
1
ln y dy = - 1ln y22 ` = 1ln z22.
y
2
2
z
z
3
1
Note that the pdf of X3 is concentrated at the values close to x = 0.
6.1.3
Independence
The collection of random variables X1 , Á , Xn is independent if
P3X1 in A 1 , X2 in A 2 , Á , Xn in A n4 = P3X1 in A 14P3X2 in A 24 Á P3Xn in A n4
for any one-dimensional events A 1 , Á , A n . It can be shown that X1 , Á , Xn are independent if and only if
FX1, Á , Xn1x1 , Á , xn2 = FX11x12 Á FXn1xn2
(6.16)
for all x1 , Á , xn . If the random variables are discrete, Eq. (6.16) is equivalent to
pX1, Á , Xn1x1 , Á , xn2 = pX11x12 Á pXn1xn2
for all x1 , Á , xn .
If the random variables are jointly continuous, Eq. (6.16) is equivalent to
fX1, Á , Xn1x1 , Á , xn2 = fX11x12 Á fXn1xn2
for all x1 , Á , xn .
Example 6.8
The n samples X1 , X2 , Á , Xn of a noise signal have joint pdf given by
fX1, Á , Xn1x1 , Á , xn2 =
e -1x1 + Á + xn2/2
12p2n/2
2
2
for all x1 , Á , xn .
It is clear that the above is the product of n one-dimensional Gaussian pdf’s. Thus X1 , Á , Xn are
independent Gaussian random variables.
6.2
FUNCTIONS OF SEVERAL RANDOM VARIABLES
Functions of vector random variables arise naturally in random experiments. For example X = 1X1 , X2 , Á , Xn2 may correspond to observations from n repetitions of an
experiment that generates a given random variable. We are almost always interested in
the sample mean and the sample variance of the observations. In another example
310
Chapter 6
Vector Random Variables
X = 1X1 , X2 , Á , Xn2 may correspond to samples of a speech waveform and we may
be interested in extracting features that are defined as functions of X for use in a
speech recognition system.
6.2.1
One Function of Several Random Variables
Let the random variable Z be defined as a function of several random variables:
Z = g1X1 , X2 , Á , Xn2.
(6.17)
The cdf of Z is found by finding the equivalent event of 5Z … z6, that is, the set
Rz = 5x: g1x2 … z6, then
FZ1z2 = P3X in Rz4 =
Á
fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ .
Lx in Rz L
(6.18)
The pdf of Z is then found by taking the derivative of FZ1z2.
Example 6.9
Maximum and Minimum of n Random Variables
Let W = max1X1 , X2 , Á , Xn2 and Z = min1X1 , X2 , Á , Xn2, where the Xi are independent
random variables with the same distribution. Find FW1w2 and FZ1z2.
The maximum of X1 , X2 , Á , Xn is less than x if and only if each Xi is less than x, so:
FW1w2 = P3max1X1 , X2 , Á , Xn2 … w4
= P3X1 … w4P3X2 … w4 Á P3Xn … w4 = 1FX1w22n.
The minimum of X1 , X2 , Á , Xn is greater than x if and only if each Xi is greater than x, so:
1 - FZ1z2 = P3min1X1 , X2 , Á , Xn2 7 z4
= P3X1 7 z4P3X2 7 z4 Á P3Xn 7 z4 = 11 - FX1z22n
and
FZ1z2 = 1 - 11 - FX1z22n.
Example 6.10 Merging of Independent Poisson Arrivals
Web page requests arrive at a server from n independent sources. Source j generates packets
with exponentially distributed interarrival times with rate lj . Find the distribution of the interarrival times between consecutive requests at the server.
Let the interarrival times for the different sources be given by X1 , X2 , Á , Xn . Each Xj
satisfies the memoryless property, so the time that has elapsed since the last arrival from each
source is irrelevant. The time until the next arrival at the multiplexer is then:
Therefore the pdf of Z is:
Z = min1X1 , X2 , Á , Xn2.
1 - FZ1z2 = P3min1X1 , X2 , Á , Xn2 7 z4
= P3X1 7 z4P3X2 7 z4 Á P3Xn 7 z4
Section 6.2
Functions of Several Random Variables
311
= A 1 - FX11z2 B A 1 - FX21z2 B Á A 1 - FXn1z2 B
= e -l1ze -l2z Á e -lnz = e -1l1 + l2 +
Á + ln2z
.
The interarrival time is an exponential random variable with rate l1 + l2 + Á + ln .
Example 6.11 Reliability of Redundant Systems
A computing cluster has n independent redundant subsystems. Each subsystem has an exponentially distributed lifetime with parameter l. The cluster will operate as long as at least one subsystem is functioning. Find the cdf of the time until the system fails.
Let the lifetime of each subsystem be given by X1 , X2 , Á , Xn . The time until the last subsystem fails is:
W = max1X1 , X2 , Á , Xn2.
Therefore the cdf of W is:
n
n
FW1w2 = A FX1w2 B n = 11 - e -lw2n = 1 - ¢ ≤ e -lw + ¢ ≤ e -2lw + Á .
1
2
6.2.2
Transformations of Random Vectors
Let X1 , Á , Xn be random variables in some experiment, and let the random variables Z1 , Á , Zn be defined by a transformation that consists of n functions of
X = 1X1 , Á , Xn2:
Z1 = g11X2
Z2 = g21X2
Á
Zn = gn1X2.
The joint cdf of Z = 1Z1 , Á , Zn2 at the point z = 1z1 , Á , zn2 is equal to the probability of the region of x where gk1x2 … zk for k = 1, Á , n:
FZ1, Á ,Zn1z1 , Á , zn2 = P3g11X2 … z1 , Á , gn1X2 … zn4.
(6.19a)
If X1 , Á , Xn have a joint pdf, then
FZ1, Á ,Zn1z1 , Á , zn2 = 1 Á 1
fX1, Á ,Xn1x1œ , Á , xnœ 2 dx1œ Á dx¿.
x¿:gk1x¿2 … zk
Example 6.12
Given a random vector X, find the joint pdf of the following transformation:
Z1 = g11X12 = a1X1 + b1 ,
Z2 = g21X22 = a2X2 + b2 ,
o
Zn = gn1Xn2 = anXn + bn .
(6.19b)
312
Chapter 6
Vector Random Variables
Note that Zk = akXk + bk , … zk , if and only if Xk … 1zk - bk2/ak , if ak 7 0, so
FZ1,Z2, Á , Zn1z1 , z2 , Á , zn2 = P B X1 …
= FX1,X2,
Á , Xn ¢
z1 - b1 z2 - b2
zn - bn
,
,Á,
≤
a1
a2
an
fZ1,Z2, Á , Zn1z1 , z2 , Á , zn2 =
=
1
f
a1 Á an X1,X2,
z1 - b1
z2 - b2
zn - bn
, X2 …
, Á , Xn …
R
a1
a2
an
Á , Xn ¢
0n
FZ ,Z , Á , Zn1z1 , z2 , Á , zn2
0z1 Á 0zn 1 2
zn - bn
z1 - b1 z2 - b2
,
,Á,
≤.
a1
a2
an
*6.2.3 pdf of General Transformations
We now introduce a general method for finding the pdf of a transformation of n jointly
continuous random variables. We first develop the two-dimensional case. Let the random variables V and W be defined by two functions of X and Y:
V = g11X, Y2
and
W = g21X, Y2.
(6.20)
Assume that the functions v(x, y) and w(x, y) are invertible in the sense that the equations v = g11x, y2 and w = g21x, y2 can be solved for x and y, that is,
x = h11v, w2 and y = h21v, w2.
The joint pdf of X and Y is found by finding the equivalent event of infinitesimal rectangles.The image of the infinitesimal rectangle is shown in Fig. 6.1(a).The image can be
approximated by the parallelogram shown in Fig. 6.1(b) by making the approximation
gk1x + dx, y2 M gk1x, y2 +
0
gk1x, y2 dx
0x
k = 1, 2
and similarly for the y variable. The probabilities of the infinitesimal rectangle and the
parallelogram are approximately equal, therefore
and
fX,Y1x, y2 dx dy = fV,W1v, w2 dP
fV,W1v, w2 =
fX,Y1h11v, w2, 1h21v, w22
dP
`
`
dxdy
,
(6.21)
where dP is the area of the parallelogram. By analogy with the case of a linear
transformation (see Eq. 5.59), we can match the derivatives in the above approximations with the coefficients in the linear transformations and conclude that the
Section 6.2
y
Functions of Several Random Variables
313
w
(g1(x dx, y dy), g2(x dx, y dy))
(g1(x, y dy),
(x, y dy)
(x dx, y dy)
(x, y)
(x dx, y)
g2(x, y dy))
(g1(x dx, y), g2(x dx, y))
(g1(x, y), g2(x, y))
x
v
(a)
w
g1
g1
g2
g2
(v x dx y dy, w x dx y dy)
g1
g2
(v y dy, w y dy)
g1
g2
(v x dx, w x dx)
(v, w)
v
v g1(x, y)
w g2(x, y)
(b)
FIGURE 6.1
(a) Image of an infinitesimal rectangle under general transformation. (b) Approximation of image by a parallelogram.
“stretch factor” at the point (v, w) is given by the determinant of a matrix of partial
derivatives:
0v
0v
0x
0y
T.
J1x, y2 = detD
0w 0w
0x
0y
314
Chapter 6
Vector Random Variables
The determinant J(x, y) is called the Jacobian of the transformation. The Jacobian of
the inverse transformation is given by
0x
0w
T.
0y
0w
0x
0v
J1v, w2 = detD
0y
0v
It can be shown that
ƒ J1v, w2 ƒ =
1
.
ƒ J1x, y2 ƒ
We therefore conclude that the joint pdf of V and W can be found using either of the
following expressions:
fV,W1v, w2 =
fX,Y1h11v, w2, 1h21v, w22
(6.22a)
ƒ J1x, y2 ƒ
= fX,Y1h11v, w2, 1h21v, w22 ƒ J1v, w2 ƒ .
(6.22b)
It should be noted that Eq. (6.21) is applicable even if Eq. (6.20) has more than
one solution; the pdf is then equal to the sum of terms of the form given by Eqs. (6.22a)
and (6.22b), with each solution providing one such term.
Example 6.13
Server 1 receives m Web page requests and server 2 receives k Web page requests. Web page transmission times are exponential random variables with mean 1/m. Let X be the total time to transmit
files from server 1 and let Y be the total time for server 2. Find the joint pdf for T, the total transmission time, and W, the proportion of the total transmission time contributed by server 1:
T = X + Y
W =
and
X
.
X + Y
From Chapter 4, the sum of j independent exponential random variables is an Erlang random variable with parameters j and m. Therefore X and Y are independent Erlang random variables with parameters m and m, and k and m, respectively:
fX1x2 =
me -mx1mx2m - 1
and fY1y2 =
1m - 12!
me -my1my2k - 1
1k - 12!
We solve for X and Y in terms of T and W:
X = TW
Y = T11 - W2.
and
The Jacobian of the transformation is:
J1x, y2 = detC
=
1
y
1x + y22
-x
1x + y2
2
-
1
-x
1x + y22
y
1x + y22
=
S
-1
-1
=
.
x + y
t
.
Section 6.2
Functions of Several Random Variables
315
The joint pdf of T and W is then:
fT,W1t, w2 =
1
ƒ J1x, y2 ƒ
= t
=
B
me -mx1mx2m - 1 me -my1my2k - 1
1m - 12!
1k - 12!
R x = tw
y = t(1 - w)
me -mtw1mtw2m - 1 me -mt11 - w21mt11 - w22k - 1
1m - 12!
1k - 12!
1m + k - 12!
me -mt1mt2m + k - 1
1m + k - 12! 1m - 12!1k - 12!
1w2m - 111 - w2k - 1.
We see that T and W are independent random variables. As expected, T is Erlang with parameters m + k and m, since it is the sum of m + k independent Erlang random variables. W is the
beta random variable introduced in Chapter 3.
The method developed above can be used even if we are interested in only one
function of a random variable. By defining an “auxiliary” variable, we can use the
transformation method to find the joint pdf of both random variables, and then we can
find the marginal pdf involving the random variable of interest. The following example
demonstrates the method.
Example 6.14 Student’s t-distribution
Let X be a zero-mean, unit-variance Gaussian random variable and let Y be a chi-square random
variable with n degrees of freedom. Assume that X and Y are independent. Find the pdf of
V = X/ 2Y/n.
Define the auxiliary function of W = Y. The variables X and Y are then related to V and W by
X = V 2W/n
The Jacobian of the inverse transformation is
ƒ J1v, w2 ƒ = `
Y = W.
and
1w/n
0
1v/221wn
` = 1w/n.
1
Since fX,Y1x, y2 = fX1x2fY1y2, the joint pdf of V and W is thus
fV,W1v, w2 =
=
2
n/2 - 1 -y/2
e
e -x /2 1y/22
ƒ J1v, w2 ƒ ` x
2≠1n/22
22p
y
1w/221n - 12/2e -31w/2211 + v2/n24
22np≠1n/22
= v 2w/n
= w
.
The pdf of V is found by integrating the joint pdf over w:
fV1v2 =
22np≠1n/22 L0
1
q
1w/221n - 12/2e -31w/2211 + v2/n24 dw.
If we let w¿ = 1w/221v2/n + 12, the integral becomes
fV1v2 =
11 + v2/n2-1n + 12/2
2np≠1n/22
L0
q
1w¿21n - 12/2e -w¿ dw¿.
316
Chapter 6
Vector Random Variables
By noting that the above integral is the gamma function evaluated at 1n + 12/2, we finally obtain
the Student’s t-distribution:
fV1v2 =
11 + v2/n2-1n + 12/2≠11n + 12/22
2np≠1n/22
.
This pdf is used extensively in statistical calculations. (See Chapter 8.)
Next consider the problem of finding the joint pdf for n functions of n random
variables X = 1X1 , Á , Xn2:
Z1 = g11X2, Z2 = g21X2, Á ,
Zn = gn1X2.
We assume as before that the set of equations
z1 = g11x2, z2 = g21x2, Á ,
zn = gn1x2.
(6.23)
has a unique solution given by
x1 = h11x2, x2 = h21x2, Á ,
xn = hn1x2.
The joint pdf of Z is then given by
fX1, Á ,Xn1h11z2, h21z2, Á , hn1z22
fZ1, Á ,Zn1z1 , Á , zn2 =
ƒ J1x1 , x2 , Á , xn2 ƒ
= fX1, Á ,Xn1h11z2, h21z2, Á , hn1z22 ƒ J1z1 , z2 , Á , zn2 ƒ ,
(6.24a)
(6.24b)
where ƒ J1x1 , Á , xn2 ƒ and ƒ J1z1 , Á , zn2 ƒ are the determinants of the transformation
and the inverse transformation, respectively,
0g1
0x1
J1x1 , Á , xn2 = detE o
0gn
0x1
Á
Á
0g1
0xn
o U
0gn
0xn
and
0h1
0z1
J1z1 , Á , zn2 = detE o
0hn
0z1
Á
Á
0h1
0zn
o U.
0hn
0zn
Section 6.2
Functions of Several Random Variables
317
In the special case of a linear transformation we have:
a11
a
Z = AX = D 21
.
an1
Á
Á
Á
Á
a12
a22
.
an2
a1n
X1
a2n
X
T D 2 T.
.
Á
ann
Xn
The components of Z are:
Zj = aj1X1 + aj2X2 + Á + ajnXn .
Since dzj /dxi = aji , the Jacobian is then simply:
a11
a
J1x1 , x2 , Á , xn2 = detD 21
.
an1
Á
Á
Á
Á
a12
a22
.
an2
Assuming that A is invertible,1 we then have that:
fZ1z2 =
fX1x2
ƒ det A ƒ
`
=
x = A-1z
a1n
a2n
T = det A.
.
ann
fX1A-1z2
ƒ det A ƒ
.
Example 6.15 Sum of Random Variables
Given a random vector X = 1X1 , X2 , X32, find the joint pdf of the sum:
Z = X1 + X2 + X3 .
We will use the transformation by introducing auxiliary variables as follows:
Z1 = X1 , Z2 = X1 + X2 , Z3 = X1 + X2 + X3 .
The inverse transformation is given by:
X1 = Z1 , X2 = Z2 - Z1 , X3 = Z3 - Z2 .
The Jacobian matrix is:
1
J1x1 , x2 , x32 = detC 1
1
0
1
1
0
0 S = 1.
1
Therefore the joint pdf of Z is
fZ1z1 , z2 , z32 = fX1z1 , z2 - z1 , z3 - z22.
The pdf of Z3 is obtained by integrating with respect to z1 and z2 :
q
q
fZ31z2 =
3
3
q
q
-
fX1z1 , z2 - z1 , z - z22 dz1dz2 .
-
This expression can be simplified further if X1 , X2 , and X3 are independent random variables.
1
Appendix C provides a summary of definitions and useful results from linear algebra.
318
6.3
Chapter 6
Vector Random Variables
EXPECTED VALUES OF VECTOR RANDOM VARIABLES
In this section we are interested in the characterization of a vector random variable
through the expected values of its components and of functions of its components. We
focus on the characterization of a vector random variable through its mean vector and
its covariance matrix. We then introduce the joint characteristic function for a vector
random variable.
The expected value of a function g1X2 = g1X1 , Á , Xn2 of a vector random variable X = 1X1 , X2 , Á , Xn2 is given by:
q
q
g1x1 , x2 , Á , xn2fX1x1 , x2 , Á , xn2 dx1 dx2 Á dxn X jointly
L
-q
L
-q
continuous
E[Z] = d
X discrete.
a Á a g1x1 , x2 , Á , xn2pX1x1 , x2 , Á , xn2
xn
x1
(6.25)
Á
An important example is g(X) equal to the sum of functions of X. The procedure
leading to Eq. (5.26) and a simple induction argument show that:
E3g11X2 + g21X2 + Á + gn1X24 = E3g11X24 + Á + E3gn1X24.
(6.26)
Another important example is g(X) equal to the product of n individual functions of
the components. If X1 , Á , Xn are independent random variables, then
E3g11X12g21X22 Á gn1Xn24 = E3g11X124E3g21X224 Á E3gn1Xn24.
6.3.1
(6.27)
Mean Vector and Covariance Matrix
The mean, variance, and covariance provide useful information about the distribution of a random variable and are easy to estimate, so we are frequently interested
in characterizing multiple random variables in terms of their first and second moments. We now introduce the mean vector and the covariance matrix. We then investigate the mean vector and the covariance matrix of a linear transformation of a
random vector.
For X = 1X1 , X2 , Á , Xn2, the mean vector is defined as the column vector of
expected values of the components Xk:
mX
X1
E[X1]
X2
E[X ]
= E[X] = ED . T ! D . 2 T .
..
..
Xn
E[Xn]
(6.28a)
Note that we define the vector of expected values as a column vector. In previous sections we have sometimes written X as a row vector, but in this section and wherever we
deal with matrix transformations, we will represent X and its expected value as a column vector.
Section 6.3
Expected Values of Vector Random Variables
319
The correlation matrix has the second moments of X as its entries:
RX
E3X214
E3X2X14
= D
.
E3XnX14
E3X1X24
E3X224
.
E3XnX24
E3X1Xn4
E3X2Xn4
T.
.
E3X2n4
Á
Á
Á
Á
(6.28b)
The covariance matrix has the second-order central moments as its entries:
KX
E31X1 - m1224
E31X2 - m221X1 - m124
= D
.
E31Xn - mn21X1 - m124
E31X1 - m121X2 - m224
E31X2 - m2224
.
E31Xn - mn21X2 - m224
Á
Á
Á
Á
E31X1 - m121Xn - mn24
E31X2 - m221Xn - mn24
T.
.
E31Xn - mn224
(6.28c)
Both R X and K X are n * n symmetric matrices. The diagonal elements of K X are
given by the variances VAR3Xk4 = E31Xk - mk224 of the elements of X. If these elements are uncorrelated, then COV1Xj , Xk2 = 0 for j Z k, and K X is a diagonal matrix. If the random variables X1 , Á , Xn are independent, then they are uncorrelated
and K X is diagonal. Finally, if the vector of expected values is 0, that is, mk = E3Xk4 = 0
for all k, then R X = K X.
Example 6.16
Let X = 1X1 , X2 , X32 be the jointly Gaussian random vector from Example 6.6. Find E[X] and K X.
We rewrite the joint pdf as follows:
2
e -1x1
fX1,X2,X31x1 , x2 , x32 =
2p
B
+ x22 - 2
1
xx2
22 1 2
1 - ¢ -
1
22
2
e -x 3 / 2
≤
2
22p
.
We see that X3 is a Gaussian random variable with zero mean and unit variance, and that it is independent of X1 and X2 . We also see that X1 and X2 are jointly Gaussian with zero mean and
unit variance, and with correlation coefficient
rX1X2 = -
1
22
=
COV1X1 , X22
sX1sX2
= COV1X1 , X22.
Therefore the vector of expected values is: m X = 0, and
1
KX = E
-
1
22
0
-
1
22
0
1
0
0
1
U.
320
Chapter 6
Vector Random Variables
We now develop compact expressions for R X and K X. If we multiply X, an n * 1
matrix, and X T, a 1 * n matrix, we obtain the following n * n matrix:
X1
X21
X2
XX
XX T = D . T 3X1 , X2 , Á , Xn4 = D 2 1
..
.
Xn
XnX1
X1X2
X22
.
XnX2
Á
Á
Á
Á
X1Xn
X2Xn
T.
.
X2n
If we define the expected value of a matrix to be the matrix of expected values of the
matrix elements, then we can write the correlation matrix as:
The covariance matrix is then:
R X = E3XX T4.
(6.29a)
K X = E31X - m X21X - m X2T4
= E3XX T4 - m X E3X T4 - E3X4m XT + m Xm XT
= R X - m Xm XT.
6.3.2
(6.29b)
Linear Transformations of Random Vectors
Many engineering systems are linear in the sense that will be elaborated on in Chapter
10. Frequently these systems can be reduced to a linear transformation of a vector of
random variables where the “input” is X and the “output” is Y:
a11
a
Y = D 21
.
an1
Á
Á
Á
Á
a 12
a 22
.
an2
an
X1
a2n X2
T D .. T = AX.
.
.
ann Xn
The expected value of the kth component of Y is the inner product (dot product) of the
kth row of A and X:
E3Yk4 = E B a akjXj R = a akjE3Xj4.
n
n
j=1
j=1
Each component of E[Y] is obtained in this manner, so:
a a1jE3Xj4
n
j=1
n
a a2jE3Xj4
m Y = E3Y4 = G j = 1
.
.
.
a anjE3Xj4
n
a11
a
W = D 21
.
an1
a12
a22
.
an2
Á
Á
Á
Á
an
E3X14
a2n E3X24
T
T D ..
.
.
ann E3Xn4
j=1
= AE3X4 = Am X.
(6.30a)
Section 6.3
Expected Values of Vector Random Variables
321
The covariance matrix of Y is then:
K Y = E31Y - m Y21Y - m Y2T4 = E31AX - Am X21AX - Am X2T4
= E3A1X - m X21X - m X2TAT4 = AE31X - m X21X - m X2T4AT
= AK XAT,
(6.30b)
where we used the fact that the transpose of a matrix multiplication is the product of
the transposed matrices in reverse order: 5A1X - m X26T = 1X - m X2TAT.
The cross-covariance matrix of two random vectors X and Y is defined as:
K XY = E31X - m X21Y - m Y2T4 = E3XY T4 - m Xm YT = R XY - m Xm YT.
We are interested in the cross-covariance between X and Y = AX:
K XY = E3X - m X21Y - m Y2T4 = E31X - m X21X - m X2TAT4
= K XAT.
(6.30c)
Example 6.17 Transformation of Uncorrelated Random Vector
Suppose that the components of X are uncorrelated and have unit variance, then K X = I, the
identity matrix. The covariance matrix for Y = AX is
K Y = AK XAT = AIAT = AAT.
(6.31)
T
In general K Y = AA is not a diagonal matrix and so the components of Y are correlated. In
Section 6.6 we discuss how to find a matrix A so that Eq. (6.31) holds for a given K Y. We can
then generate a random vector Y with any desired covariance matrix K Y.
Suppose that the components of X are correlated so K X is not a diagonal matrix.
In many situations we are interested in finding a transformation matrix A so that
Y = AX has uncorrelated components. This requires finding A so that K Y = AK XAT
is a diagonal matrix. In the last part of this section we show how to find such a matrix A.
Example 6.18 Transformation to Uncorrelated Random Vector
Suppose the random vector X1 , X2 , and X3 in Example 6.16 is transformed using the matrix:
1
22
A = E 1
22
0
Find the E[Y] and K Y.
1
22
1
22
0
0
0
1
U.
322
Chapter 6
Vector Random Variables
Since m X = 0, then E3Y4 = Am X = 0. The covariance matrix of Y is:
KY
1
1
= AK XAT = C 1
2
0
1
1
= C1
2
0
1
-1
0
1
-1
0
-
1
0
0S E
1
1
22
0
1
1 0
22
0S E
1
1 1
22
0
1
22
0
1
U C1
0
0
1
1
0
1
1 +
- ¢1 +
22
1
22
0
0
U = E
1
0
0S
1
1
22
1 -
0
≤
1
-1
0
0
0
1
22
1 +
0
0
0
0
U.
1
The linear transformation has produced a vector of random variables Y = 1Y1 , Y2 , Y32 with
components that are uncorrelated.
*6.3.3 Joint Characteristic Function
The joint characteristic function of n random variables is defined as
≥ X1,X2, Á , Xn1v1 , v2 , Á , vn2 = E3ej1v1X1 + v2X2 +
Á + vnXn2
4.
(6.32a)
In this section we develop the properties of the joint characteristic function of two random variables. These properties generalize in straightforward fashion to the case of n
random variables. Therefore consider
≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24.
(6.32b)
If X and Y are jointly continuous random variables, then
q
≥ X,Y1v1 , v22 =
q
(6.32c)
fX,Y1x, y2ej1v1x + v2y2 dx dy.
L- q L- q
Equation (6.32c) shows that the joint characteristic function is the two-dimensional
Fourier transform of the joint pdf of X and Y. The inversion formula for the Fourier
transform implies that the joint pdf is given by
q
q
1
(6.33)
≥ X,Y1v1 , v22e -j1v1x + v2y2 dv1 dv2 .
=
4p2
L- q L- q
Note in Eq. (6.32b) that the marginal characteristic functions can be obtained from
joint characteristic function:
fX,Y1x, y2 =
≥ X1v2 = ≥ X,Y1v, 02
≥ Y1v2 = ≥ X,Y10, v2.
(6.34)
If X and Y are independent random variables, then the joint characteristic function is
the product of the marginal characteristic functions since
≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24 = E3ejv1Xejv2Y4
= E3ejv1X4E3ejv2Y4 = ≥ X1v12≥ Y1v22,
where the third equality follows from Eq. (6.27).
(6.35)
Section 6.3
Expected Values of Vector Random Variables
323
The characteristic function of the sum Z = aX + bY can be obtained from the
joint characteristic function of X and Y as follows:
≥ Z1v2 = E3ejv1aX + bY24 = E3ej1vaX + vbY24 = ≥ X,Y1av, bv2.
(6.36a)
If X and Y are independent random variables, the characteristic function of Z = aX + bY
is then
(6.36b)
≥ Z1v2 = ≥ X,Y1av, bv2 = ≥ X1av2≥ Y1bv2.
In Section 8.1 we will use the above result in dealing with sums of random variables.
The joint moments of X and Y (if they exist) can be obtained by taking the derivatives of the joint characteristic function. To show this we rewrite Eq. (6.32b) as the
expected value of a product of exponentials and we expand the exponentials in a
power series:
≥ X,Y1v1 , v22 = E3ejv1Xejv2Y4
= EB a
q
i=0
1jv1X2i
i!
a
k!
k=0
= a a E3XiYk4
q
q
1jv2Y2k
q
i=0k=0
R
1jv12i 1jv22k
i!
k!
.
It then follows that the moments can be obtained by taking an appropriate set of derivatives:
0 i0 k
1
(6.37)
E3XiYk4 = i + k i k ≥ X,Y1v1 , v22 |v1 = 0,v2 = 0 .
j
0v10v2
Example 6.19
Suppose U and V are independent zero-mean, unit-variance Gaussian random variables, and let
X = U + V
Y = 2U + V.
Find the joint characteristic function of X and Y, and find E[XY].
The joint characteristic function of X and Y is
≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24 = E3ejv11U + V2ejv212U + V24
= E3ej11v1 + 2v22U + 1v1 + v22V24.
Since U and V are independent random variables, the joint characteristic function of U and V is
equal to the product of the marginal characteristic functions:
≥ X,Y1v1 , v22 = E3ej11v1 + 2v22U24E3ej11v1 + v22V24
= ≥ U1v1 + 2v22≥ V1v1 + v22
= e - 21v1 + 2v22 e - 21v1 + v22
1
2
2
= e{- 212v1 + 6v1v2 + 5v2 2}.
1
2
1
2
where marginal characteristic functions were obtained from Table 4.1.
324
Chapter 6
Vector Random Variables
The correlation E[XY] is found from Eq. (6.37) with i = 1 and k = 1:
E3XY4 =
02
1
≥ X,Y1v1 , v22 ƒ v1 = 0,v2 = 0
2 0v 0v
j
1
2
= -exp{- 1212v12 + 6v1v2 + 5v222}36v1 + 10v24a
+
1
b34v1 + 6v24
4
1
exp{- 1212v21 + 6v1v2 + 5v222}364 ƒ v1 = 0,v2 = 0
2
= 3.
You should verify this answer by evaluating E3XY4 = E31U + V212U + V24 directly.
*6.3.4 Diagonalization of Covariance Matrix
Let X be a random vector with covariance K X. We are interested in finding an n * n
matrix A such that Y = AX has a covariance matrix that is diagonal. The components
of Y are then uncorrelated.
We saw that K X is a real-valued symmetric matrix. In Appendix C we state results
from linear algebra that K X is then a diagonalizable matrix, that is, there is a matrix P
such that:
(6.38a)
P TK XP = ∂ and P TP = I
where ∂ is a diagonal matrix and I is the identity matrix. Therefore if we let A = P T,
then from Eq. (6.30b) we obtain a diagonal K Y.
We now show how P is obtained. First, we find the eigenvalues and eigenvectors
of K X from:
(6.38b)
K Xei = liei
where ei are n * 1 column vectors.2 We can normalize each eigenvector ei so that
ei Tei , the sum of the square of its components, is 1. The normalized eigenvectors are
then orthonormal, that is,
1 if i = j
(6.38c)
ei Tej = di, j = b
0 if i Z j.
Let P be the matrix whose columns are the eigenvectors of K X and let ∂ be the diagonal matrix of eigenvalues:
P = 3e1 , e2 , Á , en4
∂ = diag3l14.
From Eq. (6.38b) we have:
K XP = K X3e1 , e2 , Á , en4 = 3K Xe1 , K Xe2 , Á , K Xen4
= 3l1e1 , l2e2 , Á , lnen4 = P∂
(6.39a)
where the second equality follows from the fact that each column of K XP is obtained
by multiplying a column of P by K X. By premultiplying both sides of the above equations by P T, we obtain:
P TK XP = P TP∂ = ∂.
(6.39b)
2
See Appendix C.
Section 6.4
Jointly Gaussian Random Vectors
325
We conclude that if we let A = P T, and
Y = AX = P TX,
(6.40a)
then the random variables in Y are uncorrelated since
K Y = P TK XP = ∂.
(6.40b)
In summary, any covariance matrix KX. can be diagonalized by a linear transformation.
The matrix A in the transformation is obtained from the eigenvectors of K X.
Equation (6.40b) provides insight into the invertibility of K X and K Y. From linear algebra we know that the determinant of a product of n * n matrices is the product of the determinants, so:
det K Y = det P T det K X det P = det ∂ = l1l2 Á ln ,
where we used the fact that det P T det P = det I = 1. Recall that a matrix is invertible
if and only if its determinant is nonzero. Therefore K Y is not invertible if and only if
one or more of the eigenvalues of K X is zero.
Now suppose that one of the eigenvalues is zero, say lk = 0. Since VAR3Yk4 =
lk = 0, then Yk = 0. But Yk is defined as a linear combination, so
0 = Yk = ak1X1 + ak2X2 + Á + aknXn.
We conclude that the components of X are linearly dependent. Therefore, one or more
of the components in X are redundant and can be expressed as a linear combination of
the other components.
It is interesting to look at the vector X expressed in terms of Y. Multiply both
sides of Eq. (6.40a) by P and use the fact that PP T = I:
Y1
n
Y
X = PP TX = PY = 3e1 , e2 , Á , en4D .2 T = a Ykek .
..
k=1
(6.41)
Yn
This equation is called the Karhunen-Loeve expansion.The equation shows that a random
vector X can be expressed as a weighted sum of the eigenvectors of K X, where the coefficients are uncorrelated random variables Yk . Furthermore, the eigenvectors form an orthonormal set. Note that if any of the eigenvalues are zero, VAR3Yk4 = lk = 0, then Yk = 0,
and the corresponding term can be dropped from the expansion in Eq. (6.41). In Chapter
10, we will see that this expansion is very useful in the processing of random signals.
6.4
JOINTLY GAUSSIAN RANDOM VECTORS
The random variables X1 , X2 , Á , Xn are said to be jointly Gaussian if their joint pdf is
given by
fX1x2 ! fX1,X2,Á,Xn1x1 , Á , xn2 =
exp5 - 211x - m2TK -11x - m26
12p2n/2 ƒ K ƒ 1/2
,
(6.42a)
326
Chapter 6
Vector Random Variables
where x and m are column vectors defined by
m1
E3X14
m
E3X24
m = D 2T = D
T
o
o
mn
E3Xn4
x1
x
x = D 2T,
o
xn
and K is the covariance matrix that is defined by
VAR1X12
COV1X2 , X12
K = D
o
COV1Xn , X12
COV1X1 , X22
VAR1X22
o
Á
Á
Á
COV1X1 , Xn2
COV1X2 , Xn2
T.
o
VAR1Xn2
(6.42b)
The 1.2T in Eq. (6.42a) denotes the transpose of a matrix or vector. Note that the covariance matrix is a symmetric matrix since COV1Xi , Xj2 = COV1Xj , Xi2.
Equation (6.42a) shows that the pdf of jointly Gaussian random variables is completely specified by the individual means and variances and the pairwise covariances. It
can be shown using the joint characteristic function that all the marginal pdf’s associated with Eq. (6.42a) are also Gaussian and that these too are completely specified by
the same set of means, variances, and covariances.
Example 6.20
Verify that the two-dimensional Gaussian pdf given in Eq. (5.61a) has the form of Eq. (6.42a).
The covariance matrix for the two-dimensional case is given by
K = B
s21
rX,Ys1s2
rX,Ys1s2
R,
s22
where we have used the fact the COV1X1 , X22 = rX,Ys1s2 . The determinant of K is s12
s2211 - r2X,Y2 so the denominator of the pdf has the correct form. The inverse of the covariance
matrix is also a real symmetric matrix:
K -1 =
s21s2211
1
s22
B
2
- rX,Y2 -rX,Ys1s2
-rX,Ys1s2
R.
s21
The term in the exponent is therefore
1
s22
1x
m
,
y
m
2
B
1
2
-rX,Ys1s2
s21s2211 - r2X,Y2
=
=
s21s2211
-rX,Ys1s2 x - m1
RB
R
s21
y - m2
1
s21x - m12 - rX,Ys1s21y - m22
1x - m1 , y - m22 B 2
R
2
-rX,Ys1s21x - m12 + s211y - m22
- rX,Y2
11x - m12/s122 - 2rX,Y11x - m12/s1211y - m22/s22 + 11y - m22/s222
11 - r2X,Y2
Thus the two-dimensional pdf has the form of Eq. (6.42a).
.
Section 6.4
Jointly Gaussian Random Vectors
327
Example 6.21
The vector of random variables (X, Y, Z) is jointly Gaussian with zero means and covariance matrix:
VAR1X2
K = C COV1Y, X2
COV1Z, X2
COV1X, Y2
VAR1Y2
COV1Z, Y2
COV1X, Z2
1.0
COV1Y, Z2 S = C 0.2
VAR1Z2
0.3
0.2
1.0
0.4
0.3
0.4 S.
1.0
Find the marginal pdf of X and Z.
We can solve this problem two ways. The first involves integrating the pdf directly to obtain
the marginal pdf.The second involves using the fact that the marginal pdf for X and Z is also Gaussian and has the same set of means, variances, and covariances. We will use the second approach.
The pair (X, Z) has zero-mean vector and covariance matrix:
K¿ = B
VAR1X2
COV1Z, X2
COV1X, Z2
1.0
R = B
VAR1Z2
0.3
0.3
R.
1.0
The joint pdf of X and Z is found by substituting a zero-mean vector and this covariance matrix
into Eq. (6.42a).
Example 6.22 Independence of Uncorrelated Jointly Gaussian Random Variables
Suppose X1 , X2 , Á , Xn are jointly Gaussian random variables with COV1Xi , Xj2 = 0 for i Z j.
Show that X1 , X2 , Á , Xn are independent random variables.
From Eq. (6.42b) we see that the covariance matrix is a diagonal matrix:
K = diag3VAR1Xi24 = diag3s2i 4
Therefore
K -1 = diag B
and
1
R
s2i
1x - m2TK -11x - m2 = a ¢
n
i=1
xi - m i 2
≤ .
si
Thus from Eq. (6.42a)
fX1x2 =
n
2
exp E - 12 a i = 1 [1xi - mi2/si] F
12p2
n/2
ƒKƒ
1/2
= q
n
i=1
exp E - 21 [1xi - mi2/si]2 F
22ps2i
= q fXi1xi2.
n
i=1
Thus X1 , X2 , Á , Xn are independent Gaussian random variables.
Example 6.23 Conditional pdf of Gaussian Random Variable
Find the conditional pdf of Xn given X1 , X2 , Á , Xn - 1 .
Let K n be the covariance matrix for X n = 1X1 , X2 , Á , Xn2 and K n - 1 be the covariance matrix for X n - 1 = 1X1 , X2 , Á , Xn - 12. Let Qn = K n-1 and Qn -1 = Kn-1-1, then the latter matrices are
328
Chapter 6
Vector Random Variables
submatrices of the former matrices as shown below:
K1n
K2n
T
...
Kn - 1
Kn = D
K1n
Á
K2n
Qn = D
Q1n
Knn
Q1n
Q2n
T
...
Qn - 1
Q2n
Á
Qnn
Below we will use the subscript n or n - 1 to distinguish between the two random vectors and
their parameters. The marginal pdf of Xn given X1 , X2 , Á , Xn - 1 is given by:
fXn1xn ƒ x1 , Á , xn - 12 =
fXn1Xn2
fXn - 11Xn - 12
=
exp5- 121x n - m n2TQn1x n - m n26
=
exp5- 121x n - m n2TQn1x n - m n2 + 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 126
12p2n/2 ƒ K n ƒ 1/2
12p21n - 121/2 ƒ K n - 1 ƒ 1/2
exp5- 211x n - 1
- m n - 12TQn - 11x n - 1 - m n - 126
22p ƒ K n ƒ 1/2/ ƒ K n - 1 ƒ 1/2
.
In Problem 6.60 we show that the terms in the above expression are given by:
1
2 1x n
where B =
- m n2TQn1x n - m n2 - 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 12
= Qnn51xn - mn2 + B62 - QnnB2
1 n-1
Qjn1xj - mj2
Qnn ja
=1
and
(6.43)
ƒ K n ƒ / ƒ K n - 1 ƒ = 1 /Qnn .
This implies that Xn has mean mn - B, and variance 1/Qnn . The term QnnB2 is part of the normalization constant. We therefore conclude that:
fXn1xn ƒ x1 , Á , xn - 12 =
exp b -
2
Qnn
1 n-1
Qjn1xj - mj2 ≤ r
¢ x - mn +
a
2
Qnn j = 1
22p / Qnn
We see that the conditional mean of Xn is a linear function of the “observations”
x1 , x2 , Á , xn - 1 .
*6.4.1 Linear Transformation of Gaussian Random Variables
A very important property of jointly Gaussian random variables is that the linear transformation of any n jointly Gaussian random variables results in n random variables that
are also jointly Gaussian. This is easy to show using the matrix notation in Eq. (6.42a).
Let X = 1X1 , Á , Xn2 be jointly Gaussian with covariance matrix KX and mean vector
m X and define Y = 1Y1 , Á , Yn2 by
Y = AX,
Section 6.4
329
Jointly Gaussian Random Vectors
where A is an invertible n * n matrix. From Eq. (5.60) we know that the pdf of Y is
given by
fX1A-1y2
fY1y2 =
ƒAƒ
=
-1
exp5- 211A-1y - mX2TKX
1A-1y - mX26
12p2 ƒ A ƒ ƒ KX ƒ
n/2
1/2
.
(6.44)
From elementary properties of matrices we have that
and
1A-1y - m X2 = A-11y - Am X2
1A-1y - m X2T = 1y - Am X2TA-1T.
The argument in the exponential is therefore equal to
-1 -1
A 1y - Am X2 = 1y - Am X2T1AKXAT2-11y - Am X2
1y - Am X2TA-1TKX
T -1
T
since A-1TK -1
X = 1AKXA 2 . Letting KY = AKXA and m Y = Am X and noting that
det1KY2 = det1AKXAT2 = det1A2det1KX2det1AT2 = det1A22 det1KX2, we finally
have that the pdf of Y is
T -1
e -11/221y - mY2 KY 1y - mY2
(6.45)
.
fY1y2 =
n/2
1/2
12p2 ƒ KY ƒ
Thus the pdf of Y has the form of Eq. (6.42a) and therefore Y1 , Á , Yn are jointly
Gaussian random variables with mean vector and covariance matrix:
m Y = Am X and KY = AKXAT.
This result is consistent with the mean vector and covariance matrix we obtained before in Eqs. (6.30a) and (6.30b).
In many problems we wish to transform X to a vector Y of independent Gaussian
random variables. Since KX is a symmetric matrix, it is always possible to find a matrix
A such that AKXAT = ¶ is a diagonal matrix. (See Section 6.6.) For such a matrix A,
the pdf of Y will be
fY1y2 =
T
e -11/221y - n2
¶ -11y - n2
12p2 ƒ ¶ ƒ
n/2
1/2
exp b - 21 a 1yi - ni22/li r
n
=
i=1
312pl1212pl22 Á 12pln24
1/2
,
(6.46)
where l1 , Á , ln are the diagonal components of ¶. We assume that these values are
all nonzero. The above pdf implies that Y1 , Á , Yn are independent random variables
330
Chapter 6
Vector Random Variables
with means ni and variance li . In conclusion, it is possible to linearly transform a vector
of jointly Gaussian random variables into a vector of independent Gaussian random
variables.
It is always possible to select the matrix A that diagonalizes K so that det1A2 = 1.
The transformation AX then corresponds to a rotation of the coordinate system so that
the principal axes of the ellipsoid corresponding to the pdf are aligned to the axes of the
system. Example 5.48 provides an n = 2 example of rotation.
In computer simulation models we frequently need to generate jointly Gaussian
random vectors with specified covariance matrix and mean vector. Suppose that
X = 1X1 , X2 , Á, Xn2 has components that are zero-mean, unit-variance Gaussian
random variables, so its mean vector is 0 and its covariance matrix is the identity matrix
I. Let K denote the desired covariance matrix. Using the methods discussed in Section
6.3, it is possible to find a matrix A so that ATA = K. Therefore Y = ATU has zero
mean vector and covariance K. From Eq. (6.46) we have that Y is also a jointly Gaussian random vector with zero mean vector and covariance K. If we require a nonzero
mean vector m, we use Y + m.
Example 6.24 Sum of Jointly Gaussian Random Variables
Let X1 , X2 , Á , Xn be jointly Gaussian random variables with joint pdf given by Eq. (6.42a). Let
Z = a1X1 + a2X2 + Á + anXn .
We will show that Z is always a Gaussian random variable.
We find the pdf of Z by introducing auxiliary random variables. Let
Z3 = X3 , Á ,
Z2 = X2 ,
If we define Z = 1Z1 , Z2 , Á , Zn2, then
Zn = Xn .
Z = AX
where
A = D
a1
0
Á
Á
Á
Á
a2
1
#
#
#
0
#
#
#
an
0
0
1
#
T.
From Eq. (6.45) we have that Z is jointly Gaussian with mean n = Am, and covariance matrix
C = AKAT. Furthermore, it then follows that the marginal pdf of Z is a Gaussian pdf with mean
given by the first component of n and variance given by the 1-1 component of the covariance matrix C. By carrying out the above matrix multiplications, we find that
E3Z4 = a aiE3Xi4
n
(6.47a)
i=1
VAR3Z4 = a a aiaj COV1Xi , Xj2.
n
n
i=1 j=1
(6.47b)
Section 6.4
Jointly Gaussian Random Vectors
331
*6.4.2 Joint Characteristic Function of a Gaussian Random Variable
The joint characteristic function is very useful in developing the properties of jointly
Gaussian random variables. We now show that the joint characteristic function of n
jointly Gaussian random variables X1, X2, Á , Xn is given by
£ X1,X2, Á , Xn1v1 , v2 , Á, vn2 = e ja i = 1vimi - 2 a i = 1a k = 1vivk COV1Xi,Xk2,
1
n
n
n
(6.48a)
which can be written more compactly as follows:
T
T
£ X1V2 ! £ X1,X2, Á , Xn1v1 , v2 , Á , vn2 = ejV m - 2 V KV,
1
(6.48b)
where m is the vector of means and K is the covariance matrix defined in Eq. (6.42b).
Equation (6.48) can be verified by direct integration (see Problem 6.65). We use
the approach in [Papoulis] to develop Eq. (6.48) by using the result from Example 6.24
that a linear combination of jointly Gaussian random variables is always Gaussian.
Consider the sum
Z = a1X1 + a2X2 + Á + anXn .
The characteristic function of Z is given by
£ Z1v2 = E3ejvZ4 = E3ej1va1X1 + va2X2 +
4
Á + vanXn2
= £ X1, Á , Xn1a1v, a2v, Á , anv2.
On the other hand, since Z is a Gaussian random variable with mean and variance
given Eq. (6.47), we have
£ Z1v2 = ejvE3Z4 - 2 VAR3Z4v
1
2
= ejv a i = 1aimi - 2v a i = 1a k = 1aiak COV1Xi,Xk2.
n
1 2
n
n
By equating both expressions for £ Z1v2 with v = 1, we finally obtain
£ X1,X2,
Á , Xn1a 1 ,
(6.49)
a2 , Á , an2 = eja i = 1 aimi - 2 a i = 1a k = 1aiak COV1Xi,Xk2
1
n
T
= eja
m - 21 aTKa
.
n
n
(6.50)
By replacing the ai’s with vi’s we obtain Eq. (6.48).
The marginal characteristic function of any subset of the random variables
X1 , X2 , Á , Xn can be obtained by setting appropriate vi’s to zero. Thus, for example,
the marginal characteristic function of X1 , X2 , Á , Xm for m 6 n is obtained by setting vm + 1 = vm + 2 = Á = vn = 0. Note that the resulting characteristic function
again corresponds to that of jointly Gaussian random variables with mean and covariance terms corresponding the reduced set X1 , X2 , Á , Xm .
The derivation leading to Eq. (6.50) suggests an alternative definition for jointly
Gaussian random vectors:
Definition: X is a jointly Gaussian random vector if and only every linear combination Z = aTX is a Gaussian random variable.
332
Chapter 6
Vector Random Variables
In Example 6.24 we showed that if X is a jointly Gaussian random vector then the linear combination Z = aTX is a Gaussian random variable. Suppose that we do not
know the joint pdf of X but we are given that Z = aTX is a Gaussian random variable
for any choice of coefficients aT = 1a1 , a2 , Á , an2. This implies that Eqs. (6.48) and
(6.49) hold, which together imply Eq. (6.50) which states that X has the characteristic
function of a jointly Gaussian random vector.
The above definition is slightly broader than the definition using the pdf in Eq. (6.44).
The definition based on the pdf requires that the covariance in the exponent be invertible.
The above definition leads to the characteristic function of Eq. (6.50) which does not
require that the covariance be invertible. Thus the above definition allows for cases
where the covariance matrix is not invertible.
6.5
ESTIMATION OF RANDOM VARIABLES
In this book we will encounter two basic types of estimation problems. In the first type, we
are interested in estimating the parameters of one or more random variables, e.g., probabilities, means, variances, or covariances. In Chapter 1, we stated that relative frequencies can
be used to estimate the probabilities of events, and that sample averages can be used to estimate the mean and other moments of a random variable. In Chapters 7 and 8 we will
consider this type of estimation further. In this section, we are concerned with the second
type of estimation problem, where we are interested in estimating the value of an inaccessible random variable X in terms of the observation of an accessible random variable Y. For
example, X could be the input to a communication channel and Y could be the observed
output. In a prediction application, X could be a future value of some quantity and Y its
present value.
6.5.1
MAP and ML Estimators
We have considered estimation problems informally earlier in the book. For example,
in estimating the output of a discrete communications channel we are interested in
finding the most probable input given the observation Y = y, that is, the value of input
x that maximizes P3X = x ƒ Y = y4:
max P3X = x ƒ Y = y4.
x
In general we refer to the above estimator for X in terms of Y as the maximum a posteriori (MAP) estimator. The a posteriori probability is given by:
P3X = x ƒ Y = y4 =
P3Y = y ƒ X = x4P3X = x4
P3Y = y4
and so the MAP estimator requires that we know the a priori probabilities P3X = x4.
In some situations we know P3Y = y ƒ X = x4 but we do not know the a priori probabilities, so we select the estimator value x as the value that maximizes the likelihood of
the observed value Y = y:
max P3Y = y ƒ X = x4.
x
Section 6.5
Estimation of Random Variables
333
We refer to this estimator of X in terms of Y as the maximum likelihood (ML) estimator.
We can define MAP and ML estimators when X and Y are continuous random
variables by replacing events of the form 5Y = y6 by 5y 6 Y 6 y + dy6. If X and Y
are continuous, the MAP estimator for X given the observation Y is given by:
maxfX1X = x ƒ Y = y2,
x
and the ML estimator for X given the observation Y is given by:
maxfX1Y = y ƒ X = x2.
x
Example 6.25 Comparison of ML and MAP Estimators
Let X and Y be the random pair in Example 5.16. Find the MAP and ML estimators for X in
terms of Y.
From Example 5.32, the conditional pdf of X given Y is given by:
fX1x ƒ y2 = e -1x - y2
for y … x
n
which decreases as x increases beyond y. Therefore the MAP estimator is X
MAP = y. On the
other hand, the conditional pdf of Y given X is:
fY1y ƒ x2 =
e -y
for 0 6 y … x.
1 - e -x
As x increases beyond y, the denominator becomes larger so the conditional pdf decreases.Theren
fore the ML estimator is X
ML = y. In this example the ML and MAP estimators agree.
Example 6.26 Jointly Gaussian Random Variables
Find the MAP and ML estimator of X in terms of Y when X and Y are jointly Gaussian random
variables.
The conditional pdf of X given Y is given by:
fX1x | y2 =
exp b -
2
sX
1
1y - mY2 - mX ≤ r
x - r
2 ¢
2
sY
211 - r 2sX
22psX2 11 - r22
which is maximized by the value of x for which the exponent is zero. Therefore
sX
n
X
1y - mY2 + mX .
MAP = r
sY
The conditional pdf of Y given X is:
fY1y | x2 =
exp b -
2
sY
1
y - r
1x - mX2 - mY ≤ r
2
2 ¢
sX
211 - r 2sY
22psY2 11 - r22
which is also maximized for the value of x for which the exponent is zero:
sY
1x - mX2 - mY .
0 = y - r
sX
.
334
Chapter 6
Vector Random Variables
The ML estimator for X given Y = y is then:
sX
n
1y - mY2 + mX .
X
ML =
rsY
n
n
Therefore we conclude that X
ML Z XMAP . In other words, knowledge of the a priori probabilities of X will affect the estimator.
6.5.2
Minimum MSE Linear Estimator
n = g1Y2. In general, the
The estimate for X is given by a function of the observation X
n
estimation error, X - X = X - g1Y2, is nonzero, and there is a cost associated with
the error, c1X - g1Y22. We are usually interested in finding the function g(Y) that
minimizes the expected value of the cost, E3c1X - g1Y224. For example, if X and Y
are the discrete input and output of a communication channel, and c is zero when
X = g1Y2 and one otherwise, then the expected value of the cost corresponds to the
probability of error, that is, that X Z g1Y2. When X and Y are continuous random
variables, we frequently use the mean square error (MSE) as the cost:
e = E31X - g1Y2224.
In the remainder of this section we focus on this particular cost function. We first consider the case where g(Y) is constrained to be a linear function of Y, and then consider
the case where g(Y) can be any function, whether linear or nonlinear.
First, consider the problem of estimating a random variable X by a constant a so
that the mean square error is minimized:
min E31X - a224 = E3X24 - 2aE3X4 + a2.
a
(6.51)
The best a is found by taking the derivative with respect to a, setting the result to zero,
and solving for a. The result is
(6.52)
a* = E3X4,
which makes sense since the expected value of X is the center of mass of the pdf. The
mean square error for this estimator is equal to E31X - a*224 = VAR1X2.
Now consider estimating X by a linear function g1Y2 = aY + b:
min E31X - aY - b224.
a,b
(6.53a)
Equation (6.53a) can be viewed as the approximation of X - aY by the constant b.
This is the minimization posed in Eq. (6.51) and the best b is
b* = E3X - aY4 = E3X4 - aE3Y4.
(6.53b)
Substitution into Eq. (6.53a) implies that the best a is found by
min E351X - E3X42 - a1Y - E3Y42624.
a
We once again differentiate with respect to a, set the result to zero, and solve for a:
0 =
d
E31X - E3X42 - a1Y - E3Y4224
da
Section 6.5
Estimation of Random Variables
335
= -2E351X - E3X42 - a1Y - E3Y4261Y - E3Y424
= -21COV1X, Y2 - aVAR1Y22.
(6.54)
The best coefficient a is found to be
a* =
COV1X, Y2
VAR1Y2
= rX,Y
sX
,
sY
where sY = 2VAR1Y2 and sX = 2VAR1X2 . Therefore, the minimum mean
square error (mmse) linear estimator for X in terms of Y is
n
X = a * Y + b*
= rX,YsX
Y - E3Y4
sY
+ E3X4.
(6.55)
The term 1Y - E3Y42/sY is simply a zero-mean, unit-variance version of Y. Thus
sX1Y - E3Y42/sY is a rescaled version of Y that has the variance of the random variable
that is being estimated, namely sX2 . The term E[X] simply ensures that the estimator has
the correct mean. The key term in the above estimator is the correlation coefficient:
rX,Y specifies the sign and extent of the estimate of Y relative to sX1Y - E3Y42/sY . If X
and Y are uncorrelated (i.e., rX,Y = 0) then the best estimate for X is its mean, E[X].
On the other hand, if rX,Y = ;1 then the best estimate is equal to ;sX1Y - E3Y42/
sY + E3X4.
We draw our attention to the second equality in Eq. (6.54):
E351X - E3X42 - a*1Y - E3Y4261Y - E3Y424 = 0.
(6.56)
This equation is called the orthogonality condition because it states that the error of
the best linear estimator, the quantity inside the braces, is orthogonal to the observation Y - E[Y]. The orthogonality condition is a fundamental result in mean square
estimation.
The mean square error of the best linear estimator is
e*L = E311X - E3X42 - a*1Y - E3Y42224
= E311X - E3X42 - a*1Y - E3Y4221X - E3X424
- a*E311X - E3X42 - a*1Y - E3Y4221Y - E3Y424
= E311X - E3X42 - a*1Y - E3Y4221X - E3X424
= VAR1X2 - a* COV1X, Y2
= VAR1X211 - r2X,Y2
(6.57)
where the second equality follows from the orthogonality condition. Note that when
|rX,Y| = 1, the mean square error is zero. This implies that P3|X - a*Y - b*| = 04
= P3X = a*Y + b*4 = 1, so that X is essentially a linear function of Y.
336
6.5.3
Chapter 6
Vector Random Variables
Minimum MSE Estimator
In general the estimator for X that minimizes the mean square error is a nonlinear
function of Y. The estimator g(Y) that best approximates X in the sense of minimizing
mean square error must satisfy
minimize E31X - g1Y2224.
g1.2
The problem can be solved by using conditional expectation:
E31X - g1Y2224 = E3E31X - g1Y222 ƒ Y44
q
=
L- q
E31X - g1Y222 ƒ Y = y4fY1y2dy.
The integrand above is positive for all y; therefore, the integral is minimized by minimizing E31X - g1Y222 ƒ Y = y4 for each y. But g(y) is a constant as far as the conditional expectation is concerned, so the problem is equivalent to Eq. (6.51) and the
“constant” that minimizes E31X - g1y222 ƒ Y = y4 is
g*1y2 = E3X ƒ Y = y4.
(6.58)
The function g*1y2 = E3X ƒ Y = y4 is called the regression curve which simply traces
the conditional expected value of X given the observation Y = y.
The mean square error of the best estimator is:
e* = E31X - g*1Y2224 =
=
3
Rn
3
R
E31X - E3X ƒ y422 ƒ Y = y4fY1y2 dy
VAR3X ƒ Y = y4fY1y2 dy.
Linear estimators in general are suboptimal and have larger mean square errors.
Example 6.27 Comparison of Linear and Minimum MSE Estimators
Let X and Y be the random pair in Example 5.16. Find the best linear and nonlinear estimators
for X in terms of Y, and of Y in terms of X.
Example 5.28 provides the parameters needed for the linear estimator: E3X4 = 3/2,
E3Y4 = 1/2, VAR3X4 = 5/4, VAR3Y4 = 1/4, and rX,Y = 1/25. Example 5.32 provides the
conditional pdf’s needed to find the nonlinear estimator. The best linear and nonlinear estimators for X in terms of Y are:
n =
X
E3X ƒ y4 =
3
1 25 Y - 1/2
+ = Y + 1
2
1/2
2
25
Ly
q
xe -1x - y2 dx = y + 1 and so E3X ƒ Y4 = Y + 1.
Thus the optimum linear and nonlinear estimators are the same.
Section 6.5
Estimation of Random Variables
337
1.2
1
0.8
0.6
0.4
4.9
4.6
4
4.3
3.7
3.4
3.1
2.8
2.5
2.2
1.9
1.6
1
1.3
0.7
0
0.4
0.2
0.1
Estimator for Y given x
1.4
x
FIGURE 6.2
Comparison of linear and nonlinear estimators.
The best linear and nonlinear estimators for Y in terms of X are:
n =
Y
E3Y ƒ x4 =
1 1 X - 3/2
1
+ = 1X + 12/5.
2
25 2 25/2
L0
x
y
1 - e -x - xe -x
xe -x
e -y
= 1 .
-x dy =
-x
1 - e
1 - e
1 - e -x
The optimum linear and nonlinear estimators are not the same in this case. Figure 6.2 compares
the two estimators. It can be seen that the linear estimator is close to E3Y ƒ x4 for lower values of
x, where the joint pdf of X and Y are concentrated and that it diverges from E3Y ƒ x4 for larger
values of x.
Example 6.28
Let X be uniformly distributed in the interval 1-1, 12 and let Y = X2. Find the best linear estimator for Y in terms of X. Compare its performance to the best estimator.
The mean of X is zero, and its correlation with Y is
E3XY4 = E3XX24 =
1
L- 21
x3/2 dx = 0.
Therefore COV1X, Y2 = 0 and the best linear estimator for Y is E[Y] by Eq. (6.55). The mean
square error of this estimator is the VAR(Y) by Eq. (6.57).
The best estimator is given by Eq. (6.58):
E3Y ƒ X = x4 = E3X2 ƒ X = x4 = x2.
The mean square error of this estimator is
E31Y - g1X2224 = E31X2 - X2224 = 0.
Thus in this problem, the best linear estimator performs poorly while the nonlinear estimator
gives the smallest possible mean square error, zero.
338
Chapter 6
Vector Random Variables
Example 6.29 Jointly Gaussian Random Variables
Find the minimum mean square error estimator of X in terms of Y when X and Y are jointly
Gaussian random variables.
The minimum mean square error estimator is given by the conditional expectation of X
given Y. From Eq. (5.63), we see that the conditional expectation of X given Y = y is given by
sX
E3X ƒ Y = y4 = E3X4 + rX, Y s 1Y - E3Y42.
Y
This is identical to the best linear estimator. Thus for jointly Gaussian random variables the minimum mean square error estimator is linear.
6.5.4
Estimation Using a Vector of Observations
The MAP, ML, and mean square estimators can be extended to where a vector of observations is available. Here we focus on mean square estimation. We wish to estimate
X by a function g(Y) of a random vector of observations Y = 1Y1 , Y2 , Á , Yn2T so that
the mean square error is minimized:
minimize E31X - g1Y2224.
g1.2
To simplify the discussion we will assume that X and the Yi have zero means. The
same derivation that led to Eq. (6.58) leads to the optimum minimum mean square
estimator:
g*1y2 = E3X ƒ Y = y4.
(6.59)
The minimum mean square error is then:
E31X - g*1Y2224 =
3
Rn
E31X - E3X ƒ Y422 ƒ Y = y4fY1y2dy
=
3
Rn
VAR3X ƒ Y = y4fY1y2dy.
Now suppose the estimate is a linear function of the observations:
g1Y2 = a akYk = aTY.
n
k=1
The mean square error is now:
E31X - g1Y2224 = E B ¢ X - a akYk ≤ R .
n
2
k=1
We take derivatives with respect to ak and again obtain the orthogonality conditions:
E B ¢ X - a akYk ≤ Yj R = 0
n
k=1
for j = 1, Á , n.
Section 6.5
Estimation of Random Variables
339
The orthogonality condition becomes:
E3XYj4 = E B ¢ a akYk ≤ Yj R = a akE3YkYj4 for j = 1, Á , n.
n
n
k=1
k=1
We obtain a compact expression by introducing matrix notation:
E3XY4 = R Ya
where a = 1a1 , a2 , Á , an2T.
(6.60)
where E3XY4 = 3E3XY14, E3XY24 , Á , E3XYn4T and R Y is the correlation matrix.
Assuming R Y is invertible, the optimum coefficients are:
a = R Y-1E3XY4.
(6.61a)
We can use the methods from Section 6.3 to invert R Y . The mean square error of the
optimum linear estimator is:
E31X - aTY224 = E31X - aTY2X4 - E31X - aTY2aTY4
= E31X - aTY2X4 = VAR1X2 - aTE3YX4. (6.61b)
Now suppose that X has mean mX and Y has mean vector m Y , so our estimator
now has the form:
T
n = g1Y2 =
X
a akYk + b = a Y + b.
n
(6.62)
k=1
The same argument that led to Eq. (6.53b) implies that the optimum choice for b is:
b = E3X4 - aTm Y .
Therefore the optimum linear estimator has the form:
n = g1Y2 = aT1Y - m 2 + m = aTZ + m
X
Y
X
X
where Z = Y - m Y is a random vector with zero mean vector. The mean square error
for this estimator is:
E31X - g1Y2224 = E31X - aTZ - mX224 = E31W - aTZ224
where W = X - mX has zero mean. We have reduced the general estimation problem to one with zero mean random variables, i.e., W and Z, which has solution given
by Eq. (6.61a). Therefore the optimum set of linear predictors is given by:
a = R z -1E3WZ4 = K Y-1E31X - mX21Y - m Y24.
(6.63a)
The mean square error is:
E31X - aTY - b224 = E31W - aTZ W4 = VAR1W2 - aTE3WZ4
= VAR1X2 - aTE31X - m X21Y - m Y24.
(6.63b)
This result is of particular importance in the case where X and Y are jointly Gaussian random variables. In Example 6.23 we saw that the conditional expected value
340
Chapter 6
Vector Random Variables
of X given Y is a linear function of Y of the form in Eq. (6.62). Therefore in this case
the optimum minimum mean square estimator corresponds to the optimum linear
estimator.
Example 6.30 Diversity Receiver
A radio receiver has two antennas to receive noisy versions of a signal X. The desired signal X is
a Gaussian random variable with zero mean and variance 2. The signals received in the first
and second antennas are Y1 = X + N1 and Y2 = X + N2 where N1 and N2 are zero-mean,
unit-variance Gaussian random variables. In addition, X, N1 , and N2 are independent random
variables. Find the optimum mean square error linear estimator for X based on a single antenna
signal and the corresponding mean square error. Compare the results to the optimum mean
square estimator for X based on both antenna signals Y = 1Y1 , Y22.
Since all random variables have zero mean, we only need the correlation matrix and the
cross-correlation vector in Eq. (6.61):
RY = B
= B
= B
and
E3Y214
E3Y1Y24
E3Y1Y24
R
E3Y224
E31X + N1224
E31X + N121X + N224
E3X24 + E3N 214
E3X24
E3XY4 = B
E31X + N121X + N224
R
E31X + N2224
E3X24
3
2
2 R = B
2
E3X 4 + E3N 24
2
R
3
E3XY14
E3X24
2
R = B
2 R = B R.
E3XY24
E3X 4
2
The optimum estimator using a single antenna received signal involves solving the 1 * 1 version
of the above system:
E3X24
2
N =
X
Y1 = Y1
2
2
3
E3X 4 + E3N 14
and the associated mean square error is:
VAR1X2 - a* COV1Y1 , X2 = 2 -
2
2
2 = .
3
3
The coefficients of the optimum estimator using two antenna signals are:
a = R Y-1E3XY4 = B
3
2
2 -1 2
1 3
R B R = B
3
2
5 -2
-2 2
0.4
RB R = B R
3
2
0.4
and the optimum estimator is:
N = 0.4Y + 0.4Y .
X
1
2
The mean square error for the two antenna estimator is:
2
E31X - aTY224 = VAR1X2 - aTE3YX4 = 2 - 30.4, 0.44 B R = 0.4.
2
Section 6.5
Estimation of Random Variables
341
As expected, the two antenna system has a smaller mean square error. Note that the receiver adds the two received signals and scales the result by 0.4. The sum of the signals is:
N = 0.4Y + 0.4Y = 0.412X + N + N 2 = 0.8 ¢ X + N1 + N2 ≤
X
1
2
1
2
2
so combining the signals keeps the desired signal portion, X, constant while averaging the two
noise signals N1 and N2. The problems at the end of the chapter explore this topic further.
Example 6.31 Second-Order Prediction of Speech
Let X1 , X2 , Á be a sequence of samples of a speech voltage waveform, and suppose that the
samples are fed into the second-order predictor shown in Fig. 6.3. Find the set of predictor coefficients a and b that minimize the mean square value of the predictor error when Xn is estimated by aXn - 2 + bXn - 1 .
We find the best predictor for X1 , X2 , and X3 and assume that the situation is identical for
X2 , X3, and X4 and so on. It is common practice to model speech samples as having zero mean
and variance s2, and a covariance that does not depend on the specific index of the samples, but
rather on the separation between them:
COV1Xj , Xk2 = rƒj - kƒs2.
The equation for the optimum linear predictor coefficients becomes
s2 B
1
r1
r1 a
r
R B R = s2 B 2 R .
b
1
r1
Equation (6.61a) gives
a =
r2 - r21
1 -
r21
Xn
b
and b =
r111 - r212
Xn 1
Xn 2
⫻
1 - r21
⫻
a
⫹
^
⫹ Xn
⫺
⫹
En
FIGURE 6.3
A two-tap linear predictor for processing
speech.
.
342
Chapter 6
Vector Random Variables
In Problem 6.78, you are asked to show that the mean square error using the above values of a
and b is
1r21 - r222
(6.64)
s2 b 1 - r21 r.
1 - r21
Typical values for speech signals are r1 = .825 and r2 = .562. The mean square value of the predictor output is then .281s2. The lower variance of the output 1.281s22 relative to the input variance 1s22 shows that the linear predictor is effective in anticipating the next sample in terms of
the two previous samples. The order of the predictor can be increased by using more terms in the
linear predictor. Thus a third-order predictor has three terms and involves inverting a 3 * 3 correlation matrix, and an n-th order predictor will involve an n * n matrix. Linear predictive techniques are used extensively in speech, audio, image and video compression systems. We discuss
linear prediction methods in greater detail in Chapter 10.
*6.6
GENERATING CORRELATED VECTOR RANDOM VARIABLES
Many applications involve vectors or sequences of correlated random variables. Computer simulation models of such applications therefore require methods for generating
such random variables. In this section we present methods for generating vectors of
random variables with specified covariance matrices. We also discuss the generation of
jointly Gaussian vector random variables.
6.6.1
Generating Random Vectors with Specified Covariance Matrix
Suppose we wish to generate a random vector Y with an arbitrary valid covariance matrix K Y . Let Y = ATX as in Example 6.17, where X is a vector random variable with
components that are uncorrelated, zero mean, and unit variance. X has covariance matrix equal to the identity matrix K X = I, m Y = Am X = 0, and
K Y = ATK XA = ATA.
Let P be the matrix whose columns are the eigenvectors of K Y and let ∂ be the diagonal matrix of eigenvalues, then from Eq. (6.39b) we have:
P TK YP = P TP∂ = ∂.
If we premultiply the above equation by P and then postmultiply by P T, we obtain expression for an arbitrary covariance matrix K Y in terms of its eigenvalues and eigenvectors:
(6.65)
P∂P T = PP TK YPP T = K Y .
Define the matrix ∂ 1/2 as the diagonal matrix of square roots of the eigenvalues:
∂ 1/2
2l1
0
! D
.
0
0
2l2
.
0
Á
Á
Á
Á
0
0
. T.
2ln
Section 6.6
Generating Correlated Vector Random Variables
343
In Problem 6.53 we show that any covariance matrix K Y is positive semi-definite,
which implies that it has nonnegative eigenvalues, and so taking the square root is always possible. If we now let
A = 1P∂ 1/22T
(6.66)
then
ATA = P∂ 1/2 ∂ 1/2P T = P∂P T = K Y .
Therefore Y has the desired covariance matrix K Y .
Example 6.32
Let X = 1X1 , X22 consist of two zero-mean, unit-variance, uncorrelated random variables. Find
the matrix A such that Y = AX has covariance matrix
K = B
4
2
2
R.
4
First we need to find the eigenvalues of K which are determined from the following equation:
det1K - lI2 = 0 = det B
4 - l
2
2
R = 14 - l22 - 4 = l2 - 8l + 12
4 - l
= 1l - 621l - 22.
We find the eigenvalues to be l1 = 2 and l2 = 6. Next we need to find the eigenvectors corresponding to each eigenvalue:
B
4
2
2 e1
e
e
R B R = l1 B 1 R = 2 B 1 R
4 e2
e2
e2
which implies that 2e1 + 2e2 = 0. Thus any vector of the form 31, -14T is an eigenvector. We
choose the normalized eigenvector corresponding to l1 = 2 as e1 = 31/ 22, -1/224T. We
similarly find the eigenvector corresponding to l2 = 6 as e2 = 31/22, 1/224T.
The method developed in Section 6.3 requires that we form the matrix P whose columns
consist of the eigenvectors of K:
1
1
1
P =
B
R.
-1
1
22
Next it requires that we form the diagonal matrix with elements equal to the square root of the
eigenvalues:
22
0
∂ 1/2 = B
R.
0
26
The desired matrix is then
A = P∂ 1/2 = B
You should verify that K = AAT.
1
-1
23
R.
23
344
Chapter 6
Vector Random Variables
Example 6.33
Use Octave to find the eigenvalues and eigenvectors calculated in the previous example.
After entering the matrix K, we use the eig(K) function to find the matrix of eigenvectors
P and eigenvalues ¶. We then find A and its transpose AT. Finally we confirm that ATA gives the
desired covariance matrix.
> K=[4, 2; 2, 4];
> [P,D] =eig (K)
P=
-0.70711 0.70711
0.70711 0.70711
D=
2 0
0 6
> A=(P*sqrt(D))’
A=
-1.0000 1.0000
1.7321 1.7321
> A’
ans =
-1.0000 1.7321
1.0000 1.7321
> A’*A
ans =
4.0000 2.0000
2.0000 4.0000
The above steps can be used to find the transformation AT for any desired covariance
matrix K. The only check required is to ascertain that K is a valid covariance matrix:
(1) K is symmetric (trivial); (2) K has positive eigenvalues (easy to check numerically).
6.6.2
Generating Vectors of Jointly Gaussian Random Variables
In Section 6.4 we found that if X is a vector of jointly Gaussian random variables with
covariance KX , then Y = AX is also jointly Gaussian with covariance matrix
KY = AKXAT. If we assume that X consists of unit-variance, uncorrelated random
variables, then KX = I, the identity matrix, and therefore KY = AAT.
We can use the method from the first part of this section to find A for any desired
covariance matrix KY . We generate jointly Gaussian random vectors Y with arbitrary
covariance matrix KY and mean vector m Y as follows:
1. Find a matrix A such that KY = AAT.
2. Use the method from Section 5.10 to generate X consisting of n independent,
zero-mean, Gaussian random variables.
3. Let Y = AX + m Y.
Section 6.6
Generating Correlated Vector Random Variables
345
Example 6.34
The Octave commands below show necessary steps for generating the Gaussian random variables with the covariance matrix from Example 6.30.
> U1=rand(1000, 1);
% Create a 1000-element vector U1.
> U2=rand(1000, 1);
% Create a 1000-element vector U2.
> R2=-2 log(U1);
% Find R2.
> TH=2*pi*U2;
% Find ®.
> X1=sqrt(R2).*sin(TH);
% Generate X1.
> X2=sqrt(R2).*cos(TH);
% Generate X2.
> Y1=X1+sqrt(3)*X2
% Generate Y1.
> Y2=-X1+sqrt(3)*X2
% Generate Y2.
> plot(Y1,Y2,’+’)
% Plot scattergram.
*
We plotted the Y1 values vs. the Y2 values for 1000 pairs of generated random variables in
a scattergram as shown in Fig. 6.4. Good agreement with the elliptical symmetry of the desired
jointly Gaussian pdf is observed.
FIGURE 6.4
Scattergram of jointly Gaussian random variables.
346
Chapter 6
Vector Random Variables
SUMMARY
• The joint statistical behavior of a vector of random variables X is specified by
the joint cumulative distribution function, the joint probability mass function,
or the joint probability density function. The probability of any event involving the joint behavior of these random variables can be computed from these
functions.
• The statistical behavior of subsets of random variables from a vector X is specified by the marginal cdf, marginal pdf, or marginal pmf that can be obtained from
the joint cdf, joint pdf, or joint pmf of X.
• A set of random variables is independent if the probability of a product-form
event is equal to the product of the probabilities of the component events. Equivalent conditions for the independence of a set of random variables are that the
joint cdf, joint pdf, or joint pmf factors into the product of the corresponding marginal functions.
• The statistical behavior of a subset of random variables from a vector X, given
the exact values of the other random variables in the vector, is specified by the
conditional cdf, conditional pmf, or conditional pdf. Many problems naturally
lend themselves to a solution that involves conditioning on the values of some of
the random variables. In these problems, the expected value of random variables
can be obtained through the use of conditional expectation.
• The mean vector and the covariance matrix provide summary information about
a vector random variable. The joint characteristic function contains all of the information provided by the joint pdf.
• Transformations of vector random variables generate other vector random variables. Standard methods are available for finding the joint distributions of the
new random vectors.
• The orthogonality condition provides a set of linear equations for finding the
minimum mean square linear estimate. The best mean square estimator is given
by the conditional expected value.
• The joint pdf of a vector X of jointly Gaussian random variables is determined by
the vector of the means and by the covariance matrix. All marginal pdf’s and conditional pdf’s of subsets of X have Gaussian pdf’s. Any linear function or linear
transformation of jointly Gaussian random variables will result in a set of jointly
Gaussian random variables.
• A vector of random variables with an arbitrary covariance matrix can be generated by taking a linear transformation of a vector of unit-variance, uncorrelated
random variables. A vector of Gaussian random variables with an arbitrary covariance matrix can be generated by taking a linear transformation of a vector of
independent, unit-variance jointly Gaussian random variables.
Annotated References
347
CHECKLIST OF IMPORTANT TERMS
Conditional cdf
Conditional expectation
Conditional pdf
Conditional pmf
Correlation matrix
Covariance matrix
Independent random variables
Jacobian of a transformation
Joint cdf
Joint characteristic function
Joint pdf
Joint pmf
Jointly continuous random variables
Jointly Gaussian random variables
Karhunen-Loeve expansion
MAP estimator
Marginal cdf
Marginal pdf
Marginal pmf
Maximum likelihood estimator
Mean square error
Mean vector
MMSE linear estimator
Orthogonality condition
Product-form event
Regression curve
Vector random variables
ANNOTATED REFERENCES
Reference [3] provides excellent coverage on linear transformation and jointly
Gaussian random variables. Reference [5] provides excellent coverage of vector
random variables. The book by Anton [6] provides an accessible introduction to linear
algebra.
1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes,
McGraw-Hill, New York, 2002.
2. N. Johnson et al., Continuous Multivariate Distributions, Wiley, New York, 2000.
3. H. Cramer, Mathematical Methods of Statistics, Princeton Press, 1999.
4. R. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing,
Cambridge Univ. Press, Cambridge, UK, 2005.
5. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory
for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986.
6. H. Anton, Elementary Linear Algebra, 9th ed., Wiley, New York, 2005.
7. C. H. Edwards, Jr., and D. E. Penney, Calculus and Analytic Geometry, 4th ed.,
Prentice Hall, Englewood Cliffs, N.J., 1984.
348
Chapter 6
Vector Random Variables
PROBLEMS
Section 6.1: Vector Random Variables
6.1. The point X = 1X, Y, Z2 is uniformly distributed inside a sphere of radius 1 about the
origin. Find the probability of the following events:
(a) X is inside a sphere of radius r, r 7 0.
(b) X is inside a cube of length 2/23 centered about the origin.
(c) All components of X are positive.
(d) Z is negative.
6.2. A random sinusoid signal is given by X1t2 = A sin1t2 where A is a uniform random variable in the interval [0, 1]. Let X = 1X1t12, X1t22, X1t322 be samples of the signal taken at
times t1 , t2 , and t3 .
(a) Find the joint cdf of X in terms of the cdf of A if t1 = 0, t2 = p/2, and t3 = p. Are
X1t12, X1t22, X1t32 independent random variables?
(b) Find the joint cdf of X for t1 , t2 = t1 + p/2, and t3 = t1 + p. Let t1 = p/6.
6.3. Let the random variables X, Y, and Z be independent random variables. Find the following probabilities in terms of FX1x2, FY1y2, and FZ1z2.
(a) P3 ƒ X ƒ 6 5, Y 6 4, Z3 7 84.
(b) P3X = 5, Y 6 0, Z 7 14.
(c) P3min1X, Y, Z2 6 24.
(d) P3max1X, Y, Z2 7 64.
6.4. A radio transmitter sends a signal s 7 0 to a receiver using three paths. The signals that
arrive at the receiver along each path are:
X1 = s + N1 , X2 = s + N2 , and X3 = s + N3 ,
where N1 , N2 , and N3 are independent Gaussian random variables with zero mean and
unit variance.
(a) Find the joint pdf of X = 1X1 , X2 , X32. Are X1 , X2 , and X3 independent random
variables?
(b) Find the probability that the minimum of all three signals is positive.
(c) Find the probability that a majority of the signals are positive.
6.5. An urn contains one black ball and two white balls. Three balls are drawn from the urn.
Let Ik = 1 if the outcome of the kth draw is the black ball and let Ik = 0 otherwise. Define
the following three random variables:
X = I1 + I2 + I3 ,
Y = min5I1 , I2 , I36,
Z = max5I1 , I2 , I36.
(a) Specify the range of values of the triplet (X, Y, Z) if each ball is put back into the urn
after each draw; find the joint pmf for (X, Y, Z).
(b) In part a, are X, Y, and Z independent? Are X and Y independent?
(c) Repeat part a if each ball is not put back into the urn after each draw.
6.6. Consider the packet switch in Example 6.1. Suppose that each input has one packet with
probability p and no packets with probability 1 - p. Packets are equally likely to be
Problems
349
destined to each of the outputs. Let X1, X2 and X3 be the number of packet arrivals destined for output 1, 2, and 3, respectively.
(a) Find the joint pmf of X1 , X2 , and X3 Hint: Imagine that every input has a packet go
to a fictional port 4 with probality 1 – p.
(b) Find the joint pmf of X1 and X2 .
(c) Find the pmf of X2 .
(d) Are X1 , X2 , and X3 independent random variables?
(e) Suppose that each output will accept at most one packet and discard all additional
packets destined to it. Find the average number of packets discarded by the module
in each T-second period.
6.7. Let X, Y, Z have joint pdf
fX,Y,Z1x, y, z2 = k1x + y + z2 for 0 … x … 1, 0 … y … 1, 0 … z … 1.
6.8.
6.9.
6.10.
6.11.
6.12.
6.13.
6.14.
(a) Find k.
(b) Find fX1x ƒ y, z2 and fZ1z ƒ x, y2.
(c) Find fX1x2, fY1y2, and fZ1z2.
A point X = 1X, Y, Z2 is selected at random inside the unit sphere.
(a) Find the marginal joint pdf of Y and Z.
(b) Find the marginal pdf of Y.
(c) Find the conditional joint pdf of X and Y given Z.
(d) Are X, Y, and Z independent random variables?
(e) Find the joint pdf of X given that the distance from X to the origin is greater than 1/2
and all the components of X are positive.
Show that pX1,X2, X31x1 , x2 , x32 = pX31x3 ƒ x1 , x22pX21x2 ƒ x12pX11x12.
Let X1 , X2 , Á , Xn be binary random variables taking on values 0 or 1 to denote whether
a speaker is silent (0) or active (1). A silent speaker remains idle at the next time slot with
probability 3/4, and an active speaker remains active with probability 1/2. Find the joint
pmf for X1 , X2 , X3 , and the marginal pmf of X3 . Assume that the speaker begins in the
silent state.
Show that fX,Y,Z1x, y, z2 = fZ1z ƒ x, y2fY1y ƒ x2fX1x2.
Let U1 , U2 , and U3 be independent random variables and let X = U1 , Y = U1 + U2 , and
Z = U1 + U2 + U3 .
(a) Use the result in Problem 6.11 to find the joint pdf of X, Y, and Z.
(b) Let the Ui be independent uniform random variables in the interval [0, 1]. Find the
marginal joint pdf of Y and Z. Find the marginal pdf of Z.
(c) Let the Ui be independent zero-mean, unit-variance Gaussian random variables.
Find the marginal pdf of Y and Z. Find the marginal pdf of Z.
Let X1 , X2 , and X3 be the multiplicative sequence in Example 6.7.
(a) Find, plot, and compare the marginal pdfs of X1 , X2 , and X3 .
(b) Find the conditional pdf of X3 given X1 = x.
(c) Find the conditional pdf of X1 given X3 = z.
Requests at an online music site are categorized as follows: Requests for most popular
title with p1 = 1/2; second most popular title with p2 = 1/4; third most popular title with
p3 = 1/8; and other p4 = 1 - p1 - p2 - p3 = 1/8. Suppose there are a total number of
350
Chapter 6
Vector Random Variables
n requests in T seconds. Let Xk be the number of times category k occurs.
(a) Find the joint pmf of 1X1 , X2 , X32.
(b) Find the marginal pmf of 1X1 , X22. Hint: Use the binomial theorem.
(c) Find the marginal pmf of X1 .
(d) Find the conditional joint pmf of 1X2 , X32 given X1 = m, where 0 … m … n.
6.15. The number N of requests at the online music site in Problem 6.14 is a Poisson random
variable with mean a customers per second. Let Xk be the number of type k requests in
T seconds. Find the joint pmf of 1X1 , X2 , X3 , X42.
6.16. A random experiment has four possible outcomes. Suppose that the experiment is repeated n independent times and let Xk be the number of times outcome k occurs. The
joint pmf of 1X1 , X2 , X32 is given by
p1k1 , k2 , k32 =
n + 3 -1
n! 3!
= ¢
≤
3
1n + 32!
for 0 … ki and k1 + k2 + k3 … n.
(a) Find the marginal pmf of 1X1 , X22.
(b) Find the marginal pmf of X1 .
(c) Find the conditional joint pmf of 1X2 , X32 given X1 = m, where 0 … m … n.
6.17. The number of requests of types 1, 2, and 3, respectively, arriving at a service station in
t seconds are independent Poisson random variables with means l1t, l2t, and l3t. Let
N1 , N2 , and N3 be the number of requests that arrive during an exponentially distributed
time T with mean at.
(a) Find the joint pmf of N1 , N2 , and N3 .
(b) Find the marginal pmf of N1 .
(c) Find the conditional pmf of N1 and N2 , given N3 .
Section 6.2: Functions of Several Random Variables
6.18. N devices are installed at the same time. Let Y be the time until the first device fails.
(a) Find the pdf of Y if the lifetimes of the devices are independent and have the same
Pareto distribution.
(b) Repeat part a if the device lifetimes have a Weibull distribution.
6.19. In Problem 6.18 let Ik1t2 be the indicator function for the event “kth device is still working at time t.” Let N(t) be the number of devices still working at time t: N1t2 = I11t2 +
I21t2 + Á + IN1t2. Find the pmf of N(t) as well as its mean and variance.
6.20. A diversity receiver receives N independent versions of a signal. Each signal version has
an amplitude Xk that is Rayleigh distributed. The receiver selects that signal with the
largest amplitude Xk2 . A signal is not useful if the squared amplitude falls below a threshold g. Find the probability that all N signals are below the threshold.
6.21. (Haykin) A receiver in a multiuser communication system accepts K binary signals from
K independent transmitters: Y = 1Y1 , Y2 , Á , YK2, where Yk is the received signal from
the kth transmitter. In an ideal system the received vector is given by:
Y = Ab + N
where A = 3ak4 is a diagonal matrix of positive channel gains, b = 1b1 , b2 , Á , bK2 is
the vector of bits from each of the transmitters where bk = ;1, and N is a vector of K
Problems
351
independent zero-mean, unit-variance Gaussian random variables.
(a) Find the joint pdf of Y.
(b) Suppose b = 11, 1, Á , 12, find the probability that all components of Y are positive.
6.22. (a) Find the joint pdf of U = X1 , V = X1 + X2 , and W = X1 + X2 + X3 .
(b) Evaluate the joint pdf of (U, V, W) if the Xi are independent zero-mean, unit variance Gaussian random variables.
(c) Find the marginal pdf of V and of W.
6.23. (a) Find the joint pdf of the sample mean and variance of two random variables:
M =
X1 + X2
2
V =
1X1 - M22 + 1X2 - M22
2
in terms of the joint pdf of X1 and X2 .
(b) Evaluate the joint pdf if X1 and X2 are independent Gaussian random variables with
the same mean 1 and variance 1.
(c) Evaluate the joint pdf if X1 and X2 are independent exponential random variables
with the same parameter 1.
6.24. (a) Use the auxiliary variable method to find the pdf of
Z =
6.25.
6.26.
6.27.
6.28.
6.29.
X
.
X + Y
(b) Find the pdf of Z if X and Y are independent exponential random variables with the
parameter 1.
(c) Repeat part b if X and Y are independent Pareto random variables with parameters
k = 2 and xm = 1.
Repeat Problem 6.24 parts a and b for Z = X/Y.
Let X and Y be zero-mean, unit-variance Gaussian random variables with correlation coefficient 1/2. Find the joint pdf of U = X2 and V = Y4.
Use auxilliary variables to find the pdf of Z = X1X2X3 where the Xi are independent
random variables that are uniformly distributed in [0, 1].
Let X, Y, and Z be independent zero-mean, unit-variance Gaussian random variables.
(a) Find the pdf of R = (X2 + Y2 + Z2)1/2.
(b) Find the pdf of R2 = X2 + Y2 + Z2.
Let X1 , X2 , X3 , X4 be processed as follows:
Y1 = X1 , Y2 = X1 + X2 , Y3 = X2 + X3 , Y4 = X3 + X4 .
(a) Find an expression for the joint pdf of Y = 1Y1 , Y2 , Y3 , Y42 in terms of the joint pdf
of X = 1X1 , X2 , X3 , X42.
(b) Find the joint pdf of Y if X1 , X2 , X3 , X4 are independent zero-mean, unit-variance
Gaussian random variables.
Section 6.3: Expected Values of Vector Random Variables
6.30. Find E[M], E[V], and E[MV] in Problem 6.23c.
6.31. Compute E[Z] in Problem 6.27 in two ways:
(a) by integrating over fZ1z2;
(b) by integrating over the joint pdf of 1X1 , X2 , X32.
352
Chapter 6
Vector Random Variables
6.32. Find the mean vector and covariance matrix for three multipath signals X = 1X1 , X2 , X32
in Problem 6.4.
6.33. Find the mean vector and covariance matrix for the samples of the sinusoidal signals
X = 1X1t12, X1t22, X1t322 in Problem 6.2.
6.34. (a) Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.5a.
(b) Repeat part a for Problem 6.5c.
6.35. Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.7.
6.36. Find the mean vector and covariance matrix for the point (X, Y, Z) inside the unit sphere
in Problem 6.8.
6.37. (a) Use the results of Problem 6.6c to find the mean vector for the packet arrivals
X1 , X2 , and X3 in Example 6.5.
(b) Use the results of Problem 6.6b to find the covariance matrix.
(c) Explain why X1 , X2 , and X3 are correlated.
6.38. Find the mean vector and covariance matrix for the joint number of packet arrivals in a
random time N1 , N2 , and N3 in Problem 6.17. Hint: Use conditional expectation.
6.39. (a) Find the mean vector and covariance matrix (U, V, W) in terms of 1X1 , X2 , X32 in
Problem 6.22b.
(b) Find the cross-covariance matrix between (U, V, W) and 1X1 , X2 , X32.
6.40. (a) Find the mean vector and covariance matrix of Y = 1Y1 , Y2 , Y3 , Y42 in terms of
those of X = 1X1 , X2 , X3 , X42 in Problem 6.29.
(b) Find the cross-covariance matrix between Y and X.
(c) Evaluate the mean vector, covariance, and cross-covariance matrices if X1 , X2 , X3 , X4
are independent random variables.
(d) Generalize the results in part c to Y = 1Y1 , Y2 , Á , Yn - 1 , Yn2.
6.41. Let X = 1X1 , X2 , X3 , X42 consist of equal mean, independent, unit-variance random
variables. Find the mean vector, covariance, and cross-covariance matrices of Y = AX:
1
0
(a) A = D
0
0
1/2
1
0
0
1/4
1/2
1
0
1/8
1/4
T
1/2
1
1
1
(b) A = D
1
1
1
-1
1
-1
1
1
-1
-1
1
-1
T.
-1
1
6.42. Let W = aX + bY + c, where X and Y are random variables.
(a) Find the characteristic function of W in terms of the joint characteristic function of
X and Y.
(b) Find the characteristic function of W if X and Y are the random variables discussed
in Example 6.19. Find the pdf of W.
Problems
353
6.43. (a) Find the joint characteristic function of the jointly Gaussian random variables X and
Y introduced in Example 5.45. Hint: Consider X and Y as a transformation of the independent Gaussian random variables V and W.
(b) Find E3X2Y4.
(c) Find the joint characteristic function of X ¿ = X + a and Y ¿ = Y + b.
6.44. Let X = aU + bV and y = cU + dV, where ƒ ad - bc ƒ Z 0.
(a) Find the joint characteristic function of X and Y in terms of the joint characteristic
function of U and V.
(b) Find an expression for E[XY] in terms of joint moments of U and V.
6.45. Let X and Y be nonnegative, integer-valued random variables. The joint probability generating function is defined by
GX,Y1z1 , z22 = E3z1X z2Y 4 = a a z1 z2k P3X = j, Y = k4.
q
q
j
j=0k=0
(a) Find the joint pgf for two independent Poisson random variables with parameters a1
and a2 .
(b) Find the joint pgf for two independent binomial random variables with parameters
(n, p) and (m, p).
6.46. Suppose that X and Y have joint pgf
GX,Y1z1 , z22 = ea11z1 - 12 + a21z2 - 12 + b1z1z2 - 12.
(a) Use the marginal pgf’s to show that X and Y are Poisson random variables.
(b) Find the pgf of Z = X + Y. Is Z a Poisson random variable?
6.47. Let X and Y be trinomial random variables with joint pmf
P3X = j, Y = k4 =
n! pj1pk2 11 - p1 - p22n - j - k
for 0 … j, k and j + k … n.
j! k!1n - j - k2!
(a) Find the joint pgf of X and Y.
(b) Find the correlation and covariance of X and Y.
6.48. Find the mean vector and covariance matrix for (X, Y) in Problem 6.46.
6.49. Find the mean vector and covariance matrix for (X, Y) in Problem 6.47.
6.50. Let X = 1X1 , X22 have covariance matrix:
KX = B
1
1/4
1/4
R.
1
(a) Find the eigenvalues and eigenvectors of K X.
(b) Find the orthogonal matrix P that diagonalizes K X. Verify that P is orthogonal and
that P TK XP = ∂.
(c) Express X in terms of the eigenvectors of K X using the Karhunen-Loeve expansion.
6.51. Repeat Problem 6.50 for X = 1X1 , X2 , X32 with covariance matrix:
1
K X = C -1/2
-1/2
-1/2
1
-1/2
-1/2
-1/2 S .
1
354
Chapter 6
Vector Random Variables
6.52. A square matrix A is said to be nonnegative definite if for any vector a = (a1,a2,
Á , an)T : a TA a Ú 0. Show that the covariance matrix is nonnegative definite. Hint: Use
the fact that E31aT1X - m X2224 Ú 0.
6.53. A is positive definite if for any nonzero vector a = 1a1 , a2 , Á , an2T: aTA a 7 0.
(a) Show that if all the eigenvalues are positive, then K X is positive definite. Hint: Let
b = P Ta.
(b) Show that if K X is positive definite, then all the eigenvalues are positive. Hint: Let a
be an eigenvector of K X.
Section 6.4: Jointly Gaussian Random Vectors
6.54. Let X = 1X1 , X22 be the jointly Gaussian random variables with mean vector and covariance
matrix given by:
1
3/2
-1/2
KX = B
mX = B R
R.
-1/2
3/2
0
Find the pdf of X in matrix notation.
Find the pdf of X using the quadratic expression in the exponent.
Find the marginal pdfs of X1 and X2 .
Find a transformation A such that the vector Y = AX consists of independent
Gaussian random variables.
(e) Find the joint pdf of Y.
6.55. Let X = 1X1 , X2 , X32 be the jointly Gaussian random variables with mean vector and
covariance matrix given by:
(a)
(b)
(c)
(d)
mX
1
= C0S
2
KX
3/2
= C 0
1/2
0
1
0
1/2
0 S.
3/2
Find the pdf of X in matrix notation.
Find the pdf of X using the quadratic expression in the exponent.
Find the marginal pdfs of X1 , X2 , and X3 .
Find a transformation A such that the vector Y = AX consists of independent
Gaussian random variables.
(e) Find the joint pdf of Y.
6.56. Let U1 , U2 , and U3 be independent zero-mean, unit-variance Gaussian random variables
and let X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 .
(a) Find the covariance matrix of (X, Y, Z).
(b) Find the joint pdf of (X, Y, Z).
(c) Find the conditional pdf of Y and Z given X.
(d) Find the conditional pdf of Z given X and Y.
6.57. Let X1 , X2 , X3 , X4 be independent zero-mean, unit-variance Gaussian random variables
that are processed as follows:
(a)
(b)
(c)
(d)
Y1 = X1 + X2 , Y2 = X2 + X3 , Y3 = X3 + X4 .
(a)
(b)
(c)
(d)
Find the covariance matrix of Y = 1Y1 , Y2 , Y32.
Find the joint pdf of Y.
Find the joint pdf of Y1 and Y2 ; Y1 and Y3 .
Find a transformation A such that the vector Z = AY consists of independent
Gaussian random variables.
Problems
355
6.58. A more realistic model of the receiver in the multiuser communication system in Problem 6.21 has the K received signals Y = 1Y1 , Y2 , Á , YK2 given by:
Y = ARb + N
where A = 3ak4 is a diagonal matrix of positive channel gains, R is a symmetric matrix
that accounts for the interference between users, and b = 1b1 , b2 , Á , bK2 is the vector of
bits from each of the transmitters. N is the vector of K independent zero-mean, unit-variance
Gaussian noise random variables.
(a) Find the joint pdf of Y.
(b) Suppose that in order to recover b, the receiver computes Z = 1AR2-1Y. Find the
joint pdf of Z.
6.59. (a) Let K 3 be the covariance matrix in Problem 6.55. Find the corresponding Q2 and Q3
in Example 6.23.
(b) Find the conditional pdf of X3 given X1 and X2 .
6.60. In Example 6.23, show that:
1
2 1x n
- m n2TQn1x n - m n2 - 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 12
= Qnn51xn - mn2 + B62 - QnnB2
where B =
1 n-1
Qjk1xj - mj2 and
Qnn ja
=1
ƒ K n ƒ / ƒ K n - 1 ƒ = Qnn .
6.61. Find the pdf of the sum of Gaussian random variables in the following cases:
(a) Z = X1 + X2 + X3 in Problem 6.55.
(b) Z = X + Y + Z in Problem 6.56.
(c) Z = Y1 + Y2 + Y3 in Problem 6.57.
6.62. Find the joint characteristic function of the jointly Gaussian random vector X in Problem 6.54.
6.63. Suppose that a jointly Gaussian random vector X has zero mean vector and the covariance matrix given in Problem 6.51.
(a) Find the joint characteristic function.
(b) Can you obtain an expression for the joint pdf? Explain your answer.
6.64. Let X and Y be jointly Gaussian random variables. Derive the joint characteristic function for X and Y using conditional expectation.
6.65. Let X = 1X1 , X2 , Á , Xn2 be jointly Gaussian random variables. Derive the characteristic function for X by carrying out the integral in Eq. (6.32). Hint: You will need to complete the square as follows:
1x - jKv2TK-11x - jKv2 = xTK-1x - 2jxTv + j2vTKv.
6.66. Find E[X2Y2] for jointly Gaussian random variables from the characteristic function.
6.67. Let X = 1X1 , X2 , X3 , X42 be zero-mean jointly Gaussian random variables. Show that
E3X1X2X3X44 = E3X1X24E3X3X44 + E3X1X34E3X2X44 + E3X1X44E3X2X34.
Section 6.5: Mean Square Estimation
6.68. Let X and Y be discrete random variables with three possible joint pmf’s:
(i)
X/Y -1 0 1
(ii)
X/Y -1 0
1
-1 1/6 1/6 0
-1 1/9 1/9 1/9
-1 1/3 0
0
0
1
0
1
0
1
0 0 1/3
1/6 1/6 0
X/Y -1 0
(iii)
1
1/9 1/9 1/9
1/9 1/9 1/9
0 1/3 0
0 0 1/3
356
Chapter 6
6.69.
6.70.
6.71.
6.72.
6.73.
6.74.
6.75.
6.76.
6.77.
Vector Random Variables
(a) Find the minimum mean square error linear estimator for Y given X.
(b) Find the minimum mean square error estimator for Y given X.
(c) Find the MAP and ML estimators for Y given X.
(d) Compare the mean square error of the estimators in parts a, b, and c.
Repeat Problem 6.68 for the continuous random variables X and Y in Problem 5.26.
Find the ML estimator for the signal s in Problem 6.4.
Let N1 be the number of Web page requests arriving at a server in the period (0, 100) ms
and let N2 be the total combined number of Web page requests arriving at a server in the
period (0, 200) ms. Assume page requests occur every 1-ms interval according to independent Bernoulli trials with probability of success p.
(a) Find the minimum linear mean square estimator for N2 given N1 and the associated
mean square error.
(b) Find the minimum mean square error estimator for N2 given N1 and the associated
mean square error.
(c) Find the maximum a posteriori estimator for N2 given N1 .
(d) Repeat parts a, b, and c for the estimation of N1 given N2 .
Let Y = X + N where X and N are independent Gaussian random variables with different variances and N is zero mean.
(a) Plot the correlation coefficient between the “observed signal” Y and the “desired
signal” X as a function of the signal-to-noise ratio sX/sN .
(b) Find the minimum mean square error estimator for X given Y.
(c) Find the MAP and ML estimators for X given Y.
(d) Compare the mean square error of the estimators in parts a, b and c.
Let X, Y, Z be the random variables in Problem 6.7.
(a) Find the minimum mean square error linear estimator for Y given X and Z.
(b) Find the minimum mean square error estimator for Y given X and Z.
(c) Find the MAP and ML estimators for Y given X and Z.
(d) Compare the mean square error of the estimators in parts b and c.
(a) Repeat Problem 6.73 for the estimator of X2 , given X1 and X3 in Problem 6.13.
(b) Repeat Problem 6.73 for the estimator of X3 given X1 and X2 .
Consider the ideal multiuser communication system in Problem 6.21. Assume the transmitted bits bk are independent and equally likely to be +1 or -1.
(a) Find the ML and MAP estimators for b given the observation Y.
(b) Find the minimum mean square linear estimator for b given the observation Y. How
can this estimator be used in deciding what were the transmitted bits?
Repeat Problem 6.75 for the multiuser system in Problem 6.58.
A second-order predictor for samples of an image predicts the sample E as a linear function of sample D to its left and sample B in the previous line, as shown below:
line j
A
B
C Á
Á
line j + 1
D
E
Á
Á
Estimate for E = aD + bB.
(a) Find a and b if all samples have variance s2 and if the correlation coefficient between D and E is r, between B and E is r, and between D and B is r2.
(b) Find the mean square error of the predictor found in part a, and determine the reduction in the variance of the signal in going from the input to the output of the predictor.
Problems
357
6.78. Show that the mean square error of the two-tap linear predictor is given by Eq. (6.64).
6.79. In “hexagonal sampling” of an image, the samples in consecutive lines are offset relative
to each other as shown below:
line j
line j + 1
Á
Á
A
C
B
D
The covariance between two samples a and b is given by rd1a,b2 where d(a, b) is the Euclidean distance between the points. In the above samples, the distance between A and B,
A and C, A and D, C and D, and B and D is 1. Suppose we wish to use a two-tap linear
predictor to predict the sample D. Which two samples from the set 5A, B, C6 should we
use in the predictor? What is the resulting mean square error?
*Section 6.6: Generating Correlated Vector Random Variables
6.80. Find a linear transformation that diagonalizes K.
(a) K = B
2
1
1
R.
4
(b) K = B
4
1
1
R.
4
6.81. Generate and plot the scattergram of 1000 pairs of random variables Y with the covariance matrices in Problem 6.80 if:
(a) X1 and X2 are independent random variables that are each uniform in the unit
interval;
(b) X1 and X2 are independent zero-mean, unit-variance Gaussian random variables.
6.82. Let X = 1X1 , X2 , X32 be the jointly Gaussian random variables in Problem 6.55.
(a) Find a linear transformation that diagonalizes the covariance matrix.
(b) Generate 1000 triplets of Y = AX and plot the scattergrams for Y1 and Y2 , Y1 and
Y3 , and Y2 and Y3 . Confirm that the scattergrams are what is expected.
6.83. Let X be a jointly Gaussian random vector with mean m X and covariance matrix K X and
let A be a matrix that diagonalizes K X . What is the joint pdf of A-11X - m X2?
6.84. Let X1 , X2 , Á , Xn be independent zero-mean, unit-variance Gaussian random variables.
Let Yk = 1Xk + Xk - 12/2, that is, Yk is the moving average of pairs of values of X. Assume
X-1 = 0 = Xn + 1 .
(a) Find the covariance matrix of the Yk’s.
(b) Use Octave to generate a sequence of 1000 samples Y1 , Á , Yn . How would you
check whether the Yk’s have the correct covariances?
6.85. Repeat Problem 6.84 with Yk = Xk - Xk - 1 .
6.86. Let U be an orthogonal matrix. Show that if A diagonalizes the covariance matrix K, then
B = UA also diagonalizes K.
6.87. The transformation in Problem 6.56 is said to be “causal” because each output depends
only on “past” inputs.
(a) Find the covariance matrix of X, Y, Z in Problem 6.56.
(b) Find a noncausal transformation that diagonalizes the covariance matrix in part a.
6.88. (a) Find a causal transformation that diagonalizes the covariance matrix in Problem 6.54.
(b) Repeat for the covariance matrix in Problem 6.55.
358
Chapter 6
Vector Random Variables
Problems Requiring Cumulative Knowledge
6.89. Let U0 , U1 , Á be a sequence of independent zero-mean, unit-variance Gaussian random variables. A “low-pass filter” takes the sequence Ui and produces the output sequence Xn = 1Un + Un - 12/2, and a “high-pass filter” produces the output sequence
Yn = 1Un - Un - 12/2.
(a) Find the joint pdf of Xn + 1, Xn , and Xn - 1 ; of Xn , Xn + m,and Xn + 2m , m 7 1.
(b) Repeat part a for Yn .
(c) Find the joint pdf of Xn , Xm, Yn, and Ym .
(d) Find the corresponding joint characteristic functions in parts a, b, and c.
6.90. Let X1 , X2 , Á , Xn be the samples of a speech waveform in Example 6.31. Suppose we
want to interpolate for the value of a sample in terms of the previous and the next samples, that is, we wish to find the best linear estimate for X2 in terms of X1 and X3 .
(a) Find the coefficients of the best linear estimator (interpolator).
(b) Find the mean square error of the best linear interpolator and compare it to the
mean square error of the two-tap predictor in Example 6.31.
(c) Suppose that the samples are jointly Gaussian. Find the pdf of the interpolation error.
6.91. Let X1 , X2 , Á , Xn be samples from some signal. Suppose that the samples are jointly
Gaussian random variables with covariance
s2
for i = j
COV1Xi , Xj2 = c rs2 for ƒ i - j ƒ = 1
0
otherwise.
Suppose we take blocks of two consecutive samples to form a vector X, which is then linearly transformed to form Y = AX.
(a) Find the matrix A so that the components of Y are independent random variables.
(b) Let X i and X i + 1 be two consecutive blocks and let Yi and Yi + 1 be the corresponding
transformed variables. Are the components of Yi and Yi + 1 independent?
6.92. A multiplexer combines N digital television signals into a common communications line.
TV signal n generates Xn bits every 33 milliseconds, where Xn is a Gaussian random variable with mean m and variance s2. Suppose that the multiplexer accepts a maximum
total of T bits from the combined sources every 33 ms, and that any bits in excess of T are
discarded. Assume that the N signals are independent.
(a) Find the probability that bits are discarded in a given 33-ms period, if we let
T = ma + ts, where ma is the mean total bits generated by the combined sources, and s
is the standard deviation of the total number of bits produced by the combined sources.
(b) Find the average number of bits discarded per period.
(c) Find the long-term fraction of bits lost by the multiplexer.
(d) Find the average number of bits per source allocated in part a, and find the average
number of bits lost per source. What happens as N becomes large?
(e) Suppose we require that t be adjusted with N so that the fraction of bits lost per
source is kept constant. Find an equation whose solution yields the desired value of t.
(f) Do the above results change if the signals have pairwise covariance r?
6.93. Consider the estimation of T given N1 and arrivals in Problem 6.17.
(a) Find the ML and MAP estimators for T.
(b) Find the linear mean square estimator for T.
(c) Repeat parts a and b if N1 and N2 are given.
CHAPTER
Random Processes
9
In certain random experiments, the outcome is a function of time or space. For example, in speech recognition systems, decisions are made on the basis of a voltage waveform corresponding to a speech utterance. In an image processing system, the intensity
and color of the image varies over a rectangular region. In a peer-to-peer network, the
number of peers in the system varies with time. In some situations, two or more functions of time may be of interest. For example, the temperature in a certain city and the
demand placed on the local electric power utility vary together in time.
The random time functions in the above examples can be viewed as numerical
quantities that evolve randomly in time or space. Thus what we really have is a family
of random variables indexed by the time or space variable. In this chapter we begin the
study of random processes. We will proceed as follows:
• In Section 9.1 we introduce the notion of a random process (or stochastic
process), which is defined as an indexed family of random variables.
• We are interested in specifying the joint behavior of the random variables within
a family (i.e., the temperature at two time instants). In Section 9.2 we see that this
is done by specifying joint distribution functions, as well as mean and covariance
functions.
• In Sections 9.3 to 9.5 we present examples of stochastic processes and show how
models of complex processes can be developed from a few simple models.
• In Section 9.6 we introduce the class of stationary random processes that can be
viewed as random processes in “steady state.”
• In Section 9.7 we investigate the continuity properties of random processes and
define their derivatives and integrals.
• In Section 9.8 we examine the properties of time averages of random processes
and the problem of estimating the parameters of a random process.
• In Section 9.9 we describe methods for representing random processes by Fourier series and by the Karhunen-Loeve expansion.
• Finally, in Section 9.10 we present methods for generating random processes.
487
488
9.1
Chapter 9
Random Processes
DEFINITION OF A RANDOM PROCESS
Consider a random experiment specified by the outcomes z from some sample space S,
by the events defined on S, and by the probabilities on these events. Suppose that to
every outcome z H S, we assign a function of time according to some rule:
t H I.
X1t, z2
The graph of the function X1t, z2 versus t, for z fixed, is called a realization, sample
path, or sample function of the random process. Thus we can view the outcome of the
random experiment as producing an entire function of time as shown in Fig. 9.1. On the
other hand, if we fix a time tk from the index set I, then X1tk , z2 is a random variable
(see Fig. 9.1) since we are mapping z onto a real number. Thus we have created a family (or ensemble) of random variables indexed by the parameter t, 5X1t, z2, t H I6.
This family is called a random process. We also refer to random processes as stochastic
processes. We usually suppress the z and use X(t) to denote a random process.
A stochastic process is said to be discrete-time if the index set I is a countable set
(i.e., the set of integers or the set of nonnegative integers). When dealing with discretetime processes, we usually use n to denote the time index and Xn to denote the random
process. A continuous-time stochastic process is one in which I is continuous (i.e., the
real line or the nonnegative real line).
The following example shows how we can imagine a stochastic process as resulting from nature selecting z at the beginning of time and gradually revealing it in time
through X1t, z2.
X(t, z1)
t1
t2
t3
t2
t3
t
tk
X(t, z2)
t1
tk
t
X(t, z3)
t1
t2
FIGURE 9.1
Several realizations of a random process.
t3
tk
t
Section 9.1
Example 9.1
Definition of a Random Process
489
Random Binary Sequence
Let z be a number selected at random from the interval S = 30, 14, and let b1b2 Á be the binary
expansion of z:
z = a bi 2 -i
q
i=1
where bi H 50, 16.
Define the discrete-time random process X1n, z2 by
n = 1, 2, Á .
X1n, z2 = bn
The resulting process is sequence of binary numbers, with X1n, z2 equal to the nth number in
the binary expansion of z.
Example 9.2 Random Sinusoids
Let z be selected at random from the interval 3-1, 14. Define the continuous-time random
process X1t, z2 by
X1t, z2 = z cos12pt2
-q 6 t 6 q.
The realizations of this random process are sinusoids with amplitude z, as shown in Fig. 9.2(a).
Let z be selected at random from the interval 1-p, p2 and let Y1t, z2 = cos12pt + z2.
The realizations of Y1t, z2 are phase-shifted versions of cos 2pt as shown in Fig 9.2(b).
z ⫽ 0.6
z ⫽ 0.9
z ⫽ ⫺0.2
t
(a)
z ⫽ p/ 4
z⫽0
t
(b)
FIGURE 9.2
(a) Sinusoid with random amplitude, (b) Sinusoid with random
phase.
490
Chapter 9
Random Processes
The randomness in z induces randomness in the observed function X1t, z2. In
principle, one can deduce the probability of events involving a stochastic process at
various instants of time from probabilities involving z by using the equivalent-event
method introduced in Chapter 4.
Example 9.3
Find the following probabilities for the random process introduced in Example 9.1:
P3X11, z2 = 04 and P3X11, z2 = 0 and X12, z2 = 14.
The probabilities are obtained by finding the equivalent events in terms of z:
P3X11, z2 = 04 = Pc0 … z 6
1
1
d =
2
2
P3X11, z2 = 0 and X12, z2 = 14 = Pc
1
1
1
… z 6 d = ,
4
2
4
since all points in the interval 30 … z … 14 begin with b1 = 0 and all points in 31/4, 1/22 begin
with b1 = 0 and b2 = 1. Clearly, any sequence of k bits has a corresponding subinterval of length
(and hence probability) 2 -k.
Example 9.4
Find the pdf of X0 = X1t0 , z2 and Y1t0 , z2 in Example 9.2.
If t0 is such that cos12pt02 = 0, then X1t0 , z2 = 0 for all z and the pdf of X1t02 is a delta
function of unit weight at x = 0. Otherwise, X1t0 , z2 is uniformly distributed in the interval
1-cos 2pt0 , cos 2pt02 since z is uniformly distributed in 3-1, 14 (see Fig. 9.3a). Note that the pdf
of X1t0 , z2 depends on t0 .
The approach used in Example 4.36 can be used to show that Y1t0 , z2 has an arcsine distribution:
1
ƒyƒ 6 1
,
fY1y2 =
p21 - y2
(see Fig. 9.3b). Note that the pdf of Y1t0 , z2 does not depend on t0 .
Figure 9.3(c) shows a histogram of 1000 samples of the amplitudes X1t0 , z2 at t0 = 0,
which can be seen to be approximately uniformly distributed in 3 -1, 14. Figure 9.3(d) shows the
histogram for the samples of the sinusoid with random phase. Clearly there is agreement with
the arcsine pdf.
In general, the sample paths of a stochastic process can be quite complicated
and cannot be described by simple formulas. In addition, it is usually not possible to
identify an underlying probability space for the family of observed functions of time.
Thus the equivalent-event approach for computing the probability of events involving
X1t, z2 in terms of the probabilities of events involving z does not prove useful in
Section 9.2
fX(t0)(x)
491
Specifying a Random Process
fY(t0)(x)
1/2 cos 2πt0
x
0
⫺ cos 2πt0
y
⫺1
cos 2πt0
0
(a)
1
(b)
0.2
0.1
0.08
0.15
0.06
0.1
0.04
0.05
0.02
0
⫺1
⫺0.5
0
(c)
0.5
1
0
⫺1
⫺0.5
0
(d)
0.5
1
FIGURE 9.3
(a) pdf of sinusoid with random amplitude. (b) pdf of sinusoid with random phase. (c) Histogram of samples from
uniform amplitude sinusoid at t = 0. (d) Histogram of samples from random phase sinusoid at t = 0.
practice. In the next section we show an alternative method for specifying the probabilities of events involving a stochastic process.
9.2
SPECIFYING A RANDOM PROCESS
There are many questions regarding random processes that cannot be answered with
just knowledge of the distribution at a single time instant. For example, we may be interested in the temperature at a given locale at two different times. This requires the
following information:
P3x1 6 X1t12 … x1 , x2 6 X1t22 … x24.
In another example, the speech compression system in a cellular phone predicts the
value of the speech signal at the next sampling time based on the previous k samples.
Thus we may be interested in the following probability:
P3a 6 X1tk + 12 … b ƒ X1t12 = x1 , X1t22 = x2 , Á , X1tk2 = xk4.
492
Chapter 9
Random Processes
It is clear that a general description of a random process should provide probabilities
for vectors of samples of the process.
9.2.1
Joint Distributions of Time Samples
Let X1 , X2 , Á , Xk be the k random variables obtained by sampling the random
process X1t, z2 at the times t1 , t2 , Á , tk:
X1 = X1t1 , z2, X2 = X1t2 , z,2, Á , Xk = X1tk , z2,
as shown in Fig. 9.1. The joint behavior of the random process at these k time instants
is specified by the joint cumulative distribution of the vector random variable
X1 , X2 , Á , Xk . The probabilities of any event involving the random process at all or
some of these time instants can be computed from this cdf using the methods developed for vector random variables in Chapter 6. Thus, a stochastic process is specified by
the collection of kth-order joint cumulative distribution functions:
FX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1t12 … x1 , X1t22 … x2 , Á , X1tk2 … xk4, (9.1)
for any k and any choice of sampling instants t1 , Á , tk . Note that the collection of cdf’s
must be consistent in the sense that lower-order cdf’s are obtained as marginals of
higher-order cdf’s. If the stochastic process is continuous-valued, then a collection of
probability density functions can be used instead:
fX1, Á , Xk1x1 , x2 , Á , xk2 dx1 Á dxn
= P5x1 6 X1t12 … x1 + dx1 , Á , xk 6 X1tk2 … xk + dxk4.
(9.2)
If the stochastic process is discrete-valued, then a collection of probability mass
functions can be used to specify the stochastic process:
pX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1t12 = x1 , X1t22 = x2 , Á , X1tk2 = xk4
(9.3)
for any k and any choice of sampling instants n1 , Á , nk .
At first glance it does not appear that we have made much progress in specifying
random processes because we are now confronted with the task of specifying a vast
collection of joint cdf’s! However, this approach works because most useful models of
stochastic processes are obtained by elaborating on a few simple models, so the methods developed in Chapters 5 and 6 of this book can be used to derive the required cdf’s.
The following examples give a preview of how we construct complex models from simple models. We develop these important examples more fully in Sections 9.3 to 9.5.
Example 9.5
iid Bernoulli Random Variables
Let Xn be a sequence of independent, identically distributed Bernoulli random variables with
p = 1/2. The joint pmf for any k time samples is then
1 k
P3X1 = x1 , X2 = x2 , Á , Xk = xk4 = P3X1 = x14 Á P3Xk = xk4 = a b
2
Section 9.2
Specifying a Random Process
493
where xi H 50, 16 for all i. This binary random process is equivalent to the one discussed in
Example 9.1.
Example 9.6
iid Gaussian Random Variables
Let Xn be a sequence of independent, identically distributed Gaussian random variables with
zero mean and variance s2X . The joint pdf for any k time samples is then
fX1,X2, Á ,Xk1x1 , x2 , Á , xk2 =
1
12ps 2
2 k/2
2
2
e -1x1 + x2 +
Á + x 22/2s2
k
.
The following two examples show how more complex and interesting processes
can be built from iid sequences.
Example 9.7
Binomial Counting Process
Let Xn be a sequence of independent, identically distributed Bernoulli random variables with
p = 1/2. Let Sn be the number of 1’s in the first n trials:
Sn = X1 + X2 + Á + Xn for n = 0, 1, Á .
Sn is an integer-valued nondecreasing function of n that grows by unit steps after a random number of time instants. From previous chapters we know that Sn is a binomial random variable with
parameters n and p = 1/2. In the next section we show how to find the joint pmf’s of Sn using
conditional probabilities.
Example 9.8 Filtered Noisy Signal
Let Xj be a sequence of independent, identically distributed observations of a signal voltage m
corrupted by zero-mean Gaussian noise Nj with variance s2:
Xj = m + Nj for j = 0, 1, Á .
Consider the signal that results from averaging the sequence of observations:
Sn = 1X1 + X2 + Á + Xn2/n for n = 0, 1, Á .
From previous chapters we know that Sn is the sample mean of an iid sequence of Gaussian random variables. We know that Sn itself is a Gaussian random variable with mean m and variance
s2/n, and so it tends towards the value m as n increases. In a later section, we show that Sn is an
example from the class of Gaussian random processes.
9.2.2
The Mean, Autocorrelation, and Autocovariance Functions
The moments of time samples of a random process can be used to partially specify the
random process because they summarize the information contained in the joint cdf’s.
494
Chapter 9
Random Processes
The mean function mX1t2 and the variance function VAR[X(t)] of a continuous-time
random process X(t) are defined by
mX1t2 = E3X1t24 =
q
L- q
xfX1t21x2 dx,
(9.4)
and
q
VAR3X1t24 =
L- q
1x - mX1t222 fX1t21x2 dx,
(9.5)
where fX1t21x2 is the pdf of X(t). Note that mX1t2 and VAR[X(t)] are deterministic
functions of time. Trends in the behavior of X(t) are reflected in the variation of mX1t2
with time. The variance gives an indication of the spread in the values taken on by X(t)
at different time instants.
The autocorrelation RX(t1 , t2) of a random process X(t) is defined as the joint
moment of X1t12 and X1t22:
RX1t1 , t22 = E3X1t12X1t224 =
q
q
L- q L- q
xyfX1t12,X1t221x, y2 dx dy,
(9.6)
where fX1t12,X1t221x, y2 is the second-order pdf of X(t). In general, the autocorrelation
is a function of t1 and t2 . Note that RX1t, t2 = E3X21t24.
The autocovariance CX(t1 , t2) of a random process X(t) is defined as the covariance of X1t12 and X1t22:
CX1t1 , t22 = E35X1t12 - mX1t1265X1t22 - mX1t2264.
(9.7)
From Eq. (5.30), the autocovariance can be expressed in terms of the autocorrelation
and the means:
(9.8)
CX1t1 , t22 = RX1t1 , t22 - mX1t12mX1t22.
Note that the variance of X(t) can be obtained from CX1t1 , t22:
VAR3X1t24 = E31X1t2 - mX1t2224 = CX1t, t2.
(9.9)
The correlation coefficient of X(t) is defined as the correlation coefficient of
X1t12 and X1t22 (see Eq. 5.31):
rX1t1 , t22 =
CX1t1 , t22
2CX1t1 , t122CX1t2 , t22
.
(9.10)
From Eq. (5.32) we have that ƒ rX1t1 , t22 ƒ … 1. Recall that the correlation coefficient is
a measure of the extent to which a random variable can be predicted as a linear function of another. In Chapter 10, we will see that the autocovariance function and the autocorrelation function play a critical role in the design of linear methods for analyzing
and processing random signals.
Section 9.2
Specifying a Random Process
495
The mean, variance, autocorrelation, and autocovariance functions for discretetime random processes are defined in the same manner as above. We use a slightly different notation for the time index. The mean and variance of a discrete-time random
process Xn are defined as:
mX1n2 = E3Xn4 and VAR3Xn4 = E31Xn - mX1n2224.
(9.11)
The autocorrelation and autocovariance functions of a discrete-time random process
Xn are defined as follows:
RX1n1 , n22 = E3X1n12X1n224
(9.12)
and
CX1n1 , n22 = E35X1n12 - mX1n1265X1n22 - mX1n2264
= RX1n1 , n22 - mX1n12mX1n22.
(9.13)
Before proceeding to examples, we reiterate that the mean, autocorrelation,
and autocovariance functions are only partial descriptions of a random process. Thus
we will see later in the chapter that it is possible for two quite different random
processes to have the same mean, autocorrelation, and autocovariance functions.
Example 9.9 Sinusoid with Random Amplitude
Let X1t2 = A cos 2pt, where A is some random variable (see Fig. 9.2a). The mean of X(t) is
found using Eq. (4.30):
mX1t2 = E3A cos 2pt4 = E3A4 cos 2pt.
Note that the mean varies with t. In particular, note that the process is always zero for values of t
where cos 2pt = 0.
The autocorrelation is
RX1t1 , t22 = E3A cos 2pt1 A cos 2pt24
= E3A24 cos 2pt1 cos 2pt2 ,
and the autocovariance is then
CX1t1 , t22 = RX1t1 , t22 - mX1t12mX1t22
= 5E3A24 - E3A426 cos 2pt1 cos 2pt2
= VAR3A4 cos 2pt1 cos 2pt2 .
Example 9.10 Sinusoid with Random Phase
Let X1t2 = cos1vt + ®2, where ® is uniformly distributed in the interval 1-p, p2 (see Fig.
9.2b). The mean of X(t) is found using Eq. (4.30):
496
Chapter 9
Random Processes
mX1t2 = E3cos1vt + ®24 =
p
1
cos1vt + u2 du = 0.
2p L-p
The autocorrelation and autocovariance are then
CX1t1 , t22 = RX1t1 , t22 = E3cos1vt1 + ®2 cos1vt2 + ®24
p
=
1
1
5cos1v1t1 - t22 + cos1v1t1 + t22 + 2u26 du
2p L-p 2
=
1
cos1v1t1 - t222,
2
where we used the identity cos(a) cos1b2 = 1/2 cos1a + b2 + 1/2 cos1a - b2. Note that mX1t2
is a constant and that CX1t1 , t22 depends only on ƒ t1 - t2 ƒ . Note as well that the samples at time
t1 and t2 are uncorrelated if v1t1 - t22 = kp where k is any integer.
9.2.3
Multiple Random Processes
In most situations we deal with more than one random process at a time. For example,
we may be interested in the temperatures at city a, X(t), and city b, Y(t). Another very
common example involves a random process X(t) that is the “input” to a system and
another random process Y(t) that is the “output” of the system. Naturally, we are interested in the interplay between X(t) and Y(t).
The joint behavior of two or more random processes is specified by the collection of joint distributions for all possible choices of time samples of the processes.
Thus for a pair of continuous-valued random processes X(t) and Y(t) we must specify all possible joint density functions of X1t12, Á , X1tk2 and Y1t¿12, Á , Y1t¿j2 for all
k, j, and all choices of t1 , Á , tk and t¿1 , Á , t¿j . For example, the simplest joint pdf
would be:
fX1t12,Y1t221x, y2 dxdy = P5x 6 X1t12 … x + dx, y 6 Y1t22 … y + dy4.
Note that the time indices of X(t) and Y(t) need not be the same. For example, we may
be interested in the input at time t1 and the output at a later time t2 .
The random processes X(t) and Y(t) are said to be independent random processes
if the vector random variables X = 1X1t12, Á , X1tk22 and Y = 1Y1t¿12, Á , Y1t¿j22 are
independent for all k, j, and all choices of t1 , Á , tk and t¿1 , Á , t¿j:
FX,Y (x1, Á ,xk, y1, Á ,yj) = FX (X1, Á ,Xk) FY (y1, Á ,yj).
The cross-correlation RX,Y (t1 , t2) of X(t) and Y(t) is defined by
RX,Y1t1 , t22 = E3X1t12Y1t224.
(9.14)
The processes X(t) and Y(t) are said to be orthogonal random processes if
RX,Y1t1 , t22 = 0
for all t1 and t2 .
(9.15)
Section 9.2
Specifying a Random Process
497
The cross-covariance CX,Y(t1 , t2) of X(t) and Y(t) is defined by
CX,Y1t1 , t22 = E35X1t12 - mX1t1265Y1t22 - mX1t2264
= RX,Y1t1 , t22 - mX1t12mX1t22.
(9.16)
The processes X(t) and Y(t) are said to be uncorrelated random processes if
CX,Y1t1 , t22 = 0
for all t1 and t2 .
(9.17)
Example 9.11
Let X1t2 = cos1vt + ®2 and Y1t2 = sin1vt + ®2, where ® is a random variable uniformly
distributed in 3-p, p4. Find the cross-covariance of X(t) and Y(t).
From Example 9.10 we know that X(t) and Y(t) are zero mean. From Eq. (9.16), the crosscovariance is then equal to the cross-correlation:
CX,Y1t1 , t22 = RX,Y1t1 , t22 = E3cos1vt1 + ®2 sin1vt2 + ®24
1
1
= Ec - sin1v1t1 - t222 + sin1v1t1 + t22 + 2®2 d
2
2
1
= - sin1v1t1 - t222,
2
since E3sin1v1t1 + t22 + 2®24 = 0. X(t) and Y(t) are not uncorrelated random processes because the cross-covariance is not equal to zero for all choices of time samples. Note, however,
that X1t12 and Y1t22 are uncorrelated random variables for t1 and t2 such that v1t1 - t22 = kp
where k is any integer.
Example 9.12 Signal Plus Noise
Suppose process Y(t) consists of a desired signal X(t) plus noise N(t):
Y1t2 = X1t2 + N1t2.
Find the cross-correlation between the observed signal and the desired signal assuming that X(t)
and N(t) are independent random processes.
From Eq. (8.14), we have
RXY1t1 , t22 = E3X1t12Y1t224
= E3X1t125X1t22 + N1t2264
= RX1t1 , t22 + E3X1t124E3N1t224
= RX1t1 , t22 + mX1tl2mN1t22,
where the third equality followed from the fact that X(t) and N(t) are independent.
498
9.3
Chapter 9
Random Processes
DISCRETE-TIME PROCESSES: SUM PROCESS, BINOMIAL COUNTING
PROCESS, AND RANDOM WALK
In this section we introduce several important discrete-time random processes. We
begin with the simplest class of random processes—independent, identically distributed sequences—and then consider the sum process that results from adding an iid sequence. We show that the sum process satisfies the independent increments property as
well as the Markov property. Both of these properties greatly facilitate the calculation
of joint probabilities. We also introduce the binomial counting process and the random
walk process as special cases of sum processes.
9.3.1
iid Random Process
Let Xn be a discrete-time random process consisting of a sequence of independent,
identically distributed (iid) random variables with common cdf FX1x2, mean m, and
variance s2. The sequence Xn is called the iid random process.
The joint cdf for any time instants n1 , Á , nk is given by
FX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1 … x1 , X2 … x2 , Á , Xk … xk4
= FX1x12FX1x22 Á FX1xk2,
(9.18)
where, for simplicity, Xk denotes Xnk . Equation (9.18) implies that if Xn is discretevalued, the joint pmf factors into the product of individual pmf’s, and if Xn is continuous-valued, the joint pdf factors into the product of the individual pdf’s.
The mean of an iid process is obtained from Eq. (9.4):
mX1n2 = E3Xn4 = m
for all n.
(9.19)
Thus, the mean is constant.
The autocovariance function is obtained from Eq. (9.6) as follows. If n1 Z n2 , then
CX1n1 , n22 = E31Xn1 - m21Xn2 - m24
= E31Xn1 - m24E31Xn2 - m24 = 0,
since Xn1 and Xn2 are independent random variables. If n1 = n2 = n, then
CX1n1 , n22 = E31Xn - m224 = s2.
We can express the autocovariance of the iid process in compact form as follows:
CX1n1 , n22 = s2dn1n2 ,
(9.20)
where dn1n2 = 1 if n1 = n2 , and 0 otherwise. Therefore the autocovariance function is
zero everywhere except for n1 = n2 . The autocorrelation function of the iid process is
found from Eq. (9.7):
RX1n1 , n22 = CX1n1 , n22 + m2.
(9.21)
Section 9.3
Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk
In
499
Sn
5
4
3
2
1
0
1
0
1
2
3
4
5
6
7
n
8
0
0
1
2
3
4
(a)
5
6
7
8
n
(b)
FIGURE 9.4
(a) Realization of a Bernoulli process. In = 1 indicates that a light bulb fails and is replaced on day n. (b) Realization of a binomial
process. Sn denotes the number of light bulbs that have failed up to time n.
Example 9.13 Bernoulli Random Process
Let In be a sequence of independent Bernoulli random variables. In is then an iid random
process taking on values from the set 50, 16. A realization of such a process is shown in Fig.
9.4(a). For example, In could be an indicator function for the event “a light bulb fails and is replaced on day n.”
Since In is a Bernoulli random variable, it has mean and variance
mI1n2 = p
VAR3In4 = p11 - p2.
The independence of the In’s makes probabilities easy to compute. For example, the probability that the first four bits in the sequence are 1001 is
P3I1 = 1, I2 = 0, I3 = 0, I4 = 14
= P3I1 = 14P3I2 = 04P3I3 = 04P3I4 = 14
= p211 - p22.
Similarly, the probability that the second bit is 0 and the seventh is 1 is
P3I2 = 0, I7 = 14 = P3I2 = 04P3I7 = 14 = p11 - p2.
Example 9.14 Random Step Process
An up-down counter is driven by +1 or -1 pulses. Let the input to the counter be given by
Dn = 2In - 1, where In is the Bernoulli random process, then
Dn = b
+1
-1
if In = 1
if In = 0.
For example, Dn might represent the change in position of a particle that moves along a straight
line in jumps of ;1 every time unit. A realization of Dn is shown in Fig. 9.5(a).
500
Chapter 9
Random Processes
Sn
Dn
3
2
1
1
0
1
2
3 4
5
6
7 8
⫺1
9
10 11 12
n
n
0
⫺1
(b)
(a)
FIGURE 9.5
(a) Realization of a random step process. Dn ⴝ 1 implies that the particle moves one step to the right at time n. (b) Realization
of a random walk process. Sn denotes the position of a particle at time n.
The mean of Dn is
mD1n2 = E3Dn4 = E32In - 14 = 2E3In4 - 1 = 2p - 1.
The variance of Dn is found from Eqs. (4.37) and (4.38):
VAR3Dn4 = VAR32In - 14 = 2 2 VAR3In4 = 4p11 - p2.
The probabilities of events involving Dn are computed as in Example 9.13.
9.3.2
Independent Increments and Markov Properties of Random Processes
Before proceeding to build random processes from iid processes, we present two very
useful properties of random processes. Let X(t) be a random process and consider two
time instants, t1 6 t2 . The increment of the random process in the interval t1 6 t … t2 is
defined as X1t22 - X1t12. A random process X(t) is said to have independent increments
if the increments in disjoint intervals are independent random variables, that is, for any k
and any choice of sampling instants t1 6 t2 6 Á 6 tk , the associated increments
X1t22 - X1t12, X1t32 - X1t22, Á , X1tk2 - X1tk - 12
are independent random variables. In the next subsection, we show that the joint pdf
(pmf) of X1t12, X1t22, Á , X1tk2 is given by the product of the pdf (pmf) of X1t12 and
the marginal pdf’s (pmf’s) of the individual increments.
Another useful property of random processes that allows us to readily obtain the
joint probabilities is the Markov property. A random process X(t) is said to be Markov
if the future of the process given the present is independent of the past; that is, for any k
and any choice of sampling instants t1 6 t2 6 Á 6 tk and for any x1 , x2 , Á , xk ,
fX1tk21xk ƒ X1tk - 12 = xk - 1 , Á , X1t12 = x12
= fX1tk21xk ƒ X1tk - 12 = xk - 12
(9.22)
Section 9.3
Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk
Xn
⫹
Sn⫺1
501
Sn ⫽ Sn⫺1 ⫹ Xn
Unit
delay
FIGURE 9.6
The sum process Sn ⴝ X1 ⴙ Á ⴙ Xn , S0 ⴝ 0, can be
generated in this way.
if X(t) is continuous-valued, and
P3X1tk2 = xk ƒ X1tk - 12 = xk - 1 , Á , X1t12 = x14
= P3X1tk2 = xk ƒ X1tk - 12 = xk - 14
(9.23)
if X(t) is discrete-valued. The expressions on the right-hand side of the above two
equations are called the transition pdf and transition pmf, respectively. In the next sections we encounter several processes that satisfy the Markov property. Chapter 11 is
entirely devoted to random processes that satisfy this property.
It is easy to show that a random process that has independent increments is also
a Markov process. The converse is not true; that is, the Markov property does not imply
independent increments.
9.3.3
Sum Processes: The Binomial Counting and Random Walk Processes
Many interesting random processes are obtained as the sum of a sequence of iid random variables, X1 , X2 , Á :
Sn = X1 + X2 + Á + Xn
= Sn - 1 + Xn ,
n = 1, 2, Á
(9.24)
where S0 = 0. We call Sn the sum process. The pdf or pmf of Sn is found using the convolution or characteristic-equation methods presented in Section 7.1. Note that Sn depends
on the “past,” S1 , Á , Sn - 1 , only through Sn - 1 , that is, Sn is independent of the past
when Sn - 1 is known. This can be seen clearly from Fig. 9.6, which shows a recursive procedure for computing Sn in terms of Sn - 1 and the increment Xn . Thus Sn is a Markov
process.
Example 9.15 Binomial Counting Process
Let the Ii be the sequence of independent Bernoulli random variables in Example 9.13, and let
Sn be the corresponding sum process. Sn is then the counting process that gives the number of
successes in the first n Bernoulli trials. The sample function for Sn corresponding to a particular
sequence of Ii’s is shown in Fig. 9.4(b). Note that the counting process can only increase over
time. Note as well that the binomial process can increase by at most one unit at a time. If In indicates that a light bulb fails and is replaced on day n, then Sn denotes the number of light bulbs
that have failed up to day n.
502
Chapter 9
Random Processes
Since Sn is the sum of n independent Bernoulli random variables, Sn is a binomial random
variable with parameters n and p = P3I = 14:
n
P3Sn = j4 = ¢ ≤ p j11 - p2n - j
j
for 0 … j … n,
and zero otherwise. Thus Sn has mean np and variance np11 - p2. Note that the mean and variance of this process grow linearly with time. This reflects the fact that as time progresses, that is,
as n grows, the range of values that can be assumed by the process increases. If p 7 0 then we
also know that Sn has a tendency to grow steadily without bound over time.
The Markov property of the binomial counting process is easy to deduce. Given that the
current value of the process at time n - 1 is Sn - 1 = k, the process at the next time instant will
be k with probability 1 - p or k + 1 with probability p. Once we know the value of the process
at time n - 1, the values of the random process prior to time n - 1 are irrelevant.
Example 9.16 One-Dimensional Random Walk
Let Dn be the iid process of ;1 random variables in Example 9.14, and let Sn be the corresponding sum process. Sn can represent the position of a particle at time n. The random process Sn is an
example of a one-dimensional random walk. A sample function of Sn is shown in Fig. 9.5(b). Unlike the binomial process, the random walk can increase or decrease over time. The random walk
process changes by one unit at a time.
The pmf of Sn is found as follows. If there are k “ +1”s in the first n trials, then there are
n - k “-1”s, and Sn = k - 1n - k2 = 2k - n. Conversely, Sn = j if the number of +1’s is
k = 1j + n2/2. If 1j + n2/2 is not an integer, then Sn cannot equal j. Thus
n
P3Sn = 2k - n4 = ¢ ≤ pk11 - p2n - k
k
for k H 50, 1, Á , n6.
Since k is the number of successes in n Bernoulli trials, the mean of the random walk is:
E3Sn4 = 2np - n = n12p - 12.
As time progresses, the random walk can fluctuate over an increasingly broader range of positive and negative values. Sn has a tendency to either grow if p 7 1/2, or to decrease if p 6 1/2.
The case p = 1/2 provides a precarious balance, and we will see later, in Chapter 12, very interesting dynamics. Figure 9.7(a) shows the first 100 steps from a sample function of the random
walk with p = 1/2. Figure 9.7(b) shows four sample functions of the random walk process with
p = 1/2 for 1000 steps. Figure 9.7(c) shows four sample functions in the asymmetric case where
p = 3/4. Note the strong linear growth trend in the process.
The sum process Sn has independent increments in nonoverlapping time intervals. To see this consider two time intervals: n0 6 n … n1 and n2 6 n … n3 , where
n1 … n2 . The increments of Sn in these disjoint time intervals are given by
Sn1 - Sn0 = Xn0 + 1 + Á + Xn1
Sn3 - Sn2 = Xn2 + 1 + Á + Xn3 .
(9.25)
Section 9.3
Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk
10
8
6
4
2
0
⫺2
⫺4
10
20
30
40
50
(a)
60
70
80
90
100
100
200
300
400
500
(b)
600
700
800
900 1000
100
200
300
400
500
600
700
800
900 1000
60
40
20
0
⫺20
⫺40
⫺60
⫺80
0
600
500
400
300
200
100
0
0
(c)
FIGURE 9.7
(a) Random walk process with p ⴝ 1/2. (b) Four sample functions of
symmetric random walk process with p ⴝ 1/2. (c) Four sample functions
of asymmetric random walk with p ⴝ 3/4.
503
504
Chapter 9
Random Processes
The above increments do not have any of the Xn’s in common, so the independence of
the Xn’s implies that the increments 1Sn1 - Sn02 and 1Sn3 - Sn22 are independent random variables.
For n¿ 7 n, the increment Sn¿ - Sn is the sum of n¿ - n iid random variables, so
it has the same distribution as Sn¿ - n , the sum of the first n¿ - n X’s, that is,
P3Sn¿ - Sn = y4 = P3Sn¿ - n = y4.
(9.26)
Thus increments in intervals of the same length have the same distribution regardless of
when the interval begins. For this reason, we also say that Sn has stationary increments.
Example 9.17 Independent and Stationary Increments of Binomial Process
and Random Walk
The independent and stationary increments property is particularly easy to see for the binomial
process since the increments in an interval are the number of successes in the corresponding
Bernoulli trials. The independent increment property follows from the fact that the numbers of
successes in disjoint time intervals are independent. The stationary increments property follows
from the fact that the pmf for the increment in a time interval is the binomial pmf with the corresponding number of trials.
The increment in a random walk process is determined by the same number of successes
as a binomial process. It then follows that the random walk also has independent and stationary
increments.
The independent and stationary increments property of the sum process Sn
makes it easy to compute the joint pmf/pdf for any number of time instants. For simplicity, suppose that the Xn are integer-valued, so Sn is also integer-valued. We compute
the joint pmf of Sn at times n1 , n2 , and n3:
P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34
= P3Sn1 = y1 , Sn2 - Sn1 = y2 - y1 , Sn3 - Sn2 = y3 - y24,
(9.27)
since the process is equal to y1 , y2 , and y3 at times n1 , n2 , and n3 , if and only if it is
equal to y1 at time n1 , and the subsequent increments are y2 - y1 , and y3 - y2 . The
independent increments property then implies that
P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34
= P3Sn1 = y14P3Sn2 - Sn1 = y2 - y14P3Sn3 - Sn2 = y3 - y24.
(9.28)
Finally, the stationary increments property implies that the joint pmf of Sn is given by:
P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34
= P3Sn1 = y14P3Sn2 - n1 = y2 - y14P3Sn3 - n2 = y3 - y24.
Clearly, we can use this procedure to write the joint pmf of Sn at any time instants
n1 6 n2 6 Á 6 nk in terms of the pmf at the initial time instant and the pmf’s of the
subsequent increments:
Section 9.3
Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk
505
P3Sn1 = y1 , Sn2 = y2 , Á , Snk = yk4
= P3Sn1 = y14P3Sn2 - n1 = y2 - y14 Á P3Snk - nk - 1 = yk - yk - 14.
(9.29)
If the Xn are continuous-valued random variables, then it can be shown that the joint
density of Sn at times n1 , n2 , Á , nk is:
fSn , Sn , Á , Sn 1y1 , y2 , Á , yk2 = fSn 1y12fSn - n 1y2 - y12 Á fSn - n 1yk - yk - 12.
1
2
2
1
k
1
k
k-1
(9.30)
Example 9.18 Joint pmf of Binomial Counting Process
Find the joint pmf for the binomial counting process at times n1 and n2 . Find the probability that
P3Sn1 = 0, Sn2 = n2 - n14, that is, the first n1 trials are failures and the remaining trials are all
successes.
Following the above approach we have
P3Sn1 = y1 , Sn2 = y24 = P3Sn1 = y14P3Sn2 - Sn1 = y2 - y14
= ¢
n2 - n1 y2 - y1
n
11 - p2n2 - n1 - y2 + y1 ¢ 1 ≤ py111 - p2n1 - y1
≤p
y2 - y1
y1
= ¢
n2 - n1 n1 y2
≤ ¢ ≤ p 11 - p2n2 - y2.
y2 - y1
y1
The requested probability is then:
P3Sn1 = 0, Sn2 = n2 - n14 = ¢
n2 - n1 n1 n2 - n1
11 - p2n1 = pn2 - n111 - p2n1
≤ ¢ ≤p
n2 - n1
0
which is what we would obtain from a direct calculation for Bernoulli trials.
Example 9.19 Joint pdf of Sum of iid Gaussian Sequence
Let Xn be a sequence of iid Gaussian random variables with zero mean and variance s2. Find
the joint pdf of the corresponding sum process at times n1 and n2 .
From Example 7.3, we know that Sn is a Gaussian random variable with mean zero and
variance ns2. The joint pdf of Sn at times nj and n2 is given by
fSn , Sn 1y1 , y22 = fSn - n 1y2 - y12fSn 1y12
1
2
2
=
1
1
2 2p1n2 - n12s
1
2
e -1y2 - y12 /321n2 - n12s 4
2
2
1
2 2pn1s2
2
2
e -y1 /2n1s .
Since the sum process Sn is the sum of n iid random variables, it has mean and
variance:
mS1n2 = E3Sn4 = nE3X4 = nm
VAR3Sn4 = n VAR3X4 = ns2.
(9.31)
(9.32)
506
Chapter 9
Random Processes
The property of independent increments allows us to compute the autocovariance in
an interesting way. Suppose n … k so n = min1n, k2, then
CS1n, k2 = E31Sn - nm21Sk - km24
= E31Sn - nm251Sn - nm2 + 1Sk - km2 - 1Sn - nm264
= E31Sn - nm224 + E31Sn - nm21Sk - Sn - 1k - n2m24.
Since Sn and the increment Sk - Sn are independent,
CS1n, k2 = E31Sn - nm224 + E31Sn - nm24E31Sk - Sn - 1k - n2m24
= E31Sn - nm224
= VAR3Sn4 = ns2,
since E3Sn - nm4 = 0. Similarly, if k = min1n, k2, we would have obtained ks2.
Therefore the autocovariance of the sum process is
CS1n, k2 = min1n, k2s2.
(9.33)
Example 9.20 Autocovariance of Random Walk
Find the autocovariance of the one-dimensional random walk.
From Example 9.14 and Eqs. (9.32) and (9.33), Sn has mean n12p - 12 and variance
4np11 - p2. Thus its autocovariance is given by
Cs1n, k2 = min1n, k24p11 - p2.
Xn
⫹
Yn ⫽ αYn⫺1 ⫹ Xn
αYn⫺1
Unit
delay
⫻
α
(a)
Xn
Unit
delay
Xn
Unit
delay
Unit
delay
Xn⫺1
Xn⫺2
⫹
Xn⫺k
Z n ⫽ Xn ⫹
(b)
FIGURE 9.8
(a) First-order autoregressive process; (b) Moving average process.
⫹ Xn⫺k
Section 9.4
Poisson and Associated Random Processes
507
The sum process can be generalized in a number of ways. For example, the recursive structure in Fig. 9.6 can be modified as shown in Fig. 9.8(a). We then obtain firstorder autoregressive random processes, which are of interest in time series analysis and in
digital signal processing. If instead we use the structure shown in Fig. 9.8(b), we obtain an
example of a moving average process. We investigate these processes in Chapter 10.
9.4
POISSON AND ASSOCIATED RANDOM PROCESSES
In this section we develop the Poisson random process, which plays an important
role in models that involve counting of events and that find application in areas
such as queueing systems and reliability analysis. We show how the continuoustime Poisson random process can be obtained as the limit of a discrete-time
process. We also introduce several random processes that are derived from the
Poisson process.
9.4.1
Poisson Process
Consider a situation in which events occur at random instants of time at an average
rate of l events per second. For example, an event could represent the arrival of a customer to a service station or the breakdown of a component in some system. Let N(t)
be the number of event occurrences in the time interval [0, t]. N(t) is then a nondecreasing, integer-valued, continuous-time random process as shown in Fig. 9.9.
18
16
14
12
10
8
6
4
2
0
0
5
S0
S1
10
15
20
S7
25
30
35
40
45
50
S8
FIGURE 9.9
A sample path of the Poisson counting process. The event occurrence times are denoted
by S1 , S2 , Á . The jth interevent time is denoted by Xj = Sj - Sjⴚ1 .
508
Chapter 9
Random Processes
Suppose that the interval [0, t] is divided into n subintervals of very short duration d = t>n. Assume that the following two conditions hold:
1. The probability of more than one event occurrence in a subinterval is negligible
compared to the probability of observing one or zero events.
2. Whether or not an event occurs in a subinterval is independent of the outcomes
in other subintervals.
The first assumption implies that the outcome in each subinterval can be viewed as a
Bernoulli trial. The second assumption implies that these Bernoulli trials are independent. The two assumptions together imply that the counting process N(t) can be approximated by the binomial counting process discussed in the previous section.
If the probability of an event occurrence in each subinterval is p, then the expected number of event occurrences in the interval [0, t] is np. Since events occur at a rate
of l events per second, the average number of events in the interval [0, t] is lt. Thus we
must have that
lt = np.
If we now let n : q (i.e., d = t/n : 0) and p : 0 while np = lt remains fixed, then
from Eq. (3.40) the binomial distribution approaches a Poisson distribution with parameter lt. We therefore conclude that the number of event occurrences N(t) in the interval [0, t] has a Poisson distribution with mean lt:
P3N1t2 = k4 =
1lt2k
k!
e -lt
for k = 0, 1, Á .
(9.34a)
For this reason N(t) is called the Poisson process. The mean function and the variance
function of the Poisson process are given by:
mN1t2 = E3N1t2 = k4 = lt and VAR3N1t24 = lt.
(9.34b)
In Section 11.3 we rederive the Poisson process using results from Markov chain
theory.
The process N(t) inherits the property of independent and stationary increments
from the underlying binomial process. First, the distribution for the number of event occurrences in any interval of length t is given by Eq. (9.34a). Next, the independent and
stationary increments property allows us to write the joint pmf for N(t) at any number
of points. For example, for t1 6 t2 ,
P3N1t12 = i, N1t22 = j4 = P3N1t12 = i4P3N1t22 - N1t12 = j - i4
= P3N1t12 = i4P3N1t2 - t12 = j - i4
=
1lt12ie -lt1 1l1t2 - t122je -l1t2 - t12
i!
1j - i2!
.
(9.35a)
The independent increments property also allows us to calculate the autocovariance of
N(t). For t1 … t2 :
Section 9.4
Poisson and Associated Random Processes
509
CN1t1 , t22 = E31N1t12 - lt121N1t22 - lt224
= E31N1t12 - lt125N1t22 - N1t12 - lt2 + lt1 + 1N1t12 - lt1264
= E31N1t12 - lt124E31N1t22 - N1t12 - l1t2 - t124 + VAR3N1t124
= VAR3N1t124 = lt1 .
(9.35b)
Example 9.21
Inquiries arrive at a recorded message device according to a Poisson process of rate 15 inquiries
per minute. Find the probability that in a 1-minute period, 3 inquiries arrive during the first 10
seconds and 2 inquiries arrive during the last 15 seconds.
The arrival rate in seconds is l = 15/60 = 1/4 inquiries per second. Writing time in seconds, the probability of interest is
P3N1102 = 3 and N1602 - N1452 = 24.
By applying first the independent increments property, and then the stationary increments property, we obtain
P3N1102 = 3 and N1602 - N1452 = 24
= P3N1102 = 34P3N1602 - N1452 = 24
= P3N1102 = 34P3N160 - 452 = 24
=
110/423e -10/4 115/422e -15/4
3!
2!
.
Consider the time T between event occurrences in a Poisson process. Again suppose that the time interval [0, t] is divided into n subintervals of length d = t/n. The
probability that the interevent time T exceeds t seconds is equivalent to no event occurring in t seconds (or in n Bernoulli trials):
P3T 7 t4 = P3no events in t seconds4
= 11 - p2n
= a1 : e -lt
lt n
b
n
as n : q .
(9.36)
Equation (9.36) implies that T is an exponential random variable with parameter l.
Since the times between event occurrences in the underlying binomial process are independent geometric random variables, it follows that the sequence of interevent times
in a Poisson process is composed of independent random variables. We therefore conclude that the interevent times in a Poisson process form an iid sequence of exponential
random variables with mean 1/l.
510
Chapter 9
Random Processes
Another quantity of interest is the time Sn at which the nth event occurs in a Poisson process. Let Tj denote the iid exponential interarrival times, then
Sn = T1 + T2 + Á + Tn .
In Example 7.5, we saw that the sum of n iid exponential random variables has an Erlang distribution. Thus the pdf of Sn is an Erlang random variable:
fSn1y2 =
1ly2n - 1
1n - 12!
le -ly
for y Ú 0.
(9.37)
Example 9.22
Find the mean and variance of the time until the tenth inquiry in Example 9.20.
The arrival rate is l = 1/4 inquiries per second, so the interarrival times are exponential
random variables with parameter l. From Table 4.1, the mean and variance of exponential interarrival times then 1/l and 1/l2, respectively. The time of the tenth arrival is the sum of ten such
iid random variables, thus
E3S104 = 10E3T4 =
10
= 40 sec
l
10
VAR3S104 = 10 VAR3T4 = 2 = 160 sec2.
l
In applications where the Poisson process models customer interarrival times, it is
customary to say that arrivals occur “at random.” We now explain what is meant by this
statement. Suppose that we are given that only one arrival occurred in an interval [0, t]
and we let X be the arrival time of the single customer. For 0 6 x 6 t, N(x) is the number of events up to time x, and N1t2 - N1x2 is the increment in the interval (x, t], then:
P3X … x4 = P3N1x2 = 1 ƒ N1t2 = 14
P3N1x2 = 1 and N1t2 = 14
=
P3N1t2 = 14
P3N1x2 = 1 and N1t2 - N1x2 = 04
=
P3N1t2 = 14
P3N1x2 = 14P3N1t2 - N1x2 = 04
=
P3N1t2 = 14
lxe -lxe -l1t - x2
lte -lt
x
= .
t
=
(9.38)
Equation (9.38) implies that given that one arrival has occurred in the interval [0, t],
then the customer arrival time is uniformly distributed in the interval [0, t]. It is in this
sense that customer arrival times occur “at random.” It can be shown that if the number
of amvals in the interval [0, t] is k, then the individual arrival times are distributed independently and uniformly in the interval.
Section 9.4
Poisson and Associated Random Processes
511
Example 9.23
Suppose two customers arrive at a shop during a two-minute period. Find the probability that
both customers arrived during the first minute.
The arrival times of the customers are independent and uniformly distributed in the twominute interval. Each customer arrives during the first minute with probability 1/2. Thus the
probability that both arrive during the first minute is 11/222 = 1/4. This answer can be verified by
showing that P3N112 = 2 ƒ N122 = 24 = 1/4.
9.4.2
Random Telegraph Signal and Other Processes Derived from the Poisson Process
Many processes are derived from the Poisson process. In this section, we present two
examples of such random processes.
Example 9.24 Random Telegraph Signal
Consider a random process X(t) that assumes the values ;1. Suppose that X102 = +1 or -1
with probability 1/2, and suppose that X(t) changes polarity with each occurrence of an event in
a Poisson process of rate a. Figure 9.10 shows a sample function of X(t).
The pmf of X(t) is given by
P3X1t2 = ;14 = P3X1t2 = ;1 | X102 = 14P3X102 = 14
+ P3X1t2 = ;1 | X102 = -14P3X102 = -14.
(9.39)
The conditional pmf’s are found by noting that X(t) will have the same polarity as X(0) only
when an even number of events occur in the interval (0, t]. Thus
P3X1t2 = ;1 | X102 = ; 14 = P3N1t2 = even integer4
q 1at22j
= a
e -at
j = 0 12j2!
1
= e -at 5eat + e -at6
2
1
= 11 + e -2at2.
2
1
0
X1
1
X2
⫺1
X3
1
X4
⫺1
X5
(9.40)
1
X6
X7
⫺1
FIGURE 9.10
Sample path of a random telegraph signal. The times between transitions Xj are iid
exponential random variables.
t
512
Chapter 9
Random Processes
X(t) and X(0) will differ in sign if the number of events in t is odd:
q 1at22j + 1
P3X1t2 = ;1 | X102 = < 14 = a
e -at
j = 0 12j + 12!
1
= e -at 5eat - e -at6
2
1
= 11 - e -2at2.
2
(9.41)
We obtain the pmf for X(t) by substituting into Eq. (9.40):
11
1
11
51 + e -2at6 +
51 - e -2at6 =
22
22
2
1
P3X1t2 = -14 = 1 - P3X1t2 = 14 = .
2
P3X1t2 = 14 =
(9.42)
Thus the random telegraph signal is equally likely to be ;1 at any time t 7 0.
The mean and variance of X(t) are
mX1t2 = 1P3X1t2 = 14 + 1-12P3X1t2 = -14 = 0
VAR3X1t24 = E3X1t224 = 1122P3X1t2 = 14 + 1-122P3X1t2 = -14 = 1.
(9.43)
The autocovariance of X(t) is found as follows:
CX1t1 , t22 = E3X1t12X1t224
= 1P3X1t12 = X1t224 + 1-12P3X1t12 Z X1t224
=
1
1
51 + e -2aƒt2 - t1 ƒ6 - 51 - e -2aƒt2 - t1ƒ6
2
2
= e -2aƒt2 - t1ƒ.
(9.44)
Thus time samples of X(t) become less and less correlated as the time between them increases.
The Poisson process and the random telegraph processes are examples of the
continuous-time Markov chain processes that are discussed in Chapter 11.
Example 9.25 Filtered Poisson Impulse Train
The Poisson process is zero at t = 0 and increases by one unit at the random arrival times
Sj , j = 1, 2, Á . Thus the Poisson process can be expressed as the sum of randomly shifted step
functions:
N1t2 = a u1t - Si2
q
N102 = 0,
i=1
where the Si are the arrival times.
Since the integral of a delta function d1t - S2 is a step function u1t - S2, we can view N(t)
as the result of integrating a train of delta functions that occur at times Sn , as shown in Fig. 9.11(a):
Section 9.4
t(
Z(t)
Poisson and Associated Random Processes
513
)dt
N(t) ⫽
u(t ⫺ Sk)
k⫽1
Z(t) ⫽
δ(t ⫺ Sk)
k⫽1
0
S1
S2
S3
S4
0
S1
S2
S3
S4
t
N(t)
t
(a)
X(t) ⫽
Filter
Z(t)
h(t ⫺ Sk)
k⫽1
X(t)
0
S1
S2
S3
S4
t
(b)
FIGURE 9.11
(a) Poisson process as integral of train of delta functions. (b) Filtered
train of delta functions.
Z1t2 = a d1t - Si2.
q
i=1
We can obtain other continuous-time processes by replacing the step function by another
function h(t),1 as shown in Fig. 9.11(b):
X1t2 = a h1t - Si2.
q
(9.45)
i=1
For example, h(t) could represent the current pulse that results when a photoelectron hits a detector. X(t) is then the total current flowing at time t. X(t) is called a shot noise process.
1
This is equivalent to passing Z(t) through a linear system whose response to a delta function is h(t).
514
Chapter 9
Random Processes
The following example shows how the properties of the Poisson process can be
used to evaluate averages involving the filtered process.
Example 9.26 Mean of Shot Noise Process
Find the expected value of the shot noise process X(t).
We condition on N(t), the number of impulses that have occurred up to time t:
E3X1t24 = E3E3X1t2 | N1t244.
Suppose N1t2 = k, then
E3X1t2 | N1t2 = k4 = E B a h1t - Sj2 R
k
j=1
= a E3h1t - Sj24.
k
j=1
Since the arrival times, S1 , Á , Sk , when the impulses occurred are independent, uniformly distributed in the interval [0, t],
E3h1t - Sj24 =
L0
t
t
h1t - s2
1
ds
=
h1u2 du.
t
t L0
Thus
t
E3X1t2 | N1t2 = k4 =
k
h1u2 du,
t L0
and
E3X1t2 | N1t24 =
N1t2
t
L0
t
h1u2 du.
Finally, we obtain
E3X1t24 = E3E3X1t2 | N1t244
=
E3N1t24
= l
t
L0
L0
t
h1u2 du
t
h1u2 du,
(9.46)
where we used the fact that E3N1t24 = lt. Note that E[X(t)] approaches a constant value as t
becomes large if the above integral is finite.
9.5
GAUSSIAN RANDOM PROCESSES, WIENER PROCESS, AND BROWNIAN MOTION
In this section we continue the introduction of important random processes. First, we
introduce the class of Gaussian random processes which find many important applications in electrical engineering. We then develop an example of a Gaussian random
process: the Wiener random process which is used to model Brownian motion.
Section 9.5
9.5.1
Gaussian Random Processes, Wiener Process, and Brownian Motion
515
Gaussian Random Processes
A random process X(t) is a Gaussian random process if the samples X1 = X1t12,
X2 = X1t22, Á , Xk = X1tk2 are jointly Gaussian random variables for all k, and all
choices of t1 , Á , tk . This definition applies to both discrete-time and continuoustime processes. Recall from Eq. (6.42) that the joint pdf of jointly Gaussian random
variables is determined by the vector of means and by the covariance matrix:
fX1, X2, Á , Xk1x1 , x2 , Á , xk2 =
e -1/21xⴚm2 K 1xⴚm2
.
12p2k/2|K|1/2
T
-1
(9.47a)
In the case of Gaussian random processes, the mean vector and the covariance matrix
are the values of the mean function and covariance function at the corresponding time
instants:
CX1t1 , tk2
CX1t1 , t12 CX1t1 , t22 Á
mX1t12
CX1t2 , t12 CX1t2 , t22 Á
CX1t2 , tk2
o
S
K = D
(9.47b)
m = C
T.
o
o
o
mX1tk2
Á
CX1tk , t12
CX1tk , tk2
Gaussian random processes therefore have the very special property that their joint pdf’s
are completely specified by the mean function of the process mX1t2 and by the covariance
function CX1t1 , t22. In Chapter 6 we saw that the linear transformations of jointly
Gaussian random vectors result in jointly Gaussian random vectors as well. We will see
in Chapter 10 that Gaussian random processes also have the property that the linear
operations on a Gaussian process (e.g., a sum, derivative, or integral) results in another
Gaussian random process. These two properties, combined with the fact that many signal and noise processes are accurately modeled as Gaussian, make Gaussian random
processes the most useful model in signal processing.
Example 9.27 iid Discrete-Time Gaussian Random Process
Let the discrete-time random process Xn be a sequence of independent Gaussian random variables with mean m and variance s2. The covariance matrix for the times n1 , Á , nk is
5CX1n1 , n226 = 5s2 dij6 = s2I,
where dij = 1 when i = j and 0 otherwise, and I is the identity matrix. Thus the joint pdf for the
vector X n = 1Xn1 , Á , Xnk2 is
fXn1x1 , x2 , Á , xk2 =
exp b - a 1xi - m22/2s2 r .
12ps22k/2
i=1
1
k
The Gaussian iid random process has the property that the value at every time instant is independent of the value at all other time instants.
516
Chapter 9
Random Processes
Example 9.28 Continuous-Time Gaussian Random Process
Let X(t) be a continuous-time Gaussian random process with mean function and covariance
function given by:
mX1t2 = 3t
CX1t1 , t22 = 9e -2ƒ t1 - t2 ƒ.
Find P3X132 6 64 and P3X112 + X122 7 24.
The sample X(3) has a Gaussian pdf with mean mX132 = 3132 = 9 and variance s2X132 =
CX13, 32 = 9e -2ƒ3 - 3ƒ = 9. To calculate P3X132 6 64 we put X(3) in standard form:
P3X132 6 64 = P B
X132 - 9
29
6
6 - 9
29
R = 1 - Q1-12 = Q112 = 0.16.
From Example 6.24 we know that the sum of two Gaussian random variables is also a Gaussian
random variable with mean and variance given by Eq. (6.47). Therefore the mean and variance
of X112 + X122 are given by:
E3X112 + X1224 = mX112 + mX122 = 3 + 6 = 9
VAR3X112 + X1224 = CX11, 12 + CX11, 22 + CX12, 12 + CX12, 22
= 95e -2ƒ1 - 1ƒ + e -2ƒ2 - 1ƒ + e -2ƒ1 - 2ƒ + e -2ƒ2 - 2ƒ6
= 952 + 2e -26 = 20.43.
To calculate P3X112 + X122 7 24 we put X112 + X122 in standard form:
P3X112 + X122 7 154 = P B
9.5.2
X112 + X122 - 9
220.43
7
15 - 9
220.43
R = Q11.3272 = 0.0922.
Wiener Process and Brownian Motion
We now construct a continuous-time Gaussian random process as a limit of a discretetime process. Suppose that the symmetric random walk process (i.e., p = 1/2) of
Example 9.16 takes steps of magnitude ;h every d seconds.We obtain a continuous-time
process by letting Xd1t2 be the accumulated sum of the random step process up to time
t. Xd1t2 is a staircase function of time that takes jumps of ;h every d seconds. At time t,
the process will have taken n = 3t/d4 jumps, so it is equal to
Xd1t2 = h1D1 + D2 + Á + D3t/d42 = hSn .
The mean and variance of Xd1t2 are
E3Xd1t24 = hE3Sn4 = 0
VAR3Xd1t24 = h2n VAR3Dn4 = h2n,
where we used the fact that VAR3Dn4 = 4p11 - p2 = 1 since p = 1/2.
(9.48)
Section 9.5
Gaussian Random Processes, Wiener Process, and Brownian Motion
517
3
2.5
2
1.5
1
0.5
0
⫺0.5
⫺1
⫺1.5
⫺2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FIGURE 9.12
Four sample functions of the Wiener process.
Suppose that we take a limit where we simultaneously shrink the size of the
jumps and the time between jumps. In particular let d : 0 and h : 0 with h = 1ad
and let X(t) denote the resulting process.
X(t) then has mean and variance given by
E3X1t24 = 0
(9.49a)
VAR3X1t24 = 11ad221t/d2 = at.
(9.49b)
Thus we obtain a continuous-time process X(t) that begins at the origin, has zero mean
for all time, but has a variance that increases linearly with time. Figure 9.12 shows four
sample functions of the process. Note the similarities in fluctuations to the realizations
of a symmetric random walk in Fig. 9.7(b). X(t) is called the Wiener random process. It
is used to model Brownian motion, the motion of particles suspended in a fluid that
move under the rapid and random impact of neighboring particles.
As d : 0, Eq. (9.48) implies that X(t) approaches the sum of an infinite number
of random variables since n = 3t/d4 : q:
X1t2 = lim hSn = lim 1at
n: q
d:0
Sn
.
1n
(9.50)
By the central limit theorem the pdf of X(t) therefore approaches that of a Gaussian
random variable with mean zero and variance at:
fX1t21x2 =
(9.51)
22pat
X(t) inherits the property of independent and stationary increments from the
random walk process from which it is derived. As a result, the joint pdf of X(t) at
1
2
e -x /2at.
518
Chapter 9
Random Processes
several times t1 , t2 , Á , tk can be obtained by using Eq. (9.30):
fX1t12, Á , X1tk21x1 , Á , xk2 = fX1t121x12fX1t2 - t121x2 - x12 Á fX1tk - tk - 121xk - xk - 12
=
1xk - xk - 122
1x2 - x122
1 x21
Á
exp b - B
+
+
+
Rr
2 at1
a1t2 - t12
a1tk - tk - 12
212pa2kt11t2 - t12 Á 1tk - tk - 12
.
(9.52)
The independent increments property and the same sequence of steps that led to
Eq. (9.33) can be used to show that the autocovariance of X(t) is given by
CX1t1 , t22 = a min1t1 , t22 = a t1 for t1 6 t2 .
(9.53)
By comparing Eq. (9.53) and Eq. (9.35b), we see that the Wiener process and the Poisson process have the same covariance function despite the fact that the two processes
have very different sample functions. This underscores the fact that the mean and autocovariance functions are only partial descriptions of a random process.
Example 9.29
Show that the Wiener process is a Gaussian random process.
Equation (9.52) shows that the random variables X1t12, X1t22 - X1t12, X1t32 X1t22, Á , X1tk2 - X1tk - 12, are independent Gaussian random variables. The random variables
X1t12,X1t22, X1t32, Á , X1tk2, can be obtained from the X1t12 and the increments by a linear
transformation:
X1t12 = X1t12
X1t22 = X1t12 + 1X1t22 - X1t122
X1t32 = X1t12 + 1X1t22 - X1t122 + 1X1t32 - X1t222
o
X1tk2 = X1t12 + 1X1t22 - X1t122 + Á + 1X1tk2 - X1tk - 122.
(9.54)
It then follows (from Eq. 6.45) that X1t12, X1t22, X1t32, Á , X1tk2 are jointly Gaussian random
variables, and that X(t) is a Gaussian random process.
9.6
STATIONARY RANDOM PROCESSES
Many random processes have the property that the nature of the randomness in the
process does not change with time. An observation of the process in the time interval
1t0 , t12 exhibits the same type of random behavior as an observation in some other
time interval 1t0 + t, t1 + t2. This leads us to postulate that the probabilities of samples of the process do not depend on the instant when we begin taking observations,
that is, probabilities involving samples taken at times t1 , Á , tk will not differ from
those taken at t1 + t, Á , tk + t.
Example 9.30 Stationarity and Transience
An urn has 6 white balls each with the label “0” and 5 white balls with the label “1”. The following
sequence of experiments is performed: A ball is selected and the number noted; the first time a
white ball is selected it is not put back in the urn, but otherwise balls are always put back in the urn.
Section 9.6
Stationary Random Processes
519
The random process that results from this sequence of experiments clearly has a transient
phase and a stationary phase. The transient phase consists of a string of n consecutive 1’s and it
ends with the first occurrence of a “0”. During the transient phase P3In = 04 = 6/11, and the
mean duration of the transient phase is geometrically distributed with mean 11/6. After the first
occurrence of a “0”, the process enters a “stationary” phase where the process is a binary
equiprobable iid sequence. The statistical behavior of the process does not change once the stationary phase is reached.
If we are dealing with random processes that began at t = - q , then the above condition can be stated precisely as follows. A discrete-time or continuous-time random process
X(t) is stationary if the joint distribution of any set of samples does not depend on the placement of the time origin. This means that the joint cdf of X1t12, X1t22, Á , X1tk2 is the
same as that of X1t1 + t2, X1t2 + t2, Á , X1tk + t2:
FX1t12, Á , X1tk21x1 , Á , xk2 = FX1t1 + t2, Á , X1tk + t21x1 , Á , xk2,
(9.55)
for all time shifts t, all k, and all choices of sample times t1 , Á , tk . If a process begins
at some definite time (i.e., n = 0 or t = 0), then we say it is stationary if its joint distributions do not change under time shifts to the right.
Two processes X(t) and Y(t) are said to be jointly stationary if the joint cdf’s of
X1t12, Á , X1tk2 and Y1t1œ 2, Á , Y1tjœ2 do not depend on the placement of the time origin for all k and j and all choices of sampling times t1 , Á , tk and t¿1 , Á , t¿j .
The first-order cdf of a stationary random process must be independent of time,
since by Eq. (9.55),
(9.56)
FX1t21x2 = FX1t + t21x2 = FX1x2
all t, t.
This implies that the mean and variance of X(t) are constant and independent of time:
mX1t2 = E3X1t24 = m
(9.57)
for all t
VAR3X1t24 = E31X1t2 - m224 = s2
for all t.
(9.58)
The second-order cdf of a stationary random process can depend only on the time
difference between the samples and not on the particular time of the samples, since by
Eq. (9.55),
(9.59)
FX1t12, X1t221x1 , x22 = FX102, X1t2 - t121x1 , x22
for all t1 , t2 .
This implies that the autocorrelation and the autocovariance of X(t) can depend only
on t2 - t1:
RX1t1 , t22 = RX1t2 - t12
CX1t1 , t22 = CX1t2 - t12
for all t1 , t2
(9.60)
for all t1 , t2 .
(9.61)
Example 9.31 iid Random Process
Show that the iid random process is stationary.
The joint cdf for the samples at any k time instants, t1 , Á , tk , is
520
Chapter 9
Random Processes
FX1t12, Á , X1tk21x1 , x2 , Á , xk2 = FX1x12FX1x22 Á FX1xk2
= FX1t1 + t2, Á , X1tk + t21x1 , Á , xk2,
for all k, t1 , Á , tk . Thus Eq. (9.55) is satisfied, and so the iid random process is stationary.
Example 9.32
Is the sum process a discrete-time stationary process?
The sum process is defined by Sn = X1 + X2 + Á + Xn , where the Xi are an iid sequence. The process has mean and variance
mS1n2 = nm
VAR3Sn4 = ns2,
where m and s2 are the mean and variance of the Xn . It can be seen that the mean and variance
are not constant but grow linearly with the time index n. Therefore the sum process cannot be a
stationary process.
Example 9.33 Random Telegraph Signal
Show that the random telegraph signal discussed in Example 9.24 is a stationary random process
when P3X102 = ;14 = 1/2. Show that X(t) settles into stationary behavior as t : q even if
P3X102 = ;14 Z 1/2.
We need to show that the following two joint pmf’s are equal:
P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4,
for any k, any t1 6 Á 6 tk , and any aj = ;1. The independent increments property of the Poisson process implies that
P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t12 = a14
* P3X1t22 = a2 ƒ X1t12 = a14 Á P3X1tk2 = ak ƒ X1tk - 12 = ak - 14,
since the values of the random telegraph at the times t1 , Á , tk are determined by the number of
occurrences of events of the Poisson process in the time intervals 1tj , tj + 12. Similarly,
P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4
= P3X1t1 + t2 = a14P3X1t2 + t2 = a2 ƒ X1t1 + t2 = a14 Á
* P3X1tk + t2 = ak ƒ X1tk - 1 + t2 = ak - 14.
The corresponding transition probabilities in the previous two equations are equal since
1
51 + e -2a1tj + 1 - tj26
2
P3X1tj + 12 = aj + 1 ƒ X1tj2 = aj4 = d
1
51 - e -2a1tj + 1 - tj26
2
if aj = aj + 1
if aj Z aj + 1
= P3X1tj + 1 + t2 = aj + 1 ƒ X1tj + t2 = aj4.
Section 9.6
Stationary Random Processes
521
Thus the two joint probabilities differ only in the first term, namely, P3X1t12 = a14 and
P3X1t1 + t2 = a14.
From Example 9.24 we know that if P3X102 = ;14 = 1/2 then P3X1t2 = ;14 = 1/2, for
all t. Thus P3X1t12 = a14 = 1/2, P3X1t1 + t2 = a14 = 1/2, and
P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4.
Thus we conclude that the process is stationary when P3X102 = ;14 = 1/2.
If P3X102 = ;14 Z 1/2, then the two joint pmf’s are not equal because P3X1t12 =
a14 Z P3X1t1 + t2 = a14. Let’s see what happens if we know that the process started at a specific value, say X102 = 1, that is, P3X102 = 14 = 1. The pmf for X(t) is obtained from Eqs.
(9.39) through (9.41):
P3X1t2 = a4 = P3X1t2 = a ƒ X102 = 141
1
51 + e -2at6
2
= d
1
51 - e -2at6
2
if a = 1
if a = -1.
For very small t, the probability that X1t2 = 1 is close to 1; but as t increases, the probability that
X1t2 = 1 becomes 1/2. Therefore as t1 becomes large, P3X1t12 = a14 : 1/2 and P3X1t1 + t2 =
a14 : 1/2 and the two joint pmf’s become equal. In other words, the process “forgets” the initial
condition and settles down into “steady state,” that is, stationary behavior.
9.6.1
Wide-Sense Stationary Random Processes
In many situations we cannot determine whether a random process is stationary, but
we can determine whether the mean is a constant:
mX1t2 = m
for all t,
(9.62)
and whether the autocovariance (or equivalently the autocorrelation) is a function of
t1 - t2 only:
for all t1 , t2 .
(9.63)
CX1t1 , t22 = CX1t1 - t22
A discrete-time or continuous-time random process X(t) is wide-sense stationary (WSS)
if it satisfies Eqs. (9.62) and (9.63). Similarly, we say that the processes X(t) and Y(t) are
jointly wide-sense stationary if they are both wide-sense stationary and if their crosscovariance depends only on t1 - t2 . When X(t) is wide-sense stationary, we write
CX1t1 , t22 = CX1t2
and
RX1t1 , t22 = RX1t2,
where t = t1 - t2 .
All stationary random processes are wide-sense stationary since they satisfy Eqs.
(9.62) and (9.63). The following example shows that some wide-sense stationary
processes are not stationary.
Example 9.34
Let Xn consist of two interleaved sequences of independent random variables. For n even, Xn
assumes the values ;1 with probability 1/2; for n odd, Xn assumes the values 1/3 and -3 with
522
Chapter 9
Random Processes
probabilities 9/10 and 1/10, respectively. Xn is not stationary since its pmf varies with n. It is easy
to show that Xn has mean
mX1n2 = 0
for all n
and covariance function
CX1i, j2 = b
Xn is therefore wide-sense stationary.
E3Xi4E3Xj4 = 0
E3X2i 4 = 1
for i Z j
for i = j.
We will see in Chapter 10 that the autocorrelation function of wide-sense stationary processes plays a crucial role in the design of linear signal processing algorithms.
We now develop several results that enable us to deduce properties of a WSS process
from properties of its autocorrelation function.
First, the autocorrelation function at t = 0 gives the average power (second moment) of the process:
RX102 = E3X1t224
(9.64)
for all t.
Second, the autocorrelation function is an even function of t since
RX1t2 = E3X1t + t2X1t24 = E3X1t2X1t + t24 = RX1-t2.
(9.65)
Third, the autocorrelation function is a measure of the rate of change of a random
process in the following sense. Consider the change in the process from time t to t + t:
P3|X1t + t2 - X1t2| 7 e4 = P31X1t + t2 - X1t222 7 e24
…
=
E31X1t + t2 - X1t2224
e2
25RX102 - RX1t26
e2
,
(9.66)
where we used the Markov inequality, Eq. (4.75), to obtain the upper bound. Equation
(9.66) states that if RX102 - RX1t2 is small, that is, RX1t2 drops off slowly, then the
probability of a large change in X(t) in t seconds is small.
Fourth, the autocorrelation function is maximum at t = 0. We use the CauchySchwarz inequality:2
(9.67)
E3XY42 … E3X24E3Y24,
for any two random variables X and Y. If we apply this equation to X1t + t2 and X(t),
we obtain
RX1t22 = E3X1t + t2X1t242 … E3X21t + t24E3X21t24 = RX1022.
Thus
2
See Problem 5.74 and Appendix C.
ƒ RX1t2 ƒ … RX102.
(9.68)
Section 9.6
Stationary Random Processes
523
Fifth, if RX102 = RX1d2, then RX1t2 is periodic with period d and X(t) is mean
square periodic, that is, E31X1t + d2 - X1t2224 = 0. If we apply Eq. (9.67) to
X1t + t + d2 - X1t + t2 and X(t), we obtain
E31X1t + t + d2 - X1t + t22X1t242
… E31X1t + t + d2 - X1t + t2224E3X21t24,
which implies that
5RX1t + d2 - RX1t262 … 25RX102 - RX1d26RX102.
Thus RX1d2 = RX102 implies that the right-hand side of the equation is zero, and thus
that RX1t + d2 = RX1t2 for all t. Repeated applications of this result imply that
RX1t2 is periodic with period d. The fact that X(t) is mean square periodic follows from
E31X1t + d2 - X1t2224 = 25RX102 - RX1d26 = 0.
Sixth, let X1t2 = m + N1t2, where N(t) is a zero-mean process for which
RN1t2 : 0 as t : q , then
RX1t2 = E31m + N1t + t221m + N1t224 = m2 + 2mE3N1t24 + RN1t2
= m2 + RN1t2 : m2
as t : q.
In other words, RX1t2 approaches the square of the mean of X(t) as t : q.
In summary, the autocorrelation function can have three types of components:
(1) a component that approaches zero as t : q; (2) a periodic component; and (3) a
component due to a nonzero mean.
Example 9.35
Figure 9.13 shows several typical autocorrelation functions. Figure 9.13(a) shows the autocorrelation function for the random telegraph signal X(t) (see Eq. (9.44)):
RX1t2 = e -2aƒtƒ
for all t.
X(t) is zero mean and RX1t2 : 0 as ƒ t ƒ : q.
Figure 9.13(b) shows the autocorrelation function for a sinusoid Y(t) with amplitude a and
random phase (see Example 9.10):
RY1t2 =
a2
cos12pf0t2
2
for all t.
Y(t) is zero mean and RY1t2 is periodic with period 1/f0 .
Figure 9.13(c) shows the autocorrelation function for the process Z1t2 = X1t2 + Y1t2 + m,
where X(t) is the random telegraph process, Y(t) is a sinusoid with random phase, and m is a constant. If we assume that X(t) and Y(t) are independent processes, then
RZ1t2 = E35X1t + t2 + Y1t + t2 + m65X1t2 + Y1t2 + m64
= RX1t2 + RY1t2 + m2.
524
Chapter 9
Random Processes
RX (t) ⫽ e⫺2α
t
t
0
(a)
RY (t) ⫽
a2
cos 2pf0t
2
t
0
(b)
RZ (t)
m2
0
(c)
t
FIGURE 9.13
(a) Autocorrelation function of a random telegraph signal. (b) Autocorrelation
function of a sinusoid with random phase. (c) Autocorrelation function of a random
process that has nonzero mean, a periodic component, and a “random” component.
9.6.2
Wide-Sense Stationary Gaussian Random Processes
If a Gaussian random process is wide-sense stationary, then it is also stationary. Recall
from Section 9.5, Eq. (9.47), that the joint pdf of a Gaussian random process is completely determined by the mean mX1t2 and the autocovariance CX1t1 , t22. If X(t) is
wide-sense stationary, then its mean is a constant m and its autocovariance depends
only on the difference of the sampling times, ti - tj . It then follows that the joint pdf of
X(t) depends only on this set of differences, and hence it is invariant with respect to
time shifts. Thus the process is also stationary.
The above result makes WSS Gaussian random processes particularly easy to work
with since all the information required to specify the joint pdf is contained in m and CX1t2.
Example 9.36 A Gaussian Moving Average Process
Let Xn be an iid sequence of Gaussian random variables with zero mean and variance s2, and let
Yn be the average of two consecutive values of Xn:
Section 9.6
Yn =
Stationary Random Processes
525
Xn + Xn - 1
.
2
The mean of Yn is zero since E3Xi4 = 0 for all i. The covariance is
CY1i, j2 = E3YiYj4 =
=
1
E31Xi + Xi - 121Xj + Xj - 124
4
1
5E3XiXj4 + E3XiXj - 14 + E3Xi - 1Xj4 + E3Xi - 1Xj - 146
4
1 2
s
2
= e1 2
s
4
0
if i = j
if ƒ i - j ƒ = 1
otherwise.
We see that Yn has a constant mean and a covariance function that depends only on ƒ i - j ƒ , thus
Yn is a wide-sense stationary process. Yn is a Gaussian random variable since it is defined by a
linear function of Gaussian random variables (see Section 6.4, Eq. 6.45). Thus the joint pdf of Yn
is given by Eq. (9.47) with zero-mean vector and with entries of the covariance matrix specified
by CY1i, j2 above.
9.6.3
Cyclostationary Random Processes
Many random processes arise from the repetition of a given procedure every T seconds.
For example, a data modulator (“modem”) produces a waveform every T seconds according to some input data sequence. In another example, a “time multiplexer” interleaves n separate sequences of information symbols into a single sequence of symbols. It
should not be surprising that the periodic nature of such processes is evident in their probabilistic descriptions.A discrete-time or continuous-time random process X(t) is said to be
cyclostationary if the joint cumulative distribution function of any set of samples is invariant with respect to shifts of the origin by integer multiples of some period T. In other words,
X1t12, X1t22, Á , X1tk2 and X1t1 + mT2, X1t2 + mT2, Á , X1tk + mT2 have the
same joint cdf for all k, m, and all choices of sampling times t1 , Á , tk:
FX1t12, X1t22, Á , X1tk21x1 , x2 , Á , xk2
= FX1t1 + mT2, X1t2 + mT2, Á , X1tk + mT21x1 , x2 , Á , xk2.
(9.69)
We say that X(t) is wide-sense cyclostationary if the mean and autocovariance functions are invariant with respect to shifts in the time origin by integer multiples of T,
that is, for every integer m,
mX1t + mT2 = mX1t2
CX1t1 + mT, t2 + mT2 = CX1t1 , t22.
(9.70a)
(9.70b)
Note that if X(t) is cyclostationary, then it follows that X(t) is also wide-sense cyclostationary.
526
Chapter 9
Random Processes
Example 9.37
Consider a random amplitude sinusoid with period T:
X1t2 = A cos12pt/T2.
Is X(t) cyclostationary? wide-sense cyclostationary?
Consider the joint cdf for the time samples t1 , Á , tk:
P3X1t12 … x1 , X1t22 … x2 , Á , X1tk2 … xk24
= P3A cos12pt1/T2 … x1 , Á , A cos12ptk/T2 … xk4
= P3A cos12p1t1 + mT2/T2 … x1 , Á , A cos12p1tk + mT2/T2 … xk4
= P3X1t1 + mT2 … x1 , X1t2 + mT2 … x2 , Á , X1tk + mT2 … xk4.
Thus X(t) is a cyclostationary random process and hence also a wide-sense cyclostationary
process.
In the above example, the sample functions of the random process are always periodic. The following example shows that, in general, the sample functions of a cyclostationary random process need not be periodic.
Example 9.38 Pulse Amplitude Modulation
A modem transmits a binary iid equiprobable data sequence as follows: To transmit a binary 1,
the modem transmits a rectangular pulse of duration T seconds and amplitude 1; to transmit a binary 0, it transmits a rectangular pulse of duration T seconds and amplitude -1. Let X(t) be the
random process that results. Is X(t) wide-sense cyclostationary?
Figure 9.14(a) shows a rectangular pulse of duration T seconds, and Fig. 9.14(b) shows the
waveform that results for a particular data sequence. Let A i be the sequence of amplitudes 1;12
1
p(t)
0
T
(a) Individual signal pulse
t
1
0
1
T
2T
3T
4T
⫺1
⫺1
(b) Waveform corresponding to data sequence 1001
FIGURE 9.14
Pulse amplitude modulation.
t
Section 9.6
Stationary Random Processes
527
corresponding to the binary sequence, then X(t) can be represented as the sum of amplitudemodulated time-shifted rectangular pulses:
X1t2 = a A np1t - nT2.
q
q
(9.71)
n=-
The mean of X(t) is
mX1t2 = E B a A np1t - nT2 R = a E3A n4p1t - nT2 = 0
q
q
q
q
n=-
n=-
since E3A n4 = 0. The autocovariance function is
CX1t1 , t22 = E3X1t12X1t224 - 0
= b
E3X1t1224 = 1
E3X1t124E3X1t224 = 0
if nT … t1 , t2 6 1n + 12T
otherwise.
Figure 9.15 shows the autocovariance function in terms of t1 and t2 . It is clear that
CX1t1 + mT, t2 + mT2 = CX1t1 , t22 for all integers m. Therefore the process is wide-sense cyclostationary.
We will now show how a stationary random process can be obtained from a cyclostationary process. Let X(t) be a cyclostationary process with period T. We “stationarize”
X(t) by observing a randomly phase-shifted version of X(t):
Xs1t2 = X1t + ®2
® uniform in 30, T4,
(9.72)
t2
1
5T
0
1
4T
1
3T
0
1
2T
1
T
1
0
T
2T
3T
4T
5T
FIGURE 9.15
Autocovariance function of pulse amplitude-modulated
random process.
t1
528
Chapter 9
Random Processes
where ® is independent of X1t2. Xs1t2 can arise when the phase of X(t) is either unknown or not of interest. If X(t) is a cyclostationary random process, then Xs1t2 is a stationary random process. To show this, we first use conditional expectation to find the
joint cdf of Xs1t2:
P3Xs1t12 … x1 , Xs1t22 … x2 , Á , Xs1tk2 … xk4
= P3X1t1 + ®2 … x1 , X1t2 + ®2 … x2 , Á , X1tk + ®2 … xk4
T
P3X1t1 + ®2 … x1 , Á , X1tk + ®2 … xk | ® = u4f®1u2 du
=
L0
=
1
P3X1t1 + u2 … x1 , Á , X1tk + u2 … xk4 du.
T L0
T
(9.73)
Equation (9.73) shows that the joint cdf of Xs1t2 is obtained by integrating the joint cdf
of X(t) over one time period. It is easy to then show that a time-shifted version of Xs1t2,
say Xs1t1 + t2, Xs1t2 + t2, Á , Xs1tk + t2, will have the same joint cdf as Xs1t12,
Xs1t22, Á , Xs1tk2 (see Problem 9.80). Therefore Xs1t2 is a stationary random process.
By using conditional expectation (see Problem 9.81), it is easy to show that if X(t)
is a wide-sense cyclostationary random process, then Xs1t2 is a wide-sense stationary
random process, with mean and autocorrelation given by
E3Xs1t24 =
RXs1t2 =
T
1
mx1t2 dt
T L0
(9.74a)
T
1
R 1t + t, t2 dt.
T L0 X
(9.74b)
Example 9.39 Pulse Amplitude Modulation with Random Phase Shift
Let Xs1t2 be the phase-shifted version of the pulse amplitude–modulated waveform X(t) introduced in Example 9.38. Find the mean and autocorrelation function of Xs1t2.
Xs1t2 has zero mean since X(t) is zero-mean. The autocorrelation of Xs1t2 is obtained
from Eq. (9.74b). From Fig. 9.15, we can see that for 0 6 t + t 6 T, RX1t + t, t2 = 1 and
RX1t + t, t2 = 0 otherwise. Therefore:
RXs1t2 =
for 0 6 t 6 T:
for - T 6 t 6 0:
1
T L0
RXs1t2 =
Thus Xs1t2 has a triangular autocorrelation function:
RXs1t2 = c
1 0
ƒtƒ
T
ƒtƒ … T
ƒ t ƒ 7 T.
T-t
dt =
T
T - t
;
T
1
T + t
dt =
.
T L- t
T
Section 9.7
Continuity, Derivatives, and Integrals of Random Processes
537
The variance is then
VAR3M1t24 = E3A24
4
4
2pt
2pt
- E3A42 2 sin2
sin2
T
T
p2
p
= VAR3A4
2pt
4
sin2
.
2
T
p
Example 9.45 Integral of White Gaussian Noise
Let Z(t) be the white Gaussian noise process introduced in Example 9.43. Find the autocorrelation function of X(t), the integral of Z(t) over the interval (0, t).
From Example 9.43, the white Gaussian noise process has autocorrelation function
RZ1t1 , t22 = ad1t1 - t22.
The autocorrelation function of X(t) is then given by
RX1t1 , t22 =
t1
L0 L0
= a
L0
t2
ad1w - v2 dw dv = a
min1t1,t22
L0
t2
u1t1 - v2 dv
dv = a min1t1 , t22.
We thus find that X(t) has the same autocorrelation as the Wiener process. In addition we have
that X(t) must be a Gaussian random process since Z(t) is Gaussian. It then follows that X(t)
must be the Wiener process because it has the joint pdf given by Eq. (9.52).
9.7.4
Response of a Linear System to Random Input
We now apply the results developed in this section to develop the solution of a linear
system described by a first-order differential equation. The method can be generalized
to higher-order equations. In the next chapter we develop transform methods to solve
the general problem.
Consider a linear system described by the first-order differential equation:
X¿1t2 + aX1t2 = Z1t2
t Ú 0, X102 = 0.
(9.93)
For example, X(t) may represent the voltage across the capacitor of an RC circuit with
current input Z(t). We now show how to obtain mX1t2 and RX1t1 , t22. If the input
process Z(t) is Gaussian, then the output process will also be Gaussian. Therefore, in
the case of Gaussian input processes, we can then characterize the joint pdf of the output process.
538
Chapter 9
Random Processes
We obtain a differential equation for mX1t2 by taking the expected value of
Eq. (9.93):
œ
1t2 + mX1t2 = mZ1t2
t Ú 0
E3X¿1t24 + E3X1t24 = mX
(9.94)
with initial condition mX102 = E3X1024 = 0.
As an intermediate step we next find a differential equation for RZ,X1t1 , t22. If we
multiply Eq. (9.93) by Z1t12 and take the expected value, we obtain
E3Z1t12X¿1t224 + aE3Z1t12X1t224 = E3Z1t12Z1t224
t2 Ú 0
with initial condition E3Z1t12X1024 = 0 since X102 = 0. The same derivation that led
to the cross-correlation between X(t) and X¿1t2 (see Eq. 9.83) can be used to show that
0
R 1t , t 2.
0t2 Z,X 1 2
Thus we obtain the following differential equation:
E3Z1t12X¿1t224 =
0
R 1t , t 2 + aRZ,X1t1 , t22 = RZ1t1 , t22
0t2 Z,X 1 2
t2 Ú 0
(9.95)
with initial condition RZ,X1t1 , 02 = 0.
Finally we obtain a differential equation for RZ1t1 , t22. Multiply Eq. (9.93) by
X1t22 and take the expected value:
E3X¿1t12X1t224 + aE3X1t12X1t224 = E3Z1t12X1t224
t1 Ú 0
with initial condition E3X102X1t224 = 0. This leads to the differential equation
0
R 1t , t 2 + aRX1t1 , t22 = RZ,X1t1 , t22
0t1 X 1 2
t1 Ú 0
(9.96)
with initial condition RZ,X10, t22 = 0. Note that the solution to Eq. (9.95) appears as
the forcing function in Eq. (9.96). Thus we conclude that by solving the differential
equations in Eqs. (9.94), (9.95), and (9.96) we obtain the mean and autocorrelation
function for X(t).
Example 9.46 Ornstein-Uhlenbeck Process
Equation (9.93) with the input given by a zero-mean, white Gaussian noise process is called the
Langevin equation, after the scientist who formulated it in 1908 to describe the Brownian motion
of a free particle. In this formulation X(t) represents the velocity of the particle, so that Eq. (9.93)
results from equating the acceleration of the particle X¿1t2 to the force on the particle due to
friction -aX1t2 and the force due to random collisions Z(t). We present the solution developed
by Uhlenbeck and Ornstein in 1930.
First, we note that since the input process Z(t) is Gaussian, the output process X(t) will
also be a Gaussian random process. Next we recall that the first-order differential equation
x¿1t2 + ax1t2 = g1t2
t Ú 0, x102 = 0
Section 9.7
Continuity, Derivatives, and Integrals of Random Processes
539
has solution
L0
Therefore the solution to Eq. (9.94) is
t
x1t2 =
mX1t2 =
e -a1t - t2g1t2 dt
L0
t
t Ú 0.
e -a1t - t2mZ1t2 dt = 0.
The autocorrelation of the white Gaussian noise process is
RZ1t1 , t22 = s2d1t1 - t22.
Equation (9.95) is also a first-order differential equation, and it has solution
RZ,X1t1 , t22 =
=
L0
t2
L0
= b
e -a1t2 - t2RZ1t1 , t2 dt
t2
e -a1t2 - t2s2d1t1 - t2 dt
0
s2e -a1t2 - t12
0 … t2 6 t1
t2 Ú t1
= s2e -a1t2 - t12u1t2 - t12,
where u(x) is the unit step function.
The autocorrelation function of the output process X(t) is the solution to the first-order
differential equation Eq. (9.96). The solution is given by
RX1t1 , t22 =
L0
= s2
= s2
=
t1
e -a1t1 - t2RZ,X1t, t22 dt
L0
L0
t1
e -a1t1 - t2e -a1t2 - t2u1t2 - t2 dt
min1t1, t22
e -a1t1 - t2e -a1t2 - t2 dt
s2 -a|t1 - t2|
1e
- e -a1t1 + t222
2a
t1 Ú 0, t2 Ú 0.
(9.97a)
A Gaussian random process with this autocorrelation function is called an Ornstein-Uhlenbeck process. Thus we conclude that the output process X(t) is an Ornstein-Uhlenbeck
process.
If we let t1 = t and t2 = t + t, then as t approaches infinity,
RX1t + t, t2 :
s2 -aƒtƒ
e .
2a
(9.97b)
This shows that the effect of the zero initial condition dies out as time progresses, and the process
becomes wide-sense stationary. Since the process is Gaussian, this also implies that the process
becomes strict-sense stationary.
540
9.8
Chapter 9
Random Processes
TIME AVERAGES OF RANDOM PROCESSES AND ERGODIC THEOREMS
At some point, the parameters of a random process must be obtained through measurement. The results from Chapter 7 and the statistical methods of Chapter 8 suggest
that we repeat the random experiment that gives rise to the random process a large
number of times and take the arithmetic average of the quantities of interest. For example, to estimate the mean mX1t2 of a random process X1t, z2, we repeat the random
experiment and take the following average:
N X1t2 =
m
1 N
X1t, zi2,
N ia
=1
(9.98)
where N is the number of repetitions of the experiment, and X1t, zi2 is the realization
observed in the ith repetition.
In some situations, we are interested in estimating the mean or autocorrelation
functions from the time average of a single realization, that is,
8X1t29T =
T
1
X1t, z2 dt.
2T L-T
(9.99)
An ergodic theorem states conditions under which a time average converges as the observation interval becomes large. In this section, we are interested in ergodic theorems
that state when time averages converge to the ensemble average (expected value).
The strong law of large numbers, presented in Chapter 7, is one of the most important ergodic theorems. It states that if Xn is an iid discrete-time random process
with finite mean E3Xn4 = m, then the time average of the samples converges to the
ensemble average with probability one:
1 n
Xi = m R = 1.
n : q n ia
=1
P B lim
(9.100)
This result allows us to estimate m by taking the time average of a single realization of
the process. We are interested in obtaining results of this type for a larger class of random processes, that is, for non-iid, discrete-time random processes, and for continuoustime random processes.
The following example shows that, in general, time averages do not converge to
ensemble averages.
Example 9.47
Let X1t2 = A for all t, where A is a zero-mean, unit-variance random variable. Find the limiting
value of the time average.
The mean of the process is mX1t2 = E3X1t24 = E3A4 = 0. However, Eq. (9.99) gives
8X1t29T =
T
1
A dt = A.
2T L-T
Thus the time-average mean does not always converge to mX1t2 = 0. Note that this process is
stationary. Thus this example shows that stationary processes need not be ergodic.
Section 9.8
Time Averages of Random Processes and Ergodic Theorems
541
Consider the estimate given by Eq. (9.99) for E3X1t24 = mX1t2. The estimate
yields a single number, so obviously it only makes sense to consider processes for
which mX1t2 = m, a constant. We now develop an ergodic theorem for the time average of wide-sense stationary processes.
Let X(t) be a WSS process. The expected value of 8X1t29T is
T
T
1
1
E38X1t29T4 = E B
X1t2 dt R =
E3X1t24 dt = m.
2T L-T
2T L-T
(9.101)
Equation (9.101) states that 8X1t29T is an unbiased estimator for m.
Consider the variance of 8X1t29T:
VAR38X1t29T4 = E318X1t29T - m224
T
T
1
1
= EB b
1X1t2 - m2 dt r b
1X1t¿2 - m2 dt¿ r R
2T L-T
2T L-T
T
T
T
T
1
E31X1t2 - m21X1t¿2 - m24 dt dt¿
=
4T2 L-T L-T
=
1
CX1t, t¿2 dt dt¿.
4T2 L-T L-T
(9.102)
Since the process X(t) is WSS, Eq. (9.102) becomes
VAR38X1t29T4 =
T
T
1
CX1t - t¿2 dt dt¿ .
4T2 L-T L-T
(9.103)
Figure 9.17 shows the region of integration for this integral. The integrand is constant
along the line u = t - t¿ for -2T 6 u 6 2T, so we can evaluate the integral as the
t
⫺2T ⫽ t ⫺ t
0 ⫽ t ⫺ t
⫺T
T
u ⫽ t ⫺ t
⫺T
FIGURE 9.17
Region of integration for integral in Eq. (9.102).
t
u ⫹ du ⫽ t ⫺ t
2T ⫽ t ⫺ t
542
Chapter 9
Random Processes
sums of infinitesimal strips as shown in the figure. It can be shown that each strip has area
12T - ƒ u ƒ 2 du, so the contribution of each strip to the integral is 12T - ƒ u ƒ 2CX1u2 du.
Thus
2T
VAR38X1t29T4 =
1
12T - ƒ u ƒ 2CX1u2 du
4T2 L-2T
=
2T
ƒuƒ
1
bC 1u2 du.
a1 2T L-2T
2T X
(9.104)
Therefore, 8X1t29T will approach m in the mean square sense, that is, E318X1t29T m224 : 0, if the expression in Eq. (9.104) approaches zero with increasing T. We have
just proved the following ergodic theorem.
Theorem
Let X(t) be a WSS process with mX1t2 = m, then
lim 8X1t29 T = m
T: q
in the mean square sense, if and only if
2T
ƒuƒ
1
a1 bCX1u2 du = 0.
T : q 2T L
2T
-2T
lim
In keeping with engineering usage, we say that a WSS process is mean ergodic if it satisfies the conditions of the above theorem.
The above theorem can be used to obtain ergodic theorems for the time average
of other quantities. For example, if we replace X(t) with Y1t + t2Y1t2 in Eq. (9.99), we
obtain a time-average estimate for the autocorrelation function of the process Y(t):
8Y1t + t2Y1t29T =
T
1
Y1t + t2Y1t2 dt.
2T L-T
(9.105)
It is easily shown that E38Y1t + t2Y1t29T4 = RY1t2 if Y(t) is WSS.The above ergodic theorem then implies that the time-average autocorrelation converges to RY1t2 in the mean
square sense if the term in Eq. (9.104) with X(t) replaced by Y1t2Y1t + t2 converges to zero.
Example 9.48
Is the random telegraph process mean ergodic?
The covariance function for the random telegraph process is CX1t2 = e -2aƒtƒ, so the variance of 8X1t29T is
2T
2
u
VAR38X1t29T4 =
a1 be -2au du
2T L0
2T
2T
6
1
1 - e -4aT
e -2au du =
.
T L0
2aT
The bound approaches zero as T : q , so VAR38X1t29T4 : 0. Therefore the process is mean
ergodic.
Section 9.8
Time Averages of Random Processes and Ergodic Theorems
543
If the random process under consideration is discrete-time, then the time-average
estimate for the mean and the autocorrelation functions of Xn are given by
8Xn9T =
8Xn + kXn9T =
T
1
Xn
2T + 1 n a
= -T
(9.106)
T
1
Xn + kXn .
2T + 1 n a
= -T
(9.107)
If Xn is a WSS random process, then E38Xn9T4 = m, and so 8Xn9T is an unbiased estimate for m. It is easy to show that the variance of 8Xn9T is
2T
ƒkƒ
1
a1 bCX1k2.
a
2T + 1 k = -2T
2T + 1
VAR38Xn9T4 =
(9.108)
Therefore, 8Xn9T approaches m in the mean square sense and is mean ergodic if the expression in Eq. (9.108) approaches zero with increasing T.
Example 9.49 Ergodicity and Exponential Correlation
Let Xn be a wide-sense stationary discrete-time process with mean m and covariance function
CX1k2 = s2r-ƒkƒ, for ƒ r ƒ 6 1 and k = 0, ;1, +2, Á . Show that Xn is mean ergodic.
The variance of the sample mean (Eq. 9.106) is:
VAR[8Xn9T =
2T
ƒkƒ
1
a1 bs2rƒkƒ
a
2T + 1 k = -2T
2T + 1
1
2s2
2
s2rk =
.
a
2T + 1 k = 0
2T + 1 1 - r
q
6
The bound on the right-hand side approaches zero as T increases and so Xn is mean ergodic.
Example 9.50 Ergodicity of Self-Similar Process and Long-Range Dependence
Let Xn be a wide-sense stationary discrete-time process with mean m and covariance function
s2
5 ƒ k + 1 ƒ 2H - 2 ƒ k ƒ 2H + ƒ k - 1 ƒ 2H6
2
CX1k2 =
(9.109)
for 1/2 6 H 6 1 and k = 0, ;1, +2, Á Xn is said to be second-order self-similar. We will investigate the ergodicity of Xn .
We rewrite the variance of the sample mean in (Eq. 9.106) as follows:
VAR38Xn9T4 =
=
a 12T + 1 - ƒ k ƒ 2CX1k2
2T
1
12T + 12
2
k = -2T
1
512T + 12CX102 + 212TCX1122 + Á + 2CX12T26.
12T + 122
544
Chapter 9
Random Processes
It is easy to show (See Problem 9.132) that the sum inside the braces is s212T + 122H. Therefore
the variance becomes:
VAR38Xn9T4 =
1
s2 12T + 122H = s2 12T + 122H - 2.
12T + 122
(9.110)
The value of H, which is called the Hurst parameter, affects the convergence behavior of the sample mean. Note that if H = 1/2, the covariance function becomes CX1k2 = 1/2s2dk which corresponds to an iid sequence. In this case, the variance becomes s2/12T + 12 which is the convergence
rate of the sample mean for iid samples. However, for H 7 1/2, the variance becomes:
s2
(9.111)
12T + 122H - 1,
2T + 1
so the convergence of the sample mean is slower by a factor of 12T + 122H - 1 than for iid
samples.
The slower convergence of the sample mean when H 7 1/2 results from the long-range dependence of Xn . It can be shown that for large k, the covariance function is approximately given by:
VAR38Xn9T4 =
CX1k2 = s2H12H - 12k2H - 2.
(9.112)
a
For 1/2 6 H 6 1, C1k2 decays as 1/k where 0 6 a 6 1, which is a very slow decay rate. Thus
the dependence between values of Xn decreases slowly and the process is said to have a long
memory or long-range dependence.
*9.9
FOURIER SERIES AND KARHUNEN-LOEVE EXPANSION
Let X(t) be a wide-sense stationary, mean square periodic random process with period
T, that is, E31X1t + T2 - X1t2224 = 0. In order to simplify the development, we
assume that X(t) is zero mean. We show that X(t) can be represented in a mean square
sense by a Fourier series:
X1t2 = a Xkej2pkt/T,
q
q
(9.113)
k=-
where the coefficients are random variables defined by
T
Xk =
1
X1t¿2e -j2pkt¿/T dt¿.
T L0
(9.114)
Equation (9.114) implies that, in general, the coefficients are complex-valued random
variables. For complex-valued random variables, the correlation between two random
variables X and Y is defined by E3XY*4. We also show that the coefficients are orthogonal random variables, that is, E3XkX…m4 = 0 for k Z m.
Recall that if X(t) is mean square periodic, then RX1t2 is a periodic function in t
with period T. Therefore, it can be expanded in a Fourier series:
RX1t2 = a akej2pkt/T,
q
q
(9.115)
k=-
where the coefficients ak are given by
T
ak =
1
R 1t¿2e -j2pkt¿/T dt¿.
T L0 X
(9.116)
Section 9.9
Fourier Series and Karhunen-Loeve Expansion
545
The coefficients ak appear in the following derivation.
First, we show that the coefficients in Eq. (9.113) are orthogonal random variables, that is, E3XkX…m4 = 0:
E3XkX…m4 = E B Xk
T
1
X*1t¿2ej2pmt¿/T dt¿ R
T L0
T
=
1
E3XkX*1t¿24ej2pmt¿/T dt¿.
T L0
The integrand of the above equation has
T
E3XkX*1t24 = E B
1
X1u2e -j2pku/T du X*1t2 R
T L0
T
=
1
R 1u - t2e -j2pku/T du
T L0 X
T-t
= b
1
T L-t
RX1v2e -j2pkv/T dv r e -j2pkt/T
= ake -j2pkt/T,
where we have used the fact that the Fourier coefficients can be calculated over any
full period. Therefore
E3XkX…m4 =
T
1
a e -j2pkt¿/Tej2pmt¿/T dt¿ = akdk,m ,
T L0 k
(9.117)
where dk,m is the Kronecker delta function. Thus Xk and Xm are orthogonal random
variables. Note that the above equation implies that ak = E C ƒ Xk ƒ 2 D , that is, the ak are
real-valued.
To show that the Fourier series equals X(t) in the mean square sense, we take
E B ` X1t2 - a Xkej2pkt/T ` R
q
q
2
k=-
= E3 ƒ X1t2 ƒ 24 - E B X1t2 a X…ke -j2pkt/T R
q
q
k=-
- E B X*1t2 a Xkej2pkt/T R + E B a a XkX…mej2p1k - m2t/T R
q
q
q
q
k=-
q
q
k=-
m=-
= RX102 - a ak - a a…k + a ak .
q
q
q
q
q
q
k=-
k=-
k=-
The above equation equals zero, since the ak are real and since RX102 = ©ak from Eq.
(9.115).
If X(t) is a wide-sense stationary random process that is not mean square periodic,
we can still expand X(t) in the Fourier series in an arbitrary interval [0, T]. Mean square
equality will hold only inside the interval. Outside the interval, the expansion repeats
546
Chapter 9
Random Processes
itself with period T. The Fourier coefficients will no longer be orthogonal; instead they
are given by
E3XkX…m4 =
T
T
1
RX1t - u2e -j2pkt/Tej2pmu/T dt du.
T2 L0 L0
(9.118)
It is easy to show that if X(t) is mean square periodic, then this equation reduces to Eq.
(9.117).
9.9.1
Karhunen-Loeve Expansion
In this section we present the Karhunen-Loeve expansion, which allows us to expand a
(possibly nonstationary) random process X(t) in a series:
X1t2 = a Xkfk1t2
q
0 … t … T,
(9.119a)
k=1
where
T
X1t2f…k1t2 dt,
(9.119b)
L0
where the equality in Eq. (9.119a) is in the mean square sense, where the coefficients 5Xk6
are orthogonal random variables, and where the functions 5fk1t26 are orthonormal:
Xk =
L0
T
fi1t2fj1t2 dt = di,j
for all i, j.
In other words, the Karhunen-Loeve expansion provides us with many of the nice properties of the Fourier series for the case where X(t) is not mean square periodic. For simplicity, we again assume that X(t) is zero mean.
In order to motivate the Karhunen-Loeve expansion, we review the KarhunenLoeve transform for vector random variables as introduced in Section 6.3. Let X be a
zero-mean, vector random variable with covariance matrix KX . The eigenvalues and
eigenvectors of KX are obtained from
KXei = liei ,
(9.120)
where the ei are column vectors. The set of normalized eigenvectors are orthonormal,
that is, eTi ej = di, j . Define the matrix P of eigenvectors and ¶ of eigenvalues as
P = 3e1 , e2 , Á , en4
¶ = diag3li4,
then
KX
l1
0
= P¶PT = 3e1 , e2 , Á , en4D
Á
0
0
l2
Á
0
Á
Á
Á
Á
0
eT1
0
eT
TD 2T
Á
o
ln
eTn
Section 9.9
Fourier Series and Karhunen-Loeve Expansion
547
eT1
eT2
= 3l1e1 , l2e2 , Á , lnen4D T
o
eTn
= a lieieTi .
n
(9.121a)
k=1
Therefore we find that the covariance matrix can be expanded as a weighted sum of
matrices, ei eTi . In addition, if we let Y = PTX, then the random variables in Y are orthogonal. Furthermore, since PPT = I, then
Y1
n
Y2
X = PY = 3e1 , e2 , Á , en4D T = a Ykek .
o
k=1
Yn
(9.121b)
Thus we see that the arbitrary vector random variable X can be expanded as a weighted
sum of the eigenvectors of KX , where the coefficients are orthogonal random variables.
Furthermore the eigenvectors form an orthonormal set. These are exactly the properties we seek in the Karhunen-Loeve expansion for X(t). If the vector random variable
X is jointly Gaussian, then the components of Y are independent random variables.
This results in tremendous simplification in a wide variety of problems.
In analogy to Eq. (9.120), we begin by considering the following eigenvalue equation:
T
(9.122)
KX1t1 , t22fk1t22 dt2 = lkfk1t12
0 … t1 … T.
L0
The values lk and the corresponding functions fk1t2 for which the above equation
holds are called the eigenvalues and eigenfunctions of the covariance function
KX1t1 , t22. Note that it is possible for the eigenfunctions to be complex-valued, e.g.,
complex exponentials. It can be shown that if KX1t1 , t22 is continuous, then the normalized eigenfunctions form an orthonormal set and satisfy Mercer’s theorem:
KX1t1 , t22 = a lkfk1t12f…k1t22.
q
(9.123)
k=1
Note the correspondence between Eq. (9.121) and Eq. (9.123). Equation (9.123) in
turn implies that
KX1t, t2 = a lk ƒ fk1t2 ƒ 2.
q
(9.124)
k=1
We are now ready to show that the equality in Eq. (9.119a) holds in the mean
square sense and that the coefficients Xk are orthogonal random variables. First consider E3XkX…m4:
E3XkX…m4
=
E B X…m
L0
T
X1t¿2f…k1t2
dt¿ R =
L0
T
E3X1t¿2X…m4f…k1t¿2 dt¿.
548
Chapter 9
Random Processes
The integrand of the above equation has
T
X*1u2fm1u2 du R =
E3X1t2X…m4 = E B X1t2
L0
= lmfm1t2.
Therefore
L0
T
KX1t, u2fm1u2 du
T
E3XkX…m4 =
lmf…k1t¿2fm1t¿2 dt¿ = lkdk,m ,
L0
where dk,m is the Kronecker delta function. Thus Xk and Xm are orthogonal random
variables. Note that the above equation implies that lk = E C ƒ Xk ƒ 2 D , that is, the eigenvalues are real-valued.
To show that the Karhunen-Loeve expansion equals X(t) in the mean square
sense, we take
E B ` X1t2 - a Xkfk1t2 ` R
q
q
2
k=-
= E3 ƒ X1t2 ƒ 24 - E B X1t2 a X…kf…k1t2 R
q
q
k=-
- E B X*1t2 a Xkfk1t2 R
q
q
k=-
+ E B a a XkX…mfk1t2f…m1t2 R
q
q
q
q
k=-
m=q
= RX1t, t2 - a lk ƒ fk1t2 ƒ 2
q
k=-
- a l…k ƒ fk1t2 ƒ 2 + a lk ƒ fk1t2 ƒ 2.
q
q
q
q
k=-
k=-
The above equation equals zero from Eq. (9.124) and from the fact that the lk are real.
Thus we have shown that Eq. (9.119a) holds in the mean square sense.
Finally, we note that in the important case where X(t) is a Gaussian random process,
then the components Xk will be independent Gaussian random variables.This result is extremely useful in solving certain signal detection and estimation problems. [Van Trees.]
Example 9.51 Wiener Process
Find the Karhunen-Loeve expansion for the Wiener process.
Equation (9.122) for the Wiener process gives, for 0 … t1 … T,
lf1t12 =
L0
= s2
T
s2 min1t1 , t22f1t22 dt2
L0
t1
t2f1t22 dt2 + s2
Lt1
T
t1f1t22 dt2 .
Section 9.9
Fourier Series and Karhunen-Loeve Expansion
549
We differentiate the above integral equation once with respect to t1 to obtain an integral equation and again to obtain a differential equation:
s2
Lt1
T
f1t22 dt2 = l
d
f1t12
dt1
l d2
f1t12.
s2 dt21
-f1t12 =
This second-order differential equation has a sinusoidal solution:
f1t12 = a sin
.
2l
2l
In order to solve the above equation for a, b, and l, we need boundary conditions for the
differential equation. We obtain these by substituting the general solution for f1t2 into the integral equation:
t1
T
st1
st1
l
a sin
+ b cos
t2f1t22 dt2 +
t1f1t22 dt2 .
≤ =
2¢
s
L0
Lt1
2l
2l
st1
+ b cos
st1
As t1 approaches zero, the right-hand side approaches zero. This implies that b = 0 in the lefthand side of the equation. A second boundary condition is obtained by letting t1 approach T in
the equation obtained after the first differentiation of the integral equation:
0 = l
sa
d
sT
f1T2 =
cos
.
dt1
2l
2l
This implies that
= an -
2l
Therefore the eigenvalues are given by
sT
1
bp
2
s2T2
1 2
an - b p2
2
ln =
n = 1, 2, Á .
n = 1, 2, . Á
The normalization requirement implies that
1 =
L0
T
¢ a sin
2l
st
2
T
2
≤ dt = a2 ,
which implies that a = 12/T21/2. Thus the eigenfunctions are given by
fn1t2 =
1 p
2
sinan - b t
AT
2 T
0 … t … T,
and the Karhunen-Loeve expansion for the Wiener process is
2
1 p
X1t2 = a Xn
sinan - b t
T
2
T
A
n-1
q
0 … t 6 T,
where the Xn are zero-mean, independent Gaussian random variables with variance given by ln .
550
Chapter 9
Random Processes
Example 9.52 White Gaussian Noise Process
Find the Karhunen-Loeve expansion of the white Gaussian noise process.
The white Gaussian noise process is the derivative of the Wiener process. If we take the
derivative of the Karhunen-Loeve expansion of the Wiener process, we obtain
X¿1t2 = a
q
n = 1 2l
s
Xn
2
1 p
cosan - b t
2 T
AT
1 p
2
cosan - b t
= a Wn
T
2
T
A
n=1
q
0 … t 6 T,
where the Wn are independent Gaussian random variables with the same variance s2. This implies that the process has infinite power, a fact we had already found about the white Gaussian
noise process. In the Problems we will see that any orthonormal set of eigenfunctions can be
used in the Karhunen-Loeve expansion for white Gaussian noise.
9.10
GENERATING RANDOM PROCESSES
Many engineering systems involve random processes that interact in complex ways. It
is not always possible to model these systems precisely using analytical methods. In
such situations computer simulation methods are used to investigate the system dynamics and to measure the performance parameters of interest. In this section we consider two basic methods to generating random processes. The first approach involves
generating the sum process of iid sequences of random variables. We saw that this approach can be used to generate the binomial and random walk processes, and, through
limiting procedures, the Wiener and Poisson processes. The second approach involves
taking the linear combination of deterministic functions of time where the coefficients
are given by random variables. The Fourier series and Karhunen-Loeve expansion use
this approach. Real systems, e.g., digital modulation systems, also generate random
processes in this manner.
9.10.1 Generating Sum Random Processes
The generation of sample functions of the sum random process involves two steps:
1. Generate a sequence of iid random variables that drive the sum process.
2. Generate the cumulative sum of the iid sequence.
Let D be an array of samples of the desired iid random variables. The function
cumsum(D) in Octave and MATLAB then provides the cumulative sum, that is, the sum
process, that results from the sequence in D.
The code below generates m realizations of an n-step random walk process.
>p=1/2
>n=1000
>m=4
Section 9.10 Generating Random Processes
551
> V=-1:2:1;
> P=[1-p,p];
> D=discrete_rnd(V, P, m, n);
> X=cumsum (D);
> plot (X)
Figures 9.7(a) and 9.7(b) in Section 9.3 show four sample functions of the symmetric random walk process for p = 1/2. The sample functions vary over a wide range of positive
and negative values. Figure 9.7(c) shows four sample functions for p = 3/4. The sample
functions now have a strong linear trend consistent with the mean n12p - 12. The variability about this trend is somewhat less than in the symmetric case since the variance
function is now n4p11 - p2 = 3n/4.
We can generate an approximation to a Poisson process by summing iid
Bernoulli random variables. Figure 9.18(a) shows ten realizations of Poisson processes
with l = 0.4 arrivals per second. The sample functions for T = 50 seconds were generated using a 1000-step binomial process with p = lT/n = 0.02. The linear increasing trend of the Poisson process is evident in the figure. Figure 9.18(b) shows the
estimate of the mean and variance functions obtained by averaging across the 10 realizations. The linear trend in the sample mean function is very clear; the sample variance function is also linear but is much more variable. The mean and variance
functions of the realizations are obtained using the commands mean(transpose(X))
and var(transpose(X)).
We can generate sample functions of the random telegraph signal by taking the
Poisson process N(t) and calculating X1t2 = 21N1t2 modulo 22 - 1. Figure 9.19(a)
shows a realization of the random telegraph signal. Figure 9.19(b) shows an estimate of
the covariance function of the random telegraph signal. The exponential decay in the
covariance function can be seen in the figure. See Eq. (9.44).
25
20
15
10
5
0
0
5
10
15
20
25
(a)
30
35
40
45
50
20
18
16
14
12
10
8
6
4
2
0
0
100 200 300 400 500 600 700 800 900 1000
(b)
FIGURE 9.18
(a) Ten sample functions of a Poisson random process with l = 0.4. (b) Sample mean and variance of ten sample
functions of a Poisson random process with l = 0.4.
552
Chapter 9
Random Processes
1.5
1
0.5
0
⫺0.5
⫺1
⫺1.5
0
10
20
30
40
50
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
20
40
(a)
60
80
100
(b)
FIGURE 9.19
(a) Sample function of a random telegraph process with l ⴝ 0.4. (b) Estimate of covariance function of a random
telegraph process.
The covariance function is computed using the function CX_est below.
function [CXall]=CX_est (X, L, M_est)
N=length(X);
% N is number of samples
CX=zeros (1,L+1);
% L is maximum lag
M_est=mean(X)
% Sample mean
for m=1:L+1,
% Add product terms
for n=1:N-m+1,
CX(m)=CX(m) + (X(n) - M_est) * (X(n+m-1)- M_est);
end;
CX (m)=CX(m) / (N-m+1);
% Normalize by number of terms
end;
for i=1:L,
CXall(i)=CX(L+2-i);
% Lags 1 to L
end
CXall(L+1:2*L+1)=CX(1:L+1);
% Lags L + 1 to 2L + 1
The Wiener random process can also be generated as a sum process. One approach is to generate a properly scaled random walk process, as in Eq. (9.50). A better
approach is to note that the Wiener process has independent Gaussian increments, as
in Eq. (9.52), and therefore, to generate the sequence D of increments for the time
subintervals, and to then find the corresponding sum process. The code below generates a sample of the Wiener process:
> a=2
> delta=0.001
> n=1000
> D=normal_rnd(0,a*delta,1,n)
> X=cumsum(D);
> plot(X)
Section 9.10 Generating Random Processes
553
3
2.5
2
1.5
1
0.5
0
⫺0.5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
FIGURE 9.20
Sample mean and variance functions from 50 realizations of
Wiener process.
Figure 9.12 in Section 9.5 shows four sample functions of a Brownian motion process
with a = 2. Figure 9.20 shows the sample mean and sample variance of 50 sample
functions of the Wiener process with a = 2. It can be seen that the mean across the 50
realizations is close to zero which is the actual mean function for the process. The sample variance across the 50 realizations increases steadily and is close to the actual variance function which is at = 2t.
9.10.2 Generating Linear Combinations of Deterministic Functions
In some situations a random process can be represented as a linear combination
of deterministic functions where the coefficients are random variables. The Fourier series and the Karhunen-Loeve expansions are examples of this type of representation.
In Example 9.51 let the parameters in the Karhunen-Loeve expansion for a
Wiener process in the interval 0 … t … T be T = 1, s2 = 1:
1 pt
1
2
= a Xn 22 sinan - bpt
sinan - b
X1t2 = a Xn
T
2
T
2
A
n=1
n=1
q
q
where the Xn are zero-mean, independent Gaussian random variables with variance
ln =
s2T2
1
=
.
1n - 1/222p2
1n - 1/222p2
The following code generates the 100 Gaussian coefficients for the Karhunen-Loeve
expansion for the Wiener process.
554
Chapter 9
Random Processes
1
0.5
0
⫺0.5
⫺1
⫺1.5
⫺2
0
10
20
30
40
50
60
70
80
90
100
FIGURE 9.21
Sample functions for Wiener process using 100 terms in KarhunenLoeve expansion.
>
>
>
>
>
>
>
>
>
>
M=zeros(100,1);
n=1:1:100;
N=transpose(n);
v=1./((N-0.5).^2 *pi ^2);
t=0.01:0.01:1;
p=(N-0.5)*t;
x=normal_rnd(M,v,100,1);
y=sqrt(2)*sin(pi *p);
z=transpose(x)*y
plot(z)
% Number of coefficients
% Variances of coefficients
% Argument of sinusoid
% Gaussian coefficients
% sin terms
Figure 9.21 shows the Karhunen-Loeve expansion for the Wiener process using 100
terms. The sample functions generally exhibit the same type behavior as in the previous
figures. The sample functions, however, do not exhibit the jaggedness of the other examples, which are based on the generation of many more random variables.
SUMMARY
• A random process or stochastic process is an indexed family of random variables
that is specified by the set of joint distributions of any number and choice of random variables in the family. The mean, autocovariance, and autocorrelation functions summarize some of the information contained in the joint distributions of
pairs of time samples.
• The sum process of an iid sequence has the property of stationary and independent increments, which facilitates the evaluation of the joint pdf/pmf of the
Checklist of Important Terms
•
•
•
•
•
•
•
•
•
•
•
555
process at any set of time instants. The binomial and random processes are sum
processes. The Poisson and Wiener processes are obtained as limiting forms of
these sum processes.
The Poisson process has independent, stationary increments that are Poisson distributed. The interarrival times in a Poisson process are iid exponential random
variables.
The mean and covariance functions completely specify all joint distributions of a
Gaussian random process.
The Wiener process has independent, stationary increments that are Gaussian
distributed. The Wiener process is a Gaussian random process.
A random process is stationary if its joint distributions are independent of the
choice of time origin. If a random process is stationary, then mX1t2 is constant,
and RX1t1 , t22 depends only on t1 - t2 .
A random process is wide-sense stationary (WSS) if its mean is constant and if its
autocorrelation and autocovariance depend only on t1 - t2 . A WSS process need
not be stationary.
A wide-sense stationary Gaussian random process is also stationary.
A random process is cyclostationary if its joint distributions are invariant with respect to shifts of the time origin by integer multiples of some period T.
The white Gaussian noise process results from taking the derivative of the
Wiener process.
The derivative and integral of a random process are defined as limits of random
variables. We investigated the existence of these limits in the mean square sense.
The mean and autocorrelation functions of the output of systems described by a
linear differential equation and subject to random process inputs can be obtained
by solving a set of differential equations. If the input process is a Gaussian random process, then the output process is also Gaussian.
Ergodic theorems state when time-average estimates of a parameter of a random
process converge to the expected value of the parameter. The decay rate of the
covariance function determines the convergence rate of the sample mean.
CHECKLIST OF IMPORTANT TERMS
Autocorrelation function
Autocovariance function
Average power
Bernoulli random process
Binomial counting process
Continuous-time process
Cross-correlation function
Cross-covariance function
Cyclostationary random process
Discrete-time process
Ergodic theorem
Fourier series
Gaussian random process
Hurst parameter
iid random process
Independent increments
Independent random processes
Karhunen-Loeve expansion
Markov random process
Mean ergodic random process
556
Chapter 9
Random Processes
Mean function
Mean square continuity
Mean square derivative
Mean square integral
Mean square periodic process
Ornstein-Uhlenbeck process
Orthogonal random processes
Poisson process
Random process
Random telegraph signal
Random walk process
Realization, sample path, or sample
function
Shot noise
Stationary increments
Stationary random process
Stochastic process
Sum random process
Time average
Uncorrelated random processes
Variance of X(t)
White Gaussian noise
Wide-sense cyclostationary process
Wiener process
WSS random process
ANNOTATED REFERENCES
References [1] through [6] can be consulted for further reading on random processes.
Larson and Shubert [ref 7] and Yaglom [ref 8] contain excellent discussions on white
Gaussian noise and Brownian motion. Van Trees [ref 9] gives detailed examples on the
application of the Karhunen-Loeve expansion. Beran [ref 10] discusses long memory
processes.
1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes,
McGraw-Hill, New York, 2002.
2. W. B. Davenport, Probability and Random Processes: An Introduction for Applied
Scientists and Engineers, McGraw-Hill, New York, 1970.
3. H. Stark and J. W. Woods, Probability and Random Processes with Applications to
Signal Processing, 3d ed., Prentice Hall, Upper Saddle River, N.J., 2002.
4. R. M. Gray and L. D. Davisson, Random Processes: A Mathematical Approach for
Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986.
5. J. A. Gubner, Probability and Random Processes for Electrical and Computer Engineering, Cambridge University Press, Cambridge, 2006.
6. G. Grimett and D. Stirzaker, Probability and Random Processes, Oxford University Press, Oxford, 2006.
7. H. J. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences, vol. 1,
Wiley, New York, 1979.
8. A. M. Yaglom, Correlation Theory of Stationary and Related Random Functions,
vol. 1: Basic Results, Springer-Verlag, New York, 1987.
9. H. L. Van Trees, Detection, Estimation, and Modulation Theory, Wiley, New York,
1987.
10. J. Beran, Statistics for Long-Memory Processes, Chapman & Hall/CRC, New
York, 1994.
Problems
557
PROBLEMS
Sections 9.1 and 9.2: Definition and Specification of a Stochastic Process
9.1. In Example 9.1, find the joint pmf for X1 and X2 . Why are X1 and X2 independent?
9.2. A discrete-time random process Xn is defined as follows. A fair die is tossed and the outcome k is observed. The process is then given by Xn = k for all n.
(a) Sketch some sample paths of the process.
(b) Find the pmf for Xn .
(c) Find the joint pmf for Xn and Xn + k .
(d) Find the mean and autocovariance functions of Xn .
9.3. A discrete-time random process Xn is defined as follows. A fair coin is tossed. If the outcome is heads, Xn = 1-12n for all n; if the outcome is tails, Xn = 1-12n + 1 for all n.
(a) Sketch some sample paths of the process.
(b) Find the pmf for Xn .
(c) Find the joint pmf for Xn and Xn + k .
(d) Find the mean and autocovariance functions of Xn .
9.4. A discrete-time random process is defined by Xn = sn, for n Ú 0, where s is selected at
random from the interval (0, 1).
(a) Sketch some sample paths of the process.
(b) Find the cdf of Xn .
(c) Find the joint cdf for Xn and Xn + 1 .
(d) Find the mean and autocovariance functions of Xn .
(e) Repeat parts a, b, c, and d if s is uniform in (1, 2).
9.5. Let g(t) be the rectangular pulse shown in Fig. P9.1. The random process X(t) is defined as
X1t2 = Ag1t2,
where A assumes the values ;1 with equal probability.
1
0
1
t
FIGURE P9.1
(a) Find the pmf of X(t).
(b) Find mX1t2.
(c) Find the joint pmf of X(t) and X1t + d2.
(d) Find CX1t, t + d2, d 7 0.
9.6. A random process is defined by
Y1t2 = g1t - T2,
where g(t) is the rectangular pulse of Fig. P9.1, and T is a uniformly distributed random
variable in the interval (0, 1).
558
Chapter 9
Random Processes
(a) Find the pmf of Y(t).
(b) Find mY1t2 and CY1t1 , t22.
9.7. A random process is defined by
X1t2 = g1t - T2,
where T is a uniform random variable in the interval (0, 1) and g(t) is the periodic triangular waveform shown in Fig. P9.2.
1
0
1
2
t
3
FIGURE P9.2
9.8.
9.9.
9.10.
9.11.
(a) Find the cdf of X(t) for 0 6 t 6 1.
(b) Find mX(t) and CX1t1 , t22.
Let Y1t2 = g1t - T2 as in Problem 9.6, but let T be an exponentially distributed random
variable with parameter a.
(a) Find the pmf of Y(t).
(b) Find the joint pmf of Y(t) and Y1t + d2. Consider two cases: d 7 1, and 0 6 d 6 1.
(c) Find mY1t2 and CY1t, t + d2 for d 7 1 and 0 6 d 6 1.
Let Z1t2 = At3 + B, where A and B are independent random variables.
(a) Find the pdf of Z(t).
(b) Find mZ1t2 and CZ1t1 , t22.
Find an expression for E3 ƒ Xt2 - Xt1 ƒ 24 in terms of autocorrelation function.
The random process H(t) is defined as the “hard-limited” version of X(t):
H1t2 = b
+1
-1
if
if
X1t2 Ú 0
X1t2 6 0.
(a) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with a random
amplitude presented in Example 9.2.
(b) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with random
phase presented in Example 9.9.
(c) Find a general expression for the mean of H(t) in terms of the cdf of X(t).
9.12. (a) Are independent random processes orthogonal? Explain.
(b) Are orthogonal random processes uncorrelated? Explain.
(c) Are uncorrelated processes independent?
(d) Are uncorrelated processes orthogonal?
9.13. The random process Z(t) is defined by
Z1t2 = 2Xt - Y,
Problems
9.14.
9.15.
9.16.
9.17.
9.18.
9.19.
559
where X and Y are a pair of random variables with means mX , mY , variances s2X , s2Y ,
and correlation coefficient rX,Y . Find the mean and autocovariance of Z(t).
Let H(t) be the output of the hard limiter in Problem 9.11.
(a) Find the cross-correlation and cross-covariance between H(t) and X(t) when the
input is a sinusoid with random amplitude as in Problem 9.11a.
(b) Repeat if the input is a sinusoid with random phase as in Problem 9.11b.
(c) Are the input and output processes uncorrelated? Orthogonal?
Let Yn = Xn + g1n2 where Xn is a zero-mean discrete-time random process and g(n) is
a deterministic function of n.
(a) Find the mean and variance of Yn .
(b) Find the joint cdf of Yn and Yn + 1 .
(c) Find the autocovariance function of Yn .
(d) Plot typical sample functions forXn and Yn if: g1n2 = n; g1n2 = 1/n2; g1n2 = 1/n.
Let Yn = c1n2Xn where Xn is a zero-mean, unit-variance, discrete-time random process
and c(n) is a deterministic function of n.
(a) Find the mean and variance of Yn .
(b) Find the joint cdf of Yn and Yn + 1 .
(c) Find the autocovariance function of Yn .
(d) Plot typical sample functions forXn and Yn if: c1n2 = n; c1n2 = 1/n2; c1n2 = 1/n.
(a) Find the cross-correlation and cross-covariance for Xn and Yn in Problem 9.15.
(b) Find the joint pdf of Xn and Yn + 1 .
(c) Determine whether Xn and Yn are uncorrelated, independent, or orthogonal random processes.
(a) Find the cross-correlation and cross-covariance for Xn and Yn in Problem 9.16.
(b) Find the joint pdf of Xn and Yn + 1 .
(c) Determine whether Xn and Yn are uncorrelated, independent, or orthogonal random processes.
Suppose that X(t) and Y(t) are independent random processes and let
U1t2 = X1t2 - Y1t2
V1t2 = X1t2 + Y1t2.
(a) Find CUX1t1 , t22, CUY1t1 , t22, and CUV1t1 , t22.
(b) Find the fU1t12X1t221u, x2, and fU1t12V1t221u, v2. Hint: Use auxiliary variables.
9.20. Repeat Problem 9.19 if X(t) and Y(t) are independent discrete-time processes and X(t)
and Y(t) have different iid random processes.
Section 9.3: Sum Process, Binomial Counting Process, and Random Walk
9.21. (a) Let Yn be the process that results when individual 1’s in a Bernoulli process are
erased with probability a. Find the pmf of S¿n , the counting process for Yn . Does Yn
have independent and stationary increments?
(b) Repeat part a if in addition to the erasures, individual 0’s in the Bernoulli process
are changed to 1’s with probability b.
9.22. Let Sn denote a binomial counting process.
560
Chapter 9
Random Processes
(a) Show that P3Sn = j, Sn¿ = i4 Z P3Sn = j4P3Sn¿ = i4.
(b) Find P3Sn2 = j ƒ Sn1 = i4, where n2 7 n1 .
(c) Show that P3Sn2 = j ƒ Sn1 = i, Sn0 = k4 = P3Sn2 = j ƒ Sn1 = i4, where n2 7 n1 7 n0 .
9.23. (a) Find P3Sn = 04 for the random walk process.
(b) What is the answer in part a if p = 1/2?
9.24. Consider the following moving average processes:
Yn = 1/21Xn + Xn - 12
Zn = 2/3 Xn + 1/3 Xn - 1
X0 = 0
X0 = 0
(a) Find the mean, variance, and covariance of Yn and Zn if Xn is a Bernoulli random
process.
(b) Repeat part a if Xn is the random step process.
(c) Generate 100 outcomes of a Bernoulli random process Xn , and find the resulting Yn
and Zn . Are the sample means of Yn and Zn in part a close to their respective
means?
(d) Repeat part c with Xn given by the random step process.
9.25. Consider the following autoregressive processes:
Wn = 2Wn - 1 + Xn
W0 = 0
Zn = 3/4 Zn - 1 + Xn Z0 = 0.
(a) Suppose that Xn is a Bernoulli process. What trends do the processes exhibit?
(b) Express Wn and Zn in terms of Xn , Xn - 1 , Á , X1 and then find E3Wn4 and E3Zn4.
Do these results agree with the trends you expect?
(c) Do Wn or Zn have independent increments? stationary increments?
(d) Generate 100 outcomes of a Bernoulli process. Find the resulting realizations of Wn
and Zn . Is the sample mean meaningful for either of these processes?
(e) Repeat part d if Xn is the random step process.
9.26. Let Mn be the discrete-time process defined as the sequence of sample means of an iid
sequence:
X1 + X2 + Á + Xn
.
Mn =
n
(a) Find the mean, variance, and covariance of Mn .
(b) Does Mn have independent increments? stationary increments?
9.27. Find the pdf of the processes defined in Problem 9.24 if the Xn are an iid sequence of
zero-mean, unit-variance Gaussian random variables.
9.28. Let Xn consist of an iid sequence of Cauchy random variables.
(a) Find the pdf of the sum process Sn . Hint: Use the characteristic function method.
(b) Find the joint pdf of Sn and Sn + k .
9.29. Let Xn consist of an iid sequence of Poisson random variables with mean a.
(a) Find the pmf of the sum process Sn .
(b) Find the joint pmf of Sn and Sn + k .
Problems
561
9.30. Let Xn be an iid sequence of zero-mean, unit-variance Gaussian random variables.
(a) Find the pdf of Mn defined in Problem 9.26.
(b) Find the joint pdf of Mn and Mn + k . Hint: Use the independent increments property
of Sn .
9.31. Repeat Problem 9.26 with Xn = 1/21Yn + Yn - 12, where Yn is an iid random process.
What happens to the variance of Mn as n increases?
9.32. Repeat Problem 9.26 with Xn = 3/4Xn - 1 + Yn where Yn is an iid random process. What
happens to the variance of Mn as n increases?
9.33. Suppose that an experiment has three possible outcomes, say 0, 1, and 2, and suppose that
these occur with probabilities p0 , p1 , and p2 , respectively. Consider a sequence of independent repetitions of the experiment, and let Xj1n2 be the indicator function for outcome j. The vector
X1n2 = 1X01n2, X11n2, X21n22
then constitutes a vector-valued Bernoulli random process. Consider the counting
process for X(n):
S1n2 = X1n2 + X1n - 12 + Á + X112
S102 = 0.
(a) Show that S(n) has a multinomial distribution.
(b) Show that S(n) has independent increments, then find the joint pmf of S(n) and
S1n + k2.
(c) Show that components Sj1n2 of the vector process are binomial counting
processes.
Section 9.4: Poisson and Associated Random Processes
9.34. A server handles queries that arrive according to a Poisson process with a rate of 10
queries per minute. What is the probability that no queries go unanswered if the server is
unavailable for 20 seconds?
9.35. Customers deposit $1 in a vending machine according to a Poisson process with rate l.
The machine issues an item with probability p. Find the pmf for the number of items dispensed in time t.
9.36. Noise impulses occur in a radio transmission according to a Poisson process of rate l.
(a) Find the probability that no impulses occur during the transmission of a message
that is t seconds long.
(b) Suppose that the message is encoded so that the errors caused by up to 2 impulses can
be corrected. What is the probability that a t-second message cannot be corrected?
9.37. Packets arrive at a multiplexer at two ports according to independent Poisson processes
of rates l1 = 1 and l2 = 2 packets/second, respectively.
(a) Find the probability that a message arrives first on line 2.
(b) Find the pdf for the time until a message arrives on either line.
(c) Find the pmf for N(t), the total number of messages that arrive in an interval of
length t.
(d) Generalize the result of part c for the “merging” of k independent Poisson processes
of rates ll , Á , lk , respectively:
N1t2 = N11t2 + Á + Nk1t2.
562
Chapter 9
Random Processes
9.38. (a) Find P3N1t - d2 = j ƒ N1t2 = k4 with d 7 0, where N(t) is a Poisson process with
rate l.
(b) Compare your answer to P3N1t + d2 = j ƒ N1t2 = k4. Explain the difference, if
any.
9.39. Let N11t2 be a Poisson process with arrival rate l1 that is started at t = 0. Let N21t2 be
another Poisson process that is independent of N11t2, that has arrival rate l2 , and that is
started at t = 1.
(a) Show that the pmf of the process N1t2 = N11t2 + N21t2 is given by:
P3N1t + t2 - N1t2 = k4 =
1m1t + t2 - m1t22k
k!
e -1m1t + t2 - m1t22
for k = 0, 1, Á
where m1t2 = E3N1t24.
(b) Now consider a Poisson process in which the arrival rate l1t2 is a piecewise constant
function of time. Explain why the pmf of the process is given by the above pmf
where
m1t2 =
L0
t
l1t¿2 dt¿.
(c) For what other arrival functions l1t2 does the pmf in part a hold?
9.40. (a) Suppose that the time required to service a customer in a queueing system is a random variable T. If customers arrive at the system according to a Poisson process
with parameter l, find the pmf for the number of customers that arrive during one
customer’s service time. Hint: Condition on the service time.
(b) Evaluate the pmf in part a if T is an exponential random variable with parameter b.
9.41. (a) Is the difference of two independent Poisson random processes also a Poisson
process?
(b) Let Np1t2 be the number of complete pairs generated by a Poisson process up to
time t. Explain why Np1t2 is or is not a Poisson process.
9.42. Let N(t) be a Poisson random process with parameter l. Suppose that each time an event
occurs, a coin is flipped and the outcome (heads or tails) is recorded. Let N11t2 and N21t2
denote the number of heads and tails recorded up to time t, respectively. Assume that p is
the probability of heads.
(a) Find P3N11t2 = j, N21t2 = k ƒ N1t2 = k + j4.
(b) Use part a to show that N11t2 and N21t2 are independent Poisson random variables
of rates plt and 11 - p2lt, respectively:
P3N11t2 = j, N21t2 = k4 =
1plt2j
j!
e-plt
111 - p2lt2k
k!
e-11 - p2lt.
9.43. Customers play a $1 game machine according to a Poisson process with rate l. Suppose
the machine dispenses a random reward X each time it is played. Let X(t) be the total
reward issued up to time t.
(a) Find expressions forP3X1t2 = j4 if Xn is Bernoulli.
(b) Repeat part a if X assumes the values 50, 56 with probabilities (5/6, 1/6).
Problems
563
(c) Repeat part a if X is Poisson with mean 1.
(d) Repeat part a if with probability p the machine returns all the coins.
9.44. Let X(t) denote the random telegraph signal, and let Y(t) be a process derived from X(t)
as follows: Each time X(t) changes polarity, Y(t) changes polarity with probability p.
(a) Find the P3Y1t2 = ;14.
(b) Find the autocovariance function of Y(t). Compare it to that of X(t).
9.45. Let Y(t) be the random signal obtained by switching between the values 0 and 1 according to the events in a Poisson process of rate l. Compare the pmf and autocovariance of
Y(t) with that of the random telegraph signal.
9.46. Let Z(t) be the random signal obtained by switching between the values 0 and 1 according to the events in a counting process N(t). Let
P3N1t2 = k4 =
k
lt
1
a
b
1 + lt 1 + lt
k = 0, 1, 2, Á .
(a) Find the pmf of Z(t).
(b) Find mZ1t2.
9.47. In the filtered Poisson process (Eq. (9.45)), let h(t) be a pulse of unit amplitude and duration T seconds.
(a) Show that X(t) is then the increment in the Poisson process in the interval 1t - T, t2.
(b) Find the mean and autocorrelation functions of X(t).
9.48. (a) Find the second moment and variance of the shot noise process discussed in
Example 9.25.
(b) Find the variance of the shot noise process if h1t2 = e -bt for t Ú 0.
9.49. Messages arrive at a message center according to a Poisson process of rate l. Every
hour the messages that have arrived during the previous hour are forwarded to their
destination. Find the mean of the total time waited by all the messages that arrive
during the hour. Hint: Condition on the number of arrivals and consider the arrival
instants.
Section 9.5: Gaussian Random Process, Wiener Process and Brownian Motion
9.50. Let X(t) and Y(t) be jointly Gaussian random processes. Explain the relation between the conditions of independence, uncorrelatedness, and orthogonality of X(t)
and Y(t).
9.51. Let X(t) be a zero-mean Gaussian random process with autocovariance function given by
CX1t1 , t22 = 4e-2ƒt1 - t2ƒ.
Find the joint pdf of X(t) and X1t + s2.
9.52. Find the pdf of Z(t) in Problem 9.13 if X and Y are jointly Gaussian random variables.
9.53. Let Y1t2 = X1t + d2 - X1t2, where X(t) is a Gaussian random process.
(a) Find the mean and autocovariance of Y(t).
(b) Find the pdf of Y(t).
(c) Find the joint pdf of Y(t) and Y1t + s2.
(d) Show that Y(t) is a Gaussian random process.
564
Chapter 9
Random Processes
9.54. Let X1t2 = A cos vt + B sin vt, where A and B are iid Gaussian random variables with
zero mean and variance s2.
(a) Find the mean and autocovariance of X(t).
(b) Find the joint pdf of X(t) and X1t + s2.
9.55. Let X(t) and Y(t) be independent Gaussian random processes with zero means and the
same covariance function C1t1 , t22. Define the “amplitude-modulated signal” by
Z1t2 = X1t2 cos vt + Y1t2 sin vt.
(a) Find the mean and autocovariance of Z(t).
(b) Find the pdf of Z(t).
9.56. Let X(t) be a zero-mean Gaussian random process with autocovariance function given by
CX1t1 , t22. If X(t) is the input to a “square law detector,” then the output is
Y1t2 = X1t22.
Find the mean and autocovariance of the output Y(t).
9.57. Let Y1t2 = X1t2 + mt, where X(t) is the Wiener process.
(a) Find the pdf of Y(t).
(b) Find the joint pdf of Y(t) and Y1t + s2.
9.58. Let Y1t2 = X21t2, where X(t) is the Wiener process.
(a) Find the pdf of Y(t).
(b) Find the conditional pdf of Y1t22 given Y1t12.
9.59. Let Z1t2 = X1t2 - aX1t - s2, where X(t) is the Wiener process.
(a) Find the pdf of Z(t).
(b) Find mZ1t2 and CZ1t1 , t22.
9.60. (a) For X(t) the Wiener process with a = 1 and 0 6 t 6 1, show that the joint pdf of
X(t) and X(1) is given by:
fX1t2, X1121x1 , x22 =
exp b -
1x2 - x122
1 x21
B +
Rr
2 t
11 - t2
2p2t11 - t2
.
(b) Use part a to show that for 0 6 t 6 1, the conditional pdf of X(t) given
X102 = X112 = 0 is:
fX1t21x ƒ X102 = X112 = 02 =
exp b -
1
x2
B
Rr
2 t11 - t2
2p2t11 - t2
.
(c) Use part b to find the conditional pdf of X(t) given X1t12 = a and X1t22 = b for
t1 6 t 6 t2 . Hint: Find the equivalent process in the interval 10, t2 - t12.
Problems
565
Section 9.6: Stationary Random Processes
9.61. (a) Is the random amplitude sinusoid in Example 9.9 a stationary random process? Is it
wide-sense stationary?
(b) Repeat part a for the random phase sinusoid in Example 9.10.
9.62. A discrete-time random process Xn is defined as follows. A fair coin is tossed; if the outcome is heads then Xn = 1 for all n, and Xn = -1 for all n, otherwise.
(a) Is Xn a WSS random process?
(b) Is Xn a stationary random process?
(c) Do the answers in parts a and b change if p is a biased coin?
9.63. Let Xn be the random process in Problem 9.3.
(a) Is Xn a WSS random process?
(b) Is Xn a stationary random process?
(c) Is Xn a cyclostationary random process?
9.64. Let X1t2 = g1t - T2, where g(t) is the periodic waveform introduced in Problem 9.7,
and T is a uniformly distributed random variable in the interval (0, 1). Is X(t) a stationary
random process? Is X(t) wide-sense stationary?
9.65. Let X(t) be defined by
X1t2 = A cos vt + B sin vt,
where A and B are iid random variables.
(a) Under what conditions is X(t) wide-sense stationary?
(b) Show that X(t) is not stationary. Hint: Consider E3X31t24.
9.66. Consider the following moving average process:
Yn = 1/21Xn + Xn - 12
X0 = 0.
(a) Is Yn a stationary random process if Xn is an iid integer-valued process?
(b) Is Yn a stationary random process if Xn is a stationary process?
(c) Are Yn and Xn jointly stationary random processes if Xn is an iid process? a stationary process?
9.67. Let Xn be a zero-mean iid process, and let Zn be an autoregressive random process
Zn = 3/4Zn - 1 + Xn
Z0 = 0.
(a) Find the autocovariance of Zn and determine whether Zn is wide-sense stationary.
Hint: Express Zn in terms of Xn , Xn - 1 , Á , X1 .
(b) Does Zn eventually settle down into stationary behavior?
(c) Find the pdf of Zn if Xn is an iid sequence of zero-mean, unit-variance Gaussian random variables. What is the pdf of Zn as n : q ?
9.68. Let Y1t2 = X1t + s2 - bX1t2, where X(t) is a wide-sense stationary random process.
(a) Determine whether Y(t) is also a wide-sense stationary random process.
(b) Find the cross-covariance function of Y(t) and X(t). Are the processes jointly widesense stationary?
566
Chapter 9
Random Processes
(c) Find the pdf of Y(t) if X(t) is a Gaussian random process.
(d) Find the joint pdf of Y1t12 and Y1t22 in part c.
(e) Find the joint pdf of Y1t12 and X1t22 in part c.
9.69. Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero
means and the same covariance function CX1t2. Let Z(t) be defined by
Z1t2 = 3X1t2 - 5Y1t2.
(a) Determine whether Z(t) is also wide-sense stationary.
(b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean random processes with CX1t2 = 4e-ƒtƒ.
(c) Find the joint pdf of Z1t12 and Z1t22 in part b.
(d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly stationary random processes?
(e) Find the joint pdf of Z1t12 and X1t22 in part b. Hint: Use auxilliary variables.
9.70. Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero
means and the same covariance function CX1t2. Let Z(t) be defined by
Z1t2 = X1t2 cos vt + Y1t2 sin vt.
(a) Determine whether Z(t) is a wide-sense stationary random process.
(b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean random processes with CX1t2 = 4e-ƒtƒ.
(c) Find the joint pdf of Z1t12 and Z1t22 in part b.
(d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly stationary random processes?
(e) Find the joint pdf of Z1t12 and X1t22 in part b.
9.71. Let X(t) be a zero-mean, wide-sense stationary Gaussian random process with autocorrelation function RX1t2. The output of a “square law detector” is
Y1t2 = X1t22.
2
Show that RY1t2 = RX1022 + 2RX
1t2. Hint: For zero-mean, jointly Gaussian random
2 2
2
2
variables E3X Z 4 = E3X 4E3Z 4 + 2E3XZ42.
9.72. A WSS process X(t) has mean 1 and autocorrelation function given in Fig. P9.3.
RX (t)
4
2
2
2
⫺9 ⫺8 ⫺7 ⫺6 ⫺5 ⫺4 ⫺3 ⫺2 ⫺1 0 1 2 3 4 5 6 7
FIGURE P9.3
(a) Find the mean component of RX1t2.
(b) Find the periodic component of RX1t2.
(c) Find the remaining component of RX1t2.
t
Problems
567
9.73. Let Xn and Yn be independent random processes. A multiplexer combines these two sequences into a combined sequence Uk , that is,
U2n = Xn ,
9.74.
9.75.
9.76.
9.77.
9.78.
9.79.
9.80.
9.81.
U2n + 1 = Yn .
(a) Suppose that Xn and Yn are independent Bernoulli random processes. Under
what conditions is Uk a stationary random process? a cyclostationary random
process?
(b) Repeat part a if Xn and Yn are independent stationary random processes.
(c) Suppose that Xn and Yn are wide-sense stationary random processes. Is Uk a widesense stationary random process? a wide-sense cyclostationary random process?
Find the mean and autocovariance functions of Uk .
(d) If Uk is wide-sense cyclostationary, find the mean and correlation function of the
randomly phase-shifted version of Uk as defined by Eq. (9.72).
A ternary information source produces an iid, equiprobable sequence of symbols from
the alphabet 5a, b, c6. Suppose that these three symbols are encoded into the respective
binary codewords 00, 01, 10. Let Bn be the sequence of binary symbols that result from
encoding the ternary symbols.
(a) Find the joint pmf of Bn and Bn + 1 for n even; n odd. Is Bn stationary? cyclostationary?
(b) Find the mean and covariance functions of Bn . Is Bn wide-sense stationary? widesense cyclostationary?
(c) If Bn is cyclostationary, find the joint pmf, mean, and autocorrelation functions of the
randomly phase-shifted version of Bn as defined by Eq. (9.72).
Let s(t) be a periodic square wave with period T = 1 which is equal to 1 for the first half
of a period and -1 for the remainder of the period. Let X1t2 = As1t2, where A is a random variable.
(a) Find the mean and autocovariance functions of X(t).
(b) Is X(t) a mean-square periodic process?
(c) Find the mean and autocovariance of Xs1t2 the randomly phase-shifted version of
X(t) given by Eq. (9.72).
Let X1t2 = As1t2 and Y1t2 = Bs1t2, where A and B are independent random variables
that assume values +1 or -1 with equal probabilities, where s(t) is the periodic square
wave in Problem 9.75.
(a) Find the joint pmf of X1t12 and Y1t22.
(b) Find the cross-covariance of X(t1) and Y(t2).
(c) Are X(t) and Y(t) jointly wide-sense cyclostationary? Jointly cyclostationary?
Let X(t) be a mean square periodic random process. Is X(t) a wide-sense cyclostationary
process?
Is the pulse amplitude modulation random process in Example 9.38 cyclostationary?
Let X(t) be the random amplitude sinusoid in Example 9.37. Find the mean and autocorrelation functions of the randomly phase-shifted version of X(t) given by Eq. (9.72).
Complete the proof that if X(t) is a cyclostationary random process, then Xs1t2, defined
by Eq. (9.72), is a stationary random process.
Show that if X(t) is a wide-sense cyclostationary random process, then Xs1t2, defined by
Eq. (9.72), is a wide-sense stationary random process with mean and autocorrelation
functions given by Eqs. (9.74a) and (9.74b).
568
Chapter 9
Random Processes
Section 9.7: Continuity, Derivatives, and Integrals of Random Processes
9.82. Let the random process X1t2 = u1t - S2 be a unit step function delayed by an exponential random variable S, that is, X1t2 = 1 for t Ú S, and X1t2 = 0 for t 6 S.
(a) Find the autocorrelation function of X(t).
(b) Is X(t) mean square continuous?
(c) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation
functions.
(d) Does X(t) have a mean square integral? If so, find its mean and autocovariance
functions.
9.83. Let X(t) be the random telegraph signal introduced in Example 9.24.
(a) Is X(t) mean square continuous?
(b) Show that X(t) does not have a mean square derivative, and show that the second
mixed partial derivative of its autocorrelation function has a delta function. What
gives rise to this delta function?
(c) Does X(t) have a mean square integral? If so, find its mean and autocovariance
functions.
9.84. Let X(t) have autocorrelation function
RX1t2 = s2e-at .
2
(a) Is X(t) mean square continuous?
(b) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation
functions.
(c) Does X(t) have a mean square integral? If so, find its mean and autocorrelation
functions.
(d) Is X(t) a Gaussian random process?
9.85. Let N(t) be the Poisson process. Find E31N1t2 - N1t02224 and use the result to show that
N(t) is mean square continuous.
9.86. Does the pulse amplitude modulation random process discussed in Example 9.38 have a
mean square integral? If so, find its mean and autocovariance functions.
9.87. Show that if X(t) is a mean square continuous random process, then X(t) has a mean
square integral. Hint: Show that
RX1t1 , t22 - RX1t0 , t02 = E31X1t12 - X1t022X1t224 + E3X1t021X1t22 - X1t0224,
and then apply the Schwarz inequality to the two terms on the right-hand side.
9.88. Let Y(t) be the mean square integral of X(t) in the interval (0, t). Show that Y¿1t2 is equal
to X(t) in the mean square sense.
9.89. Let X(t) be a wide-sense stationary random process. Show that E3X1t2X¿1t24 = 0.
9.90. A linear system with input Z(t) is described by
X¿1t2 + aX1t2 = Z1t2
t Ú 0, X102 = 0.
Find the output X(t) if the input is a zero-mean Gaussian random process with autocorrelation function given by
RX1t2 = s2e-bƒtƒ.
Problems
569
Section 9.8: Time Averages of Random Processes and Ergodic Theorems
9.91. Find the variance of the time average given in Example 9.47.
9.92. Are the following processes WSS and mean ergodic?
(a) Discrete-time dice process in Problem 9.2.
(b) Alternating sign process in Problem 9.3.
(c) Xn = sn, for n Ú 0 in Problem 9.4.
9.93. Is the following WSS random process X(t) mean ergodic?
RX1t2 = b
0
511 - ƒ t ƒ 2
ƒtƒ 7 1
ƒ t ƒ … 1.
9.94. Let X1t2 = A cos12pft2, where A is a random variable with mean m and variance s2.
(a) Evaluate 6X1t27 T , find its limit as T : q , and compare to mX1t2.
(b) Evaluate 6X1t + t2X1t27, find its limit as T : q , and compare to RX1t + t, t2.
9.95. Repeat Problem 9.94 with X1t2 = A cos12pft + ®2, where A is as in Problem 9.94, ® is
a random variable uniformly distributed in 10, 2p2, and A and ® are independent random variables.
9.96. Find an exact expression for VAR3 6X1t27 T4 in Example 9.48. Find the limit as T : q .
9.97. The WSS random process Xn has mean m and autocovariance CX1k2 = 11/22ƒkƒ. Is Xn
mean ergodic?
9.98. (a) Are the moving average processes Yn in Problem 9.24 mean ergodic?
(b) Are the autoregressive processes Zn in Problem 9.25a mean ergodic?
9.99. (a) Show that a WSS random process is mean ergodic if
q
L- q
ƒ C1u2 ƒ 6 q .
(b) Show that a discrete-time WSS random process is mean ergodic if
a ƒ C1k2 ƒ 6 q .
q
k = -q
9.100. Let 6X21t27 T denote a time-average estimate for the mean power of a WSS random
process.
(a) Under what conditions is this time average a valid estimate for E3X21t24?
(b) Apply your result in part a for the random phase sinusoid in Example 9.2.
9.101. (a) Under what conditions is the time average 6X1t + t2X1t2 7 T a valid estimate for
the autocorrelation RX1t2 of a WSS random process X(t)?
(b) Apply your result in part a for the random phase sinusoid in Example 9.2.
9.102. Let Y(t) be the indicator function for the event 5a 6 X1t2 … b6, that is,
Y1t2 = b
1
0
if X1t2 H 1a, b4
otherwise.
(a) Show that 6Y1t27 T is the proportion of time in the time interval 1 -T, T2 that
X1t2 H 1a, b4.
570
Chapter 9
Random Processes
Find E36 Y1t27 T4.
Under what conditions does 6Y1t27 T : P3a 6 X1t2 … b4?
How can 6Y1t27 T be used to estimate P3X1t2 … x4?
Apply the result in part d to the random telegraph signal.
Repeat Problem 9.102 for the time average of the discrete-time Yn , which is defined
as the indicator for the event 5a 6 Xn … b46.
(b) Apply your result in part a to an iid discrete-valued random process.
(c) Apply your result in part a to an iid continuous-valued random process.
For n Ú 1, define Zn = u1a - Xn2, where u(x) is the unit step function, that is, Xn = 1 if
and only if Xn … a.
(a) Show that the time average 6Zn 7 N is the proportion of Xn’s that are less than a in
the first N samples.
(b) Show that if the process is ergodic (in some sense), then this time average is equal to
FX1a2 = P3X … a4.
In Example 9.50 show that VAR38Xn9T4 = 1s2212T + 122H - 2.
Plot the covariance function vs. k for the self-similar process in Example 9.50 with s2 = 1
for: H = 0.5, H = 0.6, H = 0.75, H = 0.99. Does the long-range dependence of the
process increase or decrease with H?
(a) Plot the variance of the sample mean given by Eq. (9.110) vs. T with s2 = 1 for:
H = 0.5, H = 0.6, H = 0.75, H = 0.99.
(b) For the parameters in part a, plot 12T + 122H - 1 vs. T, which is the ratio of the variance of the sample mean of a long-range dependent process relative to the variance
of the sample mean of an iid process. How does the long-range dependence manifest
itself, especially for H approaching 1?
(c) Comment on the width of confidence intervals for estimates of the mean of longrange dependent processes relative to those of iid processes.
Plot the variance of the sample mean for a long-range dependent process (Eq. 9.110) vs.
the sample size T in a log-log plot.
(a) What role does H play in the plot?
(b) One of the remarkable indicators of long-range dependence in nature comes from a
set of observations of the minimal water levels in the Nile river for the years
622–1281 [Beran, p. 22] where the log-log plot for part a gives a slope of -0.27. What
value of H corresponds to this slope?
Problem 9.99b gives a sufficient condition for mean ergodicity for discrete-time random
processes. Use the expression in Eq. (9.112) for a long-range dependent process to determine whether the sufficient condition is satisfied. Comment on your findings.
(b)
(c)
(d)
(e)
9.103. (a)
9.104.
9.105.
9.106.
9.107.
9.108.
9.109.
*Section 9.9: Fourier Series and Karhunen-Loeve Expansion
9.110. Let X1t2 = Xejvt where X is a random variable.
(a) Find the correlation function for X(t), which for complex-valued random processes
is defined by RX1t1 , t22 = E3X1t12X*1t224, where * denotes the complex conjugate.
(b) Under what conditions is X(t) a wide-sense stationary random process?
Problems
571
9.111. Consider the sum of two complex exponentials with random coefficients:
X1t2 = X1ejv1t + X2ejv2t
9.112.
9.113.
9.114.
9.115.
9.116.
9.117.
where v1 Z v2 .
(a) Find the covariance function of X(t).
(b) Find conditions on the complex-valued random variables X1, and X2 for X(t) to be
a wide-sense stationary random process.
(c) Show that if we let v1 = -v2 , X1 = 1U - jV2/2 and X2 = 1U + jV2/2, where U
and V are real-valued random variables, then X(t) is a real-valued random process.
Find an expression for X(t) and for the autocorrelation function.
(d) Restate the conditions on X1 and X2 from part b in terms of U and V.
(e) Suppose that in part c, U and V are jointly Gaussian random variables. Show that
X(t) is a Gaussian random process.
(a) Derive Eq. (9.118) for the correlation of the Fourier coefficients for a non-mean
square periodic process X(t).
(b) Show that Eq. (9.118) reduces to Eq. (9.117) when X(t) is WSS and mean square periodic.
Let X(t) be a WSS Gaussian random process with RX1t2 = e-ƒtƒ.
(a) Find the Fourier series expansion for X(t) in the interval [0, T].
(b) What is the distribution of the coefficients in the Fourier series?
Show that the Karhunen-Loeve expansion of a WSS mean-square periodic process X(t)
yields its Fourier series. Specify the orthonormal set of eigenfunctions and the corresponding eigenvalues.
Let X(t) be the white Gaussian noise process introduced in Example 9.43. Show that any
set of orthonormal functions can be used as the eigenfunctions for X(t) in its KarhunenLoeve expansion. What are the eigenvalues?
Let Y1t2 = X1t2 + W1t2, where X(t) and W(t) are orthogonal random processes and
W(t) is a white Gaussian noise process. Let fn1t2 be the eigenfunctions corresponding to
KX1t1 , t22. Show that fn1t2 are also the eigenfunctions for KY1t1 , t22. What is the relation
between the eigenvalues of KX1t1 , t22 and those of KY1t1 , t22?
Let X(t) be a zero-mean random process with autocovariance
RX1t2 = s2e-aƒtƒ.
(a) Write the eigenvalue integral equation for the Karhunen-Loeve expansion of X(t)
on the interval 3 -T, T4.
(b) Differentiate the above integral equation to obtain the differential equation
d2
f1t2 =
dt2
a2 ¢ l - 2
l
s2
≤
a
f1t2.
(c) Show that the solutions to the above differential equation are of the form
f1t2 = A cos bt and f1t2 = B sin bt. Find an expression for b.
572
Chapter 9
Random Processes
(d) Substitute the f1t2 from part c into the integral equation of part a to show that if
f1t2 = A cos bt, then b is the root of tan bT = a/b, and if f1t2 = B sin bt, then b is
the root of tan bT = -b/a.
(e) Find the values of A and B that normalize the eigenfunctions.
*(f ) In order to show that the frequencies of the eigenfunctions are not harmonically related, plot the following three functions versus bT: tan bT, bT/aT, -aT/bT. The intersections of these functions yield the eigenvalues. Note that there are two roots per
interval of length p.
*Section 9.10: Generating Random Processes
9.118. (a) Generate 10 realizations of the binomial counting process with p = 1/4, p = 1/2,
and p = 3/4. For each value of p, plot the sample functions for n = 200 trials.
(b) Generate 50 realizations of the binomial counting process with p = 1/2. Find the
sample mean and sample variance of the realizations for the first 200 trials.
(c) In part b, find the histogram of increments in the process for the interval [1, 50],
[51, 100], [101, 150], and [151, 200]. Compare these histograms to the theoretical
pmf. How would you check to see if the increments in the four intervals are
stationary?
(d) Plot a scattergram of the pairs consisting of the increments in the interval [1, 50] and
[51, 100] in a given realization. Devise a test to check whether the increments in the
two intervals are independent random variables.
9.119. Repeat Problem 9.118 for the random walk process with the same parameters.
9.120. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid unit-variance
Gaussian random variables with mean: m = 0; m = 0.5.
9.121. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid Poisson random variables with a = 1.
9.122. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid Cauchy random variables with a = 1.
9.123. Let Yn = aYn - 1 + Xn where Y0 = 0.
(a) Generate five realizations of the process for a = 1/4, 1/2, 9/10 and with Xn given by
the p = 1/2 and p = 1/4 random step process. Plot the sample functions for the first
200 steps. Find the sample mean and sample variance for the outcomes in each realization. Plot the histogram for outcomes in each realization.
(b) Generate 50 realizations of the process Yn with a = 1/2, p = 1/4, and p = 1/2. Find
the sample mean and sample variance of the realizations for the first 200 trials. Find
the histogram of Yn across the realizations at times n = 5, n = 50, and n = 200.
(c) In part b, find the histogram of increments in the process for the interval [1, 50], [51,
100], [101, 150], and [151, 200]. To what theoretical pmf should these histograms be
compared? Should the increments in the process be stationary? Should the increments be independent?
9.124. Repeat Problem 9.123 for the sum process in Eq. (9.24) where the Xn are iid unit-variance
Gaussian random variables with mean: m = 0; m = 0.5.
Problems
573
9.125. (a) Propose a method for estimating the covariance function of the sum process in
Problem 9.118. Do not assume that the process is wide-sense stationary.
(b) How would you check to see if the process is wide-sense stationary?
(c) Apply the methods in parts a and b to the experiment in Problem 9.118b.
(d) Repeat part c for Problem 9.123b.
9.126. Use the binomial process to approximate a Poisson random process with arrival rate
l = 1 customer per second in the time interval (0, 100]. Try different values of n and
come up with a recommendation on how n should be selected.
9.127. Generate 100 repetitions of the experiment in Example 9.21.
(a) Find the relative frequency of the event P3N1102 = 3 and N1602 - N1452 = 24
and compare it to the theoretical probability.
(b) Find the histogram of the time that elapses until the second arrival and compare it to
the theoretical pdf. Plot the empirical cdf and compare it to the theoretical cdf.
9.128. Generate 100 realizations of the Poisson random process N(t) with arrival rate l = 1
customer per second in the time interval (0, 10]. Generate the pair 1N11t2, N21t22 by assigning arrivals in N(t) to N11t2 with probability p = 0.25 and to N21t2 with probability
0.75.
(a) Find the histograms for N11102 and N21102 and compare them to the theoretical pmf
by performing a chi-square goodness-of-fit test at a 5% significance level.
(b) Perform a chi-square goodness-of-fit test to test whether N11102 and N21102 are independent random variables. How would you check whether N11t2 and N21t2 are
independent random processes?
9.129. Subscribers log on to a system according to a Poisson process with arrival rate l = 1 customer per second. The ith customer remains logged on for a random duration of Ti seconds, where the Ti are iid random variables and are also independent of the arrival times.
(a) Generate the sequence Sn of customer arrival times and the corresponding
departure times given by Dn = Sn + Tn , where the connections times are all equal
to 1.
(b) Plot: A(t), the number of arrivals up to time t; D(t), the number of departures up to
time t; and N1t2 = A1t2 - D1t2, the number in the system at time t.
(c) Perform 100 simulations of the system operation for a duration of 200 seconds. Assume that customer connection times are an exponential random variables with mean
5 seconds. Find the customer departure time instants and the associated departure
counting process D(t). How would you check whether D(t) is a Poisson process? Find
the histograms for D(t) and the number in the system N(t) at t = 50, 100, 150, 200. Try
to fit a pmf to each histogram.
(d) Repeat part c if customer connection times are exactly 5 seconds long.
9.130. Generate 100 realizations of the Wiener process with a = 1 for the interval (0, 3.5) using
the random walk limiting procedure.
(a) Find the histograms for increments in the intervals (0, 0.5], (0.5, 1.5], and (1.5, 3.5]
and compare these to the theoretical pdf.
(b) Perform a test at a 5% significance level to determine whether the increments in the
first two intervals are independent random variables.
574
Chapter 9
Random Processes
9.131. Repeat Problem 9.130 using Gaussian-distributed increments to generate the Wiener
process. Discuss how the increment interval in the simulation should be selected.
Problems Requiring Cumulative Knowledge
9.132. Let X(t) be a random process with independent increments. Assume that the increments
X1t22 - X1t12 are gamma random variables with parameters l 7 0 and a = t2 - t1 .
(a) Find the joint density function of X1t12 and X1t22.
(b) Find the autocorrelation function of X(t).
(c) Is X(t) mean square continuous?
(d) Does X(t) have a mean square derivative?
9.133. Let X(t) be the pulse amplitude modulation process introduced in Example 9.38 with
T = 1. A phase-modulated process is defined by
Y1t2 = a cosa2pt +
p
X1t2b.
2
Plot the sample function of Y(t) corresponding to the binary sequence 0010110.
Find the joint pdf of Y1t12 and Y1t22.
Find the mean and autocorrelation functions of Y(t).
Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process?
Is Y(t) mean square continuous?
Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation
functions.
9.134. Let N(t) be the Poisson process, and suppose we form the phase-modulated process
(a)
(b)
(c)
(d)
(e)
(f)
Y1t2 = a cos12pft + pN1t22.
(a) Plot a sample function of Y(t) corresponding to a typical sample function of N(t).
(b) Find the joint density function of Y1t12 and Y1t22. Hint: Use the independent increments property of N(t).
(c) Find the mean and autocorrelation functions of Y(t).
(d) Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process?
(e) Is Y(t) mean square continuous?
(f) Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation
functions.
9.135. Let X(t) be a train of amplitude-modulated pulses with occurrences according to a Poisson process:
X1t2 = a Akh1t - Sk2,
q
k=1
where the Ak are iid random variables, the Sk are the event occurrence times in a Poisson
process, and h(t) is a function of time. Assume the amplitudes and occurrence times are
independent.
(a) Find the mean and autocorrelation functions of X(t).
(b) Evaluate part a when h1t2 = u1t2, a unit step function.
(c) Evaluate part a when h1t2 = p1t2, a rectangular pulse of duration T seconds.
Problems
575
9.136. Consider a linear combination of two sinusoids:
X1t2 = A 1 cos1v0t + ® 12 + A 2 cos1 22v0t + ® 22,
where ® 1 and ® 2 are independent uniform random variables in the interval 10, 2p2, and
A1 and A2 are jointly Gaussian random variables. Assume that the amplitudes are independent of the phase random variables.
(a) Find the mean and autocorrelation functions of X(t).
(b) Is X(t) mean square periodic? If so, what is the period?
(c) Find the joint pdf of X1t12 and X1t22.
9.137. (a) A Gauss-Markov random process is a Gaussian random process that is also a Markov
process. Show that the autocovariance function of such a process must satisfy
CX1t3 , t12 =
CX1t3 , t22CX1t2 , t12
CX1t2 , t22
,
where t1 … t2 … t3 .
(b) It can be shown that if the autocovariance of a Gaussian random process satisfies
the above equation, then the process is Gauss-Markov. Is the Wiener process GaussMarkov? Is the Ornstein-Uhlenbeck process Gauss-Markov?
9.138. Let An and Bn be two independent stationary random processes. Suppose that An and Bn
are zero-mean, Gaussian random processes with autocorrelation functions
RA1k2 = s21r1ƒk ƒ
RB1k2 = s22r2ƒkƒ.
A block multiplexer takes blocks of two from the above processes and interleaves them
to form the random process Ym:
A1A2B1B2A3A4B3B4A5A6B5B6 Á .
Find the autocorrelation function of Ym .
Is Ym cyclostationary? wide-sense stationary?
Find the joint pdf of Ym and Ym + 1 .
Let Zm = Ym + T , where T is selected uniformly from the set 50, 1, 2, 36. Repeat
parts a, b, and c for Zm .
9.139. Let An be the Gaussian random process in Problem 9.138. A decimator takes every other
sample to form the random process Vm:
(a)
(b)
(c)
(d)
A1A3A5A7A9A11
(a) Find the autocorrelation function of Vm .
(b) Find the joint pdf of Vm and Vm + k.
(c) An interpolator takes the sequence Vm and inserts zeros between samples to form
the sequence Wk :
A 10A 30A 50A 70A 90A 11 Á .
Find the autocorrelation function of Wk . Is Wk a Gaussian random process?
576
Chapter 9
Random Processes
9.140. Let An be a sequence of zero-mean, unit-variance independent Gaussian random variables.
A block coder takes pairs of A’s and linearly transforms them to form the sequence Yn:
B
Y2n
1 1
R =
B
Y2n + 1
22 1
1
A
R B 2n R .
-1 A2n + 1
(a) Find the autocorrelation function of Yn .
(b) Is Yn stationary in any sense?
(c) Find the joint pdf of Yn , Yn + 1 , and Yn + 2 .
9.141. Suppose customer orders arrive according to a Bernoulli random process with parameter p.
When an order arrives, its size is an exponential random variable with parameter l. Let Sn
be the total size of all orders up to time n.
(a) Find the mean and autocorrelation functions of Sn .
(b) Is Sn a stationary random process?
(c) Is Sn a Markov process?
(d) Find the joint pdf of Sn and Sn + k .
CHAPTER
Analysis and
Processing of Random
Signals
10
In this chapter we introduce methods for analyzing and processing random signals. We
cover the following topics:
• Section 10.1 introduces the notion of power spectral density, which allows us to
view random processes in the frequency domain.
• Section 10.2 discusses the response of linear systems to random process inputs
and introduce methods for filtering random processes.
• Section 10.3 considers two important applications of signal processing: sampling
and modulation.
• Sections 10.4 and 10.5 discuss the design of optimum linear systems and introduce the Wiener and Kalman filters.
• Section 10.6 addresses the problem of estimating the power spectral density of a
random process.
• Finally, Section 10.7 introduces methods for implementing and simulating the
processing of random signals.
10.1
POWER SPECTRAL DENSITY
The Fourier series and the Fourier transform allow us to view deterministic time functions as the weighted sum or integral of sinusoidal functions. A time function that
varies slowly has the weighting concentrated at the low-frequency sinusoidal components. A time function that varies rapidly has the weighting concentrated at higher-frequency components. Thus the rate at which a deterministic time function varies is
related to the weighting function of the Fourier series or transform. This weighting
function is called the “spectrum” of the time function.
The notion of a time function as being composed of sinusoidal components is also
very useful for random processes. However, since a sample function of a random
process can be viewed as being selected from an ensemble of allowable time functions,
the weighting function or “spectrum” for a random process must refer in some way to
the average rate of change of the ensemble of allowable time functions. Equation
(9.66) shows that, for wide-sense stationary processes, the autocorrelation function
577
578
Chapter 10
Analysis and Processing of Random Signals
RX1t2 is an appropriate measure for the average rate of change of a random process.
Indeed if a random process changes slowly with time, then it remains correlated with itself for a long period of time, and RX1t2 decreases slowly as a function of t. On the
other hand, a rapidly varying random process quickly becomes uncorrelated with itself,
and RX1t2 decreases rapidly with t.
We now present the Einstein-Wiener-Khinchin theorem, which states that the
power spectral density of a wide-sense stationary random process is given by the Fourier transform of the autocorrelation function.1
10.1.1 Continuous-Time Random Processes
Let X(t) be a continuous-time WSS random process with mean mX and autocorrelation function RX1t2. Suppose we take the Fourier transform of a sample of X(t) in the
interval 0 6 t 6 T as follows
'
x1f2 =
L0
T
X1t¿2e -j2pft¿ dt¿.
(10.1)
We then approximate the power density as a function of frequency by the function:
T
T
1 '
1' '
1
'
ƒ x1f2 ƒ 2 = x1f2x …1f2 = b X1t¿2e -j2pft¿ dt¿ r b
p T1f2 =
X1t¿2ej2pft¿ dt¿ r ,
T
T
T L0
L0
(10.2)
'
where * denotes the complex conjugate. X(t) is a random process, so pT1f2 is also a
'
random process but over a different index set. pT1f2 is called the periodogram estimate and we are interested in the power spectral density of X(t) which is defined by:
1
'
'
SX1f2 = lim E3pT1f24 = lim E3 ƒ x1f2 ƒ 24.
T: q T
T: q
(10.3)
We show at the end of this section that the power spectral density of X(t) is given by the
Fourier transform of the autocorrelation function:
SX1f2 = f5RX1t26 =
q
L- q
RX1t2e -j2pft dt.
(10.4)
A table of Fourier transforms and its properties is given in Appendix B.
For real-valued random processes, the autocorrelation function is an even
function of t:
RX1t2 = RX1-t2.
(10.5)
1
This result is usually called the Wiener-Khinchin theorem, after Norbert Wiener and A. Ya. Khinchin, who
proved the result in the early 1930s. Later it was discovered that this result was stated by Albert Einstein in a
1914 paper (see Einstein).
Section 10.1 Power Spectral Density
579
Substitution into Eq. (10.4) implies that
SX1f2 =
q
L- q
q
=
L- q
RX1t25cos 2pft - j sin 2pft6 dt
RX1t2 cos 2pft dt,
(10.6)
since the integral of the product of an even function 1RX1t22 and an odd function
1sin 2pft2 is zero. Equation (10.6) implies that SX1f2 is real-valued and an even function of f. From Eq. (10.2) we have that SX1f2 is nonnegative:
SX1f2 Ú 0
for all f.
(10.7)
The autocorrelation function can be recovered from the power spectral density
by applying the inverse Fourier transform formula to Eq. (10.4):
RX1t2 = f-15SX1f26
q
=
L- q
SX1f2ej2pft df.
(10.8)
Equation (10.8) is identical to Eq. (4.80), which relates the pdf to its corresponding
characteristic function. The last section in this chapter discusses how the FFT can be
used to perform numerical calculations for SX1f2 and RX1t2.
In electrical engineering it is customary to refer to the second moment of X(t) as
the average power of X(t).2 Equation (10.8) together with Eq. (9.64) gives
E3X21t24 = RX102 =
q
L- q
SX1f2 df.
(10.9)
Equation (10.9) states that the average power of X(t) is obtained by integrating SX1f2
over all frequencies. This is consistent with the fact that SX1f2 is the “density of power”
of X(t) at the frequency f.
Since the autocorrelation and autocovariance functions are related by RX1t2 =
CX1t2 + m2X , the power spectral density is also given by
SX1f2 = f5CX1t2 + m2X6
= f5CX1t26 + m2X d1f2,
(10.10)
where we have used the fact that the Fourier transform of a constant is a delta function. We say the mX is the “dc” component of X(t).
The notion of power spectral density can be generalized to two jointly wide-sense
stationary processes. The cross-power spectral density SX,Y1 f 2 is defined by
SX,Y1f2 = f5RX,Y1t26,
(10.11)
2
If X(t) is a voltage or current developed across a 1-ohm resistor, then X21t2 is the instantaneous power absorbed by the resistor.
580
Chapter 10
Analysis and Processing of Random Signals
SX( f )
1
a⫽1
a⫽2
⫺5 ⫺4
p
p
⫺3
p
⫺2
p
⫺1
p
0
1
p
2
p
3
p
4
p
5
p
f
FIGURE 10.1
Power spectral density of a random telegraph signal with a = 1 and
a = 2 transitions per second.
where RX,Y1t2 is the cross-correlation between X(t) and Y(t):
RX,Y1t2 = E3X1t + t2Y1t24.
(10.12)
In general, SX,Y1f2 is a complex function of f even if X(t) and Y(t) are both real-valued.
Example 10.1 Random Telegraph Signal
Find the power spectral density of the random telegraph signal.
In Example 9.24, the autocorrelation function of the random telegraph process was
found to be
RX1t2 = e -2aƒtƒ,
where a is the average transition rate of the signal. Therefore, the power spectral density of the
process is
SX1f2 =
0
L- q
e2ate -j2pft dt +
L0
q
e -2ate -j2pft dt
=
1
1
+
2a - j2pf
2a + j2pf
=
4a
.
4a + 4p2f2
2
(10.13)
Figure 10.1 shows the power spectral density for a = 1 and a = 2 transitions per second. The
process changes two times more quickly when a = 2; it can be seen from the figure that the
power spectral density for a = 2 has greater high-frequency content.
Example 10.2 Sinusoid with Random Phase
Let X1t2 = a cos12pf0t + ®2, where ® is uniformly distributed in the interval 10, 2p2. Find
SX1f2.
Section 10.1 Power Spectral Density
581
From Example 9.10, the autocorrelation for X(t) is
RX1t2 =
a2
cos 2pf0t.
2
Thus, the power spectral density is
SX1f2 =
=
a2
f5cos 2pf0t6
2
a2
a2
d1f - f02 +
d1f + f02,
4
4
(10.14)
where we have used the table of Fourier transforms in Appendix B. The signal has average power
RX102 = a2>2. All of this power is concentrated at the frequencies ;f0 , so the power density at
these frequencies is infinite.
Example 10.3 White Noise
The power spectral density of a WSS white noise process whose frequency components are limited to the range -W … f … W is shown in Fig. 10.2(a). The process is said to be “white” in analogy to white light, which contains all frequencies in equal amounts. The average power in this
SX ( f )
N0 /2
⫺W
f
W
(a)
RX(t)
N0W
τ
⫺4
2W
⫺3
2W
⫺2
2W
⫺1
2W
0
1
2W
2
2W
3
2W
4
2W
(b)
FIGURE 10.2
Bandlimited white noise: (a) power spectral density, (b) autocorrelation
function.
582
Chapter 10
Analysis and Processing of Random Signals
process is obtained from Eq. (10.9):
E3X21t24 =
W
N0
df = N0 W.
L-W 2
(10.15)
The autocorrelation for this process is obtained from Eq. (10.8):
RX1t2 =
=
=
W
1
N
ej2pft df
2 0 L-W
1 e -j2pWt - ej2pWt
N0
2
-j2pt
N0 sin12pWt2
2pt
.
(10.16)
RX1t2 is shown in Fig. 10.2(b). Note that X(t) and X1t + t2 are uncorrelated at t = ;k>2W,
k = 1, 2, Á .
The term white noise usually refers to a random process W(t) whose power spectral density is N0>2 for all frequencies:
N0
(10.17)
SW1f2 =
for all f.
2
Equation (10.15) with W = q shows that such a process must have infinite average power. By taking the limit W : q in Eq. (10.16), we find that the autocorrelation of such a process approaches
RW1t2 =
N0
d1t2.
2
(10.18)
If W(t) is a Gaussian random process, we then see that W(t) is the white Gaussian noise process
introduced in Example 9.43 with a = N0>2.
Example 10.4 Sum of Two Processes
Find the power spectral density of Z1t2 = X1t2 + Y1t2, where X(t) and Y(t) are jointly WSS
processes.
The autocorrelation of Z(t) is
RZ1t2 = E3Z1t + t2Z1t24 = E31X1t + t2 + Y1t + t221X1t2 + Y1t224
= RX1t2 + RYX1t2 + RXY1t2 + RY1t2.
The power spectral density is then
SZ1f2 = f5RX1t2 + RYX1t2 + RXY1t2 + RY1t26
= SX1f2 + SYX1f2 + SXY1f2 + SY1f2.
(10.19)
Example 10.5
Let Y1t2 = X1t - d2, where d is a constant delay and where X(t) is WSS. Find RYX1t2,
SYX1f2, RY1t2, and SY1f2.
Section 10.1 Power Spectral Density
583
The definitions of RYX1t2, SYX1f2, and RY1t2 give
RYX1t2 = E3Y1t + t2X1t24 = E3X1t + t - d2X1t24 = RX1t - d2.
(10.20)
The time-shifting property of the Fourier transform gives
SYX1f2 = f5RX1t - d26 = SX1f2e -j2pfd
= SX1f2 cos12pfd2 - jSX1f2 sin12pfd2.
(10.21)
Finally,
RY1t2 = E3Y1t + t2Y1t24 = E3X1t + t - d2X1t - d24 = RX1t2.
(10.22)
Equation (10.22) implies that
SY1f2 = f5RY1T26 = f5RX1T26 = SX1f2.
(10.23)
Note from Eq. (10.21) that the cross-power spectral density is complex. Note from Eq. (10.23)
that SX1f2 = SY1f2 despite the fact that X1t2 Z Y1t2. Thus, SX1f2 = SY1f2 does not imply that
X1t2 = Y1t2.
10.1.2 Discrete-Time Random Processes
Let Xn be a discrete-time WSS random process with mean mX and autocorrelation
function RX1k2. The power spectral density of Xn is defined as the Fourier transform of
the autocorrelation sequence
SX1f2 = f5RX1k26
= a RX1k2e -j2pfk.
q
q
(10.24)
k=-
Note that we need only consider frequencies in the range -1>2 6 f … 1>2, since SX1f2
is periodic in f with period 1. As in the case of continuous random processes, SX1f2 can
be shown to be a real-valued, nonnegative, even function of f.
The inverse Fourier transform formula applied to Eq. (10.23) implies that3
RX1k2 =
1>2
L-1>2
SX1f2ej2pfk df.
(10.25)
Equations (10.24) and (10.25) are similar to the discrete Fourier transform. In the last
section we show how to use the FFT to calculate SX1f2 and RX1k2.
The cross-power spectral density SX, Y1f 2 of two jointly WSS discrete-time
processes Xn and Yn is defined by
SX,Y1f2 = f5RX,Y1k26,
(10.26)
RX,Y1k2 = E3Xn + kYn4.
(10.27)
where RX,Y1k2 is the cross-correlation between Xn and Yn :
You can view RX1k2 as the coefficients of the Fourier series of the periodic function SX1f2.
3
584
Chapter 10
Analysis and Processing of Random Signals
Example 10.6 White Noise
Let the process Xn be a sequence of uncorrelated random variables with zero mean and variance
s2X . Find SX1f2.
The autocorrelation of this process is
RX1k2 = b
s2X
0
k = 0
k Z 0.
The power spectral density of the process is found by substituting RX1k2 into Eq. (10.24):
SX1f2 = s2X
-
1
1
6 f 6 .
2
2
(10.28)
Thus the process Xn contains all possible frequencies in equal measure.
Example 10.7 Moving Average Process
Let the process Yn be defined by
Yn = Xn + aXn - 1 ,
(10.29)
where Xn is the white noise process of Example 10.6. Find SY1f2.
It is easily shown that the mean and autocorrelation of Yn are given by
E3Yn4 = 0,
and
11 + a22s2X
E3YnYn + k4 = c as2X
0
k = 0
k = ;1
otherwise.
(10.30)
The power spectral density is then
SY1f2 = 11 + a22s2X + as2X5ej2pf + e -j2pf6
= s2X511 + a22 + 2a cos 2pf6.
(10.31)
SY1f2 is shown in Fig. 10.3 for a = 1.
Example 10.8 Signal Plus Noise
Let the observation Zn be given by
Zn = Xn + Yn ,
where Xn is the signal we wish to observe, Yn is a white noise process with power s2Y , and Xn and
Yn are independent random processes. Suppose further that Xn = A for all n, where A is a random variable with zero mean and variance s2A . Thus Zn represents a sequence of noisy measurements of the random variable A. Find the power spectral density of Zn .
The mean and autocorrelation of Zn are
E3Zn4 = E3A4 + E3Yn4 = 0
Section 10.1 Power Spectral Density
585
SY ( f )
4σX 2
⫺1
⫺
1
2
0
1
2
1
f
FIGURE 10.3
Power spectral density of moving average process discussed in Example 10.7.
and
E3ZnZn + k4 = E31Xn + Yn21Xn + k + Yn + k24
= E3XnXn + k4 + E3Xn4E3Yn + k4
+ E3Xn + k4E3Yn4 + E3YnYn + k4
= E3A24 + RY1k2.
Thus Zn is also a WSS process.
The power spectral density of Zn is then
SZ1f2 = E3A24d1f2 + SY1f2,
where we have used the fact that the Fourier transform of a constant is a delta function.
10.1.3 Power Spectral Density as a Time Average
In the above discussion, we simply stated that the power spectral density is given as the
Fourier transform of the autocorrelation without supplying a proof. We now show how
the power spectral density arises naturally when we take Fourier transforms of realizations of random processes.
Let X0 , Á , Xk - 1 be k observations from the discrete-time, WSS process Xn . Let
'
xk1f2 denote the discrete Fourier transform of this sequence:
'
xk1f2 = a Xme -j2pfm.
k-1
(10.32)
m=0
'
'
Note that xk1f2 is a complex-valued random variable. The magnitude squared of xk1f2 is
a measure of the “energy” at the frequency f. If we divide this energy by the total “time” k,
we obtain an estimate for the “power” at the frequency f :
1 '
'
pk1f2 = ƒ xk1f2 ƒ 2.
k
'
pk1f2 is called the periodogram estimate for the power spectral density.
(10.33)
586
Chapter 10
Analysis and Processing of Random Signals
Consider the expected value of the periodogram estimate:
1 '
'
'
E3pk1f24 = E3xk1f2x *k1f24
k
=
k-1
k-1
1
E B a Xme -j2pfm a Xiej2pfi R
k
i=0
m=0
=
1 k-1 k-1
-j2pf1m - i2
a E3XmXi4e
k ma
=0 i=0
=
1 k-1 k-1
-j2pf1m - i2
.
a RX1m - i2e
k ma
=0 i=0
(10.34)
Figure 10.4 shows the range of the double summation in Eq. (10.34). Note that all the terms
along the diagonal m¿ = m - i are equal, that m¿ ranges from -1k - 12 to k - 1,
and that .here are k - ƒ m¿ ƒ terms along the diagonal m¿ = m - i. Thus Eq. (10.34) becomes
1
'
E3pk1f24 =
k
a
k-1
m¿ = -1k - 12
a
k-1
=
5k - ƒ m¿ ƒ 6RX1m¿2e -j2pfm¿
m¿ = -1k - 12
e1 -
ƒ m¿ ƒ
fRX1m¿2e -j2pfm¿.
k
(10.35)
Comparison of Eq. (10.35) with Eq. (10.24) shows that the mean of the periodogram
estimate is not equal to SX1f2 for two reasons. First, Eq. (10.34) does not have the term
in brackets in Eq. (10.25). Second, the limits of the summation in Eq. (10.35) are not
'
; q . We say that pk1f2 is a “biased” estimator for SX1f2. However, as k : q , we see
i
–
(k
1)
⫽
m
–
i
0
⫽
m
–
i
k–1
k–
k–1
FIGURE 10.4
Range of summation in Eq. (10.34).
1
⫽
m
–
i
m
Section 10.2 Response of Linear Systems to Random Signals
587
that the term in brackets approaches one, and that the limits of the summation approach
; q . Thus
'
(10.36)
E3pk1f24 : SX1f2
as k : q ,
that is, the mean of the periodogram estimate does indeed approach SX1f2. Note
'
that Eq. (10.36) shows that SX1f2 is nonnegative for all f, since pk1f2 is nonnegative
for all f.
In order to be useful, the variance of the periodogram estimate should also approach zero. The answer to this question involves looking more closely at the problem
of power spectral density estimation. We defer this topic to Section 10.6.
All of the above results hold for a continuous-time WSS random process X(t)
after appropriate changes are made from summations to integrals. The periodogram
estimate for SX1 f 2, for an observation in the interval 0 6 t 6 T, was defined in Eq.
10.2. The same derivation that led to Eq. (10.35) can be used to show that the mean of
the periodogram estimate is given by
'
E3pT1f24 =
T
L-T
e1 -
ƒtƒ
fRX1t2e -j2pft dt.
T
(10.37a)
It then follows that
'
E3pT1f24 : SX1f2
10.2
as T : q .
(10.37b)
RESPONSE OF LINEAR SYSTEMS TO RANDOM SIGNALS
Many applications involve the processing of random signals (i.e., random processes)
in order to achieve certain ends. For example, in prediction, we are interested in predicting future values of a signal in terms of past values. In filtering and smoothing, we
are interested in recovering signals that have been corrupted by noise. In modulation,
we are interested in converting low-frequency information signals into high-frequency transmission signals that propagate more readily through various transmission
media.
Signal processing involves converting a signal from one form into another. Thus a
signal processing method is simply a transformation or mapping from one time function into another function. If the input to the transformation is a random process, then
the output will also be a random process. In the next two sections, we are interested in
determining the statistical properties of the output process when the input is a widesense stationary random process.
10.2.1 Continuous-Time Systems
Consider a system in which an input signal x(t) is mapped into the output signal y(t) by
the transformation
y1t2 = T3x1t24.
The system is linear if superposition holds, that is,
T3ax11t2 + bx21t24 = aT3x11t24 + bT3x21t24,
588
Chapter 10
Analysis and Processing of Random Signals
where x11t2 and x21t2 are arbitrary input signals, and a and b are arbitrary constants.4
Let y(t) be the response to input x(t), then the system is said to be time-invariant if the
response to x1t - t2 is y1t - t2. The impulse response h(t) of a linear, time-invariant
system is defined by
h1t2 = T3d1t24
where d1t2 is a unit delta function input applied at t = 0. The response of the system to
an arbitrary input x(t) is then
q
y1t2 = h1t2 * x1t2 =
L- q
q
h1s2x1t - s2 ds =
L- q
h1t - s2x1s2 ds.
(10.38)
Therefore a linear, time-invariant system is completely specified by its impulse response. The impulse response h(t) can also be specified by giving its Fourier transform,
the transfer function of the system:
H1f2 = f5h1t26 =
q
L- q
h1t2e -j2pft dt.
(10.39)
A system is said to be causal if the response at time t depends only on past values of the
input, that is, if h1t2 = 0 for t 6 0.
If the input to a linear, time-invariant system is a random process X(t) as shown
in Fig. 10.5, then the output of the system is the random process given by
q
Y1t2 =
L- q
q
h1s2X1t - s2 ds =
L- q
h1t - s2X1s2 ds.
(10.40)
We assume that the integrals exist in the mean square sense as discussed in Section 9.7.
We now show that if X(t) is a wide-sense stationary process, then Y(t) is also widesense stationary.5
The mean of Y(t) is given by
q
q
E3Y1t24 = E B
L- q
h1s2X1t - s2 ds R =
X(t)
h(t)
L- q
h1s2E3X1t - s24 ds.
Y(t)
FIGURE 10.5
A linear system with a random input
signal.
4
For examples of nonlinear systems see Problems 9.11 and 9.56.
Equation (10.40) supposes that the input was applied at an infinite time in the past. If the input is applied at
t = 0, then Y(t) is not wide-sense stationary. However, it becomes wide-sense stationary as the response
reaches “steady state” (see Example 9.46 and Problem 10.29).
5
Section 10.2 Response of Linear Systems to Random Signals
589
Now mX = E3X1t - t24 since X(t) is wide-sense stationary, so
q
h1t2 dt = mXH102,
(10.41)
L- q
where H( f ) is the transfer function of the system. Thus the mean of the output Y(t) is
the constant mY = H102mX .
The autocorrelation of Y(t) is given by
E3Y1t24 = mX
q
E3Y1t2Y1t + t24 = E B
L- q
q
=
L- q
h1r2X1t + t - r2 dr R
q
L- q L- q
q
=
q
h1s2X1t - s2 ds
q
L- q L- q
h1s2h1r2E3X1t - s2X1t + t - r24 ds dr
h1s2h1r2RX1t + s - r2 ds dr,
(10.42)
where we have used the fact that X(t) is wide-sense stationary. The expression on the
right-hand side of Eq. (10.42) depends only on t. Thus the autocorrelation of Y(t) depends only on t, and since the E[Y(t)] is a constant, we conclude that Y(t) is a widesense stationary process.
We are now ready to compute the power spectral density of the output of a linear,
time-invariant system. Taking the transform of RY1t2 as given in Eq. (10.42), we obtain
SY1f2 =
q
L- q
q
=
RY1t2e -j2pft dt
q
q
L- q L- q L- q
h1s2h1r2RX1t + s - r2e -j2pft ds dr dt.
Change variables, letting u = t + s - r:
SY1f2 =
q
q
q
L- q L- q L- q
h1s2h1r2RX1u2e -j2pf1u - s + r2 ds dr du
q
q
L- q
L- q
…
= H 1f2H1f2SX1f2
=
h1s2ej2pfs ds
= ƒ H1f2 ƒ 2 SX1f2,
q
h1r2e -j2pfr dr
L- q
RX1u2e -j2pfu du
(10.43)
where we have used the definition of the transfer function. Equation (10.43) relates the
input and output power spectral densities to the system transfer function. Note that
RY1t2 can also be found by computing Eq. (10.43) and then taking the inverse Fourier
transform.
Equations (10.41) through (10.43) only enable us to determine the mean and autocorrelation function of the output process Y(t). In general this is not enough to determine probabilities of events involving Y(t). However, if the input process is a
590
Chapter 10
Analysis and Processing of Random Signals
Gaussian WSS random process, then as discussed in Section 9.7 the output process will
also be a Gaussian WSS random process. Thus the mean and autocorrelation function
provided by Eqs. (10.41) through (10.43) are enough to determine all joint pdf’s involving the Gaussian random process Y(t).
The cross-correlation between the input and output processes is also of interest:
RY,X1t2 = E3Y1t + t2X1t24
q
= E B X1t2
L- q
X1t + t - r2h1r2 dr R
q
=
L- q
q
=
L- q
E3X1t2X1t + t - r24h1r2 dr
RX1t - r2h1r2 dr
= RX1t2 * h1t2.
(10.44)
By taking the Fourier transform, we obtain the cross-power spectral density:
SY,X1f2 = H1f2SX1f2.
(10.45a)
SX,Y1f2 = S…Y,X1f2 = H …1f2SX1f2.
(10.45b)
Since RX,Y1t2 = RY,X1-t2, we have that
Example 10.9 Filtered White Noise
Find the power spectral density of the output of a linear, time-invariant system whose input is a
white noise process.
Let X(t) be the input process with power spectral density
SX1f2 =
N0
2
for all f.
The power spectral density of the output Y(t) is then
SY1f2 = ƒ H1f2 ƒ 2
N0
.
2
(10.46)
Thus the transfer function completely determines the shape of the power spectral density of the
output process.
Example 10.9 provides us with a method for generating WSS processes with arbitrary power spectral density SY1f2. We simply need to filter white noise through a filter
with transfer function H1f2 = 2SY1f2. In general this filter will be noncausal. We can
usually, but not always, obtain a causal filter with transfer function H( f ) such that
SY1f2 = H1f2H …1f2. For example, if SY1f2 is a rational function, that is, if it consists of
the ratio of two polynomials, then it is easy to factor SX1f2 into the above form, as
Section 10.2 Response of Linear Systems to Random Signals
591
shown in the next example. Furthermore any power spectral density can be approximated by a rational function. Thus filtered white noise can be used to synthesize WSS
random processes with arbitrary power spectral densities, and hence arbitrary autocorrelation functions.
Example 10.10
Ornstein-Uhlenbeck Process
Find the impulse response of a causal filter that can be used to generate a Gaussian random
process with output power spectral density and autocorrelation function
SY1f2 =
s2
a + 4p2f2
2
and
RY1t2 =
s2 -aƒtƒ
e
2a
This power spectral density factors as follows:
SY1f2 =
1
1
s2.
1a - j2pf2 1a + j2pf2
If we let the filter transfer function be H1f2 = 1>1a + j2pf2, then the impulse response is
h1t2 = e -at
for t Ú 0,
which is the response of a causal system. Thus if we filter white Gaussian noise with power spectral density s2 using the above filter, we obtain a process with the desired power spectral density.
In Example 9.46, we found the autocorrelation function of the transient response of this
filter for a white Gaussian noise input (see Eq. (9.97a)). As was already indicated, when dealing
with power spectral densities we assume that the processes are in steady state. Thus as t : q
Eq. (9.97a) approaches Eq. (9.97b).
Example 10.11
Ideal Filters
Let Z1t2 = X1t2 + Y1t2, where X(t) and Y(t) are independent random processes with power
spectral densities shown in Fig. 10.6(a). Find the output if Z(t) is input into an ideal lowpass filter
with transfer function shown in Fig. 10.6(b). Find the output if Z(t) is input into an ideal bandpass filter with transfer function shown in Fig. 10.6(c).
The power spectral density of the output W(t) of the lowpass filter is
SW1f2 = ƒ HLP1f2 ƒ 2SX1f2 + ƒ HLP1f2 ƒ 2SY1f2 = SX1f2,
since HLP1f2 = 1 for the frequencies where SX1f2 is nonzero, and HLP1f2 = 0 where SY1f2 is
nonzero. Thus W(t) has the same power spectral density as X(t). As indicated in Example 10.5,
this does not imply that W1t2 = X1t2.
To show that W1t2 = X1t2, in the mean square sense, consider D1t2 = W1t2 - X1t2. It is
easily shown that
RD1t2 = RW1t2 - RWX1t2 - RXW1t2 + RX1t2.
The corresponding power spectral density is
SD1f2 = SW1f2 - SWX1f2 - SXW1f2 + SX1f2
= ƒ HLP1f2 ƒ 2SX1f2 - HLP1f2SX1f2 - H …LP1f2SX1f2 + SX1f2
= 0.
592
Chapter 10
Analysis and Processing of Random Signals
SY ( f )
SX ( f )
⫺W2 ⫺f0 ⫺W1
W
0
(a)
W
W1 f0
W2
f
HLP( f )
1
⫺W
0
(b)
f
W
HBP( f )
1
1
⫺W2 ⫺f0 ⫺W1
0
(c)
⫹W1
f0
⫹W2
f
FIGURE 10.6
(a) Input signal to filters is X1t2 + Y1t2, (b) lowpass filter, (c) bandpass filter.
Therefore RD1t2 = 0 for all t, and W1t2 = X1t2 in the mean square sense since
E31W1t2 - X1t2224 = E3D21t24 = RD102 = 0.
Thus we have shown that the lowpass filter removes Y(t) and passes X(t). Similarly, the bandpass
filter removes X(t) and passes Y(t).
Example 10.12
A random telegraph signal is passed through an RC lowpass filter which has transfer function
H1f2 =
b
,
b + j2pf
where b = 1>RC is the time constant of the filter. Find the power spectral density and autocorrelation of the output.
Section 10.2 Response of Linear Systems to Random Signals
593
In Example 10.1, the power spectral density of the random telegraph signal with transition
rate a was found to be
4a
.
SX1f2 =
4a2 + 4p2f2
From Eq. (10.43) we have
SY1f2 = ¢
=
b2
2
2 2≤ ¢
b + 4p f
4a
≤
4a + 4p2f2
2
4ab 2
1
1
- 2
b
r.
b - 4a2 4a2 + 4p2f2
b + 4p2f2
2
RY1t2 is found by inverting the above expression:
RY1t2 =
1
5b 2e -2aƒtƒ - 2abe -bƒtƒ6.
b 2 - 4a2
10.2.2 Discrete-Time Systems
The results obtained above for continuous-time signals also hold for discrete-time signals after appropriate changes are made from integrals to summations.
Let the unit-sample response hn be the response of a discrete-time, linear, timeinvariant system to a unit-sample input dn:
dn = b
n = 0
n Z 0.
1
0
(10.47)
The response of the system to an arbitrary input random process Xn is then given by
Yn = hn *Xn = a hjXn - j = a hn - jXj .
q
q
q
q
j=-
j=-
(10.48)
Thus discrete-time, linear, time-invariant systems are determined by the unit-sample
response hn . The transfer function of such a system is defined by
H1f2 = a hie -j2pfi.
q
q
(10.49)
i=-
The derivation from the previous section can be used to show that if Xn is a widesense stationary process, then Yn is also wide-sense stationary.The mean of Yn is given by
mY = mX a hj = mXH102.
q
q
(10.50)
j=-
The autocorrelation of Yn is given by
RY1k2 = a a hjhiRX1k + j - i2.
q
q
q
q
j=-
i=-
(10.51)
594
Chapter 10
Analysis and Processing of Random Signals
By taking the Fourier transform of RY1k2 it is readily shown that the power spectral
density of Yn is
(10.52)
SY1f2 = ƒ H1f2 ƒ 2SX1f2.
This is the same equation that was found for continuous-time systems.
Finally, we note that if the input process Xn is a Gaussian WSS random process,
then the output process Yn is also a Gaussian WSS random whose statistics are completely determined by the mean and autocorrelation function provided by Eqs. (10.50)
through (10.52).
Example 10.13 Filtered White Noise
Let Xn be a white noise sequence with zero mean and average power s2X. If Xn is the input to a
linear, time-invariant system with transfer function H( f ), then the output process Yn has power
spectral density:
(10.53)
SY1f2 = ƒ H1f2 ƒ 2s2X.
Equation (10.53) provides us with a method for generating discrete-time random processes with arbitrary power spectral densities or autocorrelation functions. If the power spectral density can be written as a rational function of z = ej2pf
in Eq. (10.24), then a causal filter can be found to generate a process with the
power spectral density. Note that this is a generalization of the methods presented
in Section 6.6 for generating vector random variables with arbitrary covariance
matrix.
Example 10.14
First-Order Autoregressive Process
A first-order autoregressive (AR) process Yn with zero mean is defined by
Yn = aYn - 1 + Xn ,
(10.54)
s2X.
where Xn is a zero-mean white noise input random process with average power
Note that Yn
can be viewed as the output of the system in Fig. 10.7(a) for an iid input Xn . Find the power spectral density and autocorrelation of Yn .
The unit-sample response can be determined from Eq. (10.54):
0
hn = c 1
an
n 6 0
n = 0
n 7 0.
Note that we require ƒ a ƒ 6 1 for the system to be stable.6 Therefore the transfer function is
H1f2 = a ane -j2pfn =
q
n=0
1
1 - ae -j2pf
.
6
A system is said to be stable if a n ƒ hn ƒ 6 q . The response of a stable system to any bounded input is also
bounded.
Section 10.2 Response of Linear Systems to Random Signals
⫹
Xn
595
Yn
⫻
delay
a
(a)
⫹
Xn
delay
Yn
⫻
⫻
b1
a1
⫻
⫻
b2
a2
Xn⫺1
delay
⫹
delay
Yn⫺1
delay
delay
delay
Xn⫺p
⫻
⫻
bp
aq
Yn⫺q
(b)
FIGURE 10.7
(a) Generation of AR process; (b) Generation of ARMA process.
Equation (10.52) then gives
SY1f2 =
=
11 - ae
s2X
-j2pf
211 - aej2pf2
s2X
1 + a - 1ae -j2pf + aej2pf2
2
=
s2X
1 + a2 - 2a cos 2pf
.
Equation (10.51) gives
RY1k2 = a a hjhis2Xdk + j - i = s2X a ajaj + k =
q
q
q
j=0 i=0
Example 10.15
j=0
s2Xak
1 - a2
.
ARMA Random Process
An autoregressive moving average (ARMA) process is defined by
Yn = - a aiYn - i + a b i¿Wn - i¿ ,
q
p
i=1
i¿ = 0
(10.55)
where Wn is a WSS, white noise input process. Yn can be viewed as the output of the recursive system in Fig. 10.7(b) to the input Xn . It can be shown that the transfer function of the linear system
596
Chapter 10
Analysis and Processing of Random Signals
defined by the above equation is
a b i¿e
p
H1f2 =
-j2pfi¿
i¿ = 0
1 + a aie -j2pfi
q
.
i=1
The power spectral density of the ARMA process is
SY1f2 = ƒ H1f2 ƒ 2s2W .
ARMA models are used extensively in random time series analysis and in signal processing.The general autoregressive process is the special case of the ARMA process with b 1 = b 2 = Á = b p = 0.
The general moving average process is the special case of the ARMA process with a1 = a2 = Á =
aq = 0. Octave has a function filter(b, a, x) which takes a set of coefficients b = 1b 1 , b 2 , Á ,
b p + 12 and a = 1a1 , a2 , Á , aq2 as coefficient for a filter as in Eq. (10.55) and produces the output
corresponding to the input sequence x.The choice of a and b can lead to a broad range of discretetime filters.
For example, if we let a = 11>N, 1>N, Á , 1>N2 we obtain a moving average filter:
Yn = 1Wn + Wn - 1 + Á + Wn - N + 12>N.
Figure 10.8 shows a zero-mean, unit-variance Gaussian iid sequence Wn and the outputs from an
N = 3 and an N = 10 moving average filter. It can be seen that the N = 3 filter moderates the
extreme variations but generally tracks the fluctuations in Xn . The N = 10 filter on the other
hand severely limits the variations and only tracks slower longer-lasting trends.
Figures 10.9(a) and (b) show the result of passing an iid Gaussian sequence Xn
through first-order autoregressive filters as in Eq. (10.54). The AR sequence with a = 0.1
has low correlation between adjacent samples and so the sequence remains similar to the
underlying iid random process.The AR sequence with a = 0.75 has higher correlation between adjacent samples which tends to cause longer lasting trends as evident in Fig. 10.9(b).
3
2
1
0
⫺1
⫺2
⫺3
⫺4
10
20
30
40
50
60
70
FIGURE 10.8
Moving average process showing iid Gaussian sequence and corresponding
N = 3, N = 10 moving average processes.
Section 10.3 Bandlimited Random Processes
3
4
2
3
597
2
1
1
0
0
⫺1
⫺1
⫺2
⫺3
⫺2
10
20
30
40
50
(a)
60
70
80
90 100
⫺3
10
20
30
40
50
(b)
60
70
80
90 100
FIGURE 10.9
(a) First-order autoregressive process with a = 0.1; (b) with a = 0.75.
10.3
BANDLIMITED RANDOM PROCESSES
In this section we consider two important applications that involve random
processes with power spectral densities that are nonzero over a finite range of frequencies. The first application involves the sampling theorem, which states that
bandlimited random processes can be represented in terms of a sequence of their
time samples. This theorem forms the basis for modern digital signal processing
systems. The second application involves the modulation of sinusoidal signals by
random information signals. Modulation is a key element of all modern communication systems.
10.3.1 Sampling of Bandlimited Random Processes
One of the major technology advances in the twentieth century was the development
of digital signal processing technology. All modern multimedia systems depend in
some way on the processing of digital signals. Many information signals, e.g., voice,
music, imagery, occur naturally as analog signals that are continuous-valued and that
vary continuously in time or space or both. The two key steps in making these signals
amenable to digital signal processing are: (1). Convert the continuous-time signals into
discrete-time signals by sampling the amplitudes; (2) Representing the samples using a
fixed number of bits. In this section we introduce the sampling theorem for wide-sense
stationary bandlimited random processes, which addresses the conversion of signals
into discrete-time sequences.
Let x(t) be a deterministic, finite-energy time signal that has Fourier transform
'
X1f2 = f5x1t26 that is nonzero only in the frequency range ƒ f ƒ … W. Suppose we sample x(t) every T seconds to obtain the sequence of sample values: 5 Á , x1-2T2, x1-T2,
x102, x1T2, Á 6. The sampling theorem for deterministic signals states that x(t) can
be recovered exactly from the sequence of samples if T … 1>2W or equivalently
1>T Ú 2W, that is, the sampling rate is at least twice the bandwidth of the signal.
The minimum sampling rate 1/2W is called the Nyquist sampling rate. The sampling
598
Chapter 10
Analysis and Processing of Random Signals
⫻n
x(t)
n
x(nT )d(t – nT )
x(nT )p(t – nT)
p(t)
d(t⫺nt)
Sampling
Interpolation
(a)
~ f)
x(
ƒ
⫺W
0
W
1~
x(f)
T
1~
1
x f
T
T
1~
1
x f⫺
T
T
ƒ
⫺
1
T
⫺
1
2T
1
2T
0
1
T
(b)
X(t)
Sampler
X(kT)
Y(kT)
hk
k
⫻
p(t)
Y(t)
d(t ⫺ kT)
(c)
FIGURE 10.10
(a) Sampling and interpolation; (b) Fourier transform of sampled
deterministic signal; (c) Sampling, digital filtering, and interpolation.
theorem provides the following interpolation formula for recovering x(t) from the
samples:
x1t2 = a x1nT2p1t - nT2 where p1t2 =
q
q
n=-
sin1pt>T2
pt>T
.
(10.56)
Eq. (10.56) provides us with the interesting interpretation depicted in Fig. 10.10(a).
The process of sampling x(t) can be viewed as the multiplication of x(t) by a train of delta
functions spaced T seconds apart. The sampled function is then represented by:
xs1t2 = a x1nT2d1t - nT2.
q
q
(10.57)
n=-
Eq. (10.56) can be viewed as the response of a linear system with impulse response p(t)
to the signal xs1t2. It is easy to show that the p(t) in Eq. (10.56) corresponds to the ideal
lowpass filter in Fig. 10.6:
P1f2 = f5p1t26 = b
1
0
-W … f … W
ƒ f ƒ 7 W.
Section 10.3 Bandlimited Random Processes
599
The proof of the sampling theorem involves the following steps. We show that
q
q
'
k
1
f b a x1nT2p1t - nT2 r = P1f2 a X1f - 2,
T
T
n = -q
k = -q
(10.58)
'
which consists of the sum of translated versions of X1f2 = f5x1t26, as shown in
Fig. 10.10(b). We then observe that as long as 1>T Ú 2W, then P( f ) in the above expressions selects the k = 0 term in the summation, which corresponds to X( f ). See
Problem 10.45 for details.
Example 10.16
Sampling a WSS Random Process
Let X(t) be a WSS process with autocorrelation function RX1t2. Find the mean and covariance
functions of the discrete-time sampled process Xn = X1nT2 for n = 0, ;1, ;2, Á .
Since X(t) is WSS, the mean and covariance functions are:
mX1n2 = E3X1nT24 = m
E3Xn1Xn24 = E3X1n1T2X1n2T24 = RX1n1T - n2T2 = RX11n1 - n22T2.
This shows Xn is a WSS discrete-time process.
Let X(t) be a WSS process with autocorrelation function RX1t2 and power spectral density SX1f2. Suppose that SX1f2 is bandlimited, that is,
SX1f2 = 0
ƒ f ƒ 7 W.
We now show that the sampling theorem can be extended to X(t). Let
n 1t2 =
X
aqX1nT2p1t - nT2 where p1t2 =
q
sin1pt>T2
n=-
pt>T
,
(10.59)
n 1t2 = X1t2 in the mean square sense. Recall that equality in the mean square
then X
sense does not imply equality for all sample functions, so this version of the sampling
theorem is weaker than the version in Eq. (10.56) for finite energy signals.
To show Eq. (10.59) we first note that since SX1f2 = f5RX1t26, we can apply
the sampling theorem for deterministic signals to RX1t2:
RX1t2 = a RX1nT2p1t - nT2.
q
q
(10.60)
n=-
Next we consider the mean square error associated with Eq. (10.59):
n 1t2624 = E35X1t2 - X
n 1t26X1t24 - E35X1t2 - X
n 1t26X
n 1t24
E35X1t2 - X
=
n 1t2X1t24 F E E3X1t2X1t24 - E3X
n 1t24 - E3X
n 1t2X
n 1t24 F .
E E3X1t2X
It is easy to show that Eq. (10.60) implies that each of the terms in braces is equal to zero.
n 1t2 = X1t2 in the mean square sense.
(See Problem 10.48.) We then conclude that X
600
Chapter 10
Analysis and Processing of Random Signals
Example 10.17
Digital Filtering of a Sampled WSS Random Process
Let X(t) be a WSS process with power spectral density SX1f2 that is nonzero only for ƒ f ƒ … W.
Consider the sequence of operations shown in Fig. 10.10(c): (1) X(t) is sampled at the Nyquist rate;
(2) the samples X(nT) are input into a digital filter in Fig. 10.7(b) with a1 = a2 = Á = aq = 0;
and (3) the resulting output sequence Yn is fed into the interpolation filter. Find the power spectral
density of the output Y(t).
The output of the digital filter is given by:
Y1kT2 = a b nX11k - n2T2
p
n=0
and the corresponding autocorrelation from Eq. (10.51) is:
RY1kT2 = a a b n b iRX11k + n - i2T2.
p
p
n=0 i=0
The autocorrelation of Y(t) is found from the interpolation formula (Eq. 10.60):
RY1t2 = a RY1kT2p1t - kT2 = a a a b n b iRX11k + n - i2T2p1t - kT2
q
q
q
q
k=-
k=-
p
p
n=0 i=0
= a a b n b i b a RX11k + n - i2T2p1t - kT2 r
q
p
q
p
n=0 i=0
k=-
= a a b n b iRX1t + 1n - i2T2.
p
p
n=0 i=0
The output power spectral density is then:
SY1f2 = f5RY1t26 = a a b n b if5RX1t + 1n - i2T26
p
p
n=0 i=0
= a a b n b iSX1f2e -j2pf1n - i2T
p
p
n=0 i=0
= b a b ne -j2pfnT r b a b iej2pfiT r SX1f2
p
p
n=0
i=0
= ƒ H1fT2 ƒ SX1f2
2
(10.61)
where H( f ) is the transfer function of the digital filter as per Eq. (10.49). The key finding here is the
appearance of H( f ) evaluated at f T.We have obtained a very nice result that characterizes the overall system response in Fig. 10.8 to the continuous-time input X(t). This result is true for more general
digital filters, see [Oppenheim and Schafer].
The sampling theorem provides an important bridge between continuous-time
and discrete-time signal processing. It gives us a means for implementing the real as well
as the simulated processing of random signals. First, we must sample the random
process above its Nyquist sampling rate. We can then perform whatever digital processing is necessary. We can finally recover the continuous-time signal by interpolation. The
only difference between real signal processing and simulated signal processing is that
the former usually has real-time requirements, whereas the latter allows us to perform
our processing at whatever rate is possible using the available computing power.
Section 10.3 Bandlimited Random Processes
601
10.3.2 Amplitude Modulation by Random Signals
Many of the transmission media used in communication systems can be modeled as
linear systems and their behavior can be specified by a transfer function H(f ), which
passes certain frequencies and rejects others. Quite often the information signal A(t)
(i.e., a speech or music signal) is not at the frequencies that propagate well. The purpose of a modulator is to map the information signal A(t) into a transmission signal
X(t) that is in a frequency range that propagates well over the desired medium. At the
receiver, we need to perform an inverse mapping to recover A(t) from X(t). In this section, we discuss two of the amplitude modulation methods.
Let A(t) be a WSS random process that represents an information signal. In general A(t) will be “lowpass” in character, that is, its power spectral density will be concentrated at low frequencies, as shown in Fig. 10.11(a). An amplitude modulation
(AM) system produces a transmission signal by multiplying A(t) by a “carrier” signal
cos12pfct + ®2:
X1t2 = A1t2 cos12pfct + ®2,
(10.62)
where we assume ® is a random variable that is uniformly distributed in the interval
10, 2p2, and ® and A(t) are independent.
The autocorrelation of X(t) is
E3X1t + t2X1t24
= E3A1t + t2 cos12pfc1t + t2 + ®2A1t2 cos12pfct + ®24
= E3A1t + t2A1t24E3cos12pfc1t + t2 + ®2 cos12pfct + ®24
SA( f )
⫺W
0
(a)
ƒ
W
SX( f )
⫺ƒc
0
(b)
ƒc
FIGURE 10.11
(a) A lowpass information signal; (b) an amplitude-modulated signal.
ƒ
602
Chapter 10
Analysis and Processing of Random Signals
1
1
= RA1t2Ec cos12pfct2 + cos12pfc12t + t2 + 2®2 d
2
2
=
1
RA1t2 cos12pfct2,
2
(10.63)
where we used the fact that E3cos12pfc12t + t2 + 2®24 = 0 (see Example 9.10). Thus
X(t) is also a wide-sense stationary random process.
The power spectral density of X(t) is
1
SX1f2 = f e RA1t2 cos12pfct2 f
2
=
1
1
S 1f + fc2 + SA1f - fc2,
4 A
4
(10.64)
where we used the table of Fourier transforms in Appendix B. Figure 10.11(b) shows
SX1f2. It can be seen that the power spectral density of the information signal has been
shifted to the regions around ;fc . X(t) is an example of a bandpass signal. Bandpass
signals are characterized as having their power spectral density concentrated about
some frequency much greater than zero.
The transmission signal is demodulated by multiplying it by the carrier signal and
lowpass filtering, as shown in Fig. 10.12. Let
Y1t2 = X1t22 cos12pfct + ®2.
(10.65)
Proceeding as above, we find that
SY1f2 =
=
1
1
SX1f + fc2 + SX1f - fc2
2
2
1
1
5S 1f + 2fc2 + SA1f26 + 5SA1f2 + SA1f - 2fc26.
2 A
2
The ideal lowpass filter passes SA1f2 and blocks SA1f ; 2fc2, which is centered about
; f, so the output of the lowpass filter has power spectral density
SY1f2 = SA1f2.
In fact, from Example 10.11 we know the output is the original information signal, A(t).
X(t)
⫻
2 cos (2pfct ⫹ ⌰)
FIGURE 10.12
AM demodulator.
LPF
Y(t)
Section 10.3 Bandlimited Random Processes
603
SX ( f )
ƒ0
0
(a)
ƒ0
S A( f )
0
(b)
jSB,A( f )
0
(c)
FIGURE 10.13
(a) A general bandpass signal. (b) a real-valued even function
of f. (c) an imaginary odd function of f.
The modulation method in Eq. (10.56) can only produce bandpass signals for
which SX1f2 is locally symmetric about fc , SX1fc + df2 = SX1fc - df2 for ƒ df ƒ 6 W,
as in Fig. 10.11(b). The method cannot yield real-valued transmission signals whose
power spectral density lack this symmetry, such as shown in Fig. 10.13(a). The following
quadrature amplitude modulation (QAM) method can be used to produce such signals:
X1t2 = A1t2 cos12pfct + ®2 + B1t2 sin12pfct + ®2,
(10.66)
where A(t) and B(t) are real-valued, jointly wide-sense stationary random processes,
and we require that
(10.67a)
RA1t2 = RB1t2
RB,A1t2 = -RA,B1t2.
(10.67b)
Note that Eq. (10.67a) implies that SA1f2 = SB1f2, a real-valued, even function of f, as
shown in Fig. 10.13(b). Note also that Eq. (10.67b) implies that SB,A1f2 is a purely
imaginary, odd function of f, as also shown in Fig. 10.13(c) (see Problem 10.57).
604
Chapter 10
Analysis and Processing of Random Signals
Proceeding as before, we can show that X(t) is a wide-sense stationary random
process with autocorrelation function
RX1t2 = RA1t2 cos12pfct2 + RB,A1t2 sin12pfct2
(10.68)
and power spectral density
SX1f2 =
1
1
5SA1f - fc2 + SA1f + fc26 +
5S 1f - fc2 - SBA1f + fc26.
2
2j BA
(10.69)
The resulting power spectral density is as shown in Fig. 10.13(a). Thus QAM can be
used to generate real-valued bandpass signals with arbitrary power spectral density.
Bandpass random signals, such as those in Fig. 10.13(a), arise in communication
systems when wide-sense stationary white noise is filtered by bandpass filters. Let N(t)
be such a process with power spectral density SN1f2. It can be shown that N(t) can be
represented by
N1t2 = Nc1t2 cos12pfct + ®2 - Ns1t2 sin12pfct + ®2,
(10.70)
SNc1f2 = SNs1f2 = 5SN1f - fc2 + SN1f + fc26L
(10.71)
SNc,Ns1f2 = j5SN1f - fc2 - SN1f + fc26L ,
(10.72)
where Nc1t2 and Ns1t2 are jointly wide-sense stationary processes with
and
where the subscript L denotes the lowpass portion of the expression in brackets. In
words, every real-valued bandpass process can be treated as if it had been generated by
a QAM modulator.
Example 10.18
Demodulation of Noisy Signal
The received signal in an AM system is
Y1t2 = A1t2 cos12pfct + ®2 + N1t2,
where N(t) is a bandlimited white noise process with spectral density
N0
SN1f2 = c 2
0
ƒ f ; fc ƒ 6 W
elsewhere.
Find the signal-to-noise ratio of the recovered signal.
Equation (10.70) allows us to represent the received signal by
Y1t2 = 5A1t2 + Nc1t26 cos12pfct + ®2 - Ns1t2 sin12pfct + ®2.
The demodulator in Fig. 10.12 is used to recover A(t). After multiplication by 2 cos12pfct + ®2,
we have
2Y1t2 cos12pfct + ®2 = 5A1t2 + Nc1t262 cos212pfct + ®2
- Ns1t22 cos12pfct + ®2 sin12pfct + ®2
= 5A1t2 + Nc1t2611 + cos14pfct + 2®22
- Ns1t2 sin14pfct + 2®2.
Section 10.4 Optimum Linear Systems
605
After lowpass filtering, the recovered signal is
A1t2 + Nc1t2.
The power in the signal and noise components, respectively, are
W
s2A =
L-W
SA1f2 df
W
s2Nc =
L-W
SNc1f2 df =
W
L-W
¢
N0
N0
+
≤ df = 2WN0 .
2
2
The output signal-to-noise ratio is then
SNR =
10.4
s2A
.
2WN0
OPTIMUM LINEAR SYSTEMS
Many problems can be posed in the following way. We observe a discrete-time, zeromean process Xa over a certain time interval I = 5t - a, Á , t + b6, and we are required to use the a + b + 1 resulting observations 5Xt - a , Á , Xt , Á , Xt + b6 to obtain
an estimate Yt for some other (presumably related) zero-mean process Zt . The estimate Yt is required to be linear, as shown in Fig. 10.14:
Yt =
a ht - bXb = a hbXt - b .
t+b
a
b=t-a
b = -b
(10.73)
The figure of merit for the estimator is the mean square error
E3e2t 4 = E31Zt - Yt224,
Xt – a
ha
⫻ ha–1
Xt ⫺ a ⫹1
(10.74)
Xt
h0
⫻
⫹
Yt
FIGURE 10.14
A linear system for producing an estimate Yt .
⫻
Xt ⫹ b
hb
⫻
606
Chapter 10
Analysis and Processing of Random Signals
and we seek to find the optimum filter, which is characterized by the impulse response
hb that minimizes the mean square error.
Examples 10.19 and 10.20 show that different choices of Zt and Xa and of observation interval correspond to different estimation problems.
Example 10.19 Filtering and Smoothing Problems
Let the observations be the sum of a “desired signal” Za plus unwanted “noise” Na:
Xa = Za + Na
a H I.
We are interested in estimating the desired signal at time t. The relation between t and the observation interval I gives rise to a variety of estimation problems.
If I = 1- q , t2, that is, a = q and b = 0, then we have a filtering problem where we estimate Zt in terms of noisy observations of the past and present. If I = 1t - a, t2, then we have a
filtering problem in which we estimate Zt in terms of the a + 1 most recent noisy observations.
If I = 1- q , q 2, that is, a = b = q , then we have a smoothing problem where we are attempting to recover the signal from its entire noisy version. There are applications where this
makes sense, for example, if the entire realization Xa has been recorded and the estimate Zt is
obtained by “playing back” Xa .
Example 10.20
Prediction
Suppose we want to predict Zt in terms of its recent past: 5Zt - a , Á , Zt - 16. The general estimation problem becomes this prediction problem if we let the observation Xa be the past a values
of the signal Za , that is,
Xa = Za
t - a … a … t - 1.
The estimate Yt is then a linear prediction of Zt in terms of its most recent values.
10.4.1 The Orthogonality Condition
It is easy to show that the optimum filter must satisfy the orthogonality condition (see
Eq. 6.56), which states that the error et must be orthogonal to all the observations Xa , that is,
0 = E3etXa4
or equivalently,
for all a H I
= E31Zt - Yt2Xa4 = 0,
E3ZtXa4 = E3YtXa4
(10.75)
for all a H I.
(10.76)
If we substitute Eq. (10.73) into Eq. (10.76) we find
E3ZtXa4 = E B a hbXt - bXa R
a
for all a H I
b = -b
= a hbE3Xt - bXa4
a
b = -b
= a hbRX1t - a - b2
a
b = -b
for all a H I.
(10.77)
Section 10.4 Optimum Linear Systems
607
Equation (10.77) shows that E3ZtXa4 depends only on t - a, and thus Xa and Zt
are jointly wide-sense stationary processes. Therefore, we can rewrite Eq. (10.77) as
follows:
RZ,X1t - a2 = a hbRX1t - b - a2
a
t - a … a … t + b.
b = -b
Finally, letting m = t - a, we obtain the following key equation:
RZ,X1m2 = a hbRX1m - b2
a
-b … m … a.
(10.78)
b = -b
The optimum linear filter must satisfy the set of a + b + 1 linear equations given by
Eq. (10.78). Note that Eq. (10.78) is identical to Eq. (6.60) for estimating a random
variable by a linear combination of several random variables. The wide-sense stationarity of the processes reduces this estimation problem to the one considered in
Section 6.5.
In the above derivation we deliberately used the notation Zt instead of Zn to suggest that the same development holds for continuous-time estimation. In particular,
suppose we seek a linear estimate Y(t) for the continuous-time random process Z(t) in
terms of observations of the continuous-time random process X1a2 in the time interval t - a … a … t + b:
t+b
Y1t2 =
Lt - a
a
h1t - b2X1b2 db =
L-b
h1b2X1t - b2 db.
It can then be shown that the filter h1b2 that minimizes the mean square error is specified by
RZ,X1t2 =
a
L-b
h1b2RX1t - b2 db
-b … t … a.
(10.79)
Thus in the time-continuous case we obtain an integral equation instead of a set of
linear equations. The analytic solution of this integral equation can be quite difficult, but the equation can be solved numerically by approximating the integral by a
summation.7
We now determine the mean square error of the optimum filter. First we note
that for the optimum filter, the error et and the estimate Yt are orthogonal since
E3etYt4 = Ecet a ht - bXb d = a ht - bE3etXb4 = 0,
where the terms inside the last summation are 0 because of Eq. (10.75). Since et = Zt - Yt ,
the mean square error is then
E3e2t 4 = E3et1Zt - Yt24
= E3etZt4,
7
Equation (10.79) can also be solved by using the Karhunen-Loeve expansion.
608
Chapter 10
Analysis and Processing of Random Signals
since et and Yt are orthogonal. Substituting for et yields
E3e2t 4 = E31Zt - Yt2Zt4 = E3ZtZt4 - E3YtZt4
= RZ102 - E3ZtYt4
= RZ102 - E B Zt a hbXt - b R
a
b = -b
= RZ102 - a hbRZ,X1b2.
a
(10.80)
b = -b
Similarly, it can be shown that the mean square error of the optimum filter in the
continuous-time case is
E3e21t24 = RZ102 =
a
L-b
h1b2RZ,X1b2 db.
(10.81)
The following theorems summarize the above results.
Theorem
Let Xt and Zt be discrete-time, zero-mean, jointly wide-sense stationary processes, and let Yt be
an estimate for Zt of the form
a ht - bXb = a hbXt - b .
Yt =
t+b
a
b=t-a
b = -b
The filter that minimizes E31Zt - Yt224 satisfies the equation
RZ,X1m2 = a hbRX1m - b2
a
-b … m … a
b = -b
and has mean square error given by
E31Zt - Yt224 = RZ102 - a hbRZ,X1b2.
a
b = -b
Theorem
Let X(t) and Z(t) be continuous-time, zero-mean, jointly wide-sense stationary processes, and let
Y(t) be an estimate for Z(t) of the form
t+b
Y1t2 =
Lt - a
a
h1t - b2X1b2 db =
L-b
h1b2X1t - b2 db.
The filter h1b2 that minimizes E31Z1t2 - Y1t2224 satisfies the equation
RZ,X1t2 =
a
L-b
h1b2RX1t - b2 db
-b … t … a
609
Section 10.4 Optimum Linear Systems
and has mean square error given by
E31Z1t2 - Y1t2224 = RZ102 -
a
L-b
h1b2RZ,X1b2 db.
Example 10.21 Filtering of Signal Plus Noise
Suppose we are interested in estimating the signal Zn from the p + 1 most recent noisy observations:
a H I = 5n - p, Á , n - 1, n6.
Xa = Za + Na
Find the set of linear equations for the optimum filter if Za and Na are independent random
processes.
For this choice of observation interval, Eq. (10.78) becomes
RZ,X1m2 = a hbRX1m - b2
p
b=0
m H 50, 1, Á , p6.
(10.82)
The cross-correlation terms in Eq. (10.82) are given by
RZ,X1m2 = E3ZnXn - m4 = E3Zn1Zn - m + Nn - m24 = RZ1m2.
The autocorrelation terms are given by
RX1m - b2 = E3Xn - bXn - m4 = E31Zn - b + Nn - b21Zn - m + Nn - m24
= RZ1m - b2 + RZ,N1m - b2
+ RN,Z1m - b2 + RN1m - b2
= RZ1m - b2 + RN1m - b2,
since Za and Na are independent random processes. Thus Eq. (10.82) for the optimum filter becomes
RZ1m2 = a hb5RZ1m - b2 + RN1m - b26 m H 50, 1, Á , p6.
p
(10.83)
b=0
This set of p + 1 linear equations in p + 1 unknowns hb is solved by matrix inversion.
Example 10.22
Filtering of AR Signal Plus Noise
Find the set of equations for the optimum filter in Example 10.21 if Za is a first-order autoregressive process with average power sZ2 and parameter r, ƒ r ƒ 6 1, and Na is a white noise process
2
with average power sN
.
The autocorrelation for a first-order autoregressive process is given by
RZ1m2 = s2Zr ƒmƒ
m = 0, ;1, ;2, Á .
(See Problem 10.42.) The autocorrelation for the white noise process is
RN1m2 = s2N d1m2.
Substituting RZ1m2 and RN1m2 into Eq. (10.83) yields the following set of linear equations:
s2Zr ƒmƒ = a hb1s2Zr ƒm - bƒ + s2Nd1m - b22
p
b=0
m H 50, Á , p6.
(10.84)
610
Chapter 10
Analysis and Processing of Random Signals
If we divide both sides of Eq. (10.84) by s2Z and let ≠ = s2N>s2Z , we obtain the following matrix
equation:
Á
1
h0
1 + ≠
r
r2
rp
p
Á
r
h1
r
1 + ≠
r
r -1
(10.85)
r
1 + ≠ Á
rp - 2 U E # U = E # U.
E r2
#
#
#
#
#
Á
#
Á 1 + ≠ hp
rp
rp
rp - 1
rp - 2
Note that when the noise power is zero, i.e., ≠ = 0, then the solution is h0 = 1, hj = 0,
j = 1, Á , p, that is, no filtering is required to obtain Zn .
Equation (10.85) can be readily solved using Octave. The following function will compute
the optimum linear coefficients and the mean square error of the optimum predictor:
function [mse]= Lin_Est_AR (order,rho,varsig,varnoise)
n=[0:1:order-1]
r=varsig*rho.^n;
R=varnoise*eye(order)+toeplitz(r);
H=inv(R)*transpose(r)
mse=varsig-transpose(H)*transpose(r);
endfunction
Table 10.1 gives the values of the optimal predictor coefficients and the mean square error as
the order of the estimator is increased for the first-order autoregressive process with s2Z = 4, r = 0.9,
and noise variance s2N = 4. It can be seen that the predictor places heavier weight on more recent
samples, which is consistent with the higher correlation of such samples with the current sample. For
smaller values of r, the correlation for distant samples drops off more quickly and the coefficients
place even lower weighting on them. The mean square error can also be seen to decrease with increasing order p + 1 of the estimator. Increasing the first few orders provides significant improvements, but a point of diminishing returns is reached around p + 1 = 3.
10.4.2 Prediction
The linear prediction problem arises in many signal processing applications. In
Example 6.31 in Chapter 6, we already discussed the linear prediction of speech signals. In general, we wish to predict Zn in terms of Zn - 1 , Zn - 2 , Á , Zn - p:
Yn = a hbZn - b .
p
b=1
TABLE 10.1 Effect of predictor order on MSE performance.
pⴙ1
MSE
1
2
3
4
5
2.0000
1.4922
1.3193
1.2549
1.2302
Coefficients
0.5
0.37304
0.32983
0.31374
0.30754
0.28213
0.22500
0.20372
0.19552
0.17017
0.13897
0.12696
0.10510
0.08661
0.065501
Section 10.4 Optimum Linear Systems
611
For this problem, Xa = Za , so Eq. (10.79) becomes
RZ1m2 = a hbRZ1m - b2
p
b=1
m H 51, Á , p6.
(10.86a)
In matrix form this equation becomes
RZ102
RZ112
RZ122
RZ112
E . U = E
.
.
.
RZ1p2
RZ1p - 12
RZ112
RZ102
.
.
.
RZ122
RZ112
.
.
.
Á
Á
.
.
RZ112
RZ1p - 12
h1
RZ1p - 22
h2
.
UE . U
RZ112
.
RZ102
hp
= R Zh.
(10.86b)
Equations (10.86a) and (10.86b) are called the Yule-Walker equations.
Equation (10.80) for the mean square error becomes
E3e2n4 = RZ102 - a hbRZ1b2.
p
(10.87)
b=1
By inverting the p * p matrix R Z, we can solve for the vector of filter coefficients h.
Example 10.23 Prediction for Long-Range and Short-Range Dependent Processes
Let X11t2 be a discrete-time first-order autoregressive process with s2X = 1 and r = 0.7411, and
let X21t2 be a discrete-time long-range dependent process with autocovariance given by Eq.
(9.109), s2X = 1, and H = 0.9. Both processes have CX112 = 0.7411, but the autocovariance of
X11t2 decreases exponentially while that of X21t2 has long-range dependence. Compare the performance of the optimal linear predictor for these processes for short-term as well as long-term
predictions.
The optimum linear coefficients and the associated mean square error for the long-range
dependent process can be calculated using the following code. The function can be modified for
the autoregressive case.
function mse= Lin_Pred_LR(order,Hurst,varsig)
n=[0:1:order-1]
H2=2*Hurst
r=varsig*((1+n).^H2-2*(n.^H2)+abs(n-1).^H2)/2
rz=varsig*((2+n).^H2-2*((n+1).^H2)+(n).^H2)/2
R=toeplitz(r);
H=transpose(inv(R)*transpose(rz))
mse=varsig-H*transpose(rz)
endfunction
Table 10.2 below compares the mean square errors and the coefficients of the two processes in the case of short-term prediction. The predictor for X11t2 attains all of the benefit of prediction with a p = 1 system. The optimum predictors for higher-order systems set the other
coefficients to zero, and the mean square error remains at 0.4577. The predictor for X21t2
612
Chapter 10
Analysis and Processing of Random Signals
TABLE 10.2(a) Short-term prediction: autoregressive,
r = 0.7411, s2X = 1, CX(1) = 0.7411.
p
MSE
Coefficients
1
0.45077
0.74110
2
0.45077
0.74110
0
TABLE 10.2(b) Short-term prediction: long-range dependent process,
Hurst = 0.9, s2X = 1, CX(1) = 0.7411.
p
MSE
1
2
3
4
5
0.45077
0.43625
0.42712
0.42253
0.41964
Coefficients
0.74110
0.60809
0.582127
0.567138
0.558567
0.17948
0.091520
0.082037
0.075061
0.144649
0.084329
0.077543
0.103620
0.056707
0.082719
achieves most of the possible performance with a p = 1 system, but small reductions in mean
square error do accrue by adding more coefficients. This is due to the persistent correlation
among the values in X21t2.
Table 10.3 shows the dramatic impact of long-range dependence on prediction performance. We modified Eq. (10.86) to provide the optimum linear predictor for Xt based on two observations Xt-10 and Xt-20 that are in the relatively remote past. X11t2 and its previous values are
almost uncorrelated, so the best predictor has a mean square error of almost 1, which is the variance of X11t2. On the other hand, X21t2 retains significant correlation with its previous values and
so the mean square error provides a significant reduction from the unit variance. Note that the
second-order predictor places significant weight on the observation 20 samples in the past.
TABLE 10.3(a) Long-term prediction: autoregressive,
r = 0.7411, s2X = 1, CX(1) = 0.7411.
p
MSE
1
2
0.99750
0.99750
Coefficients
0.04977
0.04977
0
TABLE 10.3(b) Long-term prediction: long-range dependent
process, Hurst = 0.9, s2X = 1, CX(1) = 0.7411.
p
MSE
Coefficients
10
0.79354
0.45438
10;20
0.74850
0.34614
0.23822
Section 10.4 Optimum Linear Systems
613
10.4.3 Estimation Using the Entire Realization of the Observed Process
Suppose that Zt is to be estimated by a linear function Yt of the entire realization of Xt ,
that is, a = b = q and Eq. (10.73) becomes
Yt = a hbXt - b .
q
q
b=-
In the case of continuous-time random processes, we have
q
L- q
Y1t2 =
h1b2X1t - b2 db.
The optimum filters must satisfy Eqs. (10.78) and (10.79), which in this case become
RZ,X1m2 = a hbRX1m - b2
q
q
for all m
(10.88a)
b=-
RZ,X1t2 =
q
L- q
h1b2RX1t - b2 db
for all t.
(10.88b)
The Fourier transform of the first equation and the Fourier transform of the second
equation both yield the same expression:
SZ,X1f2 = H1f2SX1f2,
which is readily solved for the transfer function of the optimum filter:
H1f2 =
SZ,X1f2
SX1f2
.
(10.89)
The impulse response of the optimum filter is then obtained by taking the appropriate
inverse transform. In general the filter obtained from Eq. (10.89) will be noncausal,
that is, its impulse response is nonzero for t 6 0. We already indicated that there are
applications where this makes sense, namely, in situations where the entire realization Xa is recorded and the estimate Zt is obtained in “nonreal time” by “playing
back” Xa .
Example 10.24
Infinite Smoothing
Find the transfer function for the optimum filter for estimating Z(t) from X1a2 = Z1a2 + N1a2,
a H 1- q , q 2, where Z1a2 and N1a2 are independent, zero-mean random processes.
The cross-correlation between the observation and the desired signal is
RZ,X1t2 = E3Z1t + t2X1t24 = E3Z1t + t21Z1t2 + N1t224
= E3Z1t + t2Z1t24 + E3Z1t + t2N1t24
= RZ1t2,
since Z(t) and N(t) are zero-mean, independent random processes. The cross-power spectral
density is then
SZ,X1t2 = SZ1f2.
(10.90)
614
Chapter 10
Analysis and Processing of Random Signals
The autocorrelation of the observation process is
RX1t2 = E31Z1t + t2 + N1t + t221Z1t2 + N1t224
= RZ1t2 + RN1t2.
The corresponding power spectral density is
SX1f2 = SZ1f2 + SN1f2.
(10.91)
Substituting Eqs. (10.90) and (10.91) into Eq. (10.89) gives
H1f2 =
SZ1f2
SZ1f2 + SN1f2
.
(10.92)
Note that the optimum filter H( f ) is nonzero only at the frequencies where SZ1f2 is nonzero,
that is, where the signal has power content. By dividing the numerator and denominator of Eq.
(10.92) by SZ1f2, we see that H( f ) emphasizes the frequencies where the ratio of signal to noise
power density is large.
*10.4.4 Estimation Using Causal Filters
Now, suppose that Zt is to be estimated using only the past and present of Xa , that is,
I = 1- q , t2. Equations (10.78) and (10.79) become
RZ,X1m2 = a hbRX1m - b2
q
for all m
(10.93a)
b=0
RZ,X1t2 =
L0
q
h1b2RX1t - b2 db
for all t.
(10.93b)
Equations (10.93a) and (10.93b) are called the Wiener-Hopf equations and, though similar in appearance to Eqs. (10.88a) and (10.88b), are considerably more difficult to solve.
First, let us consider the special case where the observation process is white, that
is, for the discrete-time case RX1m2 = dm . Equation (10.93a) is then
RZ,X1m2 = a hb dm - b = hm
q
m Ú 0.
(10.94)
b=0
Thus in this special case, the optimum causal filter has coefficients given by
hm = b
0
RZ,X1m2
m 6 0
m Ú 0.
The corresponding transfer function is
H1f2 = a RZ,X1m2e -j2pfm.
q
(10.95)
m=0
Note Eq. (10.95) is not SZ,X1f2, since the limits of the Fourier transform in Eq. (10.95) do
not extend from - q to + q . However, H( f ) can be obtained from SZ,X1f2 by finding
hm = f -13SZ,X1f24, keeping the causal part (i.e., hm for m Ú 0) and setting the noncausal part to 0.
Section 10.4 Optimum Linear Systems
615
We now show how the solution of the above special case can be used to solve the
general case. It can be shown that under very general conditions, the power spectral
density of a random process can be factored into the form
SX1f2 = ƒ G1f2 ƒ 2 = G1f2G … 1f2,
(10.96)
where G( f ) and 1/G( f ) are causal filters.8 This suggests that we can find the optimum
filter in two steps, as shown in Fig. 10.15. First, we pass the observation process through
a “whitening” filter with transfer function W1f2 = 1>G1f2 to produce a white noise
process Xnœ , since
SX¿1f2 = ƒ W1f2 ƒ 2SX1f2 =
ƒ G1f2 ƒ 2
ƒ G1f2 ƒ 2
= 1
for all f.
Second, we find the best estimator for Zn using the whitened observation process
Xnœ as given by Eq. (10.95). The filter that results from the tandem combination of
the whitening filter and the estimation filter is the solution to the Wiener-Hopf
equations.
The transfer function of the second filter in Fig. 10.15 is
H21f2 = a RZ,X¿1m2e -j2pfm
q
(10.97)
m=0
by Eq. (10.95). To evaluate Eq. (10.97) we need to find
RZ,X¿1k2 = E3Zn + kXnœ 4
= a wiE3Zn + kXn - i4
q
i=0
q
= a wiRZ, X1k + i2,
(10.98)
i=0
where wi is the impulse response of the whitening filter. The Fourier transform of
Eq. (10.98) gives an expression that is easier to work with:
SZ, X¿1f2 = W…1f2SZ,X1f2 =
Xn
W( f )
X⬘n
H 2( f )
SZ,X1f2
G …1f2
.
(10.99)
Yn
FIGURE 10.15
Whitening filter approach for solving WienerHopf equations.
The method for factoring SX1f2 as specified by Eq. (10.96) is called spectral factorization. See Example
10.10 and the references at the end of the chapter.
8
616
Chapter 10
Analysis and Processing of Random Signals
The inverse Fourier transform of Eq. (10.99) yields the desired RZ,X¿1k2, which can
then be substituted into Eq. (10.97) to obtain H21f2.
In summary, the optimum filter is found using the following procedure:
1.
2.
3.
4.
Factor SX1f2 as in Eq. (10.96) and obtain a causal whitening filter W1f2 = 1>G1f2.
Find RZ,X¿1k2 from Eq. (10.98) or from Eq. (10.99).
H21f2 is then given by Eq. (10.97).
The optimum filter is then
H1f2 = W1f2H21f2.
(10.100)
This procedure is valid for the continuous-time version of the optimum causal filter problem,
after appropriate changes are made from summations to integrals. The following example considers a continuous-time problem.
Example 10.25
Wiener Filter
Find the optimum causal filter for estimating a signal Z(t) from the observation X1t2 = Z1t2 +
N1t2, where Z(t) and N(t) are independent random processes, N(t) is zero-mean white noise
density 1, and Z(t) has power spectral density
SZ1f2 =
2
.
1 + 4p2f2
The optimum filter in this problem is called the Wiener filter.
The cross-power spectral density between Z(t) and X(t) is
SZ,X1f2 = SZ1f2,
since the signal and noise are independent random processes. The power spectral density for the
observation process is
SX1f2 = SZ1f2 + SN1f2
=
3 + 4p2f2
1 + 4p2f2
= ¢
j2pf + 23
-j2pf + 23
≤¢
≤.
j2pf + 1
-j2pf + 1
If we let
G1f2 =
j2pf + 23
,
j2pf + 1
then it is easy to verify that W1f2 = 1>G1f2 is the whitening causal filter.
Next we evaluate Eq. (10.99):
SZ,X¿1f2 =
=
=
SZ,X1f2
G …1f2
=
1 - j2pf
2
1 + 4p2f2 23 - j2pf
2
11 + j2pf2123 - j2pf2
c
c
,
+
1 + j2pf
23 - j2pf
(10.101)
Section 10.5 The Kalman Filter
617
where c = 2>11 + 232. If we take the inverse Fourier transform of SZ,X¿1f2, we obtain
RZ,X¿1t2 = b
ce -t
ce 23t
t 7 0
t 6 0.
Equation (10.97) states that H21f2 is given by the Fourier transform of the t 7 0 portion of
RZ,X¿1t2:
c
H21f2 = f5ce -Tu1t26 =
.
1 + j2pf
Note that we could have gotten this result directly from Eq. (10.101) by noting that only the first
term gives rise to the positive-time (i.e., causal) component.
The optimum filter is then
H1f2 =
1
c
.
H21f2 =
G1f2
23 + j2pf
The impulse response of this filter is
h1t2 = cet-23
10.5
t 7 0.
THE KALMAN FILTER
The optimum linear systems considered in the previous section have two limitations:
(1) They assume wide-sense stationary signals; and (2) The number of equations grows
with the size of the observation set. In this section, we consider an estimation approach
that assumes signals have a certain structure. This assumption keeps the dimensionality of the problem fixed even as the observation set grows. It also allows us to consider
certain nonstationary signals.
We will consider the class of signals that can be represented as shown in Fig. 10.16(a):
Zn = an - 1Zn - 1 + Wn - 1
n = 1, 2, Á ,
(10.102)
where Z0 is the random variable at time 0, an is a known sequence of constants, and Wn is
a sequence of zero-mean uncorrelated random variables with possibly time-varying variances 5E3W2n46. The resulting process Zn is nonstationary in general.We assume that the
process Zn is not available to us, and that instead, as shown in Fig. 10.16(a), we observe
Xn = Zn + Nn
n = 0, 1, 2, Á ,
(10.103)
where the observation noise Nn is a zero-mean, uncorrelated sequence of random variables with possibly time-varying variances 5E3N 2n46. We assume that Wn and Nn are
uncorrelated at all times n1 and n2 . In the special case where Wn and Nn are Gaussian
random processes, then Zn and Xn will also be Gaussian random processes. We will develop the Kalman filter, which has the structure in Fig. 10.16(b).
Our objective is to find for each time n the minimum mean square estimate (actually prediction) of Zn based on the observations X0 , X1 , Á , Xn - 1 using a linear estimator that possibly varies with time:
- 12
Yn = a h1n
Xn - j .
j
n
j=i
(10.104)
618
Chapter 10
Analysis and Processing of Random Signals
an ⫺ 1
⫻
Zn ⫺ 1
Unit
delay
Wn ⫺ 1
⫹
Nn
⫹
Zn
Xn
(a)
Yn
an
Unit
delay
⫻
⫺
⫹
Xn
⫹
⫻
⫹
Yn ⫹ 1
kn
(b)
FIGURE 10.16
(a) Signal structure. (b) Kalman filter.
1n - 12
The orthogonality principle implies that the optimum filter 5hj
E B ¢ Zn - a hj
n
1n - 12
j=1
Xn - j ≤ Xl R = 0
6 satisfies
for l = 0, 1, Á , n - 1,
which leads to a set of n equations in n unknowns:
RZ,X1n, l2 = a hj
n
1n - 12
j=1
RX1n - j, l2
for l = 0, 1, Á , n - 1.
(10.105)
At the next time instant, we need to find
Yn + 1 = a hj Xn + 1 - j
n+1
1n2
(10.106)
j=1
by solving a system of 1n + 12 * 1n + 12 equations:
RZ,X1n + 1, l2 = a hj RX1n + 1 - j, l2
n+1
1n2
for l = 0, 1, Á , n.
(10.107)
j=1
Up to this point we have followed the procedure of the previous section and we
find that the dimensionality of the problem grows with the number of observations. We now use the signal structure to develop a recursive method for solving
Eq. (10.106).
Section 10.5 The Kalman Filter
619
We first need the following two results: For l 6 n, we have
RZ,X1n + 1, l2 = E3Zn + 1Xl4 = E31anZn + Wn2Xl4
= anRZ,X1n, l2 + E3WnXl4 = anRZ,X1n, l2,
(10.108)
since E3WnXl4 = E3Wn4E3Xl4 = 0, that is, Wn is uncorrelated with the past of the
process and the observations prior to time n, as can be seen from Fig. 10.16(a). Also for
l 6 n, we have
RZ,X1n, l2 = E3ZnXl4 = E31Xn - Nn2Xl4
= RX1n, l2 - E3NnXl4 = RX1n, l2,
(10.109)
since E3NnXl4 = E3Nn4E3Xl4 = 0, that is, the observation noise at time n is uncorrelated with prior observations.
We now show that the set of equations in Eq. (10.107) can be related to the set in
Eq. (10.105). For l 6 n, we can equate the right-hand sides of Eqs. (10.108) and (10.107):
anRZ,X1n, l2 = a hj RX1n + 1 - j, l2
n+1
1n2
j=1
RX1n, l2 + a hj RX1n + 1 - j, l2
= h1n2
1
n+1
1n2
j=2
for l = 0, 1, Á , n - 1. (10.110)
From Eq. (10.109) we have RX1n, l2 = RZ,X1n, l2, so we can replace the first term on
the right-hand of Eq. (10.110) and then move the resulting term to the left-hand side:
1an - h1n2
1 2RZ,X1n, l2 = a hj RX1n + 1 - j, l2
n+1
1n2
j=2
= a hj¿ + 1RX1n - j¿, l2.
n
1n2
(10.111)
j¿ = 1
By dividing both sides by an - h1n2
we finally obtain
1
1n2
RZ,X1n, l2 = a
hj¿ + 1
n
1n2
- h1
j¿ = 1 a n
RX1n - j¿, l2
for l = 0, 1, Á , n - 1. (10.112)
This set of equations is identical to Eq. (10.105) if we set
1n - 12
hj
1n2
=
hj + 1
for j = 1, Á , n.
1n2
a n - h1
1n - 12
(10.113a)
1n - 12
Therefore, if at step n we have found h1
, Á , hn
, and if somehow we have found
1n2
h1 , then we can find the remaining coefficients from
1n2
1n2
1n - 12
hj + 1 = 1a n - h1 2hj
1n2
Thus the key question is how to find h1 .
j = 1, Á , n.
(10.113b)
620
Chapter 10
Analysis and Processing of Random Signals
Suppose we substitute the coefficients in Eq. (10.113b) into Eq. (10.106):
Yn + 1 = h1 Xn + a 1an - h1 2hj¿
1n2
n
1n2
j¿ = 1
1n2
1n - 12
Xn - j¿
1n2
= h1 Xn + 1an - h1 2Yn
1n2
= anYn + h1 1Xn - Yn2,
(10.114)
where the second equality follows from Eq. (10.104). The above equation has a very
pleasing interpretation, as shown in Fig. 10.16(b). Since Yn is the prediction for time
n, anYn is the prediction for the next time instant, n + 1, based on the “old” information (see Eq. (10.102)). The term 1Xn - Yn2 is called the “innovations,” and it gives the
discrepancy between the old prediction and the observation. Finally, the term h1n2
1 is
called the gain, henceforth denoted by kn , and it indicates the extent to which the innovations should be used to correct anYn to obtain the “new” prediction Yn + 1 . If we denote the innovations by
(10.115)
In = Xn - Yn
then Eq. (10.114) becomes
Yn + 1 = anYn + knIn .
(10.116)
We still need to determine a means for computing the gain kn .
From Eq. (10.115), we have that the innovations satisfy
In = Xn - Yn = Zn + Nn - Yn = Zn - Yn + Nn = en + Nn ,
where en = Zn - Yn is the prediction error. A recursive equation can be obtained for
the prediction error:
en + 1 = Zn + 1 - Yn + 1 = anZn + Wn - anYn - knIn
= an1Zn - Yn2 + Wn - kn1en + Nn2
= 1an - kn2en + Wn - knNn ,
(10.117)
with initial condition e0 = Z0 . Since X0 , Wn , and Nn are zero-mean, it then follows that
E3en4 = 0 for all n. A recursive equation for the mean square prediction error is obtained from Eq. (10.117):
E3e2n + 14 = 1a n - kn22E3e2n4 + E3W 2n4 + k2nE3N 2n4,
(10.118)
with initial condition E3e204 = E3Z204. We are finally ready to obtain an expression for
the gain kn .
The gain kn must minimize the mean square error E3e2n + 14. Therefore we can differentiate Eq. (10.118) with respect to kn and set it equal to zero:
0 = -21a n - kn2E3e2n4 + 2knE3N 2n4.
Section 10.5 The Kalman Filter
Then we can solve for kn:
kn =
anE3e2n4
E3e2n4 + E3N 2n4
621
(10.119)
.
The expression for the mean square prediction error in Eq. (10.118) can be simplified by using Eq. (10.119) (see Problem 10.72):
E3e2n + 14 = an1an - kn2E3e2n4 + E3W2n4.
(10.120)
Equations (10.119), (10.116), and (10.120) when combined yield the recursive
procedure that constitutes the Kalman filtering algorithm:
Kalman filter algorithm:9
E3e204 = E3Z204
Initialization: Y0 = 0
For n = 0, 1, 2, Á
kn =
anE3e2n4
E3e2n4 + E3N 2n4
Yn + 1 = a nYn + kn1Xn - Yn2
E3e2n + 14 = an1an - kn2E3e2n4 + E3W2n4.
Note that the algorithm requires knowledge of the signal structure, i.e., the an , and the
variances E3N 2n4 and E3W2n4. The algorithm can be implemented easily and has consequently found application in a broad range of detection, estimation, and signal processing problems. The algorithm can be extended in matrix form to accommodate a
broader range of processes.
Example 10.26
First-Order Autoregressive Process
Consider a signal defined by
Zn = aZn - 1 + Wn
n = 1, 2, Á
Z0 = 0,
where E3W2n4 = s2W = 0.36, and a = 0.8, and suppose the observations are made in additive
white noise
Xn = Zn + Nn
n = 0, 1, 2, Á ,
where E3N 2n4 = 1. Find the form of the predictor and its mean square error as n : q .
The gain at step n is given by
aE3e2n4
kn =
.
E3e2n4 + 1
The mean square error sequence is therefore given by
E3e204 = E3Z204 = 0
9
We caution the student that there are two common ways of defining the gain. The statement of the Kalman
filter algorithm will differ accordingly in various textbooks.
622
Chapter 10
Analysis and Processing of Random Signals
E3e2n + 14 = a1a - kn2E3e2n4 + s2W
= a¢
a
≤ E3e2n4 + s2W
1 + E3e2n4
for n = 1, 2, Á .
The steady state mean square error eq must satisfy
eq =
a2
eq + s2W .
1 + eq
For a = 0.8 and s2W = 0.36, the resulting quadratic equation yields kq = 0.3 and eq = 0.6.
Thus at steady state the predictor is
Yn + 1 = 0.8Yn + 0.31Xn - Yn2.
*10.6
ESTIMATING THE POWER SPECTRAL DENSITY
Let X0 , Á , Xk - 1 be k observations of the discrete-time, zero-mean, wide-sense stationary process Xn . The periodogram estimate for SX1f2 is defined as
1 '
'
pk1f2 = ƒ xk1f2 ƒ 2,
k
(10.121)
'
xk1f2 = a Xme -j2pfm.
(10.122)
'
where xk1f2 is obtained as a Fourier transform of the observation sequence:
k-1
m=0
In Section 10.1 we showed that the expected value of the periodogram estimate is
'
E3pk1f24 =
a
k-1
m¿ = -1k - 12
ƒ m¿ ƒ
fRX1m¿2e -j2pfm¿,
k
e1 -
'
so pk1f2 is a biased estimator for SX1f2. However, as k : q ,
'
E3pk1f24 : SX1f2,
(10.123)
(10.124)
so the mean of the periodogram estimate approaches SX1f2.
Before proceeding to find the variance of the periodogram estimate, we note that
the periodogram estimate is equivalent to taking the Fourier transform of an estimate
for the autocorrelation sequence; that is,
'
pk1f2 =
a
k-1
rNk1m2e -j2pfm,
(10.125)
m = -1k - 12
where the estimate for the autocorrelation is
rNk1m2 =
(See Problem 10.77.)
1 k - ƒmƒ - 1
XnXn + m .
k na
=0
(10.126)
Section 10.6 Estimating the Power Spectral Density
623
Periodogram
0.3
0.2
0.1
16
32
k
48
64
FIGURE 10.17
Periodogram for 64 samples of white noise sequence Xn iid uniform in (0, 1), SX1 f 2 = s2X =
1>12 = 0.083.
We might expect that as we increase the number of samples k, the periodogram es'
timate converges to SX1f2. This does not happen. Instead we find that pk1f2 fluctuates
wildly about the true spectral density, and that this random variation does not decrease
with increased k (see Fig. 10.17). To see why this happens, in the next section we compute
the statistics of the periodogram estimate for a white noise Gaussian random process. We
find that the estimates given by the periodogram have a variance that does not approach
zero as the number of samples is increased. This explains the lack of improvement in the
estimate as k is increased. Furthermore, we show that the periodogram estimates are uncorrelated at uniformly spaced frequencies in the interval -1>2 … f 6 1>2. This explains
the erratic appearance of the periodogram estimate as a function of f. In the final section,
we obtain another estimate for SX1f2 whose variance does approach zero as k increases.
10.6.1 Variance of Periodogram Estimate
Following the approach of [Jenkins and Watts, pp. 230–233], we consider the periodogram of samples of a white noise process with SX1f2 = s2X at the frequencies
f = n>k, -k>2 … n 6 k>2, which will cover the frequency range -1>2 … f 6 1>2.
(In practice these are the frequencies we would evaluate if we were using the FFT al'
gorithm to compute xk1f2.) First we rewrite Eq. (10.122) at f = n>k as follows:
2pmn
2pmn
' n
xk a b = a Xm acosa
b - j sina
bb
k
k
k
m=0
k-1
= A k1n2 - jBk1n2
-k>2 … n 6 k>2,
(10.127)
624
Chapter 10
Analysis and Processing of Random Signals
where
A k1n2 = a Xm cosa
k-1
m=0
and
Bk1n2 = a Xm sina
k-1
m=0
2pmn
b
k
(10.128)
2pmn
b.
k
(10.129)
Then it follows that the periodogram estimate is
n 2
1
1
' n
pk a b = ` xN k a b ` = 5A 2k1n2 + B2k1n26.
(10.130)
k
k
k
k
'
We find the variance of pk1n>k2 from the statistics of A k1n2 and Bk1n2.
The random variables A k1n2 and Bk1n2 are defined as linear functions of the
jointly Gaussian random variables X0 , Á , Xk - 1 . Therefore A k1n2 and Bk1n2 are also
jointly Gaussian random variables. If we take the expected value of Eqs. (10.128) and
(10.129) we find
(10.131)
E3A k1n24 = 0 = E3Bk1n24
for all n.
Note also that the n = -k>2 and n = 0 terms are different in that
Bk1-k>22 = 0 = Bk102
A k1-k>22 = a 1-12iXi
k-1
i=0
(10.132a)
A k102 = a Xi .
k-1
(10.132b)
i=0
The correlation between A k1n2 and A k1m2 (for n, m not equal to -k>2 or 0) is
k-1 k-1
2pni
2pml
E3A k1n2A k1m24 = a a E3XiXl4 cosa
b cosa
b
k
k
i=0 l=0
k-1
2pmi
2pni
b cosa
b
= s2X a cosa
k
k
i=0
k-1
k-1
2p1n - m2i
2p1n + m2i
1
1
= s2X a cosa
b + s2X a cosa
b,
k
k
i=0 2
i=0 2
where we used the fact that E3XiX14 = s2Xdil since the noise is white. The second summation is equal to zero, and the first summation is zero except when n = m. Thus
E3A k1n2A k1m24 =
1 2
ks d
2 X nm
for all n, m Z -k>2, 0.
(10.133a)
n, m Z 0 - k>2, 0
(10.133b)
It can similarly be shown that
E3Bk1n2Bk1m24 =
1 2
ks d
2 X nm
E3A k1n2Bk1m24 = 0
for all n, m.
(10.133c)
Section 10.6 Estimating the Power Spectral Density
625
When n = -k>2 or 0, we have
E3A k1n2A k1m24 = ks2X dnm
for all m.
(10.133d)
Equations (10.133a) through (10.133d) imply that A k1n2 and Bk1m2 are uncorrelated
random variables. Since A k1n2 and Bk1n2 are jointly Gaussian random variables, this
implies that they are zero-mean, independent Gaussian random variables.
We are now ready to find the statistics of the periodogram estimates at the frequencies f = n>k. Equation (10.130) gives
1
' n
pk a b = 5A 2k1n2 + B2k1n26
k
k
n Z -k>2, 0
A 2k1n2
B2k1n2
1 2
+
= sX b
r.
2
11>22ks2X
11>22ks2X
(10.134)
The quantity in brackets is the sum of the squares of two zero-mean, unit-variance, independent Gaussian random variables. This is a chi-square random variable with two
degrees of freedom (see Problem 7.6). From Table 4.1, we see that a chi-square random
variable with v degrees of freedom has variance 2v. Thus the expression in the brackets
has variance 4, and the periodogram estimate pN k1n>k2 has variance
2
1
' n
VARcpk a b d = a s2X b 4 = s4X = SX1f22.
k
2
For n = -k>2 and n = 0,
(10.135a)
A 2k1n2
' n
2
pk a b = sX b
r.
k
ks2X
The quantity in brackets is a chi-square random variable with one degree of freedom
and variance 2, so the variance of the periodogram estimate is
' n
VARcpk a b d = 2s4X
k
n = -k>2, 0.
(10.135b)
Thus we conclude from Eqs. (10.135a) and (10.135b) that the variance of the periodogram estimate is proportional to the square of the power spectral density and does not
approach zero as k increases. In addition, Eqs. (10.133a) through (10.133d) imply that the
periodogram estimates at the frequencies f = -n>k are uncorrelated random variables. A
more detailed analysis [Jenkins and Watts, p. 238] shows that for arbitrary f,
2
sin12pfk2
'
VAR3pk1f24 = SX1f22 b 1 + a
b r.
k sin12pf2
(10.136)
Thus variance of the periodogram estimate does not approach zero as the number of
samples is increased.
The above discussion has only considered the spectrum estimation for a white
noise, Gaussian random process, but the general conclusions are also valid for nonwhite, non-Gaussian processes. If the Xi are not Gaussian, we note from Eqs. (10.128)
626
Chapter 10
Analysis and Processing of Random Signals
and (10.129) that A k and Bk are approximately Gaussian by the central limit theorem
if k is large. Thus the periodogram estimate is then approximately a chi-square random
variable.
If the process Xi is not white, then it can be viewed as filtered white noise:
Xn = hn * Wn ,
where SW1f2 = s2W and ƒ H1f2 ƒ 2 SW1f2 = SX1f2. The periodograms of Xn and Wn are
related by
1 ' n 2
1
n 2 ' n 2
` xk a b ` = ` H a b ` ` wk a b ` .
k
k
k
k
k
(10.137)
'
2
ƒ xk1n>k2 ƒ 2
' n
.
` wk a b ` =
k
ƒ H1n>k2 ƒ 2
(10.138)
Thus
'
From our previous results, we know that ƒ wk1n>k2 ƒ 2>k is a chi-square random variable
4
with variance sW . This implies that
'
ƒ xk1n>k2 ƒ 2
n 4 4
VAR B
= SX1f22.
(10.139)
R = ` H a b ` sW
k
k
Thus we conclude that the variance of the periodogram estimate for nonwhite noise is
also proportional to SX1f22.
10.6.2 Smoothing of Periodogram Estimate
A fundamental result in probability theory is that the sample mean of a sequence of
independent realizations of a random variable approaches the true mean with probability one. We obtain an estimate for SX1f2 that goes to zero with the number of observations k by taking the average of N independent periodograms on samples of size k:
1 N'
'
8pk1f29N =
pk,i1f2,
N ia
=1
(10.140)
'
where 5pk,i1f26 are N independent periodograms computed using separate sets of k
samples each. Figures 10.18 and 10.19 show the N = 10 and N = 50 smoothed periodograms corresponding to the unsmoothed periodogram of Fig. 10.17. It is evident
that the variance of the power spectrum estimates is decreasing with N.
The mean of the smoothed estimator is
1 N
'
'
'
E3pk,i1f24 = E3pk1f24
E8pk1f29N =
a
N i=1
a
k-1
=
m¿ = -1k - 12
e1 -
ƒ m¿ ƒ
fRX1m¿2e -j2pfm¿,
k
(10.141)
where we have used Eq. (10.35). Thus the smoothed estimator has the same mean as
the periodogram estimate on a sample of size k.
Section 10.6 Estimating the Power Spectral Density
Smoothed periodogram
0.3
0.2
0.1
16
32
k
48
64
FIGURE 10.18
Sixty-four-point smoothed periodogram with N = 10, Xn iid uniform in (0, 1),
SX1f 2 = 1>12 = 0.083.
Smoothed periodogram
0.3
0.2
0.1
16
32
k
48
FIGURE 10.19
Sixty-four-point smoothed periodogram with N = 50, Xn iid uniform in (0, 1),
SX1f 2 = 1>12 = 0.083.
64
627
628
Chapter 10
Analysis and Processing of Random Signals
The variance of the smoothed estimator is
1 N
'
'
VAR38pk1f29N4 = 2 a VAR3pk,i1f24
N i=1
=
1
'
VAR3pk1f24
N
M
1
S 1f22.
N X
Thus the variance of the smoothed estimator can be reduced by increasing N, the number of periodograms used in Eq. (10.140).
In practice, a sample set of size Nk, X0 , Á , XNk - 1 is divided into N blocks and a
separate periodogram is computed for each block. The smoothed estimate is then the
average over the N periodograms.This method is called Bartlett’s smoothing procedure.
Note that, in general, the resulting periodograms are not independent because the underlying blocks are not independent. Thus this smoothing procedure must be viewed as
an approximation to the computation and averaging of independent periodograms.
The choice of k and N is determined by the desired frequency resolution and
variance of the estimate. The blocksize k determines the number of frequencies for
which the spectral density is computed (i.e., the frequency resolution). The variance of
the estimate is controlled by the number of periodograms N. The actual choice of k and
N depends on the nature of the signal being investigated.
10.7
NUMERICAL TECHNIQUES FOR PROCESSING RANDOM SIGNALS
In this chapter our discussion has combined notions from random processes with basic
concepts from signal processing. The processing of signals is a very important area in
modern technology and a rich set of techniques and methodologies have been developed to address the needs of specific application areas such as communication systems,
speech compression, speech recognition, video compression, face recognition, network
and service traffic engineering, etc. In this section we briefly present a number of general tools available for the processing of random signals. We focus on the tools provided in Octave since these are quite useful as well as readily available.
10.7.1 FFT Techniques
The Fourier transform relationship between RX1t2 and SX1f2 is fundamental in the
study of wide-sense stationary processes and plays a key role in random signal analysis.
The fast fourier transform (FFT) methods we developed in Section 7.6 can be applied
to the numerical transformation from autocorrelation functions to power spectral densities and back.
Consider the computation of RX1t2 and SX1f2 for continuous-time processes:
RX1t2 =
q
L- q
SX1f2e
W
-j2pft
df L
L-W
SX1f2e -j2pft df.
Section 10.7 Numerical Techniques for Processing Random Signals
629
First we limit the integral to the region where SX1f2 has significant power. Next we restrict our attention to a discrete set of N = 2M frequency values at kf0 so that
-W = -Mf0 6 1-M + 1)f0 6 Á 6 1M - 12f0 6 W, and then approximate the
integral by a sum:
RX1t2 L
-j2pmf0t
f0 .
a SX1mf02e
M-1
m = -M
Finally, we also focus on a set of discrete lag values: kt0 so that -T = -Mt0 6 1-M + 16
t0 6 Á 6 1M - 12t0 6 T. We obtain the DFT as follows:
RX1kt02 L f0 a SX1mf02e -j2pmkt0f0 = f0 a SX1mf02e -j2pmk>N. (10.142)
M-1
M-1
m = -M
m = -M
In order to have a discrete Fourier transform, we must have t0f0 = 1>N, which is equivalent to: t0 = 1>Nf0 and T = Mt0 = 1>2f0 and W = Mf0 = 1>2t0 . We can use the FFT
function introduced in Section 7.6 to perform the transformation in Eq. (10.142) to obtain the set of values 5RX1kt02, k H 3-M, M - 146 from 5SX1mt02, k H 3-M,M - 146.
The transformation in the reverse direction is done in the same way. Since RX1t2 and
SX1f2 are even functions various simplifications are possible. We discuss some of these
in the problems.
Consider the computation of SX1f2 and RX1k2 for discrete-time processes. SX1f2
spans the range of frequencies ƒ f ƒ 6 1/2, so we restrict attention to N points 1/N apart:
SX a
m
b = a RX1k2e -j2pkf `
L
N
k = -q
f = m>N
q
-j2pkm>N
.
a RX1k2e
M-1
(10.143)
k = -M
The approximation here involves neglecting autocorrelation terms outside 3-M, M - 16.
Since df L 1>N, the transformation in the reverse direction is scaled differently:
RX1k2 =
1>2
L-1>2
SX1f2e -j2pkf df L
1 M-1
m
SX a be -j2pkm>N.
N k =a
N
-M
(10.144)
We assume that the student has already tried the FFT exercises in Section 7.6, so we
leave examples in the use of the FFT to the Problems.
The various frequency domain results for linear systems that relate input, output,
and cross-spectral densities can be evaluated numerically using the FFT.
Example 10.27 Output Autocorrelation and Cross-Correlation
Consider Example 10.12, where a random telegraph signal X(t) with a = 1 is passed through a
lowpass filter with b = 1 and b = 10. Find RY1t2.
The random telegraph has SX1f2 = a>1a2 + p2f22 and the filter has transfer function
H1f2 = b>1b + j2pf2, so RY1t2 is given by:
RY1t2 = f -1 E ƒ H1f2 ƒ 2 SX1f2 F =
q
L- q
b2
a2
df.
b 2 + 4p2 f2 a2 + 4p2 f2
630
Chapter 10
Analysis and Processing of Random Signals
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
⫺25 ⫺20 ⫺15 ⫺10 ⫺5
0
f
(a)
5
10
15
20
25
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
⫺4
⫺3
⫺2
⫺1
0
t
1
2
3
4
(b)
FIGURE 10.20
(a) Transfer function and input power spectral density; (b) Autocorrelation of filtered random telegraph with filter b ⴝ 10.
We used an N = 256 FFT to evaluate autocorrelation functions numerically for a = 1 and
b = 1 and b = 10. Figure 10.20(a) shows ƒ H1f2 ƒ 2 and SX1f2 for b = 10. It can be seen that the
transfer function (the dashed line) is close to 1 in the region of f where SX1f2 has most of its
power. Consequently we expect the output for b = 10 to have an autocorrelation similar to that
of the input. For b = 1, on the other hand, the filter will attenuate more of the significant frequencies of X(t) and we expect more change in the output autocorrelation. Figure 10.20(b)
shows the output autocorrelation and we see that indeed for b = 10 (the solid line), RY1t2 is
close to the double-sided exponential of RX1t2. For b = 1 the output autocorrelation differs
significantly from RX1t2.
10.7.2 Filtering Techniques
The autocorrelation and power spectral density functions provide us with information
about the average behavior of the processes. We are also interested in obtaining sample functions of the inputs and outputs of systems. For linear systems the principal tools
for signal processing are the convolution and Fourier transform.
Convolution in discrete-time (Eq. (10.48)) is quite simple and so convolution is
the workhorse in linear signal processing. Octave provides several functions for performing convolutions with discrete-time signals. In Example 10.15 we encountered the
function filter(b,a,x) which implements filtering of the sequence x with an ARMA
filter with coefficients specified by vectors b and a in the following equation.
Yn = - a aiYn - i + a b jXn - j .
q
p
i=1
j=0
Other functions use filter(b,a,x) to provide special cases of filtering. For example,
conv(a,b) convolves the elements in the vectors a and b. We can obtain the output of a
linear system by letting a be the impulse response and b the input random sequence.
The moving average example in Fig. 10.7(b) is easily obtained using this conv. Octave
provides other functions implementing specific digital filters.
Section 10.7 Numerical Techniques for Processing Random Signals
631
We can also obtain the output of a linear system in the frequency domain. We take
the FFT of the input sequence Xn and we then multiply it by the FFT of the transfer
function. The inverse FFT will then provide Yn of the linear system. The Octave function
fftconv(a,b,n) implements this approach. The size of the FFT must be equal to the
total number of samples in the input sequence, so this approach is not advisable for long
input sequences.
10.7.3 Generation of Random Processes
Finally, we are interested in obtaining discrete-time and continuous-time sample functions of the inputs and outputs of systems. Previous chapters provide us with several tools
for the generation of random signals that can act as inputs to the systems of interest.
Section 5.10 provides the method for generating independent pairs of Gaussian
random variables. This method forms the basis for the generation of iid Gaussian sequences and is implemented in normal_rnd=(M,V,Sz). The generation of sequences of
WSS but correlated sequences of Gaussian random variables requires more work. One
approach is to use the matrix approaches developed in Section 6.6 to generate individual vectors with a specified covariance matrix. To generate a vector Y of n outcomes
with covariance K Y , we perform the following factorization:
K Y = AT A P L P T,
and we generate the vector
Y = AT X
where X is vector of iid zero-mean, unit-variance Gaussian random variables. The Octave function svd(B) performs a singular value decomposition of the matrix B, see
[Long]. When B = K Y is a covariance matrix, svd returns the diagonal matrix D of
eigenvalues of K Y as well as the matrices U = P and V = P T.
Example 10.28
Generation of Correlated Gaussian Random Variables
Generate 256 samples of the autoregressive process in Example 10.14 with a = -0.5, sX = 1.
The autocorrelation of the process is given by RX1k2 = 1- 1/22ƒk ƒ. We generate a vector r
of the first 256 lags of RX1k2 and use the function toeplitz(r) to generate the covariance matrix. We then call the svd to obtain A. Finally we produce the output vector Y ⴝ AT X.
>
>
>
>
>
>
n=[0:255]
r=(-0.5).^n;
K=toeplitz(r);
[U,D,V]=svd(K);
X=normal_rnd(0,1,1,256);
y=V*(D^0.5)*transpose(X);
> plot(y)
Figure 10.21(a) shows a plot of Y. To check that the sequence has the desired autocovariance we use the function autocov(X,H) which estimates the autocovariance function of the sequence X for the first H lag values. Figure 10.21(b) shows that the sample correlation coefficient
that is obtained by dividing the autocovariance by the sample variance. The plot shows the alternating covariance values and the expected peak values of -0.5 and 0.25 to the first two lags.
632
Chapter 10
Analysis and Processing of Random Signals
1
3
0.8
2
0.6
1
0.4
0.2
0
0
⫺1
⫺0.2
⫺2
⫺3
⫺0.4
50
100
150
200
250
⫺0.6
2
4
6
8
n
(a)
10
k
12
14
16
18
20
(b)
FIGURE 10.21
(a) Correlated Gaussian noise (b) Sample autocovariance.
An alternative approach to generating a correlated sequence of random variables with
a specified covariance function is to input an uncorrelated sequence into a linear filter with a
specific H( f ). Equation (10.46) allows us to determine the power spectral density of the output sequence. This approach can be implemented using convolution and is applicable to extremely long signal sequences. A large choice of possible filter functions is available for both
continuous-time and discrete-time systems. For example, the ARMA model in Example 10.15
is capable of implementing a broad range of transfer functions. Indeed the entire discussion
in Section 10.4 was focused on obtaining the transfer function of optimal linear systems in
various scenarios.
Example 10.29
Generation of White Gaussian Noise
Find a method for generating white Gaussian noise for a simulation of a continuous-time communications system.
The generation of discrete-time white Gaussian noise is trivial and involves the generation
of a sequence of iid Gaussian random variables. The generation of continuous-time white Gaussian noise is not so simple. Recall from Example 10.3 that true white noise has infinite bandwidth
and hence infinite power and so is impossible to realize. Real systems however are bandlimited,
and hence we always end up dealing with bandlimited white noise. If the system of interest is
bandlimited to W Hertz, then we need to model white noise limited to W Hz. In Example 10.3 we
found this type of noise has autocorrelation:
RX1t2 =
N0 sin12pWt2
2pt
.
The sampling theorem discussed in Section 10.3 allows us to represent bandlimited white Gaussian noise as follows:
n 1t2 =
X
aqX1nT2p1t - nT2 where p1t2 =
q
n=-
sin1pt>T2
pt>T
,
Checklist of Important Terms
633
where 1>T = 2W. The coefficients X(nT) have autocorrelation RX1nT2 which is given by:
RX1nT2 =
=
N0 sin12pWnT2
2pnT
N0W sin1pn2
pn
=
= b
N0 sin12pWn>2W2
2pn>2W
N0W
0
for
for
n = 0
n Z 0.
We thus conclude that X(nT) is an iid sequence of Gaussian random variables with variance
N0W. Therefore we can simulate sampled bandlimited white Gaussian noise by generating a sequence X(nT). We can perform any processing required in the discrete-time domain, and we can
then apply the result to an interpolator to recover the continuous-time output.
SUMMARY
• The power spectral density of a WSS process is the Fourier transform of its autocorrelation function. The power spectral density of a real-valued random process
is a real-valued, nonnegative, even function of frequency.
• The output of a linear, time-invariant system is a WSS random process if its input
is a WSS random process that is applied an infinite time in the past.
• The output of a linear, time-invariant system is a Gaussian WSS random process
if its input is a Gaussian WSS random process.
• Wide-sense stationary random processes with arbitrary rational power spectral
density can be generated by filtering white noise.
• The sampling theorem allows the representation of bandlimited continuous-time
processes by the sequence of periodic samples of the process.
• The orthogonality condition can be used to obtain equations for linear systems that
minimize mean square error. These systems arise in filtering, smoothing, and prediction problems. Matrix numerical methods are used to find the optimum linear systems.
• The Kalman filter can be used to estimate signals with a structure that keeps the dimensionality of the algorithm fixed even as the size of the observation set increases.
• The variance of the periodogram estimate for the power spectral density does not
approach zero as the number of samples is increased. An average of several independent periodograms is required to obtain an estimate whose variance does approach zero as the number of samples is increased.
• The FFT, convolution, and matrix techniques are basic tools for analyzing, simulating, and implementing processing of random signals.
CHECKLIST OF IMPORTANT TERMS
Amplitude modulation
ARMA process
Autoregressive process
Bandpass signal
Causal system
Cross-power spectral density
Einstein-Wiener-Khinchin theorem
Filtering
Impulse response
Innovations
634
Chapter 10
Analysis and Processing of Random Signals
Kalman filter
Linear system
Long-range dependence
Moving average process
Nyquist sampling rate
Optimum filter
Orthogonality condition
Periodogram
Power spectral density
Prediction
Quadrature amplitude modulation
Sampling theorem
Smoothed periodogram
Smoothing
System
Time-invariant system
Transfer function
Unit-sample response
White noise
Wiener filter
Wiener-Hopf equations
Yule-Walker equations
ANNOTATED REFERENCES
References [1] through [6] contain good discussions of the notion of power spectral
density and of the response of linear systems to random inputs. References [6] and [7]
give accessible introductions to the spectral factorization problem. References [7]
through [9] discuss linear filtering and power spectrum estimation in the context of
digital signal processing. Reference [10] discusses the basic theory underlying power
spectrum estimation.
1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes,
McGraw-Hill, New York, 2002.
2. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory
for Engineers, 3d ed., Prentice Hall, Upper Saddle River, N.J., 2002.
3. R. M. Gray and L. D. Davisson, Random Processes: A Mathematical Approach for
Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986.
4. R. D. Yates and D. J. Goodman, Probability and Stochastic Processes, Wiley, New
York, 2005.
5. J. A. Gubner, Probability and Random Processes for Electrical and Computer Engineering, Cambridge University Press, Cambridge, 2006.
6. G. R. Cooper and C. D. MacGillem, Probabilistic Methods of Signal and System
Analysis, Holt, Rinehart & Winston, New York, 1986.
7. J. A. Cadzow, Foundations of Digital Signal Processing and Data Analysis,
Macmillan, New York, 1987.
8. A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice
Hall, Englewood Cliffs, N.J., 1989.
9. M. Kunt, Digital Signal Processing, Artech House, Dedham, Mass., 1986.
10. G. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications, Holden
Day, San Francisco, 1968.
11. A. Einstein, “Method for the Determination of the Statistical Values of Observations Concerning Quantities Subject to Irregular Observations,” reprinted in
IEEE ASSP Magazine, October 1987, p. 6.
12. P. J. G. Long, “Introduction to Octave,” University of Cambridge, September,
2005, available online.
Problems
635
PROBLEMS
Section 10.1: Power Spectral Density
10.1. Let g(x) denote the triangular function shown in Fig. P10.1.
(a) Find the power spectral density corresponding to RX1t2 = g1t>T2.
(b) Find the autocorrelation corresponding to the power spectral density
SX1f2 = g1f>W2.
A
⫺1
0
x
1
FIGURE P10.1
10.2. Let p(x) be the rectangular function shown in Fig. P10.2. Is RX1t2 = p1t>T2 a valid autocorrelation function?
A
⫺1
0
x
1
FIGURE P10.2
10.3. (a) Find the power spectral density SY1f2 of a random process with autocorrelation
function RX1t2 cos12pf0t2, where RX1t2 is itself an autocorrelation function.
(b) Plot SY1f2 if RX1t2 is as in Problem 10.1a.
10.4. (a) Find the autocorrelation function corresponding to the power spectral density
shown in Fig. P10.3.
(b) Find the total average power.
(c) Plot the power in the range ƒ f ƒ 7 f0 as a function of f0 7 0.
B
A
⫺f 2
⫺f 1
FIGURE P10.3
A
0
f1
f2
f
636
Chapter 10
Analysis and Processing of Random Signals
10.5. A random process X(t) has autocorrelation given by RX1t2 = s2Xe -t >2a , a > 0.
(a) Find the corresponding power spectral density.
(b) Find the amount of power contained in the frequencies ƒ f ƒ 7 k / 2pa, where
k = 1, 2, 3.
10.6. Let Z1t2 = X1t2 + Y1t2. Under what conditions does SZ1f2 = SX1f2 + SY1f2?
10.7. Show that
(a) RX,Y1t2 = RY,X1-t2.
(b) SX,Y1f2 = S…Y,X1f2.
10.8. Let Y1t2 = X1t2 - X1t - d2.
(a) Find RX,Y1t2 and SX,Y1f2.
(b) Find RY1t2 and SY1f2.
10.9. Do Problem 10.8 if X(t) has the triangular autocorrelation function g(t/T ) in Problem
10.1 and Fig. P 10.1.
10.10. Let X(t) and Y(t) be independent wide-sense stationary random processes, and define
Z1t2 = X1t2Y1t2.
(a) Show that Z(t) is wide-sense stationary.
(b) Find RZ1t2 and SZ1f2.
10.11. In Problem 10.10, let X1t2 = a cos12pf0 t + ®2 where ® is a uniform random variable in
10, 2p2. Find RZ1t2 and SZ1f2.
10.12. Let RX1k2 = 4aƒkƒ, ƒ a ƒ 6 1.
(a) Find SX1f2.
(b) Plot SX1f2 for a = 0.25 and a = 0.75, and comment on the effect of the value of a.
10.13. Let RX1k2 = 41a2ƒkƒ + 161b2ƒk ƒ, a < 1, b < 1.
(a) Find SX1f2.
(b) Plot SX1f2 for a = b = 0.5 and a = 0.75 = 3b and comment on the effect of value
of a>b.
10.14. Let RX1k2 = 911 - ƒ k ƒ >N2, for ƒ k ƒ 6 N and 0 elsewhere. Find and plot SX1f2.
10.15. Let Xn = cos12pf0n + ®2, where ® is a uniformly distributed random variable in the
interval 10, 2p2. Find and plot SX1f2 for f0 = 0.5, 1, 1.75, p.
10.16. Let Dn = Xn - Xn - d , where d is an integer constant and Xn is a zero-mean, WSS random process.
(a) Find RD1k2 and SD1f2 in terms of RX1k2 and SX1f2. What is the impact of d?
(b) Find E3D2n4.
10.17. Find RD1k2 and SD1f2 in Problem 10.16 if Xn is the moving average process of Example
10.7 with a = 1.
10.18. Let Xn be a zero-mean, bandlimited white noise random process with SX1f2 = 1 for
ƒ f ƒ 6 fc and 0 elsewhere, where fc 6 1>2.
(a) Show that RX1k2 = sin12pfck2>1pk2.
(b) Find RX1k2 when fc = 1>4.
10.19. Let Wn be a zero-mean white noise sequence, and let Xn be independent of Wn .
(a) Show that Yn = WnXn is a white sequence, and find s2Y.
(b) Suppose Xn is a Gaussian random process with autocorrelation RX1k2 = 11>22ƒkƒ.
Specify the joint pmf’s for Yn .
2
2
Problems
637
10.20. Evaluate the periodogram estimate for the random process X1t2 = a cos12pf0t + ®2,
where ® is a uniformly distributed random variable in the interval 10, 2p2. What happens as T : q ?
10.21. (a) Show how to use the FFT to calculate the periodogram estimate in Eq. (10.32).
(b) Generate four realizations of an iid zero-mean unit-variance Gaussian sequence of
length 128. Calculate the periodogram.
(c) Calculate 50 periodograms as in part b and show the average of the periodograms
after every 10 additional realizations.
Section 10.2: Response of Linear Systems to Random Signals
10.22. Let X(t) be a differentiable WSS random process, and define
Y1t2 =
d
X1t2.
dt
Find an expression for SY1f2 and RY1t2. Hint: For this system, H1f2 = j2pf.
10.23. Let Y(t) be the derivative of X(t), a bandlimited white noise process as in Example 10.3.
(a) Find SY1f2 and RY1t2.
(b) What is the average power of the output?
2
10.24. Repeat Problem 10.23 if X(t) has SX1f2 = b 2e -pf .
10.25. Let Y(t) be a short-term integration of X(t):
t
Y1t2 =
1
X1t¿2 dt¿.
T Lt - T
(a) Find the impulse response h(t) and the transfer function H(f).
(b) Find SY1f2 in terms of SX1f2.
10.26. In Problem 10.25, let RX1t2 = 11 - ƒ t ƒ >T2 for ƒ t ƒ 6 T and zero elsewhere.
(a) Find SY1f2.
(b) Find RY1t2.
(c) Find E3Y21t24.
10.27. The input into a filter is zero-mean white noise with noise power density N0>2. The filter
has transfer function
1
.
H1f2 =
1 + j2pf
(a) Find SY,X1f2 and RY,X1t2.
(b) Find SY1f2 and RY1t2.
(c) What is the average power of the output?
10.28. A bandlimited white noise process X(t) is input into a filter with transfer function
H1f2 = 1 + j2pf.
(a) Find SY,X1f2 and RY,X1t2 in terms of RX1t2 and SX1f2.
(b) Find SY1f2 and RY1t2 in terms of RX1t2 and SX1f2.
(c) What is the average power of the output?
10.29. (a) A WSS process X(t) is applied to a linear system at t = 0. Find the mean and autocorrelation function of the output process. Show that the output process becomes
WSS as t : q .
638
Chapter 10
Analysis and Processing of Random Signals
10.30. Let Y(t) be the output of a linear system with impulse response h(t) and input X(t). Find
RY,X1t2 when the input is white noise. Explain how this result can be used to estimate the
impulse response of a linear system.
10.31. (a) A WSS Gaussian random process X(t) is applied to two linear systems as shown in
Fig. P10.4. Find an expression for the joint pdf of Y1t12 and W1t22.
(b) Evaluate part a if X(t) is white Gaussian noise.
h1(t)
Y(t)
h2(t)
W(t)
X(t)
FIGURE P10.4
10.32. Repeat Problem 10.31b if h11t2 and h21t2 are ideal bandpass filters as in Example 10.11.
Show that Y(t) and W(t) are independent random processes if the filters have nonoverlapping bands.
10.33. Let Y1t2 = h1t2 * X1t2 and Z1t2 = X1t2 - Y1t2 as shown in Fig. P10.5.
(a) Find SZ1f2 in terms of SX1f2.
(b) Find E3Z21t24.
X(t)
h(t)
Y(t)
⫹
⫺
⫹
Z(t)
FIGURE P10.5
10.34. Let Y(t) be the output of a linear system with impulse response h(t) and input X1t2 + N1t2.
Let Z1t2 = X1t2 - Y1t2.
(a) Find RX,Y1t2 and RZ1t2.
(b) Find SZ1f2.
(c) Find SZ1f2 if X(t) and N(t) are independent random processes.
10.35. A random telegraph signal is passed through an ideal lowpass filter with cutoff frequency
W. Find the power spectral density of the difference between the input and output of the
filter. Find the average power of the difference signal.
Problems
639
10.36. Let Y1t2 = a cos12pfct + ®2 + N1t2 be applied to an ideal bandpass filter that passes
the frequencies ƒ f–fc ƒ 6 W>2. Assume that ® is uniformly distributed in 10, 2p2. Find
the ratio of signal power to noise power at the output of the filter.
10.37. Let Yn = 1Xn + 1 + Xn + Xn - 12>3 be a “smoothed” version of Xn . Find RY1k2, SY1f2,
and E3Y2n4.
10.38. Suppose Xn is a white Gaussian noise process in Problem 10.37. Find the joint pmf for
1Yn , Yn + 1 , Yn + 22.
10.39. Let Yn = Xn + bXn - 1 , where Xn is a zero-mean, first-order autoregressive process with
autocorrelation RX1k2 = s2ak, ƒ a ƒ 6 1.
(a) Find RY,X1k2 and SY,X1f2.
(b) Find SY1f2, RY1k2, and E3Y2n4.
(c) For what value of b is Yn a white noise process?
10.40. A zero-mean white noise sequence is input into a cascade of two systems (see Fig. P10.6).
System 1 has impulse response hn = 11>22nu1n2 and system 2 has impulse response
gn = 11>42nu1n2 where u1n2 = 1 for n Ú 0 and 0 elsewhere.
(a) Find SY1f2 and SZ1f2.
(b) Find RW,Y1k2 and RW,Z1k2; find SW,Y1f2 and SW,Z1f2. Hint: Use a partial fraction
expansion of SW,Z1f2 prior to finding RW,Z1k2.
(c) Find E3Z2n4.
Wn
hn
Yn
gn
Zn
FIGURE P10.6
10.41. A moving average process Xn is produced as follows:
Xn = Wn + a1Wn - 1 + Á + apWn - p ,
where Wn is a zero-mean white noise process.
(a) Show that RX1k2 = 0 for ƒ k ƒ 7 p.
(b) Find RX1k2 by computing E3Xn + kXn4, then find SX1f2 = f5RX1k26.
(c) Find the impulse response hn of the linear system that defines the moving average
process. Find the corresponding transfer function H( f ), and then SX1f2. Compare
your answer to part b.
10.42. Consider the second-order autoregressive process defined by
Yn =
3
1
Yn - 1 - Yn - 2 + Wn ,
4
8
where the input Wn is a zero-mean white noise process.
(a) Verify that the unit-sample response is hn = 211>22n - 11>42n for n Ú 0, and 0 otherwise.
(b) Find the transfer function.
(c) Find SY1f2 and RY1k2 = f-15SY1f26.
640
Chapter 10
Analysis and Processing of Random Signals
10.43. Suppose the autoregressive process defined in Problem 10.42 is the input to the following
moving average system:
Zn = Yn - 1/4Yn - 1 .
(a) Find SZ1f2 and RZ1k2.
(b) Explain why Zn is a first-order autoregressive process.
(c) Find a moving average system that will produce a white noise sequence when Zn is
the input.
10.44. An autoregressive process Yn is produced as follows:
Yn = a1Yn - 1 + Á + aqYn - q + Wn ,
where Wn is a zero-mean white noise process.
(a) Show that the autocorrelation of Yn satisfies the following set of equations:
RY102 = a aiRY1i2 + RW102
q
i=1
RY1k2 = a aiRY1k - i2.
q
i=1
(b) Use these recursive equations to compute the autocorrelation of the process in
Example 10.22.
Section 10.3: Bandlimited Random Processes
10.45. (a) Show that the signal x(t) is recovered in Figure 10.10(b) as long as the sampling rate
is above the Nyquist rate.
(b) Suppose that a deterministic signal is sampled at a rate below the Nyquist rate.
Use Fig. 10.10(b) to show that the recovered signal contains additional signal components from the adjacent bands. The error introduced by these components is
called aliasing.
(c) Find an expression for the power spectral density of the sampled bandlimited random process X(t).
(d) Find an expression for the power in the aliasing error components.
(e) Evaluate the power in the error signal in part c if SX1f2 is as in Problem 10.1b.
10.46. An ideal discrete-time lowpass filter has transfer function:
H1f2 = b
1
0
for
for
ƒ f ƒ 6 fc 6 1>2
fc 6 ƒ f ƒ 6 1>2.
(a) Show that H( f ) has impulse response hn = sin12pfcn2>pn.
(b) Find the power spectral density of Y(kT) that results when the signal in Problem
10.1b is sampled at the Nyquist rate and processed by the filter in part a.
(c) Let Y(t) be the continuous-time signal that results when the output of the filter in
part b is fed to an interpolator operating at the Nyquist rate. Find SY1f2.
10.47. In order to design a differentiator for bandlimited processes, the filter in Fig. 10.10(c) is
designed to have transfer function:
H1f2 = j2pf>T for ƒ f ƒ 6 1/2.
Problems
641
(a) Show that the corresponding impulse response is:
h0 = 0, hn =
10.48.
10.49.
10.50.
10.51.
10.52.
10.53.
1-12n
pn cospn - sinpn
=
n Z 0
nT
pn2T
(b) Suppose that X1t2 = a cos12pf0t + ®2 is sampled at a rate 1>T = 4f0 and then
input into the above digital filter. Find the output Y(t) of the interpolator.
Complete the proof of the sampling theorem by showing that the mean square error is
n 11t2 X1kT24 = 0, all k.
zero. Hint: First show that E31X1t2-1X
Plot the power spectral density of the amplitude modulated signal Y(t) in Example 10.18,
assuming fc 7 W; fc 6 W. Assume that A(t) is the signal in Problem 10.1b.
Suppose that a random telegraph signal with transition rate a is the input signal in an amplitude modulation system. Plot the power spectral density of the modulated signal assuming fc = a>p and fc = 10a>p.
Let the input to an amplitude modulation system be 2 cos12pf1 + £2, where £ is uniformly distributed in 1-p, p2. Find the power spectral density of the modulated signal
assuming fc 7 f1 .
Find the signal-to-noise ratio in the recovered signal in Example 10.18 if SN1f2 = af2 for
ƒ f ; fc ƒ 6 W and zero elsewhere.
The input signals to a QAM system are independent random processes with power spectral densities shown in Fig. P10.7. Sketch the power spectral density of the QAM signal.
SA( f )
W
S B( f )
0
W
W
0
W
FIGURE P10.7
10.54. Under what conditions does the receiver shown in Fig. P10.8 recover the input signals to
a QAM signal?
⫻
X(t)
LPF
2 cos (2πfc t ⫹ ⌰)
⫻
LPF
2 sin (2πfc t ⫹ ⌰)
FIGURE P10.8
10.55. Show that Eq. (10.67b) implies that SB,A1f2 is a purely imaginary, odd function of f.
642
Chapter 10
Analysis and Processing of Random Signals
Section 10.4: Optimum Linear Systems
10.56. Let Xa = Za + Na as in Example 10.22, where Za is a first-order process with
RZ1k2 = 413>42ƒkƒ and Na is white noise with s2N = 1.
(a) Find the optimum p = 1 filter for estimating Za .
(b) Find the mean square error of the resulting filter.
10.57. Let Xa = Za + Na as in Example 10.21, where Za has RZ1k2 = s2Z1r12ƒkƒ and Na has
RN1k2 = s2Nrƒ2kƒ, where r1 and r2 are less than one in magnitude.
(a) Find the equation for the optimum filter for estimating Za .
(b) Write the matrix equation for the filter coefficients.
(c) Solve the p = 2 case, if s2Z = 9, r1 = 2>3, s2N = 1, and r2 = 1>3.
(d) Find the mean square error for the optimum filter in part c.
(e) Use the matrix function of Octave to solve parts c and d for p = 3, 4, 5.
10.58. Let Xa = Za + Na as in Example 10.21, where Za is the first-order moving average
process of Example 10.7, and Na is white noise.
(a) Find the equation for the optimum filter for estimating Za .
(b) For the p = 1 and p = 2 cases, write and solve the matrix equation for the filter coefficients.
(c) Find the mean square error for the optimum filter in part b.
10.59. Let Xa = Za + Na as in Example 10.19, and suppose that an estimator for Za uses observations from the following time instants: I = 5n - p, Á , n, Á , n + p6.
(a) Solve the p = 1 case if Za and Na are as in Problem 10.56.
(b) Find the mean square error in part a.
(c) Find the equation for the optimum filter.
(d) Write the matrix equation for the 2p + 1 filter coefficients.
(e) Use the matrix function of Octave to solve parts a and b for p = 2, 3.
10.60. Consider the predictor in Eq. (10.86b).
(a) Find the optimum predictor coefficients in the p = 2 case when RZ1k2 = 911>32ƒkƒ.
(b) Find the mean square error in part a.
(c) Use the matrix function of Octave to solve parts a and b for p = 3, 4, 5.
10.61. Let X(t) be a WSS, continuous-time process.
(a) Use the orthogonality principle to find the best estimator for X(t) of the form
n 1t2 = aX1t 2 + bX1t 2,
X
1
2
where t1 and t2 are given time instants.
(b) Find the mean square error of the optimum estimator.
(c) Check your work by evaluating the answer in part b for t = t1 and t = t2 . Is the answer what you would expect?
10.62. Find the optimum filter and its mean square error in Problem 10.61 if t1 = t - d and
t2 = t + d.
10.63. Find the optimum filter and its mean square error in Problem 10.61 if t1 = t - d and t2 = t
- 2 d, and RX1t2 = e - aƒ t ƒ Compare the performance of this filter to the performance
n 1t2 = aX1t - d2.
of the optimum filter of the form X
Problems
643
10.64. Modify the system in Problem 10.33 to obtain a model for the estimation error in the optimum infinite-smoothing filter in Example 10.24. Use the model to find an expression
for the power spectral density of the error e1t2 = Z1t2 - Y1t2, and then show that the
mean square error is given by:
q
SZ1f2SN1f2
df.
E3e21t24 =
S
L- q Z1f2 + SN1f2
10.65.
10.66.
10.67.
10.68.
Hint: E3e21t24 = Re102.
Solve the infinite-smoothing problem in Example 10.24 if Z(t) is the random telegraph
signal with a = 1/2 and N(t) is white noise. What is the resulting mean square error?
Solve the infinite-smoothing problem in Example 10.24 if Z(t) is bandlimited white noise
of density N1>2 and N(t) is (infinite-bandwidth) white noise of noise density N0>2. What
is the resulting mean square error?
Solve the infinite-smoothing problem in Example 10.24 if Z(t) and N(t) are as given in
Example 10.25. Find the resulting mean square error.
Let Xn = Zn + Nn , where Zn and Nn are independent, zero-mean random processes.
(a) Find the smoothing filter given by Eq. (10.89) when Zn is a first-order autoregressive
process with s2X = 9 and a = 1/2 and Nn is white noise with s2N = 4.
(b) Use the approach in Problem 10.64 to find the power spectral density of the error Se1f2.
(c) Find Re1k2 as follows: Let Z = ej2pf, factor the denominator Se1f2, and take the inverse transform to show that:
Re1k2 =
sX2 z1
a11 - z212
z1ƒkƒ
where 0 6 z1 6 1.
(d) Find an expression for the resulting mean square error.
10.69. Find the Wiener filter in Example 10.25 if N(t) is white noise of noise density N0>2 = 1>3
and Z(t) has power spectral density
Sz1f2 =
4
.
4 + 4p2f2
10.70. Find the mean square error for the Wiener filter found in Example 10.25. Compare this
with the mean square error of the infinite-smoothing filter found in Problem 10.67.
10.71. Suppose we wish to estimate (predict) X1t + d2 by
n 1t + d2 =
X
L0
q
h1t2X1t - t2 dt.
(a) Show that the optimum filter must satisfy
RX1t + d2 =
L0
q
h1x2RX1t - x2 dx
t Ú 0.
(b) Use the Wiener-Hopf method to find the optimum filter when RX1t2 = e -2ƒtƒ.
10.72. Let Xn = Zn + Nn , where Zn and Nn are independent random processes, Nn is a white
noise process with s2N = 1, and Zn is a first-order autoregressive process with RZ1k2 =
411>22ƒk ƒ. We are interested in the optimum filter for estimating Zn from Xn , Xn - 1 , Á .
644
Chapter 10
Analysis and Processing of Random Signals
(a) Find SX1f2 and express it in the form:
SX1f2 =
1
1
¢ 1 - e -j2pf ≤ ¢ 1 - z1ej2pf ≤
z1
2z1
a1 -
1
1 -j2pf
e
b a1 - ej2pf b
2
2
.
(b) Find the whitening causal filter.
(c) Find the optimal causal filter.
Section 10.5: The Kalman Filter
10.73. If Wn and Nn are Gaussian random processes in Eq. (10.102), are Zn and Xn Markov
processes?
10.74. Derive Eq. (10.120) for the mean square prediction error.
10.75. Repeat Example 10.26 with a = 0.5 and a = 2.
10.76. Find the Kalman algorithm for the case where the observations are given by
Xn = bnZn + Nn
where bn is a sequence of known constants.
*Section 10.6: Estimating the Power Spectral Density
10.77. Verify Eqs. (10.125) and (10.126) for the periodogram and the autocorrelation function
estimate.
10.78. Generate a sequence Xn of iid random variables that are uniformly distributed in (0, 1).
(a) Compute several 128-point periodograms and verify the random behavior of the periodogram as a function of f. Does the periodogram vary about the true power spectral density?
(b) Compute the smoothed periodogram based on 10, 20, and 50 independent periodograms. Compare the smoothed periodograms to the true power spectral density.
10.79. Repeat Problem 10.78 with Xn a first-order autoregressive process with autocorrelation
function: RX1k2 = 1.92ƒkƒ; RX1k2 = 11>22ƒkƒ; RX1k2 = 1.12ƒkƒ.
10.80. Consider the following estimator for the autocorrelation function
rN kœ 1m2 =
1
a XnXn + m .
k - ƒmƒ - 1
k - ƒmƒ
n=0
Show that if we estimate the power spectrum of Xn by the Fourier transform of rN kœ 1m2,
the resulting estimator has mean
'
E3pk1f24 =
a
k-1
m¿ = -1k - 12
RX1m¿2e -j2pfm¿.
Why is the estimator biased?
Section 10.7: Numerical Techniques for Processing Random Signals
10.81. Let X(t) have power spectral density given by SX1f2 = b 2e -f >2W0 > 22p .
(a) Before performing an FFT of SX1f2, you are asked to calculate the power in the
aliasing error if the signal is treated as if it were bandlimited with bandwidth kW0 .
2
2
Problems
10.82.
10.83.
10.84.
10.85.
645
What value of W should be used for the FFT if the power in the aliasing error is to
be less than 1% of the total power? Assume W0 = 1000 and b = 1.
(b) Suppose you are to perform N = 2M point FFT of SX1f2. Explore how W, T, and t0
vary as a function of f0 . Discuss what leeway is afforded by increasing N.
(c) For the value of W in part a, identify the values of the parameters f0 , T, and t0 for
N = 128, 256, 512, 1024.
(d) Find the autocorrelation 5RX1kt026 by applying the FFT to SX1f2. Try the options
identified in part c and comment on the accuracy of the results by comparing them
to the exact value of RX1t2.
Use the FFT to calculate and plot SX1f2 for the following discrete-time processes:
(a) RX1k2 = 4aƒk ƒ, for a = 0.25 and a = 0.75.
(b) RX1k2 = 411>22ƒk ƒ + 1611>42ƒkƒ..
(c) Xn = cos12pf0n + ®2, where ® is a uniformly distributed in (0, 2p] and f0 = 1000.
Use the FFT to calculate and plot RX1k2 for the following discrete-time processes:
(a) SX1f2 = 1 for ƒ f ƒ 6 fc and 0 elsewhere, where fc = 1/8, 1/4, 3/8.
(b) SX1f2 = 1/2 + 1/2 cos 2pf for ƒ f ƒ 6 1/2.
Use the FFT to find the output power spectral density in the following systems:
(a) Input Xn with RX1k2 = 4aƒkƒ, for a = 0.25, H1f2 = 1 for ƒ f ƒ 6 1/4.
(b) Input Xn = cos12pf0n + ®2, where ® is a uniformly distributed random variable
and H1f2 = j2pf for ƒ f ƒ 6 1/2.
(c) Input Xn with RX(k) as in Problem 10.14 with N = 3 and H1f2 = 1 for ƒ f ƒ 6 1/2.
(a) Show that
RX1t2 = 2Re b
L0
q
SX1f2e -j2pft df r .
(b) Use approximations to express the above as a DFT relating N points in the time domain to N points in the frequency domain.
(c) Suppose we meet the t0f0 = 1>N requirement by letting t0 = f0 = 1> 2N. Compare
this to the approach leading to Eq. (10.142).
10.86. (a) Generate a sequence of 1024 zero-mean unit-variance Gaussian random variables
and pass it through a system with impulse response hn = e -2n for n Ú 0.
(b) Estimate the autocovariance of the output process of the digital filter and compare
it to the theoretical autocovariance.
(c) What is the pdf of the continuous-time process that results if the output of the digital filter is fed into an interpolator?
10.87. (a) Use the covariance matrix factorization approach to generate a sequence of 1024
Gaussian samples with autocovariance h1t2 = e -2ƒtƒ.
(b) Estimate the autocovariance of the observed sequence and compare to the theoretical result.
Problems Requiring Cumulative Knowledge
10.88. Does the pulse amplitude modulation signal in Example 9.38 have a power spectral density? Explain why or why not. If the answer is yes, find the power spectral density.
10.89. Compare the operation and performance of the Wiener and Kalman filters for the signals
discussed in Example 10.26.
646
Chapter 10
Analysis and Processing of Random Signals
10.90. (a) Find the power spectral density of the ARMA process in Example 10.15 by finding
the transfer function of the associated linear system.
(b) For the ARMA process find the cross-power spectral density from E3YnXm4, and
then the power spectral density from E3YnYm4.
10.91. Let X11t2 and X21t2 be jointly WSS and jointly Gaussian random processes that are input
into two linear time-invariant systems as shown below:
X11t2 : 冷 h11t2 冷 : Y11t2
X21t2 : 冷 h21t2 冷 : Y21t2
(a) Find the cross-correlation function of Y11t2 and Y21t2. Find the corresponding crosspower spectral density.
(b) Show that Y11t2 and Y21t2 are jointly WSS and jointly Gaussian random processes.
(c) Suppose that the transfer functions of the above systems are nonoverlapping, that is,
ƒ H11f2 ƒ ƒ H21f2 ƒ = 0. Show that Y11t2 and Y21t2 are independent random processes.
(d) Now suppose that X11t2 and X21t2 are nonstationary jointly Gaussian random
processes. Which of the above results still hold?
10.92. Consider the communication system in Example 9.38 where the transmitted signal X(t)
consists of a sequence of pulses that convey binary information. Suppose that the pulses
p(t) are given by the impulse response of the ideal lowpass filter in Figure 10.6.The signal
that arrives at the receiver is Y1t2 = X1t2 + N1t2 which is to be sampled and processed
digitally.
(a) At what rate should Y(t) be sampled?
(b) How should the bit carried by each pulse be recovered based on the samples Y(nT)?
(c) What is the probability of error in this system?