Academia.eduAcademia.edu
Probability, Statistics, and Random Processes for Electrical Engineering Third Edition Alberto Leon-Garcia University of Toronto Upper Saddle River, NJ 07458 Contents Preface ix CHAPTER 1 1.1 1.2 1.3 1.4 1.5 1.6 CHAPTER 2 2.1 2.2 *2.3 2.4 2.5 2.6 *2.7 *2.8 *2.9 CHAPTER 3 3.1 3.2 3.3 3.4 3.5 3.6 Probability Models in Electrical and Computer Engineering 1 Mathematical Models as Tools in Analysis and Design 2 Deterministic Models 4 Probability Models 4 A Detailed Example: A Packet Voice Transmission System Other Examples 11 Overview of Book 16 Summary 17 Problems 18 Basic Concepts of Probability Theory 21 Specifying Random Experiments 21 The Axioms of Probability 30 Computing Probabilities Using Counting Methods 41 Conditional Probability 47 Independence of Events 53 Sequential Experiments 59 Synthesizing Randomness: Random Number Generators Fine Points: Event Classes 70 Fine Points: Probabilities of Sequences of Events 75 Summary 79 Problems 80 Discrete Random Variables 9 67 96 The Notion of a Random Variable 96 Discrete Random Variables and Probability Mass Function Expected Value and Moments of Discrete Random Variable Conditional Probability Mass Function 111 Important Discrete Random Variables 115 Generation of Discrete Random Variables 127 Summary 129 Problems 130 99 104 v vi Contents CHAPTER 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 *4.10 CHAPTER 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10 CHAPTER 6 6.1 6.2 6.3 6.4 6.5 6.6 One Random Variable 141 The Cumulative Distribution Function 141 The Probability Density Function 148 The Expected Value of X 155 Important Continuous Random Variables 163 Functions of a Random Variable 174 The Markov and Chebyshev Inequalities 181 Transform Methods 184 Basic Reliability Calculations 189 Computer Methods for Generating Random Variables Entropy 202 Summary 213 Problems 215 Pairs of Random Variables 194 233 Two Random Variables 233 Pairs of Discrete Random Variables 236 The Joint cdf of X and Y 242 The Joint pdf of Two Continuous Random Variables 248 Independence of Two Random Variables 254 Joint Moments and Expected Values of a Function of Two Random Variables 257 Conditional Probability and Conditional Expectation 261 Functions of Two Random Variables 271 Pairs of Jointly Gaussian Random Variables 278 Generating Independent Gaussian Random Variables 284 Summary 286 Problems 288 Vector Random Variables 303 Vector Random Variables 303 Functions of Several Random Variables 309 Expected Values of Vector Random Variables 318 Jointly Gaussian Random Vectors 325 Estimation of Random Variables 332 Generating Correlated Vector Random Variables 342 Summary 346 Problems 348 Contents CHAPTER 7 7.1 7.2 7.3 *7.4 *7.5 7.6 CHAPTER 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 CHAPTER 9 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 *9.9 9.10 Sums of Random Variables and Long-Term Averages 359 Sums of Random Variables 360 The Sample Mean and the Laws of Large Numbers 365 Weak Law of Large Numbers 367 Strong Law of Large Numbers 368 The Central Limit Theorem 369 Central Limit Theorem 370 Convergence of Sequences of Random Variables 378 Long-Term Arrival Rates and Associated Averages 387 Calculating Distribution’s Using the Discrete Fourier Transform 392 Summary 400 Problems 402 Statistics 411 Samples and Sampling Distributions 411 Parameter Estimation 415 Maximum Likelihood Estimation 419 Confidence Intervals 430 Hypothesis Testing 441 Bayesian Decision Methods 455 Testing the Fit of a Distribution to Data 462 Summary 469 Problems 471 Random Processes vii 487 Definition of a Random Process 488 Specifying a Random Process 491 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk 498 Poisson and Associated Random Processes 507 Gaussian Random Processes, Wiener Process and Brownian Motion 514 Stationary Random Processes 518 Continuity, Derivatives, and Integrals of Random Processes 529 Time Averages of Random Processes and Ergodic Theorems 540 Fourier Series and Karhunen-Loeve Expansion 544 Generating Random Processes 550 Summary 554 Problems 557 viii Contents CHAPTER 10 10.1 10.2 10.3 10.4 *10.5 *10.6 10.7 Analysis and Processing of Random Signals Power Spectral Density 577 Response of Linear Systems to Random Signals 587 Bandlimited Random Processes 597 Optimum Linear Systems 605 The Kalman Filter 617 Estimating the Power Spectral Density 622 Numerical Techniques for Processing Random Signals Summary 633 Problems 635 CHAPTER 11 11.1 11.2 11.3 Markov Chains CHAPTER 12 12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10 647 Introduction to Queueing Theory 713 The Elements of a Queueing System 714 Little’s Formula 715 The M/M/1 Queue 718 Multi-Server Systems: M/M/c, M/M/c/c, And M>M> ˆ 727 Finite-Source Queueing Systems 734 M/G/1 Queueing Systems 738 M/G/1 Analysis Using Embedded Markov Chains 745 Burke’s Theorem: Departures From M/M/c Systems 754 Networks of Queues: Jackson’s Theorem 758 Simulation and Data Analysis of Queueing Systems 771 Summary 782 Problems 784 Appendices Index 628 Markov Processes 647 Discrete-Time Markov Chains 650 Classes of States, Recurrence Properties, and Limiting Probabilities 660 Continuous-Time Markov Chains 673 Time-Reversed Markov Chains 686 Numerical Techniques for Markov Chains 692 Summary 700 Problems 702 11.4 *11.5 11.6 A. B. C. 577 Mathematical Tables 797 Tables of Fourier Transforms Matrices and Linear Algebra 805 800 802 CHAPTER Basic Concepts of Probability Theory 2 This chapter presents the basic concepts of probability theory. In the remainder of the book, we will usually be further developing or elaborating the basic concepts presented here. You will be well prepared to deal with the rest of the book if you have a good understanding of these basic concepts when you complete the chapter. The following basic concepts will be presented. First, set theory is used to specify the sample space and the events of a random experiment. Second, the axioms of probability specify rules for computing the probabilities of events. Third, the notion of conditional probability allows us to determine how partial information about the outcome of an experiment affects the probabilities of events. Conditional probability also allows us to formulate the notion of “independence” of events and of experiments. Finally, we consider “sequential” random experiments that consist of performing a sequence of simple random subexperiments. We show how the probabilities of events in these experiments can be derived from the probabilities of the simpler subexperiments. Throughout the book it is shown that complex random experiments can be analyzed by decomposing them into simple subexperiments. 2.1 SPECIFYING RANDOM EXPERIMENTS A random experiment is an experiment in which the outcome varies in an unpredictable fashion when the experiment is repeated under the same conditions. A random experiment is specified by stating an experimental procedure and a set of one or more measurements or observations. Example 2.1 Experiment E1: Select a ball from an urn containing balls numbered 1 to 50. Note the number of the ball. Experiment E2 : Select a ball from an urn containing balls numbered 1 to 4. Suppose that balls 1 and 2 are black and that balls 3 and 4 are white. Note the number and color of the ball you select. Experiment E3: Toss a coin three times and note the sequence of heads and tails. Experiment E4: Toss a coin three times and note the number of heads. Experiment E5 : Count the number of voice packets containing only silence produced from a group of N speakers in a 10-ms period. 21 22 Chapter 2 Basic Concepts of Probability Theory Experiment E6 : A block of information is transmitted repeatedly over a noisy channel until an error-free block arrives at the receiver. Count the number of transmissions required. Experiment E7: Pick a number at random between zero and one. Experiment E8: Measure the time between page requests in a Web server. Experiment E9: Measure the lifetime of a given computer memory chip in a specified environment. Experiment E10: Determine the value of an audio signal at time t1 . Experiment E11: Determine the values of an audio signal at times t1 and t2 . Experiment E12: Pick two numbers at random between zero and one. Experiment E13 : Pick a number X at random between zero and one, then pick a number Y at random between zero and X. Experiment E14 : A system component is installed at time t = 0. For t Ú 0 let X1t2 = 1 as long as the component is functioning, and let X1t2 = 0 after the component fails. The specification of a random experiment must include an unambiguous statement of exactly what is measured or observed. For example, random experiments may consist of the same procedure but differ in the observations made, as illustrated by E3 and E4 . A random experiment may involve more than one measurement or observation, as illustrated by E2 , E3 , E11 , E12 , and E13 . A random experiment may even involve a continuum of measurements, as shown by E14 . Experiments E3 , E4 , E5 , E6 , E12 , and E13 are examples of sequential experiments that can be viewed as consisting of a sequence of simple subexperiments. Can you identify the subexperiments in each of these? Note that in E13 the second subexperiment depends on the outcome of the first subexperiment. 2.1.1 The Sample Space Since random experiments do not consistently yield the same result, it is necessary to determine the set of possible results. We define an outcome or sample point of a random experiment as a result that cannot be decomposed into other results. When we perform a random experiment, one and only one outcome occurs. Thus outcomes are mutually exclusive in the sense that they cannot occur simultaneously. The sample space S of a random experiment is defined as the set of all possible outcomes. We will denote an outcome of an experiment by z, where z is an element or point in S. Each performance of a random experiment can then be viewed as the selection at random of a single point (outcome) from S. The sample space S can be specified compactly by using set notation. It can be visualized by drawing tables, diagrams, intervals of the real line, or regions of the plane. There are two basic ways to specify a set: 1. List all the elements, separated by commas, inside a pair of braces: A = 50, 1, 2, 36, 2. Give a property that specifies the elements of the set: A = 5x : x is an integer such that 0 … x … 36. Note that the order in which items are listed does not change the set, e.g., 50, 1, 2, 36 and 51, 2, 3, 06 are the same set. Section 2.1 Specifying Random Experiments 23 Example 2.2 The sample spaces corresponding to the experiments in Example 2.1 are given below using set notation: S1 = 51, 2, Á , 506 S2 = 511, b2, 12, b2, 13, w2, 14, w26 S3 = 5HHH, HHT, HTH, THH, TTH, THT, HTT, TTT6 S4 = 50, 1, 2, 36 S5 = 50, 1, 2, Á , N6 S6 = 51, 2, 3, Á 6 S7 = 5x : 0 … x … 16 = 30, 14 See Fig. 2.1(a). S8 = 5t : t Ú 06 = 30, q 2 S9 = 5t : t Ú 06 = 30, q 2 See Fig. 2.1(b). S10 = 5v : - q 6 v 6 q 6 = 1- q , q 2 S11 = 51v1 , v22 : - q 6 v1 6 q and - q 6 v2 6 q 6 S12 = 51x, y2 : 0 … x … 1 and 0 … y … 16 S13 = 51x, y2 : 0 … y … x … 16 See Fig. 2.1(c). See Fig. 2.1(d). S14 = set of functions X1t2 for which X1t2 = 1 for 0 … t 6 t0 and X1t2 = 0 for t Ú t0 , where t0 7 0 is the time when the component fails. Random experiments involving the same experimental procedure may have different sample spaces as shown by Experiments E3 and E4 . Thus the purpose of an experiment affects the choice of sample space. S7 S9 x 0 1 (a) Sample space for Experiment E7. t 0 (b) Sample space for Experiment E9. y y 1 1 S12 0 S13 1 x (c) Sample space for Experiment E12. FIGURE 2.1 Sample spaces for Experiments E7 , E9 , E12 , and E13 . 0 1 x (d) Sample space for Experiment E13. 24 Chapter 2 Basic Concepts of Probability Theory There are three possibilities for the number of outcomes in a sample space. A sample space can be finite, countably infinite, or uncountably infinite. We call S a discrete sample space if S is countable; that is, its outcomes can be put into one-to-one correspondence with the positive integers. We call S a continuous sample space if S is not countable. Experiments E1 , E2 , E3 , E4 , and E5 have finite discrete sample spaces. Experiment E6 has a countably infinite discrete sample space. Experiments E7 through E13 have continuous sample spaces. Since an outcome of an experiment can consist of one or more observations or measurements, the sample space S can be multi-dimensional. For example, the outcomes in Experiments E2 , E11 , E12 , and E13 are two-dimensional, and those in Experiment E3 are three-dimensional. In some instances, the sample space can be written as the Cartesian product of other sets.1 For example, S11 = R * R, where R is the set of real numbers, and S3 = S * S * S, where S = 5H, T6. It is sometimes convenient to let the sample space include outcomes that are impossible. For example, in Experiment E9 it is convenient to define the sample space as the positive real line, even though a device cannot have an infinite lifetime. 2.1.2 Events We are usually not interested in the occurrence of specific outcomes, but rather in the occurrence of some event (i.e., whether the outcome satisfies certain conditions). This requires that we consider subsets of S. We say that A is a subset of B if every element of A also belongs to B. For example, in Experiment E10 , which involves the measurement of a voltage, we might be interested in the event “signal voltage is negative.” The conditions of interest define a subset of the sample space, namely, the set of points z from S that satisfy the given conditions. For example, “voltage is negative” corresponds to the set 5z : - q 6 z 6 06. The event occurs if and only if the outcome of the experiment z is in this subset. For this reason events correspond to subsets of S. Two events of special interest are the certain event, S, which consists of all outcomes and hence always occurs, and the impossible or null event, , which contains no outcomes and hence never occurs. Example 2.3 In the following examples, A k refers to an event corresponding to Experiment Ek in Example 2.1. E1 : E2 : E3 : E4 : E5 : 1 “An even-numbered ball is selected,” A 1 = 52, 4, Á , 48, 506. “The ball is white and even-numbered,” A 2 = 514, w26. “The three tosses give the same outcome,” A 3 = 5HHH, TTT6. “The number of heads equals the number of tails,” A 4 = . “No active packets are produced,” A 5 = 506. The Cartesian product of the sets A and B consists of the set of all ordered pairs (a, b), where the first element is taken from A and the second from B. Section 2.1 Specifying Random Experiments 25 “Fewer than 10 transmissions are required,” A 6 = 51, Á , 96. “The number selected is nonnegative,” A 7 = S7 . “Less than t0 seconds elapse between page requests,” A 8 = 5t : 0 … t 6 t06 = 30, t02. “The chip lasts more than 1000 hours but fewer than 1500 hours,” A 9 = 5t : 1000 6 t 6 15006 = 11000, 15002. E10: “The absolute value of the voltage is less than 1 volt,” A 10 = 5v : -1 6 v 6 16 = 1-1, 12. E11: “The two voltages have opposite polarities,” A 11 = 51v1 , v22 : 1v1 6 0 and v2 7 02 or 1v1 7 0 and v2 6 026. E12: “The two numbers differ by less than 1/10,” A 12 = 51x, y2 : 1x, y2 in S12 and ƒ x - y ƒ 6 1/106. E13: “The two numbers differ by less than 1/10,” A 13 = 51x, y2 : 1x, y2 in S13 and ƒ x - y ƒ 6 1/106. E14: “The system is functioning at time t1 ,” A 14 = subset of S14 for which X1t12 = 1. E6: E7: E8: E9: An event may consist of a single outcome, as in A 2 and A 5 . An event from a discrete sample space that consists of a single outcome is called an elementary event. Events A 2 and A 5 are elementary events. An event may also consist of the entire sample space, as in A 7 . The null event, , arises when none of the outcomes satisfy the conditions that specify a given event, as in A 4 . 2.1.3 Review of Set Theory In random experiments we are interested in the occurrence of events that are represented by sets. We can combine events using set operations to obtain other events. We can also express complicated events as combinations of simple events. Before proceeding with further discussion of events and random experiments, we present some essential concepts from set theory. A set is a collection of objects and will be denoted by capital letters S, A, B, Á . We define U as the universal set that consists of all possible objects of interest in a given setting or application. In the context of random experiments we refer to the universal set as the sample space. For example, the universal set in Experiment E6 is U = 51, 2, Á 6. A set A is a collection of objects from U, and these objects are called the elements or points of the set A and will be denoted by lowercase letters, z, a, b, x, y, Á . We use the notation: xHA and xxA to indicate that “x is an element of A” or “x is not an element of A,” respectively. We use Venn diagrams when discussing sets. A Venn diagram is an illustration of sets and their interrelationships. The universal set U is usually represented as the set of all points within a rectangle as shown in Fig. 2.2(a). The set A is then the set of points within an enclosed region inside the rectangle. We say A is a subset of B if every element of A also belongs to B, that is, if x H A implies x H B. We say that “A is contained in B” and we write: A ( B. If A is a subset of B, then the Venn diagram shows the region for A to be inside the region for B as shown in Fig. 2.2(e). 26 Chapter 2 Basic Concepts of Probability Theory U A B A (a) A 傼 B B (b) A 傽 B A A B Ac (d) A 傽 B   (c) Ac A A B B (e) A 傺 B A (f) A  B B (g) (A 傼 B)c (h) Ac 傽 Bc FIGURE 2.2 Set operations and set relations. Example 2.4 In Experiment E6 three sets of interest might be A = 5x : x Ú 106 = 510, 11, Á 6, that is, 10 or more transmissions are required; B = 52, 4, 6, Á 6, the number of transmissions is an even number; and C = 5x: x Ú 206 = 520, 21, Á 6. Which of these sets are subsets of the others? Clearly, C is a subset of A 1C ( A2. However, C is not a subset of B, and B is not a subset of C, because both sets contain elements the other set does not contain. Similarly, B is not a subset of A, and A is not a subset of B. The empty set ⵰ is defined as the set with no elements. The empty set  is a subset of every set, that is, for any set A,  ( A. We say sets A and B are equal if they contain the same elements. Since every element in A is also in B, then x H A implies x H B, so A ( B. Similarly every element in B is also in A, so x H B implies x H A and so B ( A. Therefore: A = B if and only if A ( B and B ( A. The standard method to show that two sets, A and B, are equal is to show that A ( B and B ( A. A second method is to list all the items in A and all the items in B, and to show that the items are the same. A variation of this second method is to use a Section 2.1 Specifying Random Experiments 27 Venn diagram to identify the region that corresponds to A and to then show that the Venn diagram for B occupies the same region. We provide examples of both methods shortly. We will use three basic operations on sets. The union and the intersection operations are applied to two sets and produce a third set. The complement operation is applied to a single set to produce another set. The union of two sets A and B is denoted by A ´ B and is defined as the set of outcomes that are either in A or in B, or both: A ´ B = 5x : x H A or x H B6. The operation A ´ B corresponds to the logical “or” of the properties that define set A and set B, that is, x is in A ´ B if x satisfies the property that defines A, or x satisfies the property that defines B, or both. The Venn diagram for A ´ B consists of the shaded region in Fig. 2.2(a). The intersection of two sets A and B is denoted by A ¨ B and is defined as the set of outcomes that are in both A and B: A ¨ B = 5x : x H A and x H B6. The operation A ¨ B corresponds to the logical “and” of the properties that define set A and set B. The Venn diagram for A ¨ B consists of the double shaded region in Fig. 2.2(b). Two sets are said to be disjoint or mutually exclusive if their intersection is the null set, A ¨ B = . Figure 2.2(d) shows two mutually exclusive sets A and B. The complement of a set A is denoted by Ac and is defined as the set of all elements not in A: Ac = 5x : x x A6. The operation Ac corresponds to the logical “not” of the property that defines set A. Figure 2.2(c) shows Ac. Note that Sc =  and c = S. The relative complement or difference of sets A and B is the set of elements in A that are not in B: A - B = 5x : x H A and x x B6. A - B is obtained by removing from A all the elements that are also in B, as illustrated in Fig. 2.2(f). Note that A - B = A ¨ Bc. Note also that Bc = S - B. Example 2.5 Let A, B, and C be the events from Experiment E6 in Example 2.4. Find the following events: A ´ B, A ¨ B, Ac, Bc, A - B, and B - A. A ´ B = 52, 4, 6, 8, 10, 11, 12, Á 6; A ¨ B = 510, 12, 14, Á 6; Ac = 5x : x 6 106 = 51, 2, Á , 96; Bc = 51, 3, 5, Á 6; 28 Chapter 2 Basic Concepts of Probability Theory A - B = 511, 13, 15, Á 6; and B - A = 52, 4, 6, 86. The three basic set operations can be combined to form other sets. The following properties of set operations are useful in deriving new expressions for combinations of sets: Commutative properties: A´B = B´A A ¨ B = B ¨ A. and (2.1) Associative properties: A ´ 1B ´ C2 = 1A ´ B2 ´ C and A ¨ 1B ¨ C2 = 1A ¨ B2 ¨ C. (2.2) Distributive properties: A ´ 1B ¨ C2 = 1A ´ B2 ¨ 1A ´ C2 and A ¨ 1B ´ C2 = 1A ¨ B2 ´ 1A ¨ C2. (2.3) By applying the above properties we can derive new identities. DeMorgan’s rules provide an important such example: DeMorgan’s rules: 1A ´ B2c = Ac ¨ Bc and 1A ¨ B2c = Ac ´ Bc (2.4) Example 2.6 Prove DeMorgan’s rules by using Venn diagrams and by demonstrating set equality. First we will use a Venn diagram to show the first equality. The shaded region in Fig. 2.2(g) shows the complement of A ´ B, the left-hand side of the equation. The cross-hatched region in Fig. 2.2(h) shows the intersection of Ac and Bc. The two regions are the same and so the sets are equal. Try sketching the Venn diagrams for the second equality in Eq. (2.4). Next we prove DeMorgan’s rules by proving set equality. The proof has two parts: First we show that 1A ´ B2c ( Ac ¨ Bc; then we show that Ac ¨ Bc ( 1A ´ B2c. Together these results imply 1A ´ B2c = Ac ¨ Bc. First, suppose that x H 1A ´ B2c, then x x A ´ B. In particular, we have x x A, which implies x H Ac. Similarly, we have x x B, which implies x H Bc. Hence x is in both Ac and Bc, that is, x H Ac ¨ Bc. We have shown that 1A ´ B2c ( Ac ¨ Bc. To prove inclusion in the other direction, suppose that x H Ac ¨ Bc. This implies that c x H A , so x x A. Similarly, x H Bc and so x x B. Therefore, x x 1A ´ B2 and so x H 1A ´ B2c. We have shown that Ac ¨ Bc ( 1A ´ B2c. This proves that 1A ´ B2c = Ac ¨ Bc. To prove the second DeMorgan rule, apply the first DeMorgan rule to Ac and Bc to obtain: 1Ac ´ Bc2c = 1Ac2c ¨ 1Bc2c = A ¨ B, where we used the identity A = 1Ac2c. Now take complements of both sides of the above equation: Ac ´ Bc = 1A ¨ B2c. Section 2.1 Specifying Random Experiments 29 Example 2.7 For Experiment E10 , let the sets A, B, and C be defined by A = 5v : ƒ v ƒ 7 106, B = 5v : v 6 -56, C = 5v : v 7 06, “magnitude of v is greater than 10 volts,” “v is less than -5 volts,” “v is positive.” You should then verify that A ´ B = 5v : v 6 -5 or v 7 106, A ¨ B = 5v : v 6 -106, C c = 5v : v … 06, 1A ´ B2 ¨ C = 5v : v 7 106, A ¨ B ¨ C = , and 1A ´ B2c = 5v : -5 … v … 106. The union and intersection operations can be repeated for an arbitrary number of sets. Thus the union of n sets Á d Ak = A1 ´ A2 ´ ´ An n (2.5) k=1 is the set that consists of all elements that are in A k for at least one value of k. The same definition applies to the union of a countably infinite sequence of sets: d Ak . (2.6) Á t Ak = A1 ¨ A2 ¨ ¨ An (2.7) q k=1 The intersection of n sets n k=1 is the set that consists of elements that are in all of the sets A 1 , Á , A n . The same definition applies to the intersection of a countably infinite sequence of sets: t Ak . q (2.8) k=1 We will see that countable unions and intersections of sets are essential in dealing with sample spaces that are not finite. 2.1.4 Event Classes We have introduced the sample space S as the set of all possible outcomes of the random experiment. We have also introduced events as subsets of S. Probability theory also requires that we state the class F of events of interest. Only events in this class 30 Chapter 2 Basic Concepts of Probability Theory are assigned probabilities. We expect that any set operation on events in F will produce a set that is also an event in F. In particular, we insist that complements, as well as countable unions and intersections of events in F, i.e., Eqs. (2.1) and (2.5) through (2.8), result in events in F. When the sample space S is finite or countable, we simply let F consist of all subsets of S and we can proceed without further concerns about F. However, when S is the real line R (or an interval of the real line), we cannot let F be all possible subsets of R and still satisfy the axioms of probability. Fortunately, we can obtain all the events of practical interest by letting F be of the class of events obtained as complements and countable unions and intersections of intervals of the real line, e.g., (a, b] or 1- q , b]. We will refer to this class of events as the Borel field. In the remainder of the book, we will refer to the event class F from time to time. For the introductory-level course in probability you will not need to know more than what is stated in this paragraph. When we speak of a class of events we are referring to a collection (set) of events (sets), that is, we are speaking of a “set of sets.” We refer to the collection of sets as a class to remind us that the elements of the class are sets. We use script capital letters to refer to a class, e.g., C, F, G. If the class C consists of the collection of sets A 1 , Á , A k , then we write C = 5A 1 , Á , A k6. Example 2.8 Let S = 5T, H6 be the outcome of a coin toss. Let every subset of S be an event. Find all possible events of S. An event is a subset of S, so we need to find all possible subsets of S. These are: S = 5, 5H6, 5T6, 5H, T66. Note that S includes both the empty set and S. Let iT and iH be binary numbers where i = 1 indicates that the corresponding element of S is in a given subset. We generate all possible subsets by taking all possible values of the pair iT and iH . Thus iT = 0, iH = 1 corresponds to the set 5H6. Clearly there are 2 2 possible subsets as listed above. For a finite sample space, S = 51, 2, Á , k6,2 we usually allow all subsets of S to be events. This class of events is called the power set of S and we will denote it by S. We can index all possible subsets of S with binary numbers i1 , i2 , Á , ik , and we find that the power set of S has 2 k members. Because of this, the power set is also denoted by S = 2 S. Section 2.8 discusses some of the fine points on event classes. 2.2 THE AXIOMS OF PROBABILITY Probabilities are numbers assigned to events that indicate how “likely” it is that the events will occur when an experiment is performed. A probability law for a random experiment is a rule that assigns probabilities to the events of the experiment that belong to the event class F. Thus a probability law is a function that assigns a number to sets (events). In Section 1.3 we found a number of properties of relative frequency that any definition of probability should satisfy. The axioms of probability formally state that a The discussion applies to any finite sample space with arbitrary objects S = 5x1 , Á , xk6, but we consider 51, 2, Á , k6 for notational simplicity. 2 Section 2.2 The Axioms of Probability 31 probability law must satisfy these properties. In this section, we develop a number of results that follow from this set of axioms. Let E be a random experiment with sample space S and event class F. A probability law for the experiment E is a rule that assigns to each event A H F a number P[A], called the probability of A, that satisfies the following axioms: Axiom I Axiom II Axiom III Axiom III¿ 0 … P3A4 P3S4 = 1 If A ¨ B = , then P3A ´ B4 = P3A4 + P3B4. If A 1 , A 2 , Á is a sequence of events such that A i ¨ A j =  for all i Z j, then P B d A k R = a P3A k4. q q k=1 k=1 Axioms I, II, and III are enough to deal with experiments with finite sample spaces. In order to handle experiments with infinite sample spaces, Axiom III needs to be replaced by Axiom III¿. Note that Axiom III¿ includes Axiom III as a special case, by letting A k =  for k Ú 3. Thus we really only need Axioms I, II, and III¿. Nevertheless we will gain greater insight by starting with Axioms I, II, and III. The axioms allow us to view events as objects possessing a property (i.e., their probability) that has attributes similar to physical mass. Axiom I states that the probability (mass) is nonnegative, and Axiom II states that there is a fixed total amount of probability (mass), namely 1 unit. Axiom III states that the total probability (mass) in two disjoint objects is the sum of the individual probabilities (masses). The axioms provide us with a set of consistency rules that any valid probability assignment must satisfy. We now develop several properties stemming from the axioms that are useful in the computation of probabilities. The first result states that if we partition the sample space into two mutually exclusive events, A and Ac, then the probabilities of these two events add up to one. Corollary 1 P3Ac4 = 1 - P3A4 Proof: Since an event A and its complement Ac are mutually exclusive, A ¨ Ac = , we have from Axiom III that P3A ´ Ac4 = P3A4 + P3Ac4. Since S = A ´ Ac, by Axiom II, 1 = P3S4 = P3A ´ Ac4 = P3A4 + P3Ac4. The corollary follows after solving for P3Ac4. The next corollary states that the probability of an event is always less than or equal to one. Corollary 2 combined with Axiom I provide good checks in problem 32 Chapter 2 Basic Concepts of Probability Theory solving: If your probabilities are negative or are greater than one, you have made a mistake somewhere! Corollary 2 P3A4 … 1 Proof: From Corollary 1, P3A4 = 1 - P3Ac4 … 1, since P3Ac4 Ú 0. Corollary 3 states that the impossible event has probability zero. Corollary 3 P34 = 0 Proof: Let A = S and Ac =  in Corollary 1: P34 = 1 - P3S4 = 0. Corollary 4 provides us with the standard method for computing the probability of a complicated event A. The method involves decomposing the event A into the union of disjoint events A 1 , A 2 , Á , A n . The probability of A is the sum of the probabilities of the A k’s. Corollary 4 If A 1 , A 2 , Á , A n are pairwise mutually exclusive, then P B d A k R = a P3A k4 n n k=1 k=1 for n Ú 2. Proof: We use mathematical induction. Axiom III implies that the result is true for n = 2. Next we need to show that if the result is true for some n, then it is also true for n + 1. This, combined with the fact that the result is true for n = 2, implies that the result is true for n Ú 2. Suppose that the result is true for some n 7 2; that is, P B d A k R = a P3A k4, n n k=1 k=1 (2.9) and consider the n + 1 case P B d A k R = P B b d A k r ´ A n + 1 R = P B d A k R + P3A n + 14, n+1 n n k=1 k=1 k=1 (2.10) where we have applied Axiom III to the second expression after noting that the union of events A 1 to A n is mutually exclusive with A n + 1 . The distributive property then implies b d A k r ¨ A n + 1 = d 5A k ¨ A n + 16 = d  = . n n n k=1 k=1 k=1 Section 2.2 The Axioms of Probability 33 Substitution of Eq. (2.9) into Eq. (2.10) gives the n + 1 case P B d A k R = a P3A k4. n+1 n+1 k=1 k=1 Corollary 5 gives an expression for the union of two events that are not necessarily mutually exclusive. Corollary 5 P3A ´ B4 = P3A4 + P3B4 - P3A ¨ B4 Proof: First we decompose A ´ B, A, and B as unions of disjoint events. From the Venn diagram in Fig. 2.3, P3A ´ B4 = P3A ¨ Bc4 + P3B ¨ Ac4 + P3A ¨ B4 P3A4 = P3A ¨ Bc4 + P3A ¨ B4 P3B4 = P3B ¨ Ac4 + P3A ¨ B4 By substituting P3A ¨ Bc4 and P3B ¨ Ac4 from the two lower equations into the top equation, we obtain the corollary. By looking at the Venn diagram in Fig. 2.3, you will see that the sum P[A] + P[B] counts the probability (mass) of the set A ¨ B twice. The expression in Corollary 5 makes the appropriate correction. Corollary 5 is easily generalized to three events, P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4 - P3A ¨ C4 - P3B ¨ C4 + P3A ¨ B ¨ C4, and in general to n events, as shown in Corollary 6. A 傽 Bc Ac 傽 B A傽B A B FIGURE 2.3 Decomposition of A ´ B into three disjoint sets. (2.11) 34 Chapter 2 Basic Concepts of Probability Theory Corollary 6 P B d A k R = a P3A j4 - a P3A j ¨ A k4 + Á n n k=1 j=1 j6k + 1-12n + 1P3A 1 ¨ Á ¨ A n4. Proof is by induction (see Problems 2.26 and 2.27). Since probabilities are nonnegative, Corollary 5 implies that the probability of the union of two events is no greater than the sum of the individual event probabilities P3A ´ B4 … P3A4 + P3B4. (2.12) The above inequality is a special case of the fact that a subset of another set must have smaller probability. This result is frequently used to obtain upper bounds for probabilities of interest. In the typical situation, we are interested in an event A whose probability is difficult to find; so we find an event B for which the probability can be found and that includes A as a subset. Corollary 7 If A ( B, then P3A4 … P3B4. Proof: In Fig. 2.4, B is the union of A and Ac ¨ B, thus P3B4 = P3A4 + P3Ac ¨ B4 Ú P3A4, since P3Ac ¨ B4 Ú 0. The axioms together with the corollaries provide us with a set of rules for computing the probability of certain events in terms of other events. However, we still need an initial probability assignment for some basic set of events from which the probability of all other events can be computed. This problem is dealt with in the next two subsections. A Ac 傽 B B FIGURE 2.4 If A ( B, then P1A2 … P1B2. Section 2.2 2.2.1 The Axioms of Probability 35 Discrete Sample Spaces In this section we show that the probability law for an experiment with a countable sample space can be specified by giving the probabilities of the elementary events. First, suppose that the sample space is finite, S = 5a1 , a2 , Á , an6 and let F consist of all subsets of S. All distinct elementary events are mutually exclusive, so by Corollary 4 the probœ ability of any event B = 5a1œ , a2œ , Á , am 6 is given by œ P3B4 = P35a1œ , a2œ , Á , am 64 œ = P35a1œ 64 + P35a2œ 64 + Á + P35am 64; (2.13) that is, the probability of an event is equal to the sum of the probabilities of the outcomes in the event.Thus we conclude that the probability law for a random experiment with a finite sample space is specified by giving the probabilities of the elementary events. If the sample space has n elements, S = 5a1 , Á , an6, a probability assignment of particular interest is the case of equally likely outcomes. The probability of the elementary events is 1 P35a164 = P35a264 = Á = P35an64 = . n (2.14) k P3B4 = P35a1œ 64 + Á + P35akœ 64 = . n (2.15) The probability of any event that consists of k outcomes, say B = 5a1œ , Á , akœ 6, is Thus if outcomes are equally likely, then the probability of an event is equal to the number of outcomes in the event divided by the total number of outcomes in the sample space. Section 2.3 discusses counting methods that are useful in finding probabilities in experiments that have equally likely outcomes. Consider the case where the sample space is countably infinite, S = 5a1 , a2 , Á 6. Let the event class F be the class of all subsets of S. Note that F must now satisfy Eq. (2.8) because events can consist of countable unions of sets. Axiom III¿ implies that the probability of an event such as D = 5b1 , b2 , b3 , Á 6 is given by P3D4 = P35b1œ , b2œ , b3œ , Á 64 = P35b1œ 64 + P35b2œ 64 + P35b3œ 64 + Á The probability of an event with a countably infinite sample space is determined from the probabilities of the elementary events. Example 2.9 An urn contains 10 identical balls numbered 0, 1, Á , 9. A random experiment involves selecting a ball from the urn and noting the number of the ball. Find the probability of the following events: A = “number of ball selected is odd,” B = “number of ball selected is a multiple of 3,” C = “number of ball selected is less than 5,” and of A ´ B and A ´ B ´ C. 36 Chapter 2 Basic Concepts of Probability Theory The sample space is S = 50, 1, Á , 96, so the sets of outcomes corresponding to the above events are A = 51, 3, 5, 7, 96, B = 53, 6, 96, C = 50, 1, 2, 3, 46. and If we assume that the outcomes are equally likely, then P3A4 = P35164 + P35364 + P35564 + P35764 + P35964 = P3B4 = P35364 + P35664 + P35964 = 5 . 10 3 . 10 P3C4 = P35064 + P35164 + P35264 + P35364 + P35464 = 5 . 10 From Corollary 5, P3A ´ B4 = P3A4 + P3B4 - P3A ¨ B4 = 5 3 2 6 + = , 10 10 10 10 where we have used the fact that A ¨ B = 53, 96, so P3A ¨ B4 = 2>10. From Corollary 6, P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4 - P3A ¨ C4 - P3B ¨ C4 + P3A ¨ B ¨ C4 = 3 5 2 2 1 1 5 + + + 10 10 10 10 10 10 10 = 9 . 10 You should verify the answers for P3A ´ B4 and P3A ´ B ´ C4 by enumerating the outcomes in the events. Many probability models can be devised for the same sample space and events by varying the probability assignment; in the case of finite sample spaces all we need to do is come up with n nonnegative numbers that add up to one for the probabilities of the elementary events. Of course, in any particular situation, the probability assignment should be selected to reflect experimental observations to the extent possible. The following example shows that situations can arise where there is more than one “reasonable” probability assignment and where experimental evidence is required to decide on the appropriate assignment. Example 2.10 Suppose that a coin is tossed three times. If we observe the sequence of heads and tails, then there are eight possible outcomes S3 = 5HHH, HHT, HTH, THH, TTH, THT, HTT, TTT6. If we assume that the outcomes of S3 are equiprobable, then the probability of each of the eight elementary events is 1/8. This probability assignment implies that the probability of obtaining two heads in three tosses is, by Corollary 3, P3“2 heads in 3 tosses”4 = P35HHT, HTH, THH64 = P35HHT64 + P35HTH64 + P35THH64 = 3 . 8 Section 2.2 The Axioms of Probability 37 Now suppose that we toss a coin three times but we count the number of heads in three tosses instead of observing the sequence of heads and tails. The sample space is now S4 = 50, 1, 2, 36. If we assume the outcomes of S4 to be equiprobable, then each of the elementary events of S4 has probability 1/4. This second probability assignment predicts that the probability of obtaining two heads in three tosses is P3“2 heads in 3 tosses”4 = P35264 = 1 . 4 The first probability assignment implies that the probability of two heads in three tosses is 3/8, and the second probability assignment predicts that the probability is 1/4. Thus the two assignments are not consistent with each other. As far as the theory is concerned, either one of the assignments is acceptable. It is up to us to decide which assignment is more appropriate. Later in the chapter we will see that only the first assignment is consistent with the assumption that the coin is fair and that the tosses are “independent.” This assignment correctly predicts the relative frequencies that would be observed in an actual coin tossing experiment. Finally we consider an example with a countably infinite sample space. Example 2.11 A fair coin is tossed repeatedly until the first heads shows up; the outcome of the experiment is the number of tosses required until the first heads occurs. Find a probability law for this experiment. It is conceivable that an arbitrarily large number of tosses will be required until heads occurs, so the sample space is S = 51, 2, 3, Á 6. Suppose the experiment is repeated n times. Let Nj be the number of trials in which the jth toss results in the first heads. If n is very large, we expect N1 to be approximately n/2 since the coin is fair. This implies that a second toss is necessary about n - N1 L n>2 times, and again we expect that about half of these—that is, n/4—will result in heads, and so on, as shown in Fig. 2.5. Thus for large n, the relative frequencies are fj L Nj 1 j = a b n 2 j = 1, 2, Á . We therefore conclude that a reasonable probability law for this experiment is 1 j P3 j tosses till first heads4 = a b 2 j = 1, 2, Á . (2.16) We can verify that these probabilities add up to one by using the geometric series with a = 1/2: a j = 1. aa = 1 - a ` j=1 a = 1/2 q 2.2.2 Continuous Sample Spaces Continuous sample spaces arise in experiments in which the outcomes are numbers that can assume a continuum of values, so we let the sample space S be the entire real line R (or some interval of the real line). We could consider letting the event class consist of all subsets of R. But it turns out that this class is “too large” and it is impossible 38 Chapter 2 Basic Concepts of Probability Theory n trials Tails Heads ⬇ n N1 ⬇ 2 n trials 2 Tails Heads ⬇ 1 n n N1 ⬇  2 2 4 n trials 4 Tails Heads N3 ⬇ ⬇ n 8 n trials 8 Heads N4 ⬇ n 16 FIGURE 2.5 In n trials heads comes up in the first toss approximately n/2 times, in the second toss approximately n/4 times, and so on. to assign probabilities to all the subsets of R. Fortunately, it is possible to assign probabilities to all events in a smaller class that includes all events of practical interest. This class denoted by B, is called the Borel field and it contains all open and closed intervals of the real line as well as all events that can be obtained as countable unions, intersections, and complements.3 Axiom III¿ is once again the key to calculating probabilities of events. Let A 1 , A 2 , Á be a sequence of mutually exclusive events that are represented by intervals of the real line, then P B d A k R = a P3A k4 q q k=1 k=1 where each P3A k4 is specified by the probability law. For this reason, probability laws in experiments with continuous sample spaces specify a rule for assigning numbers to intervals of the real line. Example 2.12 Consider the random experiment “pick a number x at random between zero and one.” The sample space S for this experiment is the unit interval [0, 1], which is uncountably infinite. If we suppose that all the outcomes S are equally likely to be selected, then we would guess that the probability that the outcome is in the interval [0, 1/2] is the same as the probability that the outcome is in the interval [1/2, 1].We would also guess that the probability of the outcome being exactly equal to 1/2 would be zero since there are an uncountably infinite number of equally likely outcomes. 3 Section 2.9 discusses B in more detail. Section 2.2 The Axioms of Probability 39 Consider the following probability law: “The probability that the outcome falls in a subinterval of S is equal to the length of the subinterval,” that is, P33a, b44 = 1b - a2 for 0 … a … b … 1, (2.17) where by P[[a, b]] we mean the probability of the event corresponding to the interval [a, b]. Clearly, Axiom I is satisfied since b Ú a Ú 0. Axiom II follows from S = 3a, b4 with a = 0 and b = 1. We now show that the probability law is consistent with the previous guesses about the probabilities of the events [0, 1/2], [1/2, 1], and 51/26: P330, 0.544 = 0.5 - 0 = .5 P330.5, 144 = 1 - 0.5 = .5 In addition, if x0 is any point in S, then P33x0 , x044 = 0 since individual points have zero width. Now suppose that we are interested in an event that is the union of several intervals; for example, “the outcome is at least 0.3 away from the center of the unit interval,” that is, A = 30, 0.24 ´ 30.8, 14. Since the two intervals are disjoint, we have by Axiom III P3A4 = P330, 0.244 + P330.8, 144 = .4. The next example shows that an initial probability assignment that specifies the probability of semi-infinite intervals also suffices to specify the probabilities of all events of interest. Example 2.13 Suppose that the lifetime of a computer memory chip is measured, and we find that “the proportion of chips whose lifetime exceeds t decreases exponentially at a rate a.” Find an appropriate probability law. Let the sample space in this experiment be S = 10, q 2. If we interpret the above finding as “the probability that a chip’s lifetime exceeds t decreases exponentially at a rate a,” we then obtain the following assignment of probabilities to events of the form 1t, q 2: P31t, q 24 = e -at for t 7 0, (2.18) where a 7 0. Note that the exponential is a number between 0 and 1 for t 7 0, so Axiom I is satisfied. Axiom II is satisfied since P3S4 = P310, q 24 = 1. The probability that the lifetime is in the interval (r, s] is found by noting in Fig. 2.6 that 1r, s4 ´ 1s, q 2 = 1r, q 2, so by Axiom III, P31r, q 24 = P31r, s44 + P31s, q 24. 共 r FIGURE 2.6 1r, q 2 = 1r, s4 ´ 1s, q 2. 兴共 s 40 Chapter 2 Basic Concepts of Probability Theory By rearranging the above equation we obtain P31r, s44 = P31r, q 24 - P31s, q 24 = e -ar - e -as. We thus obtain the probability of arbitrary intervals in S. In both Example 2.12 and Example 2.13, the probability that the outcome takes on a specific value is zero. You may ask: If an outcome (or event) has probability zero, doesn’t that mean it cannot occur? And you may then ask: How can all the outcomes in a sample space have probability zero? We can explain this paradox by using the relative frequency interpretation of probability.An event that occurs only once in an infinite number of trials will have relative frequency zero. Hence the fact that an event or outcome has relative frequency zero does not imply that it cannot occur, but rather that it occurs very infrequently. In the case of continuous sample spaces, the set of possible outcomes is so rich that all outcomes occur infrequently enough that their relative frequencies are zero. We end this section with an example where the events are regions in the plane. Example 2.14 Consider Experiment E12 , where we picked two numbers x and y at random between zero and one. The sample space is then the unit square shown in Fig. 2.7(a). If we suppose that all pairs of numbers in the unit square are equally likely to be selected, then it is reasonable to use a probability assignment in which the probability of any region R inside the unit square is equal to the area of R. Find the probability of the following events: A = 5x 7 0.56, B = 5y 7 0.56, and C = 5x 7 y6. y y 1 1 S x 0 1 x 0 1 2 1 2 (b) Event 兵x  (a) Sample space y 1 x 1 其 2 y 1 1 y 1 2 1 2 xy 0 1 1 (c) Event 兵y  其 2 x 0 1 (d) Event 兵x  y其 FIGURE 2.7 A two-dimensional sample space and three events. x Section 2.3 Computing Probabilities Using Counting Methods 41 Figures 2.7(b) through 2.7(d) show the regions corresponding to the events A, B, and C. Clearly each of these regions has area 1/2. Thus 1 1 1 P3B4 = , P3C4 = . P3A4 = , 2 2 2 We reiterate how to proceed from a problem statement to its probability model. The problem statement implicitly or explicitly defines a random experiment, which specifies an experimental procedure and a set of measurements and observations. These measurements and observations determine the set of all possible outcomes and hence the sample space S. An initial probability assignment that specifies the probability of certain events must be determined next. This probability assignment must satisfy the axioms of probability. If S is discrete, then it suffices to specify the probabilities of elementary events. If S is continuous, it suffices to specify the probabilities of intervals of the real line or regions of the plane. The probability of other events of interest can then be determined from the initial probability assignment and the axioms of probability and their corollaries. Many probability assignments are possible, so the choice of probability assignment must reflect experimental observations and/or previous experience. *2.3 COMPUTING PROBABILITIES USING COUNTING METHODS4 In many experiments with finite sample spaces, the outcomes can be assumed to be equiprobable. The probability of an event is then the ratio of the number of outcomes in the event of interest to the total number of outcomes in the sample space (Eq. (2.15)). The calculation of probabilities reduces to counting the number of outcomes in an event. In this section, we develop several useful counting (combinatorial) formulas. Suppose that a multiple-choice test has k questions and that for question i the student must select one of ni possible answers. What is the total number of ways of answering the entire test? The answer to question i can be viewed as specifying the ith component of a k-tuple, so the above question is equivalent to: How many distinct ordered k-tuples 1x1 , Á , xk2 are possible if xi is an element from a set with ni distinct elements? Consider the k = 2 case. If we arrange all possible choices for x1 and for x2 along the sides of a table as shown in Fig. 2.8, we see that there are n1n2 distinct ordered pairs. For triplets we could arrange the n1n2 possible pairs 1x1 , x22 along the vertical side of the table and the n3 choices for x3 along the horizontal side. Clearly, the number of possible triplets is n1n2n3 . In general, the number of distinct ordered k-tuples 1x1 , Á , xk2 with components xi from a set with ni distinct elements is number of distinct ordered k-tuples = n1n2 Á nk . (2.19) Many counting problems can be posed as sampling problems where we select “balls” from “urns” or “objects” from “populations.” We will now use Eq. (2.19) to develop combinatorial formulas for various types of sampling. 4 This section and all sections marked with an asterisk may be skipped without loss of continuity. 42 Chapter 2 Basic Concepts of Probability Theory x1 an1 b1 (a1,b1) (a2,b1) ... (an1,b1) b2 (a1,b2) (a2,b2) ... (an1,b2) ... (an1,bn2) . (a1,bn2) .. bn2 ... ... ... a2 ... x2 a1 (a2,bn2) FIGURE 2.8 If there are n1 distinct choices for x1 and n2 distinct choices for x2, then there are n1n2 distinct ordered pairs 1x1 , x22. 2.3.1 Sampling with Replacement and with Ordering Suppose we choose k objects from a set A that has n distinct objects, with replacement—that is, after selecting an object and noting its identity in an ordered list, the object is placed back in the set before the next choice is made. We will refer to the set A as the “population.” The experiment produces an ordered k-tuple 1x1 , Á , xk2, where xi H A and i = 1, Á , k. Equation (2.19) with n1 = n2 = Á = nk = n implies that number of distinct ordered k-tuples = nk. (2.20) Example 2.15 An urn contains five balls numbered 1 to 5. Suppose we select two balls from the urn with replacement. How many distinct ordered pairs are possible? What is the probability that the two draws yield the same number? Equation (2.20) states that the number of ordered pairs is 52 = 25. Table 2.1 shows the 25 possible pairs. Five of the 25 outcomes have the two draws yielding the same number; if we suppose that all pairs are equiprobable, then the probability that the two draws yield the same number is 5/25 = .2. 2.3.2 Sampling without Replacement and with Ordering Suppose we choose k objects in succession without replacement from a population A of n distinct objects. Clearly, k … n. The number of possible outcomes in the first draw is n1 = n; the number of possible outcomes in the second draw is n2 = n - 1, namely all n objects except the one selected in the first draw; and so on, up to nk = n - 1k - 12 in the final draw. Equation (2.19) then gives number of distinct ordered k-tuples = n1n - 12 Á 1n - k + 12. (2.21) Section 2.3 Computing Probabilities Using Counting Methods 43 TABLE 2.1 Enumeration of possible outcomes in various types of sampling of two balls from an urn containing five distinct balls. (a) Ordered pairs for sampling with replacement. (1, 1) (2, 1) (3, 1) (4, 1) (5, 1) (1, 2) (2, 2) (3, 2) (4, 2) (5, 2) (1, 3) (2, 3) (3, 3) (4, 3) (5, 3) (1, 4) (2, 4) (3, 4) (4, 4) (5, 4) (1, 5) (2, 5) (3, 5) (4, 5) (5, 5) (b) Ordered pairs for sampling without replacement. (1, 2) (2, 1) (3, 1) (4, 1) (5, 1) (3, 2) (4, 2) (5, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (3, 4) (2, 5) (3, 5) (4, 5) (4, 3) (5, 3) (5, 4) (c) Pairs for sampling without replacement or ordering. (1, 2) (1, 3) (2, 3) (1, 4) (2, 4) (3, 4) (1, 5) (2, 5) (3, 5) (4, 5) Example 2.16 An urn contains five balls numbered 1 to 5. Suppose we select two balls in succession without replacement. How many distinct ordered pairs are possible? What is the probability that the first ball has a number larger than that of the second ball? Equation (2.21) states that the number of ordered pairs is 5142 = 20. The 20 possible ordered pairs are shown in Table 2.1(b). Ten ordered pairs in Tab. 2.1(b) have the first number larger than the second number; thus the probability of this event is 10/20 = 1/2. Example 2.17 An urn contains five balls numbered 1, 2, Á , 5. Suppose we draw three balls with replacement. What is the probability that all three balls are different? From Eq. (2.20) there are 53 = 125 possible outcomes, which we will suppose are equiprobable. The number of these outcomes for which the three draws are different is given by Eq. (2.21): 5142132 = 60. Thus the probability that all three balls are different is 60/125 = .48. 2.3.3 Permutations of n Distinct Objects Consider sampling without replacement with k = n. This is simply drawing objects from an urn containing n distinct objects until the urn is empty. Thus, the number of possible orderings (arrangements, permutations) of n distinct objects is equal to the 44 Chapter 2 Basic Concepts of Probability Theory number of ordered n-tuples in sampling without replacement with k = n. From Eq. (2.21), we have number of permutations of n objects = n1n - 12 Á 122112 ! n!. (2.22) We refer to n! as n factorial. We will see that n! appears in many of the combinatorial formulas. For large n, Stirling’s formula is very useful: n! ' 22p nn + 1/2e -n, (2.23) where the sign ' indicates that the ratio of the two sides tends to unity as n : q [Feller, p. 52]. Example 2.18 Find the number of permutations of three distinct objects 51, 2, 36. Equation (2.22) gives 3! = 3122112 = 6. The six permutations are 123 312 231 132 213 321. Example 2.19 Suppose that 12 balls are placed at random into 12 cells, where more than 1 ball is allowed to occupy a cell. What is the probability that all cells are occupied? The placement of each ball into a cell can be viewed as the selection of a cell number between 1 and 12. Equation (2.20) implies that there are 12 12 possible placements of the 12 balls in the 12 cells. In order for all cells to be occupied, the first ball selects from any of the 12 cells, the second ball from the remaining 11 cells, and so on. Thus the number of placements that occupy all cells is 12!. If we suppose that all 12 12 possible placements are equiprobable, we find that the probability that all cells are occupied is 12 11 1 12! = a b a b Á a b = 5.37110-52. 12 12 12 12 12 This answer is surprising if we reinterpret the question as follows. Given that 12 airplane crashes occur at random in a year, what is the probability that there is exactly 1 crash each month? The above result shows that this probability is very small. Thus a model that assumes that crashes occur randomly in time does not predict that they tend to occur uniformly over time [Feller, p. 32]. 2.3.4 Sampling without Replacement and without Ordering Suppose we pick k objects from a set of n distinct objects without replacement and that we record the result without regard to order. (You can imagine putting each selected object into another jar, so that when the k selections are completed we have no record of the order in which the selection was done.) We call the resulting subset of k selected objects a “combination of size k.” From Eq. (2.22), there are k! possible orders in which the k objects in the second jar could have been selected. Thus if C nk denotes the number of combinations of size k Section 2.3 Computing Probabilities Using Counting Methods 45 from a set of size n, then C nkk! must be the total number of distinct ordered samples of k objects, which is given by Eq. (2.21). Thus C nkk! = n1n - 12 Á 1n - k + 12, (2.24) and the number of different combinations of size k from a set of size n, k … n, is C nk = n1n - 12 Á 1n - k + 12 k! = n n! ! ¢ ≤. k k! 1n - k2! (2.25) The expression A k B is called a binomial coefficient and is read “n choose k.” Note that choosing k objects out of a set of n is equivalent to choosing the n - k objects that are to be left out. It then follows that (also see Problem 2.60): n n k ¢ ≤ = ¢ n ≤. n - k Example 2.20 Find the number of ways of selecting two objects from A = 51, 2, 3, 4, 56 without regard to order. Equation (2.25) gives 5 2 ¢ ≤ = 5! = 10. 2! 3! Table 2.1(c) gives the 10 pairs. Example 2.21 Find the number of distinct permutations of k white balls and n - k black balls. This problem is equivalent to the following sampling problem: Put n tokens numbered 1 to n in an urn, where each token represents a position in the arrangement of balls; pick a combination of k tokens and put the k white balls in the corresponding positions. Each combination of size k leads to a distinct arrangement (permutation) of k white balls and n - k black balls. Thus the number of distinct permutations of k white balls and n - k black balls is C nk . As a specific example let n = 4 and k = 2. The number of combinations of size 2 from a set of four distinct objects is 4 2 ¢ ≤ = 4132 4! = = 6. 2! 2! 2112 The 6 distinct permutations with 2 whites (zeros) and 2 blacks (ones) are 1100 0110 0011 1001 1010 0101. Example 2.22 Quality Control A batch of 50 items contains 10 defective items. Suppose 10 items are selected at random and tested. What is the probability that exactly 5 of the items tested are defective? 46 Chapter 2 Basic Concepts of Probability Theory The number of ways of selecting 10 items out of a batch of 50 is the number of combinations of size 10 from a set of 50 objects: ¢ 50 50! . ≤ = 10 10! 40! The number of ways of selecting 5 defective and 5 nondefective items from the batch of 50 is the product N1N2 , where N1 is the number of ways of selecting the 5 items from the set of 10 defective items, and N2 is the number of ways of selecting 5 items from the 40 nondefective items. Thus the probability that exactly 5 tested items are defective is ¢ 10 40 ≤¢ ≤ 5 5 ¢ 50 ≤ 10 = 10! 40! 10! 40! = .016. 5! 5! 35! 5! 50! Example 2.21 shows that sampling without replacement and without ordering is equivalent to partitioning the set of n distinct objects into two sets: B, containing the k items that are picked from the urn, and Bc, containing the n - k left behind. Suppose we partition a set of n distinct objects into J subsets B1 , B2 , Á , BJ , where BJ is assigned kJ elements and k1 + k2 + Á + kJ = n. In Problem 2.61, it is shown that the number of distinct partitions is n! . k1! k2! Á kJ! (2.26) Equation (2.26) is called the multinomial coefficient. The binomial coefficient is the J = 2 case of the multinomial coefficient. Example 2.23 A six-sided die is tossed 12 times. How many distinct sequences of faces (numbers from the set 51, 2, 3, 4, 5, 66) have each number appearing exactly twice? What is the probability of obtaining such a sequence? The number of distinct sequences in which each face of the die appears exactly twice is the same as the number of partitions of the set 51, 2, Á , 126 into 6 subsets of size 2, namely 12! 12! = 6 = 7,484,400. 2! 2! 2! 2! 2! 2! 2 From Eq. (2.20) we have that there are 612 possible outcomes in 12 tosses of a die. If we suppose that all of these have equal probabilities, then the probability of obtaining a sequence in which each face appears exactly twice is 7,484,400 12!/2 6 M 3.4110-32. = 2,176,782,336 612 Section 2.4 2.3.5 Conditional Probability 47 Sampling with Replacement and without Ordering Suppose we pick k objects from a set of n distinct objects with replacement and we record the result without regard to order. This can be done by filling out a form which has n columns, one for each distinct object. Each time an object is selected, an “x” is placed in the corresponding column. For example, if we are picking 5 objects from 4 distinct objects, one possible form would look like this: Object 1 xx Object 2 / Object 3 / x Object 4 / xx where the slash symbol (“/”) is used to separate the entries for different columns. Note that this form can be summarized by the sequence xx//x/xx where the n - 1 /’s indicate the lines between columns, and where nothing appears between consecutive /’s if the corresponding object was not selected. Each different arrangement of 5 x’s and 3 /’s leads to a distinct form. If we identify x’s with “white balls” and /’s with “black balls,” then this problem was considered in Example 2.21, and 8 the number of different arrangements is given by A 3 B . In the general case the form will involve k x’s and n - 1 /’s. Thus the number of different ways of picking k objects from a set of n distinct objects with replacement and without ordering is given by ¢ 2.4 n - 1 + k n - 1 + k ≤ = ¢ ≤. k n - 1 CONDITIONAL PROBABILITY Quite often we are interested in determining whether two events, A and B, are related in the sense that knowledge about the occurrence of one, say B, alters the likelihood of occurrence of the other, A. This requires that we find the conditional probability, P3A ƒ B4, of event A given that event B has occurred. The conditional probability is defined by P3A ƒ B4 = P3A ¨ B4 P3B4 for P3B4 7 0. (2.27) Knowledge that event B has occurred implies that the outcome of the experiment is in the set B. In computing P3A ƒ B4 we can therefore view the experiment as now having the reduced sample space B as shown in Fig. 2.9. The event A occurs in the reduced sample space if and only if the outcome z is in A ¨ B. Equation (2.27) simply renormalizes the probability of events that occur jointly with B. Thus if we let A = B, Eq. (2.27) gives P3B ƒ B4 = 1, as required. It is easy to show that P3A ƒ B4, for fixed B, satisfies the axioms of probability. (See Problem 2.74.) If we interpret probability as relative frequency, then P3A ƒ B4 should be the relative frequency of the event A ¨ B in experiments where B occurred. Suppose that the experiment is performed n times, and suppose that event B occurs nB times, and that 48 Chapter 2 Basic Concepts of Probability Theory S B A傽B A FIGURE 2.9 If B is known to have occurred, then A can occur only if A ¨ B occurs. event A ¨ B occurs nA¨B times. The relative frequency of interest is then P3A ¨ B4 nA¨B/n nA¨B = : , nB nB/n P3B4 where we have implicitly assumed that P3B4 7 0. This is in agreement with Eq. (2.27). Example 2.24 A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls, numbered 3 and 4. The number and color of the ball is noted, so the sample space is 511, b2, 12, b2, 13, w2, 14, w26. Assuming that the four outcomes are equally likely, find P3A ƒ B4 and P3A ƒ C4, where A, B, and C are the following events: A = 511, b2, 12, b26, “black ball selected,” B = 512, b2, 14, w26, “even-numbered ball selected,” and C = 513, w2, 14, w26, “number of ball is greater than 2.” Since P3A ¨ B4 = P312, b24 and P3A ¨ C4 = P34 = 0, Eq. (2.24) gives P3A ƒ B4 = P3A ƒ C4 = P3A ¨ B4 P3B4 P3A ¨ C4 P3C4 = .25 = .5 = P3A4 .5 = 0 = 0 Z P3A4. .5 In the first case, knowledge of B did not alter the probability of A. In the second case, knowledge of C implied that A had not occurred. If we multiply both sides of the definition of P3A ƒ B4 by P[B] we obtain P3A ¨ B4 = P3A ƒ B4P3B4. (2.28a) P3A ¨ B4 = P3B ƒ A4P3A4. (2.28b) Similarly we also have that Section 2.4 Conditional Probability 49 In the next example we show how this equation is useful in finding probabilities in sequential experiments. The example also introduces a tree diagram that facilitates the calculation of probabilities. Example 2.25 An urn contains two black balls and three white balls. Two balls are selected at random from the urn without replacement and the sequence of colors is noted. Find the probability that both balls are black. This experiment consists of a sequence of two subexperiments. We can imagine working our way down the tree shown in Fig. 2.10 from the topmost node to one of the bottom nodes: We reach node 1 in the tree if the outcome of the first draw is a black ball; then the next subexperiment consists of selecting a ball from an urn containing one black ball and three white balls. On the other hand, if the outcome of the first draw is white, then we reach node 2 in the tree and the second subexperiment consists of selecting a ball from an urn that contains two black balls and two white balls. Thus if we know which node is reached after the first draw, then we can state the probabilities of the outcome in the next subexperiment. Let B1 and B2 be the events that the outcome is a black ball in the first and second draw, respectively. From Eq. (2.28b) we have P3B1 ¨ B24 = P3B2 ƒ B14P3B14. In terms of the tree diagram in Fig. 2.10, P3B14 is the probability of reaching node 1 and P3B2 ƒ B14 is the probability of reaching the leftmost bottom node from node 1. Now P3B14 = 2/5 since the first draw is from an urn containing two black balls and three white balls; P3B2 ƒ B14 = 1/4 since, given B1 , the second draw is from an urn containing one black ball and three white balls. Thus P3B1 ¨ B24 = 1 12 = . 45 10 In general, the probability of any sequence of colors is obtained by multiplying the probabilities corresponding to the node transitions in the tree in Fig. 2.10. 0 B1 2 5 3 5 W1 1 B2 1 10 1 4 Outcome of first draw 2 3 4 W2 3 10 B2 3 10 2 4 2 4 W2 Outcome of second draw 3 10 FIGURE 2.10 The paths from the top node to a bottom node correspond to the possible outcomes in the drawing of two balls from an urn without replacement. The probability of a path is the product of the probabilities in the associated transitions. 50 Chapter 2 Basic Concepts of Probability Theory Example 2.26 Binary Communication System Many communication systems can be modeled in the following way. First, the user inputs a 0 or a 1 into the system, and a corresponding signal is transmitted. Second, the receiver makes a decision about what was the input to the system, based on the signal it received. Suppose that the user sends 0s with probability 1 - p and 1s with probability p, and suppose that the receiver makes random decision errors with probability e. For i = 0, 1, let A i be the event “input was i,” and let Bi be the event “receiver decision was i.” Find the probabilities P3A i ¨ Bj4 for i = 0, 1 and j = 0, 1. The tree diagram for this experiment is shown in Fig. 2.11. We then readily obtain the desired probabilities P3A 0 ¨ B04 = 11 - p211 - e2, P3A 0 ¨ B14 = 11 - p2e, P3A 1 ¨ B04 = pe, and P3A 1 ¨ B14 = p11 - e2. Let B1 , B2 , Á , Bn be mutually exclusive events whose union equals the sample space S as shown in Fig. 2.12. We refer to these sets as a partition of S. Any event A can be represented as the union of mutually exclusive events in the following way: A = A ¨ S = A ¨ 1B1 ´ B2 ´ Á ´ Bn2 = 1A ¨ B12 ´ 1A ¨ B22 ´ Á ´ 1A ¨ Bn2. (See Fig. 2.12.) By Corollary 4, the probability of A is P3A4 = P3A ¨ B14 + P3A ¨ B24 + Á + P3A ¨ Bn4. By applying Eq. (2.28a) to each of the terms on the right-hand side, we obtain the theorem on total probability: (2.29) P3A4 = P3A ƒ B14P3B14 + P3A ƒ B24P3B24 + Á + P3A ƒ Bn4P3Bn4. This result is particularly useful when the experiments can be viewed as consisting of a sequence of two subexperiments as shown in the tree diagram in Fig. 2.10. 0 0 (1 ⫺ p)(1 ⫺ ε) 1⫺ε ε 1⫺p 1 (1 ⫺ p)ε pε 1 p 0 ε Input into binary channel 1⫺ε 1 Output from binary channel p(1 ⫺ ε) FIGURE 2.11 Probabilities of input-output pairs in a binary transmission system. Section 2.4 B3 B1 Conditional Probability 51 Bn  1 A Bn B2 FIGURE 2.12 A partition of S into n disjoint sets. Example 2.27 In the experiment discussed in Example 2.25, find the probability of the event W2 that the second ball is white. The events B1 = 51b, b2, 1b, w26 and W1 = 51w, b2, 1w, w26 form a partition of the sample space, so applying Eq. (2.29) we have P3W24 = P3W2 ƒ B14P3B14 + P3W2 ƒ W14P3W14 = 13 3 32 + = . 45 25 5 It is interesting to note that this is the same as the probability of selecting a white ball in the first draw. The result makes sense because we are computing the probability of a white ball in the second draw under the assumption that we have no knowledge of the outcome of the first draw. Example 2.28 A manufacturing process produces a mix of “good” memory chips and “bad” memory chips. The lifetime of good chips follows the exponential law introduced in Example 2.13, with a rate of failure a. The lifetime of bad chips also follows the exponential law, but the rate of failure is 1000a. Suppose that the fraction of good chips is 1 - p and of bad chips, p. Find the probability that a randomly selected chip is still functioning after t seconds. Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is good,” and B the event “chip is bad.” By the theorem on total probability we have P3C4 = P3C ƒ G4P3G4 + P3C ƒ B4P3B4 = P3C ƒ G411 - p2 + P3C ƒ B4p = 11 - p2e -at + pe -1000at, where we used the fact that P3C ƒ G4 = e -at and P3C ƒ B4 = e -1000at. 52 2.4.1 Chapter 2 Basic Concepts of Probability Theory Bayes’ Rule Let B1 , B2 , Á , Bn be a partition of a sample space S. Suppose that event A occurs; what is the probability of event Bj? By the definition of conditional probability we have P3Bj ƒ A4 = P3A ¨ Bj4 P3A4 = P3A ƒ Bj4P3Bj4 a P3A ƒ Bk4P3Bk4 n , (2.30) k=1 where we used the theorem on total probability to replace P[A]. Equation (2.30) is called Bayes’ rule. Bayes’ rule is often applied in the following situation. We have some random experiment in which the events of interest form a partition. The “a priori probabilities” of these events, P3Bj4, are the probabilities of the events before the experiment is performed. Now suppose that the experiment is performed, and we are informed that event A occurred; the “a posteriori probabilities” are the probabilities of the events in the partition, P3Bj ƒ A4, given this additional information. The following two examples illustrate this situation. Example 2.29 Binary Communication System In the binary communication system in Example 2.26, find which input is more probable given that the receiver has output a 1. Assume that, a priori, the input is equally likely to be 0 or 1. Let A k be the event that the input was k, k = 0, 1, then A 0 and A 1 are a partition of the sample space of input-output pairs. Let B1 be the event “receiver output was a 1.” The probability of B1 is P3B14 = P3B1 ƒ A 04P3A 04 + P3B1 ƒ A 14P3A 14 1 1 1 = ea b + 11 - e2a b = . 2 2 2 Applying Bayes’ rule, we obtain the a posteriori probabilities P3A 0 ƒ B14 = P3A 1 ƒ B14 = P3B1 ƒ A 04P3A 04 = P3B1 ƒ A 14P3A 14 = P3B14 P3B14 e/2 = e 1/2 11 - e2/2 1/2 = 11 - e2. Thus, if e is less than 1/2, then input 1 is more likely than input 0 when a 1 is observed at the output of the channel. Example 2.30 Quality Control Consider the memory chips discussed in Example 2.28. Recall that a fraction p of the chips are bad and tend to fail much more quickly than good chips. Suppose that in order to “weed out” the bad chips, every chip is tested for t seconds prior to leaving the factory. The chips that fail are discarded and the remaining chips are sent out to customers. Find the value of t for which 99% of the chips sent out to customers are good. Section 2.5 Independence of Events 53 Let C be the event “chip still functioning after t seconds,” and let G be the event “chip is good,” and B be the event “chip is bad.” The problem requires that we find the value of t for which P3G ƒ C4 = .99. We find P3G ƒ C4 by applying Bayes’ rule: P3G ƒ C4 = = P3C ƒ G4P3G4 P3C ƒ G4P3G4 + P3C ƒ B4P3B4 11 - p2e-at 11 - p2e-at + pe-a1000t = 1 + 1 pe-a1000t = .99. 11 - p2e-at The above equation can then be solved for t: t = 99p 1 lna b. 999a 1 - p For example, if 1/a = 20,000 hours and p = .10, then t = 48 hours. 2.5 INDEPENDENCE OF EVENTS If knowledge of the occurrence of an event B does not alter the probability of some other event A, then it would be natural to say that event A is independent of B. In terms of probabilities this situation occurs when P3A4 = P3A ƒ B4 = P3A ¨ B4 P3B4 . The above equation has the problem that the right-hand side is not defined when P3B4 = 0. We will define two events A and B to be independent if P3A ¨ B4 = P3A4P3B4. (2.31) Equation (2.31) then implies both P3A ƒ B4 = P3A4 (2.32a) P3B ƒ A4 = P3B4 (2.32b) and Note also that Eq. (2.32a) implies Eq. (2.31) when P3B4 Z 0 and Eq. (2.32b) implies Eq. (2.31) when P3A4 Z 0. 54 Chapter 2 Basic Concepts of Probability Theory Example 2.31 A ball is selected from an urn containing two black balls, numbered 1 and 2, and two white balls, numbered 3 and 4. Let the events A, B, and C be defined as follows: A = 511, b2, 12, b26, “black ball selected”; B = 512, b2, 14, w26, “even-numbered ball selected”; and C = 513, w2, 14, w26, “number of ball is greater than 2.” Are events A and B independent? Are events A and C independent? First, consider events A and B. The probabilities required by Eq. (2.31) are P3A4 = P3B4 = 1 , 2 and P3A ¨ B4 = P3512, b264 = 1 . 4 Thus P3A ¨ B4 = 1 = P3A4P3B4, 4 and the events A and B are independent. Equation (2.32b) gives more insight into the meaning of independence: P3A ƒ B4 = P3A4 = P3A ¨ B4 P3B4 P3A4 P3S4 = = P3512, b264 P3512, b2, 14, w264 = P3511, b2, 12, b264 1/4 1 = 1/2 2 P3511, b2, 12, b2, 13, w2, 14, w264 = 1/2 . 1 These two equations imply that P3A4 = P3A ƒ B4 because the proportion of outcomes in S that lead to the occurrence of A is equal to the proportion of outcomes in B that lead to A. Thus knowledge of the occurrence of B does not alter the probability of the occurrence of A. Events A and C are not independent since P3A ¨ C4 = P34 = 0 so P3A ƒ C4 = 0 Z P3A4 = .5. In fact, A and C are mutually exclusive since A ¨ C = , so the occurrence of C implies that A has definitely not occurred. In general if two events have nonzero probability and are mutually exclusive, then they cannot be independent. For suppose they were independent and mutually exclusive; then 0 = P3A ¨ B4 = P3A4P3B4, which implies that at least one of the events must have zero probability. Section 2.5 Independence of Events 55 Example 2.32 Two numbers x and y are selected at random between zero and one. Let the events A, B, and C be defined as follows: A = 5x 7 0.56, B = 5y 7 0.56, and C = 5x 7 y6. Are the events A and B independent? Are A and C independent? Figure 2.13 shows the regions of the unit square that correspond to the above events. Using Eq. (2.32a), we have P3A ƒ B4 = P3A ¨ B4 P3B4 = 1/4 1 = = P3A4, 1/2 2 so events A and B are independent. Again we have that the “proportion” of outcomes in S leading to A is equal to the “proportion” in B that lead to A. Using Eq. (2.32b), we have P3A ƒ C4 = P3A ¨ C4 P3C4 = 3/8 3 1 = Z = P3A4, 1/2 4 2 so events A and C are not independent. Indeed from Fig. 2.13(b) we can see that knowledge of the fact that x is greater than y increases the probability that x is greater than 0.5. What conditions should three events A, B, and C satisfy in order for them to be independent? First, they should be pairwise independent, that is, P3A ¨ B4 = P3A4P3B4, P3A ¨ C4 = P3A4P3C4, and P3B ¨ C4 = P3B4P3C4. y 1 B 1 2 A x 1 1 2 (a) Events A and B are independent. 0 y 1 A C x 1 1 2 (b) Events A and C are not independent. 0 FIGURE 2.13 Examples of independent and nonindependent events. 56 Chapter 2 Basic Concepts of Probability Theory In addition, knowledge of the joint occurrence of any two, say A and B, should not affect the probability of the third, that is, P3C ƒ A ¨ B4 = P3C4. In order for this to hold, we must have P3C ƒ A ¨ B4 = P3A ¨ B ¨ C4 P3A ¨ B4 = P3C4. This in turn implies that we must have P3A ¨ B ¨ C4 = P3A ¨ B4P3C4 = P3A4P3B4P3C4, where we have used the fact that A and B are pairwise independent. Thus we conclude that three events A, B, and C are independent if the probability of the intersection of any pair or triplet of events is equal to the product of the probabilities of the individual events. The following example shows that if three events are pairwise independent, it does not necessarily follow that P3A ¨ B ¨ C4 = P3A4P3B4P3C4. Example 2.33 Consider the experiment discussed in Example 2.32 where two numbers are selected at random from the unit interval. Let the events B, D, and F be defined as follows: B = ey 7 1 f, 2 F = ex 6 1 1 1 1 and y 6 f ´ e x 7 and y 7 f. 2 2 2 2 D = ex 6 1 f 2 The three events are shown in Fig. 2.14. It can be easily verified that any pair of these events is independent: P3B ¨ D4 = 1 = P3B4P3D4, 4 P3B ¨ F4 = 1 = P3B4P3F4, and 4 P3D ¨ F4 = 1 = P3D4P3F4. 4 However, the three events are not independent, since B ¨ D ¨ F = , so P3B ¨ D ¨ F4 = P34 = 0 Z P3B4P3D4P3F4 = 1 . 8 In order for a set of n events to be independent, the probability of an event should be unchanged when we are given the joint occurrence of any subset of the other events. This requirement naturally leads to the following definition of independence. The events A 1 , A 2 , Á , A n are said to be independent if for k = 2, Á , n, P3A i1 ¨ A i2 ¨ Á ¨ A ik4 = P3A i14P3A i24 Á P3A ik4, (2.33) Section 2.5 Independence of Events 57 y y 1 1 B 1 2 D 0 x 1 0 1 (a) B ⫽ {y  } 2 1 2 1 x 1 (b) D ⫽ {x  } 2 y 1 F 1 2 F 0 (c) F ⫽ {x  1 2 1 x 1 1 1 1 and y  } {x  and y  } 2 2 2 2 FIGURE 2.14 Events B, D, and F are pairwise independent, but the triplet B, D, F are not independent events. where 1 … i1 6 i2 6 Á 6 ik … n. For a set of n events we need to verify that the probabilities of all 2 n - n - 1 possible intersections factor in the right way. The above definition of independence appears quite cumbersome because it requires that so many conditions be verified. However, the most common application of the independence concept is in making the assumption that the events of separate experiments are independent. We refer to such experiments as independent experiments. For example, it is common to assume that the outcome of a coin toss is independent of the outcomes of all prior and all subsequent coin tosses. Example 2.34 Suppose a fair coin is tossed three times and we observe the resulting sequence of heads and tails. Find the probability of the elementary events. The sample space of this experiment is S = 5HHH, HHT, HTH, THH, TTH, THT, HTT, TTT6. The assumption that the coin is fair means that the outcomes of a single toss are equiprobable, that is, P3H4 = P3T4 = 1/2. If we assume that the outcomes of the coin tosses are independent, then 1 , 8 1 P35HHT64 = P35H64P35H64P35T64 = , 8 P35HHH64 = P35H64P35H64P35H64 = 58 Chapter 2 Basic Concepts of Probability Theory 1 , 8 1 P35THH64 = P35T64P35H64P35H64 = , 8 1 P35TTH64 = P35T64P35T64P35H64 = , 8 1 P35THT64 = P35T64P35H64P35T64 = , 8 1 P35HTT64 = P35H64P35T64P35T64 = , and 8 1 P35TTT64 = P35T64P35T64P35T64 = . 8 P35HTH64 = P35H64P35T64P35H64 = Example 2.35 System Reliability A system consists of a controller and three peripheral units. The system is said to be “up” if the controller and at least two of the peripherals are functioning. Find the probability that the system is up, assuming that all components fail independently. Define the following events: A is “controller is functioning” and Bi is “peripheral i is functioning” where i = 1, 2, 3. The event F, “two or more peripheral units are functioning,” occurs if all three units are functioning or if exactly two units are functioning. Thus F = 1B1 ¨ B2 ¨ Bc32 ´ 1B1 ¨ Bc2 ¨ B32 ´ 1Bc1 ¨ B2 ¨ B32 ´ 1B1 ¨ B2 ¨ B32. Note that the events in the above union are mutually exclusive. Thus P3F4 = P3B14P3B24P3Bc34 + P3B14P3Bc24P3B34 + P3Bc14P3B24P3B34 + P3B14P3B24P3B34 = 311 - a22a + 11 - a23, where we have assumed that each peripheral fails with probability a, so that P3Bi4 = 1 - a and P3Bci 4 = a. The event “system is up” is then A ¨ F. If we assume that the controller fails with probability p, then P3“system up”4 = P3A ¨ F4 = P3A4P3F4 = 11 - p2P3F4 = 11 - p25311 - a22a + 11 - a236. Let a = 10%, then all three peripherals are functioning 11 - a23 = 72.9% of the time and two are functioning and one is “down” 311 - a22a = 24.3% of the time. Thus two or more peripherals are functioning 97.2% of the time. Suppose that the controller is not very reliable, say p = 20%, then the system is up only 77.8% of the time, mostly because of controller failures. Suppose a second identical controller with p = 20% is added to the system, and that the system is “up” if at least one of the controllers is functioning and if two or more of the peripherals are functioning. In Problem 2.94, you are asked to show that at least one of the controllers is Section 2.6 Sequential Experiments 59 functioning 96% of the time, and that the system is up 93.3% of the time. This is an increase of 16% over the system with a single controller. 2.6 SEQUENTIAL EXPERIMENTS Many random experiments can be viewed as sequential experiments that consist of a sequence of simpler subexperiments. These subexperiments may or may not be independent. In this section we discuss methods for obtaining the probabilities of events in sequential experiments. 2.6.1 Sequences of Independent Experiments Suppose that a random experiment consists of performing experiments E1 , E2 , Á , En . The outcome of this experiment will then be an n-tuple s = 1s1 , Á , sn2, where sk is the outcome of the kth subexperiment. The sample space of the sequential experiment is defined as the set that contains the above n-tuples and is denoted by the Cartesian product of the individual sample spaces S1 * S2 * Á * Sn . We can usually determine, because of physical considerations, when the subexperiments are independent, in the sense that the outcome of any given subexperiment cannot affect the outcomes of the other subexperiments. Let A 1 , A 2 , Á , A n be events such that A k concerns only the outcome of the kth subexperiment. If the subexperiments are independent, then it is reasonable to assume that the above events A 1 , A 2 , Á , A n are independent. Thus P3A 1 ¨ A 2 ¨ Á ¨ A n4 = P3A 14P3A 24 Á P3A n4. (2.34) This expression allows us to compute all probabilities of events of the sequential experiment. Example 2.36 Suppose that 10 numbers are selected at random from the interval [0, 1]. Find the probability that the first 5 numbers are less than 1/4 and the last 5 numbers are greater than 1/2. Let x1 , x2 , Á , x10 be the sequence of 10 numbers, then the events of interest are Ak = e xk 6 1 f 4 for k = 1, Á , 5 Ak = e xk 7 1 f 2 for k = 6, Á , 10. If we assume that each selection of a number is independent of the other selections, then P3A 1 ¨ A 2 ¨ Á ¨ A 104 = P3A 14P3A 24 Á P3A 104 1 5 1 5 = a b a b . 4 2 We will now derive several important models for experiments that consist of sequences of independent subexperiments. 60 2.6.2 Chapter 2 Basic Concepts of Probability Theory The Binomial Probability Law A Bernoulli trial involves performing an experiment once and noting whether a particular event A occurs. The outcome of the Bernoulli trial is said to be a “success” if A occurs and a “failure” otherwise. In this section we are interested in finding the probability of k successes in n independent repetitions of a Bernoulli trial. We can view the outcome of a single Bernoulli trial as the outcome of a toss of a coin for which the probability of heads (success) is p = P3A4. The probability of k successes in n Bernoulli trials is then equal to the probability of k heads in n tosses of the coin. Example 2.37 Suppose that a coin is tossed three times. If we assume that the tosses are independent and the probability of heads is p, then the probability for the sequences of heads and tails is P35HHH64 = P35H64P35H64P35H64 = p3, P35HHT64 = P35H64P35H64P35T64 = p211 - p2, P35HTH64 = P35H64P35T64P35H64 = p211 - p2, P35THH64 = P35T64P35H64P35H64 = p211 - p2, P35TTH64 = P35T64P35T64P35H64 = p11 - p22, P35THT64 = P35T64P35H64P35T64 = p11 - p22, P35HTT64 = P35H64P35T64P35T64 = p11 - p22, and P35TTT64 = P35T64P35T64P35T64 = 11 - p23 where we used the fact that the tosses are independent. Let k be the number of heads in three trials, then P3k = 04 = P35TTT64 = 11 - p23, P3k = 14 = P35TTH, THT, HTT64 = 3p11 - p22, P3k = 24 = P35HHT, HTH, THH64 = 3p211 - p2, and P3k = 34 = P35HHH64 = p3. The result in Example 2.37 is the n = 3 case of the binomial probability law. Theorem Let k be the number of successes in n independent Bernoulli trials, then the probabilities of k are given by the binomial probability law: n pn1k2 = ¢ ≤ pk11 - p2n - k k for k = 0, Á , n, (2.35) Section 2.6 Sequential Experiments 61 where pn1k2 is the probability of k successes in n trials, and n k ¢ ≤ = n! k! 1n - k2! (2.36) is the binomial coefficient. The term n! in Eq. (2.36) is called n factorial and is defined by n! = n1n - 12 Á 122112. By definition 0! is equal to 1. We now prove the above theorem. Following Example 2.34 we see that each of the sequences with k successes and n - k failures has the same probability, namely pk11 - p2n - k. Let Nn1k2 be the number of distinct sequences that have k successes and n - k failures, then pn1k2 = Nn1k2pk11 - p2n - k. (2.37) n Nn1k2 = ¢ ≤ . k (2.38) The expression Nn1k2 is the number of ways of picking k positions out of n for the successes. It can be shown that5 The theorem follows by substituting Eq. (2.38) into Eq. (2.37). Example 2.38 Verify that Eq. (2.35) gives the probabilities found in Example 2.37. In Example 2.37, let “toss results in heads” correspond to a “success,” then p3102 = 3! 0 p 11 0! 3! 3! 1 p 11 p3112 = 1! 2! 3! 2 p3122 = p 11 2! 1! 3! 3 p3132 = p 11 0! 3! - p23 = 11 - p23, - p22 = 3p11 - p22, - p21 = 3p211 - p2, and - p20 = p3, which are in agreement with our previous results. You were introduced to the binomial coefficient in an introductory calculus course when the binomial theorem was discussed: n n 1a + b2n = a ¢ ≤ akbn - k. k k=0 5 See Example 2.21. (2.39a) 62 Chapter 2 Basic Concepts of Probability Theory If we let a = b = 1, then n n n 2 n = a ¢ ≤ = a Nn1k2, k=0 k k=0 which is in agreement with the fact that there are 2 n distinct possible sequences of successes and failures in n trials. If we let a = p and b = 1 - p in Eq. (2.39a), we then obtain n n n 1 = a ¢ ≤ pk11 - p2n - k = a pn1k2, k=0 k k=0 (2.39b) which confirms that the probabilities of the binomial probabilities sum to 1. The term n! grows very quickly with n, so numerical problems are encountered for relatively small values of n if one attempts to compute pn1k2 directly using Eq. (2.35). The following recursive formula avoids the direct evaluation of n! and thus extends the range of n for which pn1k2 can be computed before encountering numerical difficulties: pn1k + 12 = 1n - k2p 1k + 1211 - p2 pn1k2. (2.40) Later in the book, we present two approximations for the binomial probabilities for the case when n is large. Example 2.39 Let k be the number of active (nonsilent) speakers in a group of eight noninteracting (i.e., independent) speakers. Suppose that a speaker is active with probability 1/3. Find the probability that the number of active speakers is greater than six. For i = 1, Á , 8, let A i denote the event “ith speaker is active.” The number of active speakers is then the number of successes in eight Bernoulli trials with p = 1>3. Thus the probability that more than six speakers are active is 8 1 7 2 8 1 8 P3k = 74 + P3k = 84 = ¢ ≤ a b a b + ¢ ≤ a b 3 7 3 8 3 = .00244 + .00015 = .00259. Example 2.40 Error Correction Coding A communication system transmits binary information over a channel that introduces random bit errors with probability e = 10-3. The transmitter transmits each information bit three times, and a decoder takes a majority vote of the received bits to decide on what the transmitted bit was. Find the probability that the receiver will make an incorrect decision. The receiver can correct a single error, but it will make the wrong decision if the channel introduces two or more errors. If we view each transmission as a Bernoulli trial in which a “success” corresponds to the introduction of an error, then the probability of two or more errors in three Bernoulli trials is 3 3 P3k Ú 24 = ¢ ≤ 1.001221.9992 + ¢ ≤ 1.00123 M 3110-62. 2 3 Section 2.6 2.6.3 Sequential Experiments 63 The Multinomial Probability Law The binomial probability law can be generalized to the case where we note the occurrence of more than one event. Let B1 , B2 , Á , BM be a partition of the sample space S of some random experiment and let P3Bj4 = pj . The events are mutually exclusive, so p1 + p2 + Á + pM = 1. Suppose that n independent repetitions of the experiment are performed. Let kj be the number of times event Bj occurs, then the vector 1k1 , k2 , Á , kM2 specifies the number of times each of the events Bj occurs. The probability of the vector 1k1 , Á , kM2 satisfies the multinomial probability law: P31k1 , k2 , Á , kM24 = n! k pk1pk2 Á pMM , k1! k2! Á kM! 1 2 (2.41) where k1 + k2 + Á + kM = n. The binomial probability law is the M = 2 case of the multinomial probability law. The derivation of the multinomial probabilities is identical to that of the binomial probabilities. We only need to note that the number of different sequences with k1 , k2 , Á , kM instances of the events B1 , B2 , Á , BM is given by the multinomial coefficient in Eq. (2.26). Example 2.41 A dart is thrown nine times at a target consisting of three areas. Each throw has a probability of .2, .3, and .5 of landing in areas 1, 2, and 3, respectively. Find the probability that the dart lands exactly three times in each of the areas. This experiment consists of nine independent repetitions of a subexperiment that has three possible outcomes. The probability for the number of occurrences of each outcome is given by the multinomial probabilities with parameters n = 9 and p1 = .2, p2 = .3, and p3 = .5: P313, 3, 324 = 9! 1.2231.3231.523 = .04536. 3! 3! 3! Example 2.42 Suppose we pick 10 telephone numbers at random from a telephone book and note the last digit in each of the numbers.What is the probability that we obtain each of the integers from 0 to 9 only once? The probabilities for the number of occurrences of the integers is given by the multinomial probabilities with parameters M = 10, n = 10, and pj = 1/10 if we assume that the 10 integers in the range 0 to 9 are equiprobable.The probability of obtaining each integer once in 10 draws is then 10! 1.1210 M 3.6110-42. 1! 1! Á 1! 2.6.4 The Geometric Probability Law Consider a sequential experiment in which we repeat independent Bernoulli trials until the occurrence of the first success. Let the outcome of this experiment be m, the number of trials carried out until the occurrence of the first success. The sample space 64 Chapter 2 Basic Concepts of Probability Theory for this experiment is the set of positive integers. The probability, p(m), that m trials are required is found by noting that this can only happen if the first m - 1 trials result in failures and the mth trial in success.6 The probability of this event is p1m2 = P3A c1A c2 Á A cm - 1A m4 = 11 - p2m - 1p m = 1, 2, Á , (2.42a) where A i is the event “success in ith trial.” The probability assignment specified by Eq. (2.42a) is called the geometric probability law. The probabilities in Eq. (2.42a) sum to 1: 1 m-1 = p = 1, a p1m2 = p a q 1 - q m=1 m=1 q q (2.42b) where q = 1 - p, and where we have used the formula for the summation of a geometric series. The probability that more than K trials are required before a success occurs has a simple form: P35m 7 K64 = p a qm - 1 = pqK a qj q q m=K+1 j=0 = pqK 1 1 - q = q K. (2.43) Example 2.43 Error Control by Retransmission Computer A sends a message to computer B over an unreliable radio link. The message is encoded so that B can detect when errors have been introduced into the message during transmission. If B detects an error, it requests A to retransmit it. If the probability of a message transmission error is q = .1, what is the probability that a message needs to be transmitted more than two times? Each transmission of a message is a Bernoulli trial with probability of success p = 1 - q. The Bernoulli trials are repeated until the first success (error-free transmission). The probability that more than two transmissions are required is given by Eq. (2.43): P3m 7 24 = q2 = 10-2. 2.6.5 Sequences of Dependent Experiments In this section we consider a sequence or “chain” of subexperiments in which the outcome of a given subexperiment determines which subexperiment is performed next. We first give a simple example of such an experiment and show how diagrams can be used to specify the sample space. Example 2.44 A sequential experiment involves repeatedly drawing a ball from one of two urns, noting the number on the ball, and replacing the ball in its urn. Urn 0 contains a ball with the number 1 and two balls with the number 0, and urn 1 contains five balls with the number 1 and one ball 6 See Example 2.11 in Section 2.2 for a relative frequency interpretation of how the geometric probability law comes about. Section 2.6 Sequential Experiments 65 with the number 0. The urn from which the first draw is made is selected at random by flipping a fair coin. Urn 0 is used if the outcome is heads and urn 1 if the outcome is tails. Thereafter the urn used in a subexperiment corresponds to the number on the ball selected in the previous subexperiment. The sample space of this experiment consists of sequences of 0s and 1s. Each possible sequence corresponds to a path through the “trellis” diagram shown in Fig. 2.15(a). The nodes in the diagram denote the urn used in the nth subexperiment, and the labels in the branches denote the outcome of a subexperiment. Thus the path 0011 corresponds to the sequence: The coin toss was heads so the first draw was from urn 0; the outcome of the first draw was 0, so the second draw was from urn 0; the outcome of the second draw was 1, so the third draw was from urn 1; and the outcome from the third draw was 1, so the fourth draw is from urn 1. Now suppose that we want to compute the probability of a particular sequence of outcomes, say s0 , s1 , s2 . Denote this probability by P35s06 ¨ 5s16 ¨ 5s264. Let A = 5s26 and B = 5s06 ¨ 5s16, then since P3A ¨ B4 = P3A ƒ B4P3B4 we have P35s06 ¨ 5s16 ¨ 5s264 = P35s26 ƒ 5s06 ¨ 5s164P35s06 ¨ 5s164 = P35s26 ƒ 5s06 ¨ 5s164P35s16 ƒ 5s064P35s064. (2.44) Now note that in the above urn example the probability P35sn6 ƒ 5s06 ¨ Á ¨ 5sn - 164 depends only on 5sn - 16 since the most recent outcome determines which subexperiment is performed: P35sn6 ƒ 5s06 ¨ Á ¨ 5sn - 164 = P35sn6 ƒ 5sn - 164. 0 0 0 0 1 h t 1 0 1 0 1 2 1 2 3 0 1 3 1 2 1 6 1 0 1 1 1 2 3 (a) Each sequence of outcomes corresponds to a path through this trellis diagram. 2 3 2 3 0 1 3 5 6 1 6 1 0 1 0 1 1 0 0 1 4 0 1 3 5 6 1 6 1 5 6 1 (b) The probability of a sequence of outcomes is the product of the probabilities along the associated path. FIGURE 2.15 Trellis diagram for a Markov chain. (2.45) 66 Chapter 2 Basic Concepts of Probability Theory Therefore for the sequence of interest we have that P35s06 ¨ 5s16 ¨ 5s264 = P35s26 ƒ 5s164P35s16 ƒ 5s064P35s064. (2.46) Sequential experiments that satisfy Eq. (2.45) are called Markov chains. For these experiments, the probability of a sequence s0 , s1 , Á , sn is given by P3s0 , s1 , Á , sn4 = P3sn ƒ sn - 14P3sn - 1 ƒ sn - 24 Á P3s1 ƒ s04P3s04 (2.47) where we have simplified notation by omitting braces. Thus the probability of the sequence s0 , Á , sn is given by the product of the probability of the first outcome s0 and the probabilities of all subsequent transitions, s0 to s1 , s1 to s2 , and so on. Chapter 11 deals with Markov chains. Example 2.45 Find the probability of the sequence 0011 for the urn experiment introduced in Example 2.44. Recall that urn 0 contains two balls with label 0 and one ball with label 1, and that urn 1 contains five balls with label 1 and one ball with label 0. We can readily compute the probabilities of sequences of outcomes by labeling the branches in the trellis diagram with the probability of the corresponding transition as shown in Fig. 2.15(b). Thus the probability of the sequence 0011 is given by P300114 = P31 ƒ 14P31 ƒ 04P30 ƒ 04P304, where the transition probabilities are given by P31 ƒ 04 = 1 3 and P30 ƒ 04 = 2 3 P31 ƒ 14 = 5 6 and P30 ƒ 14 = 1 , 6 and the initial probabilities are given by P102 = 1 = P314. 2 If we substitute these values into the expression for P[0011], we obtain 5 5 1 2 1 . P300114 = a b a b a b a b = 6 3 3 2 54 The two-urn experiment in Examples 2.44 and 2.45 is the simplest example of the Markov chain models that are discussed in Chapter 11. The two-urn experiment discussed here is used to model situations in which there are only two outcomes, and in which the outcomes tend to occur in bursts. For example, the two-urn model has been used to model the “bursty” behavior of the voice packets generated by a single speaker where bursts of active packets are separated by relatively long periods of silence. The model has also been used for the sequence of black and white dots that result from scanning a black and white image line by line. Section 2.7 *2.7 Synthesizing Randomness: Random Number Generators 67 A COMPUTER METHOD FOR SYNTHESIZING RANDOMNESS: RANDOM NUMBER GENERATORS This section introduces the basic method for generating sequences of “random” numbers using a computer. Any computer simulation of a system that involves randomness must include a method for generating sequences of random numbers. These random numbers must satisfy long-term average properties of the processes they are simulating. In this section we focus on the problem of generating random numbers that are “uniformly distributed” in the interval [0, 1]. In the next chapter we will show how these random numbers can be used to generate numbers with arbitrary probability laws. The first problem we must confront in generating a random number in the interval [0, 1] is the fact that there are an uncountably infinite number of points in the interval, but the computer is limited to representing numbers with finite precision only. We must therefore be content with generating equiprobable numbers from some finite set, say 50, 1, Á , M - 16 or 51, 2, Á , M6. By dividing these numbers by M, we obtain numbers in the unit interval. These numbers can be made increasingly dense in the unit interval by making M very large. The next step involves finding a mechanism for generating random numbers. The direct approach involves performing random experiments. For example, we can generate integers in the range 0 to 2 m - 1 by flipping a fair coin m times and replacing the sequence of heads and tails by 0s and 1s to obtain the binary representation of an integer. Another example would involve drawing a ball from an urn containing balls numbered 1 to M. Computer simulations involve the generation of long sequences of random numbers. If we were to use the above mechanisms to generate random numbers, we would have to perform the experiments a large number of times and store the outcomes in computer storage for access by the simulation program. It is clear that this approach is cumbersome and quickly becomes impractical. 2.7.1 Pseudo-Random Number Generation The preferred approach for the computer generation of random numbers involves the use of recursive formulas that can be implemented easily and quickly. These pseudorandom number generators produce a sequence of numbers that appear to be random but that in fact repeat after a very long period. The currently preferred pseudo-random number generator is the so-called Mersenne Twister, which is based on a matrix linear recurrence over a binary field. This algorithm can yield sequences with an extremely long period of 2 19937 - 1. The Mersenne Twister generates 32-bit integers, so M = 2 32 - 1 in terms of our previous discussion. We obtain a sequence of numbers in the unit interval by dividing the 32-bit integers by 2 32. The sequence of such numbers should be equally distributed over unit cubes of very high dimensionality. The Mersenne Twister has been shown to meet this condition up to 632-dimensionality. In addition, the algorithm is fast and efficient in terms of storage. Software implementations of the Mersenne Twister are widely available and incorporated into numerical packages such as MATLAB® and Octave.7 Both MATLAB and Octave provide a means to generate random numbers from the unit interval using the 7 MATLAB® and Octave are interactive computer programs for numerical computations involving matrices. MATLAB® is a commercial product sold by The Mathworks, Inc. Octave is a free, open-source program that is mostly compatible with MATLAB in terms of computation. Long [9] provides an introduction to Octave. 68 Chapter 2 Basic Concepts of Probability Theory rand command. The rand (n, m) operator returns an n row by m column matrix with elements that are random numbers from the interval [0, 1). This operator is the starting point for generating all types of random numbers. Example 2.46 Generation of Numbers from the Unit Interval First, generate 6 numbers from the unit interval. Next, generate 10,000 numbers from the unit interval. Plot the histogram and empirical distribution function for the sequence of 10,000 numbers. The following command results in the generation of six numbers from the unit interval. >rand(1,6) ans = Columns 1 through 6: 0.642667 0.147811 0.317465 0.512824 0.710823 0.406724 The following set of commands will generate 10000 numbers and produce the histogram shown in Fig. 2.16. >X-rand(10000,1); % Return result in a 10,000-element column vector X. >K=0.005:0.01;0.995; % Produce column vector K consisting of the mid points % for 100 bins of width 0.01 in the unit interval. >Hist(X,K) % Produce the desired histogram in Fig 2.16. >plot(K,empirical_cdf(K,X)) % Plot the proportion of elements in the array X less % than or equal to k, where k is an element of K. The empirical cdf is shown in Fig. 2.17. It is evident that the array of random numbers is uniformly distributed in the unit interval. 140 120 100 80 60 40 20 0 0 0.2 0.4 0.6 0.8 FIGURE 2.16 Histogram resulting from experiment to generate 10,000 numbers in the unit interval. 1 Section 2.7 Synthesizing Randomness: Random Number Generators 69 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FIGURE 2.17 Empirical cdf of experiment that generates 10,000 numbers. 2.7.2 Simulation of Random Experiments MATLAB® and Octave provide functions that are very useful in carrying out numerical evaluation of probabilities involving the most common distributions. Functions are also provided for the generation of random numbers with specific probability distributions. In this section we consider Bernoulli trials and binomial distributions. In Chapter 3 we consider experiments with discrete sample spaces. Example 2.47 Bernoulli Trials and Binomial Probabilities First, generate the outcomes of eight Bernoulli trials. Next, generate the outcomes of 100 repetitions of a random experiment that counts the number of successes in 16 Bernoulli trials with probability of success 1冫2 . Plot the histogram of the outcomes in the 100 experiments and compare to the binomial probabilities with n = 16 and p = 1/2 . The following command will generate the outcomes of eight Bernoulli trials, as shown by the answer that follows. >X=rand(1,8)<0.5; % Generate 1 row of Bernoulli trials with p = 0.5 X= 01100011 If the number produced by rand for a given Bernoulli trial is less than p = 0.5, then the outcome of the Bernoulli trial is 1. 70 Chapter 2 Basic Concepts of Probability Theory Next we show the set of commands to generate the outcomes of 100 repetitions of random experiments where each involves 16 Bernoulli trials. >X=rand(100,16)<0.5; % Generate 100 rows of 16 Bernoulli trials with % p = 0.5. >Y=sum(X,2); % Add the results of each row to obtain the number of % successes in each experiment. Y contains the 100 % outcomes. >K=0:16; >Z=empirical_pdf(K,Y)); % Find the relative frequencies of the outcomes in Y. >Bar(K,Z) % Produce a bar graph of the relative frequencies. >hold on % Retains the graph for next command. >stem(K,binomial_pdf(K,16,0.5)) % Plot the binomial probabilities along % with the corresponding relative frequencies. Figure 2.18 shows that there is good agreement between the relative frequencies and the binomial probabilities. *2.8 FINE POINTS: EVENT CLASSES8 If the sample space S is discrete, then the event class can consist of all subsets of S. There are situations where we may wish or are compelled to let the event class F be a smaller class of subsets of S. In these situations, only the subsets that belong to this class are considered events. In this section we explain how these situations arise. Let C be the class of events of interest in a random experiment. It is reasonable to expect that any set operation on events in C will produce a set that is also an event in C. We can then ask any question regarding events of the random experiment, express it using set operations, and obtain an event that is in C. Mathematically, we require that C be a field. A collection of sets F is called a field if it satisfies the following conditions: (i)  H F (ii) if A H F and B H F, then A ´ B H F (iii) if A H F then Ac H F. (2.48a) (2.48b) (2.48c) Using DeMorgan’s rule we can show that (ii) and (iii) imply that if A H F and B H F, then A ¨ B H F. Conditions (ii) and (iii) then imply that any finite union or intersection of events in F will result in an event that is also in F. Example 2.48 Let S = 5T, H6. Find the field generated by set operations on the class consisting of elementary events of S : C = 55H6, 5T66. 8 The “Fine Points” sections elaborate on concepts and distinctions that are not required in an introductory course. The material in these sections is not necessarily more mathematical, but rather is not usually covered in a first course in probability. Problems 81 7. W. Feller, An Introduction to Probability Theory and Its Applications, 3d ed., Wiley, New York, 1968. 8. A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis, Dover Publications, New York, 1970. 9. P. J. G. Long, “Introduction to Octave,” University of Cambridge, September 2005, available online. 10. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill, New York, 2000. PROBLEMS Section 2.1: Specifying Random Experiments 2.1. The (loose) minute hand in a clock is spun hard and the hour at which the hand comes to rest is noted. (a) What is the sample space? (b) Find the sets corresponding to the events: A = “hand is in first 4 hours”; B = “hand is between 2nd and 8th hours inclusive”; and D = “hand is in an odd hour.” (c) Find the events: A ¨ B ¨ D, Ac ¨ B, A ´ 1B ¨ Dc2, 1A ´ B2 ¨ Dc. 2.2. A die is tossed twice and the number of dots facing up in each toss is counted and noted in the order of occurrence. (a) Find the sample space. (b) Find the set A corresponding to the event “number of dots in first toss is not less than number of dots in second toss.” (c) Find the set B corresponding to the event “number of dots in first toss is 6.” (d) Does A imply B or does B imply A? (e) Find A ¨ Bc and describe this event in words. (f) Let C correspond to the event “number of dots in dice differs by 2.” Find A ¨ C. 2.3. Two dice are tossed and the magnitude of the difference in the number of dots facing up in the two dice is noted. (a) Find the sample space. (b) Find the set A corresponding to the event “magnitude of difference is 3.” (c) Express each of the elementary events in this experiment as the union of elementary events from Problem 2.2. 2.4. A binary communication system transmits a signal X that is either a +2 voltage signal or a -2 voltage signal. A malicious channel reduces the magnitude of the received signal by the number of heads it counts in two tosses of a coin. Let Y be the resulting signal. (a) Find the sample space. (b) Find the set of outcomes corresponding to the event “transmitted signal was definitely +2.” (c) Describe in words the event corresponding to the outcome Y = 0. 2.5. A desk drawer contains six pens, four of which are dry. (a) The pens are selected at random one by one until a good pen is found. The sequence of test results is noted. What is the sample space? 82 Chapter 2 2.6. 2.7. 2.8. 2.9. 2.10. 2.11. 2.12. 2.13. 2.14. 2.15. Basic Concepts of Probability Theory (b) Suppose that only the number, and not the sequence, of pens tested in part a is noted. Specify the sample space. (c) Suppose that the pens are selected one by one and tested until both good pens have been identified, and the sequence of test results is noted. What is the sample space? (d) Specify the sample space in part c if only the number of pens tested is noted. Three friends (Al, Bob, and Chris) put their names in a hat and each draws a name from the hat. (Assume Al picks first, then Bob, then Chris.) (a) Find the sample space. (b) Find the sets A, B, and C that correspond to the events “Al draws his name,” “Bob draws his name,” and “Chris draws his name.” (c) Find the set corresponding to the event, “no one draws his own name.” (d) Find the set corresponding to the event, “everyone draws his own name.” (e) Find the set corresponding to the event, “one or more draws his own name.” Let M be the number of message transmissions in Experiment E6. (a) What is the set A corresponding to the event “M is even”? (b) What is the set B corresponding to the event “M is a multiple of 3”? (c) What is the set C corresponding to the event “6 or fewer transmissions are required”? (d) Find the sets A ¨ B, A - B, A ¨ B ¨ C and describe the corresponding events in words. A number U is selected at random from the unit interval. Let the events A and B be: A = “U differs from 1/2 by more than 1/4” and B = “1 - U is less than 1/2.” Find the events A ¨ B, Ac ¨ B, A ´ B. The sample space of an experiment is the real line. Let the events A and B correspond to the following subsets of the real line: A = 1- q , r4 and B = 1- q , s4, where r … s. Find an expression for the event C = 1r, s] in terms of A and B. Show that B = A ´ C and A ¨ C = . Use Venn diagrams to verify the set identities given in Eqs. (2.2) and (2.3). You will need to use different colors or different shadings to denote the various regions clearly. Show that: (a) If event A implies B, and B implies C, then A implies C. (b) If event A implies B, then Bc implies Ac. Show that if A ´ B = A and A ¨ B = A then A = B. Let A and B be events. Find an expression for the event “exactly one of the events A and B occurs.” Draw a Venn diagram for this event. Let A, B, and C be events. Find expressions for the following events: (a) Exactly one of the three events occurs. (b) Exactly two of the events occur. (c) One or more of the events occur. (d) Two or more of the events occur. (e) None of the events occur. Figure P2.1 shows three systems of three components, C1 , C2 , and C3 . Figure P2.1(a) is a “series” system in which the system is functioning only if all three components are functioning. Figure 2.1(b) is a “parallel” system in which the system is functioning as long as at least one of the three components is functioning. Figure 2.1(c) is a “two-out-of-three” Problems 83 system in which the system is functioning as long as at least two components are functioning. Let A k be the event “component k is functioning.” For each of the three system configurations, express the event “system is functioning” in terms of the events A k . C1 C3 C2 (a) Series system C1 C1 C2 C2 C1 C3 C3 C2 C3 (b) Parallel system (c) Two-out-of-three system FIGURE P2.1 2.16. A system has two key subsystems. The system is “up” if both of its subsystems are functioning. Triple redundant systems are configured to provide high reliability. The overall system is operational as long as one of three systems is “up.” Let A jk correspond to the event “unit k in system j is functioning,” for j = 1, 2, 3 and k = 1, 2. (a) Write an expression for the event “overall system is up.” (b) Explain why the above problem is equivalent to the problem of having a connection in the network of switches shown in Fig. P2.2. A11 A12 A21 A22 A31 A32 FIGURE P2.2 2.17. In a specified 6-AM-to-6-AM 24-hour period, a student wakes up at time t1 and goes to sleep at some later time t2 . (a) Find the sample space and sketch it on the x-y plane if the outcome of this experiment consists of the pair 1t1 , t22. (b) Specify the set A and sketch the region on the plane corresponding to the event “student is asleep at noon.” (c) Specify the set B and sketch the region on the plane corresponding to the event “student sleeps through breakfast (7–9 AM).” (d) Sketch the region corresponding to A ¨ B and describe the corresponding event in words. 84 Chapter 2 Basic Concepts of Probability Theory 2.18. A road crosses a railroad track at the top of a steep hill. The train cannot stop for oncoming cars and cars, cannot see the train until it is too late. Suppose a train begins crossing the road at time t 1 and that the car begins crossing the track at time t 2, where 0 < t 1 < T and 0 < t 2 < T. (a) Find the sample space of this experiment. (b) Suppose that it takes the train d 1 seconds to cross the road and it takes the car d 2 seconds to cross the track. Find the set that corresponds to a collision taking place. (c) Find the set that corresponds to a collision is missed by 1 second or less. 2.19. A random experiment has sample space S = { - 1, 0, +1}. (a) Find all the subsets of S. (b) The outcome of a random experiment consists of pairs of outcomes from S where the elements of the pair cannot be equal. Find the sample space S ¿ of this experiment. How many subsets does S ¿ have? 2.20. (a) A coin is tossed twice and the sequence of heads and tails is noted. Let S be the sample space of this experiment. Find all subsets of S. (b) A coin is tossed twice and the number of heads is noted. Let S? be the sample space of this experiment. Find all subsets of S ¿ . (c) Consider parts a and b if the coin is tossed 10 times. How many subsets do S and S ¿ have? How many bits are needed to assign a binary number to each possible subset? Section 2.2: The Axioms of Probability 2.21. A die is tossed and the number of dots facing up is noted. (a) Find the probability of the elementary events under the assumption that all faces of the die are equally likely to be facing up after a toss. (b) Find the probability of the events: A = 5more than 3 dots6; B = 5odd number of dots6. (c) Find the probability of A ´ B, A ¨ B, Ac. 2.22. In Problem 2.2, a die is tossed twice and the number of dots facing up in each toss is counted and noted in the order of occurrence. (a) Find the probabilities of the elementary events. (b) Find the probabilities of events A, B, C, A ¨ Bc, and A ¨ C defined in Problem 2.2. 2.23. A random experiment has sample space S = 5a, b, c, d6. Suppose that P35c, d64 = 3/8, P35b, c64 = 6/8, and P35d64 = 1/8, P35c, d64 = 3/8. Use the axioms of probability to find the probabilities of the elementary events. 2.24. Find the probabilities of the following events in terms of P[A], P[B], and P3A ¨ B4: (a) A occurs and B does not occur; B occurs and A does not occur. (b) Exactly one of A or B occurs. (c) Neither A nor B occur. 2.25. Let the events A and B have P3A4 = x, P3B4 = y, and P3A ´ B4 = z. Use Venn diagrams to find P3A ¨ B], P3Ac ¨ Bc4, P3Ac ´ Bc4, P3A ¨ Bc4, P3Ac ´ B4. 2.26. Show that P3A ´ B ´ C4 = P3A4 + P3B4 + P3C4 - P3A ¨ B4 - P3A ¨ C4 - P3B ¨ C4 + P3A ¨ B ¨ C4. 2.27. Use the argument from Problem 2.26 to prove Corollary 6 by induction. Problems 85 2.28. A hexadecimal character consists of a group of three bits. Let A i be the event “ith bit in a character is a 1.” (a) Find the probabilities for the following events: A 1 , A 1 ¨ A 3 , A 1 ¨ A 2 ¨ A 3 and A 1 ´ A 2 ´ A 3 . Assume that the values of bits are determined by tosses of a fair coin. (b) Repeat part a if the coin is biased. 2.29. Let M be the number of message transmissions in Problem 2.7. Find the probabilities of the events A, B, C, C c, A ¨ B, A - B, A ¨ B ¨ C. Assume the probability of successful transmission is 1/2. 2.30. Use Corollary 7 to prove the following: (a) P3A ´ B ´ C4 … P3A4 + P3B4 + P3C4. (b) P B d A k R … a P3A k4. n n k=1 k=1 (c) P B t A k R Ú 1 - a P3A ck4. 2.31. 2.32. 2.33. 2.34. 2.35. n n k=1 k=1 The second expression is called the union bound. Let p be the probability that a single character appears incorrectly in this book. Use the union bound for the probability of there being any errors in a page with n characters. A die is tossed and the number of dots facing up is noted. (a) Find the probability of the elementary events if faces with an even number of dots are twice as likely to come up as faces with an odd number. (b) Repeat parts b and c of Problem 2.21. Consider Problem 2.1 where the minute hand in a clock is spun. Suppose that we now note the minute at which the hand comes to rest. (a) Suppose that the minute hand is very loose so the hand is equally likely to come to rest anywhere in the clock. What are the probabilities of the elementary events? (b) Now suppose that the minute hand is somewhat sticky and so the hand is 1/2 as likely to land in the second minute than in the first, 1/3 as likely to land in the third minute as in the first, and so on. What are the probabilities of the elementary events? (c) Now suppose that the minute hand is very sticky and so the hand is 1/2 as likely to land in the second minute than in the first, 1/2 as likely to land in the third minute as in the second, and so on. What are the probabilities of the elementary events? (d) Compare the probabilities that the hand lands in the last minute in parts a, b, and c. A number x is selected at random in the interval 3-1, 24. Let the events A = 5x 6 06, B = 5 ƒ x - 0.5 ƒ 6 0.56, and C = 5x 7 0.756. (a) Find the probabilities of A, B, A ¨ B, and A ¨ C. (b) Find the probabilities of A ´ B, A ´ C, and A ´ B ´ C, first, by directly evaluating the sets and then their probabilities, and second, by using the appropriate axioms or corollaries. A number x is selected at random in the interval 3 -1, 24. Numbers from the subinterval [0, 2] occur half as frequently as those from 3-1, 02. (a) Find the probability assignment for an interval completely within 3-1, 02; completely within [0, 2]; and partly in each of the above intervals. (b) Repeat Problem 2.34 with this probability assignment. 86 Chapter 2 Basic Concepts of Probability Theory 2.36. The lifetime of a device behaves according to the probability law P31t, q 24 = 1/t for t 7 1. Let A be the event “lifetime is greater than 4,” and B the event “lifetime is greater than 8.” (a) Find the probability of A ¨ B, and A ´ B. (b) Find the probability of the event “lifetime is greater than 6 but less than or equal to 12.” 2.37. Consider an experiment for which the sample space is the real line. A probability law assigns probabilities to subsets of the form 1- q , r4. (a) Show that we must have P31- q , r44 … P31- q , s44 when r 6 s. (b) Find an expression for P[(r, s]] in terms of P31- q , r44 and P31- q , s44 (c) Find an expression for P31s, q 24. 2.38. Two numbers (x, y) are selected at random from the interval [0, 1]. (a) Find the probability that the pair of numbers are inside the unit circle. (b) Find the probability that y 7 2x. *Section 2.3: Computing Probabilities Using Counting Methods 2.39. The combination to a lock is given by three numbers from the set 50, 1, Á , 596. Find the number of combinations possible. 2.40. How many seven-digit telephone numbers are possible if the first number is not allowed to be 0 or 1? 2.41. A pair of dice is tossed, a coin is flipped twice, and a card is selected at random from a deck of 52 distinct cards. Find the number of possible outcomes. 2.42. A lock has two buttons: a “0” button and a “1” button. To open a door you need to push the buttons according to a preset 8-bit sequence. How many sequences are there? Suppose you press an arbitrary 8-bit sequence; what is the probability that the door opens? If the first try does not succeed in opening the door, you try another number; what is the probability of success? 2.43. A Web site requires that users create a password with the following specifications: • Length of 8 to 10 characters • Includes at least one special character 5!, @, #, $, %, ¿, &, *, 1, 2, +, =, 5, 6, ƒ , 6, 7, O , ' , -, 3, 4, /, ?6 • No spaces • May contain numbers (0–9), lower and upper case letters (a–z, A–Z) • Is case-sensitive. How many passwords are there? How long would it take to try all passwords if a password can be tested in 1 microsecond? 2.44. A multiple choice test has 10 questions with 3 choices each. How many ways are there to answer the test? What is the probability that two papers have the same answers? 2.45. A student has five different t-shirts and three pairs of jeans (“brand new,” “broken in,” and “perfect”). (a) How many days can the student dress without repeating the combination of jeans and t-shirt? (b) How many days can the student dress without repeating the combination of jeans and t-shirt and without wearing the same t-shirt on two consecutive days? 2.46. Ordering a “deluxe” pizza means you have four choices from 15 available toppings. How many combinations are possible if toppings can be repeated? If they cannot be repeated? Assume that the order in which the toppings are selected does not matter. 2.47. A lecture room has 60 seats. In how many ways can 45 students occupy the seats in the room? Problems 87 2.48. List all possible permutations of two distinct objects; three distinct objects; four distinct objects. Verify that the number is n!. 2.49. A toddler pulls three volumes of an encyclopedia from a bookshelf and, after being scolded, places them back in random order. What is the probability that the books are in the correct order? 2.50. Five balls are placed at random in five buckets. What is the probability that each bucket has a ball? 2.51. List all possible combinations of two objects from two distinct objects; three distinct objects; four distinct objects. Verify that the number is given by the binomial coefficient. 2.52. A dinner party is attended by four men and four women. How many unique ways can the eight people sit around the table? How many unique ways can the people sit around the table with men and women alternating seats? 2.53. A hot dog vendor provides onions, relish, mustard, ketchup, Dijon ketchup, and hot peppers for your hot dog. How many variations of hot dogs are possible using one condiment? Two condiments? None, some, or all of the condiments? 2.54. A lot of 100 items contains k defective items. M items are chosen at random and tested. (a) What is the probability that m are found defective? This is called the hypergeometric distribution. (b) A lot is accepted if 1 or fewer of the M items are defective. What is the probability that the lot is accepted? 2.55. A park has N raccoons of which eight were previously captured and tagged. Suppose that 20 raccoons are captured. Find the probability that four of these are found to be tagged. Denote this probability, which depends on N, by p(N). Find the value of N that maximizes this probability. Hint: Compare the ratio p1N2/p1N - 12 to unity. 2.56. A lot of 50 items has 40 good items and 10 bad items. (a) Suppose we test five samples from the lot, with replacement. Let X be the number of defective items in the sample. Find P3X = k4. (b) Suppose we test five samples from the lot, without replacement. Let Y be the number of defective items in the sample. Find P3Y = k4. 2.57. How many distinct permutations are there of four red balls, two white balls, and three black balls? 2.58. A hockey team has 6 forwards, 4 defensemen, and 2 goalies. At any time, 3 forwards, 2 defensemen, and 1 goalie can be on the ice. How many combinations of players can a coach put on the ice? 2.59. Find the probability that in a class of 28 students exactly four were born in each of the seven days of the week. 2.60. Show that n k ¢ ≤ = ¢ n ≤ n-k 2.61. In this problem we derive the multinomial coefficient. Suppose we partition a set of n distinct objects into J subsets B1 , B2 , Á , BJ of size k1 , Á , kJ , respectively, where ki Ú 0, and k1 + k2 + Á + kJ = n. (a) Let Ni denote the number of possible outcomes when the ith subset is selected. Show that N1 = ¢ n n - k1 n - k1 - Á - kJ - 2 ≤ , N2 = ¢ ≤ , Á , NJ - 1 = ¢ ≤. k1 k2 kJ - 1 88 Chapter 2 Basic Concepts of Probability Theory (b) Show that the number of partitions is then: N1N2 Á NJ - 1 = n! . k1! k2! Á kJ! Section 2.4: Conditional Probability 2.62. A die is tossed twice and the number of dots facing up is counted and noted in the order of occurrence. Let A be the event “number of dots in first toss is not less than number of dots in second toss,” and let B be the event “number of dots in first toss is 6.” Find P3A ƒ B4 and P3B ƒ A4. 2.63. Use conditional probabilities and tree diagrams to find the probabilities for the elementary events in the random experiments defined in parts a to d of Problem 2.5. 2.64. In Problem 2.6 (name in hat), find P3B ¨ C ƒ A4 and P3C ƒ A ¨ B4. 2.65. In Problem 2.29 (message transmissions), find P3B ƒ A4 and P3A ƒ B4. 2.66. In Problem 2.8 (unit interval), find P3B ƒ A4 and P3A ƒ B4. 2.67. In Problem 2.36 (device lifetime), find P3B ƒ A4 and P3A ƒ B4. 2.68. In Problem 2.33, let A = 5hand rests in last 10 minutes6 and B = 5hand rests in last 5 minutes6. Find P3B ƒ A4 for parts a, b, and c. 2.69. A number x is selected at random in the interval 3- 1, 24. Let the events A = 5x 6 06, B = 5 ƒ x - 0.5 ƒ 6 0.56, and C = 5x 7 0.756. Find P3A ƒ B4, P3B ƒ C4, P3A ƒ C c4, P3B ƒ C c4. 2.70. In Problem 2.36, let A be the event “lifetime is greater than t,” and B the event “lifetime is greater than 2t.” Find P3B ƒ A4. Does the answer depend on t? Comment. 2.71. Find the probability that two or more students in a class of 20 students have the same birthday. Hint: Use Corollary 1. How big should the class be so that the probability that two or more students have the same birthday is 1/2? 2.72. A cryptographic hash takes a message as input and produces a fixed-length string as output, called the digital fingerprint. A brute force attack involves computing the hash for a large number of messages until a pair of distinct messages with the same hash is found. Find the number of attempts required so that the probability of obtaining a match is 1/2. How many attempts are required to find a matching pair if the digital fingerprint is 64 bits long? 128 bits long? 2.73. (a) Find P3A ƒ B4 if A ¨ B = ; if A ( B; if A ) B. (b) Show that if P3A ƒ B4 7 P3A4, then P3B ƒ A4 7 P3B4. 2.74. Show that P3A ƒ B4 satisfies the axioms of probability. (i) 0 … P3A ƒ B4 … 1 (ii) P3S ƒ B4 = 1 (iii) If A ¨ C = , then P3A ´ C ƒ B4 = P3A ƒ B4 + P3C ƒ B4. 2.75. Show that P3A ¨ B ¨ C4 = P3A ƒ B ¨ C4P3B ƒ C4P3C4. 2.76. In each lot of 100 items, two items are tested, and the lot is rejected if either of the tested items is found defective. (a) Find the probability that a lot with k defective items is accepted. (b) Suppose that when the production process malfunctions, 50 out of 100 items are defective. In order to identify when the process is malfunctioning, how many items should be tested so that the probability that one or more items are found defective is at least 99%? Problems 89 2.77. A nonsymmetric binary communications channel is shown in Fig. P2.3. Assume the input is “0” with probability p and “1” with probability 1 - p. (a) Find the probability that the output is 0. (b) Find the probability that the input was 0 given that the output is 1. Find the probability that the input is 1 given that the output is 1. Which input is more probable? Input 0 1  ε1 Output 0 ε1 ε2 1 1  ε2 1 FIGURE P2.3 2.78. The transmitter in Problem 2.4 is equally likely to send X = +2 as X = -2. The malicious channel counts the number of heads in two tosses of a fair coin to decide by how much to reduce the magnitude of the input to produce the output Y. (a) Use a tree diagram to find the set of possible input-output pairs. (b) Find the probabilities of the input-output pairs. (c) Find the probabilities of the output values. (d) Find the probability that the input was X = +2 given that Y = k. 2.79. One of two coins is selected at random and tossed three times. The first coin comes up heads with probability p1 and the second coin with probability p2 = 2/3 7 p1 = 1/3. (a) What is the probability that the number of heads is k? (b) Find the probability that coin 1 was tossed given that k heads were observed, for k = 0, 1, 2, 3. (c) In part b, which coin is more probable when k heads have been observed? (d) Generalize the solution in part b to the case where the selected coin is tossed m times. In particular, find a threshold value T such that when k 7 T heads are observed, coin 1 is more probable, and when k 6 T are observed, coin 2 is more probable. (e) Suppose that p2 = 1 (that is, coin 2 is two-headed) and 0 6 p1 6 1. What is the probability that we do not determine with certainty whether the coin is 1 or 2? 2.80. A computer manufacturer uses chips from three sources. Chips from sources A, B, and C are defective with probabilities .005, .001, and .010, respectively. If a randomly selected chip is found to be defective, find the probability that the manufacturer was A; that the manufacturer was C. Assume that the proportions of chips from A, B, and C are 0.5, 0.1, and 0.4, respectively. 2.81. A ternary communication system is shown in Fig. P2.4. Suppose that input symbols 0, 1, and 2 occur with probability 1/3 respectively. (a) Find the probabilities of the output symbols. (b) Suppose that a 1 is observed at the output. What is the probability that the input was 0? 1? 2? 90 Chapter 2 Basic Concepts of Probability Theory Input 1ε Output 0 ε 0 1 1ε ε 1 ε 2 2 1ε FIGURE P2.4 Section 2.5: Independence of Events 2.82. Let S = 51, 2, 3, 46 and A = 51, 26, B = 51, 36, C = 51, 46. Assume the outcomes are equiprobable. Are A, B, and C independent events? 2.83. Let U be selected at random from the unit interval. Let A = 50 6 U 6 1/26, B = 51/4 6 U 6 3/46, and C = 51/2 6 U 6 16. Are any of these events independent? 2.84. Alice and Mary practice free throws at the basketball court after school. Alice makes free throws with probability pa and Mary makes them with probability pm . Find the probability of the following outcomes when Alice and Mary each take one shot: Alice scores a basket; Either Alice or Mary scores a basket; both score; both miss. 2.85. Show that if A and B are independent events, then the pairs A and Bc, Ac and B, and Ac and Bc are also independent. 2.86. Show that events A and B are independent if P3A ƒ B4 = P3A ƒ Bc4. 2.87. Let A, B, and C be events with probabilities P[A], P[B], and P[C]. (a) Find P3A ´ B4 if A and B are independent. (b) Find P3A ´ B4 if A and B are mutually exclusive. (c) Find P3A ´ B ´ C4 if A, B, and C are independent. (d) Find P3A ´ B ´ C4 if A, B, and C are pairwise mutually exclusive. 2.88. An experiment consists of picking one of two urns at random and then selecting a ball from the urn and noting its color (black or white). Let A be the event “urn 1 is selected” and B the event “a black ball is observed.” Under what conditions are A and B independent? 2.89. Find the probabilities in Problem 2.14 assuming that events A, B, and C are independent. 2.90. Find the probabilities that the three types of systems are “up” in Problem 2.15. Assume that all units in the system fail independently and that a type k unit fails with probability pk . 2.91. Find the probabilities that the system is “up” in Problem 2.16. Assume that all units in the system fail independently and that a type k unit fails with probability pk . 2.92. A random experiment is repeated a large number of times and the occurrence of events A and B is noted. How would you test whether events A and B are independent? 2.93. Consider a very long sequence of hexadecimal characters. How would you test whether the relative frequencies of the four bits in the hex characters are consistent with independent tosses of coin? 2.94. Compute the probability of the system in Example 2.35 being “up” when a second controller is added to the system. Problems 91 2.95. In the binary communication system in Example 2.26, find the value of e for which the input of the channel is independent of the output of the channel. Can such a channel be used to transmit information? 2.96. In the ternary communication system in Problem 2.81, is there a choice of e for which the input of the channel is independent of the output of the channel? Section 2.6: Sequential Experiments 2.97. A block of 100 bits is transmitted over a binary communication channel with probability of bit error p = 10 -2. (a) If the block has 1 or fewer errors then the receiver accepts the block. Find the probability that the block is accepted. (b) If the block has more than 1 error, then the block is retransmitted. Find the probability that M retransmissions are required. 2.98. A fraction p of items from a certain production line is defective. (a) What is the probability that there is more than one defective item in a batch of n items? (b) During normal production p = 10 -3 but when production malfunctions p = 10-1. Find the size of a batch that should be tested so that if any items are found defective we are 99% sure that there is a production malfunction. 2.99. A student needs eight chips of a certain type to build a circuit. It is known that 5% of these chips are defective. How many chips should he buy for there to be a greater than 90% probability of having enough chips for the circuit? 2.100. Each of n terminals broadcasts a message in a given time slot with probability p. (a) Find the probability that exactly one terminal transmits so the message is received by all terminals without collision. (b) Find the value of p that maximizes the probability of successful transmission in part a. (c) Find the asymptotic value of the probability of successful transmission as n becomes large. 2.101. A system contains eight chips. The lifetime of each chip has a Weibull probability law: k with parameters l and k = 2: P31t, q 24 = e -1lt2 for t Ú 0. Find the probability that at least two chips are functioning after 2/l seconds. 2.102. A machine makes errors in a certain operation with probability p. There are two types of errors. The fraction of errors that are type 1 is a, and type 2 is 1 - a. (a) What is the probability of k errors in n operations? (b) What is the probability of k1 type 1 errors in n operations? (c) What is the probability of k2 type 2 errors in n operations? (d) What is the joint probability of k1 and k2 type 1 and 2 errors, respectively, in n operations? 2.103. Three types of packets arrive at a router port. Ten percent of the packets are “expedited forwarding (EF),” 30 percent are “assured forwarding (AF),” and 60 percent are “best effort (BE).” (a) Find the probability that k of N packets are not expedited forwarding. (b) Suppose that packets arrive one at a time. Find the probability that k packets are received before an expedited forwarding packet arrives. (c) Find the probability that out of 20 packets, 4 are EF packets, 6 are AF packets, and 10 are BE. 92 Chapter 2 Basic Concepts of Probability Theory 2.104. A run-length coder segments a binary information sequence into strings that consist of either a “run” of k “zeros” punctuated by a “one”, for k = 0, Á , m - 1, or a string of m “zeros.” The m = 3 case is: 2.105. 2.106. 2.107. 2.108. String Run-length k 1 01 001 0 1 2 000 3 Suppose that the information is produced by a sequence of Bernoulli trials with P3“one”4 = P3success4 = p. (a) Find the probability of run-length k in the m = 3 case. (b) Find the probability of run-length k for general m. The amount of time cars are parked in a parking lot follows a geometric probability law with p = 1/2. The charge for parking in the lot is $1 for each half-hour or less. (a) Find the probability that a car pays k dollars. (b) Suppose that there is a maximum charge of $6. Find the probability that a car pays k dollars. A biased coin is tossed repeatedly until heads has come up three times. Find the probability that k tosses are required. Hint: Show that 5“k tosses are required”6 = A ¨ B, where A = 5“kth toss is heads”6 and B = 5“2 heads occurs in k - 1 tosses”6. An urn initially contains two black balls and two white balls. The following experiment is repeated indefinitely: A ball is drawn from the urn; if the color of the ball is the same as the majority of balls remaining in the urn, then the ball is put back in the urn. Otherwise the ball is left out. (a) Draw the trellis diagram for this experiment and label the branches by the transition probabilities. (b) Find the probabilities for all sequences of outcomes of length 2 and length 3. (c) Find the probability that the urn contains no black balls after three draws; no white balls after three draws. (d) Find the probability that the urn contains two black balls after n trials; two white balls after n trials. In Example 2.45, let p01n2 and p11n2 be the probabilities that urn 0 or urn 1 is used in the nth subexperiment. (a) Find p0112 and p1112. (b) Express p01n + 12 and p11n + 12 in terms of p01n2 and p11n2. (c) Evaluate p01n2 and p11n2 for n = 2, 3, 4. (d) Find the solution to the recursion in part b with the initial conditions given in part a. (e) What are the urn probabilities as n approaches infinity? *Section 2.7: Synthesizing Randomness: Number Generators 2.109. An urn experiment is to be used to simulate a random experiment with sample space S = 51, 2, 3, 4, 56 and probabilities p1 = 1/3, p2 = 1/5, p3 = 1/4, p4 = 1/7, and p5 = 1 - 1p1 + p2 + p3 + p42. How many balls should the urn contain? Generalize Problems 2.110. 2.111. 2.112. 2.113. 93 the result to show that an urn experiment can be used to simulate any random experiment with finite sample space and with probabilities given by rational numbers. Suppose we are interested in using tosses of a fair coin to simulate a random experiment in which there are six equally likely outcomes, where S = 50, 1, 2, 3, 4, 56. The following version of the “rejection method” is proposed: 1. Toss a fair coin three times and obtain a binary number by identifying heads with zero and tails with one. 2. If the outcome of the coin tosses in step 1 is the binary representation for a number in S, output the number. Otherwise, return to step 1. (a) Find the probability that a number is produced in step 2. (b) Show that the numbers that are produced in step 2 are equiprobable. (c) Generalize the above algorithm to show how coin tossing can be used to simulate any random urn experiment. Use the rand function in Octave to generate 1000 pairs of numbers in the unit square. Plot an x-y scattergram to confirm that the resulting points are uniformly distributed in the unit square. Apply the rejection method introduced above to generate points that are uniformly distributed in the x 7 y portion of the unit square. Use the rand function to generate a pair of numbers in the unit square. If x 7 y, accept the number. If not, select another pair. Plot an x-y scattergram for the pair of accepted numbers and confirm that the resulting points are uniformly distributed in the x 7 y region of the unit square. The sample mean-squared value of the numerical outcomes X112, X122, Á X1n2 of a series of n repetitions of an experiment is defined by 8X29n = 1 n 2 X 1j2. n ja =1 (a) What would you expect this expression to converge to as the number of repetitions n becomes very large? (b) Find a recursion formula for 8X29n similar to the one found in Problem 1.9. 2.114. The sample variance is defined as the mean-squared value of the variation of the samples about the sample mean 8V29n = 1 n 5X1j2 - 8X9n62. n ja =1 Note that the 8X9n also depends on the sample values. (It is customary to replace the n in the denominator with n - 1 for technical reasons that will be discussed in Chapter 8. For now we will use the above definition.) (a) Show that the sample variance satisfies the following expression: 8V29n = 8X29n - 8X92n. (b) Show that the sample variance satisfies the following recursion formula: 8V29n = a1 with 8V290 = 0. 1 1 1 b8V29n - 1 + a1 - b1X1n2 - 8X9n - 122, n n n 94 Chapter 2 Basic Concepts of Probability Theory 2.115. Suppose you have a program to generate a sequence of numbers Un that is uniformly distributed in [0, 1]. Let Yn = aUn + b. (a) Find a and b so that Yn is uniformly distributed in the interval [a, b]. (b) Let a = -5 and b = 15. Use Octave to generate Yn and to compute the sample mean and sample variance in 1000 repetitions. Compare the sample mean and sample variance to 1a + b2/2 and 1b - a22/12, respectively. 2.116. Use Octave to simulate 100 repetitions of the random experiment where a coin is tossed 16 times and the number of heads is counted. (a) Confirm that your results are similar to those in Figure 2.18. (b) Rerun the experiment with p = 0.25 and p = 0.75. Are the results as expected? *Section 2.8: Fine Points: Event Classes 2.117. In Example 2.49, Homer maps the outcomes from Lisa’s sample space SL = 5r, g, t6 into a smaller sample space SH = 5R, G6 : f1r2 = R, f1g2 = G, and f1t2 = G. Define the inverse image events as follows: f -115R62 = A 1 = 5r6 and f -115G62 = A 2 = 5g, t6. Let A and B be events in Homer’s sample space. (a) Show that f -11A ´ B2 = f -11A2 ´ f -11B2. (b) Show that f -11A ¨ B2 = f -11A2 ¨ f -11B2. (c) Show that f -11Ac2 = f -11A2c. (d) Show that the results in parts a, b, and c hold for a general mapping f from a sample space S to a set S¿. 2.118. Let f be a mapping from a sample space S to a finite set S¿ = 5y1 , y2 , Á , yn6. (a) Show that the set of inverse images A k = f -115yk62 forms a partition of S. (b) Show that any event B of S¿ can be related to a union of A k’s. 2.119. Let A be any subset of S . Show that the class of sets 5, A, Ac, S6 is a field. *Section 2.9: Fine Points: Probabilities of Sequences of Events 2.120. Find the countable union of the following sequences of events: (a) A n = 3a + 1/n, b - 1/n4. (b) Bn = 1-n, b - 1/n]. (c) Cn = 3a + 1/n, b2. 2.121. Find the countable intersection of the following sequences of events: (a) A n = 1a - 1/n, b + 1/n2. (b) Bn = 3a, b + 1/n2. (c) Cn = 1a - 1/n, b4. 2.122. (a) Show that the Borel field can be generated from the complements and countable intersections and unions of open sets (a, b). (b) Suggest other classes of sets that can generate the Borel field. 2.123. Find expressions for the probabilities of the events in Problem 2.120. 2.124. Find expressions for the probabilities of the events in Problem 2.121. Problems 95 Problems Requiring Cumulative Knowledge 2.125. Compare the binomial probability law and the hypergeometric law introduced in Problem 2.54 as follows. (a) Suppose a lot has 20 items of which five are defective. A batch of ten items is tested without replacement. Find the probability that k are found defective for k = 0, Á , 10. Compare this to the binomial probabilities with n = 10 and p = 5/20 = .25. (b) Repeat but with a lot of 1000 items of which 250 are defective. A batch of ten items is tested without replacement. Find the probability that k are found defective for k = 0, Á , 10. Compare this to the binomial probabilities with n = 10 and p = 5/20 = .25. 2.126. Suppose that in Example 2.43, computer A sends each message to computer B simultaneously over two unreliable radio links. Computer B can detect when errors have occurred in either link. Let the probability of message transmission error in link 1 and link 2 be q1 and q2 respectively. Computer B requests retransmissions until it receives an error-free message on either link. (a) Find the probability that more than k transmissions are required. (b) Find the probability that in the last transmission, the message on link 2 is received free of errors. 2.127. In order for a circuit board to work, seven identical chips must be in working order. To improve reliability, an additional chip is included in the board, and the design allows it to replace any of the seven other chips when they fail. (a) Find the probability pb that the board is working in terms of the probability p that an individual chip is working. (b) Suppose that n circuit boards are operated in parallel, and that we require a 99.9% probability that at least one board is working. How many boards are needed? 2.128. Consider a well-shuffled deck of cards consisting of 52 distinct cards, of which four are aces and four are kings. (a) Find the probability of obtaining an ace in the first draw. (b) Draw a card from the deck and look at it. What is the probability of obtaining an ace in the second draw? Does the answer change if you had not observed the first draw? (c) Suppose we draw seven cards from the deck. What is the probability that the seven cards include three aces? What is the probability that the seven cards include two kings? What is the probability that the seven cards include three aces and/or two kings? (d) Suppose that the entire deck of cards is distributed equally among four players. What is the probability that each player gets an ace? CHAPTER Discrete Random Variables 3 In most random experiments we are interested in a numerical attribute of the outcome of the experiment. A random variable is defined as a function that assigns a numerical value to the outcome of the experiment. In this chapter we introduce the concept of a random variable and methods for calculating probabilities of events involving a random variable. We focus on the simplest case, that of discrete random variables, and introduce the probability mass function. We define the expected value of a random variable and relate it to our intuitive notion of an average. We also introduce the conditional probability mass function for the case where we are given partial information about the random variable. These concepts and their extension in Chapter 4 provide us with the tools to evaluate the probabilities and averages of interest in the design of systems involving randomness. Throughout the chapter we introduce important random variables and discuss typical applications where they arise. We also present methods for generating random variables. These methods are used in computer simulation models that predict the behavior and performance of complex modern systems. 3.1 THE NOTION OF A RANDOM VARIABLE The outcome of a random experiment need not be a number. However, we are usually interested not in the outcome itself, but rather in some measurement or numerical attribute of the outcome. For example, in n tosses of a coin, we may be interested in the total number of heads and not in the specific order in which heads and tails occur. In a randomly selected Web document, we may be interested only in the length of the document. In each of these examples, a measurement assigns a numerical value to the outcome of the random experiment. Since the outcomes are random, the results of the measurements will also be random. Hence it makes sense to talk about the probabilities of the resulting numerical values. The concept of a random variable formalizes this notion. A random variable X is a function that assigns a real number, X1z2, to each outcome z in the sample space of a random experiment. Recall that a function is simply a rule for assigning a numerical value to each element of a set, as shown pictorially in 96 Section 3.1 The Notion of a Random Variable 97 S X(z)  x real line z x SX FIGURE 3.1 A random variable assigns a number X1z2 to each outcome z in the sample space S of a random experiment. Fig. 3.1. The specification of a measurement on the outcome of a random experiment defines a function on the sample space, and hence a random variable. The sample space S is the domain of the random variable, and the set SX of all values taken on by X is the range of the random variable. Thus SX is a subset of the set of all real numbers. We will use the following notation: capital letters denote random variables, e.g., X or Y, and lower case letters denote possible values of the random variables, e.g., x or y. Example 3.1 Coin Tosses A coin is tossed three times and the sequence of heads and tails is noted.The sample space for this experiment is S = 5HHH, HHT, HTH, HTT, THH, THT, TTH, TTT6. Let X be the number of heads in the three tosses. X assigns each outcome z in S a number from the set SX = 50, 1, 2, 36. The table below lists the eight outcomes of S and the corresponding values of X. z: HHH HHT HTH THH HTT THT TTH TTT X1z2: 3 2 2 2 1 1 1 0 X is then a random variable taking on values in the set SX = 50, 1, 2, 36. Example 3.2 A Betting Game A player pays $1.50 to play the following game: A coin is tossed three times and the number of heads X is counted. The player receives $1 if X = 2 and $8 if X = 3, but nothing otherwise. Let Y be the reward to the player. Y is a function of the random variable X and its outcomes can be related back to the sample space of the underlying random experiment as follows: z: HHH HHT HTH THH HTT THT TTH TTT X1z2: 3 8 2 1 2 1 2 1 1 0 1 0 1 0 0 0 Y1z2: Y is then a random variable taking on values in the set SY = 50, 1, 86. 98 Chapter 3 Discrete Random Variables The above example shows that a function of a random variable produces another random variable. For random variables, the function or rule that assigns values to each outcome is fixed and deterministic, as, for example, in the rule “count the total number of dots facing up in the toss of two dice.” The randomness in the experiment is complete as soon as the toss is done. The process of counting the dots facing up is deterministic. Therefore the distribution of the values of a random variable X is determined by the probabilities of the outcomes z in the random experiment. In other words, the randomness in the observed values of X is induced by the underlying random experiment, and we should therefore be able to compute the probabilities of the observed values of X in terms of the probabilities of the underlying outcomes. Example 3.3 Coin Tosses and Betting Let X be the number of heads in three independent tosses of a fair coin. Find the probability of the event 5X = 26. Find the probability that the player in Example 3.2 wins $8. Note that X1z2 = 2 if and only if z is in 5HHT, HTH, THH6. Therefore P3X = 24 = P35HHT, HTH, HHT64 = P35HHT64 + P35HTH64 + P35HHT64 = 3/8. The event 5Y = 86 occurs if and only if the outcome z is HHH, therefore P3Y = 84 = P35HHH64 = 1/8. Example 3.3 illustrates a general technique for finding the probabilities of events involving the random variable X. Let the underlying random experiment have sample space S and event class F. To find the probability of a subset B of R, e.g., B = 5xk6, we need to find the outcomes in S that are mapped to B, that is, A = 5z : X1z2 H B6 (3.1) as shown in Fig. 3.2. If event A occurs then X1z2 H B, so event B occurs. Conversely, if event B occurs, then the value X1z2 implies that z is in A, so event A occurs. Thus the probability that X is in B is given by: P3X H B4 = P3A4 = P35z : X1z2 H B64. (3.2) S A B FIGURE 3.2 P3X in B4 ⴝ P3z in A4 real line Section 3.2 Discrete Random Variables and Probability Mass Function 99 We refer to A and B as equivalent events. In some random experiments the outcome z is already the numerical value we are interested in. In such cases we simply let X1z2 = z, that is, the identity function, to obtain a random variable. * 3.1.1 Fine Point: Formal Definition of a Random Variable In going from Eq. (3.1) to Eq. (3.2) we actually need to check that the event A is in F, because only events in F have probabilities assigned to them. The formal definition of a random variable in Chapter 4 will explicitly state this requirement. If the event class F consists of all subsets of S, then the set A will always be in F, and any function from S to R will be a random variable. However, if the event class F does not consist of all subsets of S, then some functions from S to R may not be random variables, as illustrated by the following example. Example 3.4 A Function That Is Not a Random Variable This example shows why the definition of a random variable requires that we check that the set A is in F. An urn contains three balls. One ball is electronically coded with a label 00. Another ball is coded with 01, and the third ball has a 10 label. The sample space for this experiment is S = 500, 01, 106. Let the event class F consist of all unions, intersections, and complements of the events A 1 = 500, 106 and A 2 = 5016. In this event class, the outcome 00 cannot be distinguished from the outcome 10. For example, this could result from a faulty label reader that cannot distinguish between 00 and 10. The event class has four events F = 5, 500, 106, 5016, 500, 01, 1066. Let the probability assignment for the events in F be P3500, 1064 = 2/3 and P350164 = 1/3. Consider the following function X from S to R: X1002 = 0, X1012 = 1, X1102 = 2. To find the probability of 5X = 06, we need the probability of 5z: X1z2 = 06 = 5006. However, 5006 is not in the class F, and so X is not a random variable because we cannot determine the probability that X = 0. 3.2 DISCRETE RANDOM VARIABLES AND PROBABILITY MASS FUNCTION A discrete random variable X is defined as a random variable that assumes values from a countable set, that is, SX = 5x1 , x2 , x3 , Á 6. A discrete random variable is said to be finite if its range is finite, that is, SX = 5x1 , x2 , Á , xn6. We are interested in finding the probabilities of events involving a discrete random variable X. Since the sample space SX is discrete, we only need to obtain the probabilities for the events A k = 5z: X1z2 = xk6 in the underlying random experiment. The probabilities of all events involving X can be found from the probabilities of the A k’s. The probability mass function (pmf) of a discrete random variable X is defined as: pX1x2 = P3X = x4 = P35z : X1z2 = x64 for x a real number. (3.3) Note that pX1x2 is a function of x over the real line, and that pX1x2 can be nonzero only at the values x1 , x2 , x3 , Á . For xk in SX , we have pX1xk2 = P[A k]. 100 Chapter 3 Discrete Random Variables S A1 A2 … Ak … x1 x2 … xk … FIGURE 3.3 Partition of sample space S associated with a discrete random variable. The events A 1 , A 2 , Á form a partition of S as illustrated in Fig. 3.3. To see this, we first show that the events are disjoint. Let j Z k, then A j ¨ A k = 5z: X1z2 = xj and X1z2 = xk6 =  since each z is mapped into one and only one value in SX . Next we show that S is the union of the A k’s. Every z in S is mapped into some xk so that every z belongs to an event A k in the partition. Therefore: S = A1 ´ A2 ´ Á . All events involving the random variable X can be expressed as the union of events A k’s. For example, suppose we are interested in the event X in B = 5x2 , x56, then P3X in B4 = P35z : X1z2 = x26 ´ 5z: X1z2 = x564 = P3A 2 ´ A 54 = P3A 24 + P3A 54 = pX122 + pX152. The pmf pX1x2 satisfies three properties that provide all the information required to calculate probabilities for events involving the discrete random variable X: (i) pX1x2 Ú 0 for all x (3.4a) (ii) a pX1x2 = a pX1xk2 = a P3A k4 = 1 (3.4b) (iii) P3X in B4 = a pX1x2 where B ( SX . (3.4c) xHSX all k all k xHB Property (i) is true because the pmf values are defined as a probability, pX1x2 = P3X= x4. Property (ii) follows because the events A k = 5X = xk6 form a partition of S. Note that the summations in Eqs. (3.4b) and (3.4c) will have a finite or infinite number of terms depending on whether the random variable is finite or not. Next consider property (iii). Any event B involving X is the union of elementary events, so by Axiom III¿ we have: P3X in B4 = P3 d 5z: X1z2 = x64 = a P3X = x4 = a pX1x2. xHB xHB xHB Section 3.2 Discrete Random Variables and Probability Mass Function 101 The pmf of X gives us the probabilities for all the elementary events from SX . The probability of any subset of SX is obtained from the sum of the corresponding elementary events. In fact we have everything required to specify a probability law for the outcomes in SX . If we are only interested in events concerning X, then we can forget about the underlying random experiment and its associated probability law and just work with SX and the pmf of X. Example 3.5 Coin Tosses and Binomial Random Variable Let X be the number of heads in three independent tosses of a coin. Find the pmf of X. Proceeding as in Example 3.3, we find: p0 = P3X = 04 = P35TTT64 = 11 - p23, p1 = P3X = 14 = P35HTT64 + P35THT64 + P35TTH64 = 311 - p22p, p2 = P3X = 24 = P35HHT64 + P35HTH64 + P35THH64 = 311 - p2p2, p3 = P3X = 34 = P35HHH64 = p3. Note that pX102 + pX112 + pX122 + pX132 = 1. Example 3.6 A Betting Game A player receives $1 if the number of heads in three coin tosses is 2, $8 if the number is 3, but nothing otherwise. Find the pmf of the reward Y. pY102 = P3z H 5TTT, TTH, THT, HTT64 = 4/8 = 1/2 pY112 = P3z H 5THH, HTH, HHT64 = 3/8 pY182 = P3z H 5HHH64 = 1/8. Note that pY102 + pY112 + pY182 = 1. Figures 3.4(a) and (b) show the graph of pX1x2 versus x for the random variables in Examples 3.5 and 3.6, respectively. In general, the graph of the pmf of a discrete random variable has vertical arrows of height pX1xk2 at the values xk in SX . We may view the total probability as one unit of mass and pX1x2 as the amount of probability mass that is placed at each of the discrete points x1 , x2 , Á . The relative values of pmf at different points give an indication of the relative likelihoods of occurrence. Example 3.7 Random Number Generator A random number generator produces an integer number X that is equally likely to be any element in the set SX = 50, 1, 2, Á , M - 16. Find the pmf of X. For each k in SX , we have pX1k2 = 1/M. Note that pX102 + pX112 + Á + pX1M - 12 = 1. We call X the uniform random variable in the set 50, 1, Á , M - 16. 102 Chapter 3 Discrete Random Variables 3 8 3 8 1 8 1 8 0 1 2 x 3 (a) 4 8 3 8 1 8 x 0 1 2 3 4 5 6 7 8 (b) FIGURE 3.4 (a) Graph of pmf in three coin tosses; (b) Graph of pmf in betting game. Example 3.8 Bernoulli Random Variable Let A be an event of interest in some random experiment, e.g., a device is not defective. We say that a “success” occurs if A occurs when we perform the experiment. The Bernoulli random variable IA is equal to 1 if A occurs and zero otherwise, and is given by the indicator function for A: IA1z2 = b 0 1 if z not in A if z in A. (3.5a) Find the pmf of IA . IA1z2 is a finite discrete random variable with values from SI = 50, 16, with pmf: pI102 = P35z : z H Ac64 = 1 - p pI112 = P35z : z H A64 = p. (3.5b) We call IA the Bernoulli random variable. Note that pI112 + pI122 = 1. Example 3.9 Message Transmissions Let X be the number of times a message needs to be transmitted until it arrives correctly at its destination. Find the pmf of X. Find the probability that X is an even number. X is a discrete random variable taking on values from SX = 51, 2, 3, Á 6. The event 5X = k6 occurs if the underlying experiment finds k - 1 consecutive erroneous transmissions Section 3.2 Discrete Random Variables and Probability Mass Function 103 (“failures”) followed by a error-free one (“success”): pX1k2 = P3X = k4 = P300 Á 014 = 11 - p2k - 1p = qk - 1p k = 1, 2, Á . (3.6) We call X the geometric random variable, and we say that X is geometrically distributed. In Eq. (2.42b), we saw that the sum of the geometric probabilities is 1. 1 1 . = P3X is even4 = a pX12k2 = p a q2k - 1 = p 2 1 + q 1 - q k=1 k=1 q q Example 3.10 Transmission Errors A binary communications channel introduces a bit error in a transmission with probability p. Let X be the number of errors in n independent transmissions. Find the pmf of X. Find the probability of one or fewer errors. X takes on values in the set SX = 50, 1, Á , n6. Each transmission results in a “0” if there is no error and a “1” if there is an error, P3“1”4 = p and P3“0”4 = 1 - p. The probability of k errors in n bit transmissions is given by the probability of an error pattern that has k 1’s and n - k 0’s: n pX1k2 = P3X = k4 = ¢ ≤ pk11 - p2n - k k = 0, 1, Á , n. k (3.7) We call X the binomial random variable, with parameters n and p. In Eq. (2.39b), we saw that the sum of the binomial probabilities is 1. n n P3X … 14 = ¢ ≤ p011 - p2n - 0 + ¢ ≤ p111 - p2n - 1 = 11 - p2n + np11 - p2n - 1. 0 1 Finally, let’s consider the relationship between relative frequencies and the pmf pX1xk2. Suppose we perform n independent repetitions to obtain n observations of the discrete random variable X. Let Nk1n2 be the number of times the event X = xk occurs and let fk1n2 = Nk1n2/n be the corresponding relative frequency. As n becomes large we expect that fk1n2 : pX1xk2. Therefore the graph of relative frequencies should approach the graph of the pmf. Figure 3.5(a) shows the graph of relative 0.5 0.14 0.12 0.4 0.1 0.08 0.3 0.06 0.2 0.04 0.1 0.02 0 1 0 1 2 3 4 (a) 5 6 7 8 0 0 2 4 6 (b) 8 10 FIGURE 3.5 (a) Relative frequencies and corresponding uniform pmf; (b) Relative frequencies and corresponding geometric pmf. 12 104 Chapter 3 Discrete Random Variables frequencies for 1000 repetitions of an experiment that generates a uniform random variable from the set 50, 1, Á , 76 and the corresponding pmf. Figure 3.5(b) shows the graph of relative frequencies and pmf for a geometric random variable with p = 1/2 and n = 1000 repetitions. In both cases we see that the graph of relative frequencies approaches that of the pmf. 3.3 EXPECTED VALUE AND MOMENTS OF DISCRETE RANDOM VARIABLE In order to completely describe the behavior of a discrete random variable, an entire function, namely pX1x2, must be given. In some situations we are interested in a few parameters that summarize the information provided by the pmf. For example, Fig. 3.6 shows the results of many repetitions of an experiment that produces two random variables. The random variable Y varies about the value 0, whereas the random variable X varies around the value 5. It is also clear that X is more spread out than Y. In this section we introduce parameters that quantify these properties. The expected value or mean of a discrete random variable X is defined by mX = E3X4 = a xpX1x2 = a xkpX1xk2. xHSX (3.8) k The expected value E[X] is defined if the above sum converges absolutely, that is, E3 ƒ X ƒ 4 = a ƒ xk ƒ pX1xk2 6 q . (3.9) k There are random variables for which Eq. (3.9) does not converge. In such cases, we say that the expected value does not exist. 8 7 6 Xi 5 4 3 2 1 Yi 0 1 2 0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Trial number FIGURE 3.6 The graphs show 150 repetitions of the experiments yielding X and Y. It is clear that X is centered about the value 5 while Y is centered about 0. It is also clear that X is more spread out than Y. Section 3.3 Expected Value and Moments of Discrete Random Variable 105 If we view pX1x2 as the distribution of mass on the points x1 , x2 , Á in the real line, then E[X] represents the center of mass of this distribution. For example, in Fig. 3.5(a), we can see that the pmf of a discrete random variable that is uniformly distributed in 50, Á , 76 has a center of mass at 3.5. Example 3.11 Mean of Bernoulli Random Variable Find the expected value of the Bernoulli random variable IA . From Example 3.8, we have E3IA4 = 0pI102 + 1pI112 = p. where p is the probability of success in the Bernoulli trial. Example 3.12 Three Coin Tosses and Binomial Random Variable Let X be the number of heads in three tosses of a fair coin. Find E[X]. Equation (3.8) and the pmf of X that was found in Example 3.5 gives: 3 3 3 1 1 E3X4 = a kpX1k2 = 0 a b + 1 a b + 2a b + 3a b = 1.5. 8 8 8 8 k=0 Note that the above is the n = 3, p = 1/2 case of a binomial random variable, which we will see has E3X4 = np. Example 3.13 Mean of a Uniform Discrete Random Variable Let X be the random number generator in Example 3.7. Find E[X]. From Example 3.5 we have pX1j2 = 1/M for j = 0, Á , M - 1, so M-1 1M - 12M 1M - 12 1 1 E3X4 = a k = 50 + 1 + 2 + Á + M - 16 = = M 2M 2 k=0 M Á + L = 1L + 12L/2. Note that for M = 8, E3X4 = 3.5, where we used the fact that 1 + 2 + which is consistent with our observation of the center of mass in Fig. 3.5(a). The use of the term “expected value” does not mean that we expect to observe E[X] when we perform the experiment that generates X. For example, the expected value of a Bernoulli trial is p, but its outcomes are always either 0 or 1. E[X] corresponds to the “average of X” in a large number of observations of X. Suppose we perform n independent repetitions of the experiment that generates X, and we record the observed values as x112, x122, Á , x1n2, where x( j) is the observation in the jth experiment. Let Nk1n2 be the number of times xk is observed, and let fk1n2 = Nk1n2/n be the corresponding relative frequency. The arithmetic average, or sample mean, of the observations, is: 8X9n = x112 + x122 + Á + x1n2 = x1N11n2 + x2N21n2 + Á + xkNk1n2 + Á n = x1f11n2 + x2f21n2 + Á + xkfk1n2 + Á = a xkfk1n2. k n (3.10) 106 Chapter 3 Discrete Random Variables The first numerator adds the observations in the order in which they occur, and the second numerator counts how many times each xk occurs and then computes the total. As n becomes large, we expect relative frequencies to approach the probabilities pX1xk2: lim fk1n2 = pX1xk2 for all k. n: q (3.11) Equation (3.10) then implies that: 8X9n = a xkfk1n2 : a xkpX1xk2 = E3X4. k (3.12) k Thus we expect the sample mean to converge to E[X] as n becomes large. Example 3.14 A Betting Game A player at a fair pays $1.50 to toss a coin three times. The player receives $1 if the number of heads is 2, $8 if the number is 3, but nothing otherwise. Find the expected value of the reward Y. What is the expected value of the gain? The expected reward is: 4 3 1 11 E3Y4 = 0pY102 + 1pY1122 + 8pY182 = 0a b + 1a b + 8a b = a b. 8 8 8 8 The expected gain is: E3Y - 1.54 = 12 1 11 = - . 8 8 8 Players lose 12.5 cents on average per game, so the house makes a nice profit over the long run. In Example 3.18 we will see that some engineering designs also “bet” that users will behave a certain way. Example 3.15 Mean of a Geometric Random Variable Let X be the number of bytes in a message, and suppose that X has a geometric distribution with parameter p. Find the mean of X. X can take on arbitrarily large values since SX = 51, 2, Á 6. The expected value is: E3X4 = a kpqk - 1 = p a kqk - 1. q q k=1 k=1 This expression is readily evaluated by differentiating the series 1 = a xk 1 - x k=0 q (3.13) to obtain 11 - x2 2 Letting x = q, we obtain = a kxk - 1. q 1 E3X4 = p (3.14) k=0 1 1 = . 2 p 11 - q2 We see that X has a finite expected value as long as p 7 0. (3.15) Section 3.3 Expected Value and Moments of Discrete Random Variable 107 For certain random variables large values occur sufficiently frequently that the expected value does not exist, as illustrated by the following example. Example 3.16 St. Petersburg Paradox A fair coin is tossed repeatedly until a tail comes up. If X tosses are needed, then the casino pays the gambler Y = 2 X dollars. How much should the gambler be willing to pay to play this game? If the gambler plays this game a large number of times, then the payoff should be the expected value of Y = 2 X. If the coin is fair, P3X = k4 = 11/22k and P3Y = 2 k4 = 11/22k, so: 1 k E3Y4 = a 2 kpY12 k2 = a 2 k a b = 1 + 1 + Á = q . 2 q q k=1 k=1 This game does indeed appear to offer the gambler a sweet deal, and so the gambler should be willing to pay any amount to play the game! The paradox is that a sane person would not pay a lot to play this game. Problem 3.34 discusses ways to resolve the paradox. Random variables with unbounded expected value are not uncommon and appear in models where outcomes that have extremely large values are not that rare. Examples include the sizes of files in Web transfers, frequencies of words in large bodies of text, and various financial and economic problems. 3.3.1 Expected Value of Functions of a Random Variable Let X be a discrete random variable, and let Z = g1X2. Since X is discrete, Z = g1X2 will assume a countable set of values of the form g1xk2 where xk H SX . Denote the set of values assumed by g(X) by 5z1 , z2 , Á 6. One way to find the expected value of Z is to use Eq. (3.8), which requires that we first find the pmf of Z. Another way is to use the following result: E3Z4 = E3g1X24 = a g1xk2pX1xk2. (3.16) k To show Eq. (3.16) group the terms xk that are mapped to each value zj: a g1xk2pX1xk2 = a zj b k j a xk :g1xk2 = zj pX1xk2 r = a zjpZ1zj2 = E3Z4. j The sum inside the braces is the probability of all terms xk for which g1xk2 = zj , which is the probability that Z = zj , that is, pZ1zj2. Example 3.17 Square-Law Device Let X be a noise voltage that is uniformly distributed in SX = 5-3, -1, +1, +36 with pX1k2 = 1/4 for k in SX . Find E[Z] where Z = X2. Using the first approach we find the pmf of Z: pZ192 = P[X H 5-3, +36] = pX1-32 + pX132 = 1/2 pZ112 = pX1-12 + pX112 = 1/2 108 Chapter 3 Discrete Random Variables and so 1 1 E3Z4 = 1a b + 9 a b = 5. 2 2 The second approach gives: 1 20 = 5. E3Z4 = E3X24 = a k2pX1k2 = 51-322 + 1-122 + 12 + 326 = 4 4 k Equation 3.16 implies several very useful results. Let Z be the function Z = ag1X2 + bh1X2 + c where a, b, and c are real numbers, then: E3Z4 = aE3g1X24 + bE3h1X24 + c. (3.17a) From Eq. (3.16) we have: E3Z4 = E3ag1X2 + bh1X2 + c4 = a 1ag1xk2 + bh1xk2 + c2pX1xk2 = a a g1xk2pX1xk2 + b a h1xk2pX1xk2 + c a pX1xk2 k k k k = aE3g1X24 + bE3h1X24 + c. Equation (3.17a), by setting a, b, and/or c to 0 or 1, implies the following expressions: E3g1X2 + h1X24 = E3g1X24 + E3h1X24. (3.17b) E3aX4 = aE3X4. (3.17c) E3X + c4 = E3X4 + c. (3.17d) E3c4 = c. (3.17e) Example 3.18 Square-Law Device The noise voltage X in the previous example is amplified and shifted to obtain Y = 2X + 10, and then squared to produce Z = Y2 = 12X + 1022. Find E[Z]. E3Z4 = E312X + 10224 = E34X2 + 40X + 1004 = 4E3X24 + 40E3X4 + 100 = 4152 + 40102 + 100 = 120. Example 3.19 Voice Packet Multiplexer Let X be the number of voice packets containing active speech produced by n = 48 independent speakers in a 10-millisecond period as discussed in Section 1.4. X is a binomial random variable with parameter n and probability p = 1/3. Suppose a packet multiplexer transmits up to M = 20 active packets every 10 ms, and any excess active packets are discarded. Let Z be the number of packets discarded. Find E[Z]. Section 3.3 Expected Value and Moments of Discrete Random Variable 109 The number of packets discarded every 10 ms is the following function of X: Z = 1X - M2+ ! b 0 X - M if X … M if X 7 M. 48 48 1 k 2 48 - k = 0.182. E3Z4 = a 1k - 202 ¢ ≤ a b a b k 3 3 k = 20 Every 10 ms E3X4 = np = 16 active packets are produced on average, so the fraction of active packets discarded is 0.182/16 = 1.1%, which users will tolerate. This example shows that engineered systems also play “betting” games where favorable statistics are exploited to use resources efficiently. In this example, the multiplexer transmits 20 packets per period instead of 48 for a reduction of 28/48 = 58%. 3.3.2 Variance of a Random Variable The expected value E[X], by itself, provides us with limited information about X. For example, if we know that E3X4 = 0, then it could be that X is zero all the time. However, it is also possible that X can take on extremely large positive and negative values. We are therefore interested not only in the mean of a random variable, but also in the extent of the random variable’s variation about its mean. Let the deviation of the random variable X about its mean be X - E3X4, which can take on positive and negative values. Since we are interested in the magnitude of the variations only, it is convenient to work with the square of the deviation, which is always positive, D1X2 = 1X - E3X422. The expected value is a constant, so we will denote it by mX = E3X4. The variance of the random variable X is defined as the expected value of D: s2X = VAR3X4 = E31X - mX224 = a 1x - mX22pX1x2 = a 1xk - mX22pX1xk2. q (3.18) k=1 xHSX The standard deviation of the random variable X is defined by: sX = STD3X4 = VAR3X41/2. (3.19) By taking the square root of the variance we obtain a quantity with the same units as X. An alternative expression for the variance can be obtained as follows: VAR3X4 = E31X - mX224 = E3X2 - 2mXX + m2X4 = E3X24 - 2mXE3X4 + m2X = E3X24 - m2X . (3.20) E3X24 is called the second moment of X. The nth moment of X is defined as E3Xn4. Equations (3.17c), (3.17d), and (3.17e) imply the following useful expressions for the variance. Let Y = X + c, then VAR3X + c4 = E31X + c - 1E3X4 + c24224 = E31X - E3X4224 = VAR3X4. (3.21) 110 Chapter 3 Discrete Random Variables Adding a constant to a random variable does not affect the variance. Let Z = cX, then: VAR3cX4 = E31cX - cE3X4224 = E3c21X - E3X4224 = c2 VAR3X4. (3.22) Scaling a random variable by c scales the variance by c2 and the standard deviation by ƒ c ƒ . Now let X = c, a random variable that is equal to a constant with probability 1, then VAR3X4 = E31X - c224 = E304 = 0. (3.23) A constant random variable has zero variance. Example 3.20 Three Coin Tosses Let X be the number of heads in three tosses of a fair coin. Find VAR[X]. 3 3 1 1 E3X24 = 0a b + 12 a b + 2 2 a b + 32 a b = 3 and 8 8 8 8 VAR3X4 = E3X24 - m2X = 3 - 1.52 = 0.75. Recall that this is an n = 3, p = 1>2 binomial random variable. We see later that variance for the binomial random variable is npq. Example 3.21 Variance of Bernoulli Random Variable Find the variance of the Bernoulli random variable IA . E3I 2A4 = 0pI102 + 12pI112 = p and so VAR3IA4 = p - p2 = p11 - p2 = pq. Example 3.22 Variance of Geometric Random Variable Find the variance of the geometric random variable. Differentiate the term 11 - x22-1 in Eq. (3.14) to obtain 2 = a k1k - 12xk - 2. 11 - x23 k=0 q Let x = q and multiply both sides by pq to obtain: = pq a k1k - 12qk - 2 11 - q23 k=0 q 2pq = a k1k - 12pqk - 1 = E3X24 - E3X4. q k=0 So the second moment is E3X24 = 2pq 11 - q23 + E3X4 = 2q p2 + 1 + q 1 = p p2 (3.24) Section 3.4 Conditional Probability Mass Function 111 and the variance is VAR3X4 = E3X24 - E3X42 = 3.4 1 + q p 2 - q 1 = 2. 2 p p CONDITIONAL PROBABILITY MASS FUNCTION In many situations we have partial information about a random variable X or about the outcome of its underlying random experiment. We are interested in how this information changes the probability of events involving the random variable. The conditional probability mass function addresses this question for discrete random variables. 3.4.1 Conditional Probability Mass Function Let X be a discrete random variable with pmf pX1x2, and let C be an event that has nonzero probability, P3C4 7 0. See Fig. 3.7. The conditional probability mass function of X is defined by the conditional probability: pX1x ƒ C2 = P3X = x ƒ C4 for x a real number. (3.25) Applying the definition of conditional probability we have: pX1x ƒ C2 = P35X = x6 ¨ C4 P3C4 (3.26) . The above expression has a nice intuitive interpretation:The conditional probability of the event 5X = xk6 is given by the probabilities of outcomes z for which both X1z2 = xk and z are in C, normalized by P[C]. The conditional pmf satisfies Eqs. (3.4a) – (3.4c). Consider Eq. (3.4b). The set of events A k = 5X = xk6 is a partition of S, so C = d 1A k ¨ C2, and k a pX1xk ƒ C2 = a pX1xk ƒ C2 = a xk HSX all k = P35X = xk6 ¨ C4 P3C4 all k P3C4 1 P3A k ¨ C4 = = 1. a P3C4 all k P3C4 S Ak X(z)  xk C xk FIGURE 3.7 Conditional pmf of X given event C. 112 Chapter 3 Discrete Random Variables Similarly we can show that: P3X in B ƒ C4 = a pX1x ƒ C2 where B ( SX . xHB Example 3.23 A Random Clock The minute hand in a clock is spun and the outcome z is the minute where the hand comes to rest. Let X be the hour where the hand comes to rest. Find the pmf of X. Find the conditional pmf of X given B = 5first 4 hours6; given D = 51 6 z … 116. We assume that the hand is equally likely to rest at any of the minutes in the range S = 51, 2, Á , 606, so P3z = k4 = 1/60 for k in S. X takes on values from SX = 51, 2, Á , 126 and it is easy to show that pX1j2 = 1/12 for j in SX . Since B = 51, 2, 3, 46: pX1j ƒ B2 = P35X = j6 ¨ B4 P3B4 P3X = j4 = c 1/3 = 0 1 4 P3X H 5j6 ¨ 51, 2, 3, 464 = P3X H 51, 2, 3, 464 if j H 51, 2, 3, 46 otherwise. The event B above involves X only. The event D, however, is stated in terms of the outcomes in the underlying experiment (i.e., minutes not hours), so the probability of the intersection has to be expressed accordingly: pX1j ƒ D2 = P35X = j6 ¨ D4 P3D4 = P3z : X1z2 = j and z H 52, Á , 1164 P3z H 52, 3, 4, 564 4 = 10/60 10 P3z H 56, 7, 8, 9, 1064 5 = f = 10/60 10 P3z H 51164 1 = 10/60 10 P3z H 52, Á , 1164 for j = 1 for j = 2 for j = 3. Most of the time the event C is defined in terms of X, for example C = 5X 7 106 or C = 5a … X … b6. For xk in SX , we have the following general result: pX1xk2 pX1xk ƒ C2 = c P3C4 0 if xk H C (3.27) if xk x C. The above expression is determined entirely by the pmf of X. Example 3.24 Residual Waiting Times Let X be the time required to transmit a message, where X is a uniform random variable with SX = 51, 2, Á , L6. Suppose that a message has already been transmitting for m time units, find the probability that the remaining transmission time is j time units. Section 3.4 Conditional Probability Mass Function 113 We are given C = 5X 7 m6, so for m + 1 … m + j … L: pX1m + j ƒ X 7 m2 = P3X = m + j4 P3X 7 m4 1 1 L = = L - m L - m L for m + 1 … m + j … L. (3.28) X is equally likely to be any of the remaining L - m possible values. As m increases, 1/1L - m2 increases implying that the end of the message transmission becomes increasingly likely. Many random experiments have natural ways of partitioning the sample space S into the union of disjoint events B1 , B2 , Á , Bn . Let pX1x ƒ Bi2 be the conditional pmf of X given event Bi . The theorem on total probability allows us to find the pmf of X in terms of the conditional pmf’s: pX1x2 = a pX1x ƒ Bi2P3Bi4. n (3.29) i=1 Example 3.25 Device Lifetimes A production line yields two types of devices. Type 1 devices occur with probability a and work for a relatively short time that is geometrically distributed with parameter r. Type 2 devices work much longer, occur with probability 1 - a, and have a lifetime that is geometrically distributed with parameter s. Let X be the lifetime of an arbitrary device. Find the pmf of X. The random experiment that generates X involves selecting a device type and then observing its lifetime. We can partition the sets of outcomes in this experiment into event B1, consisting of those outcomes in which the device is type 1, and B2, consisting of those outcomes in which the device is type 2. The conditional pmf’s of X given the device type are: pXƒB11k2 = 11 - r2k - 1r for k = 1, 2, Á pXƒB21k2 = 11 - s2k - 1s for k = 1, 2, Á . and We obtain the pmf of X from Eq. (3.29): pX1k2 = pX1k ƒ B12P3B14 + pX1k ƒ B22P3B24 = 11 - r2k - 1ra + 11 - s2k - 1s11 - a2 3.4.2 for k = 1, 2, Á . Conditional Expected Value Let X be a discrete random variable, and suppose that we know that event B has occurred. The conditional expected value of X given B is defined as: mXƒB = E3X ƒ B4 = a xpX1x ƒ B2 = a xkpX1xk ƒ B2 xHSX k (3.30) 114 Chapter 3 Discrete Random Variables where we apply the absolute convergence requirement on the summation.The conditional variance of X given B is defined as: VAR3X ƒ B4 = E31X - mXƒB22 ƒ B4 = a 1xk - mXƒB22pX1xk ƒ B2 q k=1 = E3X2 ƒ B4 - m2XƒB . Note that the variation is measured with respect to mXƒB, not mX . Let B1, B2,..., Bn be the partition of S, and let pX1x ƒ Bi2 be the conditional pmf of X given event Bi. E[X] can be calculated from the conditional expected values E3X ƒ B4: E3X4 = a E3X ƒ Bi4P3Bi4. n (3.31a) i=1 By the theorem on total probability we have: E3X4 = a kpX1xk2 = a k b a pX1xk ƒ Bi2P3Bi4 r n k k i=1 = a b a kpX1xk ƒ Bi2 r P3Bi4 = a E3X ƒ Bi4P3Bi4, n n i=1 i=1 k where we first express pX1xk2 in terms of the conditional pmf’s, and we then change the order of summation. Using the same approach we can also show E3g1X24 = a E3g1X2 ƒ Bi4P3Bi4. n (3.31b) i=1 Example 3.26 Device Lifetimes Find the mean and variance for the devices in Example 3.25. The conditional mean and second moment of each device type is that of a geometric random variable with the corresponding parameter: mXƒB1 = 1/r E3X2 ƒ B14 = 11 + r2/r2 mXƒB2 = 1/s E3X2 ƒ B24 = 11 + s2/s2. The mean and the second moment of X are then: mX = mXƒB1a + mXƒB211 - a2 = a/r + 11 - a2/s E3X24 = E3X2 ƒ B14a + E3X2 ƒ B2411 - a2 = a11 + r2/r2 + 11 - a211 + s2/s2. Finally, the variance of X is: VAR3X4 = E3X24 - m2X = a11 + r2 r 2 + 11 - a211 + s2 s 2 - a 11 - a2 2 a + b . r s Note that we do not use the conditional variances to find VAR[Y] because Eq. (3.31b) does not apply to conditional variances. (See Problem 3.40.) However, the equation does apply to the conditional second moments. Section 3.5 3.5 Important Discrete Random Variables 115 IMPORTANT DISCRETE RANDOM VARIABLES Certain random variables arise in many diverse, unrelated applications. The pervasiveness of these random variables is due to the fact that they model fundamental mechanisms that underlie random behavior. In this section we present the most important of the discrete random variables and discuss how they arise and how they are interrelated. Table 3.1 summarizes the basic properties of the discrete random variables discussed in this section. By the end of this chapter, most of these properties presented in the table will have been introduced. TABLE 3.1 Discrete random variables Bernoulli Random Variable SX = 50, 16 p0 = q = 1 - p p1 = p 0 … p … 1 GX1z2 = 1q + pz2 E3X4 = p VAR3X4 = p11 - p2 Remarks: The Bernoulli random variable is the value of the indicator function IA for some event A; X = 1 if A occurs and 0 otherwise. Binomial Random Variable SX = 50, 1, Á , n6 n pk = ¢ ≤ pk11 - p2n - k k k = 0, 1, Á , n E3X4 = np VAR3X4 = np11 - p2 GX1z2 = 1q + pz2n Remarks: X is the number of successes in n Bernoulli trials and hence the sum of n independent, identically distributed Bernoulli random variables. Geometric Random Variable First Version: SX = 50, 1, 2, Á 6 pk = p11 - p2k E3X4 = k = 0, 1, Á 1 - p VAR3X4 = p 1 - p p 2 GX1z2 = p 1 - qz Remarks: X is the number of failures before the first success in a sequence of independent Bernoulli trials. The geometric random variable is the only discrete random variable with the memoryless property. Second Version: SX¿ = 51, 2, Á 6 pk = p11 - p2k - 1 E3X¿4 = 1 p k = 1, 2, Á VAR3X¿4 = 1 - p p2 GX¿1z2 = pz 1 - qz Remarks: X¿ = X + 1 is the number of trials until the first success in a sequence of independent Bernoulli trials. (Continued) 116 Chapter 3 Discrete Random Variables TABLE 3.1 Continued Negative Binomial Random Variable SX = 5r, r + 1, Á 6 where r is a positive integer pk = ¢ k - 1 r ≤ p 11 - p2k - r r - 1 E3X4 = r p VAR3X4 = k = r, r + 1, Á r11 - p2 p GX1z2 = a 2 pz 1 - qz b r Remarks: X is the number of trials until the rth success in a sequence of independent Bernoulli trials. Poisson Random Variable SX = 50, 1, 2, Á 6 pk = ak -a e k! E3X4 = a and a 7 0 k = 0, 1, Á VAR3X4 = a GX1z2 = ea1z - 12 Remarks: X is the number of events that occur in one time unit when the time between events is exponentially distributed with mean 1/a. Uniform Random Variable SX = 51, 2, Á , L6 pk = 1 L E3X4 = k = 1, 2, Á , L L + 1 2 VAR3X4 = L2 - 1 12 GX1z2 = z 1 - zL L 1 - z Remarks: The uniform random variable occurs whenever outcomes are equally likely. It plays a key role in the generation of random numbers. Zipf Random Variable SX = 51, 2, Á , L6 where L is a positive integer pk = 1 1 cL k E3X4 = L cL k = 1, 2, Á , L where cL is given by Eq. 13.452 VAR3X4 = L1L + 12 2cL - L2 c2L Remarks: The Zipf random variable has the property that a few outcomes occur frequently but most outcomes occur rarely. Discrete random variables arise mostly in applications where counting is involved. We begin with the Bernoulli random variable as a model for a single coin toss. By counting the outcomes of multiple coin tosses we obtain the binomial, geometric, and Poisson random variables. Section 3.5 3.5.1 Important Discrete Random Variables 117 The Bernoulli Random Variable Let A be an event related to the outcomes of some random experiment. The Bernoulli random variable IA (defined in Example 3.8) equals one if the event A occurs, and zero otherwise. IA is a discrete random variable since it assigns a number to each outcome of S. It is a discrete random variable with range = 50, 16, and its pmf is pI102 = 1 - p and pI112 = p, (3.32) where P3A4 = p. In Example 3.11 we found the mean of IA: mI = E3IA4 = p. The sample mean in n independent Bernoulli trials is simply the relative frequency of successes and converges to p as n increases: 0N01n2 + 1N11n2 = f11n2 : p. n In Example 3.21 we found the variance of IA: 8IA9n = s2I = VAR3IA4 = p11 - p2 = pq. The variance is quadratic in p, with value zero at p = 0 and p = 1 and maximum at p = 1/2. This agrees with intuition since values of p close to 0 or to 1 imply a preponderance of successes or failures and hence less variability in the observed values. The maximum variability occurs when p = 1/2 which corresponds to the case that is most difficult to predict. Every Bernoulli trial, regardless of the event A, is equivalent to the tossing of a biased coin with probability of heads p. In this sense, coin tossing can be viewed as representative of a fundamental mechanism for generating randomness, and the Bernoulli random variable is the model associated with it. 3.5.2 The Binomial Random Variable Suppose that a random experiment is repeated n independent times. Let X be the number of times a certain event A occurs in these n trials. X is then a random variable with range SX = 50, 1, Á , n6. For example, X could be the number of heads in n tosses of a coin. If we let Ij be the indicator function for the event A in the jth trial, then X = I1 + I2 + Á + In , that is, X is the sum of the Bernoulli random variables associated with each of the n independent trials. In Section 2.6, we found that X has probabilities that depend on n and p: n P3X = k4 = pX1k2 = ¢ ≤ pk11 - p2n - k k for k = 0, Á , n. (3.33) X is called the binomial random variable. Figure 3.8 shows the pdf of X for n = 24 and p = .2 and p = .5. Note that P3X = k4 is maximum at kmax = 31n + 12p4, where [x] 118 Chapter 3 Discrete Random Variables .2 .2 n ⫽ 24 p ⫽ .2 n ⫽ 24 p ⫽ .5 .15 .15 .1 .1 .05 .05 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (a) (b) FIGURE 3.8 Probability mass functions of binomial random variable (a) p ⴝ 0.2; (b) p ⴝ 0.5. denotes the largest integer that is smaller than or equal to x. When 1n + 12p is an integer, then the maximum is achieved at kmax and kmax - 1. (See Problem 3.50.) The factorial terms grow large very quickly and cause overflow problems in the n calculation of ¢ ≤ . We can use Eq. (2.40) for the ratio of successive terms in the k pmf allows us to calculate pX1k + 12 in terms of pX1k2 and delays the onset of overflows: pX1k + 12 n - k p = pX1k2 k + 11 - p where pX102 = 11 - p2n. (3.34) The binomial random variable arises in applications where there are two types of objects (i.e., heads/tails, correct/erroneous bits, good/defective items, active/silent speakers), and we are interested in the number of type 1 objects in a randomly selected batch of size n, where the type of each object is independent of the types of the other objects in the batch. Examples involving the binomial random variable were given in Section 2.6. Example 3.27 Mean of a Binomial Random Variable The expected value of X is: n n n n n! pk11 - p2n - k E3X4 = a kpX1k2 = a k ¢ ≤ pk11 - p2n - k = a k k k!1n - k2! k=0 k=0 k=1 n 1n - 12! pk - 111 - p2n - k = np a k = 1 1k - 12!1n - k2! n-1 1n - 12! pj11 - p2n - 1 - j = np, = np a j = 0 j!1n - 1 - j2! (3.35) where the first line uses the fact that the k = 0 term in the sum is zero, the second line cancels out the k and factors np outside the summation, and the last line uses the fact that the summation is equal to one since it adds all the terms in a binomial pmf with parameters n - 1 and p. Section 3.5 Important Discrete Random Variables 119 The expected value E3X4 = np agrees with our intuition since we expect a fraction p of the outcomes to result in success. Example 3.28 Variance of a Binomial Random Variable To find E3X24 below, we remove the k = 0 term and then let k¿ = k - 1: n n n! n! E3X24 = a k2 pk11 - p2n - k = a k pk11 - p2n - k k!1n k2! 1k 12!1n - k2! k=0 k=1 = np a 1k¿ + 12 ¢ n-1 k¿ = 0 = np b a k¿ ¢ n-1 k¿ = 0 n - 1 ≤ k¿11 - p2n - 1 - k k¿ p n-1 n - 1 n - 1 ≤ pk¿11 - p2n - 1 - k + a 1 ¢ ≤ k¿11 - p2n - 1 - k¿ r k¿ k¿ p k¿ = 0 = np51n - 12p + 16 = np1np + q2. In the third line we see that the first sum is the mean of a binomial random variable with parameters 1n - 12 and p, and hence equal to 1n - 12p. The second sum is the sum of the binomial probabilities and hence equal to 1. We obtain the variance as follows: s2X = E3X24 - E3X42 = np1np + q2 - 1np22 = npq = np11 - p2. We see that the variance of the binomial is n times the variance of a Bernoulli random variable. We observe that values of p close to 0 or to 1 imply smaller variance, and that the maximum variability is when p = 1/2. Example 3.29 Redundant Systems A system uses triple redundancy for reliability: Three microprocessors are installed and the system is designed so that it operates as long as one microprocessor is still functional. Suppose that the probability that a microprocessor is still active after t seconds is p = e -lt. Find the probability that the system is still operating after t seconds. Let X be the number of microprocessors that are functional at time t. X is a binomial random variable with parameter n = 3 and p. Therefore: P3X Ú 14 = 1 - P3X = 04 = 1 - 11 - e -lt23. 3.5.3 The Geometric Random Variable The geometric random variable arises when we count the number M of independent Bernoulli trials until the first occurrence of a success. M is called the geometric random variable and it takes on values from the set 51, 2, Á 6. In Section 2.6, we found that the pmf of M is given by P3M = k4 = pM1k2 = 11 - p2k - 1p k = 1, 2, Á , (3.36) where p = P3A4 is the probability of “success” in each Bernoulli trial. Figure 3.5(b) shows the geometric pmf for p = 1/2. Note that P3M = k4 decays geometrically with k, and that the ratio of consecutive terms is pM1k + 12>pM1k2 = 11 -p2 = q. As p increases, the pmf decays more rapidly. 120 Chapter 3 Discrete Random Variables The probability that M … k can be written in closed form: k-1 k 1 - qk = 1 - qk. P3M … k4 = a pqj - 1 = p a qj¿ = p 1 - q j¿ = 0 j=1 (3.37) Sometimes we are interested in M¿ = M - 1, the number of failures before a success occurs. We also refer to M¿ as a geometric random variable. Its pmf is: P3M¿ = k4 = P3M = k + 14 = 11 - p2kp k = 0, 1, 2, Á . (3.38) In Examples 3.15 and 3.22, we found the mean and variance of the geometric random variable: 1 - p . mM = E3M4 = 1/p VAR3M4 = p2 We see that the mean and variance increase as p, the success probability, decreases. The geometric random variable is the only discrete random variable that satisfies the memoryless property: P3M Ú k + j ƒ M 7 j4 = P3M Ú k4 for all j, k 7 1. (See Problems 3.54 and 3.55.) The above expression states that if a success has not occurred in the first j trials, then the probability of having to perform at least k more trials is the same as the probability of initially having to perform at least k trials. Thus, each time a failure occurs, the system “forgets” and begins anew as if it were performing the first trial. The geometric random variable arises in applications where one is interested in the time (i.e., number of trials) that elapses between the occurrence of events in a sequence of independent experiments, as in Examples 2.11 and 2.43. Examples where the modified geometric random variable M¿ arises are: number of customers awaiting service in a queueing system; number of white dots between successive black dots in a scan of a black-and-white document. 3.5.4 The Poisson Random Variable In many applications, we are interested in counting the number of occurrences of an event in a certain time period or in a certain region in space. The Poisson random variable arises in situations where the events occur “completely at random” in time or space. For example, the Poisson random variable arises in counts of emissions from radioactive substances, in counts of demands for telephone connections, and in counts of defects in a semiconductor chip. The pmf for the Poisson random variable is given by P3N = k4 = pN1k2 = ak -a e k! for k = 0, 1, 2, Á , (3.39) where a is the average number of event occurrences in a specified time interval or region in space. Figure 3.9 shows the Poisson pmf for several values of a. For a 6 1, P3N = k4 is maximum at k = 0; for a 7 1, P3N = k4 is maximum at 3a4; if a is a positive integer, the P3N = k4 is maximum at k = a and at k = a - 1. Section 3.5 Important Discrete Random Variables .5 α ⫽ 0.75 .4 .3 .2 .1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (a) .25 α⫽3 .2 .15 .1 .05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (b) .25 α⫽9 .2 .15 .1 .05 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 (c) FIGURE 3.9 Probability mass functions of Poisson random variable (a) a = 0.75; (b) a = 3; (c) a = 9. 121 122 Chapter 3 Discrete Random Variables The pmf of the Poisson random variable sums to one, since ak -a ak -a e = e = e -aea = 1, a a k = 0 k! k = 0 k! q q where we used the fact that the second summation is the infinite series expansion for ea. It is easy to show that the mean and variance of a Poisson random variable is given by: E3N4 = a and s2N = VAR3N4 = a. Example 3.30 Queries at a Call Center The number N of queries arriving in t seconds at a call center is a Poisson random variable with a = lt where l is the average arrival rate in queries/second. Assume that the arrival rate is four queries per minute. Find the probability of the following events: (a) more than 4 queries in 10 seconds; (b) fewer than 5 queries in 2 minutes. The arrival rate in queries/second is l = 4 queries/60 sec = 1/15 queries/sec. In part a, the time interval is 10 seconds, so we have a Poisson random variable with a = 11/15 queries/sec2 * 10 seconds = 10/15 queries. The probability of interest is evaluated numerically: 4 12/32k k=0 k! P3N 7 44 = 1 - P3N … 44 = 1 - a e -2/3 = 6.33110-42. In part b, the time interval of interest is t = 120 seconds, so a = 1/15 * 120 seconds = 8. The probability of interest is: 5 182k e -8 = 0.10. P3N … 54 = a k = 0 k! Example 3.31 Arrivals at a Packet Multiplexer The number N of packet arrivals in t seconds at a multiplexer is a Poisson random variable with a = lt where l is the average arrival rate in packets/second. Find the probability that there are no packet arrivals in t seconds. P3N = 04 = a0 -lt e = e -lt. 0! This equation has an interesting interpretation. Let Z be the time until the first packet arrival. Suppose we ask, “What is the probability that X 7 t, that is, the next arrival occurs t or more seconds later?” Note that 5N = 06 implies 5Z 7 t6 and vice versa, so P3Z 7 t4 = e -lt. The probability of no arrival decreases exponentially with t. Note that we can also show that n - 1 1lt2k e -lt. P3N1t2 Ú n4 = 1 - P3N1t2 6 n4 = 1 - a k = 0 k! One of the applications of the Poisson probabilities in Eq. (3.39) is to approximate the binomial probabilities in the case where p is very small and n is very large, Section 3.5 Important Discrete Random Variables 123 that is, where the event A of interest is very rare but the number of Bernoulli trials is very large. We show that if a = np is fixed, then as n becomes large: ak -a n pk = ¢ ≤ pk11 - p2n - k M e k! k for k = 0, 1, Á . (3.40) Equation (3.40) is obtained by taking the limit n : q in the expression for pk , while keeping a = np fixed. First, consider the probability that no events occur in n trials: p0 = 11 - p2n = a1 - a n b : e -a n as n : q , (3.41) where the limit in the last expression is a well known result from calculus. Consider the ratio of successive binomial probabilities: 1n - k2p 11 - k/n2a pk + 1 = = pk 1k + 12q 1k + 1211 - a/n2 a as n : q . : k + 1 Thus the limiting probabilities satisfy pk + 1 = a a a ak -a a pk = a b a b Á a bp0 = e . k + 1 k + 1 k 1 k! (3.42) Thus the Poisson pmf can be used to approximate the binomial pmf for large n and small p, using a = np. Example 3.32 Errors in Optical Transmission An optical communication system transmits information at a rate of 109 bits/second. The probability of a bit error in the optical communication system is 10-9. Find the probability of five or more errors in 1 second. Each bit transmission corresponds to a Bernoulli trial with a “success” corresponding to a bit error in transmission. The probability of k errors in n = 109 transmissions (1 second) is then given by the binomial probability with n = 109 and p = 10-9. The Poisson approximation uses a = np = 109110-92 = 1. Thus 4 ak -a P3N Ú 54 = 1 - P3N 6 54 = 1 - a e k = 0 k! = 1 - e -1 e 1 + 1 1 1 1 + + + f = .00366. 1! 2! 3! 4! The Poisson random variable appears in numerous physical situations because many models are very large in scale and involve very rare events. For example, the Poisson pmf gives an accurate prediction for the relative frequencies of the number of particles emitted by a radioactive mass during a fixed time period. This correspondence can be explained as follows. A radioactive mass is composed of a large number of atoms, say n. In a fixed time interval each atom has a very small probability p of disintegrating and emitting a radioactive particle. If atoms disintegrate independently of 124 Chapter 3 Discrete Random Variables … 0 t T FIGURE 3.10 Event occurrences in n subintervals of [0, T]. other atoms, then the number of emissions in a time interval can be viewed as the number of successes in n trials. For example, one microgram of radium contains about n = 1016 atoms, and the probability that a single atom will disintegrate during a onemillisecond time interval is p = 10 -15 [Rozanov, p. 58]. Thus it is an understatement to say that the conditions for the approximation in Eq. (3.40) hold: n is so large and p so small that one could argue that the limit n : q has been carried out and that the number of emissions is exactly a Poisson random variable. The Poisson random variable also comes up in situations where we can imagine a sequence of Bernoulli trials taking place in time or space. Suppose we count the number of event occurrences in a T-second interval. Divide the time interval into a very large number, n, of subintervals as shown in Fig. 3.10. A pulse in a subinterval indicates the occurrence of an event. Each subinterval can be viewed as one in a sequence of independent Bernoulli trials if the following conditions hold: (1) At most one event can occur in a subinterval, that is, the probability of more than one event occurrence is negligible; (2) the outcomes in different subintervals are independent; and (3) the probability of an event occurrence in a subinterval is p = a/n, where a is the average number of events observed in a 1-second interval. The number N of events in 1 second is a binomial random variable with parameters n and p = a/n. Thus as n : q , N becomes a Poisson random variable with parameter a. In Chapter 9 we will revisit this result when we discuss the Poisson random process. 3.5.5 The Uniform Random Variable The discrete uniform random variable Y takes on values in a set of consecutive integers SY = 5j + 1, Á , j + L6 with equal probability: pY1k2 = 1 L for k H 5j + 1, Á , j + L6. (3.43) This humble random variable occurs whenever outcomes are equally likely, e.g., toss of a fair coin or a fair die, spinning of an arrow in a wheel divided into equal segments, selection of numbers from an urn. It is easy to show that the mean and variance are: E3Y4 = j + L + 1 2 and VAR3Y4 = L2 - 1 . 12 Example 3.33 Discrete Uniform Random Variable in Unit Interval Let X be a uniform random variable in SX = 50, 1, Á , L - 16. We define the discrete uniform random variable in the unit interval by U = X L so SU = e 0, 1 2 3 1 , , , Á , 1 - f. L L L L Section 3.5 Important Discrete Random Variables 125 U has pmf: pU a 1 k b = L L for k = 0, 2, Á , L - 1. The pmf of U puts equal probability mass 1/L on equally spaced points xk = k/L in the unit interval. The probability of a subinterval of the unit interval is equal to the number of points in the subinterval multiplied by 1/L. As L becomes very large, this probability is essentially the length of the subinterval. 3.5.6 The Zipf Random Variable The Zipf random variable is named for George Zipf who observed that the frequency of words in a large body of text is proportional to their rank. Suppose that words are ranked from most frequent, to next most frequent, and so on. Let X be the rank of a word, then SX = 51, 2, Á , L6 where L is the number of distinct words. The pmf of X is: pX1k2 = 1 1 cL k for k = 1, 2, Á , L. (3.44) where cL is a normalization constant. The second word has 1/2 the frequency of occurrence as the first, the third word has 1/3 the frequency of the first, and so on. The normalization constant cL is given by the sum: L 1 1 1 1 cL = a = 1 + + + Á + 2 3 L j=1 j (3.45) The constant cL occurs frequently in calculus and is called the Lth harmonic mean and increases approximately as lnL. For example, for L = 100, cL = 5.187378 and cL - ln1L2 = 0.582207. It can be shown that as L : q , cL - lnL : 0.57721 Á . The mean of X is given by: L L L 1 = . E3X4 = a jpX1j2 = a j cL j = 1 cLj j=1 (3.46) The second moment and variance of X are: and L L1L + 12 1 1 L j = E3X24 = a j2 = a cL j = 1 2cL j = 1 cLj VAR3X4 = L1L + 12 2cL - L2 . c2L (3.47) The Zipf and related random variables have gained prominence with the growth of the Internet where they have been found in a variety of measurement studies involving Web page sizes, Web access behavior, and Web page interconnectivity. These random variables had previously been found extensively in studies on the distribution of wealth and, not surprisingly, are now found in Internet video rentals and book sales. Discrete Random Variables 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Zipf 97 89 81 73 65 57 49 41 33 25 17 9 Geometric 1 P [X > k] Chapter 3 k FIGURE 3.11 Zipf distribution and its long tail. % wealth 126 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 % population 0.8 1 1.2 FIGURE 3.12 Lorenz curve for Zipf random variable with L ⴝ 100. Example 3.34 Rare Events and Long Tails The Zipf random variable X has the property that a few outcomes (words) occur frequently but most outcomes occur rarely. Find the probability of words with rank higher than m. P3X 7 m4 = 1 - P3X … m4 = 1 - cm 1 m 1 = 1 cL ja cL j =1 for m … L. (3.48) We call P3X 7 m4 the probability of the tail of the distribution of X. Figure 3.11 shows the P3X 7 m4 with L = 100 which has E[X] = 100/c100 = 19.28. Figure 3.12 also shows P[Y 7 m] for a geometric random variable with the same mean, that is, 1/p = 19.28. It can be seen that P3Y 7 m4 for the geometric random variable drops off much more quickly than P3X 7 m4. The Zipf distribution is said to have a “long tail” because rare events are more likely to occur than in traditional probability models. Example 3.35 80/20 Rule and the Lorenz Curve Let X correspond to a level of wealth and pX1k2 be the proportion of a population that has wealth k. Suppose that X is a Zipf random variable. Thus pX112 is the proportion of the population with wealth 1, pX122 the proportion with wealth 2, and so on. The long tail of the Zipf distribution suggests that very rich individuals are not very rare. We frequently hear statements such as “20% of the population owns 80% of the wealth.” The Lorenz curve plots the proportion Section 3.6 Generation of Discrete Random Variables 127 of wealth owned by the poorest fraction x of the population, as the x varies from 0 to 1. Find the Lorenz curve for L = 100. For k in 51, 2, Á , L6, the fraction of the population with wealth k or less is: Fk = P3X … k4 = ck 1 k 1 = . cL ja cL =1 j (3.49) The proportion of wealth owned by the population that has wealth k or less is: a jpX1j2 k Wk = j=1 a ipX1i2 L i=1 = 1 k 1 j cL ja =1 j 1 L 1 i cL ia =1 i = k . L (3.50) The denominator in the above expression is the total wealth of the entire population. The Lorenz curve consists of the plot of points 1Fk , Wk2 which is shown in Fig. 3.12 for L = 100. In the graph the 70% poorest proportion of the population own only 20% of the total wealth, or conversely, the 30% wealthiest fraction of the population owns 80% of the wealth. See Problem 3.75 for a discussion of what the Lorenz curve should look like in the cases of extreme fairness and extreme unfairness. The explosive growth in the Internet has led to systems of huge scale. For probability models this growth has implied random variables that can attain very large values. Measurement studies have revealed many instances of random variables with long tail distributions. If we try to let L approach infinity in Eq. (3.45), cL grows without bound since the series does not converge. However, if we make the pmf proportional to 11/k2a then the series converges as long as a 7 1. We define the Zipf or zeta random variable with range 51, 2, 3, Á 6 to have pmf: pZ1k2 = 1 1 za ka for k = 1, 2, Á , (3.51) where za is a normalization constant given by the zeta function which is defined by: 1 1 1 za = a a = 1 + a + a + Á j 2 3 j=1 q (3.52) for a 7 1. The convergence of the above series is discussed in standard calculus books. The mean of Z is given by: L L za - 1 1 1 L 1 E3Z4 = a jpZ1j2 = a j a = a ja - 1 = z z z j a j=1 a j=1 j=1 a for a 7 2, where the sum of the sequence 1/ja - 1 converges only if a - 1 7 1, that is, a 7 2. We can similarly show that the second moment (and hence the variance) exists only if a 7 3. 3.6 GENERATION OF DISCRETE RANDOM VARIABLES Suppose we wish to generate the outcomes of a random experiment that has sample space S = 5a1 , a2 , Á , an6 with probability of elementary events pj = P35aj64. We divide the unit interval into n subintervals. The jth subinterval has length pj and 128 Chapter 3 Discrete Random Variables 1 X5 0.9 X4 0.8 0.7 X3 0.6 U 0.5 0.4 X2 0.3 0.2 0.1 0 X0 0 X1 1 2 3 4 5 x FIGURE 3.13 Generating a binomial random variable with n ⴝ 5, p ⴝ 1/2. corresponds to outcome aj . Each trial of the experiment first uses rand to obtain a number U in the unit interval. The outcome of the experiment is aj if U is in the jth subinterval. Figure 3.13 shows the portioning of the unit interval according to the pmf of an n = 5, p = 0.5 binomial random variable. The Octave function discrete_rnd implements the above method and can be used to generate random numbers with desired probabilities. Functions to generate random numbers with common distributions are also available. For example, poisson_rnd (lambda, r, c) can be used to generate an array of Poisson-distributed random numbers with rate lambda. Example 3.36 Generation of Tosses of a Die Use discrete_rnd to generate 20 samples of a toss of a die. > V=1:6; % Define SX = 51, 2, 3, 4, 5, 66. > P=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6]; % Set all the pmf values for X to 1/6. > discrete_rnd (20, V, P) % Generate 20 samples from SX with pmf P. ans = 6 2 2 6 5 2 6 1 3 6 3 1 6 3 4 2 5 3 4 1 Example 3.37 Generation of Poisson Random Variable Use the built-in function to generate 20 samples of a Poisson random variable with a = 2. > Poisson_rnd (2,1,20) % Generate a 1 * 20 array of samples of a Poisson % random variable with a = 2. ans = 4 3 0 2 3 2 1 2 1 4 0 1 2 2 3 4 0 1 3 Annotated References 129 The problems at the end of the chapter elaborate on the rich set of experiments that can be simulated using these basic capabilities of MATLAB or Octave. In the remainder of this book, we will use Octave in examples because it is freely available. SUMMARY • A random variable is a function that assigns a real number to each outcome of a random experiment. A random variable is defined if the outcome of a random experiment is a number, or if a numerical attribute of an outcome is of interest. • The notion of an equivalent event enables us to derive the probabilities of events involving a random variable in terms of the probabilities of events involving the underlying outcomes. • A random variable is discrete if it assumes values from some countable set. The probability mass function is sufficient to calculate the probability of all events involving a discrete random variable. • The probability of events involving discrete random variable X can be expressed as the sum of the probability mass function pX1x2. • If X is a random variable, then Y = g1X2 is also a random variable. • The mean, variance, and moments of a discrete random variable summarize some of the information about the random variable X. These parameters are useful in practice because they are easier to measure and estimate than the pmf. • The conditional pmf allows us to calculate the probability of events given partial information about the random variable X. • There are a number of methods for generating discrete random variables with prescribed pmf’s in terms of a random variable that is uniformly distributed in the unit interval. CHECKLIST OF IMPORTANT TERMS Discrete random variable Equivalent event Expected value of X Function of a random variable nth moment of X Probability mass function Random variable Standard deviation of X Variance of X ANNOTATED REFERENCES Reference [1] is the standard reference for electrical engineers for the material on random variables. Reference [2] discusses some of the finer points regarding the concepts of a random variable at a level accessible to students of this course. Reference [3] is a classic text, rich in detailed examples. Reference [4] presents detailed discussions of the various methods for generating random numbers with specified distributions. Reference [5] is entirely focused on discrete random variables. 1. A. Papoulis and S. U. Pillai, Probability, Random Variables, and Stochastic Processes, 4th ed., McGraw-Hill, New York, 2002. 2. K. L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974. 3. W. Feller, An Introduction to Probability Theory and Its Applications, Wiley, New York, 1968. 130 Chapter 3 Discrete Random Variables 4. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill, New York, 2000. 5. N. L. Johnson, A. W. Kemp, and S. Kotz, Univariate Discrete Distributions, Wiley, New York, 2005. 6. Y. A. Rozanov, Probability Theory: A Concise Course, Dover Publications, New York, 1969. PROBLEMS Section 3.1: The Notion of a Random Variable 3.1. Let X be the maximum of the number of heads obtained when Carlos and Michael each flip a fair coin twice. (a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. 3.2. A die is tossed and the random variable X is defined as the number of full pairs of dots in the face showing up. (a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. (d) Repeat parts a, b, and c, if Y is the number of full or partial pairs of dots in the face showing up. (e) Explain why P3X = 04 and P3Y = 04 are not equal. 3.3. The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product of x and y, where sgn(t) is 1 if t 7 0, 0 if t = 0, and -1 if t 6 0. (a) Describe the underlying space S of this random experiment and specify the probabilities of its events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. 3.4. A data source generates hexadecimal characters. Let X be the integer value corresponding to a hex character. Suppose that the four binary digits in the character are independent and each is equally likely to be 0 or 1. (a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. (d) Let Y be the integer value of a hex character but suppose that the most significant bit is three times as likely to be a “0” as a “1”. Find the probabilities for the values of Y. 3.5. Two transmitters send messages through bursts of radio signals to an antenna. During each time slot each transmitter sends a message with probability 1>2. Simultaneous transmissions result in loss of the messages. Let X be the number of time slots until the first message gets through. Problems 131 (a) Describe the underlying sample space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. 3.6. An information source produces binary triplets 5000, 111, 010, 101, 001, 110, 100, 0116 with corresponding probabilities 51/4, 1/4, 1/8, 1/8, 1/16, 1/16, 1/16, 1/166. A binary code assigns a codeword of length -log2 pk to triplet k. Let X be the length of the string assigned to the output of the information source. (a) Show the mapping from S to SX , the range of X. (b) Find the probabilities for the various values of X. 3.7. An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total amount that results when two bills are drawn from the urn without replacement. (a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. 3.8. An urn contains 9 $1 bills and one $50 bill. Let the random variable X be the total amount that results when two bills are drawn from the urn with replacement. (a) Describe the underlying space S of this random experiment and specify the probabilities of its elementary events. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. 3.9. A coin is tossed n times. Let the random variable Y be the difference between the number of heads and the number of tails in the n tosses of a coin. Assume P[heads] = p. (a) Describe the sample space of S. (b) Find the probability of the event 5Y = 06. (c) Find the probabilities for the other values of Y. 3.10. An m-bit password is required to access a system. A hacker systematically works through all possible m-bit patterns. Let X be the number of patterns tested until the correct password is found. (a) Describe the sample space of S. (b) Show the mapping from S to SX , the range of X. (c) Find the probabilities for the various values of X. Section 3.2: Discrete Random Variables and Probability Mass Function 3.11. Let X be the maximum of the coin tosses in Problem 3.1. (a) Compare the pmf of X with the pmf of Y, the number of heads in two tosses of a fair coin. Explain the difference. (b) Suppose that Carlos uses a coin with probability of heads p = 3/4. Find the pmf of X. 3.12. Consider an information source that produces binary pairs that we designate as SX = 51, 2, 3, 46. Find and plot the pmf in the following cases: (a) pk = p1/k for all k in SX . (b) pk + 1 = pk/2 for k = 2, 3, 4. 132 Chapter 3 3.13. 3.14. 3.15. 3.16. 3.17. 3.18. 3.19. 3.20. Discrete Random Variables (c) pk + 1 = pk/2 k for k = 2, 3, 4. (d) Can the random variables in parts a, b, and c be extended to take on values in the set 51, 2, Á 6? If yes, specify the pmf of the resulting random variables. If no, explain why not. Let X be a random variable with pmf pk = c/k2 for k = 1, 2, Á . (a) Estimate the value of c numerically. Note that the series converges. (b) Find P3X 7 44. (c) Find P36 … X … 84. Compare P3X Ú 84 and P3Y Ú 84 for outputs of the data source in Problem 3.4. In Problem 3.5 suppose that terminal 1 transmits with probability 1>2 in a given time slot, but terminal 2 transmits with probability p. (a) Find the pmf for the number of transmissions X until a message gets through. (b) Given a successful transmission, find the probability that terminal 2 transmitted. (a) In Problem 3.7 what is the probability that the amount drawn from the urn is more than $2? More than $50? (b) Repeat part a for Problem 3.8. A modem transmits a +2 voltage signal into a channel. The channel adds to this signal a noise term that is drawn from the set 50, -1, -2, -36 with respective probabilities 54/10, 3/10, 2/10, 1/106. (a) Find the pmf of the output Y of the channel. (b) What is the probability that the output of the channel is equal to the input of the channel? (c) What is the probability that the output of the channel is positive? A computer reserves a path in a network for 10 minutes.To extend the reservation the computer must successfully send a “refresh” message before the expiry time. However, messages are lost with probability 1>2. Suppose that it takes 10 seconds to send a refresh request and receive an acknowledgment. When should the computer start sending refresh messages in order to have a 99% chance of successfully extending the reservation time? A modem transmits over an error-prone channel, so it repeats every “0” or “1” bit transmission five times. We call each such group of five bits a “codeword.” The channel changes an input bit to its complement with probability p = 1/10 and it does so independently of its treatment of other input bits. The modem receiver takes a majority vote of the five received bits to estimate the input signal. Find the probability that the receiver makes the wrong decision. Two dice are tossed and we let X be the difference in the number of dots facing up. (a) Find and plot the pmf of X. (b) Find the probability that ƒ X ƒ … k for all k. Section 3.3: Expected Value and Moments of Discrete Random Variable 3.21. (a) In Problem 3.11, compare E[Y] to E[X] where X is the maximum of coin tosses. (b) Compare VAR[X] and VAR[Y]. 3.22. Find the expected value and variance of the output of the information sources in Problem 3.12, parts a, b, and c. 3.23. (a) Find E[X] for the hex integers in Problem 3.4. (b) Find VAR[X]. Problems 133 3.24. Find the mean codeword length in Problem 3.6. How can this average be interpreted in a very large number of encodings of binary triplets? 3.25. (a) Find the mean and variance of the amount drawn from the urn in Problem 3.7. (b) Find the mean and variance of the amount drawn from the urn in Problem 3.8. 3.26. Find E[Y] and VAR[Y] for the difference between the number of heads and tails in Problem 3.9. In a large number of repetitions of this random experiment, what is the meaning of E[Y]? 3.27. Find E[X] and VAR[X] in Problem 3.13. 3.28. Find the expected value and variance of the modem signal in Problem 3.17. 3.29. Find the mean and variance of the time that it takes to renew the reservation in Problem 3.18. 3.30. The modem in Problem 3.19 transmits 1000 5-bit codewords. What is the average number of codewords in error? If the modem transmits 1000 bits individually without repetition, what is the average number of bits in error? Explain how error rate is traded off against transmission speed. 3.31. (a) Suppose a fair coin is tossed n times. Each coin toss costs d dollars and the reward in obtaining X heads is aX2 + bX. Find the expected value of the net reward. (b) Suppose that the reward in obtaining X heads is aX, where a 7 0. Find the expected value of the reward. 3.32. Let g1X2 = IA , where A = 5X 7 106. (a) Find E[g (X)] for X as in Problem 3.12a with SX = 51, 2, Á , 156. (b) Repeat part a for X as in Problem 3.12b with SX = 51, 2, Á , 156. (c) Repeat part a for X as in Problem 3.12c with SX = 51, 2, Á , 156. 3.33. Let g1X2 = 1X - 102+ (see Example 3.19). (a) Find E[X] for X as in Problem 3.12a with SX = 51, 2, Á , 156. (b) Repeat part a for X as in Problem 3.12b with SX = 51, 2, Á , 156. (c) Repeat part a for X as in Problem 3.12c with SX = 51, 2, Á , 156. 3.34. Consider the St. Petersburg Paradox in Example 3.16. Suppose that the casino has a total of M = 2 m dollars, and so it can only afford a finite number of coin tosses. (a) How many tosses can the casino afford? (b) Find the expected payoff to the player. (c) How much should a player be willing to pay to play this game? Section 3.4: Conditional Probability Mass Function 3.35. (a) In Problem 3.11a, find the conditional pmf of X, the maximum of coin tosses, given that X 7 0. (b) Find the conditional pmf of X given that Michael got one head in two tosses. (c) Find the conditional pmf of X given that Michael got one head in the first toss. (d) In Problem 3.11b, find the probability that Carlos got the maximum given that X = 2. 3.36. Find the conditional pmf for the quaternary information source in Problem 3.12, parts a, b, and c given that X 6 4. 3.37. (a) Find the conditional pmf of the hex integer X in Problem 3.4 given that X 6 8. (b) Find the conditional pmf of X given that the first bit is 0. (c) Find the conditional pmf of X given that the 4th bit is 0. 3.38. (a) Find the conditional pmf of X in Problem 3.5 given that no message gets through in time slot 1. (b) Find the conditional pmf of X given that the first transmitter transmitted in time slot 1. 134 Chapter 3 Discrete Random Variables 3.39. (a) Find the conditional expected value of X in Problem 3.5 given that no message gets through in the first time slot. Show that E3X ƒ X 7 14 = E3X4 + 1. (b) Find the conditional expected value of X in Problem 3.5 given that a message gets through in the first time slot. (c) Find E[X] by using the results of parts a and b. (d) Find E3X24 and VAR[X] using the approach in parts b and c. 3.40. Explain why Eq. (3.31b) can be used to find E3X24, but it cannot be used to directly find VAR[X]. 3.41. (a) Find the conditional pmf for X in Problem 3.7 given that the first draw produced k dollars. (b) Find the conditional expected value corresponding to part a. (c) Find E[X] using the results from part b. (d) Find E3X24 and VAR[X] using the approach in parts b and c. 3.42. Find E[Y] and VAR[Y] for the difference between the number of heads and tails in n tosses in Problem 3.9. Hint: Condition on the number of heads. 3.43. (a) In Problem 3.10 find the conditional pmf of X given that the password has not been found after k tries. (b) Find the conditional expected value of X given X 7 k. (c) Find E[X] from the results in part b. Section 3.5: Important Discrete Random Variables 3.44. Indicate the value of the indicator function for the event A, IA1z2, for each z in the sample space S. Find the pmf and expected of IA . (a) S = 51, 2, 3, 4, 56 and A = 5z 7 36. (b) S = 30, 14 and A = 50.3 6 z … 0.76. (c) S = 5z = 1x, y2 : 0 6 x 6 1, 0 6 y 6 16 and A = 5z = 1x, y2 : 0.25 6 x + y 6 1.256. (d) S = 1- q , q 2 and A = 5z 7 a6. 3.45. Let A and B be events for a random experiment with sample space S. Show that the Bernoulli random variable satisfies the following properties: (a) IS = 1 and I = 0. (b) IA¨B = IAIB and IA´B = IA + IB - IAIB . (c) Find the expected value of the indicator functions in parts a and b. 3.46. Heat must be removed from a system according to how fast it is generated. Suppose the system has eight components each of which is active with probability 0.25, independently of the others. The design of the heat removal system requires finding the probabilities of the following events: (a) None of the systems is active. (b) Exactly one is active. (c) More than four are active. (d) More than two and fewer than six are active. 3.47. Eight numbers are selected at random from the unit interval. (a) Find the probability that the first four numbers are less than 0.25 and the last four are greater than 0.25. Problems 135 (b) Find the probability that four numbers are less than 0.25 and four are greater than 0.25. (c) Find the probability that the first three numbers are less than 0.25, the next two are between 0.25 and 0.75, and the last three are greater than 0.75. (d) Find the probability that three numbers are less than 0.25, two are between 0.25 and 0.75, and three are greater than 0.75. (e) Find the probability that the first four numbers are less than 0.25 and the last four are greater than 0.75. (f) Find the probability that four numbers are less than 0.25 and four are greater than 0.75. 3.48. (a) Plot the pmf of the binomial random variable with n = 4 and n = 5, and p = 0.10, p = 0.5, and p = 0.90. (b) Use Octave to plot the pmf of the binomial random variable with n = 100 and p = 0.10, p = 0.5, and p = 0.90. 3.49. Let X be a binomial random variable that results from the performance of n Bernoulli trials with probability of success p. (a) Suppose that X = 1. Find the probability that the single event occurred in the kth Bernoulli trial. (b) Suppose that X = 2. Find the probability that the two events occurred in the jth and kth Bernoulli trials where j 6 k. (c) In light of your answers to parts a and b in what sense are the successes distributed “completely at random” over the n Bernoulli trials? 3.50. Let X be the binomial random variable. (a) Show that pX1k + 12 n - k p = pX1k2 k + 11 - p where pX102 = 11 - p2n. (b) Show that part a implies that: (1) P3X = k4 is maximum at kmax = 31n + 12p4, where [x] denotes the largest integer that is smaller than or equal to x; and (2) when 1n + 12p is an integer, then the maximum is achieved at kmax and kmax - 1. 3.51. Consider the expression 1a + b + c2n. (a) Use the binomial expansion for 1a + b2 and c to obtain an expression for 1a + b + c2n. (b) Now expand all terms of the form 1a + b2k and obtain an expression that involves the multinomial coefficient for M = 3 mutually exclusive events, A1 , A2 , A3 . (c) Let p1 = P3A 14, p2 = P3A 24, p3 = P3A 34. Use the result from part b to show that the multinomial probabilities add to one. 3.52. A sequence of characters is transmitted over a channel that introduces errors with probability p = 0.01. (a) What is the pmf of N, the number of error-free characters between erroneous characters? (b) What is E[N]? (c) Suppose we want to be 99% sure that at least 1000 characters are received correctly before a bad one occurs. What is the appropriate value of p? 3.53. Let N be a geometric random variable with SN = 51, 2, Á 6. (a) Find P3N = k ƒ N … m4. (b) Find the probability that N is odd. 136 Chapter 3 Discrete Random Variables 3.54. Let M be a geometric random variable. Show that M satisfies the memoryless property: P3M Ú k + j ƒ M Ú j + 14 = P3M Ú k4 for all j, k 7 1. 3.55. Let X be a discrete random variable that assumes only nonnegative integer values and that satisfies the memoryless property. Show that X must be a geometric random variable. Hint: Find an equation that must be satisfied by g1m2 = P3M Ú m4. 3.56. An audio player uses a low-quality hard drive. The initial cost of building the player is $50. The hard drive fails after each month of use with probability 1/12. The cost to repair the hard drive is $20. If a 1-year warranty is offered, how much should the manufacturer charge so that the probability of losing money on a player is 1% or less? What is the average cost per player? 3.57. A Christmas fruitcake has Poisson-distributed independent numbers of sultana raisins, iridescent red cherry bits, and radioactive green cherry bits with respective averages 48, 24, and 12 bits per cake. Suppose you politely accept 1/12 of a slice of the cake. (a) What is the probability that you get lucky and get no green bits in your slice? (b) What is the probability that you get really lucky and get no green bits and two or fewer red bits in your slice? (c) What is the probability that you get extremely lucky and get no green or red bits and more than five raisins in your slice? 3.58. The number of orders waiting to be processed is given by a Poisson random variable with parameter a = l/nm, where l is the average number of orders that arrive in a day, m is the number of orders that can be processed by an employee per day, and n is the number of employees. Let l = 5 and m = 1. Find the number of employees required so the probability that more than four orders are waiting is less than 10%. What is the probability that there are no orders waiting? 3.59. The number of page requests that arrive at a Web server is a Poisson random variable with an average of 6000 requests per minute. (a) Find the probability that there are no requests in a 100-ms period. (b) Find the probability that there are between 5 and 10 requests in a 100-ms period. 3.60. Use Octave to plot the pmf of the Poisson random variable with a = 0.1, 0.75, 2, 20. 3.61. Find the mean and variance of a Poisson random variable. 3.62. For the Poisson random variable, show that for a 6 1, P3N = k4 is maximum at k = 0; for a 7 1, P3N = k4 is maximum at 3a4; and if a is a positive integer, then P3N = k4 is maximum at k = a, and at k = a - 1. Hint: Use the approach of Problem 3.50. 3.63. Compare the Poisson approximation and the binomial probabilities for k = 0, 1, 2, 3 and n = 10, p = 0.1; n = 20 and p = 0.05; and n = 100 and p = 0.01. 3.64. At a given time, the number of households connected to the Internet is a Poisson random variable with mean 50. Suppose that the transmission bit rate available for the household is 20 Megabits per second. (a) Find the probability of the distribution of the transmission bit rate per user. (b) Find the transmission bit rate that is available to a user with probability 90% or higher. (c) What is the probability that a user has a share of 1 Megabit per second or higher? 3.65. An LCD display has 1000 * 750 pixels. A display is accepted if it has 15 or fewer faulty pixels. The probability that a pixel is faulty coming out of the production line is 10 -5. Find the proportion of displays that are accepted. Problems 137 3.66. A data center has 10,000 disk drives. Suppose that a disk drive fails in a given day with probability 10 -3. (a) Find the probability that there are no failures in a given day. (b) Find the probability that there are fewer than 10 failures in two days. (c) Find the number of spare disk drives that should be available so that all failures in a day can be replaced with probability 99%. 3.67. A binary communication channel has a probability of bit error of 10-6. Suppose that transmissions occur in blocks of 10,000 bits. Let N be the number of errors introduced by the channel in a transmission block. (a) Find P3N = 04, P3N … 34. (b) For what value of p will the probability of 1 or more errors in a block be 99%? 3.68. Find the mean and variance of the uniform discrete random variable that takes on values in the set 51, 2, Á , L6 with equal probability. You will need the following formulas: ai = n i=1 n1n + 12 2 2 ai = n i=1 n1n + 1212n + 12 . 6 3.69. A voltage X is uniformly distributed in the set 5-3, Á , 3, 46. (a) Find the mean and variance of X. (b) Find the mean and variance of Y = - 2X2 + 3. (c) Find the mean and variance of W = cos1pX/82. (d) Find the mean and variance of Z = cos21pX/82. 3.70. Ten news Web sites are ranked in terms of popularity, and the frequency of requests to these sites are known to follow a Zipf distribution. (a) What is the probability that a request is for the top-ranked site? (b) What is the probability that a request is for one of the bottom five sites? 3.71. A collection of 1000 words is known to have a Zipf distribution. (a) What is the probability of the 10 top-ranked words? (b) What is the probability of the 10 lowest-ranked words? 3.72. What is the shape of the log of the Zipf probability vs. the log of the rank? 3.73. Plot the mean and variance of the Zipf random variable for L = 1 to L = 100. 3.74. An online video store has 10,000 titles. In order to provide fast response, the store caches the most popular titles. How many titles should be in the cache so that with probability 99% an arriving video request will be in the cache? 3.75. (a) Income distribution is perfectly equal if every individual has the same income. What is the Lorenz curve in this case? (b) In a perfectly unequal income distribution, one individual has all the income and all others have none. What is the Lorenz curve in this case? 3.76. Let X be a geometric random variable in the set 51, 2, Á 6. (a) Find the pmf of X. (b) Find the Lorenz curve of X. Assume L is infinite. (c) Plot the curve for p = 0.1, 0.5, 0.9. 3.77. Let X be a zeta random variable with parameter a. (a) Find an expression for P3X … k4. 138 Chapter 3 Discrete Random Variables (b) Plot the pmf of X for a = 1.5, 2, and 3. (c) Plot P3X … k4 for a = 1.5, 2, and 3. Section 3.6: Generation of Discrete Random Variables 3.78. Octave provides function calls to evaluate the pmf of important discrete random variables. For example, the function Poisson_pdf(x, lambda) computes the pmf at x for the Poisson random variable. (a) Plot the Poisson pmf for l = 0.5, 5, 50, as well as P3X … k4 and P3X 7 k4. (b) Plot the binomial pmf for n = 48 and p = 0.10, 0.30, 0.50, 0.75, as well as P3X … k4 and P3X 7 k4. (c) Compare the binomial probabilities with the Poisson approximation for n = 100, p = 0.01. 3.79. The discrete_pdf function in Octave makes it possible to specify an arbitrary pmf for a specified SX . (a) Plot the pmf for Zipf random variables with L = 10, 100, 1000, as well as P3X … k4 and P3X 7 k4. (b) Plot the pmf for the reward in the St. Petersburg Paradox for m = 20 in Problem 3.34, as well as P3X … k4 and P3X 7 k4. (You will need to use a log scale for the values of k.) 3.80. Use Octave to plot the Lorenz curve for the Zipf random variables in Problem 3.79a. 3.81. Repeat Problem 3.80 for the binomial random variable with n = 100 and p = 0.1, 0.5, and 0.9. 3.82. (a) Use the discrete_rnd function in Octave to simulate the urn experiment discussed in Section 1.3. Compute the relative frequencies of the outcomes in 1000 draws from the urn. (b) Use the discrete_pdf function in Octave to specify a pmf for a binomial random variable with n = 5 and p = 0.2. Use discrete_rnd to generate 100 samples and plot the relative frequencies. (c) Use binomial_rnd to generate the 100 samples in part b. 3.83. Use the discrete_rnd function to generate 200 samples of the Zipf random variable in Problem 3.79a. Plot the sequence of outcomes as well as the overall relative frequencies. 3.84. Use the discrete_rnd function to generate 200 samples of the St. Petersburg Paradox random variable in Problem 3.79b. Plot the sequence of outcomes as well as the overall relative frequencies. 3.85. Use Octave to generate 200 pairs of numbers, 1Xi , Yi2, in which the components are independent, and each component is uniform in the set 51, 2, Á , 9, 106. (a) Plot the relative frequencies of the X and Y outcomes. (b) Plot the relative frequencies of the random variable Z = X + Y. Can you discern the pmf of Z? (c) Plot the relative frequencies of W = XY. Can you discern the pmf of Z? (d) Plot the relative frequencies of V = X/Y. Is the pmf discernable? 3.86. Use Octave function binomial_rnd to generate 200 pairs of numbers, 1Xi , Yi2, in which the components are independent, and where Xi are binomial with parameter n = 8, p = 0.5 and Yi are binomial with parameter n = 4, p = 0.5. Problems 139 (a) Plot the relative frequencies of the X and Y outcomes. (b) Plot the relative frequencies of the random variable Z = X + Y. Does this correspond to the pmf you would expect? Explain. 3.87. Use Octave function Poisson_rnd to generate 200 pairs of numbers, 1Xi , Yi2, in which the components are independent, and where Xi are the number of arrivals to a system in one second and Yi are the number of arrivals to the system in the next two seconds. Assume that the arrival rate is five customers per second. (a) Plot the relative frequencies of the X and Y outcomes. (b) Plot the relative frequencies of the random variable Z = X + Y. Does this correspond to the pmf you would expect? Explain. Problems Requiring Cumulative Knowledge 3.88. The fraction of defective items in a production line is p. Each item is tested and defective items are identified correctly with probability a. (a) Assume nondefective items always pass the test. What is the probability that k items are tested until a defective item is identified? (b) Suppose that the identified defective items are removed. What proportion of the remaining items is defective? (c) Now suppose that nondefective items are identified as defective with probability b. Repeat part b. 3.89. A data transmission system uses messages of duration T seconds. After each message transmission, the transmitter stops and waits T seconds for a reply from the receiver.The receiver immediately replies with a message indicating that a message was received correctly. The transmitter proceeds to send a new message if it receives a reply within T seconds; otherwise, it retransmits the previous message. Suppose that messages can be completely garbled while in transit and that this occurs with probability p. Find the maximum possible rate at which messages can be successfully transmitted from the transmitter to the receiver. 3.90. An inspector selects every nth item in a production line for a detailed inspection. Suppose that the time between item arrivals is an exponential random variable with mean 1 minute, and suppose that it takes 2 minutes to inspect an item. Find the smallest value of n such that with a probability of 90% or more, the inspection is completed before the arrival of the next item that requires inspection. 3.91. The number X of photons counted by a receiver in an optical communication system is a Poisson random variable with rate l1 when a signal is present and a Poisson random variable with rate l0 6 l1 when a signal is absent. Suppose that a signal is present with probability p. (a) Find P3signal present ƒ X = k4 and P3signal absent ƒ X = k4. (b) The receiver uses the following decision rule: If P3signal present ƒ X = k4 7 P3signal absent ƒ X = k4, decide signal present; otherwise, decide signal absent. Show that this decision rule leads to the following threshold rule: If X 7 T, decide signal present; otherwise, decide signal absent. (c) What is the probability of error for the above decision rule? 140 Chapter 3 Discrete Random Variables 3.92. A binary information source (e.g., a document scanner) generates very long strings of 0’s followed by occasional 1’s. Suppose that symbols are independent and that p = P3symbol = 04 is very close to one. Consider the following scheme for encoding the run X of 0’s between consecutive 1’s: 1. If X = n, express n as a multiple of an integer M = 2 m and a remainder r, that is, find k and r such that n = kM + r, where 0 … r 6 M - 1; 2. The binary codeword for n then consists of a prefix consisting of k 0’s followed by a 1, and a suffix consisting of the m-bit representation of the remainder r. The decoder can deduce the value of n from this binary string. (a) Find the probability that the prefix has k zeros, assuming that pM = 1/2. (b) Find the average codeword length when pM = 1/2. (c) Find the compression ratio, which is defined as the ratio of the average run length to the average codeword length when pM = 1/2. CHAPTER One Random Variable 4 In Chapter 3 we introduced the notion of a random variable and we developed methods for calculating probabilities and averages for the case where the random variable is discrete. In this chapter we consider the general case where the random variable may be discrete, continuous, or of mixed type. We introduce the cumulative distribution function which is used in the formal definition of a random variable, and which can handle all three types of random variables. We also introduce the probability density function for continuous random variables. The probabilities of events involving a random variable can be expressed as integrals of its probability density function. The expected value of continuous random variables is also introduced and related to our intuitive notion of average. We develop a number of methods for calculating probabilities and averages that are the basic tools in the analysis and design of systems that involve randomness. 4.1 THE CUMULATIVE DISTRIBUTION FUNCTION The probability mass function of a discrete random variable was defined in terms of events of the form 5X = b6. The cumulative distribution function is an alternative approach which uses events of the form 5X … b6. The cumulative distribution function has the advantage that it is not limited to discrete random variables and applies to all types of random variables. We begin with a formal definition of a random variable. Definition: Consider a random experiment with sample space S and event class F. A random variable X is a function from the sample space S to R with the property that the set A b = 5z : X1z2 … b6 is in F for every b in R. The definition simply requires that every set Ab have a well defined probability in the underlying random experiment, and this is not a problem in the cases we will consider. Why does the definition use sets of the form 5z : X1z2 … b6 and not 5z : X1z2 = x b6? We will see that all events of interest in the real line can be expressed in terms of sets of the form 5z : X1z2 … b6. The cumulative distribution function (cdf) of a random variable X is defined as the probability of the event 5X … x6: FX1x2 = P3X … x4 for - q 6 x 6 + q , (4.1) 141 142 Chapter 4 One Random Variable that is, it is the probability that the random variable X takes on a value in the set 1- q , x4. In terms of the underlying sample space, the cdf is the probability of the event 5z : X1z2 … x6. The event 5X … x6 and its probability vary as x is varied; in other words, FX1x2 is a function of the variable x. The cdf is simply a convenient way of specifying the probability of all semi-infinite intervals of the real line of the form 1- q , b4. The events of interest when dealing with numbers are intervals of the real line, and their complements, unions, and intersections. We show below that the probabilities of all of these events can be expressed in terms of the cdf. The cdf has the following interpretation in terms of relative frequency. Suppose that the experiment that yields the outcome z, and hence X1z2, is performed a large number of times. FX1b2 is then the long-term proportion of times in which X1z2 … b. Before developing the general properties of the cdf, we present examples of the cdfs for three basic types of random variables. Example 4.1 Three Coin Tosses Figure 4.1(a) shows the cdf X, the number of heads in three tosses of a fair coin. From Example 3.1 we know that X takes on only the values 0, 1, 2, and 3 with probabilities 1/8, 3/8, 3/8, and 1/8, respectively, so FX1x2 is simply the sum of the probabilities of the outcomes from 50, 1, 2, 36 that are less than or equal to x.The resulting cdf is seen to be a nondecreasing staircase function that grows from 0 to 1. The cdf has jumps at the points 0, 1, 2, 3 of magnitudes 1/8, 3/8, 3/8, and 1/8, respectively. Let us take a closer look at one of these discontinuities, say, in the vicinity of x = 1. For d a small positive number, we have FX11 - d2 = P3X … 1 - d4 = P50 heads6 = 1 8 so the limit of the cdf as x approaches 1 from the left is 1/8. However, FX112 = P3X … 14 = P30 or 1 heads4 = 3 1 1 + = , 8 8 2 and furthermore the limit from the right is FX11 + d2 = P3X … 1 + d4 = P30 or 1 heads4 = FX (x) 1 . 2 fX (x) x 0 1 2 (a) 3 FIGURE 4.1 cdf (a) and pdf (b) of a discrete random variable. 3 8 3 8 1 8 0 1 1 8 2 (b) 3 x Section 4.1 The Cumulative Distribution Function 143 Thus the cdf is continuous from the right and equal to 1/2 at the point x = 1. Indeed, we note the magnitude of the jump at the point x = 1 is equal to P3X = 14 = 1/2 - 1/8 = 3/8. Henceforth we will use dots in the graph to indicate the value of the cdf at the points of discontinuity. The cdf can be written compactly in terms of the unit step function: u1x2 = b for x 6 0 for x Ú 0 , 0 1 (4.2) then FX1x2 = 3 3 1 1 u1x2 + u1x - 12 + u1x - 22 + u1x - 32. 8 8 8 8 Example 4.2 Uniform Random Variable in the Unit Interval Spin an arrow attached to the center of a circular board. Let u be the final angle of the arrow, where 0 6 u … 2p. The probability that u falls in a subinterval of 10, 2p4 is proportional to the length of the subinterval. The random variable X is defined by X1u2 = u>2p. Find the cdf of X: As u increases from 0 to 2p, X increases from 0 to 1. No outcomes u lead to values x … 0, so FX1x2 = P3X … x4 = P34 = 0 for x 6 0. For 0 6 x … 1, 5X … x6 occurs when 5u … 2px6 so FX1x2 = P3X … x4 = P35u … 2px64 = 2px/2p = x 0 6 x … 1. (4.3) Finally, for x 7 1, all outcomes u lead to 5X1u2 … 1 6 x6, therefore: FX1x2 = P3X … x4 = P30 6 u … 2p4 = 1 for x 7 1. We say that X is a uniform random variable in the unit interval. Figure 4.2(a) shows the cdf of the general uniform random variable X. We see that FX1x2 is a nondecreasing continuous function that grows from 0 to 1 as x ranges from its minimum values to its maximum values. FX (x) fX (x) 1 ba 1 x x a b (a) FIGURE 4.2 cdf (a) and pdf (b) of a continuous random variable. a b (b) 144 Chapter 4 One Random Variable Example 4.3 The waiting time X of a customer at a taxi stand is zero if the customer finds a taxi parked at the stand, and a uniformly distributed random length of time in the interval 30, 14 (in hours) if no taxi is found upon arrival. The probability that a taxi is at the stand when the customer arrives is p. Find the cdf of X. The cdf is found by applying the theorem on total probability: FX1x2 = P3X … x4 = P3X … x ƒ find taxi4p + P3X … x ƒ no taxi411 - p2. Note that P3X … x ƒ find taxi4 = 1 when x Ú 0 and 0 otherwise. Furthermore P3X … x ƒ no taxi4 is given by Eq. (4.3), therefore x 6 0 0 … x … 1 x 7 1. 0 FX1x2 = c p + 11 - p2x 1 The cdf, shown in Fig. 4.3(a), combines some of the properties of the cdf in Example 4.1 (discontinuity at 0) and the cdf in Example 4.2 (continuity over intervals). Note that FX1x2 can be expressed as the sum of a step function with amplitude p and a continuous function of x. We are now ready to state the basic properties of the cdf. The axioms of probability and their corollaries imply that the cdf has the following properties: (i) 0 … FX1x2 … 1. (ii) lim FX1x2 = 1. x: q (iii) lim FX1x2 = 0. x: -q (iv) FX1x2 is a nondecreasing function of x, that is, if a 6 b, then FX1a2 … FX1b2. (v) FX1x2 is continuous from the right, that is, for h 7 0, FX1b2 = lim FX1b + h2 h:0 = FX1b+2. These five properties confirm that, in general, the cdf is a nondecreasing function that grows from 0 to 1 as x increases from - q to q . We already observed these properties in Examples 4.1, 4.2, and 4.3. Property (v) implies that at points of discontinuity, the cdf 1 FX (x) fX (x) p 1⫺p p x 0 1 (a) FIGURE 4.3 cdf (a) and pdf (b) of a random variable of mixed type. x 0 1 (b) Section 4.1 The Cumulative Distribution Function 145 is equal to the limit from the right. We observed this property in Examples 4.1 and 4.3. In Example 4.2 the cdf is continuous for all values of x, that is, the cdf is continuous both from the right and from the left for all x. The cdf has the following properties which allow us to calculate the probability of events involving intervals and single values of X: (vi) P3a 6 X … b4 = FX1b2 - FX1a2. (vii) P3X = b4 = FX1b2 - FX1b-2. (viii) P3X 7 x4 = 1 - FX1x2. Property (vii) states that the probability that X = b is given by the magnitude of the jump of the cdf at the point b. This implies that if the cdf is continuous at a point b, then P3X = b4 = 0. Properties (vi) and (vii) can be combined to compute the probabilities of other types of intervals. For example, since 5a … X … b6 = 5X = a6 ´ 5a 6 X … b6, then P3a … X … b4 = P3X = a4 + P3a 6 X … b4 = FX1a2 - FX1a -2 + FX1b2 - FX1a2 = FX1b2 - FX1a -2. (4.4) If the cdf is continuous at the endpoints of an interval, then the endpoints have zero probability, and therefore they can be included in, or excluded from, the interval without affecting the probability. Example 4.4 Let X be the number of heads in three tosses of a fair coin. Use the cdf to find the probability of the events A = 51 6 X … 26, B = 50.5 … X 6 2.56, and C = 51 … X 6 26. From property (vi) and Fig. 4.1 we have P31 6 X … 24 = FX122 - FX112 = 7/8 - 1/2 = 3/8. The cdf is continuous at x = 0.5 and x = 2.5, so P30.5 … X 6 2.54 = FX12.52 - FX10.52 = 7/8 - 1/8 = 6/8. Since 51 … X 6 26 ´ 5X = 26 = 51 … X … 26, from Eq. (4.4) we have P51 … X 6 24 + P3X = 24 = FX122 - FX11-2, and using property (vii) for P3X = 24: P51 … X 6 24 = FX122 - FX11-2 - P3X = 24 = FX122 - FX11-2 - 1FX122 - FX12 -22 = FX12 -2 - FX11-2 = 4/8 - 1/8 = 3/8. Example 4.5 Let X be the uniform random variable from Example 4.2. Use the cdf to find the probability of the events 5-0.5 6 X 6 0.256, 50.3 6 X 6 0.656, and 5 ƒ X - 0.4 ƒ 7 0.26. 146 Chapter 4 One Random Variable The cdf of X is continuous at every point so we have: P3-0.5 6 X … 0.254 = FX10.252 - FX1-0.52 = 0.25 - 0 = 0.25, P30.3 6 X 6 0.654 = FX10.652 - FX10.32 = 0.65 - 0.3 = 0.35, P3 ƒ X - 0.4 ƒ 7 0.24 = P35X 6 0.26 ´ 5X 7 0.64 = P3X 6 0.24 + P3X 7 0.64 = FX10.22 + 11 - FX10.622 = 0.2 + 0.4 = 0.6. We now consider the proof of the properties of the cdf. • Property (i) follows from the fact that the cdf is a probability and hence must satisfy Axiom I and Corollary 2. • To obtain property (iv), we note that the event 5X … a6 is a subset of 5X … b6, and so it must have smaller or equal probability (Corollary 7). • To show property (vi), we note that 5X … b6 can be expressed as the union of mutually exclusive events: 5X … a6 ´ 5a 6 X … b6 = 5X … b6, and so by Axiom III, FX1a2 + P3a 6 X … b4 = FX1b2. • Property (viii) follows from 5X 7 x6 = 5X … x6c and Corollary 1. While intuitively clear, properties (ii), (iii), (v), and (vii) require more advanced limiting arguments that are discussed at the end of this section. 4.1.1 The Three Types of Random Variables The random variables in Examples 4.1, 4.2, and 4.3 are typical of the three most basic types of random variable that we are interested in. Discrete random variables have a cdf that is a right-continuous, staircase function of x, with jumps at a countable set of points x0 , x1 , x2 , Á . The random variable in Example 4.1 is a typical example of a discrete random variable. The cdf FX1x2 of a discrete random variable is the sum of the probabilities of the outcomes less than x and can be written as the weighted sum of unit step functions as in Example 4.1: FX1x2 = a pX1xk2 = a pX1xk2u1x - xk2, xk … x (4.5) k where the pmf pX1xk2 = P3X = xk4 gives the magnitude of the jumps in the cdf. We see that the pmf can be obtained from the cdf and vice versa. A continuous random variable is defined as a random variable whose cdf FX1x2 is continuous everywhere, and which, in addition, is sufficiently smooth that it can be written as an integral of some nonnegative function f(x): FX1x2 = x L- q f1t2 dt. (4.6) The random variable discussed in Example 4.2 can be written as an integral of the function shown in Fig. 4.2(b). The continuity of the cdf and property (vii) implies that continuous Section 4.1 The Cumulative Distribution Function 147 random variables have P3X = x4 = 0 for all x. Every possible outcome has probability zero! An immediate consequence is that the pmf cannot be used to characterize the probabilities of X. A comparison of Eqs. (4.5) and (4.6) suggests how we can proceed to characterize continuous random variables. For discrete random variables, (Eq. 4.5), we calculate probabilities as summations of probability masses at discrete points. For continuous random variables, (Eq. 4.6), we calculate probabilities as integrals of “probability densities” over intervals of the real line. A random variable of mixed type is a random variable with a cdf that has jumps on a countable set of points x0 , x1 , x2 , Á , but that also increases continuously over at least one interval of values of x. The cdf for these random variables has the form FX1x2 = pF11x2 + 11 - p2F21x2, where 0 6 p 6 1, and F11x2 is the cdf of a discrete random variable and F21x2 is the cdf of a continuous random variable. The random variable in Example 4.3 is of mixed type. Random variables of mixed type can be viewed as being produced by a two-step process: A coin is tossed; if the outcome of the toss is heads, a discrete random variable is generated according to F11x2; otherwise, a continuous random variable is generated according to F21x2. *4.1.2 Fine Point: Limiting properties of cdf Properties (ii), (iii), (v), and (vii) require the continuity property of the probability function discussed in Section 2.9. For example, for property (ii), we consider the sequence of events 5X … n6 which increases to include all of the sample space S as n approaches q , that is, all outcomes lead to a value of X less than infinity. The continuity property of the probability function (Corollary 8) implies that: lim FX1n2 = lim P3X … n4 = P3 lim 5X … n64 = P3S4 = 1. n: q n: q n: q For property (iii), we take the sequence 5X … -n6 which decreases to the empty set , that is, no outcome leads to a value of X less than - q : lim FX1-n2 = lim P3X … -n4 = P3 lim 5X … -n64 = P34 = 0. n: q n: q n: q For property (v), we take the sequence of events 5X … x + 1/n6 which decreases to 5X … x6 from the right: lim FX1x + 1/n2 = lim P3X … x + 1/n4 n: q n: q = P3 lim 5X … x + 1/n64 = P35X … x64 = FX1x2. n: q Finally, for property (vii), we take the sequence of events, 5b - 1/n 6 X … b6 which decreases to 5b6 from the left: lim 1FX1b2 - FX1b - 1/n22 = lim P3b - 1/n 6 X … b4 n: q n: q = P3 lim 5b - 1/n 6 X … b64 = P3X = b4. n: q 148 4.2 Chapter 4 One Random Variable THE PROBABILITY DENSITY FUNCTION The probability density function of X (pdf), if it exists, is defined as the derivative of FX1x2: fX1x2 = dFX1x2 dx . (4.7) In this section we show that the pdf is an alternative, and more useful, way of specifying the information contained in the cumulative distribution function. The pdf represents the “density” of probability at the point x in the following sense: The probability that X is in a small interval in the vicinity of x—that is, 5x 6 X … x + h6—is P3x 6 X … x + h4 = FX1x + h2 - FX1x2 = FX1x + h2 - FX1x2 h h. (4.8) If the cdf has a derivative at x, then as h becomes very small, P3x 6 X … x + h4 M fX1x2h. (4.9) Thus fX1x2 represents the “density” of probability at the point x in the sense that the probability that X is in a small interval in the vicinity of x is approximately fX1x2h. The derivative of the cdf, when it exists, is positive since the cdf is a nondecreasing function of x, thus (i) fX1x2 Ú 0. (4.10) Equations (4.9) and (4.10) provide us with an alternative approach to specifying the probabilities involving the random variable X. We can begin by stating a nonnegative function fX1x2, called the probability density function, which specifies the probabilities of events of the form “X falls in a small interval of width dx about the point x,” as shown in Fig. 4.4(a). The probabilities of events involving X are then expressed in terms of the pdf by adding the probabilities of intervals of width dx. As the widths of the intervals approach zero, we obtain an integral in terms of the pdf. For example, the probability of an interval [a, b] is b (4.11) fX1x2 dx. La The probability of an interval is therefore the area under fX1x2 in that interval, as shown in Fig. 4.4(b). The probability of any event that consists of the union of disjoint intervals can thus be found by adding the integrals of the pdf over each of the intervals. The cdf of X can be obtained by integrating the pdf: (ii) P3a … X … b4 = (iii) FX1x2 = x (4.12) fX1t2 dt. L- q In Section 4.1, we defined a continuous random variable as a random variable X whose cdf was given by Eq. (4.12). Since the probabilities of all events involving X can be written in terms of the cdf, it then follows that these probabilities can be written in Section 4.2 149 The Probability Density Function fX (x) fX (x) x a x x  dx P关x  X  x  dx兴 ⬵ fX (x)dx x b P关a  X  b兴  兰ab fX (x)dx (a) (b) FIGURE 4.4 (a) The probability density function specifies the probability of intervals of infinitesimal width. (b) The probability of an interval [a, b] is the area under the pdf in that interval. terms of the pdf. Thus the pdf completely specifies the behavior of continuous random variables. By letting x tend to infinity in Eq. (4.12), we obtain a normalization condition for pdf’s: +q (iv) 1 = L- q fX1t2 dt. (4.13) The pdf reinforces the intuitive notion of probability as having attributes similar to “physical mass.” Thus Eq. (4.11) states that the probability “mass” in an interval is the integral of the “density of probability mass” over the interval. Equation (4.13) states that the total mass available is one unit. A valid pdf can be formed from any nonnegative, piecewise continuous function g(x) that has a finite integral: q L- q g1x2 dx = c 6 q . (4.14) By letting fX1x2 = g1x2/c, we obtain a function that satisfies the normalization condition. Note that the pdf must be defined for all real values of x; if X does not take on values from some region of the real line, we simply set fX1x2 = 0 in the region. Example 4.6 Uniform Random Variable The pdf of the uniform random variable is given by: 1 fX1x2 = c b - a 0 a … x … b x 6 a and x 7 b (4.15a) 150 Chapter 4 One Random Variable and is shown in Fig. 4.2(b). The cdf is found from Eq. (4.12): x 6 a 0 x - a FX1x2 = d b - a 1 a … x … b (4.15b) x 7 b. The cdf is shown in Fig. 4.2(a). Example 4.7 Exponential Random Variable The transmission time X of messages in a communication system has an exponential distribution: P3X 7 x4 = e -lx x 7 0. Find the cdf and pdf of X. The cdf is given by FX1x2 = 1 - P3X 7 x4 FX1x2 = b x 6 0 x Ú 0. 0 1 - e -lx (4.16a) The pdf is obtained by applying Eq. (4.7): œ fX1x2 = F X 1x2 = b Example 4.8 x 6 0 x Ú 0. 0 le -lx (4.16b) Laplacian Random Variable The pdf of the samples of the amplitude of speech waveforms is found to decay exponentially at a rate a, so the following pdf is proposed: fX1x2 = ce -aƒxƒ - q 6 x 6 q. (4.17) Find the constant c, and then find the probability P3 ƒ X ƒ 6 v4. We use the normalization condition in (iv) to find c: q 1 = L- q ce -aƒxƒ dx = 2 L0 q ce -ax dx = 2c . a Therefore c = a/2. The probability P[ ƒ X ƒ 6 v] is found by integrating the pdf: v P3 ƒ X ƒ 6 v4 = 4.2.1 v a a e -aƒxƒ dx = 2 a b e -ax dx = 1 - e -av. 2 L-v 2 L0 pdf of Discrete Random Variables The derivative of the cdf does not exist at points where the cdf is not continuous. Thus the notion of pdf as defined by Eq. (4.7) does not apply to discrete random variables at the points where the cdf is discontinuous. We can generalize the definition of the Section 4.2 The Probability Density Function 151 probability density function by noting the relation between the unit step function and the delta function. The unit step function is defined as u1x2 = b x 6 0 x Ú 0. 0 1 (4.18a) The delta function d1t2 is related to the unit step function by the following equation: x u1x2 = L- q (4.18b) d1t2 dt. A translated unit step function is then: u1x - x02 = x - x0 L- q x L- q d1t2 dt = d1t¿ - x02 dt¿. (4.18c) Substituting Eq. (4.18c) into the cdf of a discrete random variables: FX1x2 = a pX1xk2u1x - xk2 = a pX1xk2 k k x = L- q x L- q d1t - xk2 dt a pX1xk2d1t - xk2 dt. (4.19) k This suggests that we define the pdf for a discrete random variable by fX1x2 = d F 1x2 = a pX1xk2d1x - xk2. dx X k (4.20) Thus the generalized definition of pdf places a delta function of weight P3X = xk4 at the points xk where the cdf is discontinuous. To provide some intuition on the delta function, consider a narrow rectangular pulse of unit area and width ¢ centered at t = 0: p¢1t2 = b - ¢/2 … t … ¢/2 ƒ t ƒ 7 ¢. 1/¢ 0 Consider the integral of p¢(t): x x L- q p¢1t2 dt = e L- q x L- q p¢1t2 dt = p¢1t2 dt = x L- q 0 dt = 0 for x 6 - ¢/2 u : u1x2. (4.21) ¢/2 L-¢/2 1/¢ dt = 1 for x 7 ¢/2 As ¢ : 0, we see that the integral of the narrow pulse approaches the unit step function. For this reason, we visualize the delta function d1t2 as being zero everywhere 152 Chapter 4 One Random Variable except at x = 0 where it is unbounded. The above equation does not apply at the value x = 0. To maintain the right continuity in Eq. (4.18a), we use the convention: 0 u102 = 1 = L- q d1t2 dt. If we replace p¢1t2 in the above derivation with g1t2p¢1t2, we obtain the “sifting” property of the delta function: q g102 = L- q g1x02 = g1t2d1t2 dt and q L- q g1t2d1t - x02 dt. (4.22) The delta function is viewed as sifting through x and picking out the value of g at the point where the delta functions is centered, that is, g1x02 for the expression on the right. The pdf for the discrete random variable discussed in Example 4.1 is shown in Fig. 4.1(b). The pdf of a random variable of mixed type will also contain delta functions at the points where its cdf is not continuous. The pdf for the random variable discussed in Example 4.3 is shown in Fig. 4.3(b). Example 4.9 Let X be the number of heads in three coin tosses as in Example 4.1. Find the pdf of X. Find P31 6 X … 24 and P32 … X 6 34 by integrating the pdf. In Example 4.1 we found that the cdf of X is given by FX1x2 = 3 3 1 1 u1x2 + u1x - 12 + u1x - 22 + u1x - 32. 8 8 8 8 It then follows from Eqs. (4.18) and (4.19) that fX1x2 = 1 3 3 1 d1x2 + d1x - 12 + d1x - 22 + d1x - 32. 8 8 8 8 When delta functions appear in the limits of integration, we must indicate whether the delta functions are to be included in the integration. Thus in P31 6 X … 24 = P3X in 11, 244, the delta function located at 1 is excluded from the integral and the delta function at 2 is included: P31 6 X … 24 = 2+ fX1x2 dx = 3 . 8 3- fX1x2 dx = 3 . 8 L1+ Similarly, we have that P32 … X 6 34 = 4.2.2 L2- Conditional cdf’s and pdf’s Conditional cdf’s can be defined in a straightforward manner using the same approach we used for conditional pmf’s. Suppose that event C is given and that P3C4 7 0. The conditional cdf of X given C is defined by FX1x ƒ C2 = P35X … x6 ¨ C4 P3C4 if P3C4 7 0. (4.23) Section 4.2 The Probability Density Function 153 It is easy to show that FX1x ƒ C2 satisfies all the properties of a cdf. (See Problem 4.29.) The conditional pdf of X given C is then defined by fX1x ƒ C2 = d F 1x ƒ C2. dx X (4.24) Example 4.10 The lifetime X of a machine has a continuous cdf FX1x2. Find the conditional cdf and pdf given the event C = 5X 7 t6 (i.e., “machine is still working at time t”). The conditional cdf is FX1x ƒ X 7 t2 = P3X … x ƒ X 7 t4 = P35X … x6 ¨ 5X 7 t64 P3X 7 t4 . The intersection of the two events in the numerator is equal to the empty set when x 6 t and to 5t 6 X … x6 when x Ú t. Thus FX1x ƒ X 7 t2 = c 0 FX1x2 - FX1t2 1 - FX1t2 x … t x 7 t. The conditional pdf is found by differentiating with respect to x: fX1x ƒ X 7 t2 = fX1x2 1 - FX1t2 x Ú t. Now suppose that we have a partition of the sample space S into the union of disjoint events B1 , B2 , Á , Bn . Let FX1x ƒ Bi2 be the conditional cdf of X given event Bi . The theorem on total probability allows us to find the cdf of X in terms of the conditional cdf’s: FX1x2 = P3X … x4 = a P3X … x ƒ Bi4P3Bi4 = a FX1x ƒ Bi2P3Bi4. n n i=1 i=1 (4.25) The pdf is obtained by differentiation: fX1x2 = n d FX1x2 = a fX1x ƒ Bi2P3Bi4. dx i=1 (4.26) Example 4.11 A binary transmission system sends a “0” bit by transmitting a -v voltage signal, and a “1” bit by transmitting a +v. The received signal is corrupted by Gaussian noise and given by: Y = X + N where X is the transmitted signal, and N is a noise voltage with pdf fN1x2. Assume that P3“1”4 = p = 1 - P3“0”4. Find the pdf of Y. 154 Chapter 4 One Random Variable Let B0 be the event “0” is transmitted and B1 be the event “1” is transmitted, then B0 , B1 form a partition, and FY1x2 = FY1x ƒ B023B04 + FY1x ƒ B123B14 = P3Y … x ƒ X = -v411 - p2 + P3Y … x ƒ X = v4p. Since Y = X + N, the event 5Y 6 x ƒ X = v6 is equivalent to 5v + N 6 x6 and 5N 6 x - v6, and the event 5Y 6 x ƒ X = -v6 is equivalent to 5N 6 x + v6. Therefore the conditional cdf’s are: FY1x ƒ B02 = P3N … x + v4 = FN1x + v2 and FY1x ƒ B12 = P3N … x - v4 = FN1x - v2. The cdf is: FY1x2 = FN1x + v211 - p2 + FN1x - v2p. The pdf of N is then: fY1x2 = = d FY1x2 dx d d FN1x + v211 - p2 + FN1x - v2p dx dx = fN1x + v211 - p2 + fN1x - v2p. The Gaussian random variable has pdf: fN1x2 = 1 22ps 2 2 e -x /2s 2 - q 6 x 6 q. The conditional pdfs are: fY1x ƒ B02 = fN1x + v2 = 22ps fN(x  v) fN(x  v) 1 2 2 2 e -1x + v2 /2s x ⫺v 0 FIGURE 4.5 The conditional pdfs given the input signal v Section 4.3 The Expected Value of X 155 and fY1x ƒ B12 = fN1x - v2 = 1 22ps 2 2 2 e -1x - v2 /2s . The pdf of the received signal Y is then: fY1x2 = 1 22ps e -1x + v2 /2s 11 - p2 + 2 2 2 1 22ps 2 2 2 e -1x - v2 /2s p. Figure 4.5 shows the two conditional pdfs. We can see that the transmitted signal X shifts the center of mass of the Gaussian pdf. 4.3 THE EXPECTED VALUE OF X We discussed the expected value for discrete random variables in Section 3.3, and found that the sample mean of independent observations of a random variable approaches E3X4. Suppose we perform a series of such experiments for continuous random variables. Since continuous random variables have P3X = x4 = 0 for any specific value of x, we divide the real line into small intervals and count the number of times Nk1n2 the observations fall in the interval 5xk 6 X 6 xk + ¢6. As n becomes large, then the relative frequency fk1n2 = Nk1n2/n will approach fX1xk2¢, the probability of the interval. We calculate the sample mean in terms of the relative frequencies and let n : q : 8X9n = a xkfk1n2 : a xkfX1xk2¢. k k The expression on the right-hand side approaches an integral as we decrease ¢. The expected value or mean of a random variable X is defined by +q (4.27) tfX1t2 dt. L- q The expected value E[X] is defined if the above integral converges absolutely, that is, E3X4 = E3 ƒ X ƒ 4 = +q ƒ t ƒ fX1t2 dt 6 q. L- q If we view fX1x2 as the distribution of mass on the real line, then E[X] represents the center of mass of this distribution. We already discussed E[X] for discrete random variables in detail, but it is worth noting that the definition in Eq. (4.27) is applicable if we express the pdf of a discrete random variable using delta functions: +q E3X4 = L- q t a pX1xk2d1t - xk2 dt k = a pX1xk2 k +q L- q = a pX1xk2xk . k t a d1t - xk2 dt k 156 Chapter 4 One Random Variable Example 4.12 Mean of a Uniform Random Variable The mean for a uniform random variable is given by E3X4 = 1b - a2-1 La b t dt = a + b , 2 which is exactly the midpoint of the interval [a, b]. The results shown in Fig. 3.6 were obtained by repeating experiments in which outcomes were random variables Y and X that had uniform cdf’s in the intervals 3-1, 14 and [3, 7], respectively. The respective expected values, 0 and 5, correspond to the values about which X and Y tend to vary. The result in Example 4.12 could have been found immediately by noting that E3X4 = m when the pdf is symmetric about a point m. That is, if fX1m - x2 = fX1m + x2 for all x, then, assuming that the mean exists, +q 0 = L- q 1m - t2fX1t2 dt = m - +q L- q tfX1t2 dt. The first equality above follows from the symmetry of fX1t2 about t = m and the odd symmetry of 1m - t2 about the same point. We then have that E3X4 = m. Example 4.13 Mean of a Gaussian Random Variable The pdf of a Gaussian random variable is symmetric about the point x = m. Therefore E3X4 = m. The following expressions are useful when X is a nonnegative random variable: E3X4 = and L0 q 11 - FX1t22 dt E3X4 = a P3X 7 k4 if X continuous and nonnegative (4.28) q if X nonnegative, integer-valued. (4.29) k=0 The derivation of these formulas is discussed in Problem 4.47. Example 4.14 Mean of Exponential Random Variable The time X between customer arrivals at a service station has an exponential distribution. Find the mean interarrival time. Substituting Eq. (4.17) into Eq. (4.27) we obtain E3X4 = L0 q tle -lt dt. Section 4.3 The Expected Value of X 157 We evaluate the integral using integration by parts 1 1 udv = uv - 1 vdu2, with u = t and dv = le -lt dt: E3X4 = -te -lt ` q + 0 L0 q e -lt dt q = lim te -lt - 0 + b t: q -e -lt r l 0 1 1 -e -lt + = , t: q l l l = lim where we have used the fact that e -lt and te -lt go to zero as t approaches infinity. For this example, Eq. (4.28) is much easier to evaluate: E3X4 = L0 q e -lt dt = 1 . l Recall that l is the customer arrival rate in customers per second. The result that the mean interarrival time E3X4 = 1/l seconds per customer then makes sense intuitively. 4.3.1 The Expected Value of Y ⴝ g1X2 Suppose that we are interested in finding the expected value of Y = g1X2. As in the case of discrete random variables (Eq. (3.16)), E[Y] can be found directly in terms of the pdf of X: q E3Y4 = L- q g1x2fX1x2 dx. (4.30) To see how Eq. (4.30) comes about, suppose that we divide the y-axis into intervals of length h, we index the intervals with the index k and we let yk be the value in the center of the kth interval. The expected value of Y is approximated by the following sum: E3Y4 M a ykfY1yk2h. k Suppose that g(x) is strictly increasing, then the kth interval in the y-axis has a unique corresponding equivalent event of width hk in the x-axis as shown in Fig. 4.6. Let xk be the value in the kth interval such that g1xk2 = yk , then since fY1yk2h = fX1xk2hk , E3Y4 M a g1xk2fX1xk2hk . k By letting h approach zero, we obtain Eq. (4.30). This equation is valid even if g(x) is not strictly increasing. 158 Chapter 4 One Random Variable y  g(x) yk h hk xk x FIGURE 4.6 Two infinitesimal equivalent events. Example 4.15 Expected Values of a Sinusoid with Random Phase Let Y = a cos1vt + ®2 where a, v, and t are constants, and ® is a uniform random variable in the interval 10, 2p2. The random variable Y results from sampling the amplitude of a sinusoid with random phase ®. Find the expected value of Y and expected value of the power of Y, Y2. E3Y4 = E3a cos1vt + ®24 = L0 2p a cos1vt + u2 2p du = -a sin1vt + u2 ` 2p 0 = -a sin1vt + 2p2 + a sin1vt2 = 0. The average power is E3Y24 = E3a2 cos21vt + ®24 = E B = a2 a2 + 2 2 L0 2p cos12vt + u2 a2 a2 + cos12vt + 2®2 R 2 2 du a2 = . 2p 2 Note that these answers are in agreement with the time averages of sinusoids: the time average (“dc” value) of the sinusoid is zero; the time-average power is a2/2. Section 4.3 159 The Expected Value of X Example 4.16 Expected Values of the Indicator Function Let g1X2 = IC1X2 be the indicator function for the event 5X in C6, where C is some interval or union of intervals in the real line: g1X2 = b 0 1 X not in C X in C, then +q L- q E3Y4 = g1X2fX1x2 dx = LC fX1x2 dx = P3X in C4. Thus the expected value of the indicator of an event is equal to the probability of the event. It is easy to show that Eqs. (3.17a)–(3.17e) hold for continuous random variables using Eq. (4.30). For example, let c be some constant, then q L- q E3c4 = cfX1x2 dx = c q L- q fX1x2 dx = c (4.31) and q E3cX4 = L- q cxfX1x2 dx = c q L- q xfX1x2 dx = cE3X4. (4.32) The expected value of a sum of functions of a random variable is equal to the sum of the expected values of the individual functions: E3Y4 = E B a gk1X2 R n k=1 = L- a gk1x2fX1x2 dx = a q q n n q k=1 L -q k=1 gk1x2fX1x2 dx = a E3gk1X24. n (4.33) k=1 Example 4.17 Let Y = g1X2 = a0 + a1X + a2X2 + Á + anXn, where ak are constants, then E3Y4 = E3a04 + E3a1X4 + Á + E3anXn4 = a0 + a1E3X4 + a2E3X24 + Á + anE3Xn4, where we have used Eq. (4.33), and Eqs. (4.31) and (4.32). A special case of this result is that E3X + c4 = E3X4 + c, that is, we can shift the mean of a random variable by adding a constant to it. 160 4.3.2 Chapter 4 One Random Variable Variance of X The variance of the random variable X is defined by VAR3X4 = E31X - E3X4224 = E3X24 - E3X42 (4.34) The standard deviation of the random variable X is defined by STD3X4 = VAR3X41/2. (4.35) Example 4.18 Variance of Uniform Random Variable Find the variance of the random variable X that is uniformly distributed in the interval [a, b]. Since the mean of X is 1a + b2/2, b VAR3X4 = Let y = 1x - 1a + b2/22, a + b 2 1 ax b dx. b - a La 2 1b - a2/2 1b - a22 1 y2 dy = . b - a L-1b - a2/2 12 VAR3X4 = The random variables in Fig. 3.6 were uniformly distributed in the interval 3-1, 14 and [3, 7], respectively. Their variances are then 1/3 and 4/3. The corresponding standard deviations are 0.577 and 1.155. Example 4.19 Variance of Gaussian Random Variable Find the variance of a Gaussian random variable. First multiply the integral of the pdf of X by 22p s to obtain q L- q e -1x - m2 /2s dx = 22p s. 2 Differentiate both sides with respect to s: q L- q ¢ 1x - m22 s3 2 ≤ e -1x - m2 /2s dx = 22p. 2 2 By rearranging the above equation, we obtain q 1x - m22e -1x - m2 /2s dx = s2. 22p s L- q This result can also be obtained by direct integration. (See Problem 4.46.) Figure 4.7 shows the Gaussian pdf for several values of s; it is evident that the “width” of the pdf increases with s. VAR3X4 = 1 2 2 The following properties were derived in Section 3.3: VAR3c4 = 0 (4.36) VAR3X + c4 = VAR3X4 2 VAR3cX4 = c VAR3X4, where c is a constant. (4.37) (4.38) Section 4.3 The Expected Value of X 161 fX(x) 1 .9 .8 .7 .6 s .5 1 2 .4 .3 s1 .2 .1 0 m4 m2 m x m2 m4 FIGURE 4.7 Probability density function of Gaussian random variable. The mean and variance are the two most important parameters used in summarizing the pdf of a random variable. Other parameters are occasionally used. For example, the skewness defined by E31X - E3X4234/STD3X43 measures the degree of asymmetry about the mean. It is easy to show that if a pdf is symmetric about its mean, then its skewness is zero. The point to note with these parameters of the pdf is that each involves the expected value of a higher power of X. Indeed we show in a later section that, under certain conditions, a pdf is completely specified if the expected values of all the powers of X are known. These expected values are called the moments of X. The nth moment of the random variable X is defined by E3Xn4 = q (4.39) xnfX1x2 dx. L- q The mean and variance can be seen to be defined in terms of the first two moments, E3X4 and E3X24. *Example 4.20 Analog-to-Digital Conversion: A Detailed Example A quantizer is used to convert an analog signal (e.g., speech or audio) into digital form. A quantizer maps a random voltage X into the nearest point q(X) from a set of 2 R representation values as shown in Fig. 4.8(a). The value X is then approximated by q(X), which is identified by an R-bit binary number. In this manner, an “analog” voltage X that can assume a continuum of values is converted into an R-bit number. The quantizer introduces an error Z = X - q1X2 as shown in Fig. 4.8(b). Note that Z is a function of X and that it ranges in value between -d/2 and d/2, where d is the quantizer step size. Suppose that X has a uniform distribution in the interval 3-xmax , xmax4, that the quantizer has 2 R levels, and that 2xmax = 2 Rd. It is easy to show that Z is uniformly distributed in the interval 3-d/2, d/24 (see Problem 4.93). 162 Chapter 4 One Random Variable 7d 2 4d 5d 2 3d 3d 2 2d d q(x) 0 ⫺d ⫺2d ⫺3d d 2 4d 3d 2d d ⫺ 7d 2 d 3d ⫺ ⫺ 2 2 5d ⫺ 2 fX(x)  4d d 2 0 3d 2d d x 0 d 2d 3d 1 8d x  q(x) d 2d 3d 4d x d ⫺ 2 4d ⫺4d (a) (b) FIGURE 4.8 (a) A uniform quantizer maps the input x into the closest point from the set 5;d/2, ;3d/2, ;5d/2, ;7d/26. (b) The uniform quantizer error for the input x is x - q1x2. Therefore from Example 4.12, E3Z4 = The error Z thus has mean zero. By Example 4.18, VAR3Z4 = d/2 - d/2 = 0. 2 1d/2 - 1-d/2222 12 = d2 . 12 This result is approximately correct for any pdf that is approximately flat over each quantizer interval. This is the case when 2 R is large. The approximation q(x) can be viewed as a “noisy” version of X since Q1X2 = X - Z, where Z is the quantization error Z. The measure of goodness of a quantizer is specified by the SNR ratio, which is defined as the ratio of the variance of the “signal” X to the variance of the distortion or “noise” Z: VAR3X4 VAR3X4 SNR = = VAR3Z4 d2/12 = VAR3X4 x2max/3 2 2R, where we have used the fact that d = 2xmax/2 R. When X is nonuniform, the value xmax is selected so that P3 ƒ X ƒ 7 xmax4 is small. A typical choice is xmax = 4 STD3X4. The SNR is then SNR = 3 2R 2 . 16 This important formula is often quoted in decibels: SNR dB = 10 log10 SNR = 6R - 7.3 dB. Section 4.4 Important Continuous Random Variables 163 The SNR increases by a factor of 4 (6 dB) with each additional bit used to represent X. This makes sense since each additional bit doubles the number of quantizer levels, which in turn reduces the step size by a factor of 2. The variance of the error should then be reduced by the square of this, namely 2 2 = 4. 4.4 IMPORTANT CONTINUOUS RANDOM VARIABLES We are always limited to measurements of finite precision, so in effect, every random variable found in practice is a discrete random variable. Nevertheless, there are several compelling reasons for using continuous random variable models. First, in general, continuous random variables are easier to handle analytically. Second, the limiting form of many discrete random variables yields continuous random variables. Finally, there are a number of “families” of continuous random variables that can be used to model a wide variety of situations by adjusting a few parameters. In this section we continue our introduction of important random variables. Table 4.1 lists some of the more important continuous random variables. 4.4.1 The Uniform Random Variable The uniform random variable arises in situations where all values in an interval of the real line are equally likely to occur.The uniform random variable U in the interval [a, b] has pdf: 1 fU1x2 = c b - a 0 a … x … b (4.40) x 6 a and x 7 b and cdf 0 x - a FU1x2 = d b - a 1 x 6 a a … x … b (4.41) x 7 b. See Figure 4.2. The mean and variance of U are given by: E3U4 = a + b 2 and VAR3X4 = 1b - a22 2 . (4.42) The uniform random variable appears in many situations that involve equally likely continuous random variables. Obviously U can only be defined over intervals that are finite in length. We will see in Section 4.9 that the uniform random variable plays a crucial role in generating random variables in computer simulation models. 4.4.2 The Exponential Random Variable The exponential random variable arises in the modeling of the time between occurrence of events (e.g., the time between customer demands for call connections), and in the modeling of the lifetime of devices and systems. The exponential random variable X with parameter l has pdf 164 Chapter 4 One Random Variable TABLE 4.1 Continuous random variables. Uniform Random Variable SX = 3a, b4 fX1x2 = 1 b - a a … x … b E3X4 = a + b 2 VAR3X4 = 1b - a22 12 £ X1v2 = ejvb - ejva jv1b - a2 Exponential Random Variable SX = 30, q 2 fX1x2 = le -lx x Ú 0 and l 7 0 1 l 1 E3X4 = £ X1v2 = VAR3X4 = 2 l l - jv l Remarks: The exponential random variable is the only continuous random variable with the memoryless property. Gaussian (Normal) Random Variable SX = 1- q , + q 2 fX1x2 = 2 2 e -1x - m2 /2s - q 6 x 6 + q and s 7 0 22ps 2 2 £ X1v2 = ejmv - s v /2 E3X4 = m VAR3X4 = s2 Remarks: Under a wide range of conditions X can be used to approximate the sum of a large number of independent random variables. Gamma Random Variable SX = 10, + q 2 fX1x2 = l1lx2a - 1e -lx x 7 0 and a 7 0, l 7 0 ≠1a2 where ≠1z2 is the gamma function (Eq. 4.56). VAR3X4 = a/l2 E3X4 = a/l £ X1v2 = 1 11 - jv/l2a Special Cases of Gamma Random Variable m–1 Erlang Random Variable: a = m, a positive integer fX1x2 = le -lx1lx2m - 2 1m - 12! x 7 0 £ X1v2 = a m 1 b 1 - jv/l Remarks: An m–1 Erlang random variable is obtained by adding m independent exponentially distributed random variables with parameter l. Chi-Square Random Variable with k degrees of freedom: a = k/2, k a positive integer, and l = 1/2 fX1x2 = x1k - 22/2e -x/2 2 k/2 ≠1k/22 x 7 0 £ X1v2 = a k/2 1 b 1 - 2jv Remarks: The sum of k mutually independent, squared zero-mean, unit-variance Gaussian random variables is a chi-square random variable with k degrees of freedom. Section 4.4 Important Continuous Random Variables TABLE 4.1 Continuous random variables. Laplacian Random Variable SX = 1- q , q 2 a fX1x2 = e -aƒxƒ 2 -q 6 x 6 +q VAR3X4 = 2/a2 E3X4 = 0 and a 7 0 £ X1v2 = a2 2 v + a2 Rayleigh Random Variable SX = [0, q 2 fX1x2 = x 2 a2 2 e -x /2a E3X4 = a2p/2 x Ú 0 and a 7 0 VAR3X4 = 12 - p/22a2 Cauchy Random Variable SX = 1- q , + q 2 fX1x2 = a/p x2 + a2 -q 6 x 6 +q and a 7 0 £ X1v2 = e -aƒvƒ Mean and variance do not exist. Pareto Random Variable SX = 3xm , q 2xm 7 0. x 6 xm 0 fX1x2 = c a E3X4 = xam x Ú xm xa + 1 axm a - 1 for a 7 1 VAR3X4 = ax2m 1a - 221a - 122 for a 7 2 Remarks: The Pareto random variable is the most prominent example of random variables with “long tails,” and can be viewed as a continuous version of the Zipf discrete random variable. Beta Random Variable ≠1a + b2 a - 1 x 11 - x2b - 1 fX1x2 = c ≠1a2 ≠1b2 0 E[X] = a a + b VAR3X4 = 0 6 x 6 1 and a 7 0, b 7 0 otherwise ab 1a + b221a + b + 12 Remarks: The beta random variable is useful for modeling a variety of pdf shapes for random variables that range over finite intervals. 165 166 Chapter 4 One Random Variable fX1x2 = b x 6 0 x Ú 0 0 le -lx (4.43) and cdf FX1x2 = b 0 1 - e -lx x 6 0 x Ú 0. (4.44) The cdf and pdf of X are shown in Fig. 4.9. The parameter l is the rate at which events occur, so in Eq. (4.44) the probability of an event occurring by time x increases at the rate l increases. Recall from Example 3.31 that the interarrival times between events in a Poisson process (Fig. 3.10) is an exponential random variable. The mean and variance of X are given by: E3U4 = 1 l and VAR3X4 = 1 . l2 (4.45) In event interarrival situations, l is in units of events/second and 1/l is in units of seconds per event interarrival. The exponential random variable satisfies the memoryless property: P3X 7 t + h ƒ X 7 t4 = P3X 7 h4. (4.46) The expression on the left side is the probability of having to wait at least h additional seconds given that one has already been waiting t seconds. The expression on the right side is the probability of waiting at least h seconds when one first begins to wait. Thus the probability of waiting at least an additional h seconds is the same regardless of how long one has already been waiting! We see later in the book that the memoryless property of the exponential random variable makes it the cornerstone for the theory of fX(x) FX(x) 1 1  elx lelx x x 0 0 (a) (b) FIGURE 4.9 An example of a continuous random variable—the exponential random variable. Part (a) is the cdf and part (b) is the pdf. Section 4.4 Important Continuous Random Variables 167 Markov chains, which is used extensively in evaluating the performance of computer systems and communications networks. We now prove the memoryless property: P3X 7 t + h ƒ X 7 t4 = = P35X 7 t + h6 ¨ 5X 7 t64 P3X 7 t4 P3X 7 t + h4 P3X 7 t4 = e -lh for h 7 0 e -l1t + h2 e -lt = = P3X 7 h4. It can be shown that the exponential random variable is the only continuous random variable that satisfies the memoryless property. Examples 2.13, 2.28, and 2.30 dealt with the exponential random variable. 4.4.3 The Gaussian (Normal) Random Variable There are many situations in manmade and in natural phenomena where one deals with a random variable X that consists of the sum of a large number of “small” random variables. The exact description of the pdf of X in terms of the component random variables can become quite complex and unwieldy. However, one finds that under very general conditions, as the number of components becomes large, the cdf of X approaches that of the Gaussian (normal) random variable.1 This random variable appears so often in problems involving randomness that it has come to be known as the “normal” random variable. The pdf for the Gaussian random variable X is given by fX1x2 = - q 6 x 6 q, (4.47) 22ps where m and s 7 0 are real numbers, which we showed in Examples 4.13 and 4.19 to be the mean and standard deviation of X. Figure 4.7 shows that the Gaussian pdf is a “bellshaped” curve centered and symmetric about m and whose “width” increases with s. The cdf of the Gaussian random variable is given by 1 2 2 e -1x - m2 /2s P3X … x4 = x 22ps L- q 1 2 2 e -1x¿ - m2 /2s dx¿. (4.48) The change of variable t = 1x¿ - m2/s results in FX1x2 = 1x - m2/s 22p L- q 1 = £a 2 e -t /2 dt x - m b s (4.49) where £1x2 is the cdf of a Gaussian random variable with m = 0 and s = 1: £1x2 = 1 x 22p L- q 1 2 e -t /2 dt. This result, called the central limit theorem, will be discussed in Chapter 7. (4.50) 168 Chapter 4 One Random Variable Therefore any probability involving an arbitrary Gaussian random variable can be expressed in terms of £1x2. Example 4.21 Show that the Gaussian pdf integrates to one. Consider the square of the integral of the pdf: B q 22p L- q 1 q q 2 2 e -x /2 dx R = 1 2 2 e -y /2 dy e -x /2 dx 2p L- q L- q q q 1 2 2 = e -1x + y 2/2 dx dy. 2p L- q L- q Let x = r cos u and y = r sin u and carry out the change from Cartesian to polar coordinates, then we obtain: q q 2p 1 2 2 e -r /2r dr du = re -r /2 dr 2p L0 L0 L0 = 3-e -r /240 q 2 = 1. In electrical engineering it is customary to work with the Q-function, which is defined by Q1x2 = 1 - £1x2 (4.51) = 22p Lx 1 q 2 e -t /2 dt. (4.52) Q(x) is simply the probability of the “tail” of the pdf. The symmetry of the pdf implies that Q102 = 1/2 and Q1-x2 = 1 - Q1x2. (4.53) The integral in Eq. (4.50) does not have a closed-form expression. Traditionally the integrals have been evaluated by looking up tables that list Q(x) or by using approximations that require numerical evaluation [Ross]. The following expression has been found to give good accuracy for Q(x) over the entire range 0 6 x 6 q : Q1x2 M B 1 11 - a2x + a2x + b 2 R 22p 1 2 e -x /2, (4.54) where a = 1/p and b = 2p [Gallager]. Table 4.2 shows Q(x) and the value given by the above approximation. In some problems, we are interested in finding the value of x for which Q1x2 = 10-k. Table 4.3 gives these values for k = 1, Á , 10. The Gaussian random variable plays a very important role in communication systems, where transmission signals are corrupted by noise voltages resulting from the thermal motion of electrons. It can be shown from physical principles that these voltages will have a Gaussian pdf. Section 4.4 Important Continuous Random Variables 169 TABLE 4.2 Comparison of Q(x) and approximation given by Eq. (4.54). x Q(x) Approximation x Q(x) Approximation 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 5.00E-01 4.60E-01 4.21E-01 3.82E-01 3.45E-01 3.09E-01 2.74E-01 2.42E-01 2.12E-01 1.84E-01 1.59E-01 1.36E-01 1.15E-01 9.68E-02 8.08E-02 6.68E-02 5.48E-02 4.46E-02 3.59E-02 2.87E-02 2.28E-02 1.79E-02 1.39E-02 1.07E-02 8.20E-03 6.21E-03 4.66E-03 5.00E-01 4.58E-01 4.17E-01 3.78E-01 3.41E-01 3.05E-01 2.71E-01 2.39E-01 2.09E-01 1.82E-01 1.57E-01 1.34E-01 1.14E-01 9.60E-02 8.01E-02 6.63E-02 5.44E-02 4.43E-02 3.57E-02 2.86E-02 2.26E-02 1.78E-02 1.39E-02 1.07E-02 8.17E-03 6.19E-03 4.65E-03 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0 3.47E-03 2.56E-03 1.87E-03 1.35E-03 9.68E-04 6.87E-04 4.83E-04 3.37E-04 2.33E-04 1.59E-04 1.08E-04 7.24E-05 4.81E-05 3.17E-05 3.40E-06 2.87E-07 1.90E-08 9.87E-10 4.02E-11 1.28E-12 3.19E-14 6.22E-16 9.48E-18 1.13E-19 1.05E-21 7.62E-24 3.46E-03 2.55E-03 1.86E-03 1.35E-03 9.66E-04 6.86E-04 4.83E-04 3.36E-04 2.32E-04 1.59E-04 1.08E-04 7.23E-05 4.81E-05 3.16E-05 3.40E-06 2.87E-07 1.90E-08 9.86E-10 4.02E-11 1.28E-12 3.19E-14 6.22E-16 9.48E-18 1.13E-19 1.05E-21 7.62E-24 Example 4.22 A communication system accepts a positive voltage V as input and outputs a voltage Y = aV + N, where a = 10 -2 and N is a Gaussian random variable with parameters m = 0 and s = 2. Find the value of V that gives P3Y 6 04 = 10-6. The probability P3Y 6 04 is written in terms of N as follows: P3Y 6 04 = P3aV + N 6 04 = P3N 6 -aV4 = £ a -aV aV b = Qa b = 10-6. s s From Table 4.3 we see that the argument of the Q-function should be aV/s = 4.753. Thus V = 14.7532s/a = 950.6. 170 Chapter 4 One Random Variable Q1x2 = 10-k TABLE 4.3 4.4.4 k x = Q ⴚ1110 ⴚk2 1 1.2815 2 3 4 5 6 7 8 9 10 2.3263 3.0902 3.7190 4.2649 4.7535 5.1993 5.6120 5.9978 6.3613 The Gamma Random Variable The gamma random variable is a versatile random variable that appears in many applications. For example, it is used to model the time required to service customers in queueing systems, the lifetime of devices and systems in reliability studies, and the defect clustering behavior in VLSI chips. The pdf of the gamma random variable has two parameters, a 7 0 and l 7 0, and is given by l1lx2a - 1e -lx 0 6 x 6 q, (4.55) fX1x2 = ≠1a2 where ≠1z2 is the gamma function, which is defined by the integral ≠1z2 = L0 q xz - 1e -x dx z 7 0. (4.56) The gamma function has the following properties: 1 ≠a b = 2p, 2 ≠1z + 12 = z≠1z2 ≠1m + 12 = m! for z 7 0, and for m a nonnegative integer. The versatility of the gamma random variable is due to the richness of the gamma function ≠1z2. The pdf of the gamma random variable can assume a variety of shapes as shown in Fig. 4.10. By varying the parameters a and l it is possible to fit the gamma pdf to many types of experimental data. In addition, many random variables are special cases of the gamma random variable. The exponential random variable is obtained by letting a = 1. By letting l = 1/2 and a = k/2, where k is a positive integer, we obtain the chi-square random variable, which appears in certain statistical problems. The m-Erlang random variable is obtained when a = m, a positive integer. The m-Erlang random variable is used in the system reliability models and in queueing systems models. Both of these random variables are discussed in later examples. Section 4.4 fX (x) 1.5 1.4 1.3 1.2 1.1 1 .9 .8 .7 .6 .5 .4 .3 .2 .1 0 Important Continuous Random Variables 171 l⫽1 1 a⫽ 2 a⫽1 a⫽2 0 1 2 x 3 4 FIGURE 4.10 Probability density function of gamma random variable. Example 4.23 Show that the pdf of a gamma random variable integrates to one. The integral of the pdf is L0 q fX1x2 dx = L0 q l1lx2a - 1e -lx ≠1a2 dx q = la xa - 1e -lx dx. ≠1a2 L0 Let y = lx, then dx = dy/l and the integral becomes q la ya - 1e -y dy = 1, ≠1a2la L0 where we used the fact that the integral equals ≠1a2. In general, the cdf of the gamma random variable does not have a closed-form expression. We will show that the special case of the m-Erlang random variable does have a closed-form expression for the cdf by using its close interrelation with the exponential and Poisson random variables. The cdf can also be obtained by integration of the pdf (see Problem 4.74). Consider once again the limiting procedure that was used to derive the Poisson random variable. Suppose that we observe the time Sm that elapses until the occurrence of the mth event. The times X1 , X2 , Á , Xm between events are exponential random variables, so we must have Sm = X1 + X2 + Á + Xm . 172 Chapter 4 One Random Variable We will show that Sm is an m-Erlang random variable. To find the cdf of Sm , let N(t) be the Poisson random variable for the number of events in t seconds. Note that the mth event occurs before time t—that is, Sm … t—if and only if m or more events occur in t seconds, namely N1t2 Ú m. The reasoning goes as follows. If the mth event has occurred before time t, then it follows that m or more events will occur in time t. On the other hand, if m or more events occur in time t, then it follows that the mth event occurred by time t. Thus FSm1t2 = P3Sm … t4 = P3N1t2 Ú m4 (4.57) m - 1 1lt2k = 1 - a k! k=0 e -lt, (4.58) where we have used the result of Example 3.31. If we take the derivative of the above cdf, we finally obtain the pdf of the m-Erlang random variable. Thus we have shown that Sm is an m-Erlang random variable. Example 4.24 A factory has two spares of a critical system component that has an average lifetime of 1/l = 1 month. Find the probability that the three components (the operating one and the two spares) will last more than 6 months. Assume the component lifetimes are exponential random variables. The remaining lifetime of the component in service is an exponential random variable with rate l by the memoryless property. Thus, the total lifetime X of the three components is the sum of three exponential random variables with parameter l = 1. Thus X has a 3-Erlang distribution with l = 1. From Eq. (4.58) the probability that X is greater than 6 is P3X 7 64 = 1 - P3X … 64 2 6k = a e -6 = .06197. k = 0 k! 4.4.5 The Beta Random Variable The beta random variable X assumes values over a closed interval and has pdf: fX1x2 = cxa - 111 - x2b - 1 for 0 6 x 6 1 (4.59) where the normalization constant is the reciprocal of the beta function 1 1 = B1a, b2 = xa - 111 - x2b - 1 dx c L0 and where the beta function is related to the gamma function by the following expression: B1a, b2 = ≠1a2≠1b2 ≠1a + b2 . When a = b = 1, we have the uniform random variable. Other choices of a and b give pdfs over finite intervals that can differ markedly from the uniform. See Problem 4.75. If Section 4.4 Important Continuous Random Variables 173 a = b 7 1, then the pdf is symmetric about x = 1/2 and is concentrated about x = 1/2 as well.When a = b 6 1, then the pdf is symmetric but the density is concentrated at the edges of the interval. When a 6 b (or a 7 b) the pdf is skewed to the right (or left). The mean and variance are given by: E3X4 = a a + b and VAR3X4 = ab . 1a + b2 1a + b + 12 2 (4.60) The versatility of the pdf of the beta random variable makes it useful to model a variety of behaviors for random variables that range over finite intervals. For example, in a Bernoulli trial experiment, the probability of success p could itself be a random variable. The beta pdf is frequently used to model p. 4.4.6 The Cauchy Random Variable The Cauchy random variable X assumes values over the entire real line and has pdf: fX1x2 = 1/p . 1 + x2 (4.61) It is easy to verify that this pdf integrates to 1. However, X does not have any moments since the associated integrals do not converge. The Cauchy random variable arises as the tangent of a uniform random variable in the unit interval. 4.4.7 The Pareto Random Variable The Pareto random variable arises in the study of the distribution of wealth where it has been found to model the tendency for a small portion of the population to own a large portion of the wealth. Recently the Pareto distribution has been found to capture the behavior of many quantities of interest in the study of Internet behavior, e.g., sizes of files, packet delays, audio and video title preferences, session times in peer-to-peer networks, etc. The Pareto random variable can be viewed as a continuous version of the Zipf discrete random variable. The Pareto random variable X takes on values in the range x 7 xm , where xm is a positive real number. X has complementary cdf with shape parameter a 7 0 given by: 1 x 6 xm a (4.62) x P3X 7 x4 = c m x Ú xm . a x The tail of X decays algebraically with x which is rather slower in comparison to the exponential and Gaussian random variables. The Pareto random variable is the most prominent example of random variables with “long tails.” The cdf and pdf of X are: 0 a FX1x2 = c 1 - xm xa x 6 xm x Ú xm . (4.63) 174 Chapter 4 One Random Variable Because of its long tail, the cdf of X approaches 1 rather slowly as x increases. x 6 xm 0 fX1x2 = c a xam xa + 1 (4.64) x Ú xm . Example 4.25 Mean and Variance of Pareto Random Variable Find the mean and variance of the Pareto random variable. q E3X4 = Lxm ta xam t dt = a+1 q Lxm a xam axm xam a dt = = a a 1 t a - 1 xm a - 1 for a 7 1 (4.65) where the integral is defined for a 7 1, and E3X24 = q Lxm t 2a xam t dt = a+1 q Lxm a xam t a-1 dt = ax2m xam a = a 2 a - 2 xm a - 2 for a 7 2 where the second moment is defined for a 7 2. The variance of X is then: VAR3X4 = 4.5 ax2m 2 ax2m ax2m - ¢ ≤ = a - 2 a - 1 1a - 221a - 122 for a 7 2. (4.66) FUNCTIONS OF A RANDOM VARIABLE Let X be a random variable and let g(x) be a real-valued function defined on the real line. Define Y = g1X2, that is, Y is determined by evaluating the function g(x) at the value assumed by the random variable X. Then Y is also a random variable. The probabilities with which Y takes on various values depend on the function g(x) as well as the cumulative distribution function of X. In this section we consider the problem of finding the cdf and pdf of Y. Example 4.26 Let the function h1x2 = 1x2+ be defined as follows: 1x2+ = b 0 x if x 6 0 if x Ú 0. For example, let X be the number of active speakers in a group of N speakers, and let Y be the number of active speakers in excess of M, then Y = 1X - M2+. In another example, let X be a voltage input to a halfwave rectifier, then Y = 1X2+ is the output. Section 4.5 Functions of a Random Variable 175 Example 4.27 Let the function q(x) be defined as shown in Fig. 4.8(a), where the set of points on the real line are mapped into the nearest representation point from the set SY = 5-3.5d, -2.5d, -1.5d, -0.5d, 0.5d, 1.5d, 2.5d, 3.5d6. Thus, for example, all the points in the interval (0, d) are mapped into the point d/2. The function q(x) represents an eight-level uniform quantizer. Example 4.28 Consider the linear function c1x2 = ax + b, where a and b are constants. This function arises in many situations. For example, c(x) could be the cost associated with the quantity x, with the constant a being the cost per unit of x, and b being a fixed cost component. In a signal processing context, c1x2 = ax could be the amplified version (if a 7 1) or attenuated version (if a 6 1) of the voltage x. The probability of an event C involving Y is equal to the probability of the equivalent event B of values of X such that g(X) is in C: P3Y in C4 = P3g1X2 in C4 = P3X in B4. Three types of equivalent events are useful in determining the cdf and pdf of Y = g1X2: (1) The event 5g1X2 = yk6 is used to determine the magnitude of the jump at a point yk where the cdf of Y is known to have a discontinuity; (2) the event 5g1X2 … y6 is used to find the cdf of Y directly; and (3) the event 5y 6 g1X2 … y + h6 is useful in determining the pdf of Y. We will demonstrate the use of these three methods in a series of examples. The next two examples demonstrate how the pmf is computed in cases where Y = g1X2 is discrete. In the first example, X is discrete. In the second example, X is continuous. Example 4.29 Let X be the number of active speakers in a group of N independent speakers. Let p be the probability that a speaker is active. In Example 2.39 it was shown that X has a binomial distribution with parameters N and p. Suppose that a voice transmission system can transmit up to M voice signals at a time, and that when X exceeds M, X - M randomly selected signals are discarded. Let Y be the number of signals discarded, then Y = 1X - M2+. Y takes on values from the set SY = 50, 1, Á , N - M6. Y will equal zero whenever X is less than or equal to M, and Y will equal k 7 0 when X is equal to M + k. Therefore P3Y = 04 = P3X in 50, 1, Á , M64 = a pj M j=0 and P3Y = k4 = P3X = M + k4 = pM + k where pj is the pmf of X. 0 6 k … N - M, 176 Chapter 4 One Random Variable Example 4.30 Let X be a sample voltage of a speech waveform, and suppose that X has a uniform distribution in the interval 3-4d, 4d4. Let Y = q1X2, where the quantizer input-output characteristic is as shown in Fig. 4.10. Find the pmf for Y. The event 5Y = q6 for q in SY is equivalent to the event 5X in Iq6, where Iq is an interval of points mapped into the representation point q. The pmf of Y is therefore found by evaluating P3Y = q4 = fX1t2 dt. LIq It is easy to see that the representation point has an interval of length d mapped into it. Thus the eight possible outputs are equiprobable, that is, P3Y = q4 = 1/8 for q in SY . In Example 4.30, each constant section of the function q(X) produces a delta function in the pdf of Y. In general, if the function g(X) is constant during certain intervals and if the pdf of X is nonzero in these intervals, then the pdf of Y will contain delta functions. Y will then be either discrete or of mixed type. The cdf of Y is defined as the probability of the event 5Y … y6. In principle, it can always be obtained by finding the probability of the equivalent event 5g1X2 … y6 as shown in the next examples. Example 4.31 A Linear Function Let the random variable Y be defined by Y = aX + b, where a is a nonzero constant. Suppose that X has cdf FX1x2, then find FY1y2. The event 5Y … y6 occurs when A = 5aX + b … y6 occurs. If a 7 0, then A = 5X … (y - b2/a6 (see Fig. 4.11), and thus FY1y2 = PcX … y - b y - b d = FX a b a a a 7 0. On the other hand, if a 6 0, then A = 5X Ú 1y - b2/a6, and FY1y2 = Pc X Ú y - b y - b d = 1 - FX a b a a a 6 0. We can obtain the pdf of Y by differentiating with respect to y. To do this we need to use the chain rule for derivatives: dF du dF = , dy du dy where u is the argument of F. In this case, u = 1y - b2/a, and we then obtain fY1y2 = y - b 1 fX a b a a a 7 0 Section 4.5 Functions of a Random Variable y Y⫽ aX ⫹ 177 b {Y  y} {X  y⫺b } a x y⫺b a FIGURE 4.11 The equivalent event for 5Y … y6 is the event 5X … 1y - b2/a6, if a 7 0. and fY1y2 = y - b 1 f a b -a X a a 6 0. The above two results can be written compactly as y - b 1 fX a b. a ƒaƒ fY1y2 = (4.67) Example 4.32 A Linear Function of a Gaussian Random Variable Let X be a random variable with a Gaussian pdf with mean m and standard deviation s: fX1x2 = 1 22p s 2 2 e -1x - m2 /2s - q 6 x 6 q. (4.68) Let Y = aX + b, then find the pdf of Y. Substitution of Eq. (4.68) into Eq. (4.67) yields fY1y2 = 1 22p ƒ as ƒ 2 2 e -1y - b - am2 /21as2 . Note that Y also has a Gaussian distribution with mean b + am and standard deviation ƒ a ƒ s. Therefore a linear function of a Gaussian random variable is also a Gaussian random variable. Example 4.33 Let the random variable Y be defined by Y = X 2, where X is a continuous random variable. Find the cdf and pdf of Y. 178 Chapter 4 One Random Variable Y ⫽ X2 兵Y  y其 冑y 冑y FIGURE 4.12 The equivalent event for 5Y … y6 is the event 5- 1y … X … 1y6, if y Ú 0. The event 5Y … y6 occurs when 5X2 … y6 or equivalently when 5- 1y … X … 1y6 for y nonnegative; see Fig. 4.12. The event is null when y is negative. Thus FY1y2 = b 0 FX11y2 - FX1- 1y2 and differentiating with respect to y, fY1y2 = = fX11y2 21y fX11y2 21y + fX1- 1y2 y 7 0 -21y fX1- 1y2 21y y 6 0 y 7 0 . (4.69) Example 4.34 A Chi-Square Random Variable Let X be a Gaussian random variable with mean m = 0 and standard deviation s = 1. X is then said to be a standard normal random variable. Let Y = X2. Find the pdf of Y. Substitution of Eq. (4.68) into Eq. (4.69) yields fY1y2 = e -y/2 22yp y Ú 0. (4.70) From Table 4.1 we see that fY1y2 is the pdf of a chi-square random variable with one degree of freedom. The result in Example 4.33 suggests that if the equation y0 = g1x2 has n solutions, x0 , x1 , Á , xn , then fY1y02 will be equal to n terms of the type on the right-hand Section 4.5 Functions of a Random Variable 179 y ⫽ g(x) y ⫹ dy y x1 x1  dx1 x2  dx2 x2 x3 x3  dx3 FIGURE 4.13 The equivalent event of 5y 6 Y 6 y + dy6 is 5x1 6 X 6 x1 + dx16 ´ 5x2 + dx2 6 X 6 x26 ´ 5x3 6 X 6 x3 + dx36. side of Eq. (4.69). We now show that this is generally true by using a method for directly obtaining the pdf of Y in terms of the pdf of X. Consider a nonlinear function Y = g1X2 such as the one shown in Fig. 4.13. Consider the event Cy = 5y 6 Y 6 y + dy6 and let By be its equivalent event. For y indicated in the figure, the equation g1x2 = y has three solutions x1 , x2 , and x3 , and the equivalent event By has a segment corresponding to each solution: By = 5x1 6 X 6 x1 + dx16 ´ 5x2 + dx2 6 X 6 x26 ´ 5x3 6 X 6 x3 + dx36. The probability of the event Cy is approximately P3Cy4 = fY1y2 ƒ dy ƒ , (4.71) where ƒ dy ƒ is the length of the interval y 6 Y … y + dy. Similarly, the probability of the event By is approximately P3By4 = fX1x12 ƒ dx1 ƒ + fX1x22 ƒ dx2 ƒ + fX1x32 ƒ dx3 ƒ . (4.72) Since Cy and By are equivalent events, their probabilities must be equal. By equating Eqs. (4.71) and (4.72) we obtain fX1x2 fY1y2 = a ` k ƒ dy>dx ƒ x = xk dx = a fX1x2 ` `` dy k (4.73) . (4.74) x = xk It is clear that if the equation g1x2 = y has n solutions, the expression for the pdf of Y at that point is given by Eqs. (4.73) and (4.74), and contains n terms. 180 Chapter 4 One Random Variable Example 4.35 Let Y = X2 as in Example 4.34. For y Ú 0, the equation y = x2 has two solutions, x0 = 1y and x1 = - 1y, so Eq. (4.73) has two terms. Since dy/dx = 2x, Eq. (4.73) yields fY1y2 = fX11y2 21y + fX1- 1y2 2 1y . This result is in agreement with Eq. (4.69). To use Eq. (4.74), we note that 1 dx d , = ; 1y = ; dy dy 2 1y which when substituted into Eq. (4.74) then yields Eq. (4.69) again. Example 4.36 Amplitude Samples of a Sinusoidal Waveform Let Y = cos1X2, where X is uniformly distributed in the interval 10, 2p]. Y can be viewed as the sample of a sinusoidal waveform at a random instant of time that is uniformly distributed over the period of the sinusoid. Find the pdf of Y. It can be seen in Fig. 4.14 that for -1 6 y 6 1 the equation y = cos1x2 has two solutions in the interval of interest, x0 = cos-11y2 and x1 = 2p - x0 . Since (see an introductory calculus textbook) dy ` = -sin1x02 = -sin1cos-11y22 = - 21 - y2 , dx x0 and since fX1x2 = 1/2p in the interval of interest, Eq. (4.73) yields fY1y2 = = 1 2p21 - y 2 + 1 1 2p21 - y2 for -1 6 y 6 1. p21 - y2 1 Y ⫽ cos X 0.5 y 0 cos⫺1(y) p ⫺0.5 ⫺1 FIGURE 4.14 y = cos x has two roots in the interval 10, 2p2. 2p ⫺cos⫺1y 2p x Section 4.6 The Markov and Chebyshev Inequalities 181 The cdf of Y is found by integrating the above: y 6 -1 0 sin-1y 1 FY1y2 = d + p 2 1 -1 … y … 1 y 7 1. Y is said to have the arcsine distribution. 4.6 THE MARKOV AND CHEBYSHEV INEQUALITIES In general, the mean and variance of a random variable do not provide enough information to determine the cdf/pdf. However, the mean and variance of a random variable X do allow us to obtain bounds for probabilities of the form P3 ƒ X ƒ Ú t4. Suppose first that X is a nonnegative random variable with mean E 3X4. The Markov inequality then states that E3X4 (4.75) for X nonnegative. P3X Ú a4 … a We obtain Eq. (4.75) as follows: E3X4 = L0 a tfX1t2 dt + La q tfX1t2 dt Ú La q tfX1t2 dt q afX1t2 dt = aP3X Ú a4. La The first inequality results from discarding the integral from zero to a; the second inequality results from replacing t with the smaller number a. Ú Example 4.37 The mean height of children in a kindergarten class is 3 feet, 6 inches. Find the bound on the probability that a kid in the class is taller than 9 feet.The Markov inequality gives P3H Ú 94 … 42/108 = .389. The bound in the above example appears to be ridiculous. However, a bound, by its very nature, must take the worst case into consideration. One can easily construct a random variable for which the bound given by the Markov inequality is exact. The reason we know that the bound in the above example is ridiculous is that we have knowledge about the variability of the children’s height about their mean. Now suppose that the mean E3X4 = m and the variance VAR3X4 = s2 of a random variable are known, and that we are interested in bounding P3 ƒ X - m ƒ Ú a4. The Chebyshev inequality states that P3 ƒ X - m ƒ Ú a4 … s2 . a2 (4.76) 182 Chapter 4 One Random Variable The Chebyshev inequality is a consequence of the Markov inequality. Let D2 = 1X - m22 be the squared deviation from the mean. Then the Markov inequality applied to D2 gives P3D2 Ú a24 … E31X - m224 a 2 = s2 . a2 Equation (4.76) follows when we note that 5D Ú a26 and 5 ƒ X - m ƒ Ú a6 are equivalent events. Suppose that a random variable X has zero variance; then the Chebyshev inequality implies that 2 P3X = m4 = 1, (4.77) that is, the random variable is equal to its mean with probability one. In other words, X is equal to the constant m in almost all experiments. Example 4.38 The mean response time and the standard deviation in a multi-user computer system are known to be 15 seconds and 3 seconds, respectively. Estimate the probability that the response time is more than 5 seconds from the mean. The Chebyshev inequality with m = 15 seconds, s = 3 seconds, and a = 5 seconds gives P3 ƒ X - 15 ƒ Ú 54 … 9 = .36. 25 Example 4.39 If X has mean m and variance s2, then the Chebyshev inequality for a = ks gives 1 . k2 Now suppose that we know that X is a Gaussian random variable, then for k = 2, P3 ƒ X - m ƒ Ú 2s4 = .0456, whereas the Chebyshev inequality gives the upper bound .25. P3 ƒ X - m ƒ Ú ks4 … Example 4.40 Chebyshev Bound Is Tight Let the random variable X have P3X = - v4 = P3X = v4 = 0.5. The mean is zero and the variance is VAR3X4 = E3X24 = 1-v22 0.5 + v2 0.5 = v2. Note that P3 ƒ X ƒ Ú v4 = 1. The Chebyshev inequality states: P3 ƒ X ƒ Ú v4 … 1 - VAR3X4 = 1. v2 We see that the bound and the exact value are in agreement, so the bound is tight. Section 4.6 The Markov and Chebyshev Inequalities 183 We see from Example 4.38 that for certain random variables, the Chebyshev inequality can give rather loose bounds. Nevertheless, the inequality is useful in situations in which we have no knowledge about the distribution of a given random variable other than its mean and variance. In Section 7.2, we will use the Chebyshev inequality to prove that the arithmetic average of independent measurements of the same random variable is highly likely to be close to the expected value of the random variable when the number of measurements is large. Problems 4.100 and 4.101 give examples of this result. If more information is available than just the mean and variance, then it is possible to obtain bounds that are tighter than the Markov and Chebyshev inequalities. Consider the Markov inequality again. The region of interest is A = 5t Ú a6, so let IA1t2 be the indicator function, that is, IA1t2 = 1 if t H A and IA1t2 = 0 otherwise. The key step in the derivation is to note that t/a Ú 1 in the region of interest. In effect we bounded IA1t2 by t/a as shown in Fig. 4.15. We then have: P3X Ú a4 = L0 q IA1t2fX1t2 dt … q E3X4 t fX1t2 dt = . a a L0 By changing the upper bound on IA1t2, we can obtain different bounds on P3X Ú a4. Consider the bound IA1t2 … es1t - a2, also shown in Fig. 4.15, where s 7 0. The resulting bound is: P3X Ú a4 = L0 q = e -sa IA1t2fX1t2 dt … L0 q L0 q es1t - a2fX1t2 dt estfX1t2 dt = e -saE3esX4. (4.78) This bound is called the Chernoff bound, which can be seen to depend on the expected value of an exponential function of X. This function is called the moment generating function and is related to the transforms that are introduced in the next section. We develop the Chernoff bound further in the next section. es(t  a) 0 a FIGURE 4.15 Bounds on indicator function for A = 5t Ú a6. 184 4.7 Chapter 4 One Random Variable TRANSFORM METHODS In the old days, before calculators and computers, it was very handy to have logarithm tables around if your work involved performing a large number of multiplications. If you wanted to multiply the numbers x and y, you looked up log(x) and log(y), added log(x) and log(y), and then looked up the inverse logarithm of the result. You probably remember from grade school that longhand multiplication is more tedious and error-prone than addition. Thus logarithms were very useful as a computational aid. Transform methods are extremely useful computational aids in the solution of equations that involve derivatives and integrals of functions. In many of these problems, the solution is given by the convolution of two functions: f11x2 * f21x2. We will define the convolution operation later. For now, all you need to know is that finding the convolution of two functions can be more tedious and error-prone than longhand multiplication! In this section we introduce transforms that map the function fk1x2 into another function fk1v2, and that satisfy the property that f 3f11x2 * f21x24 = f11v2f21v2. In other words, the transform of the convolution is equal to the product of the individual transforms. Therefore transforms allow us to replace the convolution operation by the much simpler multiplication operation. The transform expressions introduced in this section will prove very useful when we consider sums of random variables in Chapter 7. 4.7.1 The Characteristic Function The characteristic function of a random variable X is defined by £ X1v2 = E3ejvX4 q = L- q fX1x2ejvx dx, (4.79a) (4.79b) where j = 2 -1 is the imaginary unit number. The two expressions on the right-hand side motivate two interpretations of the characteristic function. In the first expression, £ X1v2 can be viewed as the expected value of a function of X, ejvX, in which the parameter v is left unspecified. In the second expression, £ X1v2 is simply the Fourier transform of the pdf fX1x2 (with a reversal in the sign of the exponent). Both of these interpretations prove useful in different contexts. If we view £ X1v2 as a Fourier transform, then we have from the Fourier transform inversion formula that the pdf of X is given by fX1x2 = q 1 £ 1v2e -jvx dv. 2p L- q X (4.80) It then follows that every pdf and its characteristic function form a unique Fourier transform pair. Table 4.1 gives the characteristic function of some continuous random variables. Section 4.7 Transform Methods 185 Example 4.41 Exponential Random Variable The characteristic function for an exponentially distributed random variable with parameter l is given by £ X1v2 = = L0 q le -lxejvx dx = L0 q le -1l - jv2x dx l . l - jv If X is a discrete random variable, substitution of Eq. (4.20) into the definition of £ X1v2 gives £ X1v2 = a pX1xk2ejvxk discrete random variables. k Most of the time we deal with discrete random variables that are integer-valued. The characteristic function is then £ X1v2 = a pX1k2ejvk q q integer-valued random variables. (4.81) k=- Equation (4.81) is the Fourier transform of the sequence pX1k2. Note that the Fourier transform in Eq. (4.81) is a periodic function of v with period 2p, since ej1v + 2p2k= ejvkejk2p and ejk2p = 1. Therefore the characteristic function of integervalued random variables is a periodic function of v. The following inversion formula allows us to recover the probabilities pX1k2 from £ X1v2: pX1k2 = 2p 1 £ X1v2e -jvk dv 2p L0 k = 0, ;1, ;2, Á (4.82) Indeed, a comparison of Eqs. (4.81) and (4.82) shows that the pX1k2 are simply the coefficients of the Fourier series of the periodic function £ X1v2. Example 4.42 Geometric Random Variable The characteristic function for a geometric random variable is given by £ X1v2 = a pqkejvk = p a 1qejv2k = q q k=0 k=0 p 1 - qejv . Since fX1x2 and £ X1v2 form a transform pair, we would expect to be able to obtain the moments of X from £ X1v2. The moment theorem states that the moments of 186 Chapter 4 One Random Variable X are given by E3Xn4 = 1 dn . £ 1v2 ` jn dvn X v=0 (4.83) To show this, first expand ejvx in a power series in the definition of £ X1v2: £ X1v2 = q L- q fX1x2 b 1 + jvX + 1jvX22 2! + Á r dx. Assuming that all the moments of X are finite and that the series can be integrated term by term, we obtain £ X1v2 = 1 + jvE3X4 + 1jv22E3X24 2! + Á + 1jv2nE3Xn4 n! + Á. If we differentiate the above expression once and evaluate the result at v = 0 we obtain d = jE3X4. £ 1v2 ` dv X v=0 If we differentiate n times and evaluate at v = 0, we finally obtain dn £ 1v2 ` = jnE3Xn4, dvn X v=0 which yields Eq. (4.83). Note that when the above power series converges, the characteristic function and hence the pdf by Eq. (4.80) are completely determined by the moments of X. Example 4.43 To find the mean of an exponentially distributed random variable, we differentiate £ X1v2 = l1l - jv2-1 once, and obtain lj œ . £X 1v2 = 1l - jv22 œ 102/j = 1/l. The moment theorem then implies that E3X4 = £ X If we take two derivatives, we obtain fl £X 1v2 = -2l , 1l - jv23 fl 102/j2 = 2/l2. The variance of X is then given by so the second moment is then E3X24 = £ X VAR3X4 = E3X24 - E3X42 = 1 1 2 - 2 = 2. l2 l l Section 4.7 Transform Methods 187 Example 4.44 Chernoff Bound for Gaussian Random Variable Let X be a Gaussian random variable with mean m and variance s2. Find the Chernoff bound for X. The Chernoff bound (Eq. 4.78) depends on the moment generating function: E3esX4 = £ X1-js2. In terms of the characteristic function the bound is given by: P3X Ú a4 … e -sa £ X1-js2 for s Ú 0. The parameter s can be selected to minimize the upper bound. The bound for the Gaussian random variable is: 2 2 2 2 P3X Ú a4 … e -saems + s s /2 = e -s1a - m2 + s s /2 for s Ú 0. We minimize the upper bound by minimizing the exponent: d a - m 1-s1a - m2 + s2s2/22 which implies s = . ds s2 The resulting upper bound is: 0 = P3X Ú a4 = Q a a - m 2 2 b … e -1a - m2 /2s . s This bound is much better than the Chebyshev bound and is similar to the estimate given in Eq. (4.54). 4.7.2 The Probability Generating Function In problems where random variables are nonnegative, it is usually more convenient to use the z-transform or the Laplace transform. The probability generating function GN1z2 of a nonnegative integer-valued random variable N is defined by GN1z2 = E3zN4 = a pN1k2zk. (4.84a) q (4.84b) k=0 The first expression is the expected value of the function of N, zN. The second expression is the z-transform of the pmf (with a sign change in the exponent). Table 3.1 shows the probability generating function for some discrete random variables. Note that the characteristic function of N is given by £ N1v2 = GN1ejv2. Using a derivation similar to that used in the moment theorem, it is easy to show that the pmf of N is given by pN1k2 = 1 dk G 1z2 ` . k! dzk N z=0 (4.85) This is why GN1z2 is called the probability generating function. By taking the first two derivatives of GN1z2 and evaluating the result at z = 1, it is possible to find the first 188 Chapter 4 One Random Variable two moments of X: d = a pN1k2kzk - 1 ` = a kpN1k2 = E3N4 G 1z2 ` dz N k=0 k=0 z=1 z=1 q q and d2 = a pN1k2k1k - 12zk - 2 ` GN1z2 ` dz2 k=0 z=1 z=1 q = a k1k - 12pN1k2 = E3N1N - 124 = E3N 24 - E3N4. q k=0 Thus the mean and variance of X are given by and œ 112 E3N4 = G N (4.86) œ œ fl 11222. 112 - 1G N 112 + G N VAR3N4 = G N (4.87) Example 4.45 Poisson Random Variable The probability generating function for the Poisson random variable with parameter a is given by 1az2 ak -a k GN1z2 = a e z = e -a a k! k=0 k = 0 k! q q k = e -aeaz = ea1z - 12. The first two derivatives of GN1z2 are given by œ 1z2 = aea1z - 12 GN and fl GN 1z2 = a2ea1z - 12. Therefore the mean and variance of the Poisson are E3N4 = a VAR3N4 = a2 + a - a2 = a. 4.7.3 The Laplace Transform of the pdf In queueing theory one deals with service times, waiting times, and delays. All of these are nonnegative continuous random variables. It is therefore customary to work with the Laplace transform of the pdf, q (4.88) fX1x2e -sx dx = E3e -sX4. L0 Note that X*1s2 can be interpreted as a Laplace transform of the pdf or as an expected value of a function of X, e -sX. X*1s2 = Section 4.8 Basic Reliability Calculations 189 The moment theorem also holds for X*1s2: E3Xn4 = 1-12n dn . X*1s2 ` dsn s=0 (4.89) Example 4.46 Gamma Random Variable The Laplace transform of the gamma pdf is given by q q a X*1s2 = la l xa - 1e -lxe -sx dx = xa - 1e -1l + s2x dx ≠1a2 ≠1a2 L0 L0 q = la 1 la , ya - 1e -y dy = a ≠1a2 1l + s2 L0 1l + s2a where we used the change of variable y = 1l + s2x. We can then obtain the first two moments of X as follows: E3X4 = - ala a d la = = ` a` ds 1l + s2 s = 0 l 1l + s2a + 1 s = 0 and E3X24 = a1a + 12la a1a + 12 la d2 . = = ` ` 2 1l + s2a a + 2 ds l2 1l + s2 s=0 s=0 Thus the variance of X is VAR1X2 = E3X24 - E3X42 = 4.8 a . l2 BASIC RELIABILITY CALCULATIONS In this section we apply some of the tools developed so far to the calculation of measures that are of interest in assessing the reliability of systems. We also show how the reliability of a system can be determined in terms of the reliability of its components. 4.8.1 The Failure Rate Function Let T be the lifetime of a component, a subsystem, or a system. The reliability at time t is defined as the probability that the component, subsystem, or system is still functioning at time t: R1t2 = P3T 7 t4. (4.90) The relative frequency interpretation implies that, in a large number of components or systems, R(t) is the fraction that fail after time t. The reliability can be expressed in terms of the cdf of T: R1t2 = 1 - P3T … t4 = 1 - FT1t2. (4.91) 190 Chapter 4 One Random Variable Note that the derivative of R(t) gives the negative of the pdf of T: R¿1t2 = -fT1t2. (4.92) The mean time to failure (MTTF) is given by the expected value of T: q q fT1t2 dt = R1t2 dt, L0 L0 where the second expression was obtained using Eqs. (4.28) and (4.91). Suppose that we know a system is still functioning at time t; what is its future behavior? In Example 4.10, we found that the conditional cdf of T given that T 7 t is given by FT1x ƒ T 7 t2 = P3T … x ƒ T 7 t4 E3T4 = 0 = c FT1x2 - FT1t2 1 - FT1t2 The pdf associated with FT1x ƒ T 7 t2 is fT1x ƒ T 7 t2 = fT1x2 1 - FT1t2 x 6 t x Ú t. x Ú t. (4.93) (4.94) Note that the denominator of Eq. (4.94) is equal to R(t). The failure rate function r(t) is defined as fT1x ƒ T 7 t2 evaluated at x = t: r1t2 = fT1t ƒ T 7 t2 = -R¿1t2 , (4.95) R1t2 since by Eq. (4.92), R¿1t2 = -fT1t2. The failure rate function has the following meaning: P3t 6 T … t + dt ƒ T 7 t4 = fT1t ƒ T 7 t2 dt = r1t2 dt. (4.96) In words, r(t) dt is the probability that a component that has functioned up to time t will fail in the next dt seconds. Example 4.47 Exponential Failure Law Suppose a component has a constant failure rate function, say r1t2 = l. Find the pdf and the MTTF for its lifetime T. Equation (4.95) implies that R¿1t2 (4.97) = -l. R1t2 Equation (4.97) is a first-order differential equation with initial condition R102 = 1. If we integrate both sides of Eq. (4.97) from 0 to t, we obtain - L0 t l dt¿ + k = t R¿1t¿2 L0 R1t¿2 dt¿ = ln R1t2, Section 4.8 Basic Reliability Calculations 191 which implies that R1t2 = Ke -lt, where K = ek. The initial condition R102 = 1 implies that K = 1. Thus R1t2 = e -lt and t 7 0 fT1t2 = le -lt (4.98) t 7 0. Thus if T has a constant failure rate function, then T is an exponential random variable. This is not surprising, since the exponential random variable satisfies the memoryless property. The MTTF = E3T4 = 1/l. The derivation that was used in Example 4.47 can be used to show that, in general, the failure rate function and the reliability are related by R1t2 = exp b - L0 t r1t¿2 dt¿ r (4.99) and from Eq. (4.92), fT1t2 = r1t2 exp b - L0 t r1t¿2 dt¿ r . (4.100) Figure 4.16 shows the failure rate function for a typical system. Initially there may be a high failure rate due to defective parts or installation. After the “bugs” have been worked out, the system is stable and has a low failure rate. At some later point, ageing and wear effects set in, resulting in an increased failure rate. Equations (4.99) and (4.100) allow us to postulate reliability functions and the associated pdf’s in terms of the failure rate function, as shown in the following example. r(t) t FIGURE 4.16 Failure rate function for a typical system. 192 Chapter 4 One Random Variable Example 4.48 Weibull Failure Law The Weibull failure law has failure rate function given by r1t2 = abtb - 1, (4.101) where a and b are positive constants. Equation (4.99) implies that the reliability is given by b R1t2 = e -at . Equation (4.100) then implies that the pdf for T is fT1t2 = abtb - 1e -at b t 7 0. (4.102) Figure 4.17 shows fT1t2 for a = 1 and several values of b. Note that b = 1 yields the exponential failure law, which has a constant failure rate. For b 7 1, Eq. (4.101) gives a failure rate function that increases with time. For b 6 1, Eq. (4.101) gives a failure rate function that decreases with time. Further properties of the Weibull random variable are developed in the problems. 4.8.2 Reliability of Systems Suppose that a system consists of several components or subsystems. We now show how the reliability of a system can be computed in terms of the reliability of its subsystems if the components are assumed to fail independently of each other. fT (t) 1.5 b4 1 b1 b2 .5 0 0 0.5 1 1.5 t FIGURE 4.17 Probability density function of Weibull random variable, a = 1 and b = 1, 2, 4. 2 Section 4.8 C1 C2 Basic Reliability Calculations 193 Cn (a) C1 C2 Cn (b) FIGURE 4.18 (a) System consisting of n components in series. (b) System consisting of n components in parallel. Consider first a system that consists of the series arrangement of n components as shown in Fig. 4.18(a). This system is considered to be functioning only if all the components are functioning. Let A s be the event “system functioning at time t,” and let A j be the event “jth component is functioning at time t,” then the probability that the system is functioning at time t is R1t2 = P3A s4 = P3A 1 ¨ A 2 ¨ Á ¨ A n4 = P3A 14P3A 24 Á P3A n4 = R11t2R21t2 Á Rn1t2, (4.103) since P3A j4 = Rj1t2, the reliability function of the jth component. Since probabilities are numbers that are less than or equal to one, we see that R (t) can be no more reliable than the least reliable of the components, that is, R1t2 … minj Rj1t2. If we apply Eq. (4.99) to each of the Rj1t2 in Eq. (4.103), we then find that the failure rate function of a series system is given by the sum of the component failure rate functions: t R1t2 = exp E - 10 r11t¿2 dt¿ F exp E - 10 r21t¿2 dt¿ F Á exp E - 10 rn1t¿2 dt¿ F t t t = exp E - 10 3r11t¿2 + r21t¿2 + Á + rn1t¿24 dt¿ F . Example 4.49 Suppose that a system consists of n components in series and that the component lifetimes are exponential random variables with rates l1 , l2 , Á , ln . Find the system reliability. 194 Chapter 4 One Random Variable From Eqs. (4.98) and (4.103), we have R1t2 = e -l1te -l2t Á e -lnt = e -1l1 + Á + ln2t . Thus the system reliability is exponentially distributed with rate l1 + l2 + Á + ln . Now suppose that a system consists of n components in parallel, as shown in Fig. 4.18(b). This system is considered to be functioning as long as at least one of the components is functioning. The system will not be functioning if and only if all the components have failed, that is, Thus P3A cs4 = P3A c14P3A c24 Á P3A cn4. 1 - R1t2 = 11 - R11t2211 - R21t22 Á 11 - Rn1t22, and finally, R1t2 = 1 - 11 - R11t2211 - R21t22 Á 11 - Rn1t22. (4.104) Example 4.50 Compare the reliability of a single-unit system against that of a system that operates two units in parallel. Assume all units have exponentially distributed lifetimes with rate 1. The reliability of the single-unit system is Rs1t2 = e -t. The reliability of the two-unit system is Rp1t2 = 1 - 11 - e -t211 - e -t2 = e -t12 - e -t2. The parallel system is more reliable by a factor of 12 - e -t2 7 1. More complex configurations can be obtained by combining subsystems consisting of series and parallel components. The reliability of such systems can then be computed in terms of the subsystem reliabilities. See Example 2.35 for an example of such a calculation. 4.9 COMPUTER METHODS FOR GENERATING RANDOM VARIABLES The computer simulation of any random phenomenon involves the generation of random variables with prescribed distributions. For example, the simulation of a queueing system involves generating the time between customer arrivals as well as the service times of each customer. Once the cdf’s that model these random quantities have been selected, an algorithm for generating random variables with these cdf’s must be found. MATLAB and Octave have built-in functions for generating random variables for all Section 4.9 Computer Methods for Generating Random Variables 195 of the well known distributions. In this section we present the methods that are used for generating random variables. All of these methods are based on the availability of random numbers that are uniformly distributed between zero and one. Methods for generating these numbers were discussed in Section 2.7. All of the methods for generating random variables require the evaluation of either the pdf, the cdf, or the inverse of the cdf of the random variable of interest. We can write programs to perform these evaluations, or we can use the functions available in programs such as MATLAB and Octave. The following example shows some typical evaluations for the Gaussian random variable. Example 4.51 Evaluation of pdf, cdf, and Inverse cdf Let X be a Gaussian random variable with mean 1 and variance 2. Find the pdf at x = 7. Find the cdf at x = - 2. Find the value of x at which the cdf = 0.25. The following commands show how these results are obtained using Octave. > normal_pdf (7, 1, 2) ans = 3.4813e-05 > normal_cdf (-2, 1, 2) ans = 0.016947 > normal_inv (0.25, 1, 2) ans = 0.046127 4.9.1 The Transformation Method Suppose that U is uniformly distributed in the interval [0, 1]. Let FX1x2 be the cdf of the random variable we are interested in generating. Define the random variable, -1 Z = FX 1U2; that is, first U is selected and then Z is found as indicated in Fig. 4.19. The cdf of Z is -1 P3Z … x4 = P3F X 1U2 … x4 = P3U … FX1x24. But if U is uniformly distributed in [0, 1] and 0 … h … 1, then P3U … h4 = h (see Example 4.6). Thus P3Z … x4 = FX1x2, -1 1U2 has the desired cdf. and Z = F X Transformation Method for Generating X: 1. Generate U uniformly distributed in [0, 1]. -1 1U2. 2. Let Z = F X Example 4.52 Exponential Random Variable To generate an exponentially distributed random variable X with parameter l, we need to invert the expression u = FX1x2 = 1 - e -lx. We obtain X = - 1 ln11 - U2. l 196 Chapter 4 One Random Variable 1 0.9 FX (x) 0.8 0.7 U 0.6 U 0.5 0.4 0.3 0.2 0.1 0 Z = FX⫺1(U) 0 FIGURE 4.19 Transformation method for generating a random variable with cdf FX1x2. Note that we can use the simpler expression X = - ln1U2/l, since 1 - U is also uniformly distributed in [0, 1]. The first two lines of the Octave commands below show how to implement the transformation method to generate 1000 exponential random variables with l = 1. Figure 4.20 shows the histogram of values obtained. In addition, the figure shows the probability that samples of the random variables fall in the corresponding histogram bins. Good correspondence between the histograms and these probabilities are observed. In Chapter 8 we introduce methods for assessing the goodness-of-fit of data to a given distribution. Both MATLAB and Octave use the transformation method in their function exponential_rnd. > U=rand (1, 1000); > X=-log(U); > K=0.25:0.5:6; > P(1)=1-exp(-0.5) > for i=2:12, > P(i)=P(i-1)*exp(-0.5) > end; > stem (K, P) > hold on % Generate 1000 uniform random variables. % Compute 1000 exponential RVs. % The remaining lines show how to generate % the histogram bins. > Hist (X, K, 1) 4.9.2 The Rejection Method We first consider the simple version of this algorithm and explain why it works; then we present it in its general form. Suppose that we are interested in generating a random variable Z with pdf fX1x2 as shown in Fig. 4.21. In particular, we assume that: (1) the pdf is nonzero only in the interval [0, a], and (2) the pdf takes on values in the range [0, b]. The rejection method in this case works as follows: Section 4.9 197 Computer Methods for Generating Random Variables 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 0 1 2 3 4 6 5 FIGURE 4.20 Histogram of 1000 exponential random variables using transformation method. b Reject Accept Y 0 fX (x) a x x  dx 0 X1 FIGURE 4.21 Rejection method for generating a random variable with pdf fX1x2. 1. Generate X1 uniform in the interval [0, a]. 2. Generate Y uniform in the interval [0, b]. 3. If Y … fX1X12, then output Z = X1 ; else, reject X1 and return to step 1. 198 Chapter 4 One Random Variable Note that this algorithm will perform a random number of steps before it produces the output Z. We now show that the output Z has the desired pdf. Steps 1 and 2 select a point at random in a rectangle of width a and height b. The probability of selecting a point in any region is simply the area of the region divided by the total area of the rectangle, ab. Thus the probability of accepting X1 is the probability of the region below fX1x2 divided by ab. But the area under any pdf is 1, so we conclude that the probability of success (i.e., acceptance) is 1/ab. Consider now the following probability: P3x 6 X1 … x + dx ƒ X1 is accepted4 = = P35x 6 X1 … x + dx6 ¨ 5X1 accepted64 P3X1 accepted4 fX1x2 dx/ab shaded area/ab = 1/ab 1/ab = fX1x2 dx. Therefore X1 when accepted has the desired pdf. Thus Z has the desired pdf. Example 4.53 Generating Beta Random Variables Show that the beta random variables with a¿ = b¿ = 2 can be generated using the rejection method. The pdf of the beta random variable with a¿ = b¿ = 2 is similar to that shown in Fig. 4.21. This beta pdf is maximum at x = 1/2 and the maximum value is: 11/222 - 111/222 - 1 B12, 22 = 1/4 1/4 3 = = . ≠122≠122/≠142 1!1!/3! 2 Therefore we can generate this beta random variable using the rejection method with b = 1.5. The algorithm as stated above can have two problems. First, if the rectangle does not fit snugly around fX1x2, the number of X1’s that need to be generated before acceptance may be excessive. Second, the above method cannot be used if fX1x2 is unbounded or if its range is not finite. The general version of this algorithm overcomes both problems. Suppose we want to generate Z with pdf fX1x2. Let W be a random variable with pdf fW1x2 that is easy to generate and such that for some constant K 7 1, KfW1x2 Ú fX1x2 for all x, that is, the region under KfW1x2 contains fX1x2 as shown in Fig. 4.22. Rejection Method for Generating X: 1. Generate X1 with pdf fW1x2. Define B1X12 = KfW1X12. 2. Generate Y uniform in 30, B1X124. 3. If Y … fX1X12, then output Z = X1 ; else reject X1 and return to step 1. See Problem 4.143 for a proof that Z has the desired pdf. Section 4.9 Computer Methods for Generating Random Variables 199 1 0.9 0.8 0.7 0.6 Reject Y 0.5 0.4 KfW (x) 0.3 fX (x) 0.2 Accept 0.1 0 0 1 2 3 X1 FIGURE 4.22 Rejection method for generating a random variable with gamma pdf and with 0 6 a 6 1. Example 4.54 Gamma Random Variable We now show how the rejection method can be used to generate X with gamma pdf and parameters 0 6 a 6 1 and l = 1. A function KfW1x2 that “covers” fX1x2 is easily obtained (see Fig. 4.22): fX1x2 = x a - 1 -x e ≠1a2 xa - 1 ≠1a2 … KfW1x2 = d -x e ≠1a2 0 … x … 1 x 7 1. The pdf fW1x2 that corresponds to the function on the right-hand side is aexa - 1 a + e fW1x2 = d e -x ae a + e 0 … x … 1 x Ú 1. The cdf of W is FW1x2 = d exa a + e 0 … x … 1 1 - ae e -x a + e x 7 1. W is easy to generate using the transformation method, with -1 1u2 = d FW c 1a + e2u e d 1/a -lnc1a + e2 u … e/1a + e2 11 - u2 ae d u 7 e/1a + e2. 200 Chapter 4 One Random Variable We can therefore use the transformation method to generate this fW1x2, and then the rejection method to generate any gamma random variable X with parameters 0 6 a 6 1 and l = 1. Finally we note that if we let W = lX, then W will be gamma with parameters a and l. The generation of gamma random variables with a 7 1 is discussed in Problem 4.142. Example 4.55 Implementing Rejection Method for Gamma Random Variables Given below is an Octave function definition to implement the rejection method using the above transformation. % Generate random numbers from the gamma distribution for 0 … a … 1. function X = gamma_rejection_method_altone(alpha) while (true), X = special_inverse(alpha); B = special_pdf (X, alpha); Y = rand.* B; if (Y <= fx_gamma_pdf (X, alpha)), break; end % Step 1: Generate X with pdf fX1x2. % Step 2: Generate Y uniform in 30, KfX1X24. % Step 3: Accept or reject Á end % Helper function to generate random variables according to KfZ1x2. function X = special_inverse (alpha) u = rand; if (u <= e./(alpha+e)), X = ((alpha+e).*u./e). ^ (1./alpha); elseif (u > e./(alpha+e)), X = -log((alpha+e).*(1-u)./(alpha.*e)); end % Return B in order to generate uniform variables in 30, KfZ1X24. function B = special_pdf (X, alpha) if (X >=0 && X <= 1), B = alpha.*e.*X.^(alpha-1)./(alpha + e); elseif (X > 1), B = alpha.*e.*(e. ^(-X)./(alpha + e)); end % pdf of the gamma distribution. % Could also use the built in gamma_pdf (X, A, B) function supplied with Octave setting B = 1 function Y = fx_gamma_pdf (x, alpha) y = (x.^ (alpha-1)).*(e.^ (-x))./(gamma(alpha)); Figure 4.23 shows the histogram of 1000 samples obtained using this function. The figure also shows the probability that the samples fall in the bins of the histogram. We have presented the most common methods that are used to generate random variables. These methods are incorporated in the functions provided by programs such as MATLAB and Octave, so in practice you do not need to write programs to Section 4.9 Computer Methods for Generating Random Variables 201 350 ⫹ Expected Frequencies Empirical Frequencies ⫹ 300 250 200 150 ⫹ 100 ⫹ ⫹ ⫹ 50 0 ⫹⫹ 0 0.5 ⫹⫹ ⫹ ⫹ ⫹⫹ 1 ⫹⫹ ⫹⫹ 1.5 ⫹⫹⫹ ⫹ ⫹ ⫹ ⫹ ⫹⫹ ⫹⫹⫹ ⫹⫹⫹ ⫹⫹⫹⫹ ⫹ ⫹ ⫹ ⫹⫹⫹⫹⫹ ⫹⫹⫹ ⫹⫹⫹ 2 2.5 3 3.5 4 4.5 5 FIGURE 4.23 1000 samples of gamma random variable using rejection method. generate the most common random variables. You simply need to invoke the appropriate functions. Example 4.56 Generating Gamma Random Variables Use Octave to obtain eight Gamma random variables with a = 0.25 and l = 1. The Octave command and the corresponding answer are given below: > gamma_rnd (0.25, 1, 1, 8) ans = Columns 1 through 6: 0.00021529 0.09331491 0.00013400 0.23384718 Columns 7 and 8: 1.72940941 4.9.3 0.24606757 0.08665787 1.29599702 Generation of Functions of a Random Variable Once we have a simple method of generating a random variable X, we can easily generate any random variable that is defined by Y = g1X2 or even Z = h1X1 , X2 , Á , Xn2, where X1 , Á , Xn are n outputs of the random variable generator. 202 Chapter 4 One Random Variable Example 4.57 m-Erlang Random Variable Let X1 , X2 , Á be independent, exponentially distributed random variables with parameter l. In Chapter 7 we show that the random variable Y = X1 + X2 + Á + Xm has an m-Erlang pdf with parameter l. We can therefore generate an m-Erlang random variable by first generating m exponentially distributed random variables using the transformation method, and then taking the sum. Since the m-Erlang random variable is a special case of the gamma random variable, for large m it may be preferable to use the rejection method described in Problem 4.142. 4.9.4 Generating Mixtures of Random Variables We have seen in previous sections that sometimes a random variable consists of a mixture of several random variables. In other words, the generation of the random variable can be viewed as first selecting a random variable type according to some pmf, and then generating a random variable from the selected pdf type. This procedure can be simulated easily. Example 4.58 Hyperexponential Random Variable A two-stage hyperexponential random variable has pdf fX1x2 = pae -ax + 11 - p2be -bx. It is clear from the above expression that X consists of a mixture of two exponential random variables with parameters a and b, respectively. X can be generated by first performing a Bernoulli trial with probability of success p. If the outcome is a success, we then use the transformation method to generate an exponential random variable with parameter a. If the outcome is a failure, we generate an exponential random variable with parameter b instead. *4.10 ENTROPY Entropy is a measure of the uncertainty in a random experiment. In this section, we first introduce the notion of the entropy of a random variable and develop several of its fundamental properties. We then show that entropy quantifies uncertainty by the amount of information required to specify the outcome of a random experiment. Finally, we discuss the method of maximum entropy, which has found wide use in characterizing random variables when only some parameters, such as the mean or variance, are known. 4.10.1 The Entropy of a Random Variable Let X be a discrete random variable with SX = 51, 2, Á , K6 and pmf pk = P3X = k4. We are interested in quantifying the uncertainty of the event A k = 5X = k6. Clearly, the uncertainty of A k is low if the probability of A k is close to one, and it is high if the Summary 213 SUMMARY • The cumulative distribution function FX1x2 is the probability that X falls in the interval 1- q , x4. The probability of any event consisting of the union of intervals can be expressed in terms of the cdf. • A random variable is continuous if its cdf can be written as the integral of a nonnegative function. A random variable is mixed if it is a mixture of a discrete and a continuous random variable. • The probability of events involving a continuous random variable X can be expressed as integrals of the probability density function fX1x2. • If X is a random variable, then Y = g1X2 is also a random variable. The notion of equivalent events allows us to derive expressions for the cdf and pdf of Y in terms of the cdf and pdf of X. • The cdf and pdf of the random variable X are sufficient to compute all probabilities involving X alone. The mean, variance, and moments of a random variable summarize some of the information about the random variable X. These parameters are useful in practice because they are easier to measure and estimate than the cdf and pdf. • Conditional cdf’s or pdf’s incorporate partial knowledge about the outcome of an experiment in the calculation of probabilities of events. • The Markov and Chebyshev inequalities allow us to bound probabilities involving X in terms of its first two moments only. • Transforms provide an alternative but equivalent representation of the pmf and pdf. In certain types of problems it is preferable to work with the transforms rather than the pmf or pdf. The moments of a random variable can be obtained from the corresponding transform. • The reliability of a system is the probability that it is still functioning after t hours of operation. The reliability of a system can be determined from the reliability of its subsystems. • There are a number of methods for generating random variables with prescribed pmf’s or pdf’s in terms of a random variable that is uniformly distributed in the unit interval. These methods include the transformation and the rejection methods as well as methods that simulate random experiments (e.g., functions of random variables) and mixtures of random variables. • The entropy of a random variable X is a measure of the uncertainty of X in terms of the average amount of information required to identify its value. • The maximum entropy method is a procedure for estimating the pmf or pdf of a random variable when only partial information about X, in the form of expected values of functions of X, is available. 214 Chapter 4 One Random Variable CHECKLIST OF IMPORTANT TERMS Characteristic function Chebyshev inequality Chernoff bound Conditional cdf, pdf Continuous random variable Cumulative distribution function Differential entropy Discrete random variable Entropy Equivalent event Expected value of X Failure rate function Function of a random variable Laplace transform of the pdf Markov inequality Maximum entropy method Mean time to failure (MTTF) Moment theorem nth moment of X Probability density function Probability generating function Probability mass function Random variable Random variable of mixed type Rejection method Reliability Standard deviation of X Transformation method Variance of X ANNOTATED REFERENCES Reference [1] is the standard reference for electrical engineers for the material on random variables. Reference [2] is entirely devoted to continuous distributions. Reference [3] discusses some of the finer points regarding the concept of a random variable at a level accessible to students of this course. Reference [4] presents detailed discussions of the various methods for generating random numbers with specified distributions. Reference [5] also discusses the generation of random variables. Reference [9] is focused on signal processing. Reference [11] discusses entropy in the context of information theory. 1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 2002. 2. N. Johnson et al., Continuous Univariate Distributions, vol. 2, Wiley, New York, 1995. 3. K. L. Chung, Elementary Probability Theory, Springer-Verlag, New York, 1974. 4. A. M. Law and W. D. Kelton, Simulation Modeling and Analysis, McGraw-Hill, New York, 2000. 5. S. M. Ross, Introduction to Probability Models, Academic Press, New York, 2003. 6. H. Cramer, Mathematical Methods of Statistics, Princeton University Press, Princeton, N.J., 1946. 7. M. Abramowitz and I. Stegun, Handbook of Mathematical Functions, National Bureau of Standards, Washington, D.C., 1964. Downloadable: www.math.sfu.ca/~cbm /aands/. 8. R. C. Cheng, “The Generation of Gamma Variables with Nonintegral Shape Parameter,” Appl. Statist., 26: 71–75, 1977. 9. R. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing, Cambridge Univ. Press, Cambridge, UK, 2005. Problems 215 10. P. O. Börjesson and C. E. W. Sundberg, “Simple Approximations of the Error Function Q(x) for Communications Applications,” IEEE Trans. on Communications, March 1979, 639–643. 11. R. G. Gallager, Information Theory and Reliable Communication, Wiley, New York, 1968. PROBLEMS Section 4.1: The Cumulative Distribution Function 4.1. An information source produces binary pairs that we designate as SX = 51, 2, 3, 46 with the following pmf’s: (i) pk = p1/k for all k in SX . (ii) pk + 1 = pk/2 for k = 2, 3, 4. 4.2. 4.3. 4.4. 4.5. 4.6. 4.7. (iii) pk + 1 = pk/2 k for k = 2, 3, 4. (a) Plot the cdf of these three random variables. (b) Use the cdf to find the probability of the events: 5X … 16, 5X 6 2.56, 50.5 6 X … 26, 51 6 X 6 46. A die is tossed. Let X be the number of full pairs of dots in the face showing up, and Y be the number of full or partial pairs of dots in the face showing up. Find and plot the cdf of X and Y. The loose minute hand of a clock is spun hard. The coordinates (x, y) of the point where the tip of the hand comes to rest is noted. Z is defined as the sgn function of the product of x and y, where sgn(t) is 1 if t 7 0, 0 if t = 0, and -1 if t 6 0. (a) Find and plot the cdf of the random variable X. (b) Does the cdf change if the clock hand has a propensity to stop at 3, 6, 9, and 12 o’clock? An urn contains 8 $1 bills and two $5 bills. Let X be the total amount that results when two bills are drawn from the urn without replacement, and let Y be the total amount that results when two bills are drawn from the urn with replacement. (a) Plot and compare the cdf’s of the random variables. (b) Use the cdf to compare the probabilities of the following events in the two problems: 5X = $26, 5X 6 $76, 5X Ú 66. Let Y be the difference between the number of heads and the number of tails in the 3 tosses of a fair coin. (a) Plot the cdf of the random variable Y. (b) Express P3 ƒ Y ƒ 6 y4 in terms of the cdf of Y. A dart is equally likely to land at any point inside a circular target of radius 2. Let R be the distance of the landing point from the origin. (a) Find the sample space S and the sample space of R, SR . (b) Show the mapping from S to SR . (c) The “bull’s eye” is the central disk in the target of radius 0.25. Find the event A in SR corresponding to “dart hits the bull’s eye.” Find the equivalent event in S and P[A]. (d) Find and plot the cdf of R. A point is selected at random inside a square defined by 51x, y2: 0 … x … b, 0 … y … b6. Assume the point is equally likely to fall anywhere in the square. Let the random variable Z be given by the minimum of the two coordinates of the point where the dart lands. (a) Find the sample space S and the sample space of Z, SZ . 216 Chapter 4 4.8. 4.9. 4.10. 4.11. 4.12. One Random Variable (b) Show the mapping from S to SZ . (c) Find the region in the square corresponding to the event 5Z … z6. (d) Find and plot the cdf of Z. (e) Use the cdf to find: P3Z 7 04, P3Z 7 b4, P3Z … b/24, P3Z 7 b/44. Let z be a point selected at random from the unit interval. Consider the random variable X = 11 - z2-1/2. (a) Sketch X as a function of z. (b) Find and plot the cdf of X. (c) Find the probability of the events 5X 7 16, 55 6 X 6 76, 5X … 206. The loose hand of a clock is spun hard and the outcome z is the angle in the range [0, 2p2 where the hand comes to rest. Consider the random variable X1z2 = 2 sin1z/42. (a) Sketch X as a function of z. (b) Find and plot the cdf of X. (c) Find the probability of the events 5X 7 16, 5-1/2 6 X 6 1/26, 5X … 1/126. Repeat Problem 4.9 if 80% of the time the hand comes to rest anywhere in the circle, but 20% of the time the hand comes to rest at 3, 6, 9, or 12 o’clock. The random variable X is uniformly distributed in the interval 3 -1, 24. (a) Find and plot the cdf of X. (b) Use the cdf to find the probabilities of the following events: 5X … 06, 5 ƒ X - 0.5 ƒ 6 16, and C = 5X 7 -0.56. The cdf of the random variable X is given by: 0 0.5 FX1x2 = d 11 + x2/2 1 x -1 … x 0 … x x 6 … … Ú -1 0 1 1. (a) Plot the cdf and identify the type of random variable. (b) Find P3X … -14, P3X = -14, P3X 6 0.54, P3 - 0.5 6 X 6 0.54, P3X 7 -14, P3X … 24, P3X 7 34. 4.13. A random variable X has cdf: 0 FX1x2 = c 1 - 1 e -2x 4 for x 6 0 for x Ú 0. (a) Plot the cdf and identify the type of random variable. (b) Find P3X … 24, P3X = 04, P3X 6 04, P32 6 X 6 64, P3X 7 104. 4.14. The random variable X has cdf shown in Fig. P4.1. (a) What type of random variable is X? (b) Find the following probabilities: P3X 6 - 14, P3X … - 14, P3-1 6 X 6 -0.754, P3-0.5 … X 6 04, P3 -0.5 … X … 0.54, P3 ƒ X - 0.5 ƒ 6 0.54. 4.15. For b 7 0 and l 7 0, the Weibull random variable Y has cdf: FX1x2 = b 0 b 1 - e -1x/l2 for x 6 0 for x Ú 0. Problems 217 1 6 10 2 10 4 10 x 1 1  2 0 1 FIGURE P4.1 (a) Plot the cdf of Y for b = 0.5, 1, and 2. (b) Find the probability P3jl 6 X 6 1j + 12l4 and P3X 7 jl4. (c) Plot log P3X 7 x4 vs. log x. 4.16. The random variable X has cdf: 0 FX1x2 = c 0.5 + c sin21px/22 1 x 6 0 0 … x … 1 x 7 1. (a) What values can c assume? (b) Plot the cdf. (c) Find P3X 7 04. Section 4.2: The Probability Density Function 4.17. A random variable X has pdf: fX1x2 = b c11 - x22 0 -1 … x … 1 elsewhere. (a) Find c and plot the pdf. (b) Plot the cdf of X. (c) Find P3X = 04, P30 6 X 6 0.54, and P3 ƒ X - 0.5 ƒ 6 0.254. 4.18. A random variable X has pdf: fX1x2 = b cx11 - x22 0 0 … x … 1 elsewhere. Find c and plot the pdf. Plot the cdf of X. Find P30 6 X 6 0.54, P3X = 14, P3.25 6 X 6 0.54. In Problem 4.6, find and plot the pdf of the random variable R, the distance from the dart to the center of the target. (b) Use the pdf to find the probability that the dart is outside the bull’s eye. 4.20. (a) Find and plot the pdf of the random variable Z in Problem 4.7. (b) Use the pdf to find the probability that the minimum is greater than b/3. (a) (b) (c) 4.19. (a) 218 Chapter 4 One Random Variable 4.21. (a) Find and plot the pdf in Problem 4.8. (b) Use the pdf to find the probabilities of the events: 5X 7 a6 and 5X 7 2a6. 4.22. (a) Find and plot the pdf in Problem 4.12. (b) Use the pdf to find P3- 1 … X 6 0.254. 4.23. (a) Find and plot the pdf in Problem 4.13. (b) Use the pdf to find P3X = 04, P3X 7 84. 4.24. (a) Find and plot the pdf of the random variable in Problem 4.14. (b) Use the pdf to calculate the probabilities in Problem 4.14b. 4.25. Find and plot the pdf of the Weibull random variable in Problem 4.15a. 4.26. Find the cdf of the Cauchy random variable which has pdf: fX1x2 = a/p x2 + a2 - q 6 x 6 q. 4.27. A voltage X is uniformly distributed in the set 5-3, -2, Á , 3, 46. (a) Find the pdf and cdf of the random variable X. (b) Find the pdf and cdf of the random variable Y = -2X2 + 3. (c) Find the pdf and cdf of the random variable W = cos1pX/82. (d) Find the pdf and cdf of the random variable Z = cos21pX/82. 4.28. Find the pdf and cdf of the Zipf random variable in Problem 3.70. 4.29. Let C be an event for which P3C4 7 0. Show that FX1x ƒ C2 satisfies the eight properties of a cdf. 4.30. (a) In Problem 4.13, find FX1x ƒ C2 where C = 5X 7 06. (b) Find FX1x ƒ C2 where C = 5X = 06. 4.31. (a) In Problem 4.10, find FX1x ƒ B2 where B = 5hand does not stop at 3, 6, 9, or 12 o’clock6. (b) Find FX1x ƒ Bc2. 4.32. In Problem 4.13, find fX1x ƒ B2 and FX1x ƒ B2 where B = 5X 7 0.256. 4.33. Let X be the exponential random variable. (a) Find and plot FX1x ƒ X 7 t2. How does FX1x ƒ X 7 t2 differ from FX1x2? (b) Find and plot fX1x ƒ X 7 t2. (c) Show that P3X 7 t + x ƒ X 7 t4 = P3X 7 x4. Explain why this is called the memoryless property. 4.34. The Pareto random variable X has cdf: 0 a FX1x2 = c 1 - xm xa x 6 xm x Ú xm . (a) Find and plot the pdf of X. (b) Repeat Problem 4.33 parts a and b for the Pareto random variable. (c) What happens to P3X 7 t + x ƒ X 7 t4 as t becomes large? Interpret this result. 4.35. (a) Find and plot FX1x ƒ a … X … b2. Compare FX1x ƒ a … X … b2 to FX1x2. (b) Find and plot fX1x ƒ a … X … b2. 4.36. In Problem 4.6, find FR1r ƒ R 7 12 and fR1r ƒ R 7 12. Problems 219 4.37. (a) In Problem 4.7, find FZ1z ƒ b/4 … Z … b/22 and fZ1z ƒ b/4 … Z … b/22. (b) Find FZ1z ƒ B2 and fZ1z ƒ B2, where B = 5x 7 b/26. 4.38. A binary transmission system sends a “0” bit using a -1 voltage signal and a “1” bit by transmitting a +1. The received signal is corrupted by noise N that has a Laplacian distribution with parameter a. Assume that “0” bits and “1” bits are equiprobable. (a) Find the pdf of the received signal Y = X + N, where X is the transmitted signal, given that a “0” was transmitted; that a “1” was transmitted. (b) Suppose that the receiver decides a “0” was sent if Y 6 0, and a “1” was sent if Y Ú 0. What is the probability that the receiver makes an error given that a +1 was transmitted? a -1 was transmitted? (c) What is the overall probability of error? Section 4.3: The Expected Value of X 4.39. 4.40. 4.41. 4.42. 4.43. 4.44. 4.45. 4.46. 4.47. 4.48. 4.49. 4.50. 4.51. 4.52. 4.53. 4.54. Find the mean and variance of X in Problem 4.17. Find the mean and variance of X in Problem 4.18. Find the mean and variance of Y, the distance from the dart to the origin, in Problem 4.19. Find the mean and variance of Z, the minimum of the coordinates in a square, in Problem 4.20. Find the mean and variance of X = 11 - z2-1/2 in Problem 4.21. Find E[X] using Eq. (4.28). Find the mean and variance of X in Problems 4.12 and 4.22. Find the mean and variance of X in Problems 4.13 and 4.23. Find E[X] using Eq. (4.28). Find the mean and variance of the Gaussian random variable by direct integration of Eqs. (4.27) and (4.34). Prove Eqs. (4.28) and (4.29). Find the variance of the exponential random variable. (a) Show that the mean of the Weibull random variable in Problem 4.15 is ≠11 + 1/b2 where ≠1x2 is the gamma function defined in Eq. (4.56). (b) Find the second moment and the variance of the Weibull random variable. Explain why the mean of the Cauchy random variable does not exist. Show that E[X] does not exist for the Pareto random variable with a = 1 and xm = 1. Verify Eqs. (4.36), (4.37), and (4.38). Let Y = A cos1vt2 + c where A has mean m and variance s2 and v and c are constants. Find the mean and variance of Y. Compare the results to those obtained in Example 4.15. A limiter is shown in Fig. P4.2. g(x) a a a FIGURE P4.2 0 a x 220 Chapter 4 One Random Variable (a) Find an expression for the mean and variance of Y = g(X) for an arbitrary continuous random variable X. (b) Evaluate the mean and variance if X is a Laplacian random variable with l = a = 1. (c) Repeat part (b) if X is from Problem 4.17 with a = 1/2. (d) Evaluate the mean and variance if X = U3 where U is a uniform random variable in the unit interval, 3-1, 14 and a = 1/2. 4.55. A limiter with center-level clipping is shown in Fig. P4.3. (a) Find an expression for the mean and variance of Y = g(X) for an arbitrary continuous random variable X. (b) Evaluate the mean and variance if X is Laplacian with l = a = 1 and b = 2. (c) Repeat part (b) if X is from Problem 4.22, a = 1/2, b = 3/2. (d) Evaluate the mean and variance if X = b cos12pU2 where U is a uniform random variable in the unit interval 3-1, 14 and a = 3/4, b = 1/2. y b b a a b x b FIGURE P4.3 4.56. Let Y = 3X + 2. (a) Find the mean and variance of Y in terms of the mean and variance of X. (b) Evaluate the mean and variance of Y if X is Laplacian. (c) Evaluate the mean and variance of Y if X is an arbitrary Gaussian random variable. (d) Evaluate the mean and variance of Y if X = b cos12pU2 where U is a uniform random variable in the unit interval. 4.57. Find the nth moment of U, the uniform random variable in the unit interval. Repeat for X uniform in [a, b]. 4.58. Consider the quantizer in Example 4.20. (a) Find the conditional pdf of X given that X is in the interval (d, 2d). (b) Find the conditional expected value and conditional variance of X given that X is in the interval (d, 2d). Problems 221 (c) Now suppose that when X falls in (d, 2d), it is mapped onto the point c where d 6 c 6 2d. Find an expression for the expected value of the mean square error: E31X - c22 ƒ d 6 X 6 2d4. (d) Find the value c that minimizes the above mean square error. Is c the midpoint of the interval? Explain why or why not by sketching possible conditional pdf shapes. (e) Find an expression for the overall mean square error using the approach in parts c and d. Section 4.4: Important Continuous Random Variables 4.59. Let X be a uniform random variable in the interval 3 -2, 24. Find and plot P3 ƒ X ƒ 7 x4. 4.60. In Example 4.20, let the input to the quantizer be a uniform random variable in the interval 3 -4d, 4d4. Show that Z = X - Q1X2 is uniformly distributed in 3-d/2, d/24. 4.61. Let X be an exponential random variable with parameter l. (a) For d 7 0 and k a nonnegative integer, find P3kd 6 X 6 1k + 12d4. (b) Segment the positive real line into four equiprobable disjoint intervals. 4.62. The rth percentile, p1r2, of a random variable X is defined by P3X … p1r24 = r/100. (a) Find the 90%, 95%, and 99% percentiles of the exponential random variable with parameter l. (b) Repeat part a for the Gaussian random variable with parameters m = 0 and s2. 4.63. Let X be a Gaussian random variable with m = 5 and s2 = 16. (a) Find P3X 7 44, P3X Ú 74, P36.72 6 X 6 10.164, P32 6 X 6 74, P36 … X … 84. (b) P3X 6 a4 = 0.8869, find a. (c) P3X 7 b4 = 0.11131, find b. (d) P313 6 X … c4 = 0.0123, find c. 4.64. Show that the Q-function for the Gaussian random variable satisfies Q1-x2 = 1 - Q1x2. 4.65. Use Octave to generate Tables 4.2 and 4.3. 4.66. Let X be a Gaussian random variable with mean m and variance s2. (a) Find P3X … m4. (b) Find P3 ƒ X - m ƒ 6 ks4, for k = 1, 2, 3, 4, 5, 6. (c) Find the value of k for which Q1k2 = P3X 7 m + ks4 = 10-j for j = 1, 2, 3, 4, 5, 6. 4.67. A binary transmission system transmits a signal X ( -1 to send a “0” bit; +1 to send a “1” bit). The received signal is Y = X + N where noise N has a zero-mean Gaussian distribution with variance s2. Assume that “0” bits are three times as likely as “1” bits. (a) Find the conditional pdf of Y given the input value: fY1y ƒ X = +12 and fY1y ƒ X = -12. (b) The receiver decides a “0” was transmitted if the observed value of y satisfies fY1y ƒ X = -12P3X = -14 7 fY1y ƒ X = +12P3X = +14 and it decides a “1” was transmitted otherwise. Use the results from part a to show that this decision rule is equivalent to: If y 6 T decide “0”; if y Ú T decide “1”. (c) What is the probability that the receiver makes an error given that a +1 was transmitted? a -1 was transmitted? Assume s2 = 1/16. (d) What is the overall probability of error? 222 Chapter 4 One Random Variable 4.68. Two chips are being considered for use in a certain system. The lifetime of chip 1 is modeled by a Gaussian random variable with mean 20,000 hours and standard deviation 5000 hours. (The probability of negative lifetime is negligible.) The lifetime of chip 2 is also a Gaussian random variable but with mean 22,000 hours and standard deviation 1000 hours. Which chip is preferred if the target lifetime of the system is 20,000 hours? 24,000 hours? 4.69. Passengers arrive at a taxi stand at an airport at a rate of one passenger per minute. The taxi driver will not leave until seven passengers arrive to fill his van. Suppose that passenger interarrival times are exponential random variables, and let X be the time to fill a van. Find the probability that more than 10 minutes elapse until the van is full. 4.70. (a) Show that the gamma random variable has mean: E3X4 = a/l. (b) Show that the gamma random variable has second moment, and variance given by: E3X24 = a1a + 12/l2 and VAR3X4 = a/l2. 4.71. 4.72. 4.73. 4.74. 4.75. (c) Use parts a and b to obtain the mean and variance of an m-Erlang random variable. (d) Use parts a and b to obtain the mean and variance of a chi-square random variable. The time X to complete a transaction in a system is a gamma random variable with mean 4 and variance 8. Use Octave to plot P3X 7 x4 as a function of x. Note: Octave uses b = 1/2. (a) Plot the pdf of an m-Erlang random variable for m = 1, 2, 3 and l = 1. (b) Plot the chi-square pdf for k = 1, 2, 3. A repair person keeps four widgets in stock. What is the probability that the widgets in stock will last 15 days if the repair person needs to replace widgets at an average rate of one widget every three days, where the time between widget failures is an exponential random variable? (a) Find the cdf of the m-Erlang random variable by integration of the pdf. Hint: Use integration by parts. (b) Show that the derivative of the cdf given by Eq. (4.58) gives the pdf of an m-Erlang random variable. Plot the pdf of a beta random variable with: a = b = 1/4, 1, 4, 8; a = 5, b = 1; a = 1, b = 3; a = 2, b = 5. Section 4.5: Functions of a Random Variable 4.76. Let X be a Gaussian random variable with mean 2 and variance 4. The reward in a system is given by Y = 1X2 + . Find the pdf of Y. 4.77. The amplitude of a radio signal X is a Rayleigh random variable with pdf: fX1x2 = x -x2/2a2 e a2 x 7 0, a 7 0. (a) Find the pdf of Z = 1X - r2 + . (b) Find the pdf of Z = X2. 4.78. A wire has length X, an exponential random variable with mean 5p cm. The wire is cut to make rings of diameter 1 cm. Find the probability for the number of complete rings produced by each length of wire. Problems 223 4.79. A signal that has amplitudes with a Gaussian pdf with zero mean and unit variance is applied to the quantizer in Example 4.27. (a) Pick d so that the probability that X falls outside the range of the quantizer is 1%. (b) Find the probability of the output levels of the quantizer. 4.80. The signal X is amplified and shifted as follows: Y = 2X + 3, where X is the random variable in Problem 4.12. Find the cdf and pdf of Y. 4.81. The net profit in a transaction is given by Y = 2 - 4X where X is the random variable in Problem 4.13. Find the cdf and pdf of Y. 4.82. Find the cdf and pdf of the output of the limiter in Problem 4.54 parts b, c, and d. 4.83. Find the cdf and pdf of the output of the limiter with center-level clipping in Problem 4.55 parts b, c, and d. 4.84. Find the cdf and pdf of Y = 3X + 2 in Problem 4.56 parts b, c, and d. 4.85. The exam grades in a certain class have a Gaussian pdf with mean m and standard deviation s. Find the constants a and b so that the random variable y = aX + b has a Gaussian pdf with mean m¿ and standard deviation s¿. 4.86. Let X = Un where n is a positive integer and U is a uniform random variable in the unit interval. Find the cdf and pdf of X. 4.87. Repeat Problem 4.86 if U is uniform in the interval 3-1, 14. 4.88. Let Y = ƒ X ƒ be the output of a full-wave rectifier with input voltage X. (a) Find the cdf of Y by finding the equivalent event of 5Y … y6. Find the pdf of Y by differentiation of the cdf. (b) Find the pdf of Y by finding the equivalent event of 5y 6 Y … y + dy6. Does the answer agree with part a? (c) What is the pdf of Y if the fX1x2 is an even function of x? 4.89. Find and plot the cdf of Y in Example 4.34. 4.90. A voltage X is a Gaussian random variable with mean 1 and variance 2. Find the pdf of the power dissipated by an R-ohm resistor P = RX2. 4.91. Let Y = eX. (a) Find the cdf and pdf of Y in terms of the cdf and pdf of X. (b) Find the pdf of Y when X is a Gaussian random variable. In this case Y is said to be a lognormal random variable. Plot the pdf and cdf of Y when X is zero-mean with variance 1/8; repeat with variance 8. 4.92. Let a radius be given by the random variable X in Problem 4.18. (a) Find the pdf of the area covered by a disc with radius X. (b) Find the pdf of the volume of a sphere with radius X. (c) Find the pdf of the volume of a sphere in Rn: Y = b 12p21n - 12/2 Xn/12 * 4 * Á * n2 212p21n - 12/2 Xn/11 * 3 * Á * n2 for n even for n odd. 4.93. In the quantizer in Example 4.20, let Z = X - q1X2. Find the pdf of Z if X is a Laplacian random variable with parameter a = d/2. 4.94. Let Y = a tan pX, where X is uniformly distributed in the interval 1-1, 12. (a) Show that Y is a Cauchy random variable. (b) Find the pdf of Y = 1/X. 224 Chapter 4 One Random Variable 4.95. Let X be a Weibull random variable in Problem 4.15. Let Y = 1X/l2b. Find the cdf and pdf of Y. 4.96. Find the pdf of X = -ln11 - U2, where U is a uniform random variable in (0, 1). Section 4.6: The Markov and Chebyshev Inequalities 4.97. Compare the Markov inequality and the exact probability for the event 5X 7 c6 as a function of c for: (a) X is a uniform random variable in the interval [0, b]. (b) X is an exponential random variable with parameter l. (c) X is a Pareto random variable with a 7 1. (d) X is a Rayleigh random variable. 4.98. Compare the Markov inequality and the exact probability for the event 5X 7 c6 as a function of c for: (a) X is a uniform random variable in 51, 2, Á , L6. (b) X is a geometric random variable. (c) X is a Zipf random variable with L = 10; L = 100. (d) X is a binomial random variable with n = 10, p = 0.5; n = 50, p = 0.5. 4.99. Compare the Chebyshev inequality and the exact probability for the event 5 ƒ X - m ƒ 7 c6 as a function of c for: (a) X is a uniform random variable in the interval 3-b, b4. (b) X is a Laplacian random variable with parameter a. (c) X is a zero-mean Gaussian random variable. (d) X is a binomial random variable with n = 10, p = 0.5; n = 50, p = 0.5. 4.100. Let X be the number of successes in n Bernoulli trials where the probability of success is p. Let Y = X/n be the average number of successes per trial. Apply the Chebyshev inequality to the event 5 ƒ Y - p ƒ 7 a6. What happens as n : q ? 4.101. Suppose that light bulbs have exponentially distributed lifetimes with unknown mean E[X]. Suppose we measure the lifetime of n light bulbs, and we estimate the mean E[X] by the arithmetic average Y of the measurements. Apply the Chebyshev inequality to the event 5 ƒ Y - E3X4 ƒ 7 a6. What happens as n : q ? Hint: Use the m-Erlang random variable. Section 4.7: Transform Methods 4.102. (a) Find the characteristic function of the uniform random variable in 3 -b, b4. (b) Find the mean and variance of X by applying the moment theorem. 4.103. (a) Find the characteristic function of the Laplacian random variable. (b) Find the mean and variance of X by applying the moment theorem. 4.104. Let £ X1v2 be the characteristic function of an exponential random variable. What random variable does £ nX1v2 correspond to? Problems 225 4.105. Find the mean and variance of the Gaussian random variable by applying the moment theorem to the characteristic function given in Table 4.1. 4.106. Find the characteristic function of Y = aX + b where X is a Gaussian random variable. Hint: Use Eq. (4.79). 4.107. Show that the characteristic function for the Cauchy random variable is e -ƒvƒ. 4.108. Find the Chernoff bound for the exponential random variable with l = 1. Compare the bound to the exact value for P3X 7 54. 4.109. (a) Find the probability generating function of the geometric random variable. (b) Find the mean and variance of the geometric random variable from its pgf. 4.110. (a) Find the pgf for the binomial random variable X with parameters n and p. (b) Find the mean and variance of X from the pgf. 4.111. Let GX1z2 be the pgf for a binomial random variable with parameters n and p, and let GY1z2 be the pgf for a binomial random variable with parameters m and p. Consider the function GX1z2 GY1z2. Is this a valid pgf? If so, to what random variable does it correspond? 4.112. Let GN1z2 be the pgf for a Poisson random variable with parameter a, and let GM1z2 be the pgf for a Poisson random variable with parameters b. Consider the function GN1z2 GM1z2. Is this a valid pgf? If so, to what random variable does it correspond? 4.113. Let N be a Poisson random variable with parameter a = 1. Compare the Chernoff bound and the exact value for P3X Ú 54. 4.114. (a) Find the pgf GU1z2 for the discrete uniform random variable U. (b) Find the mean and variance from the pgf. (c) Consider GU1z22. Does this function correspond to a pgf? If so, find the mean of the corresponding random variable. 4.115. (a) Find P3X = r4 for the negative binomial random variable from the pgf in Table 3.1. (b) Find the mean of X. 4.116. Derive Eq. (4.89). 4.117. Obtain the nth moment of a gamma random variable from the Laplace transform of its pdf. 4.118. Let X be the mixture of two exponential random variables (see Example 4.58). Find the Laplace transform of the pdf of X. 4.119. The Laplace transform of the pdf of a random variable X is given by: X * 1s2 = b a . s + as + b Find the pdf of X. Hint: Use a partial fraction expansion of X*1s2. 4.120. Find a relationship between the Laplace transform of a gamma random variable pdf with parameters a and l and the Laplace transform of a gamma random variable with parameters a - 1 and l. What does this imply if X is an m-Erlang random variable? 4.121. (a) Find the Chernoff bound for P3X 7 t4 for the gamma random variable. (b) Compare the bound to the exact value of P3X Ú 94 for an m = 3, l = 1 Erlang random variable. 226 Chapter 4 One Random Variable Section 4.8: Basic Reliability Calculations 4.122. The lifetime T of a device has pdf 1/10T0 fT1t2 = c 0.9le -l1t - T02 0 0 6 t 6 T0 t Ú T0 t 6 T0 . (a) Find the reliability and MTTF of the device. (b) Find the failure rate function. (c) How many hours of operation can be considered to achieve 99% reliability? 4.123. The lifetime T of a device has pdf fT1t2 = b 4.124. 4.125. 4.126. 4.127. 4.128. 1/T0 0 a … t … a + T0 elsewhere. (a) Find the reliability and MTTF of the device. (b) Find the failure rate function. (c) How many hours of operation can be considered to achieve 99% reliability? The lifetime T of a device is a Rayleigh random variable. (a) Find the reliability of the device. (b) Find the failure rate function. Does r(t) increase with time? (c) Find the reliability of two devices that are in series. (d) Find the reliability of two devices that are in parallel. The lifetime T of a device is a Weibull random variable. (a) Plot the failure rates for a = 1 and b = 0.5; for a = 1 and b = 2. (b) Plot the reliability functions in part a. (c) Plot the reliability of two devices that are in series. (d) Plot the reliability of two devices that are in parallel. A system starts with m devices, 1 active and m - 1 on standby. Each device has an exponential lifetime. When a device fails it is immediately replaced with another device (if one is still available). (a) Find the reliability of the system. (b) Find the failure rate function. Find the failure rate function of the memory chips discussed in Example 2.28. Plot In(r(t)) versus at. A device comes from two sources. Devices from source 1 have mean m and exponentially distributed lifetimes. Devices from source 2 have mean m and Pareto-distributed lifetimes with a 7 1. Assume a fraction p is from source 1 and a fraction 1 - p from source 2. (a) Find the reliability of an arbitrarily selected device. (b) Find the failure rate function. Problems 227 4.129. A device has the failure rate function: 1 + 911 - t2 r1t2 = c 1 1 + 101t - 102 4.130. 4.131. 4.132. 4.133. 0 … t 6 1 1 … t 6 10 t Ú 10. Find the reliability function and the pdf of the device. A system has three identical components and the system is functioning if two or more components are functioning. (a) Find the reliability and MTTF of the system if the component lifetimes are exponential random variables with mean 1. (b) Find the reliability of the system if one of the components has mean 2. Repeat Problem 4.130 if the component lifetimes are Weibull distributed with b = 3. A system consists of two processors and three peripheral units. The system is functioning as long as one processor and two peripherals are functioning. (a) Find the system reliability and MTTF if the processor lifetimes are exponential random variables with mean 5 and the peripheral lifetimes are Rayleigh random variables with mean 10. (b) Find the system reliability and MTTF if the processor lifetimes are exponential random variables with mean 10 and the peripheral lifetimes are exponential random variables with mean 5. An operation is carried out by a subsystem consisting of three units that operate in a series configuration. (a) The units have exponentially distributed lifetimes with mean 1. How many subsystems should be operated in parallel to achieve a reliability of 99% in T hours of operation? (b) Repeat part a with Rayleigh-distributed lifetimes. (c) Repeat part a with Weibull-distributed lifetimes with b = 3. Section 4.9: Computer Methods for Generating Random Variables 4.134. Octave provides function calls to evaluate the pdf and cdf of important continuous random variables. For example, the functions \normal_cdf(x, m, var) and normal_pdf(x, m, var) compute the cdf and pdf, respectively, at x for a Gaussian random variable with mean m and variance var. (a) Plot the conditional pdfs in Example 4.11 if v = ;2 and the noise is zero-mean and unit variance. (b) Compare the cdf of the Gaussian random variable with the Chernoff bound obtained in Example 4.44. 4.135. Plot the pdf and cdf of the gamma random variable for the following cases. (a) l = 1 and a = 1, 2, 4. (b) l = 1/2 and a = 1/2, 1, 3/2, 5/2. 228 Chapter 4 One Random Variable 4.136. The random variable X has the triangular pdf shown in Fig. P4.4. (a) Find the transformation needed to generate X. (b) Use Octave to generate 100 samples of X. Compare the empirical pdf of the samples with the desired pdf. fX (x) c ⫺a 0 a x FIGURE P4.4 4.137. For each of the following random variables: Find the transformation needed to generate the random variable X; use Octave to generate 1000 samples of X; Plot the sequence of outcomes; compare the empirical pdf of the samples with the desired pdf. (a) Laplacian random variable with a = 1. (b) Pareto random variable with a = 1.5, 2, 2.5. (c) Weibull random variable with b = 0.5, 2, 3 and l = 1. 4.138. A random variable Y of mixed type has pdf fY1x2 = pd1x2 + 11 - p2fY1x2, 4.139. 4.140. 4.141. 4.142. where X is a Laplacian random variable and p is a number between zero and one. Find the transformation required to generate Y. Specify the transformation method needed to generate the geometric random variable with parameter p = 1/2. Find the average number of comparisons needed in the search to determine each outcome. Specify the transformation method needed to generate the Poisson random variable with small parameter a. Compute the average number of comparisons needed in the search. The following rejection method can be used to generate Gaussian random variables: 1. Generate U1 , a uniform random variable in the unit interval. 2. Let X1 = -ln1U12. 3. Generate U2 , a uniform random variable in the unit interval. If U2 … exp5-1X 1 - 122/26, accept X1 . Otherwise, reject X1 and go to step 1. 4. Generate a random sign 1+ or -2 with equal probability. Output X equal to X1 with the resulting sign. (a) Show that if X1 is accepted, then its pdf corresponds to the pdf of the absolute value of a Gaussian random variable with mean 0 and variance 1. (b) Show that X is a Gaussian random variable with mean 0 and variance 1. Cheng (1977) has shown that the function KfZ1x2 bounds the pdf of a gamma random variable with a 7 1, where fZ1x2 = lalxl - 1 1al + xl22 and K = 12a - 121/2. Find the cdf of fZ1x2 and the corresponding transformation needed to generate Z. Problems 229 4.143. (a) Show that in the modified rejection method, the probability of accepting X1 is 1/K. Hint: Use conditional probability. (b) Show that Z has the desired pdf. 4.144. Two methods for generating binomial random variables are: (1) Generate n Bernoulli random variables and add the outcomes; (2) Divide the unit interval according to binomial probabilities. Compare the methods under the following conditions: (a) p = 1/2, n = 5, 25, 50; (b) p = 0.1, n = 5, 25, 50. (c) Use Octave to implement the two methods by generating 1000 binomially distributed samples. 4.145. Let the number of event occurrences in a time interval be a Poisson random variable. In Section 3.4, it was found that the time between events for a Poisson random variable is an exponentially distributed random variable. (a) Explain how one can generate Poisson random variables from a sequence of exponentially distributed random variables. (b) How does this method compare with the one presented in Problem 4.140? (c) Use Octave to implement the two methods when a = 3, a = 25, and a = 100. 4.146. Write a program to generate the gamma pdf with a 7 1 using the rejection method discussed in Problem 4.142. Use this method to generate m-Erlang random variables with m = 2, 10 and l = 1 and compare the method to the straightforward generation of m exponential random variables as discussed in Example 4.57. *Section 4.10: Entropy 4.147. Let X be the outcome of the toss of a fair die. (a) Find the entropy of X. (b) Suppose you are told that X is even. What is the reduction in entropy? 4.148. A biased coin is tossed three times. (a) Find the entropy of the outcome if the sequence of heads and tails is noted. (b) Find the entropy of the outcome if the number of heads is noted. (c) Explain the difference between the entropies in parts a and b. 4.149. Let X be the number of tails until the first heads in a sequence of tosses of a biased coin. (a) Find the entropy of X given that X Ú k. (b) Find the entropy of X given that X … k. 4.150. One of two coins is selected at random: Coin A has P[heads] = 1/10 and coin B has P[heads] = 9/10. (a) Suppose the coin is tossed once. Find the entropy of the outcome. (b) Suppose the coin is tossed twice and the sequence of heads and tails is observed. Find the entropy of the outcome. 4.151. Suppose that the randomly selected coin in Problem 4.150 is tossed until the first occurrence of heads. Suppose that heads occurs in the kth toss. Find the entropy regarding the identity of the coin. 4.152. A communication channel accepts input I from the set 50, 1, 2, 3, 4, 5, 66. The channel output is X = I + N mod 7, where N is equally likely to be +1 or -1. (a) Find the entropy of I if all inputs are equiprobable. (b) Find the entropy of I given that X = 4. 230 Chapter 4 One Random Variable 4.153. Let X be a discrete random variable with entropy HX . (a) Find the entropy of Y = 2X. (b) Find the entropy of any invertible transformation of X. 4.154. Let (X, Y) be the pair of outcomes from two independent tosses of a die. (a) Find the entropy of X. (b) Find the entropy of the pair (X, Y). (c) Find the entropy in n independent tosses of a die. Explain why entropy is additive in this case. 4.155. Let X be the outcome of the toss of a die, and let Y be a randomly selected integer less than or equal to X. (a) Find the entropy of Y. (b) Find the entropy of the pair (X, Y) and denote it by H(X, Y). (c) Find the entropy of Y given X = k and denote it by g1k2 = H1Y ƒ X = k2. Find E3g1X24 = E3H1Y ƒ X24. (d) Show that H1X, Y2 = HX + E3H1Y ƒ X24. Explain the meaning of this equation. 4.156. Let X take on values from 51, 2, Á , K6. Suppose that P3X = K4 = p, and let HY be the entropy of X given that X is not equal to K. Show that HX = -p ln p - 11 - p2 ln11 - p2 + 11 - p2HY . 4.157. Let X be a uniform random variable in Example 4.62. Find and plot the entropy of Q as a function of the variance of the error X - Q1X2. Hint: Express the variance of the error in terms of d and substitute into the expression for the entropy of Q. 4.158. A communication channel accepts as input either 000 or 111. The channel transmits each binary input correctly with probability 1 - p and erroneously with probability p. Find the entropy of the input given that the output is 000; given that the output is 010. 4.159. Let X be a uniform random variable in the interval 3-a, a4. Suppose we are told that the X is positive. Use the approach in Example 4.62 to find the reduction in entropy. Show that this is equal to the difference of the differential entropy of X and the differential entropy of X given 5X 7 06. 4.160. Let X be uniform in [a, b], and let Y = 2X. Compare the differential entropies of X and Y. How does this result differ from the result in Problem 4.153? 4.161. Find the pmf for the random variable X for which the sequence of questions in Fig. 4.26(a) is optimum. 4.162. Let the random variable X have SX = 51, 2, 3, 4, 5, 66 and pmf (3/8, 3/8, 1/8, 1/16, 1/32, 1/32). Find the entropy of X. What is the best code you can find for X? 4.163. Seven cards are drawn from a deck of 52 distinct cards. How many bits are required to represent all possible outcomes? 4.164. Find the optimum encoding for the geometric random variable with p = 1/2. 4.165. An urn experiment has 10 equiprobable distinct outcomes. Find the performance of the best tree code for encoding (a) a single outcome of the experiment; (b) a sequence of n outcomes of the experiment. 4.166. A binary information source produces n outputs. Suppose we are told that there are k 1’s in these n outputs. (a) What is the best code to indicate which pattern of k 1’s and n - k 0’s occurred? (b) How many bits are required to specify the value of k using a code with a fixed number of bits? Problems 231 4.167. The random variable X takes on values from the set 51, 2, 3, 46. Find the maximum entropy pmf for X given that E3X4 = 2. 4.168. The random variable X is nonnegative. Find the maximum entropy pdf for X given that E3X4 = 10. 4.169. Find the maximum entropy pdf of X given that E3X24 = c. 4.170. Suppose we are given two parameters of the random variable X, E3g11X24 = c1 and E3g21X24 = c2 . (a) Show that the maximum entropy pdf for X has the form fX1x2 = Ce -l1g11x2 - l2g21x2. (b) Find the entropy of X. 4.171. Find the maximum entropy pdf of X given that E3X4 = m and VAR3X4 = s2. Problems Requiring Cumulative Knowledge 4.172. Three types of customers arrive at a service station. The time required to service type 1 customers is an exponential random variable with mean 2. Type 2 customers have a Pareto distribution with a = 3 and xm = 1. Type 3 customers require a constant service time of 2 seconds. Suppose that the proportion of type 1, 2, and 3 customers is 1/2, 1/8, and 3/8, respectively. Find the probability that an arbitrary customer requires more than 15 seconds of service time. Compare the above probability to the bound provided by the Markov inequality. 4.173. The lifetime X of a light bulb is a random variable with P3X 7 t4 = 2/12 + t2 for t 7 0. Suppose three new light bulbs are installed at time t = 0. At time t = 1 all three light bulbs are still working. Find the probability that at least one light bulb is still working at time t = 9. 4.174. The random variable X is uniformly distributed in the interval [0, a]. Suppose a is unknown, so we estimate a by the maximum value observed in n independent repetitions of the experiment; that is, we estimate a by Y = max5X1 , X2 , Á , Xn6. (a) Find P3Y … y4. (b) Find the mean and variance of Y, and explain why Y is a good estimate for a when N is large. 4.175. The sample X of a signal is a Gaussian random variable with m = 0 and s2 = 1. Suppose that X is quantized by a nonuniform quantizer consisting of four intervals: 1- q , -a4, 1-a, 04, 10, a4, and 1a, q 2. (a) Find the value of a so that X is equally likely to fall in each of the four intervals. (b) Find the representation point xi = q1X2 for X in (0, a] that minimizes the meansquared error, that is, 3 0 a 1x - x122 fX1x2 dx is minimized. Hint: Differentiate the above expression with respect to xi . Find the representation points for the other intervals. (c) Evaluate the mean-squared error of the quantizer E31X - q1X224. 232 Chapter 4 One Random Variable 4.176. The output Y of a binary communication system is a unit-variance Gaussian random with mean zero when the input is “0” and mean one when the input is “one”. Assume the input is 1 with probability p. (a) Find P3input is 1 ƒ y 6 Y 6 y + h4 and P3input is 0 ƒ y 6 Y 6 y + h4. (b) The receiver uses the following decision rule: If P3input is 1 ƒ y 6 Y 6 y + h4 7 P3input is 0 ƒ y 6 Y 6 y + h4, decide input was 1; otherwise, decide input was 0. Show that this decision rule leads to the following threshold rule: If Y 7 T, decide input was 1; otherwise, decide input was 0. (c) What is the probability of error for the above decision rule? CHAPTER Pairs of Random Variables 5 Many random experiments involve several random variables. In some experiments a number of different quantities are measured. For example, the voltage signals at several points in a circuit at some specific time may be of interest. Other experiments involve the repeated measurement of a certain quantity such as the repeated measurement (“sampling”) of the amplitude of an audio or video signal that varies with time. In Chapter 4 we developed techniques for calculating the probabilities of events involving a single random variable in isolation. In this chapter, we extend the concepts already introduced to two random variables: • We use the joint pmf, cdf, and pdf to calculate the probabilities of events that involve the joint behavior of two random variables; • We use expected value to define joint moments that summarize the behavior of two random variables; • We determine when two random variables are independent, and we quantify their degree of “correlation” when they are not independent; • We obtain conditional probabilities involving a pair of random variables. In a sense we have already covered all the fundamental concepts of probability and random variables, and we are “simply” elaborating on the case of two or more random variables. Nevertheless, there are significant analytical techniques that need to be learned, e.g., double summations of pmf’s and double integration of pdf’s, so we first discuss the case of two random variables in detail because we can draw on our geometric intuition. Chapter 6 considers the general case of vector random variables. Throughout these two chapters you should be mindful of the forest (fundamental concepts) and the trees (specific techniques)! 5.1 TWO RANDOM VARIABLES The notion of a random variable as a mapping is easily generalized to the case where two quantities are of interest. Consider a random experiment with sample space S and event class F. We are interested in a function that assigns a pair of real numbers 233 234 Chapter 5 Pairs of Random Variables S R2 y X(z) z x (a) S y A X(z) z B x (b) FIGURE 5.1 (a) A function assigns a pair of real numbers to each outcome in S. (b) Equivalent events for two random variables. X1z2 = 1X1z2, Y1z22 to each outcome z in S. Basically we are dealing with a vector function that maps S into R 2, the real plane, as shown in Fig. 5.1(a). We are ultimately interested in events involving the pair (X, Y). Example 5.1 Let a random experiment consist of selecting a student’s name from an urn. Let z denote the outcome of this experiment, and define the following two functions: H1z2 = height of student z in centimeters W1z2 = weight of student z in kilograms 1H1z2, W1z22 assigns a pair of numbers to each z in S. We are interested in events involving the pair (H, W). For example, the event B = 5H … 183, W … 826 represents students with height less that 183 cm (6 feet) and weight less than 82 kg (180 lb). Example 5.2 A Web page provides the user with a choice either to watch a brief ad or to move directly to the requested page. Let z be the patterns of user arrivals in T seconds, e.g., number of arrivals, and listing of arrival times and types. Let N11z2 be the number of times the Web page is directly requested and let N21z2 be the number of times that the ad is chosen. 1N11z2, N21z22 assigns a pair of nonnegative integers to each z in S. Suppose that a type 1 request brings 0.001¢ in revenue and a type 2 request brings in 1¢. Find the event “revenue in T seconds is less than $100.” The total revenue in T seconds is 0.001 N1 + 1 N2 , and so the event of interest is B = 50.001 N1 + 1 N2 6 10,0006. Section 5.1 Two Random Variables 235 Example 5.3 Let the outcome z in a random experiment be the length of a randomly selected message. Suppose that messages are broken into packets of maximum length M bytes. Let Q be the number of full packets in a message and let R be the number of bytes left over. 1Q1z2, R1z22 assigns a pair of numbers to each z in S. Q takes on values in the range 0, 1, 2, Á , and R takes on values in the range 0, 1, Á , M - 1. An event of interest may be B = 5R 6 M/26, “the last packet is less than half full.” Example 5.4 Let the outcome of a random experiment result in a pair z = 1z1 , z22 that results from two independent spins of a wheel. Each spin of the wheel results in a number in the interval 10, 2p]. Define the pair of numbers (X, Y) in the plane as follows: X1z2 = ¢ 2 ln 2p 1/2 ≤ cos z2 z1 Y1z2 = ¢ 2 ln 2p 1/2 ≤ sin z2 . z1 The vector function 1X1z2, Y1z22 assigns a pair of numbers in the plane to each z in S. The square root term corresponds to a radius and to z2 an angle. We will see that (X, Y) models the noise voltages encountered in digital communication systems. An event of interest here may be B = 5X2 + Y2 6 r26, “total noise power is less than r2.” The events involving a pair of random variables (X, Y) are specified by conditions that we are interested in and can be represented by regions in the plane. Figure 5.2 shows three examples of events: A = 5X + Y … 106 B = 5min1X, Y2 … 56 C = 5X2 + Y2 … 1006. Event A divides the plane into two regions according to a straight line. Note that the event in Example 5.2 is of this type. Event C identifies a disk centered at the origin and y y y (0, 10) (5, 5) (0, 10) C B (10, 0) x A FIGURE 5.2 Examples of two-dimensional events. x (10, 0) x 236 Chapter 5 Pairs of Random Variables it corresponds to the event in Example 5.4. Event B is found by noting that 5min1X, Y2 … 56 = 5X … 56 ´ 5Y … 56, that is, the minimum of X and Y is less than or equal to 5 if either X and/or Y is less than or equal to 5. To determine the probability that the pair X = 1X, Y2 is in some region B in the plane, we proceed as in Chapter 3 to find the equivalent event for B in the underlying sample space S: (5.1a) A = X -11B2 = 5z: 1X1z2, Y1z22 in B6. The relationship between A = X -11B2 and B is shown in Fig. 5.1(b). If A is in F, then it has a probability assigned to it, and we obtain: P3X in B4 = P3A4 = P35z: 1X1z2, Y1z22 in B64. (5.1b) The approach is identical to what we followed in the case of a single random variable. The only difference is that we are considering the joint behavior of X and Y that is induced by the underlying random experiment. A scattergram can be used to deduce the joint behavior of two random variables. A scattergram plot simply places a dot at every observation pair (x, y) that results from performing the experiment that generates (X, Y). Figure 5.3 shows the scattergram for 200 observations of four different pairs of random variables. The pairs in Fig. 5.3(a) appear to be uniformly distributed in the unit square. The pairs in Fig. 5.3(b) are clearly confined to a disc of unit radius and appear to be more concentrated near the origin. The pairs in Fig. 5.3(c) are concentrated near the origin, and appear to have circular symmetry, but are not bounded to an enclosed region. The pairs in Fig. 5.3(d) again are concentrated near the origin and appear to have a clear linear relationship of some sort, that is, larger values of x tend to have linearly proportional increasing values of y. We later introduce various functions and moments to characterize the behavior of pairs of random variables illustrated in these examples. The joint probability mass function, joint cumulative distribution function, and joint probability density function provide approaches to specifying the probability law that governs the behavior of the pair (X, Y). Our general approach is as follows. We first focus on events that correspond to rectangles in the plane: B = 5X in A 16 ¨ 5Y in A 26 (5.2) where A k is a one-dimensional event (i.e., subset of the real line). We say that these events are of product form. The event B occurs when both 5X in A 16 and 5Y in A 26 occur jointly. Figure 5.4 shows some two-dimensional product-form events. We use Eq. (5.1b) to find the probability of product-form events: P3B4 = P35X in A 16 ¨ 5Y in A 264 ! P3X in A 1 , Y in A n4. (5.3) By defining A appropriately we then obtain the joint pmf, joint cdf, and joint pdf of (X, Y). 5.2 PAIRS OF DISCRETE RANDOM VARIABLES Let the vector random variable X = 1X, Y2 assume values from some countable set SX,Y = 51xj , yk2, j = 1, 2, Á , k = 1, 2, Á 6. The joint probability mass function of X specifies the probabilities of the event 5X = x6 ¨ 5Y = y6: Section 5.2 1 237 Pairs of Discrete Random Variables 1.5 1.0 0.8 0.5 0.6 y y 0 0.4 –0.5 0.2 0 –1 0.2 0 0.6 0.4 0.8 –1.5 –1.5 1 –1 –0.5 0 x (a) y 4 3 3 2 2 1 1 y 0 0 –1 –1 –2 –2 –3 –3 –4 –4 –3 –2 1.5 1.0 (b) 4 –4 0.5 x 0 –1 1 2 3 4 –4 –3 –2 –1 0 1 3 2 x x (c) (d) FIGURE 5.3 A scattergram for 200 observations of four different pairs of random variables. y (x1, y2) (x2, y2) y y y2 y2 y1 y1 x {x1  X  x2} 傽 {Y  y2} x1 x2 x {x1  X  x2} 傽 {y1  Y  y2} FIGURE 5.4 Some two-dimensional product-form events. ⫺x1 x1 { X  x1} 傽 {y1  Y  y2} x 4 238 Chapter 5 Pairs of Random Variables pX,Y1x, y2 = P35X = x6 ¨ 5Y = y64 for 1x, y2 H R2. ! P3X = x, Y = y4 (5.4a) The values of the pmf on the set SX,Y provide the essential information: pX,Y1xj , yk2 = P35X = xj6 ¨ 5Y = yk64 ! P3X = xj , Y = yk4 1xj , yk2 H SX,Y . (5.4b) There are several ways of showing the pmf graphically: (1) For small sample spaces we can present the pmf in the form of a table as shown in Fig. 5.5(a). (2) We can present the pmf using arrows of height pX,Y1xj , yk2 placed at the points 51xj , yk26 in the plane, as shown in Fig. 5.5(b), but this can be difficult to draw. (3) We can place dots at the points 51xj , yk26 and label these with the corresponding pmf value as shown in Fig. 5.5(c). The probability of any event B is the sum of the pmf over the outcomes in B: P3X in B4 = a a pX,Y1xj , yk2. (5.5) 1xj,yk2 in B Frequently it is helpful to sketch the region that contains the points in B as shown, for example, in Fig. 5.6. When the event B is the entire sample space SX,Y , we have: a a pX,Y1xj , yk2 = 1. q q (5.6) j=1 k=1 Example 5.5 A packet switch has two input ports and two output ports. At a given time slot a packet arrives at each input port with probability 1/2, and is equally likely to be destined to output port 1 or 2. Let X and Y be the number of packets destined for output ports 1 and 2, respectively. Find the pmf of X and Y, and show the pmf graphically. The outcome Ij for an input port j can take the following values: “n”, no packet arrival (with probability 1/2); “a1”, packet arrival destined for output port 1 (with probability 1/4); “a2”, packet arrival destined for output port 2 (with probability 1/4). The underlying sample space S consists of the pair of input outcomes z = 1I1 , I22. The mapping for (X, Y) is shown in the table below: z (n, n) X, Y (0, 0) (n, a1) (n, a2) (a1, n) (a1, a1) (a1, a2) (a2, n) (a2, a1) (a2, a2) (1, 0) (0, 1) (1, 0) (2, 0) (1, 1) (0, 1) (1, 1) (0, 2) The pmf of (X, Y) is then: pX,Y10, 02 = P3z = 1n, n24 = 11 1 = , 22 4 pX,Y10, 12 = P3z H 51n, a22, 1a2, n264 = 2 * 1 1 = , 8 4 Pairs of Discrete Random Variables PX (2) ⫽ 1/16 PX (1) ⫽ 6/16 PX (0) ⫽ 9/16 Section 5.2 PY (2) ⫽ 1/16 2 1/16 y 1 1/4 1/8 0 1/4 1/4 1/16 0 1 x (a) 2 PY (1) ⫽ 6/16 PY (0) ⫽ 9/16 y x 1 16 1 8 1 4 2 2 1 4 y 6 16 1 16 1 4 1 16 1 9 16 2 9 16 1 6 16 x 1 16 2 0 1 1 0 0 (b) y 3 1 16 2 1 0 0 1 4 1 8 1 4 1 4 1 1 16 2 (c) x 3 FIGURE 5.5 Graphical representations of pmf’s: (a) in table format; (b) use of arrows to show height; (c) labeled dots corresponding to pmf value. 239 240 Chapter 5 Pairs of Random Variables y 6 5 4 3 2 1/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 1/42 2/42 1/42 1/42 1/42 1/42 1 1 2 3 4 5 x 6 FIGURE 5.6 Showing the pmf via a sketch containing the points in B. pX,Y11, 02 = P3z H 51n, a12, 1a1, n264 = 1 , 4 1 pX,Y11, 12 = P3z H 51a1, a22, 1a2, a1264 = , 8 1 pX,Y10, 22 = P3z = 1a2, a224 = , 16 1 . pX,Y12, 02 = P3z = 1a1, a124 = 16 Figure 5.5(a) shows the pmf in tabular form where the number of rows and columns accommodate the range of X and Y respectively. Each entry in the table gives the pmf value for the corresponding x and y. Figure 5.5(b) shows the pmf using arrows in the plane. An arrow of height pX,Y1j, k2 is placed at each of the points in SX,Y = 510, 02, 10, 12, 11, 02, 11, 12, 10, 22, 12, 026. Figure 5.5(c) shows the pmf using labeled dots in the plane. A dot with label pX,Y1j, k2 is placed at each of the points in SX,Y . Example 5.6 A random experiment consists of tossing two “loaded” dice and noting the pair of numbers (X, Y) facing up. The joint pmf pX,Y1j, k2 for j = 1, Á , 6 and k = 1, Á , 6 is given by the twodimensional table shown in Fig. 5.6. The (j, k) entry in the table contains the value pX,Y1j, k2. Find the P3min1X, Y2 = 34. Figure 5.6 shows the region that corresponds to the set 5min1x, y2 = 36. The probability of this event is given by: Section 5.2 241 Pairs of Discrete Random Variables P3min1X, Y2 = 34 = pX,Y16, 32 + pX,Y15, 32 + pX,Y14, 32 + pX,Y13, 32 + pX,Y13, 42 + pX,Y13, 52 + pX,Y13, 62 = 6a 5.2.1 2 8 1 b + = . 42 42 42 Marginal Probability Mass Function The joint pmf of X provides the information about the joint behavior of X and Y. We are also interested in the probabilities of events involving each of the random variables in isolation. These can be found in terms of the marginal probability mass functions: pX1xj2 = P3X = xj4 = P3X = xj , Y = anything4 = P35X = xj and Y = y16 ´ 5X = xj and Y = y26 ´ = a pX,Y1xj , yk2, q Á4 (5.7a) k=1 and similarly, pY1yk2 = P3Y = yk4 = a pX,Y1xj , yk2. q (5.7b) j=1 The marginal pmf’s satisfy all the properties of one-dimensional pmf’s, and they supply the information required to compute the probability of events involving the corresponding random variable. The probability pX,Y1xj , yk2 can be interpreted as the long-term relative frequency of the joint event 5X = Xj6 ¨ 5Y = Yk6 in a sequence of repetitions of the random experiment. Equation (5.7a) corresponds to the fact that the relative frequency of the event 5X = Xj6 is found by adding the relative frequencies of all outcome pairs in which Xj appears. In general, it is impossible to deduce the relative frequencies of pairs of values X and Y from the relative frequencies of X and Y in isolation. The same is true for pmf’s: In general, knowledge of the marginal pmf’s is insufficient to specify the joint pmf. Example 5.7 Find the marginal pmf for the output ports (X, Y) in Example 5.2. Figure 5.5(a) shows that the marginal pmf is found by adding entries along a row or column in the table. For example, by adding along the x = 1 column we have: pX112 = P3X = 14 = pX,Y11, 02 + pX,Y11, 12 = 1 1 3 + = . 4 8 8 Similarly, by adding along the y = 0 row: pY102 = P3Y = 04 = pX,Y10, 02 + pX,Y11, 02 + pX,Y12, 02 = Figure 5.5(b) shows the marginal pmf using arrows on the real line. 1 1 9 1 + + = . 4 4 16 16 242 Chapter 5 Pairs of Random Variables Example 5.8 Find the marginal pmf’s in the loaded dice experiment in Example 5.2. The probability that X = 1 is found by summing over the first row: P3X = 14 = 1 1 1 2 + + Á + = . 42 42 42 6 Similarly, we find that P3X = j4 = 1/6 for j = 2, Á , 6. The probability that Y = k is found by summing over the kth column. We then find that P3Y = k4 = 1/6 for k = 1, 2, Á , 6. Thus each die, in isolation, appears to be fair in the sense that each face is equiprobable. If we knew only these marginal pmf’s we would have no idea that the dice are loaded. Example 5.9 In Example 5.3, let the number of bytes N in a message have a geometric distribution with parameter 1 - p and range SN = 50, 1, 2, Á 6. Find the joint pmf and the marginal pmf’s of Q and R. If a message has N bytes, then the number of full packets is the quotient Q in the division of N by M, and the number of remaining bytes is the remainder R. The probability of the pair 51q, r26 is given by P3Q = q, R = r4 = P3N = qM + r4 = 11 - p2pqM + r. The marginal pmf of Q is P3Q = q4 = P3N in5qM, qM + 1, Á , qM + 1M - 1264 = 1M - 12 qM + k a 11 - p2p k=0 = 11 - p2pqM 1 - pM = 11 - pM21pM2q 1 - p q = 0, 1, 2, Á The marginal pmf of Q is geometric with parameter pM. The marginal pmf of R is: P3R = r4 = P3N in5r, M + r, 2M + r, Á 64 q 11 - p2 = a 11 - p2pqM + r = pr r = 0, 1, Á , M - 1. 1 - pM q=0 R has a truncated geometric pmf. As an exercise, you should verify that all the above marginal pmf’s add to 1. 5.3 THE JOINT CDF OF X AND Y In Chapter 3 we saw that semi-infinite intervals of the form 1- q , x4 are a basic building block from which other one-dimensional events can be built. By defining the cdf FX1x2 as the probability of 1- q , x4, we were then able to express the probabilities of other events in terms of the cdf. In this section we repeat the above development for two-dimensional random variables. Section 5.3 y The Joint cdf of x and y 243 FX, Y (x1y1) ⫽ P[X  x1, Y  y1] (x1, y1) x FIGURE 5.7 The joint cumulative distribution function is defined as the probability of the semi-infinite rectangle defined by the point 1x1 , y12. A basic building block for events involving two-dimensional random variables is the semi-infinite rectangle defined by 51x, y2: x … x1 and y … y16, as shown in Fig. 5.7. We also use the more compact notation 5x … x1 , y … y16 to refer to this region. The joint cumulative distribution function of X and Y is defined as the probability of the event 5X … x16 ¨ 5Y … y16: FX,Y1x1 , y12 = P3X … x1 , Y … y14. (5.8) In terms of relative frequency, FX,Y1x1 , y12 represents the long-term proportion of time in which the outcome of the random experiment yields a point X that falls in the rectangular region shown in Fig. 5.7. In terms of probability “mass,” FX,Y1x1 , y12 represents the amount of mass contained in the rectangular region. The joint cdf satisfies the following properties. (i) The joint cdf is a nondecreasing function of x and y: FX,Y1x1 , y12 … FX,Y1x2 , y22 (ii) FX,Y1x1 , - q 2 = 0, if x1 … x2 and y1 … y2 , FX,Y1- q , y12 = 0, FX,Y1 q , q 2 = 1. (5.9a) (5.9b) (iii) We obtain the marginal cumulative distribution functions by removing the constraint on one of the variables. The marginal cdf’s are the probabilities of the regions shown in Fig. 5.8: FX1x12 = FX,Y1x1 , q 2 and FY1y12 = FX,Y1 q , y12. (5.9c) (iv) The joint cdf is continuous from the “north” and from the “east,” that is, lim FX,Y1x, y2 = FX,Y1a, y2 and x : a+ lim FX,Y1x, y2 = FX,Y1x, b2. y : b+ (5.9d) (v) The probability of the rectangle 5x1 6 x … x2 , y1 6 y … y26 is given by: P3x1 6 X … x2 , y1 6 Y … y24 = FX,Y1x2 , y22 - FX,Y1x2 , y12 - FX,Y1x1 , y22 + FX,Y1x1 , y12. (5.9e) 244 Chapter 5 Pairs of Random Variables y y y1 x1 x x FX ( x1) ⫽ P[X  x1, Y  ] FY ( y1) ⫽ P[X  , Y  y1] FIGURE 5.8 The marginal cdf’s are the probabilities of these half-planes. Property (i) follows by noting that the semi-infinite rectangle defined by 1x1 , y12 is contained in that defined by 1x2 , y22 and applying Corollary 7. Properties (ii) to (iv) are obtained by limiting arguments. For example, the sequence 5x … x1 and y … -n6 is decreasing and approaches the empty set , so FX,Y1x1 , - q 2 = lim FX,Y1x1 , -n2 = P34 = 0. n: q For property (iii) we take the sequence 5x … x1 and y … n6 which increases to 5x … x16, so lim FX,Y1x1 , n2 = P3X … x14 = FX1x12. n: q For property (v) note in Fig. 5.9(a) that B = 5x1 6 x … x2 , y … y16 = 5X … x2 , Y … y16 - 5X … x1 , Y … y16, so P3B4 = P3x1 6 X … x2 , Y … y14 = FX,Y1x2 , y12 - FX,Y1x1 , y12. In Fig. 5.9(b), note that FX,Y1x2 , y22 = P3A4 + P3B4 + FX,Y1x1 , y22. Property (v) follows by solving for P[A] and substituting the expression for P[B]. y y x1 x1 x2 x2 x x (x2, y2) (x1, y2) (x1, y1) y1 (x2, y1) B (a) y2 y1 A (x1, y1) B (b) FIGURE 5.9 The joint cdf can be used to determine the probability of various events. (x2, y1) Section 5.3 The Joint cdf of x and y 245 y 9 16 15 16 1 1 2 7 8 15 16 1 4 1 2 9 16 2 1 x 0 0 1 2 FIGURE 5.10 Joint cdf for packet switch example. Example 5.10 Plot the joint cdf of X and Y from Example 5.6. Find the marginal cdf of X. To find the cdf of X, we identify the regions in the plane according to which points in SX,Y are included in the rectangular region defined by (x, y). For example, • The regions outside the first quadrant do not include any of the points, so FX,Y1x, y2 = 0. • The region 50 … x 6 1, 0 … y 6 16 contains the point (0, 0), so FX,Y1x, y2 = 1/4. Figure 5.10 shows the cdf after all possible regions are examined. We need to consider several cases to find FX1x2. For x 6 0, we have FX1x2 = 0. For 0 … x 6 1, we have FX1x2 = FX,Y1x, q 2 = 9/16. For 1 … x 6 2, we have FX1x2 = FX,Y 1x, q 2 = 15/16. Finally, for x Ú 1, we have FX1x2 = FX,Y1x, q 2 = 1. Therefore FX(x) is a staircase function and X is a discrete random variable with pX102 = 9/16, pX112 = 6/16, and pX122 = 1/16. Example 5.11 The joint cdf for the pair of random variables X = 1X, Y2 is given by 0 xy FX,Y1x, y2 = e x y 1 x 0 0 0 x 6 … … … Ú 0 or y 6 0 x … 1, 0 … y … 1 x … 1, y 7 1 y … 1, x 7 1 1, y Ú 1. (5.10) Plot the joint cdf and find the marginal cdf of X. Figure 5.11 shows a plot of the joint cdf of X and Y. FX,Y1x, y2 is continuous for all points in the plane. FX,Y1x, y2 = 1 for all x Ú 1 and y Ú 1, which implies that X and Y each assume values less than or equal to one. 246 Chapter 5 Pairs of Random Variables 1 0.9 0.8 0.7 0.6 0.5 f (x, y) 0.4 0.3 0.2 1.5 0.1 1 y 0 1.5 0.5 1 0.5 0 x 0 FIGURE 5.11 Joint cdf for two uniform random variables. The marginal cdf of X is: 0 FX1x2 = FX,Y1x, q 2 = c x 1 x 6 0 0 … x … 1 x Ú 1. X is uniformly distributed in the unit interval. Example 5.12 The joint cdf for the vector of random variable X = 1X, Y2 is given by FX,Y1x, y2 = b 11 - e -ax211 - e -by2 0 x Ú 0, y Ú 0 elsewhere. Find the marginal cdf’s. The marginal cdf’s are obtained by letting one of the variables approach infinity: FX1x2 = lim FX,Y1x, y2 = 1 - e -ax x Ú 0 y: q FY1y2 = lim FX,Y1x, y2 = 1 - e -by y Ú 0. x: q X and Y individually have exponential distributions with parameters a and b, respectively. Section 5.3 The Joint cdf of x and y 247 Example 5.13 Find the probability of the events A = 5X … 1, Y … 16, B = 5X 7 x, Y 7 y6, where x 7 0 and y 7 0, and D = 51 6 X … 2, 2 6 Y … 56 in Example 5.12. The probability of A is given directly by the cdf: P3A4 = P3X … 1, Y … 14 = FX,Y11, 12 = 11 - e -a211 - e -b2. The probability of B requires more work. By DeMorgan’s rule: Bc = 15X 7 x6 ¨ 5Y 7 y62c = 5X … x6 ´ 5Y … y6. Corollary 5 in Section 2.2 gives the probability of the union of two events: P3Bc4 = P3X … x4 + P3Y … y4 - P3X … x, Y … y4 = 11 - e -ax2 + 11 - e -by2 - 11 - e -ax211 - e -by2 = 1 - e -axe -by. Finally we obtain the probability of B: P3B4 = 1 - P3Bc4 = e -axe -by. You should sketch the region B on the plane and identify the events involved in the calculation of the probability of Bc. The probability of event D is found by applying property (vi) of the joint cdf: P31 6 X … 2, 2 6 Y … 54 = FX,Y12, 52 - FX,Y12, 22 - FX,Y11, 52 + FX,Y11, 22 = 11 - e -2a211 - e -5b2 - 11 - e -2a211 - e -2b2 -11 - e -a211 - e -5b2 + 11 - e -a211 - e -2b2. 5.3.1 Random Variables That Differ in Type In some problems it is necessary to work with joint random variables that differ in type, that is, one is discrete and the other is continuous. Usually it is rather clumsy to work with the joint cdf, and so it is preferable to work with either P[X = k, Y … y] or P3X = k, y1 6 Y … y24. These probabilities are sufficient to compute the joint cdf should we have to. Example 5.14 Communication Channel with Discrete Input and Continuous Output The input X to a communication channel is +1 volt or -1 volt with equal probability. The output Y of the channel is the input plus a noise voltage N that is uniformly distributed in the interval from -2 volts to +2 volts. Find P3X = +1, Y … 04. This problem lends itself to the use of conditional probability: P3X = +1, Y … y4 = P3Y … y ƒ X = +14P3X = +14, 248 Chapter 5 Pairs of Random Variables where P3X = +14 = 1/2. When the input X = 1, the output Y is uniformly distributed in the interval 3-1, 34; therefore P3Y … y ƒ X = +14 = y + 1 4 for -1 … y … 3. Thus P3X = + 1, Y … 04 = P3Y … 0 ƒ X = +14P3X = +14 = 11/2211/42 = 1/8. 5.4 THE JOINT PDF OF TWO CONTINUOUS RANDOM VARIABLES The joint cdf allows us to compute the probability of events that correspond to “rectangular” shapes in the plane. To compute the probability of events corresponding to regions other than rectangles, we note that any reasonable shape (i.e., disk, polygon, or half-plane) can be approximated by the union of disjoint infinitesimal rectangles, Bj,k . For example, Fig. 5.12 shows how the events A = 5X + Y … 16 and B = 5X2 + X2 … 16 are approximated by rectangles of infinitesimal width. The probability of such events can therefore be approximated by the sum of the probabilities of infinitesimal rectangles, and if the cdf is sufficiently smooth, the probability of each rectangle can be expressed in terms of a density function: P3B4 L a a P3Bj,k4 = b fX,Y1xj , yk2 ¢x¢y. j 1xj, yk2HB k As ¢x and ¢y approach zero, the above equation becomes an integral of a probability density function over the region B. We say that the random variables X and Y are jointly continuous if the probabilities of events involving (X, Y) can be expressed as an integral of a probability density function. In other words, there is a nonnegative function fX,Y1x, y2, called the joint y y x x Bj,k Bj,k FIGURE 5.12 Some two-dimensional non-product form events. Section 5.4 The Joint pdf of Two Continuous Random Variables 249 f (x, y) y dA x FIGURE 5.13 The probability of A is the integral of fX,Y1x, y2 over the region defined by A. probability density function, that is defined on the real plane such that for every event B, a subset of the plane, P3X in B4 = LB L fX,Y1x¿, y¿2 dx¿ dy¿, (5.11) as shown in Fig. 5.13. Note the similarity to Eq. (5.5) for discrete random variables. When B is the entire plane, the integral must equal one: q q (5.12) fX,Y1x¿, y¿2 dx¿ dy¿. L- q L- q Equations (5.11) and (5.12) again suggest that the probability “mass” of an event is found by integrating the density of probability mass over the region corresponding to the event. The joint cdf can be obtained in terms of the joint pdf of jointly continuous random variables by integrating over the semi-infinite rectangle defined by (x, y): 1 = x y (5.13) fX,Y1x¿, y¿2 dx¿ dy¿. L- q L- q It then follows that if X and Y are jointly continuous random variables, then the pdf can be obtained from the cdf by differentiation: FX,Y1x, y2 = fX,Y1x, y2 = 0 2FX,Y1x, y2 0x 0y . (5.14) 250 Chapter 5 Pairs of Random Variables Note that if X and Y are not jointly continuous, then it is possible that the above partial derivative does not exist. In particular, if the FX,Y1x, y2 is discontinuous or if its partial derivatives are discontinuous, then the joint pdf as defined by Eq. (5.14) will not exist. The probability of a rectangular region is obtained by letting B = 51x, y2: a1 6 x … b1 and a2 6 y … b26 in Eq. (5.11): P3a1 6 X … b1 , a2 6 Y … b24 = b1 b2 La1 La2 fX,Y1x¿, y¿2 dx¿ dy¿. (5.15) It then follows that the probability of an infinitesimal rectangle is the product of the pdf and the area of the rectangle: P3x 6 X … x + dx, y 6 Y … y + dy4 = Lx x + dx Ly y + dy fX,Y1x¿, y¿2 dx¿ dy¿ M fX,Y1x, y2 dx dy. (5.16) Equation (5.16) can be interpreted as stating that the joint pdf specifies the probability of the product-form events 5x 6 X … x + dx6 ¨ 5y 6 Y … y + dy6. The marginal pdf’s fX1x2 and fY1y2 are obtained by taking the derivative of the corresponding marginal cdf’s, FX1x2 = FX,Y1x, q 2 and FY1y2 = FX,Y1 q , y2. Thus fX1x2 = q = q x d fX,Y1x¿, y¿2 dy¿ r dx¿ b dx L- q L- q L- q fX,Y1x,y¿2 dy¿. (5.17a) Similarly, fY1y2 = q (5.17b) fX,Y1x¿, y2 dx¿. L- q Thus the marginal pdf’s are obtained by integrating out the variables that are not of interest. Note that fX1x2 dx M P3x 6 X … x + dx, Y 6 q 4 is the probability of the infinitesimal strip shown in Fig. 5.14(a). This reminds us of the interpretation of the marginal pmf’s as the probabilities of columns and rows in the case of discrete random variables. It is not surprising then that Eqs. (5.17a) and (5.17b) for the marginal pdf’s and Eqs. (5.7a) and (5.7b) for the marginal pmf’s are identical except for the fact that one contains an integral and the other a summation. As in the case of pmf’s, we note that, in general, the joint pdf cannot be obtained from the marginal pdf’s. Section 5.4 The Joint pdf of Two Continuous Random Variables 251 y y y ⫹ dy x x ⫹ dx y x x fX(x)dx ⬵ P[x  X  x ⫹ dx, Y  ] fY(y)dy ⬵ P[X  , y  Y  y ⫹ dy] (a) (b) FIGURE 5.14 Interpretation of marginal pdf’s. Example 5.15 Jointly Uniform Random Variables A randomly selected point (X, Y) in the unit square has the uniform joint pdf given by fX,Y1x, y2 = b 1 0 0 … x … 1 and 0 … y … 1 elsewhere. The scattergram in Fig. 5.3(a) corresponds to this pair of random variables. Find the joint cdf of X and Y. The cdf is found by evaluating Eq. (5.13).You must be careful with the limits of the integral: The limits should define the region consisting of the intersection of the semi-infinite rectangle defined by (x, y) and the region where the pdf is nonzero.There are five cases in this problem, corresponding to the five regions shown in Fig. 5.15. 1. If x 6 0 or y 6 0, the pdf is zero and Eq. (5.14) implies FX,Y1x, y2 = 0. 2. If (x, y) is inside the unit interval, FX,Y1x, y2 = 3. L0 L0 y 1 dx¿ dy¿ = xy. If 0 … x … 1 and y 7 1, FX,Y1x, y2 = 4. x x L0 L0 1 1 dx¿ dy¿ = x. Similarly, if x 7 1 and 0 … y … 1, FX,Y1x, y2 = y. 252 Chapter 5 Pairs of Random Variables y III V II IV I 1 0 x 1 FIGURE 5.15 Regions that need to be considered separately in computing cdf in Example 5.15. 5. Finally, if x 7 1 and y 7 1, FX,Y1x, y2 = 1 L0 L0 1 1 dx¿ dy¿ = 1. We see that this is the joint cdf of Example 5.11. Example 5.16 Find the normalization constant c and the marginal pdf’s for the following joint pdf: fX,Y1x, y2 = b ce -xe -y 0 0 … y … x 6 q elsewhere. The pdf is nonzero in the shaded region shown in Fig. 5.16(a). The constant c is found from the normalization condition specified by Eq. (5.12): q 1 = L0 L0 x ce -xe -y dy dx = L0 q ce -x11 - e -x2 dx = c . 2 Therefore c = 2. The marginal pdf’s are found by evaluating Eqs. (5.17a) and (5.17b): fX1x2 = L0 q fX,Y1x, y2 dy = L0 x 2e -xe -y dy = 2e -x11 - e -x2 0 … x 6 q and fY1y2 = L0 q fX,Y1x, y2 dx = Ly q 2e -xe -y dx = 2e -2y 0 … y 6 q. You should fill in the steps in the evaluation of the integrals as well as verify that the marginal pdf’s integrate to 1. Section 5.4 The Joint pdf of Two Continuous Random Variables y 253 y x⫽y 1 2 x⫽y x⫹y⫽1 x x 1 2 (b) (a) FIGURE 5.16 The random variables X and Y in Examples 5.16 and 5.17 have a pdf that is nonzero only in the shaded region shown in part (a). Example 5.17 Find P3X + Y … 14 in Example 5.16. Figure 5.16(b) shows the intersection of the event 5X + Y … 16 and the region where the pdf is nonzero. We obtain the probability of the event by “adding” (actually integrating) infinitesimal rectangles of width dy as indicated in the figure: .5 P3X + Y … 14 = L0 Ly 1-y 2e -xe -y dx dy = L0 .5 2e -y3e -y - e -11 - y24 dy = 1 - 2e -1. Example 5.18 Jointly Gaussian Random Variables The joint pdf of X and Y, shown in Fig. 5.17, is fX,Y1x, y2 = 1 2 2 2p21 - r e -1x - 2rxy + y22/211 - r22 - q 6 x, y 6 q . (5.18) We say that X and Y are jointly Gaussian.1 Find the marginal pdf’s. The marginal pdf of X is found by integrating fX,Y1x, y2 over y: fX1x2 = 1 e -x /211 - r 2 2 2 q 2L -q 2p21 - r 2 e -1y - 2rxy2/211 - r22 dy. This is an important special case of jointly Gaussian random variables.The general case is discussed in Section 5.9. 254 Chapter 5 Pairs of Random Variables fX,Y (x,y) 0.4 0.3 3 0.2 2 1 0.1 0 0 –3 -1 –2 –1 -2 0 1 2 3 -3 FIGURE 5.17 Joint pdf of two jointly Gaussian random variables. We complete the square of the argument of the exponent by adding and subtracting r2x2, that is, y2 - 2rxy + r2x2 - r2x2 = 1y - rx22 - r2x2. Therefore fX1x2 = e -x /211 - r 2 2 2 2p21 - r2 L- q e -x /2 22p L- q 2 e -31y - rx2 q -1y - rx22/211 - r22 2 = q 22p11 - r22 e - r2x24/211 - r22 dy dy 2 = e -x /2 22p , where we have noted that the last integral equals one since its integrand is a Gaussian pdf with mean rx and variance 1 - r2. The marginal pdf of X is therefore a one-dimensional Gaussian pdf with mean 0 and variance 1. From the symmetry of fX,Y1x, y2 in x and y, we conclude that the marginal pdf of Y is also a one-dimensional Gaussian pdf with zero mean and unit variance. 5.5 INDEPENDENCE OF TWO RANDOM VARIABLES X and Y are independent random variables if any event A 1 defined in terms of X is independent of any event A 2 defined in terms of Y; that is, P3X in A 1 , Y in A 24 = P3X in A 14P3Y in A 24. (5.19) In this section we present a simple set of conditions for determining when X and Y are independent. Suppose that X and Y are a pair of discrete random variables, and suppose we are interested in the probability of the event A = A 1 ¨ A 2 , where A 1 involves only X and A 2 involves only Y. In particular, if X and Y are independent, then A 1 and A 2 are independent events. If we let A 1 = 5X = xj6 and A 2 = 5Y = yk6, then the Section 5.5 Independence of Two Random Variables 255 independence of X and Y implies that pX,Y1xj , yk2 = P3X = xj , Y = yk4 = P3X = xj4P3Y = yk4 = pX1xj2pY1yk2 for all xj and yk . (5.20) Therefore, if X and Y are independent discrete random variables, then the joint pmf is equal to the product of the marginal pmf’s. Now suppose that we don’t know if X and Y are independent, but we do know that the pmf satisfies Eq. (5.20). Let A = A 1 ¨ A 2 be a product-form event as above, then P3A4 = a a pX,Y1xj , yk2 xj in A1 yk in A2 = a a pX1xj2pY1yk2 xj in A1 yk in A2 = a pX1xj2 a pY1yk2 xj in A1 yk in A2 = P3A 14P3A 24, (5.21) which implies that A 1 and A 2 are independent events. Therefore, if the joint pmf of X and Y equals the product of the marginal pmf’s, then X and Y are independent. We have just proved that the statement “X and Y are independent” is equivalent to the statement “the joint pmf is equal to the product of the marginal pmf’s.” In mathematical language, we say, the “discrete random variables X and Y are independent if and only if the joint pmf is equal to the product of the marginal pmf’s for all xj , yk .” Example 5.19 Is the pmf in Example 5.6 consistent with an experiment that consists of the independent tosses of two fair dice? The probability of each face in a toss of a fair die is 1/6. If two fair dice are tossed and if the tosses are independent, then the probability of any pair of faces, say j and k, is: P3X = j, Y = k4 = P3X = j4P3Y = k4 = 1 . 36 Thus all possible pairs of outcomes should be equiprobable. This is not the case for the joint pmf given in Example 5.6. Therefore the tosses in Example 5.6 are not independent. Example 5.20 Are Q and R in Example 5.9 independent? From Example 5.9 we have P3Q = q4P3R = r4 = 11 - pM21pM2q = 11 - p2pMq + r 11 - p2 1 - pM pr 256 Chapter 5 Pairs of Random Variables = P3Q = q, R = r4 for all q = 0, 1, Á r = 0, Á , M - 1. Therefore Q and R are independent. In general, it can be shown that the random variables X and Y are independent if and only if their joint cdf is equal to the product of its marginal cdf’s: FX,Y1x, y2 = FX1x2FY1y2 for all x and y. (5.22) Similarly, if X and Y are jointly continuous, then X and Y are independent if and only if their joint pdf is equal to the product of the marginal pdf’s: fX,Y1x, y2 = fX1x2fY1y2 for all x and y. (5.23) Equation (5.23) is obtained from Eq. (5.22) by differentiation. Conversely, Eq. (5.22) is obtained from Eq. (5.23) by integration. Example 5.21 Are the random variables X and Y in Example 5.16 independent? Note that fX1x2 and fY1y2 are nonzero for all x 7 0 and all y 7 0. Hence fX1x2fY1y2 is nonzero in the entire positive quadrant. However fX,Y1x, y2 is nonzero only in the region y 6 x inside the positive quadrant. Hence Eq. (5.23) does not hold for all x, y and the random variables are not independent. You should note that in this example the joint pdf appears to factor, but nevertheless it is not the product of the marginal pdf’s. Example 5.22 Are the random variables X and Y in Example 5.18 independent? The product of the marginal pdf’s of X and Y in Example 5.18 is fX1x2fY1y2 = 1 -1x2 + y22/2 e 2p - q 6 x, y 6 q . By comparing to Eq. (5.18) we see that the product of the marginals is equal to the joint pdf if and only if r = 0. Therefore the jointly Gaussian random variables X and Y are independent if and only if r = 0. We see in a later section that r is the correlation coefficient between X and Y. Example 5.23 Are the random variables X and Y independent in Example 5.12? If we multiply the marginal cdf’s found in Example 5.12 we find FX1x2FY1y2 = 11 - e -ax211 - e -by2 = FX,Y1x, y2 for all x and y. Therefore Eq. (5.22) is satisfied so X and Y are independent. If X and Y are independent random variables, then the random variables defined by any pair of functions g(X) and h(Y) are also independent. To show this, consider the Section 5.6 Joint Moments and Expected Values of a Function of Two Random Variables 257 one-dimensional events A and B. Let A¿ be the set of all values of x such that if x is in A¿ then g(x) is in A, and let B¿ be the set of all values of y such that if y is in B¿ then h(y) is in B. (In Chapter 3 we called A¿ and B¿ the equivalent events of A and B.) Then P3g1X2 in A, h1Y2 in B4 = P3X in A¿, Y in B¿4 = P3X in A¿4P3Y in B¿4 = P3g1X2 in A4P3h1Y2 in B4. (5.24) The first and third equalities follow from the fact that A and A¿ and B and B¿ are equivalent events. The second equality follows from the independence of X and Y. Thus g(X) and h(Y) are independent random variables. 5.6 JOINT MOMENTS AND EXPECTED VALUES OF A FUNCTION OF TWO RANDOM VARIABLES The expected value of X identifies the center of mass of the distribution of X. The variance, which is defined as the expected value of 1X - m22, provides a measure of the spread of the distribution. In the case of two random variables we are interested in how X and Y vary together. In particular, we are interested in whether the variation of X and Y are correlated. For example, if X increases does Y tend to increase or to decrease? The joint moments of X and Y, which are defined as expected values of functions of X and Y, provide this information. 5.6.1 Expected Value of a Function of Two Random Variables The problem of finding the expected value of a function of two or more random variables is similar to that of finding the expected value of a function of a single random variable. It can be shown that the expected value of Z = g1X, Y2 can be found using the following expressions: q E3Z4 = d q L- q L- q g1x, y2fX,Y1x, y2 dx dy X, Y jointly continuous (5.25) a a g1xi , yn2pX,Y1xi , yn2 i X, Y discrete. n Example 5.24 Sum of Random Variables Let Z = X + Y. Find E[Z]. E3Z4 = E3X + Y4 = = q q q q L- q L- q L- q L- q q = L- q 1x¿ + y¿2fX,Y1x¿, y¿2 dx¿ dy¿ x¿fX,Y1x¿, y¿2 dy¿ dx¿ + x¿fX1x¿2 dx¿ + q L- q q q L- q L- q y¿ fX,Y1x¿, y¿2 dx¿ dy¿ y¿fY1y¿2 dy¿ = E3X4 + E3Y4. (5.26) 258 Chapter 5 Pairs of Random Variables Thus the expected value of the sum of two random variables is equal to the sum of the individual expected values. Note that X and Y need not be independent. The result in Example 5.24 and a simple induction argument show that the expected value of a sum of n random variables is equal to the sum of the expected values: E3X1 + X2 + Á + Xn4 = E3X14 + Á + E3Xn4. (5.27) Note that the random variables do not have to be independent. Example 5.25 Product of Functions of Independent Random Variables Suppose that X and Y are independent random variables, and let g1X, Y2 = g11X2g21Y2. Find E3g1X, Y24 = E3g11X2g21Y24. E3g11X2g21Y24 = q q L- q L- q q = b L- q g11x¿2g21y¿2fX1x¿2fY1y¿2 dx¿ dy¿ g11x¿2fX1x¿2 dx¿ r b q L- q g21y¿2fY1y¿2 dy¿ r = E3g11X24E3g21Y24. 5.6.2 Joint Moments, Correlation, and Covariance The joint moments of two random variables X and Y summarize information about their joint behavior. The jkth joint moment of X and Y is defined by q E3X Y 4 = d j k q L- q L- q a i xjykfX,Y1x, y2 dx dy j k a xi ynpX,Y1xi , n yn2 X, Y jointly continuous (5.28) X, Y discrete. If j = 0, we obtain the moments of Y, and if k = 0, we obtain the moments of X. In electrical engineering, it is customary to call the j = 1 k = 1 moment, E[XY], the correlation of X and Y. If E3XY4 = 0, then we say that X and Y are orthogonal. The jkth central moment of X and Y is defined as the joint moment of the centered random variables, X - E3X4 and Y - E3Y4: E31X - E3X42j1Y - E3Y42k4. Note that j = 2 k = 0 gives VAR(X) and j = 0 k = 2 gives VAR(Y). The covariance of X and Y is defined as the j = k = 1 central moment: COV1X, Y2 = E31X - E3X421Y - E3Y424. The following form for COV(X, Y) is sometimes more convenient to work with: COV1X, Y2 = E3XY - XE3Y4 - YE3X4 + E3X4E3Y44 (5.29) Section 5.6 259 Joint Moments and Expected Values of a Function of Two Random Variables = E3XY4 - 2E3X4E3Y4 + E3X4E3Y4 = E3XY4 - E3X4E3Y4. (5.30) Note that COV1X, Y2 = E3XY4 if either of the random variables has mean zero. Example 5.26 Covariance of Independent Random Variables Let X and Y be independent random variables. Find their covariance. COV1X, Y2 = E31X - E3X421Y - E3Y424 = E3X - E3X44E3Y - E3Y44 = 0, where the second equality follows from the fact that X and Y are independent, and the third equality follows from E3X - E3X44 = E3X4 - E3X4 = 0. Therefore pairs of independent random variables have covariance zero. Let’s see how the covariance measures the correlation between X and Y.The covariance measures the deviation from mX = E3X4 and mY = E3Y4. If a positive value of 1X - mX2 tends to be accompanied by a positive values of 1Y - mY2, and negative 1X - mX2 tend to be accompanied by negative 1Y - mY2; then 1X - mX21Y - mY2 will tend to be a positive value, and its expected value, COV(X, Y), will be positive. This is the case for the scattergram in Fig. 5.3(d) where the observed points tend to cluster along a line of positive slope. On the other hand, if 1X - mX2 and 1Y - mY2 tend to have opposite signs, then COV(X, Y) will be negative. A scattergram for this case would have observation points cluster along a line of negative slope. Finally if 1X - mX2 and 1Y - mY2 sometimes have the same sign and sometimes have opposite signs, then COV(X, Y) will be close to zero. The three scattergrams in Figs. 5.3(a), (b), and (c) fall into this category. Multiplying either X or Y by a large number will increase the covariance, so we need to normalize the covariance to measure the correlation in an absolute scale. The correlation coefficient of X and Y is defined by rX,Y = COV1X, Y2 sXsY = E3XY4 - E3X4E3Y4 sXsY , (5.31) where sX = 2VAR1X2 and sY = 2VAR1Y2 are the standard deviations of X and Y, respectively. The correlation coefficient is a number that is at most 1 in magnitude: -1 … rX,Y … 1. (5.32) To show Eq. (5.32), we begin with an inequality that results from the fact that the expected value of the square of a random variable is nonnegative: 0 … Eb ¢ X - E3X4 sX ; Y - E3Y4 sY 2 ≤ r 260 Chapter 5 Pairs of Random Variables = 1 ; 2rX,Y + 1 = 211 ; rX,Y2. The last equation implies Eq. (5.32). The extreme values of rX,Y are achieved when X and Y are related linearly, Y = aX + b; rX,Y = 1 if a 7 0 and rX,Y = -1 if a 6 0. In Section 6.5 we show that rX,Y can be viewed as a statistical measure of the extent to which Y can be predicted by a linear function of X. X and Y are said to be uncorrelated if rX,Y = 0. If X and Y are independent, then COV1X, Y2 = 0, so rX,Y = 0. Thus if X and Y are independent, then X and Y are uncorrelated. In Example 5.22, we saw that if X and Y are jointly Gaussian and rX,Y = 0, then X and Y are independent random variables. Example 5.27 shows that this is not always true for non-Gaussian random variables: It is possible for X and Y to be uncorrelated but not independent. Example 5.27 Uncorrelated but Dependent Random Variables Let ® be uniformly distributed in the interval 10, 2p2. Let X = cos ® and Y = sin ®. The point (X, Y) then corresponds to the point on the unit circle specified by the angle ®, as shown in Fig. 5.18. In Example 4.36, we saw that the marginal pdf’s of X and Y are arcsine pdf’s, which are nonzero in the interval 1-1, 12. The product of the marginals is nonzero in the square defined by -1 … x … 1 and -1 … y … 1, so if X and Y were independent the point (X, Y) would assume all values in this square. This is not the case, so X and Y are dependent. We now show that X and Y are uncorrelated: E3XY4 = E3sin ® cos ®4 = = 1 4p L0 1 2p L0 2p sin f cos f df 2p sin 2f df = 0. Since E3X4 = E3Y4 = 0, Eq. (5.30) then implies that X and Y are uncorrelated. Example 5.28 Let X and Y be the random variables discussed in Example 5.16. Find E[XY], COV(X, Y), and rX,Y . Equations (5.30) and (5.31) require that we find the mean, variance, and correlation of X and Y. From the marginal pdf’s of X and Y obtained in Example 5.16, we find that E3X4 = 3/2 and VAR3X4 = 5/4, and that E3Y4 = 1/2 and VAR3Y4 = 1/4. The correlation of X and Y is q E3XY4 = = L0 L0 L0 q x xy2e -xe -y dy dx 2xe -x11 - e -x - xe -x2 dx = 1. Section 5.7 Conditional Probability and Conditional Expectation 261 y 1 (cos θ, sin θ) θ ⫺1 x 1 ⫺1 FIGURE 5.18 (X, Y) is a point selected at random on the unit circle. X and Y are uncorrelated but not independent. Thus the correlation coefficient is given by 1 rX,Y = 5.7 31 22 5 1 A 4A 4 = 1 25 . CONDITIONAL PROBABILITY AND CONDITIONAL EXPECTATION Many random variables of practical interest are not independent:The output Y of a communication channel must depend on the input X in order to convey information; consecutive samples of a waveform that varies slowly are likely to be close in value and hence are not independent. In this section we are interested in computing the probability of events concerning the random variable Y given that we know X = x. We are also interested in the expected value of Y given X = x. We show that the notions of conditional probability and conditional expectation are extremely useful tools in solving problems, even in situations where we are only concerned with one of the random variables. 5.7.1 Conditional Probability The definition of conditional probability in Section 2.4 allows us to compute the probability that Y is in A given that we know that X = x: P3Y in A ƒ X = x4 = P3Y in A, X = x4 P3X = x4 for P3X = x4 7 0. (5.33) 262 Chapter 5 Pairs of Random Variables Case 1: X Is a Discrete Random Variable For X and Y discrete random variables, the conditional pmf of Y given X ⴝ x is defined by: pY1y ƒ x2 = P3Y = y ƒ X = x4 = P3X = x, Y = y4 P3X = x4 = pX,Y1x, y2 pX1x2 (5.34) for x such that P3X = x4 7 0. We define pY1y ƒ x2 = 0 for x such that P3X = x4 = 0. Note that pY1y ƒ x2 is a function of y over the real line, and that pY1y ƒ x2 7 0 only for y in a discrete set 5y1 , y2 , Á 6. The conditional pmf satisfies all the properties of a pmf, that is, it assigns nonnegative values to every y and these values add to 1. Note from Eq. (5.34) that pY1y ƒ xk2 is simply the cross section of pX,Y1xk ,y2 along the X = xk column in Fig. 5.6, but normalized by the probability pX1xk2. The probability of an event A given X = xk is found by adding the pmf values of the outcomes in A: P3Y in A ƒ X = xk4 = a p Y1yj ƒ xk2. (5.35) yj in A If X and Y are independent, then using Eq (5.20) pY1yj ƒ xk2 = P3X = xk , Y = yj4 P3X = xk4 = P3Y = yj4 = pY1yj2. (5.36) In other words, knowledge that X = xk does not affect the probability of events A involving Y. Equation (5.34) implies that the joint pmf pX,Y1x, y2 can be expressed as the product of a conditional pmf and a marginal pmf: pX,Y1xk , yj2 = pY1yj ƒ xk2pX1xk2 and pX,Y1xk , yj2 = pX1xk ƒ yj2pY1yj2. (5.37) This expression is very useful when we can view the pair (X, Y) as being generated sequentially, e.g., first X, and then Y given X = x. We find the probability that Y is in A as follows: P3Y in A4 = a a pX,Y1xk , yj2 all xk yj in A = a a pY1yj ƒ xk2pX1xk2 all xk yj in A = a pX1xk2 a pY1yj ƒ xk2 all xk yj in A = a P3Y in A ƒ X = xk4pX1xk2. (5.38) all xk Equation (5.38) is simply a restatement of the theorem on total probability discussed in Chapter 2. In other words, to compute P[Y in A] we can first compute P3Y in A ƒ X = xk4 and then “average” over Xk . Section 5.7 263 Conditional Probability and Conditional Expectation Example 5.29 Loaded Dice Find pY1y ƒ 52 in the loaded dice experiment considered in Examples 5.6 and 5.8. In Example 5.8 we found that pX152 = 1/6. Therefore: pY1y ƒ 52 = pX,Y15, y2 pX152 and so pY15 ƒ 52 = 2/7 and pY11 ƒ 52 = pY12 ƒ 52 = pY13 ƒ 52 = pY14 ƒ 52 = pY16 ƒ 52 = 1/7. Clearly this die is loaded. Example 5.30 Number of Defects in a Region; Random Splitting of Poisson Counts The total number of defects X on a chip is a Poisson random variable with mean a. Each defect has a probability p of falling in a specific region R and the location of each defect is independent of the locations of other defects. Find the pmf of the number of defects Y that fall in the region R. We can imagine performing a Bernoulli trial each time a defect occurs with a “success” occurring when the defect falls in the region R. If the total number of defects is X = k, then Y is a binomial random variable with parameters k and p: 0 pY1j ƒ k2 = c a k bpj11 - p2k - j j j 7 k 0 … j … k. From Eq. (5.38) and noting that k Ú j, we have k! ak pY1j2 = a pY1j ƒ k2pX1k2 = a pj11 - p2k - j e-a k! k = j j!1k - j2! k=0 q q = 1ap2je-a j! 1ap2 e k=j j -a = j! a q 511 - p2a6k - j 1k - j2! e11 - p2a = 1ap2j j! e-ap. Thus Y is a Poisson random variable with mean ap. Suppose Y is a continuous random variable. Eq. (5.33) can be used to define the conditional cdf of Y given X ⴝ xk: FY1y ƒ xk2 = P3Y … y, X = xk4 P3X = xk4 , for P3X = xk4 7 0. (5.39) It is easy to show that FY1y ƒ xk2 satisfies all the properties of a cdf. The conditional pdf of Y given X ⴝ xk, if the derivative exists, is given by fY1y ƒ xk2 = d F 1y ƒ xk2. dy Y (5.40) 264 Chapter 5 Pairs of Random Variables If X and Y are independent, P3Y … y, X = Xk4 = P3Y … y4P3X = Xk4 so FY1y ƒ x2 = FY1y2 and fY1y ƒ x2 = fY1y2. The probability of event A given X = xk is obtained by integrating the conditional pdf: P3Y in A ƒ X = xk4 = fY1y ƒ xk2 dy. Ly in A (5.41) We obtain P[Y in A] using Eq. (5.38). Example 5.31 Binary Communications System The input X to a communication channel assumes the values +1 or -1 with probabilities 1/3 and 2/3. The output Y of the channel is given by Y = X + N, where N is a zero-mean, unit variance Gaussian random variable. Find the conditional pdf of Y given X = +1, and given X = -1. Find P3X = +1 ƒ Y 7 04. The conditional cdf of Y given X = +1 is: FY1y ƒ +12 = P3Y … y ƒ X = +14 = P3N + 1 … y4 y-1 = P3N … y - 14 = L -q 1 22p 2 e -x /2 dx where we noted that if X = +1, then Y = N + 1 and Y depends only on N. Thus, if X = +1, then Y is a Gaussian random variable with mean 1 and unit variance. Similarly, if X = -1, then Y is Gaussian with mean -1 and unit variance. The probabilities that Y 7 0 given X = +1 and X = -1 is: q P3Y 7 0 ƒ X = +14 = L0 22p q P3Y 7 0 ƒ X = -14 = 1 L 0 22p 1 q 2 e -1x - 12 /2 dx = L-1 22p q 2 e -1x + 12 /2 dx = 1 L1 22p 1 2 e -t /2 dt = 1 - Q112 = 0.841. 2 e -t /2 dt = Q112 = 0.159. Applying Eq. (5.38), we obtain: P3Y 7 04 = P3Y 7 0 ƒ X = +14 1 2 + P3Y 7 0 ƒ X = -14 = 0.386. 3 3 From Bayes’ theorem we find: P3X = +1 ƒ Y 7 04 = P3Y 7 0 ƒ X = +14P3X = +14 P3Y 7 04 = 11 - Q1122/3 11 + Q1122/3 = 0.726. We conclude that if Y 7 0, then X = +1 is more likely than X = -1. Therefore the receiver should decide that the input is X = +1 when it observes Y 7 0. In the previous example, we made an interesting step that is worth elaborating on because it comes up quite frequently: P3Y … y ƒ X = +14 = P3N + 1 … y4, where Y = X + N. Let’s take a closer look: Section 5.7 P3Y … z ƒ X = x4 = Conditional Probability and Conditional Expectation P35X + N … z6 ¨ 5X = x64 P3X = x4 = 265 P35x + N … z6 ¨ 5X = x64 P3X = x4 = P3x + N … z ƒ X = x4 = P3N … z - x ƒ X = x4. In the first line, the events 5X + N … z6 and 5x + N … z6 are quite different. The first involves the two random variables X and N, whereas the second only involves N and consequently is much simpler. We can then apply an expression such as Eq. (5.38) to obtain P3Y … z4. The step we made in the example, however, is even more interesting. Since X and N are independent random variables, we can take the expression one step further: P3Y … z ƒ X = x4 = P3N … z - x ƒ X = x4 = P3N … z - x4. The independence of X and N allows us to dispense with the conditioning on x altogether! Case 2: X Is a Continuous Random Variable If X is a continuous random variable, then P3X = x4 = 0 so Eq. (5.33) is undefined for all x. If X and Y have a joint pdf that is continuous and nonzero over some region of the plane, we define the conditional cdf of Y given X ⴝ x by the following limiting procedure: FY1y ƒ x2 = lim FY1y ƒ x 6 X … x + h2. (5.42) h:0 The conditional cdf on the right side of Eq. (5.42) is: FY1y ƒ x 6 X … x + h2 = y = L- q Lx P3Y … y, x 6 X … x + h4 P3x 6 X … x + h4 x+h fX,Y1x¿, y¿2 dx¿ dy¿ Lx x+h fX1x¿2 dx¿ y = L- q fX,Y1x, y¿2 dy¿h fX1x2h . (5.43) As we let h approach zero, Eqs. (5.42) and (5.43) imply that y FY1y ƒ x2 = L- q fX,Y1x, y¿2 dy¿ fX1x2 . (5.44) The conditional pdf of Y given X ⴝ x is then: fY1y ƒ x2 = fX,Y1x, y2 d FY1y ƒ x2 = . dy fX1x2 (5.45) 266 Chapter 5 Pairs of Random Variables fX,Y (x,y) y y  dy y x fy(y x)dy  x  dx x fXY (x,y)dxdy fx(x)dx FIGURE 5.19 Interpretation of conditional pdf. It is easy to show that fY1y ƒ x2 satisfies the properties of a pdf.We can interpret fY1y ƒ x2 dy as the probability that Y is in the infinitesimal strip defined by 1y, y + dy2 given that X is in the infinitesimal strip defined by 1x, x + dx2, as shown in Fig. 5.19. The probability of event A given X = x is obtained as follows: P3Y in A ƒ X = x4 = fY1y ƒ x2 dy. Ly in A (5.46) There is a strong resemblance between Eq. (5.34) for the discrete case and Eq. (5.45) for the continuous case. Indeed many of the same properties hold. For example, we obtain the multiplication rule from Eq. (5.45): fX,Y1x, y2 = fY1y ƒ x2fX1x2 and fX,Y1x, y2 = fX1x ƒ y2fY1y2. (5.47) If X and Y are independent, then fX,Y1x, y2 = fX1x2fY1y2 and fY1y ƒ x2 = fY1y2, fX1x ƒ y2 = fX1x2, FY1y ƒ x2 = FY1y2, and FX1x ƒ y2 = FX1x2. By combining Eqs. (5.46) and (5.47), we can show that: q P3Y in A4 = L- q P3Y in A ƒ X = x4fX1x2 dx. (5.48) You can think of Eq. (5.48) as the “continuous” version of the theorem on total probability. The following examples show the usefulness of the above results in calculating the probabilities of complicated events. Section 5.7 Conditional Probability and Conditional Expectation 267 Example 5.32 Let X and Y be the random variables in Example 5.8. Find fX1x ƒ y2 and fY1y ƒ x2. Using the marginal pdf’s obtained in Example 5.8, we have fX1y ƒ x2 = 2e -xe -y = e -1x - y2 2e -2y for x Ú y fY1y ƒ x2 = e -y 2e -xe -y -x = 2e 11 - e 2 1 - e -x for 0 6 y 6 x. -x The conditional pdf of X is an exponential pdf shifted by y to the right. The conditional pdf of Y is an exponential pdf that has been truncated to the interval [0, x]. Example 5.33 Number of Arrivals During a Customer’s Service Time The number N of customers that arrive at a service station during a time t is a Poisson random variable with parameter bt. The time T required to service each customer is an exponential random variable with parameter a. Find the pmf for the number N that arrive during the service time T of a specific customer. Assume that the customer arrivals are independent of the customer service time. Equation (5.48) holds even if Y is a discrete random variable, thus P3N = k4 = L0 q q P3N = k ƒ T = t4fT1t2 dt 1bt2k = L0 = ab k tke-1a + b2t dt. k! L0 k! e-btae-at dt q Let r = 1a + b2t, then P3N = k4 = = ab k k!1a + b2k + 1 L0 ab k 1a + b2 k+1 = a q rke -r dr k b a ba b , 1a + b2 1a + b2 where we have used the fact that the last integral is a gamma function and is equal to k!. Thus N is a geometric random variable with probability of “success” a/1a + b2. Each time a customer arrives we can imagine that a new Bernoulli trial begins where “success” occurs if the customer’s service time is completed before the next arrival. Example 5.34 X is selected at random from the unit interval; Y is then selected at random from the interval(0, X). Find the cdf of Y. 268 Chapter 5 Pairs of Random Variables When X = x, Y is uniformly distributed in (0, x) so the conditional cdf given X = x is P3Y … y ƒ X = k4 = b 0 … y … x x 6 y. y/x 1 Equation (5.48) and the above conditional cdf yield: FY1y2 = P3Y … y4 = = L0 L0 1 P3Y … y ƒ X = x4fX1x2 dx = 1 y 1 dx¿ + y dx¿ = y - y ln y. Ly x¿ The corresponding pdf is obtained by taking the derivative of the cdf: fY1y2 = - ln y 0 … y … 1. Example 5.35 Maximum A Posteriori Receiver For the communications system in Example 5.31, find the probability that the input was X = +1 given that the output of the channel is Y = y. This is a tricky version of Bayes’ rule. Condition on the event 5y 6 Y … y + ¢6 instead of 5Y = y6: P3X = +1 ƒ y 6 Y 6 y + ¢4 = = P3y 6 Y 6 y + ¢4 fY1y ƒ +12¢11/32 fY1y ƒ +12¢11/32 + fY1y ƒ -12¢12/32 1 = P3y 6 Y 6 y + ¢ ƒ X = +14P3X = +14 22p 1 22p e-1y - 12 /211/32 + 2 e = e e-1y - 12 /211/32 2 1 22p -1y - 122/2 -1y - 122/2 + 2e -1y + 122/2 = e-1y + 12 /212/32 2 1 . 1 + 2e-2y The above expression is equal to 1/2 when yT = 0.3466. For y 7 yT , X = +1 is more likely, and for y 6 yT , X = -1 is more likely. A receiver that selects the input X that is more likely given Y = y is called a maximum a posteriori receiver. 5.7.2 Conditional Expectation The conditional expectation of Y given X ⴝ x is defined by q E3Y ƒ x4 = L- q yfY1y ƒ x2 dy. (5.49a) Section 5.7 Conditional Probability and Conditional Expectation 269 In the special case where X and Y are both discrete random variables we have: E3Y ƒ xk4 = a yjpY1yj ƒ xk2. (5.49b) yj Clearly, E3Y ƒ x4 is simply the center of mass associated with the conditional pdf or pmf. The conditional expectation E3Y ƒ x4 can be viewed as defining a function of x: g1x2 = E3Y ƒ x4. It therefore makes sense to talk about the random variable g1X2 = E3Y ƒ X4. We can imagine that a random experiment is performed and a value for X is obtained, say X = x0 , and then the value g1x02 = E3Y ƒ x04 is produced.We are interested in E3g1X24 = E3E3Y ƒ X44. In particular, we now show that E3Y4 = E3E3Y ƒ X44, (5.50) where the right-hand side is q E3E3Y ƒ X44 = L- q E3Y ƒ x4fX1x2 dx E3E3Y ƒ X44 = a E3Y ƒ xk4pX1xk2 X continuous (5.51a) X discrete. (5.51b) xk We prove Eq. (5.50) for the case where X and Y are jointly continuous random variables, then q E3E3Y ƒ X44 = L- q E3Y ƒ x4fX1x2 dx q q = L- q L- q q = q L- q L- q y q = yfY1y ƒ x2 dy fX1x2 dx L- q fX,Y1x, y2 dx dy yfY1y2 dy = E3Y4. The above result also holds for the expected value of a function of Y: E3h1Y24 = E3E3h1Y2 ƒ X44. In particular, the kth moment of Y is given by E3Yk4 = E3E3Yk ƒ X44. Example 5.36 Average Number of Defects in a Region Find the mean of Y in Example 5.30 using conditional expectation. E3Y4 = a E3Y ƒ X = k4P3X = k4 = a kpP3X = k4 = pE3X4 = pa. q q k=0 k=0 270 Chapter 5 Pairs of Random Variables The second equality uses the fact that E3Y ƒ X = k4 = kp since Y is binomial with parameters k and p. Note that the second to the last equality holds for any pmf of X. The fact that X is Poisson with mean a is not used until the last equality. Example 5.37 Binary Communications Channel Find the mean of the output Y in the communications channel in Example 5.31. Since Y is a Gaussian random variable with mean +1 when X = +1, and -1 when X = -1, the conditional expected values of Y given X are: E3Y ƒ +14 = 1 and E3Y ƒ -14 = -1. Equation (5.38b) implies E3Y4 = a E3Y ƒ X = k4P3X = k4 = + 111/32 - 112/32 = -1/3. q k=0 The mean is negative because the X = -1 inputs occur twice as often as X = +1. Example 5.38 Average Number of Arrivals in a Service Time Find the mean and variance of the number of customer arrivals N during the service time T of a specific customer in Example (5.33). N is a Poisson random variable with parameter bt when T = t is given, so the first two conditional moments are: E3N ƒ T = t4 = bt E3N 2 ƒ T = t4 = 1bt2 + 1bt22. The first two moments of N are obtained from Eq. (5.50): L0 E3N4 = E3N 24 = L0 q E3N ƒ T = t4fT1t2 dt = q E3N 2 ƒ T = t4fT1t2 dt = L0 q L0 btfT1t2 dt = bE3T4 q 5bt + b 2t26fT1t2 dt = bE3T4 + b 2E3T24. The variance of N is then VAR3N4 = E3N 24 - 1E3N422 = b 2E3T24 + bE3T4 - b 21E3T422 = b 2 VAR3T4 + bE3T4. Note that if T is not random (i.e., E3T4 = constant and VAR3T4 = 0) then the mean and variance of N are those of a Poisson random variable with parameter bE3T4. When T is random, the mean of N remains the same but the variance of N increases by the term b 2 VAR3T4, that is, the variability of T causes greater variability in N. Up to this point, we have intentionally avoided using the fact that T has an exponential distribution to emphasize that the above results hold Section 5.8 Functions of Two Random Variables 271 for any service time distribution fT1t2. If T is exponential with parameter a, then E3T4 = 1/a and VAR3T4 = 1/a2, so E3N4 = 5.8 b a VAR3N4 = and b2 a2 + b . a FUNCTIONS OF TWO RANDOM VARIABLES Quite often we are interested in one or more functions of the random variables associated with some experiment. For example, if we make repeated measurements of the same random quantity, we might be interested in the maximum and minimum value in the set, as well as the sample mean and sample variance. In this section we present methods of determining the probabilities of events involving functions of two random variables. 5.8.1 One Function of Two Random Variables Let the random variable Z be defined as a function of two random variables: Z = g1X, Y2. (5.52) The cdf of Z is found by first finding the equivalent event of 5Z … z6, that is, the set Rz = 5x = 1x, y2 such that g1x2 … z6, then Fz1z2 = P3X in Rz4 = O 1x, y2HRz fX,Y1x¿, y¿2 dx¿ dy¿. (5.53) The pdf of Z is then found by taking the derivative of Fz1z2. Example 5.39 Sum of Two Random Variables Let Z = X + Y. Find FZ1z2 and fZ1z2 in terms of the joint pdf of X and Y. The cdf of Z is found by integrating the joint pdf of X and Y over the region of the plane corresponding to the event 5Z … z6, as shown in Fig. 5.20. y y ⫽ ⫺x ⫹ z x FIGURE 5.20 P3Z … z4 = P3X + Y … z4. 272 Chapter 5 Pairs of Random Variables FZ1z2 = q z - x¿ L- q L- q fX,Y1x¿, y¿2 dy¿ dx¿. The pdf of Z is fZ1z2 = q d FZ1z2 = fX,Y1x¿, z - x¿2 dx¿. dz L- q (5.54) Thus the pdf for the sum of two random variables is given by a superposition integral. If X and Y are independent random variables, then by Eq. (5.23) the pdf is given by the convolution integral of the marginal pdf’s of X and Y: q fZ1z2 = fX1x¿2fY1z - x¿2 dx¿. (5.55) L- q In Chapter 7 we show how transform methods are used to evaluate convolution integrals such as Eq. (5.55). Example 5.40 Sum of Nonindependent Gaussian Random Variables Find the pdf of the sum Z = X + Y of two zero-mean, unit-variance Gaussian random variables with correlation coefficient r = -1/2. The joint pdf for this pair of random variables was given in Example 5.18. The pdf of Z is obtained by substituting the pdf for the joint Gaussian random variables into the superposition integral found in Example 5.39: fZ1z2 = q L- q fX,Y1x¿, z - x¿2 dx¿ q = 1 2 2 2 e -3x¿ - 2rx¿1z - x¿2 + 1z - x¿2 4/211 - r 2 dx¿ 2p11 - r221/2 L- q = 1 2 2 e -1x¿ - x¿z + z 2/213/42 dx¿. 2p13/421/2 L- q q After completing the square of the argument in the exponent we obtain fZ1z2 = 2 e -z /2 22p . Thus the sum of these two nonindependent Gaussian random variables is also a zero-mean, unitvariance Gaussian random variable. Example 5.41 A System with Standby Redundancy A system with standby redundancy has a single key component in operation and a duplicate of that component in standby mode. When the first component fails, the second component is put into operation. Find the pdf of the lifetime of the standby system if the components have independent exponentially distributed lifetimes with the same mean. Let T1 and T2 be the lifetimes of the two components, then the system lifetime is T = T1 + T2 , and the pdf of T is given by Eq. (5.55). The terms in the integrand are Section 5.8 fT11x2 = b fT21z - x2 = b le-lx 0 Functions of Two Random Variables 273 x Ú 0 x 6 0 le-l1z - x2 0 z - x Ú 0 x 7 z. Note that the first equation sets the lower limit of integration to 0 and the second equation sets the upper limit to z. Equation (5.55) becomes fT1z2 = L0 z le-lxle-l1z - x2 dx = l2e-lz L0 z dx = l2ze-lz. Thus T is an Erlang random variable with parameter m = 2. The conditional pdf can be used to find the pdf of a function of several random variables. Let Z = g1X, Y2, and suppose we are given that Y = y, then Z = g1X, y2 is a function of one random variable. Therefore we can use the methods developed in Section 4.5 for single random variables to find the pdf of Z given Y = y: fZ1z ƒ Y = y2. The pdf of Z is then found from q fZ1z2 = L- q fZ1z ƒ y¿2fY1y¿2 dy¿. Example 5.42 Let Z = X/Y. Find the pdf of Z if X and Y are independent and both exponentially distributed with mean one. Assume Y = y, then Z = X/y is simply a scaled version of X. Therefore from Example 4.31 fZ1z ƒ y2 = ƒ y ƒ fX1yz ƒ y2. The pdf of Z is therefore fZ1z2 = q L- q ƒ y¿ ƒ fX1y¿z ƒ y¿2fY1y¿2 dy¿ = q L- q ƒ y¿ ƒ fX,Y1y¿z, y¿2 dy¿. We now use the fact that X and Y are independent and exponentially distributed with mean one: fZ1z2 = L0 q y¿fX1y¿z2fY1y¿2 dy¿ q = L0 = 1 11 + z22 y¿e-y¿ze-y¿ dy¿ z 7 0. z 7 0 274 5.8.2 Chapter 5 Pairs of Random Variables Transformations of Two Random Variables Let X and Y be random variables associated with some experiment, and let the random variables Z1 and Z2 be defined by two functions of X = 1X, Y2: Z1 = g11X2 Z2 = g21X2. and We now consider the problem of finding the joint cdf and pdf of Z1 and Z2 . The joint cdf of Z1 and Z2 at the point z = 1z1 , z22 is equal to the probability of the region of x where gk1x2 … zk for k = 1, 2: Fz1, z21z1 , z22 = P3g11X2 … z1 , g21X2 … z24. (5.56a) If X, Y have a joint pdf, then Fz1, z21z1 , z22 = fX,Y1x¿, y¿2 dx¿ dy¿. O (5.56b) x¿: gk1x¿2 … zk Example 5.43 Let the random variables W and Z be defined by W = min1X, Y2 and Z = max1X, Y2. Find the joint cdf of W and Z in terms of the joint cdf of X and Y. Equation (5.56a) implies that FW, Z1w z2 = P35min1X, Y2 … w6 ¨ 5max1X, Y2 … z64. The region corresponding to this event is shown in Fig. 5.21. From the figure it is clear that if z 7 w, the above probability is the probability of the semi-infinite rectangle defined by the (z, z) A (w, w) FIGURE 5.21 5min1X, Y2 … w = 5X … w6 ´ 5Y … w6 and 5max1X, Y2 … z = 5X … z6 ¨ 5Y … z6. Section 5.8 Functions of Two Random Variables 275 point (z, z) minus the square region denoted by A. Thus if z 7 w, FW, Z1w, z2 = FX,Y1z, z2 - P3A4 = FX,Y1z, z2 - 5FX,Y1z, z2 - FX,Y1w, z2 - FX,Y1z, w2 + FX,Y1w, w26 = FX,Y1w, z2 + FX,Y1z, w2 - FX,Y1w, w2. If z 6 w then FW,Z1w, z2 = FX,Y1z, z2. Example 5.44 Radius and Angle of Independent Gaussian Random Variables Let X and Y be zero-mean, unit-variance independent Gaussian random variables. Find the joint cdf and pdf of R and ®, the radius and angle of the point (X, Y): R = 1X2 + Y221/2 ® = tan-1 1Y/X2. The joint cdf of R and ® is: FR, ®1r0 , u02 = P3R … r0 , ® … u04 = where O e -1x + y 2/2 dx dy 2p 2 1x, y2HR1r0, u02 2 R1r0, u02 = 51x, y2: 2x2 + y2 … r0 , 0 6 tan-11Y/X2 … u06. The region Rr0,u0 is the pie-shaped region in Fig. 5.22. We change variables from Cartesian to polar coordinates to obtain: FR,® 1r0 , u02 = P3R … r0 , ® … u04 = = r0 u0 -r2/2 e r dr du L0 L0 2p u0 2 A 1 - e -r0/2 B , 0 6 u0 6 2p 0 6 r0 6 q . 2p y r0 θ0 x FIGURE 5.22 Region of integration Rr0, u0 in Example 5.44. (5.57) 276 Chapter 5 Pairs of Random Variables R and ® are independent random variables, where R has a Rayleigh distribution and ® is uniformly distributed in 10, 2p2. The joint pdf is obtained by taking partial derivatives with respect to r and u: fR,®1r, u2 = = 02 u 2 11 - e -r /22 0r0u 2p 1 2 A re -r /2 B , 2p 0 6 u 6 2p 0 6 r 6 q . This transformation maps every point in the plane from Cartesian coordinates to polar coordinates. We can also go backwards from polar to Cartesian coordinates. First we generate independent Rayleigh R and uniform ® random variables. We then transform R and ® into Cartesian coordinates to obtain an independent pair of zero-mean, unit-variance Gaussians. Neat! 5.8.3 pdf of Linear Transformations The joint pdf of Z can be found directly in terms of the joint pdf of X by finding the equivalent events of infinitesimal rectangles. We consider the linear transformation of two random variables: V = aX + bY W = cX + eY or B V a R = B W c b X R B R. e Y Denote the above matrix by A. We will assume that A has an inverse, that is, it has determinant ƒ ae - bc ƒ Z 0, so each point (v, w) has a unique corresponding point (x, y) obtained from x y v w B R = A-1 B R . (5.58) Consider the infinitesimal rectangle shown in Fig. 5.23. The points in this rectangle are mapped into the parallelogram shown in the figure. The infinitesimal rectangle and the parallelogram are equivalent events, so their probabilities must be equal. Thus fX,Y1x, y2dx dy M fV, W1v, w2 dP where dP is the area of the parallelogram. The joint pdf of V and W is thus given by fV, W1v, w2 = fX,Y1x, y2 dP ` ` dx dy , (5.59) where x and y are related to 1v, w2 by Eq. (5.58). Equation (5.59) states that the joint pdf of V and W at 1v, w2 is the pdf of X and Y at the corresponding point (x, y), but rescaled by the “stretch factor” dP/dx dy. It can be shown that dP = 1 ƒ ae - bc ƒ 2 dx dy, so the “stretch factor” is ` ƒ ae - bc ƒ 1dx dy2 dP = ƒ ae - bc ƒ = ƒ A ƒ , ` = dx dy 1dx dy2 Section 5.8 277 Functions of Two Random Variables w y (v ⫹ adx ⫹ bdy, w ⫹ cdx ⫹ edy) (x, y ⫹ dy) (v ⫹ bdy, w ⫹ edy) (x ⫹ dx, y ⫹ dy) (v ⫹ adx, w ⫹ cdx) (x, y) (v, w) (x ⫹ dx, y) v x v ⫽ ax ⫹ by w ⫽ cx ⫹ ey FIGURE 5.23 Image of an infinitesimal rectangle under a linear transformation. where ƒ A ƒ is the determinant of A. The above result can be written compactly using matrix notation. Let the vector Z be Z = AX, where A is an n * n invertible matrix. The joint pdf of Z is then fz1z2 = fx1A-1z2. ƒAƒ (5.60) Example 5.45 Linear Transformation of Jointly Gaussian Random Variables Let X and Y be the jointly Gaussian random variables introduced in Example 5.18. Let V and W be obtained from (X, Y) by B V 1 1 R = B W -1 22 1 X X R B R = AB R. 1 Y Y Find the joint pdf of V and W. The determinant of the matrix is ƒ A ƒ = 1, and the inverse mapping is given by B 1 1 X R = B Y 22 1 -1 V R B R, 1 W so X = 1V - W2/22 and Y = 1V + W2/22. Therefore the pdf of V and W is fV, W1v, w2 = fX,Y ¢ v - w v + w , ≤, 22 22 278 Chapter 5 Pairs of Random Variables where fX,Y1x, y2 = 1 2 2 2p21 - r e -1x - 2rxy + y22/211 - r22 . By substituting for x and y, the argument of the exponent becomes 1v - w22/2 - 2r1v - w21v + w2/2 + 1v + w22/2 211 - r22 = v2 w2 + . 211 + r2 211 - r2 Thus fV,W1v, w2 = 1 2 2 e -53v /211 + r24 + 3w /211 - r246. 2p11 - r221/2 It can be seen that the transformed variables V and W are independent, zero-mean Gaussian random variables with variance 1 + r and 1 - r, respectively. Figure 5.24 shows contours of equal value of the joint pdf of (X, Y). It can be seen that the pdf has elliptical symmetry about the origin with principal axes at 45° with respect to the axes of the plane. In Section 5.9 we show that the above linear transformation corresponds to a rotation of the coordinate system so that the axes of the plane are aligned with the axes of the ellipse. 5.9 PAIRS OF JOINTLY GAUSSIAN RANDOM VARIABLES The jointly Gaussian random variables appear in numerous applications in electrical engineering. They are frequently used to model signals in signal processing applications, and they are the most important model used in communication systems that involve dealing with signals in the presence of noise. They also play a central role in many statistical methods. The random variables X and Y are said to be jointly Gaussian if their joint pdf has the form fX, Y1x, y2 = exp b x - m1 2 x - m1 y - m2 y - m2 2 -1 2r + B¢ ≤ ¢ ≤ ¢ ≤ ¢ ≤ Rr X,Y s1 s1 s2 s2 211 - r2X,Y2 2ps1s2 21 - r2X,Y (5.61a) for - q 6 x 6 q and - q 6 y 6 q . The pdf is centered at the point 1m1 , m22, and it has a bell shape that depends on the values of s1 , s2 , and rX,Y as shown in Fig. 5.25. As shown in the figure, the pdf is constant for values x and y for which the argument of the exponent is constant: B¢ x - m1 2 x - m1 y - m2 y - m2 2 ≤ - 2rX,Y ¢ ≤¢ ≤ + ¢ ≤ R = constant. s1 s1 s2 s2 (5.61b) Section 5.9 Pairs of Jointly Gaussian Random Variables 279 v y x w FIGURE 5.24 Contours of equal value of joint Gaussian pdf discussed in Example 5.45. (b) (a) FIGURE 5.25 Jointly Gaussian pdf (a) r = 0 (b) r = – 0.9. Figure 5.26 shows the orientation of these elliptical contours for various values of s1 , s2 , and rX,Y . When rX,Y = 0, that is, when X and Y are independent, the equal-pdf contour is an ellipse with principal axes aligned with the x- and y-axes. When rX,Y Z 0, the major axis of the ellipse is oriented along the angle [Edwards and Penney, pp. 570–571] u = 1 2 arctan-1 tan ¢ 2rX,Ys1s2 s21 - s22 Note that the angle is 45° when the variances are equal. ≤. (5.62) Chapter 5 Pairs of Random Variables y y ) , m2 (m 1 ,m ) 2 m1 ( θ π 4 x σ1  σ2 0θ π 4 x (a) π θ⫽ 4 σ1 ⫽ σ2 (b) 1, m 2) y (m 280 θ x π π θ 4 2 σ1  σ2 (c) FIGURE 5.26 Orientation of contours of equal value of joint Gaussian pdf for rX,Y 7 0. The marginal pdf of X is found by integrating fX,Y1x, y2 over all y. The integration is carried out by completing the square in the exponent as was done in Example 5.18. The result is that the marginal pdf of X is fX1x2 = e -1x - m12 /2s 1 2 22ps1 2 (5.63) , that is, X is a Gaussian random variable with mean m1 and variance s21 . Similarly, the marginal pdf for Y is found to be Gaussian with pdf mean m2 and variance s22 . The conditional pdf’s fX1x ƒ y2 and fY1y ƒ x2 give us information about the interrelation between X and Y. The conditional pdf of X given Y = y is fX1x ƒ y2 = fX,Y1x, y2 fY1y2 exp b = 2 s1 -1 1y m 2 m x r B R r 2 1 X,Y s2 211 - r2X,Y2s21 22ps2111 - r2X,Y2 . (5.64) Section 5.9 Pairs of Jointly Gaussian Random Variables 281 Equation (5.64) shows that the conditional pdf of X given Y = y is also Gaussian but with conditional mean m1 + rX,Y1s1/s221y - m22 and conditional variance s2111 - r2X,Y2. Note that when rX,Y = 0, the conditional pdf of X given Y = y equals the marginal pdf of X.This is consistent with the fact that X and Y are independent when rX,Y = 0. On the other hand, as ƒ rX,Y ƒ : 1 the variance of X about the conditional mean approaches zero, so the conditional pdf approaches a delta function at the conditional mean. Thus when ƒ rX,Y ƒ = 1, the conditional variance is zero and X is equal to the conditional mean with probability one.We note that similarly fY1y ƒ x2 is Gaussian with conditional mean m2 + rX,Y 2 1s2/s121x - m12 and conditional variance s2211 -rX,Y 2. We now show that the rX,Y in Eq. (5.61a) is indeed the correlation coefficient between X and Y. The covariance between X and Y is defined by COV1X, Y2 = E31X - m121Y - m224 = E3E31X - m121Y - m22 ƒ Y44. Now the conditional expectation of 1X - m121Y - m22 given Y = y is E31X - m121Y - m22 ƒ Y = y4 = 1y - m22E3X - m1 ƒ Y = y4 = 1y - m221E3X ƒ Y = y4 - m12 = 1y - m22 ¢ rX,Y s1 1y - m22 ≤ , s2 where we have used the fact that the conditional mean of X given Y = y is m1 + rX,Y1s1/s221y - m22. Therefore E31X - m121Y - m22 ƒ Y4 = rX,Y s1 1Y - m222 s2 and COV1X, Y2 = E3E31X - m121Y - m22 ƒ Y44 = rX,Y = rX,Ys1s2 . s1 E31Y - m2224 s2 The above equation is consistent with the definition of the correlation coefficient, rX,Y = COV1X, Y2/s1s2 . Thus the rX,Y in Eq. (5.61a) is indeed the correlation coefficient between X and Y. Example 5.46 The amount of yearly rainfall in city 1 and in city 2 is modeled by a pair of jointly Gaussian random variables, X and Y, with pdf given by Eq. (5.61a). Find the most likely value of X given that we know Y = y. The most likely value of X given Y = y is the value of x for which fX1x ƒ y2 is maximum. The conditional pdf of X given Y = y is given by Eq. (5.64), which is maximum at the conditional mean E3X ƒ y4 = m1 + rX,Y s1 1y - m22. s2 Note that this “maximum likelihood” estimate is a linear function of the observation y. 282 Chapter 5 Pairs of Random Variables Example 5.47 Estimation of Signal in Noise Let Y = X + N where X (the “signal”) and N (the “noise’) are independent zero-mean Gaussian random variables with different variances. Find the correlation coefficient between the observed signal Y and the desired signal X. Find the value of x that maximizes fX1x ƒ y2. The mean and variance of Y and the covariance of X and Y are: E3Y4 = E3X4 + E3N4 = 0 s2Y = E3Y24 = E31X + N224 = E3X2 + 2XN + N 24 = E3X24 + E3N 24 = sX2 + sN2 . COV1X, Y2 = E31X - E3X421E1Y - E3Y424 = E3XY4 = E3X1X + N24 = sX2 . Therefore, the correlation coefficient is: rX,Y = COV1X, Y2 = sXsY sX sX = = sY 1s2X + s2N21/2 1 2 sN ¢1 + 2 sX ≤ 1/2 . 2 2 2 Note that rX,Y = sX /sY2 = 1 - sN /sY2 . To find the joint pdf of X and Y consider the following linear transformation: X = X Y = X + N X = X N = -X + Y. which has inverse From Eq. (5.52) we have: fX,Y1x, y2 = fX, N1x, y2 det A 2 = ` 2 2 = x = x, n = y - x 2 2 2 2 e -x /2sX e -n /2sN 22psX 22psN ` x = x, n = y - x 2 e -x /2sX e -1y - x2 /2sN . 22psX 22psN The conditional pdf of the signal X given the observation Y is then: fX1x ƒ y2 = = fX,Y1x, y2 fY1y2 2 22psX 2 22psN 2 expe - 21 22psNsX/sY 1 2 2 Ax - rX,Y 2sX - A sX2 2 21 - rX,Y sX 2 B + sX 2 sX 2 2 e -y /2sY expe - 21 A A sxX B 2 + A y s-N x B 2 - A syY B 2 B f expe - 211 = e -x /2sX e -1y - x2 /2sN 22psY 2 = Ax - sX2 2 22psNsX/sY = yB2 f s2Y 2 2 sXsN sY yB2 f . This pdf has its maximum value, when the argument of the exponent is zero, that is, x = ¢ s2X s2X + s2N 1 2 ≤ y = £ 1 + sN ≥y. 2 sX Section 5.9 Pairs of Jointly Gaussian Random Variables 283 y w v θ x FIGURE 5.27 A rotation of the coordinate system transforms a pair of dependent Gaussian random variables into a pair of independent Gaussian random variables. The signal-to-noise ratio (SNR) is defined as the ratio of the variance of X and the variance of N. At high SNRs this estimator gives x L y, and at very low signal-to-noise ratios, it gives x L 0. Example 5.48 Rotation of Jointly Gaussian Random Variables The ellipse corresponding to an arbitrary two-dimensional Gaussian vector forms an angle u = 2rs1s2 1 arctan ¢ 2 ≤ 2 s1 - s22 relative to the x-axis. Suppose we define a new coordinate system whose axes are aligned with those of the ellipse as shown in Fig. 5.27. This is accomplished by using the following rotation matrix: B V cos u R = B W -sin u sin u X R B R. cos u Y To show that the new random variables are independent it suffices to show that they have covariance zero: COV1V, W2 = E31V - E3V421W - E3W424 = E351X - m12cos u + 1Y - m22sin u6 * 5-1X - m12sin u + 1Y - m22 cos u64 = -s21 sin u cos u + COV1X, Y2cos2 u -COV1X, Y2sin2 u + s22 sin u cos u = = 1s22 - s212sin 2u + 2 COV1X, Y2cos 2u 2 cos 2u31s22 - s212 tan 2u + 2 COV1X, Y24 2 . 284 Chapter 5 Pairs of Random Variables If we let the angle of rotation u be such that tan 2u = 2 COV1X, Y2 s21 - s22 then the covariance of V and W is zero as required. *5.10 , GENERATING INDEPENDENT GAUSSIAN RANDOM VARIABLES We now present a method for generating unit-variance, uncorrelated (and hence independent) jointly Gaussian random variables. Suppose that X and Y are two independent zero-mean, unit-variance jointly Gaussian random variables with pdf: fX,Y1x, y2 = 1 -1x2 + y22/2 e . 2p In Example 5.44 we saw that the transformation R = 2X2 + Y2 and ® = tan-1 Y/X leads to the pair of independent random variables fR,®1r, u2 = 1 -r2/2 re = fR1r2f®1u2, 2p where R is a Rayleigh random variable and ® is a uniform random variable. The above transformation is invertible. Therefore we can also start with independent Rayleigh and uniform random variables and produce zero-mean, unit-variance independent Gaussian random variables through the transformation: X = R cos ® and Y = R sin ®. (5.65) Consider W = R2 where R is a Rayleigh random variable. From Example 5.41 we then have that: W has pdf fR11w2 1we -1w2/2 1 = e -w/2. 2 1w 2 21w W = R2 has an exponential distribution with l = 1/2. Therefore we can generate R2 by generating an exponential random variable with parameter 1/2, and we can generate ® by generating a random variable that is uniformly distributed in the interval 10, 2p2. If we substitute these random variables into Eq. (5.65), we then obtain a pair of independent zero-mean, unit-variance Gaussian random variables. The above discussion thus leads to the following algorithm: fW1w2 = = 1. Generate U1 and U2 , two independent random variables uniformly distributed in the unit interval. 2. Let R2 = -2 log U1 and ® = 2pU2 . 3. Let X = R cos ® = 1-2 log U121/2 cos 2pU2 and Y = R sin ® = 1-2 log U121/2 sin 2pU2 . Section 5.10 Generating Independent Gaussian Random Variables 285 Then X and Y are independent, zero-mean, unit-variance Gaussian random variables. By repeating the above procedure we can generate any number of such random variables. Example 5.49 Use Octave or MATLAB to generate 1000 independent zero-mean, unit-variance Gaussian random variables. Compare a histogram of the observed values with the pdf of a zero-mean unitvariance random variable. The Octave commands below show the steps for generating the Gaussian random variables. A set of histogram range values K from -4 to 4 is created and used to build a normalized histogram Z. The points in Z are then plotted and compared to the value predicted to fall in each interval by the Gaussian pdf. These plots are shown in Fig. 5.28, which shows excellent agreement. > U1=rand(1000,1); % Create a 1000-element vector U1 (step 1). > U2=rand(1000,1); % Create a 1000-element vector U2 (step 1). > R2=-2*log(U1); % Find R 2 (step 2). > TH=2*pi*U2; % Find u (step 2). > X=sqrt(R2).*sin(TH); % Generate X (step 3). 0.1 0.08 0.06 0.04 0.02 0 ⫺3 ⫺2.5 ⫺2 ⫺1.5 ⫺1 ⫺0.5 0 FIGURE 5.28 Histogram of 1000 observations of a Gaussian random variable. 0.5 1 1.5 2 2.5 3 286 Chapter 5 Pairs of Random Variables 4 3 2 1 0 –1 –2 –3 –4 –4 –3 –2 –1 0 1 2 3 4 FIGURE 5.29 Scattergram of 5000 pairs of jointly Gaussian random variables. > Y=sqrt(R2).*cos(TH); % Generate Y (step 3). > K=-4:.2:4; % Create histogram range values K. > Z=hist(X,K)/1000 % Create normalized histogram Z based on K. > bar(K,Z) % Plot Z. > hold on > stem(K,.2*normal_pdf(K,0,1)) % Compare to values predicted by pdf. We also plotted the X values vs. the Y values for 5000 pairs of generated random variables in a scattergram as shown in Fig. 5.29. Good agreement with the circular symmetry of the jointly Gaussian pdf of zero-mean, unit-variance pairs is observed. In the next chapter we will show how to generate a vector of jointly Gaussian random variables with an arbitrary covariance matrix. SUMMARY • The joint statistical behavior of a pair of random variables X and Y is specified by the joint cumulative distribution function, the joint probability mass function, or the joint probability density function. The probability of any event involving the joint behavior of these random variables can be computed from these functions. Annotated References 287 • The statistical behavior of individual random variables from X is specified by the marginal cdf, marginal pdf, or marginal pmf that can be obtained from the joint cdf, joint pdf, or joint pmf of X. • Two random variables are independent if the probability of a product-form event is equal to the product of the probabilities of the component events. Equivalent conditions for the independence of a set of random variables are that the joint cdf, joint pdf, or joint pmf factors into the product of the corresponding marginal functions. • The covariance and the correlation coefficient of two random variables are measures of the linear dependence between the random variables. • If X and Y are independent, then X and Y are uncorrelated, but not vice versa. If X and Y are jointly Gaussian and uncorrelated, then they are independent. • The statistical behavior of X, given the exact values of X or Y, is specified by the conditional cdf, conditional pmf, or conditional pdf. Many problems lend themselves to a solution that involves conditioning on the value of one of the random variables. In these problems, the expected value of random variables can be obtained by conditional expectation. • The joint pdf of a pair of jointly Gaussian random variables is determined by the means, variances, and covariance. All marginal pdf’s and conditional pdf’s are also Gaussian pdf’s. • Independent Gaussian random variables can be generated by a transformation of uniform random variables. CHECKLIST OF IMPORTANT TERMS Central moments of X and Y Conditional cdf Conditional expectation Conditional pdf Conditional pmf Correlation of X and Y Covariance X and Y Independent random variables Joint cdf Joint moments of X and Y Joint pdf Joint pmf Jointly continuous random variables Jointly Gaussian random variables Linear transformation Marginal cdf Marginal pdf Marginal pmf Orthogonal random variables Product-form event Uncorrelated random variables ANNOTATED REFERENCES Papoulis [1] is the standard reference for electrical engineers for the material on random variables. References [2] and [3] present many interesting examples involving multiple random variables. The book by Jayant and Noll [4] gives numerous applications of probability concepts to the digital coding of waveforms. 1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 2002. 288 Chapter 5 Pairs of Random Variables 2. L. Breiman, Probability and Stochastic Processes, Houghton Mifflin, Boston, 1969. 3. H. J. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences, vol. 1, Wiley, New York, 1979. 4. N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, Englewood Cliffs, N.J., 1984. 5. N. Johnson et al., Continuous Multivariate Distributions, Wiley, New York, 2000. 6. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986. 7. H. Anton, Elementary Linear Algebra, 9th ed., Wiley, New York, 2005. 8. C. H. Edwards, Jr., and D. E. Penney, Calculus and Analytic Geometry, 4th ed., Prentice Hall, Englewood Cliffs, N.J., 1994. PROBLEMS Section 5.1: Two Random Variables 5.1. Let X be the maximum and let Y be the minimum of the number of heads obtained when Carlos and Michael each flip a fair coin twice. (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the probabilities for all values of (X, Y). (c) Find P3X = Y4. (d) Repeat parts b and c if Carlos uses a biased coin with P3heads4 = 3/4. 5.2. Let X be the difference and let Y be the sum of the number of heads obtained when Carlos and Michael each flip a fair coin twice. (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the probabilities for all values of (X, Y). (c) Find P3X + Y = 14, P3X + Y = 24. 5.3. The input X to a communication channel is “ -1”or “1”, with respective probabilities 1/4 and 3/4. The output of the channel Y is equal to: the corresponding input X with probability 1 - p - pe ; -X with probability p; 0 with probability pe . (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the probabilities for all values of (X, Y). (c) Find P3X Z Y4, P3Y = 04. 5.4. (a) Specify the range of the pair 1N1 , N22 in Example 5.2. (b) Specify and sketch the event “more revenue comes from type 1 requests than type 2 requests.” 5.5. (a) Specify the range of the pair (Q, R) in Example 5.3. (b) Specify and sketch the event “last packet is more than half full.” 5.6. Let the pair of random variables H and W be the height and weight in Example 5.1. The body mass index is a measure of body fat and is defined by BMI = W/H 2 where W is in kilograms and H is in meters. Determine and sketch on the plane the following events: A = 5“obese,” BMI Ú 306; B = 5“overweight,” 25 … BMI 6 306; C = 5“normal,” 18.5 … BMI 6 256; and D = 5“underweight,” BMI 6 18.56. Problems 289 5.7. Let (X, Y) be the two-dimensional noise signal in Example 5.4. Specify and sketch the events: (a) “Maximum noise magnitude is greater than 5.” (b) “The noise power X2 + Y2 is greater than 4.” (c) “The noise power X2 + Y2 is greater than 4 and less than 9.” 5.8. For the pair of random variables (X, Y) sketch the region of the plane corresponding to the following events. Identify which events are of product form. (a) 5X + Y 7 36. (b) 5eX 7 Ye36. (c) 5min1X, Y2 7 06 ´ 5max5X, Y2 6 06. (d) 5 ƒ X - Y ƒ Ú 16. (e) 5 ƒ X/Y ƒ 7 26. (f) 5X/Y 6 26. (g) 5X3 7 Y6. (h) 5XY 6 06. (i) 5max1 ƒ X ƒ , Y2 6 36. Section 5.2: Pairs of Discrete Random Variables 5.9. (a) (b) (c) 5.10. (a) (b) (c) 5.11. (a) Find and sketch pX,Y1x, y2 in Problem 5.1 when using a fair coin. Find pX1x2 and pY1y2. Repeat parts a and b if Carlos uses a biased coin with P3heads4 = 3/4. Find and sketch pX,Y1x, y2 in Problem 5.2 when using a fair coin. Find pX1x2 and pY1y2. Repeat parts a and b if Carlos uses a biased coin with P3heads4 = 3/4. Find the marginal pmf’s for the pairs of random variables with the indicated joint pmf. (i) X/Y -1 0 1 -1 1/6 0 1/6 (ii) 0 1/6 0 1/6 1 0 1/3 0 X/Y -1 -1 1/9 0 1/9 1 1/9 (iii) 0 1/9 1/9 1/9 1 1/9 1/9 1/9 X/Y -1 -1 1/3 0 0 1 0 0 0 1/3 0 1 0 0 1/3 (b) Find the probability of the events A = 5X 7 06, B = 5X Ú Y6, and C = 5X = -Y6 for the above joint pmf’s. 5.12. A modem transmits a two-dimensional signal (X, Y) given by: X = r cos12p®/82 and Y = r sin12p®/82 where ® is a discrete uniform random variable in the set 50, 1, 2, Á , 76. (a) Show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the joint pmf of X and Y. (c) Find the marginal pmf of X and of Y. (d) Find the probability of the following events: A = 5X = 06, B = 5Y … r> 226, C = 5X Ú r> 22, Y Ú r> 226, D = 5X 6 -r> 226. 290 Chapter 5 Pairs of Random Variables 5.13. Let N1 be the number of Web page requests arriving at a server in a 100-ms period and let N2 be the number of Web page requests arriving at a server in the next 100-ms period. Assume that in a 1-ms interval either zero or one page request takes place with respective probabilities 1 - p = 0.95 and p = 0.05, and that the requests in different 1-ms intervals are independent of each other. (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the joint pmf of X and Y. (c) Find the marginal pmf for X and for Y. (d) Find the probability of the events A = 5X Ú Y6, B = 5X = Y = 06, C = 5X 7 5, Y 7 36. (e) Find the probability of the event D = 5X + Y = 106. 5.14. Let N1 be the number of Web page requests arriving at a server in the period (0, 100) ms and let N2 be the total combined number of Web page requests arriving at a server in the period (0, 200) ms. Assume arrivals occur as in Problem 5.13. (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the joint pmf of N1 and N2 . (c) Find the marginal pmf for N1 and N2 . (d) Find the probability of the events A = 5N1 6 N26, B = 5N2 = 06, C = 5N1 7 5, N2 7 36, D = 5 ƒ N2 - 2N1 ƒ 6 26. 5.15. At even time instants, a robot moves either + ¢ cm or - ¢ cm in the x-direction according to the outcome of a coin flip; at odd time instants, a robot moves similarly according to another coin flip in the y-direction. Assuming that the robot begins at the origin, let X and Y be the coordinates of the location of the robot after 2n time instants. (a) Describe the underlying space S of this random experiment and show the mapping from S to SXY , the range of the pair (X, Y). (b) Find the marginal pmf of the coordinates X and Y. (c) Find the probability that the robot is within distance 22 of the origin after 2n time instants. Section 5.3: The Joint cdf of x and y 5.16. (a) Sketch the joint cdf for the pair (X, Y) in Problem 5.1 and verify that the properties of the joint cdf are satisfied. You may find it helpful to first divide the plane into regions where the cdf is constant. (b) Find the marginal cdf of X and of Y. 5.17. A point 1X, Y2 is selected at random inside a triangle defined by 51x, y2 : 0 … y … x … 16. Assume the point is equally likely to fall anywhere in the triangle. (a) Find the joint cdf of X and Y. (b) Find the marginal cdf of X and of Y. (c) Find the probabilities of the following events in terms of the joint cdf: A = 5X … 1/2, Y … 3/46; B = 51/4 6 X … 3/4 , 1/4 6 Y … 3/46. 5.18. A dart is equally likely to land at any point 1X1 , X22 inside a circular target of unit radius. Let R and ® be the radius and angle of the point 1X1 , X22. (a) Find the joint cdf of R and ®. (b) Find the marginal cdf of R and ®. Problems 291 (c) Use the joint cdf to find the probability that the point is in the first quadrant of the real plane and that the radius is greater than 0.5. 5.19. Find an expression for the probability of the events in Problem 5.8 parts c, h, and i in terms of the joint cdf of X and Y. 5.20. The pair (X, Y) has joint cdf given by: FX,Y1x, y2 = b 11 - 1/x2211 - 1/y22 0 for x 7 1, y 7 1 elsewhere. (a) Sketch the joint cdf. (b) Find the marginal cdf of X and of Y. (c) Find the probability of the following events: 5X 6 3, Y … 56, 5X 7 4, Y 7 36. 5.21. Is the following a valid cdf? Why? FX,Y1x, y2 = b 11 - 1/x2y22 0 for x 7 1, y 7 1 elsewhere. 5.22. Let FX1x2 and FY1y2 be valid one-dimensional cdf’s. Show that FX,Y1x, y2 = FX1x2FY1y2 satisfies the properties of a two-dimensional cdf. 5.23. The number of users logged onto a system N and the time T until the next user logs off have joint probability given by: P3N = n, X … t4 = 11 - r2rn - 111 - e -nlt2 for n = 1, 2, Á t 7 0. (a) Sketch the above joint probability. (b) Find the marginal pmf of N. (c) Find the marginal cdf of X. (d) Find P3N … 3, X 7 3/l4. 5.24. A factory has n machines of a certain type. Let p be the probability that a machine is working on any given day, and let N be the total number of machines working on a certain day. The time T required to manufacture an item is an exponentially distributed random variable with rate ka if k machines are working. Find and P3T … t4. Find P3T … t4 as t : q and explain the result. Section 5.4: The Joint pdf of Two Continuous Random Variables 5.25. The amplitudes of two signals X and Y have joint pdf: fX,Y1x, y2 = e -x/2ye -y 2 for x 7 0, y 7 0. (a) Find the joint cdf. (b) Find P3X1/2 7 Y4. (c) Find the marginal pdfs. 5.26. Let X and Y have joint pdf: fX,Y1x, y2 = k1x + y2 (a) (b) (c) (d) for 0 … x … 1, 0 … y … 1. Find k. Find the joint cdf of (X, Y). Find the marginal pdf of X and of Y. Find P3X 6 Y4, P3Y 6 X24, P3X + Y 7 0.54. 292 Chapter 5 Pairs of Random Variables 5.27. Let X and Y have joint pdf: fX,Y1x, y2 = kx11 - x2y for 0 6 x 6 1, 0 6 y 6 1. (a) Find k. (b) Find the joint cdf of (X, Y). (c) Find the marginal pdf of X and of Y. (d) Find P3Y 6 X1/24, P3X 6 Y4. 5.28. The random vector (X, Y) is uniformly distributed (i.e., f1x, y2 = k) in the regions shown in Fig. P5.1 and zero elsewhere. (i) (ii) y 1 (iii) y 1 1 x y 1 1 x 1 x FIGURE P5.1 (a) Find the value of k in each case. (b) Find the marginal pdf for X and for Y in each case. (c) Find P3X 7 0, Y 7 04. 5.29. (a) Find the joint cdf for the vector random variable introduced in Example 5.16. (b) Use the result of part a to find the marginal cdf of X and of Y. 5.30. Let X and Y have the joint pdf: fX,Y1x, y2 = ye -y11 + x2 5.31. 5.32. 5.33. 5.34. for x 7 0, y 7 0. Find the marginal pdf of X and of Y. Let X and Y be the pair of random variables in Problem 5.17. (a) Find the joint pdf of X and Y. (b) Find the marginal pdf of X and of Y. (c) Find P3Y 6 X24. Let R and ® be the pair of random variables in Problem 5.18. (a) Find the joint pdf of R and ®. (b) Find the marginal pdf of R and of ®. Let (X, Y) be the jointly Gaussian random variables discussed in Example 5.18. Find P3X2 + Y2 7 r24 when r = 0. Hint: Use polar coordinates to compute the integral. The general form of the joint pdf for two jointly Gaussian random variables is given by Eq. (5.61a). Show that X and Y have marginal pdfs that correspond to Gaussian random variables with means m1 and m2 and variances s21 and s22 respectively. Problems 293 5.35. The input X to a communication channel is +1 or –1 with probability p and 1 – p, respectively. The received signal Y is the sum of X and noise N which has a Gaussian distribution with zero mean and variance s2 = 0.25. (a) Find the joint probability P3X = j, Y … y4. (b) Find the marginal pmf of X and the marginal pdf of Y. (c) Suppose we are given that Y 7 0. Which is more likely, X = 1 or X = -1? 5.36. A modem sends a two-dimensional signal X from the set 511, 12, 11, -12, 1-1, 12, 1-1, -126. The channel adds a noise signal 1N1 , N22, so the received signal is Y = X + N = 1X1 + N1 , X2 + N22. Assume that 1N1 , N22 have the jointly Gaussian pdf in Example 5.18 with r = 0. Let the distance between X and Y be d1X, Y2 = 51X1 - Y122 + 1X2 - Y22261/2. (a) Suppose that X = 11, 12. Find and sketch region for the event 5Y is closer to (1, 1) than to the other possible values of X6. Evaluate the probability of this event. (b) Suppose that X = 11, 12. Find and sketch region for the event 5Y is closer to 11, -12 than to the other possible values of X6. Evaluate the probability of this event. (c) Suppose that X = 11, 12. Find and sketch region for the event 5d1X, Y2 7 16. Evaluate the probability of this event. Explain why this probability is an upper bound on the probability that Y is closer to a signal other than X = 11, 12. Section 5.5: Independence of Two Random Variables 5.37. Let X be the number of full pairs and let Y be the remainder of the number of dots observed in a toss of a fair die. Are X and Y independent random variables? 5.38. Let X and Y be the coordinates of the robot in Problem 5.15 after 2n time instants. Determine whether X and Y are independent random variables. 5.39. Let X and Y be the coordinates of the two-dimensional modem signal (X, Y) in Problem 5.12. (a) Determine if X and Y are independent random variables. (b) Repeat part a if even values of ® are twice as likely as odd values. 5.40. Determine which of the joint pmfs in Problem 5.11 correspond to independent pairs of random variables. 5.41. Michael takes the 7:30 bus every morning. The arrival time of the bus at the stop is uniformly distributed in the interval [7:27, 7:37]. Michael’s arrival time at the stop is also uniformly distributed in the interval [7:25, 7:40]. Assume that Michael’s and the bus’s arrival times are independent random variables. (a) What is the probability that Michael arrives more than 5 minutes before the bus? (b) What is the probability that Michael misses the bus? 5.42. Are R and ® independent in Problem 5.18? 5.43. Are X and Y independent in Problem 5.20? 5.44. Are the signal amplitudes X and Y independent in Problem 5.25? 5.45. Are X and Y independent in Problem 5.26? 5.46. Are X and Y independent in Problem 5.27? 294 Chapter 5 Pairs of Random Variables 5.47. Let X and Y be independent random variables. Find an expression for the probability of the following events in terms of FX1x2 and FY1y2. (a) 5a 6 X … b6 ¨ 5Y 7 d6. (b) 5a 6 X … b6 ¨ 5c … Y 6 d6. (c) 5 ƒ X ƒ 6 a6 ¨ 5c … Y … d6. 5.48. Let X and Y be independent random variables that are uniformly distributed in 3-1, 14. Find the probability of the following events: (a) P3X2 6 1/2, ƒ Y ƒ 6 1/24. (b) P34X 6 1, Y 6 04. (c) P3XY 6 1/24. (d) P3max1X, Y2 6 1/34. 5.49. Let X and Y be random variables that take on values from the set 5-1, 0, 16. (a) Find a joint pmf for which X and Y are independent. (b) Are X2 and Y2 independent random variables for the pmf in part a? (c) Find a joint pmf for which X and Y are not independent, but for which X2 and Y2 are independent. 5.50. Let X and Y be the jointly Gaussian random variables introduced in Problem 5.34. (a) Show that X and Y are independent random variables if and only if r = 0. (b) Suppose r = 0, find P3XY 6 04. 5.51. Two fair dice are tossed repeatedly until a pair occurs. Let K be the number of tosses required and let X be the number showing up in the pair. Find the joint pmf of K and X and determine whether K and X are independent. 5.52. The number of devices L produced in a day is geometric distributed with probability of success p. Let N be the number of working devices and let M be the number of defective devices produced in a day. (a) Are N and M independent random variables? (b) Find the joint pmf of N and M. (c) Find the marginal pmfs of N and M. (See hint in Problem 5.87b.) (d) Are L and M independent random variables? 5.53. Let N1 be the number of Web page requests arriving at a server in a 100-ms period and let N2 be the number of Web page requests arriving at a server in the next 100-ms period. Use the result of Problem 5.13 parts a and b to develop a model where N1 and N2 are independent Poisson random variables. 5.54. (a) Show that Eq. (5.22) implies Eq. (5.21). (b) Show that Eq. (5.21) implies Eq. (5.22). 5.55. Verify that Eqs. (5.22) and (5.23) can be obtained from each other. Section 5.6: Joint Moments and Expected Values of a Function of Two Random Variables 5.56. (a) Find E31X + Y224. (b) Find the variance of X + Y. (c) Under what condition is the variance of the sum equal to the sum of the individual variances? Problems 295 5.57. Find E3 ƒ X - Y ƒ 4 if X and Y are independent exponential random variables with parameters l1 = 1 and l2 = 2, respectively. 5.58. Find E3X2eY4 where X and Y are independent random variables, X is a zero-mean, unit-variance Gaussian random variable, and Y is a uniform random variable in the interval [0, 3]. 5.59. For the discrete random variables X and Y in Problem 5.1, find the correlation and covariance, and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.60. For the discrete random variables X and Y in Problem 5.2, find the correlation and covariance, and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.61. For the three pairs of discrete random variables in Problem 5.11, find the correlation and covariance of X and Y, and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.62. Let N1 and N2 be the number of Web page requests in Problem 5.13. Find the correlation and covariance of N1 and N2 , and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.63. Repeat Problem 5.62 for N1 and N2 , the number of Web page requests in Problem 5.14. 5.64. Let N and T be the number of users logged on and the time till the next logoff in Problem 5.23. Find the correlation and covariance of N and T, and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.65. Find the correlation and covariance of X and Y in Problem 5.26. Determine whether X and Y are independent, orthogonal, or uncorrelated. 5.66. Repeat Problem 5.65 for X and Y in Problem 5.27. 5.67. For the three pairs of continuous random variables X and Y in Problem 5.28, find the correlation and covariance, and indicate whether the random variables are independent, orthogonal, or uncorrelated. 5.68. Find the correlation coefficient between X and Y = aX + b. Does the answer depend on the sign of a? 5.69. Propose a method for estimating the covariance of two random variables. 5.70. (a) Complete the calculations for the correlation coefficient in Example 5.28. (b) Repeat the calculations if X and Y have the pdf: fX,Y1x, y2 = e -1x + ƒyƒ2 for x 7 0, -x 6 y 6 x. 5.71. The output of a channel Y = X + N, where the input X and the noise N are independent, zero-mean random variables. (a) Find the correlation coefficient between the input X and the output Y. (b) Suppose we estimate the input X by a linear function g1Y2 = aY. Find the value of a that minimizes the mean squared error E31X - aY224. (c) Express the resulting mean-square error in terms of sX/sN . 5.72. In Example 5.27 let X = cos ®/4 and Y = sin ®/4. Are X and Y uncorrelated? 5.73. (a) Show that COV1X, E3Y ƒ X42 = COV1X, Y2. (b) Show that E3Y ƒ X = x4 = E3Y4, for all x, implies that X and Y are uncorrelated. 5.74. Use the fact that E31tX + Y224 Ú 0 for all t to prove the Cauchy-Schwarz inequality: 1E3XY422 … E3X24E3Y24. Hint: Consider the discriminant of the quadratic equation in t that results from the above inequality. 296 Chapter 5 Pairs of Random Variables Section 5.7: Conditional Probability and Conditional Expectation 5.75. (a) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.1 assuming fair coins are used. (b) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.1 assuming Carlos uses a coin with p = 3/4. (c) What is the effect on pX1x ƒ y2 of Carlos using a biased coin? (d) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in part a; then find E[X] and E[Y]. (e) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in part b; then find E[X] and E[Y]. 5.76. (a) Find pX1x ƒ y2 for the communication channel in Problem 5.3. (b) For each value of y, find the value of x that maximizes pX1x ƒ y2. State any assumptions about p and pe . (c) Find the probability of error if a receiver uses the decision rule from part b. 5.77. (a) In Problem 5.11(i), which conditional pmf given X provides the most information about Y: pY1y ƒ -12, pY1y ƒ 02, or pY1y ƒ +12? Explain why. (b) Compare the conditional pmfs in Problems 5.11(ii) and (iii) and explain which of these two cases is “more random.” (c) Find E3Y ƒ X = x4 and E3X ƒ Y = y4 in Problems 5.11(i), (ii), (iii); then find E[X] and E[Y]. (d) Find E3Y2 ƒ X = x4 and E3X2 ƒ Y = y4 in Problems 5.11(i), (ii), (iii); then find VAR[X] and VAR[Y]. 5.78. (a) Find the conditional pmf of N1 given N2 in Problem 5.14. (b) Find P3N1 = k ƒ N2 = 2k4 for k = 5, 10, 20. Hint: Use Stirling’s fromula. (c) Find E3N1 ƒ N2 = k4, then find E3N14. 5.79. In Example 5.30, let Y be the number of defects inside the region R and let Z be the number of defects outside the region. (a) Find the pmf of Z given Y. (b) Find the joint pmf of Y and Z. (c) Are Y and Z independent random variables? Is the result intuitive? 5.80. (a) Find fY1y ƒ x2 in Problem 5.26. (b) Find P3Y 7 X ƒ x4. (c) Find P3Y 7 X4 using part b. (d) Find E3Y ƒ X = x4. 5.81. (a) Find fY1y ƒ x2 in Problem 5.28(i). (b) Find E3Y ƒ X = x4 and E 3 Y4. (c) Repeat parts a and b of Problem 5.28(ii). (d) Repeat parts a and b of Problem 5.28(iii). 5.82. (a) Find fY1y ƒ x2 in Example 5.27. (b) Find E3Y ƒ X = x4. (c) Find E3Y4. (d) Find E3XY ƒ X = x4. (e) Find E3XY4. 5.83. Find fY1y ƒ x2 and fX1x ƒ y2 for the jointly Gaussian pdf in Problem 5.34. 5.84. (a) Find fX1t ƒ N = n2 in Problem 5.23. (b) Find E3Xt ƒ N = n4. (c) Find the value of n that maximizes P3N = n ƒ t 6 X 6 t + dt4. Problems 297 5.85. (a) Find pY1y ƒ x2 and pX1x ƒ y2 in Problem 5.12. (b) Find E3Y ƒ X = x4. (c) Find E3XY ƒ X = x4 and E3XY4. 5.86. A customer enters a store and is equally likely to be served by one of three clerks. The time taken by clerk 1 is a constant random variable with mean two minutes; the time for clerk 2 is exponentially distributed with mean two minutes; and the time for clerk 3 is Pareto distributed with mean two minutes and a = 2.5. (a) Find the pdf of T, the time taken to service a customer. (b) Find E[T] and VAR[T]. 5.87. A message requires N time units to be transmitted, where N is a geometric random variable with pmf pi = 11 - a2ai - 1, i = 1, 2, Á . A single new message arrives during a time unit with probability p, and no messages arrive with probability 1 - p. Let K be the number of new messages that arrive during the transmission of a single message. (a) Find E[K] and VAR[K] using conditional expectation. n (b) Find the pmf of K. Hint: 11 - b2-1k + 12 = a ¢ ≤ b n - k. k n=k q (c) Find the conditional pmf of N given K = k. (d) Find the value of n that maximizes P3N = n ƒ X = k4. 5.88. The number of defects in a VLSI chip is a Poisson random variable with rate r. However, r is itself a gamma random variable with parameters a and l. (a) Use conditional expectation to find E[N] and VAR[N]. (b) Find the pmf for N, the number of defects. 5.89. (a) In Problem 5.35, find the conditional pmf of the input X of the communication channel given that the output is in the interval y 6 Y … y + dy. (b) Find the value of X that is more probable given y 6 Y … y + dy. (c) Find an expression for the probability of error if we use the result of part b to decide what the input to the channel was. Section 5.8: Functions of Two Random Variables 5.90. Two toys are started at the same time each with a different battery. The first battery has a lifetime that is exponentially distributed with mean 100 minutes; the second battery has a Rayleigh-distributed lifetime with mean 100 minutes. (a) Find the cdf to the time T until the battery in a toy first runs out. (b) Suppose that both toys are still operating after 100 minutes. Find the cdf of the time T2 that subsequently elapses until the battery in a toy first runs out. (c) In part b, find the cdf of the total time that elapses until a battery first fails. 5.91. (a) Find the cdf of the time that elapses until both batteries run out in Problem 5.90a. (b) Find the cdf of the remaining time until both batteries run out in Problem 5.90b. 5.92. Let K and N be independent random variables with nonnegative integer values. (a) Find an expression for the pmf of M = K + N. (b) Find the pmf of M if K and N are binomial random variables with parameters (k, p) and (n, p). (c) Find the pmf of M if K and N are Poisson random variables with parameters a1 and a2 , respectively. 298 Chapter 5 Pairs of Random Variables 5.93. The number X of goals the Bulldogs score against the Flames has a geometric distribution with mean 2; the number of goals Y that the Flames score against the Bulldogs is also geometrically distributed but with mean 4. (a) Find the pmf of the Z = X - Y. Assume X and Y are independent. (b) What is the probability that the Bulldogs beat the Flames? Tie the Flames? (c) Find E[Z]. 5.94. Passengers arrive at an airport taxi stand every minute according to a Bernoulli random variable. A taxi will not leave until it has two passengers. (a) Find the pmf until the time T when the taxi has two passengers. (b) Find the pmf for the time that the first customer waits. 5.95. Let X and Y be independent random variables that are uniformly distributed in the interval [0, 1]. Find the pdf of Z = XY. 5.96. Let X1 , X2 , and X3 be independent and uniformly distributed in 3-1, 14. (a) Find the cdf and pdf of Y = X1 + X2 . (b) Find the cdf of Z = Y + X3 . 5.97. Let X and Y be independent random variables with gamma distributions and parameters 1a1 , l2 and 1a2 , l2, respectively. Show that Z = X + Y is gamma-distributed with parameters 1a1 + a2 , l2. Hint: See Eq. (4.59). 5.98. Signals X and Y are independent. X is exponentially distributed with mean 1 and Y is exponentially distributed with mean 1. (a) Find the cdf of Z = ƒ X - Y ƒ . (b) Use the result of part a to find E[Z]. 5.99. The random variables X and Y have the joint pdf fX,Y1x, y2 = e -1x + y2 for 0 6 y 6 x 6 1. Find the pdf of Z = X + Y. 5.100. Let X and Y be independent Rayleigh random variables with parameters a = b = 1. Find the pdf of Z = X/Y. 5.101. Let X and Y be independent Gaussian random variables that are zero mean and unit variance. Show that Z = X/Y is a Cauchy random variable. 5.102. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are independent and X is uniformly distributed in [0, 1] and Y is uniformly distributed in [0, 1]. 5.103. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are independent exponential random variables with the same mean. 5.104. Find the joint cdf of W = min1X, Y2 and Z = max1X, Y2 if X and Y are the independent Pareto random variables with the same distribution. 5.105. Let W = X + Y and Z = X - Y. (a) Find an expression for the joint pdf of W and Z. (b) Find fW,Z1z, w2 if X and Y are independent exponential random variables with parameter l = 1. (c) Find fW,Z1z, w2 if X and Y are independent Pareto random variables with the same distribution. 5.106. The pair (X, Y) is uniformly distributed in a ring centered about the origin and inner and outer radii r1 6 r2 . Let R and ® be the radius and angle corresponding to (X, Y). Find the joint pdf of R and ®. Problems 299 5.107. Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Let V = aX + bY and W = cX + eY. (a) Find the joint pdf of V and W, assuming the transformation matrix A is invertible. (b) Suppose A is not invertible. What is the joint pdf of V and W? 5.108. Let X and Y be independent Gaussian random variables that are zero mean and unit variance. Let W = X2 + Y2 and let ® = tan-11Y/X2. Find the joint pdf of W and ®. 5.109. Let X and Y be the random variables introduced in Example 5.4. Let R = 1X2 + Y221/2 and let ® = tan-11Y/X2. (a) Find the joint pdf of R and ®. (b) What is the joint pdf of X and Y? Section 5.9: Pairs of Jointly Gaussian Variables 5.110. Let X and Y be jointly Gaussian random variables with pdf fX,Y1x, y2 = exp5-2x2 - y2/26 2pc for all x, y. Find VAR[X], VAR[Y], and COV(X, Y). 5.111. Let X and Y be jointly Gaussian random variables with pdf fX,Y1x, y2 = 5.112. 5.113. 5.114. 5.115. 5.116. expe -1 2 3x + 4y2 - 3xy + 3y - 2x + 14 f 2 2p for all x, y. Find E[X], E[Y], VAR[X], VAR[Y], and COV(X, Y). Let X and Y be jointly Gaussian random variables with E3Y4 = 0, s1 = 1, s2 = 2, and E3X ƒ Y4 = Y/4 + 1. Find the joint pdf of X and Y. Let X and Y be zero-mean, independent Gaussian random variables with s2 = 1. (a) Find the value of r for which the probability that (X, Y) falls inside a circle of radius r is 1/2. (b) Find the conditional pdf of (X, Y) given that (X, Y) is not inside a ring with inner radius r1 and outer radius r2 . Use a plotting program (as provided by Octave or MATLAB) to show the pdf for jointly Gaussian zero-mean random variables with the following parameters: (a) s1 = 1, s2 = 1, r = 0. (b) s1 = 1, s2 = 1, r = 0.8. (c) s1 = 1, s2 = 1, r = -0.8. (d) s1 = 1, s2 = 2, r = 0. (e) s1 = 1, s2 = 2, r = 0.8. (f) s1 = 1, s2 = 10, r = 0.8. Let X and Y be zero-mean, jointly Gaussian random variables with s1 = 1, s2 = 2, and correlation coefficient r. (a) Plot the principal axes of the constant-pdf ellipse of (X, Y). (b) Plot the conditional expectation of Y given X = x. (c) Are the plots in parts a and b the same or different? Why? Let X and Y be zero-mean, unit-variance jointly Gaussian random variables for which r = 1. Sketch the joint cdf of X and Y. Does a joint pdf exist? 300 Chapter 5 Pairs of Random Variables 5.117. Let h(x, y) be a joint Gaussian pdf for zero-mean, unit-variance Gaussian random variables with correlation coefficient r1 . Let g(x, y) be a joint Gaussian pdf for zero-mean, unit-variance Gaussian random variables with correlation coefficient r2 Z r1 . Suppose the random variables X and Y have joint pdf fX,Y1x, y2 = 5h1x, y2 + g1x, y26/2. (a) Find the marginal pdf for X and for Y. (b) Explain why X and Y are not jointly Gaussian random variables. 5.118. Use conditional expectation to show that for X and Y zero-mean, jointly Gaussian random variables, E3X2Y24 = E3X24E3Y24 + 2E3XY42. 5.119. Let X = 1X, Y2 be the zero-mean jointly Gaussian random variables in Problem 5.110. Find a transformation A such that Z = AX has components that are zero-mean, unitvariance Gaussian random variables. 5.120. In Example 5. 47, suppose we estimate the value of the signal X from the noisy observation Y by: N = X 1 Y. 1 + sN2 /sX2 N 224. (a) Evaluate the mean square estimation error: E31X - X (b) How does the estimation error in part a vary with signal-to-noise ratio sX/sN? Section 5.10: Generating Independent Gaussian Random Variables 5.121. Find the inverse of the cdf of the Rayleigh random variable to derive the transformation method for generating Rayleigh random variables. Show that this method leads to the same algorithm that was presented in Section 5.10. 5.122. Reproduce the results presented in Example 5.49. 5.123. Consider the two-dimensional modem in Problem 5.36. (a) Generate 10,000 discrete random variables uniformly distributed in the set 51, 2, 3, 46. Assign each outcome in this set to one of the signals 511, 12, 11, -12, 1-1, 12, 1 -1, -126. The sequence of discrete random variables then produces a sequence of 10,000 signal points X. (b) Generate 10,000 noise pairs N of independent zero-mean, unit-variance jointly Gaussian random variables. (c) Form the sequence of 10,000 received signals Y = 1Y1 , Y22 = X + N. (d) Plot the scattergram of received signal vectors. Is the plot what you expected? N = 1sgn1Y 2, (e) Estimate the transmitted signal by the quadrant that Y falls in: X 1 sgn1Y222. (f) Compare the estimates with the actually transmitted signals to estimate the probability of error. 5.124. Generate a sequence of 1000 pairs of independent zero-mean Gaussian random variables, where X has variance 2 and N has variance 1. Let Y = X + N be the noisy signal from Example 5.47. (a) Estimate X using the estimator in Problem 5.120, and calculate the sequence of estimation errors. (b) What is the pdf of the estimation error? (c) Compare the mean, variance, and relative frequencies of the estimation error with the result from part b. Problems 301 5.125. Let X1 , X2 , Á , X1000 be a sequence of zero-mean, unit-variance independent Gaussian random variables. Suppose that the sequence is “smoothed” as follows: Yn = 1Xn + XN - 12/2 where X0 = 0. (a) Find the pdf of 1Yn , Yn + 12. (b) Generate the sequence of Xn and the corresponding sequence Yn . Plot the scattergram of 1Yn , Yn + 12. Does it agree with the result from part a? (c) Repeat parts a and b for Zn = 1Xn - XN - 12/2. 5.126. Let X and Y be independent, zero-mean, unit-variance Gaussian random variables. Find the linear transformation to generate jointly Gaussian random variables with means m1 , m2 , variances s 21 , s 22 , and correlation coefficient r. Hint: Use the conditional pdf in Eq. (5.64). 5.127. (a) Use the method developed in Problem 5.126 to generate 1000 pairs of jointly Gaussian random variables with m1 = 1, m2 = -1, variances s21 = 1, s22 = 2, and correlation coefficient r = -1/2. (b) Plot a two-dimensional scattergram of the 1000 pairs and compare to equal-pdf contour lines for the theoretical pdf. 5.128. Let H and W be the height and weight of adult males. Studies have shown that H (in cm) and V = ln W (W in kg) are jointly Gaussian with parameters mH = 174 cm, mV = 4.4, s2H = 42.36, s2V = 0.021, and COV1H, V2 = 0.458. (a) Use the method in part a to generate 1000 pairs (H, V). Plot a scattergram to check the joint pdf. (b) Convert the (H, V) pairs into (H, W) pairs. (c) Calculate the body mass index for each outcome, and estimate the proportion of the population that is underweight, normal, overweight, or obese. (See Problem 5.6.) Problems Requiring Cumulative Knowledge 5.129. The random variables X and Y have joint pdf: fX,Y1x, y2 = c sin 1x + y2 (a) (b) (c) (d) 0 … x … p/2, 0 … y … p/2. Find the value of the constant c. Find the joint cdf of X and Y. Find the marginal pdf’s of X and of Y. Find the mean, variance, and covariance of X and Y. 5.130. An inspector selects an item for inspection according to the outcome of a coin flip:The item is inspected if the outcome is heads. Suppose that the time between item arrivals is an exponential random variable with mean one. Assume the time to inspect an item is a constant value t. (a) Find the pmf for the number of item arrivals between consecutive inspections. (b) Find the pdf for the time X between item inspections. Hint: Use conditional expectation. (c) Find the value of p, so that with a probability of 90% an inspection is completed before the next item is selected for inspection. 5.131. The lifetime X of a device is an exponential random variable with mean = 1/R. Suppose that due to irregularities in the production process, the parameter R is random and has a gamma distribution. (a) Find the joint pdf of X and R. (b) Find the pdf of X. (c) Find the mean and variance of X. 302 Chapter 5 Pairs of Random Variables 5.132. Let X and Y be samples of a random signal at two time instants. Suppose that X and Y are independent zero-mean Gaussian random variables with the same variance. When signal “0” is present the variance is s20, and when signal “1” is present the variance is s21 7 s20 . Suppose signals 0 and 1 occur with probabilities p and 1 - p, respectively. Let R2 = X2 + Y2 be the total energy of the two observations. (a) Find the pdf of R2 when signal 0 is present; when signal 1 is present. Find the pdf of R2. (b) Suppose we use the following “signal detection” rule: If R2 7 T, then we decide signal 1 is present; otherwise, we decide signal 0 is present. Find an expression for the probability of error in terms of T. (c) Find the value of T that minimizes the probability of error. 5.133. Let U0 , U1 , Á be a sequence of independent zero-mean, unit-variance Gaussian random variables. A “low-pass filter” takes the sequence Ui and produces the output sequence Xn = 1Un + Un - 12/2, and a “high-pass filter” produces the output sequence Yn = 1Un - Un - 12/2 . (a) Find the joint pdf of Xn and Xn - 1 ; of Xn and Xn + m , m 7 1. (b) Repeat part a for Yn . (c) Find the joint pdf of Xn and Ym . CHAPTER Vector Random Variables 6 In the previous chapter we presented methods for dealing with two random variables. In this chapter we extend these methods to the case of n random variables in the following ways: • By representing n random variables as a vector, we obtain a compact notation for the joint pmf, cdf, and pdf as well as marginal and conditional distributions. • We present a general method for finding the pdf of transformations of vector random variables. • Summary information of the distribution of a vector random variable is provided by an expected value vector and a covariance matrix. • We use linear transformations and characteristic functions to find alternative representations of random vectors and their probabilities. • We develop optimum estimators for estimating the value of a random variable based on observations of other random variables. • We show how jointly Gaussian random vectors have a compact and easy-to-workwith pdf and characteristic function. 6.1 VECTOR RANDOM VARIABLES The notion of a random variable is easily generalized to the case where several quantities are of interest. A vector random variable X is a function that assigns a vector of real numbers to each outcome z in S, the sample space of the random experiment. We use uppercase boldface notation for vector random variables. By convention X is a column vector (n rows by 1 column), so the vector random variable with components X1 , X2 , Á , Xn corresponds to X1 X X ⴝ D . 2 T = 3X1 , X2 , Á , Xn4T, .. Xn 303 304 Chapter 6 Vector Random Variables where “T” denotes the transpose of a matrix or vector. We will sometimes write X = 1X1 , X2 , Á , Xn2 to save space and omit the transpose unless dealing with matrices. Possible values of the vector random variable are denoted by x = 1x1 , x2 , Á , xn2 where xi corresponds to the value of Xi . Example 6.1 Arrivals at a Packet Switch Packets arrive at each of three input ports of a packet switch according to independent Bernoulli trials with p = 1/2. Each arriving packet is equally likely to be destined to any of three output ports. Let X = 1X1 , X2 , X32 where Xi is the total number of packets arriving for output port i. X is a vector random variable whose values are determined by the pattern of arrivals at the input ports. Example 6.2 Joint Poisson Counts A random experiment consists of finding the number of defects in a semiconductor chip and identifying their locations. The outcome of this experiment consists of the vector z = 1n, y1 , y2 , Á , yn2, where the first component specifies the total number of defects and the remaining components specify the coordinates of their location. Suppose that the chip consists of M regions. Let N11z2, N21z2, Á , NM1z2 be the number of defects in each of these regions, that is, Nk1z2 is the number of y’s that fall in region k. The vector N1z2 = 1N1 , N2 , Á , NM2 is then a vector random variable. Example 6.3 Samples of an Audio Signal Let the outcome z of a random experiment be an audio signal X(t). Let the random variable Xk = X1kT2 be the sample of the signal taken at time kT. An MP3 codec processes the audio in blocks of n samples X = 1X1 , X2 , Á , Xn2. X is a vector random variable. 6.1.1 Events and Probabilities Each event A involving X = 1X1 , X2 , Á , Xn2 has a corresponding region in an ndimensional real space Rn. As before, we use “rectangular” product-form sets in R n as building blocks. For the n-dimensional random variable X = 1X1 , X2 , Á , Xn2, we are interested in events that have the product form A = 5X1 in A 16 ¨ 5X2 in A 26 ¨ Á ¨ 5Xn in A n6, (6.1) where each A k is a one-dimensional event (i.e., subset of the real line) that involves Xk only. The event A occurs when all of the events 5Xk in A k6 occur jointly. We are interested in obtaining the probabilities of these product-form events: P3A4 = P3X H A4 = P35X1 in A 16 ¨ 5X2 in A 26 ¨ Á ¨ 5Xn in A n64 ! P3X1 in A 1 , X2 in A 2 , Á , Xn in A n4. (6.2) Section 6.1 Vector Random Variables 305 In principle, the probability in Eq. (6.2) is obtained by finding the probability of the equivalent event in the underlying sample space, that is, P3A4 = P35z in S : X1z2 in A64 = P35z in S : X11z2 H A 1 , X21z2 H A 2 , Á , Xn1z2 H A n64. (6.3) Equation (6.2) forms the basis for the definition of the n-dimensional joint probability mass function, cumulative distribution function, and probability density function. The probabilities of other events can be expressed in terms of these three functions. 6.1.2 Joint Distribution Functions The joint cumulative distribution function of X1 , X2 , Á , Xn is defined as the probability of an n-dimensional semi-infinite rectangle associated with the point 1x1 , Á , xn2: FX1x2 ! FX1, X2, Á , Xn1x1 , x2 , Á , xn2 = P3X1 … x1 , X2 … x2 , Á , Xn … xn4. (6.4) The joint cdf is defined for discrete, continuous, and random variables of mixed type. The probability of product-form events can be expressed in terms of the joint cdf. The joint cdf generates a family of marginal cdf’s for subcollections of the random variables X1 , Á , Xn . These marginal cdf’s are obtained by setting the appropriate entries to + q in the joint cdf in Eq. (6.4). For example: Joint cdf for X1 , Á , Xn - 1 is given by FX1, X2, Á , Xn1x1 , x2 , Á , xn - 1 , q2 and Joint cdf for X1 and X2 is given by FX1, X2 , Á , Xn1x1 , x2 , q, Á , q2. Example 6.4 A radio transmitter sends a signal to a receiver using three paths. Let X1 , X2 , and X3 be the signals that arrive at the receiver along each path. Find P3max1X1 , X2 , X32 … 54. The maximum of three numbers is less than 5 if and only if each of the three numbers is less than 5; therefore P3A4 = P35X1 … 56 ¨ 5X2 … 56 ¨ 5X3 … 564 = FX1,X2,X315, 5, 52. The joint probability mass function of n discrete random variables is defined by pX1x2 ! pX1, X2 , Á , Xn1x1 , x2 , Á , xn2 = P3X1 = x1 , X2 = x2 , Á , Xn = xn4. (6.5) The probability of any n-dimensional event A is found by summing the pmf over the points in the event P3X in A4 = a Á a pX1,X2, Á , Xn1x1 , x2 , Á , xn2. x in A (6.6) 306 Chapter 6 Vector Random Variables The joint pmf generates a family of marginal pmf’s that specifies the joint probabilities for subcollections of the n random variables. For example, the one-dimensional pmf of Xj is found by adding the joint pmf over all variables other than xj: pXj1xj2 = P3Xj = xj4 = a Á a a Á a pX1, X2 , Á , Xn1x1 , x2 , Á , xn2. (6.7) xn xj - 1 xj + 1 x1 The two-dimensional joint pmf of any pair Xj and Xk is found by adding the joint pmf over all n - 2 other variables, and so on. Thus, the marginal pmf for X1 , Á , Xn - 1 is given by pX1 , Á , Xn - 11x1 , x2 , Á , xn - 12 = a pX1 , Á , Xn1x1 , x2 , Á , xn2. (6.8) xn A family of conditional pmf’s is obtained from the joint pmf by conditioning on different subcollections of the random variables. For example, if pX1 , Á , Xn - 1 1x1 , Á , xn - 12 7 0: pX1 , Á , Xn1x1 , Á , xn2 . pXn1xn ƒ x1 , Á , xn - 12 = p X1 , Á , Xn - 11x1 , Á , xn - 12 (6.9a) Repeated applications of Eq. (6.9a) yield the following very useful expression: pX1 , Á , Xn1x1 , Á , xn2 = pXn1xn | x1 , Á , xn - 12pXn - 11xn - 1 | x1 , Á , xn - 22 Á pX21x2 | x12pX11x12. (6.9b) Example 6.5 Arrivals at a Packet Switch Find the joint pmf of X = 1X1 , X2 , X32 in Example 6.1. Find P3X1 7 X34. Let N be the total number of packets arriving in the three input ports. Each input port has an arrival with probability p = 1/2, so N is binomial with pmf: 3 1 pN1n2 = ¢ ≤ 3 n 2 for 0 … n … 3. Given N = n, the number of packets arriving for each output port has a multinomial distribution: n! 1 pX1,X2,X31i, j, k ƒ i + j + k = n2 = c i! j! k! 3n 0 for i + j + k = n, i Ú 0, j Ú 0, k Ú 0 otherwise. The joint pmf of X is then: 3 1 pX1i, j, k2 = pX1i, j, k ƒ n2 ¢ ≤ 3 n 2 for i Ú 0, j Ú 0, k Ú 0, i + j + k = n … 3. The explicit values of the joint pmf are: pX10, 0, 02 = 1 3 1 1 0! = ¢ ≤ 0! 0! 0! 30 0 2 3 8 Section 6.1 Vector Random Variables pX11, 0, 02 = pX10, 1, 02 = pX10, 0, 12 = 1! 1 3 1 3 = ¢ ≤ 0! 0! 1! 31 1 2 3 24 pX11, 1, 02 = pX11, 0, 12 = pX10, 1, 12 = 1 3 1 6 2! = ¢ ≤ 0! 1! 1! 32 2 2 3 72 307 pX12, 0, 02 = pX10, 2, 02 = pX10, 0, 22 = 3/72 pX11, 1, 12 = 6/216 pX10, 1, 22 = pX10, 2, 12 = pX11, 0, 22 = pX11, 2, 02 = pX12, 0, 12 = pX12, 1, 02 = 3/216 pX13, 0, 02 = pX10, 3, 02 = pX10, 0, 32 = 1/216. Finally: P3X1 7 X34 = pX11, 0, 02 + pX11, 1, 02 + pX12, 0, 02 + pX11, 2, 02 + pX12, 0, 12 + pX12, 1, 02 + pX13, 0, 02 = 8/27. We say that the random variables X1 , X2 , Á , Xn are jointly continuous random variables if the probability of any n-dimensional event A is given by an n-dimensional integral of a probability density function: P3X in A4 = Á fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ , Lx in A L (6.10) where fX1, Á , Xn1x1 , Á , xn2 is the joint probability density function. The joint cdf of X is obtained from the joint pdf by integration: x1 FX1x2 = FX1,X2 , Á , Xn1x1 , x2 , Á , xn2 = xn Á fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ . L- q L- q (6.11) The joint pdf (if the derivative exists) is given by fX1x2 ! fX1,X2,Á , Xn1x1 , x2 , Á , xn2 = 0n FX ,Á ,Xn1x1 , Á , xn2. 0x1 Á 0xn 1 (6.12) A family of marginal pdf’s is associated with the joint pdf in Eq. (6.12). The marginal pdf for a subset of the random variables is obtained by integrating the other variables out. For example, the marginal pdf of X1 is fX11x12 = q q fX1,X2, Á , Xn1x1 , x2œ , Á , xnœ 2 dx2œ Á dxnœ . L- q L- q As another example, the marginal pdf for X1 , Á , Xn - 1 is given by Á fX1, Á , Xn - 11x1 , Á , xn - 12 = q L- q fX1, Á , Xn1x1 , Á , xn - 1 , xnœ 2 dxnœ . (6.13) (6.14) A family of conditional pdf’s is also associated with the joint pdf. For example, the pdf of Xn given the values of X1 , Á , Xn - 1 is given by fXn1xn | x1 , Á , xn - 12 = fX1, Á , Xn1x1 , Á , xn2 fX1, Á , Xn - 11x1 , Á , xn - 12 (6.15a) 308 Chapter 6 Vector Random Variables if fX1, Á ,Xn - 11x1 , Á , xn - 12 7 0. Repeated applications of Eq. (6.15a) yield an expression analogous to Eq. (6.9b): fX1, Á ,Xn1x1 , Á , xn2 = fXn1xn ƒ x1 , Á , xn - 12fXn - 11xn - 1 ƒ x1 , Á , xn - 22 Á fX21x2 ƒ x12fX11x12. (6.15b) Example 6.6 The random variables X1 , X2 , and X3 have the joint Gaussian pdf fX1,X2,X31x1 , x2 , x32 = e -1x1 + x2 - 12 x1x2 + 2p1p 2 2 冫2 x32 2 1 . Find the marginal pdf of X1 and X3 . Find the conditional pdf of X2 given X1 and X3 . The marginal pdf for the pair X1 and X3 is found by integrating the joint pdf over x2 : fX1,X31x1 , x32 = q -1x 2 + x 2 - 12x x 2 1 2 1 2 2 e -x 3 / 2 22p L- q e 2p/22 dx2 . The above integral was carried out in Example 5.18 with r = -1/22 . By substituting the result of the integration above, we obtain fX1,X31x1 , x32 = 2 2 e -x3 / 2 e -x1/2 22p 22p . Therefore X1 and X3 are independent zero-mean, unit-variance Gaussian random variables. The conditional pdf of X2 given X1 and X3 is: fX21x2 ƒ x1 , x32 = e -1x1 + x2 - 12x1x2 + 2p1p 冫2x322 1 22p22p 2 2 e -x3 / 2e -x1 / 2 e -1 冫2x1 + x2 - 12x1x22 e -1x2-x1/12x12 = . 1p 1p 1 = 2 2 2 2 2 We conclude that X2 given and X3 is a Gaussian random variable with mean x1/22 and variance 1/2. Example 6.7 Multiplicative Sequence Let X1 be uniform in [0, 1], X2 be uniform in 30, X14, and X3 be uniform in 30, X24. (Note that X3 is also the product of three uniform random variables.) Find the joint pdf of X and the marginal pdf of X3 . For 0 6 z 6 y 6 x 6 1, the joint pdf is nonzero and given by: fX1,X2,X31x1 , x2 , x32 = fX31z | x, y2fX21y | x2fX11x2 = 1 1 1 1 = . yx xy Section 6.2 Functions of Several Random Variables 309 The joint pdf of X2 and X3 is nonzero for 0 6 z 6 y 6 1 and is obtained by integrating x between y and 1: 1 1 1 1 1 1 dx = ln x ` = ln . fX2,X31x2 , x32 = y y y y xy y 3 We obtain the pdf of X3 by integrating y between z and 1: fX31x32 = - 1 1 1 1 ln y dy = - 1ln y22 ` = 1ln z22. y 2 2 z z 3 1 Note that the pdf of X3 is concentrated at the values close to x = 0. 6.1.3 Independence The collection of random variables X1 , Á , Xn is independent if P3X1 in A 1 , X2 in A 2 , Á , Xn in A n4 = P3X1 in A 14P3X2 in A 24 Á P3Xn in A n4 for any one-dimensional events A 1 , Á , A n . It can be shown that X1 , Á , Xn are independent if and only if FX1, Á , Xn1x1 , Á , xn2 = FX11x12 Á FXn1xn2 (6.16) for all x1 , Á , xn . If the random variables are discrete, Eq. (6.16) is equivalent to pX1, Á , Xn1x1 , Á , xn2 = pX11x12 Á pXn1xn2 for all x1 , Á , xn . If the random variables are jointly continuous, Eq. (6.16) is equivalent to fX1, Á , Xn1x1 , Á , xn2 = fX11x12 Á fXn1xn2 for all x1 , Á , xn . Example 6.8 The n samples X1 , X2 , Á , Xn of a noise signal have joint pdf given by fX1, Á , Xn1x1 , Á , xn2 = e -1x1 + Á + xn2/2 12p2n/2 2 2 for all x1 , Á , xn . It is clear that the above is the product of n one-dimensional Gaussian pdf’s. Thus X1 , Á , Xn are independent Gaussian random variables. 6.2 FUNCTIONS OF SEVERAL RANDOM VARIABLES Functions of vector random variables arise naturally in random experiments. For example X = 1X1 , X2 , Á , Xn2 may correspond to observations from n repetitions of an experiment that generates a given random variable. We are almost always interested in the sample mean and the sample variance of the observations. In another example 310 Chapter 6 Vector Random Variables X = 1X1 , X2 , Á , Xn2 may correspond to samples of a speech waveform and we may be interested in extracting features that are defined as functions of X for use in a speech recognition system. 6.2.1 One Function of Several Random Variables Let the random variable Z be defined as a function of several random variables: Z = g1X1 , X2 , Á , Xn2. (6.17) The cdf of Z is found by finding the equivalent event of 5Z … z6, that is, the set Rz = 5x: g1x2 … z6, then FZ1z2 = P3X in Rz4 = Á fX1, Á , Xn1x1œ , Á , xnœ 2 dx1œ Á dxnœ . Lx in Rz L (6.18) The pdf of Z is then found by taking the derivative of FZ1z2. Example 6.9 Maximum and Minimum of n Random Variables Let W = max1X1 , X2 , Á , Xn2 and Z = min1X1 , X2 , Á , Xn2, where the Xi are independent random variables with the same distribution. Find FW1w2 and FZ1z2. The maximum of X1 , X2 , Á , Xn is less than x if and only if each Xi is less than x, so: FW1w2 = P3max1X1 , X2 , Á , Xn2 … w4 = P3X1 … w4P3X2 … w4 Á P3Xn … w4 = 1FX1w22n. The minimum of X1 , X2 , Á , Xn is greater than x if and only if each Xi is greater than x, so: 1 - FZ1z2 = P3min1X1 , X2 , Á , Xn2 7 z4 = P3X1 7 z4P3X2 7 z4 Á P3Xn 7 z4 = 11 - FX1z22n and FZ1z2 = 1 - 11 - FX1z22n. Example 6.10 Merging of Independent Poisson Arrivals Web page requests arrive at a server from n independent sources. Source j generates packets with exponentially distributed interarrival times with rate lj . Find the distribution of the interarrival times between consecutive requests at the server. Let the interarrival times for the different sources be given by X1 , X2 , Á , Xn . Each Xj satisfies the memoryless property, so the time that has elapsed since the last arrival from each source is irrelevant. The time until the next arrival at the multiplexer is then: Therefore the pdf of Z is: Z = min1X1 , X2 , Á , Xn2. 1 - FZ1z2 = P3min1X1 , X2 , Á , Xn2 7 z4 = P3X1 7 z4P3X2 7 z4 Á P3Xn 7 z4 Section 6.2 Functions of Several Random Variables 311 = A 1 - FX11z2 B A 1 - FX21z2 B Á A 1 - FXn1z2 B = e -l1ze -l2z Á e -lnz = e -1l1 + l2 + Á + ln2z . The interarrival time is an exponential random variable with rate l1 + l2 + Á + ln . Example 6.11 Reliability of Redundant Systems A computing cluster has n independent redundant subsystems. Each subsystem has an exponentially distributed lifetime with parameter l. The cluster will operate as long as at least one subsystem is functioning. Find the cdf of the time until the system fails. Let the lifetime of each subsystem be given by X1 , X2 , Á , Xn . The time until the last subsystem fails is: W = max1X1 , X2 , Á , Xn2. Therefore the cdf of W is: n n FW1w2 = A FX1w2 B n = 11 - e -lw2n = 1 - ¢ ≤ e -lw + ¢ ≤ e -2lw + Á . 1 2 6.2.2 Transformations of Random Vectors Let X1 , Á , Xn be random variables in some experiment, and let the random variables Z1 , Á , Zn be defined by a transformation that consists of n functions of X = 1X1 , Á , Xn2: Z1 = g11X2 Z2 = g21X2 Á Zn = gn1X2. The joint cdf of Z = 1Z1 , Á , Zn2 at the point z = 1z1 , Á , zn2 is equal to the probability of the region of x where gk1x2 … zk for k = 1, Á , n: FZ1, Á ,Zn1z1 , Á , zn2 = P3g11X2 … z1 , Á , gn1X2 … zn4. (6.19a) If X1 , Á , Xn have a joint pdf, then FZ1, Á ,Zn1z1 , Á , zn2 = 1 Á 1 fX1, Á ,Xn1x1œ , Á , xnœ 2 dx1œ Á dx¿. x¿:gk1x¿2 … zk Example 6.12 Given a random vector X, find the joint pdf of the following transformation: Z1 = g11X12 = a1X1 + b1 , Z2 = g21X22 = a2X2 + b2 , o Zn = gn1Xn2 = anXn + bn . (6.19b) 312 Chapter 6 Vector Random Variables Note that Zk = akXk + bk , … zk , if and only if Xk … 1zk - bk2/ak , if ak 7 0, so FZ1,Z2, Á , Zn1z1 , z2 , Á , zn2 = P B X1 … = FX1,X2, Á , Xn ¢ z1 - b1 z2 - b2 zn - bn , ,Á, ≤ a1 a2 an fZ1,Z2, Á , Zn1z1 , z2 , Á , zn2 = = 1 f a1 Á an X1,X2, z1 - b1 z2 - b2 zn - bn , X2 … , Á , Xn … R a1 a2 an Á , Xn ¢ 0n FZ ,Z , Á , Zn1z1 , z2 , Á , zn2 0z1 Á 0zn 1 2 zn - bn z1 - b1 z2 - b2 , ,Á, ≤. a1 a2 an *6.2.3 pdf of General Transformations We now introduce a general method for finding the pdf of a transformation of n jointly continuous random variables. We first develop the two-dimensional case. Let the random variables V and W be defined by two functions of X and Y: V = g11X, Y2 and W = g21X, Y2. (6.20) Assume that the functions v(x, y) and w(x, y) are invertible in the sense that the equations v = g11x, y2 and w = g21x, y2 can be solved for x and y, that is, x = h11v, w2 and y = h21v, w2. The joint pdf of X and Y is found by finding the equivalent event of infinitesimal rectangles.The image of the infinitesimal rectangle is shown in Fig. 6.1(a).The image can be approximated by the parallelogram shown in Fig. 6.1(b) by making the approximation gk1x + dx, y2 M gk1x, y2 + 0 gk1x, y2 dx 0x k = 1, 2 and similarly for the y variable. The probabilities of the infinitesimal rectangle and the parallelogram are approximately equal, therefore and fX,Y1x, y2 dx dy = fV,W1v, w2 dP fV,W1v, w2 = fX,Y1h11v, w2, 1h21v, w22 dP ` ` dxdy , (6.21) where dP is the area of the parallelogram. By analogy with the case of a linear transformation (see Eq. 5.59), we can match the derivatives in the above approximations with the coefficients in the linear transformations and conclude that the Section 6.2 y Functions of Several Random Variables 313 w (g1(x  dx, y  dy), g2(x  dx, y  dy)) (g1(x, y  dy), (x, y  dy) (x  dx, y  dy) (x, y) (x  dx, y) g2(x, y  dy)) (g1(x  dx, y), g2(x  dx, y)) (g1(x, y), g2(x, y)) x v (a) w g1 g1 g2 g2 (v  x dx  y dy, w  x dx  y dy) g1 g2 (v  y dy, w  y dy) g1 g2 (v  x dx, w  x dx) (v, w) v v  g1(x, y) w  g2(x, y) (b) FIGURE 6.1 (a) Image of an infinitesimal rectangle under general transformation. (b) Approximation of image by a parallelogram. “stretch factor” at the point (v, w) is given by the determinant of a matrix of partial derivatives: 0v 0v 0x 0y T. J1x, y2 = detD 0w 0w 0x 0y 314 Chapter 6 Vector Random Variables The determinant J(x, y) is called the Jacobian of the transformation. The Jacobian of the inverse transformation is given by 0x 0w T. 0y 0w 0x 0v J1v, w2 = detD 0y 0v It can be shown that ƒ J1v, w2 ƒ = 1 . ƒ J1x, y2 ƒ We therefore conclude that the joint pdf of V and W can be found using either of the following expressions: fV,W1v, w2 = fX,Y1h11v, w2, 1h21v, w22 (6.22a) ƒ J1x, y2 ƒ = fX,Y1h11v, w2, 1h21v, w22 ƒ J1v, w2 ƒ . (6.22b) It should be noted that Eq. (6.21) is applicable even if Eq. (6.20) has more than one solution; the pdf is then equal to the sum of terms of the form given by Eqs. (6.22a) and (6.22b), with each solution providing one such term. Example 6.13 Server 1 receives m Web page requests and server 2 receives k Web page requests. Web page transmission times are exponential random variables with mean 1/m. Let X be the total time to transmit files from server 1 and let Y be the total time for server 2. Find the joint pdf for T, the total transmission time, and W, the proportion of the total transmission time contributed by server 1: T = X + Y W = and X . X + Y From Chapter 4, the sum of j independent exponential random variables is an Erlang random variable with parameters j and m. Therefore X and Y are independent Erlang random variables with parameters m and m, and k and m, respectively: fX1x2 = me -mx1mx2m - 1 and fY1y2 = 1m - 12! me -my1my2k - 1 1k - 12! We solve for X and Y in terms of T and W: X = TW Y = T11 - W2. and The Jacobian of the transformation is: J1x, y2 = detC = 1 y 1x + y22 -x 1x + y2 2 - 1 -x 1x + y22 y 1x + y22 = S -1 -1 = . x + y t . Section 6.2 Functions of Several Random Variables 315 The joint pdf of T and W is then: fT,W1t, w2 = 1 ƒ J1x, y2 ƒ = t = B me -mx1mx2m - 1 me -my1my2k - 1 1m - 12! 1k - 12! R x = tw y = t(1 - w) me -mtw1mtw2m - 1 me -mt11 - w21mt11 - w22k - 1 1m - 12! 1k - 12! 1m + k - 12! me -mt1mt2m + k - 1 1m + k - 12! 1m - 12!1k - 12! 1w2m - 111 - w2k - 1. We see that T and W are independent random variables. As expected, T is Erlang with parameters m + k and m, since it is the sum of m + k independent Erlang random variables. W is the beta random variable introduced in Chapter 3. The method developed above can be used even if we are interested in only one function of a random variable. By defining an “auxiliary” variable, we can use the transformation method to find the joint pdf of both random variables, and then we can find the marginal pdf involving the random variable of interest. The following example demonstrates the method. Example 6.14 Student’s t-distribution Let X be a zero-mean, unit-variance Gaussian random variable and let Y be a chi-square random variable with n degrees of freedom. Assume that X and Y are independent. Find the pdf of V = X/ 2Y/n. Define the auxiliary function of W = Y. The variables X and Y are then related to V and W by X = V 2W/n The Jacobian of the inverse transformation is ƒ J1v, w2 ƒ = ` Y = W. and 1w/n 0 1v/221wn ` = 1w/n. 1 Since fX,Y1x, y2 = fX1x2fY1y2, the joint pdf of V and W is thus fV,W1v, w2 = = 2 n/2 - 1 -y/2 e e -x /2 1y/22 ƒ J1v, w2 ƒ ` x 2≠1n/22 22p y 1w/221n - 12/2e -31w/2211 + v2/n24 22np≠1n/22 = v 2w/n = w . The pdf of V is found by integrating the joint pdf over w: fV1v2 = 22np≠1n/22 L0 1 q 1w/221n - 12/2e -31w/2211 + v2/n24 dw. If we let w¿ = 1w/221v2/n + 12, the integral becomes fV1v2 = 11 + v2/n2-1n + 12/2 2np≠1n/22 L0 q 1w¿21n - 12/2e -w¿ dw¿. 316 Chapter 6 Vector Random Variables By noting that the above integral is the gamma function evaluated at 1n + 12/2, we finally obtain the Student’s t-distribution: fV1v2 = 11 + v2/n2-1n + 12/2≠11n + 12/22 2np≠1n/22 . This pdf is used extensively in statistical calculations. (See Chapter 8.) Next consider the problem of finding the joint pdf for n functions of n random variables X = 1X1 , Á , Xn2: Z1 = g11X2, Z2 = g21X2, Á , Zn = gn1X2. We assume as before that the set of equations z1 = g11x2, z2 = g21x2, Á , zn = gn1x2. (6.23) has a unique solution given by x1 = h11x2, x2 = h21x2, Á , xn = hn1x2. The joint pdf of Z is then given by fX1, Á ,Xn1h11z2, h21z2, Á , hn1z22 fZ1, Á ,Zn1z1 , Á , zn2 = ƒ J1x1 , x2 , Á , xn2 ƒ = fX1, Á ,Xn1h11z2, h21z2, Á , hn1z22 ƒ J1z1 , z2 , Á , zn2 ƒ , (6.24a) (6.24b) where ƒ J1x1 , Á , xn2 ƒ and ƒ J1z1 , Á , zn2 ƒ are the determinants of the transformation and the inverse transformation, respectively, 0g1 0x1 J1x1 , Á , xn2 = detE o 0gn 0x1 Á Á 0g1 0xn o U 0gn 0xn and 0h1 0z1 J1z1 , Á , zn2 = detE o 0hn 0z1 Á Á 0h1 0zn o U. 0hn 0zn Section 6.2 Functions of Several Random Variables 317 In the special case of a linear transformation we have: a11 a Z = AX = D 21 . an1 Á Á Á Á a12 a22 . an2 a1n X1 a2n X T D 2 T. . Á ann Xn The components of Z are: Zj = aj1X1 + aj2X2 + Á + ajnXn . Since dzj /dxi = aji , the Jacobian is then simply: a11 a J1x1 , x2 , Á , xn2 = detD 21 . an1 Á Á Á Á a12 a22 . an2 Assuming that A is invertible,1 we then have that: fZ1z2 = fX1x2 ƒ det A ƒ ` = x = A-1z a1n a2n T = det A. . ann fX1A-1z2 ƒ det A ƒ . Example 6.15 Sum of Random Variables Given a random vector X = 1X1 , X2 , X32, find the joint pdf of the sum: Z = X1 + X2 + X3 . We will use the transformation by introducing auxiliary variables as follows: Z1 = X1 , Z2 = X1 + X2 , Z3 = X1 + X2 + X3 . The inverse transformation is given by: X1 = Z1 , X2 = Z2 - Z1 , X3 = Z3 - Z2 . The Jacobian matrix is: 1 J1x1 , x2 , x32 = detC 1 1 0 1 1 0 0 S = 1. 1 Therefore the joint pdf of Z is fZ1z1 , z2 , z32 = fX1z1 , z2 - z1 , z3 - z22. The pdf of Z3 is obtained by integrating with respect to z1 and z2 : q q fZ31z2 = 3 3 q q - fX1z1 , z2 - z1 , z - z22 dz1dz2 . - This expression can be simplified further if X1 , X2 , and X3 are independent random variables. 1 Appendix C provides a summary of definitions and useful results from linear algebra. 318 6.3 Chapter 6 Vector Random Variables EXPECTED VALUES OF VECTOR RANDOM VARIABLES In this section we are interested in the characterization of a vector random variable through the expected values of its components and of functions of its components. We focus on the characterization of a vector random variable through its mean vector and its covariance matrix. We then introduce the joint characteristic function for a vector random variable. The expected value of a function g1X2 = g1X1 , Á , Xn2 of a vector random variable X = 1X1 , X2 , Á , Xn2 is given by: q q g1x1 , x2 , Á , xn2fX1x1 , x2 , Á , xn2 dx1 dx2 Á dxn X jointly L -q L -q continuous E[Z] = d X discrete. a Á a g1x1 , x2 , Á , xn2pX1x1 , x2 , Á , xn2 xn x1 (6.25) Á An important example is g(X) equal to the sum of functions of X. The procedure leading to Eq. (5.26) and a simple induction argument show that: E3g11X2 + g21X2 + Á + gn1X24 = E3g11X24 + Á + E3gn1X24. (6.26) Another important example is g(X) equal to the product of n individual functions of the components. If X1 , Á , Xn are independent random variables, then E3g11X12g21X22 Á gn1Xn24 = E3g11X124E3g21X224 Á E3gn1Xn24. 6.3.1 (6.27) Mean Vector and Covariance Matrix The mean, variance, and covariance provide useful information about the distribution of a random variable and are easy to estimate, so we are frequently interested in characterizing multiple random variables in terms of their first and second moments. We now introduce the mean vector and the covariance matrix. We then investigate the mean vector and the covariance matrix of a linear transformation of a random vector. For X = 1X1 , X2 , Á , Xn2, the mean vector is defined as the column vector of expected values of the components Xk: mX X1 E[X1] X2 E[X ] = E[X] = ED . T ! D . 2 T . .. .. Xn E[Xn] (6.28a) Note that we define the vector of expected values as a column vector. In previous sections we have sometimes written X as a row vector, but in this section and wherever we deal with matrix transformations, we will represent X and its expected value as a column vector. Section 6.3 Expected Values of Vector Random Variables 319 The correlation matrix has the second moments of X as its entries: RX E3X214 E3X2X14 = D . E3XnX14 E3X1X24 E3X224 . E3XnX24 E3X1Xn4 E3X2Xn4 T. . E3X2n4 Á Á Á Á (6.28b) The covariance matrix has the second-order central moments as its entries: KX E31X1 - m1224 E31X2 - m221X1 - m124 = D . E31Xn - mn21X1 - m124 E31X1 - m121X2 - m224 E31X2 - m2224 . E31Xn - mn21X2 - m224 Á Á Á Á E31X1 - m121Xn - mn24 E31X2 - m221Xn - mn24 T. . E31Xn - mn224 (6.28c) Both R X and K X are n * n symmetric matrices. The diagonal elements of K X are given by the variances VAR3Xk4 = E31Xk - mk224 of the elements of X. If these elements are uncorrelated, then COV1Xj , Xk2 = 0 for j Z k, and K X is a diagonal matrix. If the random variables X1 , Á , Xn are independent, then they are uncorrelated and K X is diagonal. Finally, if the vector of expected values is 0, that is, mk = E3Xk4 = 0 for all k, then R X = K X. Example 6.16 Let X = 1X1 , X2 , X32 be the jointly Gaussian random vector from Example 6.6. Find E[X] and K X. We rewrite the joint pdf as follows: 2 e -1x1 fX1,X2,X31x1 , x2 , x32 = 2p B + x22 - 2 1 xx2 22 1 2 1 - ¢ - 1 22 2 e -x 3 / 2 ≤ 2 22p . We see that X3 is a Gaussian random variable with zero mean and unit variance, and that it is independent of X1 and X2 . We also see that X1 and X2 are jointly Gaussian with zero mean and unit variance, and with correlation coefficient rX1X2 = - 1 22 = COV1X1 , X22 sX1sX2 = COV1X1 , X22. Therefore the vector of expected values is: m X = 0, and 1 KX = E - 1 22 0 - 1 22 0 1 0 0 1 U. 320 Chapter 6 Vector Random Variables We now develop compact expressions for R X and K X. If we multiply X, an n * 1 matrix, and X T, a 1 * n matrix, we obtain the following n * n matrix: X1 X21 X2 XX XX T = D . T 3X1 , X2 , Á , Xn4 = D 2 1 .. . Xn XnX1 X1X2 X22 . XnX2 Á Á Á Á X1Xn X2Xn T. . X2n If we define the expected value of a matrix to be the matrix of expected values of the matrix elements, then we can write the correlation matrix as: The covariance matrix is then: R X = E3XX T4. (6.29a) K X = E31X - m X21X - m X2T4 = E3XX T4 - m X E3X T4 - E3X4m XT + m Xm XT = R X - m Xm XT. 6.3.2 (6.29b) Linear Transformations of Random Vectors Many engineering systems are linear in the sense that will be elaborated on in Chapter 10. Frequently these systems can be reduced to a linear transformation of a vector of random variables where the “input” is X and the “output” is Y: a11 a Y = D 21 . an1 Á Á Á Á a 12 a 22 . an2 an X1 a2n X2 T D .. T = AX. . . ann Xn The expected value of the kth component of Y is the inner product (dot product) of the kth row of A and X: E3Yk4 = E B a akjXj R = a akjE3Xj4. n n j=1 j=1 Each component of E[Y] is obtained in this manner, so: a a1jE3Xj4 n j=1 n a a2jE3Xj4 m Y = E3Y4 = G j = 1 . . . a anjE3Xj4 n a11 a W = D 21 . an1 a12 a22 . an2 Á Á Á Á an E3X14 a2n E3X24 T T D .. . . ann E3Xn4 j=1 = AE3X4 = Am X. (6.30a) Section 6.3 Expected Values of Vector Random Variables 321 The covariance matrix of Y is then: K Y = E31Y - m Y21Y - m Y2T4 = E31AX - Am X21AX - Am X2T4 = E3A1X - m X21X - m X2TAT4 = AE31X - m X21X - m X2T4AT = AK XAT, (6.30b) where we used the fact that the transpose of a matrix multiplication is the product of the transposed matrices in reverse order: 5A1X - m X26T = 1X - m X2TAT. The cross-covariance matrix of two random vectors X and Y is defined as: K XY = E31X - m X21Y - m Y2T4 = E3XY T4 - m Xm YT = R XY - m Xm YT. We are interested in the cross-covariance between X and Y = AX: K XY = E3X - m X21Y - m Y2T4 = E31X - m X21X - m X2TAT4 = K XAT. (6.30c) Example 6.17 Transformation of Uncorrelated Random Vector Suppose that the components of X are uncorrelated and have unit variance, then K X = I, the identity matrix. The covariance matrix for Y = AX is K Y = AK XAT = AIAT = AAT. (6.31) T In general K Y = AA is not a diagonal matrix and so the components of Y are correlated. In Section 6.6 we discuss how to find a matrix A so that Eq. (6.31) holds for a given K Y. We can then generate a random vector Y with any desired covariance matrix K Y. Suppose that the components of X are correlated so K X is not a diagonal matrix. In many situations we are interested in finding a transformation matrix A so that Y = AX has uncorrelated components. This requires finding A so that K Y = AK XAT is a diagonal matrix. In the last part of this section we show how to find such a matrix A. Example 6.18 Transformation to Uncorrelated Random Vector Suppose the random vector X1 , X2 , and X3 in Example 6.16 is transformed using the matrix: 1 22 A = E 1 22 0 Find the E[Y] and K Y. 1 22 1 22 0 0 0 1 U. 322 Chapter 6 Vector Random Variables Since m X = 0, then E3Y4 = Am X = 0. The covariance matrix of Y is: KY 1 1 = AK XAT = C 1 2 0 1 1 = C1 2 0 1 -1 0 1 -1 0 - 1 0 0S E 1 1 22 0 1 1 0 22 0S E 1 1 1 22 0 1 22 0 1 U C1 0 0 1 1 0 1 1 + - ¢1 + 22 1 22 0 0 U = E 1 0 0S 1 1 22 1 - 0 ≤ 1 -1 0 0 0 1 22 1 + 0 0 0 0 U. 1 The linear transformation has produced a vector of random variables Y = 1Y1 , Y2 , Y32 with components that are uncorrelated. *6.3.3 Joint Characteristic Function The joint characteristic function of n random variables is defined as ≥ X1,X2, Á , Xn1v1 , v2 , Á , vn2 = E3ej1v1X1 + v2X2 + Á + vnXn2 4. (6.32a) In this section we develop the properties of the joint characteristic function of two random variables. These properties generalize in straightforward fashion to the case of n random variables. Therefore consider ≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24. (6.32b) If X and Y are jointly continuous random variables, then q ≥ X,Y1v1 , v22 = q (6.32c) fX,Y1x, y2ej1v1x + v2y2 dx dy. L- q L- q Equation (6.32c) shows that the joint characteristic function is the two-dimensional Fourier transform of the joint pdf of X and Y. The inversion formula for the Fourier transform implies that the joint pdf is given by q q 1 (6.33) ≥ X,Y1v1 , v22e -j1v1x + v2y2 dv1 dv2 . = 4p2 L- q L- q Note in Eq. (6.32b) that the marginal characteristic functions can be obtained from joint characteristic function: fX,Y1x, y2 = ≥ X1v2 = ≥ X,Y1v, 02 ≥ Y1v2 = ≥ X,Y10, v2. (6.34) If X and Y are independent random variables, then the joint characteristic function is the product of the marginal characteristic functions since ≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24 = E3ejv1Xejv2Y4 = E3ejv1X4E3ejv2Y4 = ≥ X1v12≥ Y1v22, where the third equality follows from Eq. (6.27). (6.35) Section 6.3 Expected Values of Vector Random Variables 323 The characteristic function of the sum Z = aX + bY can be obtained from the joint characteristic function of X and Y as follows: ≥ Z1v2 = E3ejv1aX + bY24 = E3ej1vaX + vbY24 = ≥ X,Y1av, bv2. (6.36a) If X and Y are independent random variables, the characteristic function of Z = aX + bY is then (6.36b) ≥ Z1v2 = ≥ X,Y1av, bv2 = ≥ X1av2≥ Y1bv2. In Section 8.1 we will use the above result in dealing with sums of random variables. The joint moments of X and Y (if they exist) can be obtained by taking the derivatives of the joint characteristic function. To show this we rewrite Eq. (6.32b) as the expected value of a product of exponentials and we expand the exponentials in a power series: ≥ X,Y1v1 , v22 = E3ejv1Xejv2Y4 = EB a q i=0 1jv1X2i i! a k! k=0 = a a E3XiYk4 q q 1jv2Y2k q i=0k=0 R 1jv12i 1jv22k i! k! . It then follows that the moments can be obtained by taking an appropriate set of derivatives: 0 i0 k 1 (6.37) E3XiYk4 = i + k i k ≥ X,Y1v1 , v22 |v1 = 0,v2 = 0 . j 0v10v2 Example 6.19 Suppose U and V are independent zero-mean, unit-variance Gaussian random variables, and let X = U + V Y = 2U + V. Find the joint characteristic function of X and Y, and find E[XY]. The joint characteristic function of X and Y is ≥ X,Y1v1 , v22 = E3ej1v1X + v2Y24 = E3ejv11U + V2ejv212U + V24 = E3ej11v1 + 2v22U + 1v1 + v22V24. Since U and V are independent random variables, the joint characteristic function of U and V is equal to the product of the marginal characteristic functions: ≥ X,Y1v1 , v22 = E3ej11v1 + 2v22U24E3ej11v1 + v22V24 = ≥ U1v1 + 2v22≥ V1v1 + v22 = e - 21v1 + 2v22 e - 21v1 + v22 1 2 2 = e{- 212v1 + 6v1v2 + 5v2 2}. 1 2 1 2 where marginal characteristic functions were obtained from Table 4.1. 324 Chapter 6 Vector Random Variables The correlation E[XY] is found from Eq. (6.37) with i = 1 and k = 1: E3XY4 = 02 1 ≥ X,Y1v1 , v22 ƒ v1 = 0,v2 = 0 2 0v 0v j 1 2 = -exp{- 1212v12 + 6v1v2 + 5v222}36v1 + 10v24a + 1 b34v1 + 6v24 4 1 exp{- 1212v21 + 6v1v2 + 5v222}364 ƒ v1 = 0,v2 = 0 2 = 3. You should verify this answer by evaluating E3XY4 = E31U + V212U + V24 directly. *6.3.4 Diagonalization of Covariance Matrix Let X be a random vector with covariance K X. We are interested in finding an n * n matrix A such that Y = AX has a covariance matrix that is diagonal. The components of Y are then uncorrelated. We saw that K X is a real-valued symmetric matrix. In Appendix C we state results from linear algebra that K X is then a diagonalizable matrix, that is, there is a matrix P such that: (6.38a) P TK XP = ∂ and P TP = I where ∂ is a diagonal matrix and I is the identity matrix. Therefore if we let A = P T, then from Eq. (6.30b) we obtain a diagonal K Y. We now show how P is obtained. First, we find the eigenvalues and eigenvectors of K X from: (6.38b) K Xei = liei where ei are n * 1 column vectors.2 We can normalize each eigenvector ei so that ei Tei , the sum of the square of its components, is 1. The normalized eigenvectors are then orthonormal, that is, 1 if i = j (6.38c) ei Tej = di, j = b 0 if i Z j. Let P be the matrix whose columns are the eigenvectors of K X and let ∂ be the diagonal matrix of eigenvalues: P = 3e1 , e2 , Á , en4 ∂ = diag3l14. From Eq. (6.38b) we have: K XP = K X3e1 , e2 , Á , en4 = 3K Xe1 , K Xe2 , Á , K Xen4 = 3l1e1 , l2e2 , Á , lnen4 = P∂ (6.39a) where the second equality follows from the fact that each column of K XP is obtained by multiplying a column of P by K X. By premultiplying both sides of the above equations by P T, we obtain: P TK XP = P TP∂ = ∂. (6.39b) 2 See Appendix C. Section 6.4 Jointly Gaussian Random Vectors 325 We conclude that if we let A = P T, and Y = AX = P TX, (6.40a) then the random variables in Y are uncorrelated since K Y = P TK XP = ∂. (6.40b) In summary, any covariance matrix KX. can be diagonalized by a linear transformation. The matrix A in the transformation is obtained from the eigenvectors of K X. Equation (6.40b) provides insight into the invertibility of K X and K Y. From linear algebra we know that the determinant of a product of n * n matrices is the product of the determinants, so: det K Y = det P T det K X det P = det ∂ = l1l2 Á ln , where we used the fact that det P T det P = det I = 1. Recall that a matrix is invertible if and only if its determinant is nonzero. Therefore K Y is not invertible if and only if one or more of the eigenvalues of K X is zero. Now suppose that one of the eigenvalues is zero, say lk = 0. Since VAR3Yk4 = lk = 0, then Yk = 0. But Yk is defined as a linear combination, so 0 = Yk = ak1X1 + ak2X2 + Á + aknXn. We conclude that the components of X are linearly dependent. Therefore, one or more of the components in X are redundant and can be expressed as a linear combination of the other components. It is interesting to look at the vector X expressed in terms of Y. Multiply both sides of Eq. (6.40a) by P and use the fact that PP T = I: Y1 n Y X = PP TX = PY = 3e1 , e2 , Á , en4D .2 T = a Ykek . .. k=1 (6.41) Yn This equation is called the Karhunen-Loeve expansion.The equation shows that a random vector X can be expressed as a weighted sum of the eigenvectors of K X, where the coefficients are uncorrelated random variables Yk . Furthermore, the eigenvectors form an orthonormal set. Note that if any of the eigenvalues are zero, VAR3Yk4 = lk = 0, then Yk = 0, and the corresponding term can be dropped from the expansion in Eq. (6.41). In Chapter 10, we will see that this expansion is very useful in the processing of random signals. 6.4 JOINTLY GAUSSIAN RANDOM VECTORS The random variables X1 , X2 , Á , Xn are said to be jointly Gaussian if their joint pdf is given by fX1x2 ! fX1,X2,Á,Xn1x1 , Á , xn2 = exp5 - 211x - m2TK -11x - m26 12p2n/2 ƒ K ƒ 1/2 , (6.42a) 326 Chapter 6 Vector Random Variables where x and m are column vectors defined by m1 E3X14 m E3X24 m = D 2T = D T o o mn E3Xn4 x1 x x = D 2T, o xn and K is the covariance matrix that is defined by VAR1X12 COV1X2 , X12 K = D o COV1Xn , X12 COV1X1 , X22 VAR1X22 o Á Á Á COV1X1 , Xn2 COV1X2 , Xn2 T. o VAR1Xn2 (6.42b) The 1.2T in Eq. (6.42a) denotes the transpose of a matrix or vector. Note that the covariance matrix is a symmetric matrix since COV1Xi , Xj2 = COV1Xj , Xi2. Equation (6.42a) shows that the pdf of jointly Gaussian random variables is completely specified by the individual means and variances and the pairwise covariances. It can be shown using the joint characteristic function that all the marginal pdf’s associated with Eq. (6.42a) are also Gaussian and that these too are completely specified by the same set of means, variances, and covariances. Example 6.20 Verify that the two-dimensional Gaussian pdf given in Eq. (5.61a) has the form of Eq. (6.42a). The covariance matrix for the two-dimensional case is given by K = B s21 rX,Ys1s2 rX,Ys1s2 R, s22 where we have used the fact the COV1X1 , X22 = rX,Ys1s2 . The determinant of K is s12 s2211 - r2X,Y2 so the denominator of the pdf has the correct form. The inverse of the covariance matrix is also a real symmetric matrix: K -1 = s21s2211 1 s22 B 2 - rX,Y2 -rX,Ys1s2 -rX,Ys1s2 R. s21 The term in the exponent is therefore 1 s22 1x m , y m 2 B 1 2 -rX,Ys1s2 s21s2211 - r2X,Y2 = = s21s2211 -rX,Ys1s2 x - m1 RB R s21 y - m2 1 s21x - m12 - rX,Ys1s21y - m22 1x - m1 , y - m22 B 2 R 2 -rX,Ys1s21x - m12 + s211y - m22 - rX,Y2 11x - m12/s122 - 2rX,Y11x - m12/s1211y - m22/s22 + 11y - m22/s222 11 - r2X,Y2 Thus the two-dimensional pdf has the form of Eq. (6.42a). . Section 6.4 Jointly Gaussian Random Vectors 327 Example 6.21 The vector of random variables (X, Y, Z) is jointly Gaussian with zero means and covariance matrix: VAR1X2 K = C COV1Y, X2 COV1Z, X2 COV1X, Y2 VAR1Y2 COV1Z, Y2 COV1X, Z2 1.0 COV1Y, Z2 S = C 0.2 VAR1Z2 0.3 0.2 1.0 0.4 0.3 0.4 S. 1.0 Find the marginal pdf of X and Z. We can solve this problem two ways. The first involves integrating the pdf directly to obtain the marginal pdf.The second involves using the fact that the marginal pdf for X and Z is also Gaussian and has the same set of means, variances, and covariances. We will use the second approach. The pair (X, Z) has zero-mean vector and covariance matrix: K¿ = B VAR1X2 COV1Z, X2 COV1X, Z2 1.0 R = B VAR1Z2 0.3 0.3 R. 1.0 The joint pdf of X and Z is found by substituting a zero-mean vector and this covariance matrix into Eq. (6.42a). Example 6.22 Independence of Uncorrelated Jointly Gaussian Random Variables Suppose X1 , X2 , Á , Xn are jointly Gaussian random variables with COV1Xi , Xj2 = 0 for i Z j. Show that X1 , X2 , Á , Xn are independent random variables. From Eq. (6.42b) we see that the covariance matrix is a diagonal matrix: K = diag3VAR1Xi24 = diag3s2i 4 Therefore K -1 = diag B and 1 R s2i 1x - m2TK -11x - m2 = a ¢ n i=1 xi - m i 2 ≤ . si Thus from Eq. (6.42a) fX1x2 = n 2 exp E - 12 a i = 1 [1xi - mi2/si] F 12p2 n/2 ƒKƒ 1/2 = q n i=1 exp E - 21 [1xi - mi2/si]2 F 22ps2i = q fXi1xi2. n i=1 Thus X1 , X2 , Á , Xn are independent Gaussian random variables. Example 6.23 Conditional pdf of Gaussian Random Variable Find the conditional pdf of Xn given X1 , X2 , Á , Xn - 1 . Let K n be the covariance matrix for X n = 1X1 , X2 , Á , Xn2 and K n - 1 be the covariance matrix for X n - 1 = 1X1 , X2 , Á , Xn - 12. Let Qn = K n-1 and Qn -1 = Kn-1-1, then the latter matrices are 328 Chapter 6 Vector Random Variables submatrices of the former matrices as shown below: K1n K2n T ... Kn - 1 Kn = D K1n Á K2n Qn = D Q1n Knn Q1n Q2n T ... Qn - 1 Q2n Á Qnn Below we will use the subscript n or n - 1 to distinguish between the two random vectors and their parameters. The marginal pdf of Xn given X1 , X2 , Á , Xn - 1 is given by: fXn1xn ƒ x1 , Á , xn - 12 = fXn1Xn2 fXn - 11Xn - 12 = exp5- 121x n - m n2TQn1x n - m n26 = exp5- 121x n - m n2TQn1x n - m n2 + 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 126 12p2n/2 ƒ K n ƒ 1/2 12p21n - 121/2 ƒ K n - 1 ƒ 1/2 exp5- 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 126 22p ƒ K n ƒ 1/2/ ƒ K n - 1 ƒ 1/2 . In Problem 6.60 we show that the terms in the above expression are given by: 1 2 1x n where B = - m n2TQn1x n - m n2 - 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 12 = Qnn51xn - mn2 + B62 - QnnB2 1 n-1 Qjn1xj - mj2 Qnn ja =1 and (6.43) ƒ K n ƒ / ƒ K n - 1 ƒ = 1 /Qnn . This implies that Xn has mean mn - B, and variance 1/Qnn . The term QnnB2 is part of the normalization constant. We therefore conclude that: fXn1xn ƒ x1 , Á , xn - 12 = exp b - 2 Qnn 1 n-1 Qjn1xj - mj2 ≤ r ¢ x - mn + a 2 Qnn j = 1 22p / Qnn We see that the conditional mean of Xn is a linear function of the “observations” x1 , x2 , Á , xn - 1 . *6.4.1 Linear Transformation of Gaussian Random Variables A very important property of jointly Gaussian random variables is that the linear transformation of any n jointly Gaussian random variables results in n random variables that are also jointly Gaussian. This is easy to show using the matrix notation in Eq. (6.42a). Let X = 1X1 , Á , Xn2 be jointly Gaussian with covariance matrix KX and mean vector m X and define Y = 1Y1 , Á , Yn2 by Y = AX, Section 6.4 329 Jointly Gaussian Random Vectors where A is an invertible n * n matrix. From Eq. (5.60) we know that the pdf of Y is given by fX1A-1y2 fY1y2 = ƒAƒ = -1 exp5- 211A-1y - mX2TKX 1A-1y - mX26 12p2 ƒ A ƒ ƒ KX ƒ n/2 1/2 . (6.44) From elementary properties of matrices we have that and 1A-1y - m X2 = A-11y - Am X2 1A-1y - m X2T = 1y - Am X2TA-1T. The argument in the exponential is therefore equal to -1 -1 A 1y - Am X2 = 1y - Am X2T1AKXAT2-11y - Am X2 1y - Am X2TA-1TKX T -1 T since A-1TK -1 X = 1AKXA 2 . Letting KY = AKXA and m Y = Am X and noting that det1KY2 = det1AKXAT2 = det1A2det1KX2det1AT2 = det1A22 det1KX2, we finally have that the pdf of Y is T -1 e -11/221y - mY2 KY 1y - mY2 (6.45) . fY1y2 = n/2 1/2 12p2 ƒ KY ƒ Thus the pdf of Y has the form of Eq. (6.42a) and therefore Y1 , Á , Yn are jointly Gaussian random variables with mean vector and covariance matrix: m Y = Am X and KY = AKXAT. This result is consistent with the mean vector and covariance matrix we obtained before in Eqs. (6.30a) and (6.30b). In many problems we wish to transform X to a vector Y of independent Gaussian random variables. Since KX is a symmetric matrix, it is always possible to find a matrix A such that AKXAT = ¶ is a diagonal matrix. (See Section 6.6.) For such a matrix A, the pdf of Y will be fY1y2 = T e -11/221y - n2 ¶ -11y - n2 12p2 ƒ ¶ ƒ n/2 1/2 exp b - 21 a 1yi - ni22/li r n = i=1 312pl1212pl22 Á 12pln24 1/2 , (6.46) where l1 , Á , ln are the diagonal components of ¶. We assume that these values are all nonzero. The above pdf implies that Y1 , Á , Yn are independent random variables 330 Chapter 6 Vector Random Variables with means ni and variance li . In conclusion, it is possible to linearly transform a vector of jointly Gaussian random variables into a vector of independent Gaussian random variables. It is always possible to select the matrix A that diagonalizes K so that det1A2 = 1. The transformation AX then corresponds to a rotation of the coordinate system so that the principal axes of the ellipsoid corresponding to the pdf are aligned to the axes of the system. Example 5.48 provides an n = 2 example of rotation. In computer simulation models we frequently need to generate jointly Gaussian random vectors with specified covariance matrix and mean vector. Suppose that X = 1X1 , X2 , Á, Xn2 has components that are zero-mean, unit-variance Gaussian random variables, so its mean vector is 0 and its covariance matrix is the identity matrix I. Let K denote the desired covariance matrix. Using the methods discussed in Section 6.3, it is possible to find a matrix A so that ATA = K. Therefore Y = ATU has zero mean vector and covariance K. From Eq. (6.46) we have that Y is also a jointly Gaussian random vector with zero mean vector and covariance K. If we require a nonzero mean vector m, we use Y + m. Example 6.24 Sum of Jointly Gaussian Random Variables Let X1 , X2 , Á , Xn be jointly Gaussian random variables with joint pdf given by Eq. (6.42a). Let Z = a1X1 + a2X2 + Á + anXn . We will show that Z is always a Gaussian random variable. We find the pdf of Z by introducing auxiliary random variables. Let Z3 = X3 , Á , Z2 = X2 , If we define Z = 1Z1 , Z2 , Á , Zn2, then Zn = Xn . Z = AX where A = D a1 0 Á Á Á Á a2 1 # # # 0 # # # an 0 0 1 # T. From Eq. (6.45) we have that Z is jointly Gaussian with mean n = Am, and covariance matrix C = AKAT. Furthermore, it then follows that the marginal pdf of Z is a Gaussian pdf with mean given by the first component of n and variance given by the 1-1 component of the covariance matrix C. By carrying out the above matrix multiplications, we find that E3Z4 = a aiE3Xi4 n (6.47a) i=1 VAR3Z4 = a a aiaj COV1Xi , Xj2. n n i=1 j=1 (6.47b) Section 6.4 Jointly Gaussian Random Vectors 331 *6.4.2 Joint Characteristic Function of a Gaussian Random Variable The joint characteristic function is very useful in developing the properties of jointly Gaussian random variables. We now show that the joint characteristic function of n jointly Gaussian random variables X1, X2, Á , Xn is given by £ X1,X2, Á , Xn1v1 , v2 , Á, vn2 = e ja i = 1vimi - 2 a i = 1a k = 1vivk COV1Xi,Xk2, 1 n n n (6.48a) which can be written more compactly as follows: T T £ X1V2 ! £ X1,X2, Á , Xn1v1 , v2 , Á , vn2 = ejV m - 2 V KV, 1 (6.48b) where m is the vector of means and K is the covariance matrix defined in Eq. (6.42b). Equation (6.48) can be verified by direct integration (see Problem 6.65). We use the approach in [Papoulis] to develop Eq. (6.48) by using the result from Example 6.24 that a linear combination of jointly Gaussian random variables is always Gaussian. Consider the sum Z = a1X1 + a2X2 + Á + anXn . The characteristic function of Z is given by £ Z1v2 = E3ejvZ4 = E3ej1va1X1 + va2X2 + 4 Á + vanXn2 = £ X1, Á , Xn1a1v, a2v, Á , anv2. On the other hand, since Z is a Gaussian random variable with mean and variance given Eq. (6.47), we have £ Z1v2 = ejvE3Z4 - 2 VAR3Z4v 1 2 = ejv a i = 1aimi - 2v a i = 1a k = 1aiak COV1Xi,Xk2. n 1 2 n n By equating both expressions for £ Z1v2 with v = 1, we finally obtain £ X1,X2, Á , Xn1a 1 , (6.49) a2 , Á , an2 = eja i = 1 aimi - 2 a i = 1a k = 1aiak COV1Xi,Xk2 1 n T = eja m - 21 aTKa . n n (6.50) By replacing the ai’s with vi’s we obtain Eq. (6.48). The marginal characteristic function of any subset of the random variables X1 , X2 , Á , Xn can be obtained by setting appropriate vi’s to zero. Thus, for example, the marginal characteristic function of X1 , X2 , Á , Xm for m 6 n is obtained by setting vm + 1 = vm + 2 = Á = vn = 0. Note that the resulting characteristic function again corresponds to that of jointly Gaussian random variables with mean and covariance terms corresponding the reduced set X1 , X2 , Á , Xm . The derivation leading to Eq. (6.50) suggests an alternative definition for jointly Gaussian random vectors: Definition: X is a jointly Gaussian random vector if and only every linear combination Z = aTX is a Gaussian random variable. 332 Chapter 6 Vector Random Variables In Example 6.24 we showed that if X is a jointly Gaussian random vector then the linear combination Z = aTX is a Gaussian random variable. Suppose that we do not know the joint pdf of X but we are given that Z = aTX is a Gaussian random variable for any choice of coefficients aT = 1a1 , a2 , Á , an2. This implies that Eqs. (6.48) and (6.49) hold, which together imply Eq. (6.50) which states that X has the characteristic function of a jointly Gaussian random vector. The above definition is slightly broader than the definition using the pdf in Eq. (6.44). The definition based on the pdf requires that the covariance in the exponent be invertible. The above definition leads to the characteristic function of Eq. (6.50) which does not require that the covariance be invertible. Thus the above definition allows for cases where the covariance matrix is not invertible. 6.5 ESTIMATION OF RANDOM VARIABLES In this book we will encounter two basic types of estimation problems. In the first type, we are interested in estimating the parameters of one or more random variables, e.g., probabilities, means, variances, or covariances. In Chapter 1, we stated that relative frequencies can be used to estimate the probabilities of events, and that sample averages can be used to estimate the mean and other moments of a random variable. In Chapters 7 and 8 we will consider this type of estimation further. In this section, we are concerned with the second type of estimation problem, where we are interested in estimating the value of an inaccessible random variable X in terms of the observation of an accessible random variable Y. For example, X could be the input to a communication channel and Y could be the observed output. In a prediction application, X could be a future value of some quantity and Y its present value. 6.5.1 MAP and ML Estimators We have considered estimation problems informally earlier in the book. For example, in estimating the output of a discrete communications channel we are interested in finding the most probable input given the observation Y = y, that is, the value of input x that maximizes P3X = x ƒ Y = y4: max P3X = x ƒ Y = y4. x In general we refer to the above estimator for X in terms of Y as the maximum a posteriori (MAP) estimator. The a posteriori probability is given by: P3X = x ƒ Y = y4 = P3Y = y ƒ X = x4P3X = x4 P3Y = y4 and so the MAP estimator requires that we know the a priori probabilities P3X = x4. In some situations we know P3Y = y ƒ X = x4 but we do not know the a priori probabilities, so we select the estimator value x as the value that maximizes the likelihood of the observed value Y = y: max P3Y = y ƒ X = x4. x Section 6.5 Estimation of Random Variables 333 We refer to this estimator of X in terms of Y as the maximum likelihood (ML) estimator. We can define MAP and ML estimators when X and Y are continuous random variables by replacing events of the form 5Y = y6 by 5y 6 Y 6 y + dy6. If X and Y are continuous, the MAP estimator for X given the observation Y is given by: maxfX1X = x ƒ Y = y2, x and the ML estimator for X given the observation Y is given by: maxfX1Y = y ƒ X = x2. x Example 6.25 Comparison of ML and MAP Estimators Let X and Y be the random pair in Example 5.16. Find the MAP and ML estimators for X in terms of Y. From Example 5.32, the conditional pdf of X given Y is given by: fX1x ƒ y2 = e -1x - y2 for y … x n which decreases as x increases beyond y. Therefore the MAP estimator is X MAP = y. On the other hand, the conditional pdf of Y given X is: fY1y ƒ x2 = e -y for 0 6 y … x. 1 - e -x As x increases beyond y, the denominator becomes larger so the conditional pdf decreases.Theren fore the ML estimator is X ML = y. In this example the ML and MAP estimators agree. Example 6.26 Jointly Gaussian Random Variables Find the MAP and ML estimator of X in terms of Y when X and Y are jointly Gaussian random variables. The conditional pdf of X given Y is given by: fX1x | y2 = exp b - 2 sX 1 1y - mY2 - mX ≤ r x - r 2 ¢ 2 sY 211 - r 2sX 22psX2 11 - r22 which is maximized by the value of x for which the exponent is zero. Therefore sX n X 1y - mY2 + mX . MAP = r sY The conditional pdf of Y given X is: fY1y | x2 = exp b - 2 sY 1 y - r 1x - mX2 - mY ≤ r 2 2 ¢ sX 211 - r 2sY 22psY2 11 - r22 which is also maximized for the value of x for which the exponent is zero: sY 1x - mX2 - mY . 0 = y - r sX . 334 Chapter 6 Vector Random Variables The ML estimator for X given Y = y is then: sX n 1y - mY2 + mX . X ML = rsY n n Therefore we conclude that X ML Z XMAP . In other words, knowledge of the a priori probabilities of X will affect the estimator. 6.5.2 Minimum MSE Linear Estimator n = g1Y2. In general, the The estimate for X is given by a function of the observation X n estimation error, X - X = X - g1Y2, is nonzero, and there is a cost associated with the error, c1X - g1Y22. We are usually interested in finding the function g(Y) that minimizes the expected value of the cost, E3c1X - g1Y224. For example, if X and Y are the discrete input and output of a communication channel, and c is zero when X = g1Y2 and one otherwise, then the expected value of the cost corresponds to the probability of error, that is, that X Z g1Y2. When X and Y are continuous random variables, we frequently use the mean square error (MSE) as the cost: e = E31X - g1Y2224. In the remainder of this section we focus on this particular cost function. We first consider the case where g(Y) is constrained to be a linear function of Y, and then consider the case where g(Y) can be any function, whether linear or nonlinear. First, consider the problem of estimating a random variable X by a constant a so that the mean square error is minimized: min E31X - a224 = E3X24 - 2aE3X4 + a2. a (6.51) The best a is found by taking the derivative with respect to a, setting the result to zero, and solving for a. The result is (6.52) a* = E3X4, which makes sense since the expected value of X is the center of mass of the pdf. The mean square error for this estimator is equal to E31X - a*224 = VAR1X2. Now consider estimating X by a linear function g1Y2 = aY + b: min E31X - aY - b224. a,b (6.53a) Equation (6.53a) can be viewed as the approximation of X - aY by the constant b. This is the minimization posed in Eq. (6.51) and the best b is b* = E3X - aY4 = E3X4 - aE3Y4. (6.53b) Substitution into Eq. (6.53a) implies that the best a is found by min E351X - E3X42 - a1Y - E3Y42624. a We once again differentiate with respect to a, set the result to zero, and solve for a: 0 = d E31X - E3X42 - a1Y - E3Y4224 da Section 6.5 Estimation of Random Variables 335 = -2E351X - E3X42 - a1Y - E3Y4261Y - E3Y424 = -21COV1X, Y2 - aVAR1Y22. (6.54) The best coefficient a is found to be a* = COV1X, Y2 VAR1Y2 = rX,Y sX , sY where sY = 2VAR1Y2 and sX = 2VAR1X2 . Therefore, the minimum mean square error (mmse) linear estimator for X in terms of Y is n X = a * Y + b* = rX,YsX Y - E3Y4 sY + E3X4. (6.55) The term 1Y - E3Y42/sY is simply a zero-mean, unit-variance version of Y. Thus sX1Y - E3Y42/sY is a rescaled version of Y that has the variance of the random variable that is being estimated, namely sX2 . The term E[X] simply ensures that the estimator has the correct mean. The key term in the above estimator is the correlation coefficient: rX,Y specifies the sign and extent of the estimate of Y relative to sX1Y - E3Y42/sY . If X and Y are uncorrelated (i.e., rX,Y = 0) then the best estimate for X is its mean, E[X]. On the other hand, if rX,Y = ;1 then the best estimate is equal to ;sX1Y - E3Y42/ sY + E3X4. We draw our attention to the second equality in Eq. (6.54): E351X - E3X42 - a*1Y - E3Y4261Y - E3Y424 = 0. (6.56) This equation is called the orthogonality condition because it states that the error of the best linear estimator, the quantity inside the braces, is orthogonal to the observation Y - E[Y]. The orthogonality condition is a fundamental result in mean square estimation. The mean square error of the best linear estimator is e*L = E311X - E3X42 - a*1Y - E3Y42224 = E311X - E3X42 - a*1Y - E3Y4221X - E3X424 - a*E311X - E3X42 - a*1Y - E3Y4221Y - E3Y424 = E311X - E3X42 - a*1Y - E3Y4221X - E3X424 = VAR1X2 - a* COV1X, Y2 = VAR1X211 - r2X,Y2 (6.57) where the second equality follows from the orthogonality condition. Note that when |rX,Y| = 1, the mean square error is zero. This implies that P3|X - a*Y - b*| = 04 = P3X = a*Y + b*4 = 1, so that X is essentially a linear function of Y. 336 6.5.3 Chapter 6 Vector Random Variables Minimum MSE Estimator In general the estimator for X that minimizes the mean square error is a nonlinear function of Y. The estimator g(Y) that best approximates X in the sense of minimizing mean square error must satisfy minimize E31X - g1Y2224. g1.2 The problem can be solved by using conditional expectation: E31X - g1Y2224 = E3E31X - g1Y222 ƒ Y44 q = L- q E31X - g1Y222 ƒ Y = y4fY1y2dy. The integrand above is positive for all y; therefore, the integral is minimized by minimizing E31X - g1Y222 ƒ Y = y4 for each y. But g(y) is a constant as far as the conditional expectation is concerned, so the problem is equivalent to Eq. (6.51) and the “constant” that minimizes E31X - g1y222 ƒ Y = y4 is g*1y2 = E3X ƒ Y = y4. (6.58) The function g*1y2 = E3X ƒ Y = y4 is called the regression curve which simply traces the conditional expected value of X given the observation Y = y. The mean square error of the best estimator is: e* = E31X - g*1Y2224 = = 3 Rn 3 R E31X - E3X ƒ y422 ƒ Y = y4fY1y2 dy VAR3X ƒ Y = y4fY1y2 dy. Linear estimators in general are suboptimal and have larger mean square errors. Example 6.27 Comparison of Linear and Minimum MSE Estimators Let X and Y be the random pair in Example 5.16. Find the best linear and nonlinear estimators for X in terms of Y, and of Y in terms of X. Example 5.28 provides the parameters needed for the linear estimator: E3X4 = 3/2, E3Y4 = 1/2, VAR3X4 = 5/4, VAR3Y4 = 1/4, and rX,Y = 1/25. Example 5.32 provides the conditional pdf’s needed to find the nonlinear estimator. The best linear and nonlinear estimators for X in terms of Y are: n = X E3X ƒ y4 = 3 1 25 Y - 1/2 + = Y + 1 2 1/2 2 25 Ly q xe -1x - y2 dx = y + 1 and so E3X ƒ Y4 = Y + 1. Thus the optimum linear and nonlinear estimators are the same. Section 6.5 Estimation of Random Variables 337 1.2 1 0.8 0.6 0.4 4.9 4.6 4 4.3 3.7 3.4 3.1 2.8 2.5 2.2 1.9 1.6 1 1.3 0.7 0 0.4 0.2 0.1 Estimator for Y given x 1.4 x FIGURE 6.2 Comparison of linear and nonlinear estimators. The best linear and nonlinear estimators for Y in terms of X are: n = Y E3Y ƒ x4 = 1 1 X - 3/2 1 + = 1X + 12/5. 2 25 2 25/2 L0 x y 1 - e -x - xe -x xe -x e -y = 1 . -x dy = -x 1 - e 1 - e 1 - e -x The optimum linear and nonlinear estimators are not the same in this case. Figure 6.2 compares the two estimators. It can be seen that the linear estimator is close to E3Y ƒ x4 for lower values of x, where the joint pdf of X and Y are concentrated and that it diverges from E3Y ƒ x4 for larger values of x. Example 6.28 Let X be uniformly distributed in the interval 1-1, 12 and let Y = X2. Find the best linear estimator for Y in terms of X. Compare its performance to the best estimator. The mean of X is zero, and its correlation with Y is E3XY4 = E3XX24 = 1 L- 21 x3/2 dx = 0. Therefore COV1X, Y2 = 0 and the best linear estimator for Y is E[Y] by Eq. (6.55). The mean square error of this estimator is the VAR(Y) by Eq. (6.57). The best estimator is given by Eq. (6.58): E3Y ƒ X = x4 = E3X2 ƒ X = x4 = x2. The mean square error of this estimator is E31Y - g1X2224 = E31X2 - X2224 = 0. Thus in this problem, the best linear estimator performs poorly while the nonlinear estimator gives the smallest possible mean square error, zero. 338 Chapter 6 Vector Random Variables Example 6.29 Jointly Gaussian Random Variables Find the minimum mean square error estimator of X in terms of Y when X and Y are jointly Gaussian random variables. The minimum mean square error estimator is given by the conditional expectation of X given Y. From Eq. (5.63), we see that the conditional expectation of X given Y = y is given by sX E3X ƒ Y = y4 = E3X4 + rX, Y s 1Y - E3Y42. Y This is identical to the best linear estimator. Thus for jointly Gaussian random variables the minimum mean square error estimator is linear. 6.5.4 Estimation Using a Vector of Observations The MAP, ML, and mean square estimators can be extended to where a vector of observations is available. Here we focus on mean square estimation. We wish to estimate X by a function g(Y) of a random vector of observations Y = 1Y1 , Y2 , Á , Yn2T so that the mean square error is minimized: minimize E31X - g1Y2224. g1.2 To simplify the discussion we will assume that X and the Yi have zero means. The same derivation that led to Eq. (6.58) leads to the optimum minimum mean square estimator: g*1y2 = E3X ƒ Y = y4. (6.59) The minimum mean square error is then: E31X - g*1Y2224 = 3 Rn E31X - E3X ƒ Y422 ƒ Y = y4fY1y2dy = 3 Rn VAR3X ƒ Y = y4fY1y2dy. Now suppose the estimate is a linear function of the observations: g1Y2 = a akYk = aTY. n k=1 The mean square error is now: E31X - g1Y2224 = E B ¢ X - a akYk ≤ R . n 2 k=1 We take derivatives with respect to ak and again obtain the orthogonality conditions: E B ¢ X - a akYk ≤ Yj R = 0 n k=1 for j = 1, Á , n. Section 6.5 Estimation of Random Variables 339 The orthogonality condition becomes: E3XYj4 = E B ¢ a akYk ≤ Yj R = a akE3YkYj4 for j = 1, Á , n. n n k=1 k=1 We obtain a compact expression by introducing matrix notation: E3XY4 = R Ya where a = 1a1 , a2 , Á , an2T. (6.60) where E3XY4 = 3E3XY14, E3XY24 , Á , E3XYn4T and R Y is the correlation matrix. Assuming R Y is invertible, the optimum coefficients are: a = R Y-1E3XY4. (6.61a) We can use the methods from Section 6.3 to invert R Y . The mean square error of the optimum linear estimator is: E31X - aTY224 = E31X - aTY2X4 - E31X - aTY2aTY4 = E31X - aTY2X4 = VAR1X2 - aTE3YX4. (6.61b) Now suppose that X has mean mX and Y has mean vector m Y , so our estimator now has the form: T n = g1Y2 = X a akYk + b = a Y + b. n (6.62) k=1 The same argument that led to Eq. (6.53b) implies that the optimum choice for b is: b = E3X4 - aTm Y . Therefore the optimum linear estimator has the form: n = g1Y2 = aT1Y - m 2 + m = aTZ + m X Y X X where Z = Y - m Y is a random vector with zero mean vector. The mean square error for this estimator is: E31X - g1Y2224 = E31X - aTZ - mX224 = E31W - aTZ224 where W = X - mX has zero mean. We have reduced the general estimation problem to one with zero mean random variables, i.e., W and Z, which has solution given by Eq. (6.61a). Therefore the optimum set of linear predictors is given by: a = R z -1E3WZ4 = K Y-1E31X - mX21Y - m Y24. (6.63a) The mean square error is: E31X - aTY - b224 = E31W - aTZ W4 = VAR1W2 - aTE3WZ4 = VAR1X2 - aTE31X - m X21Y - m Y24. (6.63b) This result is of particular importance in the case where X and Y are jointly Gaussian random variables. In Example 6.23 we saw that the conditional expected value 340 Chapter 6 Vector Random Variables of X given Y is a linear function of Y of the form in Eq. (6.62). Therefore in this case the optimum minimum mean square estimator corresponds to the optimum linear estimator. Example 6.30 Diversity Receiver A radio receiver has two antennas to receive noisy versions of a signal X. The desired signal X is a Gaussian random variable with zero mean and variance 2. The signals received in the first and second antennas are Y1 = X + N1 and Y2 = X + N2 where N1 and N2 are zero-mean, unit-variance Gaussian random variables. In addition, X, N1 , and N2 are independent random variables. Find the optimum mean square error linear estimator for X based on a single antenna signal and the corresponding mean square error. Compare the results to the optimum mean square estimator for X based on both antenna signals Y = 1Y1 , Y22. Since all random variables have zero mean, we only need the correlation matrix and the cross-correlation vector in Eq. (6.61): RY = B = B = B and E3Y214 E3Y1Y24 E3Y1Y24 R E3Y224 E31X + N1224 E31X + N121X + N224 E3X24 + E3N 214 E3X24 E3XY4 = B E31X + N121X + N224 R E31X + N2224 E3X24 3 2 2 R = B 2 E3X 4 + E3N 24 2 R 3 E3XY14 E3X24 2 R = B 2 R = B R. E3XY24 E3X 4 2 The optimum estimator using a single antenna received signal involves solving the 1 * 1 version of the above system: E3X24 2 N = X Y1 = Y1 2 2 3 E3X 4 + E3N 14 and the associated mean square error is: VAR1X2 - a* COV1Y1 , X2 = 2 - 2 2 2 = . 3 3 The coefficients of the optimum estimator using two antenna signals are: a = R Y-1E3XY4 = B 3 2 2 -1 2 1 3 R B R = B 3 2 5 -2 -2 2 0.4 RB R = B R 3 2 0.4 and the optimum estimator is: N = 0.4Y + 0.4Y . X 1 2 The mean square error for the two antenna estimator is: 2 E31X - aTY224 = VAR1X2 - aTE3YX4 = 2 - 30.4, 0.44 B R = 0.4. 2 Section 6.5 Estimation of Random Variables 341 As expected, the two antenna system has a smaller mean square error. Note that the receiver adds the two received signals and scales the result by 0.4. The sum of the signals is: N = 0.4Y + 0.4Y = 0.412X + N + N 2 = 0.8 ¢ X + N1 + N2 ≤ X 1 2 1 2 2 so combining the signals keeps the desired signal portion, X, constant while averaging the two noise signals N1 and N2. The problems at the end of the chapter explore this topic further. Example 6.31 Second-Order Prediction of Speech Let X1 , X2 , Á be a sequence of samples of a speech voltage waveform, and suppose that the samples are fed into the second-order predictor shown in Fig. 6.3. Find the set of predictor coefficients a and b that minimize the mean square value of the predictor error when Xn is estimated by aXn - 2 + bXn - 1 . We find the best predictor for X1 , X2 , and X3 and assume that the situation is identical for X2 , X3, and X4 and so on. It is common practice to model speech samples as having zero mean and variance s2, and a covariance that does not depend on the specific index of the samples, but rather on the separation between them: COV1Xj , Xk2 = rƒj - kƒs2. The equation for the optimum linear predictor coefficients becomes s2 B 1 r1 r1 a r R B R = s2 B 2 R . b 1 r1 Equation (6.61a) gives a = r2 - r21 1 - r21 Xn b and b = r111 - r212 Xn  1 Xn  2 ⫻ 1 - r21 ⫻ a ⫹ ^ ⫹ Xn ⫺ ⫹ En FIGURE 6.3 A two-tap linear predictor for processing speech. . 342 Chapter 6 Vector Random Variables In Problem 6.78, you are asked to show that the mean square error using the above values of a and b is 1r21 - r222 (6.64) s2 b 1 - r21 r. 1 - r21 Typical values for speech signals are r1 = .825 and r2 = .562. The mean square value of the predictor output is then .281s2. The lower variance of the output 1.281s22 relative to the input variance 1s22 shows that the linear predictor is effective in anticipating the next sample in terms of the two previous samples. The order of the predictor can be increased by using more terms in the linear predictor. Thus a third-order predictor has three terms and involves inverting a 3 * 3 correlation matrix, and an n-th order predictor will involve an n * n matrix. Linear predictive techniques are used extensively in speech, audio, image and video compression systems. We discuss linear prediction methods in greater detail in Chapter 10. *6.6 GENERATING CORRELATED VECTOR RANDOM VARIABLES Many applications involve vectors or sequences of correlated random variables. Computer simulation models of such applications therefore require methods for generating such random variables. In this section we present methods for generating vectors of random variables with specified covariance matrices. We also discuss the generation of jointly Gaussian vector random variables. 6.6.1 Generating Random Vectors with Specified Covariance Matrix Suppose we wish to generate a random vector Y with an arbitrary valid covariance matrix K Y . Let Y = ATX as in Example 6.17, where X is a vector random variable with components that are uncorrelated, zero mean, and unit variance. X has covariance matrix equal to the identity matrix K X = I, m Y = Am X = 0, and K Y = ATK XA = ATA. Let P be the matrix whose columns are the eigenvectors of K Y and let ∂ be the diagonal matrix of eigenvalues, then from Eq. (6.39b) we have: P TK YP = P TP∂ = ∂. If we premultiply the above equation by P and then postmultiply by P T, we obtain expression for an arbitrary covariance matrix K Y in terms of its eigenvalues and eigenvectors: (6.65) P∂P T = PP TK YPP T = K Y . Define the matrix ∂ 1/2 as the diagonal matrix of square roots of the eigenvalues: ∂ 1/2 2l1 0 ! D . 0 0 2l2 . 0 Á Á Á Á 0 0 . T. 2ln Section 6.6 Generating Correlated Vector Random Variables 343 In Problem 6.53 we show that any covariance matrix K Y is positive semi-definite, which implies that it has nonnegative eigenvalues, and so taking the square root is always possible. If we now let A = 1P∂ 1/22T (6.66) then ATA = P∂ 1/2 ∂ 1/2P T = P∂P T = K Y . Therefore Y has the desired covariance matrix K Y . Example 6.32 Let X = 1X1 , X22 consist of two zero-mean, unit-variance, uncorrelated random variables. Find the matrix A such that Y = AX has covariance matrix K = B 4 2 2 R. 4 First we need to find the eigenvalues of K which are determined from the following equation: det1K - lI2 = 0 = det B 4 - l 2 2 R = 14 - l22 - 4 = l2 - 8l + 12 4 - l = 1l - 621l - 22. We find the eigenvalues to be l1 = 2 and l2 = 6. Next we need to find the eigenvectors corresponding to each eigenvalue: B 4 2 2 e1 e e R B R = l1 B 1 R = 2 B 1 R 4 e2 e2 e2 which implies that 2e1 + 2e2 = 0. Thus any vector of the form 31, -14T is an eigenvector. We choose the normalized eigenvector corresponding to l1 = 2 as e1 = 31/ 22, -1/224T. We similarly find the eigenvector corresponding to l2 = 6 as e2 = 31/22, 1/224T. The method developed in Section 6.3 requires that we form the matrix P whose columns consist of the eigenvectors of K: 1 1 1 P = B R. -1 1 22 Next it requires that we form the diagonal matrix with elements equal to the square root of the eigenvalues: 22 0 ∂ 1/2 = B R. 0 26 The desired matrix is then A = P∂ 1/2 = B You should verify that K = AAT. 1 -1 23 R. 23 344 Chapter 6 Vector Random Variables Example 6.33 Use Octave to find the eigenvalues and eigenvectors calculated in the previous example. After entering the matrix K, we use the eig(K) function to find the matrix of eigenvectors P and eigenvalues ¶. We then find A and its transpose AT. Finally we confirm that ATA gives the desired covariance matrix. > K=[4, 2; 2, 4]; > [P,D] =eig (K) P= -0.70711 0.70711 0.70711 0.70711 D= 2 0 0 6 > A=(P*sqrt(D))’ A= -1.0000 1.0000 1.7321 1.7321 > A’ ans = -1.0000 1.7321 1.0000 1.7321 > A’*A ans = 4.0000 2.0000 2.0000 4.0000 The above steps can be used to find the transformation AT for any desired covariance matrix K. The only check required is to ascertain that K is a valid covariance matrix: (1) K is symmetric (trivial); (2) K has positive eigenvalues (easy to check numerically). 6.6.2 Generating Vectors of Jointly Gaussian Random Variables In Section 6.4 we found that if X is a vector of jointly Gaussian random variables with covariance KX , then Y = AX is also jointly Gaussian with covariance matrix KY = AKXAT. If we assume that X consists of unit-variance, uncorrelated random variables, then KX = I, the identity matrix, and therefore KY = AAT. We can use the method from the first part of this section to find A for any desired covariance matrix KY . We generate jointly Gaussian random vectors Y with arbitrary covariance matrix KY and mean vector m Y as follows: 1. Find a matrix A such that KY = AAT. 2. Use the method from Section 5.10 to generate X consisting of n independent, zero-mean, Gaussian random variables. 3. Let Y = AX + m Y. Section 6.6 Generating Correlated Vector Random Variables 345 Example 6.34 The Octave commands below show necessary steps for generating the Gaussian random variables with the covariance matrix from Example 6.30. > U1=rand(1000, 1); % Create a 1000-element vector U1. > U2=rand(1000, 1); % Create a 1000-element vector U2. > R2=-2 log(U1); % Find R2. > TH=2*pi*U2; % Find ®. > X1=sqrt(R2).*sin(TH); % Generate X1. > X2=sqrt(R2).*cos(TH); % Generate X2. > Y1=X1+sqrt(3)*X2 % Generate Y1. > Y2=-X1+sqrt(3)*X2 % Generate Y2. > plot(Y1,Y2,’+’) % Plot scattergram. * We plotted the Y1 values vs. the Y2 values for 1000 pairs of generated random variables in a scattergram as shown in Fig. 6.4. Good agreement with the elliptical symmetry of the desired jointly Gaussian pdf is observed. FIGURE 6.4 Scattergram of jointly Gaussian random variables. 346 Chapter 6 Vector Random Variables SUMMARY • The joint statistical behavior of a vector of random variables X is specified by the joint cumulative distribution function, the joint probability mass function, or the joint probability density function. The probability of any event involving the joint behavior of these random variables can be computed from these functions. • The statistical behavior of subsets of random variables from a vector X is specified by the marginal cdf, marginal pdf, or marginal pmf that can be obtained from the joint cdf, joint pdf, or joint pmf of X. • A set of random variables is independent if the probability of a product-form event is equal to the product of the probabilities of the component events. Equivalent conditions for the independence of a set of random variables are that the joint cdf, joint pdf, or joint pmf factors into the product of the corresponding marginal functions. • The statistical behavior of a subset of random variables from a vector X, given the exact values of the other random variables in the vector, is specified by the conditional cdf, conditional pmf, or conditional pdf. Many problems naturally lend themselves to a solution that involves conditioning on the values of some of the random variables. In these problems, the expected value of random variables can be obtained through the use of conditional expectation. • The mean vector and the covariance matrix provide summary information about a vector random variable. The joint characteristic function contains all of the information provided by the joint pdf. • Transformations of vector random variables generate other vector random variables. Standard methods are available for finding the joint distributions of the new random vectors. • The orthogonality condition provides a set of linear equations for finding the minimum mean square linear estimate. The best mean square estimator is given by the conditional expected value. • The joint pdf of a vector X of jointly Gaussian random variables is determined by the vector of the means and by the covariance matrix. All marginal pdf’s and conditional pdf’s of subsets of X have Gaussian pdf’s. Any linear function or linear transformation of jointly Gaussian random variables will result in a set of jointly Gaussian random variables. • A vector of random variables with an arbitrary covariance matrix can be generated by taking a linear transformation of a vector of unit-variance, uncorrelated random variables. A vector of Gaussian random variables with an arbitrary covariance matrix can be generated by taking a linear transformation of a vector of independent, unit-variance jointly Gaussian random variables. Annotated References 347 CHECKLIST OF IMPORTANT TERMS Conditional cdf Conditional expectation Conditional pdf Conditional pmf Correlation matrix Covariance matrix Independent random variables Jacobian of a transformation Joint cdf Joint characteristic function Joint pdf Joint pmf Jointly continuous random variables Jointly Gaussian random variables Karhunen-Loeve expansion MAP estimator Marginal cdf Marginal pdf Marginal pmf Maximum likelihood estimator Mean square error Mean vector MMSE linear estimator Orthogonality condition Product-form event Regression curve Vector random variables ANNOTATED REFERENCES Reference [3] provides excellent coverage on linear transformation and jointly Gaussian random variables. Reference [5] provides excellent coverage of vector random variables. The book by Anton [6] provides an accessible introduction to linear algebra. 1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 2002. 2. N. Johnson et al., Continuous Multivariate Distributions, Wiley, New York, 2000. 3. H. Cramer, Mathematical Methods of Statistics, Princeton Press, 1999. 4. R. Gray and L.D. Davisson, An Introduction to Statistical Signal Processing, Cambridge Univ. Press, Cambridge, UK, 2005. 5. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986. 6. H. Anton, Elementary Linear Algebra, 9th ed., Wiley, New York, 2005. 7. C. H. Edwards, Jr., and D. E. Penney, Calculus and Analytic Geometry, 4th ed., Prentice Hall, Englewood Cliffs, N.J., 1984. 348 Chapter 6 Vector Random Variables PROBLEMS Section 6.1: Vector Random Variables 6.1. The point X = 1X, Y, Z2 is uniformly distributed inside a sphere of radius 1 about the origin. Find the probability of the following events: (a) X is inside a sphere of radius r, r 7 0. (b) X is inside a cube of length 2/23 centered about the origin. (c) All components of X are positive. (d) Z is negative. 6.2. A random sinusoid signal is given by X1t2 = A sin1t2 where A is a uniform random variable in the interval [0, 1]. Let X = 1X1t12, X1t22, X1t322 be samples of the signal taken at times t1 , t2 , and t3 . (a) Find the joint cdf of X in terms of the cdf of A if t1 = 0, t2 = p/2, and t3 = p. Are X1t12, X1t22, X1t32 independent random variables? (b) Find the joint cdf of X for t1 , t2 = t1 + p/2, and t3 = t1 + p. Let t1 = p/6. 6.3. Let the random variables X, Y, and Z be independent random variables. Find the following probabilities in terms of FX1x2, FY1y2, and FZ1z2. (a) P3 ƒ X ƒ 6 5, Y 6 4, Z3 7 84. (b) P3X = 5, Y 6 0, Z 7 14. (c) P3min1X, Y, Z2 6 24. (d) P3max1X, Y, Z2 7 64. 6.4. A radio transmitter sends a signal s 7 0 to a receiver using three paths. The signals that arrive at the receiver along each path are: X1 = s + N1 , X2 = s + N2 , and X3 = s + N3 , where N1 , N2 , and N3 are independent Gaussian random variables with zero mean and unit variance. (a) Find the joint pdf of X = 1X1 , X2 , X32. Are X1 , X2 , and X3 independent random variables? (b) Find the probability that the minimum of all three signals is positive. (c) Find the probability that a majority of the signals are positive. 6.5. An urn contains one black ball and two white balls. Three balls are drawn from the urn. Let Ik = 1 if the outcome of the kth draw is the black ball and let Ik = 0 otherwise. Define the following three random variables: X = I1 + I2 + I3 , Y = min5I1 , I2 , I36, Z = max5I1 , I2 , I36. (a) Specify the range of values of the triplet (X, Y, Z) if each ball is put back into the urn after each draw; find the joint pmf for (X, Y, Z). (b) In part a, are X, Y, and Z independent? Are X and Y independent? (c) Repeat part a if each ball is not put back into the urn after each draw. 6.6. Consider the packet switch in Example 6.1. Suppose that each input has one packet with probability p and no packets with probability 1 - p. Packets are equally likely to be Problems 349 destined to each of the outputs. Let X1, X2 and X3 be the number of packet arrivals destined for output 1, 2, and 3, respectively. (a) Find the joint pmf of X1 , X2 , and X3 Hint: Imagine that every input has a packet go to a fictional port 4 with probality 1 – p. (b) Find the joint pmf of X1 and X2 . (c) Find the pmf of X2 . (d) Are X1 , X2 , and X3 independent random variables? (e) Suppose that each output will accept at most one packet and discard all additional packets destined to it. Find the average number of packets discarded by the module in each T-second period. 6.7. Let X, Y, Z have joint pdf fX,Y,Z1x, y, z2 = k1x + y + z2 for 0 … x … 1, 0 … y … 1, 0 … z … 1. 6.8. 6.9. 6.10. 6.11. 6.12. 6.13. 6.14. (a) Find k. (b) Find fX1x ƒ y, z2 and fZ1z ƒ x, y2. (c) Find fX1x2, fY1y2, and fZ1z2. A point X = 1X, Y, Z2 is selected at random inside the unit sphere. (a) Find the marginal joint pdf of Y and Z. (b) Find the marginal pdf of Y. (c) Find the conditional joint pdf of X and Y given Z. (d) Are X, Y, and Z independent random variables? (e) Find the joint pdf of X given that the distance from X to the origin is greater than 1/2 and all the components of X are positive. Show that pX1,X2, X31x1 , x2 , x32 = pX31x3 ƒ x1 , x22pX21x2 ƒ x12pX11x12. Let X1 , X2 , Á , Xn be binary random variables taking on values 0 or 1 to denote whether a speaker is silent (0) or active (1). A silent speaker remains idle at the next time slot with probability 3/4, and an active speaker remains active with probability 1/2. Find the joint pmf for X1 , X2 , X3 , and the marginal pmf of X3 . Assume that the speaker begins in the silent state. Show that fX,Y,Z1x, y, z2 = fZ1z ƒ x, y2fY1y ƒ x2fX1x2. Let U1 , U2 , and U3 be independent random variables and let X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 . (a) Use the result in Problem 6.11 to find the joint pdf of X, Y, and Z. (b) Let the Ui be independent uniform random variables in the interval [0, 1]. Find the marginal joint pdf of Y and Z. Find the marginal pdf of Z. (c) Let the Ui be independent zero-mean, unit-variance Gaussian random variables. Find the marginal pdf of Y and Z. Find the marginal pdf of Z. Let X1 , X2 , and X3 be the multiplicative sequence in Example 6.7. (a) Find, plot, and compare the marginal pdfs of X1 , X2 , and X3 . (b) Find the conditional pdf of X3 given X1 = x. (c) Find the conditional pdf of X1 given X3 = z. Requests at an online music site are categorized as follows: Requests for most popular title with p1 = 1/2; second most popular title with p2 = 1/4; third most popular title with p3 = 1/8; and other p4 = 1 - p1 - p2 - p3 = 1/8. Suppose there are a total number of 350 Chapter 6 Vector Random Variables n requests in T seconds. Let Xk be the number of times category k occurs. (a) Find the joint pmf of 1X1 , X2 , X32. (b) Find the marginal pmf of 1X1 , X22. Hint: Use the binomial theorem. (c) Find the marginal pmf of X1 . (d) Find the conditional joint pmf of 1X2 , X32 given X1 = m, where 0 … m … n. 6.15. The number N of requests at the online music site in Problem 6.14 is a Poisson random variable with mean a customers per second. Let Xk be the number of type k requests in T seconds. Find the joint pmf of 1X1 , X2 , X3 , X42. 6.16. A random experiment has four possible outcomes. Suppose that the experiment is repeated n independent times and let Xk be the number of times outcome k occurs. The joint pmf of 1X1 , X2 , X32 is given by p1k1 , k2 , k32 = n + 3 -1 n! 3! = ¢ ≤ 3 1n + 32! for 0 … ki and k1 + k2 + k3 … n. (a) Find the marginal pmf of 1X1 , X22. (b) Find the marginal pmf of X1 . (c) Find the conditional joint pmf of 1X2 , X32 given X1 = m, where 0 … m … n. 6.17. The number of requests of types 1, 2, and 3, respectively, arriving at a service station in t seconds are independent Poisson random variables with means l1t, l2t, and l3t. Let N1 , N2 , and N3 be the number of requests that arrive during an exponentially distributed time T with mean at. (a) Find the joint pmf of N1 , N2 , and N3 . (b) Find the marginal pmf of N1 . (c) Find the conditional pmf of N1 and N2 , given N3 . Section 6.2: Functions of Several Random Variables 6.18. N devices are installed at the same time. Let Y be the time until the first device fails. (a) Find the pdf of Y if the lifetimes of the devices are independent and have the same Pareto distribution. (b) Repeat part a if the device lifetimes have a Weibull distribution. 6.19. In Problem 6.18 let Ik1t2 be the indicator function for the event “kth device is still working at time t.” Let N(t) be the number of devices still working at time t: N1t2 = I11t2 + I21t2 + Á + IN1t2. Find the pmf of N(t) as well as its mean and variance. 6.20. A diversity receiver receives N independent versions of a signal. Each signal version has an amplitude Xk that is Rayleigh distributed. The receiver selects that signal with the largest amplitude Xk2 . A signal is not useful if the squared amplitude falls below a threshold g. Find the probability that all N signals are below the threshold. 6.21. (Haykin) A receiver in a multiuser communication system accepts K binary signals from K independent transmitters: Y = 1Y1 , Y2 , Á , YK2, where Yk is the received signal from the kth transmitter. In an ideal system the received vector is given by: Y = Ab + N where A = 3ak4 is a diagonal matrix of positive channel gains, b = 1b1 , b2 , Á , bK2 is the vector of bits from each of the transmitters where bk = ;1, and N is a vector of K Problems 351 independent zero-mean, unit-variance Gaussian random variables. (a) Find the joint pdf of Y. (b) Suppose b = 11, 1, Á , 12, find the probability that all components of Y are positive. 6.22. (a) Find the joint pdf of U = X1 , V = X1 + X2 , and W = X1 + X2 + X3 . (b) Evaluate the joint pdf of (U, V, W) if the Xi are independent zero-mean, unit variance Gaussian random variables. (c) Find the marginal pdf of V and of W. 6.23. (a) Find the joint pdf of the sample mean and variance of two random variables: M = X1 + X2 2 V = 1X1 - M22 + 1X2 - M22 2 in terms of the joint pdf of X1 and X2 . (b) Evaluate the joint pdf if X1 and X2 are independent Gaussian random variables with the same mean 1 and variance 1. (c) Evaluate the joint pdf if X1 and X2 are independent exponential random variables with the same parameter 1. 6.24. (a) Use the auxiliary variable method to find the pdf of Z = 6.25. 6.26. 6.27. 6.28. 6.29. X . X + Y (b) Find the pdf of Z if X and Y are independent exponential random variables with the parameter 1. (c) Repeat part b if X and Y are independent Pareto random variables with parameters k = 2 and xm = 1. Repeat Problem 6.24 parts a and b for Z = X/Y. Let X and Y be zero-mean, unit-variance Gaussian random variables with correlation coefficient 1/2. Find the joint pdf of U = X2 and V = Y4. Use auxilliary variables to find the pdf of Z = X1X2X3 where the Xi are independent random variables that are uniformly distributed in [0, 1]. Let X, Y, and Z be independent zero-mean, unit-variance Gaussian random variables. (a) Find the pdf of R = (X2 + Y2 + Z2)1/2. (b) Find the pdf of R2 = X2 + Y2 + Z2. Let X1 , X2 , X3 , X4 be processed as follows: Y1 = X1 , Y2 = X1 + X2 , Y3 = X2 + X3 , Y4 = X3 + X4 . (a) Find an expression for the joint pdf of Y = 1Y1 , Y2 , Y3 , Y42 in terms of the joint pdf of X = 1X1 , X2 , X3 , X42. (b) Find the joint pdf of Y if X1 , X2 , X3 , X4 are independent zero-mean, unit-variance Gaussian random variables. Section 6.3: Expected Values of Vector Random Variables 6.30. Find E[M], E[V], and E[MV] in Problem 6.23c. 6.31. Compute E[Z] in Problem 6.27 in two ways: (a) by integrating over fZ1z2; (b) by integrating over the joint pdf of 1X1 , X2 , X32. 352 Chapter 6 Vector Random Variables 6.32. Find the mean vector and covariance matrix for three multipath signals X = 1X1 , X2 , X32 in Problem 6.4. 6.33. Find the mean vector and covariance matrix for the samples of the sinusoidal signals X = 1X1t12, X1t22, X1t322 in Problem 6.2. 6.34. (a) Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.5a. (b) Repeat part a for Problem 6.5c. 6.35. Find the mean vector and covariance matrix for (X, Y, Z) in Problem 6.7. 6.36. Find the mean vector and covariance matrix for the point (X, Y, Z) inside the unit sphere in Problem 6.8. 6.37. (a) Use the results of Problem 6.6c to find the mean vector for the packet arrivals X1 , X2 , and X3 in Example 6.5. (b) Use the results of Problem 6.6b to find the covariance matrix. (c) Explain why X1 , X2 , and X3 are correlated. 6.38. Find the mean vector and covariance matrix for the joint number of packet arrivals in a random time N1 , N2 , and N3 in Problem 6.17. Hint: Use conditional expectation. 6.39. (a) Find the mean vector and covariance matrix (U, V, W) in terms of 1X1 , X2 , X32 in Problem 6.22b. (b) Find the cross-covariance matrix between (U, V, W) and 1X1 , X2 , X32. 6.40. (a) Find the mean vector and covariance matrix of Y = 1Y1 , Y2 , Y3 , Y42 in terms of those of X = 1X1 , X2 , X3 , X42 in Problem 6.29. (b) Find the cross-covariance matrix between Y and X. (c) Evaluate the mean vector, covariance, and cross-covariance matrices if X1 , X2 , X3 , X4 are independent random variables. (d) Generalize the results in part c to Y = 1Y1 , Y2 , Á , Yn - 1 , Yn2. 6.41. Let X = 1X1 , X2 , X3 , X42 consist of equal mean, independent, unit-variance random variables. Find the mean vector, covariance, and cross-covariance matrices of Y = AX: 1 0 (a) A = D 0 0 1/2 1 0 0 1/4 1/2 1 0 1/8 1/4 T 1/2 1 1 1 (b) A = D 1 1 1 -1 1 -1 1 1 -1 -1 1 -1 T. -1 1 6.42. Let W = aX + bY + c, where X and Y are random variables. (a) Find the characteristic function of W in terms of the joint characteristic function of X and Y. (b) Find the characteristic function of W if X and Y are the random variables discussed in Example 6.19. Find the pdf of W. Problems 353 6.43. (a) Find the joint characteristic function of the jointly Gaussian random variables X and Y introduced in Example 5.45. Hint: Consider X and Y as a transformation of the independent Gaussian random variables V and W. (b) Find E3X2Y4. (c) Find the joint characteristic function of X ¿ = X + a and Y ¿ = Y + b. 6.44. Let X = aU + bV and y = cU + dV, where ƒ ad - bc ƒ Z 0. (a) Find the joint characteristic function of X and Y in terms of the joint characteristic function of U and V. (b) Find an expression for E[XY] in terms of joint moments of U and V. 6.45. Let X and Y be nonnegative, integer-valued random variables. The joint probability generating function is defined by GX,Y1z1 , z22 = E3z1X z2Y 4 = a a z1 z2k P3X = j, Y = k4. q q j j=0k=0 (a) Find the joint pgf for two independent Poisson random variables with parameters a1 and a2 . (b) Find the joint pgf for two independent binomial random variables with parameters (n, p) and (m, p). 6.46. Suppose that X and Y have joint pgf GX,Y1z1 , z22 = ea11z1 - 12 + a21z2 - 12 + b1z1z2 - 12. (a) Use the marginal pgf’s to show that X and Y are Poisson random variables. (b) Find the pgf of Z = X + Y. Is Z a Poisson random variable? 6.47. Let X and Y be trinomial random variables with joint pmf P3X = j, Y = k4 = n! pj1pk2 11 - p1 - p22n - j - k for 0 … j, k and j + k … n. j! k!1n - j - k2! (a) Find the joint pgf of X and Y. (b) Find the correlation and covariance of X and Y. 6.48. Find the mean vector and covariance matrix for (X, Y) in Problem 6.46. 6.49. Find the mean vector and covariance matrix for (X, Y) in Problem 6.47. 6.50. Let X = 1X1 , X22 have covariance matrix: KX = B 1 1/4 1/4 R. 1 (a) Find the eigenvalues and eigenvectors of K X. (b) Find the orthogonal matrix P that diagonalizes K X. Verify that P is orthogonal and that P TK XP = ∂. (c) Express X in terms of the eigenvectors of K X using the Karhunen-Loeve expansion. 6.51. Repeat Problem 6.50 for X = 1X1 , X2 , X32 with covariance matrix: 1 K X = C -1/2 -1/2 -1/2 1 -1/2 -1/2 -1/2 S . 1 354 Chapter 6 Vector Random Variables 6.52. A square matrix A is said to be nonnegative definite if for any vector a = (a1,a2, Á , an)T : a TA a Ú 0. Show that the covariance matrix is nonnegative definite. Hint: Use the fact that E31aT1X - m X2224 Ú 0. 6.53. A is positive definite if for any nonzero vector a = 1a1 , a2 , Á , an2T: aTA a 7 0. (a) Show that if all the eigenvalues are positive, then K X is positive definite. Hint: Let b = P Ta. (b) Show that if K X is positive definite, then all the eigenvalues are positive. Hint: Let a be an eigenvector of K X. Section 6.4: Jointly Gaussian Random Vectors 6.54. Let X = 1X1 , X22 be the jointly Gaussian random variables with mean vector and covariance matrix given by: 1 3/2 -1/2 KX = B mX = B R R. -1/2 3/2 0 Find the pdf of X in matrix notation. Find the pdf of X using the quadratic expression in the exponent. Find the marginal pdfs of X1 and X2 . Find a transformation A such that the vector Y = AX consists of independent Gaussian random variables. (e) Find the joint pdf of Y. 6.55. Let X = 1X1 , X2 , X32 be the jointly Gaussian random variables with mean vector and covariance matrix given by: (a) (b) (c) (d) mX 1 = C0S 2 KX 3/2 = C 0 1/2 0 1 0 1/2 0 S. 3/2 Find the pdf of X in matrix notation. Find the pdf of X using the quadratic expression in the exponent. Find the marginal pdfs of X1 , X2 , and X3 . Find a transformation A such that the vector Y = AX consists of independent Gaussian random variables. (e) Find the joint pdf of Y. 6.56. Let U1 , U2 , and U3 be independent zero-mean, unit-variance Gaussian random variables and let X = U1 , Y = U1 + U2 , and Z = U1 + U2 + U3 . (a) Find the covariance matrix of (X, Y, Z). (b) Find the joint pdf of (X, Y, Z). (c) Find the conditional pdf of Y and Z given X. (d) Find the conditional pdf of Z given X and Y. 6.57. Let X1 , X2 , X3 , X4 be independent zero-mean, unit-variance Gaussian random variables that are processed as follows: (a) (b) (c) (d) Y1 = X1 + X2 , Y2 = X2 + X3 , Y3 = X3 + X4 . (a) (b) (c) (d) Find the covariance matrix of Y = 1Y1 , Y2 , Y32. Find the joint pdf of Y. Find the joint pdf of Y1 and Y2 ; Y1 and Y3 . Find a transformation A such that the vector Z = AY consists of independent Gaussian random variables. Problems 355 6.58. A more realistic model of the receiver in the multiuser communication system in Problem 6.21 has the K received signals Y = 1Y1 , Y2 , Á , YK2 given by: Y = ARb + N where A = 3ak4 is a diagonal matrix of positive channel gains, R is a symmetric matrix that accounts for the interference between users, and b = 1b1 , b2 , Á , bK2 is the vector of bits from each of the transmitters. N is the vector of K independent zero-mean, unit-variance Gaussian noise random variables. (a) Find the joint pdf of Y. (b) Suppose that in order to recover b, the receiver computes Z = 1AR2-1Y. Find the joint pdf of Z. 6.59. (a) Let K 3 be the covariance matrix in Problem 6.55. Find the corresponding Q2 and Q3 in Example 6.23. (b) Find the conditional pdf of X3 given X1 and X2 . 6.60. In Example 6.23, show that: 1 2 1x n - m n2TQn1x n - m n2 - 211x n - 1 - m n - 12TQn - 11x n - 1 - m n - 12 = Qnn51xn - mn2 + B62 - QnnB2 where B = 1 n-1 Qjk1xj - mj2 and Qnn ja =1 ƒ K n ƒ / ƒ K n - 1 ƒ = Qnn . 6.61. Find the pdf of the sum of Gaussian random variables in the following cases: (a) Z = X1 + X2 + X3 in Problem 6.55. (b) Z = X + Y + Z in Problem 6.56. (c) Z = Y1 + Y2 + Y3 in Problem 6.57. 6.62. Find the joint characteristic function of the jointly Gaussian random vector X in Problem 6.54. 6.63. Suppose that a jointly Gaussian random vector X has zero mean vector and the covariance matrix given in Problem 6.51. (a) Find the joint characteristic function. (b) Can you obtain an expression for the joint pdf? Explain your answer. 6.64. Let X and Y be jointly Gaussian random variables. Derive the joint characteristic function for X and Y using conditional expectation. 6.65. Let X = 1X1 , X2 , Á , Xn2 be jointly Gaussian random variables. Derive the characteristic function for X by carrying out the integral in Eq. (6.32). Hint: You will need to complete the square as follows: 1x - jKv2TK-11x - jKv2 = xTK-1x - 2jxTv + j2vTKv. 6.66. Find E[X2Y2] for jointly Gaussian random variables from the characteristic function. 6.67. Let X = 1X1 , X2 , X3 , X42 be zero-mean jointly Gaussian random variables. Show that E3X1X2X3X44 = E3X1X24E3X3X44 + E3X1X34E3X2X44 + E3X1X44E3X2X34. Section 6.5: Mean Square Estimation 6.68. Let X and Y be discrete random variables with three possible joint pmf’s: (i) X/Y -1 0 1 (ii) X/Y -1 0 1 -1 1/6 1/6 0 -1 1/9 1/9 1/9 -1 1/3 0 0 0 1 0 1 0 1 0 0 1/3 1/6 1/6 0 X/Y -1 0 (iii) 1 1/9 1/9 1/9 1/9 1/9 1/9 0 1/3 0 0 0 1/3 356 Chapter 6 6.69. 6.70. 6.71. 6.72. 6.73. 6.74. 6.75. 6.76. 6.77. Vector Random Variables (a) Find the minimum mean square error linear estimator for Y given X. (b) Find the minimum mean square error estimator for Y given X. (c) Find the MAP and ML estimators for Y given X. (d) Compare the mean square error of the estimators in parts a, b, and c. Repeat Problem 6.68 for the continuous random variables X and Y in Problem 5.26. Find the ML estimator for the signal s in Problem 6.4. Let N1 be the number of Web page requests arriving at a server in the period (0, 100) ms and let N2 be the total combined number of Web page requests arriving at a server in the period (0, 200) ms. Assume page requests occur every 1-ms interval according to independent Bernoulli trials with probability of success p. (a) Find the minimum linear mean square estimator for N2 given N1 and the associated mean square error. (b) Find the minimum mean square error estimator for N2 given N1 and the associated mean square error. (c) Find the maximum a posteriori estimator for N2 given N1 . (d) Repeat parts a, b, and c for the estimation of N1 given N2 . Let Y = X + N where X and N are independent Gaussian random variables with different variances and N is zero mean. (a) Plot the correlation coefficient between the “observed signal” Y and the “desired signal” X as a function of the signal-to-noise ratio sX/sN . (b) Find the minimum mean square error estimator for X given Y. (c) Find the MAP and ML estimators for X given Y. (d) Compare the mean square error of the estimators in parts a, b and c. Let X, Y, Z be the random variables in Problem 6.7. (a) Find the minimum mean square error linear estimator for Y given X and Z. (b) Find the minimum mean square error estimator for Y given X and Z. (c) Find the MAP and ML estimators for Y given X and Z. (d) Compare the mean square error of the estimators in parts b and c. (a) Repeat Problem 6.73 for the estimator of X2 , given X1 and X3 in Problem 6.13. (b) Repeat Problem 6.73 for the estimator of X3 given X1 and X2 . Consider the ideal multiuser communication system in Problem 6.21. Assume the transmitted bits bk are independent and equally likely to be +1 or -1. (a) Find the ML and MAP estimators for b given the observation Y. (b) Find the minimum mean square linear estimator for b given the observation Y. How can this estimator be used in deciding what were the transmitted bits? Repeat Problem 6.75 for the multiuser system in Problem 6.58. A second-order predictor for samples of an image predicts the sample E as a linear function of sample D to its left and sample B in the previous line, as shown below: line j A B C Á Á line j + 1 D E Á Á Estimate for E = aD + bB. (a) Find a and b if all samples have variance s2 and if the correlation coefficient between D and E is r, between B and E is r, and between D and B is r2. (b) Find the mean square error of the predictor found in part a, and determine the reduction in the variance of the signal in going from the input to the output of the predictor. Problems 357 6.78. Show that the mean square error of the two-tap linear predictor is given by Eq. (6.64). 6.79. In “hexagonal sampling” of an image, the samples in consecutive lines are offset relative to each other as shown below: line j line j + 1 Á Á A C B D The covariance between two samples a and b is given by rd1a,b2 where d(a, b) is the Euclidean distance between the points. In the above samples, the distance between A and B, A and C, A and D, C and D, and B and D is 1. Suppose we wish to use a two-tap linear predictor to predict the sample D. Which two samples from the set 5A, B, C6 should we use in the predictor? What is the resulting mean square error? *Section 6.6: Generating Correlated Vector Random Variables 6.80. Find a linear transformation that diagonalizes K. (a) K = B 2 1 1 R. 4 (b) K = B 4 1 1 R. 4 6.81. Generate and plot the scattergram of 1000 pairs of random variables Y with the covariance matrices in Problem 6.80 if: (a) X1 and X2 are independent random variables that are each uniform in the unit interval; (b) X1 and X2 are independent zero-mean, unit-variance Gaussian random variables. 6.82. Let X = 1X1 , X2 , X32 be the jointly Gaussian random variables in Problem 6.55. (a) Find a linear transformation that diagonalizes the covariance matrix. (b) Generate 1000 triplets of Y = AX and plot the scattergrams for Y1 and Y2 , Y1 and Y3 , and Y2 and Y3 . Confirm that the scattergrams are what is expected. 6.83. Let X be a jointly Gaussian random vector with mean m X and covariance matrix K X and let A be a matrix that diagonalizes K X . What is the joint pdf of A-11X - m X2? 6.84. Let X1 , X2 , Á , Xn be independent zero-mean, unit-variance Gaussian random variables. Let Yk = 1Xk + Xk - 12/2, that is, Yk is the moving average of pairs of values of X. Assume X-1 = 0 = Xn + 1 . (a) Find the covariance matrix of the Yk’s. (b) Use Octave to generate a sequence of 1000 samples Y1 , Á , Yn . How would you check whether the Yk’s have the correct covariances? 6.85. Repeat Problem 6.84 with Yk = Xk - Xk - 1 . 6.86. Let U be an orthogonal matrix. Show that if A diagonalizes the covariance matrix K, then B = UA also diagonalizes K. 6.87. The transformation in Problem 6.56 is said to be “causal” because each output depends only on “past” inputs. (a) Find the covariance matrix of X, Y, Z in Problem 6.56. (b) Find a noncausal transformation that diagonalizes the covariance matrix in part a. 6.88. (a) Find a causal transformation that diagonalizes the covariance matrix in Problem 6.54. (b) Repeat for the covariance matrix in Problem 6.55. 358 Chapter 6 Vector Random Variables Problems Requiring Cumulative Knowledge 6.89. Let U0 , U1 , Á be a sequence of independent zero-mean, unit-variance Gaussian random variables. A “low-pass filter” takes the sequence Ui and produces the output sequence Xn = 1Un + Un - 12/2, and a “high-pass filter” produces the output sequence Yn = 1Un - Un - 12/2. (a) Find the joint pdf of Xn + 1, Xn , and Xn - 1 ; of Xn , Xn + m,and Xn + 2m , m 7 1. (b) Repeat part a for Yn . (c) Find the joint pdf of Xn , Xm, Yn, and Ym . (d) Find the corresponding joint characteristic functions in parts a, b, and c. 6.90. Let X1 , X2 , Á , Xn be the samples of a speech waveform in Example 6.31. Suppose we want to interpolate for the value of a sample in terms of the previous and the next samples, that is, we wish to find the best linear estimate for X2 in terms of X1 and X3 . (a) Find the coefficients of the best linear estimator (interpolator). (b) Find the mean square error of the best linear interpolator and compare it to the mean square error of the two-tap predictor in Example 6.31. (c) Suppose that the samples are jointly Gaussian. Find the pdf of the interpolation error. 6.91. Let X1 , X2 , Á , Xn be samples from some signal. Suppose that the samples are jointly Gaussian random variables with covariance s2 for i = j COV1Xi , Xj2 = c rs2 for ƒ i - j ƒ = 1 0 otherwise. Suppose we take blocks of two consecutive samples to form a vector X, which is then linearly transformed to form Y = AX. (a) Find the matrix A so that the components of Y are independent random variables. (b) Let X i and X i + 1 be two consecutive blocks and let Yi and Yi + 1 be the corresponding transformed variables. Are the components of Yi and Yi + 1 independent? 6.92. A multiplexer combines N digital television signals into a common communications line. TV signal n generates Xn bits every 33 milliseconds, where Xn is a Gaussian random variable with mean m and variance s2. Suppose that the multiplexer accepts a maximum total of T bits from the combined sources every 33 ms, and that any bits in excess of T are discarded. Assume that the N signals are independent. (a) Find the probability that bits are discarded in a given 33-ms period, if we let T = ma + ts, where ma is the mean total bits generated by the combined sources, and s is the standard deviation of the total number of bits produced by the combined sources. (b) Find the average number of bits discarded per period. (c) Find the long-term fraction of bits lost by the multiplexer. (d) Find the average number of bits per source allocated in part a, and find the average number of bits lost per source. What happens as N becomes large? (e) Suppose we require that t be adjusted with N so that the fraction of bits lost per source is kept constant. Find an equation whose solution yields the desired value of t. (f) Do the above results change if the signals have pairwise covariance r? 6.93. Consider the estimation of T given N1 and arrivals in Problem 6.17. (a) Find the ML and MAP estimators for T. (b) Find the linear mean square estimator for T. (c) Repeat parts a and b if N1 and N2 are given. CHAPTER Random Processes 9 In certain random experiments, the outcome is a function of time or space. For example, in speech recognition systems, decisions are made on the basis of a voltage waveform corresponding to a speech utterance. In an image processing system, the intensity and color of the image varies over a rectangular region. In a peer-to-peer network, the number of peers in the system varies with time. In some situations, two or more functions of time may be of interest. For example, the temperature in a certain city and the demand placed on the local electric power utility vary together in time. The random time functions in the above examples can be viewed as numerical quantities that evolve randomly in time or space. Thus what we really have is a family of random variables indexed by the time or space variable. In this chapter we begin the study of random processes. We will proceed as follows: • In Section 9.1 we introduce the notion of a random process (or stochastic process), which is defined as an indexed family of random variables. • We are interested in specifying the joint behavior of the random variables within a family (i.e., the temperature at two time instants). In Section 9.2 we see that this is done by specifying joint distribution functions, as well as mean and covariance functions. • In Sections 9.3 to 9.5 we present examples of stochastic processes and show how models of complex processes can be developed from a few simple models. • In Section 9.6 we introduce the class of stationary random processes that can be viewed as random processes in “steady state.” • In Section 9.7 we investigate the continuity properties of random processes and define their derivatives and integrals. • In Section 9.8 we examine the properties of time averages of random processes and the problem of estimating the parameters of a random process. • In Section 9.9 we describe methods for representing random processes by Fourier series and by the Karhunen-Loeve expansion. • Finally, in Section 9.10 we present methods for generating random processes. 487 488 9.1 Chapter 9 Random Processes DEFINITION OF A RANDOM PROCESS Consider a random experiment specified by the outcomes z from some sample space S, by the events defined on S, and by the probabilities on these events. Suppose that to every outcome z H S, we assign a function of time according to some rule: t H I. X1t, z2 The graph of the function X1t, z2 versus t, for z fixed, is called a realization, sample path, or sample function of the random process. Thus we can view the outcome of the random experiment as producing an entire function of time as shown in Fig. 9.1. On the other hand, if we fix a time tk from the index set I, then X1tk , z2 is a random variable (see Fig. 9.1) since we are mapping z onto a real number. Thus we have created a family (or ensemble) of random variables indexed by the parameter t, 5X1t, z2, t H I6. This family is called a random process. We also refer to random processes as stochastic processes. We usually suppress the z and use X(t) to denote a random process. A stochastic process is said to be discrete-time if the index set I is a countable set (i.e., the set of integers or the set of nonnegative integers). When dealing with discretetime processes, we usually use n to denote the time index and Xn to denote the random process. A continuous-time stochastic process is one in which I is continuous (i.e., the real line or the nonnegative real line). The following example shows how we can imagine a stochastic process as resulting from nature selecting z at the beginning of time and gradually revealing it in time through X1t, z2. X(t, z1) t1 t2 t3 t2 t3 t tk X(t, z2) t1 tk t X(t, z3) t1 t2 FIGURE 9.1 Several realizations of a random process. t3 tk t Section 9.1 Example 9.1 Definition of a Random Process 489 Random Binary Sequence Let z be a number selected at random from the interval S = 30, 14, and let b1b2 Á be the binary expansion of z: z = a bi 2 -i q i=1 where bi H 50, 16. Define the discrete-time random process X1n, z2 by n = 1, 2, Á . X1n, z2 = bn The resulting process is sequence of binary numbers, with X1n, z2 equal to the nth number in the binary expansion of z. Example 9.2 Random Sinusoids Let z be selected at random from the interval 3-1, 14. Define the continuous-time random process X1t, z2 by X1t, z2 = z cos12pt2 -q 6 t 6 q. The realizations of this random process are sinusoids with amplitude z, as shown in Fig. 9.2(a). Let z be selected at random from the interval 1-p, p2 and let Y1t, z2 = cos12pt + z2. The realizations of Y1t, z2 are phase-shifted versions of cos 2pt as shown in Fig 9.2(b). z ⫽ 0.6 z ⫽ 0.9 z ⫽ ⫺0.2 t (a) z ⫽ p/ 4 z⫽0 t (b) FIGURE 9.2 (a) Sinusoid with random amplitude, (b) Sinusoid with random phase. 490 Chapter 9 Random Processes The randomness in z induces randomness in the observed function X1t, z2. In principle, one can deduce the probability of events involving a stochastic process at various instants of time from probabilities involving z by using the equivalent-event method introduced in Chapter 4. Example 9.3 Find the following probabilities for the random process introduced in Example 9.1: P3X11, z2 = 04 and P3X11, z2 = 0 and X12, z2 = 14. The probabilities are obtained by finding the equivalent events in terms of z: P3X11, z2 = 04 = Pc0 … z 6 1 1 d = 2 2 P3X11, z2 = 0 and X12, z2 = 14 = Pc 1 1 1 … z 6 d = , 4 2 4 since all points in the interval 30 … z … 14 begin with b1 = 0 and all points in 31/4, 1/22 begin with b1 = 0 and b2 = 1. Clearly, any sequence of k bits has a corresponding subinterval of length (and hence probability) 2 -k. Example 9.4 Find the pdf of X0 = X1t0 , z2 and Y1t0 , z2 in Example 9.2. If t0 is such that cos12pt02 = 0, then X1t0 , z2 = 0 for all z and the pdf of X1t02 is a delta function of unit weight at x = 0. Otherwise, X1t0 , z2 is uniformly distributed in the interval 1-cos 2pt0 , cos 2pt02 since z is uniformly distributed in 3-1, 14 (see Fig. 9.3a). Note that the pdf of X1t0 , z2 depends on t0 . The approach used in Example 4.36 can be used to show that Y1t0 , z2 has an arcsine distribution: 1 ƒyƒ 6 1 , fY1y2 = p21 - y2 (see Fig. 9.3b). Note that the pdf of Y1t0 , z2 does not depend on t0 . Figure 9.3(c) shows a histogram of 1000 samples of the amplitudes X1t0 , z2 at t0 = 0, which can be seen to be approximately uniformly distributed in 3 -1, 14. Figure 9.3(d) shows the histogram for the samples of the sinusoid with random phase. Clearly there is agreement with the arcsine pdf. In general, the sample paths of a stochastic process can be quite complicated and cannot be described by simple formulas. In addition, it is usually not possible to identify an underlying probability space for the family of observed functions of time. Thus the equivalent-event approach for computing the probability of events involving X1t, z2 in terms of the probabilities of events involving z does not prove useful in Section 9.2 fX(t0)(x) 491 Specifying a Random Process fY(t0)(x) 1/2 cos 2πt0 x 0 ⫺ cos 2πt0 y ⫺1 cos 2πt0 0 (a) 1 (b) 0.2 0.1 0.08 0.15 0.06 0.1 0.04 0.05 0.02 0 ⫺1 ⫺0.5 0 (c) 0.5 1 0 ⫺1 ⫺0.5 0 (d) 0.5 1 FIGURE 9.3 (a) pdf of sinusoid with random amplitude. (b) pdf of sinusoid with random phase. (c) Histogram of samples from uniform amplitude sinusoid at t = 0. (d) Histogram of samples from random phase sinusoid at t = 0. practice. In the next section we show an alternative method for specifying the probabilities of events involving a stochastic process. 9.2 SPECIFYING A RANDOM PROCESS There are many questions regarding random processes that cannot be answered with just knowledge of the distribution at a single time instant. For example, we may be interested in the temperature at a given locale at two different times. This requires the following information: P3x1 6 X1t12 … x1 , x2 6 X1t22 … x24. In another example, the speech compression system in a cellular phone predicts the value of the speech signal at the next sampling time based on the previous k samples. Thus we may be interested in the following probability: P3a 6 X1tk + 12 … b ƒ X1t12 = x1 , X1t22 = x2 , Á , X1tk2 = xk4. 492 Chapter 9 Random Processes It is clear that a general description of a random process should provide probabilities for vectors of samples of the process. 9.2.1 Joint Distributions of Time Samples Let X1 , X2 , Á , Xk be the k random variables obtained by sampling the random process X1t, z2 at the times t1 , t2 , Á , tk: X1 = X1t1 , z2, X2 = X1t2 , z,2, Á , Xk = X1tk , z2, as shown in Fig. 9.1. The joint behavior of the random process at these k time instants is specified by the joint cumulative distribution of the vector random variable X1 , X2 , Á , Xk . The probabilities of any event involving the random process at all or some of these time instants can be computed from this cdf using the methods developed for vector random variables in Chapter 6. Thus, a stochastic process is specified by the collection of kth-order joint cumulative distribution functions: FX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1t12 … x1 , X1t22 … x2 , Á , X1tk2 … xk4, (9.1) for any k and any choice of sampling instants t1 , Á , tk . Note that the collection of cdf’s must be consistent in the sense that lower-order cdf’s are obtained as marginals of higher-order cdf’s. If the stochastic process is continuous-valued, then a collection of probability density functions can be used instead: fX1, Á , Xk1x1 , x2 , Á , xk2 dx1 Á dxn = P5x1 6 X1t12 … x1 + dx1 , Á , xk 6 X1tk2 … xk + dxk4. (9.2) If the stochastic process is discrete-valued, then a collection of probability mass functions can be used to specify the stochastic process: pX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1t12 = x1 , X1t22 = x2 , Á , X1tk2 = xk4 (9.3) for any k and any choice of sampling instants n1 , Á , nk . At first glance it does not appear that we have made much progress in specifying random processes because we are now confronted with the task of specifying a vast collection of joint cdf’s! However, this approach works because most useful models of stochastic processes are obtained by elaborating on a few simple models, so the methods developed in Chapters 5 and 6 of this book can be used to derive the required cdf’s. The following examples give a preview of how we construct complex models from simple models. We develop these important examples more fully in Sections 9.3 to 9.5. Example 9.5 iid Bernoulli Random Variables Let Xn be a sequence of independent, identically distributed Bernoulli random variables with p = 1/2. The joint pmf for any k time samples is then 1 k P3X1 = x1 , X2 = x2 , Á , Xk = xk4 = P3X1 = x14 Á P3Xk = xk4 = a b 2 Section 9.2 Specifying a Random Process 493 where xi H 50, 16 for all i. This binary random process is equivalent to the one discussed in Example 9.1. Example 9.6 iid Gaussian Random Variables Let Xn be a sequence of independent, identically distributed Gaussian random variables with zero mean and variance s2X . The joint pdf for any k time samples is then fX1,X2, Á ,Xk1x1 , x2 , Á , xk2 = 1 12ps 2 2 k/2 2 2 e -1x1 + x2 + Á + x 22/2s2 k . The following two examples show how more complex and interesting processes can be built from iid sequences. Example 9.7 Binomial Counting Process Let Xn be a sequence of independent, identically distributed Bernoulli random variables with p = 1/2. Let Sn be the number of 1’s in the first n trials: Sn = X1 + X2 + Á + Xn for n = 0, 1, Á . Sn is an integer-valued nondecreasing function of n that grows by unit steps after a random number of time instants. From previous chapters we know that Sn is a binomial random variable with parameters n and p = 1/2. In the next section we show how to find the joint pmf’s of Sn using conditional probabilities. Example 9.8 Filtered Noisy Signal Let Xj be a sequence of independent, identically distributed observations of a signal voltage m corrupted by zero-mean Gaussian noise Nj with variance s2: Xj = m + Nj for j = 0, 1, Á . Consider the signal that results from averaging the sequence of observations: Sn = 1X1 + X2 + Á + Xn2/n for n = 0, 1, Á . From previous chapters we know that Sn is the sample mean of an iid sequence of Gaussian random variables. We know that Sn itself is a Gaussian random variable with mean m and variance s2/n, and so it tends towards the value m as n increases. In a later section, we show that Sn is an example from the class of Gaussian random processes. 9.2.2 The Mean, Autocorrelation, and Autocovariance Functions The moments of time samples of a random process can be used to partially specify the random process because they summarize the information contained in the joint cdf’s. 494 Chapter 9 Random Processes The mean function mX1t2 and the variance function VAR[X(t)] of a continuous-time random process X(t) are defined by mX1t2 = E3X1t24 = q L- q xfX1t21x2 dx, (9.4) and q VAR3X1t24 = L- q 1x - mX1t222 fX1t21x2 dx, (9.5) where fX1t21x2 is the pdf of X(t). Note that mX1t2 and VAR[X(t)] are deterministic functions of time. Trends in the behavior of X(t) are reflected in the variation of mX1t2 with time. The variance gives an indication of the spread in the values taken on by X(t) at different time instants. The autocorrelation RX(t1 , t2) of a random process X(t) is defined as the joint moment of X1t12 and X1t22: RX1t1 , t22 = E3X1t12X1t224 = q q L- q L- q xyfX1t12,X1t221x, y2 dx dy, (9.6) where fX1t12,X1t221x, y2 is the second-order pdf of X(t). In general, the autocorrelation is a function of t1 and t2 . Note that RX1t, t2 = E3X21t24. The autocovariance CX(t1 , t2) of a random process X(t) is defined as the covariance of X1t12 and X1t22: CX1t1 , t22 = E35X1t12 - mX1t1265X1t22 - mX1t2264. (9.7) From Eq. (5.30), the autocovariance can be expressed in terms of the autocorrelation and the means: (9.8) CX1t1 , t22 = RX1t1 , t22 - mX1t12mX1t22. Note that the variance of X(t) can be obtained from CX1t1 , t22: VAR3X1t24 = E31X1t2 - mX1t2224 = CX1t, t2. (9.9) The correlation coefficient of X(t) is defined as the correlation coefficient of X1t12 and X1t22 (see Eq. 5.31): rX1t1 , t22 = CX1t1 , t22 2CX1t1 , t122CX1t2 , t22 . (9.10) From Eq. (5.32) we have that ƒ rX1t1 , t22 ƒ … 1. Recall that the correlation coefficient is a measure of the extent to which a random variable can be predicted as a linear function of another. In Chapter 10, we will see that the autocovariance function and the autocorrelation function play a critical role in the design of linear methods for analyzing and processing random signals. Section 9.2 Specifying a Random Process 495 The mean, variance, autocorrelation, and autocovariance functions for discretetime random processes are defined in the same manner as above. We use a slightly different notation for the time index. The mean and variance of a discrete-time random process Xn are defined as: mX1n2 = E3Xn4 and VAR3Xn4 = E31Xn - mX1n2224. (9.11) The autocorrelation and autocovariance functions of a discrete-time random process Xn are defined as follows: RX1n1 , n22 = E3X1n12X1n224 (9.12) and CX1n1 , n22 = E35X1n12 - mX1n1265X1n22 - mX1n2264 = RX1n1 , n22 - mX1n12mX1n22. (9.13) Before proceeding to examples, we reiterate that the mean, autocorrelation, and autocovariance functions are only partial descriptions of a random process. Thus we will see later in the chapter that it is possible for two quite different random processes to have the same mean, autocorrelation, and autocovariance functions. Example 9.9 Sinusoid with Random Amplitude Let X1t2 = A cos 2pt, where A is some random variable (see Fig. 9.2a). The mean of X(t) is found using Eq. (4.30): mX1t2 = E3A cos 2pt4 = E3A4 cos 2pt. Note that the mean varies with t. In particular, note that the process is always zero for values of t where cos 2pt = 0. The autocorrelation is RX1t1 , t22 = E3A cos 2pt1 A cos 2pt24 = E3A24 cos 2pt1 cos 2pt2 , and the autocovariance is then CX1t1 , t22 = RX1t1 , t22 - mX1t12mX1t22 = 5E3A24 - E3A426 cos 2pt1 cos 2pt2 = VAR3A4 cos 2pt1 cos 2pt2 . Example 9.10 Sinusoid with Random Phase Let X1t2 = cos1vt + ®2, where ® is uniformly distributed in the interval 1-p, p2 (see Fig. 9.2b). The mean of X(t) is found using Eq. (4.30): 496 Chapter 9 Random Processes mX1t2 = E3cos1vt + ®24 = p 1 cos1vt + u2 du = 0. 2p L-p The autocorrelation and autocovariance are then CX1t1 , t22 = RX1t1 , t22 = E3cos1vt1 + ®2 cos1vt2 + ®24 p = 1 1 5cos1v1t1 - t22 + cos1v1t1 + t22 + 2u26 du 2p L-p 2 = 1 cos1v1t1 - t222, 2 where we used the identity cos(a) cos1b2 = 1/2 cos1a + b2 + 1/2 cos1a - b2. Note that mX1t2 is a constant and that CX1t1 , t22 depends only on ƒ t1 - t2 ƒ . Note as well that the samples at time t1 and t2 are uncorrelated if v1t1 - t22 = kp where k is any integer. 9.2.3 Multiple Random Processes In most situations we deal with more than one random process at a time. For example, we may be interested in the temperatures at city a, X(t), and city b, Y(t). Another very common example involves a random process X(t) that is the “input” to a system and another random process Y(t) that is the “output” of the system. Naturally, we are interested in the interplay between X(t) and Y(t). The joint behavior of two or more random processes is specified by the collection of joint distributions for all possible choices of time samples of the processes. Thus for a pair of continuous-valued random processes X(t) and Y(t) we must specify all possible joint density functions of X1t12, Á , X1tk2 and Y1t¿12, Á , Y1t¿j2 for all k, j, and all choices of t1 , Á , tk and t¿1 , Á , t¿j . For example, the simplest joint pdf would be: fX1t12,Y1t221x, y2 dxdy = P5x 6 X1t12 … x + dx, y 6 Y1t22 … y + dy4. Note that the time indices of X(t) and Y(t) need not be the same. For example, we may be interested in the input at time t1 and the output at a later time t2 . The random processes X(t) and Y(t) are said to be independent random processes if the vector random variables X = 1X1t12, Á , X1tk22 and Y = 1Y1t¿12, Á , Y1t¿j22 are independent for all k, j, and all choices of t1 , Á , tk and t¿1 , Á , t¿j: FX,Y (x1, Á ,xk, y1, Á ,yj) = FX (X1, Á ,Xk) FY (y1, Á ,yj). The cross-correlation RX,Y (t1 , t2) of X(t) and Y(t) is defined by RX,Y1t1 , t22 = E3X1t12Y1t224. (9.14) The processes X(t) and Y(t) are said to be orthogonal random processes if RX,Y1t1 , t22 = 0 for all t1 and t2 . (9.15) Section 9.2 Specifying a Random Process 497 The cross-covariance CX,Y(t1 , t2) of X(t) and Y(t) is defined by CX,Y1t1 , t22 = E35X1t12 - mX1t1265Y1t22 - mX1t2264 = RX,Y1t1 , t22 - mX1t12mX1t22. (9.16) The processes X(t) and Y(t) are said to be uncorrelated random processes if CX,Y1t1 , t22 = 0 for all t1 and t2 . (9.17) Example 9.11 Let X1t2 = cos1vt + ®2 and Y1t2 = sin1vt + ®2, where ® is a random variable uniformly distributed in 3-p, p4. Find the cross-covariance of X(t) and Y(t). From Example 9.10 we know that X(t) and Y(t) are zero mean. From Eq. (9.16), the crosscovariance is then equal to the cross-correlation: CX,Y1t1 , t22 = RX,Y1t1 , t22 = E3cos1vt1 + ®2 sin1vt2 + ®24 1 1 = Ec - sin1v1t1 - t222 + sin1v1t1 + t22 + 2®2 d 2 2 1 = - sin1v1t1 - t222, 2 since E3sin1v1t1 + t22 + 2®24 = 0. X(t) and Y(t) are not uncorrelated random processes because the cross-covariance is not equal to zero for all choices of time samples. Note, however, that X1t12 and Y1t22 are uncorrelated random variables for t1 and t2 such that v1t1 - t22 = kp where k is any integer. Example 9.12 Signal Plus Noise Suppose process Y(t) consists of a desired signal X(t) plus noise N(t): Y1t2 = X1t2 + N1t2. Find the cross-correlation between the observed signal and the desired signal assuming that X(t) and N(t) are independent random processes. From Eq. (8.14), we have RXY1t1 , t22 = E3X1t12Y1t224 = E3X1t125X1t22 + N1t2264 = RX1t1 , t22 + E3X1t124E3N1t224 = RX1t1 , t22 + mX1tl2mN1t22, where the third equality followed from the fact that X(t) and N(t) are independent. 498 9.3 Chapter 9 Random Processes DISCRETE-TIME PROCESSES: SUM PROCESS, BINOMIAL COUNTING PROCESS, AND RANDOM WALK In this section we introduce several important discrete-time random processes. We begin with the simplest class of random processes—independent, identically distributed sequences—and then consider the sum process that results from adding an iid sequence. We show that the sum process satisfies the independent increments property as well as the Markov property. Both of these properties greatly facilitate the calculation of joint probabilities. We also introduce the binomial counting process and the random walk process as special cases of sum processes. 9.3.1 iid Random Process Let Xn be a discrete-time random process consisting of a sequence of independent, identically distributed (iid) random variables with common cdf FX1x2, mean m, and variance s2. The sequence Xn is called the iid random process. The joint cdf for any time instants n1 , Á , nk is given by FX1, Á , Xk1x1 , x2 , Á , xk2 = P3X1 … x1 , X2 … x2 , Á , Xk … xk4 = FX1x12FX1x22 Á FX1xk2, (9.18) where, for simplicity, Xk denotes Xnk . Equation (9.18) implies that if Xn is discretevalued, the joint pmf factors into the product of individual pmf’s, and if Xn is continuous-valued, the joint pdf factors into the product of the individual pdf’s. The mean of an iid process is obtained from Eq. (9.4): mX1n2 = E3Xn4 = m for all n. (9.19) Thus, the mean is constant. The autocovariance function is obtained from Eq. (9.6) as follows. If n1 Z n2 , then CX1n1 , n22 = E31Xn1 - m21Xn2 - m24 = E31Xn1 - m24E31Xn2 - m24 = 0, since Xn1 and Xn2 are independent random variables. If n1 = n2 = n, then CX1n1 , n22 = E31Xn - m224 = s2. We can express the autocovariance of the iid process in compact form as follows: CX1n1 , n22 = s2dn1n2 , (9.20) where dn1n2 = 1 if n1 = n2 , and 0 otherwise. Therefore the autocovariance function is zero everywhere except for n1 = n2 . The autocorrelation function of the iid process is found from Eq. (9.7): RX1n1 , n22 = CX1n1 , n22 + m2. (9.21) Section 9.3 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk In 499 Sn 5 4 3 2 1 0 1 0 1 2 3 4 5 6 7 n 8 0 0 1 2 3 4 (a) 5 6 7 8 n (b) FIGURE 9.4 (a) Realization of a Bernoulli process. In = 1 indicates that a light bulb fails and is replaced on day n. (b) Realization of a binomial process. Sn denotes the number of light bulbs that have failed up to time n. Example 9.13 Bernoulli Random Process Let In be a sequence of independent Bernoulli random variables. In is then an iid random process taking on values from the set 50, 16. A realization of such a process is shown in Fig. 9.4(a). For example, In could be an indicator function for the event “a light bulb fails and is replaced on day n.” Since In is a Bernoulli random variable, it has mean and variance mI1n2 = p VAR3In4 = p11 - p2. The independence of the In’s makes probabilities easy to compute. For example, the probability that the first four bits in the sequence are 1001 is P3I1 = 1, I2 = 0, I3 = 0, I4 = 14 = P3I1 = 14P3I2 = 04P3I3 = 04P3I4 = 14 = p211 - p22. Similarly, the probability that the second bit is 0 and the seventh is 1 is P3I2 = 0, I7 = 14 = P3I2 = 04P3I7 = 14 = p11 - p2. Example 9.14 Random Step Process An up-down counter is driven by +1 or -1 pulses. Let the input to the counter be given by Dn = 2In - 1, where In is the Bernoulli random process, then Dn = b +1 -1 if In = 1 if In = 0. For example, Dn might represent the change in position of a particle that moves along a straight line in jumps of ;1 every time unit. A realization of Dn is shown in Fig. 9.5(a). 500 Chapter 9 Random Processes Sn Dn 3 2 1 1 0 1 2 3 4 5 6 7 8 ⫺1 9 10 11 12 n n 0 ⫺1 (b) (a) FIGURE 9.5 (a) Realization of a random step process. Dn ⴝ 1 implies that the particle moves one step to the right at time n. (b) Realization of a random walk process. Sn denotes the position of a particle at time n. The mean of Dn is mD1n2 = E3Dn4 = E32In - 14 = 2E3In4 - 1 = 2p - 1. The variance of Dn is found from Eqs. (4.37) and (4.38): VAR3Dn4 = VAR32In - 14 = 2 2 VAR3In4 = 4p11 - p2. The probabilities of events involving Dn are computed as in Example 9.13. 9.3.2 Independent Increments and Markov Properties of Random Processes Before proceeding to build random processes from iid processes, we present two very useful properties of random processes. Let X(t) be a random process and consider two time instants, t1 6 t2 . The increment of the random process in the interval t1 6 t … t2 is defined as X1t22 - X1t12. A random process X(t) is said to have independent increments if the increments in disjoint intervals are independent random variables, that is, for any k and any choice of sampling instants t1 6 t2 6 Á 6 tk , the associated increments X1t22 - X1t12, X1t32 - X1t22, Á , X1tk2 - X1tk - 12 are independent random variables. In the next subsection, we show that the joint pdf (pmf) of X1t12, X1t22, Á , X1tk2 is given by the product of the pdf (pmf) of X1t12 and the marginal pdf’s (pmf’s) of the individual increments. Another useful property of random processes that allows us to readily obtain the joint probabilities is the Markov property. A random process X(t) is said to be Markov if the future of the process given the present is independent of the past; that is, for any k and any choice of sampling instants t1 6 t2 6 Á 6 tk and for any x1 , x2 , Á , xk , fX1tk21xk ƒ X1tk - 12 = xk - 1 , Á , X1t12 = x12 = fX1tk21xk ƒ X1tk - 12 = xk - 12 (9.22) Section 9.3 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk Xn ⫹ Sn⫺1 501 Sn ⫽ Sn⫺1 ⫹ Xn Unit delay FIGURE 9.6 The sum process Sn ⴝ X1 ⴙ Á ⴙ Xn , S0 ⴝ 0, can be generated in this way. if X(t) is continuous-valued, and P3X1tk2 = xk ƒ X1tk - 12 = xk - 1 , Á , X1t12 = x14 = P3X1tk2 = xk ƒ X1tk - 12 = xk - 14 (9.23) if X(t) is discrete-valued. The expressions on the right-hand side of the above two equations are called the transition pdf and transition pmf, respectively. In the next sections we encounter several processes that satisfy the Markov property. Chapter 11 is entirely devoted to random processes that satisfy this property. It is easy to show that a random process that has independent increments is also a Markov process. The converse is not true; that is, the Markov property does not imply independent increments. 9.3.3 Sum Processes: The Binomial Counting and Random Walk Processes Many interesting random processes are obtained as the sum of a sequence of iid random variables, X1 , X2 , Á : Sn = X1 + X2 + Á + Xn = Sn - 1 + Xn , n = 1, 2, Á (9.24) where S0 = 0. We call Sn the sum process. The pdf or pmf of Sn is found using the convolution or characteristic-equation methods presented in Section 7.1. Note that Sn depends on the “past,” S1 , Á , Sn - 1 , only through Sn - 1 , that is, Sn is independent of the past when Sn - 1 is known. This can be seen clearly from Fig. 9.6, which shows a recursive procedure for computing Sn in terms of Sn - 1 and the increment Xn . Thus Sn is a Markov process. Example 9.15 Binomial Counting Process Let the Ii be the sequence of independent Bernoulli random variables in Example 9.13, and let Sn be the corresponding sum process. Sn is then the counting process that gives the number of successes in the first n Bernoulli trials. The sample function for Sn corresponding to a particular sequence of Ii’s is shown in Fig. 9.4(b). Note that the counting process can only increase over time. Note as well that the binomial process can increase by at most one unit at a time. If In indicates that a light bulb fails and is replaced on day n, then Sn denotes the number of light bulbs that have failed up to day n. 502 Chapter 9 Random Processes Since Sn is the sum of n independent Bernoulli random variables, Sn is a binomial random variable with parameters n and p = P3I = 14: n P3Sn = j4 = ¢ ≤ p j11 - p2n - j j for 0 … j … n, and zero otherwise. Thus Sn has mean np and variance np11 - p2. Note that the mean and variance of this process grow linearly with time. This reflects the fact that as time progresses, that is, as n grows, the range of values that can be assumed by the process increases. If p 7 0 then we also know that Sn has a tendency to grow steadily without bound over time. The Markov property of the binomial counting process is easy to deduce. Given that the current value of the process at time n - 1 is Sn - 1 = k, the process at the next time instant will be k with probability 1 - p or k + 1 with probability p. Once we know the value of the process at time n - 1, the values of the random process prior to time n - 1 are irrelevant. Example 9.16 One-Dimensional Random Walk Let Dn be the iid process of ;1 random variables in Example 9.14, and let Sn be the corresponding sum process. Sn can represent the position of a particle at time n. The random process Sn is an example of a one-dimensional random walk. A sample function of Sn is shown in Fig. 9.5(b). Unlike the binomial process, the random walk can increase or decrease over time. The random walk process changes by one unit at a time. The pmf of Sn is found as follows. If there are k “ +1”s in the first n trials, then there are n - k “-1”s, and Sn = k - 1n - k2 = 2k - n. Conversely, Sn = j if the number of +1’s is k = 1j + n2/2. If 1j + n2/2 is not an integer, then Sn cannot equal j. Thus n P3Sn = 2k - n4 = ¢ ≤ pk11 - p2n - k k for k H 50, 1, Á , n6. Since k is the number of successes in n Bernoulli trials, the mean of the random walk is: E3Sn4 = 2np - n = n12p - 12. As time progresses, the random walk can fluctuate over an increasingly broader range of positive and negative values. Sn has a tendency to either grow if p 7 1/2, or to decrease if p 6 1/2. The case p = 1/2 provides a precarious balance, and we will see later, in Chapter 12, very interesting dynamics. Figure 9.7(a) shows the first 100 steps from a sample function of the random walk with p = 1/2. Figure 9.7(b) shows four sample functions of the random walk process with p = 1/2 for 1000 steps. Figure 9.7(c) shows four sample functions in the asymmetric case where p = 3/4. Note the strong linear growth trend in the process. The sum process Sn has independent increments in nonoverlapping time intervals. To see this consider two time intervals: n0 6 n … n1 and n2 6 n … n3 , where n1 … n2 . The increments of Sn in these disjoint time intervals are given by Sn1 - Sn0 = Xn0 + 1 + Á + Xn1 Sn3 - Sn2 = Xn2 + 1 + Á + Xn3 . (9.25) Section 9.3 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk 10 8 6 4 2 0 ⫺2 ⫺4 10 20 30 40 50 (a) 60 70 80 90 100 100 200 300 400 500 (b) 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000 60 40 20 0 ⫺20 ⫺40 ⫺60 ⫺80 0 600 500 400 300 200 100 0 0 (c) FIGURE 9.7 (a) Random walk process with p ⴝ 1/2. (b) Four sample functions of symmetric random walk process with p ⴝ 1/2. (c) Four sample functions of asymmetric random walk with p ⴝ 3/4. 503 504 Chapter 9 Random Processes The above increments do not have any of the Xn’s in common, so the independence of the Xn’s implies that the increments 1Sn1 - Sn02 and 1Sn3 - Sn22 are independent random variables. For n¿ 7 n, the increment Sn¿ - Sn is the sum of n¿ - n iid random variables, so it has the same distribution as Sn¿ - n , the sum of the first n¿ - n X’s, that is, P3Sn¿ - Sn = y4 = P3Sn¿ - n = y4. (9.26) Thus increments in intervals of the same length have the same distribution regardless of when the interval begins. For this reason, we also say that Sn has stationary increments. Example 9.17 Independent and Stationary Increments of Binomial Process and Random Walk The independent and stationary increments property is particularly easy to see for the binomial process since the increments in an interval are the number of successes in the corresponding Bernoulli trials. The independent increment property follows from the fact that the numbers of successes in disjoint time intervals are independent. The stationary increments property follows from the fact that the pmf for the increment in a time interval is the binomial pmf with the corresponding number of trials. The increment in a random walk process is determined by the same number of successes as a binomial process. It then follows that the random walk also has independent and stationary increments. The independent and stationary increments property of the sum process Sn makes it easy to compute the joint pmf/pdf for any number of time instants. For simplicity, suppose that the Xn are integer-valued, so Sn is also integer-valued. We compute the joint pmf of Sn at times n1 , n2 , and n3: P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34 = P3Sn1 = y1 , Sn2 - Sn1 = y2 - y1 , Sn3 - Sn2 = y3 - y24, (9.27) since the process is equal to y1 , y2 , and y3 at times n1 , n2 , and n3 , if and only if it is equal to y1 at time n1 , and the subsequent increments are y2 - y1 , and y3 - y2 . The independent increments property then implies that P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34 = P3Sn1 = y14P3Sn2 - Sn1 = y2 - y14P3Sn3 - Sn2 = y3 - y24. (9.28) Finally, the stationary increments property implies that the joint pmf of Sn is given by: P3Sn1 = y1 , Sn2 = y2 , Sn3 = y34 = P3Sn1 = y14P3Sn2 - n1 = y2 - y14P3Sn3 - n2 = y3 - y24. Clearly, we can use this procedure to write the joint pmf of Sn at any time instants n1 6 n2 6 Á 6 nk in terms of the pmf at the initial time instant and the pmf’s of the subsequent increments: Section 9.3 Discrete-Time Processes: Sum Process, Binomial Counting Process, and Random Walk 505 P3Sn1 = y1 , Sn2 = y2 , Á , Snk = yk4 = P3Sn1 = y14P3Sn2 - n1 = y2 - y14 Á P3Snk - nk - 1 = yk - yk - 14. (9.29) If the Xn are continuous-valued random variables, then it can be shown that the joint density of Sn at times n1 , n2 , Á , nk is: fSn , Sn , Á , Sn 1y1 , y2 , Á , yk2 = fSn 1y12fSn - n 1y2 - y12 Á fSn - n 1yk - yk - 12. 1 2 2 1 k 1 k k-1 (9.30) Example 9.18 Joint pmf of Binomial Counting Process Find the joint pmf for the binomial counting process at times n1 and n2 . Find the probability that P3Sn1 = 0, Sn2 = n2 - n14, that is, the first n1 trials are failures and the remaining trials are all successes. Following the above approach we have P3Sn1 = y1 , Sn2 = y24 = P3Sn1 = y14P3Sn2 - Sn1 = y2 - y14 = ¢ n2 - n1 y2 - y1 n 11 - p2n2 - n1 - y2 + y1 ¢ 1 ≤ py111 - p2n1 - y1 ≤p y2 - y1 y1 = ¢ n2 - n1 n1 y2 ≤ ¢ ≤ p 11 - p2n2 - y2. y2 - y1 y1 The requested probability is then: P3Sn1 = 0, Sn2 = n2 - n14 = ¢ n2 - n1 n1 n2 - n1 11 - p2n1 = pn2 - n111 - p2n1 ≤ ¢ ≤p n2 - n1 0 which is what we would obtain from a direct calculation for Bernoulli trials. Example 9.19 Joint pdf of Sum of iid Gaussian Sequence Let Xn be a sequence of iid Gaussian random variables with zero mean and variance s2. Find the joint pdf of the corresponding sum process at times n1 and n2 . From Example 7.3, we know that Sn is a Gaussian random variable with mean zero and variance ns2. The joint pdf of Sn at times nj and n2 is given by fSn , Sn 1y1 , y22 = fSn - n 1y2 - y12fSn 1y12 1 2 2 = 1 1 2 2p1n2 - n12s 1 2 e -1y2 - y12 /321n2 - n12s 4 2 2 1 2 2pn1s2 2 2 e -y1 /2n1s . Since the sum process Sn is the sum of n iid random variables, it has mean and variance: mS1n2 = E3Sn4 = nE3X4 = nm VAR3Sn4 = n VAR3X4 = ns2. (9.31) (9.32) 506 Chapter 9 Random Processes The property of independent increments allows us to compute the autocovariance in an interesting way. Suppose n … k so n = min1n, k2, then CS1n, k2 = E31Sn - nm21Sk - km24 = E31Sn - nm251Sn - nm2 + 1Sk - km2 - 1Sn - nm264 = E31Sn - nm224 + E31Sn - nm21Sk - Sn - 1k - n2m24. Since Sn and the increment Sk - Sn are independent, CS1n, k2 = E31Sn - nm224 + E31Sn - nm24E31Sk - Sn - 1k - n2m24 = E31Sn - nm224 = VAR3Sn4 = ns2, since E3Sn - nm4 = 0. Similarly, if k = min1n, k2, we would have obtained ks2. Therefore the autocovariance of the sum process is CS1n, k2 = min1n, k2s2. (9.33) Example 9.20 Autocovariance of Random Walk Find the autocovariance of the one-dimensional random walk. From Example 9.14 and Eqs. (9.32) and (9.33), Sn has mean n12p - 12 and variance 4np11 - p2. Thus its autocovariance is given by Cs1n, k2 = min1n, k24p11 - p2. Xn ⫹ Yn ⫽ αYn⫺1 ⫹ Xn αYn⫺1 Unit delay ⫻ α (a) Xn Unit delay Xn Unit delay Unit delay Xn⫺1 Xn⫺2 ⫹ Xn⫺k Z n ⫽ Xn ⫹ (b) FIGURE 9.8 (a) First-order autoregressive process; (b) Moving average process. ⫹ Xn⫺k Section 9.4 Poisson and Associated Random Processes 507 The sum process can be generalized in a number of ways. For example, the recursive structure in Fig. 9.6 can be modified as shown in Fig. 9.8(a). We then obtain firstorder autoregressive random processes, which are of interest in time series analysis and in digital signal processing. If instead we use the structure shown in Fig. 9.8(b), we obtain an example of a moving average process. We investigate these processes in Chapter 10. 9.4 POISSON AND ASSOCIATED RANDOM PROCESSES In this section we develop the Poisson random process, which plays an important role in models that involve counting of events and that find application in areas such as queueing systems and reliability analysis. We show how the continuoustime Poisson random process can be obtained as the limit of a discrete-time process. We also introduce several random processes that are derived from the Poisson process. 9.4.1 Poisson Process Consider a situation in which events occur at random instants of time at an average rate of l events per second. For example, an event could represent the arrival of a customer to a service station or the breakdown of a component in some system. Let N(t) be the number of event occurrences in the time interval [0, t]. N(t) is then a nondecreasing, integer-valued, continuous-time random process as shown in Fig. 9.9. 18 16 14 12 10 8 6 4 2 0 0 5 S0 S1 10 15 20 S7 25 30 35 40 45 50 S8 FIGURE 9.9 A sample path of the Poisson counting process. The event occurrence times are denoted by S1 , S2 , Á . The jth interevent time is denoted by Xj = Sj - Sjⴚ1 . 508 Chapter 9 Random Processes Suppose that the interval [0, t] is divided into n subintervals of very short duration d = t>n. Assume that the following two conditions hold: 1. The probability of more than one event occurrence in a subinterval is negligible compared to the probability of observing one or zero events. 2. Whether or not an event occurs in a subinterval is independent of the outcomes in other subintervals. The first assumption implies that the outcome in each subinterval can be viewed as a Bernoulli trial. The second assumption implies that these Bernoulli trials are independent. The two assumptions together imply that the counting process N(t) can be approximated by the binomial counting process discussed in the previous section. If the probability of an event occurrence in each subinterval is p, then the expected number of event occurrences in the interval [0, t] is np. Since events occur at a rate of l events per second, the average number of events in the interval [0, t] is lt. Thus we must have that lt = np. If we now let n : q (i.e., d = t/n : 0) and p : 0 while np = lt remains fixed, then from Eq. (3.40) the binomial distribution approaches a Poisson distribution with parameter lt. We therefore conclude that the number of event occurrences N(t) in the interval [0, t] has a Poisson distribution with mean lt: P3N1t2 = k4 = 1lt2k k! e -lt for k = 0, 1, Á . (9.34a) For this reason N(t) is called the Poisson process. The mean function and the variance function of the Poisson process are given by: mN1t2 = E3N1t2 = k4 = lt and VAR3N1t24 = lt. (9.34b) In Section 11.3 we rederive the Poisson process using results from Markov chain theory. The process N(t) inherits the property of independent and stationary increments from the underlying binomial process. First, the distribution for the number of event occurrences in any interval of length t is given by Eq. (9.34a). Next, the independent and stationary increments property allows us to write the joint pmf for N(t) at any number of points. For example, for t1 6 t2 , P3N1t12 = i, N1t22 = j4 = P3N1t12 = i4P3N1t22 - N1t12 = j - i4 = P3N1t12 = i4P3N1t2 - t12 = j - i4 = 1lt12ie -lt1 1l1t2 - t122je -l1t2 - t12 i! 1j - i2! . (9.35a) The independent increments property also allows us to calculate the autocovariance of N(t). For t1 … t2 : Section 9.4 Poisson and Associated Random Processes 509 CN1t1 , t22 = E31N1t12 - lt121N1t22 - lt224 = E31N1t12 - lt125N1t22 - N1t12 - lt2 + lt1 + 1N1t12 - lt1264 = E31N1t12 - lt124E31N1t22 - N1t12 - l1t2 - t124 + VAR3N1t124 = VAR3N1t124 = lt1 . (9.35b) Example 9.21 Inquiries arrive at a recorded message device according to a Poisson process of rate 15 inquiries per minute. Find the probability that in a 1-minute period, 3 inquiries arrive during the first 10 seconds and 2 inquiries arrive during the last 15 seconds. The arrival rate in seconds is l = 15/60 = 1/4 inquiries per second. Writing time in seconds, the probability of interest is P3N1102 = 3 and N1602 - N1452 = 24. By applying first the independent increments property, and then the stationary increments property, we obtain P3N1102 = 3 and N1602 - N1452 = 24 = P3N1102 = 34P3N1602 - N1452 = 24 = P3N1102 = 34P3N160 - 452 = 24 = 110/423e -10/4 115/422e -15/4 3! 2! . Consider the time T between event occurrences in a Poisson process. Again suppose that the time interval [0, t] is divided into n subintervals of length d = t/n. The probability that the interevent time T exceeds t seconds is equivalent to no event occurring in t seconds (or in n Bernoulli trials): P3T 7 t4 = P3no events in t seconds4 = 11 - p2n = a1 : e -lt lt n b n as n : q . (9.36) Equation (9.36) implies that T is an exponential random variable with parameter l. Since the times between event occurrences in the underlying binomial process are independent geometric random variables, it follows that the sequence of interevent times in a Poisson process is composed of independent random variables. We therefore conclude that the interevent times in a Poisson process form an iid sequence of exponential random variables with mean 1/l. 510 Chapter 9 Random Processes Another quantity of interest is the time Sn at which the nth event occurs in a Poisson process. Let Tj denote the iid exponential interarrival times, then Sn = T1 + T2 + Á + Tn . In Example 7.5, we saw that the sum of n iid exponential random variables has an Erlang distribution. Thus the pdf of Sn is an Erlang random variable: fSn1y2 = 1ly2n - 1 1n - 12! le -ly for y Ú 0. (9.37) Example 9.22 Find the mean and variance of the time until the tenth inquiry in Example 9.20. The arrival rate is l = 1/4 inquiries per second, so the interarrival times are exponential random variables with parameter l. From Table 4.1, the mean and variance of exponential interarrival times then 1/l and 1/l2, respectively. The time of the tenth arrival is the sum of ten such iid random variables, thus E3S104 = 10E3T4 = 10 = 40 sec l 10 VAR3S104 = 10 VAR3T4 = 2 = 160 sec2. l In applications where the Poisson process models customer interarrival times, it is customary to say that arrivals occur “at random.” We now explain what is meant by this statement. Suppose that we are given that only one arrival occurred in an interval [0, t] and we let X be the arrival time of the single customer. For 0 6 x 6 t, N(x) is the number of events up to time x, and N1t2 - N1x2 is the increment in the interval (x, t], then: P3X … x4 = P3N1x2 = 1 ƒ N1t2 = 14 P3N1x2 = 1 and N1t2 = 14 = P3N1t2 = 14 P3N1x2 = 1 and N1t2 - N1x2 = 04 = P3N1t2 = 14 P3N1x2 = 14P3N1t2 - N1x2 = 04 = P3N1t2 = 14 lxe -lxe -l1t - x2 lte -lt x = . t = (9.38) Equation (9.38) implies that given that one arrival has occurred in the interval [0, t], then the customer arrival time is uniformly distributed in the interval [0, t]. It is in this sense that customer arrival times occur “at random.” It can be shown that if the number of amvals in the interval [0, t] is k, then the individual arrival times are distributed independently and uniformly in the interval. Section 9.4 Poisson and Associated Random Processes 511 Example 9.23 Suppose two customers arrive at a shop during a two-minute period. Find the probability that both customers arrived during the first minute. The arrival times of the customers are independent and uniformly distributed in the twominute interval. Each customer arrives during the first minute with probability 1/2. Thus the probability that both arrive during the first minute is 11/222 = 1/4. This answer can be verified by showing that P3N112 = 2 ƒ N122 = 24 = 1/4. 9.4.2 Random Telegraph Signal and Other Processes Derived from the Poisson Process Many processes are derived from the Poisson process. In this section, we present two examples of such random processes. Example 9.24 Random Telegraph Signal Consider a random process X(t) that assumes the values ;1. Suppose that X102 = +1 or -1 with probability 1/2, and suppose that X(t) changes polarity with each occurrence of an event in a Poisson process of rate a. Figure 9.10 shows a sample function of X(t). The pmf of X(t) is given by P3X1t2 = ;14 = P3X1t2 = ;1 | X102 = 14P3X102 = 14 + P3X1t2 = ;1 | X102 = -14P3X102 = -14. (9.39) The conditional pmf’s are found by noting that X(t) will have the same polarity as X(0) only when an even number of events occur in the interval (0, t]. Thus P3X1t2 = ;1 | X102 = ; 14 = P3N1t2 = even integer4 q 1at22j = a e -at j = 0 12j2! 1 = e -at 5eat + e -at6 2 1 = 11 + e -2at2. 2 1 0 X1 1 X2 ⫺1 X3 1 X4 ⫺1 X5 (9.40) 1 X6 X7 ⫺1 FIGURE 9.10 Sample path of a random telegraph signal. The times between transitions Xj are iid exponential random variables. t 512 Chapter 9 Random Processes X(t) and X(0) will differ in sign if the number of events in t is odd: q 1at22j + 1 P3X1t2 = ;1 | X102 = < 14 = a e -at j = 0 12j + 12! 1 = e -at 5eat - e -at6 2 1 = 11 - e -2at2. 2 (9.41) We obtain the pmf for X(t) by substituting into Eq. (9.40): 11 1 11 51 + e -2at6 + 51 - e -2at6 = 22 22 2 1 P3X1t2 = -14 = 1 - P3X1t2 = 14 = . 2 P3X1t2 = 14 = (9.42) Thus the random telegraph signal is equally likely to be ;1 at any time t 7 0. The mean and variance of X(t) are mX1t2 = 1P3X1t2 = 14 + 1-12P3X1t2 = -14 = 0 VAR3X1t24 = E3X1t224 = 1122P3X1t2 = 14 + 1-122P3X1t2 = -14 = 1. (9.43) The autocovariance of X(t) is found as follows: CX1t1 , t22 = E3X1t12X1t224 = 1P3X1t12 = X1t224 + 1-12P3X1t12 Z X1t224 = 1 1 51 + e -2aƒt2 - t1 ƒ6 - 51 - e -2aƒt2 - t1ƒ6 2 2 = e -2aƒt2 - t1ƒ. (9.44) Thus time samples of X(t) become less and less correlated as the time between them increases. The Poisson process and the random telegraph processes are examples of the continuous-time Markov chain processes that are discussed in Chapter 11. Example 9.25 Filtered Poisson Impulse Train The Poisson process is zero at t = 0 and increases by one unit at the random arrival times Sj , j = 1, 2, Á . Thus the Poisson process can be expressed as the sum of randomly shifted step functions: N1t2 = a u1t - Si2 q N102 = 0, i=1 where the Si are the arrival times. Since the integral of a delta function d1t - S2 is a step function u1t - S2, we can view N(t) as the result of integrating a train of delta functions that occur at times Sn , as shown in Fig. 9.11(a): Section 9.4 t( Z(t) Poisson and Associated Random Processes 513  )dt N(t) ⫽ u(t ⫺ Sk) k⫽1  Z(t) ⫽ δ(t ⫺ Sk) k⫽1 0 S1 S2 S3 S4 0 S1 S2 S3 S4 t N(t) t (a)  X(t) ⫽ Filter Z(t) h(t ⫺ Sk) k⫽1 X(t) 0 S1 S2 S3 S4 t (b) FIGURE 9.11 (a) Poisson process as integral of train of delta functions. (b) Filtered train of delta functions. Z1t2 = a d1t - Si2. q i=1 We can obtain other continuous-time processes by replacing the step function by another function h(t),1 as shown in Fig. 9.11(b): X1t2 = a h1t - Si2. q (9.45) i=1 For example, h(t) could represent the current pulse that results when a photoelectron hits a detector. X(t) is then the total current flowing at time t. X(t) is called a shot noise process. 1 This is equivalent to passing Z(t) through a linear system whose response to a delta function is h(t). 514 Chapter 9 Random Processes The following example shows how the properties of the Poisson process can be used to evaluate averages involving the filtered process. Example 9.26 Mean of Shot Noise Process Find the expected value of the shot noise process X(t). We condition on N(t), the number of impulses that have occurred up to time t: E3X1t24 = E3E3X1t2 | N1t244. Suppose N1t2 = k, then E3X1t2 | N1t2 = k4 = E B a h1t - Sj2 R k j=1 = a E3h1t - Sj24. k j=1 Since the arrival times, S1 , Á , Sk , when the impulses occurred are independent, uniformly distributed in the interval [0, t], E3h1t - Sj24 = L0 t t h1t - s2 1 ds = h1u2 du. t t L0 Thus t E3X1t2 | N1t2 = k4 = k h1u2 du, t L0 and E3X1t2 | N1t24 = N1t2 t L0 t h1u2 du. Finally, we obtain E3X1t24 = E3E3X1t2 | N1t244 = E3N1t24 = l t L0 L0 t h1u2 du t h1u2 du, (9.46) where we used the fact that E3N1t24 = lt. Note that E[X(t)] approaches a constant value as t becomes large if the above integral is finite. 9.5 GAUSSIAN RANDOM PROCESSES, WIENER PROCESS, AND BROWNIAN MOTION In this section we continue the introduction of important random processes. First, we introduce the class of Gaussian random processes which find many important applications in electrical engineering. We then develop an example of a Gaussian random process: the Wiener random process which is used to model Brownian motion. Section 9.5 9.5.1 Gaussian Random Processes, Wiener Process, and Brownian Motion 515 Gaussian Random Processes A random process X(t) is a Gaussian random process if the samples X1 = X1t12, X2 = X1t22, Á , Xk = X1tk2 are jointly Gaussian random variables for all k, and all choices of t1 , Á , tk . This definition applies to both discrete-time and continuoustime processes. Recall from Eq. (6.42) that the joint pdf of jointly Gaussian random variables is determined by the vector of means and by the covariance matrix: fX1, X2, Á , Xk1x1 , x2 , Á , xk2 = e -1/21xⴚm2 K 1xⴚm2 . 12p2k/2|K|1/2 T -1 (9.47a) In the case of Gaussian random processes, the mean vector and the covariance matrix are the values of the mean function and covariance function at the corresponding time instants: CX1t1 , tk2 CX1t1 , t12 CX1t1 , t22 Á mX1t12 CX1t2 , t12 CX1t2 , t22 Á CX1t2 , tk2 o S K = D (9.47b) m = C T. o o o mX1tk2 Á CX1tk , t12 CX1tk , tk2 Gaussian random processes therefore have the very special property that their joint pdf’s are completely specified by the mean function of the process mX1t2 and by the covariance function CX1t1 , t22. In Chapter 6 we saw that the linear transformations of jointly Gaussian random vectors result in jointly Gaussian random vectors as well. We will see in Chapter 10 that Gaussian random processes also have the property that the linear operations on a Gaussian process (e.g., a sum, derivative, or integral) results in another Gaussian random process. These two properties, combined with the fact that many signal and noise processes are accurately modeled as Gaussian, make Gaussian random processes the most useful model in signal processing. Example 9.27 iid Discrete-Time Gaussian Random Process Let the discrete-time random process Xn be a sequence of independent Gaussian random variables with mean m and variance s2. The covariance matrix for the times n1 , Á , nk is 5CX1n1 , n226 = 5s2 dij6 = s2I, where dij = 1 when i = j and 0 otherwise, and I is the identity matrix. Thus the joint pdf for the vector X n = 1Xn1 , Á , Xnk2 is fXn1x1 , x2 , Á , xk2 = exp b - a 1xi - m22/2s2 r . 12ps22k/2 i=1 1 k The Gaussian iid random process has the property that the value at every time instant is independent of the value at all other time instants. 516 Chapter 9 Random Processes Example 9.28 Continuous-Time Gaussian Random Process Let X(t) be a continuous-time Gaussian random process with mean function and covariance function given by: mX1t2 = 3t CX1t1 , t22 = 9e -2ƒ t1 - t2 ƒ. Find P3X132 6 64 and P3X112 + X122 7 24. The sample X(3) has a Gaussian pdf with mean mX132 = 3132 = 9 and variance s2X132 = CX13, 32 = 9e -2ƒ3 - 3ƒ = 9. To calculate P3X132 6 64 we put X(3) in standard form: P3X132 6 64 = P B X132 - 9 29 6 6 - 9 29 R = 1 - Q1-12 = Q112 = 0.16. From Example 6.24 we know that the sum of two Gaussian random variables is also a Gaussian random variable with mean and variance given by Eq. (6.47). Therefore the mean and variance of X112 + X122 are given by: E3X112 + X1224 = mX112 + mX122 = 3 + 6 = 9 VAR3X112 + X1224 = CX11, 12 + CX11, 22 + CX12, 12 + CX12, 22 = 95e -2ƒ1 - 1ƒ + e -2ƒ2 - 1ƒ + e -2ƒ1 - 2ƒ + e -2ƒ2 - 2ƒ6 = 952 + 2e -26 = 20.43. To calculate P3X112 + X122 7 24 we put X112 + X122 in standard form: P3X112 + X122 7 154 = P B 9.5.2 X112 + X122 - 9 220.43 7 15 - 9 220.43 R = Q11.3272 = 0.0922. Wiener Process and Brownian Motion We now construct a continuous-time Gaussian random process as a limit of a discretetime process. Suppose that the symmetric random walk process (i.e., p = 1/2) of Example 9.16 takes steps of magnitude ;h every d seconds.We obtain a continuous-time process by letting Xd1t2 be the accumulated sum of the random step process up to time t. Xd1t2 is a staircase function of time that takes jumps of ;h every d seconds. At time t, the process will have taken n = 3t/d4 jumps, so it is equal to Xd1t2 = h1D1 + D2 + Á + D3t/d42 = hSn . The mean and variance of Xd1t2 are E3Xd1t24 = hE3Sn4 = 0 VAR3Xd1t24 = h2n VAR3Dn4 = h2n, where we used the fact that VAR3Dn4 = 4p11 - p2 = 1 since p = 1/2. (9.48) Section 9.5 Gaussian Random Processes, Wiener Process, and Brownian Motion 517 3 2.5 2 1.5 1 0.5 0 ⫺0.5 ⫺1 ⫺1.5 ⫺2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FIGURE 9.12 Four sample functions of the Wiener process. Suppose that we take a limit where we simultaneously shrink the size of the jumps and the time between jumps. In particular let d : 0 and h : 0 with h = 1ad and let X(t) denote the resulting process. X(t) then has mean and variance given by E3X1t24 = 0 (9.49a) VAR3X1t24 = 11ad221t/d2 = at. (9.49b) Thus we obtain a continuous-time process X(t) that begins at the origin, has zero mean for all time, but has a variance that increases linearly with time. Figure 9.12 shows four sample functions of the process. Note the similarities in fluctuations to the realizations of a symmetric random walk in Fig. 9.7(b). X(t) is called the Wiener random process. It is used to model Brownian motion, the motion of particles suspended in a fluid that move under the rapid and random impact of neighboring particles. As d : 0, Eq. (9.48) implies that X(t) approaches the sum of an infinite number of random variables since n = 3t/d4 : q: X1t2 = lim hSn = lim 1at n: q d:0 Sn . 1n (9.50) By the central limit theorem the pdf of X(t) therefore approaches that of a Gaussian random variable with mean zero and variance at: fX1t21x2 = (9.51) 22pat X(t) inherits the property of independent and stationary increments from the random walk process from which it is derived. As a result, the joint pdf of X(t) at 1 2 e -x /2at. 518 Chapter 9 Random Processes several times t1 , t2 , Á , tk can be obtained by using Eq. (9.30): fX1t12, Á , X1tk21x1 , Á , xk2 = fX1t121x12fX1t2 - t121x2 - x12 Á fX1tk - tk - 121xk - xk - 12 = 1xk - xk - 122 1x2 - x122 1 x21 Á exp b - B + + + Rr 2 at1 a1t2 - t12 a1tk - tk - 12 212pa2kt11t2 - t12 Á 1tk - tk - 12 . (9.52) The independent increments property and the same sequence of steps that led to Eq. (9.33) can be used to show that the autocovariance of X(t) is given by CX1t1 , t22 = a min1t1 , t22 = a t1 for t1 6 t2 . (9.53) By comparing Eq. (9.53) and Eq. (9.35b), we see that the Wiener process and the Poisson process have the same covariance function despite the fact that the two processes have very different sample functions. This underscores the fact that the mean and autocovariance functions are only partial descriptions of a random process. Example 9.29 Show that the Wiener process is a Gaussian random process. Equation (9.52) shows that the random variables X1t12, X1t22 - X1t12, X1t32 X1t22, Á , X1tk2 - X1tk - 12, are independent Gaussian random variables. The random variables X1t12,X1t22, X1t32, Á , X1tk2, can be obtained from the X1t12 and the increments by a linear transformation: X1t12 = X1t12 X1t22 = X1t12 + 1X1t22 - X1t122 X1t32 = X1t12 + 1X1t22 - X1t122 + 1X1t32 - X1t222 o X1tk2 = X1t12 + 1X1t22 - X1t122 + Á + 1X1tk2 - X1tk - 122. (9.54) It then follows (from Eq. 6.45) that X1t12, X1t22, X1t32, Á , X1tk2 are jointly Gaussian random variables, and that X(t) is a Gaussian random process. 9.6 STATIONARY RANDOM PROCESSES Many random processes have the property that the nature of the randomness in the process does not change with time. An observation of the process in the time interval 1t0 , t12 exhibits the same type of random behavior as an observation in some other time interval 1t0 + t, t1 + t2. This leads us to postulate that the probabilities of samples of the process do not depend on the instant when we begin taking observations, that is, probabilities involving samples taken at times t1 , Á , tk will not differ from those taken at t1 + t, Á , tk + t. Example 9.30 Stationarity and Transience An urn has 6 white balls each with the label “0” and 5 white balls with the label “1”. The following sequence of experiments is performed: A ball is selected and the number noted; the first time a white ball is selected it is not put back in the urn, but otherwise balls are always put back in the urn. Section 9.6 Stationary Random Processes 519 The random process that results from this sequence of experiments clearly has a transient phase and a stationary phase. The transient phase consists of a string of n consecutive 1’s and it ends with the first occurrence of a “0”. During the transient phase P3In = 04 = 6/11, and the mean duration of the transient phase is geometrically distributed with mean 11/6. After the first occurrence of a “0”, the process enters a “stationary” phase where the process is a binary equiprobable iid sequence. The statistical behavior of the process does not change once the stationary phase is reached. If we are dealing with random processes that began at t = - q , then the above condition can be stated precisely as follows. A discrete-time or continuous-time random process X(t) is stationary if the joint distribution of any set of samples does not depend on the placement of the time origin. This means that the joint cdf of X1t12, X1t22, Á , X1tk2 is the same as that of X1t1 + t2, X1t2 + t2, Á , X1tk + t2: FX1t12, Á , X1tk21x1 , Á , xk2 = FX1t1 + t2, Á , X1tk + t21x1 , Á , xk2, (9.55) for all time shifts t, all k, and all choices of sample times t1 , Á , tk . If a process begins at some definite time (i.e., n = 0 or t = 0), then we say it is stationary if its joint distributions do not change under time shifts to the right. Two processes X(t) and Y(t) are said to be jointly stationary if the joint cdf’s of X1t12, Á , X1tk2 and Y1t1œ 2, Á , Y1tjœ2 do not depend on the placement of the time origin for all k and j and all choices of sampling times t1 , Á , tk and t¿1 , Á , t¿j . The first-order cdf of a stationary random process must be independent of time, since by Eq. (9.55), (9.56) FX1t21x2 = FX1t + t21x2 = FX1x2 all t, t. This implies that the mean and variance of X(t) are constant and independent of time: mX1t2 = E3X1t24 = m (9.57) for all t VAR3X1t24 = E31X1t2 - m224 = s2 for all t. (9.58) The second-order cdf of a stationary random process can depend only on the time difference between the samples and not on the particular time of the samples, since by Eq. (9.55), (9.59) FX1t12, X1t221x1 , x22 = FX102, X1t2 - t121x1 , x22 for all t1 , t2 . This implies that the autocorrelation and the autocovariance of X(t) can depend only on t2 - t1: RX1t1 , t22 = RX1t2 - t12 CX1t1 , t22 = CX1t2 - t12 for all t1 , t2 (9.60) for all t1 , t2 . (9.61) Example 9.31 iid Random Process Show that the iid random process is stationary. The joint cdf for the samples at any k time instants, t1 , Á , tk , is 520 Chapter 9 Random Processes FX1t12, Á , X1tk21x1 , x2 , Á , xk2 = FX1x12FX1x22 Á FX1xk2 = FX1t1 + t2, Á , X1tk + t21x1 , Á , xk2, for all k, t1 , Á , tk . Thus Eq. (9.55) is satisfied, and so the iid random process is stationary. Example 9.32 Is the sum process a discrete-time stationary process? The sum process is defined by Sn = X1 + X2 + Á + Xn , where the Xi are an iid sequence. The process has mean and variance mS1n2 = nm VAR3Sn4 = ns2, where m and s2 are the mean and variance of the Xn . It can be seen that the mean and variance are not constant but grow linearly with the time index n. Therefore the sum process cannot be a stationary process. Example 9.33 Random Telegraph Signal Show that the random telegraph signal discussed in Example 9.24 is a stationary random process when P3X102 = ;14 = 1/2. Show that X(t) settles into stationary behavior as t : q even if P3X102 = ;14 Z 1/2. We need to show that the following two joint pmf’s are equal: P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4, for any k, any t1 6 Á 6 tk , and any aj = ;1. The independent increments property of the Poisson process implies that P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t12 = a14 * P3X1t22 = a2 ƒ X1t12 = a14 Á P3X1tk2 = ak ƒ X1tk - 12 = ak - 14, since the values of the random telegraph at the times t1 , Á , tk are determined by the number of occurrences of events of the Poisson process in the time intervals 1tj , tj + 12. Similarly, P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4 = P3X1t1 + t2 = a14P3X1t2 + t2 = a2 ƒ X1t1 + t2 = a14 Á * P3X1tk + t2 = ak ƒ X1tk - 1 + t2 = ak - 14. The corresponding transition probabilities in the previous two equations are equal since 1 51 + e -2a1tj + 1 - tj26 2 P3X1tj + 12 = aj + 1 ƒ X1tj2 = aj4 = d 1 51 - e -2a1tj + 1 - tj26 2 if aj = aj + 1 if aj Z aj + 1 = P3X1tj + 1 + t2 = aj + 1 ƒ X1tj + t2 = aj4. Section 9.6 Stationary Random Processes 521 Thus the two joint probabilities differ only in the first term, namely, P3X1t12 = a14 and P3X1t1 + t2 = a14. From Example 9.24 we know that if P3X102 = ;14 = 1/2 then P3X1t2 = ;14 = 1/2, for all t. Thus P3X1t12 = a14 = 1/2, P3X1t1 + t2 = a14 = 1/2, and P3X1t12 = a1 , Á , X1tk2 = ak4 = P3X1t1 + t2 = a1 , Á , X1tk + t2 = ak4. Thus we conclude that the process is stationary when P3X102 = ;14 = 1/2. If P3X102 = ;14 Z 1/2, then the two joint pmf’s are not equal because P3X1t12 = a14 Z P3X1t1 + t2 = a14. Let’s see what happens if we know that the process started at a specific value, say X102 = 1, that is, P3X102 = 14 = 1. The pmf for X(t) is obtained from Eqs. (9.39) through (9.41): P3X1t2 = a4 = P3X1t2 = a ƒ X102 = 141 1 51 + e -2at6 2 = d 1 51 - e -2at6 2 if a = 1 if a = -1. For very small t, the probability that X1t2 = 1 is close to 1; but as t increases, the probability that X1t2 = 1 becomes 1/2. Therefore as t1 becomes large, P3X1t12 = a14 : 1/2 and P3X1t1 + t2 = a14 : 1/2 and the two joint pmf’s become equal. In other words, the process “forgets” the initial condition and settles down into “steady state,” that is, stationary behavior. 9.6.1 Wide-Sense Stationary Random Processes In many situations we cannot determine whether a random process is stationary, but we can determine whether the mean is a constant: mX1t2 = m for all t, (9.62) and whether the autocovariance (or equivalently the autocorrelation) is a function of t1 - t2 only: for all t1 , t2 . (9.63) CX1t1 , t22 = CX1t1 - t22 A discrete-time or continuous-time random process X(t) is wide-sense stationary (WSS) if it satisfies Eqs. (9.62) and (9.63). Similarly, we say that the processes X(t) and Y(t) are jointly wide-sense stationary if they are both wide-sense stationary and if their crosscovariance depends only on t1 - t2 . When X(t) is wide-sense stationary, we write CX1t1 , t22 = CX1t2 and RX1t1 , t22 = RX1t2, where t = t1 - t2 . All stationary random processes are wide-sense stationary since they satisfy Eqs. (9.62) and (9.63). The following example shows that some wide-sense stationary processes are not stationary. Example 9.34 Let Xn consist of two interleaved sequences of independent random variables. For n even, Xn assumes the values ;1 with probability 1/2; for n odd, Xn assumes the values 1/3 and -3 with 522 Chapter 9 Random Processes probabilities 9/10 and 1/10, respectively. Xn is not stationary since its pmf varies with n. It is easy to show that Xn has mean mX1n2 = 0 for all n and covariance function CX1i, j2 = b Xn is therefore wide-sense stationary. E3Xi4E3Xj4 = 0 E3X2i 4 = 1 for i Z j for i = j. We will see in Chapter 10 that the autocorrelation function of wide-sense stationary processes plays a crucial role in the design of linear signal processing algorithms. We now develop several results that enable us to deduce properties of a WSS process from properties of its autocorrelation function. First, the autocorrelation function at t = 0 gives the average power (second moment) of the process: RX102 = E3X1t224 (9.64) for all t. Second, the autocorrelation function is an even function of t since RX1t2 = E3X1t + t2X1t24 = E3X1t2X1t + t24 = RX1-t2. (9.65) Third, the autocorrelation function is a measure of the rate of change of a random process in the following sense. Consider the change in the process from time t to t + t: P3|X1t + t2 - X1t2| 7 e4 = P31X1t + t2 - X1t222 7 e24 … = E31X1t + t2 - X1t2224 e2 25RX102 - RX1t26 e2 , (9.66) where we used the Markov inequality, Eq. (4.75), to obtain the upper bound. Equation (9.66) states that if RX102 - RX1t2 is small, that is, RX1t2 drops off slowly, then the probability of a large change in X(t) in t seconds is small. Fourth, the autocorrelation function is maximum at t = 0. We use the CauchySchwarz inequality:2 (9.67) E3XY42 … E3X24E3Y24, for any two random variables X and Y. If we apply this equation to X1t + t2 and X(t), we obtain RX1t22 = E3X1t + t2X1t242 … E3X21t + t24E3X21t24 = RX1022. Thus 2 See Problem 5.74 and Appendix C. ƒ RX1t2 ƒ … RX102. (9.68) Section 9.6 Stationary Random Processes 523 Fifth, if RX102 = RX1d2, then RX1t2 is periodic with period d and X(t) is mean square periodic, that is, E31X1t + d2 - X1t2224 = 0. If we apply Eq. (9.67) to X1t + t + d2 - X1t + t2 and X(t), we obtain E31X1t + t + d2 - X1t + t22X1t242 … E31X1t + t + d2 - X1t + t2224E3X21t24, which implies that 5RX1t + d2 - RX1t262 … 25RX102 - RX1d26RX102. Thus RX1d2 = RX102 implies that the right-hand side of the equation is zero, and thus that RX1t + d2 = RX1t2 for all t. Repeated applications of this result imply that RX1t2 is periodic with period d. The fact that X(t) is mean square periodic follows from E31X1t + d2 - X1t2224 = 25RX102 - RX1d26 = 0. Sixth, let X1t2 = m + N1t2, where N(t) is a zero-mean process for which RN1t2 : 0 as t : q , then RX1t2 = E31m + N1t + t221m + N1t224 = m2 + 2mE3N1t24 + RN1t2 = m2 + RN1t2 : m2 as t : q. In other words, RX1t2 approaches the square of the mean of X(t) as t : q. In summary, the autocorrelation function can have three types of components: (1) a component that approaches zero as t : q; (2) a periodic component; and (3) a component due to a nonzero mean. Example 9.35 Figure 9.13 shows several typical autocorrelation functions. Figure 9.13(a) shows the autocorrelation function for the random telegraph signal X(t) (see Eq. (9.44)): RX1t2 = e -2aƒtƒ for all t. X(t) is zero mean and RX1t2 : 0 as ƒ t ƒ : q. Figure 9.13(b) shows the autocorrelation function for a sinusoid Y(t) with amplitude a and random phase (see Example 9.10): RY1t2 = a2 cos12pf0t2 2 for all t. Y(t) is zero mean and RY1t2 is periodic with period 1/f0 . Figure 9.13(c) shows the autocorrelation function for the process Z1t2 = X1t2 + Y1t2 + m, where X(t) is the random telegraph process, Y(t) is a sinusoid with random phase, and m is a constant. If we assume that X(t) and Y(t) are independent processes, then RZ1t2 = E35X1t + t2 + Y1t + t2 + m65X1t2 + Y1t2 + m64 = RX1t2 + RY1t2 + m2. 524 Chapter 9 Random Processes RX (t) ⫽ e⫺2α t t 0 (a) RY (t) ⫽ a2 cos 2pf0t 2 t 0 (b) RZ (t) m2 0 (c) t FIGURE 9.13 (a) Autocorrelation function of a random telegraph signal. (b) Autocorrelation function of a sinusoid with random phase. (c) Autocorrelation function of a random process that has nonzero mean, a periodic component, and a “random” component. 9.6.2 Wide-Sense Stationary Gaussian Random Processes If a Gaussian random process is wide-sense stationary, then it is also stationary. Recall from Section 9.5, Eq. (9.47), that the joint pdf of a Gaussian random process is completely determined by the mean mX1t2 and the autocovariance CX1t1 , t22. If X(t) is wide-sense stationary, then its mean is a constant m and its autocovariance depends only on the difference of the sampling times, ti - tj . It then follows that the joint pdf of X(t) depends only on this set of differences, and hence it is invariant with respect to time shifts. Thus the process is also stationary. The above result makes WSS Gaussian random processes particularly easy to work with since all the information required to specify the joint pdf is contained in m and CX1t2. Example 9.36 A Gaussian Moving Average Process Let Xn be an iid sequence of Gaussian random variables with zero mean and variance s2, and let Yn be the average of two consecutive values of Xn: Section 9.6 Yn = Stationary Random Processes 525 Xn + Xn - 1 . 2 The mean of Yn is zero since E3Xi4 = 0 for all i. The covariance is CY1i, j2 = E3YiYj4 = = 1 E31Xi + Xi - 121Xj + Xj - 124 4 1 5E3XiXj4 + E3XiXj - 14 + E3Xi - 1Xj4 + E3Xi - 1Xj - 146 4 1 2 s 2 = e1 2 s 4 0 if i = j if ƒ i - j ƒ = 1 otherwise. We see that Yn has a constant mean and a covariance function that depends only on ƒ i - j ƒ , thus Yn is a wide-sense stationary process. Yn is a Gaussian random variable since it is defined by a linear function of Gaussian random variables (see Section 6.4, Eq. 6.45). Thus the joint pdf of Yn is given by Eq. (9.47) with zero-mean vector and with entries of the covariance matrix specified by CY1i, j2 above. 9.6.3 Cyclostationary Random Processes Many random processes arise from the repetition of a given procedure every T seconds. For example, a data modulator (“modem”) produces a waveform every T seconds according to some input data sequence. In another example, a “time multiplexer” interleaves n separate sequences of information symbols into a single sequence of symbols. It should not be surprising that the periodic nature of such processes is evident in their probabilistic descriptions.A discrete-time or continuous-time random process X(t) is said to be cyclostationary if the joint cumulative distribution function of any set of samples is invariant with respect to shifts of the origin by integer multiples of some period T. In other words, X1t12, X1t22, Á , X1tk2 and X1t1 + mT2, X1t2 + mT2, Á , X1tk + mT2 have the same joint cdf for all k, m, and all choices of sampling times t1 , Á , tk: FX1t12, X1t22, Á , X1tk21x1 , x2 , Á , xk2 = FX1t1 + mT2, X1t2 + mT2, Á , X1tk + mT21x1 , x2 , Á , xk2. (9.69) We say that X(t) is wide-sense cyclostationary if the mean and autocovariance functions are invariant with respect to shifts in the time origin by integer multiples of T, that is, for every integer m, mX1t + mT2 = mX1t2 CX1t1 + mT, t2 + mT2 = CX1t1 , t22. (9.70a) (9.70b) Note that if X(t) is cyclostationary, then it follows that X(t) is also wide-sense cyclostationary. 526 Chapter 9 Random Processes Example 9.37 Consider a random amplitude sinusoid with period T: X1t2 = A cos12pt/T2. Is X(t) cyclostationary? wide-sense cyclostationary? Consider the joint cdf for the time samples t1 , Á , tk: P3X1t12 … x1 , X1t22 … x2 , Á , X1tk2 … xk24 = P3A cos12pt1/T2 … x1 , Á , A cos12ptk/T2 … xk4 = P3A cos12p1t1 + mT2/T2 … x1 , Á , A cos12p1tk + mT2/T2 … xk4 = P3X1t1 + mT2 … x1 , X1t2 + mT2 … x2 , Á , X1tk + mT2 … xk4. Thus X(t) is a cyclostationary random process and hence also a wide-sense cyclostationary process. In the above example, the sample functions of the random process are always periodic. The following example shows that, in general, the sample functions of a cyclostationary random process need not be periodic. Example 9.38 Pulse Amplitude Modulation A modem transmits a binary iid equiprobable data sequence as follows: To transmit a binary 1, the modem transmits a rectangular pulse of duration T seconds and amplitude 1; to transmit a binary 0, it transmits a rectangular pulse of duration T seconds and amplitude -1. Let X(t) be the random process that results. Is X(t) wide-sense cyclostationary? Figure 9.14(a) shows a rectangular pulse of duration T seconds, and Fig. 9.14(b) shows the waveform that results for a particular data sequence. Let A i be the sequence of amplitudes 1;12 1 p(t) 0 T (a) Individual signal pulse t 1 0 1 T 2T 3T 4T ⫺1 ⫺1 (b) Waveform corresponding to data sequence 1001 FIGURE 9.14 Pulse amplitude modulation. t Section 9.6 Stationary Random Processes 527 corresponding to the binary sequence, then X(t) can be represented as the sum of amplitudemodulated time-shifted rectangular pulses: X1t2 = a A np1t - nT2. q q (9.71) n=- The mean of X(t) is mX1t2 = E B a A np1t - nT2 R = a E3A n4p1t - nT2 = 0 q q q q n=- n=- since E3A n4 = 0. The autocovariance function is CX1t1 , t22 = E3X1t12X1t224 - 0 = b E3X1t1224 = 1 E3X1t124E3X1t224 = 0 if nT … t1 , t2 6 1n + 12T otherwise. Figure 9.15 shows the autocovariance function in terms of t1 and t2 . It is clear that CX1t1 + mT, t2 + mT2 = CX1t1 , t22 for all integers m. Therefore the process is wide-sense cyclostationary. We will now show how a stationary random process can be obtained from a cyclostationary process. Let X(t) be a cyclostationary process with period T. We “stationarize” X(t) by observing a randomly phase-shifted version of X(t): Xs1t2 = X1t + ®2 ® uniform in 30, T4, (9.72) t2 1 5T 0 1 4T 1 3T 0 1 2T 1 T 1 0 T 2T 3T 4T 5T FIGURE 9.15 Autocovariance function of pulse amplitude-modulated random process. t1 528 Chapter 9 Random Processes where ® is independent of X1t2. Xs1t2 can arise when the phase of X(t) is either unknown or not of interest. If X(t) is a cyclostationary random process, then Xs1t2 is a stationary random process. To show this, we first use conditional expectation to find the joint cdf of Xs1t2: P3Xs1t12 … x1 , Xs1t22 … x2 , Á , Xs1tk2 … xk4 = P3X1t1 + ®2 … x1 , X1t2 + ®2 … x2 , Á , X1tk + ®2 … xk4 T P3X1t1 + ®2 … x1 , Á , X1tk + ®2 … xk | ® = u4f®1u2 du = L0 = 1 P3X1t1 + u2 … x1 , Á , X1tk + u2 … xk4 du. T L0 T (9.73) Equation (9.73) shows that the joint cdf of Xs1t2 is obtained by integrating the joint cdf of X(t) over one time period. It is easy to then show that a time-shifted version of Xs1t2, say Xs1t1 + t2, Xs1t2 + t2, Á , Xs1tk + t2, will have the same joint cdf as Xs1t12, Xs1t22, Á , Xs1tk2 (see Problem 9.80). Therefore Xs1t2 is a stationary random process. By using conditional expectation (see Problem 9.81), it is easy to show that if X(t) is a wide-sense cyclostationary random process, then Xs1t2 is a wide-sense stationary random process, with mean and autocorrelation given by E3Xs1t24 = RXs1t2 = T 1 mx1t2 dt T L0 (9.74a) T 1 R 1t + t, t2 dt. T L0 X (9.74b) Example 9.39 Pulse Amplitude Modulation with Random Phase Shift Let Xs1t2 be the phase-shifted version of the pulse amplitude–modulated waveform X(t) introduced in Example 9.38. Find the mean and autocorrelation function of Xs1t2. Xs1t2 has zero mean since X(t) is zero-mean. The autocorrelation of Xs1t2 is obtained from Eq. (9.74b). From Fig. 9.15, we can see that for 0 6 t + t 6 T, RX1t + t, t2 = 1 and RX1t + t, t2 = 0 otherwise. Therefore: RXs1t2 = for 0 6 t 6 T: for - T 6 t 6 0: 1 T L0 RXs1t2 = Thus Xs1t2 has a triangular autocorrelation function: RXs1t2 = c 1 0 ƒtƒ T ƒtƒ … T ƒ t ƒ 7 T. T-t dt = T T - t ; T 1 T + t dt = . T L- t T Section 9.7 Continuity, Derivatives, and Integrals of Random Processes 537 The variance is then VAR3M1t24 = E3A24 4 4 2pt 2pt - E3A42 2 sin2 sin2 T T p2 p = VAR3A4 2pt 4 sin2 . 2 T p Example 9.45 Integral of White Gaussian Noise Let Z(t) be the white Gaussian noise process introduced in Example 9.43. Find the autocorrelation function of X(t), the integral of Z(t) over the interval (0, t). From Example 9.43, the white Gaussian noise process has autocorrelation function RZ1t1 , t22 = ad1t1 - t22. The autocorrelation function of X(t) is then given by RX1t1 , t22 = t1 L0 L0 = a L0 t2 ad1w - v2 dw dv = a min1t1,t22 L0 t2 u1t1 - v2 dv dv = a min1t1 , t22. We thus find that X(t) has the same autocorrelation as the Wiener process. In addition we have that X(t) must be a Gaussian random process since Z(t) is Gaussian. It then follows that X(t) must be the Wiener process because it has the joint pdf given by Eq. (9.52). 9.7.4 Response of a Linear System to Random Input We now apply the results developed in this section to develop the solution of a linear system described by a first-order differential equation. The method can be generalized to higher-order equations. In the next chapter we develop transform methods to solve the general problem. Consider a linear system described by the first-order differential equation: X¿1t2 + aX1t2 = Z1t2 t Ú 0, X102 = 0. (9.93) For example, X(t) may represent the voltage across the capacitor of an RC circuit with current input Z(t). We now show how to obtain mX1t2 and RX1t1 , t22. If the input process Z(t) is Gaussian, then the output process will also be Gaussian. Therefore, in the case of Gaussian input processes, we can then characterize the joint pdf of the output process. 538 Chapter 9 Random Processes We obtain a differential equation for mX1t2 by taking the expected value of Eq. (9.93): œ 1t2 + mX1t2 = mZ1t2 t Ú 0 E3X¿1t24 + E3X1t24 = mX (9.94) with initial condition mX102 = E3X1024 = 0. As an intermediate step we next find a differential equation for RZ,X1t1 , t22. If we multiply Eq. (9.93) by Z1t12 and take the expected value, we obtain E3Z1t12X¿1t224 + aE3Z1t12X1t224 = E3Z1t12Z1t224 t2 Ú 0 with initial condition E3Z1t12X1024 = 0 since X102 = 0. The same derivation that led to the cross-correlation between X(t) and X¿1t2 (see Eq. 9.83) can be used to show that 0 R 1t , t 2. 0t2 Z,X 1 2 Thus we obtain the following differential equation: E3Z1t12X¿1t224 = 0 R 1t , t 2 + aRZ,X1t1 , t22 = RZ1t1 , t22 0t2 Z,X 1 2 t2 Ú 0 (9.95) with initial condition RZ,X1t1 , 02 = 0. Finally we obtain a differential equation for RZ1t1 , t22. Multiply Eq. (9.93) by X1t22 and take the expected value: E3X¿1t12X1t224 + aE3X1t12X1t224 = E3Z1t12X1t224 t1 Ú 0 with initial condition E3X102X1t224 = 0. This leads to the differential equation 0 R 1t , t 2 + aRX1t1 , t22 = RZ,X1t1 , t22 0t1 X 1 2 t1 Ú 0 (9.96) with initial condition RZ,X10, t22 = 0. Note that the solution to Eq. (9.95) appears as the forcing function in Eq. (9.96). Thus we conclude that by solving the differential equations in Eqs. (9.94), (9.95), and (9.96) we obtain the mean and autocorrelation function for X(t). Example 9.46 Ornstein-Uhlenbeck Process Equation (9.93) with the input given by a zero-mean, white Gaussian noise process is called the Langevin equation, after the scientist who formulated it in 1908 to describe the Brownian motion of a free particle. In this formulation X(t) represents the velocity of the particle, so that Eq. (9.93) results from equating the acceleration of the particle X¿1t2 to the force on the particle due to friction -aX1t2 and the force due to random collisions Z(t). We present the solution developed by Uhlenbeck and Ornstein in 1930. First, we note that since the input process Z(t) is Gaussian, the output process X(t) will also be a Gaussian random process. Next we recall that the first-order differential equation x¿1t2 + ax1t2 = g1t2 t Ú 0, x102 = 0 Section 9.7 Continuity, Derivatives, and Integrals of Random Processes 539 has solution L0 Therefore the solution to Eq. (9.94) is t x1t2 = mX1t2 = e -a1t - t2g1t2 dt L0 t t Ú 0. e -a1t - t2mZ1t2 dt = 0. The autocorrelation of the white Gaussian noise process is RZ1t1 , t22 = s2d1t1 - t22. Equation (9.95) is also a first-order differential equation, and it has solution RZ,X1t1 , t22 = = L0 t2 L0 = b e -a1t2 - t2RZ1t1 , t2 dt t2 e -a1t2 - t2s2d1t1 - t2 dt 0 s2e -a1t2 - t12 0 … t2 6 t1 t2 Ú t1 = s2e -a1t2 - t12u1t2 - t12, where u(x) is the unit step function. The autocorrelation function of the output process X(t) is the solution to the first-order differential equation Eq. (9.96). The solution is given by RX1t1 , t22 = L0 = s2 = s2 = t1 e -a1t1 - t2RZ,X1t, t22 dt L0 L0 t1 e -a1t1 - t2e -a1t2 - t2u1t2 - t2 dt min1t1, t22 e -a1t1 - t2e -a1t2 - t2 dt s2 -a|t1 - t2| 1e - e -a1t1 + t222 2a t1 Ú 0, t2 Ú 0. (9.97a) A Gaussian random process with this autocorrelation function is called an Ornstein-Uhlenbeck process. Thus we conclude that the output process X(t) is an Ornstein-Uhlenbeck process. If we let t1 = t and t2 = t + t, then as t approaches infinity, RX1t + t, t2 : s2 -aƒtƒ e . 2a (9.97b) This shows that the effect of the zero initial condition dies out as time progresses, and the process becomes wide-sense stationary. Since the process is Gaussian, this also implies that the process becomes strict-sense stationary. 540 9.8 Chapter 9 Random Processes TIME AVERAGES OF RANDOM PROCESSES AND ERGODIC THEOREMS At some point, the parameters of a random process must be obtained through measurement. The results from Chapter 7 and the statistical methods of Chapter 8 suggest that we repeat the random experiment that gives rise to the random process a large number of times and take the arithmetic average of the quantities of interest. For example, to estimate the mean mX1t2 of a random process X1t, z2, we repeat the random experiment and take the following average: N X1t2 = m 1 N X1t, zi2, N ia =1 (9.98) where N is the number of repetitions of the experiment, and X1t, zi2 is the realization observed in the ith repetition. In some situations, we are interested in estimating the mean or autocorrelation functions from the time average of a single realization, that is, 8X1t29T = T 1 X1t, z2 dt. 2T L-T (9.99) An ergodic theorem states conditions under which a time average converges as the observation interval becomes large. In this section, we are interested in ergodic theorems that state when time averages converge to the ensemble average (expected value). The strong law of large numbers, presented in Chapter 7, is one of the most important ergodic theorems. It states that if Xn is an iid discrete-time random process with finite mean E3Xn4 = m, then the time average of the samples converges to the ensemble average with probability one: 1 n Xi = m R = 1. n : q n ia =1 P B lim (9.100) This result allows us to estimate m by taking the time average of a single realization of the process. We are interested in obtaining results of this type for a larger class of random processes, that is, for non-iid, discrete-time random processes, and for continuoustime random processes. The following example shows that, in general, time averages do not converge to ensemble averages. Example 9.47 Let X1t2 = A for all t, where A is a zero-mean, unit-variance random variable. Find the limiting value of the time average. The mean of the process is mX1t2 = E3X1t24 = E3A4 = 0. However, Eq. (9.99) gives 8X1t29T = T 1 A dt = A. 2T L-T Thus the time-average mean does not always converge to mX1t2 = 0. Note that this process is stationary. Thus this example shows that stationary processes need not be ergodic. Section 9.8 Time Averages of Random Processes and Ergodic Theorems 541 Consider the estimate given by Eq. (9.99) for E3X1t24 = mX1t2. The estimate yields a single number, so obviously it only makes sense to consider processes for which mX1t2 = m, a constant. We now develop an ergodic theorem for the time average of wide-sense stationary processes. Let X(t) be a WSS process. The expected value of 8X1t29T is T T 1 1 E38X1t29T4 = E B X1t2 dt R = E3X1t24 dt = m. 2T L-T 2T L-T (9.101) Equation (9.101) states that 8X1t29T is an unbiased estimator for m. Consider the variance of 8X1t29T: VAR38X1t29T4 = E318X1t29T - m224 T T 1 1 = EB b 1X1t2 - m2 dt r b 1X1t¿2 - m2 dt¿ r R 2T L-T 2T L-T T T T T 1 E31X1t2 - m21X1t¿2 - m24 dt dt¿ = 4T2 L-T L-T = 1 CX1t, t¿2 dt dt¿. 4T2 L-T L-T (9.102) Since the process X(t) is WSS, Eq. (9.102) becomes VAR38X1t29T4 = T T 1 CX1t - t¿2 dt dt¿ . 4T2 L-T L-T (9.103) Figure 9.17 shows the region of integration for this integral. The integrand is constant along the line u = t - t¿ for -2T 6 u 6 2T, so we can evaluate the integral as the t ⫺2T ⫽ t ⫺ t 0 ⫽ t ⫺ t ⫺T T u ⫽ t ⫺ t ⫺T FIGURE 9.17 Region of integration for integral in Eq. (9.102). t u ⫹ du ⫽ t ⫺ t 2T ⫽ t ⫺ t 542 Chapter 9 Random Processes sums of infinitesimal strips as shown in the figure. It can be shown that each strip has area 12T - ƒ u ƒ 2 du, so the contribution of each strip to the integral is 12T - ƒ u ƒ 2CX1u2 du. Thus 2T VAR38X1t29T4 = 1 12T - ƒ u ƒ 2CX1u2 du 4T2 L-2T = 2T ƒuƒ 1 bC 1u2 du. a1 2T L-2T 2T X (9.104) Therefore, 8X1t29T will approach m in the mean square sense, that is, E318X1t29T m224 : 0, if the expression in Eq. (9.104) approaches zero with increasing T. We have just proved the following ergodic theorem. Theorem Let X(t) be a WSS process with mX1t2 = m, then lim 8X1t29 T = m T: q in the mean square sense, if and only if 2T ƒuƒ 1 a1 bCX1u2 du = 0. T : q 2T L 2T -2T lim In keeping with engineering usage, we say that a WSS process is mean ergodic if it satisfies the conditions of the above theorem. The above theorem can be used to obtain ergodic theorems for the time average of other quantities. For example, if we replace X(t) with Y1t + t2Y1t2 in Eq. (9.99), we obtain a time-average estimate for the autocorrelation function of the process Y(t): 8Y1t + t2Y1t29T = T 1 Y1t + t2Y1t2 dt. 2T L-T (9.105) It is easily shown that E38Y1t + t2Y1t29T4 = RY1t2 if Y(t) is WSS.The above ergodic theorem then implies that the time-average autocorrelation converges to RY1t2 in the mean square sense if the term in Eq. (9.104) with X(t) replaced by Y1t2Y1t + t2 converges to zero. Example 9.48 Is the random telegraph process mean ergodic? The covariance function for the random telegraph process is CX1t2 = e -2aƒtƒ, so the variance of 8X1t29T is 2T 2 u VAR38X1t29T4 = a1 be -2au du 2T L0 2T 2T 6 1 1 - e -4aT e -2au du = . T L0 2aT The bound approaches zero as T : q , so VAR38X1t29T4 : 0. Therefore the process is mean ergodic. Section 9.8 Time Averages of Random Processes and Ergodic Theorems 543 If the random process under consideration is discrete-time, then the time-average estimate for the mean and the autocorrelation functions of Xn are given by 8Xn9T = 8Xn + kXn9T = T 1 Xn 2T + 1 n a = -T (9.106) T 1 Xn + kXn . 2T + 1 n a = -T (9.107) If Xn is a WSS random process, then E38Xn9T4 = m, and so 8Xn9T is an unbiased estimate for m. It is easy to show that the variance of 8Xn9T is 2T ƒkƒ 1 a1 bCX1k2. a 2T + 1 k = -2T 2T + 1 VAR38Xn9T4 = (9.108) Therefore, 8Xn9T approaches m in the mean square sense and is mean ergodic if the expression in Eq. (9.108) approaches zero with increasing T. Example 9.49 Ergodicity and Exponential Correlation Let Xn be a wide-sense stationary discrete-time process with mean m and covariance function CX1k2 = s2r-ƒkƒ, for ƒ r ƒ 6 1 and k = 0, ;1, +2, Á . Show that Xn is mean ergodic. The variance of the sample mean (Eq. 9.106) is: VAR[8Xn9T = 2T ƒkƒ 1 a1 bs2rƒkƒ a 2T + 1 k = -2T 2T + 1 1 2s2 2 s2rk = . a 2T + 1 k = 0 2T + 1 1 - r q 6 The bound on the right-hand side approaches zero as T increases and so Xn is mean ergodic. Example 9.50 Ergodicity of Self-Similar Process and Long-Range Dependence Let Xn be a wide-sense stationary discrete-time process with mean m and covariance function s2 5 ƒ k + 1 ƒ 2H - 2 ƒ k ƒ 2H + ƒ k - 1 ƒ 2H6 2 CX1k2 = (9.109) for 1/2 6 H 6 1 and k = 0, ;1, +2, Á Xn is said to be second-order self-similar. We will investigate the ergodicity of Xn . We rewrite the variance of the sample mean in (Eq. 9.106) as follows: VAR38Xn9T4 = = a 12T + 1 - ƒ k ƒ 2CX1k2 2T 1 12T + 12 2 k = -2T 1 512T + 12CX102 + 212TCX1122 + Á + 2CX12T26. 12T + 122 544 Chapter 9 Random Processes It is easy to show (See Problem 9.132) that the sum inside the braces is s212T + 122H. Therefore the variance becomes: VAR38Xn9T4 = 1 s2 12T + 122H = s2 12T + 122H - 2. 12T + 122 (9.110) The value of H, which is called the Hurst parameter, affects the convergence behavior of the sample mean. Note that if H = 1/2, the covariance function becomes CX1k2 = 1/2s2dk which corresponds to an iid sequence. In this case, the variance becomes s2/12T + 12 which is the convergence rate of the sample mean for iid samples. However, for H 7 1/2, the variance becomes: s2 (9.111) 12T + 122H - 1, 2T + 1 so the convergence of the sample mean is slower by a factor of 12T + 122H - 1 than for iid samples. The slower convergence of the sample mean when H 7 1/2 results from the long-range dependence of Xn . It can be shown that for large k, the covariance function is approximately given by: VAR38Xn9T4 = CX1k2 = s2H12H - 12k2H - 2. (9.112) a For 1/2 6 H 6 1, C1k2 decays as 1/k where 0 6 a 6 1, which is a very slow decay rate. Thus the dependence between values of Xn decreases slowly and the process is said to have a long memory or long-range dependence. *9.9 FOURIER SERIES AND KARHUNEN-LOEVE EXPANSION Let X(t) be a wide-sense stationary, mean square periodic random process with period T, that is, E31X1t + T2 - X1t2224 = 0. In order to simplify the development, we assume that X(t) is zero mean. We show that X(t) can be represented in a mean square sense by a Fourier series: X1t2 = a Xkej2pkt/T, q q (9.113) k=- where the coefficients are random variables defined by T Xk = 1 X1t¿2e -j2pkt¿/T dt¿. T L0 (9.114) Equation (9.114) implies that, in general, the coefficients are complex-valued random variables. For complex-valued random variables, the correlation between two random variables X and Y is defined by E3XY*4. We also show that the coefficients are orthogonal random variables, that is, E3XkX…m4 = 0 for k Z m. Recall that if X(t) is mean square periodic, then RX1t2 is a periodic function in t with period T. Therefore, it can be expanded in a Fourier series: RX1t2 = a akej2pkt/T, q q (9.115) k=- where the coefficients ak are given by T ak = 1 R 1t¿2e -j2pkt¿/T dt¿. T L0 X (9.116) Section 9.9 Fourier Series and Karhunen-Loeve Expansion 545 The coefficients ak appear in the following derivation. First, we show that the coefficients in Eq. (9.113) are orthogonal random variables, that is, E3XkX…m4 = 0: E3XkX…m4 = E B Xk T 1 X*1t¿2ej2pmt¿/T dt¿ R T L0 T = 1 E3XkX*1t¿24ej2pmt¿/T dt¿. T L0 The integrand of the above equation has T E3XkX*1t24 = E B 1 X1u2e -j2pku/T du X*1t2 R T L0 T = 1 R 1u - t2e -j2pku/T du T L0 X T-t = b 1 T L-t RX1v2e -j2pkv/T dv r e -j2pkt/T = ake -j2pkt/T, where we have used the fact that the Fourier coefficients can be calculated over any full period. Therefore E3XkX…m4 = T 1 a e -j2pkt¿/Tej2pmt¿/T dt¿ = akdk,m , T L0 k (9.117) where dk,m is the Kronecker delta function. Thus Xk and Xm are orthogonal random variables. Note that the above equation implies that ak = E C ƒ Xk ƒ 2 D , that is, the ak are real-valued. To show that the Fourier series equals X(t) in the mean square sense, we take E B ` X1t2 - a Xkej2pkt/T ` R q q 2 k=- = E3 ƒ X1t2 ƒ 24 - E B X1t2 a X…ke -j2pkt/T R q q k=- - E B X*1t2 a Xkej2pkt/T R + E B a a XkX…mej2p1k - m2t/T R q q q q k=- q q k=- m=- = RX102 - a ak - a a…k + a ak . q q q q q q k=- k=- k=- The above equation equals zero, since the ak are real and since RX102 = ©ak from Eq. (9.115). If X(t) is a wide-sense stationary random process that is not mean square periodic, we can still expand X(t) in the Fourier series in an arbitrary interval [0, T]. Mean square equality will hold only inside the interval. Outside the interval, the expansion repeats 546 Chapter 9 Random Processes itself with period T. The Fourier coefficients will no longer be orthogonal; instead they are given by E3XkX…m4 = T T 1 RX1t - u2e -j2pkt/Tej2pmu/T dt du. T2 L0 L0 (9.118) It is easy to show that if X(t) is mean square periodic, then this equation reduces to Eq. (9.117). 9.9.1 Karhunen-Loeve Expansion In this section we present the Karhunen-Loeve expansion, which allows us to expand a (possibly nonstationary) random process X(t) in a series: X1t2 = a Xkfk1t2 q 0 … t … T, (9.119a) k=1 where T X1t2f…k1t2 dt, (9.119b) L0 where the equality in Eq. (9.119a) is in the mean square sense, where the coefficients 5Xk6 are orthogonal random variables, and where the functions 5fk1t26 are orthonormal: Xk = L0 T fi1t2fj1t2 dt = di,j for all i, j. In other words, the Karhunen-Loeve expansion provides us with many of the nice properties of the Fourier series for the case where X(t) is not mean square periodic. For simplicity, we again assume that X(t) is zero mean. In order to motivate the Karhunen-Loeve expansion, we review the KarhunenLoeve transform for vector random variables as introduced in Section 6.3. Let X be a zero-mean, vector random variable with covariance matrix KX . The eigenvalues and eigenvectors of KX are obtained from KXei = liei , (9.120) where the ei are column vectors. The set of normalized eigenvectors are orthonormal, that is, eTi ej = di, j . Define the matrix P of eigenvectors and ¶ of eigenvalues as P = 3e1 , e2 , Á , en4 ¶ = diag3li4, then KX l1 0 = P¶PT = 3e1 , e2 , Á , en4D Á 0 0 l2 Á 0 Á Á Á Á 0 eT1 0 eT TD 2T Á o ln eTn Section 9.9 Fourier Series and Karhunen-Loeve Expansion 547 eT1 eT2 = 3l1e1 , l2e2 , Á , lnen4D T o eTn = a lieieTi . n (9.121a) k=1 Therefore we find that the covariance matrix can be expanded as a weighted sum of matrices, ei eTi . In addition, if we let Y = PTX, then the random variables in Y are orthogonal. Furthermore, since PPT = I, then Y1 n Y2 X = PY = 3e1 , e2 , Á , en4D T = a Ykek . o k=1 Yn (9.121b) Thus we see that the arbitrary vector random variable X can be expanded as a weighted sum of the eigenvectors of KX , where the coefficients are orthogonal random variables. Furthermore the eigenvectors form an orthonormal set. These are exactly the properties we seek in the Karhunen-Loeve expansion for X(t). If the vector random variable X is jointly Gaussian, then the components of Y are independent random variables. This results in tremendous simplification in a wide variety of problems. In analogy to Eq. (9.120), we begin by considering the following eigenvalue equation: T (9.122) KX1t1 , t22fk1t22 dt2 = lkfk1t12 0 … t1 … T. L0 The values lk and the corresponding functions fk1t2 for which the above equation holds are called the eigenvalues and eigenfunctions of the covariance function KX1t1 , t22. Note that it is possible for the eigenfunctions to be complex-valued, e.g., complex exponentials. It can be shown that if KX1t1 , t22 is continuous, then the normalized eigenfunctions form an orthonormal set and satisfy Mercer’s theorem: KX1t1 , t22 = a lkfk1t12f…k1t22. q (9.123) k=1 Note the correspondence between Eq. (9.121) and Eq. (9.123). Equation (9.123) in turn implies that KX1t, t2 = a lk ƒ fk1t2 ƒ 2. q (9.124) k=1 We are now ready to show that the equality in Eq. (9.119a) holds in the mean square sense and that the coefficients Xk are orthogonal random variables. First consider E3XkX…m4: E3XkX…m4 = E B X…m L0 T X1t¿2f…k1t2 dt¿ R = L0 T E3X1t¿2X…m4f…k1t¿2 dt¿. 548 Chapter 9 Random Processes The integrand of the above equation has T X*1u2fm1u2 du R = E3X1t2X…m4 = E B X1t2 L0 = lmfm1t2. Therefore L0 T KX1t, u2fm1u2 du T E3XkX…m4 = lmf…k1t¿2fm1t¿2 dt¿ = lkdk,m , L0 where dk,m is the Kronecker delta function. Thus Xk and Xm are orthogonal random variables. Note that the above equation implies that lk = E C ƒ Xk ƒ 2 D , that is, the eigenvalues are real-valued. To show that the Karhunen-Loeve expansion equals X(t) in the mean square sense, we take E B ` X1t2 - a Xkfk1t2 ` R q q 2 k=- = E3 ƒ X1t2 ƒ 24 - E B X1t2 a X…kf…k1t2 R q q k=- - E B X*1t2 a Xkfk1t2 R q q k=- + E B a a XkX…mfk1t2f…m1t2 R q q q q k=- m=q = RX1t, t2 - a lk ƒ fk1t2 ƒ 2 q k=- - a l…k ƒ fk1t2 ƒ 2 + a lk ƒ fk1t2 ƒ 2. q q q q k=- k=- The above equation equals zero from Eq. (9.124) and from the fact that the lk are real. Thus we have shown that Eq. (9.119a) holds in the mean square sense. Finally, we note that in the important case where X(t) is a Gaussian random process, then the components Xk will be independent Gaussian random variables.This result is extremely useful in solving certain signal detection and estimation problems. [Van Trees.] Example 9.51 Wiener Process Find the Karhunen-Loeve expansion for the Wiener process. Equation (9.122) for the Wiener process gives, for 0 … t1 … T, lf1t12 = L0 = s2 T s2 min1t1 , t22f1t22 dt2 L0 t1 t2f1t22 dt2 + s2 Lt1 T t1f1t22 dt2 . Section 9.9 Fourier Series and Karhunen-Loeve Expansion 549 We differentiate the above integral equation once with respect to t1 to obtain an integral equation and again to obtain a differential equation: s2 Lt1 T f1t22 dt2 = l d f1t12 dt1 l d2 f1t12. s2 dt21 -f1t12 = This second-order differential equation has a sinusoidal solution: f1t12 = a sin . 2l 2l In order to solve the above equation for a, b, and l, we need boundary conditions for the differential equation. We obtain these by substituting the general solution for f1t2 into the integral equation: t1 T st1 st1 l a sin + b cos t2f1t22 dt2 + t1f1t22 dt2 . ≤ = 2¢ s L0 Lt1 2l 2l st1 + b cos st1 As t1 approaches zero, the right-hand side approaches zero. This implies that b = 0 in the lefthand side of the equation. A second boundary condition is obtained by letting t1 approach T in the equation obtained after the first differentiation of the integral equation: 0 = l sa d sT f1T2 = cos . dt1 2l 2l This implies that = an - 2l Therefore the eigenvalues are given by sT 1 bp 2 s2T2 1 2 an - b p2 2 ln = n = 1, 2, Á . n = 1, 2, . Á The normalization requirement implies that 1 = L0 T ¢ a sin 2l st 2 T 2 ≤ dt = a2 , which implies that a = 12/T21/2. Thus the eigenfunctions are given by fn1t2 = 1 p 2 sinan - b t AT 2 T 0 … t … T, and the Karhunen-Loeve expansion for the Wiener process is 2 1 p X1t2 = a Xn sinan - b t T 2 T A n-1 q 0 … t 6 T, where the Xn are zero-mean, independent Gaussian random variables with variance given by ln . 550 Chapter 9 Random Processes Example 9.52 White Gaussian Noise Process Find the Karhunen-Loeve expansion of the white Gaussian noise process. The white Gaussian noise process is the derivative of the Wiener process. If we take the derivative of the Karhunen-Loeve expansion of the Wiener process, we obtain X¿1t2 = a q n = 1 2l s Xn 2 1 p cosan - b t 2 T AT 1 p 2 cosan - b t = a Wn T 2 T A n=1 q 0 … t 6 T, where the Wn are independent Gaussian random variables with the same variance s2. This implies that the process has infinite power, a fact we had already found about the white Gaussian noise process. In the Problems we will see that any orthonormal set of eigenfunctions can be used in the Karhunen-Loeve expansion for white Gaussian noise. 9.10 GENERATING RANDOM PROCESSES Many engineering systems involve random processes that interact in complex ways. It is not always possible to model these systems precisely using analytical methods. In such situations computer simulation methods are used to investigate the system dynamics and to measure the performance parameters of interest. In this section we consider two basic methods to generating random processes. The first approach involves generating the sum process of iid sequences of random variables. We saw that this approach can be used to generate the binomial and random walk processes, and, through limiting procedures, the Wiener and Poisson processes. The second approach involves taking the linear combination of deterministic functions of time where the coefficients are given by random variables. The Fourier series and Karhunen-Loeve expansion use this approach. Real systems, e.g., digital modulation systems, also generate random processes in this manner. 9.10.1 Generating Sum Random Processes The generation of sample functions of the sum random process involves two steps: 1. Generate a sequence of iid random variables that drive the sum process. 2. Generate the cumulative sum of the iid sequence. Let D be an array of samples of the desired iid random variables. The function cumsum(D) in Octave and MATLAB then provides the cumulative sum, that is, the sum process, that results from the sequence in D. The code below generates m realizations of an n-step random walk process. >p=1/2 >n=1000 >m=4 Section 9.10 Generating Random Processes 551 > V=-1:2:1; > P=[1-p,p]; > D=discrete_rnd(V, P, m, n); > X=cumsum (D); > plot (X) Figures 9.7(a) and 9.7(b) in Section 9.3 show four sample functions of the symmetric random walk process for p = 1/2. The sample functions vary over a wide range of positive and negative values. Figure 9.7(c) shows four sample functions for p = 3/4. The sample functions now have a strong linear trend consistent with the mean n12p - 12. The variability about this trend is somewhat less than in the symmetric case since the variance function is now n4p11 - p2 = 3n/4. We can generate an approximation to a Poisson process by summing iid Bernoulli random variables. Figure 9.18(a) shows ten realizations of Poisson processes with l = 0.4 arrivals per second. The sample functions for T = 50 seconds were generated using a 1000-step binomial process with p = lT/n = 0.02. The linear increasing trend of the Poisson process is evident in the figure. Figure 9.18(b) shows the estimate of the mean and variance functions obtained by averaging across the 10 realizations. The linear trend in the sample mean function is very clear; the sample variance function is also linear but is much more variable. The mean and variance functions of the realizations are obtained using the commands mean(transpose(X)) and var(transpose(X)). We can generate sample functions of the random telegraph signal by taking the Poisson process N(t) and calculating X1t2 = 21N1t2 modulo 22 - 1. Figure 9.19(a) shows a realization of the random telegraph signal. Figure 9.19(b) shows an estimate of the covariance function of the random telegraph signal. The exponential decay in the covariance function can be seen in the figure. See Eq. (9.44). 25 20 15 10 5 0 0 5 10 15 20 25 (a) 30 35 40 45 50 20 18 16 14 12 10 8 6 4 2 0 0 100 200 300 400 500 600 700 800 900 1000 (b) FIGURE 9.18 (a) Ten sample functions of a Poisson random process with l = 0.4. (b) Sample mean and variance of ten sample functions of a Poisson random process with l = 0.4. 552 Chapter 9 Random Processes 1.5 1 0.5 0 ⫺0.5 ⫺1 ⫺1.5 0 10 20 30 40 50 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 20 40 (a) 60 80 100 (b) FIGURE 9.19 (a) Sample function of a random telegraph process with l ⴝ 0.4. (b) Estimate of covariance function of a random telegraph process. The covariance function is computed using the function CX_est below. function [CXall]=CX_est (X, L, M_est) N=length(X); % N is number of samples CX=zeros (1,L+1); % L is maximum lag M_est=mean(X) % Sample mean for m=1:L+1, % Add product terms for n=1:N-m+1, CX(m)=CX(m) + (X(n) - M_est) * (X(n+m-1)- M_est); end; CX (m)=CX(m) / (N-m+1); % Normalize by number of terms end; for i=1:L, CXall(i)=CX(L+2-i); % Lags 1 to L end CXall(L+1:2*L+1)=CX(1:L+1); % Lags L + 1 to 2L + 1 The Wiener random process can also be generated as a sum process. One approach is to generate a properly scaled random walk process, as in Eq. (9.50). A better approach is to note that the Wiener process has independent Gaussian increments, as in Eq. (9.52), and therefore, to generate the sequence D of increments for the time subintervals, and to then find the corresponding sum process. The code below generates a sample of the Wiener process: > a=2 > delta=0.001 > n=1000 > D=normal_rnd(0,a*delta,1,n) > X=cumsum(D); > plot(X) Section 9.10 Generating Random Processes 553 3 2.5 2 1.5 1 0.5 0 ⫺0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 FIGURE 9.20 Sample mean and variance functions from 50 realizations of Wiener process. Figure 9.12 in Section 9.5 shows four sample functions of a Brownian motion process with a = 2. Figure 9.20 shows the sample mean and sample variance of 50 sample functions of the Wiener process with a = 2. It can be seen that the mean across the 50 realizations is close to zero which is the actual mean function for the process. The sample variance across the 50 realizations increases steadily and is close to the actual variance function which is at = 2t. 9.10.2 Generating Linear Combinations of Deterministic Functions In some situations a random process can be represented as a linear combination of deterministic functions where the coefficients are random variables. The Fourier series and the Karhunen-Loeve expansions are examples of this type of representation. In Example 9.51 let the parameters in the Karhunen-Loeve expansion for a Wiener process in the interval 0 … t … T be T = 1, s2 = 1: 1 pt 1 2 = a Xn 22 sinan - bpt sinan - b X1t2 = a Xn T 2 T 2 A n=1 n=1 q q where the Xn are zero-mean, independent Gaussian random variables with variance ln = s2T2 1 = . 1n - 1/222p2 1n - 1/222p2 The following code generates the 100 Gaussian coefficients for the Karhunen-Loeve expansion for the Wiener process. 554 Chapter 9 Random Processes 1 0.5 0 ⫺0.5 ⫺1 ⫺1.5 ⫺2 0 10 20 30 40 50 60 70 80 90 100 FIGURE 9.21 Sample functions for Wiener process using 100 terms in KarhunenLoeve expansion. > > > > > > > > > > M=zeros(100,1); n=1:1:100; N=transpose(n); v=1./((N-0.5).^2 *pi ^2); t=0.01:0.01:1; p=(N-0.5)*t; x=normal_rnd(M,v,100,1); y=sqrt(2)*sin(pi *p); z=transpose(x)*y plot(z) % Number of coefficients % Variances of coefficients % Argument of sinusoid % Gaussian coefficients % sin terms Figure 9.21 shows the Karhunen-Loeve expansion for the Wiener process using 100 terms. The sample functions generally exhibit the same type behavior as in the previous figures. The sample functions, however, do not exhibit the jaggedness of the other examples, which are based on the generation of many more random variables. SUMMARY • A random process or stochastic process is an indexed family of random variables that is specified by the set of joint distributions of any number and choice of random variables in the family. The mean, autocovariance, and autocorrelation functions summarize some of the information contained in the joint distributions of pairs of time samples. • The sum process of an iid sequence has the property of stationary and independent increments, which facilitates the evaluation of the joint pdf/pmf of the Checklist of Important Terms • • • • • • • • • • • 555 process at any set of time instants. The binomial and random processes are sum processes. The Poisson and Wiener processes are obtained as limiting forms of these sum processes. The Poisson process has independent, stationary increments that are Poisson distributed. The interarrival times in a Poisson process are iid exponential random variables. The mean and covariance functions completely specify all joint distributions of a Gaussian random process. The Wiener process has independent, stationary increments that are Gaussian distributed. The Wiener process is a Gaussian random process. A random process is stationary if its joint distributions are independent of the choice of time origin. If a random process is stationary, then mX1t2 is constant, and RX1t1 , t22 depends only on t1 - t2 . A random process is wide-sense stationary (WSS) if its mean is constant and if its autocorrelation and autocovariance depend only on t1 - t2 . A WSS process need not be stationary. A wide-sense stationary Gaussian random process is also stationary. A random process is cyclostationary if its joint distributions are invariant with respect to shifts of the time origin by integer multiples of some period T. The white Gaussian noise process results from taking the derivative of the Wiener process. The derivative and integral of a random process are defined as limits of random variables. We investigated the existence of these limits in the mean square sense. The mean and autocorrelation functions of the output of systems described by a linear differential equation and subject to random process inputs can be obtained by solving a set of differential equations. If the input process is a Gaussian random process, then the output process is also Gaussian. Ergodic theorems state when time-average estimates of a parameter of a random process converge to the expected value of the parameter. The decay rate of the covariance function determines the convergence rate of the sample mean. CHECKLIST OF IMPORTANT TERMS Autocorrelation function Autocovariance function Average power Bernoulli random process Binomial counting process Continuous-time process Cross-correlation function Cross-covariance function Cyclostationary random process Discrete-time process Ergodic theorem Fourier series Gaussian random process Hurst parameter iid random process Independent increments Independent random processes Karhunen-Loeve expansion Markov random process Mean ergodic random process 556 Chapter 9 Random Processes Mean function Mean square continuity Mean square derivative Mean square integral Mean square periodic process Ornstein-Uhlenbeck process Orthogonal random processes Poisson process Random process Random telegraph signal Random walk process Realization, sample path, or sample function Shot noise Stationary increments Stationary random process Stochastic process Sum random process Time average Uncorrelated random processes Variance of X(t) White Gaussian noise Wide-sense cyclostationary process Wiener process WSS random process ANNOTATED REFERENCES References [1] through [6] can be consulted for further reading on random processes. Larson and Shubert [ref 7] and Yaglom [ref 8] contain excellent discussions on white Gaussian noise and Brownian motion. Van Trees [ref 9] gives detailed examples on the application of the Karhunen-Loeve expansion. Beran [ref 10] discusses long memory processes. 1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 2002. 2. W. B. Davenport, Probability and Random Processes: An Introduction for Applied Scientists and Engineers, McGraw-Hill, New York, 1970. 3. H. Stark and J. W. Woods, Probability and Random Processes with Applications to Signal Processing, 3d ed., Prentice Hall, Upper Saddle River, N.J., 2002. 4. R. M. Gray and L. D. Davisson, Random Processes: A Mathematical Approach for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986. 5. J. A. Gubner, Probability and Random Processes for Electrical and Computer Engineering, Cambridge University Press, Cambridge, 2006. 6. G. Grimett and D. Stirzaker, Probability and Random Processes, Oxford University Press, Oxford, 2006. 7. H. J. Larson and B. O. Shubert, Probabilistic Models in Engineering Sciences, vol. 1, Wiley, New York, 1979. 8. A. M. Yaglom, Correlation Theory of Stationary and Related Random Functions, vol. 1: Basic Results, Springer-Verlag, New York, 1987. 9. H. L. Van Trees, Detection, Estimation, and Modulation Theory, Wiley, New York, 1987. 10. J. Beran, Statistics for Long-Memory Processes, Chapman & Hall/CRC, New York, 1994. Problems 557 PROBLEMS Sections 9.1 and 9.2: Definition and Specification of a Stochastic Process 9.1. In Example 9.1, find the joint pmf for X1 and X2 . Why are X1 and X2 independent? 9.2. A discrete-time random process Xn is defined as follows. A fair die is tossed and the outcome k is observed. The process is then given by Xn = k for all n. (a) Sketch some sample paths of the process. (b) Find the pmf for Xn . (c) Find the joint pmf for Xn and Xn + k . (d) Find the mean and autocovariance functions of Xn . 9.3. A discrete-time random process Xn is defined as follows. A fair coin is tossed. If the outcome is heads, Xn = 1-12n for all n; if the outcome is tails, Xn = 1-12n + 1 for all n. (a) Sketch some sample paths of the process. (b) Find the pmf for Xn . (c) Find the joint pmf for Xn and Xn + k . (d) Find the mean and autocovariance functions of Xn . 9.4. A discrete-time random process is defined by Xn = sn, for n Ú 0, where s is selected at random from the interval (0, 1). (a) Sketch some sample paths of the process. (b) Find the cdf of Xn . (c) Find the joint cdf for Xn and Xn + 1 . (d) Find the mean and autocovariance functions of Xn . (e) Repeat parts a, b, c, and d if s is uniform in (1, 2). 9.5. Let g(t) be the rectangular pulse shown in Fig. P9.1. The random process X(t) is defined as X1t2 = Ag1t2, where A assumes the values ;1 with equal probability. 1 0 1 t FIGURE P9.1 (a) Find the pmf of X(t). (b) Find mX1t2. (c) Find the joint pmf of X(t) and X1t + d2. (d) Find CX1t, t + d2, d 7 0. 9.6. A random process is defined by Y1t2 = g1t - T2, where g(t) is the rectangular pulse of Fig. P9.1, and T is a uniformly distributed random variable in the interval (0, 1). 558 Chapter 9 Random Processes (a) Find the pmf of Y(t). (b) Find mY1t2 and CY1t1 , t22. 9.7. A random process is defined by X1t2 = g1t - T2, where T is a uniform random variable in the interval (0, 1) and g(t) is the periodic triangular waveform shown in Fig. P9.2. 1 0 1 2 t 3 FIGURE P9.2 9.8. 9.9. 9.10. 9.11. (a) Find the cdf of X(t) for 0 6 t 6 1. (b) Find mX(t) and CX1t1 , t22. Let Y1t2 = g1t - T2 as in Problem 9.6, but let T be an exponentially distributed random variable with parameter a. (a) Find the pmf of Y(t). (b) Find the joint pmf of Y(t) and Y1t + d2. Consider two cases: d 7 1, and 0 6 d 6 1. (c) Find mY1t2 and CY1t, t + d2 for d 7 1 and 0 6 d 6 1. Let Z1t2 = At3 + B, where A and B are independent random variables. (a) Find the pdf of Z(t). (b) Find mZ1t2 and CZ1t1 , t22. Find an expression for E3 ƒ Xt2 - Xt1 ƒ 24 in terms of autocorrelation function. The random process H(t) is defined as the “hard-limited” version of X(t): H1t2 = b +1 -1 if if X1t2 Ú 0 X1t2 6 0. (a) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with a random amplitude presented in Example 9.2. (b) Find the pdf, mean, and autocovariance of H(t) if X(t) is the sinusoid with random phase presented in Example 9.9. (c) Find a general expression for the mean of H(t) in terms of the cdf of X(t). 9.12. (a) Are independent random processes orthogonal? Explain. (b) Are orthogonal random processes uncorrelated? Explain. (c) Are uncorrelated processes independent? (d) Are uncorrelated processes orthogonal? 9.13. The random process Z(t) is defined by Z1t2 = 2Xt - Y, Problems 9.14. 9.15. 9.16. 9.17. 9.18. 9.19. 559 where X and Y are a pair of random variables with means mX , mY , variances s2X , s2Y , and correlation coefficient rX,Y . Find the mean and autocovariance of Z(t). Let H(t) be the output of the hard limiter in Problem 9.11. (a) Find the cross-correlation and cross-covariance between H(t) and X(t) when the input is a sinusoid with random amplitude as in Problem 9.11a. (b) Repeat if the input is a sinusoid with random phase as in Problem 9.11b. (c) Are the input and output processes uncorrelated? Orthogonal? Let Yn = Xn + g1n2 where Xn is a zero-mean discrete-time random process and g(n) is a deterministic function of n. (a) Find the mean and variance of Yn . (b) Find the joint cdf of Yn and Yn + 1 . (c) Find the autocovariance function of Yn . (d) Plot typical sample functions forXn and Yn if: g1n2 = n; g1n2 = 1/n2; g1n2 = 1/n. Let Yn = c1n2Xn where Xn is a zero-mean, unit-variance, discrete-time random process and c(n) is a deterministic function of n. (a) Find the mean and variance of Yn . (b) Find the joint cdf of Yn and Yn + 1 . (c) Find the autocovariance function of Yn . (d) Plot typical sample functions forXn and Yn if: c1n2 = n; c1n2 = 1/n2; c1n2 = 1/n. (a) Find the cross-correlation and cross-covariance for Xn and Yn in Problem 9.15. (b) Find the joint pdf of Xn and Yn + 1 . (c) Determine whether Xn and Yn are uncorrelated, independent, or orthogonal random processes. (a) Find the cross-correlation and cross-covariance for Xn and Yn in Problem 9.16. (b) Find the joint pdf of Xn and Yn + 1 . (c) Determine whether Xn and Yn are uncorrelated, independent, or orthogonal random processes. Suppose that X(t) and Y(t) are independent random processes and let U1t2 = X1t2 - Y1t2 V1t2 = X1t2 + Y1t2. (a) Find CUX1t1 , t22, CUY1t1 , t22, and CUV1t1 , t22. (b) Find the fU1t12X1t221u, x2, and fU1t12V1t221u, v2. Hint: Use auxiliary variables. 9.20. Repeat Problem 9.19 if X(t) and Y(t) are independent discrete-time processes and X(t) and Y(t) have different iid random processes. Section 9.3: Sum Process, Binomial Counting Process, and Random Walk 9.21. (a) Let Yn be the process that results when individual 1’s in a Bernoulli process are erased with probability a. Find the pmf of S¿n , the counting process for Yn . Does Yn have independent and stationary increments? (b) Repeat part a if in addition to the erasures, individual 0’s in the Bernoulli process are changed to 1’s with probability b. 9.22. Let Sn denote a binomial counting process. 560 Chapter 9 Random Processes (a) Show that P3Sn = j, Sn¿ = i4 Z P3Sn = j4P3Sn¿ = i4. (b) Find P3Sn2 = j ƒ Sn1 = i4, where n2 7 n1 . (c) Show that P3Sn2 = j ƒ Sn1 = i, Sn0 = k4 = P3Sn2 = j ƒ Sn1 = i4, where n2 7 n1 7 n0 . 9.23. (a) Find P3Sn = 04 for the random walk process. (b) What is the answer in part a if p = 1/2? 9.24. Consider the following moving average processes: Yn = 1/21Xn + Xn - 12 Zn = 2/3 Xn + 1/3 Xn - 1 X0 = 0 X0 = 0 (a) Find the mean, variance, and covariance of Yn and Zn if Xn is a Bernoulli random process. (b) Repeat part a if Xn is the random step process. (c) Generate 100 outcomes of a Bernoulli random process Xn , and find the resulting Yn and Zn . Are the sample means of Yn and Zn in part a close to their respective means? (d) Repeat part c with Xn given by the random step process. 9.25. Consider the following autoregressive processes: Wn = 2Wn - 1 + Xn W0 = 0 Zn = 3/4 Zn - 1 + Xn Z0 = 0. (a) Suppose that Xn is a Bernoulli process. What trends do the processes exhibit? (b) Express Wn and Zn in terms of Xn , Xn - 1 , Á , X1 and then find E3Wn4 and E3Zn4. Do these results agree with the trends you expect? (c) Do Wn or Zn have independent increments? stationary increments? (d) Generate 100 outcomes of a Bernoulli process. Find the resulting realizations of Wn and Zn . Is the sample mean meaningful for either of these processes? (e) Repeat part d if Xn is the random step process. 9.26. Let Mn be the discrete-time process defined as the sequence of sample means of an iid sequence: X1 + X2 + Á + Xn . Mn = n (a) Find the mean, variance, and covariance of Mn . (b) Does Mn have independent increments? stationary increments? 9.27. Find the pdf of the processes defined in Problem 9.24 if the Xn are an iid sequence of zero-mean, unit-variance Gaussian random variables. 9.28. Let Xn consist of an iid sequence of Cauchy random variables. (a) Find the pdf of the sum process Sn . Hint: Use the characteristic function method. (b) Find the joint pdf of Sn and Sn + k . 9.29. Let Xn consist of an iid sequence of Poisson random variables with mean a. (a) Find the pmf of the sum process Sn . (b) Find the joint pmf of Sn and Sn + k . Problems 561 9.30. Let Xn be an iid sequence of zero-mean, unit-variance Gaussian random variables. (a) Find the pdf of Mn defined in Problem 9.26. (b) Find the joint pdf of Mn and Mn + k . Hint: Use the independent increments property of Sn . 9.31. Repeat Problem 9.26 with Xn = 1/21Yn + Yn - 12, where Yn is an iid random process. What happens to the variance of Mn as n increases? 9.32. Repeat Problem 9.26 with Xn = 3/4Xn - 1 + Yn where Yn is an iid random process. What happens to the variance of Mn as n increases? 9.33. Suppose that an experiment has three possible outcomes, say 0, 1, and 2, and suppose that these occur with probabilities p0 , p1 , and p2 , respectively. Consider a sequence of independent repetitions of the experiment, and let Xj1n2 be the indicator function for outcome j. The vector X1n2 = 1X01n2, X11n2, X21n22 then constitutes a vector-valued Bernoulli random process. Consider the counting process for X(n): S1n2 = X1n2 + X1n - 12 + Á + X112 S102 = 0. (a) Show that S(n) has a multinomial distribution. (b) Show that S(n) has independent increments, then find the joint pmf of S(n) and S1n + k2. (c) Show that components Sj1n2 of the vector process are binomial counting processes. Section 9.4: Poisson and Associated Random Processes 9.34. A server handles queries that arrive according to a Poisson process with a rate of 10 queries per minute. What is the probability that no queries go unanswered if the server is unavailable for 20 seconds? 9.35. Customers deposit $1 in a vending machine according to a Poisson process with rate l. The machine issues an item with probability p. Find the pmf for the number of items dispensed in time t. 9.36. Noise impulses occur in a radio transmission according to a Poisson process of rate l. (a) Find the probability that no impulses occur during the transmission of a message that is t seconds long. (b) Suppose that the message is encoded so that the errors caused by up to 2 impulses can be corrected. What is the probability that a t-second message cannot be corrected? 9.37. Packets arrive at a multiplexer at two ports according to independent Poisson processes of rates l1 = 1 and l2 = 2 packets/second, respectively. (a) Find the probability that a message arrives first on line 2. (b) Find the pdf for the time until a message arrives on either line. (c) Find the pmf for N(t), the total number of messages that arrive in an interval of length t. (d) Generalize the result of part c for the “merging” of k independent Poisson processes of rates ll , Á , lk , respectively: N1t2 = N11t2 + Á + Nk1t2. 562 Chapter 9 Random Processes 9.38. (a) Find P3N1t - d2 = j ƒ N1t2 = k4 with d 7 0, where N(t) is a Poisson process with rate l. (b) Compare your answer to P3N1t + d2 = j ƒ N1t2 = k4. Explain the difference, if any. 9.39. Let N11t2 be a Poisson process with arrival rate l1 that is started at t = 0. Let N21t2 be another Poisson process that is independent of N11t2, that has arrival rate l2 , and that is started at t = 1. (a) Show that the pmf of the process N1t2 = N11t2 + N21t2 is given by: P3N1t + t2 - N1t2 = k4 = 1m1t + t2 - m1t22k k! e -1m1t + t2 - m1t22 for k = 0, 1, Á where m1t2 = E3N1t24. (b) Now consider a Poisson process in which the arrival rate l1t2 is a piecewise constant function of time. Explain why the pmf of the process is given by the above pmf where m1t2 = L0 t l1t¿2 dt¿. (c) For what other arrival functions l1t2 does the pmf in part a hold? 9.40. (a) Suppose that the time required to service a customer in a queueing system is a random variable T. If customers arrive at the system according to a Poisson process with parameter l, find the pmf for the number of customers that arrive during one customer’s service time. Hint: Condition on the service time. (b) Evaluate the pmf in part a if T is an exponential random variable with parameter b. 9.41. (a) Is the difference of two independent Poisson random processes also a Poisson process? (b) Let Np1t2 be the number of complete pairs generated by a Poisson process up to time t. Explain why Np1t2 is or is not a Poisson process. 9.42. Let N(t) be a Poisson random process with parameter l. Suppose that each time an event occurs, a coin is flipped and the outcome (heads or tails) is recorded. Let N11t2 and N21t2 denote the number of heads and tails recorded up to time t, respectively. Assume that p is the probability of heads. (a) Find P3N11t2 = j, N21t2 = k ƒ N1t2 = k + j4. (b) Use part a to show that N11t2 and N21t2 are independent Poisson random variables of rates plt and 11 - p2lt, respectively: P3N11t2 = j, N21t2 = k4 = 1plt2j j! e-plt 111 - p2lt2k k! e-11 - p2lt. 9.43. Customers play a $1 game machine according to a Poisson process with rate l. Suppose the machine dispenses a random reward X each time it is played. Let X(t) be the total reward issued up to time t. (a) Find expressions forP3X1t2 = j4 if Xn is Bernoulli. (b) Repeat part a if X assumes the values 50, 56 with probabilities (5/6, 1/6). Problems 563 (c) Repeat part a if X is Poisson with mean 1. (d) Repeat part a if with probability p the machine returns all the coins. 9.44. Let X(t) denote the random telegraph signal, and let Y(t) be a process derived from X(t) as follows: Each time X(t) changes polarity, Y(t) changes polarity with probability p. (a) Find the P3Y1t2 = ;14. (b) Find the autocovariance function of Y(t). Compare it to that of X(t). 9.45. Let Y(t) be the random signal obtained by switching between the values 0 and 1 according to the events in a Poisson process of rate l. Compare the pmf and autocovariance of Y(t) with that of the random telegraph signal. 9.46. Let Z(t) be the random signal obtained by switching between the values 0 and 1 according to the events in a counting process N(t). Let P3N1t2 = k4 = k lt 1 a b 1 + lt 1 + lt k = 0, 1, 2, Á . (a) Find the pmf of Z(t). (b) Find mZ1t2. 9.47. In the filtered Poisson process (Eq. (9.45)), let h(t) be a pulse of unit amplitude and duration T seconds. (a) Show that X(t) is then the increment in the Poisson process in the interval 1t - T, t2. (b) Find the mean and autocorrelation functions of X(t). 9.48. (a) Find the second moment and variance of the shot noise process discussed in Example 9.25. (b) Find the variance of the shot noise process if h1t2 = e -bt for t Ú 0. 9.49. Messages arrive at a message center according to a Poisson process of rate l. Every hour the messages that have arrived during the previous hour are forwarded to their destination. Find the mean of the total time waited by all the messages that arrive during the hour. Hint: Condition on the number of arrivals and consider the arrival instants. Section 9.5: Gaussian Random Process, Wiener Process and Brownian Motion 9.50. Let X(t) and Y(t) be jointly Gaussian random processes. Explain the relation between the conditions of independence, uncorrelatedness, and orthogonality of X(t) and Y(t). 9.51. Let X(t) be a zero-mean Gaussian random process with autocovariance function given by CX1t1 , t22 = 4e-2ƒt1 - t2ƒ. Find the joint pdf of X(t) and X1t + s2. 9.52. Find the pdf of Z(t) in Problem 9.13 if X and Y are jointly Gaussian random variables. 9.53. Let Y1t2 = X1t + d2 - X1t2, where X(t) is a Gaussian random process. (a) Find the mean and autocovariance of Y(t). (b) Find the pdf of Y(t). (c) Find the joint pdf of Y(t) and Y1t + s2. (d) Show that Y(t) is a Gaussian random process. 564 Chapter 9 Random Processes 9.54. Let X1t2 = A cos vt + B sin vt, where A and B are iid Gaussian random variables with zero mean and variance s2. (a) Find the mean and autocovariance of X(t). (b) Find the joint pdf of X(t) and X1t + s2. 9.55. Let X(t) and Y(t) be independent Gaussian random processes with zero means and the same covariance function C1t1 , t22. Define the “amplitude-modulated signal” by Z1t2 = X1t2 cos vt + Y1t2 sin vt. (a) Find the mean and autocovariance of Z(t). (b) Find the pdf of Z(t). 9.56. Let X(t) be a zero-mean Gaussian random process with autocovariance function given by CX1t1 , t22. If X(t) is the input to a “square law detector,” then the output is Y1t2 = X1t22. Find the mean and autocovariance of the output Y(t). 9.57. Let Y1t2 = X1t2 + mt, where X(t) is the Wiener process. (a) Find the pdf of Y(t). (b) Find the joint pdf of Y(t) and Y1t + s2. 9.58. Let Y1t2 = X21t2, where X(t) is the Wiener process. (a) Find the pdf of Y(t). (b) Find the conditional pdf of Y1t22 given Y1t12. 9.59. Let Z1t2 = X1t2 - aX1t - s2, where X(t) is the Wiener process. (a) Find the pdf of Z(t). (b) Find mZ1t2 and CZ1t1 , t22. 9.60. (a) For X(t) the Wiener process with a = 1 and 0 6 t 6 1, show that the joint pdf of X(t) and X(1) is given by: fX1t2, X1121x1 , x22 = exp b - 1x2 - x122 1 x21 B + Rr 2 t 11 - t2 2p2t11 - t2 . (b) Use part a to show that for 0 6 t 6 1, the conditional pdf of X(t) given X102 = X112 = 0 is: fX1t21x ƒ X102 = X112 = 02 = exp b - 1 x2 B Rr 2 t11 - t2 2p2t11 - t2 . (c) Use part b to find the conditional pdf of X(t) given X1t12 = a and X1t22 = b for t1 6 t 6 t2 . Hint: Find the equivalent process in the interval 10, t2 - t12. Problems 565 Section 9.6: Stationary Random Processes 9.61. (a) Is the random amplitude sinusoid in Example 9.9 a stationary random process? Is it wide-sense stationary? (b) Repeat part a for the random phase sinusoid in Example 9.10. 9.62. A discrete-time random process Xn is defined as follows. A fair coin is tossed; if the outcome is heads then Xn = 1 for all n, and Xn = -1 for all n, otherwise. (a) Is Xn a WSS random process? (b) Is Xn a stationary random process? (c) Do the answers in parts a and b change if p is a biased coin? 9.63. Let Xn be the random process in Problem 9.3. (a) Is Xn a WSS random process? (b) Is Xn a stationary random process? (c) Is Xn a cyclostationary random process? 9.64. Let X1t2 = g1t - T2, where g(t) is the periodic waveform introduced in Problem 9.7, and T is a uniformly distributed random variable in the interval (0, 1). Is X(t) a stationary random process? Is X(t) wide-sense stationary? 9.65. Let X(t) be defined by X1t2 = A cos vt + B sin vt, where A and B are iid random variables. (a) Under what conditions is X(t) wide-sense stationary? (b) Show that X(t) is not stationary. Hint: Consider E3X31t24. 9.66. Consider the following moving average process: Yn = 1/21Xn + Xn - 12 X0 = 0. (a) Is Yn a stationary random process if Xn is an iid integer-valued process? (b) Is Yn a stationary random process if Xn is a stationary process? (c) Are Yn and Xn jointly stationary random processes if Xn is an iid process? a stationary process? 9.67. Let Xn be a zero-mean iid process, and let Zn be an autoregressive random process Zn = 3/4Zn - 1 + Xn Z0 = 0. (a) Find the autocovariance of Zn and determine whether Zn is wide-sense stationary. Hint: Express Zn in terms of Xn , Xn - 1 , Á , X1 . (b) Does Zn eventually settle down into stationary behavior? (c) Find the pdf of Zn if Xn is an iid sequence of zero-mean, unit-variance Gaussian random variables. What is the pdf of Zn as n : q ? 9.68. Let Y1t2 = X1t + s2 - bX1t2, where X(t) is a wide-sense stationary random process. (a) Determine whether Y(t) is also a wide-sense stationary random process. (b) Find the cross-covariance function of Y(t) and X(t). Are the processes jointly widesense stationary? 566 Chapter 9 Random Processes (c) Find the pdf of Y(t) if X(t) is a Gaussian random process. (d) Find the joint pdf of Y1t12 and Y1t22 in part c. (e) Find the joint pdf of Y1t12 and X1t22 in part c. 9.69. Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero means and the same covariance function CX1t2. Let Z(t) be defined by Z1t2 = 3X1t2 - 5Y1t2. (a) Determine whether Z(t) is also wide-sense stationary. (b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean random processes with CX1t2 = 4e-ƒtƒ. (c) Find the joint pdf of Z1t12 and Z1t22 in part b. (d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly stationary random processes? (e) Find the joint pdf of Z1t12 and X1t22 in part b. Hint: Use auxilliary variables. 9.70. Let X(t) and Y(t) be independent, wide-sense stationary random processes with zero means and the same covariance function CX1t2. Let Z(t) be defined by Z1t2 = X1t2 cos vt + Y1t2 sin vt. (a) Determine whether Z(t) is a wide-sense stationary random process. (b) Determine the pdf of Z(t) if X(t) and Y(t) are also jointly Gaussian zero-mean random processes with CX1t2 = 4e-ƒtƒ. (c) Find the joint pdf of Z1t12 and Z1t22 in part b. (d) Find the cross-covariance between Z(t) and X(t). Are Z(t) and X(t) jointly stationary random processes? (e) Find the joint pdf of Z1t12 and X1t22 in part b. 9.71. Let X(t) be a zero-mean, wide-sense stationary Gaussian random process with autocorrelation function RX1t2. The output of a “square law detector” is Y1t2 = X1t22. 2 Show that RY1t2 = RX1022 + 2RX 1t2. Hint: For zero-mean, jointly Gaussian random 2 2 2 2 variables E3X Z 4 = E3X 4E3Z 4 + 2E3XZ42. 9.72. A WSS process X(t) has mean 1 and autocorrelation function given in Fig. P9.3. RX (t) 4 2 2 2 ⫺9 ⫺8 ⫺7 ⫺6 ⫺5 ⫺4 ⫺3 ⫺2 ⫺1 0 1 2 3 4 5 6 7 FIGURE P9.3 (a) Find the mean component of RX1t2. (b) Find the periodic component of RX1t2. (c) Find the remaining component of RX1t2. t Problems 567 9.73. Let Xn and Yn be independent random processes. A multiplexer combines these two sequences into a combined sequence Uk , that is, U2n = Xn , 9.74. 9.75. 9.76. 9.77. 9.78. 9.79. 9.80. 9.81. U2n + 1 = Yn . (a) Suppose that Xn and Yn are independent Bernoulli random processes. Under what conditions is Uk a stationary random process? a cyclostationary random process? (b) Repeat part a if Xn and Yn are independent stationary random processes. (c) Suppose that Xn and Yn are wide-sense stationary random processes. Is Uk a widesense stationary random process? a wide-sense cyclostationary random process? Find the mean and autocovariance functions of Uk . (d) If Uk is wide-sense cyclostationary, find the mean and correlation function of the randomly phase-shifted version of Uk as defined by Eq. (9.72). A ternary information source produces an iid, equiprobable sequence of symbols from the alphabet 5a, b, c6. Suppose that these three symbols are encoded into the respective binary codewords 00, 01, 10. Let Bn be the sequence of binary symbols that result from encoding the ternary symbols. (a) Find the joint pmf of Bn and Bn + 1 for n even; n odd. Is Bn stationary? cyclostationary? (b) Find the mean and covariance functions of Bn . Is Bn wide-sense stationary? widesense cyclostationary? (c) If Bn is cyclostationary, find the joint pmf, mean, and autocorrelation functions of the randomly phase-shifted version of Bn as defined by Eq. (9.72). Let s(t) be a periodic square wave with period T = 1 which is equal to 1 for the first half of a period and -1 for the remainder of the period. Let X1t2 = As1t2, where A is a random variable. (a) Find the mean and autocovariance functions of X(t). (b) Is X(t) a mean-square periodic process? (c) Find the mean and autocovariance of Xs1t2 the randomly phase-shifted version of X(t) given by Eq. (9.72). Let X1t2 = As1t2 and Y1t2 = Bs1t2, where A and B are independent random variables that assume values +1 or -1 with equal probabilities, where s(t) is the periodic square wave in Problem 9.75. (a) Find the joint pmf of X1t12 and Y1t22. (b) Find the cross-covariance of X(t1) and Y(t2). (c) Are X(t) and Y(t) jointly wide-sense cyclostationary? Jointly cyclostationary? Let X(t) be a mean square periodic random process. Is X(t) a wide-sense cyclostationary process? Is the pulse amplitude modulation random process in Example 9.38 cyclostationary? Let X(t) be the random amplitude sinusoid in Example 9.37. Find the mean and autocorrelation functions of the randomly phase-shifted version of X(t) given by Eq. (9.72). Complete the proof that if X(t) is a cyclostationary random process, then Xs1t2, defined by Eq. (9.72), is a stationary random process. Show that if X(t) is a wide-sense cyclostationary random process, then Xs1t2, defined by Eq. (9.72), is a wide-sense stationary random process with mean and autocorrelation functions given by Eqs. (9.74a) and (9.74b). 568 Chapter 9 Random Processes Section 9.7: Continuity, Derivatives, and Integrals of Random Processes 9.82. Let the random process X1t2 = u1t - S2 be a unit step function delayed by an exponential random variable S, that is, X1t2 = 1 for t Ú S, and X1t2 = 0 for t 6 S. (a) Find the autocorrelation function of X(t). (b) Is X(t) mean square continuous? (c) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation functions. (d) Does X(t) have a mean square integral? If so, find its mean and autocovariance functions. 9.83. Let X(t) be the random telegraph signal introduced in Example 9.24. (a) Is X(t) mean square continuous? (b) Show that X(t) does not have a mean square derivative, and show that the second mixed partial derivative of its autocorrelation function has a delta function. What gives rise to this delta function? (c) Does X(t) have a mean square integral? If so, find its mean and autocovariance functions. 9.84. Let X(t) have autocorrelation function RX1t2 = s2e-at . 2 (a) Is X(t) mean square continuous? (b) Does X(t) have a mean square derivative? If so, find its mean and autocorrelation functions. (c) Does X(t) have a mean square integral? If so, find its mean and autocorrelation functions. (d) Is X(t) a Gaussian random process? 9.85. Let N(t) be the Poisson process. Find E31N1t2 - N1t02224 and use the result to show that N(t) is mean square continuous. 9.86. Does the pulse amplitude modulation random process discussed in Example 9.38 have a mean square integral? If so, find its mean and autocovariance functions. 9.87. Show that if X(t) is a mean square continuous random process, then X(t) has a mean square integral. Hint: Show that RX1t1 , t22 - RX1t0 , t02 = E31X1t12 - X1t022X1t224 + E3X1t021X1t22 - X1t0224, and then apply the Schwarz inequality to the two terms on the right-hand side. 9.88. Let Y(t) be the mean square integral of X(t) in the interval (0, t). Show that Y¿1t2 is equal to X(t) in the mean square sense. 9.89. Let X(t) be a wide-sense stationary random process. Show that E3X1t2X¿1t24 = 0. 9.90. A linear system with input Z(t) is described by X¿1t2 + aX1t2 = Z1t2 t Ú 0, X102 = 0. Find the output X(t) if the input is a zero-mean Gaussian random process with autocorrelation function given by RX1t2 = s2e-bƒtƒ. Problems 569 Section 9.8: Time Averages of Random Processes and Ergodic Theorems 9.91. Find the variance of the time average given in Example 9.47. 9.92. Are the following processes WSS and mean ergodic? (a) Discrete-time dice process in Problem 9.2. (b) Alternating sign process in Problem 9.3. (c) Xn = sn, for n Ú 0 in Problem 9.4. 9.93. Is the following WSS random process X(t) mean ergodic? RX1t2 = b 0 511 - ƒ t ƒ 2 ƒtƒ 7 1 ƒ t ƒ … 1. 9.94. Let X1t2 = A cos12pft2, where A is a random variable with mean m and variance s2. (a) Evaluate 6X1t27 T , find its limit as T : q , and compare to mX1t2. (b) Evaluate 6X1t + t2X1t27, find its limit as T : q , and compare to RX1t + t, t2. 9.95. Repeat Problem 9.94 with X1t2 = A cos12pft + ®2, where A is as in Problem 9.94, ® is a random variable uniformly distributed in 10, 2p2, and A and ® are independent random variables. 9.96. Find an exact expression for VAR3 6X1t27 T4 in Example 9.48. Find the limit as T : q . 9.97. The WSS random process Xn has mean m and autocovariance CX1k2 = 11/22ƒkƒ. Is Xn mean ergodic? 9.98. (a) Are the moving average processes Yn in Problem 9.24 mean ergodic? (b) Are the autoregressive processes Zn in Problem 9.25a mean ergodic? 9.99. (a) Show that a WSS random process is mean ergodic if q L- q ƒ C1u2 ƒ 6 q . (b) Show that a discrete-time WSS random process is mean ergodic if a ƒ C1k2 ƒ 6 q . q k = -q 9.100. Let 6X21t27 T denote a time-average estimate for the mean power of a WSS random process. (a) Under what conditions is this time average a valid estimate for E3X21t24? (b) Apply your result in part a for the random phase sinusoid in Example 9.2. 9.101. (a) Under what conditions is the time average 6X1t + t2X1t2 7 T a valid estimate for the autocorrelation RX1t2 of a WSS random process X(t)? (b) Apply your result in part a for the random phase sinusoid in Example 9.2. 9.102. Let Y(t) be the indicator function for the event 5a 6 X1t2 … b6, that is, Y1t2 = b 1 0 if X1t2 H 1a, b4 otherwise. (a) Show that 6Y1t27 T is the proportion of time in the time interval 1 -T, T2 that X1t2 H 1a, b4. 570 Chapter 9 Random Processes Find E36 Y1t27 T4. Under what conditions does 6Y1t27 T : P3a 6 X1t2 … b4? How can 6Y1t27 T be used to estimate P3X1t2 … x4? Apply the result in part d to the random telegraph signal. Repeat Problem 9.102 for the time average of the discrete-time Yn , which is defined as the indicator for the event 5a 6 Xn … b46. (b) Apply your result in part a to an iid discrete-valued random process. (c) Apply your result in part a to an iid continuous-valued random process. For n Ú 1, define Zn = u1a - Xn2, where u(x) is the unit step function, that is, Xn = 1 if and only if Xn … a. (a) Show that the time average 6Zn 7 N is the proportion of Xn’s that are less than a in the first N samples. (b) Show that if the process is ergodic (in some sense), then this time average is equal to FX1a2 = P3X … a4. In Example 9.50 show that VAR38Xn9T4 = 1s2212T + 122H - 2. Plot the covariance function vs. k for the self-similar process in Example 9.50 with s2 = 1 for: H = 0.5, H = 0.6, H = 0.75, H = 0.99. Does the long-range dependence of the process increase or decrease with H? (a) Plot the variance of the sample mean given by Eq. (9.110) vs. T with s2 = 1 for: H = 0.5, H = 0.6, H = 0.75, H = 0.99. (b) For the parameters in part a, plot 12T + 122H - 1 vs. T, which is the ratio of the variance of the sample mean of a long-range dependent process relative to the variance of the sample mean of an iid process. How does the long-range dependence manifest itself, especially for H approaching 1? (c) Comment on the width of confidence intervals for estimates of the mean of longrange dependent processes relative to those of iid processes. Plot the variance of the sample mean for a long-range dependent process (Eq. 9.110) vs. the sample size T in a log-log plot. (a) What role does H play in the plot? (b) One of the remarkable indicators of long-range dependence in nature comes from a set of observations of the minimal water levels in the Nile river for the years 622–1281 [Beran, p. 22] where the log-log plot for part a gives a slope of -0.27. What value of H corresponds to this slope? Problem 9.99b gives a sufficient condition for mean ergodicity for discrete-time random processes. Use the expression in Eq. (9.112) for a long-range dependent process to determine whether the sufficient condition is satisfied. Comment on your findings. (b) (c) (d) (e) 9.103. (a) 9.104. 9.105. 9.106. 9.107. 9.108. 9.109. *Section 9.9: Fourier Series and Karhunen-Loeve Expansion 9.110. Let X1t2 = Xejvt where X is a random variable. (a) Find the correlation function for X(t), which for complex-valued random processes is defined by RX1t1 , t22 = E3X1t12X*1t224, where * denotes the complex conjugate. (b) Under what conditions is X(t) a wide-sense stationary random process? Problems 571 9.111. Consider the sum of two complex exponentials with random coefficients: X1t2 = X1ejv1t + X2ejv2t 9.112. 9.113. 9.114. 9.115. 9.116. 9.117. where v1 Z v2 . (a) Find the covariance function of X(t). (b) Find conditions on the complex-valued random variables X1, and X2 for X(t) to be a wide-sense stationary random process. (c) Show that if we let v1 = -v2 , X1 = 1U - jV2/2 and X2 = 1U + jV2/2, where U and V are real-valued random variables, then X(t) is a real-valued random process. Find an expression for X(t) and for the autocorrelation function. (d) Restate the conditions on X1 and X2 from part b in terms of U and V. (e) Suppose that in part c, U and V are jointly Gaussian random variables. Show that X(t) is a Gaussian random process. (a) Derive Eq. (9.118) for the correlation of the Fourier coefficients for a non-mean square periodic process X(t). (b) Show that Eq. (9.118) reduces to Eq. (9.117) when X(t) is WSS and mean square periodic. Let X(t) be a WSS Gaussian random process with RX1t2 = e-ƒtƒ. (a) Find the Fourier series expansion for X(t) in the interval [0, T]. (b) What is the distribution of the coefficients in the Fourier series? Show that the Karhunen-Loeve expansion of a WSS mean-square periodic process X(t) yields its Fourier series. Specify the orthonormal set of eigenfunctions and the corresponding eigenvalues. Let X(t) be the white Gaussian noise process introduced in Example 9.43. Show that any set of orthonormal functions can be used as the eigenfunctions for X(t) in its KarhunenLoeve expansion. What are the eigenvalues? Let Y1t2 = X1t2 + W1t2, where X(t) and W(t) are orthogonal random processes and W(t) is a white Gaussian noise process. Let fn1t2 be the eigenfunctions corresponding to KX1t1 , t22. Show that fn1t2 are also the eigenfunctions for KY1t1 , t22. What is the relation between the eigenvalues of KX1t1 , t22 and those of KY1t1 , t22? Let X(t) be a zero-mean random process with autocovariance RX1t2 = s2e-aƒtƒ. (a) Write the eigenvalue integral equation for the Karhunen-Loeve expansion of X(t) on the interval 3 -T, T4. (b) Differentiate the above integral equation to obtain the differential equation d2 f1t2 = dt2 a2 ¢ l - 2 l s2 ≤ a f1t2. (c) Show that the solutions to the above differential equation are of the form f1t2 = A cos bt and f1t2 = B sin bt. Find an expression for b. 572 Chapter 9 Random Processes (d) Substitute the f1t2 from part c into the integral equation of part a to show that if f1t2 = A cos bt, then b is the root of tan bT = a/b, and if f1t2 = B sin bt, then b is the root of tan bT = -b/a. (e) Find the values of A and B that normalize the eigenfunctions. *(f ) In order to show that the frequencies of the eigenfunctions are not harmonically related, plot the following three functions versus bT: tan bT, bT/aT, -aT/bT. The intersections of these functions yield the eigenvalues. Note that there are two roots per interval of length p. *Section 9.10: Generating Random Processes 9.118. (a) Generate 10 realizations of the binomial counting process with p = 1/4, p = 1/2, and p = 3/4. For each value of p, plot the sample functions for n = 200 trials. (b) Generate 50 realizations of the binomial counting process with p = 1/2. Find the sample mean and sample variance of the realizations for the first 200 trials. (c) In part b, find the histogram of increments in the process for the interval [1, 50], [51, 100], [101, 150], and [151, 200]. Compare these histograms to the theoretical pmf. How would you check to see if the increments in the four intervals are stationary? (d) Plot a scattergram of the pairs consisting of the increments in the interval [1, 50] and [51, 100] in a given realization. Devise a test to check whether the increments in the two intervals are independent random variables. 9.119. Repeat Problem 9.118 for the random walk process with the same parameters. 9.120. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid unit-variance Gaussian random variables with mean: m = 0; m = 0.5. 9.121. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid Poisson random variables with a = 1. 9.122. Repeat Problem 9.118 for the sum process in Eq. (9.24) where the Xn are iid Cauchy random variables with a = 1. 9.123. Let Yn = aYn - 1 + Xn where Y0 = 0. (a) Generate five realizations of the process for a = 1/4, 1/2, 9/10 and with Xn given by the p = 1/2 and p = 1/4 random step process. Plot the sample functions for the first 200 steps. Find the sample mean and sample variance for the outcomes in each realization. Plot the histogram for outcomes in each realization. (b) Generate 50 realizations of the process Yn with a = 1/2, p = 1/4, and p = 1/2. Find the sample mean and sample variance of the realizations for the first 200 trials. Find the histogram of Yn across the realizations at times n = 5, n = 50, and n = 200. (c) In part b, find the histogram of increments in the process for the interval [1, 50], [51, 100], [101, 150], and [151, 200]. To what theoretical pmf should these histograms be compared? Should the increments in the process be stationary? Should the increments be independent? 9.124. Repeat Problem 9.123 for the sum process in Eq. (9.24) where the Xn are iid unit-variance Gaussian random variables with mean: m = 0; m = 0.5. Problems 573 9.125. (a) Propose a method for estimating the covariance function of the sum process in Problem 9.118. Do not assume that the process is wide-sense stationary. (b) How would you check to see if the process is wide-sense stationary? (c) Apply the methods in parts a and b to the experiment in Problem 9.118b. (d) Repeat part c for Problem 9.123b. 9.126. Use the binomial process to approximate a Poisson random process with arrival rate l = 1 customer per second in the time interval (0, 100]. Try different values of n and come up with a recommendation on how n should be selected. 9.127. Generate 100 repetitions of the experiment in Example 9.21. (a) Find the relative frequency of the event P3N1102 = 3 and N1602 - N1452 = 24 and compare it to the theoretical probability. (b) Find the histogram of the time that elapses until the second arrival and compare it to the theoretical pdf. Plot the empirical cdf and compare it to the theoretical cdf. 9.128. Generate 100 realizations of the Poisson random process N(t) with arrival rate l = 1 customer per second in the time interval (0, 10]. Generate the pair 1N11t2, N21t22 by assigning arrivals in N(t) to N11t2 with probability p = 0.25 and to N21t2 with probability 0.75. (a) Find the histograms for N11102 and N21102 and compare them to the theoretical pmf by performing a chi-square goodness-of-fit test at a 5% significance level. (b) Perform a chi-square goodness-of-fit test to test whether N11102 and N21102 are independent random variables. How would you check whether N11t2 and N21t2 are independent random processes? 9.129. Subscribers log on to a system according to a Poisson process with arrival rate l = 1 customer per second. The ith customer remains logged on for a random duration of Ti seconds, where the Ti are iid random variables and are also independent of the arrival times. (a) Generate the sequence Sn of customer arrival times and the corresponding departure times given by Dn = Sn + Tn , where the connections times are all equal to 1. (b) Plot: A(t), the number of arrivals up to time t; D(t), the number of departures up to time t; and N1t2 = A1t2 - D1t2, the number in the system at time t. (c) Perform 100 simulations of the system operation for a duration of 200 seconds. Assume that customer connection times are an exponential random variables with mean 5 seconds. Find the customer departure time instants and the associated departure counting process D(t). How would you check whether D(t) is a Poisson process? Find the histograms for D(t) and the number in the system N(t) at t = 50, 100, 150, 200. Try to fit a pmf to each histogram. (d) Repeat part c if customer connection times are exactly 5 seconds long. 9.130. Generate 100 realizations of the Wiener process with a = 1 for the interval (0, 3.5) using the random walk limiting procedure. (a) Find the histograms for increments in the intervals (0, 0.5], (0.5, 1.5], and (1.5, 3.5] and compare these to the theoretical pdf. (b) Perform a test at a 5% significance level to determine whether the increments in the first two intervals are independent random variables. 574 Chapter 9 Random Processes 9.131. Repeat Problem 9.130 using Gaussian-distributed increments to generate the Wiener process. Discuss how the increment interval in the simulation should be selected. Problems Requiring Cumulative Knowledge 9.132. Let X(t) be a random process with independent increments. Assume that the increments X1t22 - X1t12 are gamma random variables with parameters l 7 0 and a = t2 - t1 . (a) Find the joint density function of X1t12 and X1t22. (b) Find the autocorrelation function of X(t). (c) Is X(t) mean square continuous? (d) Does X(t) have a mean square derivative? 9.133. Let X(t) be the pulse amplitude modulation process introduced in Example 9.38 with T = 1. A phase-modulated process is defined by Y1t2 = a cosa2pt + p X1t2b. 2 Plot the sample function of Y(t) corresponding to the binary sequence 0010110. Find the joint pdf of Y1t12 and Y1t22. Find the mean and autocorrelation functions of Y(t). Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process? Is Y(t) mean square continuous? Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation functions. 9.134. Let N(t) be the Poisson process, and suppose we form the phase-modulated process (a) (b) (c) (d) (e) (f) Y1t2 = a cos12pft + pN1t22. (a) Plot a sample function of Y(t) corresponding to a typical sample function of N(t). (b) Find the joint density function of Y1t12 and Y1t22. Hint: Use the independent increments property of N(t). (c) Find the mean and autocorrelation functions of Y(t). (d) Is Y(t) a stationary, wide-sense stationary, or cyclostationary random process? (e) Is Y(t) mean square continuous? (f) Does Y(t) have a mean square derivative? If so, find its mean and autocorrelation functions. 9.135. Let X(t) be a train of amplitude-modulated pulses with occurrences according to a Poisson process: X1t2 = a Akh1t - Sk2, q k=1 where the Ak are iid random variables, the Sk are the event occurrence times in a Poisson process, and h(t) is a function of time. Assume the amplitudes and occurrence times are independent. (a) Find the mean and autocorrelation functions of X(t). (b) Evaluate part a when h1t2 = u1t2, a unit step function. (c) Evaluate part a when h1t2 = p1t2, a rectangular pulse of duration T seconds. Problems 575 9.136. Consider a linear combination of two sinusoids: X1t2 = A 1 cos1v0t + ® 12 + A 2 cos1 22v0t + ® 22, where ® 1 and ® 2 are independent uniform random variables in the interval 10, 2p2, and A1 and A2 are jointly Gaussian random variables. Assume that the amplitudes are independent of the phase random variables. (a) Find the mean and autocorrelation functions of X(t). (b) Is X(t) mean square periodic? If so, what is the period? (c) Find the joint pdf of X1t12 and X1t22. 9.137. (a) A Gauss-Markov random process is a Gaussian random process that is also a Markov process. Show that the autocovariance function of such a process must satisfy CX1t3 , t12 = CX1t3 , t22CX1t2 , t12 CX1t2 , t22 , where t1 … t2 … t3 . (b) It can be shown that if the autocovariance of a Gaussian random process satisfies the above equation, then the process is Gauss-Markov. Is the Wiener process GaussMarkov? Is the Ornstein-Uhlenbeck process Gauss-Markov? 9.138. Let An and Bn be two independent stationary random processes. Suppose that An and Bn are zero-mean, Gaussian random processes with autocorrelation functions RA1k2 = s21r1ƒk ƒ RB1k2 = s22r2ƒkƒ. A block multiplexer takes blocks of two from the above processes and interleaves them to form the random process Ym: A1A2B1B2A3A4B3B4A5A6B5B6 Á . Find the autocorrelation function of Ym . Is Ym cyclostationary? wide-sense stationary? Find the joint pdf of Ym and Ym + 1 . Let Zm = Ym + T , where T is selected uniformly from the set 50, 1, 2, 36. Repeat parts a, b, and c for Zm . 9.139. Let An be the Gaussian random process in Problem 9.138. A decimator takes every other sample to form the random process Vm: (a) (b) (c) (d) A1A3A5A7A9A11 (a) Find the autocorrelation function of Vm . (b) Find the joint pdf of Vm and Vm + k. (c) An interpolator takes the sequence Vm and inserts zeros between samples to form the sequence Wk : A 10A 30A 50A 70A 90A 11 Á . Find the autocorrelation function of Wk . Is Wk a Gaussian random process? 576 Chapter 9 Random Processes 9.140. Let An be a sequence of zero-mean, unit-variance independent Gaussian random variables. A block coder takes pairs of A’s and linearly transforms them to form the sequence Yn: B Y2n 1 1 R = B Y2n + 1 22 1 1 A R B 2n R . -1 A2n + 1 (a) Find the autocorrelation function of Yn . (b) Is Yn stationary in any sense? (c) Find the joint pdf of Yn , Yn + 1 , and Yn + 2 . 9.141. Suppose customer orders arrive according to a Bernoulli random process with parameter p. When an order arrives, its size is an exponential random variable with parameter l. Let Sn be the total size of all orders up to time n. (a) Find the mean and autocorrelation functions of Sn . (b) Is Sn a stationary random process? (c) Is Sn a Markov process? (d) Find the joint pdf of Sn and Sn + k . CHAPTER Analysis and Processing of Random Signals 10 In this chapter we introduce methods for analyzing and processing random signals. We cover the following topics: • Section 10.1 introduces the notion of power spectral density, which allows us to view random processes in the frequency domain. • Section 10.2 discusses the response of linear systems to random process inputs and introduce methods for filtering random processes. • Section 10.3 considers two important applications of signal processing: sampling and modulation. • Sections 10.4 and 10.5 discuss the design of optimum linear systems and introduce the Wiener and Kalman filters. • Section 10.6 addresses the problem of estimating the power spectral density of a random process. • Finally, Section 10.7 introduces methods for implementing and simulating the processing of random signals. 10.1 POWER SPECTRAL DENSITY The Fourier series and the Fourier transform allow us to view deterministic time functions as the weighted sum or integral of sinusoidal functions. A time function that varies slowly has the weighting concentrated at the low-frequency sinusoidal components. A time function that varies rapidly has the weighting concentrated at higher-frequency components. Thus the rate at which a deterministic time function varies is related to the weighting function of the Fourier series or transform. This weighting function is called the “spectrum” of the time function. The notion of a time function as being composed of sinusoidal components is also very useful for random processes. However, since a sample function of a random process can be viewed as being selected from an ensemble of allowable time functions, the weighting function or “spectrum” for a random process must refer in some way to the average rate of change of the ensemble of allowable time functions. Equation (9.66) shows that, for wide-sense stationary processes, the autocorrelation function 577 578 Chapter 10 Analysis and Processing of Random Signals RX1t2 is an appropriate measure for the average rate of change of a random process. Indeed if a random process changes slowly with time, then it remains correlated with itself for a long period of time, and RX1t2 decreases slowly as a function of t. On the other hand, a rapidly varying random process quickly becomes uncorrelated with itself, and RX1t2 decreases rapidly with t. We now present the Einstein-Wiener-Khinchin theorem, which states that the power spectral density of a wide-sense stationary random process is given by the Fourier transform of the autocorrelation function.1 10.1.1 Continuous-Time Random Processes Let X(t) be a continuous-time WSS random process with mean mX and autocorrelation function RX1t2. Suppose we take the Fourier transform of a sample of X(t) in the interval 0 6 t 6 T as follows ' x1f2 = L0 T X1t¿2e -j2pft¿ dt¿. (10.1) We then approximate the power density as a function of frequency by the function: T T 1 ' 1' ' 1 ' ƒ x1f2 ƒ 2 = x1f2x …1f2 = b X1t¿2e -j2pft¿ dt¿ r b p T1f2 = X1t¿2ej2pft¿ dt¿ r , T T T L0 L0 (10.2) ' where * denotes the complex conjugate. X(t) is a random process, so pT1f2 is also a ' random process but over a different index set. pT1f2 is called the periodogram estimate and we are interested in the power spectral density of X(t) which is defined by: 1 ' ' SX1f2 = lim E3pT1f24 = lim E3 ƒ x1f2 ƒ 24. T: q T T: q (10.3) We show at the end of this section that the power spectral density of X(t) is given by the Fourier transform of the autocorrelation function: SX1f2 = f5RX1t26 = q L- q RX1t2e -j2pft dt. (10.4) A table of Fourier transforms and its properties is given in Appendix B. For real-valued random processes, the autocorrelation function is an even function of t: RX1t2 = RX1-t2. (10.5) 1 This result is usually called the Wiener-Khinchin theorem, after Norbert Wiener and A. Ya. Khinchin, who proved the result in the early 1930s. Later it was discovered that this result was stated by Albert Einstein in a 1914 paper (see Einstein). Section 10.1 Power Spectral Density 579 Substitution into Eq. (10.4) implies that SX1f2 = q L- q q = L- q RX1t25cos 2pft - j sin 2pft6 dt RX1t2 cos 2pft dt, (10.6) since the integral of the product of an even function 1RX1t22 and an odd function 1sin 2pft2 is zero. Equation (10.6) implies that SX1f2 is real-valued and an even function of f. From Eq. (10.2) we have that SX1f2 is nonnegative: SX1f2 Ú 0 for all f. (10.7) The autocorrelation function can be recovered from the power spectral density by applying the inverse Fourier transform formula to Eq. (10.4): RX1t2 = f-15SX1f26 q = L- q SX1f2ej2pft df. (10.8) Equation (10.8) is identical to Eq. (4.80), which relates the pdf to its corresponding characteristic function. The last section in this chapter discusses how the FFT can be used to perform numerical calculations for SX1f2 and RX1t2. In electrical engineering it is customary to refer to the second moment of X(t) as the average power of X(t).2 Equation (10.8) together with Eq. (9.64) gives E3X21t24 = RX102 = q L- q SX1f2 df. (10.9) Equation (10.9) states that the average power of X(t) is obtained by integrating SX1f2 over all frequencies. This is consistent with the fact that SX1f2 is the “density of power” of X(t) at the frequency f. Since the autocorrelation and autocovariance functions are related by RX1t2 = CX1t2 + m2X , the power spectral density is also given by SX1f2 = f5CX1t2 + m2X6 = f5CX1t26 + m2X d1f2, (10.10) where we have used the fact that the Fourier transform of a constant is a delta function. We say the mX is the “dc” component of X(t). The notion of power spectral density can be generalized to two jointly wide-sense stationary processes. The cross-power spectral density SX,Y1 f 2 is defined by SX,Y1f2 = f5RX,Y1t26, (10.11) 2 If X(t) is a voltage or current developed across a 1-ohm resistor, then X21t2 is the instantaneous power absorbed by the resistor. 580 Chapter 10 Analysis and Processing of Random Signals SX( f ) 1 a⫽1 a⫽2 ⫺5 ⫺4 p p ⫺3 p ⫺2 p ⫺1 p 0 1 p 2 p 3 p 4 p 5 p f FIGURE 10.1 Power spectral density of a random telegraph signal with a = 1 and a = 2 transitions per second. where RX,Y1t2 is the cross-correlation between X(t) and Y(t): RX,Y1t2 = E3X1t + t2Y1t24. (10.12) In general, SX,Y1f2 is a complex function of f even if X(t) and Y(t) are both real-valued. Example 10.1 Random Telegraph Signal Find the power spectral density of the random telegraph signal. In Example 9.24, the autocorrelation function of the random telegraph process was found to be RX1t2 = e -2aƒtƒ, where a is the average transition rate of the signal. Therefore, the power spectral density of the process is SX1f2 = 0 L- q e2ate -j2pft dt + L0 q e -2ate -j2pft dt = 1 1 + 2a - j2pf 2a + j2pf = 4a . 4a + 4p2f2 2 (10.13) Figure 10.1 shows the power spectral density for a = 1 and a = 2 transitions per second. The process changes two times more quickly when a = 2; it can be seen from the figure that the power spectral density for a = 2 has greater high-frequency content. Example 10.2 Sinusoid with Random Phase Let X1t2 = a cos12pf0t + ®2, where ® is uniformly distributed in the interval 10, 2p2. Find SX1f2. Section 10.1 Power Spectral Density 581 From Example 9.10, the autocorrelation for X(t) is RX1t2 = a2 cos 2pf0t. 2 Thus, the power spectral density is SX1f2 = = a2 f5cos 2pf0t6 2 a2 a2 d1f - f02 + d1f + f02, 4 4 (10.14) where we have used the table of Fourier transforms in Appendix B. The signal has average power RX102 = a2>2. All of this power is concentrated at the frequencies ;f0 , so the power density at these frequencies is infinite. Example 10.3 White Noise The power spectral density of a WSS white noise process whose frequency components are limited to the range -W … f … W is shown in Fig. 10.2(a). The process is said to be “white” in analogy to white light, which contains all frequencies in equal amounts. The average power in this SX ( f ) N0 /2 ⫺W f W (a) RX(t) N0W τ ⫺4 2W ⫺3 2W ⫺2 2W ⫺1 2W 0 1 2W 2 2W 3 2W 4 2W (b) FIGURE 10.2 Bandlimited white noise: (a) power spectral density, (b) autocorrelation function. 582 Chapter 10 Analysis and Processing of Random Signals process is obtained from Eq. (10.9): E3X21t24 = W N0 df = N0 W. L-W 2 (10.15) The autocorrelation for this process is obtained from Eq. (10.8): RX1t2 = = = W 1 N ej2pft df 2 0 L-W 1 e -j2pWt - ej2pWt N0 2 -j2pt N0 sin12pWt2 2pt . (10.16) RX1t2 is shown in Fig. 10.2(b). Note that X(t) and X1t + t2 are uncorrelated at t = ;k>2W, k = 1, 2, Á . The term white noise usually refers to a random process W(t) whose power spectral density is N0>2 for all frequencies: N0 (10.17) SW1f2 = for all f. 2 Equation (10.15) with W = q shows that such a process must have infinite average power. By taking the limit W : q in Eq. (10.16), we find that the autocorrelation of such a process approaches RW1t2 = N0 d1t2. 2 (10.18) If W(t) is a Gaussian random process, we then see that W(t) is the white Gaussian noise process introduced in Example 9.43 with a = N0>2. Example 10.4 Sum of Two Processes Find the power spectral density of Z1t2 = X1t2 + Y1t2, where X(t) and Y(t) are jointly WSS processes. The autocorrelation of Z(t) is RZ1t2 = E3Z1t + t2Z1t24 = E31X1t + t2 + Y1t + t221X1t2 + Y1t224 = RX1t2 + RYX1t2 + RXY1t2 + RY1t2. The power spectral density is then SZ1f2 = f5RX1t2 + RYX1t2 + RXY1t2 + RY1t26 = SX1f2 + SYX1f2 + SXY1f2 + SY1f2. (10.19) Example 10.5 Let Y1t2 = X1t - d2, where d is a constant delay and where X(t) is WSS. Find RYX1t2, SYX1f2, RY1t2, and SY1f2. Section 10.1 Power Spectral Density 583 The definitions of RYX1t2, SYX1f2, and RY1t2 give RYX1t2 = E3Y1t + t2X1t24 = E3X1t + t - d2X1t24 = RX1t - d2. (10.20) The time-shifting property of the Fourier transform gives SYX1f2 = f5RX1t - d26 = SX1f2e -j2pfd = SX1f2 cos12pfd2 - jSX1f2 sin12pfd2. (10.21) Finally, RY1t2 = E3Y1t + t2Y1t24 = E3X1t + t - d2X1t - d24 = RX1t2. (10.22) Equation (10.22) implies that SY1f2 = f5RY1T26 = f5RX1T26 = SX1f2. (10.23) Note from Eq. (10.21) that the cross-power spectral density is complex. Note from Eq. (10.23) that SX1f2 = SY1f2 despite the fact that X1t2 Z Y1t2. Thus, SX1f2 = SY1f2 does not imply that X1t2 = Y1t2. 10.1.2 Discrete-Time Random Processes Let Xn be a discrete-time WSS random process with mean mX and autocorrelation function RX1k2. The power spectral density of Xn is defined as the Fourier transform of the autocorrelation sequence SX1f2 = f5RX1k26 = a RX1k2e -j2pfk. q q (10.24) k=- Note that we need only consider frequencies in the range -1>2 6 f … 1>2, since SX1f2 is periodic in f with period 1. As in the case of continuous random processes, SX1f2 can be shown to be a real-valued, nonnegative, even function of f. The inverse Fourier transform formula applied to Eq. (10.23) implies that3 RX1k2 = 1>2 L-1>2 SX1f2ej2pfk df. (10.25) Equations (10.24) and (10.25) are similar to the discrete Fourier transform. In the last section we show how to use the FFT to calculate SX1f2 and RX1k2. The cross-power spectral density SX, Y1f 2 of two jointly WSS discrete-time processes Xn and Yn is defined by SX,Y1f2 = f5RX,Y1k26, (10.26) RX,Y1k2 = E3Xn + kYn4. (10.27) where RX,Y1k2 is the cross-correlation between Xn and Yn : You can view RX1k2 as the coefficients of the Fourier series of the periodic function SX1f2. 3 584 Chapter 10 Analysis and Processing of Random Signals Example 10.6 White Noise Let the process Xn be a sequence of uncorrelated random variables with zero mean and variance s2X . Find SX1f2. The autocorrelation of this process is RX1k2 = b s2X 0 k = 0 k Z 0. The power spectral density of the process is found by substituting RX1k2 into Eq. (10.24): SX1f2 = s2X - 1 1 6 f 6 . 2 2 (10.28) Thus the process Xn contains all possible frequencies in equal measure. Example 10.7 Moving Average Process Let the process Yn be defined by Yn = Xn + aXn - 1 , (10.29) where Xn is the white noise process of Example 10.6. Find SY1f2. It is easily shown that the mean and autocorrelation of Yn are given by E3Yn4 = 0, and 11 + a22s2X E3YnYn + k4 = c as2X 0 k = 0 k = ;1 otherwise. (10.30) The power spectral density is then SY1f2 = 11 + a22s2X + as2X5ej2pf + e -j2pf6 = s2X511 + a22 + 2a cos 2pf6. (10.31) SY1f2 is shown in Fig. 10.3 for a = 1. Example 10.8 Signal Plus Noise Let the observation Zn be given by Zn = Xn + Yn , where Xn is the signal we wish to observe, Yn is a white noise process with power s2Y , and Xn and Yn are independent random processes. Suppose further that Xn = A for all n, where A is a random variable with zero mean and variance s2A . Thus Zn represents a sequence of noisy measurements of the random variable A. Find the power spectral density of Zn . The mean and autocorrelation of Zn are E3Zn4 = E3A4 + E3Yn4 = 0 Section 10.1 Power Spectral Density 585 SY ( f ) 4σX 2 ⫺1 ⫺ 1 2 0 1 2 1 f FIGURE 10.3 Power spectral density of moving average process discussed in Example 10.7. and E3ZnZn + k4 = E31Xn + Yn21Xn + k + Yn + k24 = E3XnXn + k4 + E3Xn4E3Yn + k4 + E3Xn + k4E3Yn4 + E3YnYn + k4 = E3A24 + RY1k2. Thus Zn is also a WSS process. The power spectral density of Zn is then SZ1f2 = E3A24d1f2 + SY1f2, where we have used the fact that the Fourier transform of a constant is a delta function. 10.1.3 Power Spectral Density as a Time Average In the above discussion, we simply stated that the power spectral density is given as the Fourier transform of the autocorrelation without supplying a proof. We now show how the power spectral density arises naturally when we take Fourier transforms of realizations of random processes. Let X0 , Á , Xk - 1 be k observations from the discrete-time, WSS process Xn . Let ' xk1f2 denote the discrete Fourier transform of this sequence: ' xk1f2 = a Xme -j2pfm. k-1 (10.32) m=0 ' ' Note that xk1f2 is a complex-valued random variable. The magnitude squared of xk1f2 is a measure of the “energy” at the frequency f. If we divide this energy by the total “time” k, we obtain an estimate for the “power” at the frequency f : 1 ' ' pk1f2 = ƒ xk1f2 ƒ 2. k ' pk1f2 is called the periodogram estimate for the power spectral density. (10.33) 586 Chapter 10 Analysis and Processing of Random Signals Consider the expected value of the periodogram estimate: 1 ' ' ' E3pk1f24 = E3xk1f2x *k1f24 k = k-1 k-1 1 E B a Xme -j2pfm a Xiej2pfi R k i=0 m=0 = 1 k-1 k-1 -j2pf1m - i2 a E3XmXi4e k ma =0 i=0 = 1 k-1 k-1 -j2pf1m - i2 . a RX1m - i2e k ma =0 i=0 (10.34) Figure 10.4 shows the range of the double summation in Eq. (10.34). Note that all the terms along the diagonal m¿ = m - i are equal, that m¿ ranges from -1k - 12 to k - 1, and that .here are k - ƒ m¿ ƒ terms along the diagonal m¿ = m - i. Thus Eq. (10.34) becomes 1 ' E3pk1f24 = k a k-1 m¿ = -1k - 12 a k-1 = 5k - ƒ m¿ ƒ 6RX1m¿2e -j2pfm¿ m¿ = -1k - 12 e1 - ƒ m¿ ƒ fRX1m¿2e -j2pfm¿. k (10.35) Comparison of Eq. (10.35) with Eq. (10.24) shows that the mean of the periodogram estimate is not equal to SX1f2 for two reasons. First, Eq. (10.34) does not have the term in brackets in Eq. (10.25). Second, the limits of the summation in Eq. (10.35) are not ' ; q . We say that pk1f2 is a “biased” estimator for SX1f2. However, as k : q , we see i – (k 1) ⫽ m – i 0 ⫽ m – i k–1 k– k–1 FIGURE 10.4 Range of summation in Eq. (10.34). 1 ⫽ m – i m Section 10.2 Response of Linear Systems to Random Signals 587 that the term in brackets approaches one, and that the limits of the summation approach ; q . Thus ' (10.36) E3pk1f24 : SX1f2 as k : q , that is, the mean of the periodogram estimate does indeed approach SX1f2. Note ' that Eq. (10.36) shows that SX1f2 is nonnegative for all f, since pk1f2 is nonnegative for all f. In order to be useful, the variance of the periodogram estimate should also approach zero. The answer to this question involves looking more closely at the problem of power spectral density estimation. We defer this topic to Section 10.6. All of the above results hold for a continuous-time WSS random process X(t) after appropriate changes are made from summations to integrals. The periodogram estimate for SX1 f 2, for an observation in the interval 0 6 t 6 T, was defined in Eq. 10.2. The same derivation that led to Eq. (10.35) can be used to show that the mean of the periodogram estimate is given by ' E3pT1f24 = T L-T e1 - ƒtƒ fRX1t2e -j2pft dt. T (10.37a) It then follows that ' E3pT1f24 : SX1f2 10.2 as T : q . (10.37b) RESPONSE OF LINEAR SYSTEMS TO RANDOM SIGNALS Many applications involve the processing of random signals (i.e., random processes) in order to achieve certain ends. For example, in prediction, we are interested in predicting future values of a signal in terms of past values. In filtering and smoothing, we are interested in recovering signals that have been corrupted by noise. In modulation, we are interested in converting low-frequency information signals into high-frequency transmission signals that propagate more readily through various transmission media. Signal processing involves converting a signal from one form into another. Thus a signal processing method is simply a transformation or mapping from one time function into another function. If the input to the transformation is a random process, then the output will also be a random process. In the next two sections, we are interested in determining the statistical properties of the output process when the input is a widesense stationary random process. 10.2.1 Continuous-Time Systems Consider a system in which an input signal x(t) is mapped into the output signal y(t) by the transformation y1t2 = T3x1t24. The system is linear if superposition holds, that is, T3ax11t2 + bx21t24 = aT3x11t24 + bT3x21t24, 588 Chapter 10 Analysis and Processing of Random Signals where x11t2 and x21t2 are arbitrary input signals, and a and b are arbitrary constants.4 Let y(t) be the response to input x(t), then the system is said to be time-invariant if the response to x1t - t2 is y1t - t2. The impulse response h(t) of a linear, time-invariant system is defined by h1t2 = T3d1t24 where d1t2 is a unit delta function input applied at t = 0. The response of the system to an arbitrary input x(t) is then q y1t2 = h1t2 * x1t2 = L- q q h1s2x1t - s2 ds = L- q h1t - s2x1s2 ds. (10.38) Therefore a linear, time-invariant system is completely specified by its impulse response. The impulse response h(t) can also be specified by giving its Fourier transform, the transfer function of the system: H1f2 = f5h1t26 = q L- q h1t2e -j2pft dt. (10.39) A system is said to be causal if the response at time t depends only on past values of the input, that is, if h1t2 = 0 for t 6 0. If the input to a linear, time-invariant system is a random process X(t) as shown in Fig. 10.5, then the output of the system is the random process given by q Y1t2 = L- q q h1s2X1t - s2 ds = L- q h1t - s2X1s2 ds. (10.40) We assume that the integrals exist in the mean square sense as discussed in Section 9.7. We now show that if X(t) is a wide-sense stationary process, then Y(t) is also widesense stationary.5 The mean of Y(t) is given by q q E3Y1t24 = E B L- q h1s2X1t - s2 ds R = X(t) h(t) L- q h1s2E3X1t - s24 ds. Y(t) FIGURE 10.5 A linear system with a random input signal. 4 For examples of nonlinear systems see Problems 9.11 and 9.56. Equation (10.40) supposes that the input was applied at an infinite time in the past. If the input is applied at t = 0, then Y(t) is not wide-sense stationary. However, it becomes wide-sense stationary as the response reaches “steady state” (see Example 9.46 and Problem 10.29). 5 Section 10.2 Response of Linear Systems to Random Signals 589 Now mX = E3X1t - t24 since X(t) is wide-sense stationary, so q h1t2 dt = mXH102, (10.41) L- q where H( f ) is the transfer function of the system. Thus the mean of the output Y(t) is the constant mY = H102mX . The autocorrelation of Y(t) is given by E3Y1t24 = mX q E3Y1t2Y1t + t24 = E B L- q q = L- q h1r2X1t + t - r2 dr R q L- q L- q q = q h1s2X1t - s2 ds q L- q L- q h1s2h1r2E3X1t - s2X1t + t - r24 ds dr h1s2h1r2RX1t + s - r2 ds dr, (10.42) where we have used the fact that X(t) is wide-sense stationary. The expression on the right-hand side of Eq. (10.42) depends only on t. Thus the autocorrelation of Y(t) depends only on t, and since the E[Y(t)] is a constant, we conclude that Y(t) is a widesense stationary process. We are now ready to compute the power spectral density of the output of a linear, time-invariant system. Taking the transform of RY1t2 as given in Eq. (10.42), we obtain SY1f2 = q L- q q = RY1t2e -j2pft dt q q L- q L- q L- q h1s2h1r2RX1t + s - r2e -j2pft ds dr dt. Change variables, letting u = t + s - r: SY1f2 = q q q L- q L- q L- q h1s2h1r2RX1u2e -j2pf1u - s + r2 ds dr du q q L- q L- q … = H 1f2H1f2SX1f2 = h1s2ej2pfs ds = ƒ H1f2 ƒ 2 SX1f2, q h1r2e -j2pfr dr L- q RX1u2e -j2pfu du (10.43) where we have used the definition of the transfer function. Equation (10.43) relates the input and output power spectral densities to the system transfer function. Note that RY1t2 can also be found by computing Eq. (10.43) and then taking the inverse Fourier transform. Equations (10.41) through (10.43) only enable us to determine the mean and autocorrelation function of the output process Y(t). In general this is not enough to determine probabilities of events involving Y(t). However, if the input process is a 590 Chapter 10 Analysis and Processing of Random Signals Gaussian WSS random process, then as discussed in Section 9.7 the output process will also be a Gaussian WSS random process. Thus the mean and autocorrelation function provided by Eqs. (10.41) through (10.43) are enough to determine all joint pdf’s involving the Gaussian random process Y(t). The cross-correlation between the input and output processes is also of interest: RY,X1t2 = E3Y1t + t2X1t24 q = E B X1t2 L- q X1t + t - r2h1r2 dr R q = L- q q = L- q E3X1t2X1t + t - r24h1r2 dr RX1t - r2h1r2 dr = RX1t2 * h1t2. (10.44) By taking the Fourier transform, we obtain the cross-power spectral density: SY,X1f2 = H1f2SX1f2. (10.45a) SX,Y1f2 = S…Y,X1f2 = H …1f2SX1f2. (10.45b) Since RX,Y1t2 = RY,X1-t2, we have that Example 10.9 Filtered White Noise Find the power spectral density of the output of a linear, time-invariant system whose input is a white noise process. Let X(t) be the input process with power spectral density SX1f2 = N0 2 for all f. The power spectral density of the output Y(t) is then SY1f2 = ƒ H1f2 ƒ 2 N0 . 2 (10.46) Thus the transfer function completely determines the shape of the power spectral density of the output process. Example 10.9 provides us with a method for generating WSS processes with arbitrary power spectral density SY1f2. We simply need to filter white noise through a filter with transfer function H1f2 = 2SY1f2. In general this filter will be noncausal. We can usually, but not always, obtain a causal filter with transfer function H( f ) such that SY1f2 = H1f2H …1f2. For example, if SY1f2 is a rational function, that is, if it consists of the ratio of two polynomials, then it is easy to factor SX1f2 into the above form, as Section 10.2 Response of Linear Systems to Random Signals 591 shown in the next example. Furthermore any power spectral density can be approximated by a rational function. Thus filtered white noise can be used to synthesize WSS random processes with arbitrary power spectral densities, and hence arbitrary autocorrelation functions. Example 10.10 Ornstein-Uhlenbeck Process Find the impulse response of a causal filter that can be used to generate a Gaussian random process with output power spectral density and autocorrelation function SY1f2 = s2 a + 4p2f2 2 and RY1t2 = s2 -aƒtƒ e 2a This power spectral density factors as follows: SY1f2 = 1 1 s2. 1a - j2pf2 1a + j2pf2 If we let the filter transfer function be H1f2 = 1>1a + j2pf2, then the impulse response is h1t2 = e -at for t Ú 0, which is the response of a causal system. Thus if we filter white Gaussian noise with power spectral density s2 using the above filter, we obtain a process with the desired power spectral density. In Example 9.46, we found the autocorrelation function of the transient response of this filter for a white Gaussian noise input (see Eq. (9.97a)). As was already indicated, when dealing with power spectral densities we assume that the processes are in steady state. Thus as t : q Eq. (9.97a) approaches Eq. (9.97b). Example 10.11 Ideal Filters Let Z1t2 = X1t2 + Y1t2, where X(t) and Y(t) are independent random processes with power spectral densities shown in Fig. 10.6(a). Find the output if Z(t) is input into an ideal lowpass filter with transfer function shown in Fig. 10.6(b). Find the output if Z(t) is input into an ideal bandpass filter with transfer function shown in Fig. 10.6(c). The power spectral density of the output W(t) of the lowpass filter is SW1f2 = ƒ HLP1f2 ƒ 2SX1f2 + ƒ HLP1f2 ƒ 2SY1f2 = SX1f2, since HLP1f2 = 1 for the frequencies where SX1f2 is nonzero, and HLP1f2 = 0 where SY1f2 is nonzero. Thus W(t) has the same power spectral density as X(t). As indicated in Example 10.5, this does not imply that W1t2 = X1t2. To show that W1t2 = X1t2, in the mean square sense, consider D1t2 = W1t2 - X1t2. It is easily shown that RD1t2 = RW1t2 - RWX1t2 - RXW1t2 + RX1t2. The corresponding power spectral density is SD1f2 = SW1f2 - SWX1f2 - SXW1f2 + SX1f2 = ƒ HLP1f2 ƒ 2SX1f2 - HLP1f2SX1f2 - H …LP1f2SX1f2 + SX1f2 = 0. 592 Chapter 10 Analysis and Processing of Random Signals SY ( f ) SX ( f ) ⫺W2 ⫺f0 ⫺W1 W 0 (a) W W1 f0 W2 f HLP( f ) 1 ⫺W 0 (b) f W HBP( f ) 1 1 ⫺W2 ⫺f0 ⫺W1 0 (c) ⫹W1 f0 ⫹W2 f FIGURE 10.6 (a) Input signal to filters is X1t2 + Y1t2, (b) lowpass filter, (c) bandpass filter. Therefore RD1t2 = 0 for all t, and W1t2 = X1t2 in the mean square sense since E31W1t2 - X1t2224 = E3D21t24 = RD102 = 0. Thus we have shown that the lowpass filter removes Y(t) and passes X(t). Similarly, the bandpass filter removes X(t) and passes Y(t). Example 10.12 A random telegraph signal is passed through an RC lowpass filter which has transfer function H1f2 = b , b + j2pf where b = 1>RC is the time constant of the filter. Find the power spectral density and autocorrelation of the output. Section 10.2 Response of Linear Systems to Random Signals 593 In Example 10.1, the power spectral density of the random telegraph signal with transition rate a was found to be 4a . SX1f2 = 4a2 + 4p2f2 From Eq. (10.43) we have SY1f2 = ¢ = b2 2 2 2≤ ¢ b + 4p f 4a ≤ 4a + 4p2f2 2 4ab 2 1 1 - 2 b r. b - 4a2 4a2 + 4p2f2 b + 4p2f2 2 RY1t2 is found by inverting the above expression: RY1t2 = 1 5b 2e -2aƒtƒ - 2abe -bƒtƒ6. b 2 - 4a2 10.2.2 Discrete-Time Systems The results obtained above for continuous-time signals also hold for discrete-time signals after appropriate changes are made from integrals to summations. Let the unit-sample response hn be the response of a discrete-time, linear, timeinvariant system to a unit-sample input dn: dn = b n = 0 n Z 0. 1 0 (10.47) The response of the system to an arbitrary input random process Xn is then given by Yn = hn *Xn = a hjXn - j = a hn - jXj . q q q q j=- j=- (10.48) Thus discrete-time, linear, time-invariant systems are determined by the unit-sample response hn . The transfer function of such a system is defined by H1f2 = a hie -j2pfi. q q (10.49) i=- The derivation from the previous section can be used to show that if Xn is a widesense stationary process, then Yn is also wide-sense stationary.The mean of Yn is given by mY = mX a hj = mXH102. q q (10.50) j=- The autocorrelation of Yn is given by RY1k2 = a a hjhiRX1k + j - i2. q q q q j=- i=- (10.51) 594 Chapter 10 Analysis and Processing of Random Signals By taking the Fourier transform of RY1k2 it is readily shown that the power spectral density of Yn is (10.52) SY1f2 = ƒ H1f2 ƒ 2SX1f2. This is the same equation that was found for continuous-time systems. Finally, we note that if the input process Xn is a Gaussian WSS random process, then the output process Yn is also a Gaussian WSS random whose statistics are completely determined by the mean and autocorrelation function provided by Eqs. (10.50) through (10.52). Example 10.13 Filtered White Noise Let Xn be a white noise sequence with zero mean and average power s2X. If Xn is the input to a linear, time-invariant system with transfer function H( f ), then the output process Yn has power spectral density: (10.53) SY1f2 = ƒ H1f2 ƒ 2s2X. Equation (10.53) provides us with a method for generating discrete-time random processes with arbitrary power spectral densities or autocorrelation functions. If the power spectral density can be written as a rational function of z = ej2pf in Eq. (10.24), then a causal filter can be found to generate a process with the power spectral density. Note that this is a generalization of the methods presented in Section 6.6 for generating vector random variables with arbitrary covariance matrix. Example 10.14 First-Order Autoregressive Process A first-order autoregressive (AR) process Yn with zero mean is defined by Yn = aYn - 1 + Xn , (10.54) s2X. where Xn is a zero-mean white noise input random process with average power Note that Yn can be viewed as the output of the system in Fig. 10.7(a) for an iid input Xn . Find the power spectral density and autocorrelation of Yn . The unit-sample response can be determined from Eq. (10.54): 0 hn = c 1 an n 6 0 n = 0 n 7 0. Note that we require ƒ a ƒ 6 1 for the system to be stable.6 Therefore the transfer function is H1f2 = a ane -j2pfn = q n=0 1 1 - ae -j2pf . 6 A system is said to be stable if a n ƒ hn ƒ 6 q . The response of a stable system to any bounded input is also bounded. Section 10.2 Response of Linear Systems to Random Signals ⫹ Xn 595 Yn ⫻ delay a (a) ⫹ Xn delay Yn ⫻ ⫻ b1 a1 ⫻ ⫻ b2 a2 Xn⫺1 delay ⫹ delay Yn⫺1 delay delay delay Xn⫺p ⫻ ⫻ bp aq Yn⫺q (b) FIGURE 10.7 (a) Generation of AR process; (b) Generation of ARMA process. Equation (10.52) then gives SY1f2 = = 11 - ae s2X -j2pf 211 - aej2pf2 s2X 1 + a - 1ae -j2pf + aej2pf2 2 = s2X 1 + a2 - 2a cos 2pf . Equation (10.51) gives RY1k2 = a a hjhis2Xdk + j - i = s2X a ajaj + k = q q q j=0 i=0 Example 10.15 j=0 s2Xak 1 - a2 . ARMA Random Process An autoregressive moving average (ARMA) process is defined by Yn = - a aiYn - i + a b i¿Wn - i¿ , q p i=1 i¿ = 0 (10.55) where Wn is a WSS, white noise input process. Yn can be viewed as the output of the recursive system in Fig. 10.7(b) to the input Xn . It can be shown that the transfer function of the linear system 596 Chapter 10 Analysis and Processing of Random Signals defined by the above equation is a b i¿e p H1f2 = -j2pfi¿ i¿ = 0 1 + a aie -j2pfi q . i=1 The power spectral density of the ARMA process is SY1f2 = ƒ H1f2 ƒ 2s2W . ARMA models are used extensively in random time series analysis and in signal processing.The general autoregressive process is the special case of the ARMA process with b 1 = b 2 = Á = b p = 0. The general moving average process is the special case of the ARMA process with a1 = a2 = Á = aq = 0. Octave has a function filter(b, a, x) which takes a set of coefficients b = 1b 1 , b 2 , Á , b p + 12 and a = 1a1 , a2 , Á , aq2 as coefficient for a filter as in Eq. (10.55) and produces the output corresponding to the input sequence x.The choice of a and b can lead to a broad range of discretetime filters. For example, if we let a = 11>N, 1>N, Á , 1>N2 we obtain a moving average filter: Yn = 1Wn + Wn - 1 + Á + Wn - N + 12>N. Figure 10.8 shows a zero-mean, unit-variance Gaussian iid sequence Wn and the outputs from an N = 3 and an N = 10 moving average filter. It can be seen that the N = 3 filter moderates the extreme variations but generally tracks the fluctuations in Xn . The N = 10 filter on the other hand severely limits the variations and only tracks slower longer-lasting trends. Figures 10.9(a) and (b) show the result of passing an iid Gaussian sequence Xn through first-order autoregressive filters as in Eq. (10.54). The AR sequence with a = 0.1 has low correlation between adjacent samples and so the sequence remains similar to the underlying iid random process.The AR sequence with a = 0.75 has higher correlation between adjacent samples which tends to cause longer lasting trends as evident in Fig. 10.9(b). 3 2 1 0 ⫺1 ⫺2 ⫺3 ⫺4 10 20 30 40 50 60 70 FIGURE 10.8 Moving average process showing iid Gaussian sequence and corresponding N = 3, N = 10 moving average processes. Section 10.3 Bandlimited Random Processes 3 4 2 3 597 2 1 1 0 0 ⫺1 ⫺1 ⫺2 ⫺3 ⫺2 10 20 30 40 50 (a) 60 70 80 90 100 ⫺3 10 20 30 40 50 (b) 60 70 80 90 100 FIGURE 10.9 (a) First-order autoregressive process with a = 0.1; (b) with a = 0.75. 10.3 BANDLIMITED RANDOM PROCESSES In this section we consider two important applications that involve random processes with power spectral densities that are nonzero over a finite range of frequencies. The first application involves the sampling theorem, which states that bandlimited random processes can be represented in terms of a sequence of their time samples. This theorem forms the basis for modern digital signal processing systems. The second application involves the modulation of sinusoidal signals by random information signals. Modulation is a key element of all modern communication systems. 10.3.1 Sampling of Bandlimited Random Processes One of the major technology advances in the twentieth century was the development of digital signal processing technology. All modern multimedia systems depend in some way on the processing of digital signals. Many information signals, e.g., voice, music, imagery, occur naturally as analog signals that are continuous-valued and that vary continuously in time or space or both. The two key steps in making these signals amenable to digital signal processing are: (1). Convert the continuous-time signals into discrete-time signals by sampling the amplitudes; (2) Representing the samples using a fixed number of bits. In this section we introduce the sampling theorem for wide-sense stationary bandlimited random processes, which addresses the conversion of signals into discrete-time sequences. Let x(t) be a deterministic, finite-energy time signal that has Fourier transform ' X1f2 = f5x1t26 that is nonzero only in the frequency range ƒ f ƒ … W. Suppose we sample x(t) every T seconds to obtain the sequence of sample values: 5 Á , x1-2T2, x1-T2, x102, x1T2, Á 6. The sampling theorem for deterministic signals states that x(t) can be recovered exactly from the sequence of samples if T … 1>2W or equivalently 1>T Ú 2W, that is, the sampling rate is at least twice the bandwidth of the signal. The minimum sampling rate 1/2W is called the Nyquist sampling rate. The sampling 598 Chapter 10 Analysis and Processing of Random Signals ⫻n x(t) n x(nT )d(t – nT ) x(nT )p(t – nT) p(t) d(t⫺nt) Sampling Interpolation (a) ~ f) x( ƒ ⫺W 0 W 1~ x(f) T 1~ 1 x f T T 1~ 1 x f⫺ T T ƒ ⫺ 1 T ⫺ 1 2T 1 2T 0 1 T (b) X(t) Sampler X(kT) Y(kT) hk k ⫻ p(t) Y(t) d(t ⫺ kT) (c) FIGURE 10.10 (a) Sampling and interpolation; (b) Fourier transform of sampled deterministic signal; (c) Sampling, digital filtering, and interpolation. theorem provides the following interpolation formula for recovering x(t) from the samples: x1t2 = a x1nT2p1t - nT2 where p1t2 = q q n=- sin1pt>T2 pt>T . (10.56) Eq. (10.56) provides us with the interesting interpretation depicted in Fig. 10.10(a). The process of sampling x(t) can be viewed as the multiplication of x(t) by a train of delta functions spaced T seconds apart. The sampled function is then represented by: xs1t2 = a x1nT2d1t - nT2. q q (10.57) n=- Eq. (10.56) can be viewed as the response of a linear system with impulse response p(t) to the signal xs1t2. It is easy to show that the p(t) in Eq. (10.56) corresponds to the ideal lowpass filter in Fig. 10.6: P1f2 = f5p1t26 = b 1 0 -W … f … W ƒ f ƒ 7 W. Section 10.3 Bandlimited Random Processes 599 The proof of the sampling theorem involves the following steps. We show that q q ' k 1 f b a x1nT2p1t - nT2 r = P1f2 a X1f - 2, T T n = -q k = -q (10.58) ' which consists of the sum of translated versions of X1f2 = f5x1t26, as shown in Fig. 10.10(b). We then observe that as long as 1>T Ú 2W, then P( f ) in the above expressions selects the k = 0 term in the summation, which corresponds to X( f ). See Problem 10.45 for details. Example 10.16 Sampling a WSS Random Process Let X(t) be a WSS process with autocorrelation function RX1t2. Find the mean and covariance functions of the discrete-time sampled process Xn = X1nT2 for n = 0, ;1, ;2, Á . Since X(t) is WSS, the mean and covariance functions are: mX1n2 = E3X1nT24 = m E3Xn1Xn24 = E3X1n1T2X1n2T24 = RX1n1T - n2T2 = RX11n1 - n22T2. This shows Xn is a WSS discrete-time process. Let X(t) be a WSS process with autocorrelation function RX1t2 and power spectral density SX1f2. Suppose that SX1f2 is bandlimited, that is, SX1f2 = 0 ƒ f ƒ 7 W. We now show that the sampling theorem can be extended to X(t). Let n 1t2 = X aqX1nT2p1t - nT2 where p1t2 = q sin1pt>T2 n=- pt>T , (10.59) n 1t2 = X1t2 in the mean square sense. Recall that equality in the mean square then X sense does not imply equality for all sample functions, so this version of the sampling theorem is weaker than the version in Eq. (10.56) for finite energy signals. To show Eq. (10.59) we first note that since SX1f2 = f5RX1t26, we can apply the sampling theorem for deterministic signals to RX1t2: RX1t2 = a RX1nT2p1t - nT2. q q (10.60) n=- Next we consider the mean square error associated with Eq. (10.59): n 1t2624 = E35X1t2 - X n 1t26X1t24 - E35X1t2 - X n 1t26X n 1t24 E35X1t2 - X = n 1t2X1t24 F E E3X1t2X1t24 - E3X n 1t24 - E3X n 1t2X n 1t24 F . E E3X1t2X It is easy to show that Eq. (10.60) implies that each of the terms in braces is equal to zero. n 1t2 = X1t2 in the mean square sense. (See Problem 10.48.) We then conclude that X 600 Chapter 10 Analysis and Processing of Random Signals Example 10.17 Digital Filtering of a Sampled WSS Random Process Let X(t) be a WSS process with power spectral density SX1f2 that is nonzero only for ƒ f ƒ … W. Consider the sequence of operations shown in Fig. 10.10(c): (1) X(t) is sampled at the Nyquist rate; (2) the samples X(nT) are input into a digital filter in Fig. 10.7(b) with a1 = a2 = Á = aq = 0; and (3) the resulting output sequence Yn is fed into the interpolation filter. Find the power spectral density of the output Y(t). The output of the digital filter is given by: Y1kT2 = a b nX11k - n2T2 p n=0 and the corresponding autocorrelation from Eq. (10.51) is: RY1kT2 = a a b n b iRX11k + n - i2T2. p p n=0 i=0 The autocorrelation of Y(t) is found from the interpolation formula (Eq. 10.60): RY1t2 = a RY1kT2p1t - kT2 = a a a b n b iRX11k + n - i2T2p1t - kT2 q q q q k=- k=- p p n=0 i=0 = a a b n b i b a RX11k + n - i2T2p1t - kT2 r q p q p n=0 i=0 k=- = a a b n b iRX1t + 1n - i2T2. p p n=0 i=0 The output power spectral density is then: SY1f2 = f5RY1t26 = a a b n b if5RX1t + 1n - i2T26 p p n=0 i=0 = a a b n b iSX1f2e -j2pf1n - i2T p p n=0 i=0 = b a b ne -j2pfnT r b a b iej2pfiT r SX1f2 p p n=0 i=0 = ƒ H1fT2 ƒ SX1f2 2 (10.61) where H( f ) is the transfer function of the digital filter as per Eq. (10.49). The key finding here is the appearance of H( f ) evaluated at f T.We have obtained a very nice result that characterizes the overall system response in Fig. 10.8 to the continuous-time input X(t). This result is true for more general digital filters, see [Oppenheim and Schafer]. The sampling theorem provides an important bridge between continuous-time and discrete-time signal processing. It gives us a means for implementing the real as well as the simulated processing of random signals. First, we must sample the random process above its Nyquist sampling rate. We can then perform whatever digital processing is necessary. We can finally recover the continuous-time signal by interpolation. The only difference between real signal processing and simulated signal processing is that the former usually has real-time requirements, whereas the latter allows us to perform our processing at whatever rate is possible using the available computing power. Section 10.3 Bandlimited Random Processes 601 10.3.2 Amplitude Modulation by Random Signals Many of the transmission media used in communication systems can be modeled as linear systems and their behavior can be specified by a transfer function H(f ), which passes certain frequencies and rejects others. Quite often the information signal A(t) (i.e., a speech or music signal) is not at the frequencies that propagate well. The purpose of a modulator is to map the information signal A(t) into a transmission signal X(t) that is in a frequency range that propagates well over the desired medium. At the receiver, we need to perform an inverse mapping to recover A(t) from X(t). In this section, we discuss two of the amplitude modulation methods. Let A(t) be a WSS random process that represents an information signal. In general A(t) will be “lowpass” in character, that is, its power spectral density will be concentrated at low frequencies, as shown in Fig. 10.11(a). An amplitude modulation (AM) system produces a transmission signal by multiplying A(t) by a “carrier” signal cos12pfct + ®2: X1t2 = A1t2 cos12pfct + ®2, (10.62) where we assume ® is a random variable that is uniformly distributed in the interval 10, 2p2, and ® and A(t) are independent. The autocorrelation of X(t) is E3X1t + t2X1t24 = E3A1t + t2 cos12pfc1t + t2 + ®2A1t2 cos12pfct + ®24 = E3A1t + t2A1t24E3cos12pfc1t + t2 + ®2 cos12pfct + ®24 SA( f ) ⫺W 0 (a) ƒ W SX( f ) ⫺ƒc 0 (b) ƒc FIGURE 10.11 (a) A lowpass information signal; (b) an amplitude-modulated signal. ƒ 602 Chapter 10 Analysis and Processing of Random Signals 1 1 = RA1t2Ec cos12pfct2 + cos12pfc12t + t2 + 2®2 d 2 2 = 1 RA1t2 cos12pfct2, 2 (10.63) where we used the fact that E3cos12pfc12t + t2 + 2®24 = 0 (see Example 9.10). Thus X(t) is also a wide-sense stationary random process. The power spectral density of X(t) is 1 SX1f2 = f e RA1t2 cos12pfct2 f 2 = 1 1 S 1f + fc2 + SA1f - fc2, 4 A 4 (10.64) where we used the table of Fourier transforms in Appendix B. Figure 10.11(b) shows SX1f2. It can be seen that the power spectral density of the information signal has been shifted to the regions around ;fc . X(t) is an example of a bandpass signal. Bandpass signals are characterized as having their power spectral density concentrated about some frequency much greater than zero. The transmission signal is demodulated by multiplying it by the carrier signal and lowpass filtering, as shown in Fig. 10.12. Let Y1t2 = X1t22 cos12pfct + ®2. (10.65) Proceeding as above, we find that SY1f2 = = 1 1 SX1f + fc2 + SX1f - fc2 2 2 1 1 5S 1f + 2fc2 + SA1f26 + 5SA1f2 + SA1f - 2fc26. 2 A 2 The ideal lowpass filter passes SA1f2 and blocks SA1f ; 2fc2, which is centered about ; f, so the output of the lowpass filter has power spectral density SY1f2 = SA1f2. In fact, from Example 10.11 we know the output is the original information signal, A(t). X(t) ⫻ 2 cos (2pfct ⫹ ⌰) FIGURE 10.12 AM demodulator. LPF Y(t) Section 10.3 Bandlimited Random Processes 603 SX ( f ) ƒ0 0 (a) ƒ0 S A( f ) 0 (b) jSB,A( f ) 0 (c) FIGURE 10.13 (a) A general bandpass signal. (b) a real-valued even function of f. (c) an imaginary odd function of f. The modulation method in Eq. (10.56) can only produce bandpass signals for which SX1f2 is locally symmetric about fc , SX1fc + df2 = SX1fc - df2 for ƒ df ƒ 6 W, as in Fig. 10.11(b). The method cannot yield real-valued transmission signals whose power spectral density lack this symmetry, such as shown in Fig. 10.13(a). The following quadrature amplitude modulation (QAM) method can be used to produce such signals: X1t2 = A1t2 cos12pfct + ®2 + B1t2 sin12pfct + ®2, (10.66) where A(t) and B(t) are real-valued, jointly wide-sense stationary random processes, and we require that (10.67a) RA1t2 = RB1t2 RB,A1t2 = -RA,B1t2. (10.67b) Note that Eq. (10.67a) implies that SA1f2 = SB1f2, a real-valued, even function of f, as shown in Fig. 10.13(b). Note also that Eq. (10.67b) implies that SB,A1f2 is a purely imaginary, odd function of f, as also shown in Fig. 10.13(c) (see Problem 10.57). 604 Chapter 10 Analysis and Processing of Random Signals Proceeding as before, we can show that X(t) is a wide-sense stationary random process with autocorrelation function RX1t2 = RA1t2 cos12pfct2 + RB,A1t2 sin12pfct2 (10.68) and power spectral density SX1f2 = 1 1 5SA1f - fc2 + SA1f + fc26 + 5S 1f - fc2 - SBA1f + fc26. 2 2j BA (10.69) The resulting power spectral density is as shown in Fig. 10.13(a). Thus QAM can be used to generate real-valued bandpass signals with arbitrary power spectral density. Bandpass random signals, such as those in Fig. 10.13(a), arise in communication systems when wide-sense stationary white noise is filtered by bandpass filters. Let N(t) be such a process with power spectral density SN1f2. It can be shown that N(t) can be represented by N1t2 = Nc1t2 cos12pfct + ®2 - Ns1t2 sin12pfct + ®2, (10.70) SNc1f2 = SNs1f2 = 5SN1f - fc2 + SN1f + fc26L (10.71) SNc,Ns1f2 = j5SN1f - fc2 - SN1f + fc26L , (10.72) where Nc1t2 and Ns1t2 are jointly wide-sense stationary processes with and where the subscript L denotes the lowpass portion of the expression in brackets. In words, every real-valued bandpass process can be treated as if it had been generated by a QAM modulator. Example 10.18 Demodulation of Noisy Signal The received signal in an AM system is Y1t2 = A1t2 cos12pfct + ®2 + N1t2, where N(t) is a bandlimited white noise process with spectral density N0 SN1f2 = c 2 0 ƒ f ; fc ƒ 6 W elsewhere. Find the signal-to-noise ratio of the recovered signal. Equation (10.70) allows us to represent the received signal by Y1t2 = 5A1t2 + Nc1t26 cos12pfct + ®2 - Ns1t2 sin12pfct + ®2. The demodulator in Fig. 10.12 is used to recover A(t). After multiplication by 2 cos12pfct + ®2, we have 2Y1t2 cos12pfct + ®2 = 5A1t2 + Nc1t262 cos212pfct + ®2 - Ns1t22 cos12pfct + ®2 sin12pfct + ®2 = 5A1t2 + Nc1t2611 + cos14pfct + 2®22 - Ns1t2 sin14pfct + 2®2. Section 10.4 Optimum Linear Systems 605 After lowpass filtering, the recovered signal is A1t2 + Nc1t2. The power in the signal and noise components, respectively, are W s2A = L-W SA1f2 df W s2Nc = L-W SNc1f2 df = W L-W ¢ N0 N0 + ≤ df = 2WN0 . 2 2 The output signal-to-noise ratio is then SNR = 10.4 s2A . 2WN0 OPTIMUM LINEAR SYSTEMS Many problems can be posed in the following way. We observe a discrete-time, zeromean process Xa over a certain time interval I = 5t - a, Á , t + b6, and we are required to use the a + b + 1 resulting observations 5Xt - a , Á , Xt , Á , Xt + b6 to obtain an estimate Yt for some other (presumably related) zero-mean process Zt . The estimate Yt is required to be linear, as shown in Fig. 10.14: Yt = a ht - bXb = a hbXt - b . t+b a b=t-a b = -b (10.73) The figure of merit for the estimator is the mean square error E3e2t 4 = E31Zt - Yt224, Xt – a ha ⫻ ha–1 Xt ⫺ a ⫹1 (10.74) Xt h0 ⫻ ⫹ Yt FIGURE 10.14 A linear system for producing an estimate Yt . ⫻ Xt ⫹ b hb ⫻ 606 Chapter 10 Analysis and Processing of Random Signals and we seek to find the optimum filter, which is characterized by the impulse response hb that minimizes the mean square error. Examples 10.19 and 10.20 show that different choices of Zt and Xa and of observation interval correspond to different estimation problems. Example 10.19 Filtering and Smoothing Problems Let the observations be the sum of a “desired signal” Za plus unwanted “noise” Na: Xa = Za + Na a H I. We are interested in estimating the desired signal at time t. The relation between t and the observation interval I gives rise to a variety of estimation problems. If I = 1- q , t2, that is, a = q and b = 0, then we have a filtering problem where we estimate Zt in terms of noisy observations of the past and present. If I = 1t - a, t2, then we have a filtering problem in which we estimate Zt in terms of the a + 1 most recent noisy observations. If I = 1- q , q 2, that is, a = b = q , then we have a smoothing problem where we are attempting to recover the signal from its entire noisy version. There are applications where this makes sense, for example, if the entire realization Xa has been recorded and the estimate Zt is obtained by “playing back” Xa . Example 10.20 Prediction Suppose we want to predict Zt in terms of its recent past: 5Zt - a , Á , Zt - 16. The general estimation problem becomes this prediction problem if we let the observation Xa be the past a values of the signal Za , that is, Xa = Za t - a … a … t - 1. The estimate Yt is then a linear prediction of Zt in terms of its most recent values. 10.4.1 The Orthogonality Condition It is easy to show that the optimum filter must satisfy the orthogonality condition (see Eq. 6.56), which states that the error et must be orthogonal to all the observations Xa , that is, 0 = E3etXa4 or equivalently, for all a H I = E31Zt - Yt2Xa4 = 0, E3ZtXa4 = E3YtXa4 (10.75) for all a H I. (10.76) If we substitute Eq. (10.73) into Eq. (10.76) we find E3ZtXa4 = E B a hbXt - bXa R a for all a H I b = -b = a hbE3Xt - bXa4 a b = -b = a hbRX1t - a - b2 a b = -b for all a H I. (10.77) Section 10.4 Optimum Linear Systems 607 Equation (10.77) shows that E3ZtXa4 depends only on t - a, and thus Xa and Zt are jointly wide-sense stationary processes. Therefore, we can rewrite Eq. (10.77) as follows: RZ,X1t - a2 = a hbRX1t - b - a2 a t - a … a … t + b. b = -b Finally, letting m = t - a, we obtain the following key equation: RZ,X1m2 = a hbRX1m - b2 a -b … m … a. (10.78) b = -b The optimum linear filter must satisfy the set of a + b + 1 linear equations given by Eq. (10.78). Note that Eq. (10.78) is identical to Eq. (6.60) for estimating a random variable by a linear combination of several random variables. The wide-sense stationarity of the processes reduces this estimation problem to the one considered in Section 6.5. In the above derivation we deliberately used the notation Zt instead of Zn to suggest that the same development holds for continuous-time estimation. In particular, suppose we seek a linear estimate Y(t) for the continuous-time random process Z(t) in terms of observations of the continuous-time random process X1a2 in the time interval t - a … a … t + b: t+b Y1t2 = Lt - a a h1t - b2X1b2 db = L-b h1b2X1t - b2 db. It can then be shown that the filter h1b2 that minimizes the mean square error is specified by RZ,X1t2 = a L-b h1b2RX1t - b2 db -b … t … a. (10.79) Thus in the time-continuous case we obtain an integral equation instead of a set of linear equations. The analytic solution of this integral equation can be quite difficult, but the equation can be solved numerically by approximating the integral by a summation.7 We now determine the mean square error of the optimum filter. First we note that for the optimum filter, the error et and the estimate Yt are orthogonal since E3etYt4 = Ecet a ht - bXb d = a ht - bE3etXb4 = 0, where the terms inside the last summation are 0 because of Eq. (10.75). Since et = Zt - Yt , the mean square error is then E3e2t 4 = E3et1Zt - Yt24 = E3etZt4, 7 Equation (10.79) can also be solved by using the Karhunen-Loeve expansion. 608 Chapter 10 Analysis and Processing of Random Signals since et and Yt are orthogonal. Substituting for et yields E3e2t 4 = E31Zt - Yt2Zt4 = E3ZtZt4 - E3YtZt4 = RZ102 - E3ZtYt4 = RZ102 - E B Zt a hbXt - b R a b = -b = RZ102 - a hbRZ,X1b2. a (10.80) b = -b Similarly, it can be shown that the mean square error of the optimum filter in the continuous-time case is E3e21t24 = RZ102 = a L-b h1b2RZ,X1b2 db. (10.81) The following theorems summarize the above results. Theorem Let Xt and Zt be discrete-time, zero-mean, jointly wide-sense stationary processes, and let Yt be an estimate for Zt of the form a ht - bXb = a hbXt - b . Yt = t+b a b=t-a b = -b The filter that minimizes E31Zt - Yt224 satisfies the equation RZ,X1m2 = a hbRX1m - b2 a -b … m … a b = -b and has mean square error given by E31Zt - Yt224 = RZ102 - a hbRZ,X1b2. a b = -b Theorem Let X(t) and Z(t) be continuous-time, zero-mean, jointly wide-sense stationary processes, and let Y(t) be an estimate for Z(t) of the form t+b Y1t2 = Lt - a a h1t - b2X1b2 db = L-b h1b2X1t - b2 db. The filter h1b2 that minimizes E31Z1t2 - Y1t2224 satisfies the equation RZ,X1t2 = a L-b h1b2RX1t - b2 db -b … t … a 609 Section 10.4 Optimum Linear Systems and has mean square error given by E31Z1t2 - Y1t2224 = RZ102 - a L-b h1b2RZ,X1b2 db. Example 10.21 Filtering of Signal Plus Noise Suppose we are interested in estimating the signal Zn from the p + 1 most recent noisy observations: a H I = 5n - p, Á , n - 1, n6. Xa = Za + Na Find the set of linear equations for the optimum filter if Za and Na are independent random processes. For this choice of observation interval, Eq. (10.78) becomes RZ,X1m2 = a hbRX1m - b2 p b=0 m H 50, 1, Á , p6. (10.82) The cross-correlation terms in Eq. (10.82) are given by RZ,X1m2 = E3ZnXn - m4 = E3Zn1Zn - m + Nn - m24 = RZ1m2. The autocorrelation terms are given by RX1m - b2 = E3Xn - bXn - m4 = E31Zn - b + Nn - b21Zn - m + Nn - m24 = RZ1m - b2 + RZ,N1m - b2 + RN,Z1m - b2 + RN1m - b2 = RZ1m - b2 + RN1m - b2, since Za and Na are independent random processes. Thus Eq. (10.82) for the optimum filter becomes RZ1m2 = a hb5RZ1m - b2 + RN1m - b26 m H 50, 1, Á , p6. p (10.83) b=0 This set of p + 1 linear equations in p + 1 unknowns hb is solved by matrix inversion. Example 10.22 Filtering of AR Signal Plus Noise Find the set of equations for the optimum filter in Example 10.21 if Za is a first-order autoregressive process with average power sZ2 and parameter r, ƒ r ƒ 6 1, and Na is a white noise process 2 with average power sN . The autocorrelation for a first-order autoregressive process is given by RZ1m2 = s2Zr ƒmƒ m = 0, ;1, ;2, Á . (See Problem 10.42.) The autocorrelation for the white noise process is RN1m2 = s2N d1m2. Substituting RZ1m2 and RN1m2 into Eq. (10.83) yields the following set of linear equations: s2Zr ƒmƒ = a hb1s2Zr ƒm - bƒ + s2Nd1m - b22 p b=0 m H 50, Á , p6. (10.84) 610 Chapter 10 Analysis and Processing of Random Signals If we divide both sides of Eq. (10.84) by s2Z and let ≠ = s2N>s2Z , we obtain the following matrix equation: Á 1 h0 1 + ≠ r r2 rp p Á r h1 r 1 + ≠ r r -1 (10.85) r 1 + ≠ Á rp - 2 U E # U = E # U. E r2 # # # # # Á # Á 1 + ≠ hp rp rp rp - 1 rp - 2 Note that when the noise power is zero, i.e., ≠ = 0, then the solution is h0 = 1, hj = 0, j = 1, Á , p, that is, no filtering is required to obtain Zn . Equation (10.85) can be readily solved using Octave. The following function will compute the optimum linear coefficients and the mean square error of the optimum predictor: function [mse]= Lin_Est_AR (order,rho,varsig,varnoise) n=[0:1:order-1] r=varsig*rho.^n; R=varnoise*eye(order)+toeplitz(r); H=inv(R)*transpose(r) mse=varsig-transpose(H)*transpose(r); endfunction Table 10.1 gives the values of the optimal predictor coefficients and the mean square error as the order of the estimator is increased for the first-order autoregressive process with s2Z = 4, r = 0.9, and noise variance s2N = 4. It can be seen that the predictor places heavier weight on more recent samples, which is consistent with the higher correlation of such samples with the current sample. For smaller values of r, the correlation for distant samples drops off more quickly and the coefficients place even lower weighting on them. The mean square error can also be seen to decrease with increasing order p + 1 of the estimator. Increasing the first few orders provides significant improvements, but a point of diminishing returns is reached around p + 1 = 3. 10.4.2 Prediction The linear prediction problem arises in many signal processing applications. In Example 6.31 in Chapter 6, we already discussed the linear prediction of speech signals. In general, we wish to predict Zn in terms of Zn - 1 , Zn - 2 , Á , Zn - p: Yn = a hbZn - b . p b=1 TABLE 10.1 Effect of predictor order on MSE performance. pⴙ1 MSE 1 2 3 4 5 2.0000 1.4922 1.3193 1.2549 1.2302 Coefficients 0.5 0.37304 0.32983 0.31374 0.30754 0.28213 0.22500 0.20372 0.19552 0.17017 0.13897 0.12696 0.10510 0.08661 0.065501 Section 10.4 Optimum Linear Systems 611 For this problem, Xa = Za , so Eq. (10.79) becomes RZ1m2 = a hbRZ1m - b2 p b=1 m H 51, Á , p6. (10.86a) In matrix form this equation becomes RZ102 RZ112 RZ122 RZ112 E . U = E . . . RZ1p2 RZ1p - 12 RZ112 RZ102 . . . RZ122 RZ112 . . . Á Á . . RZ112 RZ1p - 12 h1 RZ1p - 22 h2 . UE . U RZ112 . RZ102 hp = R Zh. (10.86b) Equations (10.86a) and (10.86b) are called the Yule-Walker equations. Equation (10.80) for the mean square error becomes E3e2n4 = RZ102 - a hbRZ1b2. p (10.87) b=1 By inverting the p * p matrix R Z, we can solve for the vector of filter coefficients h. Example 10.23 Prediction for Long-Range and Short-Range Dependent Processes Let X11t2 be a discrete-time first-order autoregressive process with s2X = 1 and r = 0.7411, and let X21t2 be a discrete-time long-range dependent process with autocovariance given by Eq. (9.109), s2X = 1, and H = 0.9. Both processes have CX112 = 0.7411, but the autocovariance of X11t2 decreases exponentially while that of X21t2 has long-range dependence. Compare the performance of the optimal linear predictor for these processes for short-term as well as long-term predictions. The optimum linear coefficients and the associated mean square error for the long-range dependent process can be calculated using the following code. The function can be modified for the autoregressive case. function mse= Lin_Pred_LR(order,Hurst,varsig) n=[0:1:order-1] H2=2*Hurst r=varsig*((1+n).^H2-2*(n.^H2)+abs(n-1).^H2)/2 rz=varsig*((2+n).^H2-2*((n+1).^H2)+(n).^H2)/2 R=toeplitz(r); H=transpose(inv(R)*transpose(rz)) mse=varsig-H*transpose(rz) endfunction Table 10.2 below compares the mean square errors and the coefficients of the two processes in the case of short-term prediction. The predictor for X11t2 attains all of the benefit of prediction with a p = 1 system. The optimum predictors for higher-order systems set the other coefficients to zero, and the mean square error remains at 0.4577. The predictor for X21t2 612 Chapter 10 Analysis and Processing of Random Signals TABLE 10.2(a) Short-term prediction: autoregressive, r = 0.7411, s2X = 1, CX(1) = 0.7411. p MSE Coefficients 1 0.45077 0.74110 2 0.45077 0.74110 0 TABLE 10.2(b) Short-term prediction: long-range dependent process, Hurst = 0.9, s2X = 1, CX(1) = 0.7411. p MSE 1 2 3 4 5 0.45077 0.43625 0.42712 0.42253 0.41964 Coefficients 0.74110 0.60809 0.582127 0.567138 0.558567 0.17948 0.091520 0.082037 0.075061 0.144649 0.084329 0.077543 0.103620 0.056707 0.082719 achieves most of the possible performance with a p = 1 system, but small reductions in mean square error do accrue by adding more coefficients. This is due to the persistent correlation among the values in X21t2. Table 10.3 shows the dramatic impact of long-range dependence on prediction performance. We modified Eq. (10.86) to provide the optimum linear predictor for Xt based on two observations Xt-10 and Xt-20 that are in the relatively remote past. X11t2 and its previous values are almost uncorrelated, so the best predictor has a mean square error of almost 1, which is the variance of X11t2. On the other hand, X21t2 retains significant correlation with its previous values and so the mean square error provides a significant reduction from the unit variance. Note that the second-order predictor places significant weight on the observation 20 samples in the past. TABLE 10.3(a) Long-term prediction: autoregressive, r = 0.7411, s2X = 1, CX(1) = 0.7411. p MSE 1 2 0.99750 0.99750 Coefficients 0.04977 0.04977 0 TABLE 10.3(b) Long-term prediction: long-range dependent process, Hurst = 0.9, s2X = 1, CX(1) = 0.7411. p MSE Coefficients 10 0.79354 0.45438 10;20 0.74850 0.34614 0.23822 Section 10.4 Optimum Linear Systems 613 10.4.3 Estimation Using the Entire Realization of the Observed Process Suppose that Zt is to be estimated by a linear function Yt of the entire realization of Xt , that is, a = b = q and Eq. (10.73) becomes Yt = a hbXt - b . q q b=- In the case of continuous-time random processes, we have q L- q Y1t2 = h1b2X1t - b2 db. The optimum filters must satisfy Eqs. (10.78) and (10.79), which in this case become RZ,X1m2 = a hbRX1m - b2 q q for all m (10.88a) b=- RZ,X1t2 = q L- q h1b2RX1t - b2 db for all t. (10.88b) The Fourier transform of the first equation and the Fourier transform of the second equation both yield the same expression: SZ,X1f2 = H1f2SX1f2, which is readily solved for the transfer function of the optimum filter: H1f2 = SZ,X1f2 SX1f2 . (10.89) The impulse response of the optimum filter is then obtained by taking the appropriate inverse transform. In general the filter obtained from Eq. (10.89) will be noncausal, that is, its impulse response is nonzero for t 6 0. We already indicated that there are applications where this makes sense, namely, in situations where the entire realization Xa is recorded and the estimate Zt is obtained in “nonreal time” by “playing back” Xa . Example 10.24 Infinite Smoothing Find the transfer function for the optimum filter for estimating Z(t) from X1a2 = Z1a2 + N1a2, a H 1- q , q 2, where Z1a2 and N1a2 are independent, zero-mean random processes. The cross-correlation between the observation and the desired signal is RZ,X1t2 = E3Z1t + t2X1t24 = E3Z1t + t21Z1t2 + N1t224 = E3Z1t + t2Z1t24 + E3Z1t + t2N1t24 = RZ1t2, since Z(t) and N(t) are zero-mean, independent random processes. The cross-power spectral density is then SZ,X1t2 = SZ1f2. (10.90) 614 Chapter 10 Analysis and Processing of Random Signals The autocorrelation of the observation process is RX1t2 = E31Z1t + t2 + N1t + t221Z1t2 + N1t224 = RZ1t2 + RN1t2. The corresponding power spectral density is SX1f2 = SZ1f2 + SN1f2. (10.91) Substituting Eqs. (10.90) and (10.91) into Eq. (10.89) gives H1f2 = SZ1f2 SZ1f2 + SN1f2 . (10.92) Note that the optimum filter H( f ) is nonzero only at the frequencies where SZ1f2 is nonzero, that is, where the signal has power content. By dividing the numerator and denominator of Eq. (10.92) by SZ1f2, we see that H( f ) emphasizes the frequencies where the ratio of signal to noise power density is large. *10.4.4 Estimation Using Causal Filters Now, suppose that Zt is to be estimated using only the past and present of Xa , that is, I = 1- q , t2. Equations (10.78) and (10.79) become RZ,X1m2 = a hbRX1m - b2 q for all m (10.93a) b=0 RZ,X1t2 = L0 q h1b2RX1t - b2 db for all t. (10.93b) Equations (10.93a) and (10.93b) are called the Wiener-Hopf equations and, though similar in appearance to Eqs. (10.88a) and (10.88b), are considerably more difficult to solve. First, let us consider the special case where the observation process is white, that is, for the discrete-time case RX1m2 = dm . Equation (10.93a) is then RZ,X1m2 = a hb dm - b = hm q m Ú 0. (10.94) b=0 Thus in this special case, the optimum causal filter has coefficients given by hm = b 0 RZ,X1m2 m 6 0 m Ú 0. The corresponding transfer function is H1f2 = a RZ,X1m2e -j2pfm. q (10.95) m=0 Note Eq. (10.95) is not SZ,X1f2, since the limits of the Fourier transform in Eq. (10.95) do not extend from - q to + q . However, H( f ) can be obtained from SZ,X1f2 by finding hm = f -13SZ,X1f24, keeping the causal part (i.e., hm for m Ú 0) and setting the noncausal part to 0. Section 10.4 Optimum Linear Systems 615 We now show how the solution of the above special case can be used to solve the general case. It can be shown that under very general conditions, the power spectral density of a random process can be factored into the form SX1f2 = ƒ G1f2 ƒ 2 = G1f2G … 1f2, (10.96) where G( f ) and 1/G( f ) are causal filters.8 This suggests that we can find the optimum filter in two steps, as shown in Fig. 10.15. First, we pass the observation process through a “whitening” filter with transfer function W1f2 = 1>G1f2 to produce a white noise process Xnœ , since SX¿1f2 = ƒ W1f2 ƒ 2SX1f2 = ƒ G1f2 ƒ 2 ƒ G1f2 ƒ 2 = 1 for all f. Second, we find the best estimator for Zn using the whitened observation process Xnœ as given by Eq. (10.95). The filter that results from the tandem combination of the whitening filter and the estimation filter is the solution to the Wiener-Hopf equations. The transfer function of the second filter in Fig. 10.15 is H21f2 = a RZ,X¿1m2e -j2pfm q (10.97) m=0 by Eq. (10.95). To evaluate Eq. (10.97) we need to find RZ,X¿1k2 = E3Zn + kXnœ 4 = a wiE3Zn + kXn - i4 q i=0 q = a wiRZ, X1k + i2, (10.98) i=0 where wi is the impulse response of the whitening filter. The Fourier transform of Eq. (10.98) gives an expression that is easier to work with: SZ, X¿1f2 = W…1f2SZ,X1f2 = Xn W( f ) X⬘n H 2( f ) SZ,X1f2 G …1f2 . (10.99) Yn FIGURE 10.15 Whitening filter approach for solving WienerHopf equations. The method for factoring SX1f2 as specified by Eq. (10.96) is called spectral factorization. See Example 10.10 and the references at the end of the chapter. 8 616 Chapter 10 Analysis and Processing of Random Signals The inverse Fourier transform of Eq. (10.99) yields the desired RZ,X¿1k2, which can then be substituted into Eq. (10.97) to obtain H21f2. In summary, the optimum filter is found using the following procedure: 1. 2. 3. 4. Factor SX1f2 as in Eq. (10.96) and obtain a causal whitening filter W1f2 = 1>G1f2. Find RZ,X¿1k2 from Eq. (10.98) or from Eq. (10.99). H21f2 is then given by Eq. (10.97). The optimum filter is then H1f2 = W1f2H21f2. (10.100) This procedure is valid for the continuous-time version of the optimum causal filter problem, after appropriate changes are made from summations to integrals. The following example considers a continuous-time problem. Example 10.25 Wiener Filter Find the optimum causal filter for estimating a signal Z(t) from the observation X1t2 = Z1t2 + N1t2, where Z(t) and N(t) are independent random processes, N(t) is zero-mean white noise density 1, and Z(t) has power spectral density SZ1f2 = 2 . 1 + 4p2f2 The optimum filter in this problem is called the Wiener filter. The cross-power spectral density between Z(t) and X(t) is SZ,X1f2 = SZ1f2, since the signal and noise are independent random processes. The power spectral density for the observation process is SX1f2 = SZ1f2 + SN1f2 = 3 + 4p2f2 1 + 4p2f2 = ¢ j2pf + 23 -j2pf + 23 ≤¢ ≤. j2pf + 1 -j2pf + 1 If we let G1f2 = j2pf + 23 , j2pf + 1 then it is easy to verify that W1f2 = 1>G1f2 is the whitening causal filter. Next we evaluate Eq. (10.99): SZ,X¿1f2 = = = SZ,X1f2 G …1f2 = 1 - j2pf 2 1 + 4p2f2 23 - j2pf 2 11 + j2pf2123 - j2pf2 c c , + 1 + j2pf 23 - j2pf (10.101) Section 10.5 The Kalman Filter 617 where c = 2>11 + 232. If we take the inverse Fourier transform of SZ,X¿1f2, we obtain RZ,X¿1t2 = b ce -t ce 23t t 7 0 t 6 0. Equation (10.97) states that H21f2 is given by the Fourier transform of the t 7 0 portion of RZ,X¿1t2: c H21f2 = f5ce -Tu1t26 = . 1 + j2pf Note that we could have gotten this result directly from Eq. (10.101) by noting that only the first term gives rise to the positive-time (i.e., causal) component. The optimum filter is then H1f2 = 1 c . H21f2 = G1f2 23 + j2pf The impulse response of this filter is h1t2 = cet-23 10.5 t 7 0. THE KALMAN FILTER The optimum linear systems considered in the previous section have two limitations: (1) They assume wide-sense stationary signals; and (2) The number of equations grows with the size of the observation set. In this section, we consider an estimation approach that assumes signals have a certain structure. This assumption keeps the dimensionality of the problem fixed even as the observation set grows. It also allows us to consider certain nonstationary signals. We will consider the class of signals that can be represented as shown in Fig. 10.16(a): Zn = an - 1Zn - 1 + Wn - 1 n = 1, 2, Á , (10.102) where Z0 is the random variable at time 0, an is a known sequence of constants, and Wn is a sequence of zero-mean uncorrelated random variables with possibly time-varying variances 5E3W2n46. The resulting process Zn is nonstationary in general.We assume that the process Zn is not available to us, and that instead, as shown in Fig. 10.16(a), we observe Xn = Zn + Nn n = 0, 1, 2, Á , (10.103) where the observation noise Nn is a zero-mean, uncorrelated sequence of random variables with possibly time-varying variances 5E3N 2n46. We assume that Wn and Nn are uncorrelated at all times n1 and n2 . In the special case where Wn and Nn are Gaussian random processes, then Zn and Xn will also be Gaussian random processes. We will develop the Kalman filter, which has the structure in Fig. 10.16(b). Our objective is to find for each time n the minimum mean square estimate (actually prediction) of Zn based on the observations X0 , X1 , Á , Xn - 1 using a linear estimator that possibly varies with time: - 12 Yn = a h1n Xn - j . j n j=i (10.104) 618 Chapter 10 Analysis and Processing of Random Signals an ⫺ 1 ⫻ Zn ⫺ 1 Unit delay Wn ⫺ 1 ⫹ Nn ⫹ Zn Xn (a) Yn an Unit delay ⫻ ⫺ ⫹ Xn ⫹ ⫻ ⫹ Yn ⫹ 1 kn (b) FIGURE 10.16 (a) Signal structure. (b) Kalman filter. 1n - 12 The orthogonality principle implies that the optimum filter 5hj E B ¢ Zn - a hj n 1n - 12 j=1 Xn - j ≤ Xl R = 0 6 satisfies for l = 0, 1, Á , n - 1, which leads to a set of n equations in n unknowns: RZ,X1n, l2 = a hj n 1n - 12 j=1 RX1n - j, l2 for l = 0, 1, Á , n - 1. (10.105) At the next time instant, we need to find Yn + 1 = a hj Xn + 1 - j n+1 1n2 (10.106) j=1 by solving a system of 1n + 12 * 1n + 12 equations: RZ,X1n + 1, l2 = a hj RX1n + 1 - j, l2 n+1 1n2 for l = 0, 1, Á , n. (10.107) j=1 Up to this point we have followed the procedure of the previous section and we find that the dimensionality of the problem grows with the number of observations. We now use the signal structure to develop a recursive method for solving Eq. (10.106). Section 10.5 The Kalman Filter 619 We first need the following two results: For l 6 n, we have RZ,X1n + 1, l2 = E3Zn + 1Xl4 = E31anZn + Wn2Xl4 = anRZ,X1n, l2 + E3WnXl4 = anRZ,X1n, l2, (10.108) since E3WnXl4 = E3Wn4E3Xl4 = 0, that is, Wn is uncorrelated with the past of the process and the observations prior to time n, as can be seen from Fig. 10.16(a). Also for l 6 n, we have RZ,X1n, l2 = E3ZnXl4 = E31Xn - Nn2Xl4 = RX1n, l2 - E3NnXl4 = RX1n, l2, (10.109) since E3NnXl4 = E3Nn4E3Xl4 = 0, that is, the observation noise at time n is uncorrelated with prior observations. We now show that the set of equations in Eq. (10.107) can be related to the set in Eq. (10.105). For l 6 n, we can equate the right-hand sides of Eqs. (10.108) and (10.107): anRZ,X1n, l2 = a hj RX1n + 1 - j, l2 n+1 1n2 j=1 RX1n, l2 + a hj RX1n + 1 - j, l2 = h1n2 1 n+1 1n2 j=2 for l = 0, 1, Á , n - 1. (10.110) From Eq. (10.109) we have RX1n, l2 = RZ,X1n, l2, so we can replace the first term on the right-hand of Eq. (10.110) and then move the resulting term to the left-hand side: 1an - h1n2 1 2RZ,X1n, l2 = a hj RX1n + 1 - j, l2 n+1 1n2 j=2 = a hj¿ + 1RX1n - j¿, l2. n 1n2 (10.111) j¿ = 1 By dividing both sides by an - h1n2 we finally obtain 1 1n2 RZ,X1n, l2 = a hj¿ + 1 n 1n2 - h1 j¿ = 1 a n RX1n - j¿, l2 for l = 0, 1, Á , n - 1. (10.112) This set of equations is identical to Eq. (10.105) if we set 1n - 12 hj 1n2 = hj + 1 for j = 1, Á , n. 1n2 a n - h1 1n - 12 (10.113a) 1n - 12 Therefore, if at step n we have found h1 , Á , hn , and if somehow we have found 1n2 h1 , then we can find the remaining coefficients from 1n2 1n2 1n - 12 hj + 1 = 1a n - h1 2hj 1n2 Thus the key question is how to find h1 . j = 1, Á , n. (10.113b) 620 Chapter 10 Analysis and Processing of Random Signals Suppose we substitute the coefficients in Eq. (10.113b) into Eq. (10.106): Yn + 1 = h1 Xn + a 1an - h1 2hj¿ 1n2 n 1n2 j¿ = 1 1n2 1n - 12 Xn - j¿ 1n2 = h1 Xn + 1an - h1 2Yn 1n2 = anYn + h1 1Xn - Yn2, (10.114) where the second equality follows from Eq. (10.104). The above equation has a very pleasing interpretation, as shown in Fig. 10.16(b). Since Yn is the prediction for time n, anYn is the prediction for the next time instant, n + 1, based on the “old” information (see Eq. (10.102)). The term 1Xn - Yn2 is called the “innovations,” and it gives the discrepancy between the old prediction and the observation. Finally, the term h1n2 1 is called the gain, henceforth denoted by kn , and it indicates the extent to which the innovations should be used to correct anYn to obtain the “new” prediction Yn + 1 . If we denote the innovations by (10.115) In = Xn - Yn then Eq. (10.114) becomes Yn + 1 = anYn + knIn . (10.116) We still need to determine a means for computing the gain kn . From Eq. (10.115), we have that the innovations satisfy In = Xn - Yn = Zn + Nn - Yn = Zn - Yn + Nn = en + Nn , where en = Zn - Yn is the prediction error. A recursive equation can be obtained for the prediction error: en + 1 = Zn + 1 - Yn + 1 = anZn + Wn - anYn - knIn = an1Zn - Yn2 + Wn - kn1en + Nn2 = 1an - kn2en + Wn - knNn , (10.117) with initial condition e0 = Z0 . Since X0 , Wn , and Nn are zero-mean, it then follows that E3en4 = 0 for all n. A recursive equation for the mean square prediction error is obtained from Eq. (10.117): E3e2n + 14 = 1a n - kn22E3e2n4 + E3W 2n4 + k2nE3N 2n4, (10.118) with initial condition E3e204 = E3Z204. We are finally ready to obtain an expression for the gain kn . The gain kn must minimize the mean square error E3e2n + 14. Therefore we can differentiate Eq. (10.118) with respect to kn and set it equal to zero: 0 = -21a n - kn2E3e2n4 + 2knE3N 2n4. Section 10.5 The Kalman Filter Then we can solve for kn: kn = anE3e2n4 E3e2n4 + E3N 2n4 621 (10.119) . The expression for the mean square prediction error in Eq. (10.118) can be simplified by using Eq. (10.119) (see Problem 10.72): E3e2n + 14 = an1an - kn2E3e2n4 + E3W2n4. (10.120) Equations (10.119), (10.116), and (10.120) when combined yield the recursive procedure that constitutes the Kalman filtering algorithm: Kalman filter algorithm:9 E3e204 = E3Z204 Initialization: Y0 = 0 For n = 0, 1, 2, Á kn = anE3e2n4 E3e2n4 + E3N 2n4 Yn + 1 = a nYn + kn1Xn - Yn2 E3e2n + 14 = an1an - kn2E3e2n4 + E3W2n4. Note that the algorithm requires knowledge of the signal structure, i.e., the an , and the variances E3N 2n4 and E3W2n4. The algorithm can be implemented easily and has consequently found application in a broad range of detection, estimation, and signal processing problems. The algorithm can be extended in matrix form to accommodate a broader range of processes. Example 10.26 First-Order Autoregressive Process Consider a signal defined by Zn = aZn - 1 + Wn n = 1, 2, Á Z0 = 0, where E3W2n4 = s2W = 0.36, and a = 0.8, and suppose the observations are made in additive white noise Xn = Zn + Nn n = 0, 1, 2, Á , where E3N 2n4 = 1. Find the form of the predictor and its mean square error as n : q . The gain at step n is given by aE3e2n4 kn = . E3e2n4 + 1 The mean square error sequence is therefore given by E3e204 = E3Z204 = 0 9 We caution the student that there are two common ways of defining the gain. The statement of the Kalman filter algorithm will differ accordingly in various textbooks. 622 Chapter 10 Analysis and Processing of Random Signals E3e2n + 14 = a1a - kn2E3e2n4 + s2W = a¢ a ≤ E3e2n4 + s2W 1 + E3e2n4 for n = 1, 2, Á . The steady state mean square error eq must satisfy eq = a2 eq + s2W . 1 + eq For a = 0.8 and s2W = 0.36, the resulting quadratic equation yields kq = 0.3 and eq = 0.6. Thus at steady state the predictor is Yn + 1 = 0.8Yn + 0.31Xn - Yn2. *10.6 ESTIMATING THE POWER SPECTRAL DENSITY Let X0 , Á , Xk - 1 be k observations of the discrete-time, zero-mean, wide-sense stationary process Xn . The periodogram estimate for SX1f2 is defined as 1 ' ' pk1f2 = ƒ xk1f2 ƒ 2, k (10.121) ' xk1f2 = a Xme -j2pfm. (10.122) ' where xk1f2 is obtained as a Fourier transform of the observation sequence: k-1 m=0 In Section 10.1 we showed that the expected value of the periodogram estimate is ' E3pk1f24 = a k-1 m¿ = -1k - 12 ƒ m¿ ƒ fRX1m¿2e -j2pfm¿, k e1 - ' so pk1f2 is a biased estimator for SX1f2. However, as k : q , ' E3pk1f24 : SX1f2, (10.123) (10.124) so the mean of the periodogram estimate approaches SX1f2. Before proceeding to find the variance of the periodogram estimate, we note that the periodogram estimate is equivalent to taking the Fourier transform of an estimate for the autocorrelation sequence; that is, ' pk1f2 = a k-1 rNk1m2e -j2pfm, (10.125) m = -1k - 12 where the estimate for the autocorrelation is rNk1m2 = (See Problem 10.77.) 1 k - ƒmƒ - 1 XnXn + m . k na =0 (10.126) Section 10.6 Estimating the Power Spectral Density 623 Periodogram 0.3 0.2 0.1 16 32 k 48 64 FIGURE 10.17 Periodogram for 64 samples of white noise sequence Xn iid uniform in (0, 1), SX1 f 2 = s2X = 1>12 = 0.083. We might expect that as we increase the number of samples k, the periodogram es' timate converges to SX1f2. This does not happen. Instead we find that pk1f2 fluctuates wildly about the true spectral density, and that this random variation does not decrease with increased k (see Fig. 10.17). To see why this happens, in the next section we compute the statistics of the periodogram estimate for a white noise Gaussian random process. We find that the estimates given by the periodogram have a variance that does not approach zero as the number of samples is increased. This explains the lack of improvement in the estimate as k is increased. Furthermore, we show that the periodogram estimates are uncorrelated at uniformly spaced frequencies in the interval -1>2 … f 6 1>2. This explains the erratic appearance of the periodogram estimate as a function of f. In the final section, we obtain another estimate for SX1f2 whose variance does approach zero as k increases. 10.6.1 Variance of Periodogram Estimate Following the approach of [Jenkins and Watts, pp. 230–233], we consider the periodogram of samples of a white noise process with SX1f2 = s2X at the frequencies f = n>k, -k>2 … n 6 k>2, which will cover the frequency range -1>2 … f 6 1>2. (In practice these are the frequencies we would evaluate if we were using the FFT al' gorithm to compute xk1f2.) First we rewrite Eq. (10.122) at f = n>k as follows: 2pmn 2pmn ' n xk a b = a Xm acosa b - j sina bb k k k m=0 k-1 = A k1n2 - jBk1n2 -k>2 … n 6 k>2, (10.127) 624 Chapter 10 Analysis and Processing of Random Signals where A k1n2 = a Xm cosa k-1 m=0 and Bk1n2 = a Xm sina k-1 m=0 2pmn b k (10.128) 2pmn b. k (10.129) Then it follows that the periodogram estimate is n 2 1 1 ' n pk a b = ` xN k a b ` = 5A 2k1n2 + B2k1n26. (10.130) k k k k ' We find the variance of pk1n>k2 from the statistics of A k1n2 and Bk1n2. The random variables A k1n2 and Bk1n2 are defined as linear functions of the jointly Gaussian random variables X0 , Á , Xk - 1 . Therefore A k1n2 and Bk1n2 are also jointly Gaussian random variables. If we take the expected value of Eqs. (10.128) and (10.129) we find (10.131) E3A k1n24 = 0 = E3Bk1n24 for all n. Note also that the n = -k>2 and n = 0 terms are different in that Bk1-k>22 = 0 = Bk102 A k1-k>22 = a 1-12iXi k-1 i=0 (10.132a) A k102 = a Xi . k-1 (10.132b) i=0 The correlation between A k1n2 and A k1m2 (for n, m not equal to -k>2 or 0) is k-1 k-1 2pni 2pml E3A k1n2A k1m24 = a a E3XiXl4 cosa b cosa b k k i=0 l=0 k-1 2pmi 2pni b cosa b = s2X a cosa k k i=0 k-1 k-1 2p1n - m2i 2p1n + m2i 1 1 = s2X a cosa b + s2X a cosa b, k k i=0 2 i=0 2 where we used the fact that E3XiX14 = s2Xdil since the noise is white. The second summation is equal to zero, and the first summation is zero except when n = m. Thus E3A k1n2A k1m24 = 1 2 ks d 2 X nm for all n, m Z -k>2, 0. (10.133a) n, m Z 0 - k>2, 0 (10.133b) It can similarly be shown that E3Bk1n2Bk1m24 = 1 2 ks d 2 X nm E3A k1n2Bk1m24 = 0 for all n, m. (10.133c) Section 10.6 Estimating the Power Spectral Density 625 When n = -k>2 or 0, we have E3A k1n2A k1m24 = ks2X dnm for all m. (10.133d) Equations (10.133a) through (10.133d) imply that A k1n2 and Bk1m2 are uncorrelated random variables. Since A k1n2 and Bk1n2 are jointly Gaussian random variables, this implies that they are zero-mean, independent Gaussian random variables. We are now ready to find the statistics of the periodogram estimates at the frequencies f = n>k. Equation (10.130) gives 1 ' n pk a b = 5A 2k1n2 + B2k1n26 k k n Z -k>2, 0 A 2k1n2 B2k1n2 1 2 + = sX b r. 2 11>22ks2X 11>22ks2X (10.134) The quantity in brackets is the sum of the squares of two zero-mean, unit-variance, independent Gaussian random variables. This is a chi-square random variable with two degrees of freedom (see Problem 7.6). From Table 4.1, we see that a chi-square random variable with v degrees of freedom has variance 2v. Thus the expression in the brackets has variance 4, and the periodogram estimate pN k1n>k2 has variance 2 1 ' n VARcpk a b d = a s2X b 4 = s4X = SX1f22. k 2 For n = -k>2 and n = 0, (10.135a) A 2k1n2 ' n 2 pk a b = sX b r. k ks2X The quantity in brackets is a chi-square random variable with one degree of freedom and variance 2, so the variance of the periodogram estimate is ' n VARcpk a b d = 2s4X k n = -k>2, 0. (10.135b) Thus we conclude from Eqs. (10.135a) and (10.135b) that the variance of the periodogram estimate is proportional to the square of the power spectral density and does not approach zero as k increases. In addition, Eqs. (10.133a) through (10.133d) imply that the periodogram estimates at the frequencies f = -n>k are uncorrelated random variables. A more detailed analysis [Jenkins and Watts, p. 238] shows that for arbitrary f, 2 sin12pfk2 ' VAR3pk1f24 = SX1f22 b 1 + a b r. k sin12pf2 (10.136) Thus variance of the periodogram estimate does not approach zero as the number of samples is increased. The above discussion has only considered the spectrum estimation for a white noise, Gaussian random process, but the general conclusions are also valid for nonwhite, non-Gaussian processes. If the Xi are not Gaussian, we note from Eqs. (10.128) 626 Chapter 10 Analysis and Processing of Random Signals and (10.129) that A k and Bk are approximately Gaussian by the central limit theorem if k is large. Thus the periodogram estimate is then approximately a chi-square random variable. If the process Xi is not white, then it can be viewed as filtered white noise: Xn = hn * Wn , where SW1f2 = s2W and ƒ H1f2 ƒ 2 SW1f2 = SX1f2. The periodograms of Xn and Wn are related by 1 ' n 2 1 n 2 ' n 2 ` xk a b ` = ` H a b ` ` wk a b ` . k k k k k (10.137) ' 2 ƒ xk1n>k2 ƒ 2 ' n . ` wk a b ` = k ƒ H1n>k2 ƒ 2 (10.138) Thus ' From our previous results, we know that ƒ wk1n>k2 ƒ 2>k is a chi-square random variable 4 with variance sW . This implies that ' ƒ xk1n>k2 ƒ 2 n 4 4 VAR B = SX1f22. (10.139) R = ` H a b ` sW k k Thus we conclude that the variance of the periodogram estimate for nonwhite noise is also proportional to SX1f22. 10.6.2 Smoothing of Periodogram Estimate A fundamental result in probability theory is that the sample mean of a sequence of independent realizations of a random variable approaches the true mean with probability one. We obtain an estimate for SX1f2 that goes to zero with the number of observations k by taking the average of N independent periodograms on samples of size k: 1 N' ' 8pk1f29N = pk,i1f2, N ia =1 (10.140) ' where 5pk,i1f26 are N independent periodograms computed using separate sets of k samples each. Figures 10.18 and 10.19 show the N = 10 and N = 50 smoothed periodograms corresponding to the unsmoothed periodogram of Fig. 10.17. It is evident that the variance of the power spectrum estimates is decreasing with N. The mean of the smoothed estimator is 1 N ' ' ' E3pk,i1f24 = E3pk1f24 E8pk1f29N = a N i=1 a k-1 = m¿ = -1k - 12 e1 - ƒ m¿ ƒ fRX1m¿2e -j2pfm¿, k (10.141) where we have used Eq. (10.35). Thus the smoothed estimator has the same mean as the periodogram estimate on a sample of size k. Section 10.6 Estimating the Power Spectral Density Smoothed periodogram 0.3 0.2 0.1 16 32 k 48 64 FIGURE 10.18 Sixty-four-point smoothed periodogram with N = 10, Xn iid uniform in (0, 1), SX1f 2 = 1>12 = 0.083. Smoothed periodogram 0.3 0.2 0.1 16 32 k 48 FIGURE 10.19 Sixty-four-point smoothed periodogram with N = 50, Xn iid uniform in (0, 1), SX1f 2 = 1>12 = 0.083. 64 627 628 Chapter 10 Analysis and Processing of Random Signals The variance of the smoothed estimator is 1 N ' ' VAR38pk1f29N4 = 2 a VAR3pk,i1f24 N i=1 = 1 ' VAR3pk1f24 N M 1 S 1f22. N X Thus the variance of the smoothed estimator can be reduced by increasing N, the number of periodograms used in Eq. (10.140). In practice, a sample set of size Nk, X0 , Á , XNk - 1 is divided into N blocks and a separate periodogram is computed for each block. The smoothed estimate is then the average over the N periodograms.This method is called Bartlett’s smoothing procedure. Note that, in general, the resulting periodograms are not independent because the underlying blocks are not independent. Thus this smoothing procedure must be viewed as an approximation to the computation and averaging of independent periodograms. The choice of k and N is determined by the desired frequency resolution and variance of the estimate. The blocksize k determines the number of frequencies for which the spectral density is computed (i.e., the frequency resolution). The variance of the estimate is controlled by the number of periodograms N. The actual choice of k and N depends on the nature of the signal being investigated. 10.7 NUMERICAL TECHNIQUES FOR PROCESSING RANDOM SIGNALS In this chapter our discussion has combined notions from random processes with basic concepts from signal processing. The processing of signals is a very important area in modern technology and a rich set of techniques and methodologies have been developed to address the needs of specific application areas such as communication systems, speech compression, speech recognition, video compression, face recognition, network and service traffic engineering, etc. In this section we briefly present a number of general tools available for the processing of random signals. We focus on the tools provided in Octave since these are quite useful as well as readily available. 10.7.1 FFT Techniques The Fourier transform relationship between RX1t2 and SX1f2 is fundamental in the study of wide-sense stationary processes and plays a key role in random signal analysis. The fast fourier transform (FFT) methods we developed in Section 7.6 can be applied to the numerical transformation from autocorrelation functions to power spectral densities and back. Consider the computation of RX1t2 and SX1f2 for continuous-time processes: RX1t2 = q L- q SX1f2e W -j2pft df L L-W SX1f2e -j2pft df. Section 10.7 Numerical Techniques for Processing Random Signals 629 First we limit the integral to the region where SX1f2 has significant power. Next we restrict our attention to a discrete set of N = 2M frequency values at kf0 so that -W = -Mf0 6 1-M + 1)f0 6 Á 6 1M - 12f0 6 W, and then approximate the integral by a sum: RX1t2 L -j2pmf0t f0 . a SX1mf02e M-1 m = -M Finally, we also focus on a set of discrete lag values: kt0 so that -T = -Mt0 6 1-M + 16 t0 6 Á 6 1M - 12t0 6 T. We obtain the DFT as follows: RX1kt02 L f0 a SX1mf02e -j2pmkt0f0 = f0 a SX1mf02e -j2pmk>N. (10.142) M-1 M-1 m = -M m = -M In order to have a discrete Fourier transform, we must have t0f0 = 1>N, which is equivalent to: t0 = 1>Nf0 and T = Mt0 = 1>2f0 and W = Mf0 = 1>2t0 . We can use the FFT function introduced in Section 7.6 to perform the transformation in Eq. (10.142) to obtain the set of values 5RX1kt02, k H 3-M, M - 146 from 5SX1mt02, k H 3-M,M - 146. The transformation in the reverse direction is done in the same way. Since RX1t2 and SX1f2 are even functions various simplifications are possible. We discuss some of these in the problems. Consider the computation of SX1f2 and RX1k2 for discrete-time processes. SX1f2 spans the range of frequencies ƒ f ƒ 6 1/2, so we restrict attention to N points 1/N apart: SX a m b = a RX1k2e -j2pkf ` L N k = -q f = m>N q -j2pkm>N . a RX1k2e M-1 (10.143) k = -M The approximation here involves neglecting autocorrelation terms outside 3-M, M - 16. Since df L 1>N, the transformation in the reverse direction is scaled differently: RX1k2 = 1>2 L-1>2 SX1f2e -j2pkf df L 1 M-1 m SX a be -j2pkm>N. N k =a N -M (10.144) We assume that the student has already tried the FFT exercises in Section 7.6, so we leave examples in the use of the FFT to the Problems. The various frequency domain results for linear systems that relate input, output, and cross-spectral densities can be evaluated numerically using the FFT. Example 10.27 Output Autocorrelation and Cross-Correlation Consider Example 10.12, where a random telegraph signal X(t) with a = 1 is passed through a lowpass filter with b = 1 and b = 10. Find RY1t2. The random telegraph has SX1f2 = a>1a2 + p2f22 and the filter has transfer function H1f2 = b>1b + j2pf2, so RY1t2 is given by: RY1t2 = f -1 E ƒ H1f2 ƒ 2 SX1f2 F = q L- q b2 a2 df. b 2 + 4p2 f2 a2 + 4p2 f2 630 Chapter 10 Analysis and Processing of Random Signals 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ⫺25 ⫺20 ⫺15 ⫺10 ⫺5 0 f (a) 5 10 15 20 25 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 ⫺4 ⫺3 ⫺2 ⫺1 0 t 1 2 3 4 (b) FIGURE 10.20 (a) Transfer function and input power spectral density; (b) Autocorrelation of filtered random telegraph with filter b ⴝ 10. We used an N = 256 FFT to evaluate autocorrelation functions numerically for a = 1 and b = 1 and b = 10. Figure 10.20(a) shows ƒ H1f2 ƒ 2 and SX1f2 for b = 10. It can be seen that the transfer function (the dashed line) is close to 1 in the region of f where SX1f2 has most of its power. Consequently we expect the output for b = 10 to have an autocorrelation similar to that of the input. For b = 1, on the other hand, the filter will attenuate more of the significant frequencies of X(t) and we expect more change in the output autocorrelation. Figure 10.20(b) shows the output autocorrelation and we see that indeed for b = 10 (the solid line), RY1t2 is close to the double-sided exponential of RX1t2. For b = 1 the output autocorrelation differs significantly from RX1t2. 10.7.2 Filtering Techniques The autocorrelation and power spectral density functions provide us with information about the average behavior of the processes. We are also interested in obtaining sample functions of the inputs and outputs of systems. For linear systems the principal tools for signal processing are the convolution and Fourier transform. Convolution in discrete-time (Eq. (10.48)) is quite simple and so convolution is the workhorse in linear signal processing. Octave provides several functions for performing convolutions with discrete-time signals. In Example 10.15 we encountered the function filter(b,a,x) which implements filtering of the sequence x with an ARMA filter with coefficients specified by vectors b and a in the following equation. Yn = - a aiYn - i + a b jXn - j . q p i=1 j=0 Other functions use filter(b,a,x) to provide special cases of filtering. For example, conv(a,b) convolves the elements in the vectors a and b. We can obtain the output of a linear system by letting a be the impulse response and b the input random sequence. The moving average example in Fig. 10.7(b) is easily obtained using this conv. Octave provides other functions implementing specific digital filters. Section 10.7 Numerical Techniques for Processing Random Signals 631 We can also obtain the output of a linear system in the frequency domain. We take the FFT of the input sequence Xn and we then multiply it by the FFT of the transfer function. The inverse FFT will then provide Yn of the linear system. The Octave function fftconv(a,b,n) implements this approach. The size of the FFT must be equal to the total number of samples in the input sequence, so this approach is not advisable for long input sequences. 10.7.3 Generation of Random Processes Finally, we are interested in obtaining discrete-time and continuous-time sample functions of the inputs and outputs of systems. Previous chapters provide us with several tools for the generation of random signals that can act as inputs to the systems of interest. Section 5.10 provides the method for generating independent pairs of Gaussian random variables. This method forms the basis for the generation of iid Gaussian sequences and is implemented in normal_rnd=(M,V,Sz). The generation of sequences of WSS but correlated sequences of Gaussian random variables requires more work. One approach is to use the matrix approaches developed in Section 6.6 to generate individual vectors with a specified covariance matrix. To generate a vector Y of n outcomes with covariance K Y , we perform the following factorization: K Y = AT A P L P T, and we generate the vector Y = AT X where X is vector of iid zero-mean, unit-variance Gaussian random variables. The Octave function svd(B) performs a singular value decomposition of the matrix B, see [Long]. When B = K Y is a covariance matrix, svd returns the diagonal matrix D of eigenvalues of K Y as well as the matrices U = P and V = P T. Example 10.28 Generation of Correlated Gaussian Random Variables Generate 256 samples of the autoregressive process in Example 10.14 with a = -0.5, sX = 1. The autocorrelation of the process is given by RX1k2 = 1- 1/22ƒk ƒ. We generate a vector r of the first 256 lags of RX1k2 and use the function toeplitz(r) to generate the covariance matrix. We then call the svd to obtain A. Finally we produce the output vector Y ⴝ AT X. > > > > > > n=[0:255] r=(-0.5).^n; K=toeplitz(r); [U,D,V]=svd(K); X=normal_rnd(0,1,1,256); y=V*(D^0.5)*transpose(X); > plot(y) Figure 10.21(a) shows a plot of Y. To check that the sequence has the desired autocovariance we use the function autocov(X,H) which estimates the autocovariance function of the sequence X for the first H lag values. Figure 10.21(b) shows that the sample correlation coefficient that is obtained by dividing the autocovariance by the sample variance. The plot shows the alternating covariance values and the expected peak values of -0.5 and 0.25 to the first two lags. 632 Chapter 10 Analysis and Processing of Random Signals 1 3 0.8 2 0.6 1 0.4 0.2 0 0 ⫺1 ⫺0.2 ⫺2 ⫺3 ⫺0.4 50 100 150 200 250 ⫺0.6 2 4 6 8 n (a) 10 k 12 14 16 18 20 (b) FIGURE 10.21 (a) Correlated Gaussian noise (b) Sample autocovariance. An alternative approach to generating a correlated sequence of random variables with a specified covariance function is to input an uncorrelated sequence into a linear filter with a specific H( f ). Equation (10.46) allows us to determine the power spectral density of the output sequence. This approach can be implemented using convolution and is applicable to extremely long signal sequences. A large choice of possible filter functions is available for both continuous-time and discrete-time systems. For example, the ARMA model in Example 10.15 is capable of implementing a broad range of transfer functions. Indeed the entire discussion in Section 10.4 was focused on obtaining the transfer function of optimal linear systems in various scenarios. Example 10.29 Generation of White Gaussian Noise Find a method for generating white Gaussian noise for a simulation of a continuous-time communications system. The generation of discrete-time white Gaussian noise is trivial and involves the generation of a sequence of iid Gaussian random variables. The generation of continuous-time white Gaussian noise is not so simple. Recall from Example 10.3 that true white noise has infinite bandwidth and hence infinite power and so is impossible to realize. Real systems however are bandlimited, and hence we always end up dealing with bandlimited white noise. If the system of interest is bandlimited to W Hertz, then we need to model white noise limited to W Hz. In Example 10.3 we found this type of noise has autocorrelation: RX1t2 = N0 sin12pWt2 2pt . The sampling theorem discussed in Section 10.3 allows us to represent bandlimited white Gaussian noise as follows: n 1t2 = X aqX1nT2p1t - nT2 where p1t2 = q n=- sin1pt>T2 pt>T , Checklist of Important Terms 633 where 1>T = 2W. The coefficients X(nT) have autocorrelation RX1nT2 which is given by: RX1nT2 = = N0 sin12pWnT2 2pnT N0W sin1pn2 pn = = b N0 sin12pWn>2W2 2pn>2W N0W 0 for for n = 0 n Z 0. We thus conclude that X(nT) is an iid sequence of Gaussian random variables with variance N0W. Therefore we can simulate sampled bandlimited white Gaussian noise by generating a sequence X(nT). We can perform any processing required in the discrete-time domain, and we can then apply the result to an interpolator to recover the continuous-time output. SUMMARY • The power spectral density of a WSS process is the Fourier transform of its autocorrelation function. The power spectral density of a real-valued random process is a real-valued, nonnegative, even function of frequency. • The output of a linear, time-invariant system is a WSS random process if its input is a WSS random process that is applied an infinite time in the past. • The output of a linear, time-invariant system is a Gaussian WSS random process if its input is a Gaussian WSS random process. • Wide-sense stationary random processes with arbitrary rational power spectral density can be generated by filtering white noise. • The sampling theorem allows the representation of bandlimited continuous-time processes by the sequence of periodic samples of the process. • The orthogonality condition can be used to obtain equations for linear systems that minimize mean square error. These systems arise in filtering, smoothing, and prediction problems. Matrix numerical methods are used to find the optimum linear systems. • The Kalman filter can be used to estimate signals with a structure that keeps the dimensionality of the algorithm fixed even as the size of the observation set increases. • The variance of the periodogram estimate for the power spectral density does not approach zero as the number of samples is increased. An average of several independent periodograms is required to obtain an estimate whose variance does approach zero as the number of samples is increased. • The FFT, convolution, and matrix techniques are basic tools for analyzing, simulating, and implementing processing of random signals. CHECKLIST OF IMPORTANT TERMS Amplitude modulation ARMA process Autoregressive process Bandpass signal Causal system Cross-power spectral density Einstein-Wiener-Khinchin theorem Filtering Impulse response Innovations 634 Chapter 10 Analysis and Processing of Random Signals Kalman filter Linear system Long-range dependence Moving average process Nyquist sampling rate Optimum filter Orthogonality condition Periodogram Power spectral density Prediction Quadrature amplitude modulation Sampling theorem Smoothed periodogram Smoothing System Time-invariant system Transfer function Unit-sample response White noise Wiener filter Wiener-Hopf equations Yule-Walker equations ANNOTATED REFERENCES References [1] through [6] contain good discussions of the notion of power spectral density and of the response of linear systems to random inputs. References [6] and [7] give accessible introductions to the spectral factorization problem. References [7] through [9] discuss linear filtering and power spectrum estimation in the context of digital signal processing. Reference [10] discusses the basic theory underlying power spectrum estimation. 1. A. Papoulis and S. Pillai, Probability, Random Variables, and Stochastic Processes, McGraw-Hill, New York, 2002. 2. H. Stark and J. W. Woods, Probability, Random Processes, and Estimation Theory for Engineers, 3d ed., Prentice Hall, Upper Saddle River, N.J., 2002. 3. R. M. Gray and L. D. Davisson, Random Processes: A Mathematical Approach for Engineers, Prentice Hall, Englewood Cliffs, N.J., 1986. 4. R. D. Yates and D. J. Goodman, Probability and Stochastic Processes, Wiley, New York, 2005. 5. J. A. Gubner, Probability and Random Processes for Electrical and Computer Engineering, Cambridge University Press, Cambridge, 2006. 6. G. R. Cooper and C. D. MacGillem, Probabilistic Methods of Signal and System Analysis, Holt, Rinehart & Winston, New York, 1986. 7. J. A. Cadzow, Foundations of Digital Signal Processing and Data Analysis, Macmillan, New York, 1987. 8. A. V. Oppenheim and R. W. Schafer, Discrete-Time Signal Processing, Prentice Hall, Englewood Cliffs, N.J., 1989. 9. M. Kunt, Digital Signal Processing, Artech House, Dedham, Mass., 1986. 10. G. M. Jenkins and D. G. Watts, Spectral Analysis and Its Applications, Holden Day, San Francisco, 1968. 11. A. Einstein, “Method for the Determination of the Statistical Values of Observations Concerning Quantities Subject to Irregular Observations,” reprinted in IEEE ASSP Magazine, October 1987, p. 6. 12. P. J. G. Long, “Introduction to Octave,” University of Cambridge, September, 2005, available online. Problems 635 PROBLEMS Section 10.1: Power Spectral Density 10.1. Let g(x) denote the triangular function shown in Fig. P10.1. (a) Find the power spectral density corresponding to RX1t2 = g1t>T2. (b) Find the autocorrelation corresponding to the power spectral density SX1f2 = g1f>W2. A ⫺1 0 x 1 FIGURE P10.1 10.2. Let p(x) be the rectangular function shown in Fig. P10.2. Is RX1t2 = p1t>T2 a valid autocorrelation function? A ⫺1 0 x 1 FIGURE P10.2 10.3. (a) Find the power spectral density SY1f2 of a random process with autocorrelation function RX1t2 cos12pf0t2, where RX1t2 is itself an autocorrelation function. (b) Plot SY1f2 if RX1t2 is as in Problem 10.1a. 10.4. (a) Find the autocorrelation function corresponding to the power spectral density shown in Fig. P10.3. (b) Find the total average power. (c) Plot the power in the range ƒ f ƒ 7 f0 as a function of f0 7 0. B A ⫺f 2 ⫺f 1 FIGURE P10.3 A 0 f1 f2 f 636 Chapter 10 Analysis and Processing of Random Signals 10.5. A random process X(t) has autocorrelation given by RX1t2 = s2Xe -t >2a , a > 0. (a) Find the corresponding power spectral density. (b) Find the amount of power contained in the frequencies ƒ f ƒ 7 k / 2pa, where k = 1, 2, 3. 10.6. Let Z1t2 = X1t2 + Y1t2. Under what conditions does SZ1f2 = SX1f2 + SY1f2? 10.7. Show that (a) RX,Y1t2 = RY,X1-t2. (b) SX,Y1f2 = S…Y,X1f2. 10.8. Let Y1t2 = X1t2 - X1t - d2. (a) Find RX,Y1t2 and SX,Y1f2. (b) Find RY1t2 and SY1f2. 10.9. Do Problem 10.8 if X(t) has the triangular autocorrelation function g(t/T ) in Problem 10.1 and Fig. P 10.1. 10.10. Let X(t) and Y(t) be independent wide-sense stationary random processes, and define Z1t2 = X1t2Y1t2. (a) Show that Z(t) is wide-sense stationary. (b) Find RZ1t2 and SZ1f2. 10.11. In Problem 10.10, let X1t2 = a cos12pf0 t + ®2 where ® is a uniform random variable in 10, 2p2. Find RZ1t2 and SZ1f2. 10.12. Let RX1k2 = 4aƒkƒ, ƒ a ƒ 6 1. (a) Find SX1f2. (b) Plot SX1f2 for a = 0.25 and a = 0.75, and comment on the effect of the value of a. 10.13. Let RX1k2 = 41a2ƒkƒ + 161b2ƒk ƒ, a < 1, b < 1. (a) Find SX1f2. (b) Plot SX1f2 for a = b = 0.5 and a = 0.75 = 3b and comment on the effect of value of a>b. 10.14. Let RX1k2 = 911 - ƒ k ƒ >N2, for ƒ k ƒ 6 N and 0 elsewhere. Find and plot SX1f2. 10.15. Let Xn = cos12pf0n + ®2, where ® is a uniformly distributed random variable in the interval 10, 2p2. Find and plot SX1f2 for f0 = 0.5, 1, 1.75, p. 10.16. Let Dn = Xn - Xn - d , where d is an integer constant and Xn is a zero-mean, WSS random process. (a) Find RD1k2 and SD1f2 in terms of RX1k2 and SX1f2. What is the impact of d? (b) Find E3D2n4. 10.17. Find RD1k2 and SD1f2 in Problem 10.16 if Xn is the moving average process of Example 10.7 with a = 1. 10.18. Let Xn be a zero-mean, bandlimited white noise random process with SX1f2 = 1 for ƒ f ƒ 6 fc and 0 elsewhere, where fc 6 1>2. (a) Show that RX1k2 = sin12pfck2>1pk2. (b) Find RX1k2 when fc = 1>4. 10.19. Let Wn be a zero-mean white noise sequence, and let Xn be independent of Wn . (a) Show that Yn = WnXn is a white sequence, and find s2Y. (b) Suppose Xn is a Gaussian random process with autocorrelation RX1k2 = 11>22ƒkƒ. Specify the joint pmf’s for Yn . 2 2 Problems 637 10.20. Evaluate the periodogram estimate for the random process X1t2 = a cos12pf0t + ®2, where ® is a uniformly distributed random variable in the interval 10, 2p2. What happens as T : q ? 10.21. (a) Show how to use the FFT to calculate the periodogram estimate in Eq. (10.32). (b) Generate four realizations of an iid zero-mean unit-variance Gaussian sequence of length 128. Calculate the periodogram. (c) Calculate 50 periodograms as in part b and show the average of the periodograms after every 10 additional realizations. Section 10.2: Response of Linear Systems to Random Signals 10.22. Let X(t) be a differentiable WSS random process, and define Y1t2 = d X1t2. dt Find an expression for SY1f2 and RY1t2. Hint: For this system, H1f2 = j2pf. 10.23. Let Y(t) be the derivative of X(t), a bandlimited white noise process as in Example 10.3. (a) Find SY1f2 and RY1t2. (b) What is the average power of the output? 2 10.24. Repeat Problem 10.23 if X(t) has SX1f2 = b 2e -pf . 10.25. Let Y(t) be a short-term integration of X(t): t Y1t2 = 1 X1t¿2 dt¿. T Lt - T (a) Find the impulse response h(t) and the transfer function H(f). (b) Find SY1f2 in terms of SX1f2. 10.26. In Problem 10.25, let RX1t2 = 11 - ƒ t ƒ >T2 for ƒ t ƒ 6 T and zero elsewhere. (a) Find SY1f2. (b) Find RY1t2. (c) Find E3Y21t24. 10.27. The input into a filter is zero-mean white noise with noise power density N0>2. The filter has transfer function 1 . H1f2 = 1 + j2pf (a) Find SY,X1f2 and RY,X1t2. (b) Find SY1f2 and RY1t2. (c) What is the average power of the output? 10.28. A bandlimited white noise process X(t) is input into a filter with transfer function H1f2 = 1 + j2pf. (a) Find SY,X1f2 and RY,X1t2 in terms of RX1t2 and SX1f2. (b) Find SY1f2 and RY1t2 in terms of RX1t2 and SX1f2. (c) What is the average power of the output? 10.29. (a) A WSS process X(t) is applied to a linear system at t = 0. Find the mean and autocorrelation function of the output process. Show that the output process becomes WSS as t : q . 638 Chapter 10 Analysis and Processing of Random Signals 10.30. Let Y(t) be the output of a linear system with impulse response h(t) and input X(t). Find RY,X1t2 when the input is white noise. Explain how this result can be used to estimate the impulse response of a linear system. 10.31. (a) A WSS Gaussian random process X(t) is applied to two linear systems as shown in Fig. P10.4. Find an expression for the joint pdf of Y1t12 and W1t22. (b) Evaluate part a if X(t) is white Gaussian noise. h1(t) Y(t) h2(t) W(t) X(t) FIGURE P10.4 10.32. Repeat Problem 10.31b if h11t2 and h21t2 are ideal bandpass filters as in Example 10.11. Show that Y(t) and W(t) are independent random processes if the filters have nonoverlapping bands. 10.33. Let Y1t2 = h1t2 * X1t2 and Z1t2 = X1t2 - Y1t2 as shown in Fig. P10.5. (a) Find SZ1f2 in terms of SX1f2. (b) Find E3Z21t24. X(t) h(t) Y(t) ⫹ ⫺ ⫹ Z(t) FIGURE P10.5 10.34. Let Y(t) be the output of a linear system with impulse response h(t) and input X1t2 + N1t2. Let Z1t2 = X1t2 - Y1t2. (a) Find RX,Y1t2 and RZ1t2. (b) Find SZ1f2. (c) Find SZ1f2 if X(t) and N(t) are independent random processes. 10.35. A random telegraph signal is passed through an ideal lowpass filter with cutoff frequency W. Find the power spectral density of the difference between the input and output of the filter. Find the average power of the difference signal. Problems 639 10.36. Let Y1t2 = a cos12pfct + ®2 + N1t2 be applied to an ideal bandpass filter that passes the frequencies ƒ f–fc ƒ 6 W>2. Assume that ® is uniformly distributed in 10, 2p2. Find the ratio of signal power to noise power at the output of the filter. 10.37. Let Yn = 1Xn + 1 + Xn + Xn - 12>3 be a “smoothed” version of Xn . Find RY1k2, SY1f2, and E3Y2n4. 10.38. Suppose Xn is a white Gaussian noise process in Problem 10.37. Find the joint pmf for 1Yn , Yn + 1 , Yn + 22. 10.39. Let Yn = Xn + bXn - 1 , where Xn is a zero-mean, first-order autoregressive process with autocorrelation RX1k2 = s2ak, ƒ a ƒ 6 1. (a) Find RY,X1k2 and SY,X1f2. (b) Find SY1f2, RY1k2, and E3Y2n4. (c) For what value of b is Yn a white noise process? 10.40. A zero-mean white noise sequence is input into a cascade of two systems (see Fig. P10.6). System 1 has impulse response hn = 11>22nu1n2 and system 2 has impulse response gn = 11>42nu1n2 where u1n2 = 1 for n Ú 0 and 0 elsewhere. (a) Find SY1f2 and SZ1f2. (b) Find RW,Y1k2 and RW,Z1k2; find SW,Y1f2 and SW,Z1f2. Hint: Use a partial fraction expansion of SW,Z1f2 prior to finding RW,Z1k2. (c) Find E3Z2n4. Wn hn Yn gn Zn FIGURE P10.6 10.41. A moving average process Xn is produced as follows: Xn = Wn + a1Wn - 1 + Á + apWn - p , where Wn is a zero-mean white noise process. (a) Show that RX1k2 = 0 for ƒ k ƒ 7 p. (b) Find RX1k2 by computing E3Xn + kXn4, then find SX1f2 = f5RX1k26. (c) Find the impulse response hn of the linear system that defines the moving average process. Find the corresponding transfer function H( f ), and then SX1f2. Compare your answer to part b. 10.42. Consider the second-order autoregressive process defined by Yn = 3 1 Yn - 1 - Yn - 2 + Wn , 4 8 where the input Wn is a zero-mean white noise process. (a) Verify that the unit-sample response is hn = 211>22n - 11>42n for n Ú 0, and 0 otherwise. (b) Find the transfer function. (c) Find SY1f2 and RY1k2 = f-15SY1f26. 640 Chapter 10 Analysis and Processing of Random Signals 10.43. Suppose the autoregressive process defined in Problem 10.42 is the input to the following moving average system: Zn = Yn - 1/4Yn - 1 . (a) Find SZ1f2 and RZ1k2. (b) Explain why Zn is a first-order autoregressive process. (c) Find a moving average system that will produce a white noise sequence when Zn is the input. 10.44. An autoregressive process Yn is produced as follows: Yn = a1Yn - 1 + Á + aqYn - q + Wn , where Wn is a zero-mean white noise process. (a) Show that the autocorrelation of Yn satisfies the following set of equations: RY102 = a aiRY1i2 + RW102 q i=1 RY1k2 = a aiRY1k - i2. q i=1 (b) Use these recursive equations to compute the autocorrelation of the process in Example 10.22. Section 10.3: Bandlimited Random Processes 10.45. (a) Show that the signal x(t) is recovered in Figure 10.10(b) as long as the sampling rate is above the Nyquist rate. (b) Suppose that a deterministic signal is sampled at a rate below the Nyquist rate. Use Fig. 10.10(b) to show that the recovered signal contains additional signal components from the adjacent bands. The error introduced by these components is called aliasing. (c) Find an expression for the power spectral density of the sampled bandlimited random process X(t). (d) Find an expression for the power in the aliasing error components. (e) Evaluate the power in the error signal in part c if SX1f2 is as in Problem 10.1b. 10.46. An ideal discrete-time lowpass filter has transfer function: H1f2 = b 1 0 for for ƒ f ƒ 6 fc 6 1>2 fc 6 ƒ f ƒ 6 1>2. (a) Show that H( f ) has impulse response hn = sin12pfcn2>pn. (b) Find the power spectral density of Y(kT) that results when the signal in Problem 10.1b is sampled at the Nyquist rate and processed by the filter in part a. (c) Let Y(t) be the continuous-time signal that results when the output of the filter in part b is fed to an interpolator operating at the Nyquist rate. Find SY1f2. 10.47. In order to design a differentiator for bandlimited processes, the filter in Fig. 10.10(c) is designed to have transfer function: H1f2 = j2pf>T for ƒ f ƒ 6 1/2. Problems 641 (a) Show that the corresponding impulse response is: h0 = 0, hn = 10.48. 10.49. 10.50. 10.51. 10.52. 10.53. 1-12n pn cospn - sinpn = n Z 0 nT pn2T (b) Suppose that X1t2 = a cos12pf0t + ®2 is sampled at a rate 1>T = 4f0 and then input into the above digital filter. Find the output Y(t) of the interpolator. Complete the proof of the sampling theorem by showing that the mean square error is n 11t2 X1kT24 = 0, all k. zero. Hint: First show that E31X1t2-1X Plot the power spectral density of the amplitude modulated signal Y(t) in Example 10.18, assuming fc 7 W; fc 6 W. Assume that A(t) is the signal in Problem 10.1b. Suppose that a random telegraph signal with transition rate a is the input signal in an amplitude modulation system. Plot the power spectral density of the modulated signal assuming fc = a>p and fc = 10a>p. Let the input to an amplitude modulation system be 2 cos12pf1 + £2, where £ is uniformly distributed in 1-p, p2. Find the power spectral density of the modulated signal assuming fc 7 f1 . Find the signal-to-noise ratio in the recovered signal in Example 10.18 if SN1f2 = af2 for ƒ f ; fc ƒ 6 W and zero elsewhere. The input signals to a QAM system are independent random processes with power spectral densities shown in Fig. P10.7. Sketch the power spectral density of the QAM signal. SA( f ) W S B( f ) 0 W W 0 W FIGURE P10.7 10.54. Under what conditions does the receiver shown in Fig. P10.8 recover the input signals to a QAM signal? ⫻ X(t) LPF 2 cos (2πfc t ⫹ ⌰) ⫻ LPF 2 sin (2πfc t ⫹ ⌰) FIGURE P10.8 10.55. Show that Eq. (10.67b) implies that SB,A1f2 is a purely imaginary, odd function of f. 642 Chapter 10 Analysis and Processing of Random Signals Section 10.4: Optimum Linear Systems 10.56. Let Xa = Za + Na as in Example 10.22, where Za is a first-order process with RZ1k2 = 413>42ƒkƒ and Na is white noise with s2N = 1. (a) Find the optimum p = 1 filter for estimating Za . (b) Find the mean square error of the resulting filter. 10.57. Let Xa = Za + Na as in Example 10.21, where Za has RZ1k2 = s2Z1r12ƒkƒ and Na has RN1k2 = s2Nrƒ2kƒ, where r1 and r2 are less than one in magnitude. (a) Find the equation for the optimum filter for estimating Za . (b) Write the matrix equation for the filter coefficients. (c) Solve the p = 2 case, if s2Z = 9, r1 = 2>3, s2N = 1, and r2 = 1>3. (d) Find the mean square error for the optimum filter in part c. (e) Use the matrix function of Octave to solve parts c and d for p = 3, 4, 5. 10.58. Let Xa = Za + Na as in Example 10.21, where Za is the first-order moving average process of Example 10.7, and Na is white noise. (a) Find the equation for the optimum filter for estimating Za . (b) For the p = 1 and p = 2 cases, write and solve the matrix equation for the filter coefficients. (c) Find the mean square error for the optimum filter in part b. 10.59. Let Xa = Za + Na as in Example 10.19, and suppose that an estimator for Za uses observations from the following time instants: I = 5n - p, Á , n, Á , n + p6. (a) Solve the p = 1 case if Za and Na are as in Problem 10.56. (b) Find the mean square error in part a. (c) Find the equation for the optimum filter. (d) Write the matrix equation for the 2p + 1 filter coefficients. (e) Use the matrix function of Octave to solve parts a and b for p = 2, 3. 10.60. Consider the predictor in Eq. (10.86b). (a) Find the optimum predictor coefficients in the p = 2 case when RZ1k2 = 911>32ƒkƒ. (b) Find the mean square error in part a. (c) Use the matrix function of Octave to solve parts a and b for p = 3, 4, 5. 10.61. Let X(t) be a WSS, continuous-time process. (a) Use the orthogonality principle to find the best estimator for X(t) of the form n 1t2 = aX1t 2 + bX1t 2, X 1 2 where t1 and t2 are given time instants. (b) Find the mean square error of the optimum estimator. (c) Check your work by evaluating the answer in part b for t = t1 and t = t2 . Is the answer what you would expect? 10.62. Find the optimum filter and its mean square error in Problem 10.61 if t1 = t - d and t2 = t + d. 10.63. Find the optimum filter and its mean square error in Problem 10.61 if t1 = t - d and t2 = t - 2 d, and RX1t2 = e - aƒ t ƒ Compare the performance of this filter to the performance n 1t2 = aX1t - d2. of the optimum filter of the form X Problems 643 10.64. Modify the system in Problem 10.33 to obtain a model for the estimation error in the optimum infinite-smoothing filter in Example 10.24. Use the model to find an expression for the power spectral density of the error e1t2 = Z1t2 - Y1t2, and then show that the mean square error is given by: q SZ1f2SN1f2 df. E3e21t24 = S L- q Z1f2 + SN1f2 10.65. 10.66. 10.67. 10.68. Hint: E3e21t24 = Re102. Solve the infinite-smoothing problem in Example 10.24 if Z(t) is the random telegraph signal with a = 1/2 and N(t) is white noise. What is the resulting mean square error? Solve the infinite-smoothing problem in Example 10.24 if Z(t) is bandlimited white noise of density N1>2 and N(t) is (infinite-bandwidth) white noise of noise density N0>2. What is the resulting mean square error? Solve the infinite-smoothing problem in Example 10.24 if Z(t) and N(t) are as given in Example 10.25. Find the resulting mean square error. Let Xn = Zn + Nn , where Zn and Nn are independent, zero-mean random processes. (a) Find the smoothing filter given by Eq. (10.89) when Zn is a first-order autoregressive process with s2X = 9 and a = 1/2 and Nn is white noise with s2N = 4. (b) Use the approach in Problem 10.64 to find the power spectral density of the error Se1f2. (c) Find Re1k2 as follows: Let Z = ej2pf, factor the denominator Se1f2, and take the inverse transform to show that: Re1k2 = sX2 z1 a11 - z212 z1ƒkƒ where 0 6 z1 6 1. (d) Find an expression for the resulting mean square error. 10.69. Find the Wiener filter in Example 10.25 if N(t) is white noise of noise density N0>2 = 1>3 and Z(t) has power spectral density Sz1f2 = 4 . 4 + 4p2f2 10.70. Find the mean square error for the Wiener filter found in Example 10.25. Compare this with the mean square error of the infinite-smoothing filter found in Problem 10.67. 10.71. Suppose we wish to estimate (predict) X1t + d2 by n 1t + d2 = X L0 q h1t2X1t - t2 dt. (a) Show that the optimum filter must satisfy RX1t + d2 = L0 q h1x2RX1t - x2 dx t Ú 0. (b) Use the Wiener-Hopf method to find the optimum filter when RX1t2 = e -2ƒtƒ. 10.72. Let Xn = Zn + Nn , where Zn and Nn are independent random processes, Nn is a white noise process with s2N = 1, and Zn is a first-order autoregressive process with RZ1k2 = 411>22ƒk ƒ. We are interested in the optimum filter for estimating Zn from Xn , Xn - 1 , Á . 644 Chapter 10 Analysis and Processing of Random Signals (a) Find SX1f2 and express it in the form: SX1f2 = 1 1 ¢ 1 - e -j2pf ≤ ¢ 1 - z1ej2pf ≤ z1 2z1 a1 - 1 1 -j2pf e b a1 - ej2pf b 2 2 . (b) Find the whitening causal filter. (c) Find the optimal causal filter. Section 10.5: The Kalman Filter 10.73. If Wn and Nn are Gaussian random processes in Eq. (10.102), are Zn and Xn Markov processes? 10.74. Derive Eq. (10.120) for the mean square prediction error. 10.75. Repeat Example 10.26 with a = 0.5 and a = 2. 10.76. Find the Kalman algorithm for the case where the observations are given by Xn = bnZn + Nn where bn is a sequence of known constants. *Section 10.6: Estimating the Power Spectral Density 10.77. Verify Eqs. (10.125) and (10.126) for the periodogram and the autocorrelation function estimate. 10.78. Generate a sequence Xn of iid random variables that are uniformly distributed in (0, 1). (a) Compute several 128-point periodograms and verify the random behavior of the periodogram as a function of f. Does the periodogram vary about the true power spectral density? (b) Compute the smoothed periodogram based on 10, 20, and 50 independent periodograms. Compare the smoothed periodograms to the true power spectral density. 10.79. Repeat Problem 10.78 with Xn a first-order autoregressive process with autocorrelation function: RX1k2 = 1.92ƒkƒ; RX1k2 = 11>22ƒkƒ; RX1k2 = 1.12ƒkƒ. 10.80. Consider the following estimator for the autocorrelation function rN kœ 1m2 = 1 a XnXn + m . k - ƒmƒ - 1 k - ƒmƒ n=0 Show that if we estimate the power spectrum of Xn by the Fourier transform of rN kœ 1m2, the resulting estimator has mean ' E3pk1f24 = a k-1 m¿ = -1k - 12 RX1m¿2e -j2pfm¿. Why is the estimator biased? Section 10.7: Numerical Techniques for Processing Random Signals 10.81. Let X(t) have power spectral density given by SX1f2 = b 2e -f >2W0 > 22p . (a) Before performing an FFT of SX1f2, you are asked to calculate the power in the aliasing error if the signal is treated as if it were bandlimited with bandwidth kW0 . 2 2 Problems 10.82. 10.83. 10.84. 10.85. 645 What value of W should be used for the FFT if the power in the aliasing error is to be less than 1% of the total power? Assume W0 = 1000 and b = 1. (b) Suppose you are to perform N = 2M point FFT of SX1f2. Explore how W, T, and t0 vary as a function of f0 . Discuss what leeway is afforded by increasing N. (c) For the value of W in part a, identify the values of the parameters f0 , T, and t0 for N = 128, 256, 512, 1024. (d) Find the autocorrelation 5RX1kt026 by applying the FFT to SX1f2. Try the options identified in part c and comment on the accuracy of the results by comparing them to the exact value of RX1t2. Use the FFT to calculate and plot SX1f2 for the following discrete-time processes: (a) RX1k2 = 4aƒk ƒ, for a = 0.25 and a = 0.75. (b) RX1k2 = 411>22ƒk ƒ + 1611>42ƒkƒ.. (c) Xn = cos12pf0n + ®2, where ® is a uniformly distributed in (0, 2p] and f0 = 1000. Use the FFT to calculate and plot RX1k2 for the following discrete-time processes: (a) SX1f2 = 1 for ƒ f ƒ 6 fc and 0 elsewhere, where fc = 1/8, 1/4, 3/8. (b) SX1f2 = 1/2 + 1/2 cos 2pf for ƒ f ƒ 6 1/2. Use the FFT to find the output power spectral density in the following systems: (a) Input Xn with RX1k2 = 4aƒkƒ, for a = 0.25, H1f2 = 1 for ƒ f ƒ 6 1/4. (b) Input Xn = cos12pf0n + ®2, where ® is a uniformly distributed random variable and H1f2 = j2pf for ƒ f ƒ 6 1/2. (c) Input Xn with RX(k) as in Problem 10.14 with N = 3 and H1f2 = 1 for ƒ f ƒ 6 1/2. (a) Show that RX1t2 = 2Re b L0 q SX1f2e -j2pft df r . (b) Use approximations to express the above as a DFT relating N points in the time domain to N points in the frequency domain. (c) Suppose we meet the t0f0 = 1>N requirement by letting t0 = f0 = 1> 2N. Compare this to the approach leading to Eq. (10.142). 10.86. (a) Generate a sequence of 1024 zero-mean unit-variance Gaussian random variables and pass it through a system with impulse response hn = e -2n for n Ú 0. (b) Estimate the autocovariance of the output process of the digital filter and compare it to the theoretical autocovariance. (c) What is the pdf of the continuous-time process that results if the output of the digital filter is fed into an interpolator? 10.87. (a) Use the covariance matrix factorization approach to generate a sequence of 1024 Gaussian samples with autocovariance h1t2 = e -2ƒtƒ. (b) Estimate the autocovariance of the observed sequence and compare to the theoretical result. Problems Requiring Cumulative Knowledge 10.88. Does the pulse amplitude modulation signal in Example 9.38 have a power spectral density? Explain why or why not. If the answer is yes, find the power spectral density. 10.89. Compare the operation and performance of the Wiener and Kalman filters for the signals discussed in Example 10.26. 646 Chapter 10 Analysis and Processing of Random Signals 10.90. (a) Find the power spectral density of the ARMA process in Example 10.15 by finding the transfer function of the associated linear system. (b) For the ARMA process find the cross-power spectral density from E3YnXm4, and then the power spectral density from E3YnYm4. 10.91. Let X11t2 and X21t2 be jointly WSS and jointly Gaussian random processes that are input into two linear time-invariant systems as shown below: X11t2 : 冷 h11t2 冷 : Y11t2 X21t2 : 冷 h21t2 冷 : Y21t2 (a) Find the cross-correlation function of Y11t2 and Y21t2. Find the corresponding crosspower spectral density. (b) Show that Y11t2 and Y21t2 are jointly WSS and jointly Gaussian random processes. (c) Suppose that the transfer functions of the above systems are nonoverlapping, that is, ƒ H11f2 ƒ ƒ H21f2 ƒ = 0. Show that Y11t2 and Y21t2 are independent random processes. (d) Now suppose that X11t2 and X21t2 are nonstationary jointly Gaussian random processes. Which of the above results still hold? 10.92. Consider the communication system in Example 9.38 where the transmitted signal X(t) consists of a sequence of pulses that convey binary information. Suppose that the pulses p(t) are given by the impulse response of the ideal lowpass filter in Figure 10.6.The signal that arrives at the receiver is Y1t2 = X1t2 + N1t2 which is to be sampled and processed digitally. (a) At what rate should Y(t) be sampled? (b) How should the bit carried by each pulse be recovered based on the samples Y(nT)? (c) What is the probability of error in this system?