DMC ExQuiz Sol

Chapter 30
Solutions to Quizes & Exercises

Do the quizzes and exercises by yourself. When you get stuck, take a peak here.
Chapter 0
Numbers and Sets.
1. 252 = 2 × 2 × 3 × 3 × 7.
2. minimum is 3.
3. union = {1, 3, 7, 8, 9, 10, 19}; intersection = {3, 10}.
4. Yes, there must be a minimum element and the minimum is at most 18.
5. 5 (integer, rational, real); 34 (rational, real); π (real).
6. 3k, 3k + 3.
Logarithms and Exponentials.
1. ln(12) = ln(2 × 2 × 3) = ln(2) + ln(2) + ln(3) ≈ 2.484.
2. 220 = (210 )2 ≈ 10002 = 106 .
3. ln(1 × 2 × 3 × · · · × 10) = (ln 1 + ln 2 + ln 3 + · · · + ln 10).
4. How are 2a /2b = 2a−b . 20 = 1.
5. By definition of log10 , 100 = 10log10 100 . Taking log2 of both sides,
log2 100 = log2 (10log10 100 ) = log10 100 × log2 10.
logβ x
More generally, x = β ; taking logα of both sides, logα x = logβ x × logα β.
Sums and Products.
1. (a) 1 + 2 + 3 + · · · + 1000 = 12 × 1000 × 1001 = 500, 500.
(b) 1 + 2 + 3 + · · · + n = 21 n(n + 1).
(c) 1 + 71 + 712 + 713 + 714 + · · · = 1−1 1 = 76 .
7
2. 5! = 20; n! = n × (n − 1) × (n − 2) × · · · × 2 × 1; 0! = 1.
1000
P 1000
P 1000
P
3. i= k = 21 × 1000 × 1001 = 500, 500. i = 1000 × i.
i=1 k=1 k=1
k
P n
P
1
4. 1 + 2 + 3 + · · · + k = i= 2
k(k + 1). k = 21 n(n + 1).
i=1 k=1
k
P k
Q
5. ln(i) = ln(k!); i = k!.
i=1 i=1
Algebra.
1. (1 + 2)2 = 32 = 9. Also, (1 + 2)2 = 12 + 2 × 1 × 2 + 22 = 9.
2. (a + b)2 = a2 + 2ab + b2 ; (a + b)3 = a3 + 3a2 b + 3ab2 + b3 ; (a + b)4 6= a4 + 4a3 b + 4a2 b2 + 4ab3 + b4 .
3. x2 − 5x − 6 = (x − 6)(x + 1) = 0, therefore the roots are x = 6 and x = −1
439
30. Solutions to Quizes & Exercises
4. To get solutions to e2x − 5ex − 6 = 0, set y = ex ; then y 2 − 5y − 6 = 0 and (using √

the previous
problem) y = ex = 6 or y = ex = −1. Therefore x = ln 6 or x = iπ where i = −1. Other
solutions are obtained by adding integer multiples of 2πi.
5. x + y = 2 and 2x + 3y = 7 implies x = −1 and y = 3.
6. x3x+11 4 1 3x+11 3 2
2 −x−6 = x−3 − x+2 and x2 +6x+9 = x+3 + (x+3)2 .
Calculus.
1. Derivatives: 3x2 ; 2e2x ; 2x ln 2; − x12 ; − x23 ; x1 ; x ln
1
2
; x1 .
1 4 1 2x 1 x 1
2. Integrals: 4 x ; 2 e ; ln 2 2 ; ln |x|; − x .
3. Limits as x → 0:
ex − 1 x→0 1 ex − 1 x→0 ex − 1 x→0 ex − 1 x→0 ex − 1 x→0 1
−→ ; −→ 0; −→ ∞; −→ 1; −→ .
sin(2x) 2 1+x sin(x2 ) x + x2 e2x − 1 2
x x→∞ ex −1 x→∞ 1 ex x→∞
4. Limits as x → ∞: ee2x−1
−1
−→ 0; x3 +2ex
−→ 2 ; xx
−→ 0.
5. Which of these series converges:
1 + 2 + 2 2 + 23 + 24 + · · · diverges
1 + 12 + ( 12 )2 + ( 21 )3 + ( 12 )4 + · · · converges to 2
1 − 1 + 1 − 1 + 1 − 1 + 1 − 1 + ··· diverges
1 + 21 + 31 + 14 + · · · diverges
1 − 21 + 31 − 14 + 51 − 61 + · · · converges to approx. 0.6931
1 1 1 π 2 1 π 4 1 π 6
6. f (x) = 2+sin(x)
= 3
+ 18
(x − 2
) + 216
(x + ···.
− 2
) + 6480
(x − 2
)
RT 1 2
7. Using the substitution u = arctan(x), du = dx/(1 + x ) and so 0 dx 1+x 2 = arctan(T ).
Rt 2 x d 2 t
8. For f (t) = 0 dx sin(1 + x e ), dt f (t) = sin(1 + t e ).
Pop Quiz 0.1. A powerful tactic when a problem looks hard is to make it easier. Suppose the
letters lined up vertically. That’s trivial. Now morph this simple problem into the one we want.
Easier problem Modify: move A Modify: move C Solution.
B B B B
A A A A
C C C C
C B A C B A C B A C B A
Do not underestimate the power of simplification. Be on the lookout for this technique of making
a problem easier. It helps to understand a problem. It builds confidence when you can solve
something. It can pinpoint where the difficulty is in the harder problem. We call it tinkering.
Chapter 1
Pop Quiz 1.1. The red square is safe. The final infection is on the right.
Exercise 1.2. Six infections won’t infect the whole grid; seven is the minimum.
Chapter 2
Pop Quiz 2.1. O = {n | n = 2k − 1; k ∈ N}
Exercise 2.2. True if the set of pigs that fly pf is a subset of the set of things which are green
with purple spots gp. Since pf is the empty set, pf ⊆ gp and so it is a true sentence.
Pop Quiz 2.3. M ∩ V = {a, i}; M ∪ V = {m, a, e, i, k, l, o, u};
taking the universal set as the lower case letters, M = {lower case letters other than m, a, l, i, k}.
Exercise 2.4.
(a) Visually different because the nodes and edges are drawn in different positions.
(b) The friendships are the same, so it could be the same network.
Exercise 2.5. (a) {−1, −2, . . .} = {n | n = −k; k ∈ N} (b) {1, 21 , 13 , . . .} = {r | r = n1 ; n ∈ N}
440
Chapter 3
Pop Quiz 3.1. It is not easy to verify. You might try asking A for a dream; then see if B has that
same dream; if so, ask C, and so on. If A, B, C, D, E, F all have that same dream, then you have
verified the claim. If not, ask A for another dream and repeat. Either A will run out of dreams,
or you will verify the claim. The process will terminate if A has a finite number of dreams.
Exercise 3.2. (a) f. (b) Don’t know yet. (c) Who is Kilam? (d) t.
Exercise 3.3. p ∧ r: It is raining and it is cloudy (f; it can be cloudy without rain).
p → q: If it is raining then Kilam has his umbrella (t; Kilam is a smart guy).
p → r: If it is raining then it is cloudy (t; you need clouds for rain).
q → r: If Kilam has his umbrella then it is cloudy (t; why does Kilam have an umbrella?).
q → p: If Kilam has his umbrella then it is raining (f as it could just be cloudy).
r → p: If it is cloudy then it is raining (f it can be cloudy without rain).
Exercise 3.4.
(a) t; (i) Yes, it is cloudy. (ii) No, it is clear.
(b) f; (i) We don’t know if it is raining.
(c) t; (i) Yes. (ii) Don’t know; you could just be smart. (iii) Don’t know. (iv) No.
(d) t; (i) Yes. (ii) Don’t know; you could just be wandering. (iii) Not hungry; not thirsty.
Pop Quiz 3.5. In C ++ , k is or, and && is and. To show that both codes execute the instructions
for the same x, y values, define the propositions p : x > 0, q : y > 1 and r : x < y.
The left code tests p ∨ (q ∧ r) before executing the instructions,
p q r p ∨ (q ∧ r) (p ∨ q)
and the right tests p∨q. We show their truth-tables (right). The
highlighted row 3 is a problem. The truth values are different. 1. f f f f f
Let’s examine closer: p is f, x ≤ 0; q is t, y > 1; and, r is 2. f f t f f
f, x ≥ y. This row in the truth-table is impossible: x ≤ 0 3. f t f f t
and y > 1 implies x < y, so r is t. To compare compound 4. f t t t t
propositions, you only need to consider all the possible truth
5. t f f t t
values of the basic propositions. If the basic propositions are
independent all 8 possibilities are relevant: p ∨ (q ∧ r) is not 6. t f t t t
equivalent to p ∨ q in general. In our case, p, q, r being f,t,f is 7. t t f t t
not possible: our basic propositions are not independent because 8. t t t t t
the truth value of r is constrained by the truth values of p and q.
For all possible truth values of p, q, r, the compound propositions match, so the two snipets perform
identical computations. The right snipet is simpler, uses fewer operations and, if implemented in
hardware, requires fewer gates. These can be important considerations in some applications.
Exercise 3.6.
eqv eqv
¬p → q ≡ ¬q → p ≡ p ∨ q
eqv
¬(p ∨ q) ≡ ¬p ∧ ¬q
eqv eqv
(q ∧ ¬r) → ¬p ≡ (¬p ∨ ¬q) ∨ r ≡ (p ∧ q) → r
eqv
p ∨ (q ∨ r) ≡ ¬r → (p ∨ q)
Pop Quiz 3.7. (a) n ∈ N. (b) A predicate cannot be t or f. (c) “4 is a perfect square.” (d) P (4)
and P (9) are t.
Exercise 3.8.
(a) P (x) = “x has grey hair”. P (Kilam).
(b) P (x) = “Map x can be colored with 4 colors with adjacent countries having different colors”.
∀x : P (x).
(c) P (n) = “Integer n is a sum of two primes.”. ∀n ∈ E : P (n) (E is set of even natural numbers).
(d) P (x) = “x has blue eyes and blond hair”. ¬∃x : P (x).
There can be many ways to formulate a statement with predicates. Here is another way.
P (x) = “x has blue eyes;” Q(x) = “x has blonde hair.” ¬∃x : P (x) ∧ Q(x).
Exercise 3.9.
(∃a : G(a)) ∧ (∃a : H(a)): someone has blue eyes and someone has blonde hair.
(∃b : G(b)) ∧ (∃c : H(c)): someone has blue eyes and someone has blonde hair. A quantified
statement does not change when you change the name of a (variable) parameter.
441
(∃a : G(a)) ∧ H(c): This is not a statement; it is a predicate. “someone has blue eyes and c has
blond hair.” It only becomes a statement once you specify a value for c.
Exercise 3.10.
(a) (i) ∀a : (∀d : P (a, d)): every American has every dream.
∀d : (∀a : P (a, d)): Every dream is had by every American. (both are equivalent)
(ii) ∃a : (∃d : P (a, d)): Some American has a dream.
∃d : (∃a : P (a, d)): Some dream is had by at least one American. (both are equivalent)
(b) They are valid predicates. In English:
Q(a) = ∃d : P (a, d) = “Some dream is had by American a”
R(d) = ∀a : P (a, d) = “All Americans have dream d”
∃d : (∀a : P (a, d)) = ∃d : R(d)
∀a : (∃d : P (a, d)) = ∀a : Q(a)
Exercise 3.11. (a) is easier to disprove. To disprove (a), you just need to find a single instance
n
of n for which 22 + 1 is not prime. To disprove (b), you need to show that for every choice of
(a, b, c), a + b 6= c3 . Disproving a “there exists” is typically harder than disproving a “for all”.
3 3
This is related to the fact that proving a “for all” is harder than proving a “there exists”. This is
eqv
because ¬∃x : P (x) ≡ ∀x : ¬P (x), so showing that a “there exists” statement is false means you
eqv
are showing that a “for all” statement is true. Similarly ¬∀x : P (x) ≡ ∃x : ¬P (x).
Chapter 4
Pop Quiz 4.1.
(a) p : n is greater than 2 and even q : n is the sum of two primes
(b) p : x and y are rational q : x + y is √
rational √
−b+ b2 −4ac −b− b2 −4ac
(c) p : ax2 + bx + c = 0 and a 6= 0 q:x= 2a
or x = 2a
Exercise 4.2.
(a) Proof. We use a direct proof.
1: Assume that a is divisible by b and b is divisible by c.
2: This means there are integers k, ℓ for which a = kb and b = ℓc.
3: Then, a = kb = kℓc = mc, where m = kℓ.
4: Since m = kℓ is an integer, a is divisible by c, as was to be shown.
(b) A proof does not have to be written in algorithmic steps.
Proof. Let x and y be arbitrary real numbers. First observe that ±x ≤ |x| and ±y ≤ |y|.
There are two cases: (i) x + y ≥ 0, in which case |x + y| = x + y ≤ |x| + |y| (because x ≤ |x|
and y ≤ |y|). (ii) x + y < 0, in which case |x + y| = −(x + y) = −x − y ≤ |x| + |y| (because
−x ≤ |x| and −y ≤ |y|). In both cases |x + y| ≤ |x| + |y|.
(c) Proof. Consider any four consecutive integers, and let x be the minimum, so the four integers
are x, x + 1, x + 2, x + 3. One of these four numbers must be divisible by 4, and so equals 4k.
Among the remaining numbers, two are consecutive so one is divisible by 2 and so equals 2ℓ.
Therefore the product of all four numbers is 4k × 2ℓ × (integer), which is a multiple of 8.
The proof uses subtle reasoning. We leave it to the reader to give a more detailed proof based
on cases. Consider the remainder when x is divisible by 4 (the remainder is either 0,1,2,3).
So there are 4 cases: x = 4k, x = 4k + 1, x = 4k + 2, x = 4k + 3. Show that in each of
the four cases, the product is divisible by 8. For example, if x = 4k then the product is
4k(4k + 1)(4k + 2)(4k + 3) = 8k(4k + 1)(2k + 1)(4k + 3).
Pop Quiz 4.3. You need to find one n∗ ∈ D for which Q(n∗ ) is f. It is equivalent to disproving
the implication: if n ∈ D, then Q(n).
Exercise 4.4.
(a) The two truth-tables are identical. The only case p → q is f is with p t and q f. The only
case ¬q → ¬p is f is with ¬q t and ¬p f, i.e. with p t and q f. The only case
(b) (i) if the grass is not wet, then it did not rain last night.
(ii) if x ≤ 10 and y ≤ 10, then one of x, y is not positive or xy ≤ 100.
442
Proof. (By contraposition) Suppose x ≤ 10 and y ≤ 10. There are two cases.
Case 1: One of x, y is not positive in which case “one of x, y is not positive or xy ≤ 100”
is t and there is nothing further to prove.
Case 2: Both x, y are positive, so 0 < x, y ≤ 10. In this case xy ≤ 10 × 10 = 100 and so
“one of x, y is not positive or xy ≤ 100” is t.
(iii) if you do not stay at home, then the mall is not crowded.
√
(iv) if r is rational then r is rational.
√ √
Proof. (By contraposition) Suppose r is rational. Then r = a/b for integer a and
natural number b. This means r = a /b which is rational because a2 is an integer and
2 2
b2 is a natural number.
eqv
Pop Quiz 4.5. The truth-tables are the same: p ↔ q ≡ (p → q) ∧ (q → p) (logically equivalent).
Exercise 4.6.
(a) (Do not intersect, but not parallel according to the true definition.)
(b) Two line segments (in 3-dimensions) are parallel if and only if they both lie in the same plane
and when both are extended to infinity in both directions, there is no point of intersection.
(c) A triangle is isosceles if and only if at least two sides have the same length.
Exercise 4.7.
(a) To get a contradiction, suppose there are m, n ∈ Z with 21m + 9n = 3(7m + 3n) = 1. 3
divides the LHS, therefore 3 divides 1. FISHY! This contradiction proves the claim.
√
(b) Suppose x, y > 0 and x + y < 2 xy. Both sides of the inequality are positive. Squaring,
(x + y) < 4xy, or x + 2xy + y < 4xy, or x2 − 2y + y 2 < 0 or (x − y)2 < 0. This is FISHY
2 2 2
because the square of a real numnber cannot be negative. This contradiction proves the claim.
(c) Suppose that m and n are both odd. Then m = 2k + 1 and n = 2ℓ + 1. m2 + n2 =
4k 2 + 4k + 1 + 4ℓ2 + 4ℓ + 1. Since m2 + n2 is divisible by 4, m2 + n2 = 4s, therefore
4s = 4(k 2 + k + ℓ2 + ℓ) + 2, or 4(s − k 2 − k − ℓ2 − ℓ) = 2 or 2(s − k 2 − k − ℓ2 − ℓ) = 1. The LHS
is divisible by 2 therefore 1 is divisible by 2, FISHY. This contradiction proves the claim.
Exercise 4.8.
(a) Direct proof because the result clearly follows from the assumption that x is real.
(b) Contraposition, for if n is even (not odd), then it is easy to show by algebra that n 2 is odd.
(c) Direct proof because by simple algebra if n is odd one can square and show n 2 is even.
(d) Show an example of a number that is not a square and prove it.
(e) Direct proof because, by simple algebra, a product of two ratios is a ratio.
(f) Direct proof. By simple algebra one can show that a product of two odd numbers is odd.
(g) Contradiction.
√ This gives you something to start with. You assume that a rational p/q equals
6. The contradiction will show that this rational does not exist.
(h) This does not seem to fall into a standard situation. When you can’t see where to start, often
contradiction is your best bet, because a proof by contradiction gives you something to work
with. Let us illustrate how contradiction can be used to prove this non-trivial result.
Proof. Let x1 , . . . , xn be arbitrary numbers and let µ = (x1 + · · · + xn )/n be the average, so
that x1 + · · · + xn = nµ. Now assume that every xi < µ (to obtain a contradiction). Then
x1 + · · · + xn < µ + · · · + µ = nµ. This is a contradiction. Therefore, not every xi < µ, so at
least one number is as large (or larger) than the average µ.
This is a very important fact; commit it to memory. There is always a number that is at least
as large as the average. Does there have to be at a number that is larger than the average?
Pop Quiz 4.9. We prove that x ∈ A ∩ B → x ∈ C using direct proof. Assume x ∈ A ∩ B. Then
x ∈ A and x ∈ B, so x is even and x = 9k. If k is odd, then x is the product of two odd numbers
which is odd. Therefore, k is even (to make x even). So, k = 2n, which means x = 18n = 6 · (3n),
which is a multiple of 6. Therefore, x ∈ C, which concludes the proof.
Exercise 4.10.
443
(a) (A ∩ B) ∪ (A ∩ C) = A ∩ (B ∪ C):
B B B B B
−→
union
←−
intersect
A C A C A C A C A C
A∩B A∩C A ∩ (B ∪ C) A B∪C

(A ∩ B) ∪ (A ∩ C)
Suppose x ∈ (A ∩ B) ∪ (A ∩ C). Then either x ∈ A and x ∈ B or x ∈ A and x ∈ C. In
both cases, x ∈ A and x ∈ B ∪ C so x ∈ A ∩ (B ∪ C). Now suppose x ∈ A ∩ (B ∪ C). Then
x ∈ A and either x ∈ B or x ∈ C. If x ∈ B, then x ∈ (A ∩ B) and so x ∈ (A ∩ B) ∪ (A ∩ C).
Similarly, if x ∈ C, then x ∈ (A ∩ C) and so x ∈ (A ∩ B) ∪ (A ∩ C).
(b) A ∪ B = A ∩ B:
−→ ←−
union intersect
A B A B A B complement A B A B
A B A ∪ B, A ∩ B A B
Suppose x ∈ A ∪ B This means x ∈ A or x ∈ B. x ∈ A → x 6∈ A → x 6∈ A ∩ B → x ∈ A ∩ B;
x ∈ B → x 6∈ B → x 6∈ A ∩ B → x ∈ A ∩ B. In both cases, x ∈ A ∩ B.
Now, suppose x ∈ A ∩ B, that is x 6∈ A ∩ B. So either x 6∈ A or x 6∈ B. x 6∈ A → x ∈ A →
x ∈ A ∪ B; x 6∈ B → x ∈ B → x ∈ A ∪ B. In both cases, x ∈ A ∪ B.
Chapter 5
Exercise 5.1.
(a) S(n) is a sum of integers so it is an integer, call it k.
(b) By the high-school geometric sum formula: 1+4+42 +· · ·+4n−1 = (4n −1)/(4−1) = (4n −1)/3.
(c) Therefore k = (4n − 1)/3, or 4n − 1 = 3k. That is, 4n − 1 is divisible by 3.
Exercise 5.2. (a) n ≥ 2 (b) n ≥ 0 (c) n = 0, 1 (d) n ≥ 1 (e) n ≥ 1
Exercise 5.3.
n−1
P
(a) Define the claim P (n) : a + id = na + 12 n(n − 1)d.
i=1
1: [Base case] P (1) claims that a = a, which is clearly t.
2: [Induction step] We show P (n) → P (n + 1) for all n ≥ 1, using a direct proof.
Pn−1 1
Assume (induction hypothesis)
Pn P (n) is t: i=1 a + id = na + 2 n(n − 1)d.
1
Show P (n + 1) is t: Pi=1 a + id = (n + 1)a + 2 (n + 1)(n)d.
We compute the sum n i=1 a + id as follows:
n
P n−1
P
a + id = a + nd + a + id
i=1 i=1
= a + nd + na + 21 n(n − 1)d
ih
= (n + 1)a + 12 (n(n − 1) + 2n)d = (n + 1)a + 21 (n + 1)nd

(IH stands for “by the induction hypothesis”). We have shown P (n + 1) is t, as needed.
3: By induction, P (n) is t ∀n ≥ 1.
n−1
P
(b) Define the claim P (n) : ari = a(rn − 1)/(r − 1).
i=1
1: [Base case] P (1) claims that a = a, which is clearly t.
Pn−1 i a(r n −1)
Assume (induction hypothesis) P (n) is t: i=1 ar = r−1
.
Pn i a(r n+1 −1)
Show P (n + 1) is t: i=1 ar = r−1
.
444
Pn
We compute the sum i=1 ari as follows:
n
P n−1
P
ari = arn + nd + ar i
i=1 i=1
a(r n −1) a(r n+1 −r n +r n −1) a(r n+1 −1)
= arn +
ih
r−1
= r−1
= r−1
.
We have shown that P (n + 1) is t, as needed.
(c) Define the claim P (n) : n ≤ 2n .
1: [Base case] P (1) claims that 1 ≤ 21 , which is clearly t.
Assume (induction hypothesis) P (n) is t: n ≤ 2n .
Show P (n + 1) is t: n + 1 ≤ 2n+1 .
ih
n + 1 ≤ 2n + 1 ≤ 2n + 2n = 2n+1 .
(d) Define the claim P (n) : 5n − 1 is divisible by 4.
1: [Base case] P (1) claims that 5 − 1, is divisible by 4, which is clearly t.
Assume (induction hypothesis) P (n) is t: 5n − 1 is divisible by 4, so 5n − 1 = 4k.
Show P (n + 1) is t: 5n+1 − 1 is divisible by 4.
5n+1 − 1 = 5 · 5n − 1 = 5 · (4k + 1) − 1 = 20k + 4 = 4(5k + 1).
ih
Therefore 5n+1 − 1 is divisible by 4, and we have shown that P (n + 1) is t.

Pn
(e) Define the claim P (n) : i · i! = (n + 1)! − 1.
i=1
1: [Base case] P (1) claims that 1 = 2! − 1, which is clearly t.
n
P
Assume (induction hypothesis) P (n) is t: i · i! = (n + 1)! − 1.
i=1
n+1
P n+1
P
Show P (n + 1) is t: i · i! = (n + 2)! − 1. We compute i · i! as follows:
i=1 i=1
n+1
P n
P
i · i! = (n + 1)(n + 1)! + i · i!
i=1 i=1
ih
= (n + 1)(n + 1)! + (n + 1)! − 1
= (n + 1)!(n + 1 + 1) − 1 = (n + 2)! − 1.
Pop Quiz 5.4. The claim is readily verified by substituting a0 , a1 , a2 , a3 into the four equations.
Exercise 5.5.
(a) Tinker. Compute S(n) for small n: n 1 2 3 4 5 6 7 8 9 10 ···
S(n) 1 4 9 16 25 36 49 64 81 100 ···
A reasonable guess is S(n) = n2 . The proof by induction follows the standard template. For
the base case, S(1) = 1 = 12 . Suppose S(n) = n2 and consider
S(n + 1) = S(n) + 2n + 1=n2 + 2n + 1 = (n + 1)2 .
ih
(ih stands for “by the induction hypothesis”). By induction, S(n) = n 2 for all n ≥ 1.
(b) As usual, first tinker with small n: n 1 2 3 4 5 6 ···
S(n) 1 9 36 100 225 441 · · ·
Pn
i=1 i 1 3 6 10 15 21 ···
445
P
A reasonable guess is S(n) = ( n 2
i=1 i) . The proof by
Pinduction follows the standard template.
For the base case, S(1) = 1 = 1 . Suppose S(n) = ( n
2
i=1 i) 2
and consider
n
P 2
S(n + 1) = S(n) + (n + 1)3 = i + (n + 1)3 = 14 n2 (n + 1)2 + (n + 1)3 ,
ih
i=1
Pn
where the last step follows from the formula i=1 i = 21 n(n + 1). Therefore,
S(n) = 14 (n + 1) (n + 4(n + 1)) = 41 (n + 1)2 (n + 2)2 .
2 2
P P
The lastexpression is ( n+1 2
i=1 i) . By induction, S(n) = ( n 2
i=1 i) for all n ≥ 1.
Pop Quiz 5.6. (a) n ≥ 3 (b) n = 3, 4 (c) n ≥ 1
Exercise 5.7.
(i) Define the set C = {x + z0 | x ∈ B}. Then C contains only natural numbers, and is non-
empty because B is non-empty. By the well-ordering principle C has a minimum element
c∗ = b∗ + z0 where b∗ ∈ B. Consider any b ∈ B. Then c = b + z0 ∈ C and therefore c∗ ≤ c,
i.e. b∗ + z0 ≤ b + z0 or b∗ ≤ b. This proves that b∗ is a minimum element of B.
Pn∗ −1
(ii) (a) Let n∗ be the smallest counter-example; n∗ ≥ 2 (P (1) is t). Therefore P i=0 a + id 6=
n∗ −2
n∗ a + 21 n∗ (n∗ − 1)d; and, because n∗ is the smallest counter-example, i=0 a + id =
(n∗ − 1)a + 12 (n∗ − 1)(n∗ − 2)d. But,
nP
∗ −1 nP
∗ −2
a + id = a + (n∗ − 1)d + a + id
i=0 i=0
= a + (n∗ − 1)d + (n∗ − 1)a + 12 (n∗ − 1)(n∗ − 2)d
= n∗ a + 12 n∗ (n∗ − 1)d,
which contradicts n∗ being a counter-example: there is no counter-example. Pn∗ −1 i
(b) Let n∗ be the smallest counter-example; n∗ ≥ 2 (P (1) is t). Therefore i=0 ar 6=
a(rn∗ − 1)/(r − 1); and, because
Pn∗ −2 i n ∗ is the smallest counter-example, n∗ − 1 ≥ 1 is not a
counter-example, so i=0 ar = a(rn∗ −1 − 1)/(r − 1). But,
nP
∗ −1 nP
∗ −2 a(rn∗ −1 − 1) a(rn∗ − 1)
ari = arn∗ −1 + ari = arn∗ −1 + = ,
i=0 i=0 r−1 r−1
which contradicts n∗ being a counter-example: there is no counter-example.
(c) Let n∗ be the smallest counter-example: n∗ ≥ 2 (P (1) is t) and n∗ > 2n∗ . Also, n∗ − 1
is not a counter-example (n∗ is the smallest counter-example), so n∗ − 1 ≤ 2n∗ −1 . But,
n∗ = n∗ − 1 + 1 ≤ 2n∗ −1 + 1 ≤ 2n∗ −1 + 2n∗ −1 = 2n∗ ,
(d) Let n∗ be the smallest counter-example: n∗ ≥ 2 (P (1) is t) and 5n∗ − 1 is not divisible
by 4. Also, n∗ − 1 is not a counter-example (n∗ is the smallest), so 5n∗ −1 − 1 = 4k. But,
5n∗ − 1 = 5 · 5n∗ −1 − 1 = 5(4k + 1) − 1 = 4(5k + 1),
n∗
so 4 divides 5 − 1 contradicting n∗ being a counter-example: n∗ does not P exist.
n∗
(e) Let n∗ be the smallest counter-example; n∗ ≥ 2 (P (1)Pis t). Therefore i=1 i.i! 6=
n∗ −1
(n∗ + 1)! − 1. Since n∗ is the smallest counter-example, i=1 i ∗ i! = n∗ ! − 1. But,
n∗
P nP
∗ −1
i.i! = n∗ n∗ ! + i ∗ i! = n∗ n∗ ! + n∗ ! − 1 = (n∗ + 1)! − 1,
i=1 i=1
Exercise 5.8. Suppose P (1) is t; and, P (n) → P (n + 1) is t for n ≥ 1. We show P (n) is t for
all n ≥ 1. Assume P (n) is false for some n, and let n∗ be the smallest counter-example for P (n);
n∗ ≥ 2 because P (1) is t. Therefore n∗ − 1 is not a counter-example (because n∗ is the smallest),
so P (n∗ − 1) is t. But, P (n∗ − 1) → P (n∗ ), since n∗ − 1 ≥ 1 and since P (n∗ − 1) is t, it implies
that P (n∗ ) is t. This contradicts n∗ being a counter example, so P (n) is t for all n ≥ 1.
Exercise 5.9. Suppose B is a non-empty subset of N. Suppose B has no minimum element. We
show that B is empty (a contradiction). Define the claim
P (n) : B does not contain 1, 2, . . . , n.
Then P (1) is t because if 1 ∈ B, then 1 is the minimum element. Suppose P (n) is t. We show
that P (n + 1) is t. We need to show that B does not contain 1, 2, . . . , n, n + 1. Because P (n) is
446
t, B does not contain 1, 2, . . . , n. Suppose B contains n + 1, then n + 1 is the minimum element.

Since B has no minimum element, B does not contain n + 1, and so P (n + 1) is t. Therefore P (n)
is t for all n ≥ 1, that is B contains no elements and is the empty set, as was to be shown.
If you defined “P(n): B does not contain n,” then the induction P (n) → P (n + 1) fails because to
show that n + 1 6∈ B you need to know that 1, 2, . . . , n 6∈ B.
Chapter 6
√ 1
√
Pop Quiz 6.1. Assume (to get a contradiction) that 2 n + √n+1 > 2 n + 1. Multiplying by
√ p
n + 1 and rearranging, 2 n(n + 1) > 2n + 1. Since both sides are positive, we can square
√ to
√
obtain 4n2 +4n > 4n2 +4n+1, or 0 > 1, a clear contradiction. Therefore 2 n+ √n+1
1
≥ 2 n + 1.
Exercise 6.2. Define the claim P (n) : n3 < 2n . Let us consider the induction step, so assume
that P (n) is t and consider (n + 1)3 = n3 + 3n2 + 3n + 1 < 2n + 3n2 + 3n + 1. P (n + 1) will
follow if 3n2 + 3n + 1 < 2n , so define Q(n) : 3n2 + 3n + 1 < 2n . Let us consider the induction step
for Q: assume Q(n), i.e. 3n2 + 3n + 1 < 2n and consider Q(n + 1). 3(n + 1)2 + 3(n + 1) + 1 =
3n2 + 3n + 1 + 6n + 6 < 2n + 6n + 6. Q(n + 1) will be t if 6n + 6 < 2n . Let us define the claim
R(n) : 6n + 6 < 2n . Let us prove the stronger claim P (n) ∧ Q(n) ∧ R(n) for n ≥ 10.
For the base case, the reader can easily verify that P (10), Q(10), R(10) are all t. For the induction
step, assume P (n) ∧ Q(n) ∧ R(n) for n ≥ 10, so n3 < 2n ∧ 3n2 + 3n + 1 < 2n ∧ 6n + 6 < 2n . We
prove P (n + 1) ∧ Q(n + 1) ∧ R(n + 1).
ih
n + 13 = n3 + 3n2 + 3n + 1 < 2n + 2n = 2n+1
ih
3(n + 1)2 + 3(n + 1) + 1 = 3n2 + 3n + 1 + 6n + 6 < 2n + 2n = 2n+1
ih
6(n + 1) + 6 = 6n + 6 + 1 < 2n + 6 < 2n + 2n = 2n+1 .
The 1st equation uses P (n) for n3 and Q(n) for 3n2 + 3n + 1. The 2nd equation uses Q(n) for
3n2 + 3n + 1 and R(n) for 6n + 6. The 3rd equation uses R(n) for 6n + 6 and 6 < 2n when n ≥ 10.
Therefore P (n + 1) ∧ Q(n + 1) ∧ R(n + 1) is t. By induction P (n) ∧ Q(n) ∧ R(n) is t for n ≥ 10.)
Pop Quiz 6.3. No because each subgrid has 4n squares, which is not a multiple of 3 (the number
of squares in an L-tile). This is so because we know that 4n − 1 is divisible by 3, so 4n can’t be.
Exercise 6.4. We prove a stronger claim by induction.
P (n) : the 2n × 2n grid can be L-tiled for any choice of the blacked out square.
The base case is the 2 × 2 square. For the induction step, assume P (n), so the 2n × 2n gridcan
be L-tiled for any choice of the blacked out square. We P (n + 1). Consider the 2 n+1 × 2n+1 grid
with any square missing. Divide the grid into its 4 sub-squares as in the text. Place an L-tile in
the center overlapping with the 3 sub-grids that are empty. You now have four 2 n × 2n sub-grids,
each with a square missing somewhere; three of them have a corner square missing and one has
a square missing in some arbitrary position. By the induction hypothesis, each of these sub-grids
with a square missing can be independently L-tiled. Therefore the whole 2 n+1 × 2n+1 grid with
the square missing can be L-tiled, and so P (n + 1) is t. By induction, P (n) is true for n ≥ 1.
Pop Quiz 6.5.
(a)
P (1) P (2) P (3) P (4) P (5) P (6) P (7) P (8) P (9) P (10) P (11) P (12) ···
(b) There are three sets of implication arrows: black (starting from 1); gray (starting from 2); and,
light gray (starting from 3). To touch every n with a chain of implication arrows that starts
from a base case, we need the three bases cases highlighted in boxes, i.e. P (1), P (2), P (3).
Exercise 6.6.
(a) (i)
P (1) P (2) P (3) P (4) P (5) P (6) P (7) P (8) P (9) P (10) P (11) P (12) ···
A chain starts at every odd n (we have shown the chains starting at 1, 3, 5).
(ii) There is no in-arrow into odd n, so you need infinitely many base cases 1, 3, 5, 7, 9, 11, . . ..
447
(b) (i)
P (1) P (2) P (3) P (4) P (5) P (6) P (7) P (8) P (9) P (10) P (11) P (12) ···
Now, there is an incoming arrow to every n from ⌊ n/2 ⌋.
(ii) Since there is an incoming arrow to every n, you only need the base case P (1).
Exercise 6.7.
(a) 21 = 21 + 22 + 24 .
(b) Define P (n) : n is a sum of distinct powers of 2. The base case is P (1) and 1 = 20 . We use
strong induction. Assume P (1), . . . , P (n), and consider P (n + 1). There are two cases.
P
Case 1: n is even. By the induction hypotiesis, n = i≥1 ai 2i , where ai = 0, 1. It follows
P
that n + 1 = 20 + i≥1 ai 2i and so P (n + 1) is t.
Case 2: n isPodd, so n + 1 is even, and 1 ≤ 12 (n + 1) ≤ n, Pand so i+1P ( 12 (n + 1)) is t, that is
1 i
2
(n + 1) = i≥1 ai 2 , where ai = 0, 1. Therefore n + 1 = i≥1 ai 2 and so P (n + 1) is t.
In both cases, we proved P (n + 1), and so, by induction, P (n) is t for n ≥ 1.
∞
P
(c) Define P (n) : n = ai i!, where ai ∈ {0, 1, . . . , i}. The base case is P (1) and 1 = 1!. We use
i=1
strong induction. Assume P (1), . . . , P (n), and consider P (n + 1).
P Pk−1
Let n =
P i≥1 ai i!. Let k be the first index forP which ak < k, so n = i=1 ii! + ak k! +
i≥k+1 ai i!. We claim that n + 1 = (ak + 1)k! + i≥k+1 ai i!, which proves P (n + 1). To see
this, there are two cases.
P
Case 1: k = 1, in which case the summation k−1 i=1 i is empty (i.e. zero) and a1 = 0 (because
a1 < 1), therefore we are just adding 1 to n, which clearly gives n + 1.
Case 2: k ≥ 2, in which case, by Exercise 5.3(e),
P P
n = k! − 1 + ak k! + ai i! = (ak + 1)k! − 1 + ai i!,
i≥k+1 i≥k+1
P
and adding 1 to both sides proves that n + 1 = (ak + 1)k! + i≥k+1 ai i!.
We proved P (n + 1), therefore by induction P (n) is t for n ≥ 1.
Exercise 6.8.
(a) Define P (n) : the greedy algorithm uses the fewest coins for n. The base case is P (1) which
is clearly t, since the greedy algorithm uses one 1¢ coin, and you cannot do better than one
coin. We use strong induction, so assume P (1), . . . , P (n) and consider n + 1.
Let n + 1 ≥ 25. Suppose that the optimal way to obtain n + 1 does not contain a quarter.
It cannot contain 3 or more dimes, as you can replace 3 dimes with a quarter and a nickel
and do better. We leave it to the reader to show in a similar way that it cannot contain 2
dimes, 1 dime or zero dimes, which results in an impossible situation. Therefore, the optimal
way must contain a quarter; and then uses some number of coins for n + 1 − 25. The greedy
algorithm uses a quarter and then by the induction hypothesis, the optimal number of coins
for n + 1 − 25, which means greedy is optimal for n + 1.
Suppose 10 ≤ n + 1 < 25. A similar reasoning shows there must be at least one dime, hence
greedy is optimal. (If there is no dime, there aren’t 2 nickels, or 1 nickel and at least 5 pennies,
or no nickel and at least 10 pennies.)
Suppose 5 ≤ n + 1 < 10. Using similar reasoning, there must be a nickel and greedy is
therefore optimal. Lastly, Greedy is clearly optimal for n + 1 < 5.
Thus greedy is optimal for n + 1, and hence by induction, greedy is optimal for all n ≥ 1.
(b) Consider the coin system with denominations {1¢,4¢,5¢}. To make 8¢, the greedy algorithm
uses {5¢,1¢,1¢,1¢}, but you only need two coins, {4¢,4¢}.
Pop Quiz 6.9. No. Only P (1) and P (5) are t.
You need the base cases P (1), P (2), P (3), P (4). Then,
P (1) → P (5);
P (1) ∧ P (2) → P (6);
P (1) ∧ P (2) ∧ P (3) → P (7);
P (1) ∧ P (2) ∧ P (3) ∧ P (4) → P (8);
P (1) ∧ P (2) ∧ P (3) ∧ P (4) ∧ P (4) → P (9); and so on.
448
Chapter 7
Pop Quiz 7.1. f (3) cannot be computed. f (3) = f (2) + 5 = f (1) + 3 + 5 = f (0) + 1 + 3 + 5 = . . . .
Since we don’t know any of f (2), f (1), f (0), f (−1), . . . , we cannot compute f (3).
Exercise 7.2.
(a) f (−1) = f (0) = 0; f (1) = f (0) + 1 = 1; f (2) = f (1) + 3 = 4; f (3) = f (2) + 5 = 9;
(b) f (n) = n2 .
Exercise 7.3. (a) and (b) are well defined. In (c), the recursive part uses a larger value further
from a base case. In (d) you cannot compute f (1).
Exercise 7.4. P (0) and P (1) are true since f (n) is explicitly given for n = 0, 1. We show that
P (n) → P (n + 1). Assume f (n) can be computed. Then f (n + 2) = f (n) + 2 which can be
computed since we know f (n). Therefore, by leaping induction, P (n) is t for n ≥ 0.
Exercise 7.5.
(a) f (n) = 2n. (Base case) f (0) = 0 = 2 × 0. (Induction step) Assume f (n) = 2n; then,
f (n + 1) = 2 + f (n) = 2 + 2n = 2(n + 1).
(b) f (n) = 0. (Base case) f (0) = 0. (Induction step) Assume f (n) = 0; then, f (n + 1) = 2f (n) =
2 × 0 = 0.
(c) f (n) = 2n . (Base case) f (0) = 1 = 20 . (Induction step) Assume f (n) = 2n ; then, f (n + 1) =
2f (n) = 2 × 2n = 2n+1 .
Exercise 7.6. First we unfold the recursions in (a), (b), (c); (d) is complicated.
(a) f (n) = f (n − 1) + log2 n (b) f (n) = 2f (n − 1) (c) f (n) = nf (n − 1)
f (n − 1) = f (n − 2) + log2 (n − 1) f (n − 1) = 2f (n − 2) f (n − 1) = (n − 1)f (n − 2)
f (n − 2) = f (n − 3) + log2 (n − 2) f (n − 2) = 2f (n − 3) f (n − 2) = (n − 2)f (n − 3)
.. .. ..
. . .
f (3) = f (2) + log2 3 f (3) = 2f (2) f (2) = 2f (1)
f (2) = f (1)✯ + log 2
0
✯
f (3) = 2f (1)
1
✯
f (1) = 1f (0)
1
2
+ f (n) = log2 2 + · · · + log2 n × f (n) = 2 × · · · × 2 × f (n) = 1 × 2 × · · · × n
= log2 n! = 2n−1 = n!
In (a), the cancelations occur when you equate the sum of the LHS terms to that of the RHS
terms. In (b) and (c), the cancelations occur when you equate the products. Here are the proofs.
(a) f (n) = log2 n!. (Base case) f (1) = 0 = log2 1!. (Induction step) Assume f (n) = log2 n!; then,
f (n + 1) = f (n) + log2 (n + 1) = log2 n! + log2 (n + 1) = log2 (n + 1)!.
(b) f (n) = 2n−1 . (Base case) f (1) = 1 = 20 = 21−1 . (Induction step) Assume f (n) = 2n−1 ; then,
f (n + 1) = 2f (n) = 2 × 2n−1 = 2n = 2n+1−1 .
(c) f (n) = n!. (Base case) f (0) = 1 = 0!. (Induction step) Assume f (n) = n!; then, f (n + 1) =
(n + 1)f (n) = (n + 1) × n! = (n + 1)!.
(d) It is possible to unfold the recursion, but one must be careful.
f (n) = f (n − 1)2 Even though getting the formula was a lit-
f (n − 1)2 = (f (n − 2)2 )2 tle complicated, the proof by induction after
(f (n − 2)2 )2 = ((f (n − 3)2 )2 )2 you have the formula is standard.
0 1−1
.. (Base case) f (1) = 2 = 21 = 22 = 22 .
. n−1
(Induction step) Assume f (n) = 22 .
((f (2) ) ) = (((f (1)2 )2 )... )2
2 ... 2
Then,
+ f (n) = (((f (1)2 )2 )... )2
= (((22 )2 )... )2 f (n + 1) = f (n)2
n−1 n−1
The cancelations are after summing the LHS’s = 22 × 22
n−1
and equating to the sum of the RHS’s. So, f (n) = 22×2
is 2 squared n − 1 times. When you square, you n
= 22 = 2 2
n+1−1
.
multiply the exponent by 2, so
n−1
f (n) = 21×2×2×···×2 = 22 . There’s an easier analysis of this recursion.
449
Transforming a recursion. We take this opportunity to show you a powerful trick for analyzing
recursions. Often you can transform f (n) to a function g(n) that is easier to analyze. The recursion
for f (n) will be transformed to one for g(n). We use the transformation
g(n) = log2 f (n).
Note: g(1) = log2 f (1) = 1. Taking logs of both sides of the recursion for f (n) gives
log2 f (n) = log2 f (n − 1)2 = 2 log2 f (n − 1).
We get a recursion for g(n) by replacing log 2 f with g,
g(n) = 2g(n − 1).
n−1
We analyzed this recursion in part (b), g(n) = 2n−1 and f (n) = 2g(n) = 22 .
Exercise 7.7. T1 = 1 = F (2) and T2 = 2 = F3 , so the base cases are true. We use strong
induction. Suppose T1 = F2 , . . . , Tn = Fn+1 for n ≥ 2. By the recursion, Tn+1 = Tn + Tn−1 =
Fn+1 + Fn (by the induction hypothesis). But by the Fibonacci recursion, F n+1 + Fn = Fn+2 ,
hence Tn+1 = Fn+2 . By induction, Tn = Fn+1 for n ≥ 1.
Exercise 7.8. F1 = 1 ≤ 21 and F2 = 2 ≤ 22 , so the base cases hold. Suppose F1 ≤ 21 , . . . , Fn ≤
2n for n ≥ 2. Consider Fn+1 = Fn + Fn−1 ≤ 2n + 2n−1 (by the induction hypothesis). But
2n + 2n−1 ≤ 2n + 2n = 2n+1 , so Fn+1 ≤ 2n+1 . By induction, Fn ≤ 2n for n ≥ 1.
Exercise 7.9. We use induction. The base case is Big(0)=1 which is 2 0 . Suppose Big(n) = 2n .
Since n + 1 > 0, Big(n+1) = 2 · Big(n) = 2 · 2n = 2n+1 . By induction, Big(n) = 2n for n ≥ 1.
Exercise 7.10. We use induction. The base case: T0 = 2 = 3 × 0 + 2. Suppose Tn = 3n + 2.
Then, Tn+1 = Tn + 3 = 3n + 2 + 3 = 3(n + 1) + 2. By induction, Tn = 3n + 2 for n ≥ 1.
Exercise 7.1. (a) Yes (b) Yes (c) No ( 32 → 25 ) (d) Yes (e) No (1 is not in the set)
Exercise 7.12. Remember that in all cases, by default, nothing else is in the set.
(a) 1 1 ∈ S. (b) 1 ε, 0, 1 ∈ S. (c) 1 ε ∈ S.
2 x ∈ S → 3x ∈ S. 2 x ∈ S → 0x0 ∈ S; 2 x, y ∈ S → [x]y ∈ S.
1x1 ∈ S.
Exercise 7.13. T1 =
T1 =
T1 = ε T1 =
ε
T2 = ε T2 = T2 = T2 =
Exercise 7.14.
(a) Every RFBT is an RBT. This is because the basis case for the RFBT is an RBT; and, the
constructor rules are the same. However, every RBT is not an RFBT.
is an RBT, but not an RFBT.
(b) There are no RFBTs with 6 vertices (only an odd # of vertices is possible). 5 node RFBT:
Chapter 8
x=ε,y=ε x=ε,y=[ ] x=[ ][ ],y=[ ]
Pop Quiz 8.1. ε −→ [ ] −→ [ ][ ] −→ [[ ][ ]][ ]
Exercise 8.2. The proof is by structural induction.
1: Clearly ε is matched (base case).
2: For the induction step, there is only one constructor rule. Suppose x and y are matched.
Then xy is matched and so every prefix in [xy has at least one more “[” than “]”. Inserting
“]” anywhere in [xy can add at most one to “]” ’s in some prefixes. Therefore, every prefix in
[x]y has at least as many “[” than “]” and so [x]y is matched.
450
3: By structural induction, every string in M is matched.

Exercise 8.3.
(a) Suppose s is balanced and matched. For prefix s[0] · · · s[i], define the excess function f (i) to
be the number of “ [ ” minus the number of “ ] ”. Since s is matched, f (i) ≥ 0; s must begin
with “ [ ” so f (0) = 1 and s is balanced so f (n) = 0 (length(s) = n + 1). Let i∗ be the first
prefix which is balanced, so f (i∗ ) = 0 and i∗ ≤ n and s[i∗ ] = “ ] ”. We have decomposed s as
s = [x]y,
where x = s[1] · · · s[i∗ − 1] and y = s[i∗ + 1] · · · s[n] (x or y could be empty). To be done, we
show that x andy are both balanced and matched. Since s[0] · · · s[i ∗ ] is balanced (f (i∗ ) = 0),
x must be balanced. And since s and x are balanced, y must be balanced.
We now show that x and y are matched. The easy case is y. Suppose y is not matched. So
there is some prefix α of y with more “ ] ”. Then [x]α is a prefix of s with more “ ] ”, because
[x] is balanced. This contradicts s being matched, and so y is matched.
Suppose x is not matched. So, there is a prefix β of x with more “ ] ”; β 6= x because x is
balanced. Consider [β which is a prefix of s. f ([β) ≥ 0 because s is matched and β has more
“ ] ”, so β has exactly one more “ ] ” than “ [ ”, which means that f ([β) = 0. But this contradicts
s[0] · · · s[i∗ ] being the first prefix that is balanced.
(b) Suppose s is a balanced and matched string that is not in M.
(i) By the well ordering principle, we can choose s to be the balanced and matched string
of minimum length that is not in M.
(ii) By (a), s = [x]y where x, y are both balanced and matched.
(iii) x and y are at least 2 characters shorter than s.
(iv) s has minimum length among balanced matched strings not in M. x, y are both balanced
and matched, but shorter than s. Thus x, y ∈ M.
(v) By the constructor rule, s = [x]y ∈ M, which contradicts s 6∈ M.
Exercise 8.4.
(a) Ns = N.
(b) Structural induction with Ns is exactly strong induction:
1. (Basis) Show property P holds for 1, i.e. P (1) is t.
2. (Structural Induction) Assume P holds for 1, 2, . . . , n and show P (n + 1) holds, i.e.
P (1) ∧ P (2) ∧ · · · ∧ P (n) → P (n + 1)
Exercise 8.5.
(a) x • y = 0101110; x • y • z = 010111010110.
(b) (x • y)r = 0111010; (x • y • z)r = 011010111010.
(c) Let n = |y|. We prove that (x • y)r = y r • xr by strong induction on n = |y|. If n = 0
(y = ε), there is nothing to prove (base case). Suppose the claim holds up to n ≥ 0, that is
(x • y)r = y r • xr whenever |y| ≤ n. Now consider any y with |y| = n + 1 and write y = y[n] b
where b is a single bit and y[n] is the prefix of length n. Then
ih ih ih
(x • y)r = ((x • y[n] ) • b)r = b • (x • y[n] )r = b • y[n]
r
• xr = (y[n] • b)r • xr = y r • xr .
First we apply ih to b, then to y[n] and then to b again (all of which have length at most n).
(d) The base case, n = 2, is in (c). Assume the claim holds for n ≥ 2 and consider n + 1:
(x1 • x2 • · · · • xn • xn+1 )r = ((x1 • x2 • · · · • xn ) • xn+1 )r
ih
= xrn+1 • (x1 • x2 • · · · • xn )r
ih
= xrn+1 • xrn • xrn−1 • · · · • xr1 .
Pop Quiz 8.6. ε → 11 → 0110 → 001100.

A length 6 palindrome is xxr for any x with |x| = 3. There are 8 strings of length 3, hence 8
palindromes of length 6. In general, there are 2⌈ n/2 ⌉ palindromes of length n.
Exercise 8.7.
(a) We give the formal proof with numbered steps for easy reference.
1: The 3 base cases ε, 0, 1 are palindromes. (Strings of length at most 1 are palindromes.)
451
2: For the structural induction step, suppose we start with a palindrome x = x r . We

must show that each constructor rule produces a new palindrome. Using Exercise 8.5,
(0 • x • 0)r = 0r • xr • 0r = 0 • x • 0 and similarly, (1 • x • 1)r = 1r • xr • 1r = 1 • x • 1 (because
x = xr ). Therefore both constructor rules produce palindromes.
3: By structural induction, every member of P is a palindrome.
(b) Consider s, the shortest palindrome not in P. If s starts with 0, then it must end in 0, so
s = 0 • x • 0. Further, x must be a palindrome for s to be one. Now, x is shorter than s, so
since s is the shortest palindrome not in P, it must be that x ∈ P. But then the constructor
rule gives that s = 0 • x • 0 ∈ P, a contradiction. A similar contradiction arises if s = 1 • x • 1.
Therefore, there is no shortest palindrome not in P, i.e. every palindrome is in P.
Exercise 8.8.
(a) We give the formal proof with numbered steps for easy reference.
1: The base case is 1 which clearly evaluates to 1 which is odd.
2: Structural induction: We consider each constructor rule separately. For rule 1, suppose
x ∈ Aodd and x is odd. The constructor rule produces (x + 1 + 1) and value((x + 1 + 1)) =
value(x) + 2, which is odd because value(x) is odd. For rule 2, suppose x, y ∈ Aodd and
x, y are odd. The constructor rule produces (x × y) whose value is value(x) × value(y),
which is odd because the product of two odd numbers is odd.
3: By structural induction, the value of every member of Aodd is odd.
(b) (1 + 1 + 1 + 1 + 1)
Pop Quiz 8.9. The number of links is 15. The number of vertices is 15. For any RBT, the
number of links must be one less than the number of vertices. So this tree cannot be an RBT.
Exercise 8.10.
(a) First the size. By the recursive definition,
size = 1 + size(left-subtree) + size(right-subtree) = 1 + 2 · size(left-subtree),
since both child-subtrees are identical. Applying the same logic to the left-child,
size = 1 + 2(1 + 2 · size(left-left-child)) = 1 + 2 + 4 · size(left-left-child)
= 1 + 2 + 4 + 8 · size(left-left-left-child)
= 1 + 2 + 4 + 8 + 16 · size(left-left-left-left-child
| {z }) = 1 + 2 + 4 + 8 + 16 · 0 = 15.
ε
Similarly, we can recursively obtain the height,
height = 1 + 1 + 1 + 1 + 1 · height(left-left-left-left-child
| {z }) = 1 + 1 + 1 + 1 − 1 = 3.
ε
(b) Define, for an RBT T , the property P (T ) : size(T ) ≤ 2height(T )+1 − 1. P (ε) is t because
0 = 2−1+1 − 1. Now suppose that, for RBTs T1 and T2 , P (T1 ) and P (T2 ) are t. That is,
size(T1 ) ≤ 2height(T1 )+1 − 1
← induction hypothesis
size(T2 ) ≤ 2height(T2 )+1 − 1
By the recursive definitions of size and height,
height(T ) = 1 + max(height(T1 ), height(T2 ))
size(T ) = 1 + size(T1 ) + size(T2 )
By the induction hypothesis,
size(T ) ≤ 1 + 2height(T1 )+1 − 1 + 2height(T2 )+1 − 1
= 2height(T1 )+1 + 2height(T2 )+1 − 1
≤ 2max(height(T1 ),height(T2 ))+1 + 2max(height(T1 ),height(T2 ))+1 − 1
= 2max(height(T1 ),height(T2 ))+2 − 1
= 2height(T )+1 − 1.
So, P (T ) is t and we have proved that the constructor preserves property P . By structural
iniduction, P (T ) is t for every T ∈ RBT.
Chapter 9
Pop Quiz 9.1. (a) 1+1+1=3. (b) 1+2+3=6. (c) f (i) = 1 + 1 + 1 = 3, so the answer is
3 + 3 + 3 = 9. (d) 1 + (1 + 2) + (1 + 2 + 3) = 10.
452
P Pn
Pop Quiz 9.2. T4 (n) = 5 + n i=1 10 = 5 + 10 · i=1 1. The last sum is n, so T4 (n) = 5 + 10n.
Exercise 9.3. We use several common sums together with the constant and addition rules:
Pn n
P Pn Pn
S(n) = (1 + 2i + 2i+2 ) = 1+ 2i + 2i+2 (addition rule)
i=1 i=1 i=1 i=1
Pn n
P Pn
= 1+2 i+4 2i (constant rule)
i=1 i=1 i=1
= n+2× 1
2
n(n + 1) + 4 × (2n+1 − 1 − 1) (common sums)
n+3 2
= 2 +n +n−8 (simplify)
Exercise 9.4.
n h
P n
P j
P i
(a) T1 (n) = 2 + 2+ 5+ 2
i=1 j=i k=i
Pn h Pn j
n P
P i
= 2+ 2+ 5+ 2 (addition rule)
i=1 j=i j=i k=i
Pn n P
P n Pn P n j
P
= 2+ 2+ 5+ 2 (addition rule)
i=1 i=1 j=i i=1 j=i k=i
n
P Pn Pn Pn P n j
P
= 2+2 1+5 1+2 1 (constant rule)
i=1 i=1 j=i i=1 j=i k=i
In the last expression we used the constant rule 3 times to pull the constants outside (to the
left) in all summations.
Pj
(b) 1 = j + 1 − i is one of the common sums.
k=i
Pn n+1−i
P
(c) (j + 1 − i) = 1 + 2 + · · · + (n + 1 − i) = ℓ = 21 (n + 1 − i)(n + 2 − i)
j=i ℓ=1
(In the last expression, we used a common sum).
n
P
(d) To compute (n + 1 − i)(n + 2 − i), we observe that as i goes from 1 up to n, n + 1 − i goes
i=1
from n down to 1. Letting ℓ = n + 1 − i, our sum can equivalently be written
Pn n
P n
P
(ℓ)(ℓ + 1) = ℓ2 + ℓ.
ℓ=1 ℓ=1 ℓ=1
(We used the addition rule to get the last expression.)
Pn Pn Pj
(e) We use the nested sum rule to compute 1,
i=1 j=i k=i
n
P n
P j
P n
P n
P
1 = (j + 1 − i) (nested sum rule and (b))
i=1 j=i k=i i=1 j=i
Pn
1
= 2
(n + 1 − i)(n + 2 − i) (nested sum rule and (c))
i=1
Pn Pn
= 1
2
i2 + 21 i (using (d))
i=1 i=1
1
= 12
n(n + 1)(2n + 1) + 41 n(n + 1) common sums)
Whenever you compute a huge sum like this it is advisable to tinker and see if your formula is
right for small values of n. You can write a function in your favorite programming language to
compute the sum and test it against your formula. We did exactly that to verify our formula.
n 1 2 3 4 5 6 7
n P
P n Pj
1 1 4 10 20 35 56 84
i=1 j=i k=i
1
12
n(n + 1)(2n + 1) + 41 n(n + 1) 1 4 10 20 35 56 84
Pop Quiz 9.5. Using the formulas in the table on page 115,
T1 ∈ Θ(n3 ); T2 ∈ Θ(n2 ); T3 ∈ Θ(n log n); T4 ∈ Θ(n).
Therefore,
T1 is in Ω(n log n), ω(n log n), Ω(n2 ), ω(n2 ), O(n3 ), Θ(n3 ), Ω(n3 ).
453
T2 is in Ω(n log n), ω(n log n), O(n2 ), Θ(n2 ), Ω(n2 ), O(n3 ), o(n3 ).
T3 is in O(n log n), Θ(n log n), Ω(n log n), O(n2 ), o(n2 ), O(n3 ), o(n3 ).
T4 is in O(n log n), o(n log n), O(n2 ), o(n2 ), O(n3 ), o(n3 ).
Exercise 9.6.
(a) f + f = 2f ∈ Θ(f ) because 2 is a constant. Similarly f + f + f = 3f ∈ Θ(f ). With n terms,
f + f + · · · + f = nf ∈ Θ(nf ) (you cannot ignore the n because it is not a constant).
(b) lim(c · f )/f → c = constant so c · f ∈ Θ(f ).
(c) Follows from the calculus fact that for any ǫ, k > 0, limn→∞ logk n/nǫ = 0.
(d) Follows from nk /nǫ log n = nk−ǫ log n → 0 and nk /2ǫn = 2k log2 n−ǫn → 0.
(e) Follows from log nk = k log n so log nk / log n → k = constant.
√ √
(f) Follows from (1 + n)/n = 1/n + 1/ n → 0.
1 5 1
(g) Follows from ( n + n2 )/ n = 1 + 5/n → 1 = constant.
(h) We must prove upper and lower bounds. The upper bound follows from:
log n! = log n + log(n − 1) + · · · + log 1 ≤ log n + log n + · · · + log n = n log n.
For the lower bound, observe that log(2n)! = log(2n)(2n − 1)(2n − 2)(2n − 3) · · · 2 · 1, hence
log(2n)! ≤ log(2n)2 (2(n − 1))2 · · · 22 = 2 log n! + 2n log 2.
(We get this bound by grouping in pairs, for example 2n(2n − 1) ≤ (2n)2 .) Also,
log(2n)! = log(2n) + log(2n − 1) + · · · + log(n + 1) + log n!
≥ log(2n) + log(2n − 1) + · · · + log(n + 1) ≥ n log n.
Combining the two bounds, 2 log n! + 2n log 2 ≥ n log n, or,
log n! ≥ 21 n log n − n log 2 = 41 n log n + 14 n(log n − 4 log 2).
We conclude that log n! ≥ 41 n log n which is true from the inequality above for n ≥ 24 because
log n − 4 log 2 ≥ 0 and it can be verified for n = 1, . . . , 15 explicitly.
(i) f = ak nk + g where g has only lower order terms, at most k such terms. Let the largest
coefficient in g be A. Then |g| ≤ k|A|nk−1 . We have that f /nk = ak + g/nk and |g/nk | ≤
|A|knk−1 /nk = |A|k/n. Since |A| and k are constants, |g/nk | → 0 and we have that f /nk →
ak = constant. This proves f ∈ Θ(nk ).
(j) Yes, f is polynomial. The highest power “appearing” is n which has order 1. From the
previous problem you might think f ∈ Θ(n), but that is wrong. The notation is deceiving
because there are many terms and the term of order n does not appear just once as in a
traditional polynomial. For example n/2 appears somewhere in the middle, which is also of
order 1. In fact, there are n/2 terms that are at least n/2, so f ≥ n2 /4. In fact, we know that
f (n) = 12 n(n + 1) ∈ Θ(n2 ).
To make part (i) more precise, we should emphasize that in a polynomial, each term of a
particular order appears at most once.
(k) Suppose n2 ∈ O(n), i.e. n2 ≤ Cn for a constant C (by taking ⌈ C ⌉, we may assume C is an
integer). Let n = 2C; then, 4C 2 ≤ 2C 2 or 4 ≤ 2, a contradiction. Therefore n2 6∈ O(n).
(l) Since f ∈ Θ(r) and g ∈ Θ(s), there are positive constants c, C and d, D for which
c·r ≤f ≤C ·r and d · s ≤ g ≤ D · s.
(i) Adding the left hand sides and similarly the right hand sides gives
cr + ds ≤ f + g ≤ Cr + Ds.
Since cr + ds ≥ min(c, d)(r + s) and Cr + Ds ≤ max(C, D)(r + s), we have that
min(c, d)(r + s) ≤ f + g ≤ max(C, D)(r + s),
or that f + g ∈ Θ(r + s).
(ii) Instead of adding, if we multiply, we get cd · (rs) ≤ f g ≤ CD · (rs), or that f g ∈ Θ(rs).
(m) (i) No. Consider f = 2n and g = n, then f ∈ Θ(g). But 2f /2g − 22n /2n = 2n → ∞.
(ii) Yes. We have that c · g ≤ f ≤ C · g. Everything is positive and log is increasing so take
log of both sides to get log g + log c ≤ log f ≤ log g + log C. That is log f ∈ Θ(log g).
O-notation blurs small diffenences (constants). Exponentiation blows up those differences, so
one must be careful. Logarithms further reduce differences and so are safe with O-notation.
454
(n) (a) f ∈ Θ(g) → f ∈ O(g) is t because Θ(·) requires upper and lower bounds but O(·) requires
only the upper bound: Θ(g) ⊂ O(g). (b) The converse, f ∈ O(g) 6→ f ∈ Θ(g) is f. For a
counter-example, consider f = n and g = n2 . (c) Yes: c·g ≤ f ≤ C ·g implies C1 ·f ≤ g ≤ 1c ·f .
(o) Suppose f ∈ O(n), then f ≤ Cn ≤ Cn2 . That is f ∈ O(n2 ), which means O(n) ⊂ O(n2 ). It
is a proper subset because n2 6∈ O(n) but n2 ∈ O(n2 ).
Θ(n) 6⊂ Θ(n2 ) because n ∈ Θ(n) but n 6∈ Θ(n2 ).
(p) We can use the definitions based on the limits or the more formal definitions based on bounds.
(i) f /h = (f /g) · (g/h); since both terms on the RHS converge to a constant because
f ∈ Θ(g) and g ∈ Θ(h), f /h → constant, i.e. f ∈ Θ(h).
(ii) f /h = (f /g) · (g/h) → 0 (both terms on the RHS converge to 0), i.e. f ∈ o(h).
(iii) f ≤ C · g and g ≤ C ′ · h implies f ≤ C · (C ′ · h) = CC ′ · h, i.e. f ∈ O(h).
(iv) f /h = (f /g) · (g/h) → ∞ (both terms on the RHS converge to ∞), i.e. f ∈ ω(h).
(v) f ≥ C · g and g ≥ C ′ · h implies f ≥ C · (C ′ · h) = CC ′ · h, i.e. f ∈ Ω(h).
(q) We need a basic identity for any pair of positive numbers x, y: max(x, y) ≤ x+y ≤ 2 max(x, y).
Suppose r ∈ O(f + g). Then,
r ≤ C(f + g) ≤ 2C max(f, g),
i.e., r ∈ O(max(f, g)) and O(f + g) ⊆ O(max(f, g)). Suppose r ∈ O(max(f, g)). Then,
r ≤ C max(f, g) ≤ C(f + g),
i.e., r ∈ O(f + g) and O(max(f, g)) ⊆ O(f + g). O(f + g) ⊆ O(max(f, g)) and O(max(f, g)) ⊆
O(f + g) implies O(f + g) = O(max(f, g)).
Similarly, suppose r ∈ Θ(f + g). Then,
c max(f, g) ≤ c(f + g) ≤ r ≤ C(f + g) ≤ 2C max(f, g),
i.e., r ∈ Θ(max(f, g)) and Θ(f + g) ⊆ Θ(max(f, g)). Suppose r ∈ Θ(max(f, g)). Then,
1
2
c(f + g) ≤ c max(f, g) ≤ r ≤ C max(f, g) ≤ C(f + g),
i.e., r ∈ Θ(f + g) and Θ(max(f, g)) ⊆ Θ(f + g). Θ(f + g) ⊆ Θ(max(f, g)) and Θ(max(f, g)) ⊆
Θ(f + g) implies Θ(f + g) = Θ(max(f, g)). P P P
P have thatPc · g ≤ f ≤ C · g. Summing: c · i g(i) ≤ i f (i) ≤ C · i g(i), which means
(r) We
i f (i) ∈ Θ( i g(i)).
(s) We prefer T1 because its running time is asymptotically faster.
(t) We don’t know because T2 could be n or n3 (both are in O(n3 )). For the former, we prefer T2
and for the latter we prefer T1 . Whenever possible, you should give runtimes using Θ-notation,
because O(·) is more ambiguous. (Just as if T1 = 10 and T2 ≤ 20, which is better?)
(u) Similar to (t). T2 could be n or n2.5 . (Just as if T1 = 10 and T2 < 20, which is better?)
(v) T1 is asymptotically better than T2 (T1 is “equal to” n2 versus T2 is “greater than” n2 ) so we
definitely prefer T1 .
(w) T1 is asymptotically no worse than T2 (T1 is “equal to” n2 versus T2 is “at least” n2 ) so we
prefer T1 , though T2 could be as good.
(x) We don’t know because T2 could be n or n3 , both are in Ω(n). Like O(·), Ω(·) is ambiguous.
Whenever possible, try to give a Θ-analysis.
(y) T2 is asymptotically better? Theoretically, T2 is a better run time but see (z).
(z) Asymptotically T2 is better. This means for n → ∞, T2 < T1 . But, T2 does not become better
until n > 10800 . That is a large input, unlikely to be ever seen in practice – most estimates
put the number of atoms in the Universe at fewer than 10100 .
1.2
Exercise 9.7.
n
P
i4
sum/integral
1.1 i=1
1 5
5
n
0.9
0 200 400 600 800 1000
n
Exercise 9.8. In all cases, let S(n) denote the sum.
455
Rn R n+1
(a) Since (1+i)2 is increasing, 0
dx (1+x)2 ≤ S(n) ≤ 1
dx (1+x)2 . Computing the integrals,
1
3
(n + 1)3 ≤ S(n) ≤ 13 ((n + 2)3 − 1).
3
Since the lower and upper
R n bounds are in Θ(n
R n+1), S(n) ∈ Θ(n3 ).
i x x
(b) Since 2 is increasing, 0 dx 2 ≤ S(n) ≤ 1 dx 2 . Computing the integrals,
1
ln 2
(2n − 1) ≤ S(n) ≤ 1
ln 2
(2n+1 − 2).
n
Since the lower and upper
R n bounds are in Θ(2 ), S(n) ∈ Θ(2n ).
R n+1
i x
(c) Since i2 is increasing, 0 dx x2 ≤ S(n) ≤ 1 dx x2x . Computing the integrals,
1
ln 2
n2n − 1
ln2 2
2n + 1
ln2 2
≤ S(n) ≤ 1
ln 2
(n + 1)2n+1 − 1
ln2 2
2n+1 − 2
ln 2
+ 2
ln2 2
.
n n
Since the lower and upper bounds R n+1are in Θ(n2 ), S(n) ∈ Θ(n2R ).
n
(d) Since (1 + i2 )−1 is decreasing, 1 dx (1 + x2 )−1 ≤ S(n) ≤ 0 dx (1 + x2 )−1 . Computing
the integrals,
arctan(n + 1) − π4 ≤ S(n) ≤ arctan(n).
Since the lower and upper bounds R n+1 are in Θ(1), 2S(n) ∈ Θ(1).R n
(e) Since i/(1 + i2 ) is decreasing, 1 dx x/(1 + x ) ≤ S(n) ≤ 0 dx x/(1 + x2 ). Computing the
integrals,
1
2
ln 21 (1 + (1 + n)2 ) ≤ S(n) ≤ 12 ln(1 + n2 ).
Since the lower and upper bounds are in Θ(log n), S(n) ∈ Θ(log n).
2 Rn 2 R n+1 2
(f) Since i2i is increasing, 0 dx x2x ≤ S(n) ≤ 1 dx x2x . Computing the integrals,
2 2
1
2 ln 2
(2n − 1) ≤ S(n) ≤ 1
2 ln 2
(2(n+1) − 2).
n2 2
This time, the lower bound is in Θ(2 ) and the upper bound is in Θ(2n +2n ). The lower
2 2
and upper bounds do not have the same asymptotic behavior, 2 n ∈ o(2n +2n ). Therefore we
cannot immediately extract the Θ-behavior for S(n).
The bounds provided by the integration method are too loose. We can get tighter bounds
2
with a simpler analysis. The largest term in the sum is n2n and there are n terms, so
2 2
n2n ≤ S(n) ≤ n2 2n .
2
The lower bound is asymptotically tight. By the result above, S(n − 1) ≤ (n − 1)2 2(n−1) and
2
S(n) = S(n − 1) + n2n , therefore
2 2 2
2
2 2
S(n) ≤ n2n + (n − 1)2 2(n−1) = n2n 1 + 2 (n−1) n
2−2n ≤ n2n (1 + 2n2−2n ) ≤ 32 n2n ,
2 2 2
(because x2−x ≤ 21 for x ≥ 0). Therefore, S(n) ∈ Θ(n2n ), because n2n ≤ S(n) ≤ 32 n2n .
Exercise 9.9.
(a) This is just the sum written out.
(b) Multiply the expression in (a) by 2 on both sides. P
(c) Subtract (a) from
Pn (b): 2S(n) − S(n) = −21 − 22 − 23 − · · · − 2n + n2n+1 = n2n+1 − n i
i=1 2 .
i n n+1 n n+1
(d) Use (c) with i=1 2 = 2(2 − 1), S(n) = n2 − 2(2 − 1) = (n − 1)2 + 2.
Exercise 9.10. The red areas in the figure are larger than the green ones because the slope of
ln x is decreasing, (ln x)′′ = −1/x2 < 0. Computing the integral
replaces each red region in the rectangle with the corresponding
green region, therefore we get a lower bound:
Z n+1/2 Xn
dx ln x ≤ ln i = ln n!.
3/2 i=2
Evaluating the integral on the left, we get

n + 12 ln n + 21 − n + 12 − 23 ln 3
2
+ 3
2
≤ ln n!.
1 2 3 4 5 6
Exponentiate both sides to get
−n √
n+ 1 1 n+1/2
n! ≥ n + 1 ( 2 ) e−n e

2 3/2 2 3/2 n+ 2
2 3
= nn e ne 3 n
.
1 x
Lastly, using the approximation (1 + x
) ≈ e,
1 n+ 21 h i(n+ 1 )/2n 1 1 √
n+ 2 1 2n 2
n
= 1+ 2n
≈ e 2 + 4n = e · e1/4n .
√
And since e1/4n = 1 + Θ(1/4n), we get the desired approximation n! ≈ nn e−n n(2e/3)3/2 .
456
Chapter 10
Pop Quiz 10.1. 27 = 3×7+6 (r = 6). By setting q = 3, 2, 1, 0, −1, . . . we get r = 6, 13, 20, 27, . . ..
The smallest positive remainder is 6.
Exercise 10.2. The proofs all use Definition 10.2.
(a) 0 = 0 · d (q = 0), so d|0.
(b) Suppose d|m and d′ |n, so m = qd and n = q ′ d′ . Then mn = (qq ′ )dd′ . That is dd′ |mn
(quotient = qq ′ ).
(c) Suppose d|m and m|n, so m = qd and n = q ′ m. Then, n = q ′ qd so d|n (quotient = q ′ q).
(d) Suppose d|n and d|m, so n = qd and m = q ′ d. Then n + m = (q + q ′ )d. That is d|n + m
(quotient = q + q ′ ).
(e) Suppose d|n, so n = qd. For x ∈ N, xn = qxd, so xd|xn (quotient = q).
(f) Suppose d|m+n and d|m, so m+n = qd and m = q ′ d. Then, n = qd−n = qd−q ′ d = (q −q ′ )d.
That is d|n (quotient = q − q ′ ).
Exercise 10.3.
(a) Let P (n) = n is divisible by a prime. P (2) is t because 2 is a prime. Suppose P (2), . . . , P (n)
are all t. We show that P (n + 1) is t. If n + 1 is prime, then n + 1 is divisible by the
prime n + 1. Otherwise, n + 1 is composite: n + 1 = kℓ, where 2 ≤ ℓ ≤ n. By the induction
hypothesis, ℓ is divisible by a prime, so ℓ = qp where p is prime. Therefore n + 1 = kqp which
shows that n + 1 is also divisible by the prime p. By induction, P (n) is t for all n ≥ 2.
(b) Suppose there are finitely many primes. Then, there is a largest prime p. Consider p! + 1,
which has a remainder of 1 when divided by 2, 3, . . . , p. By part (a), p!+1 must be divisible by
a prime. This prime must therefore be larger than p contradicting p being the largest prime.
Therefore there are infinitely many primes.
Pop Quiz 10.4. gcd(n, 0) = n because n|0 and n|n.
gcd(0, 0) is not defined (not both integers can be zero).
gcd(n, n) = n because n|n.
gcd(n, 1) = 1(because the largest divisor of 1 is 1.
1 if p does not divide n;
gcd(n, p) = (Because the only divisors of p are 1 and p.)
p if p does divides n.
Exercise 10.5.
gcd(34, 55)
= gcd(21, 34) 21 = 55 − 34
= gcd(13, 21) 13 = 34 − 21 = 34 − (55 − 34) = 2 · 34 − 55
= gcd(8, 13) 8 = 21 − 13 = (55 − 34) − (2 · 34 − 55) = 2 · 55 − 3 · 34
= gcd(5, 8) 5 = 13 − 8 = (2 · 34 − 55) − (2 · 55 − 3 · 34) = 5 · 34 − 3 · 55
= gcd(3, 5) 3 = 8 − 5 = (2 · 55 − 3 · 34) − (5 · 34 − 3 · 55) = 5 · 55 − 8 · 55
= gcd(2, 3)) 2 = 5 − 3 = (5 · 34 − 3 · 55) − (5 · 55 − 8 · 55) = 13 · 34 − 8 · 55
= gcd(1, 2) 1 = 3 − 2 = (5 · 55 − 8 · 55) − (13 · 34 − 8 · 55) = 13 · 55 − 21 · 34
= gcd(0, 1)
= 1
On the right we show how to write each remainder in the GCD algorithm as a linear combination
of the original two numbers. The final result is
gcd(34, 55) = 1 = 13 × 55 − 21 × 34.
Exercise 10.6. x = 1, y = 1 gives mx + ny = 21; x = −2, y = 1 gives mx + ny = 3; You cannot

get a linear combination that is smaller than 3 and gcd(6, 15) = 3.
Exercise 10.7.
(a) Suppose d|mn. By Bezout’s identity, there are x, y for which gcd(m, d) = mx + dy. Multiply
both sides by n to get
gcd(m, d) · n = xmn + ynd.
d divides the second term on the right (it is a multiple of d). d also divides the first term on
the right because d|mn, so mn = αd. Therefore, the RHS is (xα + yn)d, that is d divides the
sum on the RHS. Therefore d must divide the LHS which is what was to be shown.
457
(b) We are given that gcd(d, d′ ) = 1 = dx + d′ y (Bezout’s identity). Multiply both sides of
Bezout’s identity by n to get
n = xdn + yd′ n.
′ ′ ′
Since d|n, n = αd; since d |n, n = α d . Rewriting the equation above,
n = xα′ dd′ + yαdd′ = (xα′ + yα)dd′ ,
which means dd′ |n as was to be shown.
(c) Let D = gcd(m, ℓ) and D ′ = gcd(n, ℓ). By Bezout’s identity, D = mx + ℓy and D ′ = nx′ + ℓy ′ .
Multiplying these two equations,
DD′ = (mx + ℓy)(nx′ + ℓy ′ ) = mn(xx′ ) + ℓ(ynx′ + mxy ′ + ℓyy ′ ). (∗)
′
Since DD > 0, the RHS is a positive linear combination of mn and ℓ. The smallest positive
linear combination of mn and ℓ is gcd(mn, ℓ), so gcd(mn, ℓ) ≤ DD ′ .
To show the reverse inequality, that DD ′ ≤ gcd(mn, ℓ), it suffices to show that DD ′ is a
common divisor of mn and ℓ because then it must be at most the greatest common divisor.
By Exercise 10.2(b), DD ′ |mn. We show that DD ′ |ℓ. To do so, we will use part (a) of this
exercise. For part (a) to apply, we must show that gcd(D, D ′ ) = 1. Note that gcd(D, D ′ )|m
because gcd(D, D ′ )|D and D|m; similarly, gcd(D, D ′ )|n. Therefore, gcd(D, D ′ ) is a common
divisor of m and n, which implies
gcd(D, D′ ) ≤ gcd(m, n) = 1.
Thus, gcd(D, D ) = 1. By part (a), since D|ℓ and D ′ |ℓ (why?), it follows that DD ′ |ℓ. Therefore
′
DD′ is a common divisor of mn and ℓ and hence DD ′ ≤ gcd(mn, ℓ).

Since gcd(mn, ℓ) ≤ DD ′ and DD ′ ≤ gcd(mn, ℓ), it follows that gcd(mn, ℓ) = DD ′ .
[Note: It is essential that gcd(m, n) = 1 (consider m = n = ℓ = 5).]
(d) Using the same notation in (b), we are to show that gcd(mn, ℓ) = 1 if and only if D = 1 and
D′ = 1.
First, suppose D = 1 and D ′ = 1. In (b), (∗) showed that gcd(mn, ℓ) ≤ DD ′ = 1, which
proves gcd(mn, ℓ) = 1.
Now, suppose gcd(mn, ℓ) = 1. Any divisor of m and ℓ is also a divisor of mn and ℓ so
gcd(m, ℓ) ≤ gcd(mn, ℓ) = 1. Similarly, gcd(n, ℓ) ≤ gcd(mn, ℓ) = 1. We conclude that
gcd(m, ℓ) = gcd(n, ℓ) = 1.
(e) Let D = gcd(gcd(ℓ, m), n) and D ′ = gcd(ℓ, gcd(m, n)) Since D| gcd(ℓ, m), d|ℓ and d|m, and
also D|n. Since D|m and D|n, by GCD fact (ii) on page 129, D| gcd(m, n). Therefore D is a
common divisor of ℓ and gcd(m, n). This means D ≤ D ′ .
A similar argument proves reversed inequality D ′ ≤ D: D′ divides ℓ, m and n; this means
D′ | gcd(ℓ, m) and hence D ′ is a common divisor of gcd(ℓ, m) and n. It follows that D ′ ≤ D.
Therefore, D = D ′ .
Exercise 10.8. We prove the claim by induction on n. The base case is n = 2: if p|q 1 q2 , then,
by Euclid’s Lemma, p = q1 or p = q2 . For the induction step, suppose that for any n primes, if
p|q1 · · · q2 then p equals one of the qi . Consider any n + 1 and suppose that p|q1 · · · qn qn+1 . That
is p|(q1 · · · qn )qn+1 . By Euclid’s Lemma, either p|qn+1 or p|q1 · · · qn . In the former case, because
qn+1 is prime, p = qn+1 ; in the latter case, by the induction hypothesis p equals one of the q i . In
either case p equals one of the n + 1 primes q1 , . . . , qn+1 , proving the claim for n + 1. The claim
now follows by induction for all n ≥ 2.
Pop Quiz 10.9. This is just the Fundamental Theorem of Arithmetic in disguise. By the
fundamental theorem of arithmetic, every n ≥ 2 is a product of primes, n = p 1 p2 · · · pn : a1 is the
number of times 2 appears in this product; a2 is the number of times 3 appears in this product;
and so on. The ai must be unique for every n because if it is not unique for some n, then that n
is a product of primes in two different ways, which cannot be.
Exercise 10.10. We must show gcd(kM1 , kM2 ) = k. The only divisors of kM1 are 1, k and
M1 , since k and M1 are different primes Similarly, the only divisors of kM2 are 1, k and M2 . The
largest that is common to these two sets is k.
Exercise 10.11. We have: a ≡ b (mod d) and r ≡ s (mod d). That is,
a − b = k1 d and r − s = k2 d.
458
(a) ar − bs = (b + k1 d)(s + k2 d) − bs = (k1 s + k2 b + k1 k2 d)d .

So, d|ar − bs, i.e. ar ≡ bs (mod d).
(b) (a + r) − (b + s) = b + k1 d + s + k2 d − b − s = (k1 + k2 )d .
So, d|(a + r) − (b + s), i.e. a + r ≡ b + s (mod d).
(c) The proof is by induction, using (a). When n = 1, we are given that a ≡ b (mod d). Suppose
an ≡ bn (mod d). Then applying (a) with r = an and s = bn , we get an+1 ≡ bn+1 (mod d).
Therefore, by induction, an ≡ bn (mod d) for n ≥ 1.
Pop Quiz 10.12. We observe that 52 ≡ 1 (mod 3). Therefore by Exercise 10.10(c), 52n ≡ 12n
(mod 3). Note 12n = 1. So, 52014 ≡ 1 (mod 3). Since 5 ≡ 2 (mod 3), using Exercise 10.10(a),
52014 ≡ 2 (mod 3). The remainder is 2.
Exercise 10.13. We use a proof by contradiction. Suppose 15 does have a multiplicative inverse
k. Then 15k ≡ 1 (mod 6), that is 15k − 1 = 6pha, or 15k − 6α = 1: 3 divides the LHS but not
the RHS, a contradiction. So, 15 does not have a multiplicative inverse.
Exercise 10.14.
(a) (i) If k = 0 then 0p = 0 and 0 − 0 = 0 is divisible by p. If k = p then pp − p = p(pp−1 − 1)
is a multiple of p and so is divisible by p.
(ii) If i ∈ {1, . . . , p − 1}, then gcd(p, i) = 1. If p|ik then by Exercise 10.7(a), p|k, that is k is
a multiple of p, a contradiction. So, p does not divide ik and ik is not a multiple of p.
(iii) Immediate from Theorem 10.10 on page 133, because gcd(k, p) = 1.
(iv) Since ik is not a multiple of p, by part (ii), αi 6= 0. Thus, αi ∈ {1, 2, . . . , p − 1}.
Suppose i, j ∈ {1, . . . , p − 1}. We show, by contradiction, that if i 6= j then αi 6= αj .
Suppose αi 6= αj . Then ik = qi p+αi and jk = qj p+αi . Subtracting, ik−jk = (qi −qj )p.
That is, ik ≡ jk (mod p). By (iii), i ≡ j (mod p). Since i, j ∈ {1, 2, . . . , p − 1}, this
means i = j, a contradiction. Thus, αi 6= αj . Since Q no two αi are equal, α1 , α2 , . . . , αp−1
is a permutation of 1, 2, . . . , p − 1, which means p−1 i=1 αi = (p − 1)!.
Q Qp−1
(v) Since
Qp−1 ik ≡ α i (mod p), by repeaded use
Qp−1of Exercise 10.10, p−1i=1 ki ≡ i=1 αi (mod p).
p−1
i=1 ki = k (p − 1)! and by (iv), i=1 α i = (p − 1)!, therefore
kp−1 (p − 1)! ≡ (p − 1)! (mod p). (∗)
If p divides (p − 1)!, by Lemma 10.8 on page 131 (Euclid’s Lemma), p divides one of the
terms in the product. Since every term in the product is less than p, that is not possible,
so p does not divide (p − 1)!. The only other divisor of p is 1, so gcd(p, (p − 1)!) = 1. By
Theorem 10.10 on page 133, we can cancel the (p − 1)! from both sides of (∗) to get
kp−1 ≡ 1 (mod p).
Multiplying both sides by k, since k ≡ k (mod p), gives Fermat’s Little Theorem.
(b) When k is a multiple of p, there is no multiplicative inverse since gcd(k, p) = p > 1. When k
is not a multiple of p, we just proved above that k p−1 ≡ 1 (mod p). Let k−1 ≡ kp−2 (mod p).
Then k · k −1 ≡ kp−1 ≡ 1 (mod p). That is, k p−2 is the multiplicative inverse of k.
(c) (i) We find x, y such that 8x + 19y = 1 using the remainders in Euclid’s GCD-algorithm:
gcd(8, 19) = gcd(3, 8) rem(19, 8) = 3 = −8 · 2 + 19
= gcd(2, 3) rem(8, 3) = 2 = 8 − 3 · 2
= 8 − (−8 · 2 + 19) · 2
= 8 · 5 − 19 · 2.
= gcd(1, 2) = 1 rem(3, 2) = 1 = 3 − 2
= −8 · 2 + 19 − (8 · 5 − 19 · 2)
= 8 · (−7) + 19 · 3.
Therefore x = −7 and 8−1 = rem(−7, 19) = 12 (for modulus 19). You can verify that
8 × 12 ≡ 1 (mod 19) because 8 × 12 − 1 = 19 × 5.
(ii) 8−1 ≡ 817 (mod 19). We observe that 83 ≡ −1 (mod 19). Therefore,
815 ≡ (−1)5 ≡ −1 (mod 19).
17
Finally, 8 ≡ −64 ≡ 12 (mod 19), so 8−1 = 12 (for modulus 19).
459
Exercise 10.15. M p−2 M∗ ≡ M p−1 k (mod p). Assuming M is not a multiple of p, by Fermat’s
Little Theorem, M p−1 ≡ 1 (mod p). Multiplying both sides by k gives M p−1 k ≡ k (mod p), that
is,
M p−2 M∗ ≡ M p−1 k ≡ k (mod p).
So Charlie obtains k by computing rem(M p−2 M∗ , p).
Exercise 10.16. Alice encrypts to M∗ ≡ M 225 (mod 391) and Bob decrypts with M∗97 (mod 391).
For example, with M = 2, we have
27 ≡ 128 → 214 ≡ 1282 ≡ 353 → 228 ≡ 3532 ≡ 271 → 256 ≡ 2712 ≡ 324 → 2112 ≡ 3242 ≡ 188,
and finally we have 2225 ≡ 2 · 1882 ≡ 308 (mod 391). Bob decrypts as follows:
3083 ≡ 246 → 3086 ≡ 2462 ≡ 302 → 30812 ≡ 3022 ≡ 101 → 30824 ≡ 1012 ≡ 35 → 30848 ≡ 352 ≡ 52,
and finally we have 30897 ≡ 308·522 ≡ 2 (mod 391). Here is the table of results for M = 2, . . . , 10,
M 2 3 4 5 6 7 8 9 10
M∗ 308 105 242 158 278 109 246 77 180
Bob’s decryption 2 3 4 5 6 7 8 9 10
(Bob always recovers M .)
Exercise 10.17. Let n = pq; M∗ ≡ M e (mod n). We decode using M∗d ≡ M ed (mod n), and
must show M ed ≡ M (mod n) (i.e. we recover the correct message for any M ).
(a) Since ed ≡ 1 (mod (p − 1)(q − 1)), this means that ed − 1 is divisible by (p − 1)(q − 1), or
there is some k for which ed − 1 = k(p − 1)(q − 1).
(b) (i) By (a) ed − 1 = k(p − 1)(q − 1), so M ed−1 = M k(p−1)(q−1) .
(ii) By Fermat’s Little Theorem, M p−1 ≡ 1 (mod p) (because p does not divide M ). This
means M p−1 − 1 = αp for an integer α, or that M = 1 + αp. Therefore,
M ed−1 = M k(p−1)(q−1) = (M p−1 )k(q−1) = (1 + αp)k(q−1) .
(iii) By the Binomial Theorem,
k(q−1) k(q−1)
X k(q − 1) X k(q − 1)
(1 + αp)k(q−1) = 1 + α i pi = 1 + p αi pi−1 .
i=1
i i=1
i
| {z }
β
We could also use ar − 1 = (a − b)(1 + a + a2 + · · · + ar−1 ) with a = 1 + αp:

(1 + αp)k(q−1) − 1 = p (α + α(1 + αp) + α(1 + αp)2 + · · · + α(1 + αp)k(q−1)−1 ) .
| {z }
β
Either way, β is a sum of integers, hence β is an integer. We have proved:

M ed−1 = (1 + αp)k(q−1) = 1 + βp.
(iv) From (iii), M ed−1 − 1 = βp, that is p|M ed−1 − 1.
(c) By (b), either p divides M or if not then it must divide M ed−1 − 1. That is p must divide
their product M ed − M . Everything is symmetric with respect to p and q and so using exactly
the same reasoning, q|M ed − M .
(d) Since gcd(p, q) = 1 and both p|M ed − M and q|M ed − M , we can apply Exercise 10.7(b) on
page 130 to obtain pq|M ed − M .
(e) From (d), by definition, M ed ≡ M (mod pq). Bob can decode by computing rem(M∗d , pq).
Chapter 11
Pop Quiz 11.1.
(a) V = {1, 2, 3, 4, 5, 6};
E = {(1, 2), (2, 3), (3, 4), (1, 4)};
(b) V = {a, b, c, d, 1, 6};
E = {(a, b), (b, c), (c, 1), (a, 1)};
(c) V = {i, j, k, ℓ, m, n};
E = {(i, m), (j, ℓ), (j, m), (j, n), (k, m), (k, n)};
(d) V = {i, j, k, ℓ, m, n};
E = {(i, ℓ), (i, j), (j, m), (ℓ, m), (j, k), (m, n)};
460
Exercise 11.2. Isomorphic graphs: {I, II}

(a) Since you only relabeled vertices, the number of vertices did not change; every edge still exists
with relabeled end points, so the number of edges did not change.
(b) In the relabeling, suppose vertex v is repabeled to ℓ(v). Then every edge in the graph (v, w)
becomes (ℓ(v), ℓ(w)) and deg(v) becomes deg(ℓ(v)). Every vertex w which contributes to
deg(v) is relabeled to a vertex ℓ(w) which contributes to the degree of ℓ(v). Therefore, the
degree of each vertex does not change.
(c) Suppose v1 v2 · · · vk is a path between. Every edge (vi , vi+1 ) exists in the graph since v1 v2 · · · vk
is a path. Therefore, from the relabeling, edge (ℓ(vi ), ℓ(vi+1 )) is present in the relabeled graph.
Hence ℓ(v1 )ℓ(v2 ) · · · ℓ(vk ) is a path in the relabeled graph.
(d) Every path in the graph between any two nodes is preserved as a relabeled path between the
relabeled two vertices. This includes shortest paths.
Exercise 11.3. There were some trick questions in this exercise.
(a) An isomorphism preserves all paths (see Exercise 11.2). In the first
graph, there is a path between every pair of vertices, but not so in
the second, so the graphs cannot be isomorphic.
(b) This is a trick question. All graphs with the degree sequence [3,3,2,1,1] are isomorphic. To
see this label the vertices A, B, C, D, E (highest to lowest degree).
Vertex A, with degree 3, has 3 neighbors. Either B is one of these neigh- E
bors or it is not. If it is not, then since B also has degree 3, A and B are
neighbors of C, D, E. This is not possible since C, D, E have respective A
degrees 2, 1, 1. Therefore B is a neighbor of A. The situation is illustrated
on the right. (Since there are two degree 1 vertices, at least one (E) is a B
neighbor of A.)
Since E cannot have any more neighbors, and B must have two more E
neighbors, it must be that B is connected to the other two vertices, com-
pleting the picture as shown on the right. There is no other way to con- A C D
struct a graph with this degree sequence. (Not all degree sequences can
be realized by different, non-isomorphic, graphs. Another classic example B
is [n − 1, 1, 1, . . . , 1].)
(c) Another trick question. There is no graph with this degree sequence. Here’s a hint as to why:
the sum of the degrees is 13 (more on this later).
Pop Quiz 11.4. This graph cannot exist because there are an odd number of odd-degree
vertices.
Exercise 11.5. P
(a) If every degree is positive, 2m = i δi ≥ n, so m ≥ n/2. Example:
(b) Equivalently, we compute the maximum number of edges a graph with a degree 0 vertex can
have. Let v be the degree 0 vertex and n the number of vertices. Every vertex other than v
can have an edge to every vertex other than v, so ever vertex other than v has degree n − 1.
The number of edges is 12 (n − 1)(n − 2) (half the sum of the degrees). This is the maximum
number of edges for a graph with a degree 0 vertex. So, if a graph has 1 + 12 (n − 1)(n − 2)
edges, it cannot have a degree 0 vertex.
(c) The sum of the degrees is 5 × 3 = 15. There are no such graphs, since the sum of the degrees
must be even. You could also argue that such a graph would have 5 vertices of odd degree,
which violates Corollary 11.3.
Exercise 11.6.
(a) The dotted edge creates a cycle, and a tree is connected with no cycles.
(b) Steps 1–5 are not connected. After step 6, any new edge would create a cycle.
(c) This is an important result so we give two different proofs.
Theorem 30.1. A graph with fewer than n − 1 edges is not connected.
Proof. We prove the claim by induction on n. The base case is n = 2 in which case the graph
with 0 edges is clearly not connected. Consider any graph with n + 1 vertices and fewer than
n edges. Every vertex cannot have degree at least 2 (the sum of the degrees would be at least
2(n + 1) implying at least n + 1 edges), so some vertex v has degree less than 2. If deg(v) = 0
461
then the graph is disconnected as was to be shown.

we show the situation with deg(v) = 1 on the right. The shaded
region is the rest of the graph, other than v, and there is an edge e from v v x
to one node in the shaded region. Remove v and e from the graph. The e
y
shaded graph that remains has n vertices and fewer than n − 1 edges (we
removed one vertex and edge). By the induction hypothesis, this residual
(shaded) graph is not connected: two vertices
(illustrated by x and y) are not connected by any path. Adding back v and e cannot create a
path between x and y, so x and y remain disconnected in the original graph. We have proved
that any graph with n + 1 vertices and fewer than n edges is not connected, so the theorem
follows by induction.
We now give a proof of a more general result from which Theorem 30.1 follows. A component
in a graph is a “maximal” set of vertices that is connected. The component of a vertex v is all
the vertices connected to v,
C(v) = {u | u is connected to v by a path}.
Any vertex in a component can be used to define that component, that is if u ∈ C(v) then
C(u) = C(v): any vertex in C(v) is connected to u by going first from u to v and then to the
vertex, hence C(v) ⊆ C(u); similarly any vertex in C(u) is connected to v by going first from
v to u and then to the vertex, hence C(u) ⊆ C(v).
1 component 2 components 6 components

A graph is connected if it has only one component. A graph with n isolated vertices (and no
edges) has n components. If you add an edge between two vertices in the same component,
you do not change any of the components. If you add an edge between two vertices in different
components, you merge those two components, decreasing the number of components by 1.
Lemma 30.2. Adding an edge can decrease the number of components by at most 1.
We use this lemma to prove our general result:
Theorem 30.3. A graph with n vertices and e edges has at least n − e components.
Proof. Start with the n isolated vertices and no edges (n components) and add the edges
one by one, each time decreasing the number of components by at most one. So the number
of components decreases by at most e, leaving at least n − e components.
The formal proof would be by induction on e, the number of edges in the graph. In the
induction step you start with e + 1 edges and remove an edge. By the induction hypothesis
there will be at least n + 1 − e edges. Now add back the edge and apply Lemma 30.2 to
conclude there are at least n − e components.
Theorem 30.3 implies Theorem 30.1 because if e < n − 1, then
number of components > n − (n − 1) = 1.
This means the number of components is at least 2 and the graph is disconnected.
(d) Suppose the graph has n + k edges, where k ≥ 0. We first prove the case that the graph has
a single component. This means that for any set of vertices there is at least one edge from a
vertex in the set to a vertex not in the set (otherwise that set of vertices is disconnected from
the rest of the graph).
Let us build a connected component by adding one vertex at a time. Start at any vertex
v1 . There must be an edge from v1 to a second vertex v2 . So we have build the set v1 , v2 .
After we have built the set v1 , v2 , . . . , vi , there must be an edge from a vertex in our set to an
i + 1th vertex vi+1 . Continue this process until we have built the set containing all the nodes
v1 , . . . , vn using n − 1 edges. By construction, this set of vertices is connected using only the
n − 1 edges – this set of n − 1 edges is called a spanning tree.
There is at least one more edge in the graph, say (vi , vj ), since the graph has at least n edges.
Before adding the edge (vi , vj ), there was a path from vi to vj in the spanning tree. The edge
(vi , vj ) together with this path is a cycle.
462
The formal proof of this claim is by strong induction. In the induction step, take any graph
with n vertices and n + k edges. remove an edge e. If the graph remains connected, you
can show that adding back e creates a cycle. If e is disconnected then you can one of the
components has at least as many edges as vertices and hence contains a cycle. Adding back
e won’t remove that cycle.
Now consider a graph with more than one component. Suppose the graph has ℓ components
with n1 , n2 , . . . , nℓ vertices in each component, and e1 , e2 , . . . , eℓ edges in each component:
n1 + n2 + · · · + nℓ = n and e1 + e2 + · · · + eℓ = e. We claim that for some i∗ , ei∗ ≥ ni∗ because
if not, then ei < ni for every i and
e = e1 + e2 + · · · + eℓ < n1 + n2 + · · · + nℓ = n,
which cannot be since e ≥ n. There must therefore be a cycle in the i∗ th component and
hence in the graph.
(e) Since this is an “if and only if”, there are two parts to the proof.
1. Suppose G is a connected graph with n vertices that is a tree. By (d), if G has fewer than
n − 1 edges, then it cannot be connected, so G has at least n − 1 edges. If G has n or more
edges, then by (e) it has a cycle and therefore is not a tree. Hence G must have n − 1 vertices.
2. Suppose G is a connected graph with n nodes and n − 1 edges. We must show that G is
a tree. We need to show that there are no cycles. Suppose G contains a cycle. Remove an
edge on this cycle. Every vertex remains connected with every other vertex, hence the graph
remains connected but has n − 2 edges. This contradicts (d). Hence G has no cycles.
Exercise 11.7.
(a) Pyramid Cube Octahedron
V 4 8 6
E 6 12 12
F 4 6 8
F +V −E 2 2 2
(b) For the pyramid, we just project the apex onto the plane of the base. We can do a similar
thing for the cube, after moving out the base vertices.
For the octahedron, we can project the upper apex down to the base plane and place the
lower apex outside the base.
For the octahedron, we obtain a more symmetric looking graph (right) by projecting an upper
triangular face into the opposite lower triangular face.
(c) The faces become regions (polygons). One of the faces becomes the external (unbounded)
region.
(d) There is only one external face, so F = 1. For a tree, we know from Exercise 11.6 that
E = V − 1. Therefore,
F + V − E = 1 + V − (V − 1) = 2.
(e) Consider a connected graph that is not a tree (i.e. has cycles).
(i) Look at what happens when you remove one edge from a cycle.
463
There are two cases: the edge is between two internal faces (left) and the internal faces
merge into one internal face; the edge is between an internal face and the external face
(right) and the external face merges with the internal face to make a larger external
face. In either case, the number of faces F decreases by 1. We removed one edge but
the number of vertices remained the same. So,
ΔE = −1; ΔF = −1; and ΔV = 0.
(ii) Paths that do not use removed edge are not affected by removing an edge. Paths that
use the removed edge can go the other way around the cycle (instead of using the edge).
Thus, if there was a path between two vertices, there still will be one.
Removing an edge from a cycle does not affect connectivity
(iii) Removing an edge from a cycle decreases E and F by the same amount, so the total
change will be the same, ΔE = ΔF . The vertices are unchanged so ΔV = 0.
(iv) When you are done removing edges from cycles, there are no more cycles, and the graph
is connected. Therefore it is a tree. In this process, F → F + ΔF , E → E + ΔE and
V → V + ΔV . For a tree, we proved in (d) that (faces) + (vertices) − (edges) = 2,
therefore
F + ΔF + V + ΔV − (E + ΔE) = 2.
Since ΔV = 0 and ΔF = ΔE, we get
F + V − E = 2,
where F , V and E are for the original
P graph with cycles.
(f) When you traverse around every face, f E(f ) edges are traversed. Every edge is traversed
twice: Edges on internal faces belongs to two faces and so are traversed once for each face;
Edges on the external face that are not on P
an internal face are also traversed twice, back and
forth. Since every edge is traversed twice, f E(f ) = 2E.
(g) Every internal face is bounded by at least 3 edges so E(f ) ≥ 3 for an internal face. If V ≥ 3,
the external
P face contains at least 3 vertices and hence E(external face) ≥ 3. Therefore
2E = f E(f ) ≥ 3F . That is F ≤ 32 E. Using Euler’s Characteristic,
E = F +V −2
≤ 23 E + V − 2
1
→ 3
E ≤ V −2
→ E ≤ 3V − 6
In any planar graph with at least 3 vertices, E ≤ 3V − 6.
In a planar graph, the number of edges is linear in the number of vertices. In K 5 , V = 5, so

3V − 6 = 9; yet, E = 10 which is greater, so K5 cannot be planar.
(h) If there are no 3-cycles, then E(f ) ≥ 4 for every internal face f , because traversing around
an internal face creates a cycle. The external face contains at least 3 vertices in which case
going Paround the external face traverses at least 4 edges, so again E(f ) ≥ 4. Therefore,
2E = f E(f ) ≥ 4F and, using Euler’s Characteristic,
E = F +V −2
≤ 12 E + V − 2
1
→ 2
E ≤ V −2
→ E ≤ 2V − 4
Any simple graph has no 2-cycles. In K3,3 there are no 3-cycles because any path of odd
length takes you from one side to the other. In K3,3 , V = 6, so 2V − 4 = 8, but E = 9 which
is larger, so K3,3 cannot be planar.
464
Exercise 11.8. Vertices in Euler’s multi-graph are regions of Königsberg and edges are the
bridges that connect two regions. Euler’s problem asks to start at some vertex in the graph, and
follow a path of edges, ending at some other vertex. The requirement is that every edge must be
traversed exactly once. Other than the start and end vertex, every other vertex, if entered using
some (untraversed) edge must be exited using a different (untraversed) edge. This means every
vertex, other than the start and end vertex, must have an even degree. Every vertex in Euler’s
graph has an odd degree, so Euler’s problem is not solvable.
A path that Euler was seeking, which uses every edge is called an Euler tour. If the path starts
and ends at the same vertex, it is called an Euler cycle.
Theorem 30.4. A connected graph has an Euler cycle if and only if every vertex has even degree.
A connected graph has an Euler tour starting at u and ending at v if and only if the degrees of u
and v are odd and every other vertex has even degree.
There are two parts to an if and only if proof. Try induction for the “hard part”.
Exercise 11.9. We highlight the fastest path on the right, which takes 11ms.
The fastest path is counterintuitive because it doesn’t always move 2 8
“toward” the destination. When you take a course in algorithms you
11 4
will learn how to systematically compute shortest paths when the edge 12 2
1
weights are non-negative. The idea is to compute the shortest paths 20
6 1
to all vertices simultaneously, starting with the closest vertex, then the 1
5 1 2
next closest and so on. The technique is called dynamic programming
and the algorithm is Dijkstra’s algorithm. 10
Pop Quiz 11.10.  

(v1 →v2 ), (v3 →v1 ), (v3 →v2 ), (v2 →v4 ),
(a) V = {v1 , v2 , v3 , v4 , v5 , v6 , v7 } and E = (v2 →v5 ), (v3 →v4 ), (v5 →v4 ), (v6 →v7 ), .
 
(v2 →v1 ), (v4 →v2 ), (v6 →v2 )
(b) The graph is not (strongly) connected (there is no path from v 7 to any other node).
Exercise 11.11.
(a) m, z, d are top-dogs.
(b) Let t be a vertex with maximum out-degree (in case of ties, pick one arbitrarily). We prove
that t is a top-dog. We must show that t dominates every any other node u (either t beats u
or t beats a vertex that beats u). Suppose, to the contrary, that there is some vertex v which
t does not dominate. That is, v beats t and also beats everyone who t beats. Then out-deg(v)
is at least 1 + out-deg(t), which contradicts t having maximum out-degree. Therefore such a
v does not exist.
(c) Let v1 →v2 but vi →v1 for i > 2. So, v1 wins one match. Let v2 →vi for v2 v1
i > 2. So, v2 beats everyone but v1 . By construction, v1 is a top-dog,
having beaten just one vertex. Let v3 → v4 → · · · → vn → v3 , and the
results of all other matches can be arbitrary. So, v2 , . . . , vn all have v3 v6
out-degree at least 2. This means v1 won the fewest possible matches

and yet is a top-dog. v4 v5
Chapter 12
Exercise 12.1.
(a) Let |E| be the total number of partners that men have, which is the total number of partners
women have since every partnership is between a man and a woman. Let M be the number
of men and F the number of women. Then,
|E| |E| F F
average partners per man = = = average partners per woman × .
M F M M
Since F/M = 50.8/49.2 ≈ 1.0325,
average partners per man = average partners per woman × 1.0325.
Men have 3.25% more partners on average.
465
(b) Let |E| be the number of different sex relationships between males and femalse. Let e m be the
number of same-sex relationships among males, and ef the number of same-sex relationships
among females. We are given that em + ef is 1% of all relationships,
em + ef 0.01
= 0.01 → e m + ef = × |E|.
|E| + em + ef 0.99
The total number of partners the men had is |E| + 2em and similarly, the total number of
partners the women had is |E| + 2ef . So,
|E| + 2em
average partners per male =
M
|E| + 2ef
average partners per female = .
F
Taking the ratio,
average partners per male |E| + 2em F
= × .
average partners per female |E| + 2ef M
0.01
The two extremes are when em = 0 and when ef = 0. When em = 0, ef = |E| × 0.99 and
average partners per male |E| F
= 0.01
× ≈ 1.012.
average partners per female |E| + 2|E| × 0.99
M
0.01
When ef = 0, em = |E| × 0.99
and
0.01
average partners per male |E| + 2|E| × 0.99 F
= × ≈ 1.0534.
average partners per female |E| M
The average number of partners for men is 1.2% − 5.3% larger, depending on how the same
sex relationships are distributed between males and females.
Pop Quiz 12.2. (T1 , R1 ), (T2 , R3 )(T3 , R4 ), (T4 , R5 ).
Exercise 12.3. We try proof by induction, as follows. In the induction step, take a left-vertex
with minimum left degree and match it with a right-vertex, removing the two matched vertices.
Let us now examine the residual graph. The degrees of some right-vertices decreases by one,
but the maximum right-vertex degree could stay the same. What about left-vertices. Since we
removed a right-vertex, if that right node was linked to other left-vertices, the degree of those
left-vertices will decrease by 1. This means that the minimum left-vertex degree could decrease,
and in so doing it could drop below the maximum right-vertex degree. Therefore, we may not be
able to apply the induction hypothesis to the residual graph, and so the proof by induction falters.
Hall’s theorem implies Corollary 12.2 on page 164, so Hall’s theorem is a stronger result. It is
often easier to prove a stronger claim by induction. This is because we get to assume more in
P (n) which offsets having to prove more in P (n + 1).
Exercise 12.4. A matching is stable if there is no pair of matches that is volatile.
X Y Z A B C
1. A A B 1. Z Y Z
2. B C A 2. Y X X
3. C B C 3. X Z Y
The match A–Z is “stable” because Z is A’s top choice so A will not wish to break X A
her current match. The only possible volatile pair is (X, Y ) and (B, C). Since Y B
X prefers B to C, this is not a volatile pair. Since there are no volatile pairs of
Z C
matches, the matching is stable.
Again, Z is A’s top choice, so the only possible volatile pair is (X, Y ) and (B, C). X A
Since B prefers Y to X, this is not a volatile pair and the matching is stable. Y B
Z C
A is indifferent between the matchings because she gets her top choice in both matchings. B
prefers Y to X and C prefers X to Y so the girls prefer the second matching.
As for the boys, Z is indifferent between the two matchings because he gets the same partner in
both matchings. X prefers B to C and Y prefers C to B so the boys prefer the first matching.
Exercise 12.5.
466
(a) If a woman w has more than one suitor, she chooses her favorite and the other suitors (at least
one) cross w from their list. When there is at most one man under every woman’s balcony,
we have a stalemate. For every non-stalemate round (at least one woman has more than one
suitor), a man crosses a woman from a list. There are a total of n 2 women on all the lists (each
woman appears once on each list). Therefore there cannot be more than n 2 non-stalemate
rounds of dating because there will be no more women left to cross out.
Conclusion: After at most n2 rounds of dating, each woman has at most one suitor.
(b) If a woman w ever gets wooed, then those suitors had w on the top of their current list. She
picks her favorite who must come back the next round as w will remain on the top of that
favorite’s list. By induction, she will always have a suitor.
(c) According to the dating ritual, m will continue to woo as long as there are uncrossed women
on his list. Since m is not married at the end, it must be that m has been rejected by every
woman, which means that he has wooed every woman, including w.
Suppose that there is an unmarried man m at the end of the ritual. Then there is an unmarried
woman w who was wooed at sometime by m. By part (b), from that point on, w will always
have a suitor and so must end up married, a contradiction. Therefore, every man is married
at the end of the ritual (and therefore so too is every woman).
(d) Suppose w is at the ith position on m’s list.
(a) If m never wooed w then m could not have been rejected by all the top i − 1 candidates
on m’s list. Therefore m was ultimately accepted by one of these top i − 1 candidates
and ended up married to that better candidate: m prefers his current partner to w.
(b) If m did woo w, but is not married to w, then w rejected m for someone better, m ′ .
From this point on, in the dating ritual, w will continue to accept only candidates who
are at least as good as m′ because m′ will return to w unless someone better comes along
and w rejects m′ . Therefore, w will end up married to someone at least as good (in her
view) as m′ , who she prefers to m: w prefers her current partner to m.
That the marriages are stable is now immediate from (i) and (ii). Consider any pair of married
couples (m, w) and (m′ , w′ ). If m′ had wooed w then w prefers m to m′ and would not wish
to switch to m′ . If m′ had not wooed w then m′ prefers his current partner w ′ to w. Hence,
the pair of married couples is not volatile.
Pop Quiz 12.6. The dating ends after two rounds.
Dating Round 1: A B C X Y Z
1. Z Y Z B A, C
2. Y X X
3. X Z Y
Dating Round 2: A B C X Y Z
1. Z Y Z C B A
2. Y X X
3. X Z Y
Pop Quiz 12.7.
(a) R1 , R2 , R3 form a “clique”, that is every pair has an edge between them. Therefore every
vertex in this group must be colored a different color otherwise an edge will connect two
vertices of the same color. Thus, we need at least 3 colors.
If a graph contains a clique of size k then at least k colors are required.
(b) Certainly you need at least one color, and n colors suffice by coloring each vertex a different
color (no matter what the graph). So 1 ≤ χ(G) ≤ n. The graph with n isoloted vertices needs
can be colored with one color and Kn , the complete graph on n vertices, requires n colors.
Exercise 12.8. The graph is an example of a leveled graph in which the nodes can be parti-
tioned into levels ℓ = 1, 2, 3, 4, . . . and edges only exist between vertices in adjacent levels. In this
case you can alternate colors between levels and get a valid 2-coloring. We show how to represent
the graph as a leveled graph which immediately gives a 2-coloring. To order the vertices so that
Greedy gives a 2-coloring, simply order the vertices by levels.
467
v1
v4 v3 v2
v6 v5
Original graph G Leveled view of G Vertex ordering for Greedy
Exercise 12.9. Certainly if V ≤ 6, then 6 colors are enough by coloring each vertex a different
color. Therefore, we only need to consider V > 6 in which case E ≤ 3V − 6.
(a) Suppose every node-degree is at least 6, then the sum of the node-degrees is at least 6V , so
2E = sum of node-degrees ≥ 6V.
We conclude that E ≥ 3V > 3V − 6, which contradicts E ≤ 3V − 6. Therefore at least 1
vertex has degree of 5 or less.
(b) Start with a planar drawing of a graph and remove a vertex and its edges. The remaining
edges do not cross in the drawing that remains (since initially they did not cross). Therefore
the remaining graph is planar.
(c) We use induction on V , the number of vertices in the graph. If V ≤ 6 then the claim is
trivially true. Suppose the claim is true for any planar graph of V vertices and consider any
planar graph with V + 1 vertices. By (a), there is a node with degree at most 5. Remove
this vertex and its edges. By (b) the remaining graph is planar and has V vertices, so by the
induction hypothesis this remaining graph is 6-colorable. Now add back the removed vertex,
keeping the colors of the vertices obtained from the 6-coloring of the smaller graph. Among
the 6 colors, there must be at least one free color for the node we added back because that
vertex has at most 5 neighbors. Therefore, our graph with V + 1 vertices is 6-colorable. By
induction, every planar graph is 6-colorable.
(In fact, every planar graph is 5-colorable. The same basic induction can be used for the
proof, but you have to be a little more careful in the induction step. The 4-color theorem says
that every planar graph is 4-colorable, and that is hard to prove.)
Chapter 13
Pop Quiz 13.1. 10 × 9 × 8 × 7 × · · · × 2 × 1 = 10! = 3628800.
Exercise 13.2. The outcome is one of two types: HS2 where S2 is the sum of two dice; or T S4
where S4 is the sum of 4 dice. S2 ∈ {2, 3, . . . , 12} (11 choices) and S2 ∈ {4, 3, . . . , 24} (21 choices).
Using the sum rule, the experiment has 11 + 21 = 32 possible outcomes.
Exercise 13.3. The committees are “named” (labeled): committee 1, committee 2,. . . , commit-
tee 16. That is, the committees are “distinguishible”.
(a) An assignment can be specified by s1 s2 s3 · · · s100 , where si ∈ {1, 2, . . . , 16} is the committee
senator i gets assigned. By the product rule, there are 16100 such sequences, which equals the
number of ways each senator can be assigned to exactly one of 16 “named” committees. (You
should ponder on what happens if the committees are not distinguishible.)
(b) A senator can be in 0 or 1 committee and the assignment can be specified by s 1 s2 s3 · · · s100 ,
where si ∈ {0, 1, . . . , 16}: si = 0 if senator i is assigned to no committee; otherwise, si senator
i’s committee. By the product rule, there are 17100 such sequences, which equals the number
of ways each senator can be assigned to at most one of 16 “named” committees.
STOP: do not bother with the solution of (c) if this is your first reading.
It is a hard problem, not essential for the rest of the chapter.
(c) The complication arises because by requiring that each committee is not empty we introduce
a dependency between the senators, where as previously the senators can be assigned inde-
pendently. For example, if the first 99 senators all get assigned to the first 15 committees,
now the only available choice for s100 is committee 16, otherwise that committee would be
empty. Let us consider the case of 5 senators and 2 committees.
468
Exactly 1 committee per senator; no empty committee. When committees can be empty, there
are 25 assignments. If all 5 senators are in either committee there is an empty committee, so
these two assignments are not allowed. All others are allowed, so there are 2 5 − 2 ways.
The general case is when n senators are to be distributed into k “named” committees (in the
literature you will often see this as n distinguishible objects being partitioned into k distin-
guishible non-empty sets). First consider the case where the committees are indistinguishible
(that is not named). For example, with 5 senators and 3 committees, the following sequences
s1 s2 s3 s4 s5 are the same committee assignments if the committees are unlabeled.
12333 21333 13222 31222 23111 32111.

What matters is who is in a committee with whom. Let nk be the number of ways to create
indistinguishible committees. Now we can label the committees: pick one of the k labels for
the first committee, one of the remaining k − 1 labels for the second committee and so on
resulting in k × (k − 1) × · · · × 1 ways to label the committees (product rule). So,
nno
# ways to create k non-empty “named”
= k! .
committees from n senators k
n
The numbers k are known as Stirling numbers of the second kind,
n n o # ways to create partition n labeled ob-
= .
k jects into k non-empty unlabeled sets
Stirling numbers are well studied. Here are some facts for you to verify (n ≥ 1, k ≥ 1):
0 n 0 n o n
n
0
= 1 0
= 0 k
= 0 n−1
= 21 n(n − 1) 2
= 2n−1 − 1.
n
The Stirling numbers k satisfy a recurrence. To partition n objects into k sets: the first
object can
n beoin its own set and the other n − 1 objects are partitioned into k − 1 non-empty
sets in n−1
k−1
ways; or, the first object is in a set with some other objects, in which case

the other n − 1 objects are partitioned into k non-empty subsets in n−1 k
ways which we
multiply by the k to account for the k possible sets for the first object. Therefore,
n n n−1 o
k
= k n−1k
+ k−1 .
We leave the reader to use this recurrence and prove by induction that
k
nno X in
= (−1)k−i
k i=0
i!(k − i)!
To conclude,
k
# ways to create k non-empty “named” committees X k!
= (−1)k−i in .
from n senators (one committee per senator) i!(k − i)!
i=0
At most 1 committee per senator; no empty committee. We consider the problem by “brute-
force”, first deciding the number of senators in no committees. There are 6 cases:
# senators not on a committee
0 1 2 3 4 5
# ways to pick the excluded senators 1 5 10 10 5 1
You can verify the number of ways to exclude k senators in forming the committees: there is
1 way to exclude 0 senators (all senators are involved in committees) and 1 way to exclude 5
senators (no senators are involved in committees); there are 5 ways to exclude 1 senator (5
possible senators to exclude) and similarly 5 ways to exclude 4 senators (5 possible senators
to include). The number of ways to form a pair of senators is 10, so there are 10 ways to
exclude 2 senators and similarly 10 ways to exclude 3 senators (select a pair to include). If
you exclude 4 or 5 senators, then you cannot have both committees nonempty.
Conclusion: there are 4 types of committees: those that exclude 0,1,2 or 3 senators.
Let’s count the number of ways to form 2 non-empty committees if you exclude 2 senators. So
you use 3 senators. There are 10 ways to pick which 2 senators to exclude, and then there are
23 − 2 ways to form 2 non-empty committees using the remaining 3 senators. By the product
469
rule, there are 10 × (23 − 2) ways to form the two non-empty committees. Using this logic, we
compute the entries in the following table for the number of committees that can be formed
by excluding k senators, k = 0, 1, 2, 3.
# senators not on a committee
0 1 2 3
# committees 1 × (25 − 2) 5 × (24 − 2) 10 × (23 − 2) 10 × (22 − 2)
Using the sum rule, the total number of committees we can form with at most 1 committee
per senator and no empty committees is
1 × (25 − 2) + 5 × (24 − 2) + 10 × (23 − 2) + 10 × (22 − 2) = 170.
For the general case with n senators and k committees,
we can leave out i senators and assign
n − i of them to k non-empty committees in k! n−i k
, providing n − i ≥ k. You will see later
that there are n!/i!(n − i)! ways in which to exclude i senators, so using the sum rule,
n−k
# ways to create k non-empty “named” committees X n!k! n−i
= .
from n senators (at most one committee per senator) i!(n − i)! k
i=0
Pop Quiz 13.4.
(a) To list all the sequences of length 6, prepend 0 and 1 to the length-5 sequences:
000000 000001 000010 000011 000100 000101 000110 000111
001000 001001 001010 001011 001100 001101 001110 001111
010000 010001 010010 010011 010100 010101 010110 010111
011000 011001 011010 011011 011100 011101 011110 011111
100000 100001 100010 100011 100100 100101 100110 100111
101000 101001 101010 101011 101100 101101 101110 101111
110000 110001 110010 110011 110100 110101 110110 110111
111000 111001 111010 111011 111100 111101 111110 111111
The color coding is: purple for 0 ones; green for 1 one; red for 2 ones; blue for 3 ones. You
can count the sequences of each color to verify the first 6 entries in the row for n = 6. The
other entries follow by symmetry: if you take a sequence with 2 ones and flip every zero to
one and vice-versa, you get a sequence with 4 ones. Thus, the number of sequences with 4
ones equals the
number of sequences with 2 ones.
(b) We want 10 3
, so we fill out our Pascal’s-triangle table up to row n = 10.
n
k
k 0 1 2 3 4 5 6 7 8 9 10
0 1
1 1 1
2 1 2 1
3 1 3 3 1
4 1 4 6 4 1
5 1 5 10 10 5 1
n
6 1 6 15 20 15 6 1
7 1 7 21 35 35 21 7 1
8 1 8 28 56 70 56 28 8 1
9 1 9 36 84 126 126 84 36 9 1
10 1 10 45 120 210 252 210 120 45 10 1
10

We highlighted the number we seek: = 120. 3
Exercise 13.5.
(a) (i) Using (13.2) on page 13.2,
Xn n−1
X
Q(n, k) = Q(j, k − 1) = Q(n, k − 1) + Q(j, k − 1).
j=0 j=0
The sum on the RHS is Q(n − 1, k) by (13.2), so Q(n, k) = Q(n, k − 1) + Q(n − 1, k).
(ii) If there are no candies of color-1 in the goody bag, the goody bag is made up of n candies
using k − 1 colors: there are Q(n, k − 1) such goody bags. Or, there is at least 1 candy
470
of color-1. Place one candy of color-1 in the bag. The remaining n − 1 candies make
up a “goody bag” using k colors, so there are Q(n − 1, k) such goody bags. By the sum
rule, the total number of goody bags is Q(n, k − 1) + Q(n − 1, k).
(b) The dashed diagonal produces the same numbers as row 5 in Pascal’s triangle. The next
diagonal produces the numbers in Pascal’s triangle for row 6. Along the diagonal, n + k is
constant. For the dashed diagonal, n + k = 6. The next diagonal has n + k = 7. So,
n + k = 6 ↔ row 5 in Pascal’s triangle;
n + k = 7 ↔ row 6 in Pascal’s triangle.
The diagonal gives you the row in Pascal’s triangle, row = n + k − 1. And, k corresponds
to the column in Pascal’s triangle (k = 1 is column 0), so k corresponds to column k − 1 in
Pascal’s triangle. That is, m = n + k − 1 and ℓ = k − 1 and our guess is

Q(n, k) = n+k−1
k−1
.
Exercise 13.6.
(a) The first thing you do when using build up counting is to identify the object you are counting
with a name and tinker. Let us denote by F (n) the number of subsets of {1, 2, . . . , n} that
do not contain consecutive numbers. Now tinker with some small values of n.
n subsets F (n)
1 ∅, {1} F (1) = 2
2 ∅, {1}, {2} F (2) = 3
3 ∅, {1}, {2}, {3}, {1, 3} F (3) = 5
Now, let us try to build such a subset S. If S contains n, then it cannot contain n − 1. So the
remaining elements in S are a subset of 1, 2, . . . , n−2 not containing two consecutive numbers,
and there are F (n − 2) different such subsets (by definition of F (n)). If S does not contain
n, then the elements in S are a subset of 1, 2, . . . , n − 1 and there are F (n − 1) different such
subsets. Those are the only options for S, so by the sum rule,
F (n) = F (n − 1) + F (n − 2).
We can now compute F (20),
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
F (n) 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711
There are 17,711 subsets of {1,2,. . . ,20} that do not contain consecutive numbers.
(b) Let us denote by G(n) the number of subsets of {1, 2, . . . , n} that contain at most 1 of any 3
consecutive numbers. Tinker with some small values of n.
n subsets G(n)
1 ∅, {1} G(1) = 2
2 ∅, {1}, {2} G(2) = 3
3 ∅, {1}, {2}, {3} G(3) = 4
4 ∅, {1}, {2}, {3}, {4}, {1, 4} G(4) = 6
Now, let us try to build such a subset S. If S contains n, then it cannot contain n − 1 or
n − 2. So the remaining elements in S are a subset of 1, 2, . . . , n − 3 containing at most one
of any three consecutive numbers, and there are G(n − 3) different such subsets (by definition
of G(n)). If S does not contain n, then the elements in S are a subset of 1, 2, . . . , n − 1 and
there are G(n − 1) different such subsets. Those are the only options for S. By the sum rule,
G(n) = G(n − 1) + G(n − 3).
We can now compute G(20),
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
G(n) 2 3 4 6 9 13 19 28 41 60 88 129 189 277 406 595 872 1278 1873 2745
There are 2,745 subsets of {1,. . . ,20} containing at most one of any three consecutive numbers.
(c) The method is the same, the problem is different. Let us denote by B(n) the number of
471
length-n sequences not containing 001. Tinker with some small values of n.
n subsets B(n)
1 0, 1 B(1) = 2
2 00, 01, 10, 11 B(2) = 4
3 000, 010, 011, 100, 101, 110, 111 B(3) = 7
Let us now try to build a sequence s of length n. It either starts with 1 or 0.
1 B(n − 1) sequences
1 B(n − 2) sequences
0
0 1 sequence
If s starts with 1, what follows is any sequence of length n − 1 that does not conatin 001, and
there are B(n − 1) of these. If s starts with 0, there are two cases: the second bit is 1 in which
case what follows is any sequence of length n − 2 that does not conatin 001, and there are
B(n − 2) of these; the second bit is 0 in which case all remaining bits are 0, because otherwise
the sequence contains 001. Therefore,
B(n) = B(n − 1) + B(n − 2) + 1;
We can now compute B(20),
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
B(n) 2 4 7 12 20 33 54 88 143 232 376 609 986 1596 2853 4180 6764 10945 17710 28656
You should notice a similarity in our table for B(n) and the table for F (n) above. There are
28,656 binary sequences of length 20 that do not contain 001.
(d) Start small. For 2 players, there is just one way to configure the first round. Let P (n) be
the number of ways to configure the first round with 2n players. P (1) = 1. Now consider 2n
players. The first player can pair with any of 2n − 1 players leaving 2(n − 1) players to be
paired in P (n − 1) ways. Therefore, P (n) = (2n − 1)P (n − 1), and we have
n 1 2 3 4 5 6 7 8
P (n) 1 3 15 105 945 10395 135135 2027025
There are 2,027,025 different configurations for the first round matches.
Exercise 13.7.
(a) The claim is that Q(n, k) = n+k−1
k−1
. We prove this by a “double induction”. We prove by
induction on k, and within the induction on k, we use induction on n. Our claim is:

P (k) : Q(n, k) = n+k−1
k−1
for all n ≥ 0.
We prove by induction that P (k) is true for all k ≥ 1.

The base case is k = 1 which claims Q(n, 1) = n n
= 1 which is true.
For the induction, assume P (k). We show that

P (k + 1) : Q(n, k + 1) = n+k k
for all n ≥ 0.

k
When n = 0, Q(0, k + 1) = 1 k . Let n∗ be the smallest n for which Q(n, k + 1) 6= n+k k

(well ordering principle). Thus, n∗ > 0. By the innduction hypothesis, Q(n∗ , k) = n∗k−1 +k−1
.
n∗ +k−1

Since n∗ is the smallest n which fails, Q(n∗ − 1, k + 1) = k
.
By Exercise 13.5, Q(n, k) = Q(n, k − 1) + Q(n − 1, k), therefore
Q(n∗ , k + 1) = Q(n∗ , k) + Q(n∗ − 1, k + 1)
= n∗k−1 +k−1
+ n∗ +k−1
k
.
n∗ +k

= k
,
n−1
where in the last step we used the recursion nk = n−1 k
+ k−1 . The last expression shows
that n∗ is not a counterexample, which is a contradiction. Therefore there is no smallest
counterexample, and so P (k + 1) is true, concluding the proof by induction.
(b) The expression for Q(n,k) follows by using (a) with n + k − 1 instead of n and k − 1 instead
of k. We prove that nk = n!/k!(n − k)!. Define our claim,
n n!
P (n) : = for 0 ≤ k ≤ n.
k k!(n − k)!
472
We prove by induction that P (n) is true for all n ≥ 1. First, we verify the base case n = 1,
1
0
= 1 = 1!/0!1! and similarly 11 = 1 = 1!/1!0!. For the induction, assume P (n) is true.
n n
n+1
= + (recursion in (13.1))
k k k−1
n! n!
= + (induction hypothesis)
k!(n − k)! (k − 1)!(n − k + 1)!

n! 1 1
= +
(k − 1)!(n − k)! k n−k+1
n! n+1
= ·
(k − 1)!(n − k)! k(n − k + 1)
(n + 1)!
=
k!(n + 1 − k)!
Therefore P (n + 1) is true, and, by induction, P (n) is true for all n ≥ 1.
(c) We denoted these numbers by F (n) in the solution to Exercise 13.7(a), where we showed that
F (1) = 2, F (2) = 3 and F (n) = F (n − 1) + F (n − 2) (the Fibonacci recursion). We prove
that F (n) = Fn+2 by strong induction. The base cases n = 1, 2 are true because F3 = 2 and
F4 = 3. For the induction step, we have that
F (n + 1) = F (n) + F (n − 1) = Fn+2 + Fn+1 = Fn+3 .
(The first step is the recursion for F (n); the second is by the strong induction hypothesis;
and, the third uses the Fibonacci recursion.) By induction, F (n) = F n+2 for n ≥ 1.
(d) We denoted these numbers by B(n) in the solution to Exercise 13.7(c) (we showed that
B(1) = 2, B(2) = 4). For these two base cases, B(n) = Fn+3 − 1. For the strong induction
step,
B(n + 1) = B(n) + B(n − 1) + 1 = Fn+3 − 1 + Fn+2 − 1 + 1 = Fn+3 + Fn+2 − 1 = Fn+4 − 1.
(The first step is the recursion for B(n); the second is by the strong induction hypothesis;
and, the last step uses the Fibonacci recursion.) By induction, B(n) = F n+3 − 1 for n ≥ 1.
Pop Quiz 13.8.
(a) Everything in A maps to one element of B, f (1) = f (2) = f (3) = f (4) = 2.
(b) Trick question. 1-to-1 but not onto can only be done if |A| < |B|.
(c) Trick question. Not 1-to-1 but onto can only be done if |A| > |B|.
(d) F (1) = 2; F (2) = 3; F (3) = 4; F (4) = 5;
Pop Quiz 13.9.
(a) {3 , 4 , 2 }
(b) 000101000000
(c) Let 0•i • 1 • 0•j • 1 • 0•k be a binary sequence of length n + 2 with 2 ones. So, i, j, k ≥ 0 and
i + j + k = n. The sequence corresponds to the bag {i , j , k } containing i red candies, j
blue candies and k green candies.
Clearly this correspondence is 1-to-1 because two different sequences will have different triples
(i, j, k) which maps to different candy bags. For any candy bag, we can construct the sequence,
so the mapping is onto, hence a bijection.
Exercise 13.10. P
(a) xi is the number of candies of color i and since there are 10 candies, i xi = 10.
Any goodiy bag with the 10 candies gives non-negative xi ’s which sum to 10. Any non-
negative integer solution to x1 + · · · + x4 = 10 gives a candy bag with xi of candy i. We have
a bijection between the candy bags and the non-negative solutions, that is

Q(10, 4) = 10+4−1 4−1
= 103
= number of non-negative solutions to x1 + · · · + x4 = 10.
(b) Let yi = xi − 1. Then yi are non-negative and y1 + · · · + y4 = x1 + · · · + x4 − 4 = 6. Every
non-negative solution to y1 + · · · + y4 = 6 gives a positive
solution to x1 + · · · + x4 = 10 and

vice versa. Therefore, the answer is Q(6, 4) = 6+4−1 4−1
= 9
3
.
(c) Introduce a dummy variable x5 = 10 − (x1 + · · · + x4 ), x5 ≥ 0 and x1 + · · · + x5 = 10. Every
non-negative solution to x1 +
· · · + x5= 10 gives a non-negative solution to x1 + · · · + x4 ≤ 10,

so the answer is Q(10, 5) = 10+5−1
5−1
= 14
4
.
473
(d) For the binary sequence b1 · · · b10 construct the subset A containing the elements where bi = 1,
A = {xi | bi = 1}.
Every binary sequence with 3 ones gives corresponds to a unique subset A with 3 elements
and for every 3-subset A we can construct the binary sequence with 3 ones, so we have a
bijection. Thereforethe number of such subsets equals the number of binary sequences
with
3 ones, which is 103
. In general, the number of k-subsets of an n-element set is nk .
(e) A 3-subset corresponds uniquely to its complement (a 7-subset) and viceversa.
Since we have

a bijection from 3-subsets to 7-subsets, 10
3
= 10
7
. In general n
k
= n
n−k
.
(f) The bijection is exactly the same as in (d), but to n-bit binary sequences with k ones. The
ones indicate which elements are in the subset.
(g) Use the bijection in (e) from a k-subset to its complement, an (n − k)-subset.
(h) We proved using the product rule on page 184 that there are 2 n binary sequences of length
n. Each sequence uniquely describes a subset (the ones correspond to the elements in the
subset), so there are 2n subsets of a set (see also Example 13.1 about Senate committees on
page 186). Here is another way to count the subsets using the sum rule:
|{subsets}| = |{subsets of size 0}| + |{subsets of size 1}| + · · · + |{subsets of size n}|.

From parts (d) and (f), the number of subsets of size k is nk , therefore

2n = |{subsets}| = n0 + n1 + · · · + n n
.
This problem illustrates one of the fundamental techniques for establishing that two combi-
natorial expressions are equal: count a set in two different ways; P
the answers
must be equal.
In our example one way of counting gave 2n , and the other gave n n
k=0 k .
Pop Quiz 13.11. This problem is actually more complicated than it appears at first sight. The
king can be placed in one of 64 positions, removing 15 possible row-column squares for the queen.
How many diagonal squares are removed? It depends on where the king is.
8
0Z0Z0Z0Z 8
0Z0Z0Z0Z
7
Z0Z0Z0Z0 7
Z0Z0Z0Z0
6
0Z0Z0Z0Z 6
0Z0Z0Z0Z
5
Z0Z0Z0Z0 5
Z0Z0Z0Z0
4
0Z0Z0Z0Z 4
0Z0Z0Z0Z
3
Z0Z0ZKZ0 3
Z0Z0ZKZ0
2
0J0Z0Z0Z 2
0J0Z0Z0Z
1
Z0Z0Z0Z0 1
Z0Z0Z0Z0
a b c d e f g h a b c d e f g h
On the left we show two possible king positions. The king on b2 covers 9 diagonal squares
(excluding its own square), but the king on f 3 covers 11 diagonal squares. You should verify for
the figure on the right that the positions on the black box all cover 9 diagonal squares and those
on the red box all cover 11 diagonal squares. Our observation suggests that we should use the sum
rule with four types of positions for the king: the outer-most ring of size 28 (covering 7 diagonal
squares) to inner-most ring of size 4 (covering 13 diagonal squares)
ring size 28 20 12 4
diagonal squares covered 7 9 11 13
The number of positions available to the queen, given the position of the king is
64 − 15 row-column squares − number of diagonal squares covered.
Denoting the rings by 0,1,2,3 (ring-0 is outermost). Here are our observations:
type of king position ring-0 ring-1 ring-2 ring-3
number of possible king positions 28 20 12 4
diagonal squares covered 8 10 12 14
number of possible queen positions 42 40 38 36
Using the product rule within the types of king positions and the sum rule to add up the positions
of each type to get the total number of possible positioins,
number of possible positions = 28 × 42 + 20 × 40 + 12 × 38 + 4 × 36
= 2576
474
For practice, consider a general n × n board. First, if the queen cannot be on the same row and
column as the king, the number of ways to specify the sequence cK rK cQ rQ is
n × n × (n − 1) × (n − 1) = n2 (n − 1)2 .
If the queen cannot be on the same diagonal, the number of row-columns squares covered by the
king is 2n−1. In the outermost ring with 4n−4 squares, the number of diagonals covered is n−1, so
the number of possible queen positions is n2 −(2n−1)−(n−1). Each time you move in by one ring,
the number of possible king positions decreases by 8 and the number of diagonals covered increases
by 2, so the number of possible queen positions decreases by 2. Thus, in ring-i, the number of
possible king positions is 4n − 4 − 8i and queen positions is n2 − 3n + 2 − 2i = (n − 1)(n − 2) − 2i.
The number of rings is n/2 when n. When n is odd, the number of rings is (n − 1)/2 plus the
single center square. Using the product and sum rule for the rings,
Xk
number of possible positions = (4n − 4 − 8i)((n − 1)(n − 2) − 2i).
i=0
where k = n/2 − 1 if n is even and k = (n − 1)/2 − 1 when n is odd (when n is odd, there is also
the center square to consider). We use techniques from Chapter 9 to compute the sum.
k
X
number of possible positions = 4 (n − 1 − 2i)((n − 1)(n − 2) − 2i)
i=0
" k k k
#
X 2
X 2
X 2
= 4 (n − 1) (n − 2) − 2 (n − 1) i + 4 i
i=0 i=0 i=0

= 4 (k + 1)(n − 1)2 (n − 2) − k(k + 1)(n − 1) + 46 k(k + 1)(2k + 1)
2
When n is even, we plug in k = n/2 − 1 to get

number of possible positions = 31 n(n − 1)(n − 2)(3n − 1).
When n is odd, we plug in k = (n − 1)/2 − 1 and add the n2 − 4n + 3 positions with the king on
the center square. The resulting formula is the same.
Pop Quiz 13.12. To specify the positions of the 8 castles, we specify the sequence
(c1 r1 )(c2 r2 )(c3 r3 )(c4 r4 )(c5 r5 )(c6 r6 )(c7 r7 )(c8 r8 )
for the column and row of each castle. For the columns, there are 8 × 7 × · · · × 1 = 8! ways. For
the rows, there are 8 × 7 × · · · × 1 = 8! ways. For the rows and columns, by the product rule,
there are (8!)2 ways. Consider one such sequence, for example, (a1)(b2)(c3)(d4)(e5)(f 6)(g7)(h8).
If we reorder some positions, for example to (b2)(a1)(c3)(d4)(e5)(f 6)(g7)(h8), we get a different
sequence but the same position. There are 8 × 7 × · · · × 1 = 8! possible reorderings of this
position sequence, so this means that every position corresponds to 8! different sequences, a 1-to-
8! mapping. By the multiplicity rule, there are 8! times as many sequences as there are positions.
So the number of positions is (8!)2 /8! = 8! = 40320.
Another way to quickly get this result is to observe that we may specify the position with the
rows increasing. So every position is specified by a sequence of the form
(c1 1)(c2 2)(c3 3)(c4 4)(c5 5)(c6 6)(c7 7)(c8 8).
That is, we only get to choose the columns, and there are 8! ways to do that.
Exercise 13.13.
(a) A poker hand is a subset of 5 cards
from 52. The number of poker hands equals the number
of possible subsets which is 525
= 52!/(5! × 47!) = 2598960.
(b) The idea for such problems is give a sequence of instructions that uniquely constructs the
object. Your instructions must unambiguously construct the object. Effectively, you construct
a bijection between sequences of instructions and the objects. Now count the sequences. Here
is a “recipe” to construct a 4-of-a-kind poker hand:
1: Choose a value v and pick all four cards of value v: ♠v ♥v ♦v ♣v.
2: Choose one of the other cards of a different value c.
The sequence vc completely specifies the 4-of-a-kind. Change any part of the sequence and
you get a different 4-of-a-kind hand. We have a bijection between sequences vc and 4-of-a-
kind hands. Counting the sequences is “easy”. There are 13 possible choices for v and for each
475
choice of v there are 48 choices for c. By the product rule,

|{4-of-a-kind hands}| = |{sequences vc}| = 13 × 48 = 624.
(c) To construct a flush, here is a recipe:
1: Pick the suit s, either ♠♥♦or ♣.
2: Choose a set of 5 values V = {v1 , v2 , v3 , v4 , v5 } from the 13 values in suit s.
The sequence sV completely specifies theflush. There are 4 choices for s. Given s, V is a
5-subset of the 13 values, so there are 13 5
choices for this subset. By the product rule,

|{flushes}| = |{sequences sV}| = 4 × 13 5
= 5148.
(d) To construct a full-house, here is a recipe:
1: Choose a value v1 .
2: Choose T1 = {s1 , s2 , s3 }, a set of 3 suits having value v1 .
3: Choose a second value v2 6= v1 .
4: Choose T2 = {s1 , s2 }, a set of 2 suits having value v2 .
The sequence v1 T1 v2 T2 completely specifies the full-house. There are 13 choices for v1 ; given
v1 , T1 is a 3-subset of the 4 suits, which can be picked in 43 ways; given v1 T1 , v2 has 12
choices
(since v2 6= v1 ); given v1 T1 v2 , T2 is a 2-subset of the 4 suits, which can be picked in
4
2
ways. By the product rule,

|{full-houses}| = |{sequences v1 T1 v2 T2 }| = 13 × 43 × 12 × 42 = 3744.
(e) To construct a 3-of-a-kind, here is a recipe:
1: Choose a value v
2: Choose T = {s1 , s2 , s3 }, a set of 3 suits having value v.
3: Choose c1 , a card of value v1 6= v.
4: Choose c2 , a card of value v2 6= v or v1 .
Let’s count the sequences vT c1 c2 : there are 13 choices for v; given v, T is a 3-subset of the
4 suits, which can be picked in 43 ways; there are 48 choices for c1 (because v1 6= v2 ); there
are 44 choices for c2 (because v2 6= v or v1 ). By the product rule,

|{3-of-a-kinds}| = |{sequences vT C}| = 13 × 43 × 48 × 44 = 109842.
WRONG! The mistake is similar to the issue with counting positions of two castles (indistin-
guishible pieces) on a chessboard versus a king and a queen (distinguishible pieces). The prob-
lem is with the last two cards in the 3-of-a-kind. (♠A,♥A,♣A,♥7,♣2) and (♠A,♥A,♣A,♣2,♥7)
are the same hand. However the two sequences vT c1 c2 are different. So two different sequences
map to the same hand: we do not have a 1-to-1 mapping.
There are many ways to resolve this problem. Observe that every hand maps to two sequences.
Using the multiplicity rule, there are twice as many sequences as hands: the number of hands
is 109842/2 = 54912. Alternatively, we view the remaining 48 cards in some order and pick
c1 c2 with c1 < c2 (as we did with the castle positions). The number of ways to pick c1 < c2
is half the number of ways to pick c1 c2 (the other half have c1 > c2 ). The systematic route is
to give a recipe that uniquely constructs a 3-of-a-kind hand:
1: Choose a value v
2: Choose T = {s1 , s2 , s3 }, a set of 3 suits having value v.
3: Choose a pair of values V = {v1 , v2 } from the remaining 12 values.
4: Choose x1 , a suit from value v1 .
5: Choose x2 , a suit from value v2 .
Now, the sequence vT Vx1 x2 uniquely constructs a 3-of-a-kind hand and we can count the
sequences. There
are 13 choices for v; given v, T is a 3-subset of the 4 suits, which can be
picked in 43 ways; V is a 2-subset of the remaining 12 values, with 12 2
choices; given V, x1
and x2 each have 4 choices. By the product rule,

|{3-of-a-kinds}| = |{sequences vT Vx1 x2 }| = 13 × 43 × 12 2
× 4 × 4 = 54912.
(f) To construct a two-pair, here is a recipe:
1: Choose V = {v1 , v2 } the values for each pair.
2: Choose S1 = {s1 , s2 }, a set of 2 suits having value v1 .
3: Choose S2 = {s1 , s2 }, a set of 2 suits having value v2 .
4: Choose a the 5th card c from the 44 not of value v1 , v2 .
476

To count the sequences VS1 S2 c: there are 13 2
choices for the 2-subset of values from 13; S1

is a 2-subset of the 4 suits, which can be picked in 42 ways; similarly, there are 42 ways to
pick S2 ; lastly, there are 44 choices for c. By the product rule,

|{two-pairs}| = |{sequences VS1 S2 c}| = 13 2
× 42 × 42 × 44 = 123552.
Chapter 14
Exercise 14.1.
(a) We have a sequence
of
length n = 8 with 3
a , 2 r , 1 d , 1 k and 1 d . The number of
8 8!
sequences is 3,2,1,1,1 = 3!×2!×1!×1!×1! = 3360.
(b) A bouquet is exactly analogous toa goody-bag. The bouquet has 36 objects of 4 colors. The
36+4−1 39

number of bouquets is Q(36, 4) = 4−1
= 3 = 9139.
(c) Let the students be s1 s2 · · · s25 .
(i) There are 5 types of students: those assigned to cooking, cleaning, laundry, entertain-
ment, groceries. There are 5 of each type of student,
so we need a sequence of length 25
25
with 5 of each type. This can be done in 5,5,5,5,5 ways.
√ √
25 25! 2525 e−25 50π 25 50π
= ≈ = 5 · ≈ 6.75 × 1014 .
5, 5, 5, 5, 5 (5!)5 525 e−25 (10π)5/2 (10π)5/2
√
(We used Stirling’s formula, n! ≈ nn e−n 2πn. The exact answer is about 6.234 × 1014 .)
(ii) For each task we pick a subset of 5 students to perform the task, which can be done in
25
5
5
ways. By the product rule, the number of ways to perform the tasks is 25 5
.
25 5
5 23
5
= 53130 ≈ 4.2 × 10 .
Exercise 14.2.
(a) For this problem we use the binomial and multinomial theorem:

(i) We want the coefficient of 14 x5 which is 94 = 126.

(ii) The coefficient of (2x)4 (3y)3 is 74 = 35 which gives 35(2x)4 (3y)3 = 35 · 24 · 33 x4 y 3 , so
the coefficient of x4 y 3 is 35 × 24 × 33 = 15120.
7
(iii) x4 y 8 is the coefficient of (x1 (x2 )2 (y 2 )4 which is 1,2,5
= 105.
n i j
(b) Monomials in (x + y) are x y . There are n + 1 possibile i (0,1,. . . ,n), and given i, j = n − i.
So, the number of different monomials is n + 1.
For (x + y + z)n , the monomials are specified by (i, j, k) where i is the power of x, j is the
power of y and k is the power of z: i ranges from 0 to n; given i, j ranges from 0 to n − i
which is (n − i + 1) choices for j; given i and j, k is n − i − j. So the number of monomials is
Xn
(n − i + 1) = (n + 1) + n + · · · + 1 = 21 (n + 1)(n + 2).
i=0
(c) The terms are n-sequences with tokens a1 , . . . , ak , so there are k n terms. The number of
i
sequences with i1 a1 ’s, i2 a2 ’s, . . . , ik ak ’s is i1 ,i2n,...,ik and each equals ai11 ai22 · · · akk , where
i1 ≥ 0,. . . ,ik ≥ 0 and i1 + · · · + ik = n. We get the multinomial theorem:

X n
(a1 + a2 + · · · + ak )n = ai11 ai22 · · · aikk .
i1 , i2 , . . . , ik
i1 ≥0, i2 ≥0,...ik ≥0
i1 +i2 +···+ik =n
(d) The result is immediate from setting a1 = a2 = · · · = ak = 1 in part (c).

Pop Quiz 14.3. For convenience we repeat the derivation here.
(a)
|A1 ∪ A2 ∪ A3 | = |A1 ∪ A2 | + |A3 | − |(A1 ∪ A2 ) ∩ A3 |
(b)
= |A1 | + |A2 | + |A3 | − |A1 ∩ A2 | − |(A1 ∩ A3 ) ∪ (A2 ∩ A3 )|
(c)
= |A1 | + |A2 | + |A3 | − |A1 ∩ A2 | − (|A1 ∩ A3 | + |A2 ∩ A3 | − |A1 ∩ A3 ∩ A2 ∩ A3 |)
(d)
= |A1 | + |A2 | + |A3 | − |A1 ∩ A2 | − |A1 ∩ A3 | − |A2 ∩ A3 | + |A1 ∩ A2 ∩ A3 |.
477
(a) Let B = A1 ∪ A2 . Using inclusion-exclusion for two sets, |B ∪ A3 | = |B| + |A3 | − |B ∩ A3 |.

(b) Apply two set inclusion-exclusion to |A1 ∪A2 | and use the distributive property of intersection,
(A1 ∪ A2 ) ∩ A3 = (A1 ∩ A3 ) ∪ (A2 ∩ A3 ) (see Figure 2.1 on page 17).
(c) Let A = A1 ∩ A3 and B = A2 ∩ A3 and apply two set inclusion-exclusion to |A ∪ B|.
(d) |A1 ∩ A3 ∩ A2 ∩ A3 | = |A1 ∩ A2 ∩ A3 |.
Exercise 14.4.
(a) |A1 ∪ A2 ∪ A3 ∪ A4 | =
+ (|A1 | + |A2 | + |A3 | + |A4 |)
− (|A1 ∩ A2 | + |A1 ∩ A3 | + |A1 ∩ A4 | + |A2 ∩ A3 | + |A2 ∩ A4 | + |A3 ∩ A4 |)
+ (|A1 ∩ A2 ∩ A3 | + |A1 ∩ A2 ∩ A4 | + |A1 ∩ A3 ∩ A4 | + |A2 ∩ A3 ∩ A4 |)
− (|A1 ∩ A2 ∩ A3 ∩ A4 |)
(i) We give a proof that will generalize to the induction step for the general case.
|A1 ∪ A2 ∪ A3 ∪ A4 | = |A1 ∪ A2 ∪ A3 | + |A4 | − |(A1 ∪ A2 ∪ A3 ) ∩ A3 |
= |A1 ∪ A2 ∪ A3 | + |A4 | − |(A1 ∩ A4 ) ∪ (A2 ∩ A4 ) ∪ (A3 ∩ A4 )|
We can apply 3-set inclusion-exclusion to the first and third terms:
|A1 ∪ A2 ∪ A3 |
X3
= (−1)k+1 · (sum of k-way intersections involving A1 , A2 , A3 )
k=1
|(A1 ∩ A4 ) ∪ (A2 ∩ A4 ) ∪ (A3 ∩ A4 )|

X3
= (−1)k+1 · (sum of k-way intersections involving A1 ∩ A4 , A2 ∩ A4 , A3 ∩ A4 )
k=1
Crucial observation: a k-way intersection involving A1 ∩ A4 , A2 ∩ A4 , A3 ∩ A4 is the
intersection of A4 with the corresponding k-way intersection involving A1 , A2 , A3 . For
example (A1 ∩ A4 ) ∩ (A2 ∩ A4 ) = A1 ∩ A2 ∩ A4 . We conclude that
|A1 ∪ A2 ∪ A3 ∪ A4 |
X3
= (−1)k+1 · (sum of k-way intersections involving A1 , A2 , A3 )
k=1
3
X
+|A4 | + (−1)k+2 · (sum of A4 ∩k-way intersections involving A1 , A2 , A3 )
k=1
The summands in the last term are the k + 1-way intersections involving A4 . The
summands in the first term are the k-way intersections that do not include A 4 .
|A1 ∪ A2 ∪ A3 ∪ A4 |
X3
= (−1)k+1 · (sum of k-way intersections not involving A4 )
k=1
3
X
+|A4 | + (−1)k+2 · (sum of (k + 1)-way intersections involving A4 )
k=1
The last two terms are all k-way intersections that involve A4 with k = 1, . . . , 4. So,
|A1 ∪ A2 ∪ A3 ∪ A4 |
X3
= (−1)k+1 · (sum of k-way intersections not involving A4 )
k=1
4
X
+ (−1)k+1 · (sum of k-way intersections involving A4 )
k=1
Summing over k-way intersections involving A4 and those not involving A4 amounts to
summing over all k-way intersections:
4
X
|A1 ∪ A2 ∪ A3 ∪ A4 | = (−1)k+1 · (sum of k-way intersections)
k=1
478
(ii) The proof for general n by induction mimics the proof for n = 4. The base case is n = 2,
which we have already established. For the induction step, consider any n + 1 sets.
|A1 ∪ A2 ∪ A3 ∪ · · · ∪ An ∪ An+1 |
= |A1 ∪ A2 ∪ · · · ∪ An | + |An+1 | − |(A1 ∪ A2 ∪ · · · ∪ An ) ∩ An+1 |
= |A1 ∪ A2 ∪ · · · ∪ An | + |An+1 | − |(A1 ∩ An+1 ) ∪ (A2 ∩ An+1 ) ∪ · · · ∪ (An ∩ An+1 )|
The first term, by the induction hypothesis, sums over the k-way intersections not in-
volving An+1 . The last two terms sums over the k-way intersections involving An+1 ,
|A1 ∪ A2 ∪ A3 ∪ · · · ∪ An ∪ An+1 |
Xn
= (−1)k+1 · (sum of k-way intersections not involving An+1 )
k=1
n+1
X
+ (−1)k+1 · (sum of k-way intersections involving An+1 )
k=1
n+1
X
= (−1)k+1 · (sum of k-way intersections).
k=1
That is, the formula holds for a union of n + 1 sets, and by induction for all n ≥ 2.
(b) Let A2 , A3 , A5 , A7 be the sets of numbers from 1 to 2015 that are divisible by 2,3,5,7 respec-
tively. We want 2015 − |A2 ∪ A3 ∪ A5 ∪ A7 |. Here are two facts:
Lemma 30.5. There are ⌊ n/k ⌋ numbers from 1 to n that are divisible by k.
Lemma 30.6. x is divisible by d1 , d2 , . . . , dk if and only if x is divisible by the least common
multiple of d1 , d2 , . . . , dk .
To compute |A2 ∪ A3 ∪ A5 ∪ A7 |, we need the sizes of all the k-way intersections. For example,
|A2 | = ⌊ 2015/2 ⌋ and |A2 ∩ A3 | = ⌊ 2015/6 ⌋ because A2 ∩ A3 contains numbers divisible by 2
and 3, and hence by 6. Define Aij = Ai ∩ Aj and similarly Aijk and Aijkl . We have:
|A2 | = 1007; |A3 | = 671; |A5 | = 403; |A7 | = 287.
|A23 | = 335; |A25 | = 201; |A27 | = 143; |A35 | = 134; |A37 | = 95; |A57 | = 57.
|A235 | = 67; |A237 | = 49; |A257 | = 28; |A357 | = 19.
|A2357 | = 9.
Therefore, for |A2 ∪ A3 ∪ A5 ∪ A7 | we get:
1007 + 671 + 403 + 287 − 335 − 201 − 143 − 134 − 95 − 57 + 67 + 49 + 28 + 19 − 9 = 1557.
We conclude that 2015 − 1557 = 458 numbers are not divisible by any of {2, 3, 5, 7}.
(c) We count the ways to distribute the hats so that some man gets the right hat and subtract
from 4!, the number of ways to distribute the hats. Let A1 be the orderings in which man
1 gets his correct hat; similarly define A2 , A3 , A4 . The number of ways in which some man
gets the right hat is |A1 ∪ A2 ∪ A3 ∪ A4 |. For inclusion-exclusion, we need the intersections.
Let A12 = A1 ∩ A2 be the number of ways men 1 and 2 get the correct hats; similarly define
A13 , A14 , A23 , A24 , A34 and so on. Now, |Ai | = 3! (3! ways to distribute the other 3 hats after
giving man i his hat). Similarly, |Aij = 2! because there are 2! ways to distribute the other 2
hats after giving men i and j their hats; |Aijk | = 1!; and, Aijkl | = 0!. We have
|A1 | = |A2 | = |A3 | = |A4 | = 6.
|A12 | = |A13 | = |A14 | = |A23 | = |A24 | = |A34 | = 2.
|A123 | = |A124 | = |A134 | = |A234 | = 1.
|A1234 | = 1.
Applying the inclusion-exclusion formula,
|A1 ∪ A2 ∪ A3 ∪ A4 | = 4 × 6 − 6 × 2 + 4 × 1 − 1 = 15.
The answer is 4! − 15 = 9. Distributing n objects so that no object goes into its correct spot is
a derangement. Example 14.4 on page 206 discusses how to count derangements of n objects.
Exercise 14.5.
479
(a) Let A12 be the permutations containing 12 and A24 the permutations containing 24. There
are 10! passwords. The invalid passwords are in A12 ∪ A24 , so the number of valid passwords
is 10! − |A12 ∪ A24 |. By inclusion-exclusion,
|A12 ∪ A24 | = |A12 | + |A24 | − |A12 ∩ A24 |.
To count passwords containing 12, treat 12 as a single token. We want a permutation of
0 12 3456789. There are 9! permutations, i.e. 9! such passwords. Similarly, there are 9!
passwords contatining 24. Therefore |A12 | = |A24 | = 9!. The passwords containing 12 and 24
must contain 124. Treating 124 as a single token, we need the permutations of 0 124 356789,
of which there are 8!. Thus, |A12 ∩ A24 | = 8!. The number of valid passwords is
10! − 2 × 9! + 8! = 2943360.
(b) Each element in A can map to m elements in B, so the total number of functions is n m .
Let B = {b1 , b2 , . . . , bm }. Let F1 be the functions which do not have b1 in the range (so no
element of A maps to b1 ); similarly define F2 , . . . , Fm . A function is not onto if one of the bi ’s
is not in the range. That is, the functions which are not onto are in F 1 ∪ F2 ∪ · · · ∪ Fm . Let
us compute the size of a k-way intersection |Fi1 ∩ Fi2 ∩ · · · ∩ Fik | – the functions which do not
use bi1 , bi2 , . . . , bik . The function can map each element of A to m − k of the bi ’s, so there are
(m − k)n such functions. Since there are nk such k-way intersections, by inclusion-exclusion,
m
X m
|F1 ∪ F2 ∪ · · · ∪ Fm | = (−1)k+1 (m − k)n .
k
k=1
The number of onto functions is mn − |F1 ∪ F2 ∪ · · · ∪ Fm |,

Xm m
number of onto functions from [n] to [m] = mn − (−1)k+1 (m − k)n
k
k=1
Xm m
= (−1)k (m − k)n .
k
k=0
n
Recall that in the solution to Exercise 13.3(c) we introduced m (Stiring numbers of the
second kind): the number of ways to distribute n named objects (the elements of A) into m
unnamed (indistinguishible) bins (the elements of B) so that no bin is empty. Here, the bins
are distinguishible, so multiplying
n by m!, the number of ways to label the bins with the labels
b1 , . . . , bm , we have that m! m is the number of ways to distribute the n objects (A) into
m bins (B), each such way being an onto function from A to B, so
nno Xm m
number of onto functions from [n] to [m] = m! = (−1)k (m − k)n .
m k
k=0
n Pm k m

This gives us another formula for the Stirling number, m = m!1
k=0 (−1) k
(m − k)n .
(c) If there were no upper bound constraints, we know how to compute the number of solutions:

32
number of solutions to x1 + x2 + x3 = 30 with x1 , x2 , x3 ≥ 0 is Q(30, 3) = = 496.
2

(We used Q(n, k) = n+k−1 k−1
.) Let S1 be the solutions which violate the upper bound
constraint on x1 . So S1 contains solutions to x1 + x2 + x3 = 30 with x1 ≥ 11 and x2 , x3 ≥ 0.
Similarly, define S2 , the solutions to x1 + x2 + x3 = 30 with x2 ≥ 16 and x1 , x3 ≥ 0 and S3 ,
the solutions to x1 + x2 + x3 = 30 with x3 ≥ 21 and x1 , x2 ≥ 0. The number of solutions
which satisfy the upper bound constraints is Q(30, 3) − |S1 ∪ S2 ∪ S3 |. By inclusion-exclusion,
|S1 ∪ S2 ∪ S3 | = |S1 | + |S2 | + |S3 | − |S1 ∩ S2 | − |S1 ∩ S3 | − |S2 ∩ S3 | + |S1 ∩ S2 ∩ S3 |.
To get |S1 |, observe that solutions to x1 + x2 + x3 = 30 with x1 ≥ 11 and x2 , x3 ≥ 0 are
solutions to x1 + x2 + x3 = 19 with x1 , x2 , x3 ≥ 0 after subtracting 11 from x1 (and vice
versa). Therefore |S1 | = Q(19, 3). Similarly, |S2 | = Q(14, 3) and |S3 | = Q(9, 3). S1 ∩ S2
contains solutions to x1 + x2 + x3 = 30 with x1 ≥ 11, x2 ≥ 16, x3 ≥ 0. These solutions give
solutions to x1 + x2 + x3 = 3 with x1 , x2 , x3 ≥ 0 after subtracting 11 from x1 and 16 from
x2 , so |S1 ∩ S2 | = Q(3, 3). Similarly, |S1 ∩ S3 | = Q(−2, 3) = 0 and |S2 ∩ S3 | = Q(−7, 3) = 0.
Finally, S1 ∩ S2 ∩ S3 contains solutions to x1 + x2 + x3 = 30 with x1 ≥ 11, x2 ≥ 16, x3 ≥ 21
480
which are solutions to x1 + x2 + x3 = −18 with x1 , x2 , x3 ≥ 0 after subtracting 11 from x1 ,

16 from x2 and 21 from x3 . So, |S1 ∩ S2 ∩ S3 | = Q(−18, 3) = 0. Putting all this together,
|S1 ∪ S2 ∪ S3 | = Q(19, 3) + Q(14, 3) + Q(9, 3) − Q(3, 3) − 0 − 0 + 0

= 21
2
+ 162
+ 11
2
− 52
= 375.
Our answer is 496 − 375 = 121 solutions.
(d) The numbers in [n] which are not relatively prime to n are divisible by p1 or p2 or . . . or pm .
Let Ai be the numbers in [n] that are divisible by pi . Then the numbers in [n] which are not
relatively prime to n are in A1 ∪ · · · ∪ Am , and so
ϕ(n) = n − |A1 ∪ · · · ∪ Am |.
To compute the size of the union, we need the sizes of k-way intersections |A i1 ∩ · · · ∩ Aik |.
Since Ai1 ∩ · · · ∩ Aik contains the numbers divisible by pi1 and pi2 and . . . pik ,

n n
|Ai1 ∩ · · · ∩ Aik | = = ,
p i1 p i2 · · · p ik p i1 p i2 · · · p ik
where the last equality follows because n is divisible by pi1 pi2 · · · pik . By inclusion-exclusion,
Xm
|A1 ∪ · · · ∪ Am | = (−1)k+1 · (sum over all k-way products of n/pi1 pi2 · · · pik ).
k=1
Therefore, Euler’s totient function is
Xm
ϕ(n) = n − (−1)k+1 · (sum over all k-way products of n/pi1 pi2 · · · pik )
k=1
m
X
= n (−1)k · (sum over all k-way products of 1/pi1 pi2 · · · pik ).
k=0
Let us compare this expression with the formula
Q
ϕ(n) = n m i=1 1 − pi
1
=n 1− 1
p1
1− 1
p2
··· 1 − 1
pm
.
m
If you multiply out the RHS, you get 2 terms. Each term is the product of k reciprocal
primes, 1/pi1 pi2 · · · pik with sign (−1)k . Each term in this expansion matches one term in our
inclusion-exclusion sum, and so the two expressions are equal.
Exercise 14.6.
(a) This is the same result as social twins, except with social enemies.
(b) You need to research two facts to solve this problem:
Population of New York City: larger than 8 million.
Human head-hairs: the estimate is at most 200,000. So lets be safe and say 1 million.
Each person in New York City is a pigeon. The pigeonholes are 0, 1, . . . , 106 . Each person is
placed in the pigeonhole corresponding to the number of hairs on his/her head. Since there
are more people than pigeonholes, at least two people occupy the same pigeonhole. Those
two people have the same number of head-hairs.
(c) Define pigeonholes 0, 1, . . . , k −1. There are k pigeonholes. The k +1 numbers are the pigeons.
Place a number x in pigeonhole i if the remainder when x is divided by k is i. That is,
pigeonhole(x) = rem(x, k).
There are more numbers than pigeonholes, so at least two numbers x1 and x2 are in the same
pigeonhole. That is, rem(x1 , k) = rem(x2 , k), or x1 ≡ x2 (mod k) and k|x1 − x2 .
(d) There are 7 days of the week (pigeonholes). Place 8 guests (pigeons) in pigeonholes by the day
of the week on which they were born, there will be at least two guests in the same pigeonhole.
Even with infinitely many guests, none may be born on Monday. This is a very important
aspect of the pigeonhole principle. You can guarantee two are born on the same day with 8
guests, but you don’t know which day that would be (it could be any).
P
(e) (i) Let zi be the number of pigeons in pigeonhole i, for i = 1, . . . , k. Then n = ki=1 zi .
Now suppose that no pigeonhole has at least ⌈ n/k ⌉ pigeons, zi < ⌈ n/k ⌉. Since ⌈ n/k ⌉
481
is an integer, it means that zi ≤ ⌈ n/k ⌉ − 1. It is easy to verify that ⌈ n/k ⌉ − 1 < n/k by
separately considering the cases n/k is an integer and n/k is not an integer. Therefore,
X k Xk Xk
n= zi ≤ (⌈ n/k ⌉ − 1) < n/k = n.
i=1 i=1 i=1
This contradiction proves that zi ≥ ⌈ n/k ⌉ for at least one i.
(ii) Certainly the longest non-increasing subsequence ending at x i contains xi , so 1 ≤ ℓi . By
assumption, ℓi ≤ n. So, there are n possibilities for ℓi : 1, 2, . . . , n. Define the pigeonholes
1, 2, . . . , n. The numbers x1 , . . . , xn2 +1 are the pigeons. Place the number xi into the
pigeonhole corresponding to ℓi . Sincethere are n2+1 pigeons and n pigeonholes, there
is
at least one pigeonhole with at least (n2 + 1)/n pigeons, by part (i). (n2 + 1)/n =
⌈ n + 1/n ⌉ = n + 1. That is, there are at least n + 1 of the ℓi that are equal.
(iii) Suppose ℓi = ℓj . Then we show that xj < xi . Suppose, to the contrary, that xj ≥ xi .
The longest non-decreasing sequence ending at xi has length ℓi . Take this sequence and
add xj to the end: since xj ≥ xi , this is a non-decreasing sequence that ends at xj ,
so the longest non-decreasing sequence ending at xj has length at least ℓi + 1 which
contradicts the fact that ℓj = ℓi So xj < xi .
We have proved that if ℓi1 = ℓi2 = · · · ℓik , then xik < xik−1 < · · · < xi2 < xi1 . That is,
xi1 , . . . , xik are non-increasing.
By (ii), there are n + 1 of the ℓi that are equal. That is, ℓi1 = · · · ℓin+1 , which means
xi1 , . . . , xin+1 are a non-increasing subsequence of length n + 1, concluding the proof.
This result is tight in that there are n2 numbers for which there are non-increasing and
non-decreasing sequences of length n but neither of length n + 1. For example
2, 4, 1, 3 (n = 2) 3, 6, 9, 2, 5, 8, 1, 4, 7 (n = 3)
(f) There are 4 suits (pigeonholes). The cards are pigeons. Place a card into the pigeonhole
corresponding to its suit. By part (e) sub-part (i), there are at least ⌈ 17/4 ⌉ = 5 cards in one
pigeonhole (suit). That is, there must be a 5-card flush.
(g) The pigeons are the baskets. The pigeonholes are 0, 1, . . . , 24. Place a basket into the pi-
geonhole corresponding to the number of apples in the basket. There are 51 baskets and 25
pigeonholes. By part (e) sub-part (i), there are at least ⌈ 51/25 ⌉ = 3 baskets in one of the
pigeonholes. That is at least 3 baskets contain the same number of apples.
(h) Suppose none of the unit squares overlap. The area covered by the squares is 13. The area of
a circle of radius 2 is A = 4π ≈ 12.57 < 13. If the squares do not overlap, they cover an area
larger than the circle, a contradiction. Therefore, the squares must overlap.
Chapter 15
Pop Quiz 15.1.
(a) The statement involves John Smith and liver disease. The crucial aspect about a probabilistic
statement is to determine exactly what is the “source of the randomness”. Is it that 30% of the
John Smiths will survive till seventy? Is it that 30% of the John Smiths who are diagnosed
with liver disease will survive till seventy? Is it that 30% of people diagnosed with liver disease
will survive till seventy (and this persons name being John Smith is incidental)? Based on
our intuitive understanding of the context, most would agree that the name is incidental, and
the probabilistic statement is being made about people diagnosed with liver disease.
So, my interpretation of this statement is that if you took all people diagnosed with liver
disease, about 30% of them will survive till seventy.
(b) Among all packets sent out on the internet, approximately 0.01% are “dropped” and about
99.99% of them successfully reach their destination.
(c) Between now and your wedding in Washington, several “random things will occur”. Taken in
sequence, some of these combinations of random things will allow me to come to the wedding
(e.g. I finish my thesis ahead of schedule); and, in some cases, I won’t be able to come (e.g.
one of my experiments fails and I need to redo it). Half of those combinations of random
things will allow me to come and half won’t. This is similar to “The chance of rain tomorrow
is 40%”. If we “re-live” the time between now and your wedding 100 times, on approximately
50 of those reincarnations I will be at your wedding.
482
Pop Quiz 15.2. Here is the outcome-tree with edge probabilities and outcome-probabilitities.
1
6
1
6
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
Every edge probability is 16 because at each vertex the die randomly rolls one of 6 possible values.
The outcomes where you win are when the second roll is larger than the first. These outcomes
are shaded and form the event of interest. Adding the outcome probabilities for this event,
1 5
P[“Win”] = × (5 + 4 + 3 + 2 + 1) = .
36 12
1
The probability to win the dice game is less, by 12 than the probability to win the coin game. If
1
you played many games of each, you would win 8 3 % more coin games, so you prefer the coin game.
We mention a more convenient representation for the outcomes of 1 1 1 1 1
36 36 36 36 36 36
1
a pair of dice. Instead of using a cumbersome tree, we can use a 1 1 1 1 1 1

36 36 36 36 36 36
two dimensional grid illustrated on the right. On the x-axis are
Die 2 Value
1 1 1 1 1 1
rolls for die 1 and on the y-axis are rolls for die 2. The outcomes 36 36 36 36 36 36
are pairs, one from the x-axis and one from the y-axis, and every 1 1 1 1 1
36 36 36 36 36 36
1
1
pair has the same outcome probability 36 . This representation of 1 1 1 1 1 1
36 36 36 36 36 36
the outcomes is much more compact than the tree. 1 1 1 1 1 1
The event of interest is the same, and we have shaded the outcomes 36 36 36 36 36 36
where you win. As above, there are 15 outcomes in the event of

interest, so again, P[“Win”] = 15
36
5
= 12 . Die 1 Value
The outcome-tree is the mechanism to obtain the outcomes and outcome-probabilities. Once you
have the outcomes and outcome probabilities, you have no further need for the outcome tree. What
matters are the outcomes, when defining the event of interest, and the outcome-probabilities when
computing the probability of the event. Thus, it is often more convenient to represent the outcomes
and their probabilities in a more compact way as we did here with a grid.
Pop Quiz 15.3. From partial outcome (1,2), the host, having to open an empty door, must open
door 3. So, the host has only one option. From partial outcome (1,1), the host has a choice of
doors because both are empty. The two edges correspond to the two choices.
Exercise 15.4.
(a) If you implement the algorithm and repeatedly run with n = 120, you might get different
answers each time. So there is no “right” answer. We ran it once and got
outcome (3,2) (2,3) (1,2) (1,3)
number of times 42 34 22 22
Prize Door 1 2 3
number of times 44 34 42
1
Each prize door occurs roughly 3
rd of the time. The prize has no “preference” for any door.
Half of the time when the prize door is 1, the host opens door 2, and the other half the time,
he opens door 3. When the prize door is 1, the host has no “preference” for a door.
(b) If you switch, you win for the outcomes (3,2) and (2,3), which is 76 times.
(c) Switching is better: you win 76 times versus 44 times if you stay.
483
(d)
2.2
n 120 1,200 12,000
win
ratio for switch 1.73 2.03 1.99 2.1
win/loss ratio
loss
On the right we plot the win/loss ratio as you 2

play more and more games. As the number
1.9
of games played becomes larger, by switching,
your win/loss ratio appears to “converge” to 2. 1.8
You win about twice as often if you switch.
1.7
102 103 104 105 106
number of games
Pop Quiz 15.5.

(a) We use the 6-step method with the outcome-tree.
(i) Instead of the contestant choosing door 1, the contestant chooses any door he wishes.
1: Contestant 2: Prize 3: Host Probability
1
2 2 P (1, 1, 2) = 61 p1
1 1
3
1 2
3 P (1, 1, 3) = 16 p1
1
3 1 P (1, 2, 3) = 13 p1
1 2 3
1
3
1 P (1, 3, 2) = 13 p1
3 2
p1
1
2 1 P (2, 2, 1) = 61 p2
1 1
3
2 2
3 P (2, 2, 3) = 16 p2
p2 1
3 1 P (2, 1, 3) = 13 p2
2 1 3
1
3
1 P (2, 3, 1) = 13 p2
3 1
1
p3 2 1 P (3, 3, 1) = 61 p3
1 1
3
3 2
2 P (3, 3, 2) = 16 p3
1
3 1 P (3, 2, 1) = 13 p3
3 2 1
1
3
1 P (3, 1, 2) = 13 p3
1 2
Since we are not told the probabilities with which the contestant chooses each door, we
denoted these (possibly different) probabilities by p1 , p2 , p3 , where p1 + p2 + p3 = 1. By
switching, the contestent wins for the outcomes in the event
E = {(1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3, 1), (3, 2, 1), (3, 1, 2)}.
The probability of winning by switching is P[E] which is given by
1
P[E] = p + 31 p1 + 31 p2
3 1
+ 31 p2 + 13 p3 + 31 p3
2
= 3
(p1 + p2 + p3 )
2
= 3
.
The probability of winning by switching has not changed. The is to be expected, because
all we are doing is relabeling the doors: the contestants door is relabeled as door 1. The
other two doors are arbitrarily labeled 2 and 3, and the game has not changed.
(ii) The outcome-tree is the same; the edge-probabilities change.
484
Prize Host Probability

1
1
3 2 P (1, 2) = 9
1 2
3
1 3 2
3 P (1, 3) = 9
1
3 1 P (2, 3) = 1
2 3 3
1
3
1 P (3, 2) = 1
3 2 3
The outcome-probabilities have not changed for the outcomes in which you win by
switching. The probility to win by switching is still 32 .
(b) We add a level in the outcome-tree for what the contestant does after the host opens a door.
Prize Host Action Probability
1
1
2 switch P (1, 2, switch) = 12
1 2
2 1
1 stay P (1, 2, stay) = 12
2
1
1
1
1 1
2 switch P (1, 3, switch) = 12
3 2 3 1
1 stay P (1, 3, stay) = 12
2
1 1
3 1 2 1
2 3 switch P (2, 3, switch) = 6
1 1 1
3 2 stay P (1, 2, 3, stay) = 6
1
1
1 2 switch P (3, 2, switch) = 6
3 2 1
1 stay P (1, 3, 2, stay) = 6
2
The probability of the outcomes where the contestant wins are highlighted with gray shading.
Adding these outcome-probabilities gives P[“ContestantWins”],
1 1 1 1
P[“ContestantWins”] = 12
+ 12
+ 6
+ 6
= 12 .
Exercise 15.6.
(a) In the table, we show the result of 10,000 games using the boxed Monte Carlo algorithm.
Die Battle A versus B B versus C A versus C
Result A wins 5527 games B wins 5531 games C wins 5538 games
1: dieA=[2,2,6,6,7,7]; dieB=[1,1,5,5,9,9]; % Die faces

2: N umGames = 1000; % Number of games
3: Awins = 0;
4: for games = 1 to N umGames do
5: a ← random value from dieA;
6: b ← random value from dieB;
7: if a > b then
8: Awins ← Awins + 1; % Die A wins
9: return Awins
(b) Since we are on a roll with Monte-Carlo, lets see what simulation gives sum of two rolls:
A versus B B versus C A versus C
B wins 5171 games C wins 5172 games A wins 5371 games
A wins 4616 games B wins 4584 games C wins 4143 games
tie 213 games tie 244 games tie 486 games
485
1: dieA=[2,2,6,6,7,7]; dieB=[1,1,5,5,9,9]; % Die faces

3: Awins = 0; Bwins = 0;
5: a ← sum of two random values from dieA;
6: b ← sum of two random values from dieB;
7: if a > b then
8: Awins ← Awins + 1; % Die A wins
9: if b > a then
10: Bwins ← Bwins + 1; % Die B wins
11: return Awins, Bwins
The winners of each battle are different. Though A dominates B in one roll, B dominates
when you take the sum of two rolls. Very strange. Let’s now do the outcome-tree analysis.
An outcome specifies the value of 4 rolls. For example, in the battle of A versus B, the
outcome (2,2)(1,1) stands for die A rolling 2 then 2, and die B rolling 1 then 1. Here are the
die A outcomes (we ordered them by sum):
outcome (2,2) (2,6) (6,2) (2,7) (7,2) (6,6) (6,7) (7,6) (7,7)
sum 4 8 8 9 9 12 13 13 14
Here are the die B outcomes
outcome (1,1) (1,5) (5,1) (1,9) (9,1) (5,5) (5,9) (9,5) (9,9)
sum 2 6 6 10 10 10 14 14 18
Here are the die C outcomes
outcome (3,3) (3,4) (4,3) (4,4) (3,8) (8,3) (4,8) (8,4) (8,8)
sum 6 7 7 8 11 11 12 12 16
There are 9 outcomes for each die (2 rolls) and so in a battle, there are 81 possible outcomes
(product rule). All the outcomes have the same probability 13 × 31 × 31 × 31 = 81
1
.
A versus B. Let us count the outcomes where B beats A. If A is (2,2), 8 of B’s outcomes
beat the sum 4. In this way, for each outcome of A we count the outcomes of B which beat
A, and then add them up (sum rule) to get the number of outcomes where B beats A:
Number of outcomes where B beats A = 8 + 6 + 6 + 6 + 6 + 3 + 3 + 3 + 1 = 42.
Similarly, the number of outcomes where A beats B and A ties B are:
Number of outcomes where A beats B = 9 + 8 + 8 + 4 + 4 + 4 + 0 + 0 + 0 = 37;
Number of outcomes where A ties B = 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 2 = 2.
Therefore, B wins over A because
37 42 2
P[“A beats B”] = ; P[“B beats A”] = ; P[“A ties B”] = .
81 81 81
B versus C. We perform the same analysis:
Number of outcomes where C beats B = 9 + 8 + 8 + 5 + 5 + 5 + 1 + 1 + 0 = 42;
Number of outcomes where B beats C = 6 + 6 + 6 + 6 + 3 + 3 + 3 + 3 + 1 = 37;
Number of outcomes where B ties C = 0 + 1 + 1 + 0 + 0 + 0 + 0 + 0 + 0 = 2.
Therefore, C wins over B because
37 42 2
P[“B beats C”] = ; P[“C beats B”] = ; P[“B ties C”] = .
81 81 81
A versus C. We perform the same analysis:
Number of outcomes where C beats A = 9 + 5 + 5 + 5 + 5 + 1 + 1 + 1 + 1 = 33;
Number of outcomes where A beats C = 8 + 8 + 8 + 6 + 4 + 4 + 3 + 3 + 0 = 44;
Number of outcomes where A ties C = 0 + 1 + 1 + 0 + 0 + 2 + 0 + 0 + 0 = 4.
Therefore, A wins over C because
44 33 4
P[“A beats C”] = ; P[“C beats A”] = ; P[“A ties C”] = .
81 81 81

(c) Now, for die A, P = 21 , P = 61 and P = 31 . Only battles involving A change.
486
(i) The outcome-tree for A versus B is shown below.

1: Die A 2: Die B Probability The only thing which changes are the probabili-
1 ties for die A, but this changes the outcome prob-
1
3 P = 6 abilities. The event of interest, outcomes where A
1 beats B are unchanged (shaded). Adding up the
3
P = 61 outcome-probabilities,
1
1
2
3 P[“DieABeatsB”] = 61 + 18 1
+ 181
+ 19 + 91 = 12 .
1
P = 6 It is now a tie between die A and B. We can repeat
the analysis for dies A and C. The outcomes where
1 1 C beats A are unchanged,
3 P = 18 n o
1 1 , , , ,
6 3 1
P = 18 The outcomes involving for die A have prob-
1
3
ability 61 ; those involving have probability 18 1
;
1
P = 18 1
and, those involving have probability 9 . So,
1 1 1 1 1 1
3 1 1
P[“DieCBeatsA”] = 6
+ 6
+ 6
+ 18
+ 9
= 32 .
3 P = 9
A vs. B B vs. C A vs. C
1
1 5 2
3
P

= 1 P[A wins] = 2
P[B wins] = P[C wins] =
9 3
9
1 In a Monte-Carlo simulation with 10,000 rolls:
3

P = 1 A vs. B B vs. C A vs. C
9
A wins 4967 B wins 5579 C wins 6658
(ii) Ayfos should choose die B (which beats C and is no worse than A). Now the best you
can do is choose die A and its a dead tie. Ayfos wins about 500 of the 1000 games.
Pop Quiz 15.7. See the outcome-tree and outcome-probabilities in Exercise 15.6(c).
Exercise 15.8. Remember always that an event is a set of outcomes. We assume that students
have only one color hair (hair cannot be both black and blonde).
(a) The outcomes are each student. Each student is equally likely.
(b) The event contains all students with black or blonde hair (80% of the students):
P[“either black hair or blonde hair”] = 0.8.
(c) The event contains the 20% of students in the complement of the previous event:
P[“neither black hair nor blonde hair”] = 0.2
(d) Trick question. Given only the information in the question all we can do is give bounds on
the probability. We want to know |{Black hair} ∪ {Brown eyes}|. This size is at least 60% of
the students (the brown eyed ones) and at most all the students, so
0.6 ≤ P[“either black hair or brown eyes”] ≤ 1.
If we assume proportionality, then 0.6 × 50% = 30% of black haired students also have brown
eyes. Then, by inclusion-exclusion,
|{Black hair} ∪ {Brown eyes}| = 50% + 60% − 30% = 80%,
and therefore, assuming proportionality,
P[“either black hair or brown eyes”] = 0.8.
(e) Trick question. Again, we can only give bounds. If every black-haired student also has brown
eyes, then the intersection can be as large as 50%. If all the other 50% of students have brown
eyes, then the intersection is as small as 10%. Therefore,
0.1 ≤ P[“black hair and brown eyes”] ≤ 0.5.
If we assume proportionality, then P[“black hair and brown eyes”] = 0.3.
Exercise 15.9.
(a) We are told E1 ∩ E2 = ∅.
P (∗) P P
P[E1 ∪ E2 ] = P (ω) = P (ω) + P (ω) = P[E1 ] + P[E2 ].
ω∈E1 ∪E2 ω∈E1 ω∈E2
487
(∗) is because the outcomes in E1 ∪ E2 can be partitioned into those in E1 and those in E2
because E1 ∩ E2 = ∅. The sum rule generalizes to an arbitrary numebr of disjoint events.
n
X
P[∪n
i=1 Ei ] = P[Ei ].
i=1
(b) E and E are disjoint, so P[E ∩ E] = P[E] + P[E]. Since E ∩ E = Ω and P[Ω] = 1, we have that
1 = P[Ω] = P[E ∩ E] = P[E] + P[E].
(c) Consider X X
P[E1 ] + P[E2 ] = P (ω) + P (ω).
ω∈E1 ω∈E2
Every ω that is only in E1 contributes P (ω) to this sum once. Similarly, every ω that is only
in E2 contributes P (ω) to the sum once. However, every ω ∈ E1 ∩ E2 contributes P (ω) twice
to the sum, once in the sum over ω ∈ E1 and once in the sum over ω ∈ E2 . Therefore, by
subtracting P (ω) from this sum for every ω ∈ E1 ∩ E2 the result will have a contribution of
P (ω) exactly once from every ω ∈ E1 ∪ E2 . That is,
X X X X
P[E1 ∪ E2 ] = P (ω) = P (ω) + P (ω) − P (ω) = P[E1 ] + P[E2 ] − P[E1 ∩ E2 ].
ω∈E1 ∪E2 ω∈E1 ω∈E2 ω∈E1 ∩E2
This formula mimics the inclusion-exclusion formula for counting the size of a union.
(d) By (c), P[E1 ∪ E2 ] = P[E1 ] + P[E2 ] − P[E1 ∩ E2 ] ≤ P[E1 ] + P[E2 ] because P[E1 ∩ E2 ] ≥ 0.
(e) Let Ep be the set of outcomes where p is t. Let Eq be the set of outcomes where q is t. p → q
means that whenever p is t, q is t. That is ω ∈ Ep → ω ∈ Eq , or Ep ⊆ Eq . Since P (·) is
non-negative, the sum over outcomes in Eq includes the sum over outcomes in Ep plus possibly
some other outcome-probabilities. That is, P[Ep ] ≤ P[Eq ], which means
P[“p being true”] ≤ P[“q being true”].
(f) This follows from (e) because p → p ∧ q. Alternatively, observe that E1 ∩ E2 ⊆ E1 .
Exercise 15.10.
(a) There are 36 outcomes in this uniform probability space (see Pop Quiz 15.2). The outcomes
where the sum is 9 are { , , , }.
# outcomes with sum 9 4 1
P[“Sum is 9”] = = = .
|Ω| 36 9
(b) You are not going to be able to draw the outcome tree here. To get to ann outcome, e.g.
TTTTTTTTTT (ten tails in a row), you multiply 10 edge probabilities which are all 12 . This
is the same for every outcome, so we have a uniform probability space with |Ω| = 2 10 and
P = 2−10 . The number of sequences with 4 heads is 10 4
. Therefore,
10

# outcomes with 4 heads 4
P[“4 heads”] = = 10 ≈ 0.2051.
|Ω| 2
(c) There are 3 choices for each roll: { , , }. So, each edge probability is 31 , giving a uniform
probability space with 310 possible
outcomes.
(i) Choose 4 sevens in 10 4
ways; the remaining 6 rolls can be chosen in 2 ways each for 26

ways (product rule). So, there are 10 4
× 26 outcomes (product rule again). Therefore,
10

# outcomes with 4 sevens × 26
P[“4 sevens”] = = 4 10 ≈ 0.2276.
|Ω| 3

(ii) Choose the 4 sevens in 10 4
ways; of the remaining six rolls, choose 3 sixes in 63 ways.
10
6

By the product rule, there are 4 × 3 = 10!/4!3!3! outcomes. Therefore,
10
6
# outcomes with 4 sevens and 3 sixes 4 3
P[“4 sevens and 3 sixes”] = = ≈ 0.0711.
|Ω| 310

(iii) There are 10 4
× 26 outcomes with 4 sevens; there are 10 3
× 27 outcomes with 3 sixes;
10 6

there are 4 × 3 outcomes with 4 sevens and 3 sixes. By inclusion-exclusion, there

are 104
× 26 + 10 3
× 27 − 10 4
× 63 = 24, 600 outcomes with 4 sevens or 3 sixes. So,
# outcomes with 4 sevens or 3 sixes 24600
P[“4 sevens or 3 sixes”] = = ≈ 0.4166.
|Ω| 310
488
(d) In Exercise 13.13 we computed the number of Two-pair hands (123,552) and the number of
Three-of-a-kind hands (54,912). So
123, 552 54, 912
P[“Two-pair”] = 52
≈ 0.0475 P [“Three-of-a kind”] = 52
≈ 0.0211.
5 5
Three-of-a-kind should win because it is rarer.
(e) We have a uniform probability space in which each of the 52 13
possible 13-card hands is
equally likely.
A hand has no Ace means the 13 cards are selected 52 48 cards. This can be
from
done in 4813
ways. So the probability to not have any Aces is 48
13
/ 13 . The complementary
event is that the hand has an Ace, therefore
48

13 39 × 38 × 37 × 36
P[“Ace”] = 1 − 52 = 1 − ≈ 0.6962.
13
52 × 51 × 50 × 49
Exercise 15.11.
(a) The possible outcomes are all sequences of heads and tails which end in either HHT or THH
and there is no earlier occurrence of HHT or THH. So, HHT (you win), HHHT (you win), THH
(friend wins), TTHTHH (friend wins) are possible outcomes, but TTHHT is not a possible
outcome because though the sequence ends in HHT, there is an earlier occurrence of THH.
The outcome space is infinite. It is quite a complicated outcome space. We cannot simply
list out the possible outcomes with their probabilities. We show the how the game plays out
as an outcome-tree.
Outcome Probability Winner
1
2 1
H HH 4 You
1
2 H
1
1 T HT 4 Friend
2
1 1
T T 2 Friend
2
This outcome-tree is not showing outcomes of the game, but outcomes of coin tosses. However,
based on these outcomes of the coin tosses, we are reasoning about the game. If a tail appears,
and the game is not over, then we know your friend wins. This is because for you to win,
there must occur HH; at the first occurrence of HH, a THH has appeared and your friend has
won. If HH appears, then you win, because for your friend to win, a T must appear and at
the first appearence of T, you have won.
You win only if the coin tosses start out HH, that is P[“you win”] = 14 .
(b) We use a similar reasoning here using outcomes of the coin tosses.
1
2 1
H HH 4 You
1 1
2 H 2 H HTH
1
Friend
8
1 T
2 1
1 T HTT 8 Restart
2 1
1
T T 2 Restart
2
Let us explain the outcomes of the coin toss. As in (a), if the coin starts with HH, then you
win, because for your friend to win, a tail must be tossed, but at the first toss of a tail, you
win. If the coin tosses HTH, then your friend wins and the game ends. A winning sequence
for either you or your friend must start with H. So, ff the coin starts with HTT or T, both of
you are waiting for an H to start a possible winning sequence. This is the same situation as at
the beginning of the game where you are both waiting for H. So the game effectively restarts.
You win (y) with probability 14 ; your friend wins (f) with probability 81 ; or, the game “restarts”
(r) with probability 58 .
So the outcomes of the game are of the form r•i y (i restarts followed by you winning), or
r•i f (i restarts followed by your friend winning), where i ≥ 0.
outcome r•i y r•i f
probability 1
4
× ( 58 )i 1
8
× ( 85 )i
489
The outcomes where you win are r•i y, so the probability you win is
" 2 #
1 5 5 1 1 2
P[“you win”] = 1+ + + ··· = × 5 = .
4 8 8 4 1− 8
3
(c) We use a similar reasoning here using outcomes of the coin tosses.
1
2 1
H HH 4 You
1 H
2 1
1 T HT 4 ?
2 1
1 2 1
H THH 8 You
2 H
1 1
T 1 T THT 8 Friend
2
2 1
1 T TT 4 ?
2
As above, for HH, you win at the first toss of tails. A similar reasoning applies to THH and
THT is the winning sequence for your friend. The interesting cases are the questions marks
when a tail is tossed, T or HT. Now, any number of tails can follow. At the first H, you win
if the next toss is H and your friend wins if the next toss is T. Here are the outcomes:
outcome HH HT•i HH HT•i HT T•i HH T•i HT
probability 1
4
1
8
× ( 12 )i 1
8
× ( 21 )i 1
4
× ( 21 )i 1
4
× ( 21 )i
winner you you friend you friend
In the table above, i ≥ 1. The probability that you win is the sum of the probabilities for the
outcomes HH, HT•i HH and T•i HH, for i ≥ 1:
∞ i ∞ i
1 1X 1 1X 1 1 1 1 5
P[“you win”] = + + = + + = .
4 8 i=1 2 4 i=1 2 4 8 4 8
(d) Monte Carlo is always a useful tool for checking the result of a probability analysis. Here is a
simulation to compute the probability that you win given your string and your friend’s string.
1: you = [1, 1, 0]; f riend = [0, 1, 1]; % H=1, T=0

3: P win = 0; % Probability you win
5: x ← random binary vector of length 3;
6: while true do
7: if isequal(x, you) then
8: win = 1; break out of while; % You win
9: else if isequal(x, f riend) then
10: win = 0; break out of while; % Your friend wins
11: else
12: x[1] = x[2]; x[2] = x[3]; x[3] = random bit; % Next toss
win − P win
13: P win ← P win + ; % Update the probability
games
14: return P win
Let us justify the update in step 13. Let W be the number of wins up to that point; games
is the number of games played (including the current game) and win is the outcome of the
current game. The previous value of P win is W/(games − 1) (0 if games = 1), that is
W = P win(games − 1). The updated value of P win should be (W + win)/games
W + win P win(games − 1) + win win − P win
P win ← = = P win + .
games games games
The results of the simulation for 104 , 105 and 106 games are shown below.
490
Number of games 10,000 100,000 1,000,000

Probability you win (a) 0.2477 0.25 0.25
Probability you win (b) 0.6626 0.666 0.667
Probability you win (c) 0.6294 0.6235 0.6255
Note that athe frequency (probability from the simulation) gets closer to the computed prob-
ability as you play more games.
(e) Once again we illustrate the power of simulation. By symmetry, we may assume your friend
chooses a sequence that starts with H. For each of the 4 sequences your friend chooses, we
can evaluate your 8 possible sequences and pick the best. Here are the simulation results.
friend your best choice P[“your best wins”] P[“your 2nd-best wins”]
(i) HHH THH 0.8747 0.7005
(ii) HHT THH 0.7500 0.5000
(iii) HTH HHT 0.6668 0.6253
(iv) HTT HHT 0.6658 0.5004
Let us analyze each of the 4 cases for your friends choice of string.
(i) If a T is tossed, you win at the first arrival of HH. So your friend can only win if the
game specifically starts HHH, which has probability 18 .
P[“THH beats HHH”] = 78 .
You cannot do better with any other sequence (you must lose if the game starts HHH).
(ii) If the game starts HH (probability 14 ), your friend wins at the first arrival of T. If the
game starts any other way, you win at the first arrival of HH. So,
P[“THH beats HHT”] = 43 .
(iii) We analyzed this in part (b). P[“HHT beats HTH”] = 23 .
(iv) Here are relevant outcomes of the coin tosses.
1
2 1
H HH 4 You
1 1
2 H 2 H HTH
1
?
8
1 T
2 1
1 T HTT 8 Friend
2
1 1
2 T T 2 Restart
The outcome with the question mark is equivalent to restarting from the shaded H
vertex. We conclude that the outcomes where you win start with any number of T’s
followed by any number of HT’s followed by HH. That is, the outcomes where you win
are T•i HT•j HH, having probability ( 12 )i × ( 41 )j × 14 . Therefore,
P∞ P ∞ 1
P[“HHT beats HTT”] = 41 ( 12 )i ( 41 )j = 41 × 2 × = 32 .
i=0 j=0 1 − 41
Chapter 16
Pop Quiz 16.1.
(a) Barring the unforseen Humans will be here, P[There is a living Human tomorrow] ≈ 1.
(b) Similarly, P[Sun does not rise tomorrow] ≈ 1.
(c) The new information that is given (the Sun does not rise tomorrow) means that some catas-
trophe has indeed occurred. Humans are likely wiped out,
P[There is a living Human tomorrow | Sun does not rise tomorrow] ≈ 0.
The new information significantly changes the probability of Humans being around tomorrow.
Exercise 16.2.
P[CS ∩ MATH] 0.016
(a) P[CS | MATH] = = = 0.8; (In general, P[A | B] 6= P[B | A].)
P[MATH] 0.02
P[CS ∩ MATH] 0.016
P[MATH | CS] = = = 0.08.
P[CS] 0.2
491
(b) (i) P[A|A] = P[A ∩ A]/ P [A] = 1.

(ii) P[A|A ∩ B] = P[A ∩ (A ∩ B)]/ P [A ∩ B] = P[A ∩ B]/ P [A ∩ B] = 1.
(iii) P[A ∩ B|B] = P[(A ∩ B) ∩ B]/ P [B] = P[A ∩ B]/ P [B] = P[A | B].
(iv) P[A ∪ B|B] = P[(A ∪ B) ∩ B]/ P [B] = P[B]/ P [B] = 1.
(v) P[A|A ∪ B] = P[A ∩ (A ∪ B)]/ P [A ∪ B] = P[A]/ P [A ∪ B].
(c) (i) PB (ω) = P[{w} ∩ B]/ P [B].
If w 6∈ B, P[{w} ∩ B] = P [∅] = 0; otherwise, P[{w} ∩ B] = P[{w}] = P (w).
P
(ii) First, note that P[B] = w∈B P (w).
X X X X P (ω) X 1 X P[B]
PB (ω) = PB (ω)+ PB (ω) = + 0= P (ω) = =1
w∈Ω w∈B w∈B
P[B] P[B] w∈B
P[B]
w6∈B w6∈B
Pop Quiz 16.3.

(a) The outcomes are (i, j), where i is the first roll and j is the second roll. So, 1 ≤ i, j ≤ 6. This
1
is a uniform probability space so P (i, j) = 1/(number of outcomes) = 36 .
3 1
(b) The number of outcomes in the event is 3 so P[Sum is 10] = 36 = 12 .
9
(c) The number of outcomes in the event is 9 so P[Both are Odd] = 36 = 41 .
1
(d) Only 1 outcome has both dice odd and a sum 10, so P[(Sum is 10) and (Both are Odd)] = 36 .
1
P[(Sum is 10) and (Both are Odd)] 1
(e) P[Sum is 10 | Both are Odd] = = 361 = .
P[Both are Odd] 4
9
1
P[(Sum is 10) and (Both are Odd)] 36 1
(f) P[Both are Odd | Sum is 10] = = 1 = .
P[Sum is 10] 12
3
Exercise 16.4.
(a) The analysis mimics the example before Exercise 16.4 on page 234, with P (1, 2) = 0 and
P (1, 3) = 31 . We give the intuition. Since Monty will always open door 3 if it is available, it
must be that door 3 is not available, so you win with probability 1 by switching.
(b) The outcome tree is shown on the left. We want P[WinBySwitching | Door2Opened].
Prize Host Outcome Probability Door 2 is opened in outcomes (1, 2) and
1
2 2 (1, 2) 1 (3, 2). You win by switching in out-
6
1 1 come (3, 2), so the conditional proba-
1
3 1 3 (1, 3) 6 bility that we need is
2
1
1 P[{(3, 2)}] 1
6
1 2 2 restart 1 = 1 = .
3 6 P[{(1, 2), (3, 2)}] + 61
6
2
2
1 3 (2, 3) 1 Now, it is even odds whether to switch
6
2
1 or not. The intuition is that when the
3 1
2 2 (3, 2) 1 prize is behind door 3, Monty is no
6
3 longer forced to open door 2. He may
1
1 3 restart
6 restart by opening door 3 half the time.
2
Exercise 16.5. On the left is the outcome-tree from which we can obtain the probabilities.
(a) Your wins are {YY,YOY,OYY}, with
Set 1 Set 2 Set 3 Outcome Prob 1
probability 12 + 24 1
+ 16 29
= 48
3
you 1 (b) You win the set 1: {YY,YOY,YOO}.
4 YY
you
1 2
1
You win set 1 and match: {YY,YOY}.
2
4 you YOY 24 1 1
3 1 opp
1
P[Win set 1] = 2
+ 24 + 18 = 23 ;
4 3 opp YOO 1
4
8
P[Win set 1 & match] = 2
+ 24 = 13
1
24
;
1 3 13 2 13
3 4 you 1 P[Win match | set 1] = / = 16 ;
24 3
1 OYY 16
4 you 29
opp OYO
1 (c) From (a), P[Win] = 48 . From (b),
opp 1 48
4 1 P[Win set 1 & match] = 13
24
. Therefore,
3 opp OO 4 13 29 26
4 P[Win set 1 | match] = /
24 48
= 26
.
Exercise 16.6.
492
(a) Yes. You can only estimate P[student likes the course | student rated course]. Students who
rate the course is not a random sampling of students. In general, any surveys of this type
suffer from sampling bias. This type of sampling bias is sometimes called non-response bias.
Those who do not respond tend to be a particular type of person, not a random sample.
(b) Wald first surmised that the hits taken on planes should be somewhat random (shot accuracy
is not high enough in an aerial battle to target specific parts of a plane). So there should
be just as many hits on the tail and nose as main body. So the returing war-planes are not
indicative of where the planes are getting hit; there are indicative of which planes survive
given they are hit. He concluded that
P[survive | hit on mid-body] ≫ P[survive | hit on nose or tail].
Therefore, the nose and tail are the regions that needed fortification.
Pop Quiz 16.7. You are not old enough to have any profession but student.
Pop Quiz 16.8. We require P[A ∩ B]/ P [B] = P[A ∩ B]/ P [A] (assuming P[A], P[B] > 0), or
P[A ∩ B](P[A] − P[B]) = 0. Either A and B must be disjoint so that P[A ∩ B] = 0, or P[A] = P[B].
Exercise 16.9.
(a) By (16.1) on page 238, P[A1 ∩ (A2 ∩ A3 )] = P[A1 | A2 ∩ A3 ] × P[A2 ∩ A3 ]. Applying (16.1)
again,
P[A1 ∩ A2 ∩ A3 ] = P[A1 | A2 ∩ A3 ] × P[A2 | A3 ] × P[A3 ].
The general formula, which we encourage you to prove by induction, is
P[A1 ∩ A2 ∩ · · · ∩ An ] = P[A1 | A2 ∩ · · · ∩ An ] × P[A2 | A3 ∩ · · · ∩ An ] × · · · × P[An−1 | An ] × P[An ].
(b) Let us consider an example path on the outcome-tree to a leaf. The outcome is v 1 v2 v3 v4 :
p3 v3 p4
p1 p2 v2 v4
v1
p1 is the probability that v1 occurs at the start, p1 = P[v1 ]. After v1 occurs, p2 is the probabil-
ity that v2 occurs, given v1 has occurred, p2 = P[v2 | v1 ]. Similarly, p3 is the probability that
v3 occurs, given v1 , v2 have occurred, p3 = P[v3 | v1 and v2 ]; p4 = P[v4 | v1 and v2 and v3 ].
By part (a),
P[v4 ∧ v3 ∧ v2 ∧ v1 ] = P[v4 | v1 ∧ v2 ∧ v3 ] × P[v3 | v1 ∧ v2 ] × P[v2 | v1 ] × P[v1 ]
= p4 p 3 p 2 p 1 .
That is, the probability of the leaf outcome occurring is exactly the product of the edge
probabilities leading to that leaf.
Exercise 16.10. By the definition of conditional probability, P[A | B] = P[A ∩ B]/ P [B], and
also, P[A ∩ B] = P[B | A] P [A]. Therefore,
P[B | A] P [A] P[B | A] P [A]
P rob[A | B] = = P rob[A | B] = .
P[B] P[B | A] P [A] + P[B | A] P [A]
(The second equality is by the law of total probability, P[B] = P[B | A] P [A] + P[B | A] P [A].)
(a) Using R for Republican and D for democrat,
P[oppose taxes] = P[oppose taxes | R] P [R] + P[oppose taxes | D] P [D]
= 0.7 × 0.4 + 0.5 × 0.6 = 0.58.
(b) Using Bayes’ Theorem,
P[oppose taxes | R] P [R] 0.7 × 0.4
P[R | oppose taxes] = = ≈ 0.483.
P[oppose taxes] 0.58
Exercise 16.11. You win if two consecutive heads arrive before two consecutive tails.
(a) The outcomes where you win begin with (HT)•i or T(HT)•i , and end with HH.
Winning outcomes (HT)•i HH T(HT)•i HH
To
Probability (p(1 − p))i × p2 (1 − p) × (p(1 − p))i × p2
get the probability of winning, we add these probabilities,
∞ ∞
X X p2 (2 − p)
P[win] = p2 (p(1 − p))i + p2 (1 − p) (p(1 − p))i = .
i=0 i=0
1 − p(1 − p)
493
(b) By the law of total probability,

P[win] = P[win|H] P [H] + P[win|T] P [T]
= p P [win|H] + (1 − p) P [win|T].
Let us compute P[win|H] and P[win|T] using total probability.
P[win|H] = (1 − p) P [win|T] + p.
(If you get H you won, and if you get T, it is as if you started with T.) Similarly,
P[win|T] = p P [win|H].
(If you get T you lost, and if you get H, it is as if you started with H.) Using this expression
for P[win|T], we have: P[win|H] = p(1 − p) P [win|H] + p. Solving for P[win|H]:
p p2
P[win|H] = and P [win|T] = .
1 − p(1 − p) 1 − p(1 − p)
Substituting back into P[win],
p2 p2 (1 − p) p2 (2 − p)
P[win] = + = .
1 − p(1 − p) 1 − p(1 − p) 1 − p(1 − p)
Chapter 17
Pop Quiz 17.1. This is a tricky question. The second toss is H with probability p, and that
is independent of whether the first toss came up H or T. You can view this independence as
“full-knowledge” independence, or the “universe”-view.
Suppose you must predict the second toss. Does knowing the first toss help you? Yes. From your
point of view, the two tosses are not independent. The first toss tells you about p, which helps
to predict the second toss. If the first toss is H, then you would guess that p > 21 and predict
the second toss as H. Imagine if the first 10 tosses are H. Now you most certainly would suspect
that p ≈ 1 (biased coin) and predict the 11th toss as H. Compare: The sun has risen every day in
recorded history. Does that help you predict whether or not the sun will rise tomorrow?
If you fix p = 31 , now you don’t care what the first toss was; p is what governs the second toss, so
the two tosses are independent. Your view becomes the full-knowledge (or universe) view.
Exercise 17.2.
(a) False. Consider two disjoint events, each having positive probability.
(b) False. Consider the rain and clouds example, or A = B with P[A] < 1.
(c) True. A ∩ B ⊆ A, therefore P[A ∩ B] ≤ P[A]; similarly P[A ∩ B] ≤ P[B].
Exercise 17.3. We proved this formula in Exercise 16.10. Define the event
A = {Person is using a cellphone during a particular (fixed) weekday minute.}
Randomly pick a person in the USA. We want P[A]. To use the Fermi method, we first break
down A into smaller events that are easier to analyze.
A1 = “Has cellphone”;
A2 = “Uses cellphone in the particular minute”.
A occurs if A1 and A2 occur.
P[A] = P[A1 ∩ A2 ] = P[A1 ] × P [A2 | A1 ]
Typical surveys say that 9 in 10 adults (15-70 years old) have a cellphone and adults are about 50%
of the population. The typical cellphone plan is 1000 weekday minutes per month, which suggests
that people who have a cellphone use about 1000 weekday minutes. Assume that phone usage is
evenly spaced during the month and a weekday has 12 hours (8am-8pm) which is 720 minutes,
so 20 weekdays in a month equals 14400 weekday minutes. 1000 minutes are spread evenly over
these 14400 minutes. There are 14400
1000
ways to pick 1000 of the 14400 minutes. If you do not use

a particular minute, there are only 14399
1000
ways. The probability to not use a particular weekday
14399 14400
minute is 14399
1000
/ 14400
1000
. Since n−1
k
= n−k
n
n
k
, 1000 / 1000 = 14400−1000
14400
.
9
P[Has cellphone] ≈ 10 × 12 = 20
9
.
1000 1
P[Uses cellphone on the minute | Has cellphone] ≈ 14400
≈ 15
.
494
9 1 3
Multiplying the probabilities, P[A] ≈ 20 × 15 = 100 . There are 320 million people in the USA, so
about 10 million are in the event A (for the USA).
Pop Quiz 17.4.
(a) Since the coins are independent, any pair of coins matches with probability 12 . This is also
verified by “brute force” because, for example, A1 = {HHH, HHT, TTH, TTT} contains 4
outcomes and each has probability 18 so P[A1 ] = 4 × 18 = 21 .
(b) A2 = {HHH, THH, HTT, TTT}, so A1 ∩ A2 = {HHH,TTT}, and so P[A1 ∩ A2 ] = 2 × 18 = 14 .
The other 2 cases are analogous.
(c) A1 ∩A2 ∩A3 = {HHH,TTT} (events A1 , A2 , A3 simultaneously hold). So, P[A1 ∩A2 ∩A3 ] = 41 .
If coins 1 and 2 match and 2 and 3 match, then 1 and 3 must match, so any two pairs matching
means the third matches as well.
Pop Quiz 17.5.
18
(a) A1 (blue) contains 18 outcomes, as does A2 , so P[A1 ] = P[A2 ] = 36 = 12 . A3 contains 4
4
outcomes so P[A3 ] = 36 = 19 . The intersection of the three shaded regions contains just the
1
one outcome (3,6), and so P[A1 ∩A2 ∩A3 ] = 36 , which is the product of the three probabilities.
That is, we have 3-way independence.
1 1
(b) A1 ∩ A3 contains the one outcome (3,6), so P[A1 ∩ A3 ] = 36 and P[A1 ] · P[A3 ] = 18 .
3 1
(c) A2 ∩ A3 contains the three outcomes {(3,6), (4,5), (5,4)} so P[A 2 ∩ A3 ] = 36 = 12 and
1
P[A2 ] · P[A3 ] = 18 .
(d) The events are not 2-way independent.
Pop Quiz 17.7.
(a) We use the definition of conditional probability.
P[A1 ∩ A2 ∩ A3 ] P[A1 ] P [A2 ] P [A3 ]
P[A3 | A1 ∩ A2 ] = = = P[A3 ].
P[A1 ∩ A2 ] P[A1 ] P [A2 ]
The penultimate step is by independence.
(b) If X = A1 , Y = A2 , Z = A3 the probability of the intersection equals the product by inde-
mendence. Suppose that exactly one of sets is the complement, for example Z = A3 . Then,
by the law of total probability,
P[A1 ∩A2 ∩A3 ]+P[A1 ∩A2 ∩A3 ] = P[A1 ∩A2 ] → P[A1 ∩A2 ∩A3 ] = P[A1 ∩A2 ]−P[A1 ∩A2 ∩A3 ].
Now we can use independence to obtain
P[A1 ∩A2 ∩A3 ] = P[A1 ]P[A2 ]−P[A1 ]P[A2 ]P[A3 ] = P[A1 ]P[A2 ](1−P[A3 ]) = P[A1 ]P[A2 ]P[A3 ].
Let us give the general case and prove it by induction. Suppose A 1 , A2 , . . . , An are indepen-
dent. Let Xi = Ai or Ai . Then we want to show that
P[X1 ∩ · · · ∩ Xn ] = P[X1 ] × · · · × P[Xn ].
We prove a stronger claim by induction: the probability of any k-way intersection of the X’s
equals the product of the k probabilities. The base case, k = 1, trivially holds. Suppose the
claim holds for any k-way intersection and consider a (k +1)-way intersection. We must prove:
P[Xi1 ∩ · · · ∩ Xik+1 ] = P[Xi1 ] × · · · × P[Xik+1 ].
We use a second induction on the number of complements among these k + 1 sets. When
there are no complements, the claim follows by independence of the X i . Suppose this claim
holds when there are ℓ complements and consider the case where there are ℓ + 1 complements.
Without loss of generality, we may assume that Xik+1 = Aik+1 . Then,
P[Xi1 ∩ · · · ∩ Xik ∩ Aik+1 ] = P[Xi1 ∩ · · · ∩ Xik ] − P[Xi1 ∩ · · · ∩ Xik ∩ Aik+1 ]
= P[Xi1 ] × · · · × P[Xik ] − P[Xi1 ∩ · · · ∩ Xik ∩ Aik+1 ]
= P[Xi1 ] × · · · × P[Xik ] − P[Xi1 ] × · · · × P[Xik ] × P[Aik+1 ].
= P[Xi1 ] × · · · × P[Xik ] × (1 − P[Aik+1 ]).
= P[Xi1 ] × · · · × P[Xik ] × P[Xik+1 ].
(The first step follows because the claim holds for all k-way intersections. The second because
the second term has ℓ complements and the claim holds for ℓ complements.)
495
(c) Suppose (∗) in Exercise 17.6 holds for all 23 choices of (X, Y, Z). We must show 1,2 and 3-way
independence. 3-way follows directly from (∗). Suppose we have proved k-way independence,
and consider (k − 1)-way independence.
P[Ai1 ∩ · · · ∩ Aik−1 ] = P[Ai1 ∩ · · · ∩ Aik−1 ∩ Aik ] + P[Ai1 ∩ · · · ∩ Aik−1 ∩ Aik ]
= P[Ai1 ] · · · P [Aik−1 ] P [Aik ] + P[Ai1 ] · · · P [Aik−1 ](1 − P[Aik ])
= P[Ai1 ] · · · P [Aik−1 ]
If (∗) in Exercise 17.6 holds, then k-way independence implies (k − 1)-way independence.
Since we have 3 way independence, we have 2 and then 1. Our argument holds for general
n > 3.
Exercise 17.6. Given E1 , E2 , . . . , Ek−1 , none of sk , . . . , sN are born on day 1 to day mathk-1.
Suppose sk is born on day k, then sk+1 , . . . , sN (N − k students) are all born on days k + 1 to B
(B − k of the B − k + 1 days, given E1 , E2 , . . . , Ek−1 ). By independence,
N −k
B−k
P[Ek | E1 ∩ E2 ∩ · · · ∩ Ek−1 ] = .
B−k+1
Exercise 17.8.
(a) We want
N N −k N
Y B−k X B−k
P = , or equivalently, ln P = (N − k) ln .
B−k+1 B−k+1
k=1 k=1
The sum can be evaluated, but the product is numerically unstable. Observe that
N −k N −k B−k (N −k)/(B−k)
B−k+1 1 1
= 1+ = 1+ ≈ e(N −k)/(B−k) ,
B−k B−k B−k
because (1 + x1 )x ≈ e for large x. Using this approximation in P ,
N N
� N −k
Y N −k − B−k
P ≈ e− B−k = e k=1 .
k=1
We need to evaluate the sum in the exponent,
N N N N
X N −k X N −B+B−k X N −B X N −B
= = +1=N +
B−k B−k B−k B−k
k=1 k=1 k=1 k=1
N
P RN
N −B 1 B−1
Using the integration method, B−k
≈ (N − B) 1
dx B−x
= (N − B) ln B−N
. Therefore,
k=1
B−1
P ≈ e(B−N ) ln B−N −N .
When B = 366 and N = 200, P ≈ e−69.2 . Using the exact sum for ln P , P = e−68.4 .
(b) Trick question. With 367 people and only 366 birthdays, by pigeonhole, at least two people
must share the same birthday so the probability of no social-twins is 0.
N
P B−k
(c) We compute ln P = (N − k) ln B−k+1 , from which we obtaiin P
k=1
1
The probability of no social twin is 1 − P . We
Probability of social-twin
show a plot 1 − P versus N in th figure to the

0.75
right. The probability of a social-twin rapidly in-
creases and first goes over 0.5 for N = 23. So in
a party of just 23, the odds favor there being a 0.5
social-twin. By N = 60, it is essentially guaran-

teed that you have a social-twin. 0.25
0
1 10 20 23 30 40 50 60
N
(d) (i) Repetition is allowed. Each student has B choices, so there are B N possible sequences.
(ii) Each sequence is equally likely so each has probability 1/B N .
496
(iii) The first k birthdays are chosen in B(B − 1) · · · (B − k + 1) = B!/(B − k)! ways. The
remaining N − k students choose (with repetition) from the remaining B − k birthdays
in (B − k)N −k ways. By the product rule, the number of sequences with no repetitions
of the first k birthdays is
B!
× (B − k)N −k .
(B − k)!
(iv) Multiply the number of allowed sequences in (iii) by their probability in (ii) to get
1 B!
P[no repetition of first k birthdays] = N × × (B − k)N −k .
B (B − k)!
(v) We can cancel many terms in Equation (17.5) as follows:
N −1 N −2 N −3 N −k+1 N −k
B−1 B−2 B−3 B−k+1 B−k
···
B B−1 B−2 B−k+2 B−k+1
(B − 1)N −1 (B − 2)N −1 (B − 3)N −1 (B − k + 1)N −1 (B − k)N −1

= N −1
· N −2
· N −3
··· N −k+1
·
B (B − 1) (B − 2) (B − k + 2) (B − k + 1)N −k
We highlighted pairs of terms in the same color which simplify: the red terms simplify
to (B − 1); the green to (B − 2); and so on. We get:
1
= × (B − 1)(B − 2) · · · (B − k + 1) ×(B − k)N −k
B N −1 | {z }
1 B!
B
× (B−k)!
1 B!
= × × (B − k)N −k .
BN (B − k)!
(vi) When k = N , the result follows from
B(B − 1)(B − 2) · · · (B − (N − 1)) B (B − 1) (B − 2) (B − (N − 1))
= · · ··· ,
BN B B B B
To derive this formula directly, observe that
1
P[s2 does not match s1 ] = 1 − B
;
2
P[s3 does not match any of s1 , s2 | no match in s1 , s2 ] = 1 − B
3
P[s4 does not match any of s1 , s2 , s3 | no match in s1 , s2 , s3 ] = 1 − B
..
.
N −1
P[sN does not match any of s1 , . . . , sN −1 | no match in s1 , . . . , sN −1 ] = 1 − B
.
Multiplying these conditional probabilities,
1 2 3 N −1
P[no match in s1 , s2 , . . . , sN ] = (1 − B
) × (1 − B
) × (1 − B
) × · · · × (1 − B
).
Exercise 17.9.
(a) B = 300 and N = 100, so we can use e−N (N −1)/B ≤ P[no collisions] ≤ e−N (N −1)/2B :
e−33 ≤ P[no collisions] ≤ e−16.5 (essentially 0).
(i) There are no collisions if and only if every bin has at most one object, so:
e−33 ≤ P[every bin has at most one object] ≤ e−16.5 .
(ii) Some bin has more than one object if and only if there is a collision which has probability
1 − P[no collisions]. Therefore,
1 − e−16.5 ≤ P[some bin has more than one object] ≤ 1 − e−33 .
(b) We want P[no collisions] ≥ 0.9, so we set e−N (N −1)/B ≥ 0.9 which gives B ≥ ⌈ N (N − 1)/ ln(1/0.9) ⌉,
or B = 93, 964. That is a pretty big table size for just 100 words.
(c) P[no collisions] ≥ e−N (N −1)/B . When B = N 2+ǫ , N (N − 1)/B = N −ǫ − 1/N 1+ǫ → 0.
Therefore e−N (N −1)/B → 1. Probabilities are at most 1, so P[no collisions] → 1.
(d) P[no collisions] ≤ e−N (N −1)/2B . When B = N 2−ǫ , N (N − 1)/2B = 21 N ǫ − 12 N ǫ−1 → ∞ for
1 > ǫ > 0. Therefore e−N (N −1)/2B → 0. Probabilities are at least 0, so P[no collisions] → 0.
497
Pop Quiz 17.10. By interchanging home and the lockup, P[home] = 13 . Alternatively, the
successful step-sequences are L(RL)•i L, having probability 12 × ( 14 )i × 12 . Summing these proba-
bilities,
∞ i
1X 1 1 1 1
P[home] = = × = .
4 i=0 4 4 1 − 41 3
Alternatively, we can solve the problem using the law of total probability,
P[home] = P[home | LR] P [LR] + P[home | LL] P [LL] + P[home | R] P [R] .
| {z } | {z } | {z }
P [home] × 14 1 × 41 0 × 21
That is, P[home] = 41 · P[home] + 14 . Solving, P[home] = 14 /(1 − 41 ) = 13 .
Exercise 17.11.
(a) The analysis is similar to Example 17.5. We want P (100, 200, 38 18
) = (β 200 − β 100 )/(β 200 − 1),
where β = 0.9. The probability to double the money is about 2.66 × 10−5 . Essentially 0.
18
If you bet all your money on red, you double up with probability 38 ≈ 0.474. Much better.
(b) This is an interesting problem and the answer is very counter-intuitive. It sounds like the
person furtheres from you is likely the last to get the bread, but this is not true. Everyone
but yourself is equally likely to be the last. This includes the person sitting next to you as
1
well as the person diametrically opposite. Everyone has a probability 14 of being the last.
Consider person x. One of x’s two neighbors gets the bread before the other. In this situation,
the bread needs to travel n − 2 steps to the other neighbor before reaching x if x is the last to
get the bread (n is the number of people). That is, we have a random walk with k = 1 (one
step to reach x) n − 2 steps in the other direction to reach the other neighbor before x. So,
k = 1, L = n − 1 and β = 1. The probability to reach x first is (L − 1)/L = (n − 2)/(n − 1).
This is the probability that x is not last. The probability x is last is 1/(n − 1), as claimed.
Important: If you do not believe the analysis, do a Monte Carlo simulation to play out the
bread passing routine with 5 people and tabulate how often each person is last.
Chapter 18
Pop Quiz 18.1.
(a) X = 2 and Y = 1 are disjoint so P[X = 2 ∩ Y = 1] = 0. P[X = 2] × P[Y = 1] = 83 × 14 = 32
3
.
The two do not match. The events are not independent.
(b) {X = 2}∩{Y = 1} = {HHH}, hence P[X = 2∩Y = 1] = 81 . P[X ≥ 2]×P[Y = 1] = 12 × 41 = 18 .
The two match. The events are independent.
P[X = 2 ∩ Y = 0]
(c) P[X = 2 | Y = 0] = = 83 / 86 = 12 .
P[Y = 0]
P[X ≥ 2 ∩ Y = 0]
(d) P[X ≥ 2 | Y = 0] = = 38 / 68 = 21 .
P[Y = 0]
Exercise 18.2. Let us first construct the (non-uniform) probability space and the random vari-
ables.
outcome HHHH HHHT HHTH HHTT HTHH HTHT HTTH HTTT THHH THHT THTH THTT TTHH TTHT TTTH TTTT
16 8 8 4 8 4 4 2 8 4 4 2 4 2 2 1
probability
81 81 81 81 81 81 81 81 81 81 81 81 81 81 81 81
X12 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0
X23 2 2 1 1 1 1 0 0 2 2 1 1 1 1 0 0
X34 2 1 1 0 2 1 1 0 2 1 1 0 2 1 1 0
X12 + X23 4 4 3 3 2 2 1 1 3 3 2 2 1 1 0 0
X12 + X34 4 3 3 2 3 2 2 1 3 2 2 1 2 1 1 0
Using this table, we can compute the probabilities of interest.

36
(a) (i) P[X12 ≥ 2] = 81 = 94 . (ii) P[X12 +X23 ≥ 2] = 66 81
22
= 27 . (iii) P[X12 +X34 ≥ 2] = 72 81
= 98 .
4 4 24 4 4
(b) (i) P[X12 ≥ 2] = 9 ; P[X23 ≥ 2] = 9 ; P[X12 ≥ 2 ∩ X23 ≥ 2] = 81 6= 9 × 9 . Not independent.
(ii) P[X12 ≥ 2] = 94 ; P[X34 ≥ 2] = 94 ; P[X12 ≥ 2 ∩ X34 ≥ 2] = 8116
= 94 × 94 . Independent.
Pop Quiz 18.3. The shaded upper-left to lower-right diagonals as shown in the probability
space for X = 9 contain all the outcomes with a particular value of X. The probability is the
498
number of outcomes in the diagonal divided by 36 (table below). Simplifying the fractions gives
the answer.
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
PX (x) 36 36 36 36 36 36 36 36 36 36 36
Pop Quiz 18.4. The outcome-probabilities in the underlying probability space sum to 1. Every
outcome is represented once in the joint probabilities, so the sum of the joint probabilities is 1.
Summing the column sums is just another way to sum all the joint probabilities. So the column
sums must add to 1. Similarly for the row sums.
Pop Quiz 18.5. Yes. The first 5 coin tosses are one experiment; the second 5 tosses are another;
and, the 2 dice rolls are a third. X depends only on the first experiment. Y depends only on the
2nd and 3rd experiments which are unrelated to the first experiment.
Exercise 18.6.
(a) Start with the probability space and construct the random variables.
Sample Space Ω
ω HHH HHT HTH HTT THH THT TTH TTT
1 2 2 4 2 4 4 8
P (ω) 27 27 27 27 27 27 27 27
X(ω) 3 2 2 1 2 1 1 0
Y(ω) 1 1 0 0 1 0 0 0
X(ω) + Y(ω) 4 3 2 1 3 1 1 0
We use the probability space to derive the X PY

joint PDF and the marginals shown on the PXY (x, y)
0 1 2 3
right. We shaded the event X + Y ≥ 2,
7 8 12 2 22
from which P[X + Y ≥ 2] = 27 . The event 0 27 27 27
0 27
Y
X ≤ 2 ∩ X + Y ≥ 2 has its probabilities 4 1 5
1 0 0 27 27 27
in red. The conditional probability is the
ratio (red sum)/(shaded sum) = 76 . PX 8
27
12
27
6
27
1
27
(b) In each case, we give the joint PDF and the joint PDF obtained from the product of the
marginals. If the two match, the random variables are independent. Otherwise they are not.
(i) X12 X12

PX12 X23 PX12 PX23
0 1 2 0 1 2
3 6 1 1 4 4 1
0 81 81
0 9
0 81 81 81 9
X23 1 6 18 12 4 X23 1 4 16 16 4
81 81 81 9 81 81 81 9
12 24 4 4 16 16 4
2 0 81 81 9
2 81 81 81 9
1 4 4 1 4 4
9 9 9 9 9 9
PX12 X23 6= PX12 PX23 , so X12 and X23 are not independent.
(ii) X12 X12
PX12 X34 PX12 PX34
0 1 2 0 1 2
1 4 4 1 1 4 4 1
0 81 81 81 9
0 81 81 81 9
X34 1 4 16 16 4 X34 1 4 16 16 4
81 81 81 9 81 81 81 9
4 16 16 4 4 16 16 4
2 81 81 81 9
2 81 81 81 9
1 4 4 1 4 4
9 9 9 9 9 9
In all entries, PX12 X34 = PX12 PX34 , so X12 and X34 are independent. No surprise be-
cause the first 2 coin tosses and the last 2 coin tosses are unrelated experiments.
499
(iii) X212 X212

P X2 2 P X2 P X2
12 X34 12 34
0 1 4 0 1 4
1 4 4 1 1 4 4 1
0 81 81 81 9
0 81 81 81 9
X234 1 4 16 16 4 X234 1 4 16 16 4
81 81 81 9 81 81 81 9
4 16 16 4 4 16 16 4
4 81 81 81 9
4 81 81 81 9
1 4 4 1 4 4
9 9 9 9 9 9
No surprise. If two random variables have nothing to do with each other (are indepen-
dent) then functions of the random variables will also be unrelated.
PP P P
(c) PX (x)PY (y) = PX (x) PY (y). Each individual sum is 1, so the product is 1.
x y x y
Exercise 18.7. We reproduce the outcome tree from Exercise 15.2. Below each outcome in the
outcome-tree, is its probability and the value the maximum (the random variable).
1
6
1
6
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
1 2 3 4 5 6 2 2 3 4 5 6 3 3 3 4 5 6 4 4 4 4 5 6 5 5 5 5 5 6 6 6 6 6 6 6
(a) x 1 2 3 4 5 6
1 3 5 7 9 11
PX (x) 36 36 36 36 36 36
(b) We only give the CDF at the values with positive PDF.
x 1 2 3 4 5 6
1 4 9 16 25 36
FX (x) 36 36 36 36 36 36
We only give the CDF at the values with positive PDF. The PDF can be computed from the
CDF by looking at the jumps in the CDF.
(c) Consider n dice values X1 , . . . , Xn . In the problem, n is 10. Note:
max(X1 , . . . , Xn ) ≤ x ↔ X1 ≤ x and X2 ≤ x and · · · and Xn ≤ x
The outcomes in the first event are the same as the outcomes in the second. That means,
P[max(X1 , . . . , Xn ) ≤ x] = P[X1 ≤ x and X2 ≤ x and · · · and Xn ≤ x].
Since X1 , . . . , Xn are independent, we can compute the RHS by multiplying the individual
probabilities. For x ∈ {1, 2, 3, 4, 5, 6}, P[Xi ≤ x] = x6 . Therefore,
x n
P[max(X1 , . . . , Xn ) ≤ x] = FX (x) = .
6
The jumps in FX give us the PDF PX ,
xn − (x − 1)n
PX (x) = FX (x) − FX (x − 1) = .
6n
Pop Quiz 18.8.

(a) Since I show you the smaller number half the time, you will be wrong half the time.
(b) Using the law of total probability for the two cases you say smaller or larger,
P[you win] = P[you win | smaller] P [smaller] + P[you win | larger] P [larger]
= 21 × 21 + 21 × 21 = 21 .
(c) I choose (4,5). You always say smaller and win half the time.
(d) Yes! See the discussion following the pop quiz in the text and Exercise 18.9.
500
Exercise 18.9.
(a) P[you win] = P[E] · P[you win | E] + P[ E ] · P[you win | E ] (total probability)
= P[E] · 1 + (1 − P[E]) · 21 (if E occurs you win; if not you win half the time)
= 12 + 21 P [E] (algebra)
= 12 + 81 (H − L) > 12 .
The last step uses P (E) = (H − L)/4, where H, L ∈ {1, 2, . . . , 5}. To see this, observe that
the interval (L, H) contains H − L possible outcomes of X, each having probability 41 .
(b) Let the PDF of X be p1 , p2 , p3 , p4 with minimum pi < 14 . I now choose L = i and H = i + 1.
Then P[E] = pi . And so, P[you win] = 21 + 21 P [E] = 12 + 21 pi < 12 + 21 × 41 = 58 .
Exercise 18.10.
(a) Xi = 1 if you get question i correct; P[Xi = 1] = 15 . The number of correct answers is
X = X1 + · · · + X20 , a Binomial with n = 20 and p = 51 . We want P[X ≥ 10],
P 1 k 4 20−k
P[X ≥ 10] = 20 k=10
20
k 5 5
≈ 0.0026.
(b) Let X be the number of games played between A, B. The outcomes AAAA or BBBB each
have probability ( 12 )4 = 161
. So P[X = 4] = 81 .
If the series ends in 5 games, there are two cases: A wins or B wins. Each has the same
probability. If A wins, the series looks like xxxxA, and A must win 3 games in the first
4. We have a Binomial with n = 4 and p = 21 . P[k = 3] = 43 ( 12 )3 ( 12 )1 = 16 4
. Therefore
4
P[A wins in 5] = 16 × 12 (the last 12 is because A wins the last game). So,
4
P[X = 5] = P[A wins in 5] + P[B wins in 5] = 16 .
5
1 3 1 2 1
Similarly, A wins in 6 games with probability 3 ( 2 ) ( 2 ) × 2 , and
10 5
P[X = 6] = P[A wins in 6] + P[B wins in 6] = 32 = 16 .
6
1 3 1 3 1
Lastly, A wins in 7 games with probability 3 ( 2 ) ( 2 ) × 2 , and
P[X = 7] = P[A wins in 7] + P[B wins in 7] = 20 64
= 165
.
100
80
50 100
(c) There are 20 × 30 ∗4 sequences with 20 ones and 30 fours ( 20 ways to choose the ones;

of the remaining 80 positions there are 80 30
ways to choose the fours; each of the remaining
50
50 slots can be picked in 4 ways for 4 ). The probability of each sequence is ( 61 )100 , so
100
P[20 ones and 30 fours] = 100 20
× 80 30
× 450 × 16 ≈ 9.226 × 10−6 .
Here is another approach similar to the Binomial distribution. We get the multinomial dis-
k1 +k2 +k3
tribution. From counting, there are k1 ,k2 ,k3
sequences with k1 objects of type 1, k2 of

100
type 2 and k3 of type 3. That is, there are 20,30,50 sequences with 20-ones, 30-fours, and
50-other-values (three types of objects). Each one has probability 61 ; each four has probabil-
ity 61 and each other has probability 46 . Therefore the probability of each such sequence is
( 16 )20 · ( 61 )30 · ( 46 )50 . So, we recover the same probability as:

100
P[20 ones and 30 fours] = 20,30,50 ( 61 )20 · ( 16 )30 · ( 64 )50
(d) The challenge is to evaluate small numbers, like 0.03
(1−p)n . It is numerically more stable to compute

the log PX . So, log PX (0) = n log(1 − p). Having
computed log PX (k−1), you can use the following 0.02
update to get log PX (k),

PX

p(n − k + 1)
log PX (k) ← log PX (k −1)+log . 0.01
(1 − p)k
0
250 300 350
k
Exercise 18.11.
(a) A boy is “success” and the probability of success is 21 . Let X be the number of trials till
success. We want P[X ≥ 5] where PX (t) = β(1 − p)t with p = 21 and β = p/(1 − p) = 1
∞
X 1 1
P[X ≥ 5] = ( 12 )t = ( 12 )5 × 1 = .
t=5
1− 2 16
501
Alternatively, we may observe that X ≥ 5 if and only if the first 4 children are girls, which
happens with probability ( 21 )4 .
(b) You wait t trials for two successes precisely when the tth trial
is a success and there is one success in trials 1, . . . , t − 1. There 0.2
are t − 1 sequences with one success in the first t − 1. The
PX
probability of each such sequence is p2 (1 − p)t−2 (two successes 0.1
and t − 2 failures), therefore
P[X = t] = PX (t) = (t − 1)p2 (1 − p)t−2 = β 2 (t − 1)(1 − p)t . 0
1 5 10 15
t
(c) (i) In the first 20 transmissions you have at most 11 successful transmissions. The proba-
bility this occurs is Let X be the number of successful transmissions in the first 20 trials;
X ∼ B(20, 0.9). We want P[X ≤ 11],
11
X 20
P[X ≤ 11] = (0.9)k (0.1)20−k .
k
k=0
(ii) You fail to send any packet with probability (0.1)20 , so the probability you successfully
send at least 1 packet is 1 − (0.1)20 .
Chapter 19
Exercise 19.1. The discussion after the exercise suggests the values you should observe for the
Monte Carlo averages. (a) average dice roll is about 3.5. (b) 10 coin tosses yields on average 5
heads. (c) 7.5mm of rain per day on average. (d) The gamblers lose on average $52.63.
Exercise 19.2.
P6
(a) E[X] = i × 16 = 61 × 21 · 6 · (7) = 72 = 3 21 .
i=1
(b) 0.1
average − E[X]
-0.1
102 103 104 105 106

P number of rolls P P P
(c) E[Y] = P (ω)Y(ω) = P (ω)(aX(ω) + b) = a P (ω)X(ω) + b P (ω) = a E [X] + b.
ω∈Ω ω∈Ω ω∈Ω ω∈Ω
Pop Quiz 19.3. We show the probability space with the random variables below.
Probability Space X = D1 − D2 X = |D1 − D2 |
1 1 1 1 1 1
36 36 36 36 36 36 −5 −4 −3 −2 −1 0 5 4 3 2 1 0
1 1 1 1 1 1
−4 −3 −2 −1 0 1 4 3 2 1 0 1
Die 2 Value
Die 2 Value
Die 2 Value
36 36 36 36 36 36
1 1 1 1 1 1
36 36 36 36 36 36 −3 −2 −1 0 1 2 3 2 1 0 1 2
1 1 1 1 1 1
36 36 36 36 36 36 −2 −1 0 1 2 3 2 1 0 1 2 3
1 1 1 1 1 1
36 36 36 36 36 36 −1 0 1 2 3 4 1 0 1 2 3 4
1 1 1 1 1 1
36 36 36 36 36 36 0 1 2 3 4 5 0 1 2 3 4 5
Die 1 Value Die 1 Value Die 1 Value

To compute the expectations, we weight each random-variable value by the corresponding prob-
ability and add. For D1 − D2 , every positive value has a corresponding negative value which
cancels, so E[D1 − D2 ] = 0. The cancelation does not occur for E[|D1 − D2 |],
1 70 17
E[|D1 − D2 |] = × 2 × (5 × 1 + 4 × 2 + 3 × 3 + 2 × 4 + 1 × 5) = =1 .
36 36 18
Exercise 19.4.
502
(a) n = 20 and p = 12 , so E[X] = np = 10. The expected number of heads is 10.

(b) n = 20 and p = 51 , so E[X] = np = 4. The expected number of correct answers is 4.
P n n
P
(c) E[X(X − 1)] = k(k − 1) nk pk (1 − p)n−k = k(k − 1) nk pk (1 − p)n−k . Observe that
k=0 k=2
n
k(k − 1)n! n! n(n − 1)(n − 2)! n−2
k(k − 1) = = = = n(n − 1) .
k k!(n − k)! (k − 2)!(n − k)! (k − 2)!(n − k)! k−2
Using this identity in the expression for E[X(X − 1)],
Pn
E[X(X − 1)] = n(n − 1) n−2
k−2
pk (1 − p)n−k
k=2
n
P
n−2
= n(n − 1)p2 k−2
pk−2 (1 − p)n−k (n(n − 1)p2 is a constant)
k=2
n−2
P
n−2
= n(n − 1)p2 ℓ
pℓ (1 − p)n−2−ℓ (change index to ℓ = k − 2)
ℓ=0
2 n−2
= n(n − 1)p (p + 1 − p) (Binomial theorem)
= n(n − 1)p2 (p + 1 − p = 1)
Exercise 19.5.
1 1
(a) Hitting the bulls-eye is success. The bulls-eye area is 100 th the area of the board, so p = 100
.
Therefore, the expected number of darts you throw is p1 = 100.
(b) A 5-pack contains no EX with probability 0.995 ≈ 0.95. So a 5-pack contains an EX (success)
with probability p ≈ 0.05. You expect to buy p1 ≈ 20 5-packs to get an EX.
(c) If you pay x, with probability 10−7 you win 106 and with probability 1 − 10−7 you lose x, so
your expected profit is 10−7 × 106 − x(1 − 10−7 ). You would only play if you made a profit
so x < 10 1
/(1 − 10−7 ). You would pay approximately 10¢ to play.
(d) (i) We may assume the first child is a boy (the argument is identical if the first is a girl).
You are now waiting for a girl with success probability p = 21 . Therefore you expect 2
trials to get a girl, for a total of 3.
(ii) You expect to wait 2 trials for the first boy and 2 more for the second for a total of
4 kids. Let’s compute this from the PDF computed in Exercise 18.11 for the waiting
time to two successes with success probability p. Since p = 21 , PX (t) = (t − 1)( 12 )t , from
which
∞
P
E[X] = t(t − 1)( 12 )t . (∗)
t=1
We need to compute this infinite sum. Note that the first term in the sum is zero and
∞
P ∞
P
1
2
E [X] = t(t − 1)( 12 )t+1 = (t − 1)(t − 2)( 12 )t . (∗∗)
t=1 t=2
Subtracting (∗∗) from (∗), the LHS is E[X] − 21 E [X] = 12 E [X] and we have:
∞
P ∞
P ∞
P
1
2
E [X] = (t(t − 1) − (t − 1)(t − 2))( 12 )t = 2(t − 1)( 12 )t = (t − 1)( 12 )t−1 .
t=2 t=2 t=2
∞
P
The last sum is t( 21 )t which equals 2 by Lemma 19.6 on page 287, so we get E[X] = 4.
t=1
Exercise 19.6.
1
(a) Conditioned on D1 + D2 ≥ 4, there are now 33 outcomes, each having probability 33
. So,
3×4+4×5+5×6+6×7+5×8+4×9+3×10+2×11+1×12 244
E[D1 + D2 | D1 + D2 ≥ 4] = 33
= 33
≈ 7.4.
(b) (i) The relevant outcomes are {0, 2, 4, . . . , 20}.
P[X = 2i ∩ even] 1 1
P[X = 2i | even] = = 20
2i
· ( 12 )2i · ( 21 )20−2i = 20
2i
.
P[even] P[even] 220 P [even]
We can compute P[even] as a sum over Binomial coefficients:
P 20
P[even] = ( 12 )20 i
= ( 21 )20 × 219 = 12 ,
even i
P n
n−1
where we used i
=2 (the same is also true for the sum over the odd Binomial
even i
503
coefficients). To see this, the Binomial Theorem gives:

n
P P n P
n n
2n = (1 + 1)n = i
= i
+ i
.;
i=0 even i odd i
Pn P P
n n n
0 = (−1 + 1)n = i
(−1)i = i
− i
.
i=0 even i odd i
P n P n n−1
Solving these equations, i
= i
= 2 .
even i odd i
Now for the conditional expectation:

X X i 20
E[X | even] = i · P[X = i | even] = 20 P [even]
even i even i
2 i
1 X 20!
= 19
2 even i>0 (i − 1)!(20 − i)!
20 X 19!
= 19
2 even i>0 (i − 1)!(19 − (i − 1))!

20 X 19
= 19
2 even i>0 i − 1

20 X 19
= 19 .
2 i
odd i
The last sum is 218 and so E[X | even] = 10.

P
(ii) We need to do a similar analysis: E[X | at least 8] = i · P[X = i | at least 8], where
i≥8

1 20
P[X = i | at least 8] = 20 ,
2 P [at least 8] i
P 20
and P[at least 8] = 2−20 i
≈ 0.86841. Therefore,
i≥8
1 X 20
E[X | at least 8] = i· ≈ 10.553.
220 P [at least 8] i
i≥8
1
(c) The relevant outcomes are {5, 7, 11, 13} each with conditional probability 4
. So
2
E[X | prime] = 1
(52 2
+ 7 + 11 + 13 ) = 91.
4
2 2
(d) We want E[X | X ≥ k + 1]. We need the conditional probability

P[X = t ∩ X ≥ k + 1] β(1 − p)t
P[X = t | X ≥ k + 1] = = ,
P[X ≥ k + 1] (1 − p)k
Because P[X ≥ k + 1] is the probability to fail on the first k trials which is (1 − p)k . Therefore,
∞ ∞
β X t β X 1
E[X | X ≥ k + 1] = t(1 − p) = (t + k)(1 − p)t+k = + k.
(1 − p)k (1 − p)k t=1 p
t=k+1
X xPX (x) X xPX (x)
(e) E[X | X ≥ 25] = . and E[X | X ≥ 17] = . So,
P[X ≥ 25] P[X ≥ 17]
x≥25 x≥17
!
P P
P[X ≥ 17] xPX (x) − P[X ≥ 25] xPX (x)
x≥25 x≥17
E[X | X ≥ 25] − E[X | X ≥ 17] = .
P[X ≥ 25] P [X ≥ 17]
We show
P that the numerator isPpositive: Let P1 = P[17 ≤ X < 25]; P1 = P[X ≥ 25];
S1 = 24
x=17 xPX (x); and, S2 = x≥25 xPX (x). Note: S2 > 25P2 and S1 < 24P1 . We have,
P P
P[X ≥ 17] xPX (x) − P[X ≥ 25] xPX (x)
x≥25 x≥17
= (P1 + P2 )S2 − P2 (S1 + S2 )

= P1 S 2 − P 2 S 1
> 25P1 P2 − 24P2 P1
> 0.
504
Pop Quiz 19.7.

(a) Definition of expected value.
(b) By the law of total probability, P[X = x] = P[A] P [X = x | A] + P[A] P [X = x | A].
(c) P[A] and P[A] are independent of x and can be pulled out of the sum (constant rule).
(d) Definition of the conditional expectation.
Exercise 19.8.
(a) The logic is similar to the case of 3 dice. Let X1 be the first die, and X3 the sum of the
remaining 3 dice. The dice are independent, so P[X3 = x3 | X1 = x1 ] = PX3 (x3 ). By the law
of total probability,
X6
E[X] = E[X3 | X1 = i] P [X1 = i].
P i=1
Since E[X | X1 = i] = x3 (i + x3 )PX3 (x3 ) = i + E[X3 ] = i + 10.5, we have
6 6 6
1 X 1 X 1 X
E [X] = (i + 10.5) = E [X] = gi + E [X] = 10.5 = 3.5 + 10.5 = 14.
6 i=1
6 i=1
6 i=1
The expected sum of 4 dice is 4 times the expexted value of one die.
(b) E[X] = E[X | B] P [B] + E[X | G] P [G]. Let p be the probability of a boy. If you start with a
boy, you are now waiting for a girl, so E[X | B] = 1 + 1/(1 − p). If you start with a girl, you
are now waiting for a boy, so E[X | G] = 1 + 1/p. So,

1 1 p 1−p
E[X] = 1 + p+ 1+ (1 − p) = 1 + + .
1−p p 1−p p
When p = 21 , you expect to 3 kids till you get a boy and a girl.
(c) E[X] = E[X | H] P [H] + E[X | T] P [T] = 7p + 10.5(1 − p) = 10.5 − 3.5p, where p = P[H].
9 1
(d) E[X] = E[X | fair] P [fair] + E[X | 2-heads] P [2-heads] = 5 × 10 + 10 × 10 = 5 12 .
2 2 2
(e) E[X ] = E[X | success] P [success] + E[X | fail] P [fail]
Exercise 19.9.
(a) For n > 1, let Ai be the event that the pivot is picked as the ith smallest number, i = 1, . . . , n.
P[Ai ] = n1 . By the law of total probability,
n n
X 1X
Tn = E[runtime(n)] = E[runtime | Ai ] P [Ai ] = E[runtime | Ai ].
i=1
n i=1
Given Ai , the runtime is n + 1 plus the runtime on the left list of size i − 1 plus the runtime
on the right list of size n − i.
runtime given Ai = n + 1 + runtime(i − 1) + runtime(n − i).
Taking the expection, E[runtime | Ai ] = (n + 1) + Ti−1 + Tn−i . Therefore,
n
1X 1
Tn = [(n+1)+Ti−1 +Tn−i ] = n+1+ (T0 +Tn−1 +T1 +Tn−2 +· · ·+Tn−2 +T1 +Tn−1 +T0 ).
n i=1 n
The sum contains two copies of each Ti . Since T0 =0, we have
n−1
2X
Tn = n + 1 + Ti .
n i=1
(b) Using (a), T2 = 3 + 22 T1 = 4. Rewriting the recursion in (a),
n−1
X
nTn = n(n + 1) + 2 Ti .
i=1
Similarly, for n > 2, we have
n−2
X
(n − 1)Tn−1 = (n − 1)n + 2 Ti .
i=1
Subtracting this equation from the previous,
n−1
X n−2
X
nTn − (n − 1)Tn−1 = n(n + 1) − (n − 1)n + 2 Ti − 2 Ti = 2n + 2Tn−1 .
i=1 i=1
Rearranging gives nTn = (n + 1)Tn−1 + 2n. Dividing both sides by n gives the desired result.
505
(c) Deriving the upper bound is interesting. Given the bound, proving it by induction is good
practice. We need two facts: 1 + x ≤ ex ; eHn ≥ n + 1 (which we proved using the integration
method on page 119.) One can verify the bound for T1 and T2 . Suppose the bound holds for
Tn and apply this induction hypothesis to Tn+1 :
Tn+1 = (1 + 1
n+1
)Tn + 2 ≤ 2(1 + 1
n+1
)Hn eHn + 2 ≤ 2e1/(n+1) Hn eHn + 2 = 2Hn eHn+1 + 2.
1
Writing Hn = Hn+1 − n+1
, we get that
Tn+1 ≤ 2(Hn+1 − 1
n+1
)eHn+1 + 2 = 2Hn+1 eHn+1 + 1 − 1
n+1
eHn+1 .
eHn+1 > n + 2 1
implies n+1 eHn+1 ≥ n+2
n+1
> 1, or 1 − 1
n+1
eHn+1 < 0. Therefore,
Hn+1 Hn+1
Tn+1 ≤ 2Hn+1 e + 1− 1
n+1
e ≤ 2Hn+1 eHn+1 .
How did we derive the upper bound? We unfolded the recursion:
Tn = (1 + n1 )Tn−1 + 2
1 1 1
(1 + n
)Tn−1 = (1 + n
)(1 + n−1
)Tn−2 + 2(1 + n1 )
1 1 1 1 1 1 1
(1 + n
)(1 + n−1
)Tn−2 = (1 + n
)(1 + n−1
)(1 + n−2 )Tn−3 + 2(1 + n
)(1 + n−1
)
..
.
(1 + n1 ) · · · (1 + 14 )T3 = (1 + 1
n
)(1 + 1
n−1
) · · · (1 + 31 )T2 + 2(1 + 1
n
) · · · (1 + 41 ).
Now equate the sum of the left hand sides to the sum of the right hand sides,

Tn = (1 + n1 ) · · · (1 + 31 )T2 + 2 1 + (1 + n1 ) + (1 + n1 )(1 + n−1
1
) + · · · + (1 + n1 ) · · · (1 + 41 ) .
The first term is bounded by e1/n e1/(n−1) · · · e1/3 = eHn −H2 . The second term is a sum of
terms of the form ak , where ak = (1 + n1 ) · · · (1 + k1 ). Using ln(1 + x) ≤ x (because 1 + x ≤ ex ),
n
X 1
log ak ≤ = Hn − Hk−1 ,
i
i=k
n
X n
X
which implies Tn ≤ T2 eHn −H2 +2 eHn −Hk = T2 eHn −H2 +2eHn e−Hk .
Pnk=3 −Hk Pn k=3
Since Hk ≥ ln k, e−Hk ≤ 1
k
and so k=3 e ≤ k=3 k
1
= Hn − H2 . We conclude that
Hn −H2 Hn Hn
Tn ≤ T2 e + 2e (Hn − H2 ) = 2Hn e + eHn (T2 e−H2 − 2H2 ).
Since T2 = 4, you may verify that T2 e−H2 − 2H2 < 0 which gives the bound.
Chapter 20
Exercise 20.1.
1
(a) X is the sum of the dice, X = X1 +· · ·+Xn . Xi is an ri -sided dice, so E[Xi ] = ri
(1+· · ·+ri ) =
1
ri
× 21 ri (ri + 1) = 21 (ri + 1). By linearity of expectation,
n n n
X 1X n 1X
E[X] = E[Xi ] = (1 + ri ) = + ri .
i=1
2 i=1 2 2 i=1
(b) Let Xi indicate (0 or 1) whether trial i is a success. Xi is a Bernoulli indicator random variable
with success probability pi and E[X P i ] = pi . ThePnumber of successes X = X1 + · · · + Xn . By
linearity of expectation, E[X] = n i=1 E[Xi ] =
n
i=1 pi .
(c) Xi is the number of trials from the (i − 1)th success P to the ith and X = X1 + · · · + Xn .
E[Xi ] = 1/pi . By linearity of expectation, E[X] = n i=1 1/pi .
(d) Computing the PDF is very challenging, let alone computing the expectation from the PDF.
Pop Quiz 20.2. We need to count the possible outcomes. There are 6 choices for the first die. If
die 1 is i, there are i further rolls, so there are 6i possible outcomes if the first die is i. Therefore,
the number of possible outcomes is 61 + · · · + 66 = 55986.
The probability space is not uniform. The outcome (1, 1) has probability 1/6 2 , but the outcome
(2, 1, 1) has probability 1/63 .
You could creat a uniform outcome space by tossing 7 dice. You would then define the equivalent
random variable X1 as the first roll and X2 as the sum of the next X1 rolls.
Exercise 20.3.
506
(a) For cases X = 1, 2, . . . , 100, total probability gives

100 100
X 1 X1
PY (y) = PY (y | x)PX (x) = 0.04
x=1
100 x=y x
PY
because: PX (x) = 1/100 (X ∼ U[100]); PY (y | x) = 0 for
x < y; PY (y | x) = 1/x for x ≥ y. (Y ∼ U[x]). The sum 0.02
is H100 for y = 1 and H100 − Hy−1 for y > 1, where Hn is

the nth harmonic number. So,
( 0
1 20 40 60 80 100
H100 /100 y = 1; y
PY (y) =
(H100 − Hy−1 )/100 y ∈ {2, . . . , 100}
If you wish to verify

P that the probabilities sum to 1, you may do so, but you will need the
Harmonic sum n i=1 Hn = (n + 1)Hn − n (you can prove this by induction). Computing the
expected value from this PDF is a torturous exercise in summation, but since we are young,
lets go for it. In the process we show how to compute Harmonic sums, which occur frequently
in computer science. We will do the computation for general n (in our case n = 100).
n n
! n n
!
X 1 X 1 X X
E[Y] = iPY (i) = Hn + i(Hn − Hi−1 ) = Hn i− iHi−1 .
i=1
n i=2
n i=1 i=2
1
The first sum is 2
n(n + 1). The second sum is an example of a Harmonic sum.
n
P 2
iHi−1 = 2 × H1 = 1
i=2
3 3
+ 3 × H2 = 1
+ 2
4 4 4
+ 4 × H3 = 1
+ 2
+ 3
5 5 5 5
+ 5 × H4 = 1
+ +2 3
+ 4
.. .. .. ..
. . . .
+ n × Hn−1 = n1 + n2 + n
3
+ n
4
+ ··· + n
n−1
Instead of summing rows, let’s sum columns (shaded). The first column is (2 + · · · + n)/1 =
1
2
(n(n + 1) − 1(1 + 1))/1; the second column is (3 + · · · + n)/2 = 12 (n(n + 1) − 2(2 + 1))/2;
the ith column is ((i + 1) + · · · + n)/i = 12 (n(n + 1) − i(i + 1))/i. Therefore,
n n−1
X 1 X n(n + 1) − i(i + 1)
iHi−1 =
i=2
2 i=1 i
n−1
n(n + 1) 1X
= Hn−1 − (i + 1)
2 2 i=1
n−1
n(n + 1) 1X n(n + 1) 1
= Hn−1 − (i + 1) = Hn−1 − ( 12 n(n + 1) − 1).
2 2 i=1 2 2
Pn
(You may use this technique to compute the Harmonic sum i=1 Hn .) For E[Y] we get
1 1
E[Y] = n(n + 1)Hn − 12 n(n + 1)Hn−1 + 41 n(n + 1) − 21
n 2
1 1 n+3
= 2
n(n + 1) (Hn − Hn−1 ) + 14 n(n + 1) − 12 = .
n | {z } 4
1/n
In our case, n = 100, so E[Y] = 25 43 .

(b) Y ∼ U[X]. By Theorem 19.5 on page 287, E[Y | X] = 12 (X + 1). By iterated expectation,
E[Y] = EX [E[Y | X]] = E[ 21 (X + 1)] = 1
2
+ 1
2
E [X] = 25 43 ,
where the last step follows because X ∼ U[100], so E[X] = 12 (100 + 1). How much easier!
Exercise 20.4.
507
(a) Y depends on Z. If Z = 0, Y = 0, so E[Y | Z = 0] = 0. If Z = 1, Y has the same PDF as X,

so E[Y | Z = 1] = E[X] and E[Y 2 | Z = 1] = E[X2 ]. We can summarize both cases as
E[Y | Z] = E[X] · Z;
E[Y 2 | Z] = E[X2 ] · Z.

(b) E[Y] = EZ E[Y | Z] = EZ E[X] · Z = E[X] · EZ [Z] = (1 − p) E [X]

E[Y 2 ] = EZ E[Y 2 | Z] = EZ E[X2 ] · Z = E[X2 ] · EZ [Z] = (1 − p) E [X2 ]
(Remember that E[X] is just a number, so it can be pulled outside the expectation w.r.t. Z.)
(c) E[X] = 1 + E[Y ] = 1 + (1 − p) E [X] → E[X] = 1/p.
E[X2 ] = 1 + 2 E [Y ] + E[Y 2 ] = 2/p − 1 + (1 − p) E [X2 ] → E[X2 ] = (2 − p)/p2 .
Why bother to derive something we already derived using total expectation?
• This derivation using iterated expectation is easier to apply to more complicated situations.
• It always helps to see an old result derived using a new tool.
Pop Quiz 20.5.
(a) X is fixed with respect to the inner expectation that is w.r.t. Y.
(b) Y is independent of X, so the PDF for Y given X is unchanged, so the conditional expectation
is the same as the unconditional expectation.
(c) E[Y] is a number independent of X that can be pulled outside the expectation.
Exercise 20.6.
(a) The information X1 + X2 = 9 introduces a dependency between X1 and X2 . The possible
outcomes are (3, 6), (4, 5), (5, 4), (6, 3), each with conditional probability 14 . The conditional
expectation of the product is 41 (18 + 20 + 20 + 18) = 19. The conditional expectation of each
roll is 41 (3 + 4 + 5 + 6) = 4.5 and 4.52 = 20.25, so the conditional expectation of the product
does not equal the product of condtional expectations, even though the random variables
started out independent.
(b) (i) E[ X11 ] = 61 ( 11 + 21 + 31 + 41 + 15 + 61 ) ≈ 0.41, where as 1/ E [X1 ] = 1/3.5 ≈ 0.29. Not equal.
P
(ii) By the law of total expectation, E[X1 /X2 ] = 6i=1 E[X1 /X2 | X1 = i] P [X1 = i]. Since
E[X1 /X2 | X1 = i] = i E [1/X2 ], it follows that
6
1 X
E[X1 /X2 ] = E [1/X2 ] i = E[X1 ] E [1/X2 ] ≈ 1.43.
6 i=1
E[X1 ]/ E [X2 ] = 1. Not equal.
(iii) In (ii) we showed that E[X1 /X2 ] E [X1 ] E [1/X2 ].
Exercise 20.7. The main insight to solving problems like this is to realize that you can
“reparameterize” a sum (just like changing variables in a double integral) to change the order in
which the terms are added. Here, we let n be the sum of k and i, so n = k + i. The possible
values of n are 0, 1, . . . , r. Given n, the possible values of k are 0, 1, . . . , n; given n, k, i = n − k.
Therefore,
Xr r−k
X r X
X n
f (k, i) = f (k, n − k).
k=0 i=0 n=0 k=0
The identity above holds for any function f (k, i); every term on the left is accounted for on the
right and vice-versa. Using this identity,
r r−k
X (−1)i r X n r n r n
X X (−1)n−k X (−1)n X n! X (−1)n X n
= = (−1)k = (−1)k .
k!i! n=0 k=0
k!(n − k)! n=0
n! k!(n − k)! n=0
n! k
k=0 i=0 k=0 k=0
P
By the Binomial theorem, n k=0 k
n
(−1) k
= (−1 + 1) n
= 0 when n > 0. So, all the terms are 0
except for n = 0, which is (−1)0 /0! = 1.
Pop Quiz 20.8. y1 is the minimum possible temperature, so it cannot possibly break the record.
n−1
P i n−1
P n n−1
P 1
Pop Quiz 20.9. 1+ n1 n−i
= 1+ n1 ( n−i −1) = 1− n−1n
+ n−i
= n1 + n−1
1
+· · ·+ 11 = Hn .
i=1 i=1 i=1
Exercise 20.10.
(a) Let Xi be an indicator that i is picked. P[Xi = 0] = (1 − 1/n)m (by independence, because
i is not picked with probability 1 − 1/n). Xi is Bernoullim, so E[Xi ] = 1 − (1 − 1/n)m . The
number of distinct elements is X = X1 + · · · + Xn . By linearity of expectation,
Xn
m m
E[X] = E[Xi ] = n 1 − 1 − n1 = n − n 1 − n1 .
i=1
508

(b) Let Xi indicate whether white ball i is picked, i = 1, . . . , a. There are a+b
k
ways to choose

k balls (without replacement), each equally likely. If ball i is not picked, there are a+b−1k
a+b
ways. So, P[Xi = 0] = a+b−1k
/ k = 1 − k/(a + b) and E[Xi ] = k/(a + b). The number
of white balls picked is X = X1 + · · · + Xa . By linearity of expectation,
a
X ak
E[X] = E[Xi ] = .
i=1
a +b
(c) Obtaining the expectations from the PDF is indeed a challenge for two reasons. First we must
get the PDF. This is going require all our expertise in counting. Then, we must compute the
expectation from the PDF which will require heavy duty summation. Why go through the
effort? Because the expectation is not the only reason to get the PDF. What if you wanted to
know how large an m you should use if you want to see at least half the objects in part (a)?
(a) We need P (k) = probability that k distinct elements are sampled. There are n m m-
sequences of the n objects. How many of these contain exactly k distinct objects?

First choose the k elements to sample in nk ways. Now from these elements, construct
your m-sequence, with the condition that each of the k elements are used at least once.
The number of m-sequences of the k objects is k m . Let Ai be the sequences that do not
use element i. The number of sequences using all k elements is k m − |A1 ∪ A2 ∪ · · · ∪ Ak |.
Since |ℓ-way intersection of Ai | = (k − ℓ)m , by inclusion-exclusion,
k
X k
|A1 ∪ A2 ∪ · · · ∪ Ak | = (−1)ℓ+1 (k − ℓ)m .
ℓ
ℓ=1
P P
There are k m − kℓ=1 (−1)ℓ+1 kℓ (k − ℓ)m = kℓ=0 (−1)ℓ kℓ (k − ℓ)m sequences that use
all k objects. Dividing by nm gives
nX k m X k m
k k−ℓ n k ℓ
P (k) = (−1)ℓ = (−1)k−ℓ ,
k ℓ n k ℓ n
ℓ=0 ℓ=0
where
k = 1,2, . . . , n. The second expression changes summation index to k − ℓ and uses
k
k−l
= kl . Note that if k > m, then P (k) should be 0 and this is indeed the case.
k
X k
Lemma 30.7. If k > m, then (−1)ℓ ℓm = 0.
ℓ
ℓ=0
Pk ℓ k

Proof. We use strong induction on m. When m = 0, ℓ=0 (−1) ℓ
= (1 − 1)k = 0.
Pk ℓ k
x
Assume for m ≥ 0 that ℓ=0 (−1) ℓ
ℓ = 0 for all x ≤ m and k > x. Consider
Pk ℓ k
m+1
ℓ=0 (−1) ℓ
ℓ with k > m + 1. Since m ≥ 0, the first term is 0, so we have
k k k
X ℓ k m+1
X ℓ k m
X ℓ k−1 m
(−1) ℓ = (−1) ℓ ℓ = (−1) k ℓ .
ℓ ℓ ℓ−1
ℓ=1 ℓ=1 ℓ=1

(We used ℓ kℓ = k k−1 ℓ−1
.) Summing from 0 to k − 1 instead of 1 to k, we get
k−1
P k−1
P m
P
k−1 k−1 m
−k (−1)ℓ ℓ
(ℓ + 1)m = −k (−1)ℓ ℓ i
ℓi
ℓ=0 ℓ=0 i=0
Pm
m
k−1
P k−1

= −k i
(−1)ℓ ℓ
ℓi .
i=0 ℓ=0
(We used the Binomial theorem for (ℓ + 1)m and in the last step we changed the order of
summing.) Since k > m + 1, k − 1 > m and the right sum is zero for every i ∈ {0, . . . , m}
by the induction hypothesis, concluding the proof.
P
The use of this lemma is that we may compute E[X] as n k=1 kP (k) without regard to m
because if k > m then P (k) = 0.
P
Exercise: We know that n k=1 P (k) = 1 since it is a PDF. Prove it.
509
To work the exercise, you will need to use techniques as we are going to display. We want
n
P n
P
kP (k) = (k − n + n)P (k)
k=1 k=1
n
P n
P
= n− (n − k)P (k) use: P (k) = 1
k=1 k=1
Pn Pk
n k
= n− (−1)k−ℓ (n − k) ( nℓ )m
k ℓ
k=1 ℓ=1
Pn P k n−ℓ ℓ m n−ℓ
= n− (−1)k−ℓ (n − k) nℓ n−k
(n) use: n
k
k
ℓ
= n
ℓ n−k
k=1 ℓ=1
n
P P n
n−ℓ n
P k
P n
P n
P
= n− (−1)k−ℓ (n − k) nℓ n−k
( nℓ )m use: =
ℓ=1 k=ℓ k=1 ℓ=1 ℓ=1 k=ℓ
Pn n−1
P n−ℓ−1 ℓ m
= n− (−1)k−ℓ (n − ℓ) nℓ n−k−1
(n)
ℓ=1 k=ℓ

n−ℓ n−ℓ−1
In the last step we used (n − k) n−k = (n − ℓ) n−k−1 and dropped the term with
k = n because it is zero. Now move all terms involving only ℓ outside the inner sum,
Pn n
P n−1
P
kP (k) = n − (−1)−ℓ (n − ℓ) nℓ ( nℓ )m n−ℓ−1
(−1)k n−k−1
k=1 ℓ=1 k=ℓ
Pn n−1−ℓ
P
n n−ℓ−1
= n− (−1)−ℓ (n − ℓ) ℓ
( nℓ )m (−1)n−1−i i
ℓ=1 i=0
Pn
n
ℓ m n−1−ℓ
P n−ℓ−1

= n− (−1)n−1−ℓ (n − ℓ) ℓ
(n) (−1)i i
ℓ=1 i=0
The inner sum is an alternating sum of Binomial coefficients which we know is 0, unless
n−1−ℓ = 0 in which case the inner sum is 1. Therefore, the entire sum for the expectation
collapses to one term, the one with ℓ = n − 1:
Pn m
n n−1 m
kP (k) = n − (−1)n−1−(n−1) (n − (n − 1)) n−1 n
= n − n 1 − n1 .
k=1
That’s the same answer we got by using indicators, but what a computation. Wow!

(b) There are a+b ways to choose k balls. If i are white, there are ai to choose the i white
k
b
balls and k−i ways to choose the remaining balls as black. So,
a b . a + b
P (i) = .
i k−i k
P
Exercise: We know that n k=1 P (k) = 1 since it is a PDF. Prove it.
P
To prove that k P (k) = 1 you need a famous identity known as Vandermonde convolu-
tion, which is an application of the Binomial theorem:
a+b
P
a+b
k
xk = (1 + x)a+b = (1 + x)a (1 + x)b
k=0
Pa iP b
a b
= i
x j
xj
i=0 j=0
Pa b
P a
b
= i j
xi+j
i=0 j=0
a+b
P P a P b
a P a+b
P P a
a b
= i k−i
xk = , with j = k − i
k=0 i=0 i=0 j=0 k=0 i=0
a+b
P
= α k xk
k=0
On the left is a polynomial in x. On the right is also a polynomial in x. Two polynomials
are equal if and only if their coefficients match. That is, αk = a+bk
.
a
X a
b a+b
Lemma 30.8 (Vandermonde convolution). = .
i=0
i k−i k
510
P P b a+b
Now for the expectation. We need ai=0 iP (i) = i i ai k−i k
:
Pa b Pa b
i ai k−i
= i ai k−i
(i = 0 term is zero)
i=0 i=1
Pa
= a a−1i−1
b
k−i
use: i ai = a a−1i−1
i=1
a−1
P a−1 b
= a i k−1−i)
(sum over i = 0, . . . , a − 1)
i=0

= a a+b−1
k−1
. (Vandermonde convolution)

Finally, E[X] = a a+b−1
k−1
a+b
k
= a+bak
, as we got before with indicators.
Chapter 21
−5 −4 −3 −2 −1×5 0 1×5 2 3 4 5
Pop Quiz 21.1. E[Δ] = 36
+ 18
+ 12
+ 9
+ 36
+ 6
+ 36
+ 9
+ 12
+ 18
+ 36
= 0. In
general,
E[Δ] = E[X − µ] = E[X] − E[µ] = µ − µ = 0.
Exercise 21.2.
(a) To construct the table on the right, we use X 1 2 3 4 5 6
E[X] = 3 21 . From the table, Δ2 25 9 1 1 9 25
← (X − 3 12 )2
4 4 4 4 4 4
2 2 1 35
σ = E[Δ ] = 24
p (25 + 9 + 1 + 1 + 9 + 25) = 12 PX 1 1 1 1 1 1
6 6 6 6 6 6
std. deviation = 35/12.
(b) We show the possible outcomes for the average of two dice, along with the PDF. We know
that E[average] = 3 12 .
X 1 1 12 2 2 21 3 3 12 4 4 12 5 5 12 6
2 25 16 9 4 1 0 1 4 9 16 25
Δ 4 4 4 4 4 4 4 4 4 4 4
← (X − 27 )2
1 1 1 1 5 1 5 1 1 1 1
PX 36 18 12 9 36 6 36 9 12 18 36
Note that the probabilities do not change. It is just the random variable that changed.
σ 2 = E[Δ2 ] = 1
4×36
(25 + 32 + 27 + 16 + 5 + 0 + 5 + 14 + 27 + 32 + 25) = 35
24
p
std. deviation = 35/24.
(c) The mean is p, so the deviations are
(
−p prob = 1 − p;
Δ= → σ 2 = (1 − p)p2 + p(1 − p)2 = p(1 − p).
1 − p prob = p.
p
The standard deviation is σ = p(1 − p).
(d) µ = 7 and σ = 2.52 so the event of interest is 5 ≤ X ≤ 9. So,
1 5 1 5 1
P[µ − σ ≤ X ≤ µ + σ] = 9
+ 36
+ 6
+ 36
+ 9
= 23 .
Exercise 21.3.
(a) E[X2 ] = 61 (12 + · · · + 62 ) = 15 61 ; E[X2 ] − E[X]2 = 15 16 − (3 12 )2 = 35
12
✓
2 22 ·1+32 ·2+···+72 ·6+82 ·5+···+122 ·1 2 2
(b) E[X ] = 4×36
= 329
24
; E[X ] − E[X] = 329
24
− 49
4
= 35
24
✓
2 2 2 2 2 2
(c) E[X ] = (1 − p) × 0 + p × 1 = p; E[X ] − E[X] = p − p = p(1 − p) ✓
Exercise 21.4. By linearity, E[Y] = E[a + bX] = a + b E [X]. The deviations in Y are
ΔY = Y − E[Y] = a + bX − a + b E [X] = b(X − E[X]) = bΔX .
σ 2 (Y) = E[Δ2Y ] = E[b2 Δ2X ] = b2 E [Δ2X ] = b2 σ 2 (X).
Let X be a Bernoulli. Then Y = 2X − 1. By Theorem 21.3, σ 2 (Y) = 22 σ 2 (X) = 4p(1 − p).
511
Exercise 21.5. This exercise requires careful manipulation of sums that are squared.
n " 2 # n 2
P Pn P
2
σ ai X i = E ai X i −E ai X i (definition)
i=1 i=1 i=1
" # 2
n
P n
P n
P
= E ai X i aj X j − ai E [Xi ] (linearity)
i=1 j=1 i=1
n P
P n n P
P n
= ai aj E [Xi Xj ] − ai aj E [Xi ] E [Xj ] (linearity)
i=1 j=1 i=1 j=1
Pn n
P P n
= a2i (E X2i − E[Xi ]2 ) − ai aj (E [Xi Xj ] − E[Xi ] E [Xj ])
i=1 i=1 j6=1 | {z }
0 for independent random variables
n
P
= a2i σ 2 (Xi )
i=1
The key step in the derivation above is to break the double sum which arises from squaring into
the terms where i = j (Xi Xj = X2i ) and i 6= j (Xi and Xj are independent).
Exercise 21.6. X = X1 + X2 + X3 + X4 + X5 , where Xi are independent waiting times each
with variance (1 − p)/p2 = 6. So σ 2 (X) = 5 × 6 = 30.
A Monte Carlo with 100,000 experiments gave an average wait time of 14.98 and a variance in the
wait times of 29.99. A pretty good match to the theory.
Pop Quiz 21.7. X1 = 1 ∧ X2 = 0 → X3 = 0; X1 = 0 ∧ X2 = 0 → X3 = 0 or 1.
P
Exercise 21.8. X = n i=1 Xi and
" n n # n X n n n X
n
2
XX X X X
E[X ] = E Xi Xj = E[Xi Xj ] = E[X2i ] + E[Xi Xj ].
i=1 j=1 i=1 j=1 i=1 i=1 j6=1
1
(In the second step we used linearity of expectation.) Xi is a Bernoulli with probability p = n
so
E[X2i ] = p. Xi Xj is a Bernoulli with probability
(n − 2)! 1
p = P[Xi = 1 ∧ Xj = 1] = = .
n! n(n − 1)
So, E[Xi Xj ] = 1/n(n − 1) and we have
n n n
X 1 XX 1 1 1
E[X2 ] = + = ×n+ × n(n − 1) = 2.
i=1
n i=1
n(n − 1) n n(n − 1)
j6=1
Since E[X] = 1, we have σ (X) = E[X2 ] − E[X]2 = 2 − 1 = 1.

2
Exercise 21.9.
(a) A fair dice roll is a U[6], so σ 2 = 12
1
(62 − 1) = 35/12. Since the n dice are independent, the
variance of the sum is the sum of the variances and so σ 2 (sum of n dice) = 35n/12.
(b) The expected sum is µ(n) = 7n/2. max
(c) Using (a) and (b), 30
3-sigma range
p
µ ± 3σ = 7n/2 ± 3 35n/12.
20
In the figure, the 3σ-envelope is gray. The max (6n)
and min (n) are dotted lines, and the mean µ is the 10
solid line.
min
1 2 3 4 5 6 7
p n
(d) The bound is not trivial when 7n/2+3 35n/12 ≤ 6n. Since n is an integer, this means n ≥ 5.
1
Pop Quiz 21.10. Since X is a positive random variable, and E[X] = 1, P[X ≥ 50] ≤ 50 .
Pop Quiz 21.11. Since µ = 1 and σ = 1,
1
P[X ≥ 50] = P[X − 1 ≥ 49] ≤ P[|X − 1| ≥ 49] = P[|X − µ| ≥ 49σ] ≤ 2 ≈ 0.00042.
49
The first inequality is because X − 1 ≥ 49 → |X − 1| ≥ 49; the second uses Chebyshev’s Inequality.
512
The probability that at least 50 men get the correct hat sums P (k) for k = 50 to 100.
100 100−k
X 1 X (−1)i
P[X ≥ 50] = ≈ 0.
k! i=0 i!
k=50
51
P100−k (−1)i
The probability is upper bounded by because, for k ∈ {50, . . . , 100},
e×50! i=0 i!
≈ 1/e
and 1/k! ≤ 1/50!. The exact value is much smaller than the Chebyshev or Markov bounds.
Exercise 21.12. X = X1 + · · · + Xn , where Xi is Bernoulli with probability p = 12 .
(a) By linearity, E[X] = np = 100× 21 = 50. For the variance, σ 2 (Xi ) = p(1−p). By independence
and linearity, σ 2 (X) = np(1 − p) = n × 41 = 25 and σ = 5.
(b) Using the Binomial PDF, since X is a Binomial,
60 60
X 100 1 X 100
P[40 ≤ X ≤ 60] = ( 12 )k ( 12 )100−k = 100 ≈ 0.9648.
k 2 k
k=40 k=40
(c) P[40 ≤ X ≤ 60] = P[|X − µ| < 10] = P[|X − µ| < 2σ] ≥ 1 − 212 = 0.75.
Exercise 21.13.
(a) P[|X − µ| ≥ tσ] = P[X − µ ≥ tσ] + P[X − µ ≤ −tσ]. Using (21.2),
√
n(µ−tσ)
√ √
P[X − µ ≤ −tσ] = P[X ≤ µ − tσ] = FX (µ − tσ) ≈ φ σ
= φ(−t n) = 1 − φ(t n).
√
P[X − µ ≥ tσ] = 1 − P[X − µ ≤ tσ] = 1 − FX (µ + tσ) ≈ 1 − φ(t n).
√
Adding, these two equations gives the desired result: P[|X − µ| ≥ tσ] ≈ 2(1 − φ(t n)).
1 2
√ √ 1 2 √ e− 2 nt
(b) When t n is large, φ(t n) ≈ 1 − e− 2 nt / 2πnt2 . Therefore, P[|X − µ| ≥ tσ] ≈ 2 √ .
2πnt2
Chapter 22
Exercise 22.1.
(a) 1 7→ 2, 2 7→ 3, 3 7→ 4 is a bijection from A to B so |A| = |B|.
(b) Suppose n ≤ k, then ai 7→ bi is an injection from A to B, so |A| ≤ |B|.
Suppose |A| ≤ |B|, so there is an injection f : A 7→ B. We prove by induction on n that
n ≤ k. If n = 1, then B contains f (a1 ), so k ≥ 1. Suppose the claim holds for n. Consider
any set A with n + 1 elements. Let f (an+1 ) = bℓ . Relabel the elements of B so that bℓ → bk
and bk → bℓ (swap bℓ and bk ). Now, f maps an+1 7→ bk . If there was an element ar which
mapped to bk under f , ar now maps to bℓ . Now remove bk from B and an+1 from A. The
new f is an injection from a1 , . . . , an to b1 , . . . , bk−1 . By the induction hypothesis, n ≤ k − 1
or n + 1 ≤ k, as was to be proved.
|A| ≤ |B| → n ≤ k
(c) Suppose |A| ≤ |B| and |B| ≤ |A|. By (b), → n = k.
|B| ≤ |A| → k ≤ n
Therefore ai 7→ bi is a bijection from A to B, which means |A| = |B|.
(d) If A ⊆ B, then for a ∈ A, f (a) = a is an injection from A to B. Therefore, |A| ≤ |B|.
(e) No: in (a), A 6⊆ B and B 6⊆ A. If any two sets are comparable using a relationship, the
relationship is a total order. The subset relationship does not give a total order. You can
inj inj
always compare two sets using the injection relationship. Either A 7→ B or B 7→ A, so either
|A| ≤ |B| or |B| ≤ |A|, which means size comparison using injection gives a total order.
Exercise 22.2. First we show f is 1-to-1. Suppose not. Let n 1 6= n2 and f (n1 ) = f (n2 ). So,
1
4
(1 + (−1)n1 (2n1 − 1)) = 14 (1 + (−1)n2 (2n2 − 1)) → (−1)n1 (2n1 − 1) = (−1)n2 (2n2 − 1).
The sign of both sides must be the same, so (−1)n1 = (−1)n2 and we conclude 2n1 − 1 = 2n2 − 1.
That is, n1 = n2 , a contradiction. So, f is an injection.
Now we show that f is onto. Given z ∈ Z, we must find n for which f (n) = z.
z > 0 : n = 2z → f (n) = 41 (1 + (−1)2z (4z − 1)) = z;
z ≤ 0 : n = 2|z| + 1 → f (n) = 41 (1 + (−1)2|z|+1 (4|z| + 1)) = −|z| = z.
Therefore, f is onto, and hence a bijection from N to Z.
Pop Quiz 22.3. f is an injection, so two elements of A cannot map to the same element of N.
Pop Quiz 22.4. Yes (every integer has a unique list position). Mathematically, one can show
z1 6= z2 → f (z1 ) 6= f (z2 ). The positions of {0, +3, −3, +6, −6} are {1, 6, 7, 12, 13}.
Exercise 22.5.
513
(a) The zig-zag path moves out one square at a time. Given z/n ∈ Q, the column in which z
appears is c = 2z if z > 0 and c = 2|z| + 1 if z ≤ 0. We want to know when the path hits the
cth column and nth row. This entry will be hit when traversing the column and row for the
square of size max(n, c). If z > 0 (c even), you come down the RHS of the square; if z ≤ 0 (c
odd), you go up the RHS of the square. So, there are four cases:
n ≥ c; n even i = (n − 1)2 + 2n − c = n2 − c + 1
n ≥ c; n odd i = (n − 1)2 + c
c > n; c even i = (c − 1)2 + n
c > n; c odd i = (c − 1)2 + 2c − n = c2 − n + 1
(b) What is the list position of 02 ? All positions are used by rationals with denominator 1.
(c) The sets are countable, so they can be listed {A1 , A2 , A3 , . . . , }. Each set’s elements are count-
able, so they too can be listed, for example: A1 = {A1,1 , A1,2 , A1,3 , . . .}. Therefore, all the
elements in A1 ∪ A2 ∪ A3 ∪ · · · can be presented in a grid as follows (similar to Q):
N
1 2 3 4 5 6 7 ···
A1 A1,1 A1,2 A1,3 A1,4 A1,5 A1,6 A1,7 ···
A2 A2,1 A2,2 A2,3 A2,4 A2,5 A2,6 A2,7 ···
N A3 A3,1 A3,2 A3,3 A3,4 A3,5 A3,6 A3,7 ···
A4 A4,1 A4,2 A4,3 A4,4 A4,5 A4,6 A4,7 ···
A5 A5,1 A5,2 A5,3 A5,4 A5,5 A5,6 A5,7 ···

.. .. .. .. .. .. .. .. ..
. . . . . . . . .
The path of arrows starting at A1,1 lists the elements in the union. So, the union is countable.
(d) We have not shown R is countable (it’s not), so we don’t know if we can list the columns.
Exercise 22.6.
(a) No: b̄, the complement of the diagonal, is infinite, and does not have to be in the list.
(b) Countable means you must produce the list (injection to N) and show that it is a valid
injection. Cantor diagonalization shows that there is no valid injection from B ∞ to N. Adding
b̄ to the list won’t help because the complement of the new diagonal will not be in the list.
(c) Every infinite binary string is a subset of N. The ones in the string identify the elements of N.
This is a bijection between the subsets and infinite binary strings, so |subsets of N| = |B ∞ |.
(d) Given a finite subset, construct a finite binary string for the subset by taking its infinite
binary string and truncating it at the last 1. For example {1, 3, 7} 7→ 1010001. Two dif-
ferent subsets will have 1’s in different positions and so the truncated finite binary strings
will be different. We have an injection from finite binary strings to B which proves that
|{finite binary strings}| ≤ |B| ≤ | N |. So, the finite binary strings are countable.
Exercise 22.7.
(a) For any string that is eventually zero, construct a finite binary string by truncating it at the
last 1. Two different strings which are eventually zero have 1s in different positions, and so
will truncate to different finite strings. Hence, we have an injection from the strings which
are eventually zero and finite binary strings. So,
|{strings which are eventually zero}| ≤ |{finite binary strings}| ≤ | N |.
0 ∗
(b) An infinite string is either eventually zero or it is not, so (by definition) B ∞ = B∞ + B∞ .
∗ 0 0 ∗
(c) Suppose B∞ is countable. In (a) we showed that B∞ is countable. This means B∞ = B∞ ∪B∞
is the union of two countable sets and, by Theorem 22.2, is countable. That contradicts
∗
Theorem 22.5. We conclude that B∞ is uncountable.
∗
(d) Let a1 a2 a3 · · · and b1 b2 b3 · · · be two different infinite strings in B∞ . We prove that they map to
∗
different values in [0, 1] which proves the mapping is an injection and therefore |B ∞ | ≤ |[0, 1]|.
514
P P
Let xa = ∞i=1 ai 2
−i
and xb = ∞ −i
i=1 bi 2 . Since ai 6= bi for all i, by well ordering, there is
some minimum k for which ak 6= bk . Suppose ak = 1 and bk = 0. We have:
∞
P ∞
P
xa − xb = 2−k + ai 2−i − bi 2−i .
i=k+1 i=k+1
P
Since a is not eventually zero, the ai for i ≥ k + 1 cannot all be zero, so ∞i=k+1 ai 2
−i
> 0.
P∞ − P ∞ − −k
Setting bi = 1 for all i ≥ k + 1 gives i=k+1 bi 2 i ≤ i=k+1 2 i = 2 . Therefore,
xa − xb > 2−k + 0 − 2−k = 0
and xa 6= xb as was to be shown. The same argument works if ak = 0 and bk = 1.
Chapter 23
Pop Quiz 23.1.
(a) Lpush contains odd numbers, or strings that end in 1.
(b) Start the string with a the state of the light (1 on, 0 for off). The rest of the string is the
binary string encoding the number of pushes. Lpush contains strings whose first and last bit
are different, Lpush = {01, 10, 001, 110, . . .}.
Pop Quiz 23.2.
(a) (i) Open. (ii) Closed. (iii) Closed (not a valid sequence of walk-on/walk off). (iv) Open.
(v) Open. (vi) Closed. (vii) Closed (not a valid sequence).
(b) Strings with more 1’s than 0’s in which every prefix has at least as many 1’s as 0’s.
Pop Quiz 23.3. The graph represented by the string is shown on the right. 1
The distance between vertices 1 and 5 is three, so the answer is no . 2 5
3 4
Exercise 23.4.
(a) Starting from 1, you get your answer after D decision problems, for lengths 1, 2, . . . , D.
(b) You answer no after length n, the number of vertices in the network.
(c) Start with the lengths 1, 2, 4, . . ., doubling each time. Suppose you get the first yes at 2k
which is k + 1 decision questions. You now narrowed the distance to 2k−1 < D ≤ 2k . Perform
a binary search in the interval [2k−1 , 2k ]. The number of questions asked in the binary search
is O(log2 (length of interval)). The length of the interval is 2k − 2k−1 = 2k−1 . So the total
number of questions is at most k + 1 + O(log 2 (2k−1 )) ∈ O(k). Since 2k−1 < D, it follows that
k < 1 + log2 D ∈ O(log2 D) and you need only O(log2 D) questions.
Pop Quiz 23.5. The yes -set ⊆ {finite binary strings}, so it’s countable and hence can be
listed.
Exercise 23.6.
(a) Lbalanced = {ε, 01, 10, 0011, 0101, 0110, 1001, 1010, 1100, . . .}
(b) The finite binary strings can be listed, Σ∗ = {w1 , w2 , . . . , }. A computing problem L is a
subset of Σ∗ , which can be identified by an infinite binary sequence, where the 1’s in the
sequence identify the strings in the sequence identify the strings in L. We have an injection
from infinite binary sequences (which are uncountable) to computing problems, so
|{infinite binary sequences}| ≤ |{computing problem}|.
Therefore, the computing problems are uncountable.
(c) L1 , L1 ∪ L2 , L1 ∩ L2 are all collections of finite binary strings, so they are computing problems.
Pop Quiz 23.7.

(a) (i) {01, 011, 01111}.
(ii) {ε, 00, 0000}.
(iii) {00, 000, 001, 100, 0000, 0001, 0010, 0011, 1000, 1001, 0100, 1000, 1100}.
(b) {0, 1}∗ 1 or ∗1.
(c) Here are two solutions: {0}∗ • {ε, 1} • {0}∗ • {ε, 1} • {0}∗
({0}∗ ) ∪ ({0}∗ • 1 • {0}∗ ) ∪ ({0}∗ • 1 • {0}∗ • 1 • {0}∗ )
Pop Quiz 23.8. ε → 00, 11 → 0000, 1001, 0110, 1111
0 → 000, 101 → 00000, 10001, 01010, 11011
1 → 010, 111 → 00100, 10101, 01110, 11111
Exercise 23.9.
515
(a) L0n 1k = 0∗ 1∗ . We could not find a regular expression for L0n 1n . The challenge is to enforce
equality in the number of zeros and ones. If you find one, please let us know.
(b) We do not show the minimality clause which is there by default.
L 0n 1k : 1 ε ∈ L0 n 1 k . [basis]
2 w ∈ L0 n 1k → 0 • w ∈ L0n 1k , [constructor rules]
w ∈ L0 n 1k → w • 1 ∈ L 0n 1k .
L 0n 1n : 1 ε ∈ L0 n 1 n . [basis]
2 w ∈ L0n 1n → 0 • w • 1 ∈ L 0n 1n . [constructor rules]
Though there is no regular expression for L0n 1n , the language is easy to describe recursively.
Pop Quiz 23.10.
(a) These strings are in Lpush . We show the path of states traversed, ending in the resting state.
(i) q0 q1 q1 q1 q1 (ii) q0 q0 q1 q0 q1 (iii) q0 q1 q0 q0 q1 (iv) q0 q1 q1 q1 q0 q0 q0 q1 (v) q0 q1 q0 q0 q0 q0 q0 q0 q1 q1
(b) None of these strings are in Lpush .
(i) q0 q1 q0 (ii) q0 q0 q0 q1 q0 (iii) q0 q1 q0 q1 q0 (iv) q0 q1 q1 q1 q1 q1 q0 (v) q0 q0 q0 q0 q0 q0 q1 q1 q0
Chapter 24
Pop Quiz 24.1.
M M M
(a) (i) q0 |⊲0000 7→ q1 |0⊲000 7→ q1 |00⊲00 (v) q0 |⊲0•k 1•ℓ 7→ q1 |0⊲0•k−1 1•ℓ
M M M
7→ q1 |000⊲0 7→ q1 |0000⊲ 7→ q1 |0•2 ⊲0•k−2 1•ℓ
M M
(ii) q0 |⊲1000 7→ q2 |1⊲000 7→ q2 |10⊲00 ..
.
M M
7→ q2 |100⊲0 7→ q2 |1000⊲ M
7→ q1 |0•k ⊲1•ℓ
M M
(iii) q0 |⊲0001 7→ q1 |0⊲001 7→ q1 |00⊲01 M
7→ q2 |0•k 1⊲1•ℓ−1
M M M
7→ q1 |000⊲1 7→ q2 |0001⊲ 7→ q2 |0•k 1•2 ⊲1•ℓ−2
M M
(iv) q0 |⊲0100 7→ q1 |0⊲100 7→ q2 |01⊲00 ..
M M .
7→ q2 |010⊲0 7→ q2 |0100⊲ M
7→ q2 |0•k 1•ℓ ⊲
(b) Non-empty string containing only 0’s.
(c) (i) yes (the next transition is to q1 and the machine stays there).
(ii) yes (the machine stays in q1 ).
(iii) no (the machine cannot escape q2 ).
Exercise 24.2.
(a) When M processes a 1, it enters q2 from which it never leaves, so if w contains a 1 then
M (w) = no . If w = ε, then M stops in q0 and rejects. If w = 0•n for n > 0, then M enters
q1 and never leaves, accepting. Therefore, L(M ) = {0•n | n > 0}
(b) (i) Strings with no 1’s: L(M ) = {0}∗ .
(ii) Strings which are not only 0’s: L(M ) = {0•n | n > 0}.
(iii) Strings with an even number of zeros (incuding no 0’s and ε).
(iv) Strings with an odd number of zeros.
(v) L(M ) = {ε, 0} (a finite language with just 2 yes -strings).
(vi) Every string except ε and 0: L(M ) = {ε, 0}.
Exercise 24.3.
(a) We give a construction that generalizes to any finite language. Let ℓ be the length of the
longest string in L. Construct the binary tree to depth ℓ corresponding to every binary string
of length at most ℓ, as shown in the DFA below. Every string of length at most ℓ leads the
DFA to its unique state, which is a yes -state or not depending on whether the string is in
L. In the automaton below, the yes -states are s00 , s000 , s101 , s111 .
516
0
sε 1
s0 s1
0 1 0 1
s00 s01 s10 s11
0 1 0 1 0 1 0 1
s000 s001 s010 s011 s100 s101 s110 s111

0,1 0,1 0,1 0,1 0,1
0,1
0,
1
0,
1
e
0,1
(b) One can generalize the construction in (a). Instead, we give a proof by induction on ℓ, the
length of the longest string in L. The base cases are the empty language L = ∅ and L = {ε}.
0,1 0,1
L=∅: L = {ε} : 0,1

q0 q0 q1
Suppose that any finite language with maximum string length at most ℓ can be solved by a
DFA. Consider any language L with maximum string length ℓ + 1. L has two types of strings:
those starting with 0 and those starting with 1. So L = (0 • L0 )∪(1 • L1 ) where L0 contains the
suffixes of strings in L that start with 0 and L1 the suffixes of the strings in L that start with
1. L0 and L1 are finite languages with maximum string length at most ℓ. By the induction
hypothesis, there are DFAs M0 and M1 that solve L0 and L1 respectively. We now construct
a DFA M that solves L as follows.
r0 M0
0
M: q0
1 s0 M1
If a string starts with 0, the DFA transitions to r0 , the start state of M0 . The DFA then runs
M0 and accepts if and only the suffix is in L0 , i.e. if and only if the string is in L. The logic
is similar for a string that starts with 1. So, M accepts a nonempty string if and only if the
string is in L. One final detail is the empty string. If ε ∈ L, simply make q0 a yes -state.
Exercise 24.4.
(a) Strings in L1 ∩ L2 must contain a zero and they must end in 1 so they must be of the form
of a string containing 0 concatenated with a 1: ∗0 ∗ 1.
(b) From the start state, you wait for 0, transition to s 1 and 1 0 1
wait for 1. If you get 1, you must ensure it is the last bit. 0 1
The DFA which implements this logic is on the right. s0 s1 s2
0
(c) Using product states, the DFA structure is exactly the q0 s0 1 q0 s1 1
same as for the union L1 ∪ L2 . However, the DFA should 0
accept only the states qi sj where both qi is a yes -state 0
1
of M1 and sj is a yes -state of M2 . The resulting DFA q1 s0 q1 s1
0 1
is on the right
0
Pop Quiz 24.5.
(a) L contains a string with a zero concatenated with a string ending in 1, which gives any string
with a zero that ends with 1. That is, ∗0 ∗ 1. So we can use the same DFA in Exercise 24.4(b).
(b) From Exercise 24.4(a), L = L1 ∩ L2 . We can use the same DFA from Exercise 24.4(c).
(c) False. Consider any two L1 , L2 with L1 ∩ L2 = ∅. For example, L1 = {0} and L2 = {1}.
Pop Quiz 24.6. In M1′ , every zero toggles between q0 and q1 . An odd number of 0’s leaves you
in q1 which accepts.
517
Similarly, in M2′ the DFA toggles between q0 and q1 for every bit.
Exercise 24.7.
(a) After processing 100, the automaton state is q0 or s1 , and 101 remains to be processed.
M ′′
(b) {q0 }|⊲100101 7→ {q0 }|1⊲00101 At the end of the computation the non-
M ′′ deterministic automaton could be in ei-
7→ {q1 }|10⊲0101
ther of the states q0 , s0 or s1 .
M ′′
7→ {q0 , s1 }|100⊲101
M ′′
7→ {q0 , s0 }|1001⊲01
M ′′
7→ {q1 , s1 }|10010⊲1
M ′′
7→ {q0 , s0 , s1 }|100101⊲
For further clarity, we show the “computation-tree” on the input 100101.
q1
1 0 1
q0 q0 q1
1 0 0 s1
q0 q0 q1
1 0 1
s1 s0 s1 s0
The automaton starts in q0 and processes the bits 10, transitioning to state q0 then q1 . At the
next 0, there are two possible actions so the computation branches. Each time the automaton
is in state q1 the computation branches. At the end, there are three possible computation
paths the automaton could have taken (3 final states).
(c) If s1 is one of the possible ending states, then there is a computation path that must have
gone through q1 (prefix in L1 ) and ended in s1 without re-entering q0 (suffix in L2 ), so M ′′
should accept. In this case the decision is yes .
(d) As the hint suggests, construct a state for every non-empty subset of the states in M ′′ . We
will use the subset as the label for the state. The states are
{q0 }, {q1 }, {s0 }, {s1 }
{q0 , q1 }, {q0 , s0 }, {q0 , s1 }, {q1 , s0 }, {q1 , s1 }, {s0 , s1 }
{q0 , q1 , s0 }, {q0 , q1 , s1 }, {q0 , s0 , s1 }, {q1 , s0 , s1 }
{q0 , q1 , s0 , s1 }
For each “subset”-state, the DFA transitions to a subset of states for each bit by considering
all the possible ending states starting from any of the possible starting states of the transition.
1 1
Consider, for example, state {q1 , s1 } and input bit 1. q1 →{q1 , s1 } and s1 →s0 }, so
1
{q1 , s1 } → {q1 , s0 , s1 }.
The DFA accepts whenever s1 is in the 1
set of possible states. The full automa- 0
ton is on the right. Though there are 15 {q0 } {q0 , s1 } {q1 , s0 }
subset states, only 8 of them are reach-
0 1
able from the start state. 0 0 1 1
Our “principled” approach may not give
1
the most efficient DFA. For example, {q1 } {q0 , s0 } {q0 , s0 , s1 }
states {q1 , s1 }, {q0 , s0 , s1 }, {q1 , s0 , s1 }
state {q1 s0 can be connected directly 0 1
1 0 0 0
to are all accepting and only transition
1
amongst themselves, so they can all be {q1 , s1 } {q1 , s0 , s1 }
merged into a single accepting state.
Exercise 24.8.
1 0,1
(a)
1 0
s0 s1 s2
518
(b) The idea here is to start with a DFA that solves L and convert it into a DFA that solves L ∗ .
We use the method in Exercise 24.3 to construct a DFA for L.
1 0
DFA for L = {1, 10} sε s1 s10
1
0,1
0
e 0,1
To implment L∗ , when we reach an accept state of L, we would like the option to restart
the DFA (as well as continue with the current path): any accept state, on receiving input
0, should also transition to the state sǫ would transition to on receiving input 0. Similarly,
on receiving input 1, the DFA should also transition to the state s ǫ would transition to on
receiving input 1. We get a non-deterministic automaton. So, for example, s 1 transitions to
s1 and e on input bits 1 and 0 respectively because those are the transitions from s ε .
1
0
non-deterministic 1
sε s1 s10
automaton for L+
1
0,1 0,1
0
e 0,1
Let us emphasize that a non-deterministic automaton is an interesting machine in its own

right. However, for our purposes, it merely serves as an intermediate tool for getting at the
DFA we need. One more small detail: since ε ∈ L∗ , the non-deterministic automaton above
only captures the non-empty strings in L∗ (often denoted L+ ). At the end, we must augment
the automaton to accept ε. We now use the subset-state method to get the DFA for L + . We
only show the subset-states that are used in the DFA, not all 15 subset-states.
1 1
DFA for L+ {sε } {s1 } {s1 , e} 1
0
0 0 1
0
0,1 {e} {s10 , e}
Lastly, to get the DFA for L∗ , simply make the start state {sε } accepting.
(c) (i) {0, 1, 00, 10, 000, 100, 0000, 1000, 110, 111, 010, 011, 1010, 1100, 1011, 1110, 0010, 0100, 0011, 0110}
(ii) We use the same approach as in (b). From any accept state, we must allow the automaton
to continue its current path or “restart” from state q 0 for the next bit. The only accept
state is q1 , so q1 should also transition to wherever q0 transitions for bits 0,1. This gives the
non-deterministic automaton for L(M )+ .
0,1
non-deterministic q0 q1
+ 0,1
automaton for L(M )
1
We now use subset-states to convert this non-deterministic automaton to a DFA for L(M ) ∗ .
To accept ε, we make the start state accepting. The result is:
0 1
1
0,1
DFA for L(M )∗ {q0 } {q1 } {q0 , q1 }
0
You will notice that this DFA accepts every string including ε: L(M )∗ = Σ∗ . This is no
surprise because the Kleene star of any language that contains 0 and 1 is Σ ∗ . So, a much
simpler solution for L(M )∗ is 0,1 . We are not after the simplest solution. We are
after a systematic solution.
Exercise 24.9. Let L1 contain strings whose number of 1’s is divisible by 2 and L2 contain
strings whose number of 1’s is divisible by 3. Let We first argue that L1 and L2 are regular, which
means there are DFAs M1 and M2 to solve L1 and L2 :
519
0 0 0 0 0
1
q0 q1 q0 1 q1 1 q2
M1 : M2 :
1
1
We can build L using set operations on L1 and L2 : L = L1 ∪ L2 . By Theorem 24.2, L1 ∪ L2 is
regular because L1 and L2 are regular (closed under union). Again, by Theorem 24.2, L1 ∪ L2 is
regular (closed under complement). Therefore L is regular.
0 1
Pop Quiz 24.10. q0 1 q3 q1

0,1
0
Pop Quiz 24.11. The same proof used for Theorem 24.3 works here. If the DFA has k states
then for some 0 ≤ i < j ≤ k, the DFA must give the same answer for 0•i 1•i and 0•i 1•j , but the
first is balanced and the second is not, which is a contradiction.
Exercise 24.12.
(a) The start state is q0 . It is an accept state if the input string is empty.
q0 → e: If the first bit is 1, transition to error, doing nothing to the stack.
q0 → q1 : If the first bit is 0, push it onto the stack and transition to q 1 . (Step I.)
e → e: remain in the error state for any input (0, 1, ε) and stack symbol (0, 1, ∅), doing
nothing to the stack. symbol. If the input is ε, the automaton stops and rejects.
q1 → q1 : if input is 0 and stack is 0, push 0 onto the stack and remain in q 1 . (Step II.)
(b) {0, 1, ε}{0, 1, ∅} → {}
e {ε
{1 }{0 �
The transitions q2 → e are three
�
{0 }{∅ } → separate instructions conveniently
{ε}{0} → {}
{} }{ }
0, → {} stacked on top of each other.
→ ∅
∅
} } {}
→
}{ {}
{1
q0 q1 q2 q3
{0}{∅} → {push(0)} {1}{0} → {pop()} {ε}{∅} → {}
{0}{0} → {push(0)} {1}{0} → {pop()}

(c) Left of the head-position is the substring processed and to its right the substring remaining.
We also must show the stack which we do to the left of the current state (the top of the stack
is rightmost in black). Let us denote our PDA by M . The traces for the four strings are:
∅ | q0 | ⊲010ε ∅ | q0 | ⊲00011ε ∅ | q0 | ⊲0011ε ∅ | q0 | ⊲00111ε
M M M M
7→ ∅0 | q1 | 0⊲10ε 7→ ∅0 | q1 | 0⊲0011ε 7→ ∅0 | q1 | 0⊲011ε 7→ ∅0 | q1 | 0⊲0111ε
M M M M
7→ ∅ | q2 | 01⊲0ε 7→ ∅00 | q1 | 00⊲011ε 7→ ∅00 | q1 | 00⊲11ε 7→ ∅00 | q1 | 00⊲111ε
M M M M
7→ ∅ | e | 010⊲ε 7→ ∅000 | q1 | 000⊲11ε 7→ ∅0 | q1 | 001⊲1ε 7→ ∅0 | q1 | 001⊲11ε
M M M M
7 ∅ | e | 010ε⊲
→ 7→ ∅00 | q2 | 0001⊲1ε 7→ ∅ | q2 | 0011⊲ε 7→ ∅ | q2 | 0011⊲1ε
stop, reject M M M
7→ ∅0 | q2 | 00011⊲ε 7→ ∅ | q3 | 0011ε⊲ 7→ ∅ | e | 00111⊲ε
M stop, accept M
7 ∅0 | e | 00011ε⊲
→ 7→ ∅ | e | 00111ε⊲
stop, reject stop, reject
(d) The PDA seems reasonably easy to implement mechanically. The only concern is that it
requires an infinite stack to process inputs like 0•n 1•n for arbitrarily large n. In practice,
ofcourse, we can only implement a PDA with a finite stack memory.
Chapter 25
Pop Quiz 25.1.
1: 3: 2: 4: 4: 5: 5:
(a) S ⇒ T0 A ⇒ T0 XT1 ⇒ T0 T0 T1 T1 ⇒ T0 0T1 T1 ⇒ 00T1 T1 ⇒ 001T1 ⇒ 0011.
(b) Cannot be derived; (c) Cannot be derived; (d) Cannot be derived;
520
The strings in this CFL are of the form 0•n 1•n for n ≥ 0.
Pop Quiz 25.2.
1: 5: 2:
S ⇒ <phrase><verb> ⇒ <phrase>runs.␣S ⇒ <article><noun>runs.␣S
3: 4: 1:
⇒ A␣<noun>runs.␣S ⇒ A␣cat␣runs.␣S ⇒ A␣cat␣runs.␣<phrase><verb>
5: 2:
⇒ A␣cat␣runs.␣<phrase>walks. ⇒ A␣cat␣runs.␣<article><noun>walks.
3: 4:
⇒ A␣cat␣runs.␣The␣<noun>walks. ⇒ A␣cat␣runs.␣The␣dog␣walks.
Exercise 25.3.
1: 2: 3: 7:
(a) (i) S ⇒ <stmt>;S ⇒ <declare>;S ⇒ int␣<variable>;S ⇒ int␣x;S
1: 2: 3:
⇒ int␣x;<stmt>;S ⇒ int␣x;<declare>;S ⇒ int␣x;int␣<variable>;S
7: 7:
⇒ int␣x;int␣x<variable>;S ⇒ int␣x;int␣xx;S
1: 2:
⇒ int␣x;int␣xx;<stmt>;S ⇒ int␣x;int␣xx;<assign>;S
4: 7:
⇒ int␣x;int␣xx;<variable>=<integer>;S ⇒ int␣x;int␣xx;x=<integer>;S
5: 6:
⇒ int␣x;int␣xx;x=<integer><digit>;S ⇒ int␣x;int␣xx;x=<integer>2;S
5: 6:
⇒ int␣x;int␣xx;x=<digit>2;S ⇒ int␣x;int␣xx;x=22;S
1: 2:
⇒ int␣x;int␣xx;x=22;<stmt>; ⇒ int␣x;int␣xx;x=22;<assign>;
4:
⇒ int␣x;int␣xx;x=22;<variable>=<integer>;
7:
⇒ int␣x;int␣xx;x=22;x<variable>=<integer>;
7:
⇒ int␣x;int␣xx;x=22;xx=<integer>;
5: 6:
⇒ int␣x;int␣xx;x=22;xx=<digit>; ⇒ int␣x;int␣xx;x=22;xx=8;
Long and tedious, but that’s what it takes to derive non-trivial strings in non-trivial grammars.
1: 2: 4: 7:
(ii) S ⇒ <stmt>;S ⇒ <assign>;S ⇒ <variable>=<integer>;S ⇒ x=<integer>;S
5: 6:
⇒ x=<digit>;S ⇒ x=8;S
1: 2: 3: 7:
⇒ x=8;<stmt>; ⇒ x=8;<declare>; ⇒ x=8;int␣<variable>; ⇒ x=8;int␣x;
1: 2: 3: 7:
(iii) S ⇒ <stmt>;S ⇒ <declare>;S ⇒ int␣<variable>;S ⇒ int␣x;S
1: 2: 4:
⇒ int␣x;<stmt>; ⇒ int␣x;<assign>; ⇒ int␣x;<variable>=<integer>;
7: 7:
⇒ int␣x;x<variable>=<integer>; ⇒ int␣x;xx=<integer>;
5: 6:
⇒ int␣x;xx=<digit>; ⇒ int␣x;xx=8;
(b) Only (i) is semantically correct. In (ii) the variable is used before it is declared. In (iii) one
variable is declared and another is used.
(c) Add the white space variable in the rule for S and a new rule to create the white space:
1: S → <stmt>;W S | <stmt>;
2: <stmt> → <assign> | <declare>
3: <declare> → int␣<variable>
4: <assign> → <variable>=<integer>
5: <integer> → <integer><digit> | <digit>
6: <digit> → 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
7: <variable> → x | x<variable>
8: W → W W | ε | ␣ | \n
Pop Quiz 25.4. Let w be a non-empty string in Lequal . Either w starts with 0 or with 1.
Consider the case 0, w = 0v. Since w has an equal number of 0’s and 1’s, v has more 1’s than
0’s. Therefore some prefix of v (for example v itself) has exactly more 1’s than 0’s. By the well
ordering principle, there is a shortest prefix of v that has more 1’s than 0’s. Call this prefix v 1 ,
v = v1 w2 and v1 must end in 1 otherwise some shorter prefix has more 1’s than 0’s. Therefore,
v1 = w1 1 and w1 cannot have more 1’s than 0’s, since v1 is the shortest prefix with this property.
Therefore, w1 must have an equal number of 1’s than 0’s, because otherwise v1 would not have
more 1’s. We have proved:
w = 0v = 0v1 w2 = 0w1 1w2 ,
where w2 has an equal number of 1’s and 0’s. Since w has an equal number of 1’s and 0’s, this
means that w2 must also have an equal number of 1’s and 0’s. The case where w starts with 1
uses identical reasoning.
Exercise 25.5.
521
(a) (i) 1: S → ε | 1S (Every non-empty string in L is of the form 1w where w ∈ L.)

(ii) By induction on the length of the derivation. The base cases are S ⇒ ε and S ⇒ 1. All
other derivations start S ⇒ 1S followed by a shorter derivation for the S on the RHS. For
this shorter derivation, by the induction hypothesis the string derived is 1 •n for n ≥ 0 and so
the full derivation fives 1•n+1 .
∗ ∗
(iii) By induction on n: Suppose S ⇒ 1•n . Then, S ⇒ 1S ⇒ 1 • 1•n is a derivation of 1•n+1 .
(b) (i) 1: S → A1
2: A → ε | 0 | 1 | AA
(ii) All derivations start S ⇒ A1 followed by a derivation from A, therefore the final string is
the string resulting from A followed by 1.
(iii) By induction on string-length, any string v can be derived from A. The base cases are
∗
v = ε, 0, 1. If v = 0x, then x is shorter and A ⇒ x (induction hypothesis), so A ⇒ AA ⇒
∗
0A ⇒ 0x. Similarly if v = 1x. Therefore, any string of the form v1 can be derived from S:
∗
S ⇒ A1 ⇒ v1 (derivation of v from A).
(c) (i) 1:S → A00A
2:A → ε | 0 | 1 | AA
(ii) All derivations start S ⇒ A00A followed by a derivations from each A, therefore the final
∗ ∗
string contains 00 because it is v00w where A ⇒ v and A ⇒ w.
(iii) In (b) we showed that any string can be derived from A so any string of the form v00w
∗
can be derived from S by S ⇒ A00A ⇒ v00w (derivations of v and w from the A’s).
(d) (i) 1: S → ε | 1S0
(ii) Induction on the length of the derivation. Every derivation starts S ⇒ 1S0 followed by a
shorter derivation on the RHS S, which by the induction hypothesis gives 1 •k 0•k . Therefore
the full derivation gives 1•k+1 0•k+1
∗ ∗
(iii) Induction on the length of the string. Suppose S ⇒ 1•n 0•n then S ⇒ 1S0 ⇒ 1 • 1•n 0•n • 0
•n+1 •n+1
is a derivation of 1 0 .
(e) (i) 1: S → AB (S is composed of two types of strings, A and B)
2: A → ε | 1A0 (A generates 1•k 0•k as in part (e))
3: B → ε | 1B (B generates 1•ℓ as in part (a))
(ii) S is a string derived from A followed by one derived from B. In (e,ii) we showed that
all derivations from A yield 1•k 0•k . In (a,ii) we showed that all derivations from B yield 1•ℓ ,
therefore all derivations from S yield 1•k 0•k 1•ℓ .
(iii) In (e,iii) we showed that every string of the form 1•k 0•k can be derived from A. In (a,iii)
we showed that every string of the form 1•ℓ can be derived from B. Therefore every string of
the form 1•k 0•k 1•ℓ can be derived from S.
(f) (i) The basic palindromes are ε, 0, 1. All other palindromes either start and end in 0 (with
a palindrome in between) or they start and end in 1 (with a palindrome in between). This
observation suggests the grammar
1: S → ε | 0 | 1 | 0S0 | 1S1
(ii) Induction on the length of the derivation. For the derivation, which starts 0S0 (the same
logic applies if it starts 1S1), suppose ultimately that the middle S yields x (via a shorter
derivation). By the induction hypothesis, x is palindrome, and so the final string 0x0 is a
palindrome because (0x0)r = 0xr 0 = 0x0.
(iii) Induction on the length of the palindrome. Suppose w is a (non-basic) palindrome. Then
∗
w = 0x0 or w = 1x1 where x is a shorter palindrome. Therefore S ⇒ x by the induction
∗ ∗
hypothesis. Therefore S ⇒ 0S0 ⇒ 0x0 (or S ⇒ 1S1 ⇒ 1x1) is a derivation of w.
Exercise 25.6.
(a) ∗0∗: 1: S0 → A0A ∗1∗: 1: S1 → B1B
2: A → ε | 0 | 1 | AA 2: B → ε | 0 | 1 | BB
For the union (strings containing 0 or 1): 1: S → S0 | S1

2: S0 → A0A
3: A → ε | 0 | 1 | AA
4: S1 → B1B
5: B → ε | 0 | 1 | BB
522
(b) 0•k 1•k+ℓ 0•ℓ = 0•k 1•k • 1•ℓ 0•ℓ , so L is the concatenation of 0•k 1•k with 1•ℓ 0•ℓ .
0•k 1•k : 1: A → ε | 0A1 1•ℓ 0•ℓ : 1: B → ε | 1B0
•k •k+ℓ •ℓ
For the concatenation 0 1 0 : 1: S → AB
2: A → ε | 0A1
3: B → ε | 1B0
(c) Suppose you have a CFL L. We show what a CFG for L would 1: A → ...
look like on the right. The start variable is A. Now suppose 2:
we wish to construct a grammar for L∗ . The idea is to first ..
construct A•i for i ≥ 0 and then use the grammar for L to .
independently derive each A into a string from L. We know
how to construct a CFG for A•i , 1: S → ε | SA.
In the grammar on the right, the first rule generates a concate-
1: S → ε | SA
nation of A’s and the remaining rules from L independently
2: A → ...
derive each A into a string from L. In our case the grammar for
3:
L is 1: A → ε | 0A1 , therefore the grammar for L∗ is ..
.
1: S → ε | SA ,
2: A → ε | 0A1
You should derive some strings within this grammar and verify they are all in L ∗ .
(d) We observe that strings not containing 00 are any combination of the strings in 1, 01 termi-
nated by 0 or ε. That is, L = {1, 01}∗ • {ε, 0}. The set {1, 01} is generated by 1: A → 1 | 01
and the set {ε, 0} is generated by 1: B → ε | 0 . Therefore,
{1, 01}∗ : 1: X → ε | XA {1, 01}∗ • {ε, 0} : 1: S → XB
2: A → 1 | 01 2: X → ε | XA
3: A → 1 | 01
4: B → ε|0
We can also use the recursive approach to derive the CFG. Let S represent a string in L.
Either S is empty or begins with 1 or 0. If S begins with 0, then it can be followed by any
string in L, generically represented by S. If S starts with 0, then it either ends or follows with
a 1 and then any string in S. Therefore a grammar for L is
1: S → ε | 0 | 1S | 01S
For yet another approach to constructing a CFG for L, see Problem 25.10
Exercise 25.7.
(a) Let us walk through a systematic procedure for converting a CFG into Chomsky Normal
Form. Our grammar is 1: S → ε | 0S1S | 1S0S . First, we make sure that the start variable
does not appear on the RHS of any rule. To do this, we change the start variable and add a
new rule from the new start variable to the old one:
1: S → X
2: X → ε | 0X1X | 1X0X
This is a trivial change which does not alter the strings which can be derived. All derivations
simply start S → X and proceed as they would have before. Now remove all rules which take
a variable to ε: if a variable A may transition to ε, remove that transition and replace every
occurrence of the variable on the RHS of a rule with a new instance of the rule that replaces
A with ε. Note that every instance of A must be replaced “independently”. For example
X → 0X1X gets replaced by
X → 0X1X | 01X | 0X1 | 01
which captures all possible ways of replacing the X’s with ε. Our grammar becomes:
1: S → ε|X
2: X → 0X1X | 01X | 0X1 | 01 | 1X0X | 10X | 1X0 | 10
Now replace each terminal with a corresponding terminal variable (for example T 0 for 0 and
523
T1 for 1) and add a rule from each terminal variable to its corresponding terminal:
1: S → ε|X
2: X → T0 XT1 X | T0 T1 X | T0 XT1 | T0 T1 | T1 XT0 X | T1 T0 X | T1 XT0 | T1 T0
3: T0 → 0
4: T1 → 1
Every rule (except for the rules to terminals) now has the form
<variable> → string of <variables>.
Now perform variable reduction on the RHS of any rule that has more than two variables.
One way to reduce the rule A1 → A2 A3 · · · Ak is
A1 → A2 B1 ; B1 → A3 B2 ; B2 → A4 B3 ; ··· Bk−2 → Ak−1 Ak
When you combine all the rules above, it becomes the single rule A1 → A2 A3 · · · Ak . In
general, you can pick any consecutive variables on the RHS and reduce it to a new variable,
while providing a new rule from the new variable to the pair. For our grammar let us use
A0 → T0 X and A1 → T1 X, then the rule for X becomes
X → T0 T1 | T1 T0 | A0 A1 | T0 A1 | A0 T1 | A1 A0 | T1 A0 | A1 T0
Our grammar becomes
1: S → ε|X
2: X → T0 T1 | T1 T0 | A0 A1 | T0 A1 | A0 T1 | A1 A0 | T1 A0 | A1 T0
3: A0 → T0 X
4: A1 → T1 X
5: T0 → 0
6: T1 → 1
Finally, for rules that take a variable to a single variable (e.g. S → X), replace the variable
on the RHS by the entire rule for that variable. Our grammar in Chomsky Normal Form is
1: S → ε | T0 T1 | T1 T0 | A0 A1 | T0 A1 | A0 T1 | A1 A0 | T1 A0 | A1 T0
2: X → T0 T1 | T1 T0 | A0 A1 | T0 A1 | A0 T1 | A1 A0 | T1 A0 | A1 T0
3: A0 → T0 X
4: A1 → T1 X
5: T0 → 0
6: T1 → 1
(b) Every application of a production rule increases the length of the hybrid string by 1 (starting
from S of length 1). Every application of a terminal rule keeps the length of the hybrid string
fixed. Since the final string has length n, this means there must be n − 1 steps which increase
the length by 1 and n steps which convert variables to terminal for a total of 2n − 1 steps.
This argument works because no rule can decrease the length of the hybrid string.
(c) For a grammar in Chomsky Normal Form and an input string w of length n, try all derivations
of length 2n − 1. If w resulted from one of these (possibly exponentially many) derivations,
then w ∈ L, otherwise w 6∈ L. Note that this is a relatively inefficient procedure to test for
membership in a CFL, but a finite one.
Pop Quiz 25.8. The final parse tree is boxed on the right of each derivation.
S ⇒ S ⇒ S ⇒ S S S ⇒ S ⇒ S ⇒ S S
S + S S + S S + S S S S + S S + S S + S S S
2 2 2 2 + 2 2 2 2 2 + 2
Though the two derivations are different, the final parse trees are identical.
Exercise 25.9.
(a) Change the order of some of the transitions to get a different derivation.
524
S
S ⇒ S+P
⇒ P +P S + P
⇒ T +P
P P × T
⇒ T +P ×T
⇒ T +T ×T T T T
∗
⇒ 2+2×2
2 2 2
We get a different derivation but the same parse tree.
(b) Grammar in (25.4) Grammar in (25.3)

S ⇒ P S ⇒ S×S
⇒ P ×T ⇒ (S) × S
⇒ T ×T ⇒ (S) × (S)
⇒ (S) × T ⇒ (S + S) × (S)
⇒ (S) × (S) ⇒ (S + S) × (S + S)
⇒ (S + P ) × (S) ⇒ (S + S) × (S + S × S)
∗
⇒ (P + P ) × (S) ⇒ (2 + 2) × (2 + 2 × 2)
⇒ (T + P ) × (S)
⇒ (T + T ) × (S)
⇒ (T + T ) × (S + P )
⇒ (T + T ) × (S + P × T )
⇒ (T + T ) × (P + P × T )
⇒ (T + T ) × (T + P × T )
⇒ (T + T ) × (T + T × T )
∗
⇒ (2 + 2) × (2 + 2 × 2)
At the end of both derivations, every variable transitions to 2 in some order. The deriva-
tion using the unambiguous grammar in (25.4) is much longer, the price one pays to obtain
unambiguous parse trees (because additional information is embedded in the rules).
(c) (i) For emphasis we give 3 different derivations with different parse trees.
S ⇒ SS S S ⇒ SS S S ⇒ SS S
⇒ SSS S S
⇒ SSS S S
⇒ SSS S S
∗ ∗
⇒ 111 ⇒ 111 ⇒ SSSS
S S S S ∗ S S S S
⇒ 1ε11
1 1 1 1 1 1 1 ε 1 1
At the end of each derivation, every variable transitions to a terminal in some order. In case
of ambiguity we identify in bold the variable that is transitioning.
(ii) To remove the ambiguity, we give just one way to add a 1 to the string, S → ε | 1S.
Pop Quiz 25.10. S ⇒ 0S0 ⇒ 01S10 ⇒ 011S110 ⇒ 011#110

Exercise 25.11.
525
(a) {0, 1, #, ε}{0, 1, ∅} → {}
e {
{ #}
{1 ε}{ {0,
{0 }{ 0, 1,
{0, 1, #}{∅} → {}
}{ 0, 1, ∅
{} 1, ∅ ∅ } →
→ ∅ } →} →
} } {
∅ → { { }
}{ {} } }
{ε
q0 q2 q3 q4
{#}{∅} → {} {ε}{∅} → {}
{0
{1 }{∅
}{ } {ε}{∅} → {} {} {0}{0} → {pop()}
∅} → →
→ {pu 1 } {1}{1} → {pop()}
{p sh 0,
us (0) }{
h( } {#
1)
} q1
{0}{0, 1} → {push(0)}
{1}{0, 1} → {push(1)}
{ε}{0, 1} → {}
(b) Left of the head-position is the substring processed and to its right the substring remaining.
We also must show the stack which we do to the left of the current state (the top of the stack
is rightmost in black). Let us denote our PDA by M . The traces for the strings are:
∅ | q0 | ⊲0110ε ∅ | q0 | ⊲01#01ε ∅ | q0 | ⊲01#10ε
M M M
7→ ∅0 | q1 | 0⊲110ε 7→ ∅0 | q1 | 0⊲1#01ε 7→ ∅0 | q1 | 0⊲1#10ε
M M M
7→ ∅01 | q1 | 01⊲10ε 7→ ∅01 | q1 | 01⊲#01ε 7→ ∅01 | q1 | 01⊲#10ε
M M M
7→ ∅011 | q1 | 011⊲0ε 7→ ∅01 | q3 | 01#⊲01ε 7→ ∅01 | q3 | 01#⊲10ε
M M M
7→ ∅0110 | q1 | 0110⊲ε 7→ ∅01 | e | 01#0⊲1ε 7→ ∅0 | q3 | 01#1⊲0ε
M M M
7→ ∅0110 | q1 | 0110ε⊲ 7→ ∅01 | e | 01#01⊲ε 7→ ∅ | q3 | 01#10⊲ε
stop, reject M M
7→ ∅01 | e | 01#01ε⊲ 7 ∅ | q4 | 01#10ε⊲
→
stop, reject stop, accept
Exercise 25.12.
(a) Yes. L1 is the concatenation of 0•n 1•n (context free, S → ǫ | 0S1) with 0•m (context free,
S → ǫ | 0S). Since CFLs are closed under concatenation, L1 is a CFL.
(b) Yes, for a similar reason. L2 is the concatenation of 0•m with 0•n 1•n .
(c) L1 ∩ L2 = {0•n 1•n 0•n | n ≥ 0} = L.
(d) Suppose CFLs are closed under intersection. Then L = L1 ∩ L2 must be a CFL, which
contradicts the fact that L is not a CFL. Therefore CFLs are not closed under intersection.
(e) Suppose CFLs are closed under complement. So, L1 and L2 are CFLs. CFLs are closed under
union, so L1 ∪ L2 is a CFL. This means L1 ∪ L2 is a CFL (closed under complement). But
L1 ∪ L2 = L1 ∩ L2 = L1 ∩ L2 (= L).
(The first step uses A ∪ B = A∩B.) We have proved that closure under union and complement
implies closure under intersection. But CFLs are not closed under intersection; in particular,
L is not a CFL (contradiction). Therefore, CFLs are not closed under complement.
Chapter 26
Pop Quiz 26.1. On 0110, the TM halts with no at step 1 because there is no # in the input.
526
01#10 101#10 10#101
* 0 1 # 1 0 ␣ * 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓
* 0 1 # 1 0 ␣ * 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓
* 0 1 # 1 0 ␣ * 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
halt, no
✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
* 1 0 1 # 1 0 ␣ * 1 0 # 1 0 1 ␣
halt, no halt, no
Pop Quiz 26.2.
{0}{0}{R}
(a) q r From q if you read 0, transition to r, write 0 and move right.
{0}{}{R}
(b) q r From q if you read 0, transition to r, and move right.
{✓0 }{}{}
(c) q r From q if you read marked 0, transition to r.
{0}{✓}{}
(d) q r From q if you read 0, mark it and transition to r.
{0}{✓}{R}
(e) q r From q if you read 0, mark it, transition to r and move right.
{0,1}{ ✓1 }{L}
(f) q r From q if you read 0 or 1, write marked 1, transition to r and move left.
✓
{ 0,1 }{0,1}{L}
(g) q r From q if marked, unmark transition to r and move left.
{✓}{0}{L}
(h) q r From q if marked, write 0 (unmarked), transition to r and move left.
{✓}{}{L}
(i) q r From q if marked, transition to r and move left.
{0}{␣}{L}
(j) q r From q if 0, erase, transition to r and move left.
{␣}{#}{L}
(k) q r From q if blank, write #, transition to r and move left.
{✓}{ ✓0 }{}
(l) q r From q if marked, write marked 0 and transition to r.
Pop Quiz 26.3. We simply identify the step 2 states in the automata for Step 1 and Step 2,
and “snap” the automata together by merging the step 2 states. The result is:
527
{∗,0,1}{}{R} {0,1}{}{R}
{#}{}{R}
q0 q1
}
}{
}{
{␣}{}{} {#
{␣}{}{}
{∗}{}{}
E step 2 step 3
{␣,0,1,#}{}{L}
Pop Quiz 26.4. At the end of each step the machine transitions to the success state for that
step; this success step becomes the starting point for the next step. Merging the automaton from
the previous pop quiz with the automaton for step 3 at the state step 3 gives
{∗,0,1}{}{R} {0,1}{}{R}
{#}{}{R}
q0 q1
step 4
}
}{ z4
}{
{␣}{}{} {#
{␣}{}{} {∗,✓}{}{R} R}
✓ }{
}{
{∗}{}{} {0
E step 2 step 3
{1
}{✓
{#}{}{R} }{
{␣,0,1,#}{}{L} R}
step 5 o4
Merging the states z4 and o4 with the step-4 machine gives

{∗,0,1}{}{R} {0,1}{}{R}
{#}{}{R} {0,1}{}{R} {✓}{}{R}

q0 q1
{} {#}{}{R}
} {} z4 z5
{#
{0
{␣}{}{}
}{
{␣}{}{} {∗,✓}{}{R}
{ R} {1,␣}{}{}
✓}
✓}
{}
{
{∗}{}{} { 0}
E step 2 step 3 E step 2
{1
}{
✓}
}
{#}{}{R}
}{
{R {0,␣}{}{}
{␣,0,1,#}{}{L} }
✓
}{
{1
step 5 o4 o5
{#}{}{R}
{0,1}{}{R} {✓}{}{R}
Filling in step 5 gives the final Turing machine.
528
{∗,0,1}{}{R} {0,1}{}{R}
{#}{}{R} {0,1}{}{R} {✓}{}{R}

q0 q1
} {#}{}{R}
{ }{ z4 z5
#}
{0
{␣}{}{} {
}{
{␣}{}{} {∗,✓}{}{R} R}
}{ {1,␣}{}{}
✓}
✓
{}
}{
{∗}{}{} {0
E step 2 step 3 E step 2
{1
}{
✓}
{}
{#}{}{R} {R {0,␣}{}{}
✓}
{␣,0,1,#}{}{L} }
}{
{0,1}{}{}
{1
E step 5 o4 o5
{#}{}{R}
} {}
}{
{␣
{✓}{}{R} {0,1}{}{R} {✓}{}{R}
A
(a) → q 0 → q0 → q0 → q0 → q1 → q1 → q1
→ step 2 → step 2 → step 2 → step 2 → step 2 → step 2 → step 2
→ step 3 → step 3 → z4 → z4 → z5 → step 2 → step 2 → step 2 → step 2 → step 2
→ step 3 → step 3 → step 3 → o4 → o5 → o5
→ step 2 → step 2 → step 2 → step 2 → step 2 → step 2
→ step 3 → step 3 → step 3 → step 3 → step 5 → step 5 → step 5 → A
(b) → q0 → q0 → q0 → q0 → q1 → q1 → q1
→ step 2 → step 2 → step 2 → step 2 → step 2 → step 2 → step 2
→ step 3 → step 3 → z4 → z4 → z5 → e
(c) → q0 → q0 → q0 → q1 → q1 → q1
→ step 2 → step 2 → step 2 → step 2 → step 2 → step 2 → step 3 → step 3 → z4 → z5
→ step 2 → step 2 → step 2 → step 2 → step 3 → step 3 → step 5 → step 5 → e
(d) → q 0 → q0 → q0 → q0 → q1 → q1
→ step 2 → step 2 → step 2 → step 2 → step 2 → step 2 → step 3 → step 3
→ z4 → z4 → z5 → step 2 → step 2 → step 2 → step 2 → step 2
→ step 3 → step 3 → step 3 → o4 → o5 → o5 → e
Exercise 26.5. Step 1 and 2 are very similar to our Turing machine which solved w#w. Instead,
we are now looking for two #’s and a specific format of 0’s and 1’s. A DFA solves this problem.
{∗,0}{}{R} {1}{}{R} {0}{}{R}
{#}{}{R} {#}{}{R}
q0 q1 q2
{}
{} }
,␣
} {} }{
{}
{1,␣}{}{} {0 1,#
}
␣}
{
{ {
{∗}{}{}
E step 2 step 3
{␣,0,1,#}{}{L}
In Step 3 you are looking for the first unmarked 0; if you find it, mark it and move to #:
529
{∗,0}{}{R} {1}{}{R} {0}{}{R}
{#}{}{R} {#}{}{R}
q0 q1 q2 step 6
{ }
} {} } } }
,␣ }{ }{ {R
{0 ,# }{ { }{}
{1,␣}{}{}
{1 {␣} {#
{∗}{}{} {0}{✓}{R}
E step 2 step 3 q3 {0}{}{R}
{␣,0,1,#}{}{L}
{∗,✓}{}{R}
{#}{}{R}
step 4
Note, in state step 3 we did not show what to do if the input is 1; and in state q3 we did not show
what to do for 1 and ␣. This is because the TM already verified the input format so a 1 or ␣ is
not possible. Now for step 4.
{∗,0}{}{R} {1}{}{R} {0}{}{R}
{#}{}{R} {#}{}{R}
q0 q1 q2 step 6
}{}
}{ {} } R}
,␣ {} }{ }{
{1,␣}{}{} {0 1,#
}
␣}
{
#}
{
{ { {
{∗}{}{} {0}{✓}{R}
E step 2 step 3 q3 {0}{}{R} step 5
{#
{␣,0,1,#}{}{L} }{
}{ {∗,✓}{}{R}
L} {#}{}{R} }
}{R
{✓
{1}
{#}{}{L}
{✓1 }{1}{L} q4 step 4
{✓}{}{R}
(State q4 is responsible for moving right unmarking the 1’s.) In step 6 you are just checking for no
unmarked right 0. In step 5 you are moving right to mark a 0. We implement both steps together.
{∗,0}{}{R} {1}{}{R} {0}{}{R} {1,#,✓0 }{}{R}
{#}{}{R} {#}{}{R} {␣}{}{}

q0 q1 q2 step 6 A
{ }
{} } } {0}
,␣
} {} }{ {R {}{
{} {}
{1,␣}{}{} {0 1,#
}
␣}
{ } }
E
{ { {# {1,✓,#}{}{R}
{∗}{}{} {0}{✓}{R}
E step 2 step 3 q3 {0}{}{R} step 5
{␣}{}{}
{#
{␣,0,1,#}{}{L} }{
{L}
}{ {∗,✓}{}{R}
L} {#}{}{R} }
✓}
}{R
{✓
{1}
}{
{0
{#}{}{L}
{✓1 }{1}{L} q4 step 4 q5
{✓1 }{}{R}
{✓}{}{R}
{✓0 ,1,#}{}{L}
Exercise 26.6.
(a) Using a Turing machine to solve a regular language is like using a sledgehammer to hammer
in a thumb-tack. Our TM is just a glorified DFA:
530
{∗,1}{}{R} {0}{}{R}
{0}{}{R} {1}{}{R}
q0 q1 A
}
}{
}{
{␣}{}{}
{␣
E
(b) There are two ways to solve the problem. The first is to insert a punctuation character be-
tween the two w’s and then use the Turing machine we already developed to solve the problem:
1: If the first symbol is ␣, accept (empty input).
2: Return to ∗.
* 0 1 0 1 ␣
3: Move right to the first unmarked bit and mark it.
If you come to ␣ (equal number of unmarked bits right of ␣):
✓
mark it with # * 0 1 0 1 ␣
return to ∗, unmarking any marked bits, and goto Step 5.
4: Move right to the first unmarked bit before ␣. ✓
* 0 1 0 ␣ 1 ␣
If none exists (odd-length string) reject.
Erase and copy the bit to the blank on its right;
✓
goto Step 2. * 0 1 0 ␣ 1 ␣
5: Use the Turing machine that solves w#w.
accept or reject using the output of this Turing machine. ✓ ✓
* 0 1 ␣ 0 1 ␣
On the right we show the Turing machine in action up to the point where
it invokes the Turing machine that solves w#w. The entire purpose of this ✓ ✓
* 0 1 ␣ 0 1 ␣
Turing machine is to insert the punctuation character at the midpoint of
the input, and can be viewed as a preprocessing machine which reconfig- ✓ ✓
ures the input into a format that be used by some other Turing machine. * 0 1 # 0 1 ␣
This concept should be familiar to computer scientists: using someone
else’s program to solve a task, but reconfiguring your input so that their * 0 1 # 0 1 ␣
program will accept the input.
Our second approach builds a Turing machine to directly solve ww. The previous approach
involves less effort since it leverages previously solved problems; but, this leads to program
(Turing machine) bloat. Directly solving the problem gives a more compact Turing machine.
531
1: If the first symbol is ␣, accept (empty input).

2: Return to ∗. * 0 1 0 1 ␣
// Mark the first string with ✓ and the second with ✘
3: Move right to the first unmarked bit and mark it ✓. ✓
* 0 1 0 1 ␣
If none exists (you come to ✘), goto Step 5.
4: Move right to the last unmarked bit and mark it ✘. ✓ ✘
If none exists (the first right symbol is ␣ or ✘) reject. * 0 1 0 1 ␣
(the input has an odd number of bits)
Otherwise, after marking, goto Step 2. ✓ ✘
* 0 1 0 1 ␣
After the loop involving steps 3 and 4, the input string is partitioned into
two halves: the first is marked with ✓ and the second with ✘. ✓ ✓ ✘
* 0 1 0 1 ␣
5: Return to ∗
// Match each ✓-bit with a corresponding ✘-bit ✓ ✓ ✘ ✘
* 0 1 0 1 ␣
6: Move right to the first bit marked ✓.
If none exists (you come to ␣) accept
✓ ✓ ✘ ✘
Otherwise remember the bit and unmark it. * 0 1 0 1 ␣
7: Move right to the first bit marked ✘.
If the bit does not match the bit remembered, reject. ✓ ✘ ✘
* 0 1 0 1 ␣
If it is a match, unmark the bit and goto Step 5.
It is now just a matter of constructing the machine-level instructions for ✓ ✘
* 0 1 0 1 ␣
each step and snapping them together to get the final Turing machine.
Let’s do it for our second approach which directly solves ww. ✓ ✘
{∗}{}{R} {∗}{}{L} {✓}{}{R} * 0 1 0 1 ␣
{0,1}{}{L} {∗}{}{R} ✘
q0 step 2 step 3 * 0 1 0 1 ␣
{0,1}{✓}{R}
{∗
}{
{␣}{}{}
}{ * 0 1 0 1 ␣
}
step 5
* 0 1 0 1 ␣
A step 4
* 0 1 0 1 ␣
In steps 1–3 above, the notation ∗ means any symbol that is not ∗. Now for step 4 and 5.
{∗}{}{R} {∗}{}{L} {✓}{}{R}
{0,1}{}{L} {∗}{}{R}
q0 step 2 step 3
{∗}{}{L}
{0,1}{✓}{R}
{0,1}{✘}{L} {∗
}{
{␣}{}{}
}{
} {∗}{}{R}
q2
step 5 step 6
{✘,␣}{}{L}
{0,1}{}{R} {✘,␣}{}{R}
A q1 step 4 E
{0,1}{}{R}
Lastly, we must implement step 6 and 7 in which the bits marked ✓ are matched with bits
marked ✘.
532
{∗}{}{R} {∗}{}{L} {✓}{}{R}

✘ }{} {✘}{}{R}
{ 0 }{0 z7
{0,1}{}{L} {∗}{}{R}
}
{R
q0 step 2 step 3
{
✘ 1}
0}
{
{∗}{}{L} {0,1}{}{R}
0}
{0,1}{✓}{R}
{}
{0,1}{✘}{L} {∗
{✓
}{
{}
{␣}{}{} }{
} {∗}{}{R} {␣}{}{}
q2
step 5 step 6 A E
{✘,␣}{}{L}
{1
{}
✓ }
{0,1}{}{R} {✘,␣}{}{R}
{}
A q1 step 4 E
{1
0}
{✘
}{
R}
{ ✘1}{1 o7
{0,1}{}{R} }{} {✘}{}{R}
We encourage the reader to construct the Turing machine which first preprocesses the string
into w#w and then cascade it with our Turing machine that solves w#w.
Pop Quiz 26.7.
* 0 0 # 1 1 1 ␣ ␣ ␣ ␣ ␣ ␣ ␣ ␣
* 0 0 # 1 1 1 # ␣ ␣ ␣ ␣ ␣ ␣ ␣
* 0 0 # 1 1 1 # ␣ ␣ ␣ ␣ ␣ ␣ ␣
✓
* 0 0 # 1 1 1 # ␣ ␣ ␣ ␣ ␣ ␣ ␣
✓
* 0 0 # 1 1 1 # ␣ ␣ ␣ ␣ ␣ ␣ ␣
✓ ✓
* 0 0 # 1 1 1 # ␣ ␣ ␣ ␣ ␣ ␣ ␣
✓ ✓
* 0 0 # 1 1 1 # 0 ␣ ␣ ␣ ␣ ␣ ␣
✓ ✓
* 0 0 # 1 1 1 # 0 ␣ ␣ ␣ ␣ ␣ ␣
Exercise 26.8.
(a) (i) The Turing machine copies the bits over one by one.
1: Move right to the first ␣ and write #.
2: Return to ∗.
3: Move right to first non-marked before #.
Remember and mark the bit.
If you reach #, return to ∗ unmarking all the ✓ and halt.
4: Move right to first ␣, write the remembered bit and goto step 2.
(ii) L = {w#w | w ∈ Σ∗ }.
(b) (i) We use a ✘ to simulate the punctuation #.
1: Move right to the first ␣ and mark with ✘.
2: Return to ∗.
3: Move right to first non-marked before ✘.
Remember and mark the bit with ✓.
If you reach ✘, unmark the bit, return to ∗ unmarking all the ✓ and halt.
4: Move right to first ␣, write the remembered bit and goto step 2.
(ii) L = {ww | w ∈ Σ∗ }.
(c) (i) Write a 1 for every zero and repeat for every zero.
1: Move right to the first ␣ and mark with #.
533
2: Return to ∗.
3: Move right to first non ✘-marked 0 and mark with ✘.
If you reach #, return to ∗ unmarking all 0’s and halt.
4: Return to ∗.
5: Move right to first non ✓-marked 0 and mark with ✓
If you reach #, return to ∗ unmarking ✓s (leaving the ✘s) and goto step 3.
6: Move right to first ␣ and write 0.
7: Move left to first ✓ and goto step 5.
2
(ii) L = {0•n #1•n | n ≥ 0}.
(d) (i) Mark and replace the first with the last bit and vice versa and continue.
1: Move right to the first non-marked bit. Mark it and remember it.
If you reach ␣, return to ∗, erasing all marks and halt.
2: Move right to the last non-marked bit.
If there is none, return to ∗, erasing all marks and halt.
Otherwise, remember it, replace it with the bit from step 1 and mark it.
3: Move left to the first marked bit.
Replace the bit with the bit remembered in step 2 and goto step 1.
(ii) L = {w#w r | w ∈ Σ∗ }.
Pop Quiz 26.9. (a) → q0 → E. (b) → q0 → q1 → A.
(c) → q0 → q0 → q1 → q0 → q1 → q0 → q1 → q0 → q1 → · · · (infinite loop).
Pop Quiz 26.10. Yes. Every decider also trivially recognizes its language so every decidable
language is also recognizable.
Exercise 26.11. Let M be Turing machine that decides L. We assume that the states, symbols
and transition instructions are suitably punctuated.
1: Process each state to check that there is a valid instruction telling the machine what to do if
in that state for every symbol.
2: for each state, marking it when you process it do
3: for each symbol, marking it when you process it do
4: Find the transition instruction that begins with the state and symbol and verify that it
is a valid transition instruction.
Our Turing machine is verifying that the input string represents a valid Turing machine, much
like one of the tasks of a compiler is to verify that the input program is a valid program.
Chapter 27
Pop Quiz 27.1.
(a) To grade the submission correctly, the TA must determine if the program prints “Hello World!”.
If the program does print “Hello World!”, Goldbach’s conjecture is false; or, if the program
does not print “Hello World!”, Goldbach’s conjecture is true. In either case, the TA resolves
an open conjecture and becomes famous.
(b) Run the ultimate debugger on the student’s submission. If the debugger says the program has
no infinite loop, then Goldbach’s conjecture is false. If the program runs forever, Goldbach’s
conjecture is true.
(c) When you run confuse on confuse, first confuse runs auto-grade on confuse.
If auto-grade says no (confuse does not print “Hello World!”) then confuse replaces
the output file with “Hello World!” and halts. That is confuse does print “Hello World!”: a
contradiction.
If auto-grade says yes (confuse does print “Hello World”) then confuse erases the output
file which means it does not print “Hello World”. Again, a contradiction.
confuse is not a valid decider since it cannot output an answer on some input. But confuse
exists if auto-grade exists, therefore auto-grade cannot exist. Something is FISHY here.
If auto-grade does not exist, then what are the TA’s using to grade the CS1 assignments?
Exercise 27.2. When we mark a vertex, we also mark the edges containing this vertex. This
requires a lot of zig-zagging (we supress the details). Note: the encoding hGi contains all the
punctuation.
534
Marking node “1” in the edges requires

* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ zig-zagging. The TM cannot just “re-
member” the label “1” and mark in in
✓
, , , , the edges, because the TM cannot “re-
* 1 ; 2 ; 3 ; 4 # 1 2 ; 2 3 ; 1 3 ; 3 4 ␣
member” labels. The TM has a finite
✓ ✓
number of states with which to “remem-
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ ber” things. The vertex labels are ar-
bitrary, and there can be arbitrarily
✓
; ; ;
✓
, ; , ;
✓
, ; , many of them since the graph can have
* 1 2 3 4 # 1 2 2 3 1 3 3 4 ␣
any finite size. Vertex label “1” is really
✓ ✓ ✓
“00000001” (ascii code). What hap-
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ pens is that the TM marks the position
of the vertex label and remembers the first bit in the label (marking it). Then the TM moves
right to the edges looking for a vertex with that first bit. If it finds one, it marks the bit and then
zig-zags to match and mark all bits. If all bits match, the vertex is marked as matched. If at some
point the bit does not match, the vertex is marked as non-matched so that it does not have to
be checked again. After processing the vertex, all edge vertices that were marked as non-matched
are unmarked. In our figure above, we skipped over the steps when no match is found. At the
end of this phase, the TM is ready to scan the edges for an unmarked vertex.
✓ ✓ ✓
The head moves right to mark with ✘
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ the first unmarked vertex in the edges,
also finding and marking this vertex
✓ ✓
,
✘
,
✓
, , in the node-list. Now, the TM marks
* 1 ; 2 ; 3 ; 4 # 1 2 ; 2 3 ; 1 3 ; 3 4 ␣
all locations in the edges where the
✓ ✘ ✓ ✘ ✓
newly marked vertex occurs. Lastly,
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ the head returns to ∗ changing every
✘ to ✓. Again, the TM is ready to
✓
;
✘
; ;
✓
,
✘
;
✘
, ;
✓
, ; , scan the edges for another unmarked
* 1 2 3 4 # 1 2 2 3 1 3 3 4 ␣
vertex. For the remaining steps we sup-
✓ ✓ ✓ ✓ ✓ ✓
press the zig-zagging and only show the
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ high-level of what happens.
Next, the TM marks an unmarked ver-
✓ ✓ ✓ ✓ ✓ ✘ ✓
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ tex among the edges (in this case vertex
“3”), and repeats the dance of marking
✓ ✓ ✘ ✓ ✓ ✓ ✘ ✓
the vertex, and then all edges in which
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ the vertex appears. Finally, the TM
gets to vertex “4”, and when all is said
✓ ✓ ✘ ✓ ✓ ✓ ✘ ✓ ✘ ✘ and done, the TM returns to ∗ to per-
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣
form one last scan of the vertices. In
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ the last scan, the TM looks for an un-
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ marked vertex, in which case the graph
is disconnected. Here, all vertices are
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✘ marked, and the graph is connected.
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣
We went through the details in some of
✓ ✓ ✓ ✘ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✘ their gore to show you that a TM can
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ indeed solve problems which we are ac-
customed to writing programs to solve.
✓ ✓ ✓ ✘ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✘ The TM can indeed solve all the nice
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣
problems we are used to seeing solved
✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ by a computer. The TM is a good
* 1 ; 2 ; 3 ; 4 # 1 , 2 ; 2 , 3 ; 1 , 3 ; 3 , 4 ␣ model of a computer.
Exercise 27.3. No, because if w 6∈ L1 but w ∈ L2 , we need a TM M that will halt with
accept. However, because M1 is a recognizer for L1 , in step 1 it may not halt which means the
construction for M will not halt.
535
It is indeed the case that recognizable languages are closed under union, but we need a more
sophisticated construction for M . Essentially M must interleave the running of M 1 and M2 , so
M runs one step of M1 and then one step of M2 and so on. We do not give the details.
Pop Quiz 27.4. (See also the solution to Exercise 27.2.) The reason is that the TM can only
remember finitely many things using its states. Since M on the other hand can an arbitrary
number of states, the TM cannot “remember” the state. It marks the state, and must match
bit-by-bit for a match to the transition instruction (depending on the tape input at ).
In order to match the state bit-by-bit, the TM zig-zags.
Exercise 27.5. This Demonic fixed TM is Utm : L(Utm ) = Ltm , which we know is undecidable.
Exercise 27.6. Our sketch implements the program our diabolical student gave in Pop Quiz 27.1.
Dtm = Diabolical student’s Turing Machine
input: empty tape.
1: Write 4 (in binary) on the tape.
2: Test if the number on the tape is a sum of two primes.
If not, accept;
otherwise add 2 to the number on tape and repeat step 2.
Running Atm on Dtm with empty input solves Goldbach’s congecture. Goldbach’s conjecture is
false if and only if Atm accepts.
Atm is a general purpose tool that does not exist. That does not mean we cannot “mathematically”
analyze the the special TM Dtm to determine if it terminates successfully.
Exercise 27.7.
(a) Lempty = {hM i | M is a TM and L(M ) = ∅}.
Suppose Etm decides Lempty . We sketch a decider Htm for Lhalt that uses Etm . Since Lhalt is
undecidable, this is a contradiction, which proves that Etm does not exist.
Htm = Decider for Lhalt that uses Etm
input: hM i#w, where M is a TM and w its input.
1: Modify M to M ′
M ′ = Modified version of M
input: w ′
1: Run M on w.
2: accept
2: Obtain hM ′ i, the encoding of M ′ .

3: Run Etm (hM ′ i) and accept if and only if Etm rejects.
There are several interesting points in our construction of H tm .

(i) The inputs to Htm are hM i and w, and these are “hard-coded” into M ′ .
(ii) You can define another TM, in this case M ′ , inside a TM.
(iii) M ′ accepts every input providing M halts on w.
(iv) Htm never actually runs M ′ or M on w; it only encodes M ′ into its description hM ′ i.
First Htm always halts, so it is a decider, because Etm is a decider and always halts. Second,
L(M ′ ) = ∅ if and only if M does not halt on w, therefore Htm is a decider for Lhalt , which
is the contradiction we desire.
(b) Leq = {hM1 i#hM2 i | M1 and M2 are TM’s and L(M1 ) = L(M2 )}.
Suppose you have a decider for Leq , call it EQtm . We construct a decider Etm for Lempty as
follows. Let M ∗ be a TM that immediately halts and rejects on every input. So L(M ∗ ) = ∅.
Now let Etm (hM i) = EQtm (hM i#hM ∗ i). Etm accepts if and only if L(M ) = L(M ∗ ) = ∅.
Therefore Etm decides Lhalt , a contradiction, and so EQtm does not exist.
Exercise 27.8. The dominoes must start with a domino with the same first character in the
top and bottom. The only possibility is d1 ,
10
d1 =
101
536
For the next domino, the top first character must be 1 and the remaining characters on top must
match characters on the bottom. The only possibility is d3 ,
10 101 10101
d1 d3 = =
101 011 101011
We still have a hanging 1 at the bottom. Repeating the argument, the only possibility is d 3 again,
and again, and so on. You can never get rid of the hanging 1 at the bottom. The only possible
solution is d1 d3 d3 d3 · · · , which is infinite. That is, this instance of PCP has no solution.
Exercise 27.9. So now, each domino is described by n, m where n is the number of 1’s on top
and m is the number of 1’s on the bottom. So the input is n1 , n2 , . . . , nℓ #m1 , m2 , . . . , mℓ .
We will just give the algorithm to decide this instance relying on the Church-Turing thesis which
says this algorithm can be converted to a TM-decider. We encourage the reader with some spare
time to actually construct the TM.
1: If ni > mi for all i or ni < mi for all i then reject.
2: accept
The algorithm is simple enough, testing to see if all the top values are either less than or greater
than the bottom values. This algorithm always halts, and we must prove that the decision is
always correct.
First suppose ni > mi (resp. ni < mi ) for all i. Then the top string will always be longer (resp.
shorter) than the bottom string, and so it is not possible to have a match and reject is correct.
Now suppose it is not the case that ni > mi for all i or ni < mi for all i. That is ni ≤ mi for at
least one i and ni ≥ mi for at least one i. There are two cases:
(i) If ni = mi then di is a trivial solution.
(ii) ni < mi for at least one i and ni > mi for at least one i. Without loss of generality, we may
assume n1 < m1 and n2 > m2 . Consider the sequence of dominoes
•(n2 −m2 ) •(m1 −n1 )
d1 d2 .
The number of 1’s in the top is n1 (n2 − m2 ) + n2 (m1 − n1 ); the number of 1’s in the bottom is
m1 (n2 − m2 ) + m2 (m1 − n1 ). The difference is
m1 (n2 −m2 )+m2 (m1 −n1 )−n1 (n2 −m2 )−n2 (m1 −n1 ) = (m1 −n1 )(n2 −m2 )+(m2 −n2 )(m1 −n1 ) = 0.
That is, the top and bottom strings match and accept is correct.
Exercise 27.10.
(a) Use a second tape as the counter. Every time the TM M executes a transition instruction,
it first moves right on the counter tape and writes 1, then it performs its instruction. So, for
every instruction, the counter tape will have a 1. If a single tape TM is desired, merge the
second tape with the first using the trick in Example 27.1.
(b) Each step touches at most one new tape-slot, so at most n tape-slots are touched.
(c) First, come back to ∗. For every 1 on the counter move L one step. Now, for every 1 on the
counter tape, move R twice, each time erasing the tape (if you come to ∗, move R). All slots
within n of ∗ are now erased.
Exercise 27.11. The homework should require that the program should solve the problem
within a given number of steps (CPU-cycles or time). It is possible to decide if a TM prints “Hello
World!” and halts within a given number of steps. Simply simulate the machine for that number
of steps. If it halts, examine the tape for “Hello World!”. If it does not halt, reject.
Exercise 27.12.
(a) Let M and M be recognizers for L and L respectively. We sketch a decider for L. First, here
is a sketch that does not work.
D = Flawed Decider for L that uses M and M
input: w (the input to M or M )
1: Run M on w; if M accepts, then accept.
2: Run M on w; if M accepts, then reject.
We know that one of M , M must accept, because they are recognizers and either w ∈ L or
w ∈ L. However, in step 1, M may infinite-loop (M is only a recognizer). In this case D also
537
infinite-loops and so it is not a decider, which must always halt. The solution is to interleave
the running of M with M , and if either accepts at any point in the interleaved execution,
then perform the appropriate action.
D = Decider for L that uses M and M
input: w (the input to M or M )
1: Run M for one step on w; if M accepts, then accept.
2: Run M for one step on w; if M accepts, then reject.
3: goto step 1.
The easiest way to implement this interleaving is using a second tape. Each time you switch
machines, you switch tapes. Now, since one of M , M must halt, D also halts. If w ∈ L, then
M halts and accepts, so D also accepts; If w 6∈ L, then M halts and accepts, so D rejects.
That is, D is a decider for L and L is decidable.
(b) In (a) we proved:
if L and L are recognizable then L is decidable.
The contrapositive (which is equivalent) is:
if L is undecidable then it is not the case that L and L are recognizable.
This is exactly what was to be proved.
(c) Ltm recognized by Utm . Ltm is undecidable, therefore by (b) Ltm is unrecognizable.
Lhalt is recognizable (just simulate the machine and accept if it halts). L halt is undecidable,
therefore by (b) Lhalt is unrecognizable.
Fact: with every undecidable problem L comes at least one unrecognizable problem (L or L).
Chapter 28
Exercise 28.1.
(a) At each scan, an equal number of 0’s and 1’s are marked, so the total number of bits is always
even, so the algorithm will not prematurely exit until all the 0’s, and correspondingly all the
1’s, are marked.
(b) We prove the claim by strong induction on n, the number of unmarked bits. It is easy to
verify the base cases with n = 1. For the induction step, suppose the claim holds when the
number of unmarked bits is at most n and consider the claim for n + 1 unmarked bits. If n + 1
is odd, then there is nothing to prove.
Suppose n + 1 is even. The number of unmarked 0’s and 1’s are either both even or both odd.
We consider each case separately.
(i) Both even: Let the number of unmarked 0’s be 2k and the number of unmarked 1’s be 2ℓ,
with ℓ 6= k. After one scan, marking every other 0 and 1, there will be k unmarked 0’s and ℓ
unmarked 1’s, with k 6= ℓ and k + ℓ ≤ n. By the induction hypothesis, at some point there
will be an odd number of unmarked bits and the algorithm will reject.
(ii) Both odd: Let the number of unmarked 0’s be 2k + 1 and the number of unmarked 1’s be
2ℓ + 1, with ℓ 6= k. After one scan, marking every other 0 and 1, there will be k unmarked 0’s
and ℓ unmarked 1’s, with k 6= ℓ and k + ℓ ≤ n. By the induction hypothesis, at some point
there will be an odd number of unmarked bits and the algorithm will reject.
In both cases, an odd number of unmarked bits will occur and the claim holds for n + 1, so
the claim holds for all n ≥ 1.
(c) We give the sketch and analyze the runtime.
M = Efficient Turing Machine that solves {0•n #1•n }
input: Binary string w.
1: Check that the input has the correct format and return to ∗.
2: Check that the number of unmarked bits is even.
If not, reject.
If there are no unmarked bits, accept.
3: Mark every other unmarked 0 and every other unmarked 1. goto step 2.
538
Fixing the size of the input, the worst case runtime is for 0•n #1•n , because otherwise the al-
gorithm exits early with a reject. For 0•n #1•n , steps 1,2 and 3 can all be accomplished in a
single scan of the input, that is Θ(n) steps. Abusing notation a little, the runtime is given by
number of times steps 2 and 3 are executed × Θ(n).
If the number of unmarked zeros is 2k or 2k + 1 before step 3 is executed, then the number
of unmarked zeros after step 3 is 2k (see part (b)). That is, each time step 3 is executed, the
number of unmarked 0’s drops by a factor of at least 2. If step 3 is executed m times, then
the number of unmarked 0’s is at most n/2m , which will be less than one if 2m > n. If the
number of unmarked 0’s is less than one, the machine exits. This means the machine exits
after ⌊ 1 + log2 n ⌋ steps and the worst case runtime is
⌊ 1 + log2 n ⌋ × Θ(n) ∈ Θ(n log n).
√
Pop Quiz 28.2. log n, n, n, n2 log n are all fast because they are upper bounded by n3 , and
n is fast with λ = 8, because (2n)3 ≤ 8n3 .
3
√
(log2 n)log2 n , nlog2 n , 2 n , 2n are all not-fast because for n ≥ 220 , they are all at least (log2 n)log2 n ,
and we show, by contradiction, that (log 2 n)log2 n is not fast. Suppose (log2 n)log2 n is fast. So
there is a function f (n) for which
(log2 n)log2 n ≤ f (n) and f (2n) ≤ λf (n).
Because f is fast, f (2 ) ≤ λf (2k−1 ) ≤ λ2 f (2k−2 ) ≤ · · · ≤ λk−1 f (2). Since f is an upper bound,
k
f (2k ) ≥ kk . Choose any k > max(λ, f (2)). Then

f (2k ) ≥ kk > kk−1 · k > λk−1 f (2),
k k−1
which contradicts f (2 ) ≤ λ f (2). Therefore, (log2 n)log2 n is not fast.
√
fast log n n n n2 log n
√
not fast (log2 n)log2 n nlog2 n 2 n
2n
Exercise 28.3. Let n = 2k , so k = log2 n.

f (2k ) = 2k f (2k−1 )
= 2k 2k−1 f (2k−2 )
= ···
= 2k 2k−1 · · · 21 f (20 )
= 2k+(k−1)+···+1 f (1)
= 2k(k+1)/2 f (1)
k (k+1)/2
= √ )
(2 f (1)
= nlog2 n+1 f (1).
Pop Quiz 28.4. Our algorithm assumes the input does not contain ␣ symbols.
1: Move to the end of the input and mark the last symbol.
2: Shift all marked symbols 1 step right ✓
repeat, moving right, until you come to two blank symbols. * 0 # 0 ␣ ␣ ␣ ␣ ␣
3: Move left to the first unmarked symbol.

✓
Mark the symbol and got to step 2. * 0 # ␣ 0 ␣ ␣ ␣ ␣
If you come to ∗, unmark all symbols.
We illustrate with input 0#0 on the right. For an input of size n, step 1 ✓ ✓
* 0 # ␣ 0 ␣ ␣ ␣ ␣
takes Θ(n) steps. In steps 2 and 3, for k marked symbols, the work done
is approximately 2k steps to shift the marks and 2k steps to come back ✓ ✓
to an unmarked symbol, for a total of about 4k. The number of marked * 0 ␣ # ␣ 0 ␣ ␣ ␣
symbols goes from 1 to n, and so the runtime is of this loop is
Xn ✓ ✓ ✓
* 0 ␣ # ␣ 0 ␣ ␣ ␣
4k = 2n(n + 1) ∈ Θ(n2 ).
k=1
✓ ✓ ✓
The total run time is in Θ(n + n2 ) which is in Θ(n2 ). * ␣ 0 ␣ # ␣ 0 ␣ ␣
539
Pop Quiz 28.5. If Mtwo has implemented t steps, the heads are at most 2t slots right of their
beacon (∗ or ** ). So, move L until you find the beacon (at most 2t moves) and then move R until
you find the head (another at most 2t moves). The total number of steps is at most 4t ∈ O(t).
We made the assumption that the head is always right of the beacon (start point). Alternatively,
in Mone , we can implement left and right boundary markers, LB and RB, which mark the left most
and right most points touched. You only need to update LB if you move L from LB and similarly
update RB if you move R from RB. Now, to find the head, move to one of the boundary markers
and then in the other direction until you reach the head. Again, this would be in O(t) steps.
Exercise 28.6.
(a) By Pop Quiz 28.4, step 1 takes O(n2 ) steps. All operations in Mone are constant time except
for switch, which runs in O(n) steps because Mtwo runs in Θ(n) steps (see Pop Quiz 28.5).
So, each execution of step 4 is requires O(n) steps, and step 4 is executed n times, which is
O(n2 ) steps. Similarly for step 5. So the total number of steps is in O(n2 ).
(b) The proof is exactly analogous to the analysis in (a). Reconfiguring the tape into odd and
even slots requires O(n2 ) steps. Now, Mone simulates each step of Mtwo , with the added
complication of switch. In the worst case Mone must switch at every step of Mtwo , which is
at most t(n) switch’s. Each switch requires O(t(n)) steps (Pop Quiz 28.5), so the number
of steps spent switching is t(n) × O(t(n)) ∈ O(t(n)2 ) (abuse of O-notation). Once the switch
is done, Mone simply implements Mtwo ’s instruction in O(1) steps, so it takes O(t(n)) steps
to implement all Mtwo ’s instructions.
The total runtime is in O(n2 + t(n) + t(n)2 ). Notice that if Mtwo is some trivial machine that
immediately rejects, then the overhead to reconfigure the tape, n 2 , dominates. Assuming that
Mtwo is a non-trivial TM that at least examines its input, then t(n) ≥ n in which case t(n) 2
dominates and n2 + t(n) + t(n)2 ∈ O(t(n)2 ), completing the proof.
If you had a k-tape machine, the only difference in the analysis is that
(i) The overhead to reconfigure the tape into k interleaved tapes is kn 2 .
(ii) The switch takes O(kt(n)) steps.
(iii) An instruction of Mk on Mone takes O(k) steps (e.g. move R becomes move k steps R).
The runtime is O(kn2 + kt(n) + kt(n)2 ) ∈ O(kt(n)2 ) ∈ O(t(n)2 ) (since k is a constant).
Exercise 28.7. To prove a problems is decidable, we must sketch a Turing Machine to it.
(a) The idea is to try all permutations of the vertices.

M = Decider for hamiltonian-path
input: Encoding of a graph, hGi.
1: Check that the input has the correct graph-format and return to ∗.
2: Using the list of vertices, list out every permutation of the vertices to the
right of the graph input (with a punctuation character # to separate each
permutation).
3: Process each permutation of the nodes: vi1 vi2 · · · vin
For consecutive vertices vik vik+1 , check that (vik , vik+1 ) is an edge.
(Identify vin+1 with vi1 .)
If every edge exists accept; otherwise process the next permutation.
4: If you didn’t accept for any permutation, reject
The algorithm has non-trivial steps, e.g. how do you list out every permutation of the vertices.
If there is one vertex it is easy. If there are n, you can solve the problem “recursively”: for
each vertex v, listing out the permutations of the other n − 1 vertices and prepend v to the
front.
(b) The idea is to try all subsets of the vertices of size ⌈ n/2 ⌉.
540
M = Decider for clique

input: Encoding of a graph, hGi.
1: Check that the input has the correct graph-format
and return to ∗.
2: Using the list of vertices, list out every n2
-subset of the vertices to the right
of the graph input (with a punctuation character # to separate each subset).
3: Process each subset of the nodes: vi1 vi2 · · · vi⌈ n/2 ⌉
For pair of vertices u, v in the subset, check that (u, v) is an edge.
If every edge exists accept; otherwise process the next subset.
4: If you didn’t accept for any subset, reject

Again, you may be wondering how to list out every n2 -subset. This would be equivalent to
n
n 2 1’s. Again it can be done recursively,
listing out all binary strings of n bits with exactly
listing the binary strings with n− 1 bits and 2 1’s and then prepending 0, plus the binary
strings with n − 1 bits and n2 − 1 1’s and then prepending 1.
We did not get into some of the algorithmic details of our deciders, and it is clear that we have
taken a very lazy approach to solving the problem. Essentially, try all possibilities, and if one
works accept. If none work reject. These deciders take a very long time, but they always
halt. For hamiltonian-path, thenumber of permutations is n! which is super-exponential. For
n
clique, the number of subsets is n/2 (assume n is even) and

n n(n − 1)(n − 2) · · · (n + 1 − n2 ) n−1 n−2 n − ( n2 − 1)
n = n n n n n =2×2 ×2 ··· × 2 ≥ 2n/2 .
2
( − 1)( 2 − 2) · · · ( 2 − ( 2 − 1))
2 2
n−2 n−4 n − (n − 2)
So, the number of subsets is super-polynomial. Neither of our TMs have polynomial runtime.
Exercise 28.7.
(a) Try every possible factor from 2 to n − 1. If none work, then reject. Otherwise write the
factor that works and accept.
Munary =Transducer to compute a factor of n.
input: 1•n . Assume n ≥ 2.
1: Check the input format and mark a # at the end of the input.
2: Copy the input 1•n to the right of the #.
3: Mark the first 1 after the # (p = 1) with an ✘.
4: Move the ✘ one step R (p ← p + 1).
If ␣ is to the right of ✘, then you came to p = n so reject.
5: Mark a unmarked left 1 for each right 1 from # up to the ✘.
if you run out of left 1’s (p does not divide n), erase all left marks,
goto step 4.
if there are no left 1’s which remain unmarked (p divides n),
erase all 1’s after the ✘ and accept.
if there are left 1’s which remain unmarked (continue “dividing” by p),
repeat step 5.
The ✘ is used to denote p. Each time we implement step 4, p is incremented by 1 because the
previous p is not a valid divisor of n. In step 5 we test if p divides n.
Let us analyze the runtime of Munary .
1: One scan, in Θ(n).
2: n zig-zags of about 2n steps each for Θ(n2 ).
3: Since we are already at #, this is O(1) steps.
4: O(n) to find the ✘; this step is repeated at most n times, for O(n2 ) in total.
5: p zig-zags repeated about n/p times for p × 2(n + p) × n/p ∈ Θ(n2 ) steps.
This step is implemented at most n times, once for each p, which is O(n3 ) steps in total.
The n3 term dominates, so the worst case runtime is in O(n3 ), which is polynomial in n.
There are many ways to make this algorithm more efficient, for example you only need to test
√
the possible factors up to n. Making algorithms more efficient is the dominant content of
your next algorithms course. Our concern was to get a polynomial runtime, any polynomial.
541
(b) We are going to take a particularly dumb approach to this problem. First compute the unary
representation of wn and then run the algorithm from part (a) on this unary representation.
Mbinary =Transducer to compute a factor of n.
input: wn , n in binary. Assume n ≥ 2.
1: Compute wn in unary and write 1•n to the left of ∗.
2: Run the TM Munary on the left of ∗ (replacing L ↔ R).
3: If Munary rejects, reject.
4: If Munary accepts,
Compute the binary wp for the unary p (which is on the left).
Copy wp to the right of wn and erase the tape left of ∗.
A non-trivial task in step 1 is to convert binary to unary. It is not a very hard task. Let
wn = bk−1 bk−2 · · · b1 b0 be the k bits of wn , where bk−1 = 1. Going from i = 0 to k − 1, if
bi =1, you add 2i−1 1’s to the unary representation. We encourage the reader to construct a
TM for binary-to-unary conversion. The details are not essential to our argument.
Since wn = bk−1 bk−2 · · · b1 b0 and bk−1 = 1, n ≥ 2k−1 = 2|wn |−1 . We know that Munary
requires at least n steps to run, so the runtime of Mbinary is at least the runtime of Munary
on 1•n which is at least 2|wn |−1 (the additional steps like the binary to unary conversion can
only add more to the runtime). The runtime is at least exponential, which is non-polynomial.
We emphasize the difference between the natural parameter in the problem, n, and the length of
the input to the TM. Runtimes refer to the length of the input.
In (a), the natural parameter and the length of the input are comparable, equal to n and so a
polynomial TM will have runtime which is polynomial in n (the length of the input).
In (b), the input is wn , which is the binary representation of n. This input has length log 2 n, and
so a polynomial TM for the problem with the input formatted in binary must have a runtime that
is polynomial in log2 n (for example log132 n would work). Our runtime in (b) is at least 2
log2 n−1
which is not polynomial in log2 n.

You may also be astounded by the stupidity of our algorithm in (b). Yes, it is stupid, but not
far off from reality. Nobody has been able to find a factoring-algorithm that is a polynomial in
log2 n. So our algorithm which looks stupid is not that far off from the best we know how to do.
Be the first to find an algorithm which is polynomial in log 2 n and fame will come.
Chapter 29
Pop Quiz 29.1.
(a) yes : For A = {5, 11}, sum(A) = 16 ≥ 15 = 21 sum(S)
(b) no : It is not obvious how to prove the answer is no. The
X=SubsetSum(S)
simplest “proof” is to list out all the possible subset-sums.
// S = {s1 , . . . , sn }
We used the simple program on the right to create this list
1: X = {0};
of possible subset-sums. The idea is to start with 0, and
2: for i = 1 to |S| do
add each value in S to the current possible subset-sums
3: XS ← {};
(removing duplicates). The possible subset-sums are:
4: for j = 1 to |X| do
0, 2, 3, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 16, 5: add si + xj to XS ;
17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 30. 6: X ← X ∪ XS ;
Observe that 15 is not one of the possible subset-sums. 7: return X;
(c) yes : sum(S) = 30 and 3 + 6 + 6 = 15 (also 2 + 11 + 2 = 15).
Exercise 29.2.
(a) 001101.
(b) The evidence would be an encoding of all subsets. The certifier would check each subset’s
sum to verify that it does not work. Since there are exponentially many subsets, the evidence
is exponential and the runtime is exponential.
(c) Let n be the number of elements in the input S. Assume that the runtime of the certifier
(given the input) is a polynomial p(n). To solve a problem with input S, run the certifier
542
with every possible setting for the certificate (n-bit string). If any certificate verifies to yes
then accept. If no certificate verifies to yes then reject.
There are 2n possible certificates, so the runtime is 2n p(n), which is not polynomial. (Tech-
nically, n is not the length of the input, but the input length is related to n.)
Exercise 29.3.
(a) Easier to prove yes . The evidence is the subset; the proof checks that the subset-sum is k.
(b) Easier to prove yes . The evidence is the clique; the proof checks that there are at least k
clique-vertices and there is an edge between every pair of vertices in the clique.
(c) Easier to prove yes . The evidence is an assignment of colors to every vertex; the proof checks
that there are at most k different colors and that every edge has two vertices of different colors.
(d) Easier to prove no . The evidence is integers p, q; the proof checks p, q > 1 and pq = n.
The yes -answer can also be proved quickly because the AKS-test (proved in 2002) shows
that IsPrime ∈ P. If you can solve a problem in polynomial time, you can prove either answer
in polynomial time. Still, it is much easier to prove the no -answer.
(e) Easier to prove yes . The evidence is an isomorphism f from G1 to G2 ; the proof checks that
the number of vertices and edges match and for each edge (u, v) ∈ G 1 , (f (u), f (v)) ∈ G2 .
Pop Quiz 29.4. The valid inputs have length 21 n(n − 1) = 0, 1, 3, 6, 10, 15, 21, 28, 36, . . ..
(a) 12 is not a valid input length. (b) 1 6 3
7 5
2 4
Pop Quiz 29.5. We may use multi-tape TMs because a polynomial multi-tape TM can be
simulated by a polynomial single-tape TM. The first phase checks that e has at least k 1’s.
1: Initialize a second tape with one 1.
2: Mark the rightmost unmarked bit of k as the current bit.
If there are no bits of k to mark this first phase is a success.
3: If the marked bit is 0, goto step 2.
If the marked bit is 1,
Mark as many unmarked 1’s in e as there are 1’s on the second tape.
If you run out of 1’s in e, reject.
4: Double the number of 1’s on the second tape and goto step 2.
In the algorithm abov the number of 1’s on the second tape is 2 i−1 when bit in k being processed.
The head on the main tape just moves right along the evidence e, marking 1’s, making at most n
steps in total. The expensive step is doubling the number of 1’s on the second tape. If there are
2i−1 1’s then adding another 2i−1 P 1’s takes Θ((2i−1 )2 ) steps. Let ℓ be the number of bits in k,
then the work in doubling is in Θ( ℓi=1 22i−2 ) = Θ( 13 (4ℓ )). Since k ≤ n, it means 2ℓ ≤ n and so
the total time spent in step 4 is in Θ(n2 ).
Assuming phase 1 is a success, in phase 2 the algorithm must ensure that an edge is present
between every pair of vertices in the clique. Equivalently, for every edge not present, we ensure
that pair of vertices are not both present in e. Our algorithm needs to mark each edge while
simultaneously keeping track of which pair vertices in e that edge connects.
543
✓ ✘✘
On the right, we illustrate the algorithm keeping track of the node-pairs 1010110101 # 11 # 11010
as it walks through the edgdes on ✓✓ ✘ ✘
1010110101 # 11 # 11010
1: Mark the first edge e1 and the first two vertices in e. ✓✓✓ ✘ ✘
2: If the last edge marked is 0, reject if both marked vertices are 1. 1010110101 # 11 # 11010
✓✓✓✓ ✘ ✘
3: Mark the next edge. If there are no more, accept 1010110101 # 11 # 11010
4: Update the vertex marks and goto step 2. ✓✓✓✓✓ ✘✘
To update the vertex marks, move the ✘ one right. 1010110101 # 11 # 11010
✓✓✓✓✓✓ ✘ ✘
If the ✘ cannot move right, 1010110101 # 11 # 11010
move the ✓ one right and the ✘ to the right of the ✓. ✓✓✓✓✓✓✓ ✘ ✘
1 1010110101 # 11 # 11010
Walking through the edges one step at a time is 2 n(n − 1) steps. For ✓✓✓✓✓✓✓✓ ✘✘
each edge, you move right to the ✘ and update it. Finding the ✘ is 1010110101 # 11 # 11010
✓✓✓✓✓✓✓✓✓ ✘ ✘
O(|hGi|) since |hGi| dominates the size of the input, and updating is 1 010110101 # 11 # 11010
1 2
O(n). So, there are 2 n(n − 1) steps each taking O(n ) time, for a total ✓✓✓✓✓✓✓✓✓✓ ✘✘
of O(n4 ) time, which dominates the runtime of the entire algorithm. 1010110101 # 11 # 11010
The total runtime is in O(n4 ). The input size is in Θ(n2 ), so we have a quadratic algorithm.
Exercise 29.6.
(a) Coloring: The evidence e is a set of colors, 1, 2, . . . , ℓ and a color assignment to each vertex.
Since ℓ ≤ n, the encoding of each color is at most log 2 n bits and the encoding of all the colors
has length at most ℓ log2 n ≤ n log2 n. The color assignments to each vertex also requires at
most n log2 n bits. So the |e| is polynomial. The certifier checks that the number of colors ℓ
is at most k. Then the certifier processes each edge (u, v) and verifies that the colors assigned
to u and v are different. The certifier is polynomial because it has polynomially many things
to do, each taking polynomial time.
(b) Any problem in P has a polynomial decider. The certifier is just this polynomial decider with
no evidence e = ε. Any polynomial time decider is a polynomial time verifier of yes .
Pop Quiz 29.7. The second automaton is the first one with yes and no states switched. Yet,
strings accepted by the first automaton are not necessarily rejected by the second and vice versa.
first automaton second automaton
ε 0 1 00 01 10 11 ε 0 1 00 01 10 11
no yes yes yes yes no yes yes yes no yes yes yes yes
To solve the complement language, you can’t just flip accept to reject states and vice versa. For
example, a string is accepted if (say) 2 of 4 computation paths accept. Now, if you switch the
yes and no states, you still have 2 of 4 computation paths accepting, so you will accept.
There is asymmetry between accept and reject. To accept, one of the posisble computation paths
must accept. To reject, every possible computation path must reject. (Compare with the yes - no
asymmetry in defining NP.) To solve the complement language, you first construct the equivalent
DFA using subset states (Exercise 24.7), and then flip the yes and no states of the DFA.
Pop Quiz 29.8. The deterministic instruction is: if in state q1 and the bit read in is ‘0’, then
transition to state q3 while writing ‘1’ to the tape and then move R.
The non-deterministic instruction is: if in state q1 and write 1; state q3 ; move L
✓
the bit read in is ‘0’, then transition to one of the write 0; state q3 ; move L
states q0 , q1 , q3 (which?), either mark the ‘0’ or replace write 1; state q3 ; move R
✓
it with ‘1’ (which?), and move either R or L (which?). write 0; state q3 ; move R
The non-deterministic machine gets to try all of these write 1; state q1 ; move L
✓
choices, so there are 2 × 2 × 3 = 12 possible choices. state: q0 write 0; state q1 ; move L
The computation splits into 12 possible branches as input: 0 write 1; state q1 ; move R
✓
shown on the right. write 0; state q1 ; move R
In keeping with the tradition for non-deterministic au- write 1; state q0 ; move L
✓
tomata, the non-deterministic TM will accept if any write 0; state q0 ; move L
one of these branches is an accepting computation. write 1; state q0 ; move R
✓
write 0; state q0 ; move R
Pop Quiz 29.9. (a) y = (1 ∧ 0) ∧ (1 ∧ 1) = 0 ∧ 1 = 0. (b) Yes. x1 x2 = 01.
Exercise 29.10.
(a) Given input w and evidence e of length p(n), the certifier M runtime is t(n), a polynomial
544
in n. By Theorem 29.3 in time poly(t(n)), we construct a circuit of size poly(t(n)) which is

fed into the black-box. The runtime of the black-box is polynomial in the size of its input,
therefore, the runtime of the black-box is poly(poly(t(n))). So,
total runtime = time to create circuit + time to run the black-box

= poly(t(n)) + poly(poly(t(n))),
which is polynomial because a polynomial evaluated on a polynomial is a polynomial.
(b) We must construct different certifier circuits for each evidence length from 0, 1, . . . , p(n). So,
we need to run the black-box p(n) times, for each of the p(n) circuits, which essentially multi-
plies the polynomial running time by p(n). Since the product of polynomials is a polynomial,
the runtime remains polynomial.
Note that if you wish to only use the black-box just once, you can take each of the p(n)
circuits and send them through a massive p(n)-way or. Feed this gigantic circuit (which is
only about p(n) times bigger than the one in (a)) into the black-box. If the black-box says
the circuit is satisfiable, then one of the smaller circuits is satisfiable. If the black-box says
the circuit is not satisfiable, then none of the smaller circuits is satisfiable.
Exercise 29.11.
(a) We give two different types of and circuits, illustrating the constructions for ℓ = 8.
and
∧
∧
∧ and
∧
∧ ∧
∧ ∧ ∧
∧
∧ ∧ ∧ ∧
x1 x2 x3 x4 x5 x6 x7 x8 x1 x2 x3 x4 x5 x6 x7 x8
In both circuits, the number of and-gates used is ℓ − 1. The number of gates used is called
the size of the circuit, so the sizes of our circuits are ℓ − 1. A circuit is a directed graph.
The depth of a circuit is the length of the longest path. The main difference between our two
circuits is the depth. On the left, the depth is ℓ − 1, linear in ℓ. You should convince yourself
that the circuit on the right has depth ⌈ log 2 ℓ ⌉, logarithmic in ℓ.
Processor-chips are implemented using gates, starting with building blocks like multi-input-
and, adders, multipliers, sorters, etc. Designing circuits to accomplish tasks while minimizing
size and depth are important considerations for efficienct chip-design. Circuit complexity
theorists study the minimum size and depth requirements for certain basic operations which
are primitives of more complex operations. We know quite impressive designs for many basic
operations, but surprisingly little about the best one can do, even for basic operations.
(b) We begin with the primitive task of moving a 1 all the way to the left. And, to simplify even
further, consider just n = 2 (two input bits). On the left is a simple circuit for 2 input bits.
y1 y2 You can verify the input-output table on the right to confirm input output
that this circuit shifts the 1 to the left. This circuit also sorts x1 x2 y1 y2
∨ ∧ two input-bits. Let us denote this circuit as the bubble-left 0 0 0 0
(bl) “gate”. The 1 in the input bubbles “up” to the left. We 0 1 1 0
now combine bl-gates to build a circuit that shifts a 1 (if 1 0 1 0
x1 x2
there is a 1) all the way to the left of an n-bit string. We call 1 1 1 1
this the shift-l circuit. Suppose we have a shift-l circuit for n − 1 bits. For the n bits
x1 , . . . , xn , apply the shift-l circuit to x2 , . . . , xn . Now, if there is a 1 in x2 , . . . , xn , it will
be on the left, at position x2 . Applying the bl-gate to x1 and the output at the x2 position
will move the 1 at the x2 -position to the left. Here is an illustration for a 4-bit input.
545
y1 y2 y3 y4
bl
y1 y2 y3 y4
bl bl
y1 y2 y3 y4
shift-l(4) = shift-l(3) = shift-l(2)
x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4
In the second step, we recursively applied the construction to shift-l(3). Observe that
shift-l(2) is just a bl-gate, and so shift-l(4) is just a cascade from right to left of 3 bl-
gates. In general, shift-l(n) is a cascade from right to left of n − 1 bl-gates. Since a bl-gate
consists of two ∧/∨ gates, shift-l(n) uses 2n − 2 gates. The proof is by induction. The
base case is shift-l(2) which is just a bl-gate that uses 2 ∧/∨ gates. For the induction,
assume shift-l(n) uses 2n − 2 gates; by our construction, shift-l(n + 1) adds one bl-gate to
shift-l(n), adding 2 ∧/∨ gates, resulting in a total of 2n gates. The two important properties
of shift-l(n) that we need are:
Lemma 30.9. Let (y1 , . . . , yn ) = shift-l(n)(x1 , . . . , xn ). Then,
(i) (y1 , . . . , yn ) and (x1 , . . . , xn ) have the same number of 1’s;

(ii) if (x1 , . . . , xn ) contains a 1, then y1 = 1.
In words, the output of shift-l(n) is a permutation of its input with at least one 1 having
“bubbled-up” all the way to the left. We encourage you to tinker and confirm by hand that
shift-l(4) applied to 0111 gives 1011. The proof of the lemma is by induction.
Proof. The base case is shift-l(2) for which the claim is true by inspecting the input-output
table for the bl-gate. Suppose the claim holds for shift-l(n) and consider shift-l(n +
1) applied to (x1 , x2 , . . . , xn+1 ) in two steps. First apply shift-l(n) to (x2 , . . . , xn+1 ) to
get (y2′ , y3 , . . . , yn+1 ). Now apply the bl-gate to (x1 , y2′ ) to get (y1 , y2 ). By the induction
hypothesis, both operations preserve the bits, so the result is a permutation. We now prove
the contrapositive of property (ii). If y1 = 0, then, from the input-output table of the bl-gate,
the only possibility is x1 = y1′ = 0. If y1′ = 0, then by the induction hypothesis there were no
1’s in (x2 , . . . , xn+1 ), for if there were 1’s, the application of shift-l(n) would make y1′ = 1.
Thus, if y1 = 0, then x1 = x2 = · · · = xn+1 = 0, that is there are no 1’s in the input (the
contrapositive of property (ii)). Therefore, properties (i) and (ii) hold for shift-l(n + 1), and
the lemma follows by induction.
We now use shift-l(n) to construct the sorter. The idea is simple. y1 y2 y3 y4

Repeatedly shift a 1 to the left. Eventually, all the 1’s will be on the
left. This idea is illustrated on the right for 4 inputs. For an n-bit shift-l(2)
input, the number of gates in the sorter is
n−1
X shift-l(3)
2 + 4 + · · · + 2(n − 1) = 2 i = n(n − 1) ∈ Θ(n2 ).
i=1
It is helpful to see how the sorter works on input 0111. shift-l(4)
shift-l(4) shift-l(3) shift-l(2)
0111 1011 1101 1110 x1 x2 x3 x4
This example also illustrates why we need all the layers of shift-l,
because none of the intermediate outputs are sorted. Observe that in each application of
shift-l, a 1 “bubbles-up” to the left. You may have seen a sorting algorithm called bubble-
sort in your earlier programming courses. We have created a circuit version of bubble-sort.
We now prove that our sorter works.
Theorem 30.10. Our construction produces a sorter using Θ(n 2 ) ∧/∨ gates.
546
Proof. The proof is by induction, based on the observation that y1 y2 y3 y4

our sort(n) circuit is composed of a shift-l(n) circuit followed by a
sort(n−1) circuit applied to the last n−1 outputs of the shift-l(n)
sort(3)
circuit, as illustrated for n = 4 on the right. The shift-l(n) circuit
moves a 1 to the left, which becomes the output y1 . By the induction
hypothesis, we assume that sort(n − 1) works, which means that its shift-l(4)
output is 1’s followed by 0’s, which become the outputs y2 , . . . , yn .
Therefore the full output y1 , . . . , yn is 1’s followed by zeros, which is x1 x2 x3 x4
sorted as required.
Pop Quiz 29.12.
(a) The first three bits after sorting are 1, so the and after the sorter outputs 1. It now suffices
to show that every input to the and in the clique-verifier is 1: C1,2 = C1,3 = C1,4 = C1,5 = 1
because x1 = 0; C2,3 = 1 because e2,3 = 1; C2,4 = 1 because e2,4 = 1; C2,5 = 1 because
x5 = 0; C3,4 = 1 because e3,4 = 1; C3,5 = C4,5 = 1 because x5 = 0.
The output being 1 means that 01110 is a clique of size at least 3 in the graph 1010110101.
(b) Each circuit Ci,j uses 2 gates, and there are 21 n(n − 1) of those for a total of n(n − 1) gates,
where n is the number of vertices. (we usually exclude not-gates from the size of a circuit.
If you counted not-gates, it is not a big deal.) The and in the clique-verifier therefore needs
1
2
n(n − 1) − 1 gates, so the size of our clique verifier is 32 n(n − 1) − 1. The sorter uses n(n − 1)
gates and the and after the sorter uses k − 1 gates. So the total number of gates used is
5
2
n(n − 1) + k − 2 ∈ Θ(n2 ).
2
The input size is in Ω(n ), so the verifier size is linear in the input size. For n = 5 and k = 3,
our formula evaluates to 51 gates. (If you counted not-gates, your should get 71 gates.)
Exercise 29.13. The construction is identical to our circuit that verified Clique. The only
difference is in how you construct the primitive circuit Ci,j which ensured the “clique-condition”.
Here, we need to ensure the “independent set-condition”. For the clique-
condition, the circuit Ci,j verified that if vertices i, j are in the clique, then Ci,j ¬
edge ei,j is in the graph. For the independent set-condition, it is the oppo-
site: if vertices i, j are in the independent set, then edge ei,j is not in the ∧
graph. That is, for the IndSet problem, Ci,j computes ei,j ∨ xi ∨ xj (as
opposed to ei,j ∨ xi ∨ xj for the Clique problem). Equivalently, we want ∧
Ci,j to compute ei,j ∧ xi ∧ xj , and we give the circuit which implements this
independent set-condition on the right. The rest of the circuit, including the ei,j xi xj
size-verifier, is identical to that for the Clique problem.
Pop Quiz 29.14. We need to prove, for every L ∈ NP,
if L∗ is polynomialy solvable, then L is polynomialy solvable.
We use a direct proof. Suppose L∗ is polynomialy solvable and consider any L ∈ NP. Since (29.2)
is t, it means CircuitSat is polynomialy solvable. Since CircuitSat is NP-complete (the claim
in (29.1) is t), it means L is polynomialy solvable, as was to be shown.
Pop Quiz 29.15.
(a) There is nothing to prove because 1 means t and 0 means f.
(b) Suppose u = v, so there are two cases: u = v = 0 and u = v = 1. In both cases, (u ∨ v) and
(u ∨ v) are t, therefore u = v → (u ∨ v) ∧ (u ∨ v).
Now suppose (u ∨ v) ∧ (u ∨ v) is t. There are two cases. u = 1 and u = 0. If u = 1, then for
(u ∨ v) to be t, v = 1 so u = v. If u = 0, then for (u ∨ v) to be t, v = 0 so u = v. In both
cases v = u, therefore (u ∨ v) ∧ (u ∨ v) → u = v.
You may wonder how we got this equivalent logical expression to u = v. So, we u v u = v?
take this opportunity to introduce two fundamental representations of Boolean 0 0 t
functions in computer science: the disjunctive normal form (DNF) which is an 0 1 f
or of ands and the conjunctive normal form (CNF) which is an and of ors. 1 0 f
Both representations are based on the truth-table for u = v shown on the 1 1 t
right.
For u = v to be t, either u = 0 and v = 0 or u = 1 and v = 1. That is,
u = v is equivalent to (u ∧ v) ∨ (u ∧ v) (DNF)
547
For u = v to be f, either u = 0 and v = 1 or u = 1 and v = 0. That is,

u = v is equivalent to (u ∧ v) ∨ (u ∧ v)
This means u = v is equivalent to (u ∧ v) ∨ (u ∧ v). We now use De Morgan’s laws for
eqv eqv
negation, namely A ∨ B ≡ A ∧ B and A ∧ B ≡ A ∨ B to get
u = v is equivalent to (u ∨ v) ∧ (u ∨ v) (CNF)
(c) Suppose u = v∨w. There are four cases corresponding to (v, w) being (0, 0), (0, 1), (1, 0), (1, 1).
In each case (u ∨ v) ∧ (u ∨ w) ∧ (u ∨ v ∨ w) is t.
Now suppose (u ∨ v) ∧ (u ∨ w) ∧ (u ∨ v ∨ w) is t. There are two cases, u = 1 or u = 0. If
u = 1, then for (u ∨ v ∨ w) to be t, (v ∨ w) must be t, that is u = v ∨ w. If u = 0, then for
(u ∨ v) to be t, v = 0 and for (u ∨ w), w = 0. So v ∨ w = 0, i.e. u = v ∨ w.
It is a useful exercise to recover this logical expression for u = v ∨ w from its truth-table using
the CNF construction. eqv
(d) This follows from (c) because u = v ∧ w if and only if u = v ∧ w ≡ v ∨ w. So, we get the
expression in (c) with u, v, w replaced by u, v, w.
Pop Quiz 29.16. clause 1 clause 2 clause 3 clause 4
y x x z → (x, y, z) = (f, t, f);
y x z z → (x, y, z) = (f, t, f);
z x x y → (x, y, z) = (f, f, t);
z z x y → (x, y, z) = (f, f, t);
Exercise 29.17. Only problems in NP can be NP-complete. So when trying to show that a
problem is NP-complete, you should always verify first that the problem is in NP.
(a) We know that Clique is in NP. For the NP-complete problem L∗ , we choose IndSet. We
show that IndSet is polynomialy reducible to Clique.
Recall the complement graph G to a graph G. The complement graph is obtained from the
original graph by removing all existing edges and adding all other edges:
(u, v) is an edge in G ↔ (u, v) is not an edge in G.
An independent set in the graph is a clique in the complement graph and vice versa. Therefore,
G has an independent set of size k if and only if the complement graph has a clique of size k.
If we have a black-box which polynomialy solves Clique, we can solve IndSet by using the
black-box on the complement graph. Therefore,
Theorem 30.11 (Clique is NP-complete). If Clique is polynomialy solvable, then IndSet
is polynomialy solvable.
(b) First, let’s check that VertexCover is in NP. By checking each edge, one can verify in
polynomial time whether a set is a vertex cover (every edge should have at least one end in
the vertex cover), and all that remains is to check the size of the vertex cover. Therefore a
yes -instance of VertexCover can be polynomialy verified given the evidence, which is the
vertex cover itself.
For the NP-complete problem L∗ , we again choose IndSet. We show that IndSet is polyno-
mialy reducible to VertexCover. Assume a graph has n vertices V = {v1 , . . . , vn }, and let
S ⊆ V be a subset of the vertices. The complement of S is S = V − S.
Lemma 30.12. S is an independent set of size k if and only if S is a vertex cover of size
n − k.
Proof. Suppose S is an independent set of size k. Consider any edge e. If both endpoints of
e are in S then S is not an independent set (there cannot be an edge between any two vertices
in S). Therefore at least one endpoint of e is in S. Since e is arbitrary, every edge has at least
one endpoint in S, and so S is a vertex cover of size n − k.
Suppose S is a vertex cover of size n − k. Now consider any pair of vertices in S. If there is an
edge between these two vertices, then that edge does not have an endpoint in S, contradicting
S being a vertex cover. Therefore no pair of vertices in S is adjacent and so S is an independent
set of size k.
Suppose we have a black-box to solve VertexCover. We solve IndSet with size k by running
the black-box on the same graph with size n − k (seeking a vertex cover of size at most n − k).
Lemma 30.12 assures us that this algorithm works.
548
Theorem 30.13 (VertexCover is NP-complete). If VertexCover is polynomialy solv-

able, then IndSet is polynomialy solvable.
Exercise 29.18.
(a) This is an NP-completeness reduction from a general problem (Clique) to a restriction of the
problem to special cases (in BigClique, k must be large). We need to show that if we have
a polynomial black-box that solves BigClique we can polynomialy solve Clique.
Consider any instance G, k of Clique and suppose G has n vertices v1 , . . . , vn . We cannot use
our black-box to solve this problem because k may be too small. Our solution is to convert
this clique problem to another equivalent one with a large k. Construct a new graph G ′ by
adding n new vertices w1 , . . . , wn to G. Also add edges from every w-vertex to every other
vertex in G′ . Clearly, it takes polynomial time to construct G′ from G.
If you have any k-clique in G, then those vertices together with all the w-vertices form a
(k + n)-clique in G′ . Further, every (k + n)-clique in G′ contains at least k v-vertices which
are all connected to each other and so contain a k-clique in G. Therefore,
Lemma 30.14. G contains a k-clique if and only if G′ contains a (k + n)-clique.
Here is the algorithm to solve Clique. First construct G′ . Now run the black-box on G′ with
k′ = k + n (which is larger than half the vertices in G′ ). Output the answer yes / no from
the black-box. The Lemma ensures that this answer is correct.
(b) Suppose you have a black-box which polynomialy solves FreqItems. 1 We show how to use
the black-box to polynomialy solve Clique. It will follow that FreqItems is NP-complete.
Given an input G = (V, E), k to Clique, we construct an input to FreqItems, a binary
matrix A, popularity n and basket-size ℓ. We run our black-box on this input of FreqItems,
and the answer will tell us whether G has a k-clique.
The customers will be the vertices in G. The items will be the edges in G. Each of the vertices
(customers) buys an edge (item) whenever the customer is not an end point of the edge. Here
is an example. The input to Clique is on the left; A is on the right.
G; k = 4 A; n = 2, ℓ = 6
items (edges)
2 3 (1, 2) (1, 3) (1, 4) (1, 5) (2, 3) (2, 4) (3, 4) (4, 6)
1 0 0 0 0 1 1 1 1
customers
1 4 2 0 1 1 1 0 0 1 1
(nodes)
3 1 0 1 1 0 1 0 1
4 1 1 0 1 1 0 0 0
5 6 5 1 1 1 0 1 1 1 1
6 1 1 1 1 1 1 1 0
We highlighted a solution to FreqItems with a basket of size ℓ = 6 bought by n = 2

customers. The other 4 customers (nodes) must form a 4-clique in G because the 6 edges in
our basket (bought by two customers) have endpoints among the 4 customers who did not
buy the edges. The maximum number of edges in a group of 4 vertices is 6, which can only
be so if every edge is present. That is, those 4 vertices are a 4-clique.
For a general instance of Clique, we construct an instance A of FreqItems as we described.
Now run our black-box on A with n = |V | − k and ℓ = 21 k(k − 1) to get the answer to
the instance of Clique. The next lemma proves the answer is correct. Further, since the
black-box is polynomial, the entire solution to Clique is polynomial.
Lemma 30.15. There is a k-clique in G = (V, E) if and only if the instance A of FreqItems
that we constructed has a basket of size 21 k(k − 1) which is bought by |V | − k customers.
Proof. Suppose there is a k-clique in G = (V, E). In A, the 12 k(k − 1) edges in the k-clique is
a basket of edges which is bought by each of the |V | − k vertices (customers) not in the clique.
Suppose that in A there is a basket of size 21 k(k − 1) which is bought by |V | − k customers.
Those 12 k(k − 1) edges in the basket only connect to the k customers who did not by the
1 FreqItems is more general than BalancedBipartiteClique which is the NP-complete problem GT24
in the Garey & Johnson book Computers and Intractability. A more general problem cannot be easier.
549
basket. Therefore, those k customers are a subgraph with 12 k(k − 1) edges, which can only be
the case if every edge is present, and they form a k-clique in G.
Let us summarize the general methodology for proving a problem is NP-complete.
To show L is NP-complete using the NP-complete problem L∗
1: Start with a general instance I ∗ of L∗ .
2: Construct an instance I of L from I ∗ , in polynomial time.
3: Show that I ∗ ∈ L∗ if and only if I ∈ L.
550

DMC ExQuiz Sol

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DMC ExQuiz Sol

Uploaded by

Copyright:

Available Formats

Chapter 30

Solutions to Quizes & Exercises

4. To get solutions to e2x − 5ex − 6 = 0, set y = ex ; then y 2 − 5y − 6 = 0 and (using √

A∩B A∩C A ∩ (B ∪ C) A B∪C

= (n + 1)a + 12 (n(n − 1) + 2n)d = (n + 1)a + 21 (n + 1)nd

Therefore 5n+1 − 1 is divisible by 4, and we have shown that P (n + 1) is t.

t, B does not contain 1, 2, . . . , n. Suppose B contains n + 1, then n + 1 is the minimum element.

3: By structural induction, every string in M is matched.

Pop Quiz 8.6. ε → 11 → 0110 → 001100.

2: For the structural induction step, suppose we start with a palindrome x = x r . We

Exercise 10.6. x = 1, y = 1 gives mx + ny = 21; x = −2, y = 1 gives mx + ny = 3; You cannot

DD′ is a common divisor of mn and ℓ and hence DD ′ ≤ gcd(mn, ℓ).

(a) ar − bs = (b + k1 d)(s + k2 d) − bs = (k1 s + k2 b + k1 k2 d)d .

We could also use ar − 1 = (a − b)(1 + a + a2 + · · · + ar−1 ) with a = 1 + αp:

Either way, β is a sum of integers, hence β is an integer. We have proved:

Exercise 11.2. Isomorphic graphs: {I, II}

then the graph is disconnected as was to be shown.

1 component 2 components 6 components

In any planar graph with at least 3 vertices, E ≤ 3V − 6.

In a planar graph, the number of edges is linear in the number of vertices. In K 5 , V = 5, so

Pop Quiz 11.10.  

out-degree at least 2. This means v1 won the fewest possible matches

When n is even, we plug in k = n/2 − 1 to get

choice of v there are 48 choices for c. By the product rule,

(d) The result is immediate from setting a1 = a2 = · · · = ak = 1 in part (c).

(a) Let B = A1 ∪ A2 . Using inclusion-exclusion for two sets, |B ∪ A3 | = |B| + |A3 | − |B ∩ A3 |.

|(A1 ∩ A4 ) ∪ (A2 ∩ A4 ) ∪ (A3 ∩ A4 )|

The number of onto functions is mn − |F1 ∪ F2 ∪ · · · ∪ Fm |,

which are solutions to x1 + x2 + x3 = −18 with x1 , x2 , x3 ≥ 0 after subtracting 11 from x1 ,

a pair of dice. Instead of using a cumbersome tree, we can use a 1 1 1 1 1 1

where you win. As above, there are 15 outcomes in the event of

On the right we plot the win/loss ratio as you 2

Pop Quiz 15.5.

Prize Host Probability

1: dieA=[2,2,6,6,7,7]; dieB=[1,1,5,5,9,9]; % Die faces

1: dieA=[2,2,6,6,7,7]; dieB=[1,1,5,5,9,9]; % Die faces

(i) The outcome-tree for A versus B is shown below.

1: you = [1, 1, 0]; f riend = [0, 1, 1]; % H=1, T=0

Number of games 10,000 100,000 1,000,000

(b) (i) P[A|A] = P[A ∩ A]/ P [A] = 1.

Pop Quiz 16.3.

(b) By the law of total probability,

show a plot 1 − P versus N in th ﬁgure to the

social-twin. By N = 60, it is essentially guaran-

(B − 1)N −1 (B − 2)N −1 (B − 3)N −1 (B − k + 1)N −1 (B − k)N −1

Using this table, we can compute the probabilities of interest.

We use the probability space to derive the X PY

(i) X12 X12

(iii) X212 X212

Pop Quiz 18.8.

(1−p)n . It is numerically more stable to compute

update to get log PX (k),

102 103 104 105 106

Die 1 Value Die 1 Value Die 1 Value

(a) n = 20 and p = 12 , so E[X] = np = 10. The expected number of heads is 10.

coeﬃcients). To see this, the Binomial Theorem gives:

The last sum is 218 and so E[X | even] = 10.

(d) We want E[X | X ≥ k + 1]. We need the conditional probability

= (P1 + P2 )S2 − P2 (S1 + S2 )

Pop Quiz 19.7.

(a) For cases X = 1, 2, . . . , 100, total probability gives

is H100 for y = 1 and H100 − Hy−1 for y > 1, where Hn is