You are on page 1of 303

Instructors’ Solutions Manual

for

MULTIVARIABLE
MATHEMATICS:
Linear Algebra, Multivariable Calculus,
and Manifolds

Theodore Shifrin
University of Georgia

John Wiley & Sons


Hoboken, N.J.
CONTENTS

1. VECTORS AND MATRICES . . . . . . . . . . . . . . . . . 1

1. Vectors in Rn 1
2. Dot Product 5
3. Subspaces of Rn 11
4. Linear Transformations and Matrix Algebra 14
5. Introduction to Determinants and the Cross Product 26

2. FUNCTIONS, LIMITS, AND CONTINUITY . . . . . . . . . . . . 32

1. Scalar- and Vector-Valued Functions 32


2. A Bit of Topology in Rn 37
3. Limits and Continuity 41

3. THE DERIVATIVE . . . . . . . . . . . . . . . . . . . . . 46

1. Graphs, Partial Derivatives, and Directional Derivatives 46


2. Differentiability 48
3. Differentiation Rules 53
4. The Gradient 57
5. Curves 63
6. Higher-Order Partial Derivatives 69

4. IMPLICIT AND EXPLICIT SOLUTIONS OF


LINEAR SYSTEMS . . . . . . . . . . . . . . . . 74

1. Gaussian Elimination and the Theory of Linear Systems 74


2. Elementary Matrices and Calculating Inverse Matrices 87
3. Linear Independence, Basis, and Dimension 92
4. The Fundamental Subspaces 103
5. The Nonlinear Case: Introduction to Manifolds 110

5. EXTREMUM PROBLEMS . . . . . . . . . . . . . . . . . . 117

1. Compactness and the Maximum Value Theorem 117


2. Maximum/Minimum Problems 119
3. Quadratic Forms and the Second Derivative Test 129
4. Lagrange Multipliers 135
5. Projections, Least Squares, and Inner Product Spaces 147
6. SOLVING NONLINEAR PROBLEMS . . . . . . . . . . . . . 163
1. The Contraction Mapping Principle 163
2. The Inverse and Implicit Function Theorems 168
3. Manifolds Revisited 173

7. INTEGRATION . . . . . . . . . . . . . . . . . . . . . 178
1. Multiple Integrals 178
2. Iterated Integrals and Fubini’s Theorem 184
3. Polar, Cylindrical, and Spherical Coordinates 194
4. Physical Applications 199
5. Determinants and n-dimensional Volume 204
6. Change of Variables Theorem 214

8. DIFFERENTIAL FORMS AND INTEGRATION


ON MANIFOLDS . . . . . . . . . . . . . . . . 220
1. Motivation 220
2. Differential Forms 220
3. Line Integrals and Green’s Theorem 226
4. Surface Integrals and Flux 234
5. Stokes’s Theorem 242
6. Applications to Physics 249
7. Applications to Topology 253

9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS . . . . . 258


1. Linear Transformations and Change of Basis 258
2. Eigenvalues, Eigenvectors, and Diagonalizability 268
3. Difference Equations and Ordinary Differential Equations 277
4. The Spectral Theorem 289
CHAPTER 1
Vectors and Matrices
1.1. Vectors
     
2 −1 1
1.1.1 a. x+y = + = .
3 1 4
       
2 −1 2 − (−1) 3
b. x − y = − = = .
3 1 2 3−1
         
2 −1 2 −2 0
c. x + 2y = +2 = + = .
3 1 3 2 5
   1 1
1 1 1 −2
d. 2 x + 2 y = 3 + 1 = 2 .
2 2 2
       
−1 2 (−1) − 2 −3
e. y−x= − = = .
1 3 1−3 −2
       
2 −1 4 − (−1) 5
f. 2x − y = 2 − = = .
3 1 6−1 5
√ √
g. kxk = 22 + 32 = 13.
 √ 
√ 2/ 13
h. x/kxk = x/ 13 = √ .
3/ 13
     
1 2 3
1.1.2 Let A = 2 , B = 4 , and C = 1 . Let D be the other vertex of the parallelogram
    
1 3 5
P.
Case 1: If AB   of P, then either AC or BC is alsoa side
is a side P. 
 of  If AC is a side of P,
2 2 2 4
−−→ −→ −−→ −−→ −−→
then BD = AC =  −1 , so D = OD = OB + BD =  4  +  −1  =  3 . If BC is a side of
4 3 4 7
       
1 1 1 2
−−→ −−→  −
−→ −→ −
−→
P, then AD = BC = −3 , so D = OD = OA + AD =  2  +  −3  =  −1 .
2 1 2 3
 
−1
−−→ −−→ 
Case 2: If AB is not   ofP, then
a side  AC  and CB are sides of P, and AD = CB = 3 , so
1 −1 0 −2
−−→ −→ −−→
D = OD = OA + AD =  2  +  3  =  5 .      
1 −2 −1 4 2 0
Therefore, the three possible locations of D are  3 ,  −1 , and  5 .
7 3 −1

1
2 1. VECTORS AND MATRICES

1.1.3 a. This problem requires some geometric insight. The most elegant argument begins by
noticing that the collection of vectors from the origin to the vertices of an n-gon is invariant under
a rotation by angle 2π/n. Thus, the sum of these vectors must also be invariant by this rotation.
But the only vector that is invariant under a rotation by an angle that is not an integral multiple
of 2π is the zero vector. Weconclude that  these vectors sum to 0.
cos(2πk/n)
Alternatively, let vk = , k = 0, 1, . . . , n − 1. Visualize summing the vectors by
sin(2πk/n)
moving v1 to the head of v0 , v2 to the head of v1 , and so on. As we see in the diagram below,
these vectors fit together to make a similar regular n-gon, and so the figure closes up when we get
to vn−1 . That is, the vectors add up to 0.

θ
v2 v1

θ θ
v0

π − θ

b. Assume our polygon is centered at the origin O. Fix a vertex A. Write each vector

−→ −→ −−→
AB from A to a vertex B as the sum of AO and OB. Then
X −− → X− −→ X −→ X −−→
AB = AB = AO + OB
B6=A B B B
−→ −→ P −− → −→
The second sum is 0 by part a and −OA = AO, so we have AB = −n OA , where n is the
B6=A
number of vertices of the polygon.
−−→ −−
→ −−→ −→
1.1.4 We have AM = 12 AB and AN = 21 AC. Thus,
−−→ −−→ −−→ 1 −→ 1 − −→ −→ − −→ −−→
M N = AN − AM = 2 AC − 2 AB = 12 (AC − AB) = 12 BC.
−−→ −→ −→ −→
1.1.5 Using △ABC, P Q = 21 AC by Exercise 4. Similarly, SR = 12 AC using △ADC. Hence,
−−→ −→ −
−→ −→
P Q = SR. Similarly, using △BCD and △BAD, QR = P S, so P QRS is a parallelogram.
−→ −→ −−→ −−→ −−→ −−→ −→
1.1.6 We have AQ = AC + CQ and CQ = 12 CD = 12 (AD − AC), so
−→ 1 −→ 1 −−→ 1 −→ 1 −− →
AQ = 2 AC + 2 AD = 2 AC + 3 AB.

On the other hand,


−→ −→ −−→ −→ 2 −−→ −→ 2 −− → −→ −→ −−

AE = AC + CE = AC + 5 CB = AC + 5 (AB − AC) = 53 AC + 25 AB.
−→ −→ −
−→ −→
Comparing, we see 56 AE = 12 AC + 13 AB = AQ; hence, c = 5/6.
1.1. VECTORS 3

−→ −−→ 3 −−→ −−→ −−→


1.1.7 AP = AD + 4 DE because DP = 34 DE. Then

−→ −−→ 3 −→ −−→ −−→ −→


AP = AD + 4 (AE − AD) = 14 AD + 34 AE and
−→ −−→ − −→ −−→ −→
AC = AD + AB = AD + 3AE

−→ −−
→ −→ −→ −→ −→
since AE = 13 AB. Then, AC = 4AP ; since AP is a scalar multiple of AC, P is on AC.

1.1.8 We have

v − x = 31 (x + y + z) − x = 13 (y + z − 2x) = 2
3
1
2 (y + z) − x .

This says that v − x is a scalar multiple of the vector joining A and the midpoint of BC. That is,
the head of the vector v lies on the median from A to BC. A similar argument shows it lies on
each of the medians.

1.1.9 a. Since x = su + tv and s + t = 1, we have x = (1 − t)u + tv = u + t(v − u). Thus,


the vectors su + tv, with s + t = 1, give the line in R2 passing through u and v. When s ≥ 0,
x is on the u-side of v. When t ≥ 0, x is on the v-side of u. Therefore, when s and t are both
nonnegative, x is on the segment joining u and v.
b. Similar to part a, vectors x = ru + sv + tw with r + s + t = 1 are in the plane
containing u, v, and w (as points; do not confuse this with the space spanned by u, v, and w).
Substituting r = 1 − s − t, we have x = u + s(v − u) + t(w − u), so we can take v − u and w − u
to be our direction vectors. When r ≥ 0, x is on the u-side of the line passing through v and w;
similarly, when s ≥ 0, x is on the v-side of the line through u and w; and when t ≥ 0, x is on the
w-side of the line through u and v. (Therefore, when all three coefficients are nonnegative, x lies
inside the triangle with vertices at u, v, and w.)

1.1.10 Suppose x, y ∈ Rn are nonparallel.


a. Suppose sx + ty = 0. If s 6= 0, then x = (−t/s)y, and if t 6= 0, then y = (−s/t)x.
Either way, x and y are scalar multiples of one another, contradicting the assumption that they
are nonparallel. Thus, we conclude that both s and t are 0.
b. If ax+by = cx+dy, then (a−c)x+(b−d)y = 0, so by part a, we have a−c = b−d = 0.
Thus, a = c and b = d.
←−→
1.1.11 With the notation from the proof of Proposition 1.1, the line OM is parametrized by
t ←→ 1
2 (x + y), t ∈ R, and the line AN is parametrized by x + s( 2 y − x), s ∈ R. These lines intersect
when
t 1

2 (x + y) = x + s 2 y − x
 
or 2t − 1 + s x + 2t − 2s y = 0. By Exercise 10, we conclude that t
2 − 1 + s = 0 and t
2 − s
2 = 0.
Solving, we obtain s = t = 2/3.
4 1. VECTORS AND MATRICES

1.1.12 a. By the commutative law of addition for real numbers,


           
x1 y1 x1 + y1 y1 + x1 y1 x1
 ..   ..   ..   ..   ..   .. 
x+y =  . + .  =  . = .  =  .  +  .  = y + x.
xn yn xn + yn yn + xn yn xn

Geometrically, x + y and y + x represent the same vector, as we’ve seen in Figure 1.5 of the text.
b. By the associative law of addition for real numbers,
       
x1 y1 z1 (x1 + y1 ) + z1
       
(x + y) + z =  ...  +  ...  +  ...  =  ..
. 
xn yn zn (xn + yn ) + zn
       
x1 + (y1 + z1 ) x1 y1 z1
 ..   ..   ..   .. 
= .  =  .  +  .  +  .  = x + (y + z).
xn + (yn + zn ) xn yn zn

Geometrically, x + y + z is the diagonal of the parallelepiped formed by x, y, and z.


       
0 x1 0 + x1 x1
       .. 
c. 0 + x =  ...  +  ...  =  ..
.  =  .  = x. Geometrically, since 0 has no
0 xn 0 + xn xn
length, the parallelogram created by 0 and x is just the vector x; then this parallelogram’s diagonal
is also the vector x.    
x1 −x1
   
d. If x =  ... , take −x =  ... . Then
xn −xn
       
x1 −x1 x1 + (−x1 ) 0
 ..   ..   ..   .. 
x + (−x) =  .  +  .  =  .  =  .  = 0.
xn −xn xn + (−xn ) 0

Geometrically, just extend any vector x in the opposite direction with the same length, the vector
sum of this vector and x is zero.
e. By the associative property of multiplication for real numbers,
     
dx1 c(dx1 ) (cd)x1
     .. 
c(dx) = c  ...  =  ..
.  =  .  = (cd)x.
dxn c(dxn ) (cd)xn

Geometrically: First scaling by d and then by c is the same as scaling the original by cd.
f. By the distributive property of multiplication over addition for real numbers,
         
x1 + y1 c(x1 + y1 ) cx1 + cy1 cx1 cy1
 ..   ..   ..   ..   .. 
c(x + y) = c  . = .  = .  =  .  +  .  = cx + cy.
xn + yn c(xn + yn ) cxn + cyn cxn cyn

Geometrically: the scalar multiple of the sum of two vectors is the same as the sum of their
respective scalar multiples.
1.2. DOT PRODUCT 5

g. Once again, by the distributive property,

     
x1 (c + d)x1 cx1 + dx1
 ..   ..   .. 
(c + d)x = (c + d)  .  =  . = . 
xn (c + d)xn cxn + dxn
       
cx1 dx1 x1 x1
 ..   ..   ..   .. 
=  .  +  .  = c  .  + d  .  = cx + dx.
cxn dxn xn xn

The sum of two vectors, geometrically, is the diagonal from the origin to the opposite corner of the
parallelogram created by the two vectors; when the two vectors are scalar multiples of one another,
this parallelogram is flattened, so that the diagonal is also a multiple of the sides; that multiple is
the sum of the individual multiples.
 
x1
 . 
  h. By the definition of the multiplicative identity 1 ∈ R, we have 1x = 1  ..  =
1x1 x1 x
n
 ..   .. 
 .  =  .  = x. Geometrically, multiplication by 1 changes neither the length nor the
1xn xn
direction of the vector.

1.1.13 a. Starting with the equation 0 + 0 = 0 and using property g, we have 0x = (0 + 0)x =
0x + 0x. Adding the additive inverse of 0x to both sides, and using properties b and c, we obtain
0 = 0x + (−0x) = (0x + 0x) + (−0x) = 0x + (0x + (−0x)) = 0x + 0 = 0x.
b. First notice, by properties h and g, that (−1)x + x = (−1)x + (1)x = (−1 + 1)x =
0x = 0, by part a. But this says that (−1)x is the additive inverse of x. (Note the additive inverse
is unique.)

1.2. Dot Product


   
2 −5
1.2.1 a. · = −10 + 10 = 0, so the vectors are orthogonal and θ = π/2.
5 2
      
2 −1 2 √ −1 √ √
b. · = −2+1 = −1. Since = 5 and = 2, cos θ = −1/ 10,
1 1 1 1

so θ = arccos(−1/ 10).
       
1 7 1 √ 7 √
c. · = −25. Since = 65 and = 65, cos θ = −25/65 =
8 −4 8 −4
−5/13, so θ = arccos(−5/13).
   
1 5
d.  4  ·  1  = 5 + 4 − 9 = 0. Again, the vectors are orthogonal, so θ = π/2.
−3 3
6 1. VECTORS AND MATRICES
      
1 5 1 5
√ √
e.  −1  ·  3  = 14. Since  −1  = 38 and  3  = 38, cos θ = 14/38 = 7/19,

6 2 6 2
so θ = arccos(7/19).
       
3 −1 3 −1
√ √
f.  −4  ·  0  = 2. Since  −4  = 50, and  0  = 2, cos θ = 1/5, so

5 1 5 1
θ = arccos(1/5).
       
1 1 1 1

 1   −3   1   −3 
g.  ·  = 2. Since   = 2 and   = 6, cos θ = 1/6, so θ =
 1   −1   1   −1 

1 5 1 5
arccos(1/6).
       
2 −5 −5 2
· ·
5 2 2 5
1.2.2 a. projy x = y = 0; projx y = x = 0.
25 + 4 4 + 25
       
2 −1 −1 2
· ·
1 1 1 1 1 1
b. projy x = y = − y; projx y = x = − x.
1+1 2 4+1 5
       
1 7 7 1
· ·
8 −4 5 −4 8 5
c. projy x = y = − y; projx y = x = − x.
49 + 16 13 1 + 64 13
       
1 5 5 1
 4·1  1  · 5 4 
−3 3 3 −3
d. projy x = y = 0; projx y = x = 0.
25 + 1 + 9 1 + 16 + 9
       
1 5 5 1
 −1  ·  3   3  ·  −1 
6 2 7 2 6 7
e. projy x = y= y; projx y = x= x.
25 + 9 + 4 19 1 + 1 + 36 19
       
3 −1 −1 3
 −4  ·  0   0  ·  −4 
5 1 1 5 1
f. projy x = y = y; projx y = x= x.
1+1 9 + 16 + 25 25
       
1 1 1 1
 1   −3   −3   1 
 ·     
 1   −1   −1  ·  1 
1 5 1 5 1 1
g. projy x = y= y; projx y = x = x.
1 + 9 + 1 + 25 18 1+1+1+1 2
1.2.3 Since the problem is independent
   of  scaling,
  we 
may
 work
 with
 the
 unit cube
 in
 the first
0 1 0 1 0 1 0 1
octant, i.e., the cube with vertices  0 ,  0 ,  1 ,  1 ,  0 ,  0 ,  1 , and  1 . A long
0 0 0 0 1 1 1 1
1.2. DOT PRODUCT 7
   
1 1
diagonal is then given by  1  and a face diagonal is given by  1 . The angle between these two
1 0

vectors is arccos(2/ 6) ≈ 35.3◦ .

1.2.4 We may assume one corner of the box is at the origin. Let x, y, and z denote three
edges of the box so that z is the longest edge and w = x + y + z is the long diagonal. Then we have
kzk = 5, kwk2 = (x+y+z)·(x+y+z) = kxk2 +kyk2 +kzk2 = 50, and w ·z = (x+y+z)·z = kzk2 ,
since x, y, and z are mutually orthogonal. Then the angle θ between w and z satisfies cos θ =
w·z kzk 1
= = √ , from which we deduce that θ = π/4.
kwkkzk kwk 2
1.2.5 If θ = arccos(1/4), then x · y = kxkkyk cos θ = 1/2. Then,

(x − 3y) · (x + y) = kxk2 + x · y − 3(y · x) − 3kyk2 = 4 + 1


2 − 3
2 − 3 = 0,

so (x − 3y) and (x + y) are orthogonal, by definition.

1.2.6 Let α, β, and γ denote the angles between x and y, y and z, and x and z, respectively.

(∗) cos α = x · y = −x · (x + z) = −(kxk2 + x · z) = −(1 + cos γ);

on the other hand,

cos α = y · x = −y · (y + z) = −(kyk2 + y · z) = −(1 + cos β).

Hence, cos β = cos γ. Repeating this argument, it is easy to conclude that cos α = cos β = cos γ,
and so α = β = γ. Furthermore, substituting α = γ in the equation (∗), we get cos α = −1/2.
Thus, α = β = γ = 2π/3.
 
x1
1.2.7 Let x =  x2  ∈ R3 , x 6= 0. Then, since kei k = 1, we have xi = x · ei = kxk cos θi ,
x3
i = 1, 2, 3, and so kxk2 (cos2 θ 1 + cos2 θ2 + cos2 θ3 ) = x21 + x22 + x23 = kxk2 . Thus, cos2 θ1 + cos2 θ2 +
cos2 θ3 = 1.

1.2.8 We have kxk2 = n, kyk2 = n(n + 1)(2n + 1)/6, and x · y = n(n + 1)/2. Therefore,
s √
n(n + 1) 3(n + 1) 3
cos θn = q = → as n → ∞.
2(2n + 1) 2
2n (n+1)(2n+1)
6

Therefore, θn → π/6 as n → ∞.

1.2.9 The point x + t0 y is the point closest to the origin on the line through x with direction
vector y. But this also means that −t0 y is the point on the line spanned by y closest to x. That
is, we expect that −t0 y should be the projection of x onto y; since −t0 = x · y/kyk2 , this is indeed
the case.
8 1. VECTORS AND MATRICES

x−t0y
x

t0y
y

1.2.10 We position the parallelogram with one vertex at the origin, and we label the vectors
emanating from that vertex x and y. The parallelogram is a rectangle if and only if x · y = 0. The
diagonals of the parallelogram are x + y and x − y. Since kx + yk2 = kxk2 + 2x · y + kyk2 and
kx − yk2 = kxk2 − 2x · y + kyk2 , the diagonals have equal lengths if and only if x · y = 0.

1.2.11

kx + yk2 + kx − yk2 = (x + y) · (x + y) + (x − y) · (x − y)
 
= kxk2 + 2x · y + kyk2 + kxk2 − 2x · y + kyk2
= 2(kxk2 + kyk2 ).

Geometrically, the sum of the squares of the lengths of the diagonals of a parallelogram is equal to
the sum of the squares of the lengths of its four sides.
−→ −−→ −
−→ −−

1.2.12 Let x = CA and y = CB. Then AB = y − x, and c2 = kABk2 = ky − xk2 =
kyk2 − 2y · x + kxk2 = a2 − 2ab cos θ + b2 .

1.2.13 The diagonals of a parallelogram with sides x and y are x + y and x − y. Now, x + y
is orthogonal to x − y if and only if (x + y) · (x − y) = 0 (by definition). But

(x + y) · (x − y) = x · x − x · y + y · x − y · y = kxk2 − kyk2 .

So (x + y) · (x − y) = 0 if and only if kxk = kyk, i.e., if and only if the parallelogram is a rhombus.

1.2.14 As shown in Figure 2.5 of the text, the relevant sides of the triangle are the vectors x − y
and −(x+y). Since kxk = kyk (the radius of the circle), we have (x+y)·(x−y) = kxk2 −kyk2 = 0,
so the triangle is a right triangle.
Answer to the geometric challenge: The locus consists of two circular arcs.

1.2.15 a. If x · y = 0 for all x ∈ Rn , then, in particular, y · y = 0. By Proposition 2.1, y = 0.


b. If x · y = x · z for all x ∈ Rn , then x · (y − z) = 0 for all x ∈ Rn . We infer from part
a that y − z = 0, so y = z.
     
x1 −x2 x
1.2.16 a. Let x = . Then ρ(x) · x = · 1 = 0.
x2 x1 x2
1.2. DOT PRODUCT 9
 
y1
b. Let y = . Then
y2
   
x1 −y2
x · ρ(y) = · = −x1 y2 + y1 x2 = −(−x2 y1 + x1 y2 ) = −ρ(x) · y.
x2 y1

Geometrically, ρ rotates a vector by π/2 counterclockwise, while −ρ rotates a vector by π/2


clockwise. The equation above says that the angle between x and ρ(y) is the same as the angle
between y and −ρ(x), namely, the complement of the angle between x and y.

1.2.17 From Corollary 2.4 we have kxk = k(x − y) + yk ≤ kx − yk + kyk, and so kxk − kyk ≤

kx−yk. Switching x and y, we infer that kyk−kxk ≤ kx−yk as well. Thus, kxk−kyk ≤ kx−yk,
as required.

1.2.18 Let ℓ, h, and w denote the length, height, and width of the box. The length of the long

diagonal is then c = ℓ2 +h2 + w2 . We wish to maximize ℓ + h + w while holding c constant. If
ℓ 1
we let x =  h  and y =  1 , then the Cauchy-Schwarz inequality gives
w 1

ℓ + h + w = x · y ≤ kxkkyk = c 3;

equality holds when x and y are parallel, i.e., when ℓ = h = w = c/ 3. Thus, the optimal box is
a cube.

1.2.19 We have

0 ≤ kbx − ayk2 = (bx − ay) · (bx − ay)


= b2 kxk2 − 2abx · y + a2 kyk2
= 2a2 b2 − 2abx · y.

Since a and b are nonnegative, we have 0 ≤ ab − x · y, so x · y ≤ ab = kxkkyk. Substituting −x


for x, we get (−x) · y ≤ kxkkyk, as well. Thus, |x · y| ≤ kxkkyk, as required. Equality holds only
when kbx − ayk = 0, i.e., when one of the vectors is a scalar multiple of the other.

1.2.20 a. Define α (resp., β) to be the angle between x + y and x (resp., y). Then
x · (x + y) x · y + kxk2
cos α = =
kxkkx + yk kxkkx + yk
y · (x + y) x · y + kyk2
cos β = = .
kykkx + yk kykkx + yk
Since kxk = kyk, we see that cos α = cos β. Thus, α = β, and x + y bisects the angle between x
and y.
b. Replacing x by bx and replacing y by ay, we obtain two vectors of equal length, and
so the result of part a shows that bx + ay bisects the angle between bx and ay, i.e., the angle
between x and y.
10 1. VECTORS AND MATRICES

1.2.21 Assume the parallelogram P has a vertex at the origin and that vectors x and y form
two sides of P. Let a = kxk and b = kyk. Suppose that the diagonal x + y bisects the angle
between x and y. Then by Exercise 20, x + y must be a scalar multiple of bx + ay, which means
that a = b (we may cite Exercise 1.1.10 if we wish here). Thus, P is a rhombus.
Conversely, suppose P is a rhombus, so kxk = kyk. Then, by Exercise 20, the diagonal x + y
bisects the angle between x and y. To see that it also bisects the opposite angle, we observe that
the opposite angle is the same as the angle between −x and −y. Finally, notice that the other
two angles are given by the angles between −x and y (or x and −y), so they are bisected by the
diagonal x − y.
−−
→ −→ −−→
1.2.22 Let x = AB, y = AC, a = kxk, and b = kyk. By Exercise 20 we know that since AD
−−→
bisects ∠CAB it must be a multiple of bx + ay, i.e., AD = s(bx + ay) for some s ∈ R. Also, since
−−→
D lies on BC we know that AD = x + t(y − x) for some t ∈ R. Equating these expressions yields
sbx + say = (1 − t)x + ty. Since x and y are nonparallel we conclude from Exercise 1.1.10 that
sb = 1 − t and sa = t. Thus, t/(1 − t) = a/b. Finally, we have
−−→ −−

kBDk t a kABk
−−→ = 1 − t = b = −→ ,
kCDk kACk
as required.

1.2.23 Using the notation in the hint and Exercise 20, we know that the bisector of ∠AOB is
given by
t(bx + ay), t ∈ R.

Furthermore, the bisector of ∠OAB passes through the point A and has direction given by the
direction of the bisector of −x and y − x, i.e., this line is given parametrically by

x + s a(y − x) − cx , s ∈ R.

The point P of intersection of these two lines is given by setting


 
t(bx + ay) = x + s a(y − x) − cx = 1 − (a + c)s x + say,

and so, by Exercise 1.1.10, we have tb = 1 − (a + c)s and ta = sa. So s = t = 1/(a + b + c), and
thus
−−→ 1
OP = (bx + ay).
a+b+c
Finally, the line bisecting ∠OBA passes through the point B and has direction given by the direction
of the bisector of −y and x − y, and so this line is given parametrically by

y + u b(x − y) − cy , u ∈ R.
−−→
It is straightforward to check that setting u = 1/(a + b + c) gives the vector OP as well. Thus, P
lies on all three angle bisectors.
1.3. SUBSPACES OF Rn 11

−→ −−→ −−→
1.2.24 Let C be as in the hint and let x = OA, y = OB, and z = OC. Since C lies on the
altitude through B, we notice (z − y) · x = 0, i.e., x · z = x · y. Similarly, since C lies on the altitude
through A, we have y · z = x · y. In particular, z · x = z · y, so z · (x − y) = 0. This means that
−−→ −
−→
OC is orthogonal to AB, so the altitude through O passes through C, as we needed to show.

1.2.25 a. The perpendicular bisector of OA is given by 12 x + sρ(x), s ∈ R. The perpendicular


bisector of OB is 12 y + tρ(y), t ∈ R. These lines intersect when
1
2x + sρ(x) = 12 y + tρ(y).

If we take the dot product of this equation with y and recall that ρ(y) · y = 0, we get
1
2x · y + sρ(x) · y = 21 kyk2 .

kyk2 − x · y
Solving for s gives s = .
2ρ(x) · y
b. Since 12 (x + y) is the midpoint of AB, the perpendicular bisector of AB is given by
1 1
2 (x + y) + sρ(y − x), s ∈ R. To show that z lies on this line, it suffices to show that z − 2 (x + y)
is orthogonal to y − x. But
  
1
 kyk2 − x · y 1
z − 2 (x + y) · (y − x) = ρ(x) − 2 y · (y − x)
2ρ(x) · y
 
kyk2 − x · y
= ρ(x) · y − 21 y · (y − x)
2ρ(x) · y

= 12 kyk2 − x · y + y · x − kyk2 = 0.
−→ −−→ −−→
1.2.26 As usual, let OA = x and OB = y. Then OP = 13 (x + y) and, by Exercise 25,
−−→ kyk2 − x · y ←→
OR = 12 x + cρ(x), where c = . Now, Q must lie on the altitude from B to OA,
2ρ(x) · y
−−→ ←→
so OQ = y + tρ(x) for some scalar t; similarly, Q must lie on the altitude from A to OB, so
−−→ −−→
OQ = x + sρ(y) for some scalar s. Solving, we find that OQ = y − 2cρ(x). Then we have
−−→ −−→ −−

QR = 12 x − y + 3cρ(x) and QP = 13 (x − 2y) + 2cρ(x) = 23 QR. Therefore, P lies two-thirds the
way from Q to R along QR.
When the triangle is isosceles, the intersection of the angle bisectors does lie on that line.
However, for example, let △OAB be a right triangle with right angle at O. Then Q = O and R is
the midpoint of AB, and OR bisects ∠O only when we have an isosceles right triangle.

1.3. Subspaces of Rn

1.3.1 a. No: 0 ∈
/ V.
   
1 0
b. Yes: V = Span  0  ,  1 .
1 1
12 1. VECTORS AND MATRICES

c. No: 0 ∈
/ V.
d. No: 0 ∈
/ V.
e. Yes: x21 + x22 + x23 = 0 ⇐⇒ x = 0, so V = {0}.
f. No: 0 ∈
/ V (in fact, V = ∅).
   
2 1
g. Yes: V = Span  1  ,  2 .
1 1
         
3 2 1 2 1
h. Yes:  0  = 2  1  −  2 , so V = Span  1  ,  2 .
1 1 1 1 1
     
2 2 1
i. No: 0 6∈ V , since solving 0 =  4  + s  1  + t  2  is equivalent to solving
−1 1 −1

2s + t = −2
s + 2t = −4
s − t = 1 ,

which is easily seen to have no solution.

1.3.2 In order for us to conclude from this argument that 0 ∈ V , we must first have some
v ∈ V . The first criterion is equivalent to stipulating that V be nonempty.

1.3.3 x · (c1 v1 + c2 v2 + · · · + ck vk ) = (x · c1 v1 ) + (x · c2 v2 ) + · · · + (x · ck vk ) = c1 (x · v1 ) +
c2 (x · v2 ) + · · · + ck (x · vk ) = 0 + 0 + · · · + 0 = 0, as required.

1.3.4 We check the requisite three properties. (i) 0 ∈ V ⊥ since 0 · v = 0 for every v ∈ V .
(ii) Suppose x ∈ V ⊥ and c ∈ R. We must check that cx ∈ V ⊥ . We calculate: (cx) · v = c(x · v) = 0
for all v ∈ V , as required.
(iii) Suppose x, y ∈ V ⊥ ; we must check that x+y ∈ V ⊥ . Well, (x+y)·v = (x·v)+(y·v) = 0+0 = 0
for all v ∈ V , as needed.

1.3.5 Since W is a subspace, it is closed under scalar multiplication and addition, so every
vector of the form c1 v1 + c2 v2 + · · · + ck vk must lie in W . That is, the general element of V is an
element of W , so V ⊂ W .

1.3.6 a. (i) 0 ∈ U ∩ V , since 0 ∈ U and 0 ∈ V .


(ii) Suppose x ∈ U ∩ V and c ∈ R. We must show that cx ∈ U ∩ V . We know x ∈ U and x ∈ V .
Since U and V are closed under scalar multiplication, we infer that cx ∈ U and cx ∈ V . Therefore,
cx ∈ U ∩ V .
(iii) Suppose x, y ∈ U ∩ V . We must show that x + y ∈ U ∩ V . We know that x, y ∈ U and
x, y ∈ V . Since U and V are closed under addition, we infer that x + y ∈ U and x + y ∈ V .
1.3. SUBSPACES OF Rn 13

Therefore, x + y ∈ U ∩ V .
In conclusion, U ∩ V is a subspace.
Example 1. Let U = Span(e1 ) and V = Span(e2 ) ⊂ R2 . The lines U and V are subspaces of
R2 and the point U ∩ V = {0} is a subspace as well.
Example 2. Let U = Span(e1 , e2 ) and V = Span(e1 , e3 ) ⊂ R3 . The planes U and V are
subspaces and their intersection, the line spanned by e1 , is a subspace of R3 as well.
b. No. Let U and V be as in the first example above. Then the vector e1 + e2 is a sum
of two vectors in U ∪ V , but does not lie in U ∪ V .
c. (i) Since 0 ∈ U and 0 ∈ V , we have 0 = 0 + 0 ∈ U + V . (ii) Suppose x ∈ U + V and
c ∈ R. We are to show that cx ∈ U + V . By definition, x can be written in the form x = u + v for
some u ∈ U and v ∈ V . Then cx = c(u + v) = (cu) + (cv) ∈ U + V , inasmuch as each of U and V is
closed under scalar multiplication. (iii) Suppose x, y ∈ U + V . Then x = u + v and y = u′ + v′ for
some u, u′ ∈ U and v, v′ ∈ V . Therefore, x + y = (u + v) + (u′ + v′ ) = (u + u′ ) + (v + v′ ) ∈ U + V ,
since U and V are both closed under addition.
Example 1. Let U = Span(e1 ) and V = Span(e2 ) ⊂ R2 . Then U + V = R2 .
Example 2. Let U = Span(e1 , e2 ) and V = Span(e1 + e3 ). Then U + V = R3 .

1.3.7 If Span(v1 , . . . , vk , v) ⊂ Span(v1 , . . . , vk ), then, in particular, v ∈ Span(v1 , . . . , vk ).


Conversely, if v ∈ Span(v1 , . . . , vk ), then v = c1 v1 +c2 v2 +· · ·+ck vk for some scalars c1 , . . . , ck ; now
consider an arbitrary linear combination of the vectors v1 , . . . , vk and v, say, x = α1 v1 +· · ·+αk vk +
βv. Then x = α1 v1 + · · · + αk vk + β(c1 v1 + c2 v2 + · · · + ck vk ) = (α1 + βc1 )v1 + · · · + (αk + βck )vk ∈
Span(v1 , . . . , vk ), so Span(v1 , . . . , vk , v) ⊂ Span(v1 , . . . , vk ). Since the reverse inclusion is obvious,
we are done.

1.3.8 Since V and V ⊥ are subspaces of Rn , 0 ∈ V and 0 ∈ V ⊥ , so 0 ∈ V ∩ V ⊥ . Now suppose


x ∈ V ∩ V ⊥ . Then kxk2 = x · x = 0, so x = 0. Hence V ∩ V ⊥ = {0}.

1.3.9 Suppose x ∈ V ⊥ . Then x · v = 0 for all v ∈ V . In particular, since U ⊂ V , we infer that


x · u = 0 for all u ∈ U , and so x ∈ U ⊥ .

1.3.10 Let x ∈ V . To show x ∈ (V ⊥ )⊥ , we must show x · u = 0 for every u ∈ V ⊥ . Now, for


any u ∈ V ⊥ , we know that u · v = 0 for every v ∈ V . In particular, x · u = u · x = 0, as desired.
It is true for any subspace V ⊂ Rn that V = (V ⊥ )⊥ , but we need a dimension argument
to prove this (see Proposition 4.8 of Chapter 4). What’s more, equality need not hold in the
infinite-dimensional setting.

1.3.11 As in the hint, set w1 = v1 . Then set w2 = v2 − projw1 v2 . Then it is straightforward


to check that w2 · w1 = 0. What’s more, we have Span(w1 , w2 ) = Span(v1 , v2 ), since each of w1
and w2 is a linear combination of v1 and v2 and vice versa. Continue inductively. (See Theorem
5.3 of Chapter 5 for full details.)
14 1. VECTORS AND MATRICES

1.3.12 Let x ∈ (U + V )⊥ . Then x · (u + v) = 0 for all u ∈ U and all v ∈ V . Since U and


V are subspaces, 0 ∈ U and 0 ∈ V , so, in particular, x · u = x · (u + 0) = 0 for any u ∈ U and
x · v = x · (0 + v) = 0 for any v ∈ V . Hence, x ∈ U ⊥ ∩ V ⊥ , so (U + V )⊥ ⊂ U ⊥ ∩ V ⊥ .
Now, let x ∈ U ⊥ ∩ V ⊥ . Then x·u = 0 for all u ∈ U and x·v = 0 for all v ∈ V . We wish to show
that x ∈ (U + V )⊥ . So we must show that x is orthogonal to an arbitrary element of U + V , which
is a vector of the form u + v for some u ∈ U and some v ∈ V . Now, x · (u + v) = x · u + x · v = 0,
as desired. Thus, U ⊥ ∩ V ⊥ ⊂ (U + V )⊥ .

1.4. Linear Transformations and Matrix Algebra


" # " #
1+2 2+1 3 3
1.4.1 a. A + B = =
3+4 4+3 7 7
" # " #
2−2 4−1 0 3
b. 2A − B = =
6−4 8−3 2 5
c. Since A and C are not the same shape, A − C is not defined.
d. Since C and D are not the same shape, C + D is not defined.
" #" # " # " #
1 2 2 1 (1)(2) + (2)(4) (1)(1) + (2)(3) 10 7
e. AB = = =
3 4 4 3 (3)(2) + (4)(4) (1)(3) + (4)(3) 22 15
" #" # " #
2 1 1 2 5 8
f. BA = =
4 3 3 4 13 20
" #" # " #
1 2 1 2 1 1 4 5
g. AC = =
3 4 0 1 2 3 10 11
h. CA is not defined since the number of columns in C is not the same as the number
of rows in A.
i. BD is not defined, since the number of columns in B is not the same as the number
of rows in D.
   
0 1 " # 4 3
  2 1  
j. DB =  1 0  = 2 1 
4 3
2 3 16 11
 
" # 0 1 " #
1 2 1   4 4
k. CD = 1 0 =
0 1 2 5 6
2 3
   
0 1 " # 0 1 2
  1 2 1  
l. DC =  1 0 = 1 2 1
0 1 2
2 3 2 7 8
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 15

1.4.2 a. Since the j th column of A is Aej and Ax = 0 for all x ∈ Rn , Aej = 0. Since this is
true for all j = 1, . . . , n, A = O. Working with the rows rather than with the columns of A, we
can argue as follows: For any i = 1, . . . , m, we have Ai · x = 0 for all x ∈ Rn . By Exercise 1.2.15,
Ai = 0.
b. Apply the result of part a to A − B.

1.4.3 (i) 0 ∈ V since A0 = 0.


(ii) If v ∈ V and c ∈ R, then we must show that cv ∈ V . Well, A(cv) = c(Av) = c0 = 0.
(iii) If v, w ∈ V , then we must show that v + w ∈ V . Since Av = Aw = 0, we have A(v + w) =
Av + Aw = 0 + 0 = 0, as required.
 
0
1.4.4 a. (i) Since A0 = 0, the zero vector ∈ Rm+n is in V .
0
 
x
(ii) If v ∈ V and c ∈ R, then we must show that cv ∈ V . Well, v = for some x ∈ Rn . Then
Ax
   
cx cx
cv = = ∈ V , as required.
cAx A(cx)
 
x
(iii) If v, w ∈ V , then we must show that v + w ∈ V as well. We know that v = and
Ax
     
y x+y x+y
w= for some x, y ∈ Rn . Then v + w = = ∈ V , as required.
Ay Ax + Ay A(x + y)
This completes the verification that V is a subspace.
 
x1
   .. 
a  
b. Given the 1 × n matrix A, set a = AT ∈ Rn . Set b = . Then b ·  .  =
−1  xn 
y
a · x − y = 0 if and only if y = Ax, as required.
 
x1 + x2
1.4.5 a. We want a matrix A such that Ax = . It is easy to check that A =
x1 − x2
" #
1 1
works.
1 −1
" # " #
x1 + x2 1 1 x1 + x2
b. We want a matrix A such that Ax = proj  x = = . It
1 2 1 2 x1 + x2
" #  
1
1 1
is easy to check that A = 2 2 works.
1 1
2 2
" # " #
1 0 −1 0
c. The matrices B = and C = give the reflection of x across the
0 −1 0 1
" #
−1 0
lines x2 = 0 and x1 = 0, respectively. Hence, the desired matrix is given by A = BC = .
0 −1
16 1. VECTORS AND MATRICES
" # " #
x1 + 2x2 1 1 x1 + 2x2
d. We want a matrix A such that Ax = proj  x = = .
1 5 2 5 2x1 + 4x2
" #  
2
1 2
We see that A = 5 5 does the job.
2 4
5 5
" #
0 −1
e. As explained in the text, B = rotates a vector π/2 counterclockwise. Thus
1 0
" # " #
1 2
5 5 − 25 − 45
(according to part d), if C = 2 4
, then A = BC = 1 2
projects x onto the line
5 5 5 5
2x1 − x2 = 0 and then rotates the resulting vector π/2 radians counterclockwise.
" #
2 1

f. Here we reverse the order from part e, so A = CB = 54 5 .
2
5 − 5
" #" #
cos θ − sin θ cos φ − sin φ
1.4.6 a. Aθ Aφ = =
sin θ cos θ sin φ cos φ
" #
cos θ cos φ − sin θ sin φ −(cos θ sin φ + sin θ cos φ)
.
sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ
" #" # " #
cos φ − sin φ cos θ − sin θ cos θ cos φ − sin θ sin φ −(cos θ sin φ + sin θ cos φ)
Aφ Aθ = = .
sin φ cos φ sin θ cos θ sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ

b. Since Aθ Aφ = A(θ+φ) , we see that


" # " #
cos θ cos φ − sin θ sin φ − cos θ sin φ + sin θ cos φ cos(θ + φ) − sin(θ + φ)
= ,
sin θ cos φ + cos θ sin φ cos θ cos φ − sin θ sin φ sin(θ + φ) cos(θ + φ)

so cos(θ + φ) = cos θ cos φ − sin θ sin φ and sin(θ + φ) = sin θ cos φ + cos θ sin φ.
" #" # " #
cos θ − sin θ x1 x1 cos θ − x2 sin θ
1.4.7 a. Aθ x = = , so
sin θ cos θ x2 x1 sin θ + x2 cos θ

kAθ xk2 = (x1 cos θ − x2 sin θ)2 + (x1 sin θ + x2 cos θ)2 = x21 + x22 = kxk2 .

b. x · Aθ x = x21 cos θ − x1 x2 sin θ + x1 x2 sin θ + x22 cos θ = (x21 + x22 ) cos θ and
kAθ xkkxk = kxk2 = x21 + x22 , so

x · Aθ x (x2 + x2 ) cos θ
= 1 2 2 2 = cos θ.
kAθ xkkxk x1 + x2
(Strictly speaking, we really want the signed angle between the vectors to be θ. That is, we should
rotate counterclockwise through an angle θ to get from x to Aθ x. See the discussion in Section 5.)

1.4.8 All of these statements are incorrect.


" # " # " # " #
1 0 1 0 1 1 1 0
a. Let A = ,B= , and C = . Then AB = =
1 0 0 0 1 0 1 0
CB and B 6= O, but A 6= C.
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 17
" #
1 0
b. Let A = . Then A2 = A, but A 6= O and A 6= I.
0 0
" # " #
1 0 1 0
c. Let A = and B = . Then
1 0 0 0
" #" # " #
2 0 0 0 2 2 0 0
(A + B)(A − B) = = O, but A − B = .
1 0 1 0 1 0
" # " # " #
0 2 2 0 0 1
d. Let A = , B = , and C = . Then AB = BC, B is
1 0 0 1 2 0
nonsingular, and yet A 6= C.

1.4.9 a. If A2 = I2 , then

a2 + bc = 1, bc + d2 = 1, b(a + d) = 0, and c(a + d) = 0.

From the third equation we infer that either b = 0 or a + d = 0. If b = 0, then the first two
equations give a2 = d2 = 1, so a = ±1 and d = ±1. The last equation gives a + d = 0 or c = 0.
Thus, the solutions with b = 0 are given by
" # " # " # " #
1 0 −1 0 1 0 −1 0
, , , or , c ∈ R.
0 1 0 −1 c −1 c 1

If b 6= 0, then we have a = −d and c = (1 − a2 )/b, so we have the additional solutions


" #
a b
, a, b ∈ R, b 6= 0.
(1 − a2 )/b −a

b. For A2 = O we get a2 + bc = d2 + bc = c(a + d) = b(a + d) = 0. The solutions are


" #
0 0
for any β ∈ R
β 0

and
" # " #
a b 1 β
2
=a
−a /b −a −a/β −1

for any a, b ∈ R, b 6= 0 (or any β 6= 0, letting β = b/a when a 6= 0).


c. A2 = −I2 gives the equations

a2 + bc = d2 + bc = −1, and b(a + d) = c(a + d) = 0.

Now we cannot have b = 0 or c = 0, so the general solution is


" #
a b
A= , a, b ∈ R, b 6= 0.
−(1 + a2 )/b −a
18 1. VECTORS AND MATRICES
" #
1
1.4.10 a. The first column of R is the vector obtained by reflecting across the given line,
0
" # " #
cos 2θ 0
namely . Its second column is the vector obtained by reflecting across the given line;
sin 2θ 1
since the latter vector makes an angle of π/2 − θ with the line, its reflection makes an angle of
−(π/2 − θ)"with the# line, and therefore an angle of −(π/2 − 2θ) with the positive x1 -axis. This is
sin 2θ
the vector .
− cos 2θ

Re1 e2 π
−θ
2
θ
e1
Re2

" # " #" # " #


1 0 cos 2θ − sin 2θ 1 0 cos 2θ sin 2θ
b. A2θ = = = R and
0 −1 sin 2θ cos 2θ 0 −1 sin 2θ − cos 2θ
" # " #" #" #
1 0 cos θ − sin θ 1 0 cos θ sin θ
Aθ A−θ =
0 −1 sin θ cos θ 0 −1 − sin θ cos θ
" #
cos2 θ − sin2 θ 2 sin θ cos θ
= = R, by the double angle formulas.
2 sin θ cos θ sin2 θ − cos2 θ
" #" # " #
1 1 1 1 1 2
1.4.11 a. We have A2 = = and
0 1 0 1 0 1
" #" # " #" # " #
1 k 1 1 1 1 1 k 1 k+1
= = for k ≥ 1. Therefore, by induction
0 1 0 1 0 1 0 1 0 1
" #
1 n
on n we have An = .
0 1
    
d1 d1 d21
 ..  ..   .. 
b. Since A = 
2
 . 
 . =
  .  and

dm dm d2m
    
dk1 d1 dk+1
1
 ..  ..   .. 
 .  . = . , by induction on n we have
    
dkm dm dk+1
m
 
dn1
 .. 
An = 
 . .

dnm
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 19

1.4.12 Let the rows of A, B, C, and D be given by Ai , Bi , Cj , Dj , i = 1, . . . , m, j = 1, . . . , n,


and let the columns of A′ , B ′ , C ′ , and D ′ be given by a′i , b′j , c′i , d′j , i = 1, . . . , m, j = 1, . . . , n. Then
" #
A B
the first m rows of are given by the vectors in Rm+n made by concatenating Ai and Bi ,
C D
 T  T
Ai Cj
i = 1 . . . , n, i.e., . Likewise, the last n rows are given by , j = 1, . . . , n. Similarly,
Bi Dj
" #  ′
A′ B ′ ai
the first m columns of ′ ′
are given by , i = 1, . . . , m, and the last n columns are
C D c′i
 ′  " #" #
bj A B A′ B ′
given by , j = 1, . . . , n. Thus, the kℓ-entry of the matrix product is
d′j C D C ′ D′
given by

   ′
 Ak a

 · ′ℓ , 1 ≤ k, ℓ ≤ m

 Bk c

 A   bℓ′ 




k
· ℓ−m
, 1 ≤ k ≤ m, m + 1 ≤ ℓ ≤ m + n

 Bk  dℓ−m′ 

 Ck−m a

 · ′ℓ , m + 1 ≤ k ≤ m + n, 1 ≤ ℓ ≤ m

 D cℓ
 Ck−m   b′ 
k−m



 · ℓ−m
′ , m + 1 ≤ k, ℓ ≤ m + n.
Dk−m dℓ−m

   ′    ′ 
Ak aℓ ′ ′ ′ ′ Ak bℓ−m
Now, · ′ = Ak · aℓ + Bk · cℓ is the kℓ-entry of AA + BC , and · =
Bk cℓ B d′ℓ−m
   ′ k
Ck−m a
Ak ·b′ℓ−m + Bk ·d′ℓ−m is the (k, ℓ − m)-entry of AB ′ + BD ′ . Similarly, · ′ℓ = Ck−m ·a′ℓ +
Dk−m cℓ
   ′ 
Ck−m b
Dk−m ·c′ℓ is the (k −m, ℓ)-entry of CA′ +DC ′ , and · ℓ−m = Ck−m ·b′ℓ−m +Dk−m ·d′ℓ−m
Dk−m d′ℓ−m
" #" #
′ ′ A B A′ B ′
is the (k − m, ℓ − m)-entry of CB + DD . Thus, the kℓ-entries of and
C D C ′ D′
" #
AA′ + BC ′ AB ′ + BD ′
agree, and so the matrices are equal.
CA′ + DC ′ CB ′ + DD ′

1.4.13 a.# We have S(e1 ) = −e2 and S(e2 ) = −e1 , so the standard matrix for" S is A #=
"
0 −1 0 −1
. And T (e1 ) = e2 and T (e2 ) = −e1 , so the standard matrix for T is B = .
−1 0 1 0
b. We have # 1 ) = T (−e2 ) = e1 and (T ◦ S)(e2 ) = T (−e1 ) = −e2 , so the standard
" (T ◦ S)(e
1 0
matrix for T ◦ S is . Note that this is, in fact, the matrix product BA.
0 −1
c. We have " # 1 ) = S(e2 ) = −e1 and (S ◦ T )(e2 ) = S(−e1 ) = e2 , so the standard
(S ◦ T )(e
−1 0
matrix for S ◦ T is , which is the matrix product AB.
0 1
20 1. VECTORS AND MATRICES
" #
1 1
1.4.14 a. Rotating the vector e1 by −π/4 gives the vector √ ; reflecting that vector
2 −1
" # " #
1 −1 1 1
across the line x1 = x2 gives √ . Similarly, rotating e2 by −π/4 gives the vector √ ,
2 1 2 1
" #
1 −1 1
which is left unchanged by the reflection. Thus, the standard matrix for T is √ .
2 1 1
b. The rotation takes the standard basis vectors e1 , e2 , and e3 to e1 , e3 , and −e2 , respec-
tively. Reflecting the latter vectors across the plane x2 = 0 results in e1 , e3 , and e2 , respectively.
1 0 0
 
Thus, the standard matrix for T is  0 0 1 .
0 1 0
c. The first rotation takes the standard basis vectors e1 , e2 , and e3 to e1 , −e3 , and e2 ,
respectively. The second rotation
 takes the latter
 vectors to e2 , −e3 , and −e1 , respectively. Thus,
0 0 −1
 
the standard matrix for T is  1 0 0 .
0 −1 0

1.4.15 a.
  carries e1 to e2 , e2 to −e1 , and leaves e3 fixed. Thus, the standard
This symmetry
0 −1 0
 
matrix is  1 0 0 .
0 0 1  
1
 
  b. Since the front face, whose center is 0 , is moved to the bottom face,
 whose
 center
0 0 0
is  0 , we see that e1 is mapped

 
 to −e3 . Likewise, the right face, with center 1 , is moved
−1 0 0
to theleft face, with center  −1 , so e2 is mappedto −e
2 . Finally, the top face, whose center
0 0 −1
is  0 , is moved to
 the back face, whose center is
 0 , so e3 is mapped to −e1 . Thus, the
1 0 0 −1 0
 
standard matrix is  0 −1 0 .
−1 0 0
c. Once again, following faces, we see that the top face moves to the front face, the front
face moves to the right face, and the right
 face moves tothe top. Thus, e3 maps to e1 , e1 maps to
0 0 1
 
e2 , and e2 maps to e3 . So the matrix is  1 0 0 , which is again an orthogonal matrix.
0 1 0

1.4.16 Notice that


 the standard basis vectors are the midpoints of the three edges that con-
1
tain the vertex  1 . Thus, we need only understand how those edges move under the various
1
symmetries.
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 21
 
1
a. A 120◦ rotation (counterclockwise, as viewed from far out the line past  1 ) sends
 
0 0 1 1
 
e1 to e2 , e2 to e3 , and e3 to e1 . Thus, the standard matrix is  1 0 0 . For a −120◦
0 1 0
 
0 1 0
 
rotation, we have the inverse (or transpose!) of the previous matrix,  0 0 1 .
1 0 0
b. Here the vectors e1 and e2 are sent
 to their respective additive inverses, and e3 is left
−1 0 0
 
fixed, so we find the matrix  0 −1 0 .
0 0 1    
0 1
c.    
  There are six such reflections. For example, if the plane passes through 0 , −1 ,
−1 1 −1
and  1 , then the reflection T fixes e3 and reflects the x1 x2 -plane across
 the axis x2 = −x1 ,
−1 0 −1 0
 
so T (e1 ) = −e2 and T (e2 ) = −e1 . Thus, the standard matrix for T is  −1 0 0 . The
0 0 1
   
1 1 0 0
   
other reflection matrices are as follows: for the plane through  0 ,  0 0 −1 ; for the plane
0 0 −1 0
       
0 0 0 −1 −1 1 0 0
       
through  1 ,  0 1 0 ; for the plane through  0 ,  0 0 1 ; for the plane
0 −1 0 0 0 0 1 0
       
0 0 0 1 0 0 1 0
       
through  −1 ,  0 1 0 ; and for the plane through  0 ,  1 0 0 .
0 1 0 0 −1 0 0 1

1.4.17 a. (BAB −1 )2 = (BAB −1 )(BAB −1 ) = BA(B −1 B)AB −1 = BA2 B −1 .


b. For k ≥ 1, (BAk B −1 )(BAB −1 ) = BAk (B −1 B)AB −1 = BAk+1 B −1 , so we can see by
induction on n that (BAB −1 )n = BAn B −1 for all positive integers n.
c. We must make the additional assumption that A is invertible. Then by Proposition
4.3, (BAB −1 )−1 = (B −1 )−1 A−1 B −1 = BA−1 B −1 . Thus, the result of part b holds as well when
the exponent is a negative integer.

1.4.18 Examples
" are: #
0 1
a. A =
0 0
22 1. VECTORS AND MATRICES
 
0 1 0
 
b. A =  0 0 1
0 0 0
The n × n matrix A with ai,i+1 = 1, i = 1, . . . , n − 1, and all other entries 0 has this property.
Another conjecture one might offer is that if An−1 6= O and An = O, then A must be at least n × n.

1.4.19 A−1 x = A−1 ( 71 Ax) = 17 (A−1 A)x = 71 x.


 
1.4.20 Since A3 − 3A + 2I = O, we see I = 32 A − 12 A3 = A 32 I − 12 A2 = 32 I − 12 A2 A. So,
setting B = 23 I − 21 A2 , we have AB = BA = I. Thus, B = A−1 (and so A is invertible).

1.4.21 Notice that (In + A + A2 + · · · + A9 )(In − A) = In − A10 = In since A10 = O. We infer


from Corollary 2.2 that In − A is invertible.

1.4.22 a. Since (AT )ii = aii we have tr(AT ) = tr(A).


b. Since (A + B)ii = aii + bii , we have tr(A + B) = tr(A) + tr(B). Since (cA)ii = caii ,
we have tr(cA) = c tr(A).
P
n P
n
c. (AB)ii = ai1 b1i + ai2 b2i + · · · + ain bni = aij bji , whereas (BA)ii = aji bij . Thus,
j=1 j=1
P
n P
n P
n P
n P
n P
n
tr(AB) = aij bji = aij bji = aji bij = tr(BA).
i=1 j=1 j=1 i=1 i=1 j=1
" #
1 3
1.4.23 a. AT =
2 4
" # " # " #
2 4 2 4 0 0
b. 2A − B T = − =
6 8 1 3 5 5
 
1 0
T  
c. C =  2 1
1 2
 
1 1
T  
d. C + D =  3 1
3 5
" #
1 5 7
e. AT C =
2 8 10
f. AC T is not defined because A is 2 × 2 and C T is 3 × 2.
   
1 0 " # 1 3
  1 3  
g. C T AT =  2 1 = 4 10 
2 4
1 2 5 11
" #" # " #
T 2 1 0 1 2 1 2 7
h. BD = =
4 3 1 0 3 3 4 17
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 23

i. D T B is not defined because D T is 2 × 3 and B is 2 × 2


 
" # 1 0 " #
1 2 1   6 4
j. CC T = 2 1 =
0 1 2 4 5
1 2
   
1 0 " # 1 2 1
  1 2 1  
k. C T C =  2 1 = 2 5 4
0 1 2
1 2 1 4 5
   
1 0 " # 0 1 2
  0 1 2  
l. C T D T =  2 1 = 1 2 7
1 0 3
1 2 2 1 8

1.4.24 (AB)T = B T AT = BA. Thus, (AB)T = AB if and only if AB = BA, as desired.

1.4.25 (AT A)T = AT (AT )T = AT A.

1.4.26 (A−1 )T AT = (AA−1 )T = I T = I and (AT )(A−1 )T = (A−1 A)T = I T = I. Therefore, AT


is invertible if A is invertible, and (AT )−1 = (A−1 )T .

1.4.27 We claim that Aθ x·y = x·A−θ y for all x, y ∈ R2 . Since, by Exercise 7, multiplication by
Aθ preserves length, this is equivalent to the observation that the angle between Aθ x and y is equal
to the angle between x and A−θ y. It then follows from Proposition 4.5 that A−1 θ = A−θ = Aθ .
T

" # " #
1 0 0 1
1.4.28 a. There are two: and .
0 1 1 0
b. There are six:

     
1 0 0 1 0 0 0 1 0
     
0 1 0, 0 0 1, 1 0 0,
0 0 1 0 1 0 0 0 1
     
0 1 0 0 0 1 0 0 1
     
0 0 1, 1 0 0, and 0 1 0.
1 0 0 0 1 0 1 0 0

c. Let P and Q be permutation matrices with columns p1 , . . . , pn and q1 , . . . , qn , re-


spectively. Each column vector has exactly one nonzero entry (which is a 1). Also, pi 6= pj and
qi 6= qj when i 6= j. Now if qi has a 1 in the j th entry, then P qi = pj . Since the columns of P Q
are P q1 , . . . , P qn and qk 6= qℓ when k 6= ℓ, we see that these are a rearrangement of the original
columns of P . In particular, each row and each column of P Q have exactly one nonzero entry,
which is a 1. Thus, P Q is a permutation matrix.
24 1. VECTORS AND MATRICES

Permutation matrices need not commute. For example, if


   
1 0 0 0 1 0
   
P = 0 0 1 and Q = 1 0 0,
0 1 0 0 0 1

then P Q 6= QP .
d. Let p1 , . . . , pn denote the columns of P . Since P is a permutation matrix, we have
pT
i pj = 0 if i 6= j and pT i pi = 1. The rows of P
T are given by pT , . . . , pT . Thus, P T P = I.
1 n
Similarly, letting P1 , . . . , Pn denote the rows of P , we have Pi · Pj = 0 if i 6= j and Pi · Pi = 1, so
P P T = I, as well. Thus, P T = P −1 , as required.
e. The rows of P A are a rearrangement (permutation) of the rows of A. If the ith row
of P has a 1 in the k th entry, then the ith row of P A is Ak . Similarly, the columns of the matrix
AP are a rearrangement of the columns of A. For instance, if the j th column of P has a 1 in the
k th entry, then the j th column of AP is ak .

1.4.29 x · y = x · AT b = Ax · b = 0 · b = 0.

1.4.30 Let y ∈ V ⊥ . For any v ∈ V , we have Ay · v = y · AT v = y · Av = 0, since Av ∈ V and


y ∈ V ⊥ . Therefore, Ay ∈ V ⊥ .
 
1 1 0
 
1.4.31 a. B =  2 3 1  = AT , so (by Exercise 26) B −1 = (AT )−1 = (A−1 )T =
1 1 −1
 
4 −1 −1
 
 −3 1 1 .
 
1 0 −1 1 2 1
 
b. If we switch the last two rows of the matrix A, we obtain B =  0 1 −1 . To
1 3 1
find −1 −1 =
 the inverse of  B, we should therefore switch the last two columns of A , yielding B
4 1 −3
 
 1 0 1 .
−1 −1 1

 c. If we multiply the last row of A by a factor of two, then we obtain


 the matrix 
B=
1
1 2 1 4 1 2
   
1 3 1 . So multiplying the last column of A−1 by 21 yields B −1 =  −1 1 0 .
0 2 −2 −1 1 − 12

1.4.32 If (AT A)x = 0, then, using Proposition 4.5, we have (AT A)x · x = Ax · (AT )T x =
Ax · Ax = 0, so kAxk2 = 0, from which we conclude that Ax = 0.

1.4.33 For any x ∈ Rn we have AT Ax = A2 x = 0. Thus, by Exercise 32, we know Ax = 0 for


all x ∈ Rn . This implies A = O.
1.4. LINEAR TRANSFORMATIONS AND MATRIX ALGEBRA 25

An alternative proof is to notice that the jj-entry of A2 = AT A is aj · aj = kaj k2 . If these are


all 0, then the columns
" are all#zero.
0 0
The matrix A = satisfies A2 = O, yet A 6= O.
1 0

1.4.34 a. Notice that for any square matrix A, (AT A)ij = ai · aj . So, if A is orthogonal,

1, i = j
ai · aj = (AT A)ij = (In )ij = .
0, i 6= j

   
" √ # " √ # 1 0 0 1 0 0
3 1 3
2 − 12    
b. √2 or 2 √ , 0 −1 0 or 0 −1 0 ,
3 3
− 12 2 − 12 − 2 0 0 1 0 0 −1
1 2 2
 1 
3 3 3 3 − 23 2
3
 2 1   
 3 3 − 23  or  2
3 − 13 − 23 .
2
3 − 23 1
3
2
3
2
3
1
3
c. By part  a, the column vectors a1 and a2 must be mutually orthogonal unit vectors.
cos θ
In particular, a1 = for some θ. Since a2 must be a unit vector orthogonal to a1 , we must
sin θ
 
− sin θ
have a2 = ± .
cos θ
d. "The first#matrix in part c is Aθ , the matrix giving rotation through angle θ; the second
1 0
matrix is Aθ , the composition of a rotation and a reflection.
0 −1
e. Notice that for any square matrix A, (AAT )ij = Ai · Aj . As remarked in the exercise,
if AT A = I, then it follows that AAT = I as well, so Ai · Aj = (AAT )ij = Iij .

1.4.35 a. If A and B are orthogonal, then AT A = B T B = In , so (AB)T AB = (B T AT )AB =


B T (AT A)B = B T In B = B T B = In , and AB is orthogonal.
b. If A is orthogonal, then A−1 = AT , so (A−1 )T (A−1 ) = (AT )T (AT ) = AAT = In by
part e of Exercise 34.

1.4.36 If A is both symmetric and skew-symmetric, then A = AT = −A, whence A = O.


a.
T T
b. S T = 21 (A + AT ) = 12 (A + AT )T = 12 (AT + A) = S and K T = 12 (A − AT ) =
1 T T 1 T 1 T
2 (A − A ) = 2 (A − A) = − 2 (A − A ) = −K.

c. Suppose A is square. By part b, S = 21 (A + AT ) is symmetric and K = 12 (A − AT ) is


skew-symmetric. Moreover, S + K = 21 (A + AT ) + 12 (A − AT ) = A.
d. Notice that if A = S + K = S ′ + K ′ , then S − S ′ = K ′ − K. Also (S − S ′ )T =
S T − S ′T = S − S ′ and (K ′ − K)T = K ′T − K T = K − K ′ , so S − S ′ is symmetric and K ′ − K is
skew-symmetric. By part a, S − S ′ = K ′ − K = O. Hence, S = S ′ and K = K ′ .

1.4.37 It must be the case that A = cI for some scalar c. Denote by Eij , the matrix with a 1 in
the ij-entry and 0’s elsewhere. Then from AEij = Eij A we infer that all the nondiagonal entries
of the j th row and ith column of A are 0 and (comparing ij-entries of the product) that aii = ajj .
26 1. VECTORS AND MATRICES

1.5. Introduction to Determinants and the Cross Product

1.5.1 The parallelogram OADB has the same area as parallelogram OAEC because △OBC

E
D
C
B
y+cx
y

x A
O

and △ADE are congruent. Alternatively, the result follows from Cavalieri’s principle, as sliding
cross-sections parallel to the base OA results in a figure with the same area.
   
x1 y
1.5.2 Let x = and y = 1 . Then
x2 y2

D(x, y) = D(x1 e1 + x2 e2 , y1 e1 + y2 e2 ) by definition


= D(x1 e1 , y1 e1 ) + D(x1 e1 , y2 e2 ) + D(x2 e2 , y1 e1 )+D(x2 e2 , y2 e2 )
by Property 3
= x1 y1 D(e1 , e1 ) + x1 y2 D(e1 , e2 ) + x2 y1 D(e2 , e1 )+x2 y2 D(e2 , e2 )
by Property 2
= x1 y 2 − x2 y 1 by Properties 1 and 4.

Note that at the last step we have used the observation that, by Property 1, for any z ∈ R2 , we
have D(z, z) = −D(z, z), and so D(z, z) = 0.

1.5.3 Let v1 , v2 , . . . , vn be the vertices of the polygon P, arranged consecutively, proceeding


counterclockwise around P. Dividing the polygon into n − 2 triangles with common vertex v1 , we
have
n−1
X
signed area(P) = signed area of the triangle with vertices v1 , vi , vi+1
i=2
n−1
1X
= D(vi − v1 , vi+1 − v1 )
2
i=2
n−1
X
1 
= D(vi , vi+1 ) − D(v1 , vi+1 ) + D(v1 , vi )
2
i=2
n−1
X
1 1 1
= D(vi , vi+1 ) + D(v1 , v2 ) + D(vn , v1 )
2 2 2
i=2
1.5. INTRODUCTION TO DETERMINANTS AND THE CROSS PRODUCT 27

n−1
1X 1
= D(vi , vi+1 ) + D(vn , v1 ).
2 2
i=1

(The latter sum is the sum of the signed areas of triangles with vertices at the origin, vi , and vi+1 .
This makes good sense, since the answer should be independent of the location of the origin.)

O
v1

   
b11 b12
1.5.4 a. Let b1 = and b2 = be the columns of B. Then
b21 b22
" # " #!
a11 b11 + a12 b21 a11 b12 + a12 b22
det AB = D(Ab1 , Ab2 ) = D ,
a21 b11 + a22 b21 a21 b12 + a22 b22
= (a11 b11 + a12 b21 )(a21 b12 + a22 b22 ) − (a21 b11 + a22 b21 )(a11 b12 + a12 b22 )
= (a11 a22 − a12 a21 )((b11 b22 − b12 b21 ) = det A det B.

b. Since det Aθ = cos2 θ + sin2 θ = 1, this is immediate.


c. Given two vectors x, y ∈ R2 , we choose a rotation Aθ so that Aθ x = ae1 for some
a b

a > 0. Writing Aθ y = be1 + ce2 , by part b, we have D(x, y) = = ac, which we recognize as
0 c
(up to sign) “base × height.”
 
e1 1 1 2

 
1.5.5 a. x × y = e2 0 2 =  −2 .

e3 −1 1 2
 
e1 1 7 9

 
b. x × y = e2 −2 1 =  12 .

e3 1 −5 15

−→ −→
1.5.6 Let x = AB and y = AC. Then area(△ABC) = 21 kx × yk.
 
2 √
a. x × y =  −2  and area(△ABC) = 3.
2
28 1. VECTORS AND MATRICES
 
2 √
b. x × y =  −2  and area(△ABC) = 3.
2
 
9 √
c. x × y = 12  and area(△ABC) = 12 (15 2).

15
 
9 √
d. x × y =  12  and area(△ABC) = 12 (15 2).
15

1.5.7 Using the cross products already calculated in Exercise 6 and the approach of Example
2, we find
a. x1 − x2 + x3 = 0
b. x1 − x2 + x3 = 3
c. 3x1 + 4x2 + 5x3 = 0
d. 3x1 + 4x2 + 5x3 = 12
   
1 1
1.5.8 Since the plane is parallel to the vectors u =  −2  and v =  0 , its normal vector is
 
−2 −1 1
given by A = u × v =  −2 . Thus, an equation of the plane is x1 + x2 − x3 = −2.
2    
1 2
1.5.9 The normals of the respective planes are u =  1  and v =  1 , so the intersection
 
3 −2 1
of the planes is spanned by w = u × v =  −5 .
−1

1.5.10 Note first that P = {x : a · x = b} is an affine plane with normal a at distance b/kak
from the origin. In addition, ℓ = {x : a × x = c} is a line parallel to a in the plane through 0
orthogonal to c and at distance kck/kak from the origin. Since the intersection of P and ℓ is a
single point,
 there is a unique
 vector
 x satisfying the two equations. Algebraically, one can check
a a
that x = b +c× is the unique solution.
kak kak2
   
2 1
1.5.11 The points x0 =  1  and y0 =  1  lie on ℓ and m respectively. The distance between
1 0
ℓ and m is found by finding the projection of x0 − y0 on a vector v orthogonal to both lines. The

e1 0 1


latter
 is computed by the cross product of the respective direction vectors: v = e2 1 1 =
 
2 1 e3 −1 1
  1
 −1 . Thus, the distance between the lines is  
projv 0 = √6 .
1
−1
1.5. INTRODUCTION TO DETERMINANTS AND THE CROSS PRODUCT 29

1.5.12 The volume of the parallelepiped is given by the absolute value of D(x, y, z). Now,

1 2 −1


D(x, y, z) = 2 3 0 = −2,

1 1 3

so the volume is 2.

1.5.13 Suppose the parallelogram P is spanned by x and y. Then we know from Proposition
5.1 that area(P) = kx × yk. But, by definition,

x y 2 x y 2 x y 2 2 2 2
2 2 3 3 1 1
kx × yk2 = + + = area(P1 ) + area(P2 ) + area(P3 ) .
x3 y 3 x1 y 1 x2 y 2

1.5.14 a. The first equation is immediate from Property 1 of determinants, and the second from
Property 3.
b. We have (e1 × e2 ) × e2 = e3 × e2 = −e1 , and yet e1 × (e2 × e2 ) = e1 × 0 = 0.

1.5.15 T is linear by virtue of Properties 2 and 3 of determinants. Its standard matrix is


determined, as usual, by computing
     
0 −c b
     
T (e1 ) =  c  , T (e2 ) =  0  , T (e3 ) =  −a  ,
−b a 0

and so
 
0 −c b
 
[T ] =  c 0 −a  .
−b a 0
Now, (a × x) · y = (x × y) · a, so (a × x) · y = (x × y) · a = −(y × x) · a = −(a × y) · x. Therefore,
T (x) · y = −x · T (y), and so [T ] must be skew-symmetric.

1.5.16 Although one can bludgeon this in coordinates, it is best to note that both sides are
linear in each of the vectors x, y, z, and w. Thus, it suffices to check the equality when each is
replaced by one of the standard basis vectors for R3 : x = ei , y = ej , z = ek , and w = eℓ .
Notice that if i = j or k = ℓ, both sides vanish: the left because the cross product of a vector
with itself is 0, the right because either the rows or the columns are equal. If i = k and j = ℓ,
i 6= j, then both sides are 1. If i = k and i 6= j, j 6= ℓ, then ei × ej = ±eℓ and ei × eℓ = ∓ej , and
so both the left and the right vanish. Up to obvious symmetries, this covers all the bases.

1.5.17 a. As the hint suggests, we write x − u = s(v − u) + t(w − u) for some (unique) scalars
s and t. Then we have x = (1 − s − t)u + sv + tw, and, letting r = 1 − s − t, we are done.
b. The signed area of the triangle with vertices x, v, and w is given by

D(x − v, w − v) = D r(u − v) + t(w − v), w − v = r D(u − v, w − v),
30 1. VECTORS AND MATRICES

as required. Similarly, s and t are, respectively, the ratios of the signed areas of △uxw and △uvx
to that of △uvw.
c. From Exercise 1.1.8 we know that x = 31 (u + v + w), so this tells us that each of the
three triangles has one-third the area of △uvw. This is a non-obvious result.
d. This is an immediate consequence of parts a and b: We can express 0 = ru + sv + tw
uniquely, with r = D(v, w), s = D(w, u), and t = D(u, v). For a physical interpretation, we have
this: If we translate our coordinates so that the origin is in the interior of △uvw, then putting
masses r, s, and t at vertices u, v, and w, respectively, the system balances at the origin.

1.5.18 a. We have

kx × yk2 = (x2 y3 − x3 y2 )2 + (x3 y1 − x1 y3 )2 + (x1 y2 − x2 y1 )2


 
= x22 y32 + x23 y22 + x23 y12 + x21 y32 + x21 y22 + x22 y12 − 2 x2 x3 y2 y3 + x1 x3 y1 y3 + x1 x2 y1 y2
= (x21 + x22 + x23 )(y12 + y22 + y32 ) − (x21 y12 + x22 y22 + x23 y32 )

+ 2(x2 x3 y2 y3 + x1 x3 y1 y3 + x1 x2 y1 y2 )
= kxk2 kyk2 − (x1 y1 + x2 y2 + x3 y3 )2 = kxk2 kyk2 − (x · y)2 .

Thus, letting θ be the angle between x and y, we have kx × yk2 = kxk2 kyk2 (1 − cos2 θ) =
kxk2 kyk2 sin2 θ, so kx × yk = kxkkyk sin θ, which is, indeed, the area of the parallelogram spanned
by x and y.
b. Think of the parallelepiped as having as its base the parallelogram spanned by x and
y. Then its height is kzk cos ω, where ω is the angle between z and x × y. Thus, the signed volume
of the parallelepiped is kx × ykkzk cos ω = (x × y) · z.

x×y

z
ω

c. This is just the formula D(x, y, z) = D(z, x, y) = z · (x × y) observed in the proof of


Proposition 5.1.

1

1.5.19 Since x · y = 2 kxk2 + kyk2 − kx − yk2 , we have
1 1 
A2 = kx × yk2 = kxk2 kyk2 − (x · y)2
4 4
    
1 2 2 1 2 2 2 2 1 1 2 2 2 1 2 2 2
= a b − (a + b − c ) = ab + (a + b − c ) ab − (a + b − c )
4 4 4 2 2
1.5. INTRODUCTION TO DETERMINANTS AND THE CROSS PRODUCT 31

1   1
= (a + b)2 − c2 c2 − (a − b)2 = (a + b + c)(a + b − c)(c + a − b)(c − a + b)
16 16
a+b+c a+b−c c+a−b c−a+b
= · · · = s(s − a)(s − b)(s − c).
2 2 2 2
1.5.20 As we see in the figure, we can decompose the triangle into three triangles, each having
height r. Then, using the result of Exercise 19, we have

c b
r
a

1 p
A= (a + b + c)r = rs = s(s − a)(s − b)(s − c),
2
p
and so r = (s − a)(s − b)(s − c)/s.
CHAPTER 2
Functions, Limits, and Continuity
2.1. Scalar- and Vector-Valued Functions
   
2 4
2.1.1 a. x= +t .
0 −3
   
−1 3
b. x = +t .
2 1
   
1 1
c. x =  2  + t  −1 .
1 −1
   
−2 −5
d. x = +t .
1 3
   
1 1
 1  −2 
e. x=   
 0  + t  3 .
−1 −1

2.1.2 a. Since t = y/(x + 1), we have x2 + t2 (x + 1)2 = 1. Therefore, t2 (x + 1)2 = 1 − x2 =


(1 + x)(1 − x); since we are not interested in the root x = −1, we get t2 (1 + x) = 1 − x, and so
1 − t2
x= . Less cleverly, we proceed by factoring or applying the quadratic formula to solve for x:
1 + t2

(1 + t2 )x2 + 2t2 x + (t2 − 1) = (x + 1) (1 + t2 )x + (t2 − 1) = 0,

1 − t2
so either x = −1 (which we discard) or x = . Thus, we have the parametrization
1 + t2
" 2
#
1−t
x = g(t) = 1+t2 .
2t
1+t2
   
x cos θ
Alternatively, if we write = , then t = tan θ2 , and
y sin θ

θ 2 1 − t2 θ θ 2t
x = cos θ = 2 cos2 −1= − 1 = and y = sin θ = 2 sin cos = .
2 1 + t2 1 + t2 2 2 1 + t2
b. For every rational number t, we obtain rational numbers x and y with x2 + y 2 = 1.
Clearing denominators, we obtain integers X, Y , and Z with X 2 + Y 2 = Z 2 . In particular, for all
the rational numbers t with 0 < t < 1 we obtain distinct points in the first quadrant of the unit
circle, hence distinct Pythagorean triples having no common ratios. To get a more explicit formula,
32
2.1. SCALAR- AND VECTOR-VALUED FUNCTIONS 33

note that if t = m/n ∈ Q in lowest terms, m, n ∈ Z, then


n 2 − m2 2mn
x= 2 2
, y= 2 , and so X = n2 − m2 , Y = 2mn, Z = m2 + n2
m +n n + m2
for integers m and n with no common factor other than 1.
" # " # " #
cos θ sin θ cos θ + θ sin θ
2.1.3 g(θ) = a + aθ =a .
sin θ − cos θ sin θ − θ cos θ
" # " # " # " #
at 0 −b sin t at − b sin t
2.1.4 Modifying Example 4 slightly, g(t) = + + = .
0 a −b cos t a − b cos t

2.1.5 a. Note that aθ = bφ and that ψ = π − θ − φ = π − ( a+b


b )θ. We have

b
φ ψ a
a
φ
θ P θ b ψ
P

" # " # " # " #


−−→ cos θ cos ψ cos θ cos( a+b
b )θ
OP = (a + b) +b = (a + b) −b .
sin θ − sin ψ sin θ sin( a+b
b )θ

b. In this case, we have ψ = φ − θ = ( a−b


b )θ, so
" # " # " # " #
−−→ cos θ cos ψ cos θ cos( a−b
b )θ
OP = (a − b) +b = (a − b) +b .
sin θ − sin ψ sin θ − sin( a−b
b )θ

2.1.6 The answer is 3, as can be deduced from the parametric equations for the epicycloid in
Exercise 5. When a = 2 and b = 1, we see that the motion of P relative to the center of the smaller
cos 3θ
circle is given by − , so the coin makes 3 full revolutions.
sin 3θ
A more intuitive argument is this: Imagine unrolling the circumference of the large coin. Then
the small coin makes two revolutions as it traverses that circumference. But when we roll the
circumference back into its circular shape, that adds one more revolution. (Think, by way of
analogy, of the observed speed of a woman walking in a bus aisle as the bus moves down the road.)
   
t x(t)
2.1.7 When the master is at , let the dog’s position be . Then we have x(t) =
0 y(t)
t + cos θ(t), y(t) = sin θ(t), so
y ′ (t) cos θ(t)θ ′ (t)
tan θ(t) = = .
x′ (t) 1 − sin θ(t)θ ′ (t)
34 2. FUNCTIONS, LIMITS, AND CONTINUITY

Solving for θ ′ (t), we find that θ ′ (t) = sin θ(t). Separating variables and integrating, we have
R R
dθ/ sin θ = dt, and so t = − log(csc θ + cot θ) + c for some constant c. Since θ = π/2 when t = 0,
we see that c = 0. " #
cos θ − log(csc θ + cot θ)
a. We have x = .
sin θ
1 + cos θ
b. Since e−t = , we find that (1 + cos θ)2 = e−2t (1 − cos2 θ), so 1 + cos θ =
sin θ
e−2t − 1 1 − e2t 2et
e−2t (1 − cos θ). Solving, we obtain cos θ = −2t = , and then sin θ = . Thus,
e +1 1 + e2t 1 + e2t

" # " #
1−e2t
t+ 1+e2t
t − tanh t
x= 2et
= .
1+e2t
secht

2.1.8 For any distinct nonzero real numbers s, t, and u, we have (using the properties of
determinant given in Section 5 of Chapter 1):


s t u 1 1 1 1 0 0 1 0 0

2 2 2
s t u = stu s t u = stu s t − s u − s = stu(t − s)(u − s) s 1 1

s 3 t 3 u3 s 2 t 2 u2 s 2 t 2 − s 2 u2 − s 2 s2 s + t s + u

1 0 0


= stu(t − s)(u − s) s 1 0 = stu(t − s)(u − s)(u − t) 6= 0.

s2 s + t u − t

Since the volume of the parallelepiped spanned by f (s), f (t), and f (u) is nonzero, the three points
cannot be collinear. On the other hand, when s = 0, say, and t and u are nonzero numbers, we see
that f (0) = 0, f (t), and f (u) are collinear and if and only if f (u) is a scalar multiple of f (t), and it
is easy to check this happens only when t = u.

2.1.9 The level curves and graphs are shown below.

a.
2.1. SCALAR- AND VECTOR-VALUED FUNCTIONS 35

b.

c.

d.

2.1.10 a. X is a hyperboloid of one sheet and Y is a hyperboloid of two sheets.


b. The horizontal
 cross-sections of X are
p circles, and
 one can obviously follow the circle
x0 2
1 + z0
from any point P0 =  y0  to the point P0′ =  0 . Similarly, any points of the form
z0 z0
36 2. FUNCTIONS, LIMITS, AND CONTINUITY

p  p 
1 + z02 1 + z12

P0 =  0  and P1′ =  0  can be joined by following the hyperbola x2 −z 2 = 1, y = 0.
z0 z1
 
cosh t
(This path can be parametrized explicitly, for example, by  0 , arcsinhz0 ≤ t ≤ arcsinhz1 .)
sinh t
   
x0 x1
Thus, given two points  y0  and  y1  ∈ X, we proceed from P0 to P0′ to P1′ to P1 .
z0 z1
   
0 0
There is, however, no path in Y from P =  0  to Q =  0 . For if there were, by the
−1 1
Intermediate Value Theorem, that path would have to cross every plane z = t, −1 ≤ t ≤ 1, and
yet there is no point on Y with z-coordinate 0.

2.1.11 a. This is a torus, the locus of points obtained by rotating a circle of radius 1 about a

circle of radius 2.
p
b. Note that x2 + y 2 = (2 + cos t)2 , so x2 + y 2 − 2 = cos t. Thus, every point of X
p 2
satisfies the equation x2 + y 2 − 2 + z 2 = 1. Removing the square root, this can be rewritten
as (x2 + y 2 + z 2 )2 − 10(x2 + y 2 ) + 6z 2 + 9 = 0.

2.1.12 a. This is straightforward algebra:


     
2 2 2 st + 1 2 s−t 2 s+t 2
x +y −z = + −
st − 1 st − 1 st − 1
(s2 t2 + 2st + 1) + (s2 − 2st + t2 ) − (s2 + 2st + t2 ) (st − 1)2
= = = 1.
(st − 1)2 (st − 1)2
2.2. A BIT OF TOPOLOGY IN Rn 37
   
  −1 0
0
b. For s0 = 0, we have g =  0  + t  1 . For s0 6= 0, we have
t
0 −1
   
  1 2
s0   1  
g =  −1/s0  +  s0 − 1/s0  ,
t s0 t − 1
1/s0 s0 + 1/s0
which is a (somewhat unusual) parametric equation of a line. Similar results hold holding t constant.
c. The image of g consists of all the points of the hyperboloid with x 6= 1. To see this,
note first that s = 0 maps to the line x = −1, y = −z, and t = 0 maps to the line x = −1, y = z.
x+1 z+y
Now, whenever x 6= ±1, we set α = and β = ; note that α and β have the same sign
x−1 z−y
because y 2 − z 2 = 1 − x2 6= 0. Next, observe that we have st = α and s/t = β, so s2 = αβ and
t2 = α/β. Now, it is just a matter of careful sign-checking to see that we cover all the bases.
s t x y, z
√ p
αβ α/β |x| > 1 y<z
√ p
− αβ α/β |x| < 1 y > |z|
√ p
αβ − α/β |x| < 1 y < −|z|
√ p
− αβ − α/β |x| > 1 y>z

2.2. A Bit of Topology in Rn

2.2.1 We refer to the respective subsets as S, for convenience.


a. Neither: there is no neighborhood of a = 2 contained in S, so S is not open; the
sequence {xk = 1/k} converges to 0, but 0 ∈/ S, so S is not closed.
S

b. Closed: For example, R − S = (−∞, 0) ∪ (2−(k+1) , 2−k ) ∪ (1/2, ∞) is open.
k=1
 
a
c. Open: Given c = ∈ S, we see that B(c, b) ⊂ S.
b
d. Closed: R2 − S = {y < 0}, which is open by the same reasoning as in part c.
 
a √
e. Open: Given c = ∈ S, B(c, (b − a)/ 2) ⊂ S.
b
 
a
f. Open: Given c = ∈ S, either a 6= 0, in which case B(c, |a|) ⊂ S, or b 6= 0, in
b
which case B(c, |b|) ⊂ S.
g. Closed: The complement is open (by the same reasoning
  as in part e). Alternatively,
xk
given a sequence {xk } in S converging to a ∈ R2 , write xk = , so xk → a and yk → b. Since
yk
xk = yk , it follows that a = b and so a ∈ S.
h. Open: Given a ∈ S, let r = min(kak, 1 − kak). Then r > 0 and, by the triangle
inequality, B(a, r) ⊂ S.
38 2. FUNCTIONS, LIMITS, AND CONTINUITY

i. Open: Given a ∈ S, note that B(a, kak − 1) ⊂ S.


j. Closed: Rn − S is the subset in part i.
k. Neither: Every open interval in R contains irrational numbers, so S cannot be open.
S cannot be closed since every irrational number is the limit of a sequence of rational numbers.
l. Open: If a ∈ S, let r = min (1 − kak, 1 − ka − e1 k). Then B(a, r) ⊂ S.
m. Both: Since there is no element of the empty set, both definitions hold vacuously.

2.2.2 Suppose xk → a. We want to show that xk,i → ai for all i = 1, . . . , n. Given ε > 0,
there is K ∈ N so that kxk − ak < ε whenever k > K. Since |xk,i − ai | ≤ kxk − ak, it follows that
|xk,i − ai | < ε whenever k > K, so we are done.
Conversely, suppose that xk,i → ai for all i = 1, . . . , n. Given ε > 0, for each i, there is Ki ∈ N so

that |xk,i −ai | < ε/ n whenever k > Ki . Then it follows that whenever k > K = max(K1 , . . . , Kn ),
we have
n
X 1/2 √ 1/2
kxk − ak = |xk,i − ai |2 < n(ε/ n)2 = ε,
i=1

as required.

2.2.3 a. By Exercise 1.2.17, we have kxk k − kak ≤ kxk − ak → 0 as k → ∞. (More
pedantically, given ε > 0, the same K that works for the original sequence works here.)
b. By linearity of the dot product and the Cauchy-Schwarz inequality, we have |b · xk −
b · a| = |b · (xk − a)| ≤ kbkkxk − ak → 0 as k → ∞. (More rigorously, if b = 0, there’s nothing to
prove. If b 6= 0, given ε > 0, there is K ∈ N so that kxk − ak < ε/kbk whenever k > K. Then we
have |b · xk − b · a| < ε whenever k > K.)

2.2.4 We apply the results of Example 6 and Exercise 2. Suppose xk ∈ R and xk → c. Then
xk,i → ci for i = 1, . . . , n. Since each interval [ai , bi ] is closed, we know that ci ∈ [ai , bi ], which
means that c ∈ R.

2.2.5 / B(a, r), then ky − ak = s > r. Let δ = s − r. By the triangle inequality, for every
If y ∈
point z ∈ B(y, δ) we have ky − ak ≤ ky − zk + kz − ak, so kz − ak ≥ ky − ak − ky − zk > s − δ = r.
Therefore, B(y, δ) ⊂ Rn − B(a, r), and so, by Proposition 2.1, B(a, r) is closed.

2.2.6 a. Since we are told that xk → a, given any ε > 0 there is K so that kxk − ak < ε
whenever k > K. Choose J so that kJ ≥ K. Then, whenever j > J, we have kj > kJ ≥ K, and so
kxkj − ak < ε, as required.
b. Yes, trivially. If every subsequence converges to a, take the case of the original
sequence.

2.2.7 a. Suppose a ∈ U ∪ V . Then either a ∈ U or a ∈ V . Let’s suppose the former. Since U


is open, it follows that there is r > 0 so that B(a, r) ⊂ U , so B(a, r) ⊂ U ∪ V and U ∪ V is open.
2.2. A BIT OF TOPOLOGY IN Rn 39

Now suppose a ∈ U ∩ V . Since U and V are open, there are r1 , r2 > 0 so that B(a, r1 ) ⊂ U and
B(a, r2 ) ⊂ V . If we set r = min(r1 , r2 ), it follows that B(a, r) ⊂ U ∩ V , so U ∩ V is open.
b. We can deduce these results from part a using DeMorgan’s laws: Rn − (C ∪ D) =
(Rn − C) ∩ (Rn − D) and Rn − (C ∩ D) = (Rn − C) ∪ (Rn − D). But here is a direct argument.
Suppose {xk } is a sequence in C ∩ D that converges to a for some a ∈ Rn . Since xk ∈ C and C
is closed, it follows that a ∈ C; since xk ∈ D and D is closed, it follows that a ∈ D. Therefore,
a ∈ C ∩ D, so C ∩ D is closed.
The argument for the union is slightly more subtle, and requires that we use the result of
Exercise 6. Suppose {xk } is a sequence in C ∪ D that converges to a for some a ∈ Rn . First, it
must be that xk ∈ C for infinitely many k or xk ∈ D for infinitely many k. For definiteness, let’s
say the former. Then we have a subsequence xkj ∈ C, which necessarily converges to a. Since C is
closed, we know that a ∈ C ⊂ C ∪ D, and so C ∪ D is closed.

2.2.8 a. If a ∈ S is not an interior point, then every neighborhood of a contains a point of S


(namely, a) and some point not in S, so a is a frontier point of S. If we take S = [0, 1) ⊂ R, then
1 is a frontier point of S that does not belong to S and 0 is one that does.
  
x
b. Take S = Z ⊂ R, S = {1/k : k ∈ N} ⊂ R, or S = : 0 < x < 1 ⊂ R2 .
0
c. Let Fr(S) denote the set of frontier points of S. Note that Fr(S) = Fr(Rn − S). Thus,
the complement of Fr(S) consists of the union of the set of interior points of S and the set of interior
points of Rn − S, each of which is open. (By definition, if a is an interior point of S, we have a
neighborhood B(a, r) ⊂ S. Then every point of this ball is again an interior point of S.) Thus, by
Exercise 7, the complement of Fr(S) is open, and so Fr(S) is closed.
Alternatively, let {xk } be a sequence of points of Fr(S) that converges to a. Suppose a were
not a frontier point of S. Then there would be a neighborhood B(a, r) of a lying either entirely in
S or entirely in Rn − S. Choosing k sufficiently large that xk ∈ B(a, r), we see that xk cannot be
a frontier point of S, for B(a, r) is a neighborhood of xk that fails to contain points of both S and
Rn − S.
d. Rn − S ′ is the set of interior points of Rn − S, and is therefore open. Therefore, S ′ is
closed.
e. Suppose C is a closed set containing S. Choose x ∈ Rn − C. Then, since C is closed,
there is a neighborhood B(x, r) ⊂ Rn − C ⊂ Rn − S, so x is neither an element of S nor a frontier
point of S. Thus, x ∈ Rn − S ′ . Rn − C ⊂ Rn − S ′ is equivalent to S ′ ⊂ C, as required.

2.2.9 a. Let S = (−1, 0) ∪ (0, 1) ⊂ R. Then S = [−1, 1]. 0 is an interior point of S that is not
an element of S.
b. Let S = Q ⊂ R. Then F = Fr(Q) = R, and the set of frontier points of F is empty.

2.2.10 a. If Ik = [ak , bk ], let x = sup{ak }. The set of left-hand endpoints is bounded above
(e.g., by b1 ), and so the least upper bound exists. We have ak ≤ x for all k automatically. Now, if
40 2. FUNCTIONS, LIMITS, AND CONTINUITY

x > bj for some j, then since Ik ⊂ Ij for all k > j, this means that bj is an upper bound of the set
as well, contradicting the fact that x is the least upper bound.
b. Take Ik = (0, 1/k). Then there is no point in Ik for all k ∈ N.

2.2.11 As suggested by the hint, suppose S 6= ∅ and S 6= R. Choose a ∈ S and b ∈ / S and,


without loss of generality, suppose a < b. Let α = sup{x ∈ R : [a, x] ⊂ S}. (Since b is an upper
bound of S, the set has a least upper bound.) Now, since S is closed, α ∈ S. On the other hand,
since S is open, once we know α ∈ S, there must be δ > 0 so that (α − δ, α + δ) ⊂ S, contradicting
the fact that α is an upper bound of S.

2.2.12 a. Suppose xk → a. Given ε > 0, there is K ∈ N so that whenever k > N we have


kxk − ak < ε/2. It follows from the triangle inequality that whenever k, ℓ > K, we have

ε ε
kxk − xℓ k = k(xk − a) + (a − xℓ )k ≤ kxk − ak + kxℓ − ak < + = ε,
2 2

as required.
b. Suppose {xk } is a Cauchy sequence and xkj → a. Suppose ε > 0 is given. Then, on
one hand, we have J ∈ N so that, whenever j > J, we have kxkj − ak < ε/2. On the other hand, we
have K ∈ N so that, whenever k, ℓ > K, we have kxk − xℓ k < ε/2. Choose j0 > J so that kj0 > K
(this is possible because kj → ∞). Then, whenever k > K, we have

ε ε
kxk − ak = k(xk − xkj0 ) + (xkj0 − a)k ≤ kxk − xkj0 k + kxkj0 − ak < + = ε.
2 2

Thus, xk → a.

2.2.13 Choose ε = 1. Then there is K ∈ N so that for all k, ℓ > K we have kxk − xℓ k < 1. In
particular, for all k > K we have kxk − xK+1 k < 1, so kxk k < 1 + kxK+1 k. Therefore, for all j ∈ N,

we have kxj k ≤ max kx1 k, kx2 k, . . . , kxK k, kxK+1 k + 1 .

2.2.14 a. As the hint suggests, we proceed by “successive bisection.” We construct a nested


sequence of intervals · · · ⊂ Ij ⊂ Ij−1 ⊂ · · · ⊂ I2 ⊂ I1 ⊂ [a, b], with |bj − aj | = |b − a|/2j , so that
each Ij contains infinitely many distinct xk ’s. By Exercise 10, there is a point x0 ∈ Ij for all j.
Choosing xkj ∈ Ij for each j we obtain a subsequence. On the other hand, it must converge to x0 ,
since |xkj − x0 | ≤ |b − a|/2j → 0 as j → ∞.
b. By Exercise 13, any Cauchy sequence in R is contained in some closed interval [−R, R].
By part a, it has a convergent subsequence. But then it follows from Exercise 12 that the sequence
itself must converge.
c. Suppose {xk } is a Cauchy sequence in Rn . Since |xk,i − xℓ,i | ≤ kxk − xℓ k, it follows
that for each i = 1, . . . , n, the sequence {xk,i } of ith coordinates is Cauchy as well. This means
that, by part b, each of the coordinate sequences is convergent. But then by Exercise 2, the original
sequence must converge itself.
2.3. LIMITS AND CONTINUITY 41

2.2.15 Let {xk } be a sequence of points in S. We wish to construct a convergent subsequence


(which, necessarily, will converge to a point of S, since S is closed). We proceed by induction on
n. When n = 1, we are done by part a of Exercise 14. Suppose nowthatn ≥ 2 and we know
x1
the result n−1 . We introduce some notation: given x =  ..  ∈ Rn , we write x =
  to be true in R  . 
x1 xn
 ..  n−1 . Given our sequence {x }, consider the sequence {x } of points in the rectangle
 .  ∈ R k k
xn−1
[a1 , b1 ] × · · · × [an−1 , bn−1 ] ⊂ Rn−1 . By our induction hypothesis, there is a convergent subsequence
{xkj }. Now the sequence of nth coordinates of the corresponding vectors xkj , lying in the closed
interval [an , bn ], has in turn a convergent subsequence, indexed by kj1 < kj2 < · · · < kjℓ < . . .. But
then, by Exercises 6 and 2, it now follows that the subsequence {xkjℓ } converges, as required.

2.3. Limits and Continuity

2.3.1 Suppose we were to have two limits ℓ and m for f (x) as x approaches a. Then for any
ε > 0 there would be δ1 , δ2 > 0 so that kf (x)−ℓℓ k < ε whenever 0 < kx−ak < δ1 and kf (x)−m
mk < ε
whenever 0 < kx − ak < δ2 . Then, by the triangle inequality, when 0 < kx − ak < δ = min(δ1 , δ2 ),
we would have

kℓℓ − m k = k(ℓℓ − f (x)) + (f (x) − m )k ≤ kf (x) − ℓ k + kf (x) − m k < 2ε.

Choosing ε ≤ kℓℓ − m k/2 yields a contradiction.

2.3.2 Let a ∈ Rn be arbitrary. Given any ε > 0, note that if kx − ak < ε, then (by Exercise
1.2.17)

|f (x) − f (a)| = kxk − kak ≤ kx − ak < ε.

Thus, lim f (x) = f (a), so f is continuous.


x→a

2.3.3 Because lim f (x) = ℓ = lim h(x), given any ε > 0, there are δ1 , δ2 > 0 so that |f (x) −
x→a x→a
ℓ| < ε whenever 0 < kx − ak < δ1 and |h(x) − ℓ| < ε whenever 0 < kx − a| < δ2 . Set δ = min(δ1 , δ2 ).
Then, whenever 0 < kx − ak < δ, we have

−ε < f (x) − ℓ ≤ g(x) − ℓ ≤ h(x) − ℓ < ε,

so, in particular, |g(x) − ℓ| < ε. Thus, lim g(x) = ℓ.


x→a

2.3.4 We copy the proof of second part of Theorem 3.2 with only minor alterations. Given
ε > 0, there are δ1 , δ2 > 0 so that
ε 
kf (x) − ℓ k < min , 1 whenever 0 < kx − ak < δ1
2(|c| + 1)
42 2. FUNCTIONS, LIMITS, AND CONTINUITY

and
ε
|k(x) − c| < whenever 0 < kx − ak < δ2 .
2(kℓℓ k + 1)
Note that when 0 < kx − ak < δ1 , we have (by the triangle inequality) kf (x)k < kℓℓk + 1. Now, let
δ = min(δ1 , δ2 ). Whenever 0 < kx − ak < δ, we have

kk(x)f (x) − cℓℓk = k(k(x) − c)f (x) + c(f (x) − ℓ )k ≤ |k(x) − c|kf (x)k + |c|kf (x) − ℓ k
< (kℓℓ k + 1)|k(x) − c| + |c|kf (x) − ℓk
ε ε ε ε
< (kℓℓ k + 1) + |c| < + = ε,
2(kℓℓ k + 1) 2(|c| + 1) 2 2
as required.

2.3.5 Suppose f is continuous at a and f (a) > 0. Then, given any ε > 0, there is δ > 0 so that
whenever x ∈ B(a, δ), we have |f (x) − f (a)| < ε. Choose ε = f (a). Then for every x ∈ B(a, δ), we
have 0 = f (a) − ε < f (x) < f (a) + ε, and we are done. Somewhat more generally, if f is continuous
at a and f (a) 6= 0, then there is a neighborhood of a on which f is nowhere zero.

2.3.6 Let f : R − {0} → R be given by f (x) = 1/x. It is a standard result of single-variable


calculus that f is continuous. Then, by Proposition 3.5, f ◦ g = 1/g is continuous on a neighborhood
of a on which g is nonzero (see Exercise 5).

2.3.7 a. By Proposition 3.1, T is continuous if each of its component functions Ti , i = 1, . . . , m,


is continuous. Since every linear map T : Rn → R is given by dot product with some vector in Rn ,
we infer from Example 1 that each Ti is in fact continuous.
 
A1 · x
 A2 · x 
 
b. Letting Ai denote the ith row vector of A, as usual, we have Ax =  . , so
.
 . 
Xm Xm X 
Am · x
kAxk2 = (Ai · x)2 ≤ kAi k2 kxk2 = a2ij kxk2 ,
i=1 i=1 i,j
P 1/2
and thus kAxk ≤ a2ij kxk. Taking A to be the standard matrix of T , this inequality is
obviously enough to prove that T is continuous at 0. But then linearity establishes continuity at

an arbitrary point a: since lim T (x − a) = 0, we have lim T (x) = lim T (a) + T (x − a) = T (a).
x→a x→a x→a

2.3.8 a. lim f (x) = 0, since the limit of the quotient is the quotient of the limits whenever
x→0
the limit of the denominator is nonzero.
sin u
b. lim f (x) = 1, since lim = 1.
x→0 u→0 u 
    x + y, x 6= y
x x
c. We have f = x + y whenever x 6= y. So we have f = .
y y 0, x=y
Thus, lim f (x) = 0, since 0 ≤ |f (x)| ≤ |x + y| and x + y → 0 as x → 0.
x→0
d. lim f (x) = 1, since exp is continuous at 0.
x→0
2.3. LIMITS AND CONTINUITY 43

e. lim f (x) = 0, since lim e−u = 0.


x→0 u→∞
 2  
y 0
f. lim f (x) does not exist, as lim f = 1 and lim f = 0. More interestingly,
x→0 y→0 y y→0 y
 4
y
note that f → ∞ as y → 0.
y
kxk3
g. lim f (x) = 0 since |f (x)| ≤ = kxk → 0 as x → 0.
x→0 kxk2
 
x 2
h. lim f (x) = 0. Since | sin y| ≤ |y|, we have f ≤ |x|y ≤ |x| → 0 as x → 0.
x→0 y x2 + y 2
 
2y 2
i. lim f (x) does not exist, since f = → ∞ as y → 0.
x→0 y 7|y|
   
x x
j. lim f (x) does not exist, as lim f 2 = 2 and lim f = 0. More inter-
x→0 x→0 −x + x x→0 −x
 
x
estingly, note that f 4 → ∞ as x → 0.
−x + x

2.3.9 Suppose xk → a. Since f is continuous, by Proposition 3.6, we have f (xk ) → f (a). By


definition of the sequence {xk }, this means that xk+1 → f (a), but xk+1 → a, so (by uniqueness of
the limit) we must have f (a) = a.

2.3.10 Throughout this problem we operate under the assumption that the sequences are known
to be convergent.

a. Here we have f (x) = 2x, whose only fixed points are 0 and 2. Now we claim that
xk → 2, since the sequence is increasing (by induction, 1 ≤ xk < 2, and so xk < xk+1 ).
b. Here we have f (x) = x/2 + 2/x, whose fixed points are ±2. Since xk > 0 for all k, it
follows that xk → 2.

c. Set f (x) = 1 + 1/x. Then the fixed points of f are (1 ± 5)/2. Since xk > 0 for all

k, it follows that xk → (1 + 5)/2 (the golden ratio).

d. Let f (x) = 1 + 1/(1 + x). Then f (x) = x ⇐⇒ (x − 1)(x + 1) = 1 ⇐⇒ x = ± 2.

Since xk > 0 for all k, we deduce that xk → 2.

x, x<0
2.3.11 Take f (x) = . Then for every c ∈ R, f −1 ({c}) is either one point or the
x + 1, x ≥ 0
empty set.

2.3.12 This is false. Consider f : R → R, f (x) = x2 . Then the image of the open interval
(−1, 1) is the half-open interval [0, 1), which is not an open subset of R. Even easier, take f to be
any constant function.

2.3.13 We give two different proofs, the first using Proposition 3.4, the second using Proposition
3.6. Suppose C ⊂ Rm is closed and f : Rn → Rm is continuous.
44 2. FUNCTIONS, LIMITS, AND CONTINUITY

First proof : By Proposition 2.1, Rm −C is open, and so f −1 (Rm −C) is open. But now we claim
that f −1 (Rm − C) = Rn − f −1 (C), so it follows that f −1 (C) must be closed. To establish equality
of the sets, note that x ∈ f −1 (Rm − C) ⇐⇒ f (x) ∈ Rm − C ⇐⇒ f (x) ∈ / C ⇐⇒ x ∈ / f −1 (C).
Second proof : We wish to show that f −1 (C) is closed. Suppose {xk } is a convergent sequence
of points in f −1 (C). Now, xk ∈ f −1 (C) means that f (xk ) ∈ C. By continuity, if xk → a, then
f (xk ) → f (a). Since C is closed, the limit of the convergent sequence {f (xk )} must belong to C.
Therefore, f (a) ∈ C, which means, by definition, that a ∈ f −1 (C), as we needed to establish.

2.3.14 a. The determinant is a polynomial function of the entries of the square matrix, hence
is continuous. Therefore, {A : det A 6= 0} = det−1 (R − {0}) is the preimage of an open set and is
therefore open.
b. This is an immediate consequence of Corollary 3.7, since f : Mn×n → Mn×n , f (A) =
AAT , is continuous.

2.3.15 a. f is obviously discontinuous at 0 since on any ball centered at 0 f takes on both


the values 0 and 1. On the other hand, f is identically 0 on both axes and, restricting to the line
y = mx, f is identically 0 on the interval |x| < |m|. Therefore, f is continuous at 0 on every line
through 0.

  
x 0, |y| > |x|3 or y = 0
b. By analogy with part a, set f = .
y 1, otherwise

  y 2 /x,    
x x 6= 0 x 0
2.3.16 a. Take f = . Then f = m2 x → 0 as x → 0 and f =0
y 0,x=0 mx y

for all y. But f is unbounded on the curve x = y 3 as x →


 0.
  0,
x |y| ≤ x2
b. Here are two examples. Take f = . Then f is
y (y − x2 )(x/y)4 , |y| > x2
continuous everywhere except at 0, and along the curve y = x3/2 we see that the values of f are
unbounded as x → 0.  
x
For another example, motivated by Example 5, let f = (x2 |y|)2/3 /(x4 + y 2 ), x 6= 0,
y
f (0) = 0. The function is unbounded on the parabola y = x2 .

2.3.17 The answer is that we must have α/γ + β/δ > 1. Here is the most elegant way we know
to show this. Using a “weighted” version of polar coordinates (see Example 6 in Section 1), and
working only with x ≥ 0, y ≥ 0, we take xγ/2 = r cos θ and y δ/2 = r sin θ. Then

xα y β
 
1 2 α
+ βδ −1
γ δ
= 2 r 2α/γ r 2β/δ (cos θ)2α/γ (sin θ)2β/δ ≤ r γ
.
x +y r
Thus, we see that f (x) → 0 as x → 0 whenever   α/γ + β/δ > 1. On the other hand, when
x 1
α/γ +β/δ = 1, we choose θ = π/4 and get that f = when xγ = y δ . And when α/γ +β/δ < 1,
y 2
approaching along the same curve results in arbitrarily large values of f .
2.3. LIMITS AND CONTINUITY 45

2.3.18 a. The function f : Rn → Rn , f (b) = A−1 b, is continuous (e.g., by Exercise 7).


b. When n = 1 and n = 2, we know that taking the inverse matrix is a continuous
function on the (open) subset of invertible matrices in Mn×n (see the explicit formula in Example
9 of Chapter 1, Section 4 for the case n = 2). We will learn in Chapters 6 and 7 that this is true for
arbitrary n. The standard notation for the subset of invertible n × n matrices is GL(n). Then the
A
function F : GL(n) × Rn → Rn , F = A−1 b, is the composition of continuous functions and is
b
therefore continuous: F = f ◦ g, where
g f
GL(n) × Rn −→ Mn×n × Rn −→ Rn
  " −1 #  
A A A
g = , f = Ab.
b b b
CHAPTER 3
The Derivative
3.1. Partial Derivatives and Directional Derivatives
∂f ∂f
3.1.1 a. = 3x2 + 3y 2 , = 6xy − 2.
∂x ∂y
∂f x ∂f y
b. =p , =p .
∂x x + y ∂y
2 2 x + y2
2

∂f y ∂f x
c. =− 2 2
, = 2 .
∂x x + y ∂y x + y2
∂f 2 2 ∂f 2 2
d. = −2xe−(x +y ) , = −2ye−(x +y ) .
∂x ∂y
∂f ∂f
e. = log x + 1 + y 2 /x, = 2y log x.
∂x ∂y
∂f ∂f
f. = yexy z 2 − y sin(πyz), = xexy z 2 − x sin(πyz) − πxyz cos(πyz),
∂x ∂y
∂f
= 2exy z − πxy 2 cos(πyz).
∂z
f (a + tv) − f (a) (2 + t)2 + (2 + t)(1 − t) − 6 3t
3.1.2 a. Dv f (a) = lim = lim = lim = 3.
t→0 t t→0 t t→0 t

f (a + tv) − f (a) (2 + √12 t)2 + (2 + √12 t)(1 − √12 t) − 6


b. Dv f (a) = lim = lim
t→0 t t→0 t
√3 t
2 3
= lim = √ .
t→0 t 2
f (a + tv) − f (a) (1 + 4t)e−3t − 1
c. Dv f (a) = lim = lim = ϕ′ (0) where ϕ(t) =
t→0 t t→0 t
(1 + 4t)e−3t . By the product rule, ϕ′ (t) = e−3t (4 − 3(1 + 4t)) = e−3t (1 − 12t), so ϕ′ (0) = 1.
3
f (a + tv) − f (a) (1 + 45 t)e− 5 t − 1
d. Dv f (a) = lim = lim = ϕ′ (0) where ϕ(t) =
3
t→0 t 3
t→0 t 3 3
(1+ 54 t)e− 5 t . By the product rule, ϕ′ (t) = e− 5 t ( 54 − 35 (1+ 45 t)) = e− 5 t ( 45 − 53 (1+ 45 t)) = e− 5 t ( 15 − 25
12
t),

so ϕ (0) = 1/5.

a.
Let ϕ(t) = 2 ′
3.1.3  f (a+tv)
  = (2+tv1 ) +(2+tv1 )(1+tv2 ).Then Dv f (a) = ϕ (0) = 5v1 +2v2 .
5 v 1 5
Note that 5v1 + 2v2 = · 1 is largest when v = √ .
2 v2 29 2    
b. Let ϕ(t) = (1 + tv )e−tv1 . Then D f (a) = ϕ′ (0) = −v + v = −1 · v1 is largest
  2 v 1 2
1 v2
1 −1
when v = √ .
2 1

46
3.1. PARTIAL DERIVATIVES AND DIRECTIONAL DERIVATIVES 47

1 1 1
c. Let ϕ(t) = + + . Then Dv f (a) = ϕ′ (0) = −v1 − v2 − v3 =
    1 + tv 1 −1 + tv
 2 1 + tv 3
−1 v1 1
 −1  ·  v2  is largest when v = − √1  1 .
−1 v 3 1
3

f (a − tv) − f (a) f (a + sv) − f (a) f (a + sv) − f (a)


3.1.4 D−v f (a) = lim = lim = − lim =
t→0 t s→0 −s s→0 s
−Dv f (a).

3.1.5 a. Since D−v f (a) = −Dv f (a), we cannot have Dv f (a) > 0 for all nonzero v.
b. Let b 6= 0 and f (x) = b · x. Then Db f (a) = kbk2 for all a.
     
V nRT p nRT p pV ∂f
3.1.6 We have p = f = , V =g = , and T = h = , so =
T V T p V nR ∂V
nRT ∂g nR ∂h V
− 2 , = , and = . Thus,
V ∂T p ∂p nR
∂f ∂g ∂h nRT nR V
· · =− 2 · · = −1.
∂V ∂T ∂p V p nR
(See Exercise 6.2.8 for the general result.)

3.1.7 By the single-variable chain rule, we have


     
∂g x 1 ∂g x x
= f′ · and = f′ · − 2 ,
∂x y y ∂y y y
  
∂g ∂g x x x
so x +y = f′ − = 0.
∂x ∂y y y y

3.1.8 By the single-variable chain rule, we have


∂g p x ∂g p y
= f ′ ( x2 + y 2 ) · p and = f ′ ( x2 + y 2 ) · p ,
∂x x2 + y 2 ∂y x2 + y 2
∂g p xy ∂g
so y = f ′ ( x2 + y 2 ) · p =x .
∂x 2
x +y 2 ∂y
   
x 0
3.1.9 Since f =0=f for all x, y, both partial derivatives of f at 0 are 0. When
0 y
 
v1 v1 v2
v= , f (tv) = 2 for all t 6= 0, so f is discontinuous at 0 along the line spanned by v
v2 v1 + v22
whenever v1 v2 6= 0. Consequently, for such v, the directional derivative Dv f (0) does not exist.

3.1.10 a. Note first that De1 f (0) = 0, since f is identically 0 on the x-axis. If v2 6= 0,
t3 v12 v2
f (tv) − f (0) t4 v4 +t2 v22 v 2 v2 v2
lim = lim 1 = lim 2 1 2 4 = 1 .
t→0 t t→0 t t→0 v2 + t v1 v2
 
x x3 y
b. f = 6 , x 6= 0, f (0) = 0. A similar calculation to that in part a shows that
y x + y2
Dv f (0) = 0 for all v. And, approaching along y = x3 , we see that f is discontinuous.
48 3. THE DERIVATIVE

T (a + tv) − T (a) T (a) + tT (v) − T (a)


3.1.11 Dv T (a) = lim = lim = T (v).
t→0 t t→0 t

f (A + tB) − f (A) (A + tB)T − AT AT + tB T − AT


3.1.12 a. DB f (A) = lim = lim = lim =
t→0 t t→0 t t→0 t
B T.
f (A + tB) − f (A) tr(A + tB) − trA trA + t trB − trA
b. DB f (A) = lim = lim = lim =
t→0 t t→0 t t→0 t
trB.
f (A + tB) − f (A) (A + tB)2 − A2 t(AB + BA) + t2 B 2
3.1.13 a. DB f (A) = lim = lim = lim =
t→0 t t→0 t t→0 t
AB + BA.
f (A + tB) − f (A) (A + tB)T (A + tB) − AT A
b. DB f (A) = lim = lim
t→0 t t→0 t
t(B T A + AT B) + t2 B T B
= lim = B T A + AT B.
t→0 t

3.2. Differentiability

3.2.1 Note that every function here is C1 , so we know it is differentiable and can apply Propo-
sition 2.1. h i
a. [Df (a)] = 2e−2 −e−2 , so the equation of the tangent plane is

z = e−2 + e−2 2(x + 1) − (y − 2) = e−2 (2x − y + 5).
h i
b. [Df (a)] = −2 4 , so the equation of the tangent plane is
z = 5 + (−2)(x + 1) + 4(y − 2) = −2x + 4y − 5.
h i
c. [Df (a)] = 3/5 4/5 , so the equation of the tangent plane is
z = 5 + (3/5)(x − 3) + (4/5)(y − 4) = 53 x + 45 y.
h √ √ i
d. [Df (a)] = −1/ 2 −1/ 2 , so the equation of the tangent plane is
√ √ √
z = 2 + (−1/ 2)(x − 1) + (−1/ 2)(y − 1) = √12 (−x − y + 4).
h i
e. [Df (a)] = 6 3 2 , so the equation of the tangent plane is
w = 6 + 6(x − 1) + 3(y − 2) + 2(z − 3) = 6x + 3y + 2z − 12.
h i
f. [Df (a)] = −1 1 1 , so the equation of the tangent plane is
w = 1 + (−1)(x − 1) + 1(y − 0) + 1(z + 1) = −x + y + z + 3.
1
3.2.2 h √ here is√C i, it is differentiable, and we can apply Proposition 2.3.
Since every function
a. [Df (a)] = 1/ 2 −1/ 2 , so Dv f (a) = Df (a)v = 0.
h √ √ i √
b. [Df (a)] = 1/ 2 −1/ 2 , so Dv f (a) = Df (a)v = 2.
h i
c. [Df (a)] = 1 6 , so Dv f (a) = Df (a)v = 13.
h i √
d. [Df (a)] = 4 2 , so Dv f (a) = Df (a)v = 2 5.
3.2. DIFFERENTIABILITY 49
h √ √ i
e. [Df (a)] = 2/ 5 1/ 5 , so Dv f (a) = Df (a)v = 1.
h i
f. [Df (a)] = e −e −e , so Dv f (a) = Df (a)v = −e.
" #
y x
3.2.3 a.
2x 2y
 
− sin t
 
b.  cos t 
et
" #
cos t −s sin t
c.
sin t s cos t
" #
yz xz xy
d.
1 1 2z
 
cos y −x sin y
 
e.  sin y x cos y 
0 1

3.2.4 We useh the function


i
f and the approach of Example 4, now setting a = 216 and b = 6.
a
Then Df = 1/6 −6 and
b
    " #
224 216
h i 8 4
1
f ≈f + 6 −6 = 36 + − 3 ≈ 34.33.
6.5 6 1/2 3

(To two decimal places, the correct value is 34.46.)


 
a
3.2.5 The area of a triangle with sides a and b and included angle θ is given by f  b  =
θ
 
3 h√ √ i
1
ab sin θ. Then Df  4  = 3 3 3/4 3 . Since a small change of h in the variable xi results
2
π/3
∂f
in a change of approximately (a)h, we see that the area is, at the given dimensions, most
∂xi
sensitive to a change in θ.

3.2.6 If f is differentiable at a, then we have a linear map Df (a) so that


f (a + h) − f (a) − Df (a)h
lim = 0. Then each component of this vector function must also ap-
h→0 khk
proach 0, and that tells us that for each i = 1, . . . , m there is a linear map Ti (the ith component
fi (a + h) − fi (a) − Ti (a)h
of Df (a)) so that lim = 0. Thus, fi is differentiable at a. Conversely,
h→0 khk
if each
 fi  is differentiable at a, with derivative Ti = Dfi (a), then it follows that the linear map
T1
 
T =  ...  satisfies the definition of the derivative of f at a.
Tm
50 3. THE DERIVATIVE

3.2.7 Suppose T : Rn → Rm is linear. We claim that for any a ∈ Rn DT (a) = T . It suffices


T (a + h) − T (a) − T (h)
to observe that T (a + h) − T (a) − T (h) = 0, and so = 0, so certainly the
khk
requisite limit exists and is 0.
 
x p
3.2.8 2 2 2
The nappes of the cone z =x + y are the graphs of z = ±f = ± x2 + y 2 . The
y
a
a b
tangent plane of this graph at a =  b  is given by z − c = (x − a) + (y − b), or, simplifying,
c c
c
cz = ax + by. To find the intersection of this plane and the cone, we solve the system of equations

z 2 = x2 + y 2 and cz = ax + by.

By algebra, we obtain (ax + by)2 = c2 (x2 + y 2 ), and so (bx − ay)2 = 0. This means that we have
y =bx/a and z = cx/a, and so the intersection is the line through the origin with direction vector

a
 b . (This is, of course, the generator of the cone passing through a.)
c  
a
3.2.9 The tangent plane of the surface z = xy at a =  b  is given by z−c = b(x−a)+a(y−b),
c
or, simplifying, z = bx + ay − ab. To find the intersection of this plane and the original surface, we
solve the system of equations

z = xy and z = bx + ay − ab.

Solving, we obtain xy− − ay + ab= (x − a)(y − b) = 0,so theintersection


 bx   consists of two lines:
 a   x 
the first is given by  y  : y ∈ R and the second by  b  : x ∈ R .
   
ay bx
 x y 
" # p −p
2x −2y p  x2 + y 2 x2 + y 2 
3.2.10 Df (a) = = 2 x2 + y 2  y x . (Note that the indi-
2y 2x p p
x2 + y 2 x2 + y 2
| {z }

cated matrix is a rotation matrix because it is orthogonal with positive determinant. There is a
p p
unique θ ∈ [0, 2π) with cos θ = x/ x2 + y 2 and sin θ = y/ x2 + y 2 .)
h i  
a
3.2.11 a. We establish that the linear map T = 2a 2b is the derivative of f at a = :
b

f (a + h) − f (a) − T (h) (a + h)2 + (b + k)2 − (a2 + b2 ) − 2(ah + bk)


lim = lim
h→0 khk h→0 khk
h2 + k 2
= lim = lim khk = 0.
h→0 khk h→0
3.2. DIFFERENTIABILITY 51

h i  
a
b. We establish that the linear map T = b2 2ab is the derivative of f at a = :
b

f (a + h) − f (a) − T (h) (a + h)(b + k)2 − (ab2 ) − (b2 h + 2abk)


lim = lim
h→0 khk h→0 khk
2bhk + (a + h)k 2
= lim √ = 0,
h→0 h2 + k 2
|k| √ √
inasmuch as √ ≤ 1, and so hk/ h2 + k 2 and hk 2 / h2 + k 2 both approach 0.
h + k2
2

c. We establish that the linear map T = 2aT is the derivative of f at a:


f (a + h) − f (a) − T (h) ka + hk2 − kak2 − 2a · h khk2
lim = lim = lim = 0,
h→0 khk h→0 khk h→0 khk

as required.
∂f ∂f
3.2.12 Since f = 0 on the coordinate axes, it is clear that (0) = (0) = 0. On the other
∂x ∂y
hand,
∂f 2xy(y 2 − x4 ) ∂f x2 (x4 − y 2 )
= and = ,
∂x (x4 + y 2 )2 ∂y (x4 + y 2 )2
   
∂f x ∂f x
from which we see that → 2 as x → 0 and → ∞ as x → 0. Either of these is
∂x x ∂y 0
sufficient to establish that f cannot be C1 at 0.
2
3.2.13 Throughout this exercise, we identify n × n matrices with vectors in Rn . We have
(A + B)2 − A2 − (AB + BA) B2
lim = lim = O, since kB 2 k ≤ kBk2 . The latter statement
B→O kBk B→O kBk
follows from the Cauchy-Schwarz inequality, quite like Exercise 2.3.7: By the definition of matrix
P
n Pn
product, kB 2 k2 = kBbj k2 ≤ kBk2 kbj k2 = kBk4 .
j=1 j=1
Similarly, in the case of the second function, we have
(A + B)T (A + B) − AT A − (BAT + AB T ) B TB
lim = lim = O.
B→O kBk B→O kBk
P P
In this case, we have kB T Bk2 = (Bi · bj )2 ≤ kBi k2 kbj k2 = kBk4 .
i,j i,j

3.2.14 a. Of course, it is clear that f is C1 , hence differentiable. But, arguing directly, we have
f (a + h) − f (a) − (Aa · h + Ah · a) h · Ah
lim = lim = 0,
h→0 khk h→0 khk

since |h · Ah| ≤ khkkAhk ≤ kAkkhk2 .


b. When A = AT , we have Df (a)h = Aa · h + Ah · a = Aa · h + a · AT h = 2Aa · h.

3.2.15 Since f is differentiable at a, by Proposition 2.3 it suffices to check that Dv f (a) = 0 for
every v. This is a standard result from single-variable calculus: Letting ϕ(t) = f (a + tv), we are
told that 0 is a local maximum point of ϕ, and so ϕ′ (0) = Dv f (a) = 0. (See the proof of Lemma
2.1 of Chapter 5.)
52 3. THE DERIVATIVE
     
x x a
3.2.16 Choose x = ∈ B(a, δ). Write x = and a = . Then
y y b
         
x a x x
f (x) − f (a) = f −f + f −f
b b y b
   
∂f ξ ∂f x
= (x − a) + (y − b)
∂x b ∂y η

for some ξ between a and x and some η between b and y. Since Df = O on B(a, δ), it follows that
∂f ∂f
= = 0 on B(a, δ), and so the right-hand side vanishes. Therefore f (x) − f (a) = 0 for all
∂x ∂y
x ∈ B(a, δ), as required.

3.2.17 a. Since |f (x)| ≤ |y| ≤ kxk, it follows that f is continuous at 0.


b. Since f vanishes on the coordinate axes, both partial derivatives of f at 0 must be 0.
If f were to be differentiable at 0, its derivative would therefore have to be the zero map, and we
f (h) − f (0)
see if this fits the definition: Is lim = 0? The answer is no. If we approach along the
  h→0 khk
t t 1
diagonal, with h = , then we have lim √ = √ 6= 0.
t t→0 t 2 2

3.2.18 a. We claim that every directional derivative of f at 0 is 0: For any v ∈ R2 , v 6= 0, we


have
1 t7 v1 v26 t2 v1 v26
Dv f (0) = lim = lim = 0.
t→0 t t4 v14 + t8 v28 t→0 v14 + t4 v28

b. f is not continuous at 0, as along the curve x = y 2 the value of f is identically 1/2


(except at the origin).
c. f cannot be differentiable at 0 since it is not continuous at 0. (See Proposition 2.2.)

3.2.19 a. See Exercise 3.1.10, part a. f is discontinuous at 0 and therefore cannot be differen-
tiable at 0. (See Proposition 2.2.)
b. See Exercise 3.1.10, part b.

   y px2 + y 2 , x 6= 0
x
c. Take f = |x| . Then evidently De2 f (0) = 0. For any v with
y 
0, x=0
v1 6= 0, we have
tv2
|t|kvk
|t||v1 | v2
Dv f (0) = lim = kvk.
t→0 t |v1 |
 
y4
On the other hand, we see that f → ∞ as y → 0.
y
d. Here are two such functions:
  
   y 4   
 x6 y 2
x (x − y 4 )2 , x > y 4 x , x 6= 0
f = x and g = (x12 + y 4 )(x2 + y 2 ) .
y 0, x ≤ y4 y 
0, x=0
3.3. DIFFERENTIATION RULES 53

(The tricky issue


 3 with f is to make sure it has directionalderivatives
 at every point along x = y 4 .)
y x 1
Note that f → ∞ as y → 0+ . Similarly, we have g 3 = → ∞ as x → 0.
y x 2(x + x6 )
2

3.3. Differentiation Rules


     
cos t − sin t 1 1
3.3.1 We have g′ (t) =  1 , so g′ (0) =  1 . Also note that g(0) =  1 . By the
2t + 4 4 −1
 
h i 1
chain rule, (f ◦ g)′ (0) = Df (g(0))g′ (0) = 2 1 −1  1  = −1.
4
   
  − cos x 2 x
" #
x  x+3y  3 1 −1
3.3.2 We have Df =  e 3ex+3y  and Dg y  = . Thus, by the
y 2 1 z y
y x + 3y z
chain rule, we have
   
  −1 2 " # −1 −1 1
0   3 1 −1  
D(f ◦ g)(0) = Df Dg(0) =  e3 3e3  =  6e3 e3 −e3 
1 1 0 0
1 3 6 1 −1

and
   
0
" # −1 2 " #
3 1 −1   −2 9
D(g◦ f )(0) = Dg 1 Df (0) =  1 3 = .
1 0 1 −1 2
0 0 0

3.3.3 By the chain rule, we have


 
h i − sin t
 
(f ◦ g)′ (t) = Df (g(t))g′ (t) = 2 cos t + 1 sin t 2 sin(t/2)  cos t 
cos(t/2)

= 2 (cos t + 1)(− sin t) + sin t(cos t) + 2 sin(t/2) cos(t/2) = 2(− sin t + sin t) = 0.

We conclude that f ◦ g is a constant function. That is, the parametrized curve g lies on the sphere
x2 + y 2 + z 2 + 2x = 3.
∂f
3.3.4 a. Let f : R3 → R be given by f (x) = kxk. Then = xi /kxk, so Df (x) =
∂xi  
 
3 0
1 h i 1 h i
 
x y z . We have (f ◦ g)′ (2π) = Df  0  g′ (2π) = √ 3 0 10π  3 
kxk 9 + 100π 2
10π 5
50π
=√ .
9 + 100π 2
54 3. THE DERIVATIVE
h i
b. Now we have Df (x) = y x 2z , so
√ 
  
−3/ 2 h i − √32
√  √3  75π
(f ◦ g)′ (3π/4) = Df  3/ 2  g′ (3π/4) = √3 − √32 15π
− 2  = .
2 2
2
15π/4 5

3.3.5 Using a coordinate system (in miles) centered at the radar tower, with x > 0 to the east,
y > 0 to the north, and z > 0 upwards, let f (x) = kxk denote the distance
 from x to the tower.
−3
Denote by g(t) the location of the plane at time t; suppose that g(t0 ) =  0 .
4
 
450
  1 T
a. We are told that g′ (t
0 ) =  0 . Then, since Df (x) = x , we have
kxk
5
 
h i 450
1  
(f ◦ g)′ (t0 ) = −3 0 4  0  = −266.
5
5
That is, the plane is approaching the tower at the rate of 266 mph.
 √ 
225 2

b. Now we are told that g′ (t 0 ) =  225 2 , so
5
√ 

225 2
1h 
i √  √
(f ◦ g)′ (t0 ) = −3 0 4  225 2  = −135 2 + 4 ≈ −186.9.
5
5
That is, the plane is approaching the tower at the rate of approximately 186.9 mph.
  " # " #
V T V (t) 10
3.3.6 We have p = f = . Let g(t) = . We are given g(t0 ) = and
T V T (t) 300
" #
1
g′ (t0 ) = . (V is measured in liters (l), T is measured in ◦ K, p in atmospheres (atm), and
5
  h i
V
time t is measured in minutes.) Then we have Df = −T /V 2 1/V . So, by the chain rule,
T
" #
h i 1
(f ◦ g)′ (t0 ) = Df (g(t0 ))g′ (t0 ) = −3 1/10 = −2.5 atm/min.
5
    h " #
V V V
i V (t)
3.3.7 We have I = f = . Then Df = 1/R −V /R2 . Assuming g(t) =
R R R R(t)
" # " #
10 −0.1
is differentiable, g(t0 ) = and g′ (t0 ) = , then by the chain rule, I ′ (t0 ) = (f ◦ g)′ (t0 ) =
100 0.5
" #
h i −0.1

Df (g(t0 ))g (t0 ) = 0.01 −0.001 = −0.0015 amp/sec.
0.5
3.3. DIFFERENTIATION RULES 55

3.3.8 We know from single-variable calculus that for the function r(x) = 1/x, r ′ (x) = −1/x2 .
1
Then, by the chain rule, D(r ◦ g)(a) = r ′ (g(a))Dg(a) = − Dg(a), as required.
g(a)2

3.3.9 We have

(f · g)(a + h)−(f · g)(a) − Df (a)h · g(a) + f (a) · Dg(a)h

= f (a + h) · g(a + h) − f (a) · g(a) − Df (a)h · g(a) + f (a) · Dg(a)h
= (f (a + h) · g(a + h) − f (a + h) · g(a)) + (f (a + h) · g(a) − f (a) · g(a)) −

Df (a)h · g(a) + f (a) · Dg(a)h
 
= f (a + h) · g(a + h) − g(a) − Dg(a)h + f (a + h) − f (a) − Df (a)h · g(a)
+ (f (a + h) − f (a)) · Dg(a)h.

Therefore, by Cauchy-Schwarz and the triangle inequality,



(f · g)(a + h) − (f · g)(a) − Df (a)h · g(a) + f (a) · Dg(a)h

khk
kg(a + h) − g(a) − Dg(a)hk kf (a + h) − f (a) − Df (a)hk
kf (a + h)k + kg(a)k
khk khk
+ kf (a + h) − f (a)kkDg(a)k,

and all three terms go to 0 as h → 0: the first, by differentiability of g; the second, by differentia-
bility of f ; and the last, by continuity of f .

3.3.10 Perhaps the easiest proof comes from writing everything out in coordinates and using the
product rule for scalar-valued functions. But we can copy the proof given in Exercise 9, substituting
cross product for dot product (and being careful about order). Note that it follows from Proposition
5.1 of Chapter 1 that kx × yk ≤ kxkkyk. We have

(f × g)(a + h)−(f × g)(a) − Df (a)h × g(a) + f (a) × Dg(a)h

= f (a + h) × g(a + h) − f (a) × g(a) − Df (a)h × g(a) + f (a) × Dg(a)h
= (f (a + h) × g(a + h) − f (a + h) × g(a)) + (f (a + h) × g(a) − f (a) × g(a)) −

Df (a)h × g(a) + f (a) × Dg(a)h
 
= f (a + h) × g(a + h) − g(a) − Dg(a)h + f (a + h) − f (a) − Df (a)h × g(a)
+ (f (a + h) − f (a)) × Dg(a)h.

Therefore, by our earlier comment and the triangle inequality,



(f × g)(a + h) − (f × g)(a) − Df (a)h × g(a) + f (a) × Dg(a)h

khk
kg(a + h) − g(a) − Dg(a)hk kf (a + h) − f (a) − Df (a)hk
kf (a + h)k + kg(a)k
khk khk
+ kf (a + h) − f (a)kkDg(a)k,
56 3. THE DERIVATIVE

and all three terms go to 0 as h → 0, just as before.

3.3.11 As the hint suggests, we fix x 6= 0 and consider the function h(t) = t−k f (tx). Assume
first that f is homogeneous of degree k. Then h is a constant function, and so, by the product rule
and chain rule, we have 0 = h′ (t) = −kt−k−1 f (tx) + t−k Df (tx)x. In particular, setting t = 1, we
obtain Df (x)x = kf (x), as required.
Conversely, suppose Df (x)x = kf (x) for all nonzero x. Then it follows that for any t > 0, we
have tDf (tx)x = Df (tx)(tx) = kf (tx). Thus,

h′ (t) = −kt−k−1 f (tx) + t−k Df (tx)x = −kt−k−1 f (tx) + t−k−1 kf (tx) = 0,

and so h(t) = h(1) = f (x) for all t. Therefore, f (tx) = tk f (x) for all t > 0.

3.3.12 Recall that it is a consequence of the Mean Value Theorem that a continuous function
on a closed interval with zero derivative on that interval is a constant function. Fix a ∈ U and
let b be arbitrary. Let g : [0, 1] → Rn be given by g(t) = a + t(b − a), so g parametrizes the line
segment from a to b. Then (f ◦ g)′ (t) = Df (g(t))g′ (t) = 0 for all t, inasmuch as Df (x) = O for all
x ∈ U . Thus, fi ◦ g is a constant function for each i = 1, . . . , m, and so f ◦ g is a constant function.
That is, f (b) = f (a); since b is arbitrary, f (b) = f (a) for all b ∈ U , so f is a constant function.
The same proof shows that the result holds whenever we can join b to a by any differentiable
path g, and, therefore, by extension, by any piecewise-differentiable path.
 
2 x
3.3.13 Let g : R − {0} → R be given by g(x) = kxk. Then h = f ◦ g and Dh =
y
h i ∂h ∂h
f ′ (r) x/r y/r . Therefore, x +y = (x2 + y 2 )f ′ (r)/r = rf ′ (r).
∂x ∂y
  Z v  
2 u 2 u(t)
3.3.14 Define f : R → R by f = h(s)ds and g : (a, b) → R by g(t) = . The
v u v(t)
Fundamental Theorem of Calculus tells us that, since h is continuous, f is C1 and hence differen-
∂f ∂f
tiable, with = −h(u) and = h(v). Then F = f ◦ g is differentiable, and we have
∂u ∂v
" #
h i u′ (t)
′ ′
F (t) = Df (g(t))g (t) = −h(u(t)) h(v(t)) = −h(u(t))u′ (t) + h(v(t))v ′ (t).
v ′ (t)
  " #
u u+v
3.3.15 Letting g = , we have F = f ◦ g, and
v u−v

          " #
u ∂f u ∂f u 1 1
DF = g g
v ∂x v ∂y v 1 −1
             
∂f u ∂f u ∂f u ∂f u
= g + g g − g .
∂x v ∂y v ∂x v ∂y v
3.4. THE GRADIENT 57

Thus,
      !       !
∂F ∂F u ∂f ∂f u ∂f u ∂f u
= g + g g − g
∂u ∂v v∂x ∂y v ∂x v ∂y v
 2     2   
∂f u ∂f u
= g − g .
∂x v ∂y v
  " #
r r cos θ
3.3.16 Letting g = , we have F = f ◦ g and
θ r sin θ

         " #
r ∂fu ∂f u cos θ −r sin θ
DF = g g
θ ∂x v ∂y v sin θ r cos θ
         
∂f r cos θ ∂f r cos θ ∂f r cos θ ∂f r cos θ
= cos θ + sin θ −r sin θ + r cos θ ,
∂x r sin θ ∂y r sin θ ∂x r sin θ ∂y r sin θ
so
 2  2     !2
∂F 1 ∂F ∂f r cos θ ∂f r cos θ
+ 2 = cos θ + sin θ
∂r r ∂θ ∂x r sin θ ∂y r sin θ
    !2
∂f r cos θ1 ∂f r cos θ
−r + 2 sin θ + r cos θ
∂x r sin θ
r ∂y r sin θ
 2   2   
∂f r cos θ ∂f r cos θ
= + .
∂x r sin θ ∂y r sin θ
" #   " #
x u u
3.3.17 As the hint suggests, let =g = . Letting F = f ◦ g, we find that
t v (v − u)/c
∂F
= 0, which means that F is independent of u and therefore a function just of v. That is,
∂u   
u x
F = h(v) for some function h. But v = x + ct, so f = h(x + ct) for some (differentiable)
v t
function h.

3.4. The Gradient

3.4.1 We use the fact that ∇f (a) is the normal to the tangent line at a of the level curve of
f passing through a." # " #
3x2 1
a. ∇f = 2
, so ∇f (a) = 3 . Therefore, the tangent line is given by
3y 4
" # " #
1 x−1
0= · = (x − 1) + 4(y − 2), i.e., x + 4y = 9.
4 y−2
58 3. THE DERIVATIVE
" # " #
3y 2 + yexy 4
b. ∇f = xy
, so ∇f (a) = . Therefore, the tangent line is
6xy + xe − π cos(πy) π
given by " # " #
4 x
0= · = 4x + π(y − 1), i.e., 4x + πy = π.
π y−1
" # " #
3x2 + y 2 4
c. ∇f = 3
, so ∇f (a) = . Therefore, the tangent line is given by
2xy − 4y 2
" # " #
4 x−1
0= · = 4(x − 1) + 2(y + 1), i.e., 2x + y = 1.
2 y+1

3.4.2 We use the fact that ∇f (a) is the normal to the tangent plane at a of the level surface
of f passing througha.   
2x 1
   
a. ∇f =  2y , so ∇f (a) = 2  0 . Therefore, the tangent plane is given by
2z 2
   
1 x−1
   
0 =  0  ·  y  = (x − 1) + 2(z − 2), i.e., x + 2z = 5.
2 z−2
   
2yexy z 5 4
 2 xy 5   
b. ∇f =  z + 2xe z , so ∇f (a) =  1 . Therefore, the tangent plane is given
2yz + 10exy z 4 14
by
   
4 x
   
0 =  1  ·  y − 2  = 4x + (y − 2) + 14(z − 1), i.e., 4x + y + 14z = 16.
14 z−1
   
3x2 + z 2 3
 2   
c. ∇f =  2yz + 3y , so ∇f (a) =  3 . Therefore, the tangent plane is given by
2xz + y 2 1
   
3 x+1
   
0 =  3  ·  y − 1  = 3(x + 1) + 3(y − 1) + z, i.e., 3x + 3y + z = 0.
1 z
   
2e2x+z cos(3y) − y 2
 2x+z   
d. ∇f =  −3e sin(3y) − x , so ∇f (a) =  1 . Therefore, the tangent plane is
e2x+z cos(3y) + 1 2
given by
   
2 x+1
   
0 =  1  ·  y  = 2(x + 1) + y + 2(z − 2), i.e., 2x + y + 2z = 2.
2 z−2
3.4. THE GRADIENT 59

3.4.3 See the map below.

3.4.4 a. We know that moving from the point a ∈ R2 in the direction of v = ∇f (a) will result
in the greatest rate of increase of f , hence in the steepest ascent up the hillside. That rate will be
Dv f (a) = ∇f (a) · v = k∇f (a)k2 = 25. This gives the “rise”
 corresponding to the “run” ∇f (a),
3
and so a vector of steepest ascent up the hillside is  −4 .
25

25e3
rise

∇f (a)
run

b. If the stream flows in the e2 direction, the rate at which its elevation changes is
De2 f (a) = −4, so the stream bed makes an angle of − arctan 4 with the horizontal (“rise/run” =
−4/1).

3.4.5 Proceeding as in Example 3, let f1 (x) = kx − ak, f2 (x) = kx − bk, f = f1 + f2 , and let
the ladybug’s position as a function of time be given by g(t). At time t = t0 , we have g(t0 ) = x0
and g′ (t0 ) = v. Then, by the chain rule, we have
 
′ ′ x0 − a x0 − b π √
(f ◦ g) (0) = Df (x0 )g (t0 ) = ∇f (x0 ) · v = + · v = −2kvk cos = −5 2.
kx0 − ak kx0 − bk 4

Thus, the sum of the ladybug’s distances from a and b decreases at a rate of 5 2 units/sec.

3.4.6 Since (f ◦ g)(t) = c for all t ∈ (−ε, ε), we have (f ◦ g)′ (t) = 0 for all t ∈ (−ε, ε). In
particular, 0 = (f ◦ g)′ (0) = ∇f (a) · g′ (0). Since g′ (0) is the direction vector for the tangent line to
C at a, the conclusion follows. (That C can be so parametrized, with g′ (0) 6= 0, is a consequence
of the implicit function theorem. See Section 5 of Chapter 4 and Sections 2 and 3 of Chapter 6.)
60 3. THE DERIVATIVE
     
x −c c
3.4.7 Let P = , F1 = , and F2 = . We have
y 0 0

−−→ −−→ p p
kF1 P k + kF2 P k = (x + c)2 + y 2 + (x − c)2 + y 2 = 2a
m
p p
(x + c)2 + y 2 = 2a − (x − c)2 + y 2
m
p
(x + c)2 + y 2 = 4a2 − 4a (x − c)2 + y 2 + (x − c)2 + y 2
m
p
4a (x − c)2 + y 2 = 4a2 − 4cx
m (using cx ≤ ca < a2 for ⇑)

a2 (x − c)2 + y 2 = (a2 − cx)2
m
a2 (x2 − 2cx + c2 + y 2 ) = a4 − 2a2 cx + c2 x2
m
(a2 − c2 )x2 + a2 y 2 = a2 (a2 − c2 )
m
x2 y2
+ = 1, where a2 − c2 = b2 .
a2 b2
−−

3.4.8  of points P so that kF P k is equal
A parabola with focus F and directrix ℓ is the locus
0
to the distance from P to ℓ. For concreteness, let F = and ℓ = {x ∈ R2 : x2 = −c}. Then
c
−→
the parabola is a level set of the function f (x) = kF xk − x2 . If T is the unit tangent vector to the
parabola at x, then we have

−→ ! −→
Fx Fx
0= −→ − e2 · T = −→ · T − e2 · T = cos α − cos β,
kF xk kF xk

−→
where α is the angle between T and F x and β is the angle between T and the vertical. This means
that the light ray emanating from the focus and the vertical make equal angles with the parabola,
as required.

3.4.9 The crucial fact that’s needed here is the following: If P is external to a circle, the two
line segments from P tangent to the circle have equal length. The plane containing F1 , P , and
Q1 intersects the smaller sphere in a circle, and we observe that the line segments F1 P and Q1 P
are both tangent to that circle, the former because the sphere is tangent to the shaded plane, the
−−→ −−→
second because it is tangent to the cone at Q1 . Thus, kF1 P k = kQ1 P k. Similarly, using the larger
3.4. THE GRADIENT 61

−−→ −−→
inscribed sphere, we obtain kF2 P k = kQ2 P k. Therefore,
−−→ −−→ −−→ −−→
kF1 P k + kF2 P k = kQ1 P k + kQ2 P k = const,

inasmuch as the distance along generators from one horizontal slice of the cone to another is the
same for all the generators.
 
2
3.4.10 a. We are told that ∇f is everywhere a scalar multiple of the vector . Thus, the
1
level curves must be lines orthogonal to that vector, i.e., lines of the form 2x + y = c, c ∈ R.
To verify that our statement is correct, we check directly that fis constant along any such line.
t
Choose a parametrization g of any such line, e.g., g(t) = . Then
c − 2t
∂f ∂f
(f ◦ g)′ (t) = ∇f (g(t)) · g′ (t) = (g(t)) − 2 (g(t)) = 0,
∂x ∂y
as required.
     
0 x 0
b. Set F (s) = f . Then F is differentiable and f =f = F (2x + y).
s y 2x + y
  " #
x −y
3.4.11 a. We are told that ∇f · = 0 everywhere. Since ∇f is orthogonal to the
y x
   
x −y
level curves of f , it follows that the level curve through must be tangent to . It is not
y x
difficult to see that that level curve must be a circle centered at the origin. Formally, if the level
curve is (locally) the graph y = g(x), then g ′ (x) = −x/g(x), so x + g(x)g ′ (x) = 0, from which we
obtain x2 + g(x)2 = const.  
a cos t
To verify that this is correct, for any constant a, we differentiate ϕ(t) = f using the
a sin t
chain rule:
  " #    
′ −a
a cos t sin t ∂f a cos t ∂f a cos t
ϕ (t) = ∇f · = −a sin t + a cos t = 0.
a sin t
a cos t ∂x a sin t ∂y a sin t
    p 
s x x2 + y 2 p
b. For s > 0, set F (s) = f . Then f = f = F ( x2 + y 2 ), as
0 y 0
required.

3.4.12 Let S1 = {x : x2 + y 2 + z 2 = 1} and S2 = {x : x2 + y 2 − z + c = 0}. The normal vectors


to their respective tangent planes at x are
   
x 2x
   
A1 = 2  y  and A2 =  2y  .
z −1

a. The tangent planes coincide at x precisely when A1 and A2 are parallel. If x or y is


nonzero, we see that A1 = A2 , and so we must have z = −1/2. Then x2 + y 2 = 3/4, and c = −5/4.
If both x = y = 0, then either z = 1 and c = 1, or z = −1 and c = −1. However, in the latter case,
62 3. THE DERIVATIVE
 
0
the surfaces intersect not only at  0 , but also along the circle x2 + y 2 = 1, z = 0, and there
they are not tangent. −1

b. The tangent planes are orthogonal at x precisely when A1 ·A2 = 0, so 2x2 +2y 2 −z = 0.
Solving this simultaneously with the other two equations, we find that x2 + y 2 = c, so z = 2c and
√ √
4c2 + c − 1 = 0. Thus, c = (−1 ± 17)/8. However, in the case c = −(1 + 17)/8 < −5/8, we

have z < −5/4, and there is no such point on S1 . Thus, only in the case c = ( 17 − 1)/8 do the
surfaces intersect orthogonally.

3.4.13 Let g be a parametrization of the ellipse. Note that since n is the unit normal, (n◦ g)′
−−→ ′
is orthogonal to n, hence tangential. Note that by the product rule, we have F1 g · (n◦ g) =
−−→ −−→ −−→ ′ −−→
g′ · (n◦ g) + F1 g · (n◦ g)′ = F1 g · (n◦ g)′ . Similarly, F2 g · (n◦ g) = F2 g · (n◦ g)′ . Differentiating, and
−−→
letting α denote the angle between Fi P and n, we have
 −−→  −−→ ′ −−→ −−→  −−→  −−→
F1 g · (n◦ g) F2 g · (n◦ g) = (F1 g) · (n◦ g)′ F2 g · (n◦ g) + F1 g · (n◦ g) (F2 g) · (n◦ g)′
−−→ −−→ 
= k(n◦ g)′ kkF1 gkkF2 gk (− sin α)(cos α) + (cos α)(sin α) = 0.
−−→  −−→ 
Therefore, F1 g · (n◦ g) F2 g · (n◦ g) is constant, as required.
An alternative solution is to write it out in 
coordinates, using the equation in Exercise 7. Since
x x2 y 2
the ellipse is a level curve of the function f = 2 + 2 = 1, its (non-unit) normal is given by
y a b
 2

1 x/a
2 ∇f = y/b 2 . Then it suffices to prove that

" # " #! " # " #!  


x+c x/a2 x−c x/a2 2 x2 y 2
· · =b + 4 .
y y/b2 y y/b2 a4 b
 
a
(We arrive at the constant b2 = a2 − c2 by plugging in the point .) But this is a straightforward
0
calculation: Expanding the left-hand side, we find
  
x(x + c) y 2 x(x − c) y 2 x2 (x2 − c2 ) 2x2 y 2 y 4
+ + = + 2 2 + 4
a2 b2 a2 b2 a4 a b b
 2 2
2 2
x y c
= 2
+ 2 − 4 x2
a b a
2 2  2 
b −a 2 2 x 1 x2
=1+ x =b + 2− 2 2
a4 a4 b a b
2
!  
2 x2 b2 (1 − xa2 ) 2 x
2 y2
=b + =b + 4 ,
a4 b4 a4 b

as required.

3.4.14 At each point the stream flows in the direction of steepest descent,
 so on the map
80 x
(projecting onto the xy-plane), it is following −∇h = . If the route of the
(4 + x2 + 3y 2 )2 3y
3.5. CURVES 63

3f (x) dy 3y
stream is given by y = f (x), then we have f ′ (x) = (or = ). Separating variables and
x dx x
integrating, we obtain
f ′ (x) 3
= =⇒ log f (x) = 3 log x + c =⇒ f (x) = Cx3 for some constant C .
f (x) x
 
1
Since f (1) = 1, we must have C = 1, and the stream follows the path y = x3 from “outwards.”
1

3.4.15 At each point, the water follows the path of steepest descent, so its path on the map
(projecting onto the xy-plane) must be orthogonal to level curves of the height
 function.
 From the
x 2 2
 function h y = 9 − (4x + y ),
equation of the football, we infer that these are level curvesof the
4x
and so the (projection of the) water drop follows − 12 ∇h = . If the path is given by y = f (x),
y
f (x)
then we have f ′ (x) = . Separating variables and integrating, we obtain
4x
f ′ (x) 1 1
= =⇒ log f (x) = log x + c =⇒ f (x) = Cx1/4 for some constant C .
f (x) 4x 4
 
1
Since f (1) = 1, we must have C = 1, and the water drop follows the path x = y 4 from
1
1.398
“outwards” to approximately . Along the surface of the football, the actual water drop
1.087
 4
    
t 1 1.398
takes the path  √ t  from  1  to  1.087 .
1
2 9 − 4t8 − t2 1 0

3.5. Curves

3.5.1 If g · g′ = 0, then (g · g)′ = 0, and so kgk2 is constant. Thus, g lies on a sphere centered
at the origin.

3.5.2 kg′ k = const ⇐⇒ kg′ k2 = (const)2 ⇐⇒ 2g′ ·g′′ = 0 ⇐⇒ the velocity and acceleration
vectors are always orthogonal.

3.5.3 The result is immediate from the product rule. The geometric interpretation when
kf k = kgk = 1 is very simple: In order to maintain a constant angle (f · g = cos θ, where θ is the
angle between f and g), as f turns towards g, g must simultaneously turn away from f at the same
rate.

3.5.4 There are two cases. If the force field is everywhere zero, then the particle either is
stationary (if its speed is 0) or moves in a line. If the force is nonzero, then we know from
Proposition 5.1 that the particle moves in a plane. From Exercise 2 it follows that its velocity and
acceleration vectors are always orthogonal. Since the force field is central, the particle’s velocity
and position vectors are always orthogonal, and so it follows from Exercise 1 that the particle moves
64 3. THE DERIVATIVE

on a sphere centered at the origin. We’ve already established that its trajectory is planar; thus, it
moves in a circle centered at the origin.

3.5.5 Intuitively, if a particle’s velocity is only in the direction of its position vector, then
its direction cannot change, and its position vector always points in the same direction. More
precisely, following the hint, we have g = kgkh, and so λg = g′ = kgk′ h + kgkh′ . Therefore,
h′ = (λ − kgk′ /kgk)h. But, inasmuch as h has constant length 1, by Proposition 5.2, we know that
h′ must be orthogonal to h. Therefore h′ = 0, as required.

3.5.6 t0 is a global minimum of the function f : (a, b) → R, f (t) = kg(t) − pk2 . Thus,
0 = f ′ (t0 ) = 2(g(t0 ) − p) · g′ (t0 ). (For an intuitive explanation, this is the Pythagorean Theorem
in action. If g(t0 ) is the point on the curve closest to p, then as we move away we approximate
g(t) − p as the hypotenuse of a triangle with legs g(t0 ) − p and (t − t0 )g′ (t0 ).)
 t   
e (cos t − sin t) cos t − sin t
3.5.7 a. We have g′ (t) =  et (sin t + cos t)  = et  sin t + cos t . Thus,
et 1

p √
kg′ (t)k = et (cos t − sin t)2 + (sin t + cos t)2 + 1 = et 3,

Z b Z b√ √
and the arclength of the curve is kg′ (t)kdt = 3et dt = 3(eb − ea ).
a a
1 t 
−t
2 (e −e )
b. We have g′ (t) =  21 (et + e−t ) , so
1

q q
et + e−t
kg′ (t)k = 1 t
4 (e − e−t )2 + 14 (et + e−t )2 + 1 = 1 t
2 (e + e−t )2 = √ .
2

Z 1 Z 1 √
1  1
Thus, the arclength of the curve is kg′ (t)kdt = √ et + e−t dt = 2(e − ).
−1 2 −1 e
 
1 √
c. We have g′ (t) =  6t , so kg′ (t)k = 1 + 36t2 + 324t4 = 18t2 + 1. Thus, the
2
18t
Z 1 Z 1

arclength of the curve is kg (t)kdt = (18t2 + 1)dt = 7.
0 0
 
1 − cos t p

d. We have g (t) = a , and so kg′ (t)k = a (1 − cos t)2 + sin2 t =
sin t
p
a 2(1 − cos t). Using the double angle formula cos t = 1 − 2 sin2 (t/2), we find that the arclength
Z 2π Z 2π p Z 2π
of the curve is kg′ (t)kdt = a 2(1 − cos t)dt = 2a | sin(t/2)|dt = 8a.
0 0 0
3.5. CURVES 65
 
− √13 sin t + √12 cos t
 
3.5.8 a. We have g′ (t) =  − √13 sin t , so υ(t) = kg′ (t)k = 1, and T = g′ . Then the
− √13 sin t − √12 cos t
curve is arclength-parametrized and
 
− √1 cos t − √1 sin t
3 2
 
κ(t) = kT′ (t)k =  − √13 cos t  = 1.

− 3 cos t + 2 sin t
√1 √1

(Note that g is just the parametrization of the unit circle in the plane x1 − 2x2 + x3 = 0.)
 −t 
−e
b. We have g′ (t) = √
et , so υ(t) = kg′ (t)k = et + e−t , and the unit tangent vector
2
 
−e −t
1  et . By the chain rule, we have
is T = √
e + e−t
t
2
    
−e−t e−t 2
− e−t
 t  1 et t  1  
κ(s(t))υ(t)N(s(t)) = t −t  e + t −t  e = t  2 .
(e + e ) 2 √ e +e (e + e−t )2 √ −t t
2 0 2(e − e )

Therefore, κ = 2/(et + e−t )2 .
 
1 √
c. Since g′ (t) =  2t , so υ(t) = kg′ (t)k = 1 + 4t2 + 9t4 and the unit tangent vector
2
3t
 
1
1  2t . By the chain rule,
is given by T = √
1 + 4t2 + 9t4 3t2

   
1 0
′ 3 2 4 −3/2   2 4 −1/2  
κ(s(t))υ(t)N(s(t)) = (T◦ s) (t) = −(4t + 18t )(1 + 4t + 9t )  2t  + (1 + 4t + 9t )  2 
3t2 6t
 
−t(2 + 9t2 )
2 4 −3/2  
= 2(1 + 4t + 9t )  1 − 9t4  ,
3t(1 + 2t2 )

and so, noting that 1 + 13t2 + 54t4 + 117t6 + 81t8 = (1 + 4t2 + 9t4 )(1 + 9t2 + 9t4 ), we find that

κ = 2(1 + 4t2 + 9t4 )−3/2 1 + 9t2 + 9t4 .

3.5.9 Since g′ = υT and g′′ = υ ′ T + κυ 2 N, we have g′ × g′′ = κυ 3 T × N, so κ = kg′ × g′′ k/υ 3 ,


as required.

3.5.10 We wish to bank the road so that the resistive normal force n exerted by the road
contributes the centripetal acceleration with magnitude κυ 2 , so we should have knk sin θ = mκυ 2 .
Since the vertical component of the normal force must balance the weight of the car, we have
66 3. THE DERIVATIVE

W=mg

knk cos θ = mg. Thus, we should have

6400 mi/hr2 5280 ft/mi


tan θ = κυ 2 /g = 2 · ≈ 0.0815,
32 ft/sec (3600 sec/hr)2

so θ ≈ 0.0813 (≈ 4.66◦ ).

3.5.11 a. Since T and N are orthogonal unit vectors, their cross product B is a unit vector
orthogonal to both of them. Then the matrix
 
| | |
 
A = T N B 
| | |

is orthogonal, and hence A−1 = AT . Given any vector x ∈ R3 , it follows that we can solve the
equation Ac = x for c, and so every vector x ∈ R3 can be expressed as x = c1 T + c2 N + c3 B for
some scalars ci . (Indeed, c1 = T · x, c2 = N · x, and c3 = B · x.)
b. Since kBk = 1, it follows from Proposition 5.2 that B′ · B = 0. Since T′ · B = 0, it
follows from Exercise 3 that B′ · T = −T′ · B = 0. Since we know from part a that B′ is a linear
combination of T, N, and B, it follows that B′ must be a scalar multiple of N.
c. If τ = 0, then B(s) = B0 for all s. Then (g · B0 )′ = T · B0 = 0, so g lies in a
plane with normal vector B0 . Conversely, if g lies in the plane A · x = b, then A · g = b and so
A · g′ = A · T = 0, and thus A · N = 0. Then B = T × N is a scalar multiple of A, and is therefore
constant.
d. We know that N′ = c1 T + c2 N + c3 B for some scalar functions c1 , c2 , and c3 . Using
Proposition 5.2 and Exercise 3 once again, we have c2 = N′ · N = 0, c1 = N′ · T = −T′ · N = −κ,
and c3 = N′ · B = −B′ · N = τ . Thus, N′ = −κT + τ B, as desired.
 
−a sin t √ √
3.5.12 We have g′ (t) =  a cos t , so υ(t) = kg′ (t)k = a2 + b2 . Writing c = a2 + b2 , we
b
have
 
−a sin t
1 
T(s(t)) =  a cos t  ,
c
b
3.5. CURVES 67
 
cos t
T′  
N= = −  sin t  ,
kT′ k
0
1 a a
κ= k(T◦ s)′ (t)k = 2 = 2 ,
υ c a + b2
 
b sin t
1 
B = T × N =  −b cos t  ,
c
a
 
b cos t
′ 1 
(B◦ s) (t) =  b sin t  = −υτ N(s(t)), so
c
0
b b
τ= 2
= 2 .
c a + b2
 
cos t − sin t √
3.5.13 We have g′ (t) = et  cos t + sin t , so υ(t) = kg′ (t)k = et 3. Then we have
1
 
cos t − sin t
1  
T(s(t)) = √  cos t + sin t  ,
3
1
 
− cos t − sin t
T′ 1  
N= = √  cos t − sin t  ,
kT′ k 2
0

1 2
κ = k(T◦ s)′ (t)k = t ,
υ 3e
 
sin t − cos t
1  
B = T × N = √  − sin t − cos t  ,
6
2
 
cos t + sin t
′ 1  
(B s) (t) = √  − cos t + sin t  = −υτ N(s(t)),
◦ so
6
0
1 1
τ = √ = t.
υ 3 3e

−−→
3.5.14
 Letting Q be the point of tangency of the string to the cycloid, we have OQ = f (t) =
t + sin t −−

. Note that when 0 ≤ t ≤ π, −QP is a scalar multiple of the vector f ′ (t)/kf ′ (t)k, the
1 − cos t
scalar being arclength of that portion of the cycloid from O to Q. When 0 ≤ t ≤ π, that length is
given by:
Z t Z t Z t

s(t) = kf ′ (u)kdu = 2 + 2 cos udu = 2| cos(u/2)|du = 4 sin(t/2).
0 0 0
68 3. THE DERIVATIVE

−−→
(Note, moreover, that since the figure
" is symmetric
# about t = π, this will also be the value of kQP k
1 + cos t
when π < t ≤ 2π.) Now, f ′ (t) = , so
sin t
" # " #
−−→ −−→ −−→ t + sin t 4 sin(t/2) 1 + cos t
OP = OQ + QP = −
1 − cos t 2 cos(t/2) sin t
" # " #
t + sin t 2 sin(t/2) 2 cos2 (t/2)
= −
1 − cos t cos(t/2)
2 sin(t/2) cos(t/2)
" # " #
t + sin t − 4 sin(t/2) cos(t/2) t − sin t
= = .
1 − cos t − 4 sin2 (t/2) −1 + cos t

(Although we derived this formula assuming 0 ≤ t ≤ π, it holds as well when π < t ≤ 2π, because
−−→
now QP is in the same direction as f ′ (t), so the sign of cos(t/2) takes care of this sign change.)
What is interesting, as Huygens discovered, is that the pendulum bob follows the arc of a congruent
cycloid.

3.5.15 a. Note that a2 sin2 θ + b2 cos2 θ = a2 − c2 cos2 θ. Then

b2 (r cos θ − c)2 + a2 (r sin θ)2 = a2 b2


m
(a2 sin2 θ + b2 cos2 θ)r 2 − 2b2 c cos θr − b4 = 0
m
a2 r 2 − (cr cos θ + b2 )2 = 0
m
2
 
(a − c cos θ)r − b (a + c cos θ)r + b2 = 0
m (using (a + c cos θ)r > 0 for ⇓)
(a − c cos θ)r = b2
m
c b2
r(1 − cos θ) = .
a a

b. Note first that e′r (t) = θ ′ (t)eθ (t) and e′θ (t) = −θ ′ (t)er (t). Now, differentiating g(t) =
r(t)er (t), we obtain

g′ (t) = r ′ (t)er (t) + r(t)θ ′ (t)eθ (t)


g′′ (t) = r ′′ (t)er (t) + 2r ′ (t)θ ′ (t)eθ (t) + r(t)θ ′′ (t)eθ (t) − r(t)θ ′ (t)(−θ ′ (t)er (t))
 
= r ′′ (t) − r(t)θ ′ (t)2 er (t) + 2r ′ (t)θ ′ (t) + r(t)θ ′′ (t) eθ (t),

as required.
3.6. HIGHER-ORDER PARTIAL DERIVATIVES 69

c. Recall that A0 = g(t) × g′ (t) = r 2 (t)θ ′ (t)er (t) × eθ (t). Since the force field is inverse
square, we have g′′ (t) = −GM/r(t)2 er (t), so
GM 
g′′ (t) × A0 = − e r (t) × r 2
(t)θ ′
(t)e r (t) × e θ (t) = GM θ ′ (t)eθ (t) = GM e′r (t).
r(t)2
Since A0 , G, and M are constants, this means that g′ (t) × A0 = GM (er (t) + c) for some constant
vector c.
d. We have g(t) · g′ (t) × A0 = A0 · g(t) × g′ (t) = kA0 k2 , so

kA0 k2 = GM r(t)er (t) · (er (t) + c) = GM r(t)(1 − kck cos θ(t))

(assuming, as per the problem, that c is a negative scalar multiple of e1 ). If kck ≥ 1, we see that
as cos θ(t) → (1/kck)− , r(t) → ∞. Since the orbit of a planet is bounded, we infer that in this case
we must have kck < 1 and it now follows from part a that the orbit is an ellipse with one focus at
the origin.
e. We know from Proposition 5.1 that the position vector of the planet sweeps out area
at the constant rate 12 kA0 k. It sweeps out the area of the ellipse in one period, so πab/T = 12 kA0 k.
r
b2 kA0 k2 3/2 kA0 k2
Now we infer from parts a and d that = , and so we have ab = a , whence
a GM GM
2πab 2πa3/2
T = = √ , as required.
kA0 k GM
3.5.16 The bicycle is going from right to left. Since the front wheel can turn but the rear wheel
cannot, following the path of the rear wheel (the solid curve) a constant distance along each tangent
line must give us the position of the front wheel (the dotted curve).

3.6. Higher-Order Partial Derivatives

∂f ∂f
3.6.1 a. Note first that since f vanishes on the axes, (0) = (0) = 0. By the quotient
∂x ∂y
rule, we have
 
∂f y(x4 + 4x2 y 2 − y 4 ) ∂f 0
= , so = −y, and
∂x (x2 + y 2 )2 ∂x y
 
∂f x(x4 − 4x2 y 2 − y 4 ) ∂f x
= , so = x.
∂y (x2 + y 2 )2 ∂x 0
70 3. THE DERIVATIVE
   
∂2f ∂ ∂f ∂2f ∂ ∂f
b. We have (0) = (0) = 1 and (0) = (0) = −1.
∂x∂y ∂x ∂y ∂y∂x ∂y ∂x
c. It follows immediately from Theorem 6.1 that f cannot be C2 at 0.

∂2f ∂2f
3.6.2 a. We have + = 6 − 6 = 0.
∂x2 ∂y 2
∂f 2x ∂f 2y ∂2f y 2 − x2 ∂2f
b. We have = 2 and = . Then = 2 and =
∂x x + y2 ∂y x2 + y 2 ∂x2 (x2 + y 2 )2 ∂y 2
x2 − y 2 ∂2f ∂2f
2 2 2 2
, so 2
+ 2 = 0.
(x + y ) ∂x ∂y
∂2f ∂2f ∂2f
c. We have + + = 2 + 4 − 6 = 0.
∂x2 ∂y 2 ∂z 2
∂f ∂2f
d. We have = −x(x2 + y 2 + z 2 )−3/2 , so = (2x2 − y 2 − z 2 )(x2 + y 2 + z 2 )−5/2 .
∂x ∂x2
Permuting the variables, we have
∂2f ∂2f ∂2f 
2
+ 2
+ 2
= (x2 + y 2 + z 2 )−5/2 (2x2 − y 2 − z 2 ) + (2y 2 − x2 − z 2 ) + (2z 2 − x2 − y 2 ) = 0.
∂x ∂y ∂z

∂2f ∂2f 2 ∂2f 2


2∂ f
3.6.3 a. = − cos(x + ct) and = −c cos(x + ct), so = c , as required.
∂x2 ∂t2 ∂t2 ∂x2
∂2f ∂2f 2 ∂2f 2
2∂ f
b. = −25 sin 5x cos 5ct and = −25c sin 5x cos 5ct, so = c .
∂x2 ∂t2 ∂t2 ∂x2
∂f 2 x  ∂2f 1 2
3.6.4 We have = t−1/2 e−x /4kt , so 2
= 2 t−5/2 e−x /4kt (x2 − 2kt). On the other
∂x 2kt ∂x 4k
∂f 1 −5/2 −x2 /4kt 2 ∂f ∂2f
hand, = t e (x − 2kt), so = k 2 , as required.
∂t 4k ∂t ∂x
 
x
3.6.5 We wish to express f in the form f = φ(x + ct) + ψ(x − ct), subject to the initial
t
   
x ∂f x
conditions f = h(x) and = k(x). Then we have
0 ∂t 0

h(x) = φ(x) + ψ(x) and k(x) = c(φ′ (x) − ψ ′ (x)),


Z
1 x
The latter equation implies k(t)dt = φ(x) − ψ(x), so, solving simultaneously with the first,
c 0
we obtain
 Z   Z 
1 1 x 1 1 x
φ(x) = h(x) + k(t)dt and ψ(x) = h(x) − k(t)dt ,
2 c 0 2 c 0
which one can easily check works.

3.6.6 Proceeding as in Example 3, we have


  
  u
u x v 
g =

 
v u 
y
v
3.6. HIGHER-ORDER PARTIAL DERIVATIVES 71
    
u u
and set F =f g . Then by the chain rule, we have
v v
      
u u u
DF = Df g Dg
v v v
 ∂x 
        ∂x
∂f u ∂f u  ∂u ∂v  ,
= g g  
∂x v ∂y v ∂y ∂y
∂u ∂v
so
 
        ∂x      
∂F ∂f u ∂f u  ∂v  ∂f u ∂x ∂f u ∂y
= g g  = g + g .
∂v ∂x v ∂y v ∂y ∂x v ∂v ∂y v ∂v
∂v
Now, differentiating with respecttou,
we have toapply
 the product rule as well as the chain
∂f u ∂f u
rule (to each of the functions g and g ). For typographical reasons, we will
v v
  ∂x ∂y
u
suppress the argument g in the partial derivatives of f .
v
 ∂x   ∂x 
   
∂2F ∂f ∂ 2 x ∂ 2 f ∂ 2 f  ∂u  ∂x ∂f ∂ 2 y ∂ 2 f ∂ 2 f  ∂u  ∂y
= +   + +  
∂u∂v ∂x ∂u∂v ∂x2 ∂y∂x ∂y ∂v ∂y ∂u∂v ∂x∂y ∂y 2 ∂y ∂v
∂u ∂u
2 2 2 2
  2
∂f ∂ x ∂f ∂ y ∂ f ∂x ∂x ∂ f ∂x ∂y ∂x ∂y ∂ f ∂y ∂y
= + + + + + 2 ,
∂x ∂u∂v ∂y ∂u∂v ∂x2 ∂u ∂v ∂x∂y ∂u ∂v ∂v ∂u ∂y ∂u ∂v
where at the last step we have applied Theorem 6.1.
  " #
r r cos θ
3.6.7 Let g = and let F = f ◦ g. Then we apply the chain rule to obtain the
θ r sin θ
 
r
following (we suppress the argument g in all the derivatives of f throughout):
θ
   " #
∂F ∂F ∂f ∂f cos θ −r sin θ
= ,
∂r ∂θ ∂x ∂y sin θ r cos θ
so
∂F ∂f ∂f ∂F ∂f ∂f
= cos θ + sin θ and = (−r sin θ) + (r cos θ).
∂r ∂x ∂y ∂θ ∂x ∂y
Differentiating again, and being careful to apply the chain rule once more, we have
∂2F ∂2f 2 ∂2f ∂2f ∂2f
= cos θ + cos θ sin θ + sin θ cos θ + sin2 θ
∂r 2 ∂x2 ∂y∂x ∂x∂y ∂y 2
∂2F ∂f ∂f ∂2f 2 ∂2f
= (−r cos θ) + (−r sin θ) + (−r sin θ) + (−r sin θ)(r cos θ)+
∂θ 2 ∂x ∂y ∂x2 ∂y∂x
∂2f ∂2f
(r cos θ)(−r sin θ) + 2 (r cos θ)2 .
∂x∂y ∂y
72 3. THE DERIVATIVE

Thus, using Theorem 6.1, summing and doing the algebra carefully,
 2 
∂2F 1 ∂F 1 ∂2F ∂ f 2 ∂2f ∂2f 2
+ + 2 2 = cos θ + 2 sin θ cos θ + 2 sin θ
∂r 2 r ∂r r ∂θ ∂x2 ∂x∂y ∂y
   
1 ∂f ∂f 1 ∂f ∂f
+ cos θ + sin θ + 2 (−r cos θ) + (−r sin θ)
r ∂x ∂y r ∂x ∂y
 2 2 2

1 2∂ f 2 2 ∂ f 2∂ f 2
+ 2 r (sin θ) − 2r (cos θ sin θ) + r (cos θ)
r ∂x2 ∂x∂y ∂y 2
∂2f ∂2f
= + ,
∂x2 ∂y 2
as desired. Whew!
 
r
3.6.8 If F = r n cos nθ, then
θ

∂2F 1 ∂F 1 ∂2F 
2 n−2
+ + = n(n − 1) + n − n r cos nθ = 0.
∂r 2 r ∂r r 2 ∂θ 2
The analogous computation holds with sin.
 
r 1 h′′ (r)
3.6.9 Suppose F = h(r) is harmonic. Then we have h′′ (r) + h′ (r) = 0. Thus, ′ =
θ r h (r)
1
− , so log h′ (r) = − log r + const. We now infer that h′ (r) = c/r for some constant c, so h(r) =
r
c log r + c′ for some constants c and c′ .

3.6.10 We wish to check that the given functions are solutions of the equation
  ∂f 2  ∂ 2 f   ∂f 2  ∂ 2 f
∂f ∂f ∂ 2 f
1+ −2 + 1+ = 0.
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2

a. This is truly immediate.


b. We have
∂f y ∂f x ∂2f 2xy ∂2f ∂2f y 2 − x2
=− 2 , = 2 , = = − , = 2 .
∂x x + y2 ∂y x + y2 ∂x2 (x2 + y 2 )2 ∂y 2 ∂x∂y (x + y 2 )2
Therefore,
  ∂f 2  ∂ 2 f   ∂f 2  ∂ 2 f
∂f ∂f ∂ 2 f
1+ −2 + 1+
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2
1 2 2 2 2
 2 2 2 2 2 2

= 2 2xy (x + y ) + x + 2xy(y − x ) − 2xy (x + y ) + y
(x + y 2 )4
2xy
= 2 (x2 + y 2 − x2 − y 2 ) = 0.
(x + y 2 )4
c. We have
∂f 1 −1/2 2x ∂f −1/2
= (ex + e−x )2 − 4y 2 (e − e−2x ), = −2y (ex + e−x )2 − 4y 2 ,
∂x 2 ∂y
∂2f 1 −3/2 4x 
2
= − (ex + e−x )2 − 4y 2 e + 4e2x + 6 + 4e−2x + e−4x − 8y 2 (e2x + e−2x ) ,
∂x 2
3.6. HIGHER-ORDER PARTIAL DERIVATIVES 73

∂2f −3/2 2x
= 2 (ex + e−x )2 − 4y 2 y(e − e−2x ),
∂x∂y
∂2f −x 2

2 −3/2 x
= −2 (ex
+ e ) − 4y (e + e−x )2 .
∂y 2
Then, as is best checked with a computer algebra system,
  ∂f 2  ∂ 2 f   ∂f 2  ∂ 2 f
∂f ∂f ∂ 2 f
1+ −2 + 1+
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2

−5/2 1 
= (ex + e−x )2 − 4y 2 − e4x + 4e2x + 6 + 4e−2x + e−4x − 8y 2 (e2x + e−2x )
2
(ex + e−x )2 + 4y 2 (e2x − e−2x )2 − 2(ex + e−x )2 (ex + e−x )2 − 4y 2 +

1 2x −2x 2

(e − e ) = 0.
4

3.6.11 F is clearly C2 everywhere except perhaps along the set where u = 0 and v ≥ 0. Only
∂F ∂2F
and are not identically zero, and these are easily checked to be continuous along the set
∂u ∂u2  
∂2F u
in question. It is obvious that = 0. Suppose we had F = φ(u) + ψ(v) for some functions
∂u∂v v
φ and ψ. Then for all u > 0 we would have
   
u 3 u
F = φ(u) + ψ(1) = u and F = φ(u) + ψ(−1) = 0,
1 −1
∂F
which, of course, is impossible. What’s wrong? We know that is independent of v on {v < 0}
∂u
and independent of v on {v > 0}, but no one says they must be the same function on both regions.
(The statement f ′ = 0 =⇒ f = const is true on an interval and false on a disconnected set.)
CHAPTER 4
Implicit and Explicit Solutions of Linear Systems
4.1. Gaussian Elimination and the Theory of Linear Systems

4.1.1 We need to show that every solution of Ax = b is also a solution of Cx = d and vice
versa. Start with a solution u of Ax = b. Denoting the rows of A by A1 , . . . , Am , then we have

A 1 · u = b1
A 2 · u = b2
..
.
A m · u = bm

If we apply an elementary operation of type (i), u still satisfies precisely the same list of equations.
If we apply an elementary operation of type (ii), say multiplying the kth equation by r 6= 0, then u
satisfies Ak · u = bk if and only if it satisfies (rAk ) · u = rbk , as is required. As for an elementary
operation of type (iii), suppose we add r times the k th equation to the ℓth ; since Ak · u = bk and
Aℓ · u = bℓ , it follows that

(rAk + Aℓ ) · u = (rAk · u) + (Aℓ · u) = rbk + bℓ ,

and so u satisfies the “new” ℓth equation.


To prove conversely that if u satisfies Cx = d, then it satisfies Ax = b, we merely note that
each argument we’ve given can be reversed; in particular, the inverse of an elementary operation
is again an elementary operation. That is, to undo any elementary operation we again perform an
elementary operation.

4.1.2 a. Neither; see condition (1).


b. Echelon, but not reduced echelon; see condition (4) or (5).
c. Reduced echelon.
d. Echelon, but not reduced echelon; see condition (4).
e. Neither; see condition (3).
f. Echelon, but not reduced echelon; see condition (4) or (5).
g. Reduced echelon.
74
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 75

4.1.3 a. To find the reduced echelon form, we use Gaussian elimination:


       
1 0 −1 1 0 −1 1 0 −1 1 0 −1
       
A =  −2 3 1 0 3 −3  0 3 −3  0 1 −1  .
3 −3 0 0 −3 3 0 0 0 0 0 0

Ax = 0 if and only if

x1 − x3 = 0
x2 − x3 = 0 ,
 
1
so x = x3  1 .
1
     
2 −2 4 1 −1 2 1 −1 2
     
b. A =  −1 1 −2  1 −1 2 0 0 0 .
3 −3 6 1 −1 2 0 0 0
Thus, the general solution of Ax = 0 is
       
x1 x2 −2x3 1 −2
       
x =  x2  =  x2  = x2 1 + x3  0 .
x3 x3 0 1

c.
       
1 2 −1 1 2 −1 1 2 −1 1 2 −1
       
 1 3 1 0 1 2 0 1 2 0 1 2
A=
 2
      
 4 3
0
 0 5
0
 3 5
0
 0 −1 
−1 1 6 0 3 5 0 0 1 0 0 1
     
1 2 −1 1 2 0 1 0 0
     
0 1 2 0 1 0 0 1 0
     .
0 0 1 0 0 1 0 0 1
     
0 0 0 0 0 0 0 0 0

Ax = 0 implies x1 = x2 = x3 = 0, so x = 0.
" # " # " #
1 −2 1 0 1 −2 1 0 1 −2 0 1
d. A = .
2 −4 3 −1 0 0 1 −1 0 0 1 −1
Ax = 0 gives
       
x1 2x2 −x4 2 −1
       
 x2   x2  1  0
x=  
x =
 = x2   + x4   .
 3  x4 

0
 
 1
 
x4 x4 0 1
76 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

e.
     
1 1 1 1 1 1 1 1 1 1 1 1
     
1 2 1 2 0 1 0 1 0 1 0 1
A=
1
    
 3 2 4
0
 2 1 3
0
 0 1 1
1 2 2 3 0 1 1 2 0 0 1 1
     
1 1 1 1 1 1 0 0 1 0 0 −1
     
0 1 0 1 0 1 0 1 0 1 0 1
     .
0 0 1 1 0 0 1 1 0 0 1 1
     
0 0 0 0 0 0 0 0 0 0 0 0
 
1
 −1 
Thus, x = x4  
 −1 .
1
f.
   
1 2 0 −1 −1 1 2 0 −1 −1
   
 −1 −3 1 2 3 0 −1 1 1 2
A=
 1
  
 −1 3 1 1
0
 −3 3 2 2
2 −3 7 3 4 0 −7 7 5 6
   
1 2 0 −1 −1 1 2 0 −1 −1
   
0 1 −1 −1 −2  0 1 −1 −1 −2 
   
0 0 0 −1 −4  0 0 0 1 4
   
0 0 0 −2 −8 0 0 0 0 0
   
1 2 0 0 3 1 0 2 0 −1
   
0 1 −1 0 2 0 1 −1 0 2
   ,
0 0 0 1 4  4
 0 0 0 1 
0 0 0 0 0 0 0 0 0 0
so Ax = 0 gives rise to the general solution
       
x1 −2x3 +x5 −2 1
       
 x2   x3 −2x5   1 −2
       
  
x =  x3  =  x3     
 = x3  1 + x5  0 .
       
 x4   −4x5   0 −4
x5 x5 0 1
g.
   
1 −1 1 1 0 1 −1 1 1 0
   
 1 0 2 1 1 0 1 1 0 1
A=
 0
  
 2 2 2 0
0
 1 1 1 0
−1 1 −1 0 −1 0 0 0 1 −1
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 77
   
1 −1 1 1 0 1 −1 1 1 0
   
0 1 1 0 1 0 1 1 0 1
→
0
  
 0 0 1 −1 

0
 0 0 1 −1 

0 0 0 1 −1 0 0 0 0 0
   
1 −1 1 0 1 1 0 2 0 2
   
0 1 1 0 1  1
  → 0 1 1 0 .
0 0 0 1 
−1  0 0 0 1 −1 
  
0 0 0 0 0 0 0 0 0 0
       
x1 −2x3 −2x5 −2 −2
       
 x2   −x3 −x5  −1 −1
       
Ax = 0 gives x =   
 x3  =  x3
 = x3  1 + x5  0.
    
       
x
 4  x 5 0
   1
x5 x5 0 1
h.
   
1 1 0 5 0 −1 1 1 0 5 0 −1
   
 0 1 1 3 −2 0 0 1 1 3 −2 0
A=
 −1
  
 2 3 4 1 −6 

0
 3 3 9 1 −7 

0 4 4 12 −1 −7 0 4 4 12 −1 −7
   
1 1 0 51 1 0 0 −15 0 −1
   
0 1 1 0
3 1 −2 1 0
3 −2 0
   
0 0 0 0
0 0 7 0 −7 
0 1 −1 
   
0 0 0 00 0 7 0 −70 0 0
   
1 1 0 5 0 −1 1 0 −1 2 0 1
   
0 1 1 3 0 −2    −2 
 0 1 1 3 0 .
0 0 0 0 1 −1    −1 
 0 0 0 0 1 
0 0 0 0 0 0 0 0 0 0 0 0
         
x1 x3 −2x4 −x6 1 −2 −1
         
 x2  −x3 −3x4 +2x6  −1  −3   2
         
x   x   1  0  0
 3  3       
Ax = 0 gives x =   =   = x3   + x4   + x6  .
 x4   x4       
     0  1  0
x   x6   0  0  1
 5        
x6 x6 0 0 1

4.1.4 a.
     
2 1 −1 3 1 2 1 0 1 2 1 0
     
[A|b] =  1 2 1 0  2 1 −1 3 0 −3 −3 3
−1 1 2 −3 −1 1 2 −3 0 3 3 −3
78 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
     
1 2 1 0 1 2 1 0 1 0 −1 2
     
0 1 1 −1  →  0 1 1 −1  0 1 1 −1  .
0 3 3 −3 0 0 0 0 0 0 0 0

Thus, the system of equations from the matrix [A|b] is given in reduced echelon form by

− x3 = 2 x1
x2 + x3 = −1 ,
       
x1 2+x3 2 1
       
from which we read off x =  x2  = −1−x3  = −1 + x3 −1.
x3 x3 0 1
b.
" # " #
1 1 1 1 6 1 1 1 1 6
[A|b] =
3 3 2 0 17 0 0 −1 −3 −1
" # " #
1 1 1 1 6 1 1 0 −2 5
.
0 0 1 3 1 0 0 1 3 1

Thus, the system of equations from the matrix [A|b] is given in reduced echelon form by

x1 + x2 − 2x4 = 5
x3 + 3x4 = 1 ,

from which we read off the general solution


         
x1 5−x2 +2x4 5 −1 2
         
 x2   x2  0    
x=    =   + x2  1 + x4  0 .
 x  = 1 
−3x4  1    −3
 3   0  
x4 x4 0 0 1
c.
   
1 1 1 −1 0 −2 1 1 1 −1 0 −2
   
2 0 4 1 −1 10  0 −2 2 3 −1 14 
[A|b] = 
1
  
 2 0 −2 2 −3 

0
 1 −1 −1 2 −1 

0 1 −1 2 4 7 0 1 −1 2 4 7
   
1 1 1 −1 0 −2 1 1 1 −1 0 −2
   
0 1 −1 −1 2 −1  0 1 −1 −1 2 −1 
   
0 −2 2 3 −1 14  0 0 0 1 3 12 
   
0 1 −1 2 4 7 0 0 0 3 2 8
   
1 1 1 −1 0 −2 1 1 1 −1 0 −2
   
0 1 −1 −1 2 −1  0 1 −1 −1 2 −1 
   
0 0 0 1 3 12  0 0 0 1 3 12 
   
0 0 0 0 −7 −28 0 0 0 0 1 4
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 79
   
1 1 1 −1 0 −2 1 1 1 0 0 −2
   
0 1 −1 −1 0 −9  0 1 −1 0 0 −9 
   
0 0 0 1 0 0 0 0 0 1 0 0
   
0 0 0 0 1 4 0 0 0 0 1 4
 
1 0 2 0 0 7
 
0 1 −1 0 0 −9 
 .
0 0 0 1 0 0
 
0 0 0 0 1 4
       
x1 7−2x3 7 −2
       
 x2  −9+ x3  −9  1
       
  
Thus, the general solution is x =  x3  =  x3  =  0 + x3 
   
 1.
       
 x4   0   0  0
x5 4 4 0
   
1 0
4.1.5 To find all unit vectors in R3 that make angle π/3 with u =  0  and v =  1 , we
−1 1
need to find x so that the cosine of the angle between u and x and the cosine of the angle between
v and x are both equal to 1/2. That is, we must find vectors x with
u·x v·x 1
kxk = 1 and = = .
kukkxk kvkkxk 2
This gives the equations
1 1
kxk = 1, x1 − x3 = √ , and x2 + x3 = √ .
2 2
  1  
x1 √ +x3
2
   √1 
The general solution of the system given by the last two equations is x =  x2  = 
 2 −x 3
=

x3 x3
 √     √ 
1/ 2 1 1/ 2
 √     √ 
1/ 2 + x3 −1. Since we want kxk = 1, we find that we must have x3 = 0, i.e., x =  1/ 2 .
0 1 0
(This follows from straightforward computation, but more elegantly by noting that the direction
 √ 
1/ 2
 √ 
vector of the line is orthogonal to  1/ 2 , which is already a unit vector.)
0

4.1.6 a. We wish to find a vector x ∈ R4 such that Ax = 0, where


   
1 1 1 1 1 0 0 −1
   
A = 1 2 1 2, whose reduced echelon form is 0 1 0 1.
1 3 2 4 0 0 1 1
80 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
   
1 1
 −1   −1 
The general solution of Ax = 0 is x = x4    
 −1 , and so  −1  gives a normal vector to the
1 1
hyperplane.
b. Here
   
1 1 1 1 1 0 0 0
   
A = 2 2 1 2, and the reduced echelon form is 0 1 0 1.
1 3 2 3 0 0 1 0
 
0
 −1 
So the  
  general solution of Ax = 0 is x = x4  0 , and a normal vector to the hyperplane is
0
 −1  1
 .
 0
1

4.1.7 We first find a, b, and c so that the given points satisfy x2 + y 2 + ax + by + c = 0. This
means we must solve the system

2a + 6b + c = −40
−a + 7b + c = −50
−4a − 2b + c = −20 .

We reduce the augmented matrix to reduced echelon form:


   
2 6 1 −40 1 0 0 2
   
 −1 7 1 −50  0 1 0 −4  .
−4 −2 1 −20 0 0 1 −20

Thus, x2 + y 2 + 2x − 4y − 20 = 0 gives the circle that contains the three points. To find the center
and radius of this circle, we complete the square:

0 = x2 + y 2 + 2x − 4y − 20 = (x + 1)2 + (y − 2)2 − 25.


 
−1
Thus, the center of the circle is and its radius is 5.
2

4.1.8 Finding numbers c1 , c2 , and c3 so that c1 v1 + c2 v2 + c3 v3 = b is equivalent to solving


the system
    
1 0 2 c1 3
    
 0 1 1   c2  =  0  .
−1 2 1 c3 −2
 
1
The solution of this system is c =  −1 , so b = v1 − v2 + v3 .
1
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 81

4.1.9 Let A have columns v1 , v2 , v3 . Then b can be written as a linear combination of v1 ,


v2 , and v3 if and only if Ax = b is consistent. We compute the constraint equations for Ax = b
to be consistent:
     
1 0 1 b1 1 0 1 b1 1 0 1 b1
     
 0 −1 −2 b2   0 −1 −2 b2  0 1 2 −b2 
     .
 1 0 1 b   0 0 0 b − b   0 0 0 b − b 
 3   3 1   3 1 
−2 1 0 b4 0 1 2 b4 + 2b1 0 0 0 b2 + b4 + 2b1

Thus, the constraint equations are b3 − b1 = 2b1 + b2 + b4 = 0. The vector b in part a does not
satisfy the second constraint, the vector in part b satisfies both constraints, and the vector in part
c satisfies neither constraint.
 
1 1
 
4.1.10 a. Since the matrix A =  1 2  has rank at most 2, there will be constraint(s) for
1 2
the equation Ax = b to be consistent. Hence the vectors cannot span R3 .
 
1 1 1
 
b. A =  1 2 3  has rank 2, so there will be a constraint equation for Ax = b
1 2 3
to be consistent. Hence the vectors cannot span R3 .
 
1 1 3 2
 
c. A =  0 −1 5 3  has rank 2, so the vectors do not span R3 .
1 1 3 2
 
1 2 0
 
d. A =  0 1 1  has rank 3; thus, Ax = b is always consistent, and so the
−1 1 5
3
vectors span R .

4.1.11 a.
   
3 −1 b1 3 −1 b1
   
 6 −2 b2  0 0 b2 − 2b1 
−9 3 b3 0 0 b3 + 3b1

Constraint equations: 2b1 − b2 = 0, 3b1 + b3 = 0.


b.
     
1 1 1 b1 1 1 1 b1 1 1 1 b1
     
 −1 1 2 b2  0 2 3 b1 + b2  0 2 3 b1 + b2 
1 3 4 b3 0 2 3 b3 − b1 0 0 0 b3 − 2b1 − b2

Constraint equation: 2b1 + b2 − b3 = 0.


82 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

c.

     
1 2 1 b1 1 2 1 b1 1 2 1 b1
     
 0 1 1 b2  0 1 1 b2  0 1 1 b2 
     
 −1 3 4 b3  0 5 5 b3 + b1  0 0 0 b3 + b1 − 5b2 
     
−2 −1 1 b4 0 3 3 b4 + 2b1 0 0 0 b4 + 2b1 − 3b2

Constraint equations: b1 − 5b2 + b3 = 0, 2b1 − 3b2 + b4 = 0.

4.1.12 a.
To write
 b as a linearcombination of these three vectors we must solve the system
1 0 1
 
0 1 1

Ax = b, where A =  . The constraint equations are given by
1 1 1
1 2 0

   
1 0 1 b1 1 0 1 b1
   
0 1 1 b2  0 1 1 b2 
   
1 1 1 b3  0 1 0 b3 − b1 
   
1 2 0 b4 0 2 −1 b4 − b1
   
1 0 1 b1 1 0 1 b1
   
0 1 1 b2  0 1 1 b2 
   .
0 0 −1 b3 − b2 − b1  0 0 1 b1 + b2 − b3 
   
0 0 −3 b4 − 2b2 − b1 0 0 0 2b1 + b2 − 3b3 + b4

The constraint equation is 2b1 + b2 − 3b3 + b4 = 0.


b.

     
1 0 2 b1 1 0 2 b1 1 0 2 b1
     
0 1 −1 b2  0 1 −1 b2  0 1 −1 b2 
     
1 1 1 b3  0 1 −1 b3 − b1  0 0 0 b3 − b2 − b1 
     
1 2 0 b4 0 2 −2 b4 − b1 0 0 0 b4 − 2b2 − b1

The constraint equations are b1 + b2 − b3 = 0 and b1 + 2b1 − b4 = 0.


         
1 2 2 1 1
       
4.1.13 a. If A  0  = A  1  = b, then A  1  −  0  = b − b = 0. Thus,  1  must
1 1 1 1 0
   
1 1
be orthogonal to the rows of A. But  0  is not orthogonal to  1 , so no such A can exist.
1 0
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 83
     
" # 4 1 3
0 1 0 1  1   2   −1 
b. A = works. Notice that      
 0  −  1  =  −1  is orthogonal
0 0 1 1
3 2 1
   
0 0
1 0
to    
 0  and  1 .
1 1
     
1 1 1
0 0 0
c. Since the rows of A are orthogonal to      
 1 , we know that A  1  = 0, so  1 
0 0 0
cannot be a solution of Ax = b when b 6= 0.
           
1 2 1 1 2 1
d. If A =  0  = A  1  = b1 , and A  0  = A  1  = b2 , then A  1  −  0  =
1 1 0 1 1 1
     
1 1 1
b1 − b1 = 0 and A  1  −  0  = b2 − b2 = 0, so the rows of A must be orthogonal to  1 
1 0 0
  " #
0
1 −1 1
and  1 , suggesting the matrix A = , which satisfies the requirements.
1 1 −1 1
" # " #
1 α 1 α
4.1.14 a. A= . For A to be singular, we must have 3α − α2 = 0, so
α 3α 0 3α − α2
α = 0 or α = 3. " #
1 0
b. α = 0: For x = b to be consistent, the vector b must satisfy b2 = 0.
0 0
α = 3: Since
" # " #
1 3 b1 1 3 b1
,
3 9 b2 0 0 b2 − 3b1

in order for Ax = b to be consistent, the vector b must satisfy 3b1 − b2 = 0.


   
1 1 α 1 1 α
   
4.1.15 a. A =  α 2 α  0 2 − α α − α2 . We see that A will be singular if and
α α 1 0 0 1 − α2
only if 1 − α2 = 0 or 2 − α = 0. Thus, the values α = 1, α = −1, and α = 2 make A singular.
b. If α = 1, we have
   
1 1 1 b1 1 0 1 2b1 − b2
   
1 2 1 b2  0 1 0 b2 − b1  ,
1 1 1 b3 0 0 0 b3 − b1

so Ax = b is consistent precisely when b3 − b1 = 0.


84 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

If α = −1, we have
   
1 1 −1 b1 1 1 −1 b1
   
 −1 2 −1 b2  0 3 −2 b1 + b2  ,
−1 −1 1 b3 0 0 0 b1 + b3

so Ax = b is consistent precisely when b1 + b3 = 0.


If α = 2, we have
   
1 1 2 b1 1 1 2 b1
   
2 2 2 b2  0 0 2 2b1 − b2 ,
2 2 1 b3 0 0 0 2b1 − 3b2 + 2b3

so Ax = b is consistent precisely when 2b1 − 3b2 + 2b3 = 0.


 
1 0
 
4.1.16 a. Counterexample:
 Take A =  0 1 . Then Ax = 0 has only the trivial solution,
0 0 0
but Ax =  0  has no solution.
1
" # # "
1 1 0 0
b. Let A = and B = . Then the solutions of both Ax = 0 and
0 0 1 1
 
1
Bx = 0 consist of all scalar multiples of . However, Ax = b is consistent only for vectors b
−1
satisfying b2 = 0, whereas Bx = b is consistent only when b satisfies b1 = 0.

4.1.17 a. Suppose (AB)x = 0. Then A(Bx) = 0, so, since A is nonsingular, Bx = 0. But now
since B is nonsingular, we must have x = 0.
b. Suppose first that B is singular; then there is a nonzero vector x so that Bx = 0. Then
that same nonzero vector x satisfies (AB)x = A(Bx) = 0, so AB is singular. Now suppose B is
nonsingular and A is singular. There is a nonzero vector y so that Ay = 0. Since B is nonsingular,
there is a (a fortiori nonzero) vector x so that Bx = y. Then we have (AB)x = A(Bx) = Ay = 0.
Thus, AB is singular.

4.1.18 a. A can’t exist: Ax = 0 always has the trivial solution.


" #
1 0
b. Take r = m = n; an example is the matrix A = .
0 1
 
1 0
 
c. Take m > n, r = n. An example is: A =  0 1 .
0 0
" #
1 0 0
d. Take r = m < n. An example is: A = .
0 1 0
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 85
 
1 0 0
 
e. Take r < n. An example is: A =  0 1 0 .
0 0 0
f. A can’t exist: If Ax = b3 has infinitely many solutions, we must have r < n. If
Ax = b2 has exactly one solution, we must have r = n.

4.1.19 a. Suppose that for some x ∈ Rn we have Ax = b. Then x = In x = (BA)x = B(Ax) =


Bb, so Bb is the unique solution. (Alternatively, if Ax = Ay, then x = B(Ax) = B(Ay) = y.)
b. Since A(Cb) = (AC)b = b, we see that x = Cb satisfies Ax = b.
c. By associativity, B = BIm = B(AC) = (BA)C = In C = C.

4.1.20 a. By successively adding row 1, row 2, . . . , and row m − 1 to row m, we obtain a row
of zeroes in the last row. Proceeding to echelon form, we see that there must be a row of zeroes,
and so r < m.
An alternative argument is as follows. Because the sum of the rows is 0, any vector b for which
Ax = b is consistent must satisfy b1 + · · · + bm = 0. (To see this, note that if Ax = b, then this
means that Ai · x = bi , so 0 = (A1 + · · · + Am ) · x = A1 · x + · · · + Am · x = b1 + · · · + bm .) Since
constraint equations arise from rows of zeroes in the echelon form of the matrix, we must have
r < m.
b. Since ci 6= 0, as in part a, we first multiply the ith row by ci , and then add to it c1
times row 1, c2 times row 2, . . . , and cm times row m. We thereby obtain a row of zeroes, and the
echelon form must therefore contain a row of zeroes.
 
1
1
 
4.1.21 a. If a1 + a2 + · · · + an = 0, then Ax = 0 has a nontrivial solution, viz., x =  . . Thus,
 .. 
A is singular, and so r < n.
1
P
n
b. As in part a, if there is a nonzero vector c such that ci ai = 0, then Ax = 0 has
i=1
the nontrivial solution x = c. Thus, A is singular, so r < n.

4.1.22 a. We reduce the matrix to echelon form using x1 − x2 6= 0, x1 − x3 6= 0, and x2 − x3 6= 0:

     
1 x1 x21 1 x1 x21 1 x1 x21
     
 1 x2 x22   0 x2 − x1 x22 − x21   0 1 x1 + x2 
1 x3 x23 0 x3 − x1 x23 − x21 0 1 x1 + x3
   
1 x1 x21 1 x1 x21
   
 0 1 x1 + x2   0 1 x1 + x2  .
0 0 x3 − x2 0 0 1

Therefore, A is nonsingular.
86 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

b. Solving the system

    
x21 x1 1 a y1
 2    
 x2 x2 1   b  =  y2 
x23 x3 1 c y3

is equivalent to solving
    
1 x1 x21 c y1
 2    
1 x2 x2   b  =  y 2  .
1 x3 x23 a y3

By part a, the latter system always has a unique solution.

4.1.23 a. As in the hint, the three points P1 , P2 , and P3 are collinear if and only if there are
numbers a, b, and c with a and b not both zero such that the points satisfy ax + by + c = 0. That
is, they are collinear if and only if the system

ax1 + by1 + c = 0
ax2 + by2 + c = 0
ax3 + by3 + c = 0

 
a
 
has a solution  b  with a and b not both zero. But notice that the only solution with a = b = 0
c
is the trivial solution, a = b = c = 0; thus, P1 , P2 , and P3 are collinear if and only if the system
has a nontrivial solution. Of course the coefficient matrix for this system is A.
b. By part a, if the points P1 , P2 , and P3 are not collinear, then Ax = 0 has only the
trivial solution, so A is nonsingular. Now to find a circle passing through P1 , P2 , and P3 we need
to solve the system

ax1 + by1 + c = −(x21 + y12 )


ax2 + by2 + c = −(x22 + y22 )
ax3 + by3 + c = −(x23 + y32 ) ,

   
a x21 + y12
   
i.e., Ax = b, where x =  b  and b = −  x22 + y22 . Since A is nonsingular, this system has a
c x23 + y32
unique solution. Thus, there is a unique circle passing through P1 , P2 , and P3 .
4.2. ELEMENTARY MATRICES AND CALCULATING INVERSE MATRICES 87

4.2. Elementary Matrices and Calculating Inverse Matrices


    
1 1 0 0 1 0 0 1 0 0
 1    
4.2.1 a. E =  3 0 1 0 0 1 02 1 0 =
1 0 1 1 −3 0 1 0 0 1
   
1 0 0 1 0 −1
 2 1   
 3 3 0 ; since EA =  0 1 −1 , we see that the constraint equation for Ax = b
−1 1 1 0 0 0
to be consistent is −b1 + b2 + b3 = 0.
   1   1 
1 0 0 1 0 0 2 0 0 2 0 0
     
b. E =  0 1 01 1 0 0 1 0  =  12 1 0 ;
−3 0 1 0 0 1 0 0 1 − 32 0 1
 
1 −1 2
 
since EA =  0 0 0 , the constraint equations for Ax = b to be consistent are 12 b1 + b2 =
0 0 0
− 32 b1 + b3 = 0.
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0  0  0
c. E =  0 1 0 0 1 0 ...
0 1
0 0 0 0 1 0 0 0 1 0 
 5   
1
0 0 0 1 0 0 5 1 0 −3 0 1
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 0  0  0
 1 0  0 1 0   −1 1 0 
0 0  0  0
 0 1   −2 0 1  0 0 1 
1 0 1 0 0 0 0 1 0 0 0 1
   
1 0 0 0 1 2 −1
   
 −1 1 0 0 0 1 2 
=   −2
; since EA = 
 
, the constraint equation
 5 0 1
5 0 0 0 1
18 1
5 −3 5 1 0 0 0
for Ax = b to be consistent is 18 b 1 − 3b 2 + 1
b
5 3 + b 4 = 0.
" #5
1 0
d. E = ; there is no row of zeroes in the echelon form of A, so there is no
−2 1
constraint equation for Ax = b to be consistent.
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0 0 1 0 0 0 1 0 0 
e. E =  0

 

 
...
 0 1 00 0 1 0   0 −2 1 0
0 0 −1 1 0 −1 0 1 0 0 0 1
   
1 0 0 0 1 0 0 0 1 0 0 0
   
 0 0  0  0
 1 0  0 1 0   −1 1 0 
 0 0 1 0   −1 0 1 0  0 0 1 0 
   
−1 0 0 1 0 0 0 1 0 0 0 1
88 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
   
1 0 0 0 1 1 1 1
   
 −1 1 0 0 0 1 0 1
=   ; since EA =  , the constraint equa-
 1 −2 1 0 0 0 1 1
−1 1 −1 1 0 0 0 0
tion for Ax = b to be consistent is −b1 + b2 − b3 + b4 = 0.
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0  0 −1 0 0 0 1 0 0
f. E =       ...
0 0 −1 00
 0 1 00
 0 1 0
0 0 0 1 0 0 0 1 0 0 −2 1
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0 0 1 0 0  0 1 0 0 
   ...
0 0 1  
0   0 −3 1  
0 0 0 1 0
 
0 −7 0 1 0 0 0 1 −2 0 0 1
    
1 0 0 0 1 0 0 0 1 0 0 0
    
 0 0  0  0
 1 0 1 1 0  =  −1 −1 0 ; since
 −1 0 1 0 0 0 1 0   4 3 −1 0 
    
0 0 0 1 0 0 0 1 −1 −1 −2 1
 
1 2 0 −1 −1
 
0 1 −1 −1 −2 
EA =   , so the constraint equation for Ax = b to be consistent is
0 0 0 1 4
0 0 0 0 0
−b1 − b2 − 2b3 + b4 = 0.
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0 0 1 0 0 0 1 0 0
g. E =       ...
1 0   0 −2 
0 0 2 0   0 1 0   1 0 
1
0 0 0 1 0 0 −2 1 0 0 0 1
    
1 0 0 0 1 0 0 0 1 0 0 0
    
0 1 0 0   −1 1 0 0   −1 1 0 0 
   =  ; since
0     1
0
 0 1 0 0 0 1 0  1 −1 2 
1 0 0 1 0 0 0 1 0 1 − 12 1
 
1 −1 1 1 0
 
0 1 1 0 1
EA =   , the constraint equation for Ax = b to be consistent is
0 0 0 1 −1 
 
0 0 0 0 0
b2 − 12 b3 + b4 = 0.
4.2. ELEMENTARY MATRICES AND CALCULATING INVERSE MATRICES 89
   
1 0 0 0 1 0 0 0 1 0 0 0
   
0 1 0 0  0  0
h. E =  0 1 0 0 1 0 ...
0 0 1
0 0 0 1 0 0 0 1 0 
 7     
0 0 0 1 0 0 −1 1 0 −4 0 1
    
1 0 0 0 1 0 0 0 1 0 0 0
    
0 1 0 0 0 1 0 0   0 1 0 0 
   =  ; since
 0 −3     1 3 1
0
 1 01 0 1 0  7 −7 7 
0 0 0 1 0 0 0 1 −1 −1 −1 1
 
1 1 0 5 0 −1
 
0 1 1 3 −2 0
EA =  0
, the constraint equation for Ax = b to be consistent
 0 0 0 1 −1  
0 0 0 0 0 0
is −b1 − b2 − b3 + b4 = 0.
" # " # " #
1 2 1 0 1 2 1 0 1 2 1 0
4.2.2 a. 1 1
−1 3 0 1 0 5 1 1 0 1 5 5
" # " #
3
1 0 − 25 1 3 −2
5
1 1
, so A−1 = .
0 1 5 5
5 1 1
   
1 2 3 1 10 2 3 0 1 0 0
   
b.  1 1 2 0  0 −1 −1 −1
1 0 1 0
0 1 2 0 00 1 2 1 0 0 1
   
1 2 3 1 0 0 1 2 0 4 −3 −3
   
0 1 1 1 −1 0 0 1 0 2 −2 −1 
0 0 1 −1 1 1 0 0 1 −1 1 1
   
1 0 0 0 1 −1 0 1 −1
  −1  
0 1 0 2 −2 −1 , so A =  2 −2 −1 .
0 0 1 −1 1 1 −1 1 1
   
1 0 1 1 0 0 1 0 1 1 0 0
   
c.  0 2 1 0 1 0 0 2 1 0 1 0
−1 3 1 0 0 1 0 3 2 1 0 1
   
1 0 1 1 0 0 1 0 1 1 0 0
   
0 2 1 0 1 0 0 1 1 1 −1 1
0 1 1 1 −1 1 0 2 1 0 1 0
   
1 0 1 1 0 0 1 0 0 −1 3 −2
   
0 1 1 1 −1 1 0 1 0 −1 2 −1 ,
0 0 1 2 −3 2 0 0 1 2 −3 2
 
−1 3 −2
 
so A−1 =  −1 2 −1 .
2 −3 2
90 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
   
1 2 3 1 0 0 1 2 3 1 0 0
   
d.  4 5 6 0 1 0  0 −3 −6 −4 1 0
7 8 9 0 0 1 0 −6 −12 −7 0 1
 
1 2 3 1 0 0
 
0 −3 −6 −4 1 0 , and so we see that A−1 does not exist.
0 0 0 1 −2 1
   
2 3 4 1 0 0 1 −1 −2 0 0 −1
   
e.  2 1 1 0 1 0 2 3 4 1 0 0
−1 1 2 0 0 1 2 1 1 0 1 0
   
1 −1 −2 0 0 −1 1 −1 −2 0 0 −1
   
0 5 8 1 0 2  0 −1 −2 1 −2 −2 
0 3 5 0 1 2 0 3 5 0 1 2
   
1 −1 −2 0 0 −1 1 −1 −2 0 0 −1
   
0 1 2 −1 2 2 0 1 2 −1 2 2
0 3 5 0 1 2 0 0 −1 3 −5 −4
   
1 −1 0 −6 10 7 1 0 0 −1 2 1
   
0 1 0 5 −8 −6  0 1 0 5 −8 −6 ,
0 0 1 −3 5 4 0 0 1 −3 5 4
 
−1 2 1
 
so A−1 = 5 −8 −6 .
−3 5 4
" # " # " # " #
5 −3 3 2 3
4.2.3 a. (i) A−1 = ; (ii) x = ; (iii) b = 3 − .
−3 2 −1 3 5
       
−2 0 1 0 1 1
       
b. (i) A−1 = 9 −1 −3 ; (ii) x =  2 ; (iii) b = 2  2  −  3 .
−6 1 2 −1 2 2
         
1 −1 0 3 1 1 1
         
c. (i) A−1 =  −1 0 1 ; (ii) x =  −2  ; (iii) b = 3 0 − 2 1 + 2  1 .
1 1 −1 2 1 2 1
         
1 −1 0 0 2 1 1 1
         
0 1 −3 2        
d. (i) A−1 = ; (ii) x =  −1 ; (iii) b = 2  0  −  1  +  1 .
0 0 4 −3     0 0 1
  1      
0 0 −1 1 0 0 0 1
   
2 0 0 1
   
4.2.4 a. 4 −1  and  0 1 .
3 −1 1 0
4.2. ELEMENTARY MATRICES AND CALCULATING INVERSE MATRICES 91
" #
1 0
b. Let A = . Then there can be no 2 × 2 matrix B with AB = I2 , since the
0 0
bottom row of AB will always be 0.
" # " #
0 1 1 0 1 1
c. and .
1 0 −1 0 −1 0
d. Using the matrix A from part b, there can be no 2 × 2 matrix B with BA = I2 , since
the second column of BA will always be 0.

4.2.5 For an elementary matrix E of type (i), E −1 = E, for interchanging rows i and j of
a matrix and then interchanging rows i and j of the result gives us our original matrix. For an
elementary matrix of type (ii), with c 6= 0 in the ii-entry, the inverse is given by putting 1/c in the
ii-entry. For an elementary matrix of type (iii), we replace c by −c. That is, after adding c times
row i to row j, if we then add −c times row i to row j, we have returned to the original matrix. In
each of these cases, the inverse is again an elementary matrix.

4.2.6 By Theorem 2.1, AB and B are invertible. Since B −1 is also invertible, we infer from
Proposition 4.3 of Chapter 1 that A = (AB)(B −1 ) is invertible as well. Indeed, we have A−1 =
B(AB)−1 .

4.2.7 a. By Exercise 1.4.12,


" #" # " # " #
A−1 O A O A−1 A O Im O
= = = Im+n ,
O B −1 O B O B −1 B O In
so, by Corollary 2.2, we have
" # " #−1
A−1 O A O
= .
O B −1 O B
b. Notice that
" #" # " #
A−1 −A−1 CB −1 A C Im O
= ,
O B −1 O B O In
and so we infer from Corollary 2.2 that
" # " #−1
A−1 −A−1 CB −1 A C
= .
O B −1 O B

4.2.8 a. If A is nonsingular, we know its reduced echelon form is I. There are therefore
finitely many elementary row operations that transform A into I, each of these operations being
implemented by multiplying on the left by an elementary matrix. Thus, there are finitely many
elementary matrices E1 , E2 , . . ., Ek so that Ek Ek−1 · · · E2 E1 A = I.
b. Let B = Ek Ek−1 · · · E2 E1 . Since every elementary matrix is invertible, we have
A = E1−1 E2−1 · · · Ek−1
−1
Ek−1
= B −1 , and so AB = B −1 B = I. (Or, more directly, from A = B −1 we
infer that A−1 = B.)
92 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

4.2.9 a. Suppose A has a right inverse B (so AB = Im ). If b ∈ Rm , then b = Im b = (AB)b =


A(Bb). So x = Bb is a solution of Ax = b.
Now assume Ax = c has a solution for every c ∈ Rm . For j = 1, . . . , m, let bj be a solution of
Ax = ej ; define B to be the matrix whose j th column is bj . Then AB = Im . Proposition 1.3 tells
us that Ax = b is always consistent if and only if rank(A) = m.
b. Suppose A has a left inverse B and that Ax = 0. Then since BA = In we have
x = In x = (BA)x = B(Ax) = B0 = 0. Now Proposition 1.5 tells us that the system Ax = 0 has
only the trivial solution if and only if rank(A) = n. " #
In
Finally, if rank(A) = n, we know that the reduced echelon form of A is , where the zero
O
on the bottom indicates the (m − n) × n zero" matrix.
# Thus, there are m × m elementary matrices
In
E1 , E2 , . . . , Ek so that Ek Ek−1 · · · E2 E1 A = . Let B be the n × m matrix given by the top n
O
rows of the product Ek Ek−1 · · · E2 E1 . Then BA = In .
c. From parts a and b we know that A has both a left and right inverse if and only if
r = n = m. But if m = n and A has a left inverse, then Corollary 2.2 tells us that A is invertible.
Of course, if A is invertible, then A has both a left and right inverse.

4.3. Linear Independence, Basis, and Dimension

4.3.1 a. Correct: 2v1 − v3 = 0.


b. Incorrect: v2 cannot be written as a linear combination of v1 and v3 .
" # " # " # " #
1 2 1 2 1 0
4.3.2 a. Suppose c1 + c2 = 0. Since , we see that c1 = c2 =
4 9 4 9 0 1
   
1 2
0. Thus, , is linearly independent.
4 9
       
1 2 1 2 1 0
       
b. Suppose c1  4  + c2  9  = 0. From  4 9 0 1 , we see that c1 =
0 0 0 0 0 0
    
 1 2 
c2 = 0. Thus,  4  ,  9  is linearly independent.
 
0 0
     
1 2 3
     
c. Suppose c1  4  + c2  9  + c3  −2  = 0. Since the reduced echelon form of
0 0 0
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 93
   
1 2 3 1 0 31
   
4 9 −2  is  0 1 −14 , we see that this system has infinitely many solutions (e.g.,
0 0 0 0 0 0
     
 1 2 3 
c1 = −31, c2 = 14, c3 = 1), and so the set  4  ,  9  ,  −2  is linearly dependent.
 
0 0 0
     
1 2 0
     
d. Suppose c1  1  + c2  3  + c3  1  = 0. Since the echelon form of the matrix
1 3 2
   
1 2 0 1 2 0
   
1 3 1 0 1 1  has three pivots, we see that c1 = c2 = c3 = 0 is the only
1 3 2 0 0 1
solution, and so the vectors form a linearly independent set.
       
1 1 1 3
       
1 1 3 1
e. Suppose c1        
 1  + c2  3  + c3  1  + c4  1  = 0. We find that
       
3 1 1 1

   
1 1 1 3 1 1 1 3
   
1 1 3 1 0 1 0 −1 
   ,
1 3 1 1 0 0 1 −1 
   
3 1 1 1 0 0 0 −10

so the only solution is the trivial solution. Thus, the vectors form a linearly independent set.
f. These vectors form a linearly dependent set. For example, their sum is 0.

4.3.3 Suppose c1 (v − w) + c2 (2v + w) = 0. Then (c1 + 2c2 )v +"(c2 − c1 )w#= 0." Since {v,
# w}
1 2 1 0
is linearly independent, we infer that c1 + 2c2 = −c1 + c2 = 0. But , so
−1 1 0 1
the only solution of this system is c1 = c2 = 0, as desired.

4.3.4 a. Since {u, v, w} is linearly independent, u ∈ / Span(v, w). Since v × w is orthogonal


to the plane spanned by v and w, if we had u · (v × w) = 0, then u would have to lie in that plane.
b. Suppose c1 u × v + c2 v × w + c3 w × u = 0. Dotting with u, and using u · (u × v) =
u · (w × u) = 0, we find that c2 u · (v × w) = 0. From part a we infer that c2 = 0. Now we are
left with u × (c1 v − c3 w) = 0, so c1 v − c3 w must be a scalar multiple of u. We infer from linear
independence of {u, v, w} that c1 = c3 = 0.
(Alternatively, we can deduce in analogy with part a that v · (w × u) 6= 0 and w · (u × v) 6= 0,
and argue in a symmetric fashion.)
94 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

4.3.5 Suppose c1 v1 +c2 v2 +· · ·+ck vk = 0. Then, for i = 1, . . . , k, (c1 v1 +c2 v2 +· · ·+ck vk )·vi =
0, so c1 (v1 · vi ) + · · · + ci (vi · vi ) + · · · + ck (vk · vi ) = ci kvi k2 = 0. Since vi 6= 0, we must have ci = 0
for i = 1, . . . , k. Hence {v1 , . . . , vk } is linearly independent.  
| | |
 
4.3.6 Suppose k > n, let v1 , . . . , vk ∈ Rn , and write A = v1 v2 · · · vk . Since
| | |
rank(A) ≤ n < k, the system Ax = 0 will have a nontrivial solution. Thus, {v1 , . . . , vk } is linearly
dependent. If we have k linearly independent vectors in Rn , we conclude that k ≤ n.

4.3.7 Suppose {v1 , . . . , vk } is linearly dependent. Then there exist scalars c1 , . . . , ck , not all
P
k P
zero, such that ci vi = 0. If cj 6= 0, we write vj = −(ci /cj )vi .
i=1 i6=j

4.3.8 We prove the contrapositive of the statement: If v1 6= 0 and vi+1 ∈ / Span(v1 , . . . , vi ) for
all i = 1, 2, . . . , k − 1, then {v1 , . . . , vk } is linearly independent. We proceed by induction. Suppose
v1 6= 0. Then {v1 } is linearly independent. Now suppose {v1 , . . . , vi } is linearly independent for
some 1 ≤ i ≤ k − 1. Then if vi+1 6∈ Span(v1 , . . . , vi ), we see from Proposition 3.2 that the set
{v1 , . . . , vi , vi+1 } is linearly independent.

4.3.9 Suppose c1 v1 + c2 v2 + · · · + ck vk = 0. Then, A(c1 v1 + c2 v2 + · · · + ck vk ) = 0, so


A(c1 v1 ) + · · · + A(ck vk ) = c1 (Av1 ) + · · · + ck (Avk ) = c1 b1 + · · · + ck bk = 0. Since {b1 , . . . , bk } is
linearly independent, c1 = · · · = ck = 0, so {v1 , . . . , vk } is linearly independent.

4.3.10 Suppose c1 T (v1 ) + · · · + ck T (vk ) = 0. Then T (c1 v1 + c2 v2 + · · · + ck vk ) = 0. Since [T ] is


nonsingular, we know that T (x) = 0 has only the trivial solution. Thus, c1 v1 +c2 v2 +· · ·+ck vk = 0.
Since {v1 , . . . , vk } is linearly independent, it follows that c1 = c2 = · · · = ck = 0, as desired.

4.3.11 Suppose c1 T (v1 ) + · · · + ck T (vk ) = 0. Then T (c1 v1 + c2 v2 + · · · + ck vk ) = 0. Since


rank([T ]) = n, the equation T (x) = 0 has only the trivial solution, so c1 v1 + c2 v2 + · · · + ck vk = 0.
Since {v1 , . . . , vk } is linearly independent, c1 = · · · = ck = 0. Hence {T (v1 ), . . . , T (vk )} is linearly
independent. " # " #
1 −1 1
If rank([T ]) < n, the statement is false. For example, if A = and v = , then
1 −1 1
the set {v} is linearly independent, whereas the set {Av} is not.
     
1 2 1
4.3.12 a. No: Since  2  − 2  4  + 3  2  = 0, the vectors form a linearly dependent set.
1 5 3
b. No: Any four vectors in R3 form a linearly dependent set.
c. No: Three vectors span a subspace whose dimension is at most 3.
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 95
 
1 0 1 2
 
0 1 1 −2 
d. Yes: The matrix 
2
 is nonsingular.
 1 4 1
3 1 4 2
     
1 3 5 1 0 −13 13
   
4.3.13 a. We have A =  2 4 −2  0 1 6 , and so A  −6  = 0. This
3 7 3 0 0 0 1
         
5 1 3 1 3
means that  −2  = −13  2  + 6  4 , so  2  and  4  span V . These vectors are easily
3 3 7 3 7
checked to be linearly independent; thus, they form a basis for V and dim V = 2.
   
−1 0
 0  
b. The general element of V is of the form x = x 3
  + x4  −1 , and so V is
  1  0
−1 0
 0   0 1
spanned by   and  −1 , which are easily checked to form a linearly independent set, so
 1  0
0 1
dim V = 2.

  c. x∈ V  ⇐⇒ x1+
 2x2+ 3x3 =
0. The general solution of this equation is x =
−2 −3  −2 −3 
x2  
1 + x3  
0 , and so  
1 ,  0  gives a basis for V and dim V = 2.
       
0 1 0 1
1 0 0
1 0 0
     
d. The general      
  solution
  of this system of equations is x = x2  0  + x4  1  + x5  0 ,
 1 0 0  0 1 0

 

     
 1   0   0     0 0 1
so a basis for V is  0  ,  1  ,  0  , and dim V = 3.
     

  0   1   0 

 

 
0 0 1

4.3.14 To show that {v1 , . . . , vn } is a basis for Rn , it suffices by Proposition 3.9 to show that
this is a linearly independent
" set of # vectors.
" #
2 3 3 1 0 3
a. From , we see that {v1 , v2 } is linearly indepen-
3 5 4 0 1 −1
   
c1 3
dent and that b = 3v1 − v2 , so the coordinates of b with respect to this basis are = .
c2 −1
   
1 1 1 1 1 0 0 0
   
b. From  0 2 3 1 0 1 0 2 , we see that {v1 , v2 , v3 } is
3 2 2 2 0 0 1 −1
linearly
   independent
 and that 2v2 − v3 = b, so the coordinates of b with respect to this basis are
c1 0
 c2  =  2  .
c3 −1
96 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
   
1 1 1 3 1 0 0 3
   
c. From  0 1 1 0 0 1 0 −2 , we see that {v1 , v2 , v3 } is
1 2 1 1 0 0 1 2
   
c1 3
linearly independent. Moreover, the coordinates of b with respect to this basis are  c2  =  −2 .
c3 2
   
1 1 1 1 2 1 0 0 0 2
   
0 1 1 1 0 0 1 0 0 −1 
d. From 
0
  , we see that {v1 , v2 , v3 , v4 }
 0 1 3 1
0
 0 1 0 1
0 0 1 4 1 0 0 0 1 0
   
c1 2
 c2   −1 
is linearly independent. Moreover, the coordinates of b with respect to this basis are    
 c3  =  1  .
       
c4 0
1 2 0 2
       
0 1 1 0
4.3.15 x ∈ V ∩ W if and only if x = a   + b   and x = c   + d 
      
 1  for some scalars
1 1 1  
1 2 0 2
a, b, c, and d.    
1 2 0 2 1 0 0 1
   1 
0 1 1 0  0 1 0 
A= 1




2 ,
1 
 1 1 1 0 0 1 −2 
1 2 0 2 0 0 0 0
     
−a −1 −2
   1  
 −b  −  1 −1 
so the vector     2   
c  =  1  = 2  1  spans the solution set of Ax = 0. This means that
   2  
d 1 2
         
1 2 0 2 4
         
0 1 1 0 1
x = 2         
 1  + 1  1  = 1  1  + 2  1  =  3  spans, and therefore gives a basis for, V ∩ W .
         
1 2 0 2 4

4.3.16 a. First, {v1 , . . . , vn } is linearly independent: Suppose c1 v1 + c2 v2 + · · · + cn vn = 0.


For any i = 1, . . . , n, dotting this equation with vi gives ci kvi k2 = 0, and so ci = 0. Now, by
Proposition 3.9, {v1 , . . . , vn } gives a basis for Rn .
b. To find the coordinates c1 , . . . , cn of x with respect to the basis {v1 , . . . , vn }, we write
x = c1 v1 + c2 v2 + · · · + cn vn . Since vi · vj = 0 for all i 6= j, taking the dot product of the equation
with vi gives x · vi = ci kvi k2 , i = 1, . . . , n. Therefore, we have ci = x · vi /kvi k2 , i = 1, . . . , n.
c. We conclude from part b that
Xn Xn
x · vi
x= vi = projvi x,
i=1
kvi k2 i=1
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 97

by definition.

4.3.17 To establish the first part of the Proposition, suppose v1 , . . . , vk span V and were to form
a linearly dependent set. Then one of the vectors, say vk , is a linear combination of the remaining
vectors, and so Span(v1 , . . . , vk−1 ) = Span(v1 , . . . , vk ). Continuing in this fashion, we end up with
a linearly independent subset of {v1 , . . . , vk } still spanning V . If there are ℓ < k vectors in that
subset, then it follows that those ℓ vectors form a basis for V , and so dim V = ℓ. Since dim V = k,
this is a contradiction; thus, the k vectors must have been linearly independent.
To establish the latter part of the Proposition, let {v1 , . . . , vk } be a linearly independent set in
V . Let W = Span(v1 , . . . , vk ). Then W ⊂ V and dim W = k. By Lemma 3.8, we have W = V .

4.3.18 This is the same as the proof of Theorem 3.5: If v1 , . . . , vk span V , we’re done; if
not, choose vk+1 ∈ V with vk+1 ∈ / Span(v1 , . . . , vk ). Then {v1 , . . . , vk , vk+1 } is still linearly
independent. Repeat the process; since n linearly independent vectors span Rn , this process must
terminate after a finite number of steps.

4.3.19 Pick a basis {v1 , . . . , vk } for W. By Exercise 18, we can find vectors vk+1 , . . . , vℓ so that
{v1 , . . . , vℓ } forms a basis for V . Hence, dim W = k ≤ ℓ = dim V .

4.3.20 By Exercise 9, {v1 , . . . , vn } is linearly independent, so {v1 , . . . , vn } forms a basis for Rn .


P
Suppose Ax = 0. Then there are scalars c1 , . . . , cn so that x = ni=1 ci vi . Then, A(c1 v1 + · · · +
cn vn ) = c1 (Av1 ) + · · · + cn (Avn ) = 0. However, since {Av1 , . . . , Avn } is linearly independent, we
infer that c1 = c2 = · · · = cn = 0, so x = 0. Hence A is nonsingular.
An alternative proof is as follows. Let B be the matrix whose columns are v1 , . . . , vn . Then,
by hypothesis, the matrix AB has linearly independent columns and is therefore nonsingular. By
Exercise 4.1.17, both A and B must be nonsingular.

4.3.21 a. Note that Span(u1 , . . . , uk , v1 , . . . , vk ) ⊂ U + V . Also, if x ∈ U + V , then x = u + v


for some u ∈ U and some v ∈ V , so x = u + v = c1 u1 + · · · + ck uk + d1 v1 + · · · + dℓ vℓ for some
ci , dj ∈ R. Therefore, the vectors u1 , . . . , uk , v1 , . . . , vℓ span U + V .
Now we want to see that {u1 , . . . , uk , v1 , . . . , vℓ } is linearly independent. Suppose c1 u1 + · · · +
ck uk + d1 v1 + · · · + dℓ vℓ = 0. Then, letting u = c1 u1 + · · · + ck uk and v = −(d1 v1 + · · · + dℓ vℓ ),
we see that u ∈ U , v ∈ V , and u = v. Thus, we conclude that u ∈ U ∩ V , and hence u = v = 0.
From the fact that {u1 , . . . , uk } and {v1 , . . . , vℓ } are linearly independent sets we now infer that
c1 = · · · = ck = d1 = · · · = dℓ = 0. Therefore, {u1 , . . . , uk , v1 , . . . , vℓ } is a basis for U + V .
b. This follows immediately from part a and the definition of dimension.
c. Let dim(U ∩ V ) = k and choose a basis {w1 , . . . , wk } for U ∩ V . Using Exercise
18, we can find vectors {uk+1 , . . . , uℓ } and {vk+1 , . . . , vm } so that {w1 , . . . , wk , uk+1 , . . . , uℓ } is
a basis for U and {w1 , . . . , wk , vk+1 , . . . , vm } is a basis for V . We claim that {w1 , . . . , wk , uk+1 ,
. . . , uℓ , vk+1 , . . . , vm } is a basis for U + V .
98 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

To show these vectors span U + V , let x ∈ U + V ; then x = u + v for some u ∈ U and v ∈ V .


Then there are scalars a1 , . . . , aℓ and b1 , . . . , bm so that

u = a1 w1 + · · · + ak wk + ak+1 uk+1 + · · · + aℓ uℓ and


v = b1 w1 + · · · + bk wk + bk+1 vk+1 + · · · + bm vm .

Thus,

u + v = (a1 + b1 )w1 + · · · + (ak + bk )wk + ak+1 uk+1 + · · · + aℓ uℓ + bk+1 vk+1 + · · · + bm vm .

Therefore, Span(w1 , . . . , wk , uk+1 , . . . , uℓ , vk+1 , . . . , vm ) = U + V .


To show the set is linearly independent, assume there are scalars a1 , . . . , ak , bk+1 , . . . , bℓ , and
ck+1 , . . . , cm so that

a1 w1 + · · · + ak wk + bk+1 uk+1 + · · · + bℓ uℓ + ck+1 vk+1 + · · · + cm vm = 0.

Now, let u = a1 w1 + · · · + ak wk + bk+1 uk+1 + · · · + bℓ uℓ and v = −(ck+1 vk+1 + · · · + cm vm ).


Then u = v, u ∈ U , and v ∈ V . Thus, v ∈ U ∩ V , so we can find scalars d1 , . . . , dk so that
v = d1 w1 + · · · + dk wk . Setting the two expressions for v equal, we obtain

d1 w1 + · · · + dk wk + ck+1 vk+1 + · · · + cm vm = 0.

Since {w1 , . . . , wk , vk+1 , . . . , vm } is linearly independent, we immediately deduce that d1 = · · · =


dk = ck+1 = · · · = cm = 0. This leads us to u = a1 w1 + · · · + ak wk + bk+1 uk+1 + · · · + bℓ uℓ = 0, and
so, by linear independence of {w1 , . . . , wk , vk+1 , . . . , vm }, we infer that a1 = · · · = ak = bk+1 =
· · · = bℓ = 0. In conclusion, we have a1 = · · · = ak = bk+1 = · · · = bℓ = ck+1 = · · · = cm = 0, so the
given set of vectors is in fact linearly independent.
Thus, we know that {w1 , . . . , wk , uk+1 , . . . , uℓ , vk+1 , . . . , vm } is a basis for U + V . Counting,
we have dim(U + V ) = k + (ℓ − k) + (m − k) = ℓ + m − k = dim(U ) + dim(V ) − dim(U ∩ V ), as
required.

4.3.22 a. Since T is linear, for any v ∈ V , we have T (0) = T (v − v) = T (v) − T (v) = 0, so


0 ∈ ker(T ) and 0 ∈ image(T ).
Now, if u, v ∈ ker(T ), then T (u + v) = T (u) + T (v) = 0 + 0 = 0, so u + v ∈ ker(T ). Likewise,
for any scalar c, T (cv) = cT (v) = c0 = 0, so cv ∈ ker(T ).
If w, z ∈ image(T ), then there are u, v ∈ V so that T (u) = w and T (v) = z. Thus, T (u + v) =
T (u) + T (v) = w + z, so w + z ∈ image(T ). Similarly, for any scalar c, T (cu) = cT (u) = cw, so
cw ∈ image(T ).
b. First, T (vk+1 ), . . . , T (vn ) span image(T ): An arbitrary element of image(T ) is of the
form w = T (v) for some v ∈ Rn . Writing v = c1 v1 + · · · + ck vk + ck+1 vk+1 + · · · + cn vn , we have
w = T (v) = ck+1 T (vk+1 ) + · · · + cn T (vn ), as required.
Next, we claim {T (vk+1 ), . . . , T (vn )} is linearly independent. Suppose ck+1 T (vk+1 ) + · · · +
cn T (vn ) = 0. Then T (ck+1 vk+1 + · · · + cn vn ) = 0, and so ck+1 vk+1 + · · · + cn vn ∈ ker(T ) =
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 99

Span(v1 , . . . , vk ). Since {v1 , . . . , vk , vk+1 , . . . , vn } is, by construction, a basis for Rn , we infer that
ck+1 = · · · = cn = 0, as required.
c. This is immediate from part b: dim ker(T ) = k and dim image(T ) = n − k, so
dim ker(T ) + dim image(T ) = n.
" # " # " #
1 0 0 1 1 1
4.3.23 a. Suppose a +b +c = O. Then we have the following
0 1 1 0 1 −1
system:
a
+ c = 0
b + c = 0
a − c = 0 ,
(" # " # " #)
1 0 0 1 1 1
whose only solution is a = b = c = 0. Therefore, , , is linearly
0 1 1 0 1 −1
independent.
b. Suppose c1 f1 + c2 f2 + c3 f3 = 0. This means that c1 t + c2 (t + 1) + c3 (t + 2) = 0.
Evaluating at t = 0, −1, and −2, we obtain the system

c2 + 2c3 = 0
−c1 + c3 = 0
2c1 − c2 = 0 .
   
0 1 2 1 0 −1
   
Reducing the coefficient matrix, we find  −1 0 1 0 1 2 , so we have a
−2 −1 0 0 0 0
nontrivial solution c1 = c3 = 1, c2 = −2. Thus, {f1 , f2 , f3 } is in fact linearly dependent.
c. Suppose c1 f1 + c2 f2 + c3 f3 = 0. Then c1 (1) + c2 cos t + c3 sin t = 0. Evaluating at
t = 0, π/2, and π, we obtain the system

c1 + c2 = 0
c1 + c3 = 0
c1 − c2 = 0 ,

whose only solution is c1 = c2 = c3 = 0. Therefore, the set {f1 , f2 , f3 } is linearly independent.


d. As is well-known, f2 + f3 = f1 , so the set {f1 , f2 , f3 } is linearly dependent.
e. Suppose c1 f1 +c2 f2 +c3 f3 = 0. Then we have c1 (1)+c2 cos t+c3 cos 2t = 0. Evaluating
at t = 0, π/2, and π, we obtain the following system

c1 + c2 + c3 = 0
c1 − c3 = 0
c1 − c2 + c3 = 0 .

Since the reduced echelon form of the coefficient matrix is the identity matrix, the only solution is
c1 = c2 = c3 = 0, and so the set {f1 , f2 , f3 } is linearly independent.
100 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

f. As is perhaps slightly less well-known, f2 = 2f3 − f1 , so {f1 , f2 , f3 } is linearly depen-


dent.

4.3.24 a. Let Eij be the m × n matrix whose ij-entry is 1 and all of whose other entries are
0. Then it is clear that {Eij , i = 1, . . . , m, j = 1, . . . , n} give a basis for Mm×n . Thus, Mm×n is
mn-dimensional.
b. Denote by D, U, and L the sets of diagonal, upper-triangular, and lower-triangular
matrices, respectively. First, we check that these are all subspaces. (i) It is clear that O ∈ D, U,
and L. (ii) Since c0 = 0 for all scalars c, any scalar multiple of an upper (resp., lower) triangular
matrix is upper (resp., lower) triangular and any scalar multiple of a diagonal matrix is diagonal.
(iii) Since 0 + 0 = 0, the sum of two upper triangular matrices is upper triangular, the sum of two
lower triangular matrices is lower triangular, and the sum of two diagonal matrices is diagonal.
Hence U, L, and D are subspaces of Mn×n .
Using the notation given in the solution of part a, we see that:
(1) the matrices Eii , i = 1, . . . , n, form a basis for D; hence, dim D = n.
(2) the matrices Eij , i ≤ j, form a basis for U. Thus, dim U = n+(n−1)+· · · +2+1 = n(n+1)/2.
(3) the matrices Eij , i ≥ j, form a basis for L. As in the preceding case, dim L = n(n + 1)/2.
c. First we check that S is a subspace of Mn×n . (i) Since OT = O, O ∈ S. (ii) If A ∈ S
and c ∈ R then (cA)T = cAT = cA, so cA ∈ S. (iii) If A, B ∈ S then (A + B)T = AT + B T = A + B,
so A + B ∈ S. Similarly, K is a subspace of Mn×n . (i) Since OT = O = −O, O ∈ K. (ii) If A ∈ K
and c ∈ R then (cA)T = cAT = −cA, so cA ∈ K. (iii) If A, B ∈ K, then (A + B)T = AT + B T =
−A − B = −(A + B), so A + B ∈ K.
We see that the matrices Eii , i = 1, . . . , n, and Eij + Eji , i < j, form a basis for S. Thus,
dim S = dim(U) = n(n + 1)/2. It is easy to see that {Eij − Eji : i < j} is a basis for K. Thus,
dim K = (n − 1) + (n − 2) + · · · + 2 + 1 = n(n − 1)/2. As in Exercise 1.4.36, if A ∈ Mn×n is
arbitrary, then we write A = 12 (A + AT ) + 12 (A − AT ), so A ∈ S + K.

4.3.25 a. Let S, T ∈ V ∗ and a ∈ R. Then we define S + T by (S + T )(x) = S(x) + T (x) and


aT by (aT )(x) = a(T (x)). Then S + T and aT are linear maps: if u, v ∈ V , then

 
(S + T )(u + v) = S(u + v) + T (u + v) = S(u) + S(v) + T (u) + T (v)
 
= S(u) + T (u) + S(v) + T (v) = (S + T )(u) + (S + T )(v),

(S + T )(cv) = S(cv) + T (cv) = cS(v) + cT (v) = c S(v) + T (v) = c(S + T )(v),
 
(aT )(u + v) = a T (u + v) = a T (u) + T (v) = aT (u) + aT (v) = (aT )(u) + (aT )(v),
 
(aT )(cv) = a T (cv) = a cT (v) = acT (v) = c(aT )(v).

The eight properties in the definition of a vector space are all immediate consequences of the
algebraic properties of the real numbers.
(1) (S + T )(v) = S(v) + T (v) = T (v) + S(v) = (T + S)(v).
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 101
  
(2) (R + S) + T (v) = (R + S)(v) + T (v) = R(v) + S(v) + T (v) = R(v) + S(v) + T (v) =

R(v) + (S + T )(v) = R + (S + T ) (v).
(3) Define 0 ∈ V ∗ by 0(v) = 0 for all v. Then (0 + T )(v) = 0(v) + T (v) = T (v) for all v.

(4) Given T ∈ V ∗ , define −T by (−T )(v) = −T (v). Then T + (−T ) (v) = T (v) + (−T )(v) =
T (v) − T (v) = 0 = 0(v).
 
(5) For all a, b ∈ R, a(bT )(v) = a bT (v) = (ab)T (v) = (ab)T (v).
 
(6) For all a ∈ R, a(S + T )(v) = a S(v) + T (v) = aS(v) + aT (v) = (aS) + (aT ) (v).
 
(7) For all a, b ∈ R, (a + b)T (v) = (a + b)T (v) = aT (v) + bT (v) = (aT ) + (bT ) (v).
(8) (1T )(v) = 1T (v) = T (v).
Hence, V ∗ is a vector space.
b. From the definition of the functions fi given in the problem, we have fi (vi ) = 1 and
fi (vj ) = 0 when i 6= j. Suppose c1 f1 + · · · + cn fn = 0. Evaluating this element of V ∗ on vi , we
obtain ci = 0. This holds for i = 1, . . . , n, and so {f1 , . . . , fn } is linearly independent.
Now, suppose T ∈ V ∗ is arbitrary. Let ci = T (vi ). Then for any x = a1 v1 + · · · + an vn , we
have

T (x) = T (a1 v1 + · · · + an vn ) = a1 T (v1 ) + · · · + an T (vn )


= c1 a1 + · · · + cn an = c1 f1 (x) + · · · + cn fn (x),

so T = c1 f1 + · · · + cn fn , and we see that f1 , . . . , fn span V ∗ .


c. When V is finite-dimensional, we have seen that a basis for V ∗ consists of precisely
the same number of vectors as one for V . Thus, dim V ∗ = dim V .

4.3.26 The polynomials pj (x) = xj , j = 0, 1, . . . , k, evidently span the space in question. We


must only argue that they form a linearly independent set. Now suppose c0 p0 +c1 p1 +· · ·+ck pk = 0,
i.e., c0 + c1 x + c2 x2 + · · · + ck xk = 0 for all x ∈ R. Evaluating at x = 0, we see immediately that
c0 = 0. Since all the derivatives of the 0-function are again identically 0, differentiating repeatedly
and evaluating at x = 0 gives c1 = 0, c2 = 0, . . . , ck = 0, as required.
Alternatively, evaluating at x = 0 gives c0 = 0. Now we have c1 x + c2 x2 + · · · + ck xk =
x(c1 + c2 x + · · · + ck xk−1 ) = 0 for all x, so c1 + c2 x + · · · + ck xk−1 = 0 for all x, from which we
conclude that c1 = 0. Continuing in this fashion, we have c0 = c1 = · · · = ck = 0, as we needed.

4.3.27 a. Clearly the 0 function is homogeneous of degree k. If f ∈ Pk,n and c is a scalar, then
(cf )(tx) = cf (tx) = c(tk f (x)) = tk (cf )(x). Similarly, if f, g ∈ Pk,n , then f + g ∈ Pk,n , inasmuch
as (f + g)(tx) = f (tx) + g(tx) = tk f (x) + tk g(x) = tk (f + g)(x).
b. It is evident that if i1 + i2 + · · · + in = k, the monomial xi11 xi22 · · · xinn is homogeneous
of degree k and that all such monomials span Pk,n . Now we must establish linear independence.
Let’s argue by induction on n. When n = 1, there is nothing to prove. Now suppose we assume
102 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

the corresponding result for Pk,n−1 (for all k) and we are given
X
ci1 i2 ...in xi11 xi22 · · · xinn = 0.
i1 +i2 +···+in =k

We group the terms according to the exponent of xn :


 
X k
X X i
ci1 i2 ...in xi11 xi22 · · · xinn =  ci1 i2 ...in−1 j xi11 xi22 · · · xn−1
n−1  j
xn = 0.
i1 +i2 +···+in =k j=0 i1 +i2 +···+in−1 =k−j
P i
Setting xn = 0, we obtain ci1 i2 ...in−1 0 xi11 xi22 · · · xn−1
n−1
= 0. By induction hypothesis
i1 +i2 +···+in−1 =k
(for Pk,n−1 ), all the coefficients ci1 i2 ...in−1 0 vanish. Now we are left with
 
Xk X in−1  j
 ci1 i2 ...in−1 j xi11 xi22 · · · xn−1 xn
j=1 i1 +i2 +···+in−1 =k−j
 
k
X X i
= xn  ci1 i2 ...in−1 j xi11 xi22 · · · xn−1
n−1  j−1
xn = 0.
j=1 i1 +i2 +···+in−1 =k−j
!
P
k P i
It follows that ci1 i2 ...in−1 j xi11 xi22 · · · xn−1
n−1
xj−1
n = 0 for all x. Once again,
j=1 i1 +i2 +···+in−1 =k−j
setting xn = 0 and applying the induction hypothesis (here for Pk−1,n−1 ), we infer that all the
coefficients ci1 i2 ...in−1 1 vanish. Continuing in this fashion (officially, doing induction on k), we find
that all the coefficients ci1 i2 ...in = 0, and our proof is complete.
c. Using the result of part b, this is merely a matter of counting the number of ways of
writing k as the sum of n nonnegative integers. There are many ways of doing this, but here is my
favorite. Start with k dots in a horizontal array, and draw n − 1 vertical lines to partition the array
x1 x2 x3 x4 x5 x6 x7

into n sections. The lines are allowed to be adjacent (with no dots between them) or to fall at the
beginning or end of the array. The number of dots between the (i − 1)th and ith vertical lines is
going to give us the exponent on xi . To count the number of ways of creating such configurations,
it is easier, as the figure suggests, to consider an array of k + (n − 1) dots and choose n − 1 of them
 
(these will be the dividing lines). Thus, there are n−1+kn−1 = n−1+k
k different monomials.
P n−1+i P n+k
k k
d. We have i = dim Pi,n = dim Pk,n+1 = k , because there is a one-
i=0 i=0
to-one correspondence between homogeneous polynomials of degree k or less in n variables and
homogeneous polynomials of degree k in n + 1 variables: If i1 + i2 + · · · + in ≤ k, then
k−(i +i2 +···+in )
xi11 xi22 · · · xinn ←→ xi11 xi22 · · · xinn xn+1 1 .

Substituting n + 1 for n gives the desired result.


4.4. THE FUNDAMENTAL SUBSPACES 103

4.4. The Fundamental Subspaces

4.4.1 Let’s show that R(B) ⊂ R(A) if B is obtained by performing any row operation on A.
Obviously, a row interchange doesn’t affect the span. If Bi = cAi and all the other rows are the
same, c1 B1 +· · ·+ci Bi +· · ·+cm Bm = c1 A1 +· · ·+(ci c)Ai +· · ·+cm Am , so any vector in R(B) is in
R(A). If Bi = Ai + cAj and all the other rows are the same, then c1 B1 + · · · + ci Bi + · · · + cm Bm =
c1 A1 + · · · + ci (Ai + cAj ) + · · · + cm Am = c1 A1 + · · · + ci Ai + · · · + (cj + cci )Aj + · · · + cm Am , so
once again any vector in R(B) is in R(A).
To see that R(A) ⊂ R(B), we observe that the matrix A is obtained from B by performing the
(inverse) row operation (this is why we need c 6= 0 for the second type of row operation). Since
R(B) ⊂ R(A) and R(A) ⊂ R(B), we have R(A) = R(B).

4.4.2 a. Taking the augmented matrix to echelon form, we have

   
1 2 1 1 b1 1 2 1 1 b1
   
 −1 0 3 4 b2  0 2 4 5 b2 + b1 ,
2 2 −2 −3 b3 0 0 0 0 b3 − b1 + b2

so C(A) = {b ∈ R3 : b1 − b2 − b3 = 0}.  
1
b. Since N(AT ) = C(A)⊥ , it is enough to find C(A)⊥ . From part a, we know  −1  is
−1
in C(A)⊥ , but does it span? We know from part a that dim C(A) = 2, so dim(C(A)⊥ ) = 1, and
the answer is yes.
" # " # " #
1 2 3 1 2 3 1 0
4.4.3 a. A = = U = R, with E = , so R(A)
2 4 6 0 0 0 −2 1
     
 1  (" #)  −2 −3 
  1  
     
has basis  2  , C(A) has basis , N(A) has basis  1  ,  0  , and N(AT ) has

 
 2 
 

3 0 1
(" #)
−2
basis .
1
     
2 1 3 1 0 2 1 0 0
     
b. Since U =  0 1 −1 , R =  0 1 −1 , and E =  −2 1 0 ,
0 0 0 0 0 0 3 −3 2
         

 2 0  
 2 1  
 −2  
         
R(A) has basis  1  ,  1  , C(A) has basis  4  ,  3  , N(A) has basis  1  ,

 
 
 
 
 

3 −1 3 3 1
 

 3  
T  
and N(A ) has basis  −3  .

 

2
104 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
" # " #
1 −2 1 0 1 −2 0 1
c. Since U = and R = , R(A) has ba-
0 0 1 −1 0 0 1 −1
       

 1 0 
 
 2 −1 


  (" # " #)  
  
 −2   0 

 1 1

   
 1   0 

sis         
 ,  , C(A) has basis , , N(A) has basis   ,   , and

  1   1  
 2 3 

  0   1  


   
0 −1   0 1 
N(AT ) = {0}.
   
1 −1 1 1 0 1 0 2 0 2
   
0 1 1 0 1 0 1 1 0 1
d. Since U =     , and
0 0 0 2 −2 , R =  0 0 0 1 −1 
   
0 0 0 0 0 0 0 0 0 0
     
   1 0 0 

 
1 0 0 0 
     

  
  −1   1   0  

 −1 1 0 0       
E =   2 −2
, R(A) has basis  1  ,  1  ,  0  , C(A) has basis
 1 0 




   
   



  1   0   1  

0 2 −1 2 
 

0 1 −1
   
       −2 −2   
 1 −1 1  
 
  0 

         

      



  −1   −1  



  
 1  0 1     
 ,  ,   , N(A) has basis  1   0  , and N(AT ) has basis  2  .


    
 0   2   2  
 



 

 −1 
 

 
 
  0   1  
 
 

 −1 1 0  
 
  2 
 
0 1
   
1 1 0 1 −1 1 1 0 1 −1
   
0 0 2 −2 2  0 0 1 −1 1 
e. Since U =  0
, R = 
 
, and E =
 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
   
   1 0      
1 0 0 0 
 
  1 0 

       
  
  1   0  



     
 −1 1 0 0        
 , R(A) has basis  0  ,  1  , C(A) has basis  1  ,  1  ,
 −1 −1 0          
 1  
     2   1  
 


  1   −1  
 
 −1 

2 −1 0 1 
 
 1
−1 1
     
 −1 −1 1     

 
  −1 2 
   
 1   0   0     
 
 


  
   
      −1   −1 
N(A) has basis   ,  
 0   1   −1  ,   , and N(A T ) has basis   , 
 1   0 .


        
    

  0   1   0   
 


 
  0 1 
 
0 0 1
   
1 1 0 5 0 −1 1 0 −1 2 0 1
   
0 1 1 3 −2 0 0 1 1 3 0 −2 
f. Here U =     ,
0 0 0 0 7 −7 , R =  0 0 0 0 1 −1 
   
0 0 0 0 0 0 0 0 0 0 0 0
4.4. THE FUNDAMENTAL SUBSPACES 105
     

 1 0 0 

  
      
1 0 0 0 
  0   1   0 
     

  

  
       

 0 1 0 0   −1   1   0 
and E =   , so we see that R(A) has basis   ,   ,   ,
 1 −3 1 0 


 2   3   0 
      


  0   0   1  
−1 −1 −1 1 
      

 

 1 −2 −1 
     

 1 −2 −1 

      
     

 1 1 0  
 −1   −3   2 


 
 
     


      
     

 0   1   −2   1   0   0 
C(A) has basis         , and N(AT )
 −1
,
 ,  2   1  , N(A) has basis

 ,
  ,
 
 0   1   0 

       
 
     


 
 
     

0 4 −1 
 0   0   1 


 

 0 0 1 
 

 −1  

 
 −1 

has basis  
 −1 .

  

 

 1 
 
3 −1 b1
 
4.4.4 a. Reducing the augmented matrix [A|b] yields  0 0 2b1 − b2 . This gives the
0 0 3b1 + b3

constraint
" equations# 2b 1 − b 2 =
" #0 and 3b 1 + b 3 = 0 for C(A) and N(A) = Span (1, 3) . Set
2 −1 0 1
X= and Y = . Then C(A) = N(X) and N(A) = C(Y ).
3 0 1 3
 
h i −1
 
b. X = 3 −2 1 , Y =  1 .
1

" # −2
1 0 −1 0  
c. X = , Y =  1 .
2 −1 0 −1
1
   
0 1
4.4.5 a. First, A must be 3 × 3; let its columns be a1 , a2 , a3 . Since  1  and  0  are in
0 1
 
1
N(A), we know that a2 = 0 and a1 = −a3 . In other words a1 must span C(A). But  1  and
1
 
0
 1  are nonparallel, so no such matrix can exist.
1
A slightly more sophisticated argument would be to show that A must have rank at least 2
since its column space is (at least) a plane, but then its nullspace cannot be (at least) a plane.
106 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
 
1 0 −1 −1
 
b.  0 1 0 0
0 1 0 0
 
2 −1 0
 
c. One example is  0 0 0 .
2 −1 0
h i
d. The matrix 1 1 −1 works.
 
2 0 1
 
e.  0 2 1  works.
2 2 2
" #
0 1
f.
0 0
g. This is impossible by Corollary 4.6, since 1 + 1 6= 3.
   
0 0 1 1
 
4.4.6 a. A =  0 0 0 . C(A) = Span  0 , which is a subset of N(A).
0 0 0 0
   
0 1 0 1
   
b. A =  0 0 1 . N(A) = Span 0 , which is clearly a subset of C(A).
0 0 0 0

c. No: dim N(A) + dim C(A) = 3 is impossible with N(A) = C(A).


 
0 0 1 0
 
0 0 0 1

d. A =  .
0 0 0 0
0 0 0 0
   
1 0 1 0
   
0 1 0 1
   
4.4.7 Set A = 1 −1  0 0 = R; then EA = R, where E =
   
   
1 0 0 0
1 2 0 0
       
1 0 0 0 0 
 −1 −1 −1  
  
      
 0  
 

1 0 0 0

   
 1   0   −2   
 −1  . Therefore, T ) has basis      
 1 1 0 0 N(A

, ,
 1   0   0 . Since
  
      
 −1 0 0 1 0 
  0   1   0  


 

−1 −2 0 0 1 0 0 1
V = C(A), by Proposition 4.8, we have V = (C(A)⊥ )⊥ ; since C(A)⊥ = N(AT ) by Corollary 4.3,
we have V = N(AT )⊥ . Thus, V = {x ∈ R5 : −x1 + x2 + x3 = −x1 + x4 = −x1 − 2x2 + x5 = 0}.
4.4. THE FUNDAMENTAL SUBSPACES 107
   
" # −3 −4
   
1 0 3 4  −2   5
4.4.8 a. Let A = . Then the vectors v1 =     
0 1 2 −5  and v2 =  0 
 1  
0 1

form a basis for N(A). By Proposition 4.2, {v1 , v2 } gives a basis for R(A) = V . ⊥
" #
1 0 3 4
b. W = N(A) where A = . Thus, W ⊥ = N(A)⊥ = R(A) by
0 1 2 −5
   

 1 0 


 
 0   1  
    
Theorem 4.9. Therefore, a basis for W ⊥ is  ,  .


 
 3   2 


 

 4 −5 
 
1 1 0 −2
 
4.4.9 a. Setting A =  1 −1 −1 6 , we have V = N(A) and so V ⊥ = R(A). The
0 1 1 −4
 
1 1 0 −2
 
echelon form of A is  0 2 1 −8 , so the rows of A are linearly independent. Thus,
0 0 1 0
     

 1 1 0  

 
    
 1   −1   1 

 , ,  gives a basis for V ⊥ .
      
 0   −1   1 
 


 −2 
6 −4 
b. Using the same matrix A as in part
a, W = R(A), so W ⊥ = N(A). The reduced
  
 −2  

 
1 0 0 2  4 




    ⊥
echelon form of A is  0 1 0 −4 , so   gives a basis for W .

  0  
0 0 1 0 
 

 1 
h i
c. Let B = −2 4 0 1 . Then we have W ⊥ = R(B), so W = (W ⊥ )⊥ =
R(B)⊥ = N(B).

4.4.10 Since U is a matrix in echelon form, its last m − r rows are 0. When we consider the
matrix product A = BU , we see that every column of A is a linear combinations of the first r
columns of B; hence, these r column vectors span C(A). Since dim C(A) = r, by Proposition 3.9,
these column vectors must give a basis.
 
" #! 1
1  
4.4.11 a. C(A) = Span and R(A) = Span  2 , so, for any b ∈ R we are looking
1
3
      
1 " # 1 " # 1
   1   14 b  
for s ∈ R so that A s  2  = b . But A  2  = , so s = b/14 and x =  2 .
1 14 14
3 3 3
108 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
   
1 0 " #
     b1
b. Here C(A) = R2 and R(A) = Span  1  ,  1 , so given b = , we are
b2
    1 −1
1 0 " #" # " #
    3 0 s b1
looking for x = s  1  + t  1  so that Ax = b. This yields the system = ,
0 2 t b2
1 −1
   
1 0
b1   b2  
so s = b1 /3 and t = b2 /2. Thus, x =  1  +  1 .
3 2
1 −1
 
b1 − b2
 
An alternative solution is as follows. It is easy to see that x0 =  b2  is a solution and
0
   
−2 b1 − b2
   
that a =  1  gives a basis for N(A) = R(A)⊥ . Then x = x0 − proja x0 =  b2  −
1 0
   1

  −2 3 b1
b1 b2   1 
− +  1   3 b1 + 21 b2  ∈ R(A).
=
3 2 1 1
1 3 b1 − 2 b2

4.4.12 a. Let x ∈ N(B). Then Bx = 0, so (AB)x = A(Bx) = A0 = 0, and therefore


x ∈ N(AB). Hence, N(B) ⊂ N(AB).
b. Let b ∈ C(AB). Then b = (AB)x for some x ∈ Rp by Proposition 4.1. By associa-
tivity, b = A(Bx), so b ∈ C(A).
c. N(B) ⊂ N(AB) by part a. Now suppose A is n × n and nonsingular, and let x ∈
N(AB). Then (AB)x = A(Bx) = 0; since A is nonsingular, Bx = 0, so x ∈ N(B). Thus, we have
N(B) = N(AB).
d. By part b, C(AB) ⊂ C(A). Now, suppose b ∈ C(A). Then b = Ay for some y ∈ Rn .
Since B is nonsingular, y = Bx for some x ∈ Rn . Then b = A(Bx) = (AB)x, so b ∈ C(AB).
Hence C(AB) = C(A).

4.4.13 a. By Exercise 12b, C(AB) ⊂ C(A), so, by Exercise 4.3.19 and Theorem 4.5,
rank(AB) = dim C(AB) ≤ dim C(A) = rank(A).
b. This is immediate from Exercise 12d.
c. Exercise 12a says that N(B) ⊂ N(AB), so dim(N(B)) ≤ dim(N(AB)). But
rank(AB) = p − dim(N(AB)) ≤ p − dim(N(B)) = rank(B).
d. This is immediate from Exercise 12c.
e. By part a, rank(A) ≥ rank(AB) = n. But if A is m × n, we know that rank(A) ≤ n,
hence rank(A) = n. By part c, we know rank(B) ≥ rank(AB) = n, but we also know that because
B is n × p, we must have rank(B) ≤ n. Hence rank(B) = n.

4.4.14 Exercise 1.4.32 tells us that N(AT A) ⊂ N(A), and Exercise 12 tells us that N(A) ⊂
N(AT A). Therefore, N(AT A) = N(A).
4.4. THE FUNDAMENTAL SUBSPACES 109

4.4.15 a. By Exercise 12, N(A) ⊂ N(AT A). So it suffices to prove that N(AT A) ⊂ N(A).
Suppose x ∈ N(AT A). Then Ax ∈ N(AT ); on the other hand, Ax ∈ C(A); but we know N(AT ) =
C(A)⊥ , so N(AT ) ∩ C(A) = {0}. Therefore, Ax = 0 and x ∈ N(A), as required.
b. rank(A) = n − dim N(A) = n − dim N(AT A) = rank(AT A).
c. By part b, dim C(AT A) = dim C(AT ). Moreover, by Exercise 12, C(AT A) ⊂ C(AT ).
Therefore, by Lemma 3.8, C(AT A) = C(AT ).

4.4.16 a. Suppose x ∈ C(A). Then, by Proposition 4.1, x = Av for some v ∈ Rn . But then
Ax = A(Av) = A2 v = Av = x. Thus, C(A) ⊂ {x ∈ Rn : x = Ax}. The other inclusion follows
immediately, as Ax ∈ C(A), once again by Proposition 4.1.
b. If x = u − Au for some u ∈ Rn , then Ax = Au − A2 u = Au − Au = 0. Thus,
{x : x = u − Au for some u ∈ Rn } ⊂ N(A). On the other hand, if x ∈ N(A), then Ax = 0, so
x = x − Ax.
c. Let x ∈ C(A) ∩ N(A). Since x ∈ C(A) we know by part a that x = Ax. Since
x ∈ N(A), then Ax = 0, and so x = Ax = 0.
d. Given an arbitrary x ∈ Rn , write x = Ax + (x − Ax). Since Ax ∈ C(A) and
x − Ax ∈ N(A), we have shown that every vector in Rn can be written as the sum of vectors in
C(A) and N(A). That is, Rn = C(A) + N(A).

4.4.17 Since U and V are subspaces, U ⊥ and V ⊥ are subspaces as well, so (U ⊥ + V ⊥ )⊥ =


(U ⊥ )⊥ ∩ (V ⊥ )⊥ by Exercise 1.3.12. Then, by Proposition 4.8, (U ⊥ + V ⊥ )⊥ = U ∩ V , so, applying
Proposition 4.8 again, we have U ⊥ + V ⊥ = (U ∩ V )⊥ .
 
v1
 . 
4.4.18 a. Notice that if u ∈ R and v ∈ R with v =  .. 
m n  T
, then the columns of uv are
vn
given by v1 u, v2 u, . . . , vn u. Now if A is an m × n matrix of rank 1, then A has at least one nonzero
column and all of the other columns of A are multiples of that nonzero column. Let u be some
particular nonzero column of A and define vj so that aj = vj u, j = 1, . . . , n.
To describe the four fundamental subspaces of A, first notice that C(A) = Span(u) and similarly
R(A) = Span(v) (since the rows of A will be multiples of vT ). Furthermore, if x ∈ Rn then
⊥
Ax = (uvT )x = (v · x)u. So x ∈ N(A) if and only if v · x = 0, i.e., N(A) = Span(v) . Similarly,
if y ∈ Rm , we have yT A = yT uvT = (y · u)vT , so y ∈ N(AT ) if and only if y · u = 0, i.e.,
⊥
N(AT ) = Span(u) .
b. If the columns of A are aj , j = 1, . . . , n, then the rows of AT are aT
i , i = 1, . . . , n, so
T T T
the ij-entry of A A is given by ai aj = ai · aj . Thus, A A = In exactly when ai · aj = 1 when
i = j and 0 when i 6= j, i.e., exactly when the aj are mutually orthogonal unit vectors in Rm .
c. Let A = uvT . Then AT A = (vuT )(uvT ) = v(uT u)vT = (u · u)vvT = kuk2 vvT .
Now, from part a we know R(A) = Span(v), so x ∈ R(A) if and only if x = sv for some s ∈ R.
110 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

Then AT Ax = kuk2 vvT (sv) = skuk2 (v · v)v = skuk2 kvk2 v, so AT Ax = x for all x ∈ R(A) if and
only if kuk2 kvk2 = 1, i.e., kukkvk = 1.
 
T uvT u v T u
Now, supposing kukkvk = 1 we have A = uv = = . Letting û =
kukkvk kuk kvk kuk
v T n
and v̂ = , we have kûk = kv̂k = 1 and A = ûv̂ . Since kv̂k = 1, notice that for x ∈ R , the
kvk
projection of x onto v̂ is just (x · v̂)v̂ = (v̂v̂T )x. Thus, T (x) is given by projecting x to v̂ and then
substituting û ∈ Rm for v̂ ∈ Rn .

4.5. The Nonlinear Case: Introduction to Manifolds

4.5.1 a. Although xy = 0 is a graph locally away from the origin, there is no neighborhood of
the origin on which it is a graph.
b. This set is the union of the curves xy = π6 + 2πn, n ∈ Z, and xy = 5π
6 + 2πn, n ∈ Z.
Thus, in a neighborhood of each point satisfying this equation, it is a graph.

4.5.2 In each case, the solution set M = f −1 (0) will be a smooth curve (1-dimensional man-
ifold) provided that for each a ∈ M we have ∇f (a) 6= 0. Equivalently, we must check that if
∇f (x) = 0, then x ∈/ M .   
1 − 3x2 x
a. ∇f = = 0 if and only if 1 − 3x2 = 0 and y = 0. But for such a point ,
2y y
 
x
f = y 2 − x3 + x = x(1 − x2 ) 6= 0, so M is a smooth curve.
y
 
−2x − 3x2
b. ∇f = = 0 if and only if x(2 + 3x) = 0 and y = 0. We note that
2y
   
0 −2/3
f = 0 and f 6= 0, so the origin is a “trouble point.” (See Figure 1.4 of Chapter 2.)
0 0
  " # " #
x
z − xy −y −x 1
c. Consider f : R → R given by f y  =
3 2
2 . Df = . Note that
y−x −2x 1 0
z
Df has rank 2 everywhere (neither row vector can be a scalar multiple of the other). Thus, M is a
smooth curve. (See Figure 1.5 of Chapter 2.)
  " #
x 2 + y2 + z2 − 1
x
d. Consider f : R → R given by f y  =
3 2 . Then
x2 − x + y 2
z
" # " # " #
2x 2y 2z 2x 2y 2z 1 0 2z
Df = ,
2x − 1 2y 0 1 0 2z 0 y z(1 − 2x)

 rank < 2 if and only if y = z(1 − 2x) = 0. Substituting in the original equations, we find
so Dfhas
1
that  0  is the only “trouble point.”
0
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 111
  " #
x
x2 + y 2 + z 2 − 1
e. Consider f : R3 → R2 given by f y  = Then Df = .
" # z 2 − xy
z
2x 2y 2z
. When z = 0, this matrix has rank < 2 if and only if x2 = y 2 . Substituting
−y −x 2z
in the second equation, we find x = y = z = 0, which does not satisfy the first equation. When
z 6= 0, in order for rank(Df ) < 2, we must have 2x + y = 2y + x = 0, which again leads only to the
origin. Thus, M is a smooth curve.
∂f ∂f
4.5.3 a. Note that f is C1 and = x cos(xz) + ez , so (a) = 2 6= 0. It follows from
∂z ∂z  
1
Theorem 5.1 that there is a C1 function φ defined in a neighborhood of so that the surface
−1
 
x
is given by z = φ near a.
y
∂f ∂f
b. We have = y 2 + z cos(xz) and = 2xy. By Lemma 5.2,
∂x ∂y
  ∂f   ∂f
∂φ 1 ∂x 1 ∂φ 1 ∂y
=− ∂f
(a) = − and = − ∂f (a) = 1.
∂x −1 2 ∂y −1  
∂z ∂z
1
c. On one hand, the normal vector of the surface at a is ∇f (a) =  −2 , so the tangent
2
plane is given by 0 = ∇f (a) · (x − a) = (x − 1) − 2(y + 1) + 2z, i.e., x − 2y + 2z = 3. Alternatively,
writing the surface locally as the graph of φ yields the equation
  
1 x−1 1 1 3
z = Dφ = − (x − 1) + (y + 1) = − x + y + .
−1 y+1 2 2 2

∂h
4.5.4 Since 6= 0, the equation h(x) = 0 locally determines x2 as a C1 function of x1 , viz.,
∂x2  
x
x2 = ψ(x1 ). Substituting, we obtain z = φ = xψ(y/x). Then we have
y
 y  y    
∂φ ∂φ ′ y
 1 ′ y y x
x +y =x ψ +x· − 2 ψ +y x· ψ = xψ =φ .
∂x ∂y x x x x x x y
 
x  
y/x
Alternatively, we can apply the Implicit Function Theorem to the function f y  = h .
z/x
z
 
∂f 1 ∂h y/x
By the chain rule, we have = 6= 0, so there is a C1 function φ as asserted. Now
∂z x ∂x2 z/x
we apply Lemma 5.2 to see that

∂φ
∂f
∂x
− xy2 ∂x
∂h
− z ∂h
x2 ∂x2
∂h ∂h
1 y ∂x1 + z ∂x2
= − ∂f = − 1
1 ∂h
= ∂h
∂x x ∂x
x ∂x
∂z 2 2
∂f 1 ∂h ∂h
∂φ ∂y x ∂x1 ∂x1
=− ∂f
=− 1 ∂h
=− ∂h
.
∂y x ∂x2 ∂x2
∂z
112 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

Therefore,
∂h ∂h ∂h  
∂φ ∂φ y ∂x + z ∂x ∂x1 x
x +y = 1
∂h
2
+ −y ∂h = z = φ ,
∂x ∂y y
∂x
2 ∂x 2

as required. (Compare Exercise 3.3.11.)

4.5.5 Let f : Rn → R be given by f (x) = kxk2 . Then Df (x) = 2xT 6= 0 at every point of
S n−1 . Therefore, S n−1 is a smooth hypersurface, i.e., (n − 1)-dimensional manifold.
h i
4.5.6 Df = 12x2 z − 6yz − 6xy 2 −6xz + 12y 2 − 6x2 y 2z + 4x3 − 6xy . Df (x) = 0 if and
only if

2x2 z − yz − xy 2 = −xz + 2y 2 − x2 y = z + 2x3 − 3xy = 0.

Eliminating z using the last equation, the first two equations give x(x2 − y)2 = 0 and (x2 − y)2 = 0,
respectively. (x = 0leads
 to y = z = 0 and nothing else.) Thus, Df fails to have rank 1 at precisely
x
points of the form  x2 , x ∈ R, and these points do indeed lie on M . Away from these points, M
x3
is a smooth surface. As pictured on p. 480 of the text, M is the tangent developable of the twisted
cubic curve, i.e., the locus of tangent lines of this curve. It has a cuspidal edge along the curve and
is smooth everywhere else.
  " #
x 2 + 2y 2 + 3z 2 − 9
x
4.5.7 Consider F : R3 → R2 given by F y  = . Then
x2 + y 2 − z 2
z
" # " #
x 2y 3z x 0 −5z
DF = 2 ,
x y −z 0 y 4z

and this matrix has rank 2 unless (at least) two of x, y, and z are 0. But no such point
 lies
√ on

5 2

F = 0, and so the intersection is a smooth curve. Since N(DF(a)) is spanned by v =  −4 2 ,
1
so the tangent line of the curve is the line through a with direction vector v.
 
1
4.5.8 We see that rank(DF(x)) < 1 at the points x = ±  0 . Indeed, when a = b, the two
0
separate curves pictured in Figure 5.5 meet at those two points, forming an “X” locally at each of
them.
" # " #!
x y x y
4.5.9 The matrix ∈ M2×2 is singular if and only if xw−yz = 0. Letting f =
z w z w
h i
xw − yz, we see that Df = w −z −y x has rank < 1 only at 0. Thus, the set of nonzero
singular 2 × 2 matrices is a smooth hypersurface in M2×2 .
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 113
 
∂f −1
4.5.10 a. The curve looks much like that in Example 1. Here = 0 at the points and
∂y 1/2
 
1
.
−1/2

1/2

-1 1
-1/2

b. For example, using (a + b)3 = a3 + 3a2 b + 3ab2 + b3 , we obtain


1 p 1 
4φ31 (x) − 3φ1 (x) = x + x2 − 1 + √ = x,
2 x + x2 − 1
as required. Similarly, using cos 3θ = 4 cos3 θ − 3 cos θ, we check that

4φ32 (x) − 3φ2 (x) = 4 cos3 ( 31 arccos x) − 3 cos( 13 arccos x) = cos(arccos x) = x.

The remaining functions are as follows:



φ3 (x) = − cos 13 (arccos x + π) , x ∈ (−1, 1)
φ4 (x) = − cos( 13 arccos(−x)), x ∈ (−1, 1)
√ √ 
φ5 (x) = 21 (x − x2 − 1)1/3 + (x − x2 − 1)−1/3 , x ∈ (−∞, −1)
(We figured out the formulas for φ1 and φ5 using Cardano’s formula for the solution of a general
cubic (see, e.g., Shifrin’s Abstract Algebra: A Geometric Approach, pp. 68–71). Those for φ2 , φ3 , φ4
come from the triple angle formula, as we saw.)
 
∂f 1
∂x 1 1
c. According to Lemma 5.2, we should have φ′ (1) = −   = . Now, neither φ1
∂f 1 9
∂y 1
nor φ2 is ostensibly differentiable at 1. However, it is a consequence of L’Hôpital’s rule that if
g is continuous at a and lim g ′ (x) exists, then g ′ (a) = lim g ′ (x). So we only need to check that
x→a x→a
′ 1 ′
lim φ1 (x) = = lim φ2 (x). Well, we have (using L’Hôpital’s rule where appropriate)
x→1+ 9 x→1−
  p p 
′ 1 x
lim φ1 (x) = lim 1 + √ (x + x2 − 1)−2/3 − (x + x2 − 1)−4/3
x→1+ 6 x→1+ x2 − 1

1 1 (x + x2 − 1)2/3 − 1
= lim √ √
6 x→1+ (x + x2 − 1)1/3 x2 − 1
√  
2 2 − 1)−1/3 1 + √ x
1 3 (x + x 2
x −1
= lim
6 x→1+ √ x
x2 −1

1 x + x2 − 1 1
= lim = ;
9 x→1+ x 9
114 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

1 1
lim φ′2 (x) = lim sin( 31 arccos x) √
x→1− x→1− 3 1 − x2
1
1 3 cos( 13 arccos x) · − √1−x
1
2 1 cos( 13 arccos x) 1
= lim x = lim = ,
3 x→1− − √1−x 2
9 x→1− x 9

as required. Thus, φ′+ (1) = φ′− (1) = 1/9 and φ is C1 at 1.


" #
kxk 2−1
4.5.11 a. Let F : R4 → R2 be given by F(x) = . Then
x1 x2 − x3 x4
" #
2x1 2x2 2x3 2x4
DF = .
x2 x1 −x4 −x3
rank(DF(x)) = 0 if and only if x = 0. Now suppose rank(DF(x)) = 1. Then either x1 = x2 and
x3 = −x4 or x1 = −x2 and x3 = x4 . In either event, we cannot have x1 x2 = x3 x4 with x 6= 0.
Thus, rank(DF) = 2 everywhere on M and M is a 2-dimensional manifold.
b. In the first case, we have
" # " #
2 0 0 0 1 0 0 0
DF(a) = ,
0 1 0 0 0 1 0 0
so N(DF(a)) = {x : x1 = x2 = 0} and the tangent plane of M at a is the plane with equations
x1 = 1, x2 = 0.
In the second case, we have
" # " #
1 −1 −1 1 1 −1 0 0
DF(a) = ,
−1/2 1/2 −1/2 1/2 0 0 1 −1
so N(DF(a)) = {x : x1 − x2 = x3 − x4 = 0} and the tangent plane of M at a is the plane with
equations x1 − x2 = 1, x3 − x4 = −1.
∂f
∂φ
4.5.12 We start with = − ∂x
∂f
, and differentiate again, using the chain rule.
∂x
∂z
! ! !
∂f ∂f ∂f
∂2φ ∂ ∂x ∂ ∂x ∂x
= − ∂f + − ∂f · − ∂f
∂x2 ∂x ∂z
∂z ∂z ∂z
 2f

2 2 ∂f ∂f ∂ ∂f ∂ 2 f
∂f ∂ f
2 −
∂f ∂ f
∂x ∂z ∂z∂x − ∂x ∂z 2
= − ∂z ∂x ∂x2 ∂x∂z +  3
∂f ∂f
∂z ∂z
 2  2
∂f ∂2f 2 ∂2f
∂z ∂x2 − 2 ∂f ∂f ∂ f
∂x ∂z ∂x∂z +
∂f
∂x ∂z 2
=−  3 ,
∂f
∂z

as required.

4.5.13 The lines ℓ1 and ℓ2 are skew (neither parallel nor intersecting), so, in fact, through
“most” points P not on either of them there is a unique line that intersects both ℓ1 and ℓ2 . Suppose
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 115
       
a 1 a 0
P =  b . Then the plane P containing ℓ1 and P has normal vector  0  ×  b  =  −c . Thus,
c 0 c b
 
c/b
P has the equation −cy + bz = 0, and the point of intersection of P and ℓ2 is Q =  1 . (Note
c/b
that if b = 0, then P is parallel to ℓ2 .)    
a c − ab
←→
The line P Q therefore has the parametric representation x =  b  +v  b(1 − b) . Now, taking
c c(1 − b)
←→
P to be an arbitrary point on ℓ3 , we find that a point on P Q is of the form
         
x 1+u 2(1 + u) − 2(1 + u) 1+u 0
         
x=y= 2 +v −2 = 2  + 2v  1 
z 2(1 + u) −2(1 + u) 2(1 + u) 1+u
 
1+u
 
= 2(1 + v) .
2(1 + u)(1 + v)

Therefore, the surface formed by all the lines intersecting all the lines ℓ1 , ℓ2 , and ℓ3 has the equation
z = xy. This is a saddle surface, a doubly ruled surface. (See the solution of Exercise 2.1.9.)
 
x0
4.5.14 Suppose a = ∈ X × Y . Since X is a k-dimensional manifold in Rn , there is a
y0
neighborhood U of x0 in Rn so that X ∩ U is the graph of a C1 function f on some open set U
in a coordinate k-plane in Rn . Similarly, since Y is an ℓ-dimensional manifold in Rp , there is a
neighborhood V of y0 in Rp so that Y ∩ V is the graph of a C1 function g on some open subset V
in a coordinate ℓ-plane in Rp . It follows that (X × Y ) ∩ (U × V ) is given as the graph of f × g on
U × V in a neighborhood of a.

4.5.15 a. Since A is an n × (n + 1) matrix of rank n, without loss of generality we may assume


that the reduced echelon form of A is (for some appropriate product E of elementary matrices)
 
1 ··· u1
 
 1 u2 
EA =   .. .. 
.
 . . 
1 un

If Eb = c, then it follows that the solution set of Ax = b consists of the line


 
−u1
 
 −u2 
 
 . 
x = c + t  ..  .
 
 −u 
 n
1
116 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS

Since the direction vector of the line is constant, we see that as b varies, we get parallel lines
ℓb ⊂ Rn+1 . What’s more, given b, b′ ∈ Rn , the distance between the lines ℓb and ℓb′ is at most
kE(b − b′ )k ≤ kEkkb − b′ k, so it is quite reasonable to say the lines vary continuously with b.
In fact, the set of lines in Rn+1 is a manifold. To have a unique way of specifying a line, we
choose a unit direction vector v ∈ S n and the point p ∈ (Span(v))⊥ through which it passes. Thus,
locally, at least, the set of lines looks like S n × Rn .
b. One obvious generalization is to consider matrices of rank r. Then we get parallel
affine (n + 1 − r)-dimensional subspaces and the same argument applies. One might also allow the
matrix A to vary. Then one must decide what it means for two k-dimensional subspaces of Rn+1
to be “close”; this leads to one of the most important constructions in modern mathematics, the
Grassmannian.
CHAPTER 5
Extremum Problems
5.1. Compactness and the Maximum Value Theorem

5.1.1 Let X denote the subset in question.


a. X is compact. It is obviously bounded (since x ∈ X ⇐⇒ kxk = 1) and is closed by
Corollary 3.7 of Chapter 2.
b. X is compact. It is bounded (since x ∈ X ⇐⇒ kxk ≤ 1). R2 −X = {x : x2 +y 2 > 1}
is open by Proposition 3.4 of Chapter 2. Therefore, X is closed, by Proposition 2.1 of Chapter 2.
√ 
n2 + 1
c. X is unbounded, since the point ∈ X for arbitrarily large n ∈ N. There-
n
fore, X is not compact.
d. X is unbounded for the same reason as in part c.  
2/(4k + 1)π
e. X fails to be closed.
 For example, the points xk = lie in X and form
1
0
a sequence converging to , which is not contained in X. Therefore, X is not compact.
1
 
et cos t

f. Since t = et , this subset is neither bounded (let t → ∞) nor closed (let
e sin t
t → −∞). Thus, X fails to be compact.  
e−2πk
g. As in part f, this set is not closed. Taking xk = ∈ X, we have a sequence
0
that converges to 0 ∈
/ X. Therefore, X is not compact.
h. X is compact by the reasoning given for part b.
 
n
i. X is unbounded, since the point  −n  ∈ X for arbitrary large n ∈ N. Therefore, X
0  
is not compact. 1 n 0
 
j. X is unbounded, since we can take A =  1  for arbitrarily large n. Therefore,
X is not compact. 1

k. X is closed, taking f : M2×2 → M2×2 given by f (A) = AT A and applying Corollary


3.7 of Chapter 2. Since the columns of A are unit vectors, the Euclidean length of A ∈ X satisfies

kAk = 2, so X is bounded. Therefore, X is compact.
l. X is closed, taking f : M3×3 → M3×3 given by f (A) = AT A and applying Corollary
3.7 of Chapter 2. Since the columns of A are unit vectors, the Euclidean length of A ∈ X satisfies

kAk = 3, so X is bounded. Therefore, X is compact.

117
118 5. EXTREMUM PROBLEMS

5.1.2 If X is not compact, then either X is unbounded or X fails to be closed. Suppose X


is unbounded; then the continuous function given by f (x) = kxk is unbounded on X. Suppose X
fails to be closed. Then there is a point a ∈ Rn so that a ∈
/ X and yet a is the limit of a sequence of
points xk ∈ X. Then the function given by f (x) = 1/kx−ak is continuous on X and is unbounded,
since f (xk ) → ∞.

5.1.3 The standard matrix of T is aT for some a ∈ Rn and therefore T (x) = a · x. By the
Cauchy-Schwarz inequality, we have |T (x)| ≤ kak whenever kxk = 1, with equality holding when
x is a positive scalar multiple of a. Therefore, kT k = kak.
    
2 1 1
5.1.4 a. For any x ∈ R with kxk = 1, we have Ax = ·x , and so kAxk =
1 1
   
√ 1 1 1
2 · x ≤ 2kxk. Equality holds when x = √ , and so kAk = 2, as required.
1 2 1
    
2 3 1
b. For any x ∈ R with kxk = 1, we have Ax = ·x , and so kAxk =
4 1
   
√ 3 √ 1 3 √
2 · x ≤ 5 2kxk. Equality holds when x = ± , and so kAk = 5 2, as required.
4 5 4

5.1.5 The first inequality was proved in Exercise 2.3.7. For the second inequality, we note that
X X X
a2ij = kaj k2 = kAej k2 ≤ nkAk2 .
i,j j j

5.1.6 Suppose x ∈ Rn and kxk = 1. Then by Proposition 1.3,

k(S ◦ T )(x)k = kS(T (x))k ≤ kSkkT (x)k ≤ kSkkT k.

Thus, kS ◦ T k ≤ kSkkT k.

5.1.7 Suppose x0 ∈ Rn is a unit vector so that kAx0 k = kAk. Choose a unit vector y0 ∈ Rm
that is a positive scalar multiple of Ax0 . Then we have

kAk = Ax0 · y0 = x0 · AT y0 ≤ kx0 kkAT y0 k ≤ kx0 kkAT kky0 k = kAT k.

But replacing A by AT , we get kAT k ≤ kAk. Therefore, kAk = kAT k.

5.1.8 The function f : S → R given by f (x) = kx − ak is continuous. Since S is compact, it is


a consequence of Theorem 1.2 that f achieves its minimum value at some point y ∈ S. Then y is
the desired point.

5.1.9 First, S is closed: Any convergent sequence of points in S has a subsequence converging
to a point of S and hence, by Exercise 2.2.6, must converge itself to that point of S. Next, S
is bounded: If not, we could take xk ∈ S with kxk k > k; then {xk } would have no convergent
subsequence.
5.2. MAXIMUM/MINIMUM PROBLEMS 119

5.1.10 Suppose {bk } is a sequence of points in f (X). There are points ak ∈ X so that bk =
f (ak ). Since X is compact, there is a convergent subsequence {akj }. Say akj → a ∈ X. By
Proposition 3.6 of Chapter 2, the corresponding subsequence {bkj } must therefore converge to
b = f (a), so b ∈ f (X). It follows from Exercise 9 that f (X) is compact.

5.1.11 Form a sequence {ak } by choosing ak ∈ Sk . Since this is a sequence in the compact set
S1 , by Theorem 1.1, there is a subsequence akj converging to some point x ∈ S1 . Since akj ∈ Sk for
all k ≤ kj , we can chop off the first j0 terms of this subsequence and have a convergent sequence
that lives in Sk for all k ≤ kj0 . Since Sk is closed, it follows that x ∈ Sk for all k ≤ kj0 . Letting
j0 → ∞, we infer that x ∈ Sk for all k ∈ N.

5.1.12 Following the hint, suppose that for every k ∈ N the statement X ⊂ U1 ∪ · · · ∪ Uk were
false. Then for every k ∈ N we could choose xk ∈ X so that xk ∈ / U1 ∪ · · · ∪ Uk . Since X is compact,
the sequence {xk } has a convergent subsequence xkj → x0 . Now x0 ∈ Uℓ for some ℓ. Since Uℓ is
open, it follows that, for sufficiently large j, we have xkj ∈ Uℓ . But as soon as kj ≥ ℓ (which must
happen eventually), this contradicts the hypothesis that xkj ∈ / U1 ∪ · · · ∪ Uℓ ∪ · · · ∪ Uk j .

5.1.13 Following the hint, suppose that there were no such number δ > 0. Then for every δ > 0,
in particular, for δ = 1/k for k ∈ N, there would be some xk ∈ X so that B(xk , 1/k) is contained in
none of the open sets Uj . Since X is compact, there is a convergent subsequence xkj → x0 . Now,
x0 ∈ Uℓ for some ℓ, and since Uℓ is open, there is some r > 0 so that B(x0 , r) ⊂ Uℓ . Choose j large
enough so that kxkj − x0 k < r/2 and 1/kj < r/2. Then it follows from the triangle inequality that
for such a j, we have B(xkj , 1/kj ) ⊂ Uℓ , contradicting our hypothesis.

5.2. Maximum/Minimum Problems


h i
5.2.1 a. We have Df = 2x + 3 −4y + 4 , so x is a critical point if and only if 2x + 3 =
 
−3/2
−4y + 4 = 0. Thus, x = is the only critical point of f .
1
h i
b. We have Df = y + 1 x − 1 , so x is a critical point if and only if y + 1 = x − 1 = 0.
 
1
Thus, x = is the only critical point of f .
−1
h i
c. We have Df = cos x cos y , so x is a critical point if and only if cos x = cos y = 0.
 
(2k + 1)π/2
Thus, the critical points of f are , k, ℓ ∈ Z.
(2ℓ + 1)π/2
h i
d. We have Df = 2x − 6xy −3x2 + 3y 2 , so x is a critical point if and only if
   
2 2 0 ±1/3
2x(1 − 3y) = 3(−x + y ) = 0. Thus, the critical points of f are , .
0 1/3
120 5. EXTREMUM PROBLEMS
h i
e. We have Df =2xy + 3x2 − 2x x2 + 2y , so x is a critical point if and only if
     
2 2 0 2 1
2xy + 3x − 2x = x + 2y = 0. Solving this system, we find that , , and are the
0 −2 −1/2
critical points of f .
h i
f. We have Df = e−y 2x −x2 − y 2 + 2y . so x is a critical point if and only if e−y = 0
or 2x = −x2 −y 2 +2y = 0. Since −y 2 2
 0, this means that we must have 2x = −x −y +2y = 0,
 e  is never
0 0
so the critical points of f are and .
0 2
2 2
h i
g. We have Df = e−(x +y )/4 1 + (x − y)(−x/2) −1 + (x − y)(−y/2) . Since eu is
never 0, we see that x is a critical point if and only if 1 + (x − y)(−x/2) = −1 + (x − y)(−y/2) = 0.
Then wehave x(x − y) = 2 and y(x − y) = −2, so y = −x and x2 = 1. Thus, the critical points of
1
f are ± .
−1
h i
h. We have Df = 2xy − 4y x2 − 4x − 2y , so x is a critical point if and only if
2xy − 4y = x2 − 4x − 2y = 0. This means that we have 2y(x − 2) = 0 and 2y = x2− 
4x. If y = 0,
0 4
then x = 0 or x = 4. If x = 2, then y = −2. Thus, the critical points of f are , , and
0 0
 
2
.
−2
h i
i. We have Df = yz − 2x xz − 2y xy + 2z , so x is a critical point if and only if
yz − 2x = xz − 2y = xy + 2z = 0. This means that x = y = z = 0 or xyz = −8. But then we have
x2 = xyz/2 = −4, which is impossible. Therefore, x = 0 is the only critical point of f .
h i
j. We have Df = 3x2 + z 2 − 6x 2y 2xz + 4z , so x is a critical point if and only if
3x2 + z 2 − 6x = 2y = 2xz + 4z = 0. Thus, we have y = 0 and 2z(x + 2) = 0, so if z = 0, then the
2
first equation gives x = 0 or
 x= 2, and  x = −2, then the first equation gives z = −24. Thus,
 if
0 2
the critical points of f are  0  and  0 .
0 0
k. We have
2 +y 2 +z 2 )/6
h i
Df = e−(x 1 − (x/3)(x − y + z) −1 − (y/3)(x − y + z) 1 − (z/3)(x − y + z) ,

so x is a critical point of f if and only if x(x − y + z) = −y(x


 −y + z) = z(x − y + z) = 3. Therefore,
1
x = −y = z = ±1. So the two critical points of f are ±  −1 .
1
h i
l. We have Df = yz − 2x xz − 2y xy − 2z , so x is a critical point if and only if
yz − 2x = xz − 2y = xy − 2z = 0. This means that x = y = z = 0 orxyz
 = 8, from
 which
 it
0 2 2 −2
follows 2 2 2        2 ,
 that
 x = y = z = 4. Therefore, the critical points of f are 0 , 2 , −2 ,
−2 0 2 −2 −2
and  −2 .
2
5.2. MAXIMUM/MINIMUM PROBLEMS 121
 
x
5.2.2 Consider the function f = 31 xy(6 − x − 2y) defined on the compact set X =
y
  
x
: x ≥ 0, y ≥ 0, x + 2y ≤ 6 . Since f is continuous, we are guaranteed that f achieves its
y
global maximum. That maximum must occur at an interior point, since f = 0 on the boundary of
X. Since f is differentiable,
h that maximum pointi must be a critical point.
1
Since Df = 3 2y(3 − x − y) x(6 − x − 4y) , we find the interior critical point by solving the
system of linear equations 3 − x − y = 6 − x − 4y = 0:
" # " #
1 1 3 1 0 2
.
1 4 6 0 1 1
 
2
Then the unique critical point is a = , which must be the global maximum point; f (a) = 4/3
1
is the maximum possible volume of the box.

5.2.3 2 + y 2 + z 2 = r 2 , z ≥ 0. Consider the function


We take the hemisphere to be given by x
   
x p x
f 2 2 2
= (2x)(2y) r − x − y defined on X = 2 2 2
: x ≥ 0, y ≥ 0, x + y ≤ r . Then f
y y
is continuous on the compact set X and is 0 on the boundary. Therefore, f takes on its global
maximum at an interior point, which must necessarily be a critical point of f , since f is differentiable
on the interior of X. h i
4
Since Df = p y(r 2 − 2x2 − y 2 ) x(r 2 − x2 − 2y 2 ) , we see that the only critical
r 2 − x2 − y 2  
r 1
point of f in the interior of X is a = √ . Since this must be the global maximum point, we
3 1
2r 2r r
see that the dimensions of the box with maximum volume are √ × √ × √ .
3 3 3
5.2.4 Since D is compact and the function f is continuous, we are guaranteed that f takes
on its maximum and minimum  values on D. We look first for interior critical points: We have
h i 1
Df = 2x − 2 4y , so a0 = is the lone critical point.
0
Now" consider #the restriction of f to the boundary of D. We parametrize the boundary of D by

2 cos t √
g(t) = √ , t ∈ [0, 2π], and consider (f ◦ g)(t) = 2 + 2 sin2 t − 2 2 cos t. (Alternatively, we
2 sin t
could differentiate f ◦ g by the chain rule, rather than calculating it explicitly.) Then (f ◦ g)′ (t) =
√ √
2(2 sin t cos t + 2 sin t) = 0 if and only if sin t = 0 or cos t = −1/ 2. (Note that we needn’t worry
separately about the endpoints of the closed interval.) This yields four possible critical points on
the boundary:
√   √     
2 − 2 −1 −1
a1 = , a2 = , a3 = , and a4 = .
0 0 1 −1

√ √
We have f (a0 ) = −1, f (a1 ) = 2 − 2 2 ≈ −0.83, f (a2 ) = 2 + 2 2 ≈ 4.83, f (a3 ) = f (a4 ) = 5. Thus,
the minimum temperature on D is −1 and the maximum temperature is 5.
122 5. EXTREMUM PROBLEMS

5.2.5 Consider the two rectangles as shown in the figure. The area of the lower rectangle is

x(1 − x) and that of the upper rectangle is u (1 − u) − (1 − x) . The sum of the areas gives us

u
1−u
x
1−x

    
x x
our function f = x(1 − x) + u(x − u), defined on the domain X = :0≤u≤x≤1 .
u u
Since X is compact and f continuous, we are guaranteed a global maximum value.
  Since Df =
h i 2/3 2/3 1
1 − 2x + u x − 2u , we see that the only critical point is a = , and f = . On
1/3 1/3 3
the other hand, on the boundary of X, the situation degenerates to a single rectangle and the
maximum area of an inscribed rectangle is 1/4. Thus, we obtain the maximum area by taking the
width of the upper rectangle to be 1/3 and that of the lower rectangle to be 2/3.

5.2.6 Let x, y, and z denote the length, width, and height, respectively, of the box, mea-
sured in ft. Given that xy + 2z(x + y) = 12,  we wish to maximize
 the
 volume V = xyz of
x 1 12 − xy
the box. We therefore consider the function f = xy . The domain of f is
y 2 x+y
    
x 0
X = 6= : 0 ≤ xy ≤ 12 . Although f is continuous on X, the set X is not compact,
y 0
so we are not a priori guaranteed a global maximum point for f .
Nevertheless, we first find the critical point(s) of f . We have
1 h i
Df = y 2 (12 − x2 − 2xy) x2 (12 − y 2 − 2xy) ,
2(x + y)2
 
2
and so the only critical point of f in X is at a = . To argue that a is in fact the global
2
maximum point of f on X, we proceed as in Example 5. Note, first of all, that lim f (x) = 0, so
x→0
we may think of f as being a continuous function defined on the compact set
   
x 12
S= : 0 ≤ x ≤ 24, 0 ≤ y ≤ min , 24 .
y x
The Maximum Value Theorem guarantees us a global maximum of the continuous function f on
S. Now, a is the sole critical point of f in the interior 
of S
 and f (a) = 4. Moreover, it is easy to
x
check that on the boundary of or outside S we have f ≤ 3. Thus, a is the global maximum
y
point of f . The largest possible box is 2 ft × 2 ft × 1 ft.

5.2.7 Let x, y, and z denote the length, width, and height, respectively, of the box. Given that
2(xy + xz + yz)=A, we wish
 to maximize
 the volume V = xyz of the box.
 We  therefore
 consider

x A/2 − xy x 0 A
the function f = xy . The domain of f is X = 6= : 0 ≤ xy ≤ .
y x+y y 0 2
5.2. MAXIMUM/MINIMUM PROBLEMS 123

24

xy=12

a
S
x
24

Proceeding
r as in
 the solution of Exercise 6, we determine that the unique critical point of f in X
A 1
is a = ; thus, the putative maximum value of f is (A/6)3/2 .
6 1
Consider the restriction of f to the compact set
   
x √ A √
S= : 0 ≤ x ≤ 4 A, 0 ≤ y ≤ min ,4 A .
y 2x
 
x AA 1 A3/2 A3/2
Then we see that on the boundary of and outside S we have f ≤ √ = < √ .
y 2 24 A 16 6 6
It follows that a is the global maximum point of f . This gives a box with dimensions x = y = z =

A/ 6, i.e., a cube.

5.2.8 Let x, y, and z denote the length, width, and height, respectively, of the box. The cost
of the box is proportional to C = xy + 2z(x + y). We wish to maximize the volume V = xyz
of the box. (We see that this problem is virtually
  identical
 to Exercise
 6, merely replacing 12
x 1 C − xy
with C.) We therefore consider the function f = xy . The domain of f is X =
y 2 x+y
    
x 0
6= : 0 ≤ xy ≤ C . Although f is continuous on X, the set X is not compact, so we are
y 0
not a priori guaranteed a global maximum point for f .
Nevertheless, we first find the critical point(s) of f . We have
1 h i
Df = y 2 (C − x2 − 2xy) x2 (C − y 2 − 2xy) ,
2(x + y)2
r  
C 1
and so the only critical point of f in X is at a = . To argue that a is in fact the global
3 1
maximum point of f on X, we proceed as in Example 5. Note, first of all, that lim f (x) = 0, so
x→0
we may think of f as being a continuous function defined on the compact set
   
x √ C √
S= : 0 ≤ x ≤ 6 C, 0 ≤ y ≤ min ,6 C .
y x
124 5. EXTREMUM PROBLEMS

The Maximum Value Theorem guarantees us a global maximum of the continuous function f on
3/2

S. Now, a is the sole critical point of f in the interior of S and 
f (a)
 = C /6 3. Moreover, it
x
is easy to check that on the boundary of or outside S we have f ≤ C 3/2 /12. Thus, a is the
y
r r r
C C 1 C
global maximum point of f . The largest possible box is × × .
3 3 2 3
x1 x2 x3
5.2.9 The plane in R3 with equation + + = 1 has respective intercepts x, y, and z
x y z
on the coordinate axes and forms a pyramid in the first octant whose volume is V = 16 xyz. This
 
1
1 2 2 2
plane passes through 2  if and only if + + = 1, so that z =
 . (Note that it
2
x y z 1 − x1 − y2
 
x 1 xy
is immediate that x ≥ 1 and y, z ≥ 2.) So we consider the function f = , whose
y 3 1 − x1 − y2
  
x 1 2
domain is X = : x > 0, y > 0, + < 1 . Then
y x y
1 h y xi
Df = y − 2 − 2 x − 1 − 4
3(1 − x1 − y2 )2 x y
1 h i
= x(y − 2) − 2y y(x − 1) − 4x .
3xy(1 − x1 − y2 )2
 
3
Solving for critical points, we find that 2x = y and so x = 3 and y = 6. We claim now that a =
6
is the global minimum point of f . Note that f (a) = 18. Now if we let

60

a S

x y−2x−y = 0

30

  
x 1 2 29
S= : x ≤ 30, y ≤ 60, + ≤ ,
y x y 30
then S is compact and f achieves its global  minimum on S. On the other hand, on the boundary
x
of or outside S, it is easy to check that f ≥ 20. Therefore, a is the global minimum point for
y
x y z
f on all of X. That is, the desired plane is + + = 1.
3 6 6
5.2. MAXIMUM/MINIMUM PROBLEMS 125

5.2.10
  Let x and θ be as indicated in the figure.  Then we wish to maximize the  function
x x 2π
f = (x sin θ)(12 − 2x + x cos θ) on the domain X = : 0 ≤ x ≤ 6, 0 ≤ θ ≤ . Since f
θ θ 3

x
θ
12−2x

is continuous on the compact set X, we are guaranteed a global maximum. Since f is differentiable,
if that maximum occurs at an interior point, it must be a critical point. So we begin by finding the
critical points of f . We have
h i
Df = 2 sin θ(6 − 2x + x cos θ) x (12 − 2x) cos θ + x(cos2 θ − sin2 θ) .

Solving for critical points, we find x = θ = 0 or cos θ = 2 − 6/x, which leads to the equation
  6   6 2 
2 6−x 2− +x 2 2− − 1 = 3(x − 4) = 0.
x x
 
4 √
Therefore we have the single interior critical point a = , and f (a) = 12 3 ≈ 20.8. On the
π/3
       
x 0 6 x √
boundary of X, we have f =f = 0, f = 36 sin θ cos θ ≤ 18, and f ≤8 3≈
0 θ θ 2π/3
13.9, so the global maximum does in fact occur in the interior. We achieve the trough of maximum
cross-sectional area by bending up 4′′ on either side at an angle of π/3.

5.2.11 Let x be the base of the rectangle and y its height, and let θ be the base angle of the

isosceles triangle. Then the perimeter of the pentagon is P = x(1 + sec θ) + 2y and its area is
 given

1 2 1
 x
by A = xy + 4 x tan θ. So we have y = 2 P − x(1 + sec θ) and consider the function f =
θ
   
1 2 1
 x P P
2 P x + x ( 2 tan θ − sec θ − 1) on the domain X = θ
: 0 ≤ x ≤ , 0 ≤ θ ≤ arcsec
2 x
−1 .
Since X is compact and f is continuous, we are guaranteed a global maximum.
We have
1h 1 2 1 2
i
Df = P + 2x( 2 tan θ − sec θ − 1) x ( 2 sec θ − sec θ tan θ) ,
2
so at an interior critical point we must have 21 sec θ = tan θ, whence θ = π/6. It then follows that
√ √ √
x = P/(2 + 3) = (2 − 3)P . These dimensions give an area of A0 = 41 (2 − 3)P 2 . To verify that
126 5. EXTREMUM PROBLEMS

this is the global maximum of f , we note that f = 0 on the boundary of X except in the instance
that we let y = 0 and the pentagon degenerates to a triangle; in this case, the maximum area is

obtained when we have an equilateral triangle with sides P/3 and area P 2 3/36, which is indeed
less than A0 . Thus, the pentagon of maximum area is obtained by taking a rectangle with base
√ √ √
P (2 − 3), height P (3 − 3)/6 and isosceles triangle with height P (2 3 − 3)/6.
 
x
5.2.12 We wish to maximize/minimize the function f = −(x + 2y) with x2 + y 2 = 1. This
y
is really just a one-variable problem if we parametrize the circle. That is, we must find the extrema
of the function g(t) = −(cos t + 2 sin t), t ∈ [0, 2π]. We have g ′ (t) = sin t − 2 cos t = 0 if and only if
√ √
tan t = 2, so the maximum of g occurs when cos t = −1/ 5 and sin t = −2/ 5, and theminimum √ 
√ √ −1/ 5

occurs when cos t = 1/ 5 and 
sin t = 2 5. That is, the highest point on the ellipse is −2 5 
√  √
1/ 5 5

and the lowest point is  2 √5 .
− 5

Given x, y, z > 0 
with 2 3
5.2.13  xy z = 108, we wish to find the 
minimum
 valueof x + y + z. So
y 108 y
we consider the function f = y + z + 2 3 defined on X = : y, z > 0 . We first find
z y z z
 
2 · 108 3 · 108
the critical points of f . Well, Df = 1 − 3 3 1 − 2 4 , so at a critical point we must have
y z y z  
2
3y = 2z and so y = 2 and z = 3. Now, we must argue that a = is the global minimum point
3
of f . Note first that f (a) = 6.   
1y 1
Now consider the compact set S = ≤ y, ≤ z, y + z ≤ 6 . Then the continuous
:
4z 2  
y
function f takes on its minimum on S. Now if y + z ≥ 6, we have f > 6; if y ≤ 1/4 and z ≤ 6,
z
   
y 108 y 108
then we have f ≥ 1 2 3 = 8; and if z ≤ 1/2 and y ≤ 6, then we have f ≥ 2 1 3 = 24.
z (4) 6 z 6 (2)
Therefore, on the boundary of or outside S the values of f are strictly greater than f (a) = 6. It
follows that a is the global minimum point of f on X.

5.2.14 Actually, no calculus whatsoever is required here: Just note that, by completing the
square, we obtain
 
1
k
X 2 Xk
1 k
X 2

f (x) = k x − aj + kaj k2 − aj  ,
k k
j=1 j=1 j=1

k
1X
and so the global minimum point is aj , the average (or center of mass) of the given vectors.
k
j=1
P
k 1 Pk
If we use calculus, note that ∇f (x) = 2 (x − aj ), so ∇f (x) = 0 if and only if x = aj ,
j=1 k j=1
1 Pk
as before. We must now argue that a = aj is the global minimum point for f . Let R =
k j=1
5.2. MAXIMUM/MINIMUM PROBLEMS 127

R 2R
×
a

max{kaj −ak, j = 1, . . . , k}. Then f (a) ≤ kR2 . Now if kx−ak ≥ 2R, then by the triangle inequality,
we have f (x) ≥ kR2 . Indeed, as long as not all the aj are the same, we’ll have f (x) > kR2 . From
this we infer that the global minimum of f on the compact set B(a, 2R) must be at the critical
point, a, and that, moreover, this is the global minimum of f on Rn , since f (a) ≤ kR2 < f (x) for
all x ∈
/ B(a, 2R).

5.2.15 Using the result of Example 1 of Chapter 3, Section 4, we see that if fj (x) = kx − aj k,
x − aj
j = 1, 2, 3, then ∇fj (x) = . (Note that fj is differentiable except at aj , so f is differen-
kx − aj k
tiable except at a1 , a2 , and a3 .) Then x is a critical point of f if and only if1
3
X x − a1 x − a2 x − a3
∇fj (x) = + + = 0.
kx − a1 k kx − a2 k kx − a3 k
j=1

Now we need the result of Exercise 1.2.6. In order for three unit vectors in R2 to add up to 0,

a2
a3
a

a1

they must make an angle of 2π/3 with one another. Thus, we need to find a (presumably inside
the triangle with vertices aj ) with the property that the line segments from a to the vertices of the
triangle form angles of 2π/3. Such a point is sometimes called the Fermat point of the triangle. As
1
Remark. We could also arrive at this point on physical grounds: Hanging equal weights by massless strings
passing through a table at positions aj and attaching the other ends of the strings to a massless ring, the system
reaches equilibrium when the weights are as low as possible, hence when the sum of the distances from the ring to
the vertices is as small as possible. Each mass exerts a force on the ring pointing from the ring to the point aj ; since
the masses are equal, the forces have equal magnitudes. But at equilibrium, the force vectors must sum to 0.
128 5. EXTREMUM PROBLEMS

we shall establish shortly, such a point exists provided all of the angles of the triangle are less than
2π/3. If one of the angles is 2π/3 or greater, then the minimum point falls at that vertex.
Why is the Fermat point the global minimum of f ? We could attempt a compactness argument
like that in Exercise 14, but since f fails to be differentiable at the vertices, we’ll need to be more
careful. Note first that the union of the triangle and its interior forms a compact set, on which the
continuous function f must have a minimum, a′ . Next, if x is outside the triangle, then clearly
∇f (x) points away from the triangle, so we can decrease f by moving the point back towards the
triangle. Therefore, the point a′ is the global minimum point for f on all of R2 . Indeed, examining
∇f along the edges of our triangle, we see that moving the point toward the opposite vertex will
decrease f , so the minimum point cannot be on any edge of the triangle. When the angles of
the triangle are all less than 2π/3, we get (as we see below) the Fermat point in the interior of
the triangle, and examining ∇f as we move toward any vertex, we see that f increases; thus, the
Fermat point a is the global minimum. If some angle is 2π/3 or greater, then there can be no
critical point in the interior (since the angle joining the vertices remote from that angle can be no
smaller than it is at the vertex itself), and so the minimum must be at one of the vertices. It is
clear that the vertex with the largest angle gives the smallest value of f .
The geometric construction of the Fermat point is pictured in the figure below. Recall that an

angle inscribed in a circle subtends double its measure in arc. So given points A and B, the locus
of points P with ∠AP B measuring 2π/3 consists of an arc of circle and its mirror image, the arc
forming a central angle of 2π/3. It is easy to see that the two circles will be be the circumcircles
for the equilateral triangles with AB as one side. So we construct these circles, as pictured in the
figure, for any two sides of our triangle; their point of intersection must be the Fermat point. (Note
that the third circle is redundant, since 2π − (2π/3 + 2π/3) = 2π/3.)
An alternative geometric construction is shown in the figure below. We construct equilateral
triangles on each of the sides of the given triangle. Let Q be inside △ABC, and let Q′ be the point
obtained by rotating π/3 about vertex A. Then
−→ −−→ −−→ −−→ −−→ −−− → −−→
kAQk + kBQk + kCQk = kQ′ Qk + kQBk + kB ′ Q′ k ≥ kB ′ Bk,
5.3. QUADRATIC FORMS AND THE SECOND DERIVATIVE TEST 129

A′

B′ Q′ Q P
B
A

Q′′

C′

with equality holding if and only if B ′ , Q′ , Q, and B are collinear. Now, let P be the intersection
of BB ′ and CC ′ ; substituting Q = P , we see that P ′ lies on BB ′ if and only if P lies on CC ′ .
Therefore, B ′ , P ′ , P , and B are indeed collinear, so we have equality and so P is the Fermat point.
To see that, in fact, AA′ passes through P requires the following observation. First, by side-
−−→ −−→
angle-side, △BAB ′ ∼ = △C ′ AC, and so kBB ′ k = kCC ′ k. Rotating π/3 about B, we deduce that
−→ −−→ −−→ −−→ −−→ −−→ −−→ −−→
kAQk + kBQk + kCQk = kCQ′′ k + kQ′′ Qk + kQCk ≥ kC ′ Ck = kB ′ Bk,

and we’ve already established that equality holds with Q = P . Therefore, C ′ , P ′′ , P , and C must
be collinear, and P ′′ lies on CC ′ if and only if P lies on AA′ . Thus, AA′ passes through P , as we
wished to establish. (See also “The Fermat-Steiner Problem,” by Shay Gueron and Ran Tessler, in
The American Mathematical Monthly, vol. 109, no. 5, May, 2002.)

5.3. Quadratic Forms and the Second Derivative Test


" #
2
5.3.1 a. We have Hess(f )(x) = , so the critical point is a saddle point.
−4
" #
0
1
b. We have Hess(f )(x) = , so the critical point is a saddle point.
1
0
" #
− sin x
c. We have Hess(f )(x) = , so
− sin y
  " #
(2k + 1)π/2 (−1)k
Hess(f ) = .
(2ℓ + 1)π/2 (−1)ℓ
130 5. EXTREMUM PROBLEMS
 
(2k + 1)π/2
It follows that the critical point is a local minimum point when k and ℓ are even,
(2ℓ + 1)π/2
a local maximum point when k and ℓ are odd, and a saddle point when k and ℓ are of different
parity.
" #
2 − 6y −6x
d. Hess(f )(x) = , so
−6x 6y
           
0 2 0 1/3 0 −2 −1/3 0 2
Hess(f ) = , Hess(f ) = , and Hess(f ) = .
0 0 0 1/3 −2 2 1/3 2 2
 
±1/3
Thus, are saddle points, and the second derivative test is inconclusive at 0. However, it
1/3
   
x 0
is easy to tell directly that the origin is a saddle point, as f > 0 for x 6= 0 and f < 0 for
0 y
y < 0.
# "
2y + 6x − 2 2x
e. Hess(f )(x) = , so
2x 2
           
0 −2 0 2 6 4 1 3 2
Hess(f ) = , Hess(f ) = , and Hess(f ) = .
0 0 2 −2 4 2 −1/2 2 2
     
0 2 1
Therefore, and
are saddle points and is a local minimum.
0 −2 −1/2
" #    
−y 2 −2x 0 2 0
f. Hess(f )(x) = e , so Hess(f ) = and
−2x 2 + x2 − 4y + y 2 0 0 2
       
0 2 0 0 0
Hess(f ) = e−2 . Thus, is a local minimum point and is a saddle point.
2 0 −2 0 2
" #
1 −(x2 +y2 )/4 x3 − 6x − x2 y + 2y x2 y − xy 2 + 2x − 2y
g. Hess(f )(x) = e , so
4 x2 y − xy 2 + 2x − 2y xy 2 − y 3 − 2x + 6y
         
1 1 −3 1 −1 1 3 −1 1
Hess(f ) = √ and Hess(f ) = √ . Thus is a local
−1 2 e 1 −3 1 2 e −1 3 −1
−1
maximum point and is a local minimum point.
1
" #
2y 2(x − 2)
h. Hess(f )(x) = , so
2(x − 2) −2
           
0 0 −4 4 0 4 2 −4 0
Hess(f ) = , Hess(f ) = , and Hess(f ) = .
0 −4 −2 0 4 −2 −2 0 −2
     
0 4 2
Thus, and are saddle points and is a local maximum point.
0 0 −2
   
−2 z y −2
   
i. Hess(f )(x) =  z −2 x , so Hess(f )(0) =  −2  and we see that 0
y x 2 2
is a saddle point.
5.3. QUADRATIC FORMS AND THE SECOND DERIVATIVE TEST 131
     
6x − 6 0 2z 0 −6
 
j. Hess(f )(x) =  0 2 0 , so Hess(f ) 0
 =  2  and
2z 0 4 0 4
       
2 6 0 2
Hess(f ) 0 =  2 . Thus,  0  is a saddle point and  0  is a local minimum point.
0 4 0 0
1 2 2 2
k. Hess(f )(x) = e−(x +y +z )/6 ×
9
 
x3 − x2 y + x2 z − 9x + 3y − 3z x2 y − xy 2 + xyz + 3x − 3y x2 z + xz 2 − xyz − 3x − 3z
 
 x2 y − xy 2 + xyz + 3x − 3y xy 2 + y 2 z − y 3 − 3x + 9y − 3z yz 2 − y 2 z + xyz − 3y + 3z  ,
x2 z + xz 2 − xyz − 3x − 3z yz 2 − y 2 z + xyz − 3y + 3z xz 2 − yz 2 + z 3 − 3x + 3y − 9z

so
       
1 −4 1 −1 −1 4 −1 1
1 1
Hess(f ) −1 = √  1 −4 1  and Hess(f )  1 = √  −1 4 −1  .
3 e 3 e
1 −1 1 −4 −1 1 −1 4

Now, we have
     
−4 1 −1 1 −4 1 − 41 1
4
   1   
 1 −4 1  =  −4 1  − 15
4  1 − 51 ,
1
−1 1 −4 4 − 15 1 − 18
5 1
| {z }| {z }| {z }
L D LT
   
1 −1
so we see that  −1  is a local maximum point and  1  is a local minimum point.
1 −1
       
−2 z y 0 −2 2
 
l. Hess(f )(x) =  z −2 x , so Hess(f ) 0 =  −2 , Hess(f ) 2 =
y x −2 0 −2 2
         
−2 2
2 2 −2 −2 −2 −2 −2 −2 2
 2 −2  
2 , Hess(f ) −2  =  −2 −2 
2 , Hess(f )  2 =  −2 −2 −2 , and
2 2 −2 −2 −2 2 −2 −2 2 −2 −2
   
−2 −2 2 −2
Hess(f ) −2 =  2 −2 −2 . Clearly, the origin is a local maximum point. All the remaining
2 −2 −2 −2
critical points are saddle points. For example, we have
     
−2 2 2 1 −2 1 −1 −1
   1   
 2 −2 2  =  −1 2 − 21   4  1
2 1
2 2 −2 −1 1 1 −4 − 12 1
| {z }| {z }| {z }
P D PT

(see Exercise 8).


132 5. EXTREMUM PROBLEMS
ih
5.3.2 a. 8x3 − 6xy −3x2 + 2y , it is clear that the origin is a critical point
Since Df =
   
0 2 x
of f . Restricting to the axes, we have f = y and f = 2x4 , so the origin is a global
y 0
 in either case. On the other hand, restricting to the line y = mx, m 6= 0, we have
minimumpoint
x
g(x) = f = 2x4 − 3mx3 + m2 x2 and g ′′ (0) = 2m2 > 0, so the origin is a local minimum point
mx
for g.
b. Note that 2x4 − 3x2 y + y 2 = (2x2 − y)(x2 − y), so f < 0 at any point other than the
origin on the curve y = 32 x2 . It follows (combining this observation with the results of part a) that
the origin is a saddle point of f .
2 2
h i
5.3.3 We have Df = 2e−(x +y ) x(2 − 2x2 − y 2 ) y(1 − 2x2 − y 2 ) , so there are critical points
     
0 ±1 0
at , , and . We have
0 0 ±1
" #
−(x2 +y 2 ) 4x4 + 2x2 y 2 − 10x2 − y 2 + 2 2xy(2x2 + y 2 − 3)
Hess(f )(x) = 2e ,
2xy(2x2 + y 2 − 3) 4x2 y 2 + 2y 4 − 2x2 − 5y 2 + 1
so
           
0 4 1 −8 ±1 0 1 2
Hess(f ) = , , and Hess(f )
Hess(f ) = = .
0
2 e −2 0 ±1 e −4
     
0 ±1 0
Therefore, is a (global) minimum point, are (global) maximum points, and are
0 0 ±1
saddle points. Thus, we see two mountain peaks (global maxima) joined by two ridges (two saddle

points) with a deep valley (global minimum) between them.


h i
5.3.4 a. We have Df = 3(x2 − ey ) 3(−xey + e3y ) , so at a critical point we must have
x2 − ey = ey (−x + e2y ) = 0. Therefore, we have 0 = −x + x4 = x(−1 + x3 ), so x = 1 (we
1
cannot have x = 0 since ey 6= 0). Thus, the unique critical point of f is a = . Now,
0
" # " #
2x −ey 2 −1
Hess(f )(x) = 3 y y 3y
, so Hess(f )(a) = 3 . It follows that a is a local
−e −xe + 3e −1 2
minimum point.
5.3. QUADRATIC FORMS AND THE SECOND DERIVATIVE TEST 133
 
−3
b. We have f (a) = −1 and f = −17, so a is obviously not a global minimum
0
 
x
point. Indeed, lim f = −∞ for any b.
x→−∞ b

∂2f ∂2f ∂2f


5.3.5 If f is harmonic, then we have + = 0. Therefore, if (a) 6= 0, we infer that
∂x2 ∂y 2 ∂x2
∂2f ∂2f
(a) and (a) have different signs, so Hf (a) is indefinite. Therefore, a can be neither a local
∂x2 ∂y 2
minimum nor a local maximum point.

5.3.6 a. We have
" # " # " # " #" #" #
1 3 1 h ih i
0 0 1 1 1 3
= 1 1 3 + = .
3 13 3 0 4 3 1 4 1
| {z }| {z }| {z }
L D LT

Since both entries of D are positive, the quadratic form Q is positive definite.
b. We have
" # " # " # " #" #" #
2 3 1 h ih 3
i 0 0 1 2 1 3
2
= 3 2 1 2
+ = .
3 4 2 0 − 12 3
2 1 − 12 1
| {z }| {z } | {z }
L D LT

Since the entries of D have opposite signs, the quadratic form Q is indefinite.
c. We have
    
2 2 −2 1 h ih i 0 0 0
     
 2 −1 4 =  1 2 1 1 −1 +  0 −3 6
−2 4 1 −1 0 6 −1
   
1 h ih i 0 h ih i
   
=  1 2 1 1 −1 +  1  −3 0 1 −2
−1 −2
 
0 h ih i
 
+  0  11 0 0 1
1
   
1 2 1 1 −1
   
= 1 1  −3  1 −2  .
−1 −2 1 11 1
| {z }| {z }| {z }
L D LT

Since the entries of D have different signs, the quadratic form Q is indefinite.
134 5. EXTREMUM PROBLEMS

d. We have
     
1 −2 2 1 h ih i0 0 0
     
 −2 6 −6  =  −2  1 1 −2 2 +0 2 −2 
2 −6 9 2 0 −2 5
   
1 h ih i 0 h ih i
   
=  −2  1 1 −2 2 + 1 2 0 1 −1
2 −1
 
0 h ih i
 
+0 3 0 0 1
1
   
1 1 1 −2 2
   
=  −2 1  2  1 −1  .
2 −1 1 3 1
| {z }| {z }| {z }
L D LT

Since the entries of D are all positive, the quadratic form Q is positive definite.
e. We have
     
1 1 −3 1 1 0 0 0 0
   h ih i  
 1 0 −3 0    0 −1 −1 
  =  1 1 1 1 −3 1 +
0 
 −3 −3 −1    0 2
 11   −3   0 2 
1 0 −1 2 1 0 −1 2 1
   
1 0
  i  
 1h ih 1h ih i
= 
 −3  1 1 1 −3 1 +  
 0  −1 0 1 0 1
   
1 1
 
0 0 0 0
 
0 0 0 0
+
0

 0 2 2
0 0 2 2
   
1 0
  i  
 1h ih 1h ih i

=  1 1 1 −3 1 + 
 −1 0 1 0 1
 
 −3  0
1 1
 
0
 h ih i
0
+ 
1 2 0 0 1 1
 
1
5.4. LAGRANGE MULTIPLIERS 135
   
1 1 1 1 −3 1
   
 1 1  −1  1 0 1
=
 −3




.
 0 1  2  1 1
1 1 1 1 0 1
| {z }| {z }| {z }
L D LT

Since there is a zero entry in D and the remaining entries have different signs, the quadratic form
Q is indefinite.

5.3.7 Suppose LDU = L′ D ′ U ′ . Then (L′−1 L)D = D ′ (U ′ U −1 ). Note first that L′−1 L is lower
triangular and U ′ U −1 is upper triangular. Thus, (L′−1 L)D is lower triangular and D ′ (U ′ U −1 ) is
upper triangular. Since the diagonal entries of L′−1 L and U ′ U −1 are all 1’s, we must have D = D ′
and then L′−1 L = I = U ′ U −1 , so L = L′ and U = U ′ .

5.3.8 a. We have
" # " #" #" #
1 2 1 1 1 2
B= = ,
2 0 2 1 −4 1
| {z }| {z } | {z }
L D LT
" #" # # "
0 1 1 0 2 1
so A = EDE T , where E = E1−1 L = = . This means that Q(x) =
1 0 2 1 1 0
xT Ax = xT (EDE T )x = (E T x)T D(E T x). Letting y = E T x, we have Q(x) = y12 − 4y22 . (Indeed,
we have x = (E T )−1 y, so Q(x) = 4(y2 )(y1 − 2y2 ) + (y1 − 2y2 )2 , which checks.)
" #
1
1 2 , we have E AE T = B, as promised. Then B = LDLT , with
b. With E1 = 1 1
0 1
" # " # " #" #
1
1 1 1 − 1 0
L= and D = . Thus, A = EDE T , where E = E1−1 L = 2 =
1 1 −1 0 1 1 1
" #
1 1
2 − 2 . Thus, we have Q(x) = 2x x = y 2 − y 2 , where y = E T x.
1 2 1 2
1 1

5.4. Lagrange Multipliers


 
x
5.4.1 a. Let g = x + y − 2. Then, solving for points x at which Df (x) = λDg(x) for some
y
h i h i
scalar λ, we find 2 x y =λ 1 1 ⇐⇒ x = y. Substituting in the constraint equation,
 
1
we find the single constrained critical point a = . Thus, the minimum value of f on the given
1
curve is 2. The function f obviously has no maximum because the curve is unbounded.
h i
b. Here the rôles of f and g reverse, and we seek points x at which 1 1 =
h i
λ x y for some scalar λ. Once again, we obtain x = y, and this leads to two critical points,
136 5. EXTREMUM PROBLEMS
   
1 −1
a1 = and a2 = . Here, a1 is the maximum point and a2 is the minimum point. (Since
1 −1
the constraint curve is compact here, we must have both.) The maximum value of g is 2.
c. Because both functions increase as we move outwards from the origin, if a is a min-
imum of f subject to the constraint g = 0, then a is a maximum of g subject to the constraint
f = f (a). (This is a special case of the duality principle in linear programming.) This is best
understood by visualizing the level curves of both functions.

5.4.2 Since f is continuous and the circle is compact, we are guaranteed


h maximum
i h and mini-i
mum values of f . We must solve for the points x on the circle for which 4x 3 = λ x y − 1
1 y−1
for some scalar λ. This occurs if either x = 0 (and y is arbitrary) or = . Substituting in
     √  4 3
0 0 ± 7/4
the constraint equation, we find the points , , . Substituting these in the tem-
0 2 7/4
     √ 
0 0 ± 7/4 49
perature function, we have T = 0, T = 6, T = . Therefore, the constrained
0 2 7/4 8
minimum value of T is 0 and the constrained maximum value is 49/8.

5.4.3 Since f is continuous and the sphere is compact, we are guaranteed a maximum value.
2
We wish to find theh maximum point i = kxk − 4 = 0. Thus, we seek x on
i of hf on the set g(x)
the sphere so that 2 2 −1 = λ x y z for some scalar λ. So we must find the
x y
points on the constraint surface satisfying = = −z. We find x = y = −2z with z = ±2/3.
 2 2
−2
2 
Thus,
 the two critical points are ± 3 −2 . The maximum value of f on the sphere is therefore
4/3 1
f  4/3  = 6.
−2/3

5.4.4 Since f is continuous and D is compact, we are guaranteed maximum and minimum
values of f . The only critical point in the interior of D is the origin. We then check for constrained
hcritical points oni the boundary
h i by using Lagrange multipliers. We seek x on the circle so that
2x + y x + 2y = λ x y for some scalar λ. Provided we are not dividing by 0, this leads to
 
2x + y x + 2y y x 1 ±1
= , so 2 + = + 2, and y = ±x. This yields the four critical points √ .
x y x y 2 ±1
(Note that if x = 0, then y = 0, and vice versa, so there are no additional critical points.) Then
we have     √    √ 
0 1/ 2 3 −1/ 2 1
f = 0, f ± √ = , and f ± √ = ,
0 1/ 2 2 1/ 2 2
so the minimum value of f on D is 0 and the maximum value is 3/2.
   
x 2 2 x
5.4.5 We wish to minimize f = (x − 1) + y subject to the constraint g = x2 +
y y
h i h i
4y 2 − 4 = 0. Thus, we seek points x on the constraint curve satisfying x − 1 y = λ x 4y for
x
some scalar λ. Thus, either = 4, x = 1 and y is arbitrary, or y = 0 and x is arbitrary. This
x−1
5.4. LAGRANGE MULTIPLIERS 137
         
±2 0 4/3 ±2 0
leads to the potential critical points , , √ . Since f = 1, f = 2,
0 ±1 ± 5/3 0 ±1
     
4/3 4/3 1
and f √ = 2/3, we see that √ are the points on the ellipse closest to .
± 5/3 ± 5/3 0
 
x
5.4.6 We want to find the extrema of f subject to the constraint g y  = x2 + y 2 + z 2 − 3 =
z
0. Since f is continuous and the sphere is compact, we are hguaranteed
i a global
h maximum
i and
minimum of f . We seek points x on the sphere satisfying x 1 1 = λ x y z for some
p
scalar
 λ. Therefore,
 we
 eitherhave x = 0 and y =
 z = ± 3/2 or y = z =  x = ±1. Since
 1 and
p0 √ p0 √ ±1 ±1
f p3/2 = 2 6, f −p3/2 = −2 6, and f  1  = 5, it follows that  1  are the warmest
3/2 − 3/2 1 1
 
p0
points and  −p3/2  is the coldest point.
− 3/2
 
±x
5.4.7 Let the vertices of the box be at the points  ±y , x, y, z ≥ 0. Then we wish to maximize
±z
   
x x
y2 z2
the function f y = 8xyz subject to the constraint g y  = x2 +
   + − 1 = 0. Since f
2 3
z z
is continuous and the portion of the ellipsoid in the first octant
h is compact,
i wehare guaranteed i a
maximum. We seek points x satisfying the constraint so that yz xz xy = λ x y/2 z/3 for
some scalar λ. Since we obviously do not obtain maximum volume when any coordinate is 0, we
x y z
find that = = , and so y 2 = 2x2 and z 2 = 3x2 . Solving, we obtain the critical point
 √  yz 2xz 3xy
1/ 3 √
√ √
 2/ 3 . Thus, the greatest volume of a box inscribed in the ellipsoid is 8 2/3.
1
 
x
5.4.8 We want to find the extrema of the function f subject to the constraint g y  =
z
4x2 + + y2 4z 2
− 16 = 0. Since f is continuous and the ellipsoid is compact, we hare guaranteedi a
global maximum and minimum of f . We seek points x on the ellipsoid satisfying 4x z y − 4 =
h i y 4z
λ 4x y 4z for some scalar λ. Therefore, we must have x = 0 and = or x 6= 0
   z  −4 
y 
0 0 ±4/3
y 4z
and 1 = = . We thus obtain the potential critical points  4 ,  −2√
, and  −4/3 .
z y−4
0 ± 3 −4/3
√ √
 f − 600 at these points yields
Evaluating
  the values 0, −6 3, 6 3, and 64/9, so the hottest point
0 0
is  √
−2  and the coldest point is  −2 .

− 3 3
138 5. EXTREMUM PROBLEMS
 
x
5.4.9 We want to find the extrema of f subject to the constraint g y  = x2 +y 2 +z 2 −2z = 0.
z
Since f is continuous and the sphere is compact, we
h are guaranteed
i a global
h maximum
i and minimum
2
of f . We seek points x on the sphere satisfying y x z − 1 = λ x y z − 1 for some scalar
λ. Now, we get solutions if x = y = 0 and z is arbitrary or if z = 1 and y = ±x. Otherwise, we
must have

y x z2 − 1
= = = z + 1,
x y z−1

so either y = x and z = 0 or y = −x andz = −2


 (which
 cannot
 occur
√  on the sphere). The complete
0 0 ±1/ 2

list of potential critical points, then, is  0 ,  0 , and  ±1/ 2 . Evaluating f at these points,
0 2 1
  
√   √ 
0 1/ 2 −1/ 2
√ √
we find that  0  is the hottest point and  −1/ 2  and  1/ 2  are the coldest points.
2 1 1

5.4.10 Since f is continuous and S is compact, it follows that f must achieve its hmaximum and
i
minimum values. On the upper hemisphere, a constrained critical point must satisfy y x 3z 2 =
h i
λ x y z for some scalar λ, so either x = y = 0 and z = 1 or y = x and z = 1/3. The latter
   
2/3 −2/3
leads to the critical points  2/3  and  −2/3 . On the boundary circle, we are merely looking
1/3 1/3
   
x 1
1
for critical points of f y  = xy subject to the constraint x2 + y 2 = 1. There are four: ± √  1 
0 2 0
   
−1 0
1 
and ± √ 1 . Evaluating f at all seven points, we find that the maximum point is 0  and
 
2 0 1
 
−1
1
the minimum points are ± √  1 .
2 0

5.4.11 As suggested in the hint, let x, y, and z denote the central


  angles subtended by the
x
vertices of the inscribed triangle. Then the area of the triangle is f y  = 12 (sin x + sin y + sin z).
z
 
x
We wish to maximize f subject to the constraint g y  = x + y + z − 2π = 0. Note that the
z
constraint set should really be x + y + z = 2π, 0 ≤ x, y, z ≤ π, so this is a compact subset of
R3 . Therefore, the
h continuous function
i f ish guaranteed toiachieve its maximum. That maximum
will occur when cos x cos y cos z = λ 1 1 1 for some scalar λ. This means that
x = y = z = 2π/3, and the triangle must be equilateral.
5.4. LAGRANGE MULTIPLIERS 139

5.4.12 Let x, y, and z denote the central angles


 subtended by the vertices of the inscribed
x 
triangle. Then the perimeter of the triangle is f y  = 2 sin(x/2) + sin(y/2) + sin(z/2) . We

z
 
x
wish to maximize f subject to the constraint g y  = x + y + z − 2π = 0. Note that the
z
constraint set should really be x + y + z = 2π, 0 ≤ x, y, z ≤ π, so this is a compact subset of R3 .
Therefore, theh continuous function f is guaranteed
i h to achieve itsi maximum. That maximum will
occur when cos(x/2) cos(y/2) cos(z/2) = λ 1 1 1 for some scalar λ. This means
that x = y = z = 2π/3, and the triangle must be equilateral.
   
a a
5.4.13 We wish to minimize the function f = πab subject to the constraint g =
b b
  h i
9 1 9 1 9 1 1
2
+ 2
− 1 = 0. So we seek a solution of 3 3
= λ b a . This means that 2 = 2 = ,
a b a b a b 2
x 2 y 2
so a2 = 18 and b2 = 2, so the ellipse we seek has equation + = 1.
18 2
 
α
α β γ
5.4.14 We want to maximize the function f β  = sin sin sin subject to the constraint
  2 2 2
α γ
g β  = α + β + γ − π = 0. As before, we should also stipulate that 0 ≤ α, β, γ ≤ π, and so the
γ
constraint set is really a compact subset of R3 . The continuous function f is therefore guaranteed
to attain its maximum, at a point where

h i h i
1
2 cos α2 sin β2 sin γ2 1
2 sin α2 cos β2 sin γ2 1
2 sin α2 sin β2 cos γ2 =λ 1 1 1

for some scalar λ. At such a constrained critical point with none of α, β, or γ equal to 0, we must
α β γ
have tan = tan = tan , so α = β = γ = π/3. When we have an equilateral triangle, the
2 2 2
function f attains its maximum value 1/8.
 
x
5.4.15
  We want to find the extrema of the function f = x2 + y 2 subject to the constraint
y
x
g = 2x2 + 4xy + 5y 2 − 1 = 0. Since the ellipse is compact, the continuous function f is
y
h i
guaranteed to achieve its maximum and minimum values. We seek points where x y =
h i
λ 2x + 2y 2x + 5y for some scalar λ. (Note that x = 0 if and only if y = 0 and the origin is
2x + 2y 2x + 5y  y 2 y  y y 
not on our ellipse.) Thus, = , so 2 −3 −2 = 2 +1 − 2 = 0.
x y x x x  x  
1 1 1 2
Substituting in the constraint equation, we obtain the critical points ± √ and ± √ .
30 2 5 −1
The former are the points on the ellipse closest to the origin, and the latter are those farthest from
it.
140 5. EXTREMUM PROBLEMS
   
x x
5.4.16 We want to maximize f y  = xyz subject to the constraint g y  = xy + 2(xz +
z z
h i h i
yz) − C = 0. We seek points where yz xz xy = λ y + 2z x + 2z 2x + 2y for some scalar
λ. Since we obviously cannot achieve a maximum if any of the variables is 0, this leads to the
equations

y + 2z x + 2z 2x + 2y 1 2 1 2 2 2
= = , so + = + = + .
yz xz xy z y z x y x

Thus, we have x = y = 2z, and so the base of the box should be a square and its height should be
one-half the length of the base.
   
x x
1 2 2
5.4.17 We want to maximize f y  = xyz/6 subject to the constraint g y  = + + −1 =
x y z
z z
h i  
1 2 2
0. We seek points where yz xz xy = λ for some scalar λ. This leads to the
x2 y 2 z 2
1 2 2 1 x y z
equations = = = . Thus, the equation of the plane we seek is + + = 1.
x y z 3 3 6 6

5.4.18 We want to find the extrema of the function f (x) = x1 +· · ·+xn subject to the constraint
g(x) = kxk2 −1 = 0. Since the the unit sphere is compact and f continuous, we know that f achieves
its maximum and minimum. (Indeed, no calculus is necessary: The Cauchy-Schwarz  inequality is
1
 .. 
sufficient.) The method of Lagrange multipliers leads immediately to x = λ  .  for some scalar
 
1 1
1   √
λ. Therefore, we must have x = ± √  ... . The extreme values of f are ± n.
n
1

5.4.19 Let one vertex of the rectangular parallelepiped be at the origin and the opposite vertex
at x, with all the edges parallel to the coordinate axes. Then we wish to maximize f (x) = x1 x2 . . . xn
subject to the constraint g(x) = kxk2 − δ2 = 0 (obviously, we may take all xi ≥ 0.) Since f is
continuous and the constraint set h compact, we are guaranteed a globali maximum.
h This maximum
i
must occur at a point where x2 . . . xn x1 x3 . . . xn · · · x1 . . . xn−1 = λ x1 x2 · · · xn for
2 2 2
 λ. This leads immediately to x1 = x2 = · · · = xn , so the maximum occurs at x =
some scalar
1  
δ . δ n
√  .. , and the maximum volume is √ .
n n
1

5.4.20 Suppose
h we fix x1 + · · · + xn = c > 0 and i tryhto maximizeif (x) = x1 x2 . . . xn . Then we
must have x2 . . . xn x1 x3 . . . xn · · · x1 . . . xn−1 = λ 1 1 · · · 1 for some scalar λ. It follows
that x2 . . . xn = x1 x3 . . . xn = · · · = x1 . . . xn−1 , so (dividing through by x1 x2 . . . xn ) we infer that
the critical point of interest satisfies x1 = x2 = · · · = xn = c/n. It follows that the maximum
5.4. LAGRANGE MULTIPLIERS 141

 c n  n
x1 + · · · + xn
value of f is given by = . Then we conclude, upon taking nth roots, that
n n

n x x ...x ≤
x1 + · · · + xn
1 2 n , as desired.
n
   
x x
5.4.21 We wish to minimize the function f = xp /p+y q /q
subject to the constraint g =
y y
h i h i
xy = c > 0. Then at a constrained critical point we must have xp−1 y q−1 = λ y x for
p q
some scalar λ. Then we conclude that x = y , and so, substituting in the constraint equation, we
find x = cq/(p+q) = c1/p and y = cp/(p+q) = c1/q . Substituting these values into f , we find that the
minimum value is c/p + c/q = c, so the desired inequality holds.
 
x
5.4.22
  We need to maximize the function f y  = xy + 14 x2 tan θ subject to the constraint
x θ
g y  = x(1 + sec θ) + 2y = P . This leads to
θ h i h i
1 1 2 2 = λ 1 + sec θ 2 x sec θ tan θ
y+ 2 x tan θ x 4 x sec θ

for some scalar λ. From this we deduce that

y + 21 x tan θ (1) x (2) 41 x2 sec2 θ x


== = = = ,
1 + sec θ 2 x sec θ tan θ 4 sin θ

and so it is immediate from √the equation (2) that sin θ = 1/2 and θ = π/6. Substituting in the
 
y + x/(2 3) x x 1
equation (1) yields √ = , and so y = 1 + √ . Substituting in the constraint
1 + 2/ √3 2 2 3√

equation gives us x = P/(2 + 3) = P (2 − 3) and y = P (3 − 3)/6.

5.4.23 Let the radius of the cylinder and cone be r, let the
 height
 of the cylinder be H, and let
r
the height of the cone be h. Then we wish to maximize f H  = πr 2 (H + h/3) subject to the
 
r h
√ 
constraint g H  = πr 2H + r 2 + h2 = A. Applying the method of Lagrange multipliers leads
h
us to the equation
h i  
√ r2 rh
2r(H + h/3) r2 r 2 /3 = λ 2H + r2 + h2 +√ 2r √
r 2 + h2 r 2 + h2

for some scalar λ, which in turn gives us



2r(H + h/3) r (2) r r 2 + h2
(1)
2r 2 +h2
== = = .
2H + √ 2 3h
r 2 +h2
√ √
From equation (2) we infer that h = 2r/ 5, and then,rfrom equationr
(1), that H = r/ r 5. Solving,
1 A 1 A 2 A
now, using the constraint equation, we find r = 1/4 , H = 3/4 , and h = 3/4 .
5 π 5 π 5 π
142 5. EXTREMUM PROBLEMS
 
x
5.4.24 We want to find the extrema of the function f y  = z subject to the constraints
  z
x
" # " #
x2 y2
+ −1 0
g y =
 = . Then we must find points so that
z x + 2y + z 0
h i h i h i
0 0 1 =λ x y 0 +µ 1 2 1
   
x 1 1
for some scalars λ and µ. It follows that we must have µ = 1 and =− . This means
y λ 2
 
    1
x 1 1 1
that = ±√ and the critical points are ± √  2 .
y 5 2 5 −5
 
x
5.4.25
" We want# to
" minimize
# the function f (x) = kxk2 subject to the constraint g y  =
x + 2y + z − 5 0 z
= . Then we want to find a point where
2x + y − z − 1 0
h i h i h i
x y z =λ 1 2 1 +µ 2 1 −1

for some scalars λ and µ. Substituting x = λ + 2µ, y = 2λ + µ, z = λ − µ into the constraint


equations yields the system of equations
 
" # 1 2 " # " #
1 2 1   λ 5
2 1 =
2 1 −1 µ 1
1 −1
" #" # " # " # " #
6 3 λ 5 λ 1
= =⇒ =
3 6 µ 1 µ −1/3
     
1 2 1/3
1
Thus, the point closest to the origin on the given line is  2  −  1  =  5/3 .
3
1 −1 4/3

2
5.4.26 a.
" We# want to minimize the function f (x) = kx − bk subject to the constraint g(x) =
1 −1 3
x = 0. The method of Lagrange multipliers leads us to the equation
2 1 0    
| {z } 1 2
A
   
x − b = λ  −1  + µ  1  for some scalars λ and µ.
3 0
   
1 2 " # 3
  λ  
Thus, x =  −1 1 +  7 , and so, using the constraint equation Ax = 0, we obtain
µ
3 0 1
" # " #" # " #
T λ 11 1 λ −1
0 = AA + Ab = + ,
µ 1 5 µ 13
5.4. LAGRANGE MULTIPLIERS 143
       
    1 2 3 −2
λ 1 1 1 8
and so = and the closest point is x =  −1  −  1  +  7  =  4 .
µ 3 −8 3 3
3 0 1 2
2
" # to minimize the function f (x) = kx − bk subject to the constraint g(x) =
b. We want
1 1 1 1
x = 0. The method of Lagrange multipliers leads us to the equation
1 0 2 1    
| {z }
A 1 1
   
1 0
x − b = λ   
 1  + µ  2  for some scalars λ and µ.
   
1 1
   
1 1 " # 3
   
1 0 λ  1
Thus, x = 
1
 + 
 1 , and so, using the constraint equation Ax = 0, we obtain
 2 µ  
1 1 −1
" # " #" # " #
T λ 4 4 λ 4
0 = AA + Ab = + ,
µ 4 6 µ 4
     
    1 3 2
λ 1  1  1  0
   
and so =− , and x = − 
 + = .
µ 0 1  1  0
1 −1 −2
(See also (‡) and the discussion on pp. 229-230.)  
x
2  
5.4.27
" # " # the function f (x) = kxk subject to the constraint g y =
We want to minimize
2 2 2
x − xy + y − z − 1 0 z
2 2
= . The method of Lagrange multipliers leads to
x +y −1 0
h i h i h i
x y z = λ 2x − y −x + 2y −2z + µ x y 0

for some scalars λ and µ. There are two cases to consider. If z = 0, then
 the
 constraint
  equations
1 0
tell us that xy = 0, so, letting λ = 0, we get the four critical points ±  0  and ±  1 . If z 6= 0,
0
#" # " # 0
"
2 −1 x x
then we must have λ = −1/2; then we must have = 2(µ−1) , and so µ = 3/2
−1 2 y y
   
x 1 1
or µ = 5/2. With µ = 3/2, we must have = ±√ and there are no solutions (since this
y 1
 2  
2 x 1 1
leads to z = −1/2). With µ = 5/2, we must have = ±√ , and this gives the critical
 
y 2 −1    
1 1 0
1 
points ± √ −1 . Evaluating f at the eight different points, we see that ±  0  and ±  1  are
2 ±1 0 0
closest to the origin.
144 5. EXTREMUM PROBLEMS

5.4.28 As shown in the figure below, let the sidelengths of the quadrilateral be a, b, c, and d,
and let one pair of the opposite angles be x and y. (Without loss of generality,
 we may assume
x 1
that 0 < x < π/2 and 0 < y < π.) Then the area of the quadrilateral is f = 2 (ab sin x +
y

S
d
a
R
x y
P
c
b
Q
 
x
cd sin y). We wish to maximize this function subject to the constraint g = ab cos x − cd cos y =
y
h i
1 2 2 −c2 −d2 ) (coming from the law of cosines). This leads to the equation
2 (a +b ab cos x cd cos y =
h i
λ −ab sin x cd sin y for some scalar λ. Thus, at a constrained critical point, we must have
tan x = − tan y, so y = π − x. It now follows that the quadrilateral can be inscribed in a circle:
Consider the circle circumscribed about △P QS; then R must lie on this circle, since ∠QRS and
∠SP Q subtend a total angle 2π.

5.4.29 Proceeding as in Example 4 and the preceding discussion, we have the following.
a. We must solve Ax = λx:

x + 2y = λx
2x − 2y = λy,
x + 2y 2x − 2y  y 2 y   y  y 
so, eliminating λ, we obtain = . Thus, 2 +3 −2 = 2 − 1 + 2 = 0.
x y x x   x x 
y 1 1 2 1 1
Therefore, we have = or −2, leading to the critical points ± √ and ± √ , with
x 2 5 1 5 −2
respective Lagrange multipliers 2 and −3.
b. We must solve Ax = λx:

3y = λx
3x − 8y = λy,
3y 3x − 8y  y 2 y   y  y 
leading to the equation = , so 3 +8 −3 = 3 − 1 + 3 = 0. Therefore,
x y x x   x x 
y 1 1 1 1 3
we have = or −3, leading to the critical points ± √ and ± √ , with respective
x 3 10 −3 10 1
Lagrange multipliers −9 and 1.

5.4.30 Recall that kAk = max kAxk, so we need to maximize the function f (x) = kAxk2
kxk=1
subject to the constraint g(x) = kxk2 = 1.
5.4. LAGRANGE MULTIPLIERS 145
 
x
a. Here we have f = (x + y)2 + y 2 = x2 + 2xy + 2y 2 , so the method of Lagrange
y
h i h i x+y
multipliers leads to the equation x + y x + 2y = λ x y for some scalar λ. Thus, =
√  x 
x + 2y  y 2  y  y 1± 5 0.5257
, and so − − 1 = 0, so = . This leads to the critical points ±
y  0.8507
x x  x 2  
0.8507 0.5257 0.8507
and ± , and we have f ≈ 2.618 and f ≈ 0.382. Therefore, we deduce
−0.5257 0.8507 −0.5257

that kAk ≈ 2.618 ≈ 1.618 (which happens to be the golden ratio). (We can with hindsight derive
the 2
 exact
  result  algebraically: Letting r = y/x, we observed   that r = r + 1; the critical points
2 + 2r + 1
x0 x0 x0 2r
= are found by solving x2 (1+r 2 ) = 1, so f = (1+r)2 +r 2 )x20 = =
y0 rx0 y0 r+2
4r + 3
= r + 1 = r 2 , as required.)
r+2
 
x
b. Now we have f = (2x + y)2 + (3y)2 = 4x2 + 4xy + 10y 2 , so the method of
y
h i h i
Lagrange multipliers leads to the equation 2x + y x + 5y = λ x y for some scalar λ.
 y 2 y √
2x + y x + 5y y 3 ± 13
Thus, = , and so −3 − 1 = 0, so = . This leads to the critical
x  y  x x  x 2  
0.2898 0.9571 0.2898 0.9571
points ± and ± , and we have f ≈ 10.606 and f ≈ 3.394.
0.9571 −0.2898 0.9571 −0.2898

Therefore, kAk ≈ 10.606 ≈ 3.257.
 
x
c. Here we have f = (2x + y)2 + (x + 3y)2 = 5x2 + 10xy + 10y 2 , so the method
y
h i h i
of Lagrange multipliers leads to the equation x + y x + 2y = λ x y for some scalar λ.
 y 2  y  √
x+y x + 2y y 1± 5
Thus, = , and so − − 1 = 0, so = , just as in part a. This
x y  x  x   x 2  
0.5257 0.8507 0.5257
leads to the critical points ± and ± , and we have f ≈ 13.090 and
0.8507 −0.5257 0.8507
 
0.8507 √
f ≈ 1.910. Therefore, kAk ≈ 13.090 ≈ 3.618.
−0.5257

5.4.31 Consider the figure below: ℓ and h are constants, as is the length of the rope. Physics
dictates that at equilibrium the weight hang as low as possible. Thus, we wish to maximize


h
y
x
β θ α

z
146 5. EXTREMUM PROBLEMS
   
x x    
y  y  x + y + z − L 0
       
   
f  z  = x sin α + z given the constraints g  z  =  y sin β − x sin α − h  =  0 . This leads
α α x cos α + y cos β − ℓ 0
β β
to the following equation (which we write in the equivalent vector form for obvious typographical
reasons):
       
sin α 1 − sin α cos α
       
 0  1  sin β   cos β 
       
 1  = λ 1 + µ 0 +ν 0 
       
       
 x cos α  0  −x cos α   −x sin α 
0 0 y cos β −y sin β
for some scalars λ, µ, and ν. The third component of this equation tells us that λ = 1. The
fifth component tells us that µ/ν = tan β, and from this and the second component we infer that
ν = − cos β, so µ = − sin β. Now the fourth component gives us

cos α = sin β cos α + cos β sin α = sin(α + β)

and the first component gives us

sin α = sin β sin α − cos β cos α + 1 = − cos(α + β) + 1.

Squaring and adding these, we obtain cos(α + β) = 1/2, so α + β = π/3 and the angle θ we seek
must be 2π/3. (This is not all that surprising if we think about the tension in the rope and the
nature of the force vectors at equilibrium. Cf. the solution of Exercise 5.2.15.)

5.4.32 a. We have (g ◦ ψ)(t) = t locally, so Dg(ψ(c))ψ ′ (c) = 1. We also have Df (a) = λDg(a),
so applying both sides of this equation to the vector ψ ′ (c) gives (f ◦ ψ)′ (c) = Df (a)ψ ′ (c) =
λDg(a)ψ ′ (c) = λ.  
c
2
b. Since f and g are C , the function F : R × ×R → Rn Rn × R defined by F x =
" # λ
∇f (x) − λ∇g(x) 1
is C . Then the Implicit Function Theorem (see Section 2 of Chapter 6 for
c − g(x)
 
x
−1
the general statement) tells us that on the level set F (0) we can solve locally for as a C1
λ
function of c provided the (n + 1) × (n + 1) matrix giving the derivative of F with respect to the
variables x, λ is invertible. But that derivative is (aside from a couple of minus signs) the matrix
given in the exercise.

5.4.33 Suppose we are given a budget g(x) = p · x = c and we wish to maximize the production
function f . Then the method of Lagrange multipliers tells us that at the constrained critical point a
∂f
we must have Df (a) = λDg(a) for some scalar λ. This means that (a) = λpi for all i = 1, . . . , n,
∂xi
1 ∂f
which in turn means that λ = (a) for all i = 1, . . . , n. It is intuitively plausible that all the
pi ∂xi
5.5. PROJECTIONS AND LEAST SQUARES 147

marginal productivities should be equal; if that for item j were greater, we would produce more
widgets by using more than aj units of item j without changing the cost.
The result of Exercise 32a tells us that the marginal productivity is the derivative of the
(optimal) number of widgets produced as a function of our budget. That is, if we increase the
budget by one dollar, then λ will be the extra number of widgets produced optimally. What the
equality in this exercise establishes is that, at the optimal point, spending that extra dollar on any
one of the items results in the same increase in productivity. This is a non-obvious result.

5.4.34 Proceeding as in the hint, we parametrize the level set S locally by Φ : U → Rn , where
U is a neighborhood of a in Rn−1 . Set h = f − λg. If Hh◦Φ,a is positive (negative) definite, then
the function h◦ Φ has a local minimum (maximum) at a. But h◦ Φ = f ◦ Φ − λc, so h◦ Φ has a local
minimum (maximum) at a if and only if f ◦ Φ has a local minimum (maximum) at a, which in turns
happens if and only if f has a constrained local minimum (maximum) at a.
Now, we differentiate h◦ Φ carefully using the chain rule. Recall that for the particular value λ
that appears in Lagrange multipliers, we have Dh(a) = Df (a) − λDg(a) = 0. For j = 1, . . . , n − 1,
we have
Xn
∂(h◦ Φ) ∂h ∂Φℓ
(a) = (Φ(a)) (a), and so
∂uj ∂xℓ ∂uj
ℓ=1
Xn n
X ∂h
∂ 2 (h◦ Φ) ∂2h ∂Φk ∂Φℓ ∂ 2 Φℓ
(a) = (a) (a) (a) + (a) (a)
∂ui ∂uj ∂xk ∂xℓ ∂ui ∂uj ∂xℓ ∂ui ∂uj
k,ℓ=1 ℓ=1
n
X ∂2h ∂Φk ∂Φℓ
= (a) (a) (a) = viT Hess(h)(a)vj ,
∂xk ∂xℓ ∂ui ∂uj
k,ℓ=1

∂Φ
where vi = (a), i = 1, . . . , n − 1, give a basis for Ta S. That is, computing the Hessian of h◦ Φ
∂ui
at a is exactly computing the restriction of the Hessian of h at a to the tangent space of S at a.

5.5. Projections and Least Squares

In these exercises, as in Example 5, it is easier to project onto ⊥


5.5.1  V . 
2 1
   
  1·1    
1 1 4/3
b·a 1 1    
a. V has normal vector a =  1 , so projV ⊥ b = 2a =   2  1  =  4/3 .
kak
1 1 1 4/3
 
 1 

1
     
2 4/3 2/3
     
Then projV b = b − projV ⊥ b =  1  −  4/3  =  −1/3 .
1 4/3 −1/3
148 5. EXTREMUM PROBLEMS
   
0 1
   
1 1
 ·     
  2 1
1     1 1
   
1 b·a 3 0 1  
b. a =     2   =  1 . Then projV b =
 1 , so projV ⊥ b = kak2 a = 1 1   1
 
0   0 0
 1 
 
 1 
 

0
     
0 1 −1
     
1 1  0
   
b − projV ⊥ b =   −   =   .
2 1 1 
     
3 0 3
   
1 1
   
 1   −1 
 · 
  1  1    
1     1 3/7
   
 −1  b·a 1 2  −1   −3/7 
c. Here a =    , so projV ⊥ b = a =     
1 kak2
 2 
1  =  3/7 . Thus,
1    
2   2 6/7
 −1 
 
 1 
 

2
     
1 3/7 4
     
 1   −3/7  1  10 
projV b = b − projV ⊥ b =      
 1  −  3/7  = 7  4 .
     
1 6/7 1
  
5.5.2 P T = A(AT A)−1 AT T = A (AT A)−1 T AT = A (AT A)T −1 AT = A(AT A)−1 AT = P
2
and P 2 = A(AT A)−1 AT = A(AT A)−1 (AT A)(AT A)−1 AT = A(AT A)−1 AT = P . (I − P )T =
I T − P T = I − P and (I − P )2 = I − 2P + P 2 = I − P . This stands to reason: If P = projV , then
I − P = projV ⊥ .
 
1
5.5.3 a. Notice that V ⊥ is spanned by a =  −2 . Thus,
−1
   
1 h i 1 1 −2 −1
1 T 1   
projV ⊥ = 2
aa =  −2  1 −2 −1 =  −2 4 2,
kak 6 6
−1 −1 2 1

 
5 2 1
1 
so projV = I − projV ⊥ = 2 2 −2 .
6
1 −2 5
5.5. PROJECTIONS AND LEAST SQUARES 149
 
1 0 " #
  2 −2
b. Let A =  0 1 . Then AT A = and so
−2 5
1 −2
   
1 0 " #" # 5 2 1
T −1 T 1  5 2 1 0 1 1 
P = A(A A) A =  0 1 = 2 2 −2  .
6 2 2 0 1 −2 6
1 −2 1 −2 5
   
1 0
c. We have V = Span(v1 , v2 ), where v1 =  0  and v2 =  1 . Applying the Gram-
1 −2
   
0 1
   
   1·0  
0 1
  −2 1  
Schmidt process, we take w1 = v1 and w2 = v2 − projw1 v2 =  1  −   2  0  =
1
−2 1
 
       0 

0 1 1 1
     
 1  +  0  =  1 . Then V = Span(w1 , w2 ) and
−2 1 −1
     
2
X 1 h i 1 1 h i 1 5 2 1
1 T 1     
projV = 2
wi wi =  0  1 0 1 +  1  1 1 −1 =  2 2 −2  .
kwi k 2 3 6
i=1 1 −1 1 −2 5
   
1 1 4 " # " #
    6 2 1
5.5.4 a. Let A =  2 1  and b =  −2 . Then AT A = and AT b = .
2 3 1
1−1 1
" #" # " #
T −1 T 1 3 −2 1 1 1
Then x = (A A) A b = = is the least squares solution.
14 −2 6 1 14 4
   
5 4
1  
b. Ax =  6  is the point in C(A) closest to  −2 .
14
−3 1

   
1 1 1 " # " #
    6 0 11
5.5.5 a. Let A =  1 −3  and b =  4 . Then AT A = and AT b = .
0 11 −8
2 1 3
" #" # " #
1/6 0 11 11/6
Then x = (AT A)−1 AT b = = is the least squares solution.
0 1/11 −8 −8/11
   
73 1
1   
b. Ax =  265  is the point in C(A) closest to 4 .
66
194 3
150 5. EXTREMUM PROBLEMS
" #
1 −1 3
5.5.6 a. Let B = . Then
2 1 0

     
3 1 2 " #−1 " # 3
T T −1     11 1 1 −1 3  
x0 = b − B (BB ) Bb =  7  −  −1 1 7
1 5 2 1 0
1 3 0 1
       
3 1 2 " #" # 3 1 2 " #
  1   5 −1 −1   1  −1
=7−  −1 1 =  7  −  −1 1
54 −1 11 13 3 8
1 3 0 1 3 0
     
3 5 −2
     
= 7− 3 =  4.
1 −1 2

" #
1 1 1 1
b. Let B = . Then
1 0 2 1

     
3 1 1 " #−1 " # 3
    
  
1 1 0 1  
x0 = b − B T (BB T )−1 Bb =   4 4 1 1 1  1
 1−1 2 1  
    4 6 1 0 2  1
−1 1 1 −1
     
3 1 2
     
 1 1  0
=    
 1−1 =  0.

     
−1 1 −2

5.5.7 a. Fitting y = a to the data yields the inconsistent system

a=0
a=1
a=3
a = 5,

   
1 0
   
1 1
which takes the matrix form Aa = b, with A =     T
 1  and b =  3 . Then A A = [4] and
   
1 5
T
A b = [9], so a = 9/4. (Notice this is just the average of the given y-values.) The sum of the errors
is (0 − 94 ) + (1 − 94 ) + (3 − 94 ) + (5 − 49 ) = 0, of course.
5.5. PROJECTIONS AND LEAST SQUARES 151

b. The system
−a + b = 0
b = 1
a + b = 3
2a + b = 5

can be put in the matrix form


   
" #  −1 1 " #

0
 
a  0 
1 a  
A = = 1.
b  1 1 3
  b  
2 1 5  
" # " # 0
 
6 2 a  1
We have AT A = . The least squares solution is given by = (AT A)−1 AT  
3 =
2 4 b  
" #" # " # 5
1 2 −1 13 1 17 17 7
= , so the least squares line is y = x + . So far as the errors are
10 −1 3 9 10 14 10 5
     
3
0 − (− 10 ) 3 1
     
 1 − 14  
1  −4   
concerned, we have ε =  10  1
 3 − 31  = 10  −1 , so ε ·  1  = 0.
 10     
48
5 − 10 2 1
c. The linear system
a − b + c = 0
c = 1
a + b + c = 3
4a + 2b + c = 5

takes the form    


  1 −1 1   0
a   a  
   0 0 1  1
 
A b  = 
1  b =   .
 1 1 3
c c
4 2 1 5
Solving the normal equations by using an augmented matrix, we have
   1

18 8 6 23 1 0 0 4
   29  ,
 8 6 2 13  0 1 0 20 
23
6 2 4 9 0 0 1 20
   
1
0 − (− 20 ) 1
 23   
1 2 29 23  1 − 20  1  −3 
so the least squares parabola is y = x + x + . We have ε =     , so,
4 20 20 57 =  
 3 − 20  20  3 
once again, the sum of the errors is 0. 5 − 101 −1
20
152 5. EXTREMUM PROBLEMS

5.5.8 a. Fitting y = a to the data yields the inconsistent system

a=1
a=2
a=1
a = 3,
   
1 1
   
1 2
which takes the matrix form Aa = b, with A =     T
 1  and b =  1 . Then A A = [4] and
   
1 3
T
A b = [7], so a = 7/4. (Notice this is just the average of the given y-values.) The sum of the errors
is (1 − 74 ) + (2 − 74 ) + (1 − 74 ) + (3 − 47 ) = 0, of course.
b. The system
a + b = 1
2a + b = 2
3a + b = 1
4a + b = 3

can be put in the matrix form


   
" # 1 1 " #

1
 
a 2 1  
A =  a = 2.
b 3 
1 b 1
  
4 1 3  
" # " # 1
 
30 10 a  2
We have AT A = . The least squares solution is given by = (AT A)−1 AT  
1 =
10 4 b  
3
" #" # " #
1
1 2 −5 20 1 1
= 21 , so the least squares line is y = x + . As far as the errors are
10 −5 15 7 2 2
 2     
1−1 0 1
     
 2 − 32   21  1
concerned, we have ε =      
 1 − 2  =  −1 , so ε ·  1  = 0.
     
5 1
3− 2 2 1
c. The linear system
a + b + c = 1
4a + 2b + c = 2
9a + 3b + c = 1
16a + 4b + c = 3
5.5. PROJECTIONS AND LEAST SQUARES 153

takes the form    


  1 1 1   1
a   a  
   4 2 1  2
A b  = 
 9   b = 
1.

 3 1  
c c
16 4 1 3
Solving the normal equations by using an augmented matrix, we have
   1

354 100 30 66 1 0 0 4
   
 100 30 10 20  0 1 0 − 34  ,
7
30 10 4 7 0 0 1 4
   
1 − 54 −1
   
1 2 3 7  2 − 54  1 3
so the least squares parabola is y = x − x + . We have ε =    =  , so, once
4 4 4 7  4  −3 
 1− 4  
11
3− 4 1
again, the sum of the errors is 0.

5.5.9 We want to solve Ax = p where b − p ∈ C(A)⊥ . By Theorem 4.9 of Chapter 4,


C(A)⊥ = N(AT ), so we must have AT (b − p) = 0. Thus, AT (b − Ax) = 0, so AT Ax = AT b.

5.5.10 a. Suppose projV x = p and projV y = q. Then x − p and y − q are vectors in V ⊥ . Then
 
x + y = (p + q) + (x + y) − (p + q) = (p + q) + (x − p) + (y − q) ,
| {z } | {z }
∈V ∈V ⊥

so projV (x + y) = p + q, as required.
b. Similarly, since

cx = c p + (x − p) = (cp) + (cx − cp) = (cp) + c(x − p) ,
|{z} | {z }
∈V ∈V ⊥

we infer that projV (cx) = cp, as well.


c. Let p = projV b. Then b−p ∈ V ⊥ and b−(b−p) = p ∈ V = (V ⊥ )⊥ (by Proposition
4.8 of Chapter 4). Therefore, b − p = projV ⊥ b, so b = p + (b − p) = projV b + projV ⊥ b.

5.5.11 a. Let b ∈ Rm . Then p = Ab = projV b is the unique vector in V with the property
that b − p ∈ V ⊥ . Moreover, it follows that whenever p ∈ V to start with, we must have Ap = p
(inasmuch as p ∈ V and p − p ∈ V ⊥ ). Therefore, for any b ∈ Rm , A2 b = A(Ab) = Ap = p = Ab,
so A2 = A.
To prove that A = AT , we show that Ax · y = x · Ay for all x, y ∈ Rm . We write y =
Ay + (y − Ay), where Ay ∈ V and y − Ay ∈ V ⊥ . Since Ax ∈ V , it follows that Ax · (y − Ay) = 0,
and so Ax·y = Ax·Ay. But a similar argument shows that x·Ay = Ax·Ay, and so Ax·y = x·Ay
for any x, y ∈ Rm . But now it follows that Ax · y = AT x · y, so (A − AT )x · y = 0 for all x, y ∈ Rm .
So, by Exercise 1.2.15, (A − AT )x = 0 for all x ∈ Rm and so A − AT = O, as required.
154 5. EXTREMUM PROBLEMS

b. Let V = C(A). Then we claim that µA = projV . Certainly Ax ∈ V for all x ∈ Rm .


Next, since A2 = A, we have A(x − Ax) = Ax − A2 x = Ax − Ax = 0, so x − Ax ∈ N(A) =
N(AT ) = C(A)⊥ = V ⊥ . Therefore, Ax = projV x, as required.
     
1 2 3
     
5.5.12 a. Let v1 =  0 , v2 =  1 , and v3 =  2 . Applying the Gram-Schmidt process,
0 0 1
we have

 
1
 
w1 = v1 =  0     
0 2 1
   
  1·0    
2 1 0
  0 0    
w2 = v2 − projw1 v2 =  1  −   2  0  =  1 
1
0 0 0
 
 0 

0

w3 = v3 − projw1 v3 − projw2 v3
       
3 1 3 0
       
  2·0   2·1    
3 1 0 0
  1 0   1 0    
=  2  −   2  0  −   2  1  =  0  .
1 0
1 0 0 1
   
 0   1 

0 0

These three vectors already have length 1, so they give an orthonormal basis.
     
1 0 0
     
b. Let v1 =  1 , v2 =  1 , and v3 =  0 . Applying the Gram-Schmidt process,
1 1 1
we have

 
1
 
w1 = v1 =  1     
1 0 1
   
  1·1    
0 1 −2/3
  1 1    
w2 = v2 − projw1 v2 =  1  −   2  1  =  1/3 
1
1 1 1/3
 
 1 

1
5.5. PROJECTIONS AND LEAST SQUARES 155

and we can rescale to spare ourselves the fractions:

 
−2
 
w2′ =  1 
1
w3 = v3 − projw1 v3 − projw2′ v3
       
0 1 0 −2
       
  0·1   0· 1    
0 1 −2 0
  1 1   1 1    
=  0  −   2  1  −   2  1  =  −1/2  ,
1 −2
1 1 1 1/2
   
 1   1 

1 1

 
0
 
which we can likewise rescale to w3′ =  −1 .
1
   
1 −2
1   1  
We now take q1 = w1 /kw1 k = √  1 , q2 = w2 /kw2 k = w2′ /kw2′ k = √  1 , and
3 6
1 1
 
0
′ ′ 1  
q3 = w3 /kw3 k = w3 /kw3 k = √  −1  to form our orthonormal basis.
2
1
     
1 2 0
     
0 1  1
   
c. Let v1 =  , v2 =  , and v3 =   . Applying the Gram-Schmidt process,

1 0  2
0 1 −3
we have
 
1
 
0
w1 = v1 =  1
    
  2 1
0    
1 0
     
0·1    
2     1 1
     
1 1 0 0  1
w2 = v2 − projw1 v2 =  
0−      
 =  −1 
  1 2   1   

1   0 1
 0 
 
 1 
 

0
w3 = v3 − projw1 v3 − projw2 v3
156 5. EXTREMUM PROBLEMS
       
0 1 0 1
       
 1 0    
   ·     1· 1  
 2 1  2   −1 
0     1     1
     
 1 −3 0 0 −3 1  1
= 
 2−     −   2  
  1 2   1 
 1  −1 
 

−3   0   1
 0   1 
   
 1   −1 
   

0 1
 
0
 
 2
=  .
0 
 
−2
   
1 1
   
1  0 1 1, and q3 = w3 /kw3 k =
We now take q1 = w1 /kw1 k = √  , q2 = w2 /kw2 k = 
21 2  −1 
0 1
 
0
 
1  1 
√   to form our orthonormal basis.
2 0


−1
     
−1 2 −1
     
 2  −4   3
d. Let v1 =       
0 , v2 =  1 , and v3 =  1 . Applying the Gram-Schmidt
     
2 −4 1
process, we have
 
−1
 
 2
w1 = v1 =  
0 
 
2
   
2 −1
   
 −4   2 
   ·
 1  0 
   
2     −1 0
     
 −4  −4 2  2 0
w2 = v2 − projw1 v2 =   1−
   2    
 0=1
     
−1
−4   2 0
 2 
 
 0 
 

2
w3 = v3 − projw1 v3 − projw2 v3
5.5. PROJECTIONS AND LEAST SQUARES 157
       
−1 −1 −1 0
       
 3  2  3 0
    
 1· 0 
     
 1·1  
−1     −1     0
     
 3 1 2  2 1 0 0
= 
 1−     −    
  −1 2   0 
 0 2  1


1   2   0
 2   0 
   
 0   1 
   

2 0
 
0
 
 1
= .
0 
 
     
−1 −1 0 0
     
1 2
0
  1  1
We now take q1 = w1 /kw1 k =  , q2 = w2 /kw2 k =  , and q3 = w3 /kw3 k = √ 
3 0  1
  2  0 

2 0 −1
to form our orthonormal basis.
   
1 1
   
 0   −1 
      
1· 0 
   
1 1     1 1/2
       
 −1  0 1 2  −1   1/2 
       
5.5.13 a. Take w1 = 
0  and w2 =  1  −
  2 
0  =  1 . To make life
    1    
2 1   2 0
 −1 
 
 0 
 

2
 
1
 
 1
easier, we take w2′ =   ′
 2 . Then {w1 , w2 } gives an orthogonal basis for V .
 
0
b.

projV b = projw1 b + projw2′ b


       
1 1 1 1
       
 −3   −1   −3   1 
 ·     
 1  0    1·2  
    1     1
   
1 2  −1  1 0 1
=   2   0+
   2  
2
1   1  

  2   0
 −1   1 
   
 0   2 
   

2 0
158 5. EXTREMUM PROBLEMS
     
1 1 1
     
 −1   1   −1 

= (1)      
 + (0)  2  =  0  .
 0    
2 0 2

c. The least squares solution of Ax = b is the unique vector" # x with the property that
1
Ax = projV b. In this case projV b is the first column of A, so x = .
0
   
0 1
   
1 0
       
0·1      
1 0     1 −1 −1
         
0 1 1 1 0 1 3  3
   
5.5.14 a. Take v1 =   and v2 =   −   2   =      ′  
1 0 1 3 −1 ; we set v2 =  −1 
    1      
1 1   1 2 2
 0 
 
 1 
 

1
for convenience. {v1 , v2′ } gives an orthogonal basis for V .
   
" #! −1 −1
1 0 1 1  0   −1 
b. V ⊥ = N = Span     
 1  ,  0 . Applying the Gram-
−1 3 −1 2
0 1
   
−1 1
   
 0  2
Schmidt process once more, we find that w1 = 

 and w2 =  
 1  give an orthogonal basis
 1  
0 −2
for V ⊥ .
c. Given x ∈ R4 , we set v = projV x and w = projV ⊥ x. Then one can easily check that
x = v + w. Here we have
   
1 −1
   
0  3
 
x·    x·    
1 −1 
  1   −1
   
1 0 2  3
 
v = projv1 x + projv2′ x =   2   +    
1  1  −1 2   −1 


  1   2
 0   3 
   
 1   −1 
   

1 2
   
1 −1
   
x1 + x3 + x4  0
 +
 −x1 + 3x2 − x3 + 2x4  3 
 
= 1  −1 
3   15  
1 2
5.5. PROJECTIONS AND LEAST SQUARES 159
 
2x1 − x2 + 2x3 + x4
 
1 −x1 + 3x2 − x3 + 2x4 
= 
5  2x1 − x2 + 2x3 + x4  
x1 + 2x2 + x3 + 3x4
   
−1 1
   
 0  2
x· 1 
  x·  1
  
  −1   1
   
0  0 −2  2
w = projw1 x + projw2 x =     +   2  
−1 2  1 
1
 1
   
  0   −2
 0   2 
   
 1   1 
   

0 −2
   
−1 1
   
−x1 + x3  0 x1 + 2x2 + x3 − 2x4  2
=  1+  1
2   10  
0 −2
 
3x1 + x2 − 2x3 − x4
 
1 x1 + 2x2 + x3 − 2x4 .
= 
5  −2x1 + x2 + 3x3 − x4 
−x1 − 2x2 − x3 + 2x4
 
1
5.5.15 a. Since C(A) = Span , we know that b ∈ C(A) if and only if it has the form
1
 
  1
1
b=b for some b ∈ R. A solution v of Av = b is given by v = b  0 . To find x ∈ R(A) with
1
0
   
1 1
   
b 0 ·  2     
1 1
0 3   b  
Ax = b, we take x = projR(A) v =   2  2  =  2 .
1 14
3 3
 
 2 

3
b. Since rank(A) = 2, we know that C(A) = R2 . Row reducing the augmented matrix
 
" # b1 − b2
1 1 1 b1  
yields the solution v =  b2 . The rows of A are orthogonal, so to find
0 1 −1 b2
0
160 5. EXTREMUM PROBLEMS

the unique solution lying in the row space, we take

   
1 0
   
v·1   v· 1      
1 0 1 0
1   −1   b1   b2  
x = projR(A) v =   2  1  +   2  1  = 3  1  + 2  1  .
1 0
1 −1 1 −1
   
 1   1 

1 −1

c. Again it is clear that C(A) = R2 


. By row reducing
 the augmented matrix, we find
3b1 − b2
 
1 0 
that a solution of the system Ax = b is v =  . The rows of A are orthogonal, so the
2  b2 − b1 
0
unique solution lying in the row space is
   
1 1
   
1  1

v·     v·    

1 1  3 1
   
1 1 −5  1
x = projR(A) v =   2  
 + 
  2 
 3

1 1 1  

  1   −5
 1   1 
   
 1   3 
   

1 −5
   
1 1
   
b1 
 1 b2 
 1.
= +
4  
 1  36  3 

1 −5

d. The first two rows of A are orthogonal and the  third row is the sum of the first
b1
 
two. Thus, if b ∈ C(A), it must have the form b =  b2  for some b1 , b2 ∈ R. If we take
b1 + b2
   
1 1
   
b1 
 1 b2 
 1
x=
4  1  + 36  3  (as in part c), we see that Ax = b and x ∈ R(A).
   
1 −5

5.5.16 a. If a1 , . . . , an form an orthonormal set, then AT A = I, and so A−1 = AT by Corollary


2.2 of Chapter 4.
5.5. PROJECTIONS AND LEAST SQUARES 161

b. If a1 , . . . , an form an orthogonal set, then the ij-entry of AT A is 0 when i 6= j and


kai k2 when i = j. Since
 
ka1 k2 0 ··· 0
 2 
 0 ka2 k 0 
AT A = 
 . . .. ..  = B,

 . . . 
0 0 · · · kan k2

it follows that A−1 = B −1 AT .

5.5.17 a. Suppose g ∈ U + and h ∈ U − . Then


Z a Z 0 Z a
hg, hi = g(t)h(t)dt =
g(t)h(t)dt + g(t)h(t)dt
−a −a 0
Z a Z a Z a Z a
= g(−s)h(−s)ds + g(t)h(t)dt = − g(s)h(s)ds + g(t)h(t)dt = 0,
0 0 0 0

as required.
b. First, V = U + + U − by the remark. Let’s now check that U − = (U + )⊥ . We’ve
already established that U − ⊂ (U + )⊥ , so it remains only to show that if f ∈ (U + )⊥ , then f ∈ U − .
Write f = f1 + f2 , where f1 ∈ U + and f2 ∈ U − . Then we have

0 = hf, f1 i = hf1 + f2 , f1 i = hf1 , f1 i + hf2 , f1 i = hf1 , f1 i,

since we’ve already shown that even and odd functions are orthogonal. Thus, f1 = 0 and f ∈ U − ,
as we needed to show. (This means that (U + )⊥⊥ = U + in this instance. Although Proposition 4.8
of Chapter 4 need not hold in infinite dimensions, it does hold here.)

5.5.18 a. We check the various properties of an inner product.


(i) By part a of Exercise 1.4.22, hA, Bi = tr(AT B) = tr((AT B)T ) = tr(B T A) =
hB, Ai.
(ii) By part b of Exercise 1.4.22, hcA, Bi = tr((cA)T B) = c tr(AT B) = chA, Bi.
(iii) By part b of Exercise 1.4.22, hA + B, Ci = tr((A + B)T C) = tr(AT C + B T C) =
tr(AT C) + tr(B T C) = hA, Ci + hB, Ci.
P P
(iv) (AT A)ii = nj=1 a2ji , so hA, Ai = tr(AT A) = ni,j=1 a2ji is the sum of the squares
of all of the entries of A. Thus, hA, Ai ≥ 0 and hA, Ai = 0 only if A = O.
b. Using part a and part c of Exercise 1.4.22, we have hA, Bi = tr(AT B) = tr(AB) =
tr(BA) = −tr(B T A) = −hB, Ai = −hA, Bi. Thus, hA, Bi = 0.
c. By part b, K ⊂ S⊥ . Since, by Exercise 4.3.24, dim S + dim K = n2 = dim Mn×n , it
follows that dim K = dim S⊥ , so, by Lemma 3.8 of Chapter 4, we have K = S⊥ .

5.5.19 Let V = Span(g1 , g2 ).


162 5. EXTREMUM PROBLEMS

a. Let f ∈ P2 ⊂ C0 ([−1, 1]) satisfy hf, g1 i = hf, g2 i = 0. Writing f (t) = a0 + a1 t + a2 t2 ,


we have
Z 1
2
0= (a0 + a1 t + a2 t2 )dt = a2 + 2a0
−1 3
Z 1
2
0= (a0 + a1 t + a2 t2 )tdt = a1 .
−1 3
1
Thus, f (t) = 3 a2 (3t
2 − 1), so {3t2 − 1} is a basis for V ⊥ .
b. Let f ∈ P2 ⊂ C0 ([0, 1]) satisfy hf, g1 i = hf, g2 i = 0. Writing f (t) = a0 + a1 t + a2 t2 , we
R1 R1
have 0 = 0 (a0 + a1 t + a2 t2 )dt = a0 + 21 a1 + 31 a2 and 0 = 0 (a0 + a1 t + a2 t2 )tdt = 12 a0 + 13 a1 + 14 a2 .
" #!  
1 1 1
1
Since N 1
2
1
3
1
is spanned by  −6 , we conclude that {1 − 6t + 6t2 } is a basis for
2 3 4 6
V ⊥.
c. Let f ∈ P3 ⊂ C0 ([−1, 1]) satisfy hf, g1 i = hf, g2 i = 0. Writing f (t) = a0 + a1 t + a2 t2 +
a3 t3 , then we have
Z 1
0= (a0 + a1 t + a2 t2 + a3 t3 )dt = 23 a2 + 2a0
−1
Z 1
0= (a0 + a1 t + a2 t2 + a3 t3 )tdt = 25 a3 + 23 a1 .
−1
1 1
Thus, f (t) = 5 a3 (5t
3 − 3t) + 3 a2 (3t
2 − 1). and a basis for V ⊥ is {5t3 − 3t, 3t2 − 1}.

5.5.20 For any nonzero integer k, we have


Z π iπ Z π iπ
1 1
cos(kt)dt = sin(kt) =0 and sin(kt)dt = − cos(kt) = 0.
−π k −π −π k −π

As suggested in the hint at the back of the book, when k 6= ℓ, we have


Z π Z
1 π 
sin(kt) sin(ℓt)dt = cos(k − ℓ)t − cos(k + ℓ)t dt = 0
−π 2 −π
Z π Z
1 π 
sin(kt) cos(ℓt)dt = sin(k + ℓ)t − sin(k − ℓ)t dt = 0
−π 2 −π
Z π Z
1 π 
cos(kt) cos(ℓt)dt = cos(k + ℓ)t + cos(k − ℓ)t dt = 0.
−π 2 −π
Alternatively, one can derive the last integrals by integration by parts. For example, for k, ℓ ∈ N,
we have:
Z π iπ Z Z
1 k π k π
sin(kt) cos(ℓt)dt = sin(kt) sin(ℓt) − sin(ℓt) cos(kt)dt = − sin(ℓt) cos(kt)dt.
−π ℓ −π ℓ −π ℓ −π
Now, another integration by parts yields
Z π iπ Z Z
k k2 π k2 π
sin(kt) cos(ℓt)dt = 2 cos(kt) cos(ℓt) + 2 sin(kt) cos(ℓt)dt = 2 sin(kt) cos(ℓt)dt.
−π ℓ −π ℓ −π ℓ −π

If k 6= ℓ, this implies −π sin(kt) cos(ℓt)dt = 0. If k = ℓ, we can stop after the first integration by
Rπ Rπ
parts to conclude that −π sin(kt) cos(kt)dt = − −π sin(kt) cos(kt)dt = 0.
CHAPTER 6
Solving Nonlinear Problems
6.1. The Contraction Mapping Principle

6.1.1 Suppose f is a contraction mapping. Then there is a constant c with 0 < c < 1 so that
kf (x) − f (y)k ≤ ckx − yk. Given ε > 0, take δ = ε/c; then whenever kx − yk < δ, we have
kf (x) − f (y)k < cδ = ε, as required. If x and y are fixed points, then kf (x) − f (y)k = kx − yk ≤
ckx − yk can hold only when kx − yk = 0. That is, f cannot have more than one fixed point.

6.1.2 We have f (x) > |x| for all x ∈ R, so f cannot have a fixed point. On the other hand,
|x|
|f ′ (x)| = √ < 1 for all x. Since lim |f ′ (x)| = 1, we suspect that there is no c < 1 so that
x2 + 1 |x|→∞
|f (x) − f (y)| ≤ c|x − y|. For example,
√ √
f (2x) − f (x) 4x2 + 1 − x2 + 1
lim = lim = 1,
x→∞ x x→∞ x
so there can indeed be no such c.

6.1.3 We have
k
X ∞
X ∞
X
 
xk − x = x0 + (xj − xj−1 ) − x0 + (xj − xj−1 ) = − (xj − xj−1 ),
j=1 j=1 j=k+1

so

X  X
∞  ck
kxk − xk ≤ kxj − xj−1 k ≤ cj−1 kx1 − x0 k = kx1 − x0 k.
1−c
j=k+1 j=k+1

6.1.4 Since kxk+1 − xk k ≤ ckxk − xk−1 k for all k ∈ N, we have kxk+1 − xk k ≤ ck kx1 − x0 k.
Thus, when ℓ > k > K, the triangle inequality gives us

kxℓ − xk k ≤ kxk+1 − xk k + kxk+2 − xk+1 k + · · · + kxℓ − xℓ−1 k


 
≤ ck + ck+1 + · · · + cℓ−1 kx1 − x0 k
ck cK
< kx1 − x0 k < kx1 − x0 k.
1−c 1−c
Since cK → 0 as K → ∞, given any ε > 0, we can choose K so that cK < (1 − c)ε/kx1 − x0 k, and
cK
so kx1 − x0 k < ε. Thus, the sequence {xk } is Cauchy.
1−c
163
164 6. SOLVING NONLINEAR PROBLEMS
P
6.1.5 Suppose kak k converges. This means that given any ε > 0, there is K so that
P

kaj k < ε. Let sk = a1 + · · · + ak . Then, for any ℓ > k > K we have, by the triangle
j=K
inequality,

X
ℓ ℓ
X ∞
X

ksℓ − sk k = aj ≤ kaj k ≤ kaj k < ε.
j=k+1 j=k+1 j=k+1

Therefore, the sequence {sk } is a Cauchy sequence and therefore converges, by Exercise 2.2.14.
P
6.1.6 a. If kHk < 1, then the geometric series H k converges (by virtue of Proposition 1.1
and the remark following). But


X
(I − H) H k = lim (I − H)(I + H + H 2 + · · · + H K ) = lim I − H K+1 = I ,
K→∞ K→∞
k=0

inasmuch as kH K+1 k ≤ kHkK+1 → 0 as K → ∞.


An alternative, simpler proof, due to one of my students, is this: Proceeding by contradiction,
suppose I − H were singular. Then there would be a unit vector x with Hx = x, establishing the
inequality kHk ≥ 1.
b. Since A + H = A(I + A−1 H), by Proposition 4.3 of Chapter 1, it suffices to show
that I + A−1 H is invertible. This is immediate from part a, since kA−1 Hk ≤ kA−1 kkHk < 1 by
hypothesis.
c. Suppose A is invertible. We must show that there is an open ball centered at A
2
in Mn×n = Rn consisting of invertible matrices. The notation gets a bit confusing here, as the
Euclidean length and the matrix norm are different entities. However, as Exercise 5.1.5 establishes,
P 2 1/2
if hij < 1/kA−1 k, then kHk < 1/kA−1 k, and so, by part b, the matrix A + H will be
invertible.
P

6.1.7 a. When kHk < 1, we have (I + H)−1 = (−1)k H k , so
k=0

X∞ X ∞
kHk ε

k(I + H)−1 − Ik = (−1)k H k ≤ kHkk = < ,
1 − kHk 1−ε
k=1 k=1

x 1
since the function f (x) = = −1 + is increasing on (0, 1).
1−x 1−x
−1 P
∞ 
b. We have (A + H)−1 − A−1 = A(I + A−1 H) − A−1 = (−1)k (A−1 H)k A−1 ,
k=1
and so, proceeding as in part a, we have


X ∞
X k ε
k(A + H)−1 − A−1 k ≤ kA−1 k kA−1 Hkk ≤ kA−1 k kA−1 kkHk < kA−1 k .
1−ε
k=1 k=1
6.1. THE CONTRACTION MAPPING PRINCIPLE 165

ε P 1/2
c. Given ε > 0, set δ = . If h2ij < δ, then kHk < δ, and it
kA−1 k kA−1 k + ε)
follows from part b that
ε
kA−1 k+ε
kf (A + H) − f (A)k < kA−1 k = ε,
1 − kA−1ε k+ε

as required.

6.1.8 a. We have
 
′ 1 ′′ ′ g(x0 ) 1 1
2
g(x1 ) = g(x0 ) + g (x0 )h0 + g (ξ)h0 = g(x0 ) + g (x0 ) − ′ + g ′′ (ξ)h20 = g ′′ (ξ)h20 .
2 g (x0 ) 2 2
Therefore,
1 g(x0 )2 1 M |g(x0 )| 1
|g(x1 )| ≤ M ′ 2
= |g(x0 )| ′ 2
≤ |g(x0 )|,
2 g (x0 ) 2 g (x0 ) 4
as required.
b. Applying the Mean Value Theorem to g ′ gives g ′ (x1 ) = g ′ (x0 ) + g ′′ (c)h0 for some
|g(x0 )|
c between x0 and x1 . Note that |g ′′ (c)h0 | ≤ M ′ ≤ 12 |g ′ (x0 )|. Therefore, by the triangle
|g (x0 )|
1 2
inequality, we have |g ′ (x1 )| ≥ |g ′ (x0 )| − |g ′′ (c)h0 | ≥ 12 |g ′ (x0 )|, and so ′ ≤ ′ . Thus, we
|g (x1 )| |g (x0 )|
have  
|g(x1 )| 1 4 |g(x0 )|
≤ |g(x 0 )| = ′ .
g ′ (x1 )2 4 g ′ (x0 )2 g (x0 )2
It follows that
M |g(x1 )| M |g(x0 )| 1
′ 2
≤ ′ 2
≤ .
g (x1 ) g (x0 ) 2
c. We now have
 
|g(x1 )| 1 2 1
|h1 | = ′ ≤ |g(x0 )| ′
= |h0 |.
|g (x1 )| 4 |g (x0 )| 2
d. Because of the final result in part b, the arguments can all be iterated to show that
1 1 2 |g(xk )| |hk−1 |
|g(xk )| ≤ |g(xk−1 )|, ≤ ′ , and |hk | = ≤
4 |g ′ (xk )| |g (xk−1 )| |g ′ (xk )| 2
P
for all k ∈ N. Therefore, we have |hk | ≤ |hk−1 |/2 ≤ |hk−2 |/4 ≤ · · · ≤ |h0 |/2k . Since the series hk
converges absolutely, it follows from Proposition 1.1 that it converges. Moreover,
X∞ X ∞ ∞
X 1

hk ≤ |hk | ≤ |h0 | = |h0 |,
2k
k=1 k=1 k=1

so Newton’s method converges to a point in the interval of radius |h0 | centered at x1 .

6.1.9 a. We have x0 = 1 and x1 = x0 −g(x0 )/g ′ (x0 ) = 1.5, so h0 = −0.5. On the interval [1, 2],
we have |g ′′ | = 2 = M , and |g(x0 )|M = 2 ≤ 12 (4) = 12 (g ′ (x0 ))2 . Therefore, we are guaranteed that
Newton’s method will converge to a root in the interval [1, 2]. We have x2 = x1 − g(x1 )/g ′ (x1 ) =
1.41667, x3 = 1.41422, x4 = 1.41421, etc.
166 6. SOLVING NONLINEAR PROBLEMS

b. We have x0 = 1.25 and x1 = x0 − g(x0 )/g ′ (x0 ) = 1.26, so h0 = −0.01. On the interval
[1.25, 1.27], we have |g ′′ | ≤ M = 7.62 < 8, so |g(x0 )|M ≈ 0.38 < 2.34 ≈ 12 (g ′ (x0 ))2 . Therefore, we
are guaranteed that Newton’s method will converge to a root in the interval [1.25, 1.27]. Indeed,
we have x2 = x3 = ... = 1.25992105.
c. We have x0 = π/4 ≈ 0.785398 and x1 = x0 − g(x0 )/g ′ (x0 ) = 0.523599, so h0 ≈
−0.2618. Now, |g ′′ (x)| = |4 cos 2x| ≤ 4 = M , and we have |g(x0 )M | = (π/4)(4) ≤ 12 (3)2 =
1 ′ 2
2 (g (x0 )) , so we are guaranteed that Newton’s method will converge to a root in the interval
[0.26, 0.79]. Indeed, we have x2 = 0.514961, x3 = 0.514933, etc.

6.1.10 This follows the proof of the one-dimensional version in Exercise ??.
a. By Proposition 3.2 of Chapter 5, for each i = 1, . . . , n, we have gi (x1 ) = gi (x0 ) +
Dgi (x0 )h0 + 12 Hgi ,x0 +ξi h0 (h0 ) for some 0 < ξi < 1. Moreover, we have |Hgi ,x0 +ξi h0 (h0 )| ≤ Mi kh0 k2 .
By the definition of h0 , we have g(x0 ) + Dg(x0 )h0 = 0, and so

1 X 1/2 1
n
1 1
kg(x1 )k ≤ (Hgi ,x0 +ξi h0 (h0 ))2 ≤ M kh0 k2 ≤ M kDg(x0 )−1 k2 kg(x0 )k2 ≤ kg(x0 )k.
2 2 2 4
i=1

b. The Mean Value Inequality tells us that, for each i = 1, . . . , n, kDgi (x1 ) − Dgi (x0 )k ≤
Mi kh0 k, and so, by Exercise 5.1.5, we have kDg(x1 ) − Dg(x0 )k ≤ M kh0 k. Then, following the

hint, if we let H = Dg(x0 )−1 Dg(x1 ) − Dg(x0 ) , then we see that kHk ≤ kDg(x0 )−1 kM kh0 k =
kDg(x0 )−1 k2 kg(x0 )kM ≤ 1/2, and so, by part a of Exercise 7, we have k(I + H)−1 − Ik ≤
1. Therefore, kDg(x1 )−1 − Dg(x0 )−1 k ≤ kDg(x0 )−1 kk(I + H)−1 − Ik ≤ kDg(x0 )−1 k, and so
kDg(x1 )−1 k ≤ 2kDg(x0 )−1 k.
c. Combining the results of parts a and b, we obtain kDg(x1 )−1 k2 kg(x1 )k ≤
4kDg(x0 )−1 k2 14 kg(x0 )k = kDg(x0 )−1 k2 kg(x0 )k. Therefore, kDg(x1 )−1 k2 kg(x1 )kM ≤
kDg(x0 )−1 k2 kg(x0 )kM ≤ 1/2.
d. Using the results of parts a and b, and substituting once for kh0 k, we have

1
kh1 k = kDg(x1 )−1 g(x1 )k ≤ kDg(x1 )−1 kkg(x1 )k ≤ 2kDg(x0 )−1 k M kh0 k2
2
−1 2
≤ kDg(x0 ) k kg(x0 )kM kh0 k ≤ kh0 k/2.

e. Because of the result of part c, the arguments can all be iterated to show that

kg(xk )k ≤ 14 kg(xk−1 )k, kDg(xk )−1 k ≤ 2kDg(xk−1 )−1 k, and khk | ≤ 12 khk−1 k

for all k ∈ N. Therefore, we have khk k ≤ khk−1 k/2 ≤ khk−2 k/4 ≤ · · · ≤ kh0 k/2k . Since the series
P
hk converges absolutely, it follows from Proposition 1.1 that it converges. Moreover,
X∞ X ∞ ∞
X 1

hk ≤ khk k ≤ kh0 k = kh0 k,
2k
k=1 k=1 k=1

so Newton’s method converges to a point in the closed ball of radius kh0 k centered at x1 .
6.1. THE CONTRACTION MAPPING PRINCIPLE 167
   
1 1
6.1.11 a. We have x0 = and x1 = x0 − Dg(x0 )−1 g(x0 ) = . We have g(x0 ) =
0 1/4
  " # " #
1 4 0 0 0
and Dg(x0 ) = , so kDg(x0 )−1 k = 1/4. We have Hess(g1 ) = and Hess(g2 ) =
−1 0 4 0 2
" #
0 4 √
, so M1 = 2 and M2 = 4. Therefore, we have M = 2 5. Then kDg(x0 )−1 k2 kg(x0 )kM =
4 0
√ √ √
( 41 )2 ( 2)(2 5) = 10/8 < 1/2, so we are guaranteed that Newton’s method will converge to a
   
0.983871 0.983858
root in the ball B(x1 , 1/4). In fact, x2 = and x3 = x4 = · · · = .
0.254032 0.254102
     
2 −1 9/4 −1
b. We have x0 = and x1 = x0 −Dg(x0 ) g(x0 ) = . We have g(x0 ) = ,
0 −1
" # " 1/3 # " #
4 0 2 0 0 3/2
Dg(x0 ) = , so kDg(x0 )−1 k = 1/3. We have Hess(g1 ) = and Hess(g2 ) = ,
0 3 0 2 3/2 0
so M1 = 2 and M2 = 3/2. Therefore, we have M = 5/2. Then kDg(x0 )−1 k2 kg(x0 )kM =

( 31 )2 ( 2)( 52 ) ≈ 0.39 < 1/2, so we are guaranteed that Newton’s method will converge to a root in
   
2.216164 2.215733
the ball B(x1 , 5/12). In fact, x2 = , x3 = x4 = · · · = .
0.301309 0.300879
     
π π 3.141593
c. We have x0 = and x1 = ≈ , so kh0 k = 1/2π. We have
0 1/2π 0.159155
  " # " #
0 −4 0 −4 sin x1 0
g(x0 ) = and Dg(x0 ) = , so kDg(x0 )−1 k = 1/4. We have Hess(g1 ) =
−1 0 2π 0 2
" #
0 2 √
and Hess(g2 ) = , so M1 ≤ 4 (in fact 2 is more accurate) and M2 = 2, so M = 2 5. Then
2 0
√ √
kDg(x0 ) k kg(x0 )kM = ( 14 )2 (1)(2 5) = 5/8 < 1/2. Thus, we are guaranteed that Newton’s
−1 2
 
3.1478998
method will converge to a root in the ball B(x1 , 1/2π). In fact, we have x2 = ,
0.1588354
 
3.1478999
x3 = .
0.1588361
     
0 1/4 −1/4
d. We have x0 = and x1 = . We have g(x0 ) = and Dg(x0 ) =
1 1 0
" # " # " #
1 −1/2 0 0 cos x1 0
, so kDg(x0 )−1 k ≈ 1.28. We have Hess(g1 ) = and Hess(g2 ) = ,
0 1 0 −1/2 0 0

so M1 = 1/2 and M2 = 1. Thus, we have M = 5/2, and so kDg(x0 )−1 k2 kg(x0 )kM ≈
(1.28)2 (0.25)(1.12) ≈ 0.46 < 1/2. Therefore, we are guaranteed
 that
 Newton’s
 method
 will converge
0.236167 0.236299
to a root in the ball B(x1 , 1/4). Indeed, we have x2 = , x3 = .
0.972335 0.972211

6.1.12 Following the hints, we set g(t) = f (a + t(b − a)) and v = g(1) − g(0), and consider
φ(t) = g(t) · v. Since g is differentiable, so is φ, and so, by the Mean Value Theorem, we have
kvk2 = φ(1) − φ(0) = φ′ (c) = g′ (c) · v ≤ kg′ (c)kkvk for some 0 < c < 1. Therefore, we have kf (b) −
f (a)k = kvk ≤ kg′ (c)k for that value of c. Now, by the chain rule, g′ (c) = (Df (a + c(b − a))(b − a),
so kg′ (c)k ≤ kDf (a + c(b − a))kkb − ak. Setting ξ = a + c(b − a), we obtain the desired result.
168 6. SOLVING NONLINEAR PROBLEMS

6.2. The Inverse and Implicit Function Theorems

1
6.2.1 Note once and for all that
" all the # functions f are C .
x −y
a. We have Df (x) = 2 , so Df (x) is invertible provided x2 + y 2 6= 0. So, for ev-
y x
" #
1 −1 1 x0 y 0
ery x0 6= 0, f has a local C inverse g near x0 , and Dg(f (x0 )) = Df (x0 ) = .
2(x20 + y02 ) −y0 x0
 2 
y − x2 −2xy " #
 (x2 + y 2 )2 (x2 + y 2 )2  1 y 2 − x2 −2xy
b. We have Df (x) =   −2xy
= . Thus,
x2 − y 2  (x2 + y 2 )2 −2xy x2 − y 2
(x2 + y 2 )2 (x2 + y 2 )2
Df (x) is invertible if and only if (x2 − y 2 )2 + 4x2 y 2 = (x2 + "y 2 )2 6= 0. So, for every
# x0 6= 0, f has a
2 2
x0 − y0 2x0 y0
local C1 inverse g near x0 , and Dg(f (x0 )) = Df (x0 )−1 = − .
2x0 y0 y02 − x20
" #
1 h′ (y)
c. We have Df (x) = , which is invertible for all x. Then for any x0 , f has
0 1
" #
1 −1 1 −h′ (y0 )
a local C inverse g near x0 , and Dg(f (x0 )) = Df (x0 ) = . (Indeed, we can write
0 1
   
x x − h(y)
down an explicit global inverse: g = .)
y y
" #
1 ey
d. We have Df (x) = , so Df (x) is invertible provided ex+y 6= 1, i.e., provided
ex 1
1 −1
# x0 , f has a local C inverse g near x0 and Dg(f (x0 )) = Df (x0 ) =
x + y 6= 0." For any such
1 1 −ey0
.
1 − ex0 +y0 −ex0 1
 
1 1 1
 
e. We have Df (x) =  y + z x + z x + y . Now
yz xz xy
   
1 1 1 1 1 1
   
y +z x+z x+y 0 x−y x−z ,
yz xz xy 0 0 (x − z)(y − z)

so Df (x0 ) is nonsingular if and only if x0 , y0 , and z0 are all distinct. With (more than) a bit of
algebra, we find that
 
x20 −x0 1
 (x0 − y0 )(x0 − z0 ) (x0 − y0 )(x0 − z0 ) (x0 − y0 )(x0 − z0 ) 
 
 −y 2 y 0 −1 
Dg(f (x0 )) = Df (x0 )−1 = 0
 (x − y )(y − z ) (x − y )(y − z ) (x − y )(y − z )  .

 0 0 0 0 0 0 0 0 0 0 0 0 
 z02 −z0 1 
(x0 − z0 )(y0 − z0 ) (x0 − z0 )(y0 − z0 ) (x0 − z0 )(y0 − z0 )
6.2. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 169

2 2
6.2.2 a.  in R . In fact, since (u + v) − 4uv =
Note that f maps U to the (open) first quadrant
x
(u − v)2 > 0, we see that f maps U to the set of points with x2 > 4y. Indeed, consider
y
  " p #   
1 2 − 4y
x x + p x x 2
g = 1 2  . Then, letting W = : x > 4y > 0, x > 0 , we see that
y 2 x− x2 − 4y y
g : W → U is the global inverse function of f .
b. Calculating directly, we have

 
x −2 " #
  1+ p p p 
x 1 x2 − 4y x2 − 4y 
= p 1
1
x + px2 − 4y −1
Dg = 
 x 2 
2  .
y 2 1− p p x2 − 4y − 12 x − x2 − 4y 1
x2 − 4y x2 − 4y

   
u x
On the other hand, by the Inverse Function Theorem, if f = , then we have
v y

    −1 " # " p  #


1
x u 1 u −1 1 x + px2 − 4y −1
Dg = Df = =p 2  ,
y v u−v −v 1 x2 − 4y − 12 x − x2 − 4y 1

and the two answers agree.


c. Say the roots of the polynomial p(z) = z 2 − xz + y are u and v. Then p(z) =
(z − u)(z − v) = z 2 − (u + v)z + uv, so we have x = u +v and
 y = uv. Assuming the roots
u x
are positive and distinct, then our function f associates to the coefficient vector , and
v y
the inverse function g gives the roots. In Example 2 in Chapter 4, Section 5, the upper and
lower “sheets” of the surface correspond to u and v respectively. Where they come together the
polynomial has a double root and we can no longer find a (local) C1 inverse for the function f .
1
6.2.3 Note once and for all h that all the functions F are C . i  
1
2
a. We have DF = −3x − 2π cos(π(x − y)) 2y + 2π cos(π(x − y)) , so DF =
−1
h i ∂F
−3 − 2π −2 + 2π . Since (a) 6= 0, we can solve for y locally as a C1 function y = φ(x), and
∂y
∂F
∂x (a) 2π + 3
φ′ (x0 ) = − ∂F = .
∂y (a) 2π − 2
h i
b. We have DF = yex1 y − x2 y 2 sin x1 x2 −x1 y 2 sin x1 x2 x1 ex1 y + 2y cos x1 x2 , so
 
1 h i ∂F
DF 2 = 0 0 1 . Since (a) 6= 0, we can solve for y locally as a C1 function y = φ(x),
∂y
0
1 ∂F h i h i
and Dφ(x0 ) = − ∂F (a) = − 0 0 = 0 0 .
∂y (a)
∂x
170 6. SOLVING NONLINEAR PROBLEMS
 
h i 0
c. We have DF = yex1 y y 2 /(1 + x22 ) x1 ex1 y + 2y arctan x2 , so DF 1 =
1
h i ∂F
1 1/2 π/2 . Since (a) =6 0, we can solve for y locally as a C1 function y = φ(x), and
∂y
1 ∂F 2h i
Dφ(x0 ) = − ∂F (a) = − 1 1/2 .
∂y (a)
∂x π
" #   " #
2
2x −2y1 −2y2 4 −2 −2
d. We have DF = , and DF 1 = . The matrix
1 −1 1 1 −1 1
1
" #
∂F −2 −2
(a) = is invertible, so we can solve for y locally as a C1 function φ(x) and φ′ (x0 ) =
∂y −1 1
 −1 " #" # " #
∂F ∂F 1 1 2 4 3/2
− (a) (a) = = .
∂y ∂x 4 1 −2 1 1/2
 
" # 2 " #
2x1 −2x2 −3y12 2y2 −1 4 2 −12 2
e. DF = , so DF  
 2 = −2 .
2x2 2x1 + 2x2 −4y1 12y23 2 −8 12
1
" #
∂F −12 2
The matrix (a) = is invertible, so we can solve for y locally as a C1 function
∂y −8 12
 −1 " #" # " #
∂F ∂F 1 6 −1 4 2 1 13 5
φ(x), and Dφ(x0 ) = − (a) (a) = = .
∂y ∂x 32 4 −6 −2 2 16 14 −2
  " #
x
x2 y + xy 2 + t2 − 1
6.2.4 Consider the C1 function F : R3 → R2 given by F y  = . Then
x2 + y 2 − 2yt
t
" #   " #
−1
2xy + y 2 x2 + 2xy 2t −1 −1 2
DF = , so DF  1 = . Since the submatrix
2x 2y − 2t −2y −2 0 −2
1
  " #  
∂F ∂F −1 −1 x
A= = is invertible, it follows that locally we can solve for as a C1
∂x ∂y −2 0 y
" # " #" # " #
2 1 0 −1 2 −1
function of φ(t). We have φ′ (1) = −A−1 = − = . Thus, the
−2 2 −2 1 −2 3
   
−1 −1
tangent vector of the curve F−1 (0) at  1  is  3 , and the tangent line of the curve is given
1 1
   
−1 −1
parametrically by g(t) =  1  + s  3 .
1 1
 
h 1 h i i
1
6.2.5 Of course, F is C ; we have DF = 2 x − z 2y −x − z , so DF 1 = 2 0
 2 −2 .
1
 
∂F 1 x
Since (a) 6= 0, on the surface F = 0 we can express z locally near a as a C function φ . In
∂z y
6.2. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 171
p
fact, we can solve explicitly: z = −x ± 2(x2 + y 2 ). The level surface F = 0 is a cone, and we can
solve for z explicitly as the graph of a C1 function away from the origin.

6.2.6 Let the sides of the


  triangle be x, y, and z, and let θ be the included angle between
x
y  1
the first two. Consider F   2 2 2
z  = x + y − 2xy cos θ − z = 0. Clearly, F is C . We have
θ
 
h i x
∂F
DF = 2 x − y cos θ y − x cos θ −z xy sin θ , so 6 0 and we can solve locally for θ = φ y .
=
∂θ
z
We have
1 h i
Dφ = − x − y cos θ y − x cos θ −z .
xy sin θ

∂φ
Now we claim that, no matter what the shape of the triangle is, is the largest of the three partial
∂z
derivatives in absolute value; that is, the angle is most sensitive to a small change in the opposite
side. (The clever student is invited to find a heuristic geometric argument for this.) But this is not
difficult: note that z > |x−y cos θ|, inasmuch as z 2 = x2 +y 2 −2xy cos θ > x2 +y 2 cos2 θ−2xy cos θ =
(x − y cos θ)2 (and similarly when we switch x and y).

6.2.7 a. We know that f is C1 (since, for example, the entries are polynomial functions). We
2 2
have Df (A)B = AB + BA, so Df (I)B = 2B, and the linear map Df (I) : Rn → Rn is certainly
invertible. Therefore, in a neighborhood of I the function f has a C1 inverse, thereby giving a C1
square root for all matrices sufficiently close to f (I) = I. Similarly, Df (−I) is invertible, and so we
get a local C1 inverse on a neighborhood of −I as well.
b. Note that there are infinitely many matrices A so that f (A) = I. Aside from ±I, the
standard matrix for every reflection (across every k-dimensional subspace, 0 < k < n) is such. But
f is not locally invertible in a neighborhood
" # of any such matrix." Indeed, as the
# hint suggests, let’s
1 2b11 0
examine what happens at A0 = . Then Df (A0 )B = , so Df (A0 ) has a 2-
−1 0 −2b22
dimensional nullspace and is certainly not invertible; this corresponds to changing the 1-dimensional
subspaces on which the reflection is respectively the identity and negative the identity.
Of course, the Inverse Function Theorem gives a sufficient, but not necessary, condition to have
a local inverse. So, to be "sure, we
# should find a matrix B near I that has no square root near A0 .
1 ε
But this is easy: let B = for ε 6= 0.
0 1

6.2.8 We have

∂F ∂F ∂F
     
∂p ∂V ∂T ∂p
= − ∂V , = − ∂T , and =− ,
∂V T
∂F ∂T p
∂F ∂p V
∂F
∂p ∂V ∂T
172 6. SOLVING NONLINEAR PROBLEMS

and so

∂F ∂F ∂F
     
∂p ∂V ∂T ∂V ∂T ∂p
=− = −1.
∂V T ∂T p ∂p V
∂F ∂F ∂F
∂p ∂V ∂T

(Cf. Exercise 3.1.6 for the case of an ideal gas.)


 
1 ∂V 1 R 1
6.2.9 a. For one mole of an ideal gas, we have pV = RT , so α = = · =
V ∂T p V p T
   
1 ∂V 1 RT 1
and β = − =− · − 2 = .
V ∂p T V p p
       
∂p ∂T α ∂V ∂p
b. Note that = 1 and =− , so, using the result of
∂T V ∂p V β ∂T p ∂V T
     
∂p ∂p ∂V α
Exercise 8, we have =− = .
∂T V ∂V T ∂T p β

6.2.10 Given kDf (x) − Ik ≤ 12 for kxk ≤ r, we have (by Proposition 1.3) kf (x) − xk ≤ 12 kxk
whenever kxk ≤ r. By Exercise 1.2.17, we have kxk − kf (x)k ≤ kf (x) − xk ≤ 21 kxk, and so
kf (x)k ≥ 21 kxk. In particular, when kxk = r, we have kf (x)k ≥ r/2.

6.2.11 Following the proof of Theorem 2.1, define φ(x) = x−f (x)+y. Then Dφ(x) = I −Df (x),
so kDφ(x)k ≤ s < 1 and φ is a contraction mapping. If x ∈ B, then kφ(x)k ≤ kx − f (x)k + kyk <
sr + r(1 − s) = r, so φ is a contraction mapping from B to itself. Therefore, φ has a unique fixed
point x, which in turn is a point so that f (x) = y. (In fact, x ∈ B, since the image of φ lies in the
open ball.)

6.2.12 Start by translating in Rn so that a = 0. By renumbering h the variables


i in Rn and
making a linear change of basis in Rm , we may assume that Df (0) = I ∗ . Write vectors in
  
    x
x x f
Rn as , where x ∈ Rm and y ∈ Rn−m . Define F : Rn → Rm × Rn−m by F = y .
y y
y
" #
Im ∗
Then we have DF(0) = , so DF(0) is invertible. Therefore, F has a local inverse G
O In−m
defined in a neighborhood V × W ⊂ Rm × Rn−m of 0. For c ∈ V , we have

" #     
   c
c c f G
=F G =  0 ,
0 0

 
c
and so b = G is a point (near 0) with f (b) = c.
0
6.3. MANIFOLDS REVISITED 173
   
  x
x  f t 
6.2.13 a. Consider the C1 function F : R3 → R2 given by F =   
 ∂f x . Then the
t
  ∂t t
x0
hypothesis of the problem tells us precisely that near the equation F = 0 defines x = g(t)
t0
 
g(t)
for some C1 function g. The equation f = 0 tells us that g(t) lies on the curve Ct . Now,
t
 
g(t)
differentiating h(t) = f = 0 gives us
t
   
′ g(t) ′ ∂f g(t)
0 = h (t) = ∇f · g (t) + ,
t ∂t t
 
∂f g(t)
so having = 0 tells us as well that g′ (t) is tangent to Ct . Thus, g gives (locally) a
∂t t
parametrization of the envelope, as desired.
b. We solve the"equations
# given in part a for x and y as functions of t.
cos t
(i) g(t) = , so the envelope is the circle x2 + y 2 = 1
sin t
" #
1/2t
(ii) g(t) = , so the envelope is the hyperbola 4xy = 1
t/2
" #
t3/2
(iii) g(t) = , so the envelope is the hypocycloid x2/3 + y 2/3 = 1
(1 − t)3/2

6.3. Manifolds Revisited

6.3.1 If X were a 1-dimensional manifold, in a neighborhood of 0 it would have to be a graph


of the form y = f (x) or x = f (y) for some smooth function f . It is clearly not a graph over the
y-axis, and f (x) = |x| is far from differentiable. The so-called parametrization has 0 derivative at
t = 0.

6.3.2 This subset of R2 seems to fulfill the requirements of definition (3), and yet in no neigh-
borhood of the origin is it a graph over either of the coordinate axes. What goes wrong is this:
lim g(t) = 0 = g(−π/4), so g−1 fails to be continuous at the origin.
t→π/4

6.3.3 No, any neighborhood of 0, for example, contains portions of infinitely many of these
parallel lines and is therefore not a graph over either coordinate axis.

6.3.4 Yes. Despite the fact that the hyperbola gets closer and closer to the asymptote as
set,we can find a ball W ⊂ R2 centered at p in which we
|x| → ∞, given any point p in this   have

a a
a graph. To wit, say a > 0; if p = , then take W = B(p, min(1/2a, 1/2)), and if p = ,
0 1/a
174 6. SOLVING NONLINEAR PROBLEMS

then take W = B(p, 1/2a). Alternatively, we can observe that this locus is the zero set of the
x
h i
function F = y(xy − 1), and DF = y 2 2xy − 1 is everywhere nonzero.
y
 
   2
 x  
x y   x − y2
6.3.5 a. explicit: graph of = f (y) = ; implicit: zero set of F y = ; note
z y4 z − x2
z
that DF has rank 2 everywhere.
√ p
  b. explicit: graph (locally) of y = ±3 1 − x2 or x = ± 1 − (y/3)2 ; implicit: zero set of
x
F = x2 + y 2 /9 − 1, whose derivative has rank 1 everywhere on the curve.
y
   
cos t 1p− 12 cos2 t
c. parametric: g(t) =  sin t , |t| < π/3, or g(t) =  ± cos t 1 − (cos2 t)/4 ,

± 2 cos t − 1 sin t
   √     1 2

y ± 1−x 2 x p 2 (1 + z )
|t| < π/2; explicit: graph (locally) of = √ or = .
z ± 2x − 1 y ± 1 − (1 + z 2 )2 /4

 d. parametric: The curve has two connected portions, one parametrized by g+ (t) =
cos t cos t    √ 
 sin t   sin t  y 1 − x2

       2  or
 − sin t , the other by g− (t) =  sin t ; explicit: graph (locally) of z = ± − 1 − x
w x
cos t − cos t
√ 
1 − x2

±  1 − x2 .
−x

6.3.6 a. g is one-to-one, for if g(u) = g(v), then u + u2 = v + v 2 and u2 = v 2 , from which we


deduce immediately that u = v.
b. As in Example 1, we write
       
u u + u2 0 u + u2
     
G v1  =  u2  +  v1  =  u2 + v1  .
v2 u3 v2 u3 + v2
   
u x
Since DG(0) = I, we know that G has a C1 local inverse. Indeed, setting G v1  =  y , we
v2 z
√ 
determine that u = 21 −1 ± 4x + 1 , so we get two explicit C1 inverse functions on the domain
x > −1/4. Namely, we have
 √   √ 
  −1+ 4x+1   −1− 4x+1
x  2 √  x  2 √ 
H+  y  =  y − x + −1+ 24x+1  and H− y  =  y − x + −1− 24x+1  .
  √ 3    √ 3 
z z − −1+ 24x+1 z z − −1− 24x+1

As before, we define F by using the second and


 third components of our functions H± . In fact, a
x
" #
y − (x − y)2
global function F is immediately obvious: F y  = .
z − (x − y)3
z
6.3. MANIFOLDS REVISITED 175
 
  cos u
√ p u
6.3.7 a. explicit: graph of y = ± 1 − x2 or x = ± 1 − y 2 ; parametric: g =  sin u .
v
v
 
  u cos v
p u
2 2
b. explicit: graph of z = ± x + y ; parametric: g = u sin v .

v
±u
 
x
c. implicit: zero-set of F y  = y cos z − x sin z (x2 + y 2 6= 0), whose derivative has rank
z
1 everywhere; explicit: locally the graph of y = x tan z or x = y cot z.
 
x
d. implicit: zero-set of F y  = x2 + y 2 + z 2 − 1 (y = 0, x ≥ 0 omitted); explicit: locally
z
p p √
the graph of z = ± 1− x2 − y2, x = ± 1 − y 2 − z 2 , y = ± 1 − x2 − z 2 .
 
x
e. implicit: zero-set of F y  = x2 + y 2 + (z/2)2 − 1 (y = 0, x ≥ 0 omitted); explicit:
z
p p p
locally the graph of z = ±2 1 − x2 − y 2 , x = ± 1 − y 2 − (z/2)2 , y = ± 1 − x2 − (z/2)2 .
 
x p 2
f. implicit: zero-set of F y  =
 x2 + y 2 − 3 + z 2 − 4; explicit: locally the graph of
z
q p 2
z=± 4− x2 + y 2 − 3 , etc.
 
x
6.3.8 a. Let F y  = (x2 + y 2 + z 2 )2 − 10(x2 + y 2 ) + 6z 2 + 9. Then
z
h i
DF = 4 x(x2 + y 2 + z 2 − 5) y(x2 + y 2 + z 2 − 5) z(x2 + y 2 + z 2 + 3) .

Suppose DF (x) = 0 and x ∈ X. Then z = 0, so we must have x2 + y 2 = 5 (since 0 ∈ / X). But


25 − 50 + 9 6= 0, so there is no such x ∈ X. Therefore, DF (x) has rank 1 for every x ∈ X and so
X is a 2-manifold.
p p
b. ( x2 + y 2 − 2)2 + z 2 = 1 ⇐⇒ x2 + y 2 + z 2 − 4 x2 + y 2 = −3 ⇐⇒ x2 + y 2 + z 2 + 3 =
p
4 x2 + y 2 ⇐⇒ (x2 + y 2 + z 2 )2 + 6(x2 + y 2 + z 2 ) + 9 = 16(x2 + y 2 ) ⇐⇒ (x2 + y 2 + z 2 )2 −
10(x2 + y 2 ) + 6z 2 + 9 = 0. The alternative equation shows us
 that
 X is the set of points obtained
0
by rotating a circle of radius 1 in the yz-plane centered at  2  about the z-axis. The surface so
0
obtained is a torus.
 
x
6.3.9 Let F y  = (x2 + y 2 + z 2 )2 − 4(x2 + y 2 ). Then
z
h i
DF = 4 x(x2 + y 2 + z 2 − 2) y(x2 + y 2 + z 2 − 2) z(x2 + y 2 + z 2 ) .
176 6. SOLVING NONLINEAR PROBLEMS
 
x
Suppose DF (x) = 0. Then either x2 + y 2 + z 2 = 2 and z = 0, so x2 + y 2 = 2 and F y  = −4, or
z
else x = y = z = 0. So 0 is the only point of X at which it fails to be a smooth surface. (See
 the

a
picture of the “doughnut with no hole,” Figure 3.15 on p. 298.) The tangent space of X at p =  b 
c
  
 x 
is N(DF (p)) =  y  : a(a2 + b2 + c2 − 2)x + b(a2 + b2 + c2 − 2)y + c(a2 + b2 + c2 )z = 0 .
 
z
" # " #
x21 + x22 + x23 + x24 − 4 2x1 2x2 2x3 2x4
6.3.10 Let F(x) = . Then DF = . Suppose
x1 x2 + x3 x4 x2 x1 x4 x3
rank(DF(x)) ≤ 1. Then either x1 = x2 and x3 = x4 or x1 = −x2 and x3 = −x4 . Then
0 = x1 x2 + x3 x4 = ±(x21 + x23 ), so x = 0, which is not a point on F = 0. Thus, rank(DF) = 2
everywhere onF =0 and this level set is a smooth surface in R4 .
1 " # " #
 1 2 2 −2 2 1 1 0 0
Let p =  
 −1 . Then DF(p) = 1 , so the
1 1 −1 0 0 1 −1
1
4
tangent
 space  the surface at p is {x ∈ R : x1 + x2 = 0, x3 − x4 = 0} and a basis for it is
 of

 −1 0 
   0 
 1 ,  .
  0   1 

 

0 1

Suppose M ∩ W is the n−k , where V ⊂ Rk is open. That is, M ∩ W =


6.3.11
   graph of f : V → R  
x n−k n−k x
∈ V ×R : y = f (x) . Then we define F : W → R by F = y − f (x). Then
y y
h i
M ∩ W = F−1 (0) and DF = ∗ I , so rank(DF(p)) = n − k for all p ∈ W .
 
  kxk2 − 1
x  
6.3.12 Consider F : R3 × R3 → R3 given by F =  kyk2 − 1 . Then we have
y
x·y

 
2x1 2x2 2x3 0 0 0
 
DF =  0 0 0 2y1 2y2 2y3  .
y 1 y 2 y 3 x1 x2 x3

Suppose F(x) = 0. If rank(DF(x)) < 3, then y = λx and x = µy for some scalars λ and µ, and
since both x and y are unit vectors, we must have λ, µ = ±1. But then x · y = 0 is impossible.
Therefore, rank(DF(x)) = 3 for all x ∈ F−1 (0), and so F−1 (0) is a 3-dimensional manifold. In
fact, this manifold can be visualized as the collection of all unit tangent vectors of the unit sphere
S 2 ⊂ R3 .
6.3. MANIFOLDS REVISITED 177

6.3.13 a. As suggested in the hint, consider F : Mn×n → {symmetric n × n matrices} = Rn(n+1)/2


given by F(A) = AT A − I. Then O(n) = F−1 (O), and we need only check that DF (A) has rank
n(n + 1)/2 for every A ∈ O(n). It suffices to see that every symmetric matrix C can be written
as DF(A)B for some B ∈ Mn×n . By Exercise 3.1.13, we have DF(A)B = AT B + B T A. Let

B = 21 AC. Then AT B + B T A = 12 AT (AC) + (C T AT )A = 12 (C + C T ) = C, as required.
b. TI O(n) = N(DF(I)) = {B ∈ Mn×n : B + B T = O}.

Let h: V →U be n−k by f = g ◦ h. Then for


6.3.14  of g1 and define f : V → R
 the local inverse 2
x g1 (u)
x ∈ V , we have = = g(u) for u = h(x).
f (x) (g2 ◦ h)(g1 (u))
In general, since rank(Dg) = k, we can reorder the coordinates in Rn to arrange that the
derivative matrix has the form given.
CHAPTER 7
Integration
7.1. Multiple Integrals

7.1.1 Let P1 = {0 = x0 < x1 = 1} be the trivial partition of [0, 1] and let P2 = {0 = y0 <
y1 < y2 < y3 = 1} be a partition of [0, 1] with the properties that y1 ≤ 21 < y2 and y2 − y1 < ε; set
P = P1 × P2 . Then for j = 1 and 3, we have m1j = M1j , whereas m12 = 0 and M12 = 1. Then
U (f, P) − L(f, P) = (M12 − m12 )(y2 − y1 ) = y2 − y1 < ε,

and so, by the Convenient Criterion, Proposition 1.3, we infer that f is integrable. Now, for our
particular partition P, we have L(f, P) = 1 − y2 < 21 ≤ 1 − y1 = U (f,
Z P); thus, 1/2 is the only
number that can lie between all lower and upper sums, and therefore f dA = 1/2.
R

7.1.2 As suggested in the hint, let PN be the partition of R into 1/N × 1/N squares Rij ,
1 ≤ i, j ≤ N . Then whenever |i − j| > 1, we have mij = Mij = 0; but whenever |i − j| ≤ 1, we
have mij = 0 and Mij = 1. (Note that each of the squares with |i − j| = 1 has one corner on the
diagonal.) Therefore, we have U (f, PN )−L(f, PN ) = (3N −2)(1/N 2 ). Since lim (3N −2)/N 2 = 0,
N →∞
it follows that for any ε > 0 we can find N sufficiently large so that U (f, PN ) − L(f, PN ) < ε.
Therefore, f is integrable on R. On the other hand, since L(f, PN ) = 0 for every N , it must be
the case Zthat I = 0 is the unique number satisfying L(f, P) ≤ I ≤ U (f, P) for every partition P.
That is, f dA = 0.
R

7.1.3 Just as in the preceding problem, let PN be the partition of R into 1/N ×1/N squares Rij ,
1 ≤ i, j ≤ N . Then whenever i ≥ j, we have Mij = 1; if i < j, we have Mij = 0. On the other hand,
if i ≤ j + 1, then mij = 0 and if i > j + 1, then mij = 1. Thus, Mij − mij = 0 unless 0 ≤ i − j ≤ 1,
in which case Mij − mij = 1. Summing up, we have U (f, PN ) − L(f, PN ) = (2N − 1)(1/N 2 ).
Since lim (2N − 1)/N 2 = 0, it follows that for any ε > 0 we can find N sufficiently large so that
N →∞
U (f, PN ) − L(f, PN ) < ε. Therefore, f is integrable on R. Now, L(f, PN ) = (N − 1)(N − 2)/2N 2
and U (f, PN ) = N (N + 1)/2N 2 , both of which approach 1/2 as N → ∞. Therefore,
Z I = 1/2 is the
unique number satisfying L(f, P) ≤ I ≤ U (f, P) for every partition P. That is, f dA = 1/2.
R


X X∞
1 1 1 1
7.1.4 First, note that = − = . Given any 0 < ε < 1, choose
n(n + 1) n n+1 2
n=2 n=2
ε 1 ε
N ≤ 2/ε. Let P1 be the partition of [0, 1] with x0 = 0, x1 = , x2 = − ,
2 N −1 2(N − 1)N
178
7.1. MULTIPLE INTEGRALS 179

1 ε 1 ε 1 ε
x3 = + , x4 = − , x5 = + ,. . . ,
N − 1 2(N − 1)N N − 2 2(N − 2)(N − 1) N − 2 2(N − 2)(N − 1)
1 ε 1 ε
x2N −4 = − , x2N −3 = + , x2N −2 = 1. Let P2 be the trivial partition of [0, 1], and let
2 12 2 12
P = P1 × P2 . Then we claim that U (f, P) − L(f, P) < ε, so that f is integrable on R. Note that
mi1 = 0 for all i. When i is even, Mi1 = 0, and when i is odd, Mi1 = 1. Therefore,
X X
U (f, P) − L(f, P) = (Mi1 − mi1 )(xi − xi−1 ) = xi − xi−1
i i odd
ε ε ε ε 1 ε
+2
= + · · · + 2 < + 2 · · = ε,
2 2(N − 1)N 12 2 2 2
Z
as required. Moreover, since L(f, P) = 0 for every partition P, we must have f dA = 0.
R

7.1.5 a. Say x0 ∈ R and f (x0 ) > 0. By Exercise 2.3.5, there is a neighborhood of x0 on which
f ≥ f (x0 )/2, so there is a rectangle R′ containing x0 so that m′ = inf x∈R′ f (x) ≥ f (x0 )/2. For any
partition P′ for which R′ is Za rectangle belonging to P′ , we therefore have L(f, P′ ) ≥ m′ vol(R′ ) > 0.
Since f is integrable, I = f dV is the unique number satisfying L(f, P) ≤ I ≤ U (f, P) for all
R
partitions P; therefore 0 < L(f, P′ ) ≤ I, so I > 0.

1, Z
x=0
b. Let R = [0, 1] × [0, 1], and let f (x) = . Then f dA = 0, despite the
0, otherwise R
fact that f is positive at some point of R.

Z 7.1.6 a. Z This is immediate:


Z Since m ≤ f ≤ M , we infer from Proposition 1.11 that mvol(Ω) =
mdV ≤ f dV ≤ M dV = M vol(Ω).
Ω Ω Ω
b. Since f is continuous and Ω is compact, by Theorem 1.2 of Chapter 5 there are points
y and z ∈ Ω with f (y) = m and f (z) = M . From part a we deduce that
R
f dV
m≤ Ω ≤ M.
vol(Ω)
Now, let g : [0, 1] → Ω be a continuous path from g(0) = y to g(1) = z. Since the continuous
function f ◦ g takes the value m at 0 and the value MZ at 1, .
it follows from the Intermediate Value
Theorem that there is ξ ∈ [0, 1] with (f ◦ g)(ξ) = f dV vol(Ω). Letting g(ξ) = c, we have
Z . Z Ω
f (c) = f dV vol(Ω), and so f dV = f (c)vol(Ω), as desired.
Ω Ω

7.1.7 Let Mε = sup f (x) and mε = inf f (x). Then we have (as in Exercise 6a)
x∈B(a,ε) x∈B(a,ε)
R
B(a,ε) f dV
mε ≤ ≤ Mε .
volB(a, ε)
Since f is continuous at a, lim mε = f (a) = lim Mε , and so, by Exercise 2.3.3, we have
Z ε→0+ ε→0+
1
lim f dV = f (a), as required.
ε→0+ volB(a, ε) B(a,ε)
180 7. INTEGRATION

7.1.8 Let R′′ = R ∩ R′ . Note that Ω ⊂ R′′ . Let f˜′′ denote the extension of f to R′′ . Then since
f˜ is 0 outside R′′ except
Z perhapsZ on a set ofZ volume 0 (namely, the Z intersection
Z of the frontier
Z of R′′
with Ω), we have f˜dV = f˜dV = f˜′′ dV . Similarly, f˜′ dV = f˜′ dV = f˜′′ dV .
Z R Z R ′′
Z R′′ R′ R′′ R′′

Therefore, f˜dV = f˜′ dV , and f dV is well-defined.


R′′ R′ Ω

7.1.9 a. The crucial inequality given in the hint follows from, for example,

Mif +g = sup (f + g)(x) ≤ sup f (x) + sup g(x) = Mif + Mig .


x∈Ri x∈Ri x∈Ri

Since f and g are integrable on R, given ε > 0 there is a partition P′ so that U (f, P′ ) − L(f, P′ ) <
ε/2 and another partition P′′ so that U (g, P′′ ) − L(g, P′′ ) < ε/2. Letting P be the common
refinement of P′ and P′′ , we have U (f, P) − L(f, P) < ε/2 and U (g, P) − L(g, P) < ε/2. Therefore,
 
U (f, P) + U (g, P) − L(f, P) + L(g, P) < ε. From the inequality given in the hint, we have
 
U (f + g, P) − L(f + g, P) ≤ U (f, P) + U (g, P) − L(f, P) + L(g, P) < ε,

and so, by the Convenient Criterion, f + g is integrable on R. Then we have


Z
L(f, P) + L(g, P) ≤ L(f + g, P) ≤ (f + g)dV ≤ U (f + g, P) ≤ U (f, P) + U (g, P).
R
Z Z
But since f and g are integrable, gdV is the unique number between L(f, P) + L(g, P)
f dV +
R Z R
and U (f, P) + U (g, P) for all partitions P. Since (f + g)dV is also between these two numbers,
R
by uniqueness, we must have equality.
b. We may assume α > 0 (there is nothing to prove if α = 0). Given ε > 0, there is a
partition P so that U (f, P) − L(f, P) < ε/α. Then U (αf, P) − L(αf, P) < ε, so αf is integrable.
On the other hand, Miαf = αMif and mαf f
i = αmi , so
Z
L(αf, P) = αL(f, P) ≤ α f dV ≤ αU (f, P) = U (αf, P),
R
Z Z
and α f dV and (αf )dV are both the unique number between L(αf, P) and U (αf, P). There-
R R
fore, they are equal.
c. Given ε > 0, let P′ be a partition of R′ so that U (f, P′ ) − L(f, P′ ) < ε/2 and let P′′
be a partition of R′′ so that U (f, P′′ ) − L(f, P′′ ) < ε/2. Letting P be the partition of R obtained
by taking the “refined union” of the partitions P′ and P′′ ,1 we have
 
U (f, P) − L(f, P) ≤ U (f, P′ ) + U (f, P′′ ) − L(f, P′ ) + L(f, P′′ ) < ε.

It follows that f is integrable on R. Indeed, since


Z
′ ′′
L(f, P ) + L(f, P ) ≤ L(f, P) ≤ f dV ≤ U (f, P) ≤ U (f, P′ ) + U (f, P′′ ),
R

1
We take the union of the partitions in each coordinate so as to obtain an actual partition of the large rectangle.
′ ′′
So we are really refining P and P and then taking a union.
7.1. MULTIPLE INTEGRALS 181
Z Z Z
we see that f dV + f dV and f dV both lie between L(f, P′ ) + L(f, P′′ ) and U (f, P′ ) +
R′ R′′ R
U (f, P′′ ), and so, by uniqueness, they must be equal.
Conversely, suppose f is integrable on R. Given ε > 0, let P be a partition of R so that
U (f, P) − L(f, P) < ε. Let P ˜ be the refinement we obtain by appending the missing face of R′ ,
and let P ˜ ′ be the corresponding partition of R′ . Then U (f, P
˜ ′ ) − L(f, P
˜ ′ ) ≤ U (f, P
˜ ) − L(f, P
˜) ≤
U (f, P) − L(f, P) < ε, so f is integrable on R′ (and, similarly, on R′′ ). Now

Z Z
˜ ) = L(f, P
L(f, P) ≤ L(f, P ˜ ′ ) + L(f, P
˜ ′′ ) ≤ f dV + f dV
R′ R′′
˜ ′ ) + U (f, P
≤ U (f, P ˜ ′′ ) = U (f, P
˜ ) ≤ U (f, P),

Z Z Z
so f dV + f dV and f dV both lie between L(f, P) and U (f, P). By uniqueness, they
R′ R′′ R
must be equal.

7.1.10 Following the hint, we start with a partition P′ so that U (f, P′ )−L(f, P′ ) < ε/2. Say the
total (n − 1)-dimensional volume of the partitioning hyperplanes is A. (If the partition is given as
n
X Y
on p. 268 of the text, then we can give an explicit formula for A, viz., A = (ki − 1) (bj − aj ).)
i=1 j6=i
Suppose now we consider a partition P of R by rectangles of diameter < δ. Then the total volume
of all of those that intersect the partitioning hyperplanes is at most 2Aδ. (To cover an (n − 1)-
dimensional rectangle of (n − 1)-dimensional volume A with n-dimensional rectangles of diameter
(and therefore height) < δ requires less than volume Aδ. With thanks to Jacob Rooney for pointing
this out, we need a factor of 2 in case the partitioning hyperplanes belong to rectangles on either
side, i.e., when the partioning hyperplanes are faces of the rectangles. ) Now the contribution
of those rectangles to U (f, P) − L(f, P) is at most 2M · 2Aδ. The contribution of the remaining
rectangles is at most U (f, P′ ) − L(f, P′ ) < ε/2, inasmuch as every other rectangle is contained in
one of the rectangles of P′ . Thus, if we choose δ < ε/8M A, then we will have 2M · 2Aδ < ε/2 and
so U (f, P) − L(f, P) < ε, as required.

7.1.11 a. Suppose R = [a1 , b1 ] × [a2 , b2 ] × · · · × [an , bn ] is a rectangle in Rn with vol(R) =


(b1 − a1 )(b2 − a2 ) · · · (bn − an ) < δ. Since Q is dense in R (that is, there are rational numbers as
close to any given real number as we wish), we can choose b′1 > b1 , b′2 > b2 , . . . , b′n > bn so that
b′i − ai ∈ Q for all i = 1, . . . , n and so that (b′1 − a1 )(b′2 − a2 ) · · · (b′n − an ) < δ. (This is immediate,
for example, from the continuity of the function f (x) = (x1 − a1 )(x2 − a2 ) · · · (xn − an ) at x = b.)
Now suppose X ⊂ Rn is a set of volume 0. Then, given ε > 0, we can cover X by finitely many
P
rectangles R1 , . . . , Rs with vol(Rj ) < ε. Now it follows from the argument we’ve just given that
we can choose rectangles R1 , . . . , Rs′ with Rj ⊂ Rj′ , whose sidelengths are all rational, and with

P
vol(Rj′ ) < ε. Each such rectangle can be covered by finitely many cubes (with sidelength, say,
the reciprocal of the least common multiple of all the denominators), and so we therefore can cover
P
X with finitely many cubes C1 , . . . , Cr with vol(Ci ) < ε.
182 7. INTEGRATION

b. Given a linear map T : Rn → Rn , recall that it maps any ball of radius R into a ball
of radius kT kR. Since a cube of diameter δ is contained in a ball of radius δ/2, it follows that T
maps a cube of diameter δ into a ball of radius kT kδ/2, which is, in turn, contained in a cube of
√ √ n
diameter kT k nδ. Letting k = (kT k n) , given any cube C, it follows that T (C) is contained in
a cube whose volume is at most k times that of C.
P
Given X with volume 0, we find cubes C1 , . . . , Cr so that X ⊂ C1 ∪· · ·∪Cr and vol(Ci ) < ε/k.
P P
Then T (X) ⊂ T (C1 ) ∪ · · · ∪ T (Cr ) ⊂ C1′ ∪ · · · ∪ Cr′ , and vol(Ci′ ) ≤ k vol(Ci ) < ε. Therefore,
T (X) also has volume 0.
When m < n, of course this needn’t be true. Take, for example, the projection of a region in an
m-dimensional subspace. But what goes wrong with the proof? Imagine covering a line segment in
R2 parallel to the x-axis by s squares of area ε/s each. Then its projection to the x-axis is covered
p p √
by s line segments of length ε/s each. But as s → ∞, note that s ε/s = sε → ∞.

7.1.12 By Exercise 5.1.13, there is δ > 0 so that for every x ∈ X, we have B(x, δ) ⊂ U . Suppose

the sidelength of the cube C is c. Now choose N > c m/δ. Then when we divide C into N m

subcubes, each of sidelength c/N < δ/ m, it follows that the subcube containing x ∈ X must lie
inside U . Let Y be the union of those subcubes covering X. Then Y is also compact and so, by
the Maximum Value Theorem, the continuous function kDφk has a maximum value, say M , on Y .
It follows from Proposition 1.3 of Chapter 6 that for any cube C ′ of sidelength r, the image φ(C ′ )

is contained in a cube of sidelength M r n. Since X is covered by the subcubes constituting Y ,
√ n
φ(X) is covered by at most N m cubes of volume at most (c/N )M n . That is, φ(X) is covered
 c √ n √
by cubes whose total volume is at most N m M n = (cM n)n N m−n . Since m < n, we see
N
that by making N bigger we can arrange for this to be less than any given positive ε. Therefore
φ(X) has volume 0.

7.1.13 The function f given here is discontinuous at every rational point. Nevertheless, f is
integrable. Given ε > 0, choose N > 2/ε. Then list all the rational numbers with denominator
≤ N : 0, 1, 21 , 31 , 32 , . . . , N1 ,. . . , NN−1 . Create a partition P in which each of these points is an
interior point of a subinterval of the partition, and in which all the lengths of those subintervals
add up to ε/2. For any one of these subintervals, we have Mi ≤ 1 and mi = 0. For any of the
remaining subintervals, we have Mi ≤ 1/(N + 1) < ε/2. Therefore, we have
X X
U (f, P) − L(f, P) = (Mi − mi )(xi − xi−1 ) + (M − m )(x − xi−1 )
| {z } | i {z i} i
special subintervals ≤1 remaining subintervals
<ε/2
ε ε
< 1 · + · 1 = ε.
2 2
Therefore, f is integrable on [0, 1]. Moreover, since L(f, P) = 0 for every partition P, we must have
Z 1
f (x)dx = 0.
0

7.1.14 a. Obviously, if there are finitely many rectangles R1 , . . . , Rs so that X ⊂ R1 ∪ · · · ∪ Rs


P
and vol(Ri ) < ε, then X has measure 0 as well. (We can take Rs+1 = · · · = ∅.)
7.1. MULTIPLE INTEGRALS 183

b. Any countable set X, e.g., Q ⊂ R, has measure 0. For each xi ∈ X, i ∈ N, choose a


P∞ P∞
rectangle Ri of volume < ε/2i containing xi . Then vol(Ri ) < ε/2i = ε.
i=1 i=1
S
c. By Exercise 5.1.12, since X is compact and X ⊂ Ri , it follows that X ⊂ R1 ∪· · ·∪Rs
P
s P

for some s. Since vol(Ri ) ≤ vol(Ri ) < ε, it follows that X has volume 0. (There is one
i=1 i=1
subtlety here: Exercise 5.1.12 refers to open sets, so we must first “thicken” the closed rectangles
to make them open. The easiest way to do this is to start with a sequence of (closed) rectangles

covering X whose total volume is less than ε/2. Then expanding each of its sides by a factor of n 2
and making an open rectangle results in a covering by open rectangles of total volume less than ε.)
d. The crucial fact here is that a countable union of countable sets is again countable.
P S

If we cover Xi by rectangles Rij , j = 1, 2, . . ., with vol(Rij ) < ε/2i , then Xi is covered by
j
P P P i=1 P
the collection {Rij } of rectangles, whose total volume is vol(Rij ) = vol(Rij ) < ε/2i = ε.
i j i
(Verifying that {Rij } can be arranged in a sequence is exactly like the proof that Q is countable.

R11 R12 R13 R14 R15 R16


R21 R22 R23 R24 R25 R26
R31 R32 R33 R34 R35 R36
R41 R42 R43 R44 R45 R46
R51 R52 R53 R54 R55 R56
R61 R62 R63 R64 R65 R66

Just count along the “anti-diagonals,” as pictured above. In formulas, we assign the counting
number i + (i + j − 1)(i + j − 2)/2 to the set Rij .)

7.1.15 a. Since M (f, a, δ) decreases and m(f, a, δ) increases as δ → 0+ , it follows that their
difference, M (f, a, δ) − m(f, a, δ), decreases. On the other hand, it is bounded below by 0, hence
converges as δ → 0+ . Whenever kx − ak < δ, we have |f (x) − f (a)| ≤ M (f, a, δ) − m(f, a, δ).
Therefore, if o(f, a) = 0, then f is continuous at a. Conversely, if f is continuous at a, then given
any ε > 0, there is δ > 0 so that whenever kx − ak < δ, we have |f (x) − f (a)| < ε/2, so for any
x, y ∈ B(a, δ), we have |f (x) − f (y)| < ε and therefore M (f, a, δ) − m(f, a, δ) ≤ ε. Since ε is
arbitrary, we must have o(f, a) = 0.
b. By part a, if f is discontinuous at a, then o(f, a) > 0, so there is some k ∈ N so that
o(f, a) ≥ 1/k. Therefore, x ∈ D =⇒ x ∈ D1/k for some k ∈ N, so D ⊂ D1 ∪ D1/2 ∪ D1/3 ∪ · · ·.
On the other hand, if x ∈ D1/k for some k ∈ N, then f is discontinuous at x, so x ∈ D. Therefore
D ⊃ D1 ∪ D1/2 ∪ D1/3 ∪ · · ·, and we’ve established equality.
To prove Dε is a closed set, suppose xk → a and xk ∈ Dε . This means that there are points
yk , zk ∈ B(xk , 1/k) with f (yk ) − f (zk ) ≥ ε. Since yk → a and zk → a, this shows that o(f, a) ≥ ε
as well.
184 7. INTEGRATION

c. Choose ε > 0. Then there is a partition P so that U (f, P) − L(f, P) < ε/k. For each
x ∈ D1/k , we have x ∈ Ri for some rectangle in P, and so (1/k)vol(Ri ) ≤ (Mi − mi )vol(Ri ). There-
1 X X ε
fore, we have vol(Ri ) ≤ (Mi − mi )vol(Ri ) ≤ U (f, P) − L(f, P) < . There-
k k
Ri ∩D1/k 6=∅ Ri ∩D1/k 6=∅
X
fore, vol(Ri ) < ε and therefore D1/k is a set of volume 0. It follows from Exercise 14 that
Ri ∩D1/k 6=∅
D1/k is a set of measure 0 for each k ∈ N and therefore D has measure 0.
d. The proof is quite like that of Proposition 1.8. Suppose |f | ≤ M . Suppose D has
measure 0 and we are given ε > 0; set ε′ = ε/2vol(R). Then Dε′ ⊂ D has measure 0; since Dε′ is a
closed subset of a rectangle, it is compact, and therefore by Exercise 14c it has volume 0. We can
cover Dε′ by finitely many rectangles Rj′ , j = 1, . . . , s, whose volumes sum to less than ε/4M ; we
also make sure that no point of Dε′ is a frontier point of the union of these rectangles.
S
s
Consider the closure Y of R − Rj′ . For every x ∈ Y , we have o(f, x) < ε′ and so there is an
j=1
open rectangle Sx on which supy∈Sx f (y) − inf y∈Sx f (y) < ε′ . By Exercise 5.1.12, we can cover Y
by finitely many such rectangles (hence by their closures). We finally create a partition P = {Ri }
of R so that every one of the S x ’s we used and every Rj′ we used is a union of subrectangles of P.
Then we have
X X
U (f, P) − L(f, P) = (Mj − mj ) vol(Rj ) + (Mj − mj ) vol(Rj )
| {z } s
| {z }
Rj ⊂Y
≤ε′ R′j ≤2M
S
Rj ⊂
j=1

s
X ε ε ε ε
< ε′ vol(R) + 2M vol(Rj′ ) < vol(R) + 2M = + = ε.
2vol(R) 4M 2 2
j=1

Therefore, it follows from the Convenient Criterion, Proposition 1.3, that f is integrable on R.

7.2. Iterated Integrals and Fubini’s Theorem


Z Z 1 Z π/2 Z 1 iπ/2 Z 1 i1
x x
7.2.1 a. f dA = e cos ydydx = e sin y dx = ex dx = ex = e − 1.
R 0 0 0 0 0 0
Z Z 3Z 4 Z Z 3
y1 y 2 i4 3
6 i3
b. f dA = dydx = dx = dx = 6 log x = 6 log 3.
R 1 2 x
1 x 2 2 1 x 1
Z Z 3Z 1 Z 3 i Z 3
x 1 2
1 1 
c. f dA = 2
dxdy = log(x + y) dy = log(y + 1) − log y dy
R 1 0 x +y 1 2 0 2 1
 i3 1 
= 21 (y + 1)(log(y + 1) − 1) − y(log y − 1) = 2 4(log 4 − 1) − 3(log 3 − 1) − 2(log 2 − 1) + (−1)
 1
= 3 log 2 − 32 log 3 = log 3√8
3
.
Z Z 1 Z 2Z 3 Z 1Z 2 Z 1
5 5

d. f dV = (x + y)zdzdydx = 2 (x + y)dydx = 2 x + 23 dx = 15 2 .
R −1 1 2 −1 1 −1
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 185
      Z
x x
7.2.2a. Ω = : 0 ≤ x ≤ 1, x ≤ y ≤ 1 = : 0 ≤ y ≤ 1, 0 ≤ x ≤ y , and f dA =
y y Ω
Z  
1Z y
x
f dxdy.
0 0 y
      Z
x x y
b. Ω = : 0 ≤ x ≤ 1, 0 ≤ y ≤ 2x = : 0 ≤ y ≤ 2, ≤ x ≤ 1 , and f dA =
y y 2 Ω
Z 2Z 1  
x
f dxdy.
0 y/2 y
     
x x √
c. Ω = : 1 ≤ y ≤ 2, y2 ≤x≤4 = : 1 ≤ x ≤ 4, 1 ≤ y ≤ x , and
y y
Z Z 4Z x  

x
f dA = f dydx.
Ω 1 1 y
    
x p x √
d. Ω = : −1 ≤ y ≤ 1, 0 ≤ x ≤ 1 − y 2 = : 0 ≤ x ≤ 1, − 1 − x2 ≤ y
y y
o Z Z 1 Z √1−x2  
√ x
≤ 1 − x2 , and f dA = √ f dydx.
Ω 0 − 1−x2 y
     
x 2 x √
e. Ω = : 0 ≤ x ≤ 1, x ≤ y ≤ x = : 0 ≤ y ≤ 1, y ≤ x ≤ y , and
y y
Z Z 1Z y√  
x
f dA = f dxdy.
Ω 0 y y
     
x x √ √
f. Ω = : −1 ≤ x ≤ 2, x2 ≤ y ≤ x + 2 = : 0 ≤ y ≤ 1, − y ≤ x ≤ y ∪
y y
   Z Z 1 Z √y   Z 4 Z √y  
x √ x x
: 1 ≤ y ≤ 4, y − 2 ≤ x ≤ y , and f dA = √
f dxdy + f dxdy.
y Ω 0 − y y 1 y−2 y
Z 1Z x Z Z
1
1 ix 1
3 2 1 i1 1
7.2.3 a. (x + y)dydx = xy + y 2 dx = x dx = x3 = . Changing the
0 0 0 2 0 0 2 2 0 2
order of integration, we rewrite this as
Z 1Z 1 Z 1 Z 1
1 2  i1 1 3  1 1 1  i1 1
(x + y)dxdy = x + xy dy = + y − y 2 dy = y + y 2 − y 3 = .
0 y 0 2 y 0 2 2 2 2 2 0 2
Z 1 Z √1−y2 Z 1 p i1 2
2
2 dy = − (1 − y 2 )3/2
b. √ 2 ydxdy = 2y 1 − y = . Changing the order
0 − 1−y 0 3 0 3
of integration, we rewrite this as
Z 1 Z √1−x2 Z 1 √ Z Z 1
1 2 i 1−x2 1 1 2 2
ydydx = y dx = (1 − x )dx = (1 − x2 )dx = .
−1 0 −1 2 0 2 −1 0 3
Z 1Z x Z 1 ix Z 1
x 
c. 2
dydx = x arctan y dx = x arctan x − x arctan(x2 ) dx. Inte-
x2 1 + y x 2
0 0 0
grating by parts a few times, we find that
Z
1
x arctan xdx = (x2 arctan x − x + arctan x) and
2
Z
1 1
x arctan(x2 )dx = x2 arctan x2 − log(x4 + 1),
2 4
186 7. INTEGRATION

a. b. c.

d. e. f.

1π  1
so our integral is equal to − 1 + log 2. On the other hand, changing the order of integration
2 4 4
gives us
Z 1 Z √y Z i√ y Z  
x 1 1 1 2 1 1 y y2
dxdy = x dy = − dy
0 y 1 + y2 2 0 1 + y2 y 2 0 1 + y2 1 + y2
1 1 i1 1  1 π
= log(1 + y 2 ) − y + arctan y = log 2 − 1 + .
2 2 0 2 2 4

a. b. c.

Z 1  
x
7.2.4 No. f dy exists only for x = 1/2, since the integrand is otherwise everywhere
0 y
discontinuous. (Alternatively, when 0 ≤ x < 1/2, every upper sum is 1 and every lower sum is 2x.)
  
x m n
7.2.5 We claim that :x= and y = for some m, n, q ∈ N with q prime is dense
y q q
2
in R . In particular, any rectangle whatsoever contains a point of that form. To establish this,
note that if q is prime and 1/q < δ (remember that, by Euclid, there are infinitely many primes),
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 187

then any interval in R of length δ must contain a point of the form m/q (or else we’d have an
interval of length δ separating two consecutive multiples of 1/q).

7.2.6 This is a two-dimensional variant of the function we studied in Exercise 7.1.13. Taking
the trivial partition of the interval [0, 1] on the y-axis and using the same partition on the x-axis,
we see that f is integrable on R. Z   1
x
When y ∈ Q, we know from that exercise that f dx exists and is equal to 0. Therefore,
0 y
Z 1   Z 1Z 1  
x x
f dx = 0 for all y and f dxdy = 0. On the other hand, for any x ∈ Q, x = p/q
0 y 0 0 y
Z 1  
x
in lowest terms, the integral f dy does not exist, since every lower sum is 0 and every upper
0 y
Z 1Z 1  
x
sum is 1/q. Therefore, the iterated integral f dydx does not exist.
0 0 y

  1, x = 0, y ∈ Q or y = 0, x ∈ Q
x
7.2.7 Yes. Here is a cheap solution: Let f = . Then
y 0, otherwise
Z 1   Z 1  
x 0
neither f dx nor f dy exists, so neither iterated integral exists.
0 0 0 y

7.2.8 In all these cases, the key is to change the order of integration. The student should in
every such problem begin by sketching the region.
Z 2 Z x2 Z 2 i2 1
1 x2 1 3 2
a. 3
dydx = 3
dx = log(1 + x ) = log 9 = log 3.
0 0 1 + x 0 1 + x 3 0 3 3
Z 1 Z y3 Z 1
4 4 1 4 i1 1
b. ey dxdy = y 3 ey dy = ey = (e − 1).
0 0 0 4 0 4
Z 1 Z x2 Z 1 ix 2 Z 1
1 i1 1
c. ey/x dydx = xey/x dx = x(ex − 1)dx = xex − ex − x2 = .
0 0 0 0 0 2 0 2
Note that, despite the seeming discontinuity at the origin, the function is really bounded on the
region Ω, since 0 ≤ y/x ≤ 1 on Ω. Moreover, as x → 0 in Ω, 0 ≤ y/x ≤ x → 0, so the function in
fact approaches 1 as x → 0.
    
x p x
7.2.9 Let f 2
= 16 − y and Ω = : 0 ≤ y ≤ 4, 0 ≤ x ≤ y/2 . Then the volume in
y y
question is

Z   Z 4 Z y/2 p Z i4 32
x 1 4 p 1
f dA = 2
16 − y dxdy = y 16 − y 2 dy = − (16 − y 2 )3/2 = .
Ω y 0 0 2 0 6 0 3

7.2.10 The volume of the region Ω is given as a triple integral by

Z Z 2 Z 4−x2 Z y Z 2 Z 4−x2 Z 2 Z 2
2 2 256
1dV = dzdydx = ydydx = (4−x ) dx = (16−8x2 +x4 )dx = .
Ω −2 0 0 −2 0 0 0 15
188 7. INTEGRATION

7.2.11 The volume of the region Ω is most easily computed as an iterated integral with the
x-integral outermost. By symmetry, we have
Z Z 1 Z √1−x2 Z √1−x2 Z 1
16
1dV = 8 dzdydx = 8 (1 − x2 )dx = .
Ω 0 0 0 0 3
 
Z 1 Z 1−x Z 1−x−z x
7.2.12 a. f y  dydzdx
0 0 0 z
 
Z 1 Z 1−x2 Z 1−x2 x
b. f y  dydzdx
0 0 z z
√  
Z 1 Z 1 Z z 2 −x2 x
c. √ f y  dydzdx
−1 |x| − z 2 −x2 z
   
Z 1 Z x Z 1−x2 x Z 1 Z 1+x−x2 Z 1−x2 x
d. f y  dydzdx + f y  dydzdx
0 0 0 z 0 x z−x z
   
Z 1 Z x Z 1−x x Z 1 Z 1 Z 1−x x
e. f y  dydzdx + f y  dydzdx
0 0 0 z 0 x z−x z

a. b. c.

d. e.

7.2.13 Let Ω be the region in the first octant bounded by the plane x/a + y/b + z/c = 1. Then,
doing careful bookkeeping, its volume is given by
Z Z a Z b(1−x/a) Z c(1−x/a−y/b)
1dV = dzdydx
Ω 0 0 0
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 189

Z a Z b(1−x/a) Z a
x y x 1 y 2 ib(1−x/a)
=c 1 − − dydx = c 1− y−
0 a b a 2 b 0
Z a0 i
0
bc 
x 2 abc 
x 3 a abc
= 1− dx = ·− 1− = .
2 0 a 6 a 0 6

7.2.14 The volume of this region is


Z 1 Z 1−x2 Z 1−z Z 1 Z 1−x2 Z 1
8
dydzdx = (1 − z)dzdx = 2(1 − x2 )dx = .
−1 x2 −1 0 −1 x2 −1 −1 3

7.2.15 The integral is given by the following iterated integral:

Z Z 1Z 1 Z 2−y−z
xdV = xdxdydz
Ω 0 1−z 0
Z 1Z 1 Z 1 i1
1 1
= (2 − y − z)2 dydz = −(2 − y − z)3 dz
2 0 1−z 6 0 1−z
Z
1 1  1 1 i1 1 3 1
= 1 − (1 − z)3 dz = z + (1 − z)4 = · = .
6 0 6 4 0 6 4 8

7.2.16 We have
Z 1Z 1 Z 1Z 1 Z 1 i1
x−y 1 2y 1 y
3
dxdy = 2
− dxdy = − + dy
0 0 (x + y) 0 0 (x + y) (x + y)3 0 x + y (x + y)2 0
Z 1  Z 1
1 y 1 1 i1 1
= − + 2
dy = − 2
dy = =− ;
0 y + 1 (y + 1) 0 (y + 1) y+1 0 2

whereas
Z 1Z 1 Z 1Z 1 Z 1 i1
x−y 1 2x 1 x
dydx = − + dydx = − dx
0 0 (x + y)3 0 0 (x + y)2 (x + y)3 0 x + y (x + y)2 0
Z 1  Z 1
1 x 1 1 i1 1
= − dx = dx = − = .
0 x + 1 (x + 1)2 0 (x + 1)
2 x+1 0 2

There is no contradiction: Fubini’s Theorem does not apply, as f is unbounded on [0, 1] × [0, 1] and
hence not integrable.
190 7. INTEGRATION
 
x 1 1
7.2.17Note that f is unbounded on R, hence not integrable. Let Rkℓ = : ≤x≤ ,
y k+1 k

1 1
≤y≤ . Then we have
ℓ+1 ℓ


 1/2ℓ−k , k < ℓ
Z 

f dA = −1, k=ℓ.
Rkℓ 


0, k>ℓ

Then we expect that


Z 1Z 1   XXZ X X  X
x
f dydx = f dA = −1 + 2−(ℓ−k) = 0 = 0,
0 0 y Rkℓ
k ℓ k k<ℓ k

But for the other iterated integral, we have


Z 1Z 1   XXZ X X  X
x 
f dxdy = f dA = −1 + 2−(ℓ−k) = −1 + −2−ℓ = −2.
0 0 y Rkℓ
ℓ k ℓ k<ℓ ℓ

Interesting! (This is a manifestation of the fact from infinite series that only in the case of absolute
convergence can we rearrange with impunity.)
Z 1  
1 1 x
To be a bit more pedantic, note that when < x ≤ , we have f dy = −k(k + 1) +
k+1 k 0 y
X k(k + 1) Z 1Z 1  
x 1 1
= 0. Therefore, f dydx really is equal to 0. And, when <y≤ ,
2ℓ−k 0 0 y ℓ + 1 ℓ
ℓ>k
Z 1   X ℓ(ℓ + 1)
x 1
then we have f dx = −ℓ(ℓ + 1) + ℓ−k
= −ℓ(ℓ + 1) ℓ−1 . Therefore, integrating with
0 y 2 2
k<ℓ
Z 1Z 1   X
x 1  1
respect to y, we obtain f dxdy = −ℓ(ℓ + 1) ℓ−1 = −2, as we surmised.
0 0 y 2 ℓ(ℓ + 1)

7.2.18 a. Write R = R+ ∪R− , where R+ = {x ∈ R : x1 ≥ 0} and R− = {x ∈ R : x1 ≤ 0}. Then


the fact that R = [a1 , b1 ]×· · ·×[an , bn ] is symmetric about x1 = 0 tells us that a1 = −b1 . Moreover,
 
u
Z 0 Z b Z b  x2 
 
g(x1 )dx1 = g(−u)du = − g(u)du if g(−u) = −g(u). Applying this with g(u) = f  . ,
−b 0 0  .. 
xn
we have
 
x1
Z Z bn Z b2 Z 0 Z bn Z b2 Z 0  x 
 2
f dV = ··· f (x)dx1 dx2 · · · dxn = ··· f  ..  dx1 dx2 · · · dxn
R− an a2 a1 an a2 −b1  . 
xn
 
x1
Z bn Z b2 Z b1  x  Z
 2
=− ··· f  .  dx1 dx2 · · · dxn = − f dV .
an a2 0  ..  R+
xn
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 191
Z Z Z
Therefore, f dV = f dV + f dV = 0.
R R− R+
b. Suppose R = [−b1 , b1 ] × · · · × [−bn , bn ] and f (−x) = −f (x) for all x ∈ R. Then

   
Z Z bn
x1 Z b1 Z bn Z b1 −x1
 ..   .. 
f (x)dV = ··· f  .  dx1 · · · dxn = ··· f  .  dx1 · · · dxn = . . .
R −bn −b1 −bn −b1
xn xn
 
Zbn b1
−x1Z Z Z
 .. 
= ··· f  .  dx1 · · · dxn = f (−x)dV = − f (x)dV .
−bn −b1 R R
−xn

Z Z
Therefore, f dV = − f dV = 0.
R R
c. The same result applies to an arbitrary region Ω having the same symmetry. We
enclose Ω in a symmetric rectangle, consider the extended function f˜, and observe that it has the
same symmetry. Then the result follows from the results of a and b.

∂2f ∂2f
7.2.19 Suppose that for some x0 we have (x0 ) > (x0 ). Since f is C2 , there is a ball
∂x∂y ∂y∂x
∂2f ∂2f
centered at x0 (hence a rectangle R = [a, b]×[c, d] centered at x0 ) on which > . It follows
Z  2  Z ∂x∂y ∂y∂x Z
∂ f ∂2f ∂2f ∂2f
from Exercise 7.1.5 that − dA > 0, and hence that dA > dA.
R ∂x∂y ∂y∂x R ∂x∂y R ∂y∂x
Since f is C2 , the integrands are continuous and we can evaluate these by iterated integrals, using
the Fundamental Theorem of Calculus. We have

Z Z bZ Z bZ d
d  
∂2f ∂2f ∂ ∂f
dA = dydx = dydx
R ∂y∂x a c ∂y∂x a c ∂y ∂x
Z b           
∂f x ∂f x b a b a
= − dx = f −f −f +f , and
a ∂x d ∂x c d d c c
Z Z dZ b 2 Z dZ b  
∂2f ∂ f ∂ ∂f
dA = dxdy = dxdy
R ∂x∂y c a ∂x∂y c a ∂x ∂y
Z d           
∂f b ∂f a b b a a
= − dy = f −f −f +f .
c ∂y y ∂y y d c d c

Z Z
∂2f ∂2f
Comparing our answers, we have arrived at a contradiction, for we have dA = dA.
R ∂x∂y R ∂y∂x
It must therefore follow that the mixed partials are everywhere equal.

7.2.20 a. Since f is continuous and the rectangle [a, b] × [c, d] is compact, by Theorem 1.4 of
Chapter
  5, we
 know f is uniformly continuous.
   ′ Given
 ε > 0, there is δ > 0 so that whenever
x x′ x x ε

y − y ′ < δ, then we have f y − f y ′ < d − c . Now we claim that if |x − x | < δ,
192 7. INTEGRATION
   
x x′
then |F (x) − F (x′ )|
< ε. For if |x − < δ, then we have
y − < δ and so
x′ |
y
Z d     ′  Z d    ′  Z d
x x x x ε

|F (x) − F (x )| = −f
dy ≤
f f y − f y dy < dy = ε,
c y y c c d−c

as required.
Z d  
∂f t
b. Set φ(t) = dy. Reasoning precisely as in part a, we conclude that φ is
y
c Z∂x x
continuous. Setting Φ(x) = φ(t)dt, we conclude from the Fundamental Theorem of Calculus
a
that Φ is differentiable and Φ′ (x) = φ(x) for all x ∈ [a, b]. On the other hand, by Fubini’s Theorem,
we have
Z xZ d   Z dZ x   Z d    
∂f t ∂f t x a
Φ(x) = dydt = dtdy = f −f dy = F (x) − F (a).
a c ∂x y c a ∂x y c y y

It follows, then, that F = Φ + F (a) is differentiable and F ′ (x) = Φ′ (x) = φ(x), which is what we
wanted to establish.
Z 1 x
y −1
7.2.21 (Note first of all that the improper integral dy converges at 1, as, by L’Hôpital’s
a log y
yx − 1
rule, lim = lim xy x = x.) By Exercise 20, we have
y→1 log y y→1
Z 1 x Z 1 i1
′ y log y 1 1
F (x) = dy = y x dy = y x+1 = for x > −1.
0 log y 0 x+1 0 x+1
Then F (x) = log(x + 1) + C for some constant C. But note that F (0) = 0, so C = 0. Therefore
Z 1
y−1
F (1) = dy = log 2.
0 log y

7.2.22 a. Using the Fundamental Theorem of Calculus and Exercise 20 as necessary, we have
Z x
2 2
f ′ (x) = 2e−x e−t dt
0
Z 1 −x 2 2 Z 1 Z 1
′ e (t +1)  −x2 (t2 +1) −x2 2
g (x) = · −2x(t + 1) dt = −2
2xe dt = −2xe e−(xt) dt
0 t2 +1 0 0
Z x
2 2
Now, making the substitution u = xt in the latter integral, we have g ′ (x) = −2e−x e−u du,
0
and so f ′ (x) + g ′ (x) = 0.
Z 1
dt π
b. We have f (0) = 0 and g(0) = = , so f (x) + g(x) = π/4 for all x ∈ R.
0 +1 4 t2
Now, we claim that lim g(x) = 0. To see this, note that
x→∞
Z 1 2 2 Z 1
−x2 e−x t 2 dt 2π
g(x) = e 2
dt ≤ e−x = e−x → 0 as x → ∞.
0 t +1 0 t2 +1 4
Therefore, it follows that lim f (x) = π/4, and so
x→∞
Z ∞ Z x p √
−t2 −t2 π
e dt = lim e dt = lim f (x) = .
0 x→∞ 0 x→∞ 2
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 193
  Z z  
x x
7.2.23 As the hint suggests, consider F = f dy. Then by Exercise 20 and the
z c y
Fundamental Theorem of Calculus, respectively, we have
Z z    
∂F ∂f x ∂F x
= dy and =f .
∂x c ∂x y ∂z z
 
2 x
Then, setting φ : [a, b] → R , φ(x) = , note that h = F ◦ φ. Therefore, we have
g(x)
    Z g(x)    
′ ′ ∂F x ∂F x ∂f x x
h (x) = DF (φ(x))φ (x) = + g ′ (x) = dy + f g ′ (x).
∂x g(x) ∂z g(x) c ∂x y g(x)

7.2.24 We have (substituting y = x tan θ)


Z x Z π/4 Z
dy x sec2 θdθ 1 π/4 1 π
F (x) = 2 + y2
= 2 sec2 θ
= dθ = · .
0 x 0 x x 0 x 4

a. By Exercise 23, we have


Z x
1 π ′ dy 1
− 2 · = F (x) = −2x 2 2 2
+ 2 ,
x 4 0 (x + y ) x + x2
Z x  
dy 1 π 1 π+2
and so 2 2 2
= 2
+ 2 = .
0 (x + y ) 2x 4x 2x 8x3
Z x
dy π+2
b. Set G(x) = 2 2 2
= . Then
0 (x + y ) 8x3
Z x
−3(π + 2) ′ dy 1
4
= G (x) = −4x 2 2 3
+ ,
8x 0 (x + y ) (2x2 )2
Z x  
dy 1 3(π + 2) 1 3π + 8
and so 2 2 3
= 4
+ 4 = .
0 (x + y ) 4x 8x 4x 32x5
(Of course, these integrals could be evaluated by the same trigonometric substitution we used at
the outset.)
Z x
7.2.25 ′
Of course, h(0) = 0 and, by Exercise 23, h (x) = cos(x − y)f (y)dy + sin(x − x)f (x) =
Z x 0 Z x
cos(x − y)f (y)dy, so h′ (0) = 0. Once again, we have h′′ (x) = − sin(x − y)f (y)dy +
0 0
cos(x − x)f (x) = −h(x) + f (x), as required.

7.2.26 Obviously, the formula holds when n = 1. Suppose we know the result for a k-fold
integral and want to calculate the (k + 1)-fold integral. Then
Z x Z x1 Z xk Z xk+1
··· f (xk+1 )dxk+1 dxk · · · dx2 dx1
0 0 0 0
Z x Z x 1 Z xk Z xk+1 
= ··· f (xk+1 )dxk+1 dxk · · · dx2 dx1
0 0 0 0
Z x Z x1 
1 k−1
= (x1 − t) f (t)dt dx1
0 (k − 1)! 0
194 7. INTEGRATION

(changing the order of integration)


Z xZ x
1
= (x1 − t)k−1 f (t)dx1 dt
(k − 1)! 0 t
Z ix Z
1 x k 1 x
= (x1 − t) f (t) dt = (x − t)k f (t)dt,
k! 0 x1 =t k! 0
as desired.

7.3. Polar, Cylindrical, and Spherical Coordinates

7.3.1 The figures are as below:

4 3

a. b.

−2

c. d.

7.3.2 The curves r cos θ = 1 and r = 2 intersect at θ = ±π/3. Therefore, the area of the region
Ω described is
Z Z π/3 Z 2 Z Z
1 π/3 2 i2 1 π/3
1dA = rdrdθ = r dθ = (4 − sec2 θ)dθ
Ω −π/3 sec θ 2 −π/3 sec θ 2 −π/3
Z π/3 iπ/3 4π √
= (4 − sec2 θ)dθ = 4θ − tan θ = − 3.
0 0 3
By elementary geometry, we note that we have a sector of a circle (with central angle 2π/3) less
1 1 √
the triangular region inside; thus, the area is (4π) − (2 3).
3 2
7.3. POLAR, CYLINDRICAL, AND SPHERICAL COORDINATES 195

7.3.3 The area of the cardioid is


Z 2π Z 1+cos θ Z
1 2π 1 1 1 i2π 3π
rdrdθ = (1 + cos θ)2 dθ = θ + 2 sin θ + sin 2θ + θ = .
0 0 2 0 2 4 2 0 2
7.3.4 We have
Z Z 2π Z 1 Z 2π Z 1
1 1
p dA = rdrdθ = drdθ = 2π(1 − ε) → 2π as ε → 0.
Sε x2 + y 2 0 ε r 0 ε

Z Z Z √
2π 2
2 3
7.3.5 a. y dA = r 3 sin2 θdrdθ = π.
S 0 1 4
Z Z Z √
2π 2
3
b. On the other hand, (x2 + y 2 )dA = r 3 drdθ = (2π). But we observe
S 0 1 Z 4
Z Z
2 2
that since the region S is symmetric about the line y = x, y dA = x dA, so y 2 dA =
Z S S S
1
(x2 + y 2 )dA.
2 S
7.3.6 We have
Z Z √ Z π/4 Z √2
π/4 Z 2
2 2 −5/2 r sin θ sin θ
y(x + y ) dA = 5
rdrdθ = 3
drdθ
S 0 sec θ r 0 sec θ r
Z
1 π/4

1

1 1 1 iπ/4 √2 − 1
2 3
= cos θ − sin θdθ = − cos θ + cos θ = .
2 0 2 2 3 2 0 12

7.3.7 We have
Z Z 5π/6 Z 2 Z 5π/6 Z 2
2 2 −3/2 1 1
(x + y ) dA = 3
rdrdθ = 2
drdθ
S π/6 csc θ r π/6 csc θ r
Z 5π/6 
1  1 i5π/6 √ π
= sin θ − dθ = − cos θ − θ = 3− .
π/6 2 2 π/6 3

7.3.8 We have
Z Z π/4 Z 2 cos θ 2 2 Z π/4 Z 2 cos θ
r sin θ
f dA = rdrdθ = r 2 sin2 θdrdθ
S 0 sec θ r 0 sec θ
Z Z
1 π/4 3 3 2 1 π/4 
= (8 cos θ − sec θ) sin θdθ = 8 sin2 θ(1 − sin2 θ) cos θ − tan2 θ sec θ dθ
3 0 3 0
 iπ/4 1 √
1 sin θ 3 5
sin θ 1 1 √ 2
= 8 −8 − sec θ tan θ + log(sec θ + tan θ) = log( 2 + 1) − .
3 3 5 2 2 0 6 90
7.3.9 Although this problem is quite easily done by changing the order of integration, the
x2 + y 2
in the denominator suggests to us that changing to polar coordinates may simplify matters.
Indeed, it does:
Z 1Z 1 Z π/4 Z sec θ Z π/4 Z sec θ
xex r cos θer cos θ
2 2
dxdy = rdrdθ = cos θer cos θ drdθ
0 y x +y 0 0 r2 0 0
Z π/4 isec θ Z π/4 π
= er cos θ = (e − 1)dθ = (e − 1).
0 0 0 4
196 7. INTEGRATION

7.3.10 The volume of S is given by the triple integral


Z Z π Z 2 sin θ Z 2r sin θ Z π Z 2 sin θ
1dV = rdzdrdθ = (2r 2 sin θ − r 3 )drdθ
S 2
Z0 π 0 r
3
0 0
iπ
4 4 1 1 4 3π π
= sin4 θdθ = θ − sin 2θ + sin 4θ = · = .
0 3 3 8 4 32 0 3 8 2

7.3.11 Calculating in spherical coordinates, we have


Z 2π Z π Z sin φ Z
2 2π π 4 2π 3π π2
ρ sin φdρdφdθ = sin φdφ = = .
0 0 0 3 0 3 8 4

7.3.12 Calculating in spherical coordinates, we have (note that since ρ ≥ 0, we must have
0 ≤ φ ≤ π)
Z π Z π Z sin θ Z Z Z π  Z π 
2 1 π π 3 1 3 8
ρ sin φdρdφdθ = sin θ sin φdφdθ = sin θdθ sin φdφ = .
0 0 0 3 0 0 3 0 0 9

7.3.13 The volume of the region inside both surfaces is given by


Z 2π Z 1 Z √2−r2 Z 1 p
2π i1 2π √
2 rdzdrdθ = 4π r 2 − r 2 dr = −(2 − r 2 )3/2 = (2 2 − 1).
0 0 0 0 3 0 3
7.3.14 The volume of the region inside both surfaces is given by
Z π Z 2a sin θ Z √4a2 −r2 Z π
1 i2a sin θ
2 rdzdrdθ = 2 − (4a2 − r 2 )3/2 dθ
0 0 0 0 3 0
Z
16a3 π 16a3
= (1 − | cos3 θ|)dθ = (3π − 4).
3 0 9

7.3.15 The region lies over the disk r ≤ 1 in the xy-plane; thus, the volume of the region is
Z 2π Z 1 Z √2−r2 Z 1 p  1
 1 i1 π √
rdzdrdθ = 2π r 2 − r 2 − r 3 dr = 2π − (2 − r 2 )3/2 − r 4 = (8 2 − 7).
0 0 r2 0 3 4 0 6
Z Z √
2π aZ  1
a2 −r 2 ia 4πa3
7.3.16 a. 2 rdzdrdθ = 4π − (a2 − r 2 )3/2 = .
0 0 0 3 0 3
Z 2π Z π Z a Z a  Z π   a3 
2 2 4πa3
b. ρ sin φdρdφdθ = 2π ρ dρ sin φdφ = 2π (2) = .
0 0 0 0 0 3 3
7.3. POLAR, CYLINDRICAL, AND SPHERICAL COORDINATES 197

7.3.17 We set up the cone with its vertex at the origin and its axis of symmetry along the
z-axis. Z Z
2π aZ h Z a
r2  π
a. dr = a2 h.
rdzdrdθ = 2πh r−
0 0 hr/a 0 a 3
Z 2π Z arctan(a/h) Z h sec φ Z
2πh3 arctan(a/h)
b. ρ2 sin φdρdφdθ = sec3 φ sin φdφ
0 0 0 3 0
2πh3  1 2
iarctan(a/h) 2πh3 a 2 πa2 h
= sec φ = = .
3 2 0 6 h 3
Z 2π Z 1 Z √2−r2 Z 1 p
 2π i1
7.3.18 a. rdzdrdθ = 2π r 2 − r 2 − r 2 dr = − (2 − r 2 )3/2 + r 3
0 0 r 0 3 0
4π √ 
= 2−1 .
3
Z 2π Z π/4 Z √2 √ !  √
2 2 2 1 4π( 2 − 1)
b. ρ sin φdρdφdθ = 2π 1− √ = .
0 0 0 3 2 3

Z Z √ √
2π a 3 Z 4a2 −r 2 Z a p 
7.3.19 a. rdzdrdθ = 2π r 4a2 − r 2 − ar dr
0 √0 a 0
1 2 2 3/2 a 2 ia 3 7 3  5πa3
= −2π (4a − r ) + r = 2πa3 − = .
3 2 0 3 2 3
Z 2π Z π/3 Z 2a Z
2πa3 π/3 2πa3 5 5πa3
b. ρ2 sin φdρdφdθ = 8 − sec3 φ) sin φdφ = · = .
0 0 a sec φ 3 0 3 2 3
Z
7.3.20 Because S is symmetric under the interchanges x y z, we have x2 dV =
Z Z S
y 2 dV = z 2 dV , so
S S
Z Z Z 2π Z π Z 1
21 2 2 2 1 1 4π  4π
x dV = (x + y + z )dV = ρ4 sin φdρdφdθ = = .
S 3 S 3 0 0 0 3 5 15
7.3.21 a.
Rewriting the integral in spherical coordinates, we have
Z Z 2π Z π Z ∞ Z ∞
−(x2 +y 2 +z 2 ) −ρ2 2 2
e dV = e ρ sin φdρdφdθ = 4π ρ2 e−ρ dρ
R3 0 0 0 0
 i∞ Z ∞  √
2 2 π
= 2π −ρe−ρ + e−ρ dρ = 2π = π 3/2 ,
0 0 2
Z
2 2 2
by Example 3. Alternatively, we could evaluate the integral directly, noting that e−(x +y +z ) dV
Z ∞ 3 R3
2
= e−x dx .
−∞
b. This is a bit sneaky, since we’ve not yet discussed the higher-dimensional change of
variables theorem. But, separating the iterated integral
Z ∞as a productZof∞single integrals, we then
−ax2

−( ax)2
p
just make simple substitutions. Indeed, if a > 0, then e dx = e dx = π/a, so
−∞ −∞
Z Z  Z  Z 
−(x2 +2y 2 +3z 2 )

−x2

−2y 2

−3z 2

e dV = e dx e dy e dz = π 3/2 / 6.
R3 −∞ −∞ −∞
198 7. INTEGRATION

7.3.22 The region described lies over the circle (x − 3/2)2 + (y − 2)2 = 25/4 in the xy-plane.
Since 
translating
  a region does not affect its volume, we see that the region has
 the same volume as
 x
25 3 3 
S =  y  : x2 + y 2 ≤ , (x + )2 + (y + 2)2 ≤ z ≤ 3(x + ) + 4(y + 2) . Now, we compute
 4 2 2 
z
the volume of S:
Z 2π Z 5/2
25 3 
vol(S) = 3r cos θ + 4 sin θ + − (r cos θ + )2 − (r sin θ + 2)2 rdrdθ
0 0 2 2
Z 2π Z 5/2
25 
= − r 2 rdrdθ
0 0 4

π 25 2 625π
= = .
2 4 32
7.3.23 This integral can be calculated as a sum of two iterated integrals in spherical coordinates
or, more simply, in cylindrical coordinates, as follows.
Z Z 2π Z √3/2 Z √1−r2
z rz
2 2 2 3/2
dV = √ 2 2 3/2
dzdrdθ
S (x + y + z ) 0 0 1− 1−r 2 (r + z )
 
Z √3/2  i √
2
Z √3/2
r 1−r
q r
= 2π −√ √ dr = 2π √ − r  dr
2
r +z 2 1− 1−r 2
0 0 2(1 − 1 − r ) 2

 p p  # 3/2
√ 1  1 11π
= 2π 2 (1 − 1 − r 2 )1/2 − (1 − 1 − r 2 )3/2 − r 2 = .
3 2 12
0
Z
r √ r
(To find p √ dr, we substitute u2 = 1 − 1 − r 2 , so 2udu = √ dr and the
1 −Z 1 − r 2 1 − r2
Z
2u(1 − u2 )du
integral becomes = 2 (1 − u2 )du.)
u

7.3.24 This is one of my all-time favorite challenge problems for vector calculus students. (It
is an even better challenge for a single-variable calculus student. And the ultimate challenge is
to solve the problem, à la grecque, with no calculus whatsoever.2 ) Exploiting symmetry to the
utmost, note that the region is composed of (6)(8) = 48 regions congruent to
  
 x 
 y  : x2 + y 2 ≤ 1, 0 ≤ y ≤ x, 0 ≤ z ≤ y ,
 
z

whose volume is
Z π/4 Z 1 Z r sin θ Z π/4 Z 1
1 1 
rdzdrdθ = r 2 sin θdrdθ = 1− √ .
0 0 0 0 0 3 2
2
As a hint, recall that Archimedes computed the volume of a sphere by applying what we now call Cavalieri’s
principle, noting that the cross-sections of a sphere of radius a are the same as those of the region obtained by
removing a (double) cone of height a and radius a from a cylinder of height 2a and radius a. The region in question
can be written as the union of a cube and 24 “caps,” each of which is a truncated portion of the intersection of two
cylinders.
7.4. PHYSICAL APPLICATIONS 199
 √
Therefore, the full region has volume 16 1 − √1 = 8(2 − 2), approximately 19% more than the
2
volume of the unit ball.

7.4. Physical Applications


Z 2π Z a
1 1 2πa3 2
7.4.1 The average distance is r 2 drdθ = = a.
πa2 0 0 πa2 3 3
Z 2π Z π Z a
1 1 3
7.4.2 The average distance is 4 3 ρ3 sin φdρdφdθ = 4 3
πa 4
= a.
3 πa 0 0 0 3 πa
4

7.4.3 Let the point in question be 0, and take the boundary of the ball to be given by r =
2a cos θ, |θ| ≤ π/2. Then the average distance is
Z π/2 Z 2a cos θ Z
1 2 1 8a3 π/2 8a 4 32
2
r drdθ = 2
cos3 θdθ = · = a ≈ 1.13a.
πa −π/2 0 πa 3 −π/2 3π 3 9π

7.4.4 Let the point in question be 0, and take the boundary of the ball to be given by ρ =
2a cos φ, 0 ≤ φ ≤ π/2. Then the average distance is
Z 2π Z π/2 Z 2a cos φ Z π/2
1 3 1  4 4
 6
4 3
ρ sin φdρdφdθ = 4 3
8πa cos φ sin φdφ = a.
3 πa 0 0 0 3 πa 0 5

7.4.5 We take the square [0, a] × [0, a] and the corner tobe 0. By symmetry, it suffices
 to
x
find the average distance from 0 to points in the triangle S = : 0 ≤ x ≤ a, 0 ≤ y ≤ x . The
y
average distance is
Z π/4 Z a sec θ Z
1 2 1 a3 π/4
2a  1 √ √ 
1 2 r drdθ = 1 2 sec3 θdθ = ( 2 + log( 2 + 1))
2a 0 0 2a
3 0 3 2
√ √
2 + log( 2 + 1)
= a ≈ 0.77a.
3
7.4.6 First of all, the mass of Ω is given by
Z Z π/4 Z 2 cos θ Z π/4 Z 2 cos θ
r sin θ 1 − log 2
m= δdA = 2
rdrdθ = sin θdrdθ = .
Ω 0 sec θ r 0 sec θ 2
Now we have
Z Z π/4 Z 2 cos θ
1 1 1 1 1 3 − log 4
x= xδdA = r cos θ sin θdrdθ = · (3 − log 4) = · ≈ 1.31
m Ω m 0 sec θ m 8 4 1 − log 2
Z Z π/4 Z 2 cos θ
1 1 1 3π − 8 1 3π − 8
y= yδdA = r sin2 θdrdθ = · = · ≈ 0.58
m Ω m 0 sec θ m 16 8 1 − log 2

7.4.7 First of all, the mass of Ω is given by


Z Z π/3 Z 2 cos θ Z π/3 Z 2 cos θ √
1 π
m= δdA = rdrdθ = drdθ = 2 3− .
Ω −π/3 1 r −π/3 1 3
200 7. INTEGRATION

Now we have
Z Z √
1 π/3 2 cos θ
1 √ 3 3
x= r cos θdrdθ = · 3= √ ≈ 1.26
m −π/3 1 m 2(3 3 − π)
Z π/3 Z 2 cos θ
1
y= r sin θdrdθ = 0.
m −π/3 1
  
x
7.4.8 Let Ω = : x2 + y2 ≤ a2 , y ≥ 0 . Without loss of generality, we may take the
y
1
density to be δ = 1. Then mass(Ω) = πa2 . By symmetry, x = 0. And
2
Z πZ a
1 2 2a3 4
y= r 2 sin θdrdθ = 2
· = a ≈ 0.42a.
mass(Ω) 0 0 πa 3 3π
  
 x 
7.4.9 Ω =  y  : x2 + y 2 + z 2 ≤ a2 , z ≥ 0 . Without loss of generality, we may take the
 
z
2
density to be δ = 1. Then mass(Ω) = πa3 . By symmetry, x = y = 0. And
3
Z 2π Z π/2 Z a
3 3 πa4 3
z= 3
(ρ cos φ)ρ2 sin φdρdφdθ = 3
· = a.
2πa 0 0 0 2πa 4 8

7.4.10 By symmetry, x = y = 0. Now, taking δ = 1,


Z 2π Z π/3 Z 2a
3 2 3 9πa4 27
z= (ρ cos φ)ρ sin φdρdφdθ = = a.
5πa3 0 0 a sec φ 5πa 3 4 20

7.4.11 Without loss of generality, we may take the density to be δ = 1. We know from Exercise
7.2.13 that the volume of the tetrahedron is V = abc/6. Then
Z Z Z
1 a b(1−x/a) c(1−x/a−y/b)
x= xdzdydx
V 0 0 0
Z
1 bc a  x 2
= · x 1− dx
V 2 0 a

(substituting u = 1 − x/a)
Z 1
1 bc 6 a2 bc 1 a
= · a2 (1 − u)u2 du = · · = .
V 2 0 abc 2 12 4
Similarly (merely permuting the variables), we have y = b/4 and z = c/4. That is, x is the average
of the four vertices of the tetrahedron.
  
 x 
7.4.12 Let Ω =  y  : x2 + y 2 ≤ a2 , 0 ≤ z ≤ h . Using cylindrical coordinates, we have
 
z
Z 2π Z a Z h
2π 3
δ = r, and the mass of the solid cylinder is m = r 2 dzdrdθ = a h. The moment of
0 0 0 3
Z 2π Z a Z h
2π 5 3
inertia about the z-axis is I = r 4 dzdrdθ = a h = ma2 .
0 0 0 5 5
7.4. PHYSICAL APPLICATIONS 201
Z 2π Z π Z a
4π 6
7.4.13 I= (ρ sin φ)2 (ρ)ρ2 sin φdρdφdθ = a .
0 0 0 9
7.4.14 In cylindrical coordinates, we have
Z Z √
2Z
√ Z √ √ !
2π 4−r 2 2 p  64 8 2
2 3 4
I= (r )rdzdrdθ = 2π r 4 − r 2 − r dr = 2π − .
0 0 r 0 15 3

And in spherical coordinates, we determine


Z 2π Z π/4 Z 2  
2 2 64π 2 5
I= (ρ sin φ) ρ sin φdρdφdθ = − √ .
0 0 0 5 3 6 2
Z 2π Z π/3 Z 2

7.4.15 We have I = (ρ sin φ)2 ρ2 sin φdρdφdθ = .
0 0 0 3

7.4.16 a. No integration is required here. Here every particle is at distance a from the axis of
revolution, and I = ma2 = πδa4 h.
Z 2π Z a Z h
a4 1
b. I = δ r 3 dzdrdθ = 2πδ · · h = ma2 .
0 0 0 4 2
Z 2π Z a Z h 4
a 3
c. I = δ r 3 dzdrdθ = 2πδ · ·h= ma2 .
0 0 hr/a 20 10

7.4.17 Without loss of generality,


Z 2π Z π we
Z a take the density to be δ = 1.
8π 5
a. We have Ia,b = (ρ sin φ)2 ρ2 sin φdρdφdθ = (a − b5 ).
0 0 b 15
Ia,b 8π (a − b)(a4 + a3 b + a2 b2 + ab3 + b4 ) 8π 5 2 8 2
b. lim 3 = lim = · a = πa .
b→a− a − b3 15 b→a− (a − b)(a2 + ab + b2 ) 15 3 9
c. If we divide our answer to part b by 4π/3, then we will find the limiting ratio of
moment of inertia to mass (since the volume of a ball is 4π/3 times the cube of its radius). Thus,
in the limit, we find that for a uniform hollow spherical shell, I = 23 ma2 .

7.4.18 Z One approach is to apply Exercise Z 7.2.20, seeking critical points of this function
Z of a: If
2 T
F (a) = kx − ak dV , then DF (a) = 2 (x − a) dV = 0 if and only if vol(Ω)a = xdV . That
Ω Z Ω Ω
1
is, a = xdV = x, which is the center of mass of Ω.
vol(Ω) Ω
Alternatively, we write this integral out explicitly as quadratic function of a and complete the
square:
Z Z Z Z
2 2 2
 2
kx − ak dV = kxk − 2a · x + kak dV = kak vol(Ω) − 2a · xdV + kxk2 dV
Ω Ω Ω Ω
 R  Z
2 Ω xdV
= vol(Ω) kak − 2a · + kxk2 dV
vol(Ω) Ω
Z
= vol(Ω)ka − xk2 + kxk2 dV − kxk2 vol(Ω).

It follows that the integral takes on its minimum value when a = x.


202 7. INTEGRATION

7.4.19 Without loss of generality, we take the density to be given by δ = 1. The mass of the
Z 2π Z a1/n Z z n Z a1/n

solid is m = 2 rdrdzdθ = 2π z 2n dz = a(2n+1)/n . Then the moment of
0 0 0 0 2n + 1
Z 2π Z a1/n Z z n Z a1/n
inertia about the axis of revolution is given by I = 2 r 3 drdzdθ = π z 4n dz =
0 0 0 0
1
π 4+ n
π I 4n+1 a 2n + 1
a(4n+1)/n . Thus, we have = = . In particular, note that as n → ∞,
4n + 1 ma2 2π 1
4+ n 2(4n + 1)
2n+1 a
the ratio approaches a limiting value of 1/4.

7.4.20 As usual, without loss of generality, we assume the densityZis δ Z= 1.Z


2π ε a
a. The mass of the narrow ice-cream cone is m = ρ2 sin φdρdφdθ =
0 0 0
2πa3
(1 − cos ε). By symmetry, its center of mass lies on the z-axis at height
3
Z Z Z
1 2π ε a 1 πa4 3a
z= (ρ cos φ)ρ2 sin φdρdφdθ = · (1 − cos2 ε) = (1 + cos ε).
m 0 0 0 m 4 8
b. As ε → 0+ , z → 3a/4. At first, this is somewhat surprising, inasmuch as the limiting
shape is a line segment of length a. But the limiting density function of that line segment is far
from uniform (since the ice cream cone has more mass at its top than at its bottom).

7.4.21 The y-coordinate of the center of mass of R is given by


Z Z b Z f (x)
1 1
y= ydA = ydydx.
area(R) R area(R) a g(x)
On the other hand, using cylindrical coordinates along the x-axis, we have
Z 2π Z b Z f (x) Z b Z f (x)
vol(Ω) = rdrdxdθ = 2π ydydx = 2πy · area(R).
0 a g(x) a g(x)

7.4.22 Denote by x the center of mass of Ω. Without any loss of generality, we’ll take ℓ0 to
be the
 z-axis,
 x = 0, and ℓ to be the line parallel through the z-axis passing through the point
a
a =  b . Then
0
Z Z Z Z
2 2
 2 2
I= δ (x − a) + (y − b) dV = δ(x + y )dV − 2 δ(a · x)dV + δ(a2 + b2 )dV
Ω Ω Ω Ω
Z
= I0 − 2a · δxdV + mh2 = I0 + mh2 ,

Z
since δxdV = mass(Ω)x = 0.

7.4.23 Adapting the calculation 


of Example
 6, we take the ball to be centered at the origin and
0
the point mass to be at the point  0 . By symmetry, the resultant force will act only in the
R
7.4. PHYSICAL APPLICATIONS 203

vertical direction. So we calculate


Z Z
cos α R − ρ cos φ
F3 = −G δ 2 dV = −G ρ · 2 dV
Ω d Ω (R + ρ2 − 2Rρ cos φ)3/2
Z R
2ρ2
= −2πG ρ · 2 dρ = −πGR2 .
0 R
Note that the total mass of the ball is πR4 , so the force is the same as if we concentrated all the
mass at the center of the ball. (See Exercise 26.)

7.4.24 We have
Z
z
F3 = G 2 2 2 3/2
dV
Ω (x + y + z )
Z 2π Z π/4 Z √2a Z Z !
2π π/2 Z a cot φ csc φ
=G cos φ sin φdρdφdθ + cos φ sin φdρdφdθ
0 0 0 0 π/4 0
!
√ Z π/4 Z π/2
= 2πGa 2 cos φ sin φdφ + (csc φ − sin φ)dφ
0 π/4
1 √ 1  √ 1 
= 2πGa √ + log( 2 + 1) − √ = 2πGa log( 2 + 1) − √ .
2 2 2 2 2
7.4.25 Imagine two identical objects of mass M/2 located at ±a on the x-axis. The gravitational
force exerted by that system on a test mass at a − ε is obviously large and to the right. But the
force exerted by a mass M at the origin (the center of mass) on our test mass would be towards
the origin.

7.4.26 Following the calculation in Example 6, the φ-integral is unchanged and when b ≥ R we
have Z
4πG R GM
F3 = −
2
δ(ρ)ρ2 dρ = − 2 .
b 0 b
When b < R, it is still the case that the integrand vanishes whenever ρ > b, and so
Z
4πG b G
F3 = − 2 δ(ρ)ρ2 dρ = − 2 · (mass of the earth within distance b from the center).
b 0 b
7.4.27 a. The volume of this region is
Z π/2 Z √k cos φ Z
2 2π 3/2 π/2 4πk 3/2
V = 2π ρ sin φdρdφ = k (cos φ)3/2 sin φdφ = .
0 0 3 0 15
We have V = 4π/3 when k = 52/3 ≈ 2.92.
b. By symmetry, the gravitational force is all vertical, and it is given by
Z √
2π Z π/2 Z k cos φ √ Z π/2 4πG √ 4πG
F3 = G cos φ sin φdρdφdθ = 2πG k (cos φ)3/2 sin φdφ = k = 2/3 .
0 0 0 0 5 5
This is about 2.6% greater than the gravitational force of the uniform unit ball.
Remark: Finding the (rotationally symmetric) shape with given volume that maximizes the
gravitational force is a maximum problem in the space of (continuous) functions, a problem in the
204 7. INTEGRATION

calculus of variations. We define an inner product (see Section 5.3 of Chapter 5) on the vector
Z π/2
space of continuous functions on [0, π/2] by hf, gi = f (φ)g(φ)dφ. Then we want to maximize
Z π/2 0 Z π/2
F(f ) = f (φ) cos φ sin φdφ subject to the constraint G(f ) = f (φ)3 sin φdφ = 2. If f is a
0 0
constrained critical point, then we should have

Z Z
d π/2 π/2
Dg F(f ) = (f (φ) + tg(φ)) cos φ sin φdφ = g(φ) cos φ sin φdφ = 0
dt 0 0 0

for every function g with

Z Z
d π/2
3
π/2
Dg G(f ) = (f (φ) + tg(φ)) sin φdφ = 3 f (φ)2 g(φ) sin φdφ = 0
dt 0 0 0

(i.e., for every g in the tangent space to the constraint set). That is, we seek a function f so
that hg, cos sini = 0 for all g with hg, f 2 sini = 0. Standard orthogonal complementarity arguments
suggest then that we should have f (φ)2 = k cos φ for some constant k.

7.4.28 There are all sorts of details to iron out, such as how the helicopters will fight the fire,
but the key issue is to choose the location a ∈ R2 of the helipad so that we minimize, on average, the
amount of forest that will burn as the helicopters fly from a to the center, x, of the fire. Intuitively,
if the fire starts at x, then the amount that burns will be proportional to kx − ak2 (since the time
is proportional to the distance from a to x and the area that burns is proportional to the square of
the time). So we should choose a to make the average value of kx − ak2 as small as possible. This
means (see Exercise 18) that we should put a at the center of mass of Ω. (Of course, this is not
absolutely right, because if x is sufficiently close to the boundary of Ω, then less forest will actually
burn.)

7.5. Determinants and n-dimensional Volume

7.5.1 We use the properties of the determinant and arrange to apply Proposition 5.12. Of
course, one can
use either row or column
operations
to get
the the matrix
to triangular form.
−1
6 −2 −1 0
0 0 −1 0

a. 3 4 5 = 3 22 −1 = − 3 −1 22

5 2 1 5 32 −9 5 −9 32

−1 0 0


= − 3 −1 0 = −(−1)(−1)(−166) = 166.

5 −9 −166
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 205

1 0 2 0 1 0 0 0 1 0 0 0


−1 2 −2
0 −1 2 0
0 −1 2 0 0
b. = =
0 1 2 6 0 1 2 6 0 1 2 0

1 1 3 2 1 1 1 2 1 1 1 −1
= (1)(2)(2)(−1) = −4.

1 4 1 −3 1 0 0 0


2 10 0 1 4 2 0 0
c. = = (1)(2)(2)(3) = 12.
0 0 2 2 1 −2 2 0


0 0 −2 1 −3 7 2 3

2 −1 0 0 0 1 2 0 0 0


−1 2 −1 0 0 −2 −1 −1 0 0

d. 0 −1 2 −1 0 = 1 0 2 −1 0

0 0 −1 2 −1 0 0 −1 2 −1

0 0 0 −1 2 0 0 0 −1 2

1 0 0 0 0 1 0 0 0 0


−2 3 −1 0 0 −2 1 3 0 0

= 1 −2 2 −1 0 = 1 −2 −2 −1 0

0 0 −1 2 −1 0 1 0 2 −1

0 0 0 −1 2 0 0 0 −1 2

1 0 0 0 0 1 0 0 0 0


−2 1 0 0 0 −2 1 0 0 0

= 1 −2 4 −1 0 = 1 −2 1 4 0

0 1 −3 2 −1 0 1 −2 −3 −1

0 0 0 −1 2 0 0 1 0 2

1 0 0 0 0 1 0 0 0 0


−2 1 0 0 0 −2 1 0 0 0

= 1 −2 1 0 0 = 1 −2 1 0 0

0 1 −2 5 −1 0 1 −2 1 5

0 0 1 −4 2 0 0 1 −2 −4

1 0 0 0 0


−2 1 0 0 0

= 1 −2 1 0 0 = (1)(1)(1)(1)(6) = 6.

0 1 −2 1 0

0 0 1 −2 6

7.5.2 Suppose ai = 0. Then ai = 2ai , so

det A = D(a1 , . . . , ai , . . . , an ) = D(a1 , . . . , 2ai , . . . , an ) = 2D(a1 , . . . , ai , . . . , an ) = 2 det A,

and thus det A = 0.


206 7. INTEGRATION

7.5.3 We need only prove this statement for each of the three types of elementary matrices
listed in Section 2 of Chapter 4.
(i) Multiplying A by an elementary matrix E of type (i) interchanges the ith and j th
columns of A, so det(AE) = − det A. Since E itself is obtained by interchanging the
same two columns of the identity matrix, we have det E = − det I = −1. Hence,
det(AE) = det E det A.
(ii) Multiplying A by an elementary matrix E of type (ii) multiplies the ith column of A by
the scalar c, and so det(AE) = c det A. Since we obtain E by multiplying the ith column
of the identity matrix by c, we have det E = c det I = c, and so det(EA) = det E det A.
(iii) Multiplying A by an elementary matrix E of type (iii) adds a scalar multiple of column i
to row j, and hence doesn’t change the determinant of A. On the other hand, we obtain
E by adding that same scalar multiple of the ith column of the identity matrix to its j th
column, and so det E = det I = 1. Thus, in this case, too, we have det(AE) = det A,
as required.

7.5.4 We need only prove this statement for each of the three types of elementary matrices
listed in Section 2 of Chapter 4. An elementary matrix of type (i) or (ii) is symmetric, and so the
result is immediate. Next, any elementary matrix of type (iii) has determinant equal to 1 and the
transpose of any elementary matrix of type (iii) is again of type (iii), so the result is immediate.

7.5.5 This follows by applying the second property of D repeatedly. Multiplying a single
column of A by the scalar c results in multiplying the determinant by c; doing so to each of the
columns in succession multiplies the determinant by c a total of n times, hence multiplies the
original determinant by a factor of cn .

7.5.6 When A is a 1 × 1 matrix with a single integer entry, then det A is obviously an integer.
Now, assume that the determinant of any k×k matrix whose entries are integers must be an integer.
Let A be a (k + 1) × (k + 1) matrix whose entries are all integers. By Proposition 5.14, expanding
P
in cofactors along the first row, we have det A = k+1 j=1 a1j C1j , where C1j = (−1)
1+j det A
1j and
th
A1j is the k × k matrix obtained by crossing out the first row and j column of A. Since this is a
k × k matrix with integers entries, we infer that C1j is an integer for j = 1, . . . , k + 1. Therefore,
det A, being a sum of products of integers, is itself an integer. (Alternatively, by Proposition 5.18
directly, det A is the sum of products of integers and is therefore an integer.)

7.5.7 The crucial ingredient in solving this problem is to remember the meaning of place values.
Using property (3) of Proposition 5.4 three times, we add 103 times the first column, 102 times the
second column, and 10 times the third column to the fourth. Then

1 8 9 8 1 8 9 1898 1 8 9 146


3 4 7 1 3 4 7 3471 3 4 7 267
= = 13 ;
7 2 1
5 7 2 1 7215 1 555
7 2

8 1 6 4 8 1 6 8164 8 1 6 628
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 207

since the final determinant must be an integer, we see that our original determinant is divisible by
13. (Of course, we needn’t know the actual values of the integers in the last column; all we need
to know is that all those entries are integers.)

7.5.8 Moving the bottom row to the first for ease of calculation, we have

a1 b1 c1 1 1 1 1 0 0

b1 − a1 c1 − a1
a2 b2 c2 = a1 b1 c1 = a1 b1 − a1 c1 − a1 = ,
b2 − a2 c2 − a2
1 1 1 a2 b2 c2 a2 b2 − a2 c2 − a2

which (see Section 5 of Chapter 1) is twice the signed area of △ABC. A nicely geometric, alternative
solution is this: the original 3× 3 determinant
 above givesthesigned volume of the parallelepiped
a1 b1 c1
spanned by the vectors x =  a2 , y =  b2 , and z =  c2 . The pyramid with vertices 0, x,
1 1 1

y
z x
1

y, and z has 1/6 that volume; but, on the other hand, the volume of the pyramid is 1/3 the area
of its base (the shaded triangle) times its height (1). Therefore, the area of the triangle is 1/2 the
determinant. On the other hand, that triangle is congruent to △ABC.

7.5.9 If A is in upper- or lower-triangular form, det A is the product of its diagonal entries,
and then the determinant of the given (n + 1) × (n + 1) matrix will be that same product. In
general, reducing the the given matrix to upper- or lower-triangular form requires exactly the
same row (or column) operations as reducing A itself to that form, so the determinants are the
same. Geometrically, the (n + 1)-dimensional signed volume of a parallelepiped with height 1 is the
n-dimensional signed volume of its base.

7.5.10 a. Let E ′ be the product of the k × k elementary matrices corresponding to the row
operations required to put A in upper-triangular form U ′ . Let E ′′ be the product of the ℓ × ℓ
208 7. INTEGRATION

elementary matrices corresponding to the row operations required to put D in upper-triangular


form U ′′ . Then we have
" #" # " #
E′ O A B U′ ⋆
= = U,
O E ′′ O D O U ′′

which is an upper-triangular matrix. Now we have det E ′ det A = det U ′ , which equals the product
of the diagonal entries of U ′ , and det " E ′′ det D =# det U ′′ ,"which equals
# " the product
# of the diago-

E O ′
E O I O
nal entries of U ′′ . Also note that det ′′
= det = det E ′ det E ′′ .
O E O I O E ′′
Putting this all together, we have
" #
A B det U det U ′ det U ′′ det U ′ det U ′′
det = = = = det A det D,
O D det E ′ det E ′′ det E ′ det E ′′ det E ′ det E ′′

as desired.
b. We do row operations on the given matrix to reduce it to the block form in part a:
Since A is invertible, we can do row operations to convert A to the identity matrix, then do row
operations to remove the entries of C below. In equations, we have
" #" #" # " #
I O A−1 O A B I A−1 B
= .
−C I O I C D O D − CA−1 B
Rewriting this, we have
" # " #" #" #
A B A O I O I A−1 B
= ,
C D O I C I O D − CA−1 B

so, using the product rule for determinants and the result of part a, we have
" #
A B
det = det A det(D − CA−1 B).
C D

c. If k = ℓ, then we have det A det(D − CA −1 B) = det(AD − ACA−1 B). And if


" #
A B
AC = CA, then ACA−1 B = CB, so then det = det(AD − CB).
C D
" # " # " # " #
1 0 0 0 0 1 0 0
d. Let A = , B = , C = , and D = . Then
0 0 1 0 0 0 0 1
" #
−1 0
AD − CB = , so det(AD − CB) = 0. And yet
0 0
 
" # 1 0 0 0
 
A B  0 0 1 0 
det = det 
 0 1 0 0  = −1.

C D  
0 0 0 1
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 209

" Now #we give "an example


# when " A is nonsingular
# but
" A and # C don’t commute. "Take A #=
1 0 1 0 0 1 1 0 1 −1
,B= ,C= , and D = . Then AD − CB = ,
1 1 0 1 0 0 0 0 1 0
so det(AD − CB) = 1. On the other hand, we have
 
" # 1 0 1 0
 
A B  1 1 0 1 
det = det   = 0.
C D 0 1 1 0 
 
0 0 0 0

7.5.11 If A is orthogonal, then AT A = I, so by Propositions 5.11 and 5.7, we have 1 =


det(AT A) = det(AT ) det A = (det A)2 , and so det A = ±1.

7.5.12 By Exercise 5, we have det AT = det(−A) = (−1)n det A = − det A, since" n is odd.
# But
0 −1
we know that det AT = det A, so det A = 0. Notice that when n = 2, we have det = 1.
1 0

7.5.13 a. Expanding by cofactors along the second row, we have



2 1 1 1 1 2

det A = −(2) + (3) − (0) = (−2)(0) + (3)(1) − (0)(2) = 3.
4 2 1 2 1 4
 
1 1 1
 
We have B2 =  2 2 0 , and so det B2 = −(2)(3) + (2)(1) = −4. Therefore, Cramer’s Rule
1 −1 2
tells us that x2 = det B2 / det A = −4/3.
 
6 −4 5
 
b. The cofactor matrix is C =  0 1 −2 , so
−3 2 −1
 
6 0 −3
1 1 
A−1 = C T =  −4 1 2 .
det A 3
5 −2 −1

7.5.14 Expanding in cofactors along the first column, we have



1 0 2 3 2 3

det A = (−1) − (2) + (0) = −3.
2 3 2 3 1 0
 
3 −6 4
 
Moreover, the cofactor matrix is C =  0 −3 2 , so
−3 6 −5
   
3 0 −3 −1 0 1
1 1   
A−1 = T
C = −  −6 −3 6 =  2 1 −2 .
det A 3
4 2 −5 − 43 − 23 5
3
210 7. INTEGRATION

7.5.15 a. If A has integer entries, then (by Exercise 6) the cofactors of A are all integers. Since
det A = ±1, we infer from Proposition 5.17 that A−1 = ±C T will have all integer entries as well.
b. Since AA−1 = I, we know that (det A)(det A−1 ) = 1. Since A and A−1 are both
matrices with integer entries, we know that det A and det A−1 are both integers. The only way to
obtain 1 as the product of two integers is for both integers to be either 1 or −1. Thus, det A = ±1.

7.5.16 Suppose i < j and we wish to interchange rows i and j. We first exchange Ai and Ai+1 ,
then Ai and Ai+2 , and so on, until we exchange Ai and Aj , so that we have reached the ordering

A1 , A2 , . . . , Ai−1 , Ai+1 , Ai+2 , . . . , Aj−1 , Aj , Ai , Aj+1 , . . . , An .

So far we have made j − i exchanges of adjacent rows. Next, we move Aj back into the original
position of Ai by exchanging Aj with Aj−1 , then with Aj−2 , . . . , finally with Ai+1 . This is a total
of j − i − 1 interchanges of adjacent rows. In summary, we have interchanged Ai and Aj with a
total of (j − i) + (j − i − 1) = 2(j − i) − 1 interchanges of adjacent rows. Since 2(j − i) − 1 is odd,
we are done.

7.5.17 If A is orthogonal, then, by Exercise 11, det A = ±1. Then AT = A−1 = ±C T , so


A = ±C.

7.5.18 Suppose A is singular; then det A = 0 and so we must show that AC T = O. The ij-entry
of AC T is equal to

n
X n
X
(∗) Ai · C j = aik Cjk = (−1)j+k aik det Ajk .
k=1 k=1

When i = j, this is just the formula for det A, expanding in cofactors along the ith row, and so we
obtain 0. When i 6= j, consider the matrix à obtained by replacing the j th row of A by Ai . Then
the final sum in (∗) is the formula we get when we expand det à in cofactors along the j th row,
which must be 0 by virtue of Lemma 5.2.
When A is singular, the columns of C T are in N(A).

      1 1 1

x x1 x2
7.5.19 a. First notice that if = or , then x x1 x2 = 0, since the matrix has
y y1 y2
y y1 y2
two identical columns. Expanding the determinant in cofactors along the first column we obtain a
linear function ax + by + c = 0, where


1 1 1 1 x x
1 2
a = − , b= , and c= .
y1 y2 x1 x2 y1 y2
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 211
   
x1 x2
Since the points and are distinct, either a or b must be nonzero, so this gives the
y1 y2
 
x1
equation of a line, and that line, as we’ve already seen, passes through the given points and
y1
 
x2
.
y2
     
x1 x2 x3
b. As in part a, it is clear that the three points  y1 ,  y2 ,  y3  must satisfy the
z1 z2 z3
given equation. Similarly, expanding the determinant in cofactors along the first column, we obtain
an equation of the form ax + by + cz + d = 0. This will be the equation of a plane provided we check
that at least one
 ofa, b, and c must be nonzero. Taking a slightly different approach, we just show
x∗
that for some  y ∗  ∈ R3 , this equation does not hold. We observe that since the three points are
z∗
   
x2 − x1 x3 − x1
noncollinear, the vectors  y2 − y1  and  y3 − y1  form a linearly independent set of vectors in R3 .
z2 − z1 z − z1
     3 
1 0 0
 x1   x2 − x1   
It follows that  ,  , and  x3 − x1  form a linearly independent set of vectors in R4 ,
 y1   y2 − y1   y3 − y1 
z1 z2 − z1 z3 − z1

 1 1
 0 0 1 1 1 1
1
∗ ∗
x ∗ x x1 x2 − x1 x3 − x1 x x1 x2 x3
  4
and so there is some vector  ∗  ∈ R so that ∗ 6 0.
y y y y − y y − y = y∗ y y y =
1 2 1 3 1 1 2 3
z∗ ∗
z z1 z2 − z1 z3 − z1 z ∗ z1 z2 z3
       
x x1 x2 x3
7.5.20 Setting equal to , , or yields a matrix with two identical columns,
y y1 y2 y3
and so we know that these three points satisfy the respective equations. Expanding in cofactors
along the first column, we see that the equations take the appropriate form.
For example,
the first
1 1 1


equation is of the form dy = a′ x2 + b′ x + c′ , and if we check that d = x1 x2 x3 6= 0, then we’ll

x2 x2 x2
1 2 3
be done. Since x1 , x2 , and x3 are distinct, it follows from Exercise 4.1.22 that this determinant is
nonzero.
Similarly,
assuming that three points are noncollinear ensures (see Exercise 4.1.23) that
1 1 1


x1 x2 x3 6= 0, so a′ 6= 0, and we have a bona fide equation of a parabola. Similarly, in the case

y1 y2 y3
of the circle (where we no longer need any assumption on the xi ’s), the same inequality guarantees
that we get a nonzero coefficient of x2 + y 2 when we expand the second determinant by cofactors
along the first column.
212 7. INTEGRATION

7.5.21 Suppose det and det f are two functions satisfying the properties in Theorem 5.1. Since
the proofs of Theorem 5.5, Corollary 5.6, and Proposition 5.12 rely only on the properties listed
in Theorem 5.1, we conclude that both det and det f satisfy the conclusions of these proposi-
tions. In particular, we infer that if A is singular, then det A = det f A = 0; if A is nonsin-
gular, then since A = Em Em−1 · · · E2 E1 is a product of elementary matrices, then det A =
f = detE
det Em det Em−1 · · · det E2 det E1 and detA f m detE
f m−1 · · · detE
f 2 detE
f 1 . Since it follows di-
rectly from the properties that det E = detEf for any elementary matrix E, we conclude that
f for all square matrices A.
det A = detA
Remark: Since there are many ways of writing A as a product of elementary matrices, we might
worry that we could get different answers for det A. This is a question of well-definedness of the
function det, not of its uniqueness.

7.5.22 Let V = Span(v1 , . . . , vk ) and choose an orthonormal basis {w1 , . . . , wn−k } for V ⊥ . Let
A be the n × n matrix whose columns are v1 , . . . , vk , w1 , . . . , wn−k . Since the parallelepiped in Rn
spanned by v1 , . . . , vk , w1 , . . . , wn−k is an “extension” of the k-dimensional parallelepiped spanned
by v1 , . . . , vk by unit lengths in the orthogonal directions, we conclude that det A is equal to the
signed volume of the k-dimensional parallelepiped spanned by v1 , . . . , vk . Thus, the square of this
volume is (det A)2 = det(AT A). But
 
v1 · v1 · · · v1 · vk
 .. .. .. 
 . . . O 
 
T  
A A =  vk · v1 · · · vk · vk ,
 
 
 
O In−k


v · v · · · v · v
1 1 1 k
. .. ..
and so (det A)2 = .. . . , as required.

vk · v1 · · · vk · vk

7.5.23 a. Let ϕ(t) = det(I + tB). Then D(det)(I)B = ϕ′ (0). Now, by Proposition 5.18,
X
ϕ(t) = sign(σ)(I + tB)1σ(1) · · · (I + tB)nσ(n)
σ
X
(†) = (1 + tb11 )(1 + tb22 ) · · · (1 + tbnn ) + sign(σ)(I + tB)1σ(1) · · · (I + tB)nσ(n)
σ(i)6=i for some i
2
= 1 + t(b11 + b22 + · · · + bnn ) + t (· · ·);

note that in each term of the second sum in (†), there must be at least two off-diagonal terms,
hence at least a factor of t2 . Therefore, ϕ′ (0) = b11 + b22 + · · · + bnn = trB, as desired.
b. Since ψ(t) = det(A + tB) = det A det(I + tA−1 B) for any invertible A, by the result
of part a, we have D(det)(A)B = ψ ′ (0) = det A tr(A−1 B).
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 213
 
x1
7.5.24 Let Ω ⊂ Rn−1 be the projection of R into the x2 · · · xn -plane. For x = ∈ R × Rn−1 ,
x
write R = {x ∈ Rn : φ(x) ≤ x1 ≤ ψ(x)}. Then we have
Z Z ψ(x) Z

vol(R) = dx1 dVn−1 = ψ(x) − φ(x) dVn−1 .
Ω φ(x) Ω

For T as given of the first type,


Z Z cψ(x) Z

vol(T (R)) = dx1 dVn−1 = cψ(x) − cφ(x) dVn−1 = cvol(R).
Ω cφ(x) Ω

For T of the second type,


Z Z ψ(x)+cx2 Z

vol(T (R)) = dx1 dVn−1 = (ψ(x) + cx2 ) − (φ(x) − cx2 ) dVn−1
Ω φ(x)+cx2 Ω
Z

= ψ(x) − φ(x) dVn−1 = vol(R).

   
2x x
7.5.25 The linear transformation T : R2 → R2 , T =
, maps the unit circle to the
y y
  
x2 x
ellipse + y 2 = 1. The line y = x clearly bisects the quarter-disk : x2 + y 2 ≤ 1, x, y ≥ 0 .
4 y
We then
  expect that the image
 of this line under T , namely y = x/2, should bisect the region
x 2
x
: + y 2 ≤ 1, x, y ≥ 0 . Indeed, by Proposition 5.13, if Ω ⊂ R2 is a region, then T (Ω) is a
y 4
region with area | det T | times that of Ω.

7.5.26 First consider the case of a circle. Notice that all inscribed equilateral triangles have
maximum area, since if one alters the triangle by moving one vertex, the height—and hence the
area—of the triangle decreases. For the general case, place the ellipse E in the plane so that its
major and minor axes are aligned on the x- and y-axes; then its equation is x2 /a2 + y 2 /b2 =
 1for
x
appropriate positive numbers a and b. The linear transformation T : R2 → R2 given by T =
y
 
ax
maps the unit circle to our ellipse E. By Proposition 5.13, T scales area by the (constant)
by
factor of ab, so triangles of maximal area inscribed in the unit circle must map to triangles of
maximal area inscribed in the ellipse. Since there are infinitely many of the former, there must be
infinitely many of the latter.

7.5.27 Consider f (t) = det(A + tB). This is a quadratic polynomial in t. By part b of Exercise
15, we have f (t) = ±1 for t = 0, 1, 2, 3, 4. Since a nonconstant quadratic polynomial can take on
a given value at most two times, it follows that f is a constant polynomial. So either f (t) = 1 for
all t or f (t) = −1 for all t. In either event, part a of Exercise 15 tells us that A + tB is invertible
for all real values t and that its inverse has integer entries for all integers t. (In fact, one can show
that (A−1 B)2 = O, since tr(A−1 B) = det(A−1 B) = 0.)
214 7. INTEGRATION

7.6. Change of Variables Theorem

7.6.1 a. kx + yk = max(|xi + yi |) ≤ max(|xi | + |yi |) ≤ max(|xi |) + max(|yi |) = kxk + kyk ;


kcxk = max(|cxi |) = |c| max(|xi |) = |c|kxk .

b. kS + T k = max k(S + T )(x)k ≤ max kS(x)k + kT (x)k ≤ max kS(x)k +
kxk =1 kxk =1 kxk =1
max kT (x)k = kSk + kT k ; kcT k = max k(cT )(x)k = max |c|kT (x)k = |c|kT k .
kxk =1 kxk =1 kxk =1

c. Note first that if kxk = 1, then kT (x)k ≤ kT k . Now, in general, for x 6= 0, we


have
! ! !
x x x

T (x) = T kxk = kxk T , so kT (x)k = kxk T ≤ kT k kxk .
kxk kxk kxk
 
a1
 ..  Pn
d. Note that if A =  . , then max |A · x| = |aj |. Thus,
kxk =1 j=1
an

kT k = max kT (x)k = max kAxk = max max |Ai · x|


kxk =1 kxk =1 kxk =1 1≤i≤m
n
X
= max max |Ai · x| = max |aij |.
1≤i≤m kxk =1 1≤i≤m
j=1

P
n
e. We have kxk2 = max (x2i ) ≤ x2i ≤ n max(x2i ) = nkxk2 . Now, choose x0 with
1≤i≤n i=1  
±1
  √
kx0 k = 1 so that kT k = kT (x0 )k ≤ kT (x0 )k. Since x0 =  ... , we have kx0 k = n and
±1
√ √
so kT (x0 )k ≤ nkT k. Therefore, we have kT k ≤ nkT k. As for the remaining inequality,
let A be the standard matrix for T . Then kT k = kT (x∗ )k for some x∗ with kx∗ k = 1, and so
√ P √
kT k ≤ n max |aij | = nkAk .
1≤i≤m j

f. We have
Z b  Z b  Z b  Z b


g(t)dt = max
gi (t)dt ≤ max |gi (t)|dt ≤ kg(t)k dt.
1≤i≤n 1≤i≤n
a a a a

7.6.2 If we wanted to assume that f is continuous (or, alternatively, that the hypotheses
of Fubini’s Theorem hold), we could derive this immediately by making substitutions in iterated
integrals. However, it is easy to give a straightforward argument using the definition of the integral.
There is a one-to-one correspondence between partitions P of R and partitions P′ of T (R). If the
(diagonal) entries of [T ] are d1 , . . . , dn , then, given a partition of R as on p. 268 of the text,
we obtain the partition P′ by taking (di )xij , 1 ≤ i ≤ n, 1 ≤ j ≤ ki . Since vol(Rj′ 1 j2 ...jn ) =
|d1 d2 · · · dn |vol(Rj1 j2 ...jn ), we see that

L(f, P′ ) = | det T |L(f ◦ T, P) ≤ | det T |U (f ◦ T, P) = U (f, P′ );


7.6. CHANGE OF VARIABLES THEOREM 215

moreover, since f is integrable on T (R), given ε > 0, we can find a partition P′ so that U (f, P′ ) −
L(f, P′ ) < | det T |ε, and so the corresponding partition P will have the property thatZ U (f ◦ T, P) −
L(f ◦ T, P) < ε. Therefore, f ◦ T is integrable on R. Now, since L(f, P′ ) ≤ | det T | (f ◦ T )dV ≤
Z R Z

U (f, P ) and f is integrable on T (R), we infer that, by uniqueness, | det T | (f ◦ T )dV = f dV .
R T (R)
   
x ax
7.6.3 The ellipse is the image of the unit disk in R2 under the linear map T = ; the
y by
   
x ax
ellipsoid is the image of the unit disk in R3 under the linear map T y  =  by . Therefore, by
z cz
the Change of Variables Theorem, the area of the ellipse is | det T |π = πab, and the volume of the
4π 4π
ellipsoid is | det T | = abc.
3 3

7.6.4 a. We have
Z Z π/2 Z 1/(cos θ+sin θ) cos θ−sin θ
f dA = e cos θ+sin θ rdrdθ
S 0 0
Z π/2 Z 1
1 1 cos θ−sin θ 1 1 1
= 2
e cos θ+sin θ dθ = eu du = e − ).
2 0 (cos θ + sin θ) 4 −1 4 e
cos θ − sin θ
(Here, magically, the substitution u = works out perfectly, as du =
cos θ + sin θ
−2/(cos θ + sin θ)2 dθ.)
" #   " #" #
x u 1 1 1 u
b. Let = g = . Then g maps the region Ω =
y v 2 −1 1 v
  
u
: 0 ≤ v ≤ 1, −v ≤ u ≤ v one-to-one and onto S. Thus, we have
v
Z Z
f dA = (f ◦ g)| det Dg|dAuv
S Ω
Z 1Z v Z 1 iv Z 1
u/v 1 1 1 1 1 1
= e v(e − )dv = e − ).
dudv = veu/v dv =
0 −v 2 2 0 −v 0 e 4 e 2
" #   " #" #
x u 1 3 −1 u
7.6.5 Let u = 2x + y, v = −x + 3y. Then =g = , and g maps
y v 7 1 2 v
  
u
the region Ω = : 1 ≤ u ≤ 5, − u2 ≤ v ≤ 1 one-to-one and onto S. Then we have
v
Z Z 5Z 1 Z 5
x − 3y 1 v 1 u 1 1
dA = − dvdu = − du = (3 − log 5).
S 2x + y 7 1 −u/2 u 14 1 4 u 14
" #   " #
x u u/v
7.6.6 Substituting u = xy, v = y means that we consider the map =g = .
y v v
  
u √
This function g maps the region Ω = : 1 ≤ u ≤ 4, u ≤ v ≤ 3 one-to-one and onto S. Now
v
216 7. INTEGRATION
" #
1/v −u/v 2
we have Dg = , and det(Dg) = 1/v. Therefore, we have
0 1
Z Z 4Z 3 Z 4Z 3 Z 4
1 √ 13
ydA = √
v · dvdu = √
dvdu = ( u − 3)du = ,
S 1 u v 1 u 1 3

just as before.

7.6.7 Using the same mapping as in part b of Exercise 4, we obtain


Z   Z Z u Z 1
x−y 1 1 v 1
cos dA = cos dudv = v(sin 1)dv = sin 1.
S x+y 2 0 −v v 0 2
 
  
x cos x−y
x+y , x + y 6= 0
We can, to be official, define a function f : S → R by f = . Then f is
y 0, x+y = 0
bounded and discontinuous only at the origin, hence integrable.

7.6.8 We can either introduce an “elliptical cylindrical coordinates” change of variable by


   
" # " #
r 2r cos θ
  x 2u
defining g θ  =  r sin θ  or else make the linear change of variables = and then
z y v
z
use usual cylindrical coordinates in uvz-space. In either event, we end up with the integral
Z 2π Z 2 Z 4(4−r2 ) Z 2
2rdzdrdθ = 16π r(4 − r 2 )dr = 64π.
0 0 0 0

7.6.9 Using the same substitution as in Example 3, we have


Z Z 3Z 2r Z 3Z 2 Z 3 √
u 1 1 √ −3/2 1 √ 1 
xdA = · dvdu = uv dvdu = 1− √ udu = 2 3 1 − √ .
S 0 1 v 2v 0 1 2 0 2 2
7.6.10 Using the same substitution as in Example 3, we have
Z Z 3Z 2
x 1 1 1 1 1
dA = · dvdu = · · 2 = .
S y 1 1 v 2v 2 2 2
#  "
xy x
7.6.11 Making the substitution suggested in the hint, we have g−1 = . The
x2 − y 2 y
mapping g maps the region Ω = [0, 1] × [0, 1] in the uv-plane one-to-one and"onto S. Although#
−1 y x
it is messy (!) to solve for g, and hence for det Dg, we observe that Dg = and so
2x −2y
det Dg−1 = −2(x2 + y 2 ). By Corollary 5.10, we therefore have
Z Z Z
1 1 1
(x2 + y 2 )dA = (x2 + y 2 ) · 2 + y2)
dAuv = dAuv = .
S Ω 2(x 2 Ω 2

7.6.12 The integrand


" #   #" # u = x + y, v = 2x − y + 1. Then this gives
leads us to"try the substitution
x u 1 1 1 u
us the mapping =g = . Then it is easy to check that g maps the
y v 3 2 −1 v−1
7.6. CHANGE OF VARIABLES THEOREM 217
  
u v
region Ω = : 1 ≤ v ≤ 2, 0 ≤ u ≤ one-to-one and onto S. Then we have
v 2
Z Z 2 Z v/2 Z 2
x+y 1 1 u 1 1
4
dA = 2
dv = 4
dudv =
.
S (2x − y + 1)
1 0 3 1 v v 48 24
  " #
u u+v
7.6.13 Using the substitution suggested in the hint, we have g = . Although g
v v − u2
  
u
is far from one-to-one, we can check that g maps the region Ω = : 0 ≤ u ≤ 1, 0 ≤ v ≤ u
v
" #
1 1
one-to-one and onto S. We have Dg = , so det Dg = 1 + 2u. Then we have
−2u 1
Z Z 1Z u Z
1 1 1
u(2u + 1) 1 √
dA = · (2u + 1)dvdu = du = (π 3 − 3).
S (x − y + 1)2 0 0 (u + u + 1)2
2
0
2
(u + u + 1) 2 9

7.6.14 The image of g is a solid torus, obtained by rotating a disk of radius b about a circle of
radius a. We have
 
cos φ cos θ −(a + r cos φ) sin θ −r sin φ cos θ
 
Dg =  cos φ sin θ (a + r cos φ) cos θ −r sin φ sin θ  .
sin φ 0 r cos φ

Expanding the determinant in cofactors on the bottom row, we find that


 
det Dg = sin φ r(a + r cos φ) sin φ + r cos φ (a + r cos φ) cos φ = r(a + r cos φ).

Therefore, the volume of the image is


Z bZ 2π Z 2π Z b
2
r(a + r cos φ)dφdθdr = (2π) a rdr = 2π 2 ab2 = (2πa)(πb2 ).
0 0 0 0

(Cf. Pappus’s Theorem, Exercise 7.4.21.)

7.6.15 Without worrying about convergence of improper integrals, we have


Z Z Z
−1 −1

f (A x)dV = f A(A x) | det A|dV = | det A| f dV = | det A|.
Rn Rn Rn

Now, starting with the last row and subtracting the previous row from each, we have

1 1 1 · · · 1 1 1 1 ··· 1


1 2 1 · · · 1 0 1 0 ··· 0

1 2 3 · · · 1 = 0 0 2 ··· 0 = (n − 1)!.

. . . . . .. .. .. .. . . ..
.. .. .. . . . . . . .


1 2 3 · · · n 0 0 0 · · · n − 1

Therefore, the given integral is equal to (n − 1)!.


218 7. INTEGRATION

   
x1 x1
 x2   2x2 
   
7.6.16 Consider the linear map T : Rn → Rn defined by T  .  =  . . Then det T = n!
 ..   .. 
xn nxn
and T (S) is the “pyramid” R = {x ∈ Rn : xi ≥ 0 for all i, x1 + x2 + · · · + xn ≤ n}. Then vol(R) =
det T vol(S). By induction, the volume of the pyramid {xi ≥ 0 for all i, x1 + x2 + · · · + xn ≤ 1} is
1/n!, so vol(R) = nn /n!, and therefore vol(S) = nn /(n!)2 .
   
ρ ρ sin ψ sin φ cos θ
ψ   ρ sin ψ sin φ sin θ 
Define g : (0, ∞) × (0, π) × (0, π) × (0, 2π) → R4 by g  = 
. Then
7.6.17  φ   ρ sin ψ cos φ 


θ ρ cos ψ
3 2
a deliberate calculation yields | det Dg| = ρ sin ψ sin φ. Therefore,
Z Z 2π Z π Z π Z a
π  a5  2π 2 5
kxkdV = ρ4 sin2 ψ sin φdρdψdφdθ = (4π) = a .
B(0,a) 0 0 0 0 2 5 5

1 P∞ 1 P∞
7.6.18 a. Since = uk when |u| < 1, whenever |xy| < 1, we have = (xy)k
1−u k=0 1 − xy k=0
and
Z Z 1Z 1X∞
1
I= dA = xk y k dxdy
R 1 − xy 0 0 k=0

X Z 1 X∞ X∞
1 1 1
= y k dy = 2
= .
0 k+1 (k + 1) k2
k=0 k=0 k=1

To justify the interchange of the summation and the integration, we need uniform convergence.
(See Chapter 24 of Spivak.) We know that the geometric series converges uniformly on [−b, b] for
Z Z 1−δ Z 1 X
1
any 0 < b < 1. Thus, if we consider Iδ = dA = xk y k dxdy, then we
[0,1]×[0,1−δ] 1 − xy 0 0
can move the summation outside the integral, since the series converges uniformly on |xy| ≤ 1 − δ.
∞ (1 − δ)k
P
We then get Iδ = , and, by Abel’s Theorem, lim Iδ = I.
k=1 k2 δ→0+
  " #
u 1 u−v
b. Consider the mapping g =√ . Then g maps the square S with vertices
v 2 u+v
   √  √   √ 
0 1/ 2 2 1/ 2
, √ , , and √ to R. Then we have
0 −1/ 2 0 1/ 2
Z Z Z
1 1 1
dAxy = 1 2 dAuv = 2 dAuv .
R 1 − xy S 1 − 2 (u − v 2 ) S (2 − u2 ) + v 2

We have to break this up into two separate iterated integrals:


Z Z √ Z √2 Z √2−u
1/ 2 Z u
1 1 1
dAuv = dvdu + √ √ dvdu .
S (2 − u2 ) + v 2 −u (2 − u2 ) + v 2 (2 − u 2) + v2
|0 {z } | 1/ 2 u− 2 {z }
I1 I2
7.6. CHANGE OF VARIABLES THEOREM 219
Z
dx 1 x
Now, using the formula = arctan , we have
a2 + x 2 a a
Z u  iu  
dv 1 v 2 u
2 2
= √ arctan √ = √ arctan √ ,
−u (2 − u ) + v 2 − u2 2 − u2 −u 2 − u2 2 − u2

and then, making the substitution u = 2 sin θ, we have
Z 1/√2
1  u  Z π/6
1  √2 sin θ √
I1 = 2 √ arctan √ du = 2 √ arctan √ 2 cos θdθ
0 2 − u2 2 − u2 0 2 cos θ 2 cos θ
Z π/6
π 2
=2 θdθ = .
0 6
Next, we have s√
Z √
2−u
dv 2 2−u
√ 2) + v2
=√ 2 arctan √ ,
u− 2 (2 − u 2−u 2 2+u

and, then, making the substitution u = 2 cos θ, we obtain
s√ r
Z √2 Z π/3
1 2−u 1 − cos θ
I2 = 2 √ √ arctan √ du = 2 arctan dθ
1/ 2 2−u 2 2+u 0 1 + cos θ
Z π/3 Z π/3
θ 1 π 2
=2 arctan tan dθ = θdθ = .
0 2 0 2 3
1 1  π2
At last, we have I = 2(I1 + I2 ) = 2π 2 + = .
36 18 6
7.6.19 We have a1 = 2 and a2 = π. Let Bn (r) denote the ball of radius r centered at 0 in
R . Then, by scaling, vol(Bn (r)) = r n vol(Bn (1)) = an r n . For each x ∈ B2 (1), the cross-section
n
p
perpendicular to R2 is an (n − 2)-dimensional ball of radius 1 − kxk2 , so
Z Z 2π Z 1
1
an+2 = 1dVn+2 = an (1 − r 2 )n/2 rdrdθ = 2πan · .
Bn+2 (1) 0 0 n+2
If n = 2m, then (by induction hypothesis) we have
 m 
π 2π π m+1
a2m+2 = = ,
m! 2m + 2 (m + 1)!
as desired. And if n = 2m + 1, then (by induction hypothesis) we have
 m 2m+1   
π 2 m! 2π π m+1 22m+2 m!
a2m+3 = =
(2m + 1)! 2m + 3 (2m + 3)(2m + 1)!
π m+1 22m+2 m!(2m + 2) π m+1 22m+3 (m + 1)!
= = ,
(2m + 3)! (2m + 3)!
as we needed to show. Thus, the given formulas for an hold for all n ∈ N.
CHAPTER 8
Differential Forms and Integration on Manifolds
8.1. Motivation

8.1.1 It does neither. A reflection does, however, interchange “front” and “back,” thereby
reversing orientation.

8.1.2 They made the fan belt in the form of a Möbius strip (see Figure 4.6).

8.2. Differential Forms

8.2.1 By definition, Λk (Rn )∗ is the vector space spanned by the dxI with I increasing. Note
that if I = (i1 , . . . , ik ) and (j1 , . . . , jk ) are increasing k-tuples, then

1, j = i , j = i , . . . , j = i
1 1 2 2 k k
(†) dxI (ej1 , . . . , ejk ) = .
0, otherwise

(If i1 < i2 < · · · < ik and j1 < j2 < · · · < jk , suppose ℓ is the smallest index for which we have
jℓ 6= iℓ . If jℓ < iℓ , then jℓ 6= is for all s and so the ℓth column of our determinant is all 0’s; if
jℓ > iℓ , then iℓ 6= js for all s and so the ℓth row of our determinant is all 0’s.) So, suppose that
P
T = I increasing cI dxI = 0. Fixing an increasing k-tuple J = (j1 , . . . , jk ), it follows from (†) that
0 = T (ej1 , . . . , ejk ) = cJ . Since this holds for every increasing k-tuple J, we infer that all the cI ’s
are 0, and so the dxI form a linearly independent set. But this calculation also establishes the
P
second fact: given any T ∈ Λk (Rn )∗ , we write T = I increasing aI dxI for some scalars aI , and then,
by (†), we have aJ = T (ej1 , . . . , ejk ) for any increasing k-tuple J.

8.2.2 a. This is immediate from part (2) of Proposition 2.2.


b. We take k = 2 and n = 4: Let ω = dx1 ∧ dx2 + dx3 ∧ dx4 . Then ω ∧ ω = 2dx1 ∧ dx2 ∧
dx3 ∧ dx4 .

8.2.3 We have

e 1 v1 w 1


v × w = (v2 w3 − v3 w2 )e1 + (v3 w1 − v1 w3 )e2 + (v1 w2 − v2 w1 )e3 = e2 v2 w2 ,

e 3 v3 w 3

so dx(v × w) = v2 w3 − v3 w2 = (dy ∧ dz)(v, w). The other formulas are checked similarly.
220
8.2. DIFFERENTIAL FORMS 221

Remark: This exercise shows that the wedge product is the appropriate generalization of the
cross product to higher dimensions.

8.2.4 a. −5dx ∧ dy + 10dy ∧ dz


b. 8dx ∧ dy ∧ dz
c. 11dx ∧ dy ∧ dz
d. 2dx1 ∧ dx2 ∧ dx3 ∧ dx4
e. 6dx1 ∧ dx2 ∧ dx3 ∧ dx4 ∧ dx5 ∧ dx6

8.2.5 Note that



n 1 v1 w 1
v w v w v w
2 2 3 3 1 1
φ(v, w) = n1 + n2 + n3 = n 2 v2 w 2 ,
v3 w 3 v1 w 1 v2 w 2
n 3 v3 w 3

which is the signed volume of the parallelepiped spanned by n, v, and w. Since n is a unit vector
orthogonal to v and w, the volume is precisely the signed area of the parallelogram spanned by v
and w. (Cf. Proposition 5.1 of Chapter 1 and its proof.)

8.2.6 a. dω = xexy dy ∧ dx = −xexy dx ∧ dy


b. dω = 2zdz ∧ dx + 2xdx ∧ dy + 2ydy ∧ dz = 2(xdx ∧ dy + ydy ∧ dz + zdz ∧ dx)
c. dω = 2xdx ∧ dy ∧ dz + 2ydy ∧ dz ∧ dx + 2zdz ∧ dx ∧ dy = 2(x + y + z)dx ∧ dy ∧ dz
d. dω = (x2 dx1 + x1 dx2 ) ∧ dx3 ∧ dx4 = x2 dx1 ∧ dx3 ∧ dx4 + x1 dx2 ∧ dx3 ∧ dx4

8.2.7 Part (4) of Proposition 2.3 gives a necessary condition.


a. dω = 2dx ∧ dy, so there can be no such function f .
 
x
b. f = x2 y
y
c. dω = dy ∧ dx + dz ∧ dy + dx ∧ dz 6= 0, so there can be no such function f .
 
x
1 1
d. f y  = x3 + xyz + sin y + z 2
3 2
z
 
x 1
e. f = log(x2 + y 2 )
y 2
 y
f. (See Example 5(d).) Here we have ω = d arctan on x 6= 0, but in fact there is no
x
function f defined on R2 − {0} so that ω = df . For example, if we try piecing together different
“arctan”-type functions, we end up with a discontinuity when we wrap around the origin. (See
Theorem 3.2 and Example 9 of Section 3.)

8.2.8 Part (4) of Proposition 2.3 gives a necessary condition. Note that if such a (k − 1)-form
η exists, then η ′ = η + dρ will also work for any (k − 2)-form ρ.
a. η = xdy
222 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

b. η = 12 x2 dy
c. dω = dz ∧ dx ∧ dy, so there can be no such 1-form η.
d. η = yz(dz − dx)
e. dω = dx ∧ dy ∧ dz, so there can be no such 1-form η.
f. dω = (x2 + y 2 + z 2 )−1 dx ∧ dy ∧ dz, so there can be no such 1-form η.
g. η = x1 x5 dx2 ∧ dx3 ∧ dx4

8.2.9 When we say “extending by linearity,” technically this is only correct if we think of the
set of forms as a module over the ring of smooth functions. We mean, of course, that ⋆(f dx+gdy) =
f ⋆dx + g⋆dy = f dy − gdx, etc.
∂f ∂f ∂f ∂f
a. We have df = dx + dy, so ⋆(df ) = dy − dx, and, thus, d⋆(df ) =
∂x ∂y ∂x ∂y
∂2f ∂2f  ∂2f ∂2f 
dx ∧ dy − dy ∧ dx = + dx ∧ dy.
∂x2 ∂y 2 ∂x2 ∂y 2
∂f ∂f ∂f ∂f ∂f ∂f
b. df = dx+ dy + dz, so ⋆(df ) = dy ∧dz + dz ∧dx+ dx∧dy, and, thus,
∂x ∂y ∂z ∂x ∂y ∂z
∂2f ∂2f ∂2f  ∂2f ∂2f ∂2f 
d⋆(df ) = dx ∧ dy ∧ dz + dy ∧ dz ∧ dx + dz ∧ dx ∧ dy = + + dx ∧ dy ∧ dz.
∂x2 ∂y 2 ∂z 2 ∂x2 ∂y 2 ∂z 2

8.2.10 Suppose df = λω. Then 0 = d(df ) = dλ ∧ ω + λdω. Wedging this equation with ω gives

0 = (dλ ∧ ω) ∧ ω + λdω ∧ ω = dλ ∧ (ω ∧ ω) + λdω ∧ ω = λdω ∧ ω,

inasmuch as ω ∧ ω = 0 because ω is a 1-form. Since λ is nowhere-zero, we conclude that dω ∧ ω = 0.



8.2.11 a. g ∗ dx = cos udu and g ∗ ( 1 − x2 ) = cos u (since cos > 0 on (−π/2, π/2)), so g∗ ω = du.
b. g∗ dx = −6 sin 2vdv and g∗ dy = 6 cos 2vdv, so g∗ ω = (−3 sin 2v)(−6 sin 2v)dv +
(3 cos 2v)(6 cos 2v)dv = 18dv.
c. g∗ dx = 3 cos 2vdu − 6u sin 2v, g∗ dy = 3 sin 2vdu + 6u cos 2v, so g∗ ω = (−3u sin 2v)
(3 cos 2vdu − 6u sin 2v) + (3u cos 2v)(3 sin 2vdu + 6u cos 2v) = 18u2 dv.
d. g∗ dx = − sin udu, g∗ dy = cos udu, and g∗ dz = dv, so g∗ ω = v(− sin udu) +
(cos u)(cos udu) + (sin u)dv = (cos2 u − v sin u)du + (sin u)dv.
e. g∗ dx = − sin udu, g∗ dy = cos udu, and g∗ dz = dv, so g∗ ω = v(− sin udu)∧(cos udu)+
sin u(dv) ∧ (− sin udu) = sin2 udu ∧ dv.
f. g∗ dx1 = − sin udu and g∗ dx4 = − sin vdv, so g∗ ω = (sin v)(− sin udu) + (sin u)
(− sin vdv) = − sin u sin v(du + dv).
g. g∗ dx3 = cos udu and g∗ dx4 = − sin vdv, so g∗ ω = (cos u)(cos udu)−(sin v)(− sin vdv)
= cos2 udu + sin2 vdv.
h. g∗ dx1 = − sin udu, g∗ dx2 = cos vdv, g∗ dx3 = cos udu, and g∗ dx4 = − sin vdv, so
 
g∗ ω = (− sin u)(− sin udu) + (cos u)(cos udu) ∧ (− sin v)(− sin vdv) + (cos v)(cos vdv) = du ∧ dv.
8.2. DIFFERENTIAL FORMS 223

8.2.12 Of course, by Proposition 2.4, one’s answers should agree.


a. dω = 0, so g ∗ (dω) = 0.
b. g∗ ω = 18dv, so d(g∗ ω) = 0.
c. dω = 2dx ∧ dy, so g∗ (dω) = 36udu ∧ dv = d(18u2 dv).
d. d(g∗ ω) = (sin u + cos u)du ∧ dv.
e. Since g∗ ω is a 2-form on R2 , d(g∗ ω) = 0.
f. d(g∗ ω) = (sin u cos v − cos u sin v)du ∧ dv.
g. dω = dx1 ∧ dx3 − dx2 ∧ dx4 ; since both g∗ dx1 and g∗ dx3 are multiples of du and both
g∗ dx2 and g∗ dx4 are multiples of dv, g∗ (dω) = 0.
h. Since g∗ ω is a 2-form on R2 , d(g∗ ω) = 0.

8.2.13 Using the result immediately preceding Proposition 2.4 (or Exercise 17), we have (ex-
panding in cofactors along the third row)


sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ


g∗ (dx ∧ dy ∧ dz) = det(Dg)dρ ∧ dφ ∧ dθ = sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ dρ ∧ dφ ∧ dθ

cos φ −ρ sin φ 0

= cos φ(ρ2 sin φ cos φ) + ρ sin φ(ρ sin2 φ) dρ ∧ dφ ∧ dθ = ρ2 sin φdρ ∧ dφ ∧ dθ.

8.2.14 a. By part (4) of Proposition 2.3, if ω = dη, then dω = d(dη) = 0. The closed 1-form
ω = (−ydx + xdy)/(x2 + y 2 ) is, however, not exact on R2 − {0}. (See Theorem 3.2 and Example
9 of Section 3.)
b. Since dω = dφ = 0, by part (3) of Proposition 2.3, d(ω ∧ φ) = dω ∧ φ ± ω ∧ dφ = 0.
c. Suppose ω = dη and dφ = 0. Then d(η ∧ φ) = dη ∧ φ ± η ∧ dφ = ω ∧ φ, so ω ∧ φ is
indeed exact.

8.2.15 Since dx1 , . . . , dxn give a basis for (Rn )∗ , there are scalars aij , j = 1, . . . , n, so that
Pn
ωi = j=1 aij dxj . Using the hypothesis, we have

k
X k X
X n X X
0= dxi ∧ ωi = aij dxi ∧ dxj = aij dxi ∧ dxj + aij dxi ∧ dxj
i=1 i=1 j=1 1≤i,j≤k 1≤i≤k
k+1≤j≤n
X  X
= aij − aji dxi ∧ dxj + aij dxi ∧ dxj .
1≤i<j≤k 1≤i≤k
k+1≤j≤n

Since {dxi ∧ dxj : i < j} gives a basis for Λ2 (Rn )∗ , we infer that aij − aji = 0 for 1 ≤ i < j ≤ k and
aij = 0 for k + 1 ≤ j ≤ n. This is what we needed to establish.
224 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.2.16 By Proposition 2.4, we have (g◦ h)∗ dxi = d(g◦ h)i = d(gi ◦ h) = d(h∗ gi ) = h∗ (dgi ) =
h∗ (g∗ dxi ). It now follows from the definition of pullback that
X X X
(g◦ h)∗ fI dxI = (g◦ h)∗ fI (g◦ h)∗ dxI = fI ◦ (g◦ h)(g◦ h)∗ dxi1 ∧ · · · ∧ (g◦ h)∗ dxik
I I I
X
= (fI ◦ g)◦ h(g◦ h)∗ dxi1 ∧ · · · ∧ (g◦ h)∗ dxik
I
X
= h∗ (g∗ fI )h∗ (g∗ dxi1 ) ∧ · · · ∧ h∗ (g∗ dxik )
I
X  X 
= h∗ (g∗ fI )g∗ dxI = h∗ g∗ fI dxI ,
I I
as required.

8.2.17 a. By definition of the sign of the permutation σ (see p. 306), dxσ(1) ∧dxσ(2) ∧· · ·∧dxσ(n) =
sign(σ)dx1 ∧ dx2 ∧ · · · ∧ dxn .
b. We have
X
n  X
n  X
n 
ω1 ∧ ω2 ∧ · · · ∧ ωn = a1j1 dxj1 ∧ a1j2 dxj2 ∧ · · · ∧ a1jn dxjn
j1 =1 j2 =1 jn =1
X
= a1j1 a2j2 . . . anjn dxj1 ∧ dxj2 ∧ · · · ∧ dxjn
X
= a1σ(1) a2σ(2) . . . anσ(n) dxσ(1) ∧ dxσ(2) ∧ · · · ∧ dxσ(n)
permutations σ
X
= a1σ(1) a2σ(2) . . . anσ(n) sign(σ)dx1 ∧ · · · ∧ dxn = det Adx1 ∧ · · · ∧ dxn ,
permutations σ

as required.
∂gi
c. Setting aij = , the result is immediate from part b.
∂xj

8.2.18 First note that it suffices to check the equality when the vj are basis vectors for the
following reason: By the properties of determinant, the function T : |Rn × ·{z
· · × Rn} → R defined
k times
by T (v1 , . . . , vk ) = det[φi (vj )] is alternating and multilinear and therefore defines an element of
Λk (Rn )∗ . As we know, e.g., from Exercise 1, the values of T on k-tuples of standard basis vectors
determine T uniquely.
P
n
Set φi = aij dxj . Then, as we saw in Exercise 17,
j=1
X
n  X
n 
φ1 ∧ · · · ∧ φk = a1j1 dxj1 ∧ · · · ∧ a1jk dxjk
j1 =1 jk =1
X
= a1j1 . . . akjk dxj1 ∧ · · · ∧ dxjk ,

so, taking v1 = eJ1 , . . . , vk = eJk ,


X
φ1 ∧ · · · ∧ φk (eJ1 , . . . , eJk ) = a1j1 . . . akjk (dxj1 ∧ · · · ∧ dxjk )(eJ1 , . . . , eJk ).
8.2. DIFFERENTIAL FORMS 225

Now,

sign(σ), j1 = σ(J1 ), . . . , jk = σ(Jk ) for some permutation σ
(dxj1 ∧ · · · ∧ dxjk )(eJ1 , . . . , eJk ) = ,
0, otherwise

and so, by Proposition 5.18 of Chapter 7, we have



a
1J1 · · · akJk
. .. ..  
φ1 ∧ · · · ∧ φk (eJ1 , . . . , eJk ) = .. . . = det φi (eJj ) .

akJ1 · · · akJk

8.2.19 Note, first of all, that if v ∈ Rm , then g∗ dxi (a)(v) = Dgi (a)v (e.g., by Proposition 2.4).
Then for any ordered k-tuple I = (i1 , . . . , ik ) and vectors v1 , . . . , vk ∈ Rm , we have (using Exercise
18 at the last step)
 
g∗ dxI (v1 , . . . , vk ) = g∗ dxi1 ∧ · · · ∧ g∗ dxik (v1 , . . . , vk ) = dgi1 ∧ · · · ∧ dgik (v1 , . . . , vk )
= det [dgiℓ (vj )]1≤ℓ,j≤k = det [Dgiℓ (vj )]1≤ℓ,j≤k = det [(Dg(a)vj )iℓ ]

= dxI Dg(a)v1 , . . . , Dg(a)vk .

The case of the general k-form follows by linearity (over smooth functions).
n
X ∂f
8.2.20 Suppose d′ satisfies all the properties listed in Proposition 2.3 as well as d′ f = dxj .
∂xj
j=1
P
Let ω ∈ Ak (U ) be arbitrary. Then ω = I fI dxI . We have

X 
d′ ω = d′ fI dxI by Property (1)
I
X
= d′ fI ∧ dxI + fI d′ (dxI ) by Property (2)
I
X
= dfI ∧ dxI + fI d′ (dxI ) by the additional hypothesis
I
X
= dfI ∧ dxI + fI d′ (dxi1 ∧ · · · ∧ dxik )
I
X 
= dfI ∧ dxI + fI d′ dxi1 ∧ · · · ∧ dxik − · · · ± dxi1 ∧ · · · ∧ d′ dxik by Property (3)
I
X
= dfI ∧ dxI by Property (4), using the hypothesis that dxi = d′ xi
I

= dω,

as required.
226 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.3. Line Integrals and Green’s Theorem


Z Z Z 1

8.3.1 a. g∗ ω = 2tdt, so ω= g ω= 2tdt = 1.
C [0,1] 0
Z Z Z 1
b. g∗ ω = 3t2 dt, so ω= g∗ ω = 3t2 dt = 1.
C [0,1] 0
Z Z Z 1

c. g∗ ω = −2(1 − t)dt, so ω= g ω=− 2(1 − t)dt = −1.
C [0,1] 0
Z Z Z π/2
d. g∗ ω = −4 cos3 t sin tdt, so ω= g∗ ω = −4 cos3 t sin tdt = −1.
C [0,π/2] 0
Z

e. g∗ ω = (1 − cos 2t)(2 cos 2t) + (sin 2t)(2 sin 2t) dt = 2(cos 2t − cos 4t)dt, so ω =
Z Z π/4 C

g ω= 2(cos 2t − cos 4t)dt = 1.
[0,π/4] 0
Z Z

f. g∗ ω = (1−sin t)(− sin t)+cos t(− cos t) dt = −(sin t+cos 2t)dt, so ω= g∗ ω =
C [0,π/2]
Z π/2
− (sin t + cos 2t)dt = −1.
0
Remark: Note that ω = d(xy), and the integral depends only on the endpoints.
Z Z Z 1
5
8.3.2 a. g∗ ω = (t2 + t)dt, so ω= g∗ ω =
(t2 + t)dt = .
C [0,1] 0 6
Z Z Z 1
13
b. g∗ ω = (t4 + 2t2 )dt, so ω= g∗ ω = (t4 + 2t2 )dt = .
C [0,1] 0 15
Z Z Z 1
∗ 5
∗ 2
c. g ω = −((1 − t) + (1 − t))dt, so ω= g ω=− ((1 − t)2 + (1 − t))dt = − .
C [0,1] 0 6
d. g∗ ω = −2 cos t sin t(cos4 t + cos2 t)dt, so
Z Z Z π/2
∗ 5
ω= g ω= −2 cos t sin t(cos4 t + cos2 t)dt = − .
C [0,π/2] 0 6
∗ 2

e. g ω = (1 − cos 2t) (2 cos 2t) + (sin 2t)(2 sin 2t) dt, so
Z Z Z π/4

 5 π
ω= g ω= 2 cos 2t − 2 cos2 2t + cos3 2t + sin2 2t dt = − .
C [0,π/4] 0 3 4
∗ 2

f. g ω = (1 − sin t) (− sin t) + cos t(− cos t) dt, so
Z Z Z π/2
∗ π 5
ω= g ω=− (− sin t + 2 sin2 t − sin3 t + cos2 t)dt = − .
C [0,π/2] 0 4 3
 
cos t
8.3.3 a. We parametrize C by g(t) = , 0 ≤ t ≤ 2π. Then g∗ (xy 3 dx) = − sin4 t cos tdt,
sin t
Z Z 2π
3
so xy dx = − sin4 t cos tdt = 0.
C 0
 
t
b. We parametrize C by g(t) =  1 − 2t , 0 ≤ t ≤ 1. Then g∗ (zdx + xdy + ydz) =
2+t
8.3. LINE INTEGRALS AND GREEN’S THEOREM 227
Z Z 1
3
−3t + 3, so zdx + xdy + ydz = 3(1 − t)dt = .
C 0 2
 
1+t
c. We parametrize C by g(t) =  3t , 0 ≤ t ≤ 1. Then g∗ (y 2 dx + zdy − 3xydz) =
1 − 2t
Z Z 1
2
(27t2 + 12t + 3)dt, so y dx + zdy − 3xydz = (27t2 + 12t + 3)dt = 18.
C 0
d. The tricky part here
 is parametrizing
 C. An orthonormal
 basis for the plane x +
1 1
1 1
y + z = 0 is given by v1 = √  −1  and v2 = √  1 . We thus are able to parametrize
2 0 6 −2
C by g(t) = (cos t)v1 + (sin t)v2 , 0 ≤ t ≤ 2π. (Note that v1 × v2 points upwards, so mov-
ing from v1to v2 gives a counterclockwise
 motionas prescribed in theproblem.) Therefore,
Z
∗ cos t sin t sin t cos t 1 2
g (ydx) = − √ + √ −√ + √ dt = − √ + sin t cos t dt, and so ydx =
Z 2π  2 6 2 6 2 3 3 C
1 2 π
− √ + sin t cos t dt = − √ .
0 2 3 3 3

8.3.4 Recall that the circle x2 + y 2= 2x has the


 equation r = 2 cos θ in polar coordinates.
2
2 cos t
We therefore parametrize C by g(t) =  2 cos t sin t , −π/2 ≤ t ≤ π/2. So, on [0, π/2], we have
2| sin t|

g∗ (ydx + zdy + xdz) = 2 − sin 2t + 2 sin t(cos2 t − sin2 t) + 2 cos3 t dt, and Zon [−π/2, 0] we have
2

g∗ (ydx + zdy + xdz) = 2 − sin2 2t − 2 sin t(cos2 t − sin2 t) − 2 cos3 t dt. Thus, ydx + zdy + xdz =
Z π/2 Z π/2 C
8
−2 sin2 2tdt + 4 sin t(cos2 t − sin2 t)dt = −π − .
−π/2 0 3
 
1 + cos t
An alternative (perhaps easier) parametrization is given by g(t) =  sin t , 0 ≤ t ≤
2 sin(t/2)

2π. Then we get g∗ (ydx + zdy + xdz) = − sin2 t + 2 sin(t/2) cos t + (1 + cos t) cos(t/2) dt, so
Z Z 2π Z 2π

ydx + zdy + xdz = − sin2 t + 2 sin(t/2) cos t + (1 + cos t) cos(t/2) dt = − sin2 tdt +
Z π
C 0 0
2 2
 4  8
4 sin u(2 cos u − 1) + (1 − sin u) cos u du = −π + 4 − 2 = −π − .
0 3 3
 
x 1
8.3.5 Note that ω = df , where f = log(x2 + y 2 ). Then, by Proposition 3.1, we have
y 2
Z    
2 1
ω =f −f = log 2. (Note that f is C1 along C, as we are told that C cannot pass
C 2 1
through the origin.)

8.3.6 We use
 the
 second
Z x0 method Z yillustrated in the text to construct potential functions.
 
x0 0
1 2 2 x 1
a. f = xdx + (x0 + y)dy = (x0 + y0 ) + x0 y0 . Taking f = (x2 +
y0 0 0 2 y 2
y 2 ) + xy, we have indeed that df = (x + y)(dx + dy) = ω.
228 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
 
0
b. dω 6= 0, so ω cannot be exact. Letting C be the rectangle with vertices at ,
−1
      Z
1 1 0
, , and , we check easily that ω = 2.
−1 1 1 C
  Z x0 Z y0  
x0 x 2 2 x0 2 1 3 x
c. f = e dx + (x0 + y )dy = e − 1 + x0 y0 + y0 . Taking f =
y0 0 0 3 y
1
ex + x2 y + y 3 − 1, we have df = (ex + 2xy)dx + (x2 + y 2 )dy = ω, as required.
3
 
x0 Z x0 Z y0 Z z0
  2 2 1
d. f y0 = x dx + (x0 + y )dy + (x0 + y0 + z 2 )dz = (x30 + y03 + z03 ) +
0 0 0 3
z0
 
x
1
x0 y0 + x0 z0 + y0 z0 . Taking f y  = (x3 + y 3 + z 3 ) + xy + xz + yz, we find that df = (x2 + y +
3
z
z)dx + (y 2 + x + z)dy + (z 2 + x + y)dz = ω, as required.
   
x0 Z z0 x

e. f y0 =  (x0 y0 +y0 cos z)dz = x0 y0 z0 +y0 sin z0 . Taking f y  = xy 2 z+y sin z,
2 2 
z0 0 z
we find that df = y 2 zdx + (2xyz + sin z)dy + (xy 2 + y cos z)dz = ω, as required.

8.3.7 a. Differentiating (and using the result of Example 1 of Chapter 3, Section 5) we obtain
1 X  X 
n n

dω = f (kxk) xi dxi ∧ xi dxi = 0,
kxk
i=1 i=1

since the wedge product of any 1-form with itself is 0.


Z u
b. Define G(u) = sf (s)ds, and set g(x) = G(kxk). Then dg = G′ (kxk)dkxk =
P 0
xi dxi P 
G′ (kxk) = f (kxk) xi dxi , as required.
kxk

8.3.8 Of course, the ludicrous nature of the path makes it clear that the 1-form in question
must be exact. Indeed, applying
  one of the methods in Example 4, we find that the 1-form is the
x
1 2
derivative of the function f y  = (3x2 + y 2 + ez ) + xy 2 + x2 z + eyz . Applying Proposition 3.1,
2
z
we have    
Z e 1
5 33 7
df = f (g(1)) − f (g(0)) = f 4 − f −1 = e4 + e2 + e + .
C 2 2 2
1 0
I I I
8.3.9 Since d(xy) = ydx + xdy, we have 0 = d(xy) = ydx + xdy. In the event that C
C C C
bounds a region Ω, then, intuitively speaking, the first integral computes the area of Ω by slicing the
region into thin vertical strips, whereas the second integral computes the area by slicing the region
into thin horizontal strips. Why the sign discrepancy? As we go around C counterclockwise, the
first integral corresponds to an x-integral going from right to left, and actually gives the negative
of the area.
8.3. LINE INTEGRALS AND GREEN’S THEOREM 229

Remark: It is tempting to try to apply Green’s Theorem, but we certainly do not “know” that
every closed curve bounds a region (or a union thereof).
Z
8.3.10 a. Traversing the edges counterclockwise, starting on the x-axis, we have ω =
Z 1 Z 1 Z 1 C

x2 dx + 2ydy − (x2 − 1)dx = 2. Letting R be the square with boundary C, Green’s


0 0 Z Z0 Z Z
1
Theorem yields ω= dω = 4ydx ∧ dy = 4ydA = 4 · y · area(R) = 4 · · 1 = 2.
C R R R 2
Z
b. Traversing the edges counterclockwise, starting on the x-axis, we have ω =
Z 1 Z 1 C

1dy − (−1)dx = 2. Letting R be the square with boundary C, Green’s Theorem yields
Z0 Z 0 Z Z Z 1Z 1
2 2 2 2
ω= dω = 3(x + y )dx ∧ dy = 3(x + y )dA = 3(x2 + y 2 )dydx = 2.
C R R R
 0 0 Z
cos t
c. Of course, we parametrize C by g(t) = a , 0 ≤ t ≤ 2π. We have ω =
sin t C
Z 2π Z
a4 2π 2
a4 2(cos2 t)(sin2 t)dt = sin 2tdt = πa4 /2. Letting D denote the disk of radius a cen-
0 2 0 Z Z Z
tered at the origin and applying Green’s Theorem, we have ω = dω = (x2 + y 2 )dA =
Z 2π Z a C D D
3 4
r drdθ = πa /2.
0 0
  Z
cos2 t
d. We parametrize C by g(t) = 2 , −π/2 ≤ t ≤ π/2. Then ω =
cos t sin t C
Z π/2
32
(2 cos t)(4 cos2 t)dt = . Letting D be the disk bounded by C, Green’s Theorem tells us that
−π/2 3
Z Z Z ! Z p
xdx + ydy p
ω = dω = p ∧ (−ydx + xdy) + x2 + y 2 (2dx ∧ dy) = 3 x2 + y 2 dA =
C D D x2 + y 2 D
Z π/2 Z 2 cos θ Z π/2
32
3r 2 drdθ = 8 cos3 θdθ = .
−π/2 0 −π/2 3
     
0 a a
e. Let C1 be the line segment from to , C2 the circular arc from to
0 0 0
 √   √   
a/ 2 a/ 2 0
√ , and C3 the line segment from √ to . Then
a/ 2 a/ 2 0

Z Z Z Z Z π/4
3 2 3
ω= ω+ ω+ ω =0+a (sin3 θ + cos3 θ)dθ + 0 = a .
C C1 C2 C3 0 3

Z
On the other hand, by Green’s Theorem, letting Ω denote the sector in question, we have ω=
Z Z Z π/4 Z a C
2
dω = 2 (x + y)dx ∧ dy = 2 r 2 (cos θ + sin θ)drdθ = a3 .
Ω Ω 0 0 3
230 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z Z
8.3.11 Let D be the disk bounded by C. Then ω = dω = (1 + 2y)dx ∧ dy =
Z C D D
(1 + 2y)dA = (1 + 2y)area(D) = π. (Since D is symmetric about the x-axis, y = 0.)
D
 
a cos t
8.3.12 Proceeding as in Example 8, we parametrize the boundary curve C by g(t) = ,
b sin t
Z Z
1 1 2π
0 ≤ t ≤ 2π, and calculate the area by the line integral −ydx + xdy = abdt = πab.
2 C 2 0

8.3.13 Proceeding as in Example 8, we use the parametrization givenZ in the hint (with
1
0 ≤ t ≤ 2π) to find that the area is given by the line integral −ydx + xdy =
Z 2π Z 2π Z 2π 2 C
1 3 3 3π
3(sin4 t cos2 t + cos4 t sin2 t)dt = sin2 t cos2 tdt = sin2 2tdt = .
2 0 2 0 8 0 8
8.3.14 It is essential to realize that to apply the method of Example 8 we must have a closed
curve. The boundary C of the region in question consists of the trochoid (traversed right to left),
the two vertical line segments, and the horizontal line segment (traversed left to right). Thus, the
area is given by

Z Z 2π Z a−b
1 1  1
−ydx + xdy = − −(a − b cos t)(a − b cos t) + (at − b sin t)(b sin t) dt + (2πa)dt
2 C 2 0 2 0
= π(2a + b2 ).
2

8.3.15 It is essential to realize that to apply the method of Example 8 we must have a closed
curve. The boundary C of the region in question consists of the evolute (traversed right to left)
and the vertical line segment from A to B. Thus, the area is given by
Z Z 2π Z
1 a2  1 0
−ydx + xdy = −(sin t − t cos t)(t cos t) + (cos t + t sin t)(t sin t) dt + (a)dt
2 C 2 0 2 −2πa
Z  2 
a2 2π 2 2 2 4π
= t dt + πa = πa +1 .
2 0 3
I Z Z
x2 2
8.3.16 a. Let C = ∂S. Then (e − 2xy)dx + (2xy − x )dy = 2ydx ∧ dy = 2ydA =
C S S
2yarea(S) = 2 · 2 · 9 = 36. (Since S is symmetric about the line y = 2, we have y = 2.)
I Z
2 2 sin y
b. Let C = ∂S. Then (2xy − y )dx + (x + e )dy = 6ydx ∧ dy = 6yarea(S) =
C S
7 9 9
6· (15 + π) = 21(15 + π).
2 4 4
 
n t
t1 1 1
8.3.17 a. If T = , then we have σ(T) = −n2 t1 + n1 t2 = = 1, since n and T span
t2 n 2 t2
a parallelogram (square) with signed area 1. Alternatively, note that since we obtain T when we
rotate n by angle π/2, for any v ∈ R2 , we have σ(v) = v · T (see the beginning of Section 5 of
Chapter 1).
8.3. LINE INTEGRALS AND GREEN’S THEOREM 231

b. Suppose we parametrize C by g(t), a ≤ t ≤ b. Then


Z Z b Z b


σ= g σ= −n2 (g(t))g1′ (t) + n1 (g(t))g2′ (t) dt
C a a
Z b Z b Z
′ ′
= g (t) · T(g(t))dt = kg (t)kdt = ds.
a a C
(More directly, see the discussion at the beginning of this section.)
c. Although we didn’t say so explicitly, the definition
Z ofZthe integral of a 1-form ω is
precisely this: if T is the unit tangent vector along C, then ω= f ds, where f = ω(T).
C C
   
n1t1
8.3.18 If n = and T =
, then, since T is obtained by rotating n an angle π/2
n2t2
 
−F2
counterclockwise, we have t1 = −n2 and t2 = n1 , so F · n = F1 t2 − F2 t1 = · T, so
F1
Z Z   Z Z
−F2
F · nds = · Tds = −F2 dx + F1 dy = ⋆ω.
C C F1 C C
By Green’s Theorem, when C = ∂S, we have
Z Z Z Z 
∂F1 ∂F2 
F · nds = ⋆ω = d(⋆ω) = + dx ∧ dy.
C C S S ∂x ∂y
   
r r cos θ
8.3.19 Consider g : [a, b] × [0, 2π] → R2 , g = . Note that as we trace counter-
| {z } θ r sin θ
R    
a b
clockwise around the boundary of R, the image under g traces the line segment from to ,
0 0
   
b a
the outer circle counterclockwise, then the line segment from to , and, finally, the inner
0 0
Z Z
circle clockwise. Then we have ω= g∗ ω, inasmuch as the two line integrals over the line
∂Ω ∂R
segment cancel. Thus, applying Theorem 3.4 on R, we have
Z Z Z Z Z
ω= g∗ ω = d(g∗ ω) = g∗ (dω) = dω,
∂Ω ∂R R R Ω
the last equality holding because g is orientation-preserving and one-to-one except on a set of area
0.

8.3.20 a. Using Fubini’s Theorem as in the proof of Theorem 3.4, we have


Z Z  
∂Q ∂P
dω = − dA
S S ∂x ∂y
Z b Z a(1−y/b) Z a Z b(1−x/a)
∂Q ∂P
= dxdy − dydx
0 0 ∂x 0 0 ∂y
Z b     Z a    
a(1 − y/b) 0 x x
= Q −Q dy − P −P dx
0 y y 0 b(1 − x/a) 0
Z a   Z 1     Z b  
t (1 − t)a (1 − t)a 0
= P dt + (−a)P + bQ dt − Q dt
0 0 0 tb tb 0 t
232 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z
= ω,
∂S

as required.
Z h(x)  
x
b. Using Exercise 7.2.23, if we set F (x) = Q dy, then we have
g(x) y

Z h(x)      
′ ∂Q x x x
F (x) = dy + Q h′ (x) − Q g ′ (x),
g(x) ∂x y h(x) g(x)

and so
Z h(b)   Z h(a)   Z b
b a
Q dy − Q dy = F ′ (x)dx =
g(b) y g(a) y a
Z bZ h(x)   Z b   Z b  
∂Q x x x
dydx + Q h′ (x)dx − Q g ′ (x)dx.
a g(x) ∂x y a h(x) a g(x)

Thus, we have

Z bZ h(x)   Z b   Z h(b)  
∂Q x x b
dydx = Q g ′ (x)dx + Q dy
a g(x) ∂x y a g(x) g(b) y
Z b   Z h(a)   Z
x a
− Q h′ (x)dx − Q dy = Qdy.
a h(x) g(a) y ∂Ω

On the other hand, it is immediate from the Fundamental Theorem of Calculus that
Z bZ h(x)   Z b     Z
∂P x x x
dydx = P −P dx = P dx.
a g(x) ∂y y a h(x) g(x) ∂Ω

8.3.21 To be completely rigorous here, we need the Jordan curve theorem in some form, but the
idea is quite simple. If C has no self-intersection, then we get ±2π, as in Example 10, depending
on whether the origin lies in the region bounded by C. If C has some self-intersection points, start
at one of them, a, and proceed counterclockwise around C until the curve first returns to a. If the
planar region bounded by that portion of the curve contains the origin, then the integral picks up
2π. We delete that portion of the curve and continue. Continue in this manner to delete all loops
in C due to self-intersections. We are guaranteed a finite number. In the end, we are left with
one curve that either encircles the origin or doesn’t. The total line integral is the sum of all the
individual line integrals, each of which is either ±2π or 0; thus, the sum is an integral multiple of
2π.
Remark: As one learns in differential topology (cf. Guillemin and Pollack, Differential Topology,
Prentice Hall, 1974), we can compute the winding number by choosing a “generic” ray from the
origin, which will cross C non-tangentially a finite number of times, and then count those points
of intersection with a sign, + if the curve crosses counterclockwise at the point, − if it crosses
clockwise.
8.3. LINE INTEGRALS AND GREEN’S THEOREM 233

8.3.22 This problem proceeds exactly like Exercise 21. The only difference is that the 1-form
we’re integrating is the sum of

−ydx + (x − 1)dy −ydx + (x + 1)dy


ω1 = A and ω2 = B .
(x − 1)2 + y 2 (x + 1)2 + y 2
Z  
1
As the discussion of Example 10 shows, if C = ∂Ω, then ω1 = (±2π)A if ∈ Ω and 0
C 0
Z  
−1
otherwise, and, likewise, ω2 = (±2π)B if ∈ Ω and 0 otherwise.
C 0

8.3.23 If C = ∂S, then, by Green’s Theorem, the work done by F on the ant is given by
Z Z Z
F · Tds = (y 3 + x2 y)dx + (2x2 − 6xy)dy = (4x − 6y − 3y 2 − x2 )dx ∧ dy
C
ZC S

= 7 − (x − 2)2 − 3(y + 1)2 dA.
S

Now,
Z it follows from Exercise 7.1.5 that if f is continuous and f ≥ 0 on a subset of finite area, then
f dA is largest when S = {x : f (x) ≥ 0}. If we remove any of S, then the integral goes down,
S
and if we continue outside S, thenwe add a negative quantity tothe integral and it goes down.
x
Thus, in our case, we want S = : (x − 2)2 + 3(y + 1)2 ≤ 7 , and C should be the ellipse
y
(x − 2)2 + 3(y + 1)2 = 7.

8.3.24 Z The proof of Theorem 3.2—in particular, of (2) =⇒ (3)—seems to require that the
integral ω be path-independent. In fact, all that is needed is path-independence along paths
that are composed of line segments parallel to the coordinate axes. Then taking any two such
paths from A to B whose union is a simple closed curve, the hypothesis of the problem and Green’s
Theorem tell us that the line integrals agree.

8.3.25 a. Suppose we row a distance d downstream and then back upstream. The total time
d d v 2d
is therefore + = 2d 2 2
> , which is the time it would take with no current
v+c v−c v −c v
whatsoever.
b. Let’s say the current vector is c (constant). We row with ground velocity v, so
the resultant velocity of the boat is v + c. We write kvk = υ and kck = c. Note that since
υ sin β = c sin α (using the notation in Figure 3.15), we have υ 2 cos2 β − c2 cos2 α = υ 2 − c2 . Now,
the time of the trip is
Z Z Z
ds ds υ cos β − c cos α
= = 2 cos2 β − c2 cos2 α
ds
C kv + ck C υ cos β + c cos α C υ
Z Z Z
ds c ds
> − 2 2
cos αds > ,
C υ cos β υ −c C C υ
Z Z Z
1 c
which is the time of the trip with no current. Note that cos αds = c·Tds = · Tds = 0.
C c C c C
234 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

Remark: We can avoid the trigonometry by calculating as follows: Note, first of all, that
v+c
T= . Now,
kv + ck
1  2 2 
(v · T)2 − (c · T)2 = v · (v + c) − c · (v + c) = (v − c) · (v + c) = υ 2 − c2 .
kv + ck2
Therefore,
1 1 1 (v − c) · T v·T c·T
= = = −
kv + ck (v + c) · T (v + c) · T (v − c) · T (v · T)2 − (c · T)2 υ 2 − c2
1 c·T 1 c·T
> − 2 2
> − 2 ,
v·T υ −c υ υ − c2
which gives the same result as before.
       
x θ cos θ cos τ
8.3.26 We have x = = f = b +a , so a straightforward calculation
y τ sin θ sin τ
yields f ∗ (dx ∧ dy) = ab sin(τ − θ)dθ ∧ dτ . Inparticular,
 by the Inverse Function
  Theorem, so long
θ x
as sin(τ − θ) 6= 0, f is locally invertible and is locally a C1 function of . Indeed, it seems
τ y
reasonable
  to assume that, for our purposes, f has a global inverse and so there is a closed curve Γ
θ
in -space with f (Γ) = C. Suppose, moreover, that C = ∂Ω.
τ
   
θ cos θ
Now, the planar coordinates of the center of the wheel are given by y = g =b −
τ sin θ
 
cos τ
, so it follows that y is locally a C1 function of x. Let’s assume the wheel is 1 unit down the
sin τ
arm (as pictured) and has radius 1. Then as the point x moves along the curve C (say, parametrized
by F(t), 0 ≤ t ≤ T ), the center of the wheel moves along a path G(t) = (g◦ f −1 ◦ F)(t). Moreover,
− sin τ
the angle through which the wheel turns is given by α(t), where α′ (t) = G′ (t) · (inasmuch
cos τ
as the wheel rotates in a plane perpendicular to the arm from y to x). Now
Z T Z   Z
− sin τ
total angle = α′ (t)dt = g∗ dy · = b cos(τ − θ)dθ − dτ
0 Γ cos τ Γ
Z Z

= (f ) b cos(τ − θ)dθ − dτ = (f −1 )∗ b sin(τ − θ)dθ ∧ dτ
−1 ∗
C Ω
Z
1 1
= dx ∧ dy = area(Ω).
a Ω a

8.4. Surface Integrals and Flux

8.4.1 Although we can parametrize


  S directly, it is easiest to use the result of Example 5. The
1
1 3
outward unit normal is n =  2 , so g∗ σ = dx ∧ dy.
3 2
2
8.4. SURFACE INTEGRALS AND FLUX 235
   
0 4
a. The projection of S into the xy-plane is a triangle T with vertices at , , and
0 0
 
0 3
, whose area is therefore 4. Thus, the area of S is · 4 = 6.
2 2
Z Z Z
3 3  3 1 
b. (x − y + 3z)σ = x − y + (4 − x − 2y) dx ∧ dy = 6 − x − 4y dA =
S T 2 2 2 T 2
3 1 1 4 2 
· (6 − x − 4y) · area(T ) = 6 · 6 − ( ) − 4( ) = 16. (Recall that the center of mass of a triangle
2 2 2 3 3
is two-thirds the way down all the medians.)
 
  x
x
c. Let g : T → S be given by g = y . Then g∗ (zdx ∧ dy + ydz ∧ dx +
y
2 − x/2 − y
Z

xdy∧dz) = (2− 21 x−y)+y+ 12 x dx∧dy = 2dx∧dy, so the integral is just 2dx∧dy = 2area(T ) = 8.
T
An
 alternative
 solution is to recognize this as a flux integral. This integral gives the flux of
z Z Z
  1
F = y outwards across S, and therefore is the integral (F · n)σ = (x + 2y + 2z)σ =
x S S 3
4 4
area(S) = · 6 = 8.
3 3
 
  a cos θ  
θ   θ
8.4.2 Let S be the surface in question. We parametrize S by g = a sin θ , ∈Ω=
z z
z
{0 ≤ θ ≤ π, 0 ≤ z ≤ a sin θ}. (For future reference, we note that ggives 
a parametrization with
cos θ
outward-pointing normal pointing radially outwards.) We have n =  sin θ , so σ = cos θdy ∧ dz +
0
Z Z π Z a sin θ
sin θdz ∧ dx and g∗ σ = adθ ∧ dz. Thus, we have area(S) = adθ ∧ dz = adzdθ = 2a2 .
Ω 0 0

8.4.3 The projection of S onto the xy-plane is the region Ω bounded by (1 − y)2 = 2(x2 + y 2 ),
i.e., the ellipse x2 + (y + 1)2 /2 = 1. If γ is the angle between the tangent plane of the cone at any
√ √
point and the xy-plane, then it is easy to check that | cos γ| = 1/ 3, so area(S) = 3·area(ellipse) =
√ √ √
3π(1)( 2) = π 6.

8.4.4 Recall that the circle x2 + y 2 = 2y in R2 is given in polar coordinates


 by r = 2 sin θ,
  2 sin θ cos θ
θ
0 ≤ θ ≤ π. We parametrize the surface S in question by g : Ω → R3 , g =  2 sin2 θ , Ω =
z
z
 
   x
θ
: 0 ≤ θ ≤ π, |z| ≤ 2| cos θ| . Now, the unit outward-pointing normal to S is n =  y − 1 ,
z
0
so σ = xdy ∧ dz + (y − 1)dz ∧ dx and g∗ σ = 2(sin2 2θ + cos2 2θ)dθ ∧ dz = 2dθ ∧ dz. Therefore, we
Z Z π/2 Z 2 cos θ
have (using symmetry) area(S) = 2dθ ∧ dz = 8 dzdθ = 16.
Ω 0 0
236 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.4.5 Parametrizing by spherical coordinates parametrization (see Example 6), we pull back
the given 2-form ω = xdy ∧ dz + ydz ∧ dx + zdx ∧ dy to find

g∗ ω = a3 (sin φ cos θ)(sin2 φ cos θ) + (sin φ sin θ)(sin2 φ sin θ) + (cos φ)(sin φ cos φ) dφ ∧ dθ
= a3 sin φdφ ∧ dθ.
Z Z 2π Z π
Thus, ω= a3 sin φdφdθ = (2a3 )(2π) = 4πa3 . Now, we observe that the unit outward-
S 0 0  
x
1  1
pointing normal to S is y and the area 2-form of S is σ = (xdy ∧ dz + ydz ∧ dx + zdx ∧ dy).
a a
z
We thereby recover the usual formula for the surface area of a sphere: area(S) = 4πa2 .

8.4.6Using
Z the calculation
Z 2π Z π in Exercise 5, we have:
4
a. x2 σ = (sin φ cos θ)2 sin φdφdθ = π.
S 0 0 3
Z Z Z
b. By symmetry of the sphere S, x2 σ = y2σ = z 2 σ. (Officially, the function
    S S S
x y
T y  =  z , for example, maps S to itself preserving orientation, since T ∗ σ = σ. Therefore,
z
Z Z x Z Z Z Z
1 1 4π
z2 σ = T ∗ (z 2 σ) = x2 σ.) It follows that x2 σ = (x2 + y 2 + z 2 )σ = σ= .
S S S S 3 S 3 S 3

8.4.7 We obtain the outward-pointing unit normal by finding the cross product
 
e1 −(a + b cos v)(sin u) −b sin v cos u cos u cos v
∂g ∂g
 
× = e2 (a + b cos v)(cos u) −b sin v sin u = b(a + b cos v)  sin u cos v 
∂u ∂v
e3 0 b cos v sin v
 
cos u cos v
so n =  sin u cos v . Thus, the pullback of the area 2-form of the torus is
sin v

g∗ σ = (cos u cos v)g∗ (dy ∧ dz) + (sin u cos v)g∗ (dz ∧ dx) + (sin v)g∗ (dx ∧ dy)
= b(a + b cos v)(cos2 u cos2 v + sin2 u cos2 v + sin2 v)du ∧ dv = b(a + b cos v)du ∧ dv.
Z 2π Z 2π
(Also see Exercise 20.) Therefore, the area of the torus is b(a + b cos v)dudv = 4π 2 ab.
0 0

8.4.8 Without loss of generality, we take the planes to be z = c and z = c + h, −a ≤ c <


c + h ≤ a. Using the spherical coordinates parametrization, we have
Z 2π Z arccos(c/a)  
2 2 c+h c
area = a sin φdφdθ = 2πa − = 2πah.
0 arccos((c+h)/a) a a

Interestingly, the answer depends only on h and not on the location of the planes.
8.4. SURFACE INTEGRALS AND FLUX 237
 
sin v
∂g ∂g 
8.4.9 a. We have × = − cos v , which points upwards for u > 0.
∂u ∂v
u
Z
b. We have g∗ (xdz ∧ dx) = (u cos v)(− cos vdu ∧ dv), so xdz ∧ dx =
Z S
1Z 2π
π
(−u cos2 v)dvdu = − .
0 0 2
     
u 0 x
8.4.10 We have v = (1 − t) 0 + t y  for some t ∈ R. Therefore, 0 = (1 − t) + tz
    
0 1 z
 
    x
u 1 x
and t = 1/(1 − z). Therefore, we have = . Using the fact that  y  lies on the
v 1−z y
z
2 2
unit sphere, we find that (1 − z)u + z 2 = 1, so 1 − z 2 = (1 − z)2 (u2 + v 2 ), from
+ (1 − z)v
 
2 2 x  
which we infer that 1 + z = (1 − z)(u 2 + v 2 ) and z = u + v − 1 . That is,  y  = g u =
  u 2 + v2 + 1 v
2u z
1  . We can see that g is orientation-reversing in several ways: We can
2v
u2 + v 2 + 1 2 2
u +v −1
∂g ∂g
(without a great deal of mirth) calculate the cross product × and see that it is inward-
∂u ∂v
pointing. More geometrically: Note that going counterclockwise around a latitude circle on the
sphere results in a counterclockwise motion in the plane and that heading “uphill” towards the
north pole results in an outwards motion in the plane; thus, a positively-oriented basis for the
tangent plane of the sphere corresponds to a negatively-oriented basis for R2 .

8.4.11 a. If we parametrize by spherical coordinates, then g∗ (xdy ∧ dz) = sin3 φ cos2 θdφ ∧ dθ
Z Z 2π Z π

and xdy ∧ dz = sin3 φ cos2 θdφdθ = .
S 0 0 3
b. Let S ± denote the upper and lower hemispheres, parametrized respectively by
  x
± x   (perhaps it’s better to use polar coordinates). Then, letting D
g = p y
y
± 1 − x2 − y 2
Z Z Z 2π Z 1
x2 r 3 cos2 θ 2
denote the unit disk, xdy ∧ dz = p dx ∧ dy = √ drdθ = π · .
S + D 2
1−x −y 2 0 0 1−r 2 3

Because on S the graph
Z Z parametrization
Z is orientation-reversing,
Z we get the identical integral for
2π 4π
xdy ∧ dz, and so xdy ∧ dz = xdy ∧ dz + xdy ∧ dz = 2 · = .
S− S S+ S− 3 3
2u  −8u 
c. Using the result of Exercise 10, g∗ (xdy ∧ dz) = 2 du ∧ dv
u + v 2 + 1 (u2 + v 2 + 1)3
−16u 2
= du ∧ dv. Recalling that the parametrization is orientation-reversing, we have
(u + v 2 + 1)4
2
Z Z Z 2π Z ∞ Z ∞
16u2 16r 3 cos2 θ 8(u − 1) 4
xdy ∧ dz = 2 2 4
du ∧ dv = 2 4
drdθ = π 4
du = π.
S R2 (u + v + 1) 0 0 (r + 1) 1 u 3
238 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.4.12 We have σ = xdy ∧ dz + ydz ∧ dx + zdx ∧ dy; moreover, on S, we have d(x2 + y 2 + z 2 ) =


2(xdx + ydy + zdz) = 0. Thus,

zσ = xzdy ∧ dz + yzdz ∧ dx + z 2 dx ∧ dy
= zdz ∧ (−xdy + ydx) + (1 − x2 − y 2 )dx ∧ dy
= −(xdx + ydy) ∧ (−xdy + ydx) + (1 − x2 − y 2 )dx ∧ dy
= (x2 + y 2 )dx ∧ dy + (1 − x2 − y 2 )dx ∧ dy = dx ∧ dy.
Z Z
Therefore, zσ = dx ∧ dy = π, inasmuch as the projection of S onto the xy-plane is the unit
S S
disk.  
0
Alternatively, zσ = (F · n)σ for F =  0 . But then we know that the corresponding 2-form
1
η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy = dx ∧ dy.
 
  a cos θ
θ
8.4.13 We parametrize by cylindrical coordinates: g = a sin θ . This is an orientation-

z
z
preserving parametrization. Z
a. Note that g∗ ω = z(−a sin θdθ) ∧ (a cos θdθ) = 0. Therefore, ω = 0. Geometrically,
S
the projection of (any tangent plane) of S onto the xy-plane has area 0.
Z
b. Now g∗ ω = (a sin θ)(−a sin θdθ) ∧ dz = −a2 sin2 θdθ ∧ dz. Therefore, ω =
Z S
2π Z h
(−a2 sin2 θ)dzdθ = −πa2 h.
0 0

8.4.14 The moment of inertia is given by the integral


Z Z 2π Z π
2 2 8 2 2
I = (x + y )δσ = δ (a2 sin2 φ)(a2 sin φ)dφdθ = πδa4 = (4πδa2 )a2 = ma2 .
S 0 0 3 3 3

8.4.15 a. Since F · n = a everywhere, the flux is a(4πa2 ) = 4πa3 .


 
x
1 
b. We have n = y , so F · n = a, and the flux is a(4πah) = 4πa2 h.
a
0
c. To the result of part b we add the flux across the two disks. On these, we have
F · n = h and the additional flux is 2πa2 h. Thus, the total flux is 6πa2 h.
d. By symmetry, the flux is 6 times that across a single face. On any face, we have
F · n = 1, and, since it has area 4, the total flux is 6 · 4 = 24.

8.4.16 a. Since F · n = (x3 + y 3 + z 3 )/a, it is clear from symmetry considerations that the total
flux should be 0. Writing out the integral explicitly, we have
Z 2π Z π

a4 sin φ sin3 φ(cos3 θ + sin3 θ) + cos3 φ dφdθ = 0,
0 0
8.4. SURFACE INTEGRALS AND FLUX 239
Z 2π Z 2π Z π
3 3
since cos θdθ = sin θdθ = 0 and cos3 φ sin φdφ = 0.
0 0 0
b. By the same considerations as in part a, the flux is given by
Z 2π Z π/2 Z π/2
 πa4
a4 sin φ sin3 φ(cos3 θ + sin3 θ) + cos3 φ dφdθ = 2π cos3 φ sin φdφ = .
0 0 0 2
 
  r cos θ
θ
c. Parametrizing the cone by g =  r sin θ , 0 ≤ θ ≤ 2π, 0 ≤ r ≤ 1, we see that
r
r
this gives the correct orientation. Then the flux of F outwards across S is given by integrating the
2-form ω = x2 dy ∧ dz + y 2 dz ∧ dx + z 2 dx ∧ dy, whose pullback is g∗ ω = r 3 (cos3 θ + sin3 θ − 1)dθ ∧ dr.
Z Z 1 Z 2π
π
Then ω= r 3 (cos3 θ + sin3 θ − 1)dθdr = − .
S 0 0 2
Z Z h Z 2π
1 3 3
d. The flux of F outwards across S is given by (x + y )dS = a3 (cos3 θ +
S a 0 0
sin3 θ)dθdz = 0.
e. To the answer of d, we add the flux of F outwards across the bottom disk (0, since
F · n = F · (−e3 ) = −z 2 = 0) and across the top disk (πa2 h2 , since F · n = F · e3 = z 2 = h2 here).
 
  r cos θ
r
8.4.17 We parametrize the paraboloid by g =  r sin θ , 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π. The flux
θ
4 − r2
of F is given by integrating the 2-form ω = xzdy ∧ dz + yzdz ∧ dx + (x2 + y 2 )dx ∧ dy over S. So
Z Z Z

ω= g∗ ω = 2r 3 (4 − r 2 ) + r 3 dr ∧ dθ
S [0,2]×[0,2π] [0,2]×[0,2π]
Z 2π Z 2 Z 4
3 2 88
= r (9 − 2r )drdθ = π u(9 − 2u)du = π.
0 0 0 3
1
8.4.18 a. We have F · n = 1/a2 , so the flux is (4πa2 ) = 4π.
a2
a
b. We have F · n = , so the flux is
+ z 2 )3/2 (a2
Z 2π Z h Z h Z arctan(h/a)
2 1 2 1 h
a 2 2 3/2
dzdθ = 2πa 2 2 3/2
dz = 4π cos udu = 4π √ .
0 −h (a + z ) −h (a + z ) 0 a + h2
2

Z 2π Z a c. To the answer of part


1
b we must add the flux across the two disks, which is

h 1
2 2 2 3/2
rdrdθ = 4πh − √ . Thus, the total flux across the closed cylinder
0 0 (r + h ) h a2 + h2
is 4π.
d. By symmetry, the flux is 6 times that across a single face. On the face z = 1, say, we
have F · n = (x2 + y 2 + 1)−3/2 , so the flux across that face is
Z 1Z 1 Z π/4 Z sec θ Z π/4  
1 r 1
2 2 3/2
dA = 8 drdθ = 8 1 − √ dθ
−1 −1 (x + y + 1) 0 0 (r 2 + 1)3/2 0 sec2 θ + 1
Z π/4 ! Z π/4 !
π cos θ π cos θ
=8 − √ dθ = 8 − p dθ
4 0 1 + cos2 θ 4 0 2 − sin2 θ
240 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
  sin θ iπ/4  π
π π 2
=8 − arcsin √ =8 − = π.
4 2 0 4 6 3

Thus, the total flux across the surface of the cube is 4π.
 
a/2
8.4.19 Note that S lies over the disk D of radius a/2 centered at in the xy-plane.
0
Z Z
a. This is easy, if we keep track of orientation: dx ∧ dy = − dx ∧ dy = −area(D) =
S D
πa2
− .
4
 
Z x/z
b. We can interpret ω as the flux across S of the vector field F =  y/z . Since
S −1
 
x Z √
1 √
the unit normal to S is n = √  y , we have F · n = 2, and F · ndS = 2area(S) =
2z −z S

πa2
2area(D) = , inasmuch as the tangent plane along S makes an angle of π/4 with the xy-plane.
2
(Of course, we could parametrize and pull back, getting 2rdθ ∧ dr.)
 
N1
8.4.20 If N =  N2 , then we have g∗ (dy ∧ dz) = N1 du ∧ dv, g∗ (dz ∧ dx) = N2 du ∧ dv, and
N3
g∗ (dx ∧ dy) = N3 du ∧ dv. Therefore,

g∗ (n1 dy ∧ dz + n2 dz ∧ dx + n3 dx ∧ dy) = (n1 N1 + n2 N2 + n3 N3 )du ∧ dv = kNkdu ∧ dv.

∂g ∂g
Since, by the discussion of Section 4.2, the area of the parallelogram spanned by and is
√ √ ∂u ∂v
EG − F 2 , we infer that kNk = EG − F 2 , as required.

8.4.21 The surface parametrized by g is a Möbius strip. A straightforward


 calculation
  yields
 u  u 1  0 2π
g∗ (dy ∧ dz) = 2 + v sin cos u cos + v sin u du ∧ dv. Even though g =g , note
2 2 2 0 0
   
0 2π
that g∗ (dy ∧ dz) = 2du ∧ dv and g∗ (dy ∧ dz) = −2du ∧ dv. This reflects the fact that as
0 0
we make one trip around the Möbius strip, the orientation reverses.
 
  cos θ
θ  sin θ 
8.4.22 We parametrize X by g : (0, 2π) × (0, 2π) → R4 , g = 
 cos φ . Note that this is an
φ
sin φ
orientation-preserving parametrization.
Z
a. g∗ ω = 0, so ω = 0.
X
b. g∗ ω = (sin θ sin φ + cos θ cos φ)dθ ∧ dφ, so
Z Z 2π Z 2π
ω= (sin θ sin φ + cos θ cos φ)dθdφ = 0.
X 0 0
8.4. SURFACE INTEGRALS AND FLUX 241
Z Z 2π Z 2π
c. g∗ ω 2 2
= sin θ sin φdθ ∧ dφ, so ω= sin2 θ sin2 φdθdφ = π 2 .
X 0 0
 
  cos θ
θ
8.4.23 a. Parametrizing, as usual, by g =  sin θ , 0 ≤ θ ≤ 2π, −1 ≤ z ≤ 1, we have
z
z
Z Z
xdy ∧ dz − zdx ∧ dy = cos2 θdθ ∧ dz = 2π.
S [0,2π]×[−1,1]

Z Z Z
2π Z 2π
b. We have − cos2 θdθ = −π and
xzdy = xzdy = − cos2 θdθ = −π.
Z C1 0 Z Z C2 0

We observe that − xdy ∧ dz − zdx ∧ dy = xzdy + xzdy. This is explained by applying


S C1 C2
Green’s Theorem. As the figure shows, ∂S = C1 ∪C2 , and we see that d(xzdy) = (xdz +zdx)∧dy =
−(xdy ∧ dz − zdx ∧ dy).

8.4.24 a. Parametrizing by spherical coordinates, we have


Z Z
dx ∧ dy + 2zdz ∧ dx = (a2 sin φ cos φ + 2a3 cos φ sin2 φ sin θ)dφdθ = πa2 .
S [0,π/2]×[0,2π]

Z Z 2π
b. xdy + z 2 dx = a2 cos2 θdθ = πa2 .
C 0
The answers are equal. Parametrizing the hemisphere by a disk, we realize that Green’s Theo-
rem predicts that the two integrals should be equal. Note that d(xdy + z 2 dx) = dx ∧ dy + 2zdz ∧ dx.

8.4.25 a. If we cut the Möbius strip down the middle, we get a single band with two half-twists
(which is orientable). When we cut it again, we get two linked bands, each with two half-twists.
b. We end up with two linked bands, one a Möbius strip, the other orientable (with two
half-twists).

8.4.26 This is false. A union of k disjoint orientable surfaces has 2k possible orientations.
242 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.5. Stokes’s Theorem

8.5.1 Since the outward-pointing normal to ∂Rk+ is −ek , we must decide whether
{−ek , e ,...,e } is a positively-oriented basis for Rk . We need k − 1 exchanges
| 1 {z k−1}
standard positive basis for Rk−1
and one change of sign to obtain {e1 , . . . , ek }. This is k sign changes in all, and hence the standard
positive basis for Rk−1 gives the correct orientation precisely when (−1)k = +1.
 
cos t
8.5.2 For the direct calculation, we parametrize C by g(t) =  sin t , 0 ≤ t ≤ 2π.
2 cos t + 3 sin t − 1
Then
Z Z 2π 
ydx − 2zdy + xdz = − sin2 t − 2(2 cos t + 3 sin t − 1)(cos t) + cos t(−2 sin t + 3 cos t) dt
C 0
Z 2π
= (− sin2 t − cos2 t − 8 sin t cos t + 2 cos t)dt = −2π.
0
Z
To apply Stokes’s Theorem, let S be the ellipse bounded by C. We have ydx − 2zdy + xdz =
C  
Z 2
(2dy ∧ dz − dz ∧ dx − dx ∧ dy), which we can interpret as the flux of the vector field F =  −1 
S −1
 
−2
1
outwards across S. Since n = √  −3  (watch out for orientation issues here!), we have F · n =
14 1
√ √
−2/ 14. On the other hand, the area of S is 14π, so the flux of F is −2π. (Alternatively, pulling
back the 2-form 2dy ∧ dz − dz ∧ dx − dx ∧ dy to the unit disk, we get (−4 + 3 − 1)dx ∧ dy, whose
integral is −2π.)

8.5.3 Let S be the ellipse bounded by C, and let D be the disk of radius a in the xy-plane
centered at the origin. Note that because C is oriented clockwise as viewed from high above the
xy-plane, we must endow S and D with the opposite of their usual orientations. By Stokes’s
Theorem,
Z Z
(y − z)dx + (z − x)dy + (x − y)dz = −2(dy ∧ dz + dz ∧ dx + dx ∧ dy)
C S
Z  
b
=2 + 1 dx ∧ dy = 2πa(a + b).
D a

8.5.4 Let S be the disk in the plane z = 1 bounded by its intersection with the sphere, oriented
with its outward-pointing normal upwards. Then
Z Z
(−y 3 + z)dx + (x3 + 2y)dy + (y − x)dz = 3(x2 + y 2 )dx ∧ dy + 2dz ∧ dx + dy ∧ dz
C S
Z 2π Z 1
3
= 3r 3 drdθ = π.
0 0 2
8.5. STOKES’S THEOREM 243

8.5.5 Proceeding as in Example 1, let S be the disk bounded by C in the plane x + y + z= 0. 
Z Z 0
Then 2zdx + 3xdy − dz = 2dz ∧ dx + 3dx ∧ dy, which we can interpret as the flux of F =  2 
C S 3
√ √
outwards across S. Since F · n = 5/ 3 and the area of S is πa2 , the flux is 5πa2 / 3.

8.5.6 Let S1 be the upper hemisphere (oriented upwards) and Z


let S2 be the disk of radius a
in the xy-plane (oriented downwards); then ∂Ω = S1 ∪ S2 . Now, xzdy ∧ dz + yzdz ∧ dx +
 ∂Ω 
xz
2
z )dx ∧ dy is the flux across ∂Ω of the vector field F = 
(x + y + 2 2
yz . On S1 , we have
x 2 2 2
x +y +z
1 z
n =  y , so F · n = x2 + y 2 + a2 ), and
a a
z
Z Z 2π Z π
4 3  3πa4
F · ndS = a (sin2 φ + 1) cos φ sin φdφdθ = 2πa4 = .
S1 0 0 4 2
On S2 , we have n = −e3 and F · n = −(x2 + y 2 ), so
Z Z 2π Z a
a4 πa4
F · ndS = − r 3 drdθ = −2π = − .
S2 0 0 4 2
Z
3πa4 πa4
Thus, F · ndS = − = πa4 .
∂Ω 2 2
On the other hand, by Stokes’s Theorem, we have
Z Z
2 2 2
xzdy ∧ dz + yzdz ∧ dx + (x + y + z )dx ∧ dy = 4zdx ∧ dy ∧ dz
∂Ω Ω
Z 2π Z π/2 Z a
= 4ρ3 cos φ sin φdρdφdθ
0 0 0
 a4  1 
= 4 · 2π = πa4 .
4 2
8.5.7 Let S1 be the parabolic surface z = 1 − x2 − y 2 , z ≥ 0 (oriented upwards), and let S2
be the disk of radius 1 in the xy-plane (oriented downwards). Then ∂M = S1 ∪ S2 and, letting D
denote the unit disk oriented upwards, we have
Z Z Z
ω= ω+ ω
∂M S1 S2
Z Z
2 2 2 2 2

= (1 − x − y ) + 2xy + 2x y dx ∧ dy + 0
D S2
Z 2π Z 1
π
= (1 − r 2 )2 rdrdθ = .
0 0 3
On the other hand, by Stokes’s Theorem, we have
Z Z Z Z 2π Z 1 Z 1−r 2 Z 1
π
ω= dω = 2zdx ∧ dy ∧ dz = 2rzdzdrdθ = 2π r(1 − r 2 )2 dr = ,
∂M M M 0 0 0 0 3
once again.
244 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.5.8 Let D denote the unit disk in the xy-plane, oriented upwards. Calculating directly, we
have
Z Z
F · ndS = x2 zdy ∧ dz + y 2 zdz ∧ dx + (x2 + y 2 )dx ∧ dy
M M
Z

= (1 − x2 − y 2 )(x2 )(2x) + (1 − x2 − y 2 )(y 2 )(2y) + (x2 + y 2 ) dx ∧ dy
ZD
= (x2 + y 2 )dx ∧ dy (by symmetry)
D
Z 2π Z 1
π
= r 3 drdθ = .
0 0 2

In order to apply Stokes’s Theorem, we observe that M ∪ D − is the boundary of the 3-manifold
Ω = {0Z≤ z ≤ 1−x 2 2 2 2 2 2
with boundary
Z Z −y Z}. Then, Zletting ωZ= x zdy∧dz+y zdz∧dx+(x +y )dx∧dy,
we have F·ndS =
ω and ω+ ω= dω = 2(x+y)zdx∧dy∧dz = 0 (by symmetry).
MZ ZM ZD− ZM Ω Ω
π
Therefore, ω=− ω= ω= (x2 + y 2 )dx ∧ dy = , just as before.
M D− D D 2
8.5.9 Since we have no idea what the shape of M is, we must resort to Stokes’s Theorem. There
are two possible approaches. First, as in Example 3, if we attach the disk D = {x2 + y 2 ≤ 4, z = 0},
then M ∪ D − = ∂Ω. Let ω = yzdy ∧ dz + x3 dz ∧ dx + y 2 dx ∧ dy; then dω = 0, and
Z Z Z Z Z Z
0= dω = ω= ω+ ω= ω− ω.
Ω ∂Ω M D− M D
Z Z 2π Z 2 Z
Now, ω= r 3 sin2 θdrdθ = 4π, so ω = 4π, as required.
D 0 0 M
1
The second approach is to observe that ω = dη, where, for example, η = x3 zdx+xy 2 dy+ y 2 zdz.
2
Then
Z Z Z Z Z 2π
2
ω= dη = η= xy dy = 16 sin2 t cos2 tdt = 4π.
M M ∂M ∂M 0
Z Z Z Z
8.5.10 By Stokes’s Theorem, we have dω = ω= ω= dω.
M ∂M ∂M ′ M′

8.5.11 a. We let M ′ be the disk of radius a in the xy-plane


Z centered
Z at theZorigin, oriented with
outward-pointing normal upwards. Then ∂M ′ = ∂M and dω = dω = 2dx ∧ dy = 2πa2 .
M M′ M′
b. We let M ′ be the disk of radius 2 lying in the plane z = 4 that forms
Z theZ“cap” of
the parabolic region, oriented with outward-pointing normal downwards. Then dω = dω =
Z M M′
dy ∧ dx = 4π.
M′
c. Let M ′ be the disk of radius 1 lying in the plane z = 2 that forms
Z the “cap”
Z of the
cylindrical region, oriented with outward-pointing normal downwards. Then dω = dω =
Z Z M M′
3z(x2 + y 2 )dx ∧ dy = −6 (x2 + y 2 )dy ∧ dx = −3π.
M′ M′
8.5. STOKES’S THEOREM 245

8.5.12 In both instances, we apply Stokes’s Theorem to obtain an integral over M .


a. Here dω = dx4 ∧ dx1 ∧ dx2 ∧ dx3 = −dx1 ∧ dx2 ∧ dx3 ∧ dx4 , so
Z Z Z
ω= dω = − dx1 ∧ dx2 ∧ dx3 ∧ dx4
∂M M M
Z Z 1
=− dx4 dVR3
B(0,1) x21 +x22 +x23
Z 2π Z π Z 1Z 1
=− ρ2 sin φdx4 dρdφdθ
0 0 0 ρ2
Z 1 2 8π
= −4π ρ2 (1 − ρ2 )dρ = −4π =− .
0 15 15
b. Here dω = −2x4 dx1 ∧ dx2 ∧ dx3 ∧ dx4 , so
Z Z Z
ω= dω = − 2x4 dx1 ∧ dx2 ∧ dx3 ∧ dx4
∂M M M
Z Z 1
=− 2x4 dx4 dVR3
B(0,1) x21 +x22 +x23
Z 2π Z π Z 1Z 1
=− 2x4 ρ2 sin φdx4 dρdφdθ
0 0 0 ρ2
Z 1 4 16π
= −4π ρ2 (1 − ρ4 )dρ = −4π =− .
0 21 21

8.5.13 Note that X = ∂M , where M = {x21 + x22 ≤ 1, x23 + x24 = 1}. One checks that the
orientation on ∂M is that prescribed for X. Now, dω = (x4 dx2 + x2 dx4 ) ∧ dx1 ∧ dx3 , so (noting
that dx3 ∧ dx4 = 0 on M )
Z Z Z ! Z 
ω= dω = −x4 dx1 ∧ dx2 ∧ dx3 = dx1 ∧ dx2 −x4 dx3 = π 2 .
X M B(0,1) S1
Z Z Z
∂f ∂f ∂f
8.5.14 We have Dn f dS = ∇f · ndS = dy ∧ dz + dz ∧ dx + dx ∧ dy =
Z Z ∂M Z ∂M ∂M ∂x ∂y ∂z
⋆(df ) = d⋆(df ) = ∇2 f dV .
∂M M M

8.5.15 a. We parametrize the cylinder by cylindrical coordinates, as usual. Note that the upper
p
intersection of the cylinder and the sphere is given by z = a 2(1 + sin θ). Thus, we have
Z Z Z √ 2π a Z2(1+sin θ) 2π
zdS = azdzdθ = a3 (1 + sin θ)dθ = 2πa3 .
S 0 0 0

(Note that we have oriented S with the outward-pointing normal away from the axis of the cylinder.)
b. Let C ′ be the circle of radius a in the xy-plane centered at the origin, oriented
′ 2 2 2
counterclockwise.
Z Z Z ∂S = CZ∪ C . Now, let ω = y(z − 1)dx + x(1 − z )dy +Zz dz. Then we
Then
have ω+ ω= dω = 2 xzdy ∧ dz + yzdz ∧ dx + (1 − z 2 )dx ∧ dy = 2a zdS = 4πa4 .
Z C CZ ′ S S Z S
2 2 2
Since ω= −ydx + xdy = 2πa , we infer that ω = 2πa (2a − 1).
C′ C′ C
246 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

8.5.16 a. We parametrize the cylinder by cylindrical coordinates, as usual. Note that the upper
p
intersection of the cylinder and the sphere is given by z = a 2(1 + cos θ) = 2a| cos(θ/2)|. Thus,
we have

Z Z 2π Z 2a| cos(θ/2)| Z 2π
8 3 64 4
z 2 dS = az 2 dzdθ = a | cos(θ/2)|3 dθ = a .
S 0 0 3 0 9

(Note that we have oriented S with the outward-pointing normal away from the axis of the cylinder.)
b. Let C ′ be the circle of radius a in the xy-plane centered at the origin, oriented
′ 3 3
counterclockwise.
Z Z Z Then ∂SZ = C ∪ C . Now, let ω = y(z + 1)dx − x(z + 1)dy + zdz.
Z Then we have
64 5
ω+ ω = dω = 3xz dy ∧ dz + 3yz dz ∧ dx − 2(z + 1)dx ∧ dy = 3a z 2 dS =
2 2 3
a .
C Z C′ Z S S Z S 3
64
Since ω= ydx − xdy = −2πa2 , we infer that ω = 2πa2 + a5 .
C′ C′ C 3
     
2 2 2 2 u 1 u−v x
8.5.17 Let T : R × R → R × R be given by T = = . Then T maps
v 2 u+v y
  
u
X = S1 × S1 = : kuk = kvk = 1 one-to-one onto M . Moreover, if we set Y = {kuk ≤
v
1
1, kvk = 1}, then X = ∂Y . Letting ω = (y12 − x21 )dx2 ∧ dy2 , we have T ∗ ω = u1 v1 du2 ∧ dv2 .
Z Z 2 Z
du2 dv2 ∗
Note that ∧ > 0 with the usual orientation on X, so ω = T ω = d(T ∗ ω) =
Z u 1 v 1 M X Y
1 1
v1 du1 ∧ du2 ∧ dv2 = π 2 .
2 Y 2

8.5.18 (See the solution of Exercise 8.3.3d.) Note that C = ∂M , where M is the disk bounded
 
1
1
by C in the plane x+y+z = 0. Now, an orthonormal basis for that plane is given by v1 = √  −1 
2 0
 
1  
1 r
and v2 = √  1 . We thus are able to parametrize M by g = (r cos θ)v1 + (r sin θ)v2 ,
6 −2 θ
0 ≤ r ≤ 1, 0 ≤ t ≤ 2π. (Note that v1 × v2 points upwards, so g is orientation-preserving.) Then we
Z Z 2π Z 1
2  1  2
∗ 3 ∗ 2
have g d(z dx) = g (3z dz∧dx) = 3r 2 2
sin θ √ rdr∧dθ , so ω= √ r 3 sin2 θdrdθ
3 3 M 3 0 0
π
= √ .
2 3
1
Remark: The area 2-form σ of the plane x + y + z = 0 is given by σ = √ dy ∧ dz + dz ∧ dx +
 3
dx ∧ dy . By symmetry (e.g., the projections of a region into all three coordinate planes have the
1 r
same area), on this plane we have dy ∧ dz = dz ∧ dx = dx ∧ dy, so dz ∧ dx = √ σ = √ dr ∧ dθ.
3 3
Z Z
8.5.19 2 2 2
(See the solution of Exercise 18.) Let ω = xy dx + yz dy + zx dz. Then ω= dω.
Z ZC M
Now, dω = −2(yzdy ∧dz+xzdz∧dx+xydx∧dy) and, by symmetry, yzdy ∧dz = xzdz∧dx =
M M
8.5. STOKES’S THEOREM 247
Z
xydx ∧ dy. Therefore,
M
Z Z Z
ω= dω = −6 xzdz ∧ dx
C M M
Z 2π Z 1
1 1  2  r
= −6 r √ cos θ + √ sin θ − √ r sin θ √ drdθ
0 0 2 6 6 3
Z 2π Z 1
2 π
=√ r 3 sin2 θdrdθ = √ .
3 0 0 2 3

8.5.20 Let ω ∈ Ak−2 (Rk ), and write d(dω) = f (x)dx1 ∧ · · · ∧ dxk . Suppose f (a) > 0. By
continuity, there is a ball B centered at a on which Zf > 0. Then,
Z Z on one hand,
Z by Exercise 7.1.5,
f dV > 0. On the other hand, by Corollary 5.3, f dV = d(dω) = dω = 0. From this
B B B ∂B
contradiction we infer that f = 0 everywhere.

8.5.21 The crucial ingredient


Z here is Corollary 5.3. If we had a 1-form ω on the orientable
surface M with area(S) = ω for every region S ⊂ M , then by Stokes’s Theorem we would have
Z ∂S Z
area(S) = dω. In the case of a compact surface M , this would lead to area(M ) = dω = 0,
S M
which is impossible. Thus, we have
a. No.
b. No.
c. Maybe? By Exercise 8.4.10, we can identify the punctured sphere with R2 , so the
answer should be “yes.” In particular,

4  2(vdu − udv) 
g∗ σ = − du ∧ dv = d .
(1 + u2 + v 2 )2 2 2
| 1 + u{z + v }
η

It follows that the 1-form ω = (g−1 )∗ η will do the job.


 
−∂f /∂x
1
 −∂f /∂y , so
8.5.22 a. The upward-pointing unit normal to the graph is n = p
1 + k∇f k2 1
the area 2-form is σ, as given in the problem. Now, a straightforward but not particularly pleasant
calculation yields
−3/2  ∂f ∂f ∂ 2 f ∂2f  ∂f 2 2 ∂ 2 f  ∂f 2 2 
dσ = 1 + k∇f k2 2 − 1 + − 1 + ,
∂x ∂y ∂x∂y ∂x2 ∂y ∂y 2 ∂x
so dσ = 0 if and only if f satisfies the minimal surface equation.
b.
Z Let N
Z be an oriented
Z surface with outward-pointing unit normal n. Then, by Cauchy-
Schwarz, σ= n · ndS ≤ dS, with equality holding if and only if n = n everywhere (by
N N N
continuity). Now it is easy to see that the latter holds if and only if N is the graph of f + c for
some constant c.
248 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS


c. If M and
Z N are Z
two oriented
Z surfaces with the same boundary
Z Zcurve, then M ∪ N =
∂W . Then we have σ− σ = dσ = 0, so area(M ) = σ = σ < area(N ), since
M N W M N
equality cannot hold with these hypotheses.

8.5.23 a. If ω is a nowhere-zero k-form on M , then we define {v1 , . . . , vk } to be a positive


basis for Tp M if ω(p)(v1 , . . . , vk ) > 0. Note that if g : U → Rn is a coordinate chart (and let’s
assume U is connected), then g∗ ω = f (x)dx1 ∧ · · · ∧ dxk for some smooth function f . Since
ω is nowhere-zero, the function f cannot change sign; say f > 0. Then, by Exercise 8.2.19,
∂g ∂g  ∂g ∂g
dx1 ∧ · · · ∧ dxk ,..., > 0, so ,..., give a positive basis for the tangent space, as
∂u1 ∂uk ∂u1 ∂uk
required.
Conversely, suppose M is orientable. We cover M with open sets Uj on each of which M is
a graph over some coordinate k-plane. Say that on the set Uj we know M is a graph over the
xij,1 · · · xij,k -plane, and we order the coordinates so that dxij,1 ∧ · · · ∧ dxij,k > 0 on Uj . Let {ρj }
be a partition of unity (built out of the coordinate balls in these different sets Uj ). We claim that
P
ω= ρj dxij,1 ∧ · · · ∧ dxij,k is a nowhere-zero k-form on M . For any point p ∈ M , if p ∈ Uj and
{v1 , . . . , vk } is a positively oriented basis for Tp M , then dxij,1 ∧ · · · ∧ dxij,k (v1 , . . . , vk ) > 0; since
  P
ρj (p) ≥ 0, we have ρj dxij,1 ∧ · · · ∧ dxij,k (p) ≥ 0. If p ∈ / Uj , then ρj (p) = 0. Since ρj = 1, it
follows that ω(p)(v1 , . . . , vk ) > 0, as we needed to show.
b. If σ is a volume form on M , then it is of course a nowhere-zero k-form, and so M
is orientable by part a. On the other hand, if M is orientable, there is a nowhere-zero k-form
ω. Locally, let {v1 , . . . , vk } be any smoothly-varying basis for the tangent spaces of M (e.g.,
∂g ∂g
,..., for some coordinate chart g). By applying the Gram-Schmidt process, we obtain
∂u1 ∂uk
a smoothly-varying orthonormal basis {q1 , . . . , qk }. Then |ω(q1 , . . . , qk )| does not depend on the
choice of basis (because any two orthonormal basis differ by a linear transformation of determinant

±1). Setting σ = ω |ω(q1 , . . . , qk )| gives the volume form of M .

8.5.24 Assume M is connected (if not, work with a single connected piece of M ). Let σ be
the volume form of M (see Exercise 23), and write dω = f σ. If f were never 0, then, by the
Intermediate
Z Value Theorem, it would Zalways have
Z to have the same sign. If f > 0 everywhere,
then f σ > 0; yet, by Corollary 5.3, fσ = dω = 0.
M M M

1 −1/x
8.5.25 a. We have f (x) = e−1/x , so we can take p0 (x) = 1. Since f ′ (x) = e , we
x2
2 (k)
can take p1 (x) = x . Now, proceeding by induction, suppose f (x) = e −1/x pk (1/x) for some
1 
polynomial pk of degree 2k. Then we have f (k+1) (x) = e−1/x 2 pk (1/x) − p′k (1/x) . So, if we
 x
set pk+1 (x) = x2 pk (x) − p′k (x) , then we have f (k+1) (x) = e−1/x pk+1 (1/x), as required. Note,
moreover, that if pk has degree 2k, then pk+1 clearly has degree 2(k + 1).
8.6. APPLICATIONS TO PHYSICS 249

b. Obviously h(0) = 0. Suppose that h(k) (0) = 0 for some k ≥ 0. Now, clearly the
(k+1)
left-hand derivative h− (0) = 0, since h(x) = 0 for all x ≤ 0. But we also have

(k+1) h(k) (t) − h(k) (0) f (k) (t) e−1/t pk (1/t)


h+ (0) = lim = lim = lim = lim upk (u)e−u = 0,
t→0+ t t→0+ t t→0+ t u→∞

since lim P (u)e−u = 0 for any polynomial P . We conclude that h(k+1) (0) = 0, as desired. There-
u→∞
fore, h(k) (0) = 0 for all k ≥ 0.

8.5.26 As suggested in the hint, the collection of balls B(q, 1/k) with rational center q ∈ Qn
and radius 1/k, k ∈ N, is a countable collection. (Q is countable, and so Qn is as well. A countable
union of countable sets is countable.) Now, we claim that given any x ∈ Rn contained in some
open set V , there is a ball B(q, 1/k) with x ∈ B(q, 1/k) ⊂ V . (Proof: Since V is open, there is
1 r 1
r > 0 so that B(x, r) ⊂ V . Choose k ∈ N so that < . Then for any q ∈ Qn with kx − qk < ,
k 2 k
we note that, by the triangle inequality, x ∈ B(q, 1/k) ⊂ B(x, r) ⊂ V .)
For each x ∈ X, we know that x ∈ Vα for some α; choose one of our countable collection of balls
B(q, 1/k) containing x and contained in Vα . We end up with a countable collection Bi = B(qi , 1/ki )
covering X so that each Bi ⊂ Vαi for some αi . Since all these Bi ’s cover X, the corresponding sets
Vαi must as well.

8.6. Applications to Physics


   
−νy 0
8.6.1 When a = e3 , we have F0 =  νx . Then curl F0 =  0  = 2νa. Note that
0 2ν
 
a2 x3 − a3 x2
F0 = νe3 × x. More generally, the vector field F = νa × x = ν  a3 x1 − a1 x3  gives rotation about
a1 x2 − a2 x1
 
a1
axis a with angular speed ν, and curl F = 2ν  a2  = 2νa, as required.
a3

8.6.2 By symmetry, the force F is radial and of uniform strength on spheres centered at the
center of the ball. Say the radius of the ball is R. Thus, the flux of F outwards across a sphere of
radius b > R is −kFk(4πb2 ); on the other hand, by Gauss’s law, that flux is −4πGM . Therefore
kFk = GM/b2 . So F is a radial inverse square force outside the ball, as we wished to show.
Z Z Z Z Z
8.6.3 a. We have Dn gdS = ∇g · ndS = ⋆dg = d⋆dg = ∇2 gdV . Then
Z Z ∂Ω Z Z
∂Ω ∂Ω Z Ω Ω
f Dn gdS = f ⋆dg = d(f ⋆dg) = (df ∧⋆dg+f d⋆dg) = (∇f ·∇g +f ∇2 g)dV . Using the
∂Ω ∂Ω Ω Ω Ω
250 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z

second result twice, we have (f Dn g−gDn f )dS = (f ∇2 g+∇f ·∇g)−(g∇2 f +∇g·∇f ) dV =
Z ∂Ω Ω
(f ∇2 g − g∇2 f )dV .

b. We merely apply the results of part a. For the first two equalities, substitute g = f
to get the results. The last result is immediate from the last equation in part a.

8.6.4 Since ∇2 is linear (e.g., because d and ⋆ are), if f and g are harmonic, so then is
h = f − g. So we start with a function h that is Z
harmonic on Ω and
Z has value 0 on ∂Ω. By
the second equation in Exercise 3b, we have 0 = hDn hdS = k∇hk2 dV . Since k∇hk2
∂Ω Ω
is continuous and nonnegative, we conclude from Exercise 7.1.5 that ∇h = 0 everywhere on Ω.
Therefore, h is constant on every connected piece of Ω. Since h = 0 on ∂Ω, it follows that h = 0
on Ω, and so f = g on Ω.

8.6.5 a. See Exercise 3.6.2d.


b. Since ∇g = −x/kxk3 , on a sphere of radius r with outward-pointing unit normal we
have Dn g = −1/r 2 . Applying the last Green’s formula from Exercise 3 to the region Ωε , we have
Z Z Z Z
f Dn gdS − f Dn gdS = gDn f dS − gDn f dS
kxk=r kxk=ε kxk=r kxk=ε

so
Z Z Z Z
1 1 1 1
f dS − 2 f dS = Dn f dS − Dn f dS = 0
r2 kxk=r ε kxk=ε r kxk=r ε kxk=ε

by the first Green’s identity. Therefore,


Z Z
1 1
f dS = f dS
4πr 2 kxk=r 4πε2 kxk=ε

so, by continuity of f at 0,
Z Z
1 1
f dS = lim f dS = f (0).
4πr 2 kxk=r ε→0+ 4πε2 kxk=ε

c. Suppose a is an interior point of Ω. By part b, f (a) is the average of the values of


f on any (small) sphere centered at a, so we cannot have f (a) > f (x) for all x in a neighborhood
of a. Therefore, unless f is constant, its maximum point must be in ∂Ω. (Alternatively, suppose
a is the maximum point and it were an interior point. Then by part b, f must be constant on
a neighborhood of a. Suppose there were a point b ∈ Ω with f (b) < f (a). Take a path g(t),
0 ≤ t ≤ 1, in Ω joining a and b. Consider the set S = {t ∈ [0, 1] : f (g(s)) = f (a) for all s ∈ [0, t]}.
Using the result of part b again, we see that sup S = 1.)

8.6.6 a. Since no points of D lie on or inside S, the integrand is continuous on all of S × D


and, by Fubini’s Theorem,
Z Z Z Z Z
y−x y−x
F · ndS = G 3
· nδ(y)dV y dS x = G 3
· nδ(y)dSx dVy
S S D ky − xk D S ky − xk
8.6. APPLICATIONS TO PHYSICS 251
Z Z 
= Fy · ndSx δ(y)dVy = 0,
D S

(by the discussion of Section 6.2) since no point y ∈ D lies inside S. (Here Fy denotes the force
field due to a unit point mass at y.)
b. The argument is rather similar when all of D lies inside S. Fubini’s Theorem still
applies, and we have
Z Z Z  Z Z
F · ndS = Fy · ndSx δ(y)dVy = (−4πG)δ(y)dVy = −4πG δdV ,
S D S D D

as required.

8.6.7 c., d., e., g., h. div = 0; a., f., g., h. curl = 0
 
−y
8.6.8 a. The vector field F = will do in R2 . Its flow lines are concentric circles about
x
the origin.
b. Suppose C were a closed flow line of F, parametrized by g : [a, b] → Rn . Then, since
I Z Z b
F is conservative, by Theorem 3.2, we have F · Tds = 0. Yet F · Tds = F(g(t)) · g′ (t)dt =
Z b C C a
′ 2 ′
kg (t)k dt, so we must have g (t) = 0 for all t. That is, C is merely a point.
a
Z Z
c. By Exercise 8.3.18, if C = ∂S, then F·nds = div FdA, but since F is everywhere
Z
C S
tangent to C, the flux of F across C is 0. Thus, div FdA = 0, and so it follows from Exercise
S
7.1.5 that div F(x) = 0 for some x ∈ S.
Z Z Z Z
∂f
8.6.9 a. For i = 1, 2, 3, we have f ni dS = (f ei ) · ndS = div (f ei )dV = dV .
Z Z ∂Ω ∂Ω Ω Ω ∂xi
Therefore, f ndS = ∇f dV .
∂Ω Ω
b. Applying the result of part a with f = 1 gives the result. Intuitively, the average
value of n must be 0 on a closed surface because, in order for the surface to close up, the normal
must spend equal amounts of area pointing in opposite directions.

8.6.10 a. This is immediate if we think of approximating the integral by a sum over (almost
planar) pieces of surface area.
Z Z Z
b. Using the result of part a, we have B = − pndS = − ∇pdV = − δgdV =
Z ∂Ω Ω Ω
−g δdV = −M g.

c. Because the object is in equilibrium, the buoyancy force (upwards) exactly balances
the weight of the object (downwards), and so the floating body must displace precisely that amount
of liquid that will have its own weight.
252 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z
d ∂δ
8.6.11 Using Exercise 7.2.20, we have δdV = dV . On the other hand, by Theorem
Z Z dt Ω Ω ∂t
6.2, F · ndS = div FdV . The law of conservation of mass can therefore be rewritten as
∂Ω Ω
Z
∂δ 
(∗) + div F dV = 0.
Ω ∂t
∂δ
Now comes a standard and important argument: Suppose the continuous function + div F
∂t
were nonzero (say, positive) at some point a; then, by Exercise 7.1.5, its integral over a small ball
centered at a would be positive, contradicting equation (∗).
Remark: It is in this way that we go back from integral laws to their “differential” versions, as
in Section 6.3.
Z Z Z
8.6.12 a. The flux of q inwards across ∂Ω is − q·ndS = K∇u·ndS = Kdiv (∇u)dV
Z ∂Ω ∂Ω Ω
= K∇2 udV . (We use Theorem 6.2 at the penultimate step.)

b. This is immediate from the definition of the integral.
Z Z
∂u
c. Applying the same reasoning as in Exercise 11c, we have K∇2 udV = c dV
Ω Ω ∂t
∂u
for all regions Ω, and therefore we must have K∇2 u = c (since the functions involved are
∂t
continuous).
 
x
8.6.13 a. E(0) = 0 since we are given u = 0 for all x. By Exercise 7.2.20 and the second
0
Green’s formula in Exercise 3, we have
Z Z Z Z
′ ∂u 2
E (t) = u dV = u∇ udV = uDn udS − k∇uk2 dV .
Ω ∂t Ω ∂Ω Ω
Z
By the hypothesis that the boundary is insulated, we therefore have E ′ (t) = − k∇uk2 dV ≤ 0.
Z t Ω

Since E(0) = 0, we have E(t) = E (s)ds ≤ 0. But, clearly, E(t) ≥ 0, since E is the integral of
0
a nonnegative function. We infer that E(t) = 0 for all t ≥ 0.
2
 Exercise 7.1.5, since u ≥ 0 and is continuous, the only way its integral can be 0
b. By
x
is to have u = 0 for all x ∈ Ω and t ≥ 0.
t
c. By linearity of the derivative, if u1 and u2 are solutions are the heat equation, then
u = u1 − u2 is as well. If u1 = u2 at t = 0 and along ∂Ω, then we know that u satisfies the
hypotheses given originally. It follows from part b that u = 0, and hence that u1 = u2 , for all x ∈ Ω
and t ≥ 0.

8.6.14 Applying Exercise 7.2.20 as usual, we have


Z 3
! Z 3
!
′ ∂u ∂ 2 u X ∂u ∂ 2 u ∂u 2 X ∂u ∂ 2 u
E (t) = + dV = ∇ u+ dV
Ω ∂t ∂t2 ∂xi ∂t∂xi Ω ∂t ∂xi ∂t∂xi
i=1 i=1
8.7. APPLICATIONS TO TOPOLOGY 253

Z ! Z Z 
∂u 2 X3
∂u ∂ 2 u  ∂u  ∂u 
= ∇ u+ dV = div ∇u dV = ∇u · ndS = 0,
Ω ∂t ∂xi ∂xi ∂t Ω ∂t ∂Ω ∂t
i=1
 
∂u x
inasmuch as = 0 for all x ∈ ∂Ω. Therefore, E is constant, as desired.
∂t t

8.7. Applications to Topology



8.7.1 By construction, r(x) is the point f (x) + t x − f (x) with length 1 and t > 0. Expanding
this out, we obtain the equation

kx − f (x)k2 t2 + 2(x − f (x)) · f (x) t + kf (x)k2 − 1 = 0.


| {z } | {z } | {z }
A B C

Note that, by assumption, A 6= 0. We know from the quadratic formula that the solutions t will be
2
2 2 2

smooth functions provided B − 4AC = 4 (x − f (x)) · f (x) + kx − f (x)k 1 − kf (x)k > 0.
Being the sum of a nonnegative number and a positive number, this expression is in fact clearly
positive. Therefore, the positive root is a smooth function of x, as is r(x).
 
x
8.7.2 Yes, any two maps f , g : [0, 2π] → R2 are homotopic: We merely take H = tf (x) +
t
(1 − t)g(x), the so-called straight line homotopy.

8.7.3 The proof consists basically Zin copying Example Z 2. Let H : X × Z [0, 1] → Y be the
homotopy between f and g. We have H∗ ω = d(H∗ ω) = H∗ (dω) = 0.
∂(X×[0,1]) X×[0,1] Z X×[0,1]Z
 ∗
But ∂(X × [0, 1]) = (−1) dim X−1 −
(X × {1}) ∪ (X × {0}) , so we infer that g ω= H∗ ω =
Z Z X X×{1}

H ω= f ∗ ω, as required.
X×{0} X

8.7.4 Note that on |z| = 2, we have |z 4 | = 16 and | − 3z + 9| ≤ |3z| + 9 = 15. Let g(z) = z 4 . It
f g
follows as in the proof of Theorem 7.6 that on X = {|z| = 2}, the maps and are homotopic
|f | |g|
maps X → S 1 . In particular, we define
 
z z 4 + t(−3z + 9)
H = 4 ,
t |z + t(−3z + 9)|
    Z
z g(z) z f (z)
observe that it is smooth and that H = and H = . Therefore f ∗ω =
Z 0 |g(z)| 1 |f (z)| ∂Ω

g ∗ ω = 8π. It follows from Proposition 7.8 that f has 4 roots in Ω.


∂Ω

8.7.5 a. The antipodal map f (x) = −x has no fixed point.


b. Rotate the annulus about the origin.
c. Rotate the solid torus about its center point.
254 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

d. Consider the function f (x) = (x + e1 )/2.

8.7.6
Z Say M is an n-dimensional manifold. Let ω be an (n − 1)-form on ∂M with the property
that ω 6= 0. (One can either use the volume form on ∂M or else use a form defined in a single
∂M
coordinate chart, bumped off using a partition of unity to give a globally-defined form on ∂M .)
Suppose there were a retraction f . Then we would have
Z Z Z Z
0 6= ω= f ∗ω = d(f ∗ ω) = f ∗ (dω) = 0,
∂M ∂M M M

since the only n-form on an (n − 1)-dimensional manifold is 0. This contradiction completes the
proof.
Z Z Z
8.7.7 We have ω = 10, ω = 0, ω = −3. We can gradually deform C3 so that it is
C3 C4 C5
the union of a curve homotopic to C1 and one homotopic to C2− , the homotopy occurring in R3 − Z.
Since ω is closed, it follows
Z from
Z Proposition
Z 7.4 that the integral does not change under such a
homotopy. Therefore, ω = ω− ω = 10. C4 is actually the boundary of a punctured
C3 C1 C2
torus M that runs along Zthe “lollipop
Z stick” and around the “lollipop”, completely missing the
vertical axis. Therefore, ω = dω = 0. (Alternatively, one can slide C4 down the lollipop
C4 M
stick and pull it across the lollipop, and then pinch to obtain two curves, one homotopic to C2 , the
other homotopic to C2− .) Last,
Z C5 can
Z be deformed
Z into the union of a curve homotopic to C4− and
one homotopic to C1− , so ω=− ω− ω = −3.
C5 C4 C1

8.7.8 a. We have

dz dx + idy x − iy (x − iy)(dx + idy) (xdx + ydy) + i(−ydx + xdy)


= · = 2 2
= .
z x + iy x − iy x +y x2 + y 2

dz f dg + gdf df dg dz 
b. Since (f g)∗ dz = f dg + gdf , we have (f g)∗ = = + = f∗ +
z fg f g z
dz 
g∗ . Taking the imaginary part of this equation, we obtain (f g)∗ ω = f ∗ ω + g ∗ ω, as desired.
z
8.7.9 Since we are given |g
a. −  f | < |f | on C, we can give a homotopy between f and g as
z
maps from C to C − {0}: Let H = (1 − t)f (z) + tg(z). Since (1 − t)f + tg = f + t(g − f ), it
t
follows that |(1 − t)f + tg| ≥ |f | − |g − f | > 0 for all Zt, so H is Z
well-defined and smooth. Now, since

ω is closed on C − {0}, by Proposition 7.4 we have f ω= g ∗ ω, as required.
C C
b. By the Maximum Value Theorem, Theorem 1.2 of Chapter 5, there is a number m > 0
so that |p| ≥ m on ∂D. Let max |z| = R. Choose 0 < δ < m/(1 + R + R2 + · · · + Rn−1 ). Then,
z∈∂D
whenever |aj − bj | < δ, j = 0, 1, . . . , n − 1, we have |P (z) − p(z)| ≤ |b0 − a0 | + |b1 − a1 ||z| + · · · +
|bn−1 − an−1 ||z|n−1 < δ(1 + R + · · · + Rn−1 ) < m whenever z ∈ ∂D. It follows that on ∂D we have
|P − p| < |p|, and so, by part a and Proposition 7.8, p and P have the same number of roots in D.
8.7. APPLICATIONS TO TOPOLOGY 255

c. Here we use the Fundamental Theorem of Algebra, Theorem 7.6, to get started. Given
the polynomial p with roots r1 , . . . , rn ∈ C (allowing repetitions, of course), choose any ε > 0 and
S
n
let D = B(rj , ε). Since we’ve already accounted for all the roots of p, we know that p 6= 0 on
j=1
∂D. It follows from part b that there is δ > 0 so that whenever |bj − aj | < δ, j = 0, 1, . . . , n − 1,
the polynomial P will have n (hence, all) roots inside D. That is, “wiggling” the coefficients of the
polynomial less than δ results in all the roots’ remaining within ε of the original roots.

8.7.10 Suppose not. Then for every x ∈ S 2m , f (x) is equal to neither x nor −x, so there is a
unique great circle starting at x and passing through f (x). Taking the unit tangent vector to that
circle at x (pointing towards f (x)) gives us a smooth nowhere-zero vector field v on S 2m . But by
Theorem 7.9 there can be no such.

8.7.11 As the hint suggests, if there is no x ∈ D n with f (x) = 0, then we can define the map
f
: D n → S n−1 . Consider tf (x)+(1−t)x = x+t(f (x)−x), 0 ≤ t ≤ 1. Then, since kf (x)−xk < 1
kf k
for all x ∈ S n−1 , it follows from the triangle inequality that kx+t(f (x)−x)k ≥ kxk−kf (x)−xk
 > 0
x
for all x ∈ S n−1 and all t ∈ [0, 1]. Therefore, we can define H : S n−1 × [0, 1] → S n−1 by H =
t
tf (x) + (1 − t)x
. H is smooth and gives a homotopy between the identity map on S n−1 and
ktf (x) + (1 − t)xk
fZ/kf k. It follows
∗ from Proposition 7.4 that, taking ω to be the volume form of S n−1 , we have
f
ω 6= 0. But if f 6= 0 on D n , then f /kf k is a smooth function from D n to S n−1 , and,
S n−1 kf k
just as in the proof of Theorem 7.2, we have
Z  ∗ Z  ∗ Z  ∗
f f f
0 6= ω= d ω= dω = 0.
S n−1 kf k Dn kf k Dn kf k

This contradiction allows us to conclude that f must have a zero in D n .


Z 1 Z 1
∂ k−1 ∂fI
8.7.12 a. Note first that, by Exercise 7.2.20, t fI (tx)dt = tk (tx)dt. Then we
∂xℓ 0 0 ∂xℓ
have

n Z
X 1  X k
∂fk d
d(I(φ)) = t (tx)dt dxℓ ∧ (−1)j−1 xij dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik
0 ∂xi
ℓ=1 j=1
Z 1 
+k tk−1 fI (tx)dt dxI .
0

Pn ∂f
I
On the other hand, dφ = dxℓ ∧ dxI , so
ℓ=1 ∂x ℓ

n Z
X 1  k
X 
k ∂fI d
I(dφ) = t (tx)dt xℓ dxI + (−1)j xij dxℓ ∧ dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik .
0 ∂xℓ
ℓ=1 j=1
256 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS

Adding these together, we obtain


Z n Z !
1 X 1
∂f I
d(I(φ)) + I(dφ) = k tk−1 fI (tx)dt + xℓ tk (tx)dt dxI
0 0 ∂xℓ
ℓ=1
Z 1 
d k 
= t fI (tx) dt dxI = fI (x)dxI = φ,
0 dt
as required.
b. We have ω = I(dω) + d(I(ω)) = d(I(ω)), since ω is closed. Therefore, ω is exact.
Moreover, we have given a precise recipe for the (k − 1)-form whose exterior derivative is ω.

8.7.13 Using Exercise


Z 1 12, we have:
   
a. Set η = etx cos(ty) + tz x + 2t3 yz 2 − etx sin(ty) y + tx + 2t3 y 2 z + etz z dt =
0
ex cos y + xz + y 2 z 2 + ez − 2.
Z 1  Z 1 
2 2
b. Set η = t(2tx + t y )dt (ydz − zdy) + t(3ty + tz)dt (xdz − zdx) +
0 0
Z 1  1 2 4 1 2 1 2 1 1  5
t(tz − t xy)dt (xdy − ydx) = xy − yz − z dx + − x y − xz − y 2 z dy + ( xy +
2
0 4 3 3 4 3 4 3
1 1 3
xz + y )dz
3 4
Z 1  1
5
c. Set η = t xyzdt (xdy ∧ dz − ydx ∧ dz + zdx ∧ dy) = xyz(xdy ∧ dz +
0 6
ydz ∧ dx + zdx ∧ dy)

8.7.14 Such a surface is pictured above.

Remark: Every simple closed curve in R3 bounds an orientable surface, called its Seifert surface.
See Adams, Colin, The Knot Book , W. H. Freeman & Co., 1994, pp. 95 ff.

8.7.15 If v ∈ R3 , define 1 2
 a vector field Xv on S ×S as follows. (Recall
2 2
 thatρ : R → R denotes
p tρ(p)
rotation by π/2.) If ∈ S 1 × S 2 , set projq v = tq, and let Xv = ∈ T p  (S 1 × S 2 ).
q v − tq
q
Then the vector fields Xei , i = 1, 2, 3, are the desired linearly independent vector fields.
8.7. APPLICATIONS TO TOPOLOGY 257

8.7.16 a.
V is obviously smooth away from the origin. It isn’t difficult to check that DV(0) = O
∂Vi
and that ≤ Ckxk for some constant C, so the partial derivatives are continuous at 0.
∂xj
b. Since the function f0 is one-to-one and onto, we strongly suspect that for sufficiently
small |t|, the same will be true of ft . Note that Dft (x) = I + tDV(x); since D n+1 is compact, there
is a constant K so that kDV (x)k ≤ K. Therefore, by the Inverse Function Theorem, Theorem 2.1
of Chapter 6, whenever |t| < 1/K, the function ft will have a local inverse at every x. What we want
to establish is that for sufficiently small |t|, the function ft is globally one-to-one. Suppose not. Then
we would have a sequence tk → 0 along with points xk 6= yk ∈ D n+1 with ftk (xk ) = ftk (yk ). Since
D n+1 is compact, by Theorem 1.1 of Chapter 5, we can find convergent subsequences xkj → x0 and
ykj → y0 . Then ftkj (xkj ) → f0 (x0 ) = x0 and ftkj (ykj ) → f0 (y0 ) = y0 , so x0 = y0 . Now, we would
like to assert that having xkj → x0 and ykj → x0 should contradict the fact that the functions ftkj
are locally one-to-one at x0 ; the flaw is that the neighborhood of x0 on which ftkj is one-to-one may
well not include both xkj and ykj . Thus, following the hint, consider F : D n+1 × R → Rn+1 × R,
 
|
        
x ft (x) x  I + tDV V(x)   x0
F = . Then DF = . Since the matrix DF is
t t t  |  0
0 ··· 0 1
 
x0
invertible, by the Inverse Function Theorem, F is locally invertible on a neighborhood of ,
0
   
x y
hence one-to-one on that neighborhood. Therefore, F kj = F kj =⇒ xkj = ykj for j
tkj tkj
sufficiently large, contradicting our hypothesis.
Note that so far we haven’t used the fact that we started with a vector field v on S n . This
means that v(x) · x = 0, and so, for all x ∈ D n+1 , V(x) · x = 0 as well. Therefore, we have
√ √
kft (x)k = 1 + t2 for kxk = 1 and kft (x)k < 1 + t2 kxk for kxk < 1. So we want to claim that for

small |t|, the function ft maps D n+1 onto the closed ball of radius 1 + t2 centered at the origin.
It follows from the proof of the Inverse Function Theorem that the image of ft is the intersection

of B(0, 1 + t2 ) with an open subset of Rn+1 . It follows from Exercise 5.1.10 that the image of ft

is compact, hence a closed subset of B(0, 1 + t2 ). Because the disk is connected, it is a fact that
the only nonempty subset that is both open and closed is the whole set. (See also Exercise 2.2.11.)
Z

c. By Theorem 6.4 of Chapter 7, we have vol(B(0, 1 + t2 )) = | det Dft |dV , which
D n+1
is going to come out a polynomial function of t.

d. Since we have n = 2m, we know that vol(B(0, 1 + t2 )) = vol(D n+1 )(1 + t2 )(2m+1)/2 ,
which is not a polynomial. From this contradiction, we infer that when n is even there can be no
nowhere-vanishing vector field on S n .
CHAPTER 9
Eigenvalues, Eigenvectors, and Applications
9.1. Linear Transformations and Change of Basis
" # " #
2 1 −1 2 −1
9.1.1 a. The change-of-basis matrix is P = , whose inverse is P = .
3 2 −3 2
Thus,

" #" # #" # "


−1 2 −1 136 245 2 1
[T ]B′ = P [T ]P = = .
−3 2 −55 −37
2 −2 3 2
" #
2 −1
b. We see that the matrix for S with respect to the basis B ′ is [S]B′ = . To
1 3
get the matrix with respect to the standard basis E, we use

" #" #" # " #


−1 2 1 2 −1 2 −1 7 −3
[S] = [S]E = P [S]B′ P = = .
3 2 1 3 −3 2 7 −2
   
cos θ − sin θ
9.1.2 Let v1 = and v2 = . Then the linear transformation R giving reflection
sin θ cos θ
across the line spanned by v1 satisfies R(v" 1 ) = v1# and R(v2 ) = −v2 . Thus, the matrix for R with
1 0
respect to the basis B = {v1 , v2 } is A = . To get the matrix with respect to the standard
0 −1
" # " #
cos θ − sin θ cos θ sin θ
basis E, we use [R]E = P AP −1 , with P = and P −1 = P T = .
sin θ cos θ − sin θ cos θ
Thus,

" #" #" # " #


cos θ − sin θ 1 0 cos θ sin θ cos2 θ − sin2 θ 2 sin θ cos θ
[R] = [R]E = =
sin θ cos θ 0 −1 − sin θ cos θ 2 sin θ cos θ sin2 θ − cos2 θ
" #
cos 2θ sin 2θ
= .
sin 2θ − cos 2θ

(Note that we can clearly visualize the succession of moves here: To reflect across the line spanned
by v1 , we first rotate angle −θ, so that the desired axis is now horizontal, we reflect across the
horizontal axis, and then we rotate back through angle θ.)
258
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 259
   
1 1
9.1.3 a. A basis for the plane is given by u1 =  1 , u2 =  0 . To get an orthogonal basis
0 1
(which is actually unnecessary in this problem), we use the Gram-Schmidt process:
 
1
v1 = u1 =  1 
0
     
1 1 1/2
u 2 · v1 1
v2′ = u2 − v1 =  0  −  1  =  −1/2  .
kv1 k2 1
2
0 1
 
1
It is probably easier to clear out the fractions and work with v2 = −1 . Finally, we take v3 to

2
 
−1
be the normal vector of the plane: v3 =  1 .
1
b. Since T (v1 ) = v1 , T (v2 ) = v2 , and T (v3 ) = −v3 , the matrix for T with respect to
 
1 0 0
 
the basis B ′ = {v1 , v2 , v3 } is [T ]B′ =  0 1 0 .
0 0 −1
 
1 1 −1
 
c. Let P =  1 −1 1  be the change-of-basis matrix from the standard basis E to
0 2 1
   
kv1 k2 0 0 2
   
B ′ . Since the columns of P are orthogonal, P TP =  0 kv2 k2 0 = 6 ,
0 0 kv3 k2 3
 1
  1 1

2 2 2 0
  T  1 . (See also Exercise 5.5.16.) Thus,
and so P −1 =  1
6 P =  1
6
1
−6 3 
1
3 − 13 1
3
1
3

   1 1
  
1 1 −1 1 0 0 2 2 0 1 2 2
[T ] = [T ]E = P [T ]B′ P −1
   1 = 1 
= 1 −1 10 1 0   16 − 61 3  2 1 −2  .
3
0 2 1 0 0 −1 − 13 1
3
1
3 2 −2 1
   
1 0
9.1.4 Let u1 =  0  and u2 =  1 . Then {u1 , u2 } is a basis for V . Using the Gram-
1 −2
Schmidt process, we get an orthogonal basis {v1 , v2 } for V :
     
 
1 0 1 1
    −2    
v1 = u1 =  0  , v2 = u2 − projv1 u2 =  1  − 0 =  1.
2
1 −2 1 −1
260 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
 
−1
 
We find easily that v3 =  2  gives a basis for V ⊥ . With respect to the basis B ′ = {v1 , v2 , v3 }
1
 
1 0 0
 
the matrix for T = projV is [T ]B′ =  0 1 0 , so the standard matrix is given by [T ] =
0 0 0
   1 1

1 1 −1 2 0 2
   
P [T ]B′ P −1 , where P =  0 1 2  and P −1 =  31 1
3 − 31 . Thus,
1 −1 1 − 16 1
3
1
6

   1 1
  
1 1 −1 1 0 0 2 0 2 5 2 1
    1 
[T ] =  0 1 20 1 0   13 1
3 − 31  = 2 2 −2  .
6
1 −1 1 0 0 0 − 16 1
3
1
6 1 −2 5
     
2 −2 1
9.1.5  
The plane is spanned by v1 = 1 and v2 =  0 , and the vector v3 = −2 
 
0 1 2
is normal to the plane. The matrix for T with respect to the basis B ′ = {v1 , v2 , v3 } is [T ]B′ =
   
1 0 0 2 −2 1
   
0 1 0 , and so the standard matrix for T is [T ] = P [T ]B′ P −1 , where P =  1 0 −2 
0 0 −1 0 1 2
 
2 5 4
−1 1 
and P =  −2 4 5 . Thus,
9
1 −2 2

     
2 −2 1 1 0 0 2 5 4 7 4 −4
1    1 
[T ] =  1 0 −2   0 1 0   −2 4 5 =  4 1 8.
9 9
0 1 2 0 0 −1 1 −2 2 −4 8 1

9.1.6 Rotation through an angle of π/2 about the x3 -axis is given by the matrix B =
 
0 −1 0
 
1 0 0 ; rotation through an angle of π/2 about the x1 -axis is given by the matrix
0 0 1
 
1 0 0
 
A = 0 0 −1 . Thus, the standard matrix for the composition of these two rotations
0 1 0
    
1 0 0 0 −1 0 0 −1 0
    
is AB =  0 0 −1   1 0 0 = 0 0 −1 .
0 1 0 0 0 1 1 0 0
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 261
     
1 −1 1
9.1.7 The vectors v1 =  1  and v2 =  0  give a basis for V ; v3 =  −1  gives a
0 1 1
basis for V ⊥
 .For part c we will
 want  an orthonormal basis,
 so 
the Gram-Schmidt process yields
1 −1 1
1 1 1
q1 = √  1  and q2 = √  1 , and we take q3 = √  −1 .
2 0 6 2 3 1
a. Working with the “new” basis B ′ = {v1 , v2 , v3 }, we have the change-of-basis matrix
   
1 −1 1 1 2 1
  1 
P = 1 0 −1 , P −1 =  −1 1 2 , and the matrix for projection onto V is [T ]B′ =
3
0 1 1 1 −1 1
 
1 0 0
 
0 1 0 . Thus, the standard matrix is
0 0 0

   
1 −1 1 1 0 0 1 2 1
1   
[T ] = P [T ]B′ P −1 = 1 0 −1   0 1 0   −1 1 2
3
0 1 1 0 0 0 1 −1 1
 
2 1 −1
1 
=  1 2 1.
3
−1 1 2

b. Now we are working


 with reflection, and the matrix with respect to our convenient
1 0 0
 
basis is [S]B′ =  0 1 0 . Therefore, the standard matrix is
0 0 −1

   
1 −1 1 1 0 0 1 2 1
1   
[S] = P [S]B′ P −1 = 1 0 −1   0 1 0   −1 1 2
3
0 1 1 0 0 −1 1 −1 1
 
1 2 −2
1 
=  2 1 2.
3
−2 2 1

Notice that S = 2T − I.
c. Since we’re dealing with rotation now, we want to use the orthonormal basis (indeed,
we need the basis for V to be orthonormal;
 the length of the
 normal vector is immaterial). Now
√1 − 6
√1 √1
 12 3 
we take the change-of-basis matrix Q =  √
 2
√1
6
− √1 , and, since Q is orthogonal, we have
3 
2 1
0 √
6

3
Q−1 = QT . The basis {q1 , q2 , q3 } is right-handed, since, for example, q1 × q2 = q3 . This means
262 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
 √3 
2 − 12 0

 1 3 
that q1 rotates toward q2 and q2 toward −q1 . Thus, we have the matrix [R]B′ =  2 2 0 ,
0 0 1
and

  √  1 
√1 − √16 √1 3
− 21 0 √ √1 0
 2 3  2 √  2 2 

[R] = Q[R]B′ Q−1 = 

√1
2
√1
6
− √13 

1
2 2
3
0  − √1
6
√1
6
√2
6


0 √20 0 √1 1 √1 − √1 √1
6 3 3 3 3
√ √ 
3+1 −1 − 3+1
1 √ √ 
=  3−1 3+1 −1  .
3 √ √
1 3−1 3+1
   
1 0
0  1
9.1.8 Let v1 =     ⊥
 2  and v2 =  −1 . We obtain a basis for V by finding a basis for
1 1
   
" # −2 −1
1 0 2 1  1  
the nullspace of the matrix : v3 =  , v4 =  −1 . Then we have
0 1 −1 1  1  0
0 1
projV v1 = v1 , projV v2 = v2 , projV v3 = 0, and projV v4 =  0, and so the matrixfor T = projV
1 0 0 0
 
 0 1 0 0

with respect to the new basis B = {v1 , v2 , v3 , v4 } is [T ]B = 
′  . Letting P =
0 0 0 0
0 0 0 0
   
1 0 −2 −1 3 1 5 4
   
0 1 1 −1   6 −4 7
 , we have P −1 = 1  1 , and so [T ] = P [T ]B′ P −1 =
 2 −1 1 0  17  −5 4 3 −1 
   
1 1 0 1 −4 −7 −1 6
 
3 1 5 4
 
1 1 6 −4 7.
17  5 −4 14 1
4 7 1 11

9.1.9 a. For any invertible matrix P we have P −1 (cI)P = cP −1 P = cI.


b. If B = P −1 AP , then A = P BP −1 = (P −1 )−1 B(P −1 ), so A is similar to B.
" # " # " #
0 1 a 0 b 0
c. Let P = . Then P −1 = P , and P −1 P = .
1 0 0 b 0 a
" # " # " # " #
1 a−b 1 a 1 2a − b 1 b
d. Let P = . Then P = =P .
0 1 0 2 0 2 0 2
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 263
" # " # " #
2 1 2 0 x y
e. Let A = and B = . Suppose that P = satisfies
0 2 0 2 z w
" # " #
2x 2y 2x + z 2y + w
P B = AP . Then we have = , and so z = w = 0. But then P fails to
2z 2w 2z 2w
be invertible. " #
a 0
f. We repeat the calculation of part c with B = for some a, b ∈ R. Now
0 b
" #
ax by
PB = and the equations resulting from P B = AP are: 2x + z = ax, 2y + w = by,
az bw
2z = az, and 2w = bw. Now, if z 6= 0, we infer that a = 2, but then the first equation implies that
z = 0. Thus, we must have z = 0. Similarly, we must have w = 0. Once again, we are left with the
conclusion that P cannot be invertible.

9.1.10 a. If B = P −1 AP , then B T = (P −1 AP )T = P T AT (P −1 )T = Q−1 AT Q, where Q =


(P −1 )T . (Note that Q = (P −1 )T = (P T )−1 , so Q−1 = P T , as required.)
b. False: Let A = I and B = −I. Then A2 = B 2 , but B cannot be similar to A.
c. If B = P −1 AP and A is nonsingular, then B is a product of nonsingular matrices
and is therefore nonsingular (see Exercise 4.1.17 or Proposition 4.3 of Chapter 1.)
" # " # " #
1 0 3 1 −1 −1
d. False: Let A = and P = . Then B = P −1 AP = is
0 2 2 1 6 4
certainly not symmetric.
" # " # " #
1 0 3 1 3 1
e. False: Let A = ,P = , and B = P −1 AP = . Then
0 0 2 1 −6 −2
 
0
x= ∈ N(A), but Bx 6= 0.
1
f. From Exercise 4.4.13 we know that for any n×n matrices X and Y we have rank(XY ) ≤
rank(X) and rank(XY ) ≤ rank(Y ). Hence, if B is similar to A, we have rank(B) = rank(P −1 AP ) ≤
rank(P −1 A) ≤ rank(A). On the other hand, rank(A) = rank(P BP −1 ) ≤ rank(P B) ≤ rank(B).
Therefore, rank(A) = rank(B).

9.1.11 a. Suppose B is nonsingular. Then AB = B −1 (BA)B, so AB is similar to BA. Now,


suppose A is nonsingular. Then AB = A(BA)A−1 = (A−1 )−1 (BA)(A−1 ), whence AB is similar to
BA. " # " # " #
0 1 1 0 0 0
b. No: For example, let A = and B = . Then AB = ,
0 0 0 0 0 0
" #
0 1
whereas BA = ; these matrices cannot be similar, as O is similar only to itself.
0 0
264 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
   
− sin θ − cos φ cos θ
9.1.12 a. As in the hint, we consider the basis formed by v1 =  cos θ , v2 =  − cos φ sin θ ,
0 sin φ
 
sin φ cos θ
v3 = a =  sin φ sin θ , and the corresponding change-of-basis matrix
cos φ
 
− sin θ − cos φ cos θ sin φ cos θ
 
P =  cos θ − cos φ sin θ sin φ sin θ  .
0 sin φ cos φ

Since B ′ = {v1 , v2 , v3 } is an orthonormal basis for R3 , the matrix P is orthogonal, and P −1 = P T .


Also note that since v3 = a, {v1 , v2 } is an orthonormal basis for the plane a · x = 0.
If a vector x ∈ R3 has coordinate vector y with respect to the new basis B ′ , then x = P y. A
vector x lies in the intersection of a · x = 0 and x21 + x22 = 1 if and only if its coordinates with
respect to the basis B ′ satisfy y3 = 0 and (−y1 sin θ − y2 cos φ cos θ)2 + (y1 cos θ − y2 cos φ sin θ)2 = 1.
Simplifying, we find that y12 + (cos2 φ)y22 = 1, y3 = 0, which is evidently the equation of an ellipse.
Remark: How did we come up with this basis? v1 is orthogonal to both e3 and a and hence
is in the direction of the intersection of the xy-plane and the plane a · x = 0. v2 is orthogonal to
both a and v1 ; this is the direction of the projection of e3 onto the plane a · x = 0.

v2

v3=a

v1

b. Using the same change of coordinates as in part a, we have

x1 = −y1 sin θ − y2 cos φ cos θ + y3 cos φ cos θ


9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 265

x2 = −y1 cos θ − y2 cos φ sin θ + y3 cos φ sin θ


x3 = y2 sin φ + y3 cos φ .

The equation x21 + x22 = 1 can be rewritten as x21 + x22 + x23 = 1 + x23 , and so the circle on the cylinder
given by intersecting with the plane x3 = b is given in the new coordinates by

y12 + y22 + y32 = 1 + b2 , y2 sin φ + y3 cos φ = b.

The equation of the projection of this set onto the plane y3 = 0 is given by eliminating y3 from
these equations:
y22 − 2by2 sin φ + b2 sin2 φ (y2 − b sin φ)2
y12 + = y 2
1 + = 1,
cos2 φ cos2 φ
which we recognize as the equation of an ellipse. Thus, the projection of the portion of the cylinder
with −h ≤ x3 ≤ h gives the family of ellipses
(y2 − b sin φ)2
y12 + = 1, −h ≤ b ≤ h.
cos2 φ
This can be pictured by drawing the single ellipse with b = 0 and then displacing it continuously
vertically so that its center moves from −h sin φ to h sin φ. Notice that when h sin φ ≥ cos φ, this
appears as a rectangle with two half-elliptical ends (as we’d expect); when h sin φ < cos φ, there is
a “hole” in the projection.

y2 y2
cos φ

h sin φ

1 y1 y1

h ‡ cot φ h < cot φ


 
1
When a =  0 , we have sin φ = cos θ = 1, so cos φ = 0 and our computations are a bit flawed;
0
but it is easy to see that in this case we obtain the obvious rectangle, −1 ≤ y1 ≤ 1,−h≤ y2 ≤ h.
0
As we vary φ, the rectangle acquires elliptical ends. Finally, when we take a =  0 , we have
1
cos φ = 1, and the projection is simply the circle y12 + y22 = 1, as expected.

9.1.13 As we see in the figure below, the shape that is generated consists of two cones (with

vertex angle 2 arccos(1/ 3) ≈ 109◦ ) joined by a band in the shape of a hyperboloid of one sheet.
266 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

The hyperboloid is the surface of revolution obtained by rotating, say, the line segment joining

   
−1 1
 1  and  1  about the given axis.
−1 −1
Let’s find the equationof that
 surface in
 a new
 coordinate
 system
 corresponding to the or-
1 1 1
1 1 1
thonormal basis v1 = √  −1 , v2 = √  1 , v3 = √  1 . Letting A be the orthogonal
2 0 6 −2 3 1
matrix with these column vectors, we have
 
√1 − √12 0
 2 
A−1 = AT = 

√1
6
√1
6
− √26 
.
√1 √1 √1
3 3 3

Then the line segment is parametrized, in these new coordinates, by


 
√1 (t − 1)
 12 
f (t) =  √ (t + 3)  ,
 6  −1 ≤ t ≤ 1,
1
√ t
3

and the surface, in turn, by


   1 
  √ (t − 1) cos s − √1 (t + 3) sin s
cos s − sin s 0 2 6
s    
g =  sin s cos s 0  f (t) =  √12 (t − 1) sin s + √16 (t + 3) cos s  , 0 ≤ s ≤ 2π, −1 ≤ t ≤ 1.
t
0 0 1 √1 t
3

We now solve for the curve of intersection√of this surface with the plane x1 = 0: We have tan s =
√ t−1 t+3 3(t − 1)
3 , so cos s = √ and sin s = √ , and
t+3 2 t2 + 3 2 t2 + 3
√ !
1 3 2 1 2 2 p2 1
x2 = √ √ (t − 1) + √ (t + 3) =√ t +3 and x3 = √ t,
2
2 t +3 2 6 6 3

so the “profile curve” of the surface of revolution is the segment of the hyperbola

x22 1 1
− x23 = 1, − √ ≤ x3 ≤ √ .
2 3 3
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 267

9.1.14 a. Let v ∈ V and let w = T (v). As in the proof of Theorem 1.1, we let x and x′ denote
the coordinate vectors of v with respect to the bases V and V′ , respectively. Likewise, we let y and
y′ denote the coordinate vectors of w with respect to W and W′ , respectively. We have


x = P x′ , y = Qy′ , y′ = [T ]W ′
V′ x , and y = [T ]W
V x.

′ ′
Thus, y = Qy′ = Q[T ]W V′
x′ = Q[T ]WV′
P −1 x. Since this works for any v ∈ V , we have [T ]W
V =
′ ′
W −1 W −1 W
Q[T ]V′ P , or, equivalently, [T ]V′ = Q [T ]V P .
b. If we use the basis V in both domain and range, then the matrix for T is [T ]VV = I.
Now, changing basis in the domain to V′ but keeping the same basis for the range, we have Q = I,
and so [T ]VV′ = Q−1 [T ]VV P = P .

9.1.15 First assume that A = QP , where Q is orthogonal and P is a projection matrix. Let
V = C(P ) and note that R(A) = R(QP ) = R(P ) = C(P T ) = C(P ) = V , since P T = P and
Q is nonsingular. Also note that C(A) = {Qv : v ∈ V }. Now, if x ∈ R(A) = V , we have
AT Ax = (P Q−1 )(QP )x = P x = x. If y ∈ C(A), then we can write y = Qv for some v ∈ V ,
so AAT y = (QP )(P Q−1 )y = (QP Q−1 )y = QP v = Qv = y. Thus, T : R(A) → C(A) and
S : C(A) → R(A) are inverse functions.
To prove the converse, let’s assume T : R(A) → C(A) and S : C(A) → R(A) are inverse
functions. Choose an orthonormal basis {v1 , . . . , vk } for R(A). We claim that {Av1 , . . . , Avk }
is an orthonormal basis for C(A). First, notice that Avi · Avj = vi · AT Avj = vi · vj , since
AT A is the identity on R(A). Thus, {Av1 , . . . , Avk } is an orthonormal set of vectors in C(A).
Since dim C(A) = dim R(A), it must give a basis for C(A). Now, extend the set {v1 , . . . , vk }
to give an orthonormal basis V = {v1 , . . . , vn } for Rn and extend the set {Av1 , . . . , Avk } to give
an orthonormal basis W = {Av1 , . . . , Avk , wk+1 , . . . , wn } for Rn . Notice that for j > k we have
vj ∈ R(A)⊥ = N(A) and so Av " j = 0 for# j > k. Thus, the matrix for T with respect to the bases
I k O
V and W is given by [T ]W
V = .
O O
Now let Q1 be the matrix whose columns are the respective basis vectors of V, and let Q2 be
the matrix whose columns are the respective basis vectors of W. Since these are orthonormal bases,
Q1 and Q2 are orthogonal matrices. Then, by Exercise 14, the standard matrix for T is given by

−1 −1 W −1
[T ] = Q1 [T ]W
V Q2 = Q1 Q2 Q2 [T ]V Q2 .

Letting Q = Q1 Q−1 W −1
2 and P = Q2 [T ]V Q2 , we see that A = QP , where Q is orthogonal (see Exercise
1.4.35) and P is a projection matrix.
268 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

9.2. Eigenvalues, Eigenvectors, and Diagonalizability

9.2.1 a. p(t) = t2 − (trA)t + det A = t2 − 5t − 6 = (t + 1)(t − 6), so the eigenvalues are λ1 = −1


and λ2 = 6. We have
" # " #
2 5 −5 5
A+I = and A − 6I = ,
2 5 2 −2
   
−5 1
so gives a basis for E(−1) and gives a basis for E(6).
2 1
b. p(t) = t2 − 1 = (t + 1)(t − 1), so the eigenvalues are λ1 = −1 and λ2 = 1. We have
" # " #
1 1 −1 1
A+I = and A−I = ,
1 1 1 −1
   
−1 1
so gives a basis for E(−1) and gives a basis for E(1).
1 1
c. p(t) = t2 + t − 2 = (t + 2)(t − 1), so the eigenvalues are λ1 = −2 and λ2 = 1. We have
" # " #
12 −6 9 −6
A + 2I = and A−I = ,
18 −9 18 −12
   
1 2
so gives a basis for E(−2) and gives a basis for E(1).
2 3
d. p(t) = t2 − 2t − 8 = (t − 4)(t + 2), so the eigenvalues are λ1 = −2 and λ2 = 4. We
have
" # " #
3 3 −3 3
A + 2I = and A − 4I = ,
3 3 3 −3
   
−1 1
so gives a basis for E(−2) and gives a basis for E(4).
1 1

" e.# p(t) = t2 − 4t + 4 = (t − 2)2 , so the eigenvalues are λ1 = λ2 = 2. Since A − 2I =


 
−1 1 1
has rank 1, E(2) is one-dimensional, with basis .
−1 1 1

f. p(t) = 9t − t3 = −t(t − 3)(t + 3), so the eigenvalues are λ1 = −3, λ2 = 0, and λ3 = 3.


We have
   
2 1 2 −4 1 2
   
A + 3I =  1 5 1  , A − 0I = A, and A − 3I =  1 −1 1.
2 1 2 2 1 −4
     
1 1 1
Passing to reduced echelon form, we find that  0  spans E(−3),  −1  spans E(0), and  2 
−1 1 1
spans E(3).
9.2. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY 269

g. p(t) = −(t − 1)2 (t − 3), so the eigenvalues are λ1 = λ2 = 1 and λ3 = 3. We have


   
0 0 0 −2 0 0
   
A − I =  −2 0 2 and A − 3I =  −2 −2 2,
−2 0 2 2 0 0
      
 0 1   0 
so   
1 , 0  is a basis for E(1) and  1  is a basis for E(3).
   
0 1 1
h. p(t) = −(t − 1)2 (t − 3), so the eigenvalues are λ1 = λ2 = 1 and λ3 = 3. We have
   
0 −1 2 −2 1 2
   
A−I = 0 0 0 and A − 3I =  0 −2 0,
0 −2 2 0 −2 0
   
1 1
so the vector  0  gives a basis for E(1) and  0  gives a basis for E(3).
0 1
i. p(t) = −(t − 1)2 (t − 2), so the eigenvalues are λ1 = λ2 = 1 and λ3 = 2. We have
   
1 0 1 0 0 1
   
A−I = 0 0 2 and A − 2I =  0 −1 2,
0 0 0 0 0 −1
   
0 1
so the vector 1 gives a basis for E(1) and 0  gives a basis for E(2).
  
0 0
j. p(t) = −t(t − 1)(t + 1), so the eigenvalues are λ1 = −1, λ2 = 0, and λ3 = 1. We have
   
2 −2 2 0 −2 2
   
A + I =  −1 1 −1  and A − I =  −1 −1 −1  ,
0 2 0 0 2 −2
     
−1 2 −2
so    
0 gives a basis for E(−1), −1 gives a basis for E(0) = N(A), and  1  gives a basis
1 −2 1
for E(1).
k. p(t) = (3 − t)((1 − t)(2 − t) − 2) = −t(t − 3)2 , so the eigenvalues are λ1 = 0 and λ2 =
   
0 1 0 2
 
λ3 = 3. We have A − 3I =  0 −2 2 . We find that  −6  gives a basis for E(0) = N(A)
0 1 −1 3
 
1
and  0  gives a basis for E(3).
0
270 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

l. p(t) = −(t + 1)(t − 2)(t − 3), so the eigenvalues are λ1 = −1, λ2 = 2, and λ3 = 3. We
have
     
2 −6 4 −1 −6 4 −2 −6 4
     
A + I =  −2 −3 5, A − 2I =  −2 −6 5, and A − 3I =  −2 −7 5,
−2 −6 8 −2 −6 5 −2 −6 4
     
1 2 1
so  1 ,  1 , and  −1  give bases for the respective eigenspaces.
1 2 −1
m. p(t) = −(t − 1)2 (t − 3), so the eigenvalues are λ1 = λ2 = 1 and λ3 = 3. We have
   
2 2 −2 0 2 −2
   
A−I = 2 1 −1  and A − 3I =  2 −1 −1  ,
2 1 −1 2 1 −3
   
0 1
and we find that  1  gives a basis for E(1) and  1  gives a basis for E(3).
1 1
n. p(t) = (t − 1)2 (t − 2)2 (e.g., applying Exercise 7.5.10), and so the eigenvalues are
λ1 = λ2 = 1 and λ3 = λ4 = 2. We have
   
0 0 0 1 −1 0 0 1
   
0 0 1 1  0 −1 1 1
A−I = 0  and A − 2I = 

,
 0 1 0  0 0 0 0
0 0 0 1 0 0 0 0
        

 1 0 
 
 1 0 
         
0  ,   is a basis for E(1) and   ,  1  is a basis for E(2).
1 1
so   0   0   0   1 

  
 
   
0 0 1 0

9.2.2 Proposition 2.3 tells us that 0 is an eigenvalue of A if and only if det A = 0, which is
true precisely when A is singular. (See Theorem 5.5 of Chapter 7.) Alternatively, directly from the
definition, 0 is an eigenvalue if and only if Av = 0 for some nonzero vector v, which means A is
singular.

9.2.3 If A is upper (lower) triangular, then so is A − tI, so, by Proposition 5.12 of Chapter 7,
we have p(t) = det(A − tI) = (a11 − t)(a22 − t) · · · (ann − t), whose roots are the diagonal entries of
A.

9.2.4 Let V ⊂ Rn be a subspace, and let T : Rn → Rn and S : Rn → Rn be the linear


transformations defined respectively by projecting onto and reflecting across V . (See Example 1 in
Section 1.) Then if v ∈ V and w ∈ V ⊥ , we have T (v) = S(v) = v, T (w) = 0, and S(w) = −w.
Thus, the eigenvalues of T are 0 and 1, and we see that E(0) = V ⊥ and E(1) = V . Similarly, the
eigenvalues of S are −1 and 1, with E(−1) = V ⊥ and E(1) = V .
9.2. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY 271

Remark: To be complete, how do we know these are the only possibilities? Suppose T (x) = λx.
Write x = v + w, where v ∈ V and w ∈ V ⊥ . Then, applying T , we have λ(v + w) = v, so
(λ − 1)v + λw = 0. Since v · w = 0, we see easily that λ = 0 and v = 0 or λ = 1 and w = 0. A
similar argument works in the case of S.

9.2.5 Suppose λ is an eigenvalue of A, so Av = λv for some nonzero v. Since A is nonsin-


gular, λ 6= 0 (see Exercise 2). Then, multiplying this equation by A−1 , we have v = A−1 (Av) =
A−1 (λv) = λ(A−1 v), from which we infer that A−1 v = (1/λ)v, so that 1/λ is an eigenvalue of
A−1 . Similarly, since A = (A−1 )−1 , it follows that every eigenvalue of A is the reciprocal of an
eigenvalue of A−1 .

9.2.6 a. We proceed by induction. We know that x is an eigenvector of A = A1 with eigenvalue


λ. Suppose that x is an eigenvector of Ak with eigenvalue λk . Then we have Ak+1 x = A(Ak x) =
A(λk x) = λk (Ax) = λk (λx) = λk+1 x. Therefore, An x = λn x for all positive integers n, as required.
Since x 6= 0, this means that x is an eigenvector of An for every positive integer n.
b. Since Ax = λx and x 6= 0, we have (A + I)x = λx + x = (λ + 1)x, so x is an
eigenvector of A + I with corresponding eigenvalue λ + 1.
c. Since Ax = λx and Bx = µx, we have (A + B)x = Ax + Bx = λx + µx = (λ + µ)x.
Inasmuch as x 6= 0, we conclude that x is an eigenvector of A + B with eigenvalue λ + µ.
" # " #
1 0 0 0
d. 1 is an eigenvalue of both A = and B = , and yet 2 is certainly
0 0 0 1
" #
1 0
not an eigenvalue of A + B = .
0 1
" # " #
0 0 0 1
9.2.7 Perhaps the easiest counterexample is this: Let A = and B = .
0 0 0 0
Then both have the characteristic polynomial p(t) = t2 ; A and B cannot be similar since O is
similar only to itself.

9.2.8 By Proposition 4.5 of Chapter 1, we have Ax·y = x·AT y. Since Ax = λx and AT y = µy,
we have λx · y = x · µy, so (λ − µ)(x · y) = 0. If λ 6= µ, then we must have x · y = 0.

9.2.9 a. By Proposition 5.11 of Chapter 7, det A = det AT for any matrix A, so

pA (t) = det(A − tI) = det(A − tI)T = det(AT − tI) = pAT (t).

Since A and AT have the same characteristic polynomial, they have the same eigenvalues.
" #  
3 1 1
b. Let A = . Then 4 is an eigenvalue of both A and AT . gives a basis
−3 7 1
 
−3
for EA (4), whereas gives a basis for EAT (4).
1
272 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

9.2.10 Suppose λ1 , . . . , λn are the roots of p(t) = det(A−tI). Then, by the root-factor theorem,
since p(t) has degree n, there is a constant c so that p(t) = c(t − λ1 )(t − λ2 ) · · · (t − λn ). Since the
coefficient of tn in p(t) is (−1)n , we infer that p(t) = (−1)n (t − λ1 )(t − λ2 ) · · · (t − λn ). Therefore,
det A = p(0) = (−1)n (−λ1 )(−λ2 ) · · · (−λn ) = λ1 λ2 · · · λn .

9.2.11 a. Suppose A is nonsingular. Then A is invertible and so BA = A−1 (AB)A, from which
we see that AB and BA are similar. By Lemma 2.4, pAB (t) = pBA (t).
b. Applying Exercise 7.5.10a, consider
" #" #! " #
tI −A tI A t2 I − AB O
det = det = det(t2 I − AB) det(tI).
O I B tI B tI
On the other hand, we have
" #" #! " # " # " #
tI −A tI A tI −A tI A tI A
det = det det = det(tI) det ,
O I B tI O I B tI B tI
" #
tI A
and so we conclude that det(t2 I − AB) = det . Now, by Exercise 7.5.10c, for any
B tI
t 6= 0, the latter is equal to det(t2 I − BA), and so pAB (t2 ) = pBA (t2 ) for all t 6= 0. Therefore,
pAB (u) = pBA (u) for all u > 0, and so the polynomials are identical.
We can also avoid the reference to the latter exercise, by proceeding directly:
" # " #" #! " #
tI A I O tI A tI A
det(tI) det = det = det
B tI −B tI B tI O t2 I − BA
= det(tI) det(t2 I − BA),

as desired.

9.2.12 We use the information determined in the solutions of Exercise 1.


a. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
b. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
c. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
d. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
e. No: The eigenvalue λ = 2 has algebraic multiplicity 2, but geometric multiplicity 1.
f. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
g. Yes: There is a basis consisting of eigenvectors.
h. No: The eigenvalue λ = 1 has algebraic multiplicity 2, but geometric multiplicity 1.
i. No: The eigenvalue λ = 1 has algebraic multiplicity 2, but geometric multiplicity 1.
j. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
k. No: The eigenvalue λ = 3 has algebraic multiplicity 2, but geometric multiplicity 1.
l. Yes: The eigenvalues are distinct, and so there is a basis consisting of eigenvectors.
9.2. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY 273

m. No: The eigenvalue λ = 1 has algebraic multiplicity 2, but geometric multiplicity 1.


n. Yes: There is a basis consisting of eigenvectors.

9.2.13 a. True: This is the matrix version of Corollary 2.7.


" # " #
1 0 0 1
b. False: Let A = and B = .
0 1 0 0
c. True: By Lemma 2.4, similar matrices have identical characteristic polynomials. Al-
ternatively, suppose v is an eigenvector of A with corresponding eigenvalue λ. Then P −1 BP v = λv,
so B(P v) = λ(P v), which shows that P v is an eigenvector of B with corresponding eigenvalue λ.
" # " #
1 0 1 3
d. False: Let A = and B = .
0 1 0 1
" #
0 −1
e. False: Let A = .
1 0
f. True: Since A is diagonalizable, there are a nonsingular matrix P1 and a diagonal
matrix Λ so that P1−1 AP1 = Λ. Since B is also diagonalizable and has the identical eigenvalues,
there is a nonsingular matrix P2 so that P2−1 BP2 = Λ. Therefore, we have P1−1 AP1 = P2−1 BP2 ,
and so A = (P1 P2−1 )B(P2 P1−1 ). Letting P = P2 P1−1 , we have A = P −1 BP .

9.2.14 Let the eigenvalues of A be λ1 and λ2 . Since λ1 and λ2 are integers and their product
is det A = 120 (see Exercise 10), they must be distinct. Therefore, by Corollary 2.7, A must be
diagonalizable.

9.2.15 Suppose X T = λX. Then we have X = (X T )T = λX T = λ2 X. If X 6= O, then we must


have λ = ±1. Thus, the eigenvalues of T can be only ±1. On the other hand, both are eigenvalues,
since any symmetric matrix X satisfies T (X) = X and any skew-symmetric matrix X satisfies
T (X) = −X. That is, E(1) = S, the subspace of symmetric n × n matrices, and E(−1) = K, the
subspace of skew-symmetric n × n matrices.
Now, every matrix X can be written in the form X = 21 (X +X T )+ 21 (X −X T ), where 12 (X +X T )
is symmetric and 21 (X − X T ) is skew-symmetric. Thus, E(1) + E(−1) = Mn×n , so, concatenating
a basis for E(1) and a basis for E(−1), we must obtain a basis for Mn×n . (Here we are using the
result of Exercise 4.3.21a.) Since there is a basis for Mn×n consisting of eigenvectors of T , we infer
that T is, in fact, diagonalizable.

9.2.16 a. (A − 2I)2 = O.
" # " # " #
−1 1 0 1
b. A − 2I = , so we see by inspection that (A − 2I) = . How
−1 1 1 1
would we know a priori that this system is consistent? Part a tells us that if b ∈ C(A − 2I), then
b ∈ N(A − 2I), so C(A − 2I) ⊂ N(A − 2I). On the other hand, by Lemma 3.8 of Chapter 4, since
both of these subspaces are one-dimensional, they must be equal. Therefore, v1 ∈ N(A − 2I) =⇒
v1 ∈ C(A − 2I).
274 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

c." Since Av
# 1 = 2v1 and Av2 = v1 + 2v2 , the matrix for A with respect to the basis
2 1
{v1 , v2 } is .
0 2

9.2.17 Suppose λ is an eigenvalue of A with geometric multiplicity d. Then dim EA (λ) =


dim N(A − λI) = d. Say A is an n × n matrix. Then Theorem 4.5 of Chapter 4 tells us that
 
dim N (A − λI)T = n − rank (A − λI)T = n − rank(A − λI) = dim N(A − λI) = d. Thus,
dim EAT (λ) = d, as well.

9.2.18 a. Suppose λ is an eigenvalue of A, so there is a nonzero vector v satisfying Av = λv.


Then A2 v = A(Av) = A(λv) = λ(Av) = λ2 v. Since A2 = A, we infer that A2 v = Av, and so
λ2 v = λv. Inasmuch as v 6= 0, we infer that λ2 = λ, so λ = 0 or λ = 1.
b. From Exercise 4.4.16 we see that C(A) = E(1) and N(A) = E(0) and, moreover, that
C(A) + N(A) = Rn , so, putting together a basis for C(A) and a basis for N(A), we obtain a basis
for Rn . (Here we are using the result of Exercise 4.3.21a.) Thus, there is a basis for Rn consisting
of eigenvectors of A, and so A is diagonalizable.

9.2.19 a. Let λ be an eigenvalue of A. Then there is a nonzero vector v ∈ Rn so that Av = λv.


Therefore, A2 v = λ2 v = v, so λ2 = 1 and λ = ±1.
b. We first establish that E(1) = {x ∈ Rn : x = 12 (u + Au) for some u ∈ Rn }. If
x ∈ E(1), then Ax = x and so x = 12 (x + Ax). On the other hand, if x = 12 (u + Au) for
some u, we have Ax = 21 (Au + A2 u) = 21 (Au + u) = x, so x ∈ E(1). Similarly, we establish
that E(−1) = {x ∈ Rn : x = 12 (u − Au) for some u ∈ Rn }. If x ∈ E(−1), then Ax = −x and so
x = 12 (x−Ax). Conversely, if x = 12 (u−Au) for some u, then Ax = 12 (Au−A2 u) = 12 (Au−u) = −x,
as desired.
c. Given any x ∈ Rn , we write x = 12 (x + Ax) + 12 (x − Ax). By part b, we’ve expressed
x as the sum of vectors in E(1) and E(−1). Thus, E(1) + E(−1) = Rn . Putting together a basis
for E(1) and a basis for E(−1), we obtain a basis for Rn , so there is a basis for Rn consisting
of eigenvectors of A, and A is therefore diagonalizable. (Here we are using the result of Exercise
4.3.21a.)

9.2.20 a. p = pA is a polynomial of degree 3 and hence has a real root. (This is a standard
application of the intermediate value theorem, since polynomials are continuous and p(t) → −∞
as t → ∞ and p(t) → ∞ as t → −∞.)
b. Since A is orthogonal, we have AT A = I. Therefore, kAxk2 = Ax · Ax = x · AT Ax =
x · x = kxk2 . If x is an eigenvector with corresponding eigenvalue λ, then Ax = λx and so
kAxk = |λ|kxk = kxk, from which we deduce that |λ| = 1.
c. Since det A = 1, the characteristic polynomial takes the form p(t) = −t3 + · · · + 1.
Since p(0) = 1 and p(t) → −∞ as t → ∞, we see that the graph of p(t) must cross the positive real
axis. But we showed in part b that this can happen only at t = 1.
9.2. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY 275

d. Assuming A 6= I, we first show that dim E(1) = 1. Suppose, to the contrary, that
dim E(1) = 2. Let {v1 , v2 } be a basis for E(1) and let v3 be a basis for E(1)⊥ . Then for i = 1, 2
we have Av3 · vi = v3 · AT vi = v3 · A−1 vi = v3 · vi = 0, so Av3 = λv3 for some scalar λ. By part
b, λ can equal only 1 or −1. We rule out the former because A 6= I; we rule out the latter because
the product of the eigenvalues must equal det A = 1.
Now choose an orthonormal basis {v1 , v2 , v3 } for R3 with v1 ∈ E(1). Let’s consider the matrix
B for A with respect to this basis. For j = 2, 3 we have Avj ·v1 = vj ·AT v1 = vj ·A−1 v1 = vj ·v1 = 0,
so Avj ∈ Span(v2 , v3 ) for j = 2, 3. This means that B takes the form
 
1 0 0
 
B= 0 .
C
0

Now what do we know about the matrix C? We know that C T C = I and det C = det B = 1.
Therefore, we conclude from Exercise 1.4.34 that µC gives a rotation of the plane spanned by v2
and v3 , and thus µA gives rotation through some angle θ about the line E(1).
e. Reversing the argument of part d, suppose T : R3 → R3 is a linear transformation
giving the rotation about some axis through some angle θ. Then with respect to the obvious
orthonormal basis for R3 it has the matrix
 
1 0 0
 
B =  0 cos θ − sin θ  .
0 sin θ cos θ

Then the change-of-basis formula tells us that its standard matrix A = P BP −1 is orthogonal,
since AT A = (P BP −1 )T (P BP −1 ) = P B T P T P BP −1 = I. Likewise, det A = 1. Now, the matrix
representing the composition of two rotations is therefore the product of two orthogonal matrices
with determinant 1 and therefore is again an orthogonal matrix of determinant 1 (see Exercise
1.4.35). Thus, by part d, the composition of two rotations in R3 is again a rotation.

9.2.21 Although we could calculate the characteristic polynomial of C, that doesn’t appear to
be too much fun. Let’s just see if 1 is an eigenvalue by finding N(C − I). With a bit of care, we
find that
 √
6
√ 
6
 
− 65 1
3 + 6
1
6 − 3√ 1 0 −1

 6   
C − I =  31 − 66 2
3 − 2
3 + 6  0 1 −2  ,
√ √
1 6 1 6
6 + 3 3 − 6 − 56 0 0 0
     
1 1 1
  1   1 
and so v1 = 2 gives a basis for E(1). Now, v2 = √ −1 and v3 = √ 0  give an
1 3 1 2 −1
orthonormal basis for E(1)⊥ , and we calculate that Cv2 = −v3 and Cv3 = v2 . It follows that T is
rotation through angle −π/2 about the axis spanned by v1 (as viewed from positively far out the
v1 -axis). (Note that v2 × v3 is in the direction of v1 , so {v1 , v2 , v3 } is a “right-handed” basis for
R3 .)
276 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

9.2.22 When n = 1 there is nothing to prove. Since all of the eigenvalues of A are real, A
must have at least one eigenvector. Let v1 be an eigenvector with corresponding eigenvalue λ1 , and
choose v2′ , . . . , vn′ so that {v1 , v2′ , . . . , vn′ } gives a basis for Rn . The matrix for A with respect to
this basis takes the form  
λ1 ∗ · · · ∗
 
 0 
A′ =   ..
,

 . B 
0
where B is an (n − 1) × (n − 1) matrix. Since det(A − tI) = det(A′ − tI) = (λ1 − t) det(B − tI),
we see that all the eigenvalues of B must be real. By induction, there is a basis {v2′′ , . . . , vn′′ }
for Span(v2′ , . . . , vn′ ) with respect to which the matrix for B becomes upper triangular. Then the
matrix A′′ for A with respect to the basis {v1 , v2′′ , . . . , vn′′ } is upper triangular, as desired.

9.2.23 Proceeding as suggested by the hint, let B = {v1 , . . . , vk , vk+1 , . . . , vn } be a basis for V
with Span(v1 , . . . , vk ) = W . Since T (W ) ⊂ W , the matrix for T with respect to B will take the
block form " #
B D
A= .
O C
Using Exercise 7.5.10 as usual, we have det(A − tI) = det(B − tI) det(C − tI). Fix an eigenvalue
λ of B. We must show that its geometric multiplicity equals its algebraic multiplicity. Denote
by dA , dB , and dC the geometric multiplicities of the eigenvalue λ for the respective matrices and
denote by mA , mB , and mC the corresponding algebraic multiplicities. Since pA (t) = pB (t)pC (t),
it follows that mA = mB + mC . We also know that dA ≤ mA , dB ≤ mB , and dC ≤ mC , and we
wish to prove that dB = mB .
The crucial observation is that rank(A − λI) ≥ rank(B − λI) + rank(C − λI) (as there may be
extra pivots in the upper right corner of A−λI). Since dA = n−rank(A−λI), dB = k−rank(B−λI),
and dC = (n − k) − rank(C − λI), we then infer that dA ≤ dB + dC . Putting together all the
information, we have
dA ≤ dB + dC ≤ mB + mC = mA .
But since T is diagonalizable, we know that dA = mA , and so equality must hold at every stage,
as in the proof of Theorem 2.9. In particular, we have dB = mB , and B will be diagonalizable.

9.2.24 a. Since A and B have the same eigenvectors, there is a single nonsingular matrix P so
that both P −1 AP = Λ1 and P −1 BP = Λ2 are diagonal. Since diagonal matrices commute, we have

(P −1 AP )(P −1 BP ) = Λ1 Λ2 = Λ2 Λ1 = (P −1 BP )(P −1 AP ),

and so P −1 (AB)P = P −1 (BA)P , from which it follows immediately that AB = BA.


b. Let v be an eigenvector of A with corresponding eigenvalue λ. Since A(Bv) =
B(Av) = λ(Bv), we see that Bv ∈ EA (λ). Since the eigenspaces of A are one-dimensional,
there must be a scalar µ so that Bv = µv, so v is an eigenvector of B as well. Since A has n
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 277

distinct eigenvalues, there is a basis for Rn consisting of eigenvector of A. Since these are also
eigenvectors of B, it follows that B is diagonalizable.
The answer to the query is no: For example, if B = I, then every vector is an eigenvector of
B, but certainly need not be one of A.
c. As in part b, if v ∈ EA (λ), then A(Bv) = B(Av) = λ(Bv), so Bv ∈ EA (λ).
Therefore, B(EA (λ)) ⊂ EA (λ). Applying Exercise 23, since we are told that B is diagonalizable,
it follows that there is a basis for EA (λ) consisting of eigenvectors of B. Finally, since A is
diagonalizable, we know that the eigenspaces of A span all of Rn , so we conclude that there is a
basis {v1 , . . . , vn } for Rn consisting of eigenvectors of both A and B. Letting P be the matrix
whose column vectors are v1 , . . . , vn , we conclude that both P −1 AP and P −1 BP are diagonal.

9.2.25 This exercise is easy once we observe that Theorem 2.6 can be rephrased as follows:
Suppose λ1 , . . . , λk are distinct eigenvalues of a linear transformation. If Vi ∈ E(λi ) for i = 1, . . . , k,
and V1 + · · · + Vk = 0, then Vi = 0 for all i. (If not, discarding those that are 0, suppose
Vi1 + · · · + Vis = 0 and each Vij 6= 0. Then each Vij is an eigenvector with corresponding
eigenvalue λij , and this gives a nontrivial linear relation among eigenvectors corresponding to
distinct eigenvalues, contradicting the theorem.)
a. Suppose c1 v1 + c2 v2 + · · · + ck vk + d1 w1 + · · · + dℓ wℓ = 0. Set V = c1 v1 + c2 v2 + · · · +
ck vk ∈ E(λ) and W = d1 w1 + · · · + dℓ wℓ ∈ E(µ). Then, by our remark, we must have V = W = 0,
and now linear independence of the respective sets of vectors tells us that c1 = · · · = ck = 0 and
d1 = · · · = dℓ = 0.
P
k P di
(i) Pdi
(i)
b. Generalizing this argument, suppose cij vj = 0. Set Vi = cij vj and note
i=1 j=1 j=1
that Vi ∈ E(λi ). Then we have V1 + · · · + Vk = 0. We conclude from our remark that Vi = 0
for all i = 1, . . . , k. Then linear independence of the individual sets of vectors implies that all the
cij = 0.

9.3. Difference Equations and Ordinary Differential Equations

2
9.3.1  = t − 9 = (t + 3)(t − 3), so the
p(t)  eigenvalues of A are λ1 = −3 and λ2 = 3. The
−1 5
vector v1 = spans E(−3) and v2 = spans E(3). Now we can write A = P ΛP −1 , where
1 1
" # " # " #
−1 5 −3 1 −1 5
P = , Λ= , and P −1 = .
1 1 3 6 1 1

Then
" #" #" # " #
1 −1 5 (−3)k −1 5 3k 5 + (−1)k 5(1 + (−1)k+1 )
Ak = P Λk P −1 = = .
6 1 1 3k 1 1 6 1 + (−1)k+1 5(−1)k + 1

Remark: In fact, A2k = 32k I and A2k+1 = 32k A.


278 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

9.3.2 There are three possible states at any given time: (1) two Buds in the first tub and two
Becks in the second; (2) one of each type of beer in each tub; (3) two Becks in the first tub and
two Buds in the second. Let xk be the vector whose ith coordinateis the probability that the beers
1
0 4 0
 1 
are in state i at time k. We find that the transition matrix is A =  1 2 1 . The unique
1
0 4 0
 
1
1 
vector whose entries sum to 1 in E(1) is 4 , so as k → ∞, 2/3 of the time there will be exactly
6
1
one Becks in the first tub and 5/6 of the time there will be at least one Becks in the first tub.

9.3.3 Let ak , bk , ck , dk , and ek denote respectively theprobabilities


 that Gus has $0, $100,
ak 0
 bk  0
   
$200, $300, and $400 after playing the k th game. Let xk =  ck . Then we have x0 = 
  
 1  and
 dk  0
ek 0
xk+1 = Axk , where A is the transition matrix

 3

1 5 0 0 0
 3 
0 0 5 0 0
 
A=
0
2
5 0 3
5 0.
 2 
0 0 5 0 0
2
0 0 0 5 1

(We notice that the first and last columns of Ak never change, so this is not a regular stochastic
matrix. Indeed, we see that E(1) is at least two-dimensional.) The characteristic polynomial of A

is p(t) = −t(t − 1)2 (t2 − 12
25 ), and so the eigenvalues are λ1 = 0, λ2 = λ3 = 1, λ4 = −2 3/5, and

λ5 = 2 3/5. (Note that |λ4 | = |λ5 | < 1.) We solve for the respective eigenvectors and find that
A = P ΛP −1 , where

   
0 − 94 0 1 9
4 √
9
4 √
   
 1   15 0 0 −3(5+2 3) 3(−5+2 3) 
   4 √ 4 √ √ 4 √ 
 
Λ=
 1 
 and P = 0 0 0 3(5+2 3) 3(−5+2 3)
.
 √
  2√ 2√ 
 −253   −5 0 0 −5−2 3 −5+2 3 
√  2 2 2 
2 3
5 1 1 0 1 1
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 279
 
0
 
 1 
 
Now, note that Λk → Λ∞ =
 1  as k → ∞, so

 
 0 
0
 
 
0 9/13
   
 4/13   0 
   
Ak x0 → P Λ∞ (P −1 x0 ) = P   
 9/13  =  0  .

   
 0   0 
0 4/13

Thus, the probability that Gus eventually loses all his money is 9/13. Interestingly, we see from
this calculation that the probability that the game continues forever is 0.
   
ak 2
9.3.4 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 3
" #
0 1
where A = . Since p(t) = t2 − 3t + 2 = (t − 1)(t − 2), the eigenvalues of A are λ1 = 1
−2 3
    " #
1 1 1 1
and λ2 = 2. The vector v1 = spans E(1) and v2 = spans E(2). Letting P =
1 2 1 2
" #
1
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
2
" #" # " # " #!
1 1 1 2 −1 2
xk = Ak x0 = k
1 2 2 −1 1 3
" #" #" # " #" # " #
1 1 1 1 1 1 1 1 + 2k
= = = ,
1 2 2k 1 1 2 2k 1 + 2k+1

so ak = 1 + 2k .
   
ak 1
9.3.5 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 − t − 6 = (t − 3)(t + 2), the eigenvalues of A are λ1 = −2 and
6 1
    " #
1 1 1 1
λ2 = 3. The vector v1 = spans E(−2) and v2 = spans E(3). Letting P =
−2 3 −2 3
" #
−2
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
3
" #" # " # " #!
k 1 1 1 (−2)k 3 −1 1
xk = A x0 =
5 −2 3 3k 2 1 1
280 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
" #" #" # " #" #
1 1 1 (−2)k 2 1 1 1 (−1)k 2k+1
= k
=
5 −2 3 3 3 5 −2 3 3k+1
" #
1 (−1)k 2k+1 + 3k+1
= ,
5 (−1)k+1 2k+2 + 3k+2
1

so ak = 5 (−1)k 2k+1 + 3k+1 .
   
ak 0
9.3.6 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 −3t−4 = (t−4)(t+1), the eigenvalues of A are λ1 = −1 and
4 3
    " #
1 1 1 1
λ2 = 4. The vector v1 = spans E(−1) and v2 = spans E(4). Letting P =
−1 4 −1 4
" #
−1
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
4
" #" # " # " #!
1 1 (−1)k 4 −1 0
1
xk = Ak x0 = k
5 −1 4 4 1 1 1
" #" #" # " #" # " #
1 1 1 (−1)k −1 1 1 1 (−1)k+1 1 (−1)k+1 + 4k
= = = ,
5 −1 4 4k 1 5 −1 4 4k 5 (−1)k + 4k+1

so ak = 51 (−1)k+1 + 4k .
   
ak 0
9.3.7 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 − 4t + 4 = (t − 2)2 , the eigenvalues of A are λ1 = λ2 = 2.
−4 4
 
1
On the other hand, spans E(2), so A is not diagonalizable. Following Exercise 9.2.16, we
2
" # " #
1 0 2 1
find the change-of-basis matrix P = so that P −1 AP = Λ = . Then, setting
2 1 0 2
" #
0 1
N= , we have Λ = 2I + N and N 2 = O. Thus,
0 0

Λ2 = (2I +N )2 = 4I +4N , Λ3 = Λ2 Λ = 4(I +N )(2I +N ) = 8I +12N , ..., Λk = 2k I +k2k−1 N

(where the last formula can easily be proved by induction). Thus,


" #" #" #" #
k k −1 1 0 2k k2k−1 1 0 0
xk = A x0 = P Λ P x0 = k
2 1 0 2 −2 1 1
" #" #" # " #" # " #
1 0 2k k2k−1 0 1 0 k2k−1 k2k−1
= = = ,
2 1 0 2k 1 2 1 2k (k + 1)2k

and so ak = k2k−1 .
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 281

     
ak 0 0 1 0
     
9.3.8 Let xk = ak+1 , k ≥ 0, x0 = 1 , and set A =  0 0 1 ; then we have
ak+2 1 −2 1 2
xk+1 = Axk . Now p(t) = −t3 + 2t2 + t − 2 = −(t − 1)(t + 1)(t
 − 2),
   eigenvalues
so the   of A
1 1 1
are λ1 = −1, λ2 = 1, and λ3 = 2. Respective eigenvectors are  −1 ,  1 , and  2 . Then
1 1 4
A= P ΛP −1 , where
     
1 1 1 −1 2 −3 1
    1 
P =  −1 1 2, Λ= 1 , and P −1 =  6 3 −3  .
6
1 1 4 2 −2 0 2
Therefore, we have
    
1 1 1 (−1)k 2 −3 1 0
k k −1 1    
xk = A x0 = P Λ P x0 =  −1 1 2 1  6 3 −3   1 
6 k
1 1 4 2 −2 0 2 1
      
1 1 1 (−1)k −1 1 1 1 (−1)k+1
1    1   
=  −1 1 2 1   0  =  −1 1 2 0 
3 3
1 1 4 2k 1 1 1 4 2k
 
(−1)k+1 + 2k
1 
=  (−1)k+2 + 2k+1  ,
3
(−1)k+3 + 2k+2

so ak = 31 2k + (−1)k+1 .
 
ck
9.3.9 Set xk = , and let A be the transition matrix so that xk+1 = Axk .
mk
" #
0.7
0.1
a. We have A = , and p(t) = t2 − 1.7t + 0.72 = (t − 0.9)(t − 0.8), so the
−0.2
1.0
   
1 1
eigenvalues are λ1 = 0.8 and λ2 = 0.9. The vector v1 = spans E(0.8) and v2 = spans
1 2
" # " #
1 1 0.8
E(0.9), so, letting P = and Λ = , we have
1 2 0.9
" #" #" #" #
1 1 (0.8) k 2 −1 c0
xk = Ak x0 = P Λk P −1 x0 =
1 2 (0.9)k −1 1 m0
" #" #" #
1 1 (0.8)k 2c0 − m0
=
1 2 (0.9)k −c0 + m0
" # " #
1 1
= (2c0 − m0 )(0.8)k + (−c0 + m0 )(0.9)k .
1 2
 
c0
As k → ∞, xk → 0, no matter what happens to be.
m0
282 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
" #
1.3 0.2
b. We have A = , and p(t) = t2 − 2.3t + 1.32 = (t − 1.1)(t − 1.2), so the
−0.1 1.0
   
1 2
eigenvalues are λ1 = 1.1 and λ2 = 1.2. The vector v1 = spans E(1.1) and v2 = spans
" # " #−1 −1
1 2 1.1
E(1.2), so, letting P = and Λ = , we have
−1 −1 1.2
" #" #" #" #
1 2 (1.1)k −1 −2 c0
xk = Ak x0 = P Λk P −1 x0 =
−1 −1 (1.2)k 1 1 m0
" #" #" #
1 2 (1.1)k −c0 − 2m0
= k
−1 −1 (1.2) c0 + m0
" # " #
k −1 k 2
= (c0 + 2m0 )(1.1) + (c0 + m0 )(1.2) .
1 −1

Since (1.2)k dominates (1.1)k as k grows larger, we see that for any nonzero initial cat/mouse
population, the cat population grows without bound and the mice die out.
" #
1.1 0.3
c. Here we have A = , and p(t) = t2 − 2t + 0.96 = (t − 0.8)(t − 1.2), so the
0.1 0.9
   
1 3
eigenvalues are λ1 = 0.8 and λ2 = 1.2. The vector v1 = spans E(0.8) and v2 = spans
−1 1
" # " #
1 3 0.8
E(1.2), so, letting P = and Λ = , we have
−1 1 1.2
" #" #" #" #
1 3 (0.8) k 1 −3 c
1 0
xk = Ak x0 = P Λk P −1 x0 =
4 −1 1 (1.2)k 1 1 m0
" #" #" #
1 1 3 (0.8)k c0 − 3m0
= k
4 −1 1 (1.2) c0 + m0
" # " #
k 1 k 3
= 14 (c0 − 3m0 )(0.8) + 41 (c0 + m0 )(1.2) .
−1 1

Since (0.8)k → 0 as k → ∞, we see that for any nonzero initial cat/mouse population, the cat and
mouse populations grow without bound and approach a limiting ratio of 3 : 1.
•
9.3.10 Since we know that etA = AetA and eO = I, it follows that E(t) = etA is a solution.
To check uniqueness, we proceed as in the proof of Proposition 3.1. Suppose E(t) is any solution
of this differential equation; then
• 
e−tA E(t) = −AetA E(t) + etA AE(t) = Ae−tA + e−tA A E(t) = O ,

and so e−tA E(t) is a constant matrix. Since at t = 0 this expression is equal to I, we must have
e−tA E(t) = I for all t, and therefore E(t) = etA for all t. (Here we are tacitly assuming, as we did
−1
in the text, that e−A = eA . See Exercise 17.)
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 283

9.3.11  a. We have p(t) = t2 − 5t − 6 = (t − 6)(t


 + 1),
 and so A is diagonalizable with eigenbasis
1 −5
v1 = , corresponding to λ = 6, and v2 = , corresponding to λ = −1. Then we have
1
" # " # " 2 #
1 −5 6 1 2 5
P = and Λ = ; so P −1 = and
1 2 −1 7 −1 1
" #" #" # " # " #
tA tΛ −1
 1 −5 e6t 1 1 −t −5
6t
x(t) = e x0 = P e P x0 = =e −e .
1 2 e−t −1 1 2

b. We have p(t) = t2 − 1 = (t + 1)(t − 1), and so A is diagonalizable with eigenbasis


" v1 #=
   
1 −1 1 −1
, corresponding to λ = 1, and v2 = , corresponding to λ = −1. Here P = ,
1 1 1 1
" #
1 1 1
P −1 = , and
2 −1 1
" #" #" # " # " #
tA tΛ −1
 1 −1 et 2 t 1 −t −1
x(t) = e x0 = P e P x0 = = 2e +e .
1 1 e−t 1 1 1

2
 c. We have p(t) = t − 2t − 8 = (t −4)(t+ 2), and so A is diagonalizable with eigenbasis
1 −1
v1 = , corresponding to λ = 4, and v2 = , corresponding to λ = −2. In this case
1 1
" #" #" # " # " #
 1 −1 e4t 3 1 −1
x(t) = etA x0 = P etΛ P −1 x0 = = 3e4t − 2e−2t .
1 1 e−2t −2 1 1

d. Here p(t) =t2 − 4t + 4 = (t − 2)2


 ,the eigenvalue λ = 2 has geometric multiplicity 1,
1 0
and E(2) has basis v1 = . Letting v2 = , we observe that (A − 2I)v2 = v1 , and so we take
1 1
" # " # " #
1 0 2 1 2
P = ,Λ= , and then A = P ΛP −1 and P −1 x0 = . Thus, we obtain
1 1 2 −3
" #" #" # " # " #!
tA tΛ −1
 1 0 e2t te2t 2 2t 1 0
x(t) = e x0 = P e P x0 = =e (2 − 3t) −3 .
1 1 e2t −3 1 1

3 + 9t = −t(t − 3)(t + 3), so once again A is diagonalizable. We


e. We have p(t) = −t  
−1 1
have the eigenvectors v1 =  0 , corresponding to λ = −3, v2 =  −1 , corresponding to λ = 0,
1 1
 
1
and v3 =  2 , corresponding to λ = 3. Here we have
1
     
−1 1 1 −3 −3 0 3
    1 
P = 0 −1 2, Λ= 0 , and P −1 =  2 −2 2,
6
1 1 1 3 1 2 1
284 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
   
   
1 −1 1 1
−1   −1
 −3t     3t  
so P x0 =  2  and x(t) = etA x0 = P etΛ P x0 = e  0  + 2  −1  + e  2  .
1 1 1 1

  f. Now we have p(t) = −t3 + t =−t(t− 1)(t + 1), and here the eigenvectors are =
 v1 
−1 −2 −2
 0 , corresponding to λ = −1, v2 =  1 , corresponding to λ = 0, and v3 =  1 ,
1 2 1
corresponding to λ = 1. Thus,
     
−1 −2 −2 −1 −1 −2 0
     
P = 0 1 1, Λ =  0  , and P −1 =  0 1 1,
1 2 1 1 1 2 1
       
−1 −1 −2 −2
        
so P −1 x0 =  −2  and x(t) = etA x0 = P etΛ P −1 x0 = −e−t  0  − 2  1  + et  1  .
1 1 2 1

9.3.12 As in Example 9, if P −1 AP = Λ, we let y = P −1 x, and then ÿ = P −1 ẍ = P −1 (P ΛP −1 )x =


Λy, so we uncouple the system when Λ is diagonal.
a. Referring to Exercise 11a, we have
" # " # " # " # " #
6 7 2 −5 0
ÿ = y, y0 = P −1 = , doty0 = P −1 = .
−1 0 −1 2 1
Thus, we have
√ √
y 1 = a1 e 6t
+ b1 e− 6t

y2 = a2 cos t + b2 sin t

for appropriate values of the constants. Using the initial conditions, we easily determine that
a1 = b1 = b2 = 1 and a2 = −1. Thus,
" # " #
√ √ 1 −5
6t − 6t
x = P y = (e +e ) + (− cos t + sin t) .
1 2
b. Referring to Exercise 11b, we have
" # " # " # " # " #
1 −2 0 1 2
ÿ = y, y0 = P −1 = , ẏ0 = P −1 = .
−1 2 2 3 1
Then we find

y1 = a1 et + b1 e−t
y2 = a2 cos t + b2 sin t,

and, using the initial conditions, we determine that a1 = b2 = 1, a2 = 2, and b1 = −1. Thus,
" # " #
t −t 1 −1
x = P y = (e − e ) + (2 cos t + sin t) .
1 1
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 285

c. Referring to Exercise 11c, we have


" # " # " # " √ # " #
4 −2 1 2 − 3 2 2
ÿ = y, y0 = P −1 = , ẏ0 = P −1 √ = √ .
−2 4 3 2+3 2 3 2

Thus, we solve for y:

y1 = a1 e2t + b1 e−2t
√ √
y2 = a2 cos 2t + b2 sin 2t,

and from the initial conditions we infer that


" # " # " #
e2t 2t 1
√ √ −1
y= √ √ , and so x = e + 3(cos 2t + sin 2t) .
3(cos 2t + sin 2t) 1 1

d. Here we proceed “by hand.” The system of differential equations can be rewritten
explicitly as ẍ1 = x2 , ẍ2 = 0, so, using the initial conditions, we determine that x2 = t + 2 and
hence x1 = 16 t3 + t2 + 2t + 1.

9.3.13 We consider the second-order ODE ẍ = Ax with


" #
−(k1 + k2 )/m1 k2 /m1
A= .
k2 /m2 −(k2 + k3 )/m2
" #
−4 3
a. Here we have A = , whose eigenvalues are −1 and −7, with corresponding
3 −4
   
1 −1
eigenvectors and . The general solution is
1 1
" # " #
1 √ √ 1
x(t) = (a1 cos t + b1 sin t) + (a2 cos 7t + b2 sin 7t) .
1 −1
" #
−3 2
b. Here we have A = , whose eigenvalues are −2 and −7, with corresponding
2 −6
   
2 −1
eigenvectors and . The general solution is
1 2
" # " #
√ √ 2 √ √ −1
x(t) = (a1 cos 2t + b1 sin 2t) + (a2 cos 7t + b2 sin 7t) .
1 2

" c. Now# because of the different masses, we must be slightly more careful. In this case,
 
−3 2 1
A = , whose eigenvalues are −1 and −4, with corresponding eigenvectors and
1 −2 1
 
−2
. The general solution is
1
" # " #
1 −2
x(t) = (a1 cos t + b1 sin t) + (a2 cos 2t + b2 sin 2t) .
1 1
286 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
 
0 1 0
 
9.3.14 As in Example 8, we write J = 2I + B, where B =  0 0 1 . Then B 2 =
 
0 0 1 0 0 0
  3 = O. Thus,
0 0 0 and B
0 0 0
∞ k
X ∞ k
X
t t k(k − 1) k−2 2 
etJ = Jk = 2k I + k2k−1 B + 2 B
k! k! 2
k=0 k=0
X∞ X∞ ∞
(2t)k tk−1  1 X tk−2 
= I +t 2k−1 B + t2 2k−2 B 2
k! (k − 1)! 2 (k − 2)!
k=0 k=1 k=2
 1 2


X ∞
X 1 t 2t
(2t)k 1 (2t)k 1 2 2t 2 2t  
= e2t I + t B + t2 2 2t 2t
B = e I + te B + t e B = e  0 1 t .
k! 2 k! 2
k=0 k=0 0 0 1
# "
y(t)
9.3.15 We consider x(t) = . Then we obtain the system ẍ = Ax, where
ẏ(t)
" # " #
0 1 −1
a. A = and x0 = . The eigenvalues of A are −1 and 2, with corre-
2 1 4
   
1 1
sponding eigenvectors and . Thus, taking
−1 2
" # " # " # " #
1 1 −1 −1 1 2 −1 −1 −2
P = , Λ= , so P = and P x0 = ,
−1 2 2 3 1 1 1

we have the solution


" #" #" # " # " #
1 1 e−t −2 1 1
x(t) = etA x0 = P etΛ (P −1 x0 ) = = −2e−t + e2t .
−1 2 e2t 1 −1 2

Reading off the first coordinate, we obtain y(t) = e2t − 2e−t .


" # " #
0 1 1
b. A = and x0 = . Here p(t) = (t − 1)2 , λ = 1 is the only eigenvalue,
−1 2 2
and A fails to be diagonalizable. However, we obtain P −1 AP = Λ, where
" # " #
1 0 1 1
P = and Λ = .
1 1 1
" #
1
Then P −1 x0 = and
1
" #" #" # " # " #
tΛ −1 1 0 et tet 1 t 1 t 0
x(t) = P e (P x0 ) = = e (1 + t) +e .
1 1 et 1 1 1

This means that y(t) = et (1 + t) is the solution of the original differential equation.
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 287
" #
−k + 1 k
Alternatively, we can guess (and prove by induction) that Ak = , k ∈ N, and
−k k+1
from this derive
∞ k ∞ k
"# " #
X t −k +X1t k −t + 1 t
etA = Ak = = et .
k! k! −k k + 1 −t t+1
k=0 k=0
" #" # " #
−t + 1 t 1 t + 1
Then, as before, we have x(t) = etA x0 = et = et .
−t t+1 2 t+2
" # # "
y(t) 0 1
9.3.16 We set x(t) = and A = , and we wish to solve the system ẋ = Ax.
ẏ(t) −b −a
 √ 
The eigenvalues of A are λ = 21 −a ± a2 − 4b . When a2 − 4b 6= 0, A is diagonalizable (perhaps
over C), and we obtain
" #" #" # " # " #
1 1 eλ1 t c1 λ1 t 1 λ2 t 1
x(t) = λ t
= c1 e + c2 e .
λ1 λ2 e 2 c2 λ1 λ2

Thus, when a2 − 4b > 0, the general solution of the original ODE is y(t) = c1 eλ1 t + c2 eλ2 t . When
a2 − 4b < 0, A has a pair of conjugate complex eigenvalues λ = α ± βi, and the general solution
is y(t) = eαt (c1 cos βt + c2 sin βt). Last, when a2 − 4b = 0, A has a repeated eigenvalue λ with
algebraic multiplicity 1, and, as in Exercise 15b,
" #" #" # " # " #
1 0 eλt teλt c1 λt 1 λt 0
x(t) = = (c1 + c2 t)e + c2 e ,
λ 1 eλt c2 λ 1

from which we infer that the general solution of the original ODE is y(t) = (c1 + c2 t)eλt .

9.3.17 a. Since A commutes with its own powers, we have AetA = etA A. Differentiating f (t) =
etA e−tA by the product rule, we obtain
•
etA e−tA = AetA e−tA + etA (−A)e−tA = (AetA − etA A)e−tA = O .

This means the matrix function f is constant, and so, in particular, f (t) = f (0) = I for all t. This
−1 −1
means that e−tA = etA . Setting t = 1, we infer that eA = e−A .
b. Using the properties of the transpose, we have
T X

Ak T

X (Ak )T

X (AT )k T) −1
eA = = = = e(A = e−A = eA ,
k! k! k!
k=0 k=0 k=0

and so the matrix eA is orthogonal.


c. When A is diagonalizable, we have A = P ΛP −1 for some diagonal matrix Λ with
entries λ1 , . . . , λn . Then det(eA ) = det(P eΛ P −1 ) = det(eΛ ) = eλ1 eλ2 . . . eλn = eλ1 +λ2 +···+λn =
etrΛ = etrA (e.g., by Lemma 2.4). If A has repeated eigenvalues and fails to be diagonalizable,
then we apply Exercise 9.2.22. There is an invertible matrix P so that A = P U P −1 , where U is
upper triangular. Proceeding as before, we have det(eA ) = det(P eU P −1 ) = det(eU ) = etrU = etrA ,
288 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

since, by the definition of matrix exponential, eU will be upper triangular with diagonal entries the
exponential of the diagonal entries of U .

d
9.3.18 a. We have D(exp)(O)A = dt exp(tA) = A, so D(exp)(O) = I : Mn×n → Mn×n .
t=0
Therefore, by the Inverse Function Theorem, exp has a C1 local inverse mapping a neighborhood
of exp(O) = I to a neighborhood of O.
   
t
0 −1  " # 
0 −π 
1 0 cos t − sin t π 0
b. We have seen in Example 7 that e = , so e =
sin t cos t
" #
−1
. On the other hand, we infer from Exercise 17c that det(eA ) > 0 for every A, so
−1
" #
−2
cannot be written as eA for any A.
1

9.3.19 Let F(x) = x. Then the associated flow is given by φt (x) = et x. Let Ω = B(r) be the
ball of radius r centered n t
Z at 0 in R and let V (r) = vol(B(r)). Then φt (Ω) = B(e r), and so we

have V̇(0) = rV ′ (r) = rdS = r area(∂Ω) , so V ′ (r) = area(∂Ω), as required.
∂Ω

9.3.20 a. Fix t = t0 as suggested in the hint and x0 ∈ B(a, δ). Now consider the two functions
of s given by f (s) = φs+t0 (x0 ) and g(s) = φs (φt0 (x0 )). Then both f and g are solutions of the
differential equation ẋ(s) = F(x(s)), x(0) = φt0 (x0 ). Therefore, by the uniqueness result stated in
the problem, for sufficiently small s, we have f (s) = g(s), which means that φs+t0 = φs ◦ φt0 . Since
this holds for arbitrary (small) t0 , this proves the desired result.
b. We have φt ◦ φ−t = φ−t ◦ φt = φt+(−t) = φ0 , and φ0 (x) = x for all x. Therefore,
φ−t = (φt )−1 .
c. Each of the following functions x is a solution of the given differential equation: for
0, t≤a
any a ≥ 0, take x(t) = .
(t − a)2 , t > a

9.3.21 Fix t. Let W(s) = V(t + s). By Exercise 20, V(t + s) = vol(φs+t (Ω)) = vol(φs (φt (Ω))).
| {z }
Z Z Ω0
˙ (t) = W
By Proposition 3.5, we have V ˙ (0) = div FdV = div FdV , as required.
Ω0 φt (Ω)

9.3.22 a. We differentiate the equation φ̇t (x) = F(φt (x)) with respect to x, using the chain
rule, and use smoothness to interchange the space and time derivatives:

(Dφt (x))• = D(φ̇t )(x) = DF(φt (x))Dφt (x).


˙ = D(det)(Dφt (x))(Dφt (x))• .
b. Let J(t) = det(Dφt (x)). Then, by the chain rule, J(t)
Using Exercise 7.5.23 and the result of part a, we have
 
˙ = det(Dφt (x))tr Dφt (x)−1 (Dφt (x))• = J(t)tr Dφt (x)−1 DF(φt (x))Dφt (x)
J(t)
9.4. THE SPECTRAL THEOREM 289

= J(t)tr(DF(φt (x)) = div F(φt (x))J(t).

Here we have used the result of Exercise 1.4.22c, as well as the observation that div F = tr(DF).
But this differential equation is easy to integrate:
Z t
J˙ (t)
= div F(φt (x)) =⇒ log J(t) − log J(0) = div F(φs (x))ds.
J(t) 0
Rt
div F(φs (x))ds
Now, φ0 (x) = x for all x, so J(0) = det D(φ0 )(x) = 1. Therefore, J(t) = J(0)e 0 =
Rt
e 0 div F(φs (x))ds , as required.

9.4. The Spectral Theorem

9.4.1 In each case, denote the given matrix by A.


a. The eigenvalues of A are λ1 = 5 and λ2 = 10, with corresponding " eigenvectors
#
   
−2 1 1 −2 1
v1 = and v2 = . An orthogonal matrix that diagonalizes A is Q = √ .
1 2 5 1 2
 b.
 The eigenvalues
  of A are
 λ1 
= 0 and λ2 = λ3 = 2, with corresponding eigenvectors
0 1 0
v1 =  1 , v2 =  0 , and v3 =  −1 . An orthogonal matrix that diagonalizes A is Q =
1 0 1
 
0 1 0
 1 
 √ 0 − √1 .
 2 2
√1 0 √1
2 2

 c. The eigenvalues


  of A are = λ2 = −2 and λ3 = 4, with corresponding eigenvectors
 λ1 
1 1 −2
v1 =  −2 , v2 =  0 , and v3 =  −1 . Now we must first use the Gram-Schmidt process to
0 2 1
   
1 2
obtain an orthogonal basis for E(−2): w1 =  −2 , w2 =  1 . Thus, an orthogonal matrix that
0 5
 
√1 √2 − √26
 25 30 
diagonalizes A is Q =  −√ √1 − √16 
 5 30 . We get a somewhat nicer result by noticing that a
0 √1 √5
  6 30

 0 1 
different basis for E(−2) is  1  ,  −1  .
 
1 1

 d. The eigenvalues


  of A are λ
1 =
0, λ2 = 3, and λ3 = 6, with corresponding eigenvectors
−2 1 2
v1 =  2 , v2 =  2 , and v3 =  1 . An orthogonal matrix that diagonalizes A is Q =
1 −2 2
 
−2 1 2
1 
 2 2 1 .
3
1 −2 2
290 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

  e. 
The eigenvalues
  λ1 = −3, λ2 = λ3 = 3, with corresponding eigenvectors v1 =
ofA are
−1 −1 1
 −1 , v2 =  1 , and v3 =  0 . We use the Gram-Schmidt process to obtain an orthonormal
1 0 1
   
−1 1
basis for E(3): w2 =  1 , w3 =  1 . Thus, an orthogonal matrix that diagonalizes A is
0 2
 
− √13 − √12 − − √1
6
 1 
Q=
− 3
√ √1
2
√1
6
.

√1 0 √2
3 6
f. The
 eigenvalues
 of Aare λ1 =λ2 = 0 and λ3 =λ4 = 2, with corresponding eigen-
−1 0 1 0
 0  −1  0 1
vectors v1 =        
 1 , v2 =  0 , v3 =  1 , and v4 =  0 . An orthogonal matrix that
0 1 0 1
 
−1 0 1 0
 
1  0 −1 0 1
diagonalizes A is Q = √  .
2
 1 0 1 0
0 1 0 1

9.4.2 By the Spectral Theorem, we know that E(2) must be the orthogonal complement of
   
1 1
 
E(5). Since A is a 3 × 3 matrix, E(2) is one-dimensional, spanned by  1 . Therefore, A 1  =
2 2
   
1 2
   
2  1  =  2 .
2 4
 
1
9.4.3 Since  1  gives a basis for E(2), we deduce from the Spectral Theorem that E(1) =
1
   
 −1 −1 
E(2)⊥ and therefore a basis for E(1) is  1  ,  0  . We then use the change-of-basis formula,
 
0 1
Theorem 1.1, to construct A. Letting

   
2 1 −1 −1
   
Λ= 1  and P = 1 1 0,
1 1 0 1

 
4 1 1
1 
we have A = P ΛP −1 = 1 4 1 .
3
1 1 4
9.4. THE SPECTRAL THEOREM 291

9.4.4 From the given, we deduce that λ1 = 2 and λ2 = 3 are the


 eigenvalues
 of A. Since A is
1
symmetric, E(2) and E(3) must be orthogonal complements. Since gives a basis for E(2), we
1
 
−1
infer that must give a basis for E(3). Letting
1

" # " #
2 1 −1
Λ= and P = ,
3 1 1

" #
1 5 −1
we use the change-of-basis formula to derive A = P ΛP −1 = .
2 −1 5

9.4.5 Since A is symmetric, A is diagonalizable. But since λ is its only eigenvalue, we must
have P AP = λI for some invertible matrix P , and hence A = P (λI)P −1 = λI.
−1

9.4.6 First of all, we see that B is symmetric and therefore diagonalizable. Since A, C, and
D are upper triangular, we read off their eigenvalues from the diagonal entries. A has eigenvalue
5 with algebraic multiplicity 3, but geometric multiplicity 2 (because rank(A − 5I) = 1), and is
therefore not diagonalizable. C has distinct eigenvalues and, according to Corollary 2.7, is therefore
diagonalizable. As far as D is concerned, the eigenvalues are 1—with algebraic multiplicity 2—and
2. We see that the matrix D − I has rank 1, and so the geometric multiplicity of the eigenvalue 1
is also 2. Therefore, the matrix D is diagonalizable as well.

9.4.7 Suppose A is diagonalizable and its eigenspaces are orthogonal. Then there are a diagonal
matrix Λ and an orthogonal matrix Q so that Q−1 AQ = Λ. Then A = QΛQ−1 = QΛQT and
AT = (QΛQT )T = QΛT QT = QΛQT = A, as desired.

9.4.8 Since A is symmetric, there is a basis for Rn consisting of eigenvectors. Let v be an


eigenvector with eigenvalue λ. Then, from the hypothesis, we have Av · v = λv · v = λkvk2 = 0.
Since v 6= 0, we conclude that λ = 0. Since every eigenvalue of A must be 0, then A is similar to
the zero matrix, and hence A = O.

9.4.9 Suppose A satisfies A2 = A. As in Exercise 9.2.18, we see that if λ is an eigenvalue of


A, then λ2 = λ, so λ = 0 or λ = 1. Since A is symmetric, by " the Spectral
# Theorem the matrix for
Ik
A with respect to some orthonormal basis {v1 , . . . , vn } is . This means that µA gives
O
projection onto E(1) = Span(v1 , . . . , vk ).

9.4.10 Suppose [T ]4 = I. If λ is an eigenvalue of T , then λ4 = 1, so λ2 = ±1 and λ = ±1 or


±i. Since T is symmetric, its eigenvalues must be real, and so they can be only either 1 or −1.
Moreover, since T is symmetric, there is an orthonormal basis for Rn consisting of eigenvectors:
Let {v1 , . . . , vk } be a basis for E(1) and {vk+1 , . . . , vn } be a basis for E(−1). Then the matrix for
292 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
" #
Ik
T with respect to the basis {v1 , . . . , vn } becomes . Thus, we conclude that T is the
−In−k
linear transformation defined by reflection across the subspace E(1).

9.4.11 Choose x0 ∈ Rn with kx0 k = 1 so that kAx0 k = kAk. By the proof of the Spectral
Theorem, kAx0 k2 = xT T 2 2
0 (A A)x0 ≤ λkx0 k , so kAk ≤ λ. On the other hand, by the properties of
norm (see Exercises 5.1.6 and 7), we have λ ≤ kAT Ak ≤ kAkkAT k = kAk2 . Therefore, we have

kAk = λ.

9.4.12 a. For any x 6= 0, we have (A + B)x · x = Ax · x + Bx · x > 0 if A and B are both


positive definite (and similarly < 0 if both are negative definite).
b. Suppose A is positive definite and v is an eigenvector with eigenvalue λ. Then from
Av · v = λkvk2 > 0 we conclude that λ > 0. Likewise, if A is negative definite, then λ < 0.
Now suppose A is symmetric and all its eigenvalues are positive. Then there is an orthonormal
Pn
basis {q1 , . . . , qn } consisting of eigenvectors. Given any nonzero x ∈ Rn , write x = ci qi . Then,
i=1
using Aqi = λi qi , i = 1, . . . , n, we have
X  X  X  X  X
Ax · x = A ci qi · cj qj = ci λi qi · cj qj = λi c2i .

Since, by hypothesis, all the λi ’s are positive and some ci 6= 0, we conclude that Ax · x > 0, so A
is positive definite. (Likewise, if all the eigenvalues are negative, then A is negative definite.)
c. Replace the >’s with ≥’s in the arguments given in part b.
d. For any x 6= 0, we have Ax · x = (C T C)x · x = Cx · Cx = kCxk2 . Since rank(C) = n,
we know that N(C) = {0}, and so kCxk2 > 0, from which we conclude that A is positive definite.
It follows from part b that the eigenvalues of A are all positive.
e. Since A and B"are symmetric,
# the matrix
" AB #+ BA is symmetric, but it need not be
1 −2 3 0
positive definite. Take A = and B = . Then A and B are positive definite,
−2 5 0 1
" #
6 −8
yet AB + BA = has negative determinant and therefore cannot be positive definite.
−8 10

9.4.13 This follows immediately from parts b and d of Exercise 12. Nevertheless, here is a
self-contained proof.
Suppose v is an eigenvector of AT A with eigenvalue λ. Then λkvk2 = (AT A)v · v = Av · Av =
kAvk2 . Since A is nonsingular, Av 6= 0, and we conclude that λ > 0.
Conversely, suppose the eigenvalues λ1 , . . . , λn of AT A are all positive and let {q1 , . . . , qn } be
an orthonormal basis for Rn consisting of eigenvectors of AT A. Now let x 6= 0 be arbitrary. Writing
Pn
x= ci qi , we have
i=1
X  X  X
kAxk2 = Ax · Ax = (AT A)x · x = ci λi qi · cj qj = λi c2i ,

and this expression is positive since some ci 6= 0.


9.4. THE SPECTRAL THEOREM 293

9.4.14 By the Spectral Theorem, we have a diagonal matrix Λ, whose diagonal entries are
the eigenvalues of A, and an orthogonal matrix Q so that Q−1 AQ = Λ. By assumption, the
√ √ √
eigenvalues λ1 , . . . , λn ≥ 0. Define Λ to be the diagonal matrix whose entries are λ1 , . . . , λn .
√ √ √ 2
Set B = Q ΛQ−1 . Then B 2 = (Q ΛQ−1 )2 = Q Λ Q−1 = QΛQ−1 = A, as desired.
Now we must argue that there is a unique positive semidefinite square root of A. Let C be any
such. The eigenvalues of A are the squares of the eigenvalues of C and the eigenvalues of C are
nonnegative. Let λ1 , . . . , λs be the distinct eigenvalues of A. Then the only possible eigenvalues
√ √ √
of C are λ1 , . . . , λs . We claim that EC ( λi ) = EA (λi ) for all i = 1, . . . , s. Since A = C 2 , any

eigenvector of C must be an eigenvector of A; thus, EC ( λi ) ⊂ EA (λi ). But since there is a basis
Ps √ Ps
for Rn consisting of eigenvectors of C, it follows that dim EC ( λi ) = n = dim EA (λi ), and
i=1 √ i=1
so, much as in the proof of Theorem 2.9, we conclude that dim EC ( λi ) = dim EA (λi ) and hence

that EC ( λi ) = EA (λi ) for each i = 1, . . . , s. This means that the linear transformation µC is
uniquely determined by decomposing Rn into the eigenspaces of A, and on each such eigenspace,
µC must act by multiplication by the square root of the appropriate eigenvalue of A.

9.4.15 Since A is symmetric, there is a orthonormal basis for Rn consisting of eigenvectors of


A. If A has n distinct eigenvalues, then from the fact that AB = BA we conclude that these
eigenvectors must also be eigenvectors of B. For if v ∈ EA (λ), then A(Bv) = B(Av) = λ(Bv), so
Bv ∈ EA (λ), and so in the event that dim EA (λ) = 1 and {v} gives a basis for the eigenspace, we
conclude that Bv = µv for some scalar µ.
But, more generally, consider EA (λ). The calculation we just did shows that for v ∈ EA (λ), it
is the case that Bv ∈ EA (λ). That is, the restriction of the linear transformation µB maps EA (λ)
to EA (λ). Since, moreover, Bx · y = x · By for all x, y ∈ EA (λ), the matrix for B with respect to
any orthonormal basis for EA (λ) will be symmetric. Thus, the Spectral Theorem tells us that there
is an orthonormal basis for EA (λ) consisting of eigenvectors of B. Doing this for all the eigenvalues
of A, we obtain an orthonormal basis {q1 , . . . , qn } for Rn consisting of eigenvectors for both A and
B. That is, with respect to this basis, the matrices for both A and B become diagonal.

9.4.16 Since A − λI is singular, it follows that B is singular, and so there is a nonzero vector
n
v ∈ R so that Bv = 0. Therefore, we have Bv · v = 0. So, using symmetry of A − aI,

0 = Bv · v = (A − aI)v · (A − aI)v + b2 v · v = k A − aI vk2 + b2 kvk2 . Now, the only way the
sum of two nonnegative numbers can be zero is for them both to be zero. That is, since v 6= 0,
kvk2 6= 0, and we infer that b = 0 and (A − aI)v = 0. So λ = a is a real number, and v is the
corresponding (real) eigenvector.

9.4.17 According to the Spectral Theorem, there is an orthonormal basis in whose coordinates
Pn
yi the quadratic form Q(x) = Ax · x becomes Q̃(y) = λi yi2 . Since A is positive definite, we
i=1
n
!2
X yi
know that all the λi are positive. The ellipsoid E has the equation p ≤ 1, so an easy
i=1
1/λi
294 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

application of the Change of Variables Theorem tells us that vol(E) = vol(unit ball)/ λ1 λ2 . . . λn =

vol(unit ball)/ det A.

9.4.18 We write each of the quadratic forms in the form Ax · x for the appropriate symmetric
matrix A and then deal with the linear terms as is necessary.

x2
y2

x1

y1

a.

" # " #
0 3 1 1 3
a. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
3 −8 10 −3 1
" #
−9
. Letting x = Qy, the equation of the conic becomes −9y12 + y22 = 9, or, equivalently,
1
−y12 + (y2 /3)2 = 1, which is a hyperbola with asymptotes y2 = ±3y1 . The asymptotes in the
x1 x2 -coordinates are given by x2 = 0 and x2 = (3/4)x1 .
" # " #
3 −1 1 1 −1
b. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
−1 3 2 1 1
" #
2
. Letting x = Qy, the equation of the conic becomes 2y12 + 4y22 = 4, or 12 y12 + y22 = 1,
4

which is an ellipse with semimajor axis 1 and semiminor axis 1/ 2.

x2 y2 x2
y1
y2 y1
x1

x1

b. c.
9.4. THE SPECTRAL THEOREM 295
" # " #
16 12 1 4 −3
c. Here A = , and, with Q = , we have Q−1 AQ = Λ =
12 9 5 3 4
" #
25
. Letting x = Qy, the equation of the conic becomes 25y12 + 5y2 = 5, or y2 = 1 − 5y12 ,
0
so this is a downwards-pointing parabola symmetric about the y2 -axis.

" # " #
10 3 1 3 −1
d. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
3 2 10 1 3
" #
11 1 2
. Letting x = Qy, the equation of the conic becomes 11y12 + y22 = 11, or y12 + 11 y2 = 1,
1

which is an ellipse with semimajor axis 11 and semiminor axis 1.

x2
y2 x2 y2

y1

y1
x1 x1

d. e.

" # " #
7 6 1 2 −1
e. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
6 −2 5 1 2
" #
10 √
. Letting x = Qy, the equation of the conic becomes 10y12 − 5y22 + 2 5y2 = 6, or
−5
10y1 − 5(y2 − √15 )2 = 5, which we can rewrite as 2y12 − (y2 − √15 )2 = 1. This is a hyperbola with cen-
2
√ √ √
ter at (0, 1/ 5) and asymptotes y2 = 1/ 5 ± 2y1 . Thus, in the x1 x2 -coordinates, the asymptotes
√ √
are given by x2 = 2−1√2 (1 + (1 + 2 2)x1 ) ≈ 2.7 + 4.8x and x2 = 2+1√2 (1 + (1 − 2 2)x1 ≈ 1.3 − 0.8x.

9.4.19 We write each of the quadratic forms in the form Ax · x for the appropriate symmetric
matrix A and then deal with the linear terms as is necessary.
296 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

   
3 1 1 √2 − √1 0
   16 3 
a. Here A =  1 0 2 , and, with Q =  √
 6
√1
3
− √12  −1
, we have Q AQ =
1 2 0 √1 √1 √1
  6 3 2
4
 
Λ= 1 . Letting x = Qy, the equation of the quadric surface becomes 4y12 +y22 −2y32 =
−2
4, which is a hyperboloid of one sheet.

y3 x3
y2 x2

y1 x1

a.

   
4 −1 −1 √2 − √1 0
   16 3 
b. Here A =  −1 3 2 , and, with Q =  √
 6
√1
3
− √1 , we have Q−1 AQ =
2 
−1 2 3 √1 √1 √1
  6 3 2
3
 
Λ= 6 . Letting x = Qy, the equation of the quadric surface becomes 3y12 +6y22 +y32 =
1
√ √
6, which is an ellipsoid of semimajor axes 1, 2, and 6.

y3
x3
y2
x2

y1 x1

b.
   
−1 −2 −5 √1 √1 − √1
   2 3 6 
c. Here we have A =  −2 2 2 , and, with Q = 
 0 − √1
3
− √2 , we have
6 
−5 2 −1 √1 − √1 √1
  2 3 6
−6
 
Q−1 AQ = Λ =  6 . Letting x = Qy, the equation of the quadric surface becomes
0
−6y12 + 6y22 = 6, or −y12 + y22 = 1, which we recognize as a hyperbolic cylinder.
9.4. THE SPECTRAL THEOREM 297

x3
y3 y2
x2

y1 x1

c.
   
2 1 1 √2 0 − √13
 1 6 
 
d. Here we have A =  1 0 1 , and, with Q =  6
√ − √1
2
√1 , we have
3 
1 1 0 √1 √1 √1
  6 2 3
3
−1  
Q AQ = Λ =  −1 . Letting x = Qy, the equation of the quadric surface becomes
0

3y12 − y22 + 3y3 = 1, which we recognize as a hyperbolic paraboloid (or saddle).

y3 x3

y2 x2

y1 x1

d.

   
3 2 4 − √12 3
√1
2
2
3
   
e. Here we have A =  2 0 2 , and, with Q = 
 0 − √4
3 2
1
3
, we have

4 2 3 √1 √1 2
2 3 2 3
 
−1
−1  
Q AQ = Λ =  −1 . Letting x = Qy, the equation of the quadric surface becomes
8
−y12− + y22 8y32
= 8, which is the equation of a hyperboloid of two sheets.
   
3 0 1 − √12 √12 0
   
f. Here we have A =  0 −1 0 , and, with Q =   0 0 1, we have
1 1
1 0 3 √
2

2
0
 
2
−1  
Q AQ = Λ =  4 . Letting x = Qy, the equation of the quadric surface becomes
−1
2y12
+ 4y22
− + 2y3 = 0, or 2y12 + 4y22 − (y3 − 1)2 = −1, which we recognize as the equation of a
y32
hyperboloid of two sheets.
298 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

y3
x3
x2
y2

y1 x1

e.

y3

x3
y2 x2

x1
y1

f.

9.4.20
" Given
# the quadratic form Q(x) = ax21 + 2bx1 x2 + cx22 , define the symmetric matrix
a b
A= as usual.
b c
2
a. Let " {q1 , q#2 } be an "orthonormal
# basis for R consisting of the eigenvectors of A, and
cos α − sin α
suppose q1 = , q2 = . If λ is an eigenvalue of A, then the corresponding
sin α cos α
" #
a−λ b
eigenvector spans the nullspace of the matrix A − λI = , and therefore—as we saw
b c−λ
" #
b
at the beginning of the section—is given by v = . The angle α this vector makes with the
λ−a
x1 -axis satisfies tan α = (λ − a)/b, and so

2 tan α 2 λ−a
b 2b(λ − a)
tan 2α = 2 =  2 = 2
1 − tan α 1 − λ−a b − (λ − a)2
b
2b(λ − a) 2b(λ − a) 2b
= = = ,
b2 2
− a + 2aλ − λ 2 (a − c)(λ − a) a−c

where at the penultimate step we’ve used the equation λ2 − (a + c)λ + (ac − b2 ) = 0 to eliminate
λ2 .
9.4. THE SPECTRAL THEOREM 299

Since the eigenvalues of A satisfy λ1 λ2 = det A = ac − b2 and λ1 + λ2 = trA = a + c, we see that


the conic Q(x) = 1 will be an ellipse when det A > 0 and trA > 0 and a hyperbola when det A < 0.
The conic will be empty when det A > 0 and trA < 0, and it will be degenerate when det A = 0.
b. Since y = Qx for an orthogonal matrix Q, we have y12 + y22 = x21 + x22 . Now, suppose
the eigenvalues λ and µ satisfy λ ≥ µ. Then

Q̃(y1 , y2 ) = λy12 + µy22 = λy12 + µ(1 − y12 ) = (λ − µ)y12 + µ,

so the maximum value is λ, attained when y1 = ±1, and the minimum value is µ, attained when
y1 = 0. (Cf. our proof of the Spectral Theorem.)

9.4.21 a. Let A′ denote the (n − 1) × (n − 1) matrix obtained by deleting the nth row and
column from A. Expanding the given determinant in cofactors along the last column and then
along the last row, we see that the given determinant is equal to − det(A′ − tI); the roots of this
polynomial are the eigenvalues of A′ . The restriction of the quadratic form Q to the hyperplane
xn = 0 is positive definite if and only if all the eigenvalues of A′ are positive.
b. Without loss of generality, we may assume b is a unit vector. Choose an orthonormal
basis {v1 , . . . , vn } for Rn with vn = b. Let Q be the orthogonal matrix whose columns are
v1 , . . . , vn . Set à = Q−1 AQ. Note that Q(x) = Q̃(QT x) and b · x = 0 if and only if en · QT x = 0.
Now, by part a, we know that the quadratic form Q̃(y) = yT Ãy is positive definite on the subspace
yn = 0 precisely when the roots of

0

..
.
à − tI

0 =0


1

0 ··· 0 1 0

are all positive. But now we observe that



|


à − tI en

|


eT
n 0
   
| | |
   
 Q 0  Ã − tI en  QT 0 
= det 







 |  |  | 
0 1 eT
n 0 0 1

| |


Q(Ã − tI)QT vn A − tI b
= =

,

| |

vnT 0 bT 0
300 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS

so this completes the proof.


c. We use the result of Exercise 5.4.34. Since ∇g(a) gives a normal vector to the
hyperplane Ta S, we take b = ∇g(a); for our symmetric matrix A we take Hess(f − λg)(a).
d. Let V ⊂ Rn be a subspace, and let vn−k , . . . , vn be an orthonormal basis for V ⊥ .
Then the restriction of Q to V is positive definite if and only if the roots of

| |


A − tI vn−k · · · vn

| |

T =0
vn−k
..

. O

vnT

are all positive.

9.4.22 a. If A is nonsingular, when we write A = LDLT , the entries of D are all nonzero. As
suggested in the problem, we consider the “straight-line homotopy” Ls between I and L, defined
by multiplying each non-diagonal entry of L by s, 0 ≤ s ≤ 1. Then we obtain a continuous path,
g(s) = As = Ls DLT s , 0 ≤ s ≤ 1, in Mn×n . Since As is the product of nonsingular matrices, As
is nonsingular for every s. Now, A0 = D and A1 = A, and, by Exercise 8.7.9, the eigenvalues of
As change continuously as s varies. Since 0 is never an eigenvalue, as we watch each eigenvalue
(starting with the ith entry of D and seeing it change continuously until we get the ith eigenvalue of
A), the sign cannot change. Therefore, the number of positive eigenvalues of A equals the number
of positive entries in D, and the number of negative eigenvalues of A equals the number of negative
entries in D.
b. We know that rank(A) = rank(D) = r, so dim E(0) = dim N(A) = dim N(D), which
is the number of zero entries on the diagonal of D. If d1 , . . . , dk are the negative entries of D, choose
ε0 < min(|d1 |, . . . , |dk |). Then, for all 0 < ε ≤ ε0 , all the diagonal entries of D + εI are nonzero,
k negative and n − k positive. It follows from part a that A + εI has k negative eigenvalues and
n − k positive eigenvalues. Since ε can be chosen as small as we want, it follows that A must have
k negative eigenvalues (and r − k positive).

You might also like