Professional Documents
Culture Documents
for
MULTIVARIABLE
MATHEMATICS:
Linear Algebra, Multivariable Calculus,
and Manifolds
Theodore Shifrin
University of Georgia
1. Vectors in Rn 1
2. Dot Product 5
3. Subspaces of Rn 11
4. Linear Transformations and Matrix Algebra 14
5. Introduction to Determinants and the Cross Product 26
3. THE DERIVATIVE . . . . . . . . . . . . . . . . . . . . . 46
7. INTEGRATION . . . . . . . . . . . . . . . . . . . . . 178
1. Multiple Integrals 178
2. Iterated Integrals and Fubini’s Theorem 184
3. Polar, Cylindrical, and Spherical Coordinates 194
4. Physical Applications 199
5. Determinants and n-dimensional Volume 204
6. Change of Variables Theorem 214
1
2 1. VECTORS AND MATRICES
1.1.3 a. This problem requires some geometric insight. The most elegant argument begins by
noticing that the collection of vectors from the origin to the vertices of an n-gon is invariant under
a rotation by angle 2π/n. Thus, the sum of these vectors must also be invariant by this rotation.
But the only vector that is invariant under a rotation by an angle that is not an integral multiple
of 2π is the zero vector. Weconclude that these vectors sum to 0.
cos(2πk/n)
Alternatively, let vk = , k = 0, 1, . . . , n − 1. Visualize summing the vectors by
sin(2πk/n)
moving v1 to the head of v0 , v2 to the head of v1 , and so on. As we see in the diagram below,
these vectors fit together to make a similar regular n-gon, and so the figure closes up when we get
to vn−1 . That is, the vectors add up to 0.
θ
v2 v1
θ
θ
v0
π − θ
b. Assume our polygon is centered at the origin O. Fix a vertex A. Write each vector
−
−→ −→ −−→
AB from A to a vertex B as the sum of AO and OB. Then
X −− → X− −→ X −→ X −−→
AB = AB = AO + OB
B6=A B B B
−→ −→ P −− → −→
The second sum is 0 by part a and −OA = AO, so we have AB = −n OA , where n is the
B6=A
number of vertices of the polygon.
−−→ −−
→ −−→ −→
1.1.4 We have AM = 12 AB and AN = 21 AC. Thus,
−−→ −−→ −−→ 1 −→ 1 − −→ −→ − −→ −−→
M N = AN − AM = 2 AC − 2 AB = 12 (AC − AB) = 12 BC.
−−→ −→ −→ −→
1.1.5 Using △ABC, P Q = 21 AC by Exercise 4. Similarly, SR = 12 AC using △ADC. Hence,
−−→ −→ −
−→ −→
P Q = SR. Similarly, using △BCD and △BAD, QR = P S, so P QRS is a parallelogram.
−→ −→ −−→ −−→ −−→ −−→ −→
1.1.6 We have AQ = AC + CQ and CQ = 12 CD = 12 (AD − AC), so
−→ 1 −→ 1 −−→ 1 −→ 1 −− →
AQ = 2 AC + 2 AD = 2 AC + 3 AB.
−→ −−
→ −→ −→ −→ −→
since AE = 13 AB. Then, AC = 4AP ; since AP is a scalar multiple of AC, P is on AC.
1.1.8 We have
v − x = 31 (x + y + z) − x = 13 (y + z − 2x) = 2
3
1
2 (y + z) − x .
This says that v − x is a scalar multiple of the vector joining A and the midpoint of BC. That is,
the head of the vector v lies on the median from A to BC. A similar argument shows it lies on
each of the medians.
Geometrically, x + y and y + x represent the same vector, as we’ve seen in Figure 1.5 of the text.
b. By the associative law of addition for real numbers,
x1 y1 z1 (x1 + y1 ) + z1
(x + y) + z = ... + ... + ... = ..
.
xn yn zn (xn + yn ) + zn
x1 + (y1 + z1 ) x1 y1 z1
.. .. .. ..
= . = . + . + . = x + (y + z).
xn + (yn + zn ) xn yn zn
Geometrically, just extend any vector x in the opposite direction with the same length, the vector
sum of this vector and x is zero.
e. By the associative property of multiplication for real numbers,
dx1 c(dx1 ) (cd)x1
..
c(dx) = c ... = ..
. = . = (cd)x.
dxn c(dxn ) (cd)xn
Geometrically: First scaling by d and then by c is the same as scaling the original by cd.
f. By the distributive property of multiplication over addition for real numbers,
x1 + y1 c(x1 + y1 ) cx1 + cy1 cx1 cy1
.. .. .. .. ..
c(x + y) = c . = . = . = . + . = cx + cy.
xn + yn c(xn + yn ) cxn + cyn cxn cyn
Geometrically: the scalar multiple of the sum of two vectors is the same as the sum of their
respective scalar multiples.
1.2. DOT PRODUCT 5
x1 (c + d)x1 cx1 + dx1
.. .. ..
(c + d)x = (c + d) . = . = .
xn (c + d)xn cxn + dxn
cx1 dx1 x1 x1
.. .. .. ..
= . + . = c . + d . = cx + dx.
cxn dxn xn xn
The sum of two vectors, geometrically, is the diagonal from the origin to the opposite corner of the
parallelogram created by the two vectors; when the two vectors are scalar multiples of one another,
this parallelogram is flattened, so that the diagonal is also a multiple of the sides; that multiple is
the sum of the individual multiples.
x1
.
h. By the definition of the multiplicative identity 1 ∈ R, we have 1x = 1 .. =
1x1 x1 x
n
.. ..
. = . = x. Geometrically, multiplication by 1 changes neither the length nor the
1xn xn
direction of the vector.
1.1.13 a. Starting with the equation 0 + 0 = 0 and using property g, we have 0x = (0 + 0)x =
0x + 0x. Adding the additive inverse of 0x to both sides, and using properties b and c, we obtain
0 = 0x + (−0x) = (0x + 0x) + (−0x) = 0x + (0x + (−0x)) = 0x + 0 = 0x.
b. First notice, by properties h and g, that (−1)x + x = (−1)x + (1)x = (−1 + 1)x =
0x = 0, by part a. But this says that (−1)x is the additive inverse of x. (Note the additive inverse
is unique.)
1.2.4 We may assume one corner of the box is at the origin. Let x, y, and z denote three
edges of the box so that z is the longest edge and w = x + y + z is the long diagonal. Then we have
kzk = 5, kwk2 = (x+y+z)·(x+y+z) = kxk2 +kyk2 +kzk2 = 50, and w ·z = (x+y+z)·z = kzk2 ,
since x, y, and z are mutually orthogonal. Then the angle θ between w and z satisfies cos θ =
w·z kzk 1
= = √ , from which we deduce that θ = π/4.
kwkkzk kwk 2
1.2.5 If θ = arccos(1/4), then x · y = kxkkyk cos θ = 1/2. Then,
1.2.6 Let α, β, and γ denote the angles between x and y, y and z, and x and z, respectively.
Hence, cos β = cos γ. Repeating this argument, it is easy to conclude that cos α = cos β = cos γ,
and so α = β = γ. Furthermore, substituting α = γ in the equation (∗), we get cos α = −1/2.
Thus, α = β = γ = 2π/3.
x1
1.2.7 Let x = x2 ∈ R3 , x 6= 0. Then, since kei k = 1, we have xi = x · ei = kxk cos θi ,
x3
i = 1, 2, 3, and so kxk2 (cos2 θ 1 + cos2 θ2 + cos2 θ3 ) = x21 + x22 + x23 = kxk2 . Thus, cos2 θ1 + cos2 θ2 +
cos2 θ3 = 1.
1.2.8 We have kxk2 = n, kyk2 = n(n + 1)(2n + 1)/6, and x · y = n(n + 1)/2. Therefore,
s √
n(n + 1) 3(n + 1) 3
cos θn = q = → as n → ∞.
2(2n + 1) 2
2n (n+1)(2n+1)
6
Therefore, θn → π/6 as n → ∞.
1.2.9 The point x + t0 y is the point closest to the origin on the line through x with direction
vector y. But this also means that −t0 y is the point on the line spanned by y closest to x. That
is, we expect that −t0 y should be the projection of x onto y; since −t0 = x · y/kyk2 , this is indeed
the case.
8 1. VECTORS AND MATRICES
x−t0y
x
t0y
y
1.2.10 We position the parallelogram with one vertex at the origin, and we label the vectors
emanating from that vertex x and y. The parallelogram is a rectangle if and only if x · y = 0. The
diagonals of the parallelogram are x + y and x − y. Since kx + yk2 = kxk2 + 2x · y + kyk2 and
kx − yk2 = kxk2 − 2x · y + kyk2 , the diagonals have equal lengths if and only if x · y = 0.
1.2.11
kx + yk2 + kx − yk2 = (x + y) · (x + y) + (x − y) · (x − y)
= kxk2 + 2x · y + kyk2 + kxk2 − 2x · y + kyk2
= 2(kxk2 + kyk2 ).
Geometrically, the sum of the squares of the lengths of the diagonals of a parallelogram is equal to
the sum of the squares of the lengths of its four sides.
−→ −−→ −
−→ −−
→
1.2.12 Let x = CA and y = CB. Then AB = y − x, and c2 = kABk2 = ky − xk2 =
kyk2 − 2y · x + kxk2 = a2 − 2ab cos θ + b2 .
1.2.13 The diagonals of a parallelogram with sides x and y are x + y and x − y. Now, x + y
is orthogonal to x − y if and only if (x + y) · (x − y) = 0 (by definition). But
(x + y) · (x − y) = x · x − x · y + y · x − y · y = kxk2 − kyk2 .
So (x + y) · (x − y) = 0 if and only if kxk = kyk, i.e., if and only if the parallelogram is a rhombus.
1.2.14 As shown in Figure 2.5 of the text, the relevant sides of the triangle are the vectors x − y
and −(x+y). Since kxk = kyk (the radius of the circle), we have (x+y)·(x−y) = kxk2 −kyk2 = 0,
so the triangle is a right triangle.
Answer to the geometric challenge: The locus consists of two circular arcs.
1.2.17 From Corollary 2.4 we have kxk = k(x − y) + yk ≤ kx − yk + kyk, and so kxk − kyk ≤
kx−yk. Switching x and y, we infer that kyk−kxk ≤ kx−yk as well. Thus, kxk−kyk ≤ kx−yk,
as required.
1.2.18 Let ℓ, h, and w denote the length, height, and width of the box. The length of the long
√
diagonal is then c = ℓ2 +h2 + w2 . We wish to maximize ℓ + h + w while holding c constant. If
ℓ 1
we let x = h and y = 1 , then the Cauchy-Schwarz inequality gives
w 1
√
ℓ + h + w = x · y ≤ kxkkyk = c 3;
√
equality holds when x and y are parallel, i.e., when ℓ = h = w = c/ 3. Thus, the optimal box is
a cube.
1.2.19 We have
1.2.20 a. Define α (resp., β) to be the angle between x + y and x (resp., y). Then
x · (x + y) x · y + kxk2
cos α = =
kxkkx + yk kxkkx + yk
y · (x + y) x · y + kyk2
cos β = = .
kykkx + yk kykkx + yk
Since kxk = kyk, we see that cos α = cos β. Thus, α = β, and x + y bisects the angle between x
and y.
b. Replacing x by bx and replacing y by ay, we obtain two vectors of equal length, and
so the result of part a shows that bx + ay bisects the angle between bx and ay, i.e., the angle
between x and y.
10 1. VECTORS AND MATRICES
1.2.21 Assume the parallelogram P has a vertex at the origin and that vectors x and y form
two sides of P. Let a = kxk and b = kyk. Suppose that the diagonal x + y bisects the angle
between x and y. Then by Exercise 20, x + y must be a scalar multiple of bx + ay, which means
that a = b (we may cite Exercise 1.1.10 if we wish here). Thus, P is a rhombus.
Conversely, suppose P is a rhombus, so kxk = kyk. Then, by Exercise 20, the diagonal x + y
bisects the angle between x and y. To see that it also bisects the opposite angle, we observe that
the opposite angle is the same as the angle between −x and −y. Finally, notice that the other
two angles are given by the angles between −x and y (or x and −y), so they are bisected by the
diagonal x − y.
−−
→ −→ −−→
1.2.22 Let x = AB, y = AC, a = kxk, and b = kyk. By Exercise 20 we know that since AD
−−→
bisects ∠CAB it must be a multiple of bx + ay, i.e., AD = s(bx + ay) for some s ∈ R. Also, since
−−→
D lies on BC we know that AD = x + t(y − x) for some t ∈ R. Equating these expressions yields
sbx + say = (1 − t)x + ty. Since x and y are nonparallel we conclude from Exercise 1.1.10 that
sb = 1 − t and sa = t. Thus, t/(1 − t) = a/b. Finally, we have
−−→ −−
→
kBDk t a kABk
−−→ = 1 − t = b = −→ ,
kCDk kACk
as required.
1.2.23 Using the notation in the hint and Exercise 20, we know that the bisector of ∠AOB is
given by
t(bx + ay), t ∈ R.
Furthermore, the bisector of ∠OAB passes through the point A and has direction given by the
direction of the bisector of −x and y − x, i.e., this line is given parametrically by
x + s a(y − x) − cx , s ∈ R.
and so, by Exercise 1.1.10, we have tb = 1 − (a + c)s and ta = sa. So s = t = 1/(a + b + c), and
thus
−−→ 1
OP = (bx + ay).
a+b+c
Finally, the line bisecting ∠OBA passes through the point B and has direction given by the direction
of the bisector of −y and x − y, and so this line is given parametrically by
y + u b(x − y) − cy , u ∈ R.
−−→
It is straightforward to check that setting u = 1/(a + b + c) gives the vector OP as well. Thus, P
lies on all three angle bisectors.
1.3. SUBSPACES OF Rn 11
−→ −−→ −−→
1.2.24 Let C be as in the hint and let x = OA, y = OB, and z = OC. Since C lies on the
altitude through B, we notice (z − y) · x = 0, i.e., x · z = x · y. Similarly, since C lies on the altitude
through A, we have y · z = x · y. In particular, z · x = z · y, so z · (x − y) = 0. This means that
−−→ −
−→
OC is orthogonal to AB, so the altitude through O passes through C, as we needed to show.
If we take the dot product of this equation with y and recall that ρ(y) · y = 0, we get
1
2x · y + sρ(x) · y = 21 kyk2 .
kyk2 − x · y
Solving for s gives s = .
2ρ(x) · y
b. Since 12 (x + y) is the midpoint of AB, the perpendicular bisector of AB is given by
1 1
2 (x + y) + sρ(y − x), s ∈ R. To show that z lies on this line, it suffices to show that z − 2 (x + y)
is orthogonal to y − x. But
1
kyk2 − x · y 1
z − 2 (x + y) · (y − x) = ρ(x) − 2 y · (y − x)
2ρ(x) · y
kyk2 − x · y
= ρ(x) · y − 21 y · (y − x)
2ρ(x) · y
= 12 kyk2 − x · y + y · x − kyk2 = 0.
−→ −−→ −−→
1.2.26 As usual, let OA = x and OB = y. Then OP = 13 (x + y) and, by Exercise 25,
−−→ kyk2 − x · y ←→
OR = 12 x + cρ(x), where c = . Now, Q must lie on the altitude from B to OA,
2ρ(x) · y
−−→ ←→
so OQ = y + tρ(x) for some scalar t; similarly, Q must lie on the altitude from A to OB, so
−−→ −−→
OQ = x + sρ(y) for some scalar s. Solving, we find that OQ = y − 2cρ(x). Then we have
−−→ −−→ −−
→
QR = 12 x − y + 3cρ(x) and QP = 13 (x − 2y) + 2cρ(x) = 23 QR. Therefore, P lies two-thirds the
way from Q to R along QR.
When the triangle is isosceles, the intersection of the angle bisectors does lie on that line.
However, for example, let △OAB be a right triangle with right angle at O. Then Q = O and R is
the midpoint of AB, and OR bisects ∠O only when we have an isosceles right triangle.
1.3. Subspaces of Rn
1.3.1 a. No: 0 ∈
/ V.
1 0
b. Yes: V = Span 0 , 1 .
1 1
12 1. VECTORS AND MATRICES
c. No: 0 ∈
/ V.
d. No: 0 ∈
/ V.
e. Yes: x21 + x22 + x23 = 0 ⇐⇒ x = 0, so V = {0}.
f. No: 0 ∈
/ V (in fact, V = ∅).
2 1
g. Yes: V = Span 1 , 2 .
1 1
3 2 1 2 1
h. Yes: 0 = 2 1 − 2 , so V = Span 1 , 2 .
1 1 1 1 1
2 2 1
i. No: 0 6∈ V , since solving 0 = 4 + s 1 + t 2 is equivalent to solving
−1 1 −1
2s + t = −2
s + 2t = −4
s − t = 1 ,
1.3.2 In order for us to conclude from this argument that 0 ∈ V , we must first have some
v ∈ V . The first criterion is equivalent to stipulating that V be nonempty.
1.3.3 x · (c1 v1 + c2 v2 + · · · + ck vk ) = (x · c1 v1 ) + (x · c2 v2 ) + · · · + (x · ck vk ) = c1 (x · v1 ) +
c2 (x · v2 ) + · · · + ck (x · vk ) = 0 + 0 + · · · + 0 = 0, as required.
1.3.4 We check the requisite three properties. (i) 0 ∈ V ⊥ since 0 · v = 0 for every v ∈ V .
(ii) Suppose x ∈ V ⊥ and c ∈ R. We must check that cx ∈ V ⊥ . We calculate: (cx) · v = c(x · v) = 0
for all v ∈ V , as required.
(iii) Suppose x, y ∈ V ⊥ ; we must check that x+y ∈ V ⊥ . Well, (x+y)·v = (x·v)+(y·v) = 0+0 = 0
for all v ∈ V , as needed.
1.3.5 Since W is a subspace, it is closed under scalar multiplication and addition, so every
vector of the form c1 v1 + c2 v2 + · · · + ck vk must lie in W . That is, the general element of V is an
element of W , so V ⊂ W .
Therefore, x + y ∈ U ∩ V .
In conclusion, U ∩ V is a subspace.
Example 1. Let U = Span(e1 ) and V = Span(e2 ) ⊂ R2 . The lines U and V are subspaces of
R2 and the point U ∩ V = {0} is a subspace as well.
Example 2. Let U = Span(e1 , e2 ) and V = Span(e1 , e3 ) ⊂ R3 . The planes U and V are
subspaces and their intersection, the line spanned by e1 , is a subspace of R3 as well.
b. No. Let U and V be as in the first example above. Then the vector e1 + e2 is a sum
of two vectors in U ∪ V , but does not lie in U ∪ V .
c. (i) Since 0 ∈ U and 0 ∈ V , we have 0 = 0 + 0 ∈ U + V . (ii) Suppose x ∈ U + V and
c ∈ R. We are to show that cx ∈ U + V . By definition, x can be written in the form x = u + v for
some u ∈ U and v ∈ V . Then cx = c(u + v) = (cu) + (cv) ∈ U + V , inasmuch as each of U and V is
closed under scalar multiplication. (iii) Suppose x, y ∈ U + V . Then x = u + v and y = u′ + v′ for
some u, u′ ∈ U and v, v′ ∈ V . Therefore, x + y = (u + v) + (u′ + v′ ) = (u + u′ ) + (v + v′ ) ∈ U + V ,
since U and V are both closed under addition.
Example 1. Let U = Span(e1 ) and V = Span(e2 ) ⊂ R2 . Then U + V = R2 .
Example 2. Let U = Span(e1 , e2 ) and V = Span(e1 + e3 ). Then U + V = R3 .
1.4.2 a. Since the j th column of A is Aej and Ax = 0 for all x ∈ Rn , Aej = 0. Since this is
true for all j = 1, . . . , n, A = O. Working with the rows rather than with the columns of A, we
can argue as follows: For any i = 1, . . . , m, we have Ai · x = 0 for all x ∈ Rn . By Exercise 1.2.15,
Ai = 0.
b. Apply the result of part a to A − B.
so cos(θ + φ) = cos θ cos φ − sin θ sin φ and sin(θ + φ) = sin θ cos φ + cos θ sin φ.
" #" # " #
cos θ − sin θ x1 x1 cos θ − x2 sin θ
1.4.7 a. Aθ x = = , so
sin θ cos θ x2 x1 sin θ + x2 cos θ
kAθ xk2 = (x1 cos θ − x2 sin θ)2 + (x1 sin θ + x2 cos θ)2 = x21 + x22 = kxk2 .
b. x · Aθ x = x21 cos θ − x1 x2 sin θ + x1 x2 sin θ + x22 cos θ = (x21 + x22 ) cos θ and
kAθ xkkxk = kxk2 = x21 + x22 , so
x · Aθ x (x2 + x2 ) cos θ
= 1 2 2 2 = cos θ.
kAθ xkkxk x1 + x2
(Strictly speaking, we really want the signed angle between the vectors to be θ. That is, we should
rotate counterclockwise through an angle θ to get from x to Aθ x. See the discussion in Section 5.)
1.4.9 a. If A2 = I2 , then
From the third equation we infer that either b = 0 or a + d = 0. If b = 0, then the first two
equations give a2 = d2 = 1, so a = ±1 and d = ±1. The last equation gives a + d = 0 or c = 0.
Thus, the solutions with b = 0 are given by
" # " # " # " #
1 0 −1 0 1 0 −1 0
, , , or , c ∈ R.
0 1 0 −1 c −1 c 1
and
" # " #
a b 1 β
2
=a
−a /b −a −a/β −1
Re1 e2 π
−θ
2
θ
e1
Re2
′
Ak a
· ′ℓ , 1 ≤ k, ℓ ≤ m
Bk c
A bℓ′
k
· ℓ−m
, 1 ≤ k ≤ m, m + 1 ≤ ℓ ≤ m + n
′
Bk dℓ−m′
Ck−m a
· ′ℓ , m + 1 ≤ k ≤ m + n, 1 ≤ ℓ ≤ m
D cℓ
Ck−m b′
k−m
· ℓ−m
′ , m + 1 ≤ k, ℓ ≤ m + n.
Dk−m dℓ−m
′ ′
Ak aℓ ′ ′ ′ ′ Ak bℓ−m
Now, · ′ = Ak · aℓ + Bk · cℓ is the kℓ-entry of AA + BC , and · =
Bk cℓ B d′ℓ−m
′ k
Ck−m a
Ak ·b′ℓ−m + Bk ·d′ℓ−m is the (k, ℓ − m)-entry of AB ′ + BD ′ . Similarly, · ′ℓ = Ck−m ·a′ℓ +
Dk−m cℓ
′
Ck−m b
Dk−m ·c′ℓ is the (k −m, ℓ)-entry of CA′ +DC ′ , and · ℓ−m = Ck−m ·b′ℓ−m +Dk−m ·d′ℓ−m
Dk−m d′ℓ−m
" #" #
′ ′ A B A′ B ′
is the (k − m, ℓ − m)-entry of CB + DD . Thus, the kℓ-entries of and
C D C ′ D′
" #
AA′ + BC ′ AB ′ + BD ′
agree, and so the matrices are equal.
CA′ + DC ′ CB ′ + DD ′
1.4.13 a.# We have S(e1 ) = −e2 and S(e2 ) = −e1 , so the standard matrix for" S is A #=
"
0 −1 0 −1
. And T (e1 ) = e2 and T (e2 ) = −e1 , so the standard matrix for T is B = .
−1 0 1 0
b. We have # 1 ) = T (−e2 ) = e1 and (T ◦ S)(e2 ) = T (−e1 ) = −e2 , so the standard
" (T ◦ S)(e
1 0
matrix for T ◦ S is . Note that this is, in fact, the matrix product BA.
0 −1
c. We have " # 1 ) = S(e2 ) = −e1 and (S ◦ T )(e2 ) = S(−e1 ) = e2 , so the standard
(S ◦ T )(e
−1 0
matrix for S ◦ T is , which is the matrix product AB.
0 1
20 1. VECTORS AND MATRICES
" #
1 1
1.4.14 a. Rotating the vector e1 by −π/4 gives the vector √ ; reflecting that vector
2 −1
" # " #
1 −1 1 1
across the line x1 = x2 gives √ . Similarly, rotating e2 by −π/4 gives the vector √ ,
2 1 2 1
" #
1 −1 1
which is left unchanged by the reflection. Thus, the standard matrix for T is √ .
2 1 1
b. The rotation takes the standard basis vectors e1 , e2 , and e3 to e1 , e3 , and −e2 , respec-
tively. Reflecting the latter vectors across the plane x2 = 0 results in e1 , e3 , and e2 , respectively.
1 0 0
Thus, the standard matrix for T is 0 0 1 .
0 1 0
c. The first rotation takes the standard basis vectors e1 , e2 , and e3 to e1 , −e3 , and e2 ,
respectively. The second rotation
takes the latter
vectors to e2 , −e3 , and −e1 , respectively. Thus,
0 0 −1
the standard matrix for T is 1 0 0 .
0 −1 0
1.4.15 a.
carries e1 to e2 , e2 to −e1 , and leaves e3 fixed. Thus, the standard
This symmetry
0 −1 0
matrix is 1 0 0 .
0 0 1
1
b. Since the front face, whose center is 0 , is moved to the bottom face,
whose
center
0 0 0
is 0 , we see that e1 is mapped
to −e3 . Likewise, the right face, with center 1 , is moved
−1 0 0
to theleft face, with center −1 , so e2 is mappedto −e
2 . Finally, the top face, whose center
0 0 −1
is 0 , is moved to
the back face, whose center is
0 , so e3 is mapped to −e1 . Thus, the
1 0 0 −1 0
standard matrix is 0 −1 0 .
−1 0 0
c. Once again, following faces, we see that the top face moves to the front face, the front
face moves to the right face, and the right
face moves tothe top. Thus, e3 maps to e1 , e1 maps to
0 0 1
e2 , and e2 maps to e3 . So the matrix is 1 0 0 , which is again an orthogonal matrix.
0 1 0
1.4.18 Examples
" are: #
0 1
a. A =
0 0
22 1. VECTORS AND MATRICES
0 1 0
b. A = 0 0 1
0 0 0
The n × n matrix A with ai,i+1 = 1, i = 1, . . . , n − 1, and all other entries 0 has this property.
Another conjecture one might offer is that if An−1 6= O and An = O, then A must be at least n × n.
1.4.27 We claim that Aθ x·y = x·A−θ y for all x, y ∈ R2 . Since, by Exercise 7, multiplication by
Aθ preserves length, this is equivalent to the observation that the angle between Aθ x and y is equal
to the angle between x and A−θ y. It then follows from Proposition 4.5 that A−1 θ = A−θ = Aθ .
T
" # " #
1 0 0 1
1.4.28 a. There are two: and .
0 1 1 0
b. There are six:
1 0 0 1 0 0 0 1 0
0 1 0, 0 0 1, 1 0 0,
0 0 1 0 1 0 0 0 1
0 1 0 0 0 1 0 0 1
0 0 1, 1 0 0, and 0 1 0.
1 0 0 0 1 0 1 0 0
then P Q 6= QP .
d. Let p1 , . . . , pn denote the columns of P . Since P is a permutation matrix, we have
pT
i pj = 0 if i 6= j and pT i pi = 1. The rows of P
T are given by pT , . . . , pT . Thus, P T P = I.
1 n
Similarly, letting P1 , . . . , Pn denote the rows of P , we have Pi · Pj = 0 if i 6= j and Pi · Pi = 1, so
P P T = I, as well. Thus, P T = P −1 , as required.
e. The rows of P A are a rearrangement (permutation) of the rows of A. If the ith row
of P has a 1 in the k th entry, then the ith row of P A is Ak . Similarly, the columns of the matrix
AP are a rearrangement of the columns of A. For instance, if the j th column of P has a 1 in the
k th entry, then the j th column of AP is ak .
1.4.29 x · y = x · AT b = Ax · b = 0 · b = 0.
1.4.32 If (AT A)x = 0, then, using Proposition 4.5, we have (AT A)x · x = Ax · (AT )T x =
Ax · Ax = 0, so kAxk2 = 0, from which we conclude that Ax = 0.
1.4.34 a. Notice that for any square matrix A, (AT A)ij = ai · aj . So, if A is orthogonal,
1, i = j
ai · aj = (AT A)ij = (In )ij = .
0, i 6= j
" √ # " √ # 1 0 0 1 0 0
3 1 3
2 − 12
b. √2 or 2 √ , 0 −1 0 or 0 −1 0 ,
3 3
− 12 2 − 12 − 2 0 0 1 0 0 −1
1 2 2
1
3 3 3 3 − 23 2
3
2 1
3 3 − 23 or 2
3 − 13 − 23 .
2
3 − 23 1
3
2
3
2
3
1
3
c. By part a, the column vectors a1 and a2 must be mutually orthogonal unit vectors.
cos θ
In particular, a1 = for some θ. Since a2 must be a unit vector orthogonal to a1 , we must
sin θ
− sin θ
have a2 = ± .
cos θ
d. "The first#matrix in part c is Aθ , the matrix giving rotation through angle θ; the second
1 0
matrix is Aθ , the composition of a rotation and a reflection.
0 −1
e. Notice that for any square matrix A, (AAT )ij = Ai · Aj . As remarked in the exercise,
if AT A = I, then it follows that AAT = I as well, so Ai · Aj = (AAT )ij = Iij .
1.4.37 It must be the case that A = cI for some scalar c. Denote by Eij , the matrix with a 1 in
the ij-entry and 0’s elsewhere. Then from AEij = Eij A we infer that all the nondiagonal entries
of the j th row and ith column of A are 0 and (comparing ij-entries of the product) that aii = ajj .
26 1. VECTORS AND MATRICES
1.5.1 The parallelogram OADB has the same area as parallelogram OAEC because △OBC
E
D
C
B
y+cx
y
x A
O
and △ADE are congruent. Alternatively, the result follows from Cavalieri’s principle, as sliding
cross-sections parallel to the base OA results in a figure with the same area.
x1 y
1.5.2 Let x = and y = 1 . Then
x2 y2
Note that at the last step we have used the observation that, by Property 1, for any z ∈ R2 , we
have D(z, z) = −D(z, z), and so D(z, z) = 0.
n−1
1X 1
= D(vi , vi+1 ) + D(vn , v1 ).
2 2
i=1
(The latter sum is the sum of the signed areas of triangles with vertices at the origin, vi , and vi+1 .
This makes good sense, since the answer should be independent of the location of the origin.)
O
v1
b11 b12
1.5.4 a. Let b1 = and b2 = be the columns of B. Then
b21 b22
" # " #!
a11 b11 + a12 b21 a11 b12 + a12 b22
det AB = D(Ab1 , Ab2 ) = D ,
a21 b11 + a22 b21 a21 b12 + a22 b22
= (a11 b11 + a12 b21 )(a21 b12 + a22 b22 ) − (a21 b11 + a22 b21 )(a11 b12 + a12 b22 )
= (a11 a22 − a12 a21 )((b11 b22 − b12 b21 ) = det A det B.
1.5.7 Using the cross products already calculated in Exercise 6 and the approach of Example
2, we find
a. x1 − x2 + x3 = 0
b. x1 − x2 + x3 = 3
c. 3x1 + 4x2 + 5x3 = 0
d. 3x1 + 4x2 + 5x3 = 12
1 1
1.5.8 Since the plane is parallel to the vectors u = −2 and v = 0 , its normal vector is
−2 −1 1
given by A = u × v = −2 . Thus, an equation of the plane is x1 + x2 − x3 = −2.
2
1 2
1.5.9 The normals of the respective planes are u = 1 and v = 1 , so the intersection
3 −2 1
of the planes is spanned by w = u × v = −5 .
−1
1.5.10 Note first that P = {x : a · x = b} is an affine plane with normal a at distance b/kak
from the origin. In addition, ℓ = {x : a × x = c} is a line parallel to a in the plane through 0
orthogonal to c and at distance kck/kak from the origin. Since the intersection of P and ℓ is a
single point,
there is a unique
vector
x satisfying the two equations. Algebraically, one can check
a a
that x = b +c× is the unique solution.
kak kak2
2 1
1.5.11 The points x0 = 1 and y0 = 1 lie on ℓ and m respectively. The distance between
1 0
ℓ and m is found by finding the projection of x0 − y0 on a vector v orthogonal to both lines. The
e1 0 1
latter
is computed by the cross product of the respective direction vectors: v = e2 1 1 =
2
1
e3 −1 1
1
−1 . Thus, the distance between the lines is
projv 0
= √6 .
1
−1
1.5. INTRODUCTION TO DETERMINANTS AND THE CROSS PRODUCT 29
1.5.12 The volume of the parallelepiped is given by the absolute value of D(x, y, z). Now,
1 2 −1
D(x, y, z) = 2 3 0 = −2,
1 1 3
so the volume is 2.
1.5.13 Suppose the parallelogram P is spanned by x and y. Then we know from Proposition
5.1 that area(P) = kx × yk. But, by definition,
x y 2 x y 2 x y 2 2 2 2
2 2 3 3 1 1
kx × yk2 = + + = area(P1 ) + area(P2 ) + area(P3 ) .
x3 y 3 x1 y 1 x2 y 2
1.5.14 a. The first equation is immediate from Property 1 of determinants, and the second from
Property 3.
b. We have (e1 × e2 ) × e2 = e3 × e2 = −e1 , and yet e1 × (e2 × e2 ) = e1 × 0 = 0.
and so
0 −c b
[T ] = c 0 −a .
−b a 0
Now, (a × x) · y = (x × y) · a, so (a × x) · y = (x × y) · a = −(y × x) · a = −(a × y) · x. Therefore,
T (x) · y = −x · T (y), and so [T ] must be skew-symmetric.
1.5.16 Although one can bludgeon this in coordinates, it is best to note that both sides are
linear in each of the vectors x, y, z, and w. Thus, it suffices to check the equality when each is
replaced by one of the standard basis vectors for R3 : x = ei , y = ej , z = ek , and w = eℓ .
Notice that if i = j or k = ℓ, both sides vanish: the left because the cross product of a vector
with itself is 0, the right because either the rows or the columns are equal. If i = k and j = ℓ,
i 6= j, then both sides are 1. If i = k and i 6= j, j 6= ℓ, then ei × ej = ±eℓ and ei × eℓ = ∓ej , and
so both the left and the right vanish. Up to obvious symmetries, this covers all the bases.
1.5.17 a. As the hint suggests, we write x − u = s(v − u) + t(w − u) for some (unique) scalars
s and t. Then we have x = (1 − s − t)u + sv + tw, and, letting r = 1 − s − t, we are done.
b. The signed area of the triangle with vertices x, v, and w is given by
D(x − v, w − v) = D r(u − v) + t(w − v), w − v = r D(u − v, w − v),
30 1. VECTORS AND MATRICES
as required. Similarly, s and t are, respectively, the ratios of the signed areas of △uxw and △uvx
to that of △uvw.
c. From Exercise 1.1.8 we know that x = 31 (u + v + w), so this tells us that each of the
three triangles has one-third the area of △uvw. This is a non-obvious result.
d. This is an immediate consequence of parts a and b: We can express 0 = ru + sv + tw
uniquely, with r = D(v, w), s = D(w, u), and t = D(u, v). For a physical interpretation, we have
this: If we translate our coordinates so that the origin is in the interior of △uvw, then putting
masses r, s, and t at vertices u, v, and w, respectively, the system balances at the origin.
1.5.18 a. We have
Thus, letting θ be the angle between x and y, we have kx × yk2 = kxk2 kyk2 (1 − cos2 θ) =
kxk2 kyk2 sin2 θ, so kx × yk = kxkkyk sin θ, which is, indeed, the area of the parallelogram spanned
by x and y.
b. Think of the parallelepiped as having as its base the parallelogram spanned by x and
y. Then its height is kzk cos ω, where ω is the angle between z and x × y. Thus, the signed volume
of the parallelepiped is kx × ykkzk cos ω = (x × y) · z.
x×y
z
ω
1
1.5.19 Since x · y = 2 kxk2 + kyk2 − kx − yk2 , we have
1 1
A2 = kx × yk2 = kxk2 kyk2 − (x · y)2
4 4
1 2 2 1 2 2 2 2 1 1 2 2 2 1 2 2 2
= a b − (a + b − c ) = ab + (a + b − c ) ab − (a + b − c )
4 4 4 2 2
1.5. INTRODUCTION TO DETERMINANTS AND THE CROSS PRODUCT 31
1 1
= (a + b)2 − c2 c2 − (a − b)2 = (a + b + c)(a + b − c)(c + a − b)(c − a + b)
16 16
a+b+c a+b−c c+a−b c−a+b
= · · · = s(s − a)(s − b)(s − c).
2 2 2 2
1.5.20 As we see in the figure, we can decompose the triangle into three triangles, each having
height r. Then, using the result of Exercise 19, we have
c b
r
a
1 p
A= (a + b + c)r = rs = s(s − a)(s − b)(s − c),
2
p
and so r = (s − a)(s − b)(s − c)/s.
CHAPTER 2
Functions, Limits, and Continuity
2.1. Scalar- and Vector-Valued Functions
2 4
2.1.1 a. x= +t .
0 −3
−1 3
b. x = +t .
2 1
1 1
c. x = 2 + t −1 .
1 −1
−2 −5
d. x = +t .
1 3
1 1
1 −2
e. x=
0 + t 3 .
−1 −1
1 − t2
so either x = −1 (which we discard) or x = . Thus, we have the parametrization
1 + t2
" 2
#
1−t
x = g(t) = 1+t2 .
2t
1+t2
x cos θ
Alternatively, if we write = , then t = tan θ2 , and
y sin θ
θ 2 1 − t2 θ θ 2t
x = cos θ = 2 cos2 −1= − 1 = and y = sin θ = 2 sin cos = .
2 1 + t2 1 + t2 2 2 1 + t2
b. For every rational number t, we obtain rational numbers x and y with x2 + y 2 = 1.
Clearing denominators, we obtain integers X, Y , and Z with X 2 + Y 2 = Z 2 . In particular, for all
the rational numbers t with 0 < t < 1 we obtain distinct points in the first quadrant of the unit
circle, hence distinct Pythagorean triples having no common ratios. To get a more explicit formula,
32
2.1. SCALAR- AND VECTOR-VALUED FUNCTIONS 33
b
φ
ψ
a
a
φ
θ
P θ
b ψ
P
2.1.6 The answer is 3, as can be deduced from the parametric equations for the epicycloid in
Exercise 5. When a = 2 and b = 1, we see that the motion of P relative to the center of the smaller
cos 3θ
circle is given by − , so the coin makes 3 full revolutions.
sin 3θ
A more intuitive argument is this: Imagine unrolling the circumference of the large coin. Then
the small coin makes two revolutions as it traverses that circumference. But when we roll the
circumference back into its circular shape, that adds one more revolution. (Think, by way of
analogy, of the observed speed of a woman walking in a bus aisle as the bus moves down the road.)
t x(t)
2.1.7 When the master is at , let the dog’s position be . Then we have x(t) =
0 y(t)
t + cos θ(t), y(t) = sin θ(t), so
y ′ (t) cos θ(t)θ ′ (t)
tan θ(t) = = .
x′ (t) 1 − sin θ(t)θ ′ (t)
34 2. FUNCTIONS, LIMITS, AND CONTINUITY
Solving for θ ′ (t), we find that θ ′ (t) = sin θ(t). Separating variables and integrating, we have
R R
dθ/ sin θ = dt, and so t = − log(csc θ + cot θ) + c for some constant c. Since θ = π/2 when t = 0,
we see that c = 0. " #
cos θ − log(csc θ + cot θ)
a. We have x = .
sin θ
1 + cos θ
b. Since e−t = , we find that (1 + cos θ)2 = e−2t (1 − cos2 θ), so 1 + cos θ =
sin θ
e−2t − 1 1 − e2t 2et
e−2t (1 − cos θ). Solving, we obtain cos θ = −2t = , and then sin θ = . Thus,
e +1 1 + e2t 1 + e2t
" # " #
1−e2t
t+ 1+e2t
t − tanh t
x= 2et
= .
1+e2t
secht
2.1.8 For any distinct nonzero real numbers s, t, and u, we have (using the properties of
determinant given in Section 5 of Chapter 1):
s t u 1 1 1 1 0 0 1 0 0
2 2 2
s t u = stu s t u = stu s t − s u − s = stu(t − s)(u − s) s 1 1
s 3 t 3 u3 s 2 t 2 u2 s 2 t 2 − s 2 u2 − s 2 s2 s + t s + u
1 0 0
= stu(t − s)(u − s) s 1 0 = stu(t − s)(u − s)(u − t) 6= 0.
s2 s + t u − t
Since the volume of the parallelepiped spanned by f (s), f (t), and f (u) is nonzero, the three points
cannot be collinear. On the other hand, when s = 0, say, and t and u are nonzero numbers, we see
that f (0) = 0, f (t), and f (u) are collinear and if and only if f (u) is a scalar multiple of f (t), and it
is easy to check this happens only when t = u.
a.
2.1. SCALAR- AND VECTOR-VALUED FUNCTIONS 35
b.
c.
d.
p p
1 + z02 1 + z12
′
P0 = 0 and P1′ = 0 can be joined by following the hyperbola x2 −z 2 = 1, y = 0.
z0 z1
cosh t
(This path can be parametrized explicitly, for example, by 0 , arcsinhz0 ≤ t ≤ arcsinhz1 .)
sinh t
x0 x1
Thus, given two points y0 and y1 ∈ X, we proceed from P0 to P0′ to P1′ to P1 .
z0 z1
0 0
There is, however, no path in Y from P = 0 to Q = 0 . For if there were, by the
−1 1
Intermediate Value Theorem, that path would have to cross every plane z = t, −1 ≤ t ≤ 1, and
yet there is no point on Y with z-coordinate 0.
2.1.11 a. This is a torus, the locus of points obtained by rotating a circle of radius 1 about a
circle of radius 2.
p
b. Note that x2 + y 2 = (2 + cos t)2 , so x2 + y 2 − 2 = cos t. Thus, every point of X
p 2
satisfies the equation x2 + y 2 − 2 + z 2 = 1. Removing the square root, this can be rewritten
as (x2 + y 2 + z 2 )2 − 10(x2 + y 2 ) + 6z 2 + 9 = 0.
2.2.2 Suppose xk → a. We want to show that xk,i → ai for all i = 1, . . . , n. Given ε > 0,
there is K ∈ N so that kxk − ak < ε whenever k > K. Since |xk,i − ai | ≤ kxk − ak, it follows that
|xk,i − ai | < ε whenever k > K, so we are done.
Conversely, suppose that xk,i → ai for all i = 1, . . . , n. Given ε > 0, for each i, there is Ki ∈ N so
√
that |xk,i −ai | < ε/ n whenever k > Ki . Then it follows that whenever k > K = max(K1 , . . . , Kn ),
we have
n
X 1/2 √ 1/2
kxk − ak = |xk,i − ai |2 < n(ε/ n)2 = ε,
i=1
as required.
2.2.3 a. By Exercise 1.2.17, we have kxk k − kak ≤ kxk − ak → 0 as k → ∞. (More
pedantically, given ε > 0, the same K that works for the original sequence works here.)
b. By linearity of the dot product and the Cauchy-Schwarz inequality, we have |b · xk −
b · a| = |b · (xk − a)| ≤ kbkkxk − ak → 0 as k → ∞. (More rigorously, if b = 0, there’s nothing to
prove. If b 6= 0, given ε > 0, there is K ∈ N so that kxk − ak < ε/kbk whenever k > K. Then we
have |b · xk − b · a| < ε whenever k > K.)
2.2.4 We apply the results of Example 6 and Exercise 2. Suppose xk ∈ R and xk → c. Then
xk,i → ci for i = 1, . . . , n. Since each interval [ai , bi ] is closed, we know that ci ∈ [ai , bi ], which
means that c ∈ R.
2.2.5 / B(a, r), then ky − ak = s > r. Let δ = s − r. By the triangle inequality, for every
If y ∈
point z ∈ B(y, δ) we have ky − ak ≤ ky − zk + kz − ak, so kz − ak ≥ ky − ak − ky − zk > s − δ = r.
Therefore, B(y, δ) ⊂ Rn − B(a, r), and so, by Proposition 2.1, B(a, r) is closed.
2.2.6 a. Since we are told that xk → a, given any ε > 0 there is K so that kxk − ak < ε
whenever k > K. Choose J so that kJ ≥ K. Then, whenever j > J, we have kj > kJ ≥ K, and so
kxkj − ak < ε, as required.
b. Yes, trivially. If every subsequence converges to a, take the case of the original
sequence.
Now suppose a ∈ U ∩ V . Since U and V are open, there are r1 , r2 > 0 so that B(a, r1 ) ⊂ U and
B(a, r2 ) ⊂ V . If we set r = min(r1 , r2 ), it follows that B(a, r) ⊂ U ∩ V , so U ∩ V is open.
b. We can deduce these results from part a using DeMorgan’s laws: Rn − (C ∪ D) =
(Rn − C) ∩ (Rn − D) and Rn − (C ∩ D) = (Rn − C) ∪ (Rn − D). But here is a direct argument.
Suppose {xk } is a sequence in C ∩ D that converges to a for some a ∈ Rn . Since xk ∈ C and C
is closed, it follows that a ∈ C; since xk ∈ D and D is closed, it follows that a ∈ D. Therefore,
a ∈ C ∩ D, so C ∩ D is closed.
The argument for the union is slightly more subtle, and requires that we use the result of
Exercise 6. Suppose {xk } is a sequence in C ∪ D that converges to a for some a ∈ Rn . First, it
must be that xk ∈ C for infinitely many k or xk ∈ D for infinitely many k. For definiteness, let’s
say the former. Then we have a subsequence xkj ∈ C, which necessarily converges to a. Since C is
closed, we know that a ∈ C ⊂ C ∪ D, and so C ∪ D is closed.
2.2.9 a. Let S = (−1, 0) ∪ (0, 1) ⊂ R. Then S = [−1, 1]. 0 is an interior point of S that is not
an element of S.
b. Let S = Q ⊂ R. Then F = Fr(Q) = R, and the set of frontier points of F is empty.
2.2.10 a. If Ik = [ak , bk ], let x = sup{ak }. The set of left-hand endpoints is bounded above
(e.g., by b1 ), and so the least upper bound exists. We have ak ≤ x for all k automatically. Now, if
40 2. FUNCTIONS, LIMITS, AND CONTINUITY
x > bj for some j, then since Ik ⊂ Ij for all k > j, this means that bj is an upper bound of the set
as well, contradicting the fact that x is the least upper bound.
b. Take Ik = (0, 1/k). Then there is no point in Ik for all k ∈ N.
ε ε
kxk − xℓ k = k(xk − a) + (a − xℓ )k ≤ kxk − ak + kxℓ − ak < + = ε,
2 2
as required.
b. Suppose {xk } is a Cauchy sequence and xkj → a. Suppose ε > 0 is given. Then, on
one hand, we have J ∈ N so that, whenever j > J, we have kxkj − ak < ε/2. On the other hand, we
have K ∈ N so that, whenever k, ℓ > K, we have kxk − xℓ k < ε/2. Choose j0 > J so that kj0 > K
(this is possible because kj → ∞). Then, whenever k > K, we have
ε ε
kxk − ak = k(xk − xkj0 ) + (xkj0 − a)k ≤ kxk − xkj0 k + kxkj0 − ak < + = ε.
2 2
Thus, xk → a.
2.2.13 Choose ε = 1. Then there is K ∈ N so that for all k, ℓ > K we have kxk − xℓ k < 1. In
particular, for all k > K we have kxk − xK+1 k < 1, so kxk k < 1 + kxK+1 k. Therefore, for all j ∈ N,
we have kxj k ≤ max kx1 k, kx2 k, . . . , kxK k, kxK+1 k + 1 .
2.3.1 Suppose we were to have two limits ℓ and m for f (x) as x approaches a. Then for any
ε > 0 there would be δ1 , δ2 > 0 so that kf (x)−ℓℓ k < ε whenever 0 < kx−ak < δ1 and kf (x)−m
mk < ε
whenever 0 < kx − ak < δ2 . Then, by the triangle inequality, when 0 < kx − ak < δ = min(δ1 , δ2 ),
we would have
2.3.2 Let a ∈ Rn be arbitrary. Given any ε > 0, note that if kx − ak < ε, then (by Exercise
1.2.17)
|f (x) − f (a)| = kxk − kak ≤ kx − ak < ε.
2.3.3 Because lim f (x) = ℓ = lim h(x), given any ε > 0, there are δ1 , δ2 > 0 so that |f (x) −
x→a x→a
ℓ| < ε whenever 0 < kx − ak < δ1 and |h(x) − ℓ| < ε whenever 0 < kx − a| < δ2 . Set δ = min(δ1 , δ2 ).
Then, whenever 0 < kx − ak < δ, we have
2.3.4 We copy the proof of second part of Theorem 3.2 with only minor alterations. Given
ε > 0, there are δ1 , δ2 > 0 so that
ε
kf (x) − ℓ k < min , 1 whenever 0 < kx − ak < δ1
2(|c| + 1)
42 2. FUNCTIONS, LIMITS, AND CONTINUITY
and
ε
|k(x) − c| < whenever 0 < kx − ak < δ2 .
2(kℓℓ k + 1)
Note that when 0 < kx − ak < δ1 , we have (by the triangle inequality) kf (x)k < kℓℓk + 1. Now, let
δ = min(δ1 , δ2 ). Whenever 0 < kx − ak < δ, we have
kk(x)f (x) − cℓℓk = k(k(x) − c)f (x) + c(f (x) − ℓ )k ≤ |k(x) − c|kf (x)k + |c|kf (x) − ℓ k
< (kℓℓ k + 1)|k(x) − c| + |c|kf (x) − ℓk
ε ε ε ε
< (kℓℓ k + 1) + |c| < + = ε,
2(kℓℓ k + 1) 2(|c| + 1) 2 2
as required.
2.3.5 Suppose f is continuous at a and f (a) > 0. Then, given any ε > 0, there is δ > 0 so that
whenever x ∈ B(a, δ), we have |f (x) − f (a)| < ε. Choose ε = f (a). Then for every x ∈ B(a, δ), we
have 0 = f (a) − ε < f (x) < f (a) + ε, and we are done. Somewhat more generally, if f is continuous
at a and f (a) 6= 0, then there is a neighborhood of a on which f is nowhere zero.
2.3.8 a. lim f (x) = 0, since the limit of the quotient is the quotient of the limits whenever
x→0
the limit of the denominator is nonzero.
sin u
b. lim f (x) = 1, since lim = 1.
x→0 u→0 u
x + y, x 6= y
x x
c. We have f = x + y whenever x 6= y. So we have f = .
y y 0, x=y
Thus, lim f (x) = 0, since 0 ≤ |f (x)| ≤ |x + y| and x + y → 0 as x → 0.
x→0
d. lim f (x) = 1, since exp is continuous at 0.
x→0
2.3. LIMITS AND CONTINUITY 43
2.3.10 Throughout this problem we operate under the assumption that the sequences are known
to be convergent.
√
a. Here we have f (x) = 2x, whose only fixed points are 0 and 2. Now we claim that
xk → 2, since the sequence is increasing (by induction, 1 ≤ xk < 2, and so xk < xk+1 ).
b. Here we have f (x) = x/2 + 2/x, whose fixed points are ±2. Since xk > 0 for all k, it
follows that xk → 2.
√
c. Set f (x) = 1 + 1/x. Then the fixed points of f are (1 ± 5)/2. Since xk > 0 for all
√
k, it follows that xk → (1 + 5)/2 (the golden ratio).
√
d. Let f (x) = 1 + 1/(1 + x). Then f (x) = x ⇐⇒ (x − 1)(x + 1) = 1 ⇐⇒ x = ± 2.
√
Since xk > 0 for all k, we deduce that xk → 2.
x, x<0
2.3.11 Take f (x) = . Then for every c ∈ R, f −1 ({c}) is either one point or the
x + 1, x ≥ 0
empty set.
2.3.12 This is false. Consider f : R → R, f (x) = x2 . Then the image of the open interval
(−1, 1) is the half-open interval [0, 1), which is not an open subset of R. Even easier, take f to be
any constant function.
2.3.13 We give two different proofs, the first using Proposition 3.4, the second using Proposition
3.6. Suppose C ⊂ Rm is closed and f : Rn → Rm is continuous.
44 2. FUNCTIONS, LIMITS, AND CONTINUITY
First proof : By Proposition 2.1, Rm −C is open, and so f −1 (Rm −C) is open. But now we claim
that f −1 (Rm − C) = Rn − f −1 (C), so it follows that f −1 (C) must be closed. To establish equality
of the sets, note that x ∈ f −1 (Rm − C) ⇐⇒ f (x) ∈ Rm − C ⇐⇒ f (x) ∈ / C ⇐⇒ x ∈ / f −1 (C).
Second proof : We wish to show that f −1 (C) is closed. Suppose {xk } is a convergent sequence
of points in f −1 (C). Now, xk ∈ f −1 (C) means that f (xk ) ∈ C. By continuity, if xk → a, then
f (xk ) → f (a). Since C is closed, the limit of the convergent sequence {f (xk )} must belong to C.
Therefore, f (a) ∈ C, which means, by definition, that a ∈ f −1 (C), as we needed to establish.
2.3.14 a. The determinant is a polynomial function of the entries of the square matrix, hence
is continuous. Therefore, {A : det A 6= 0} = det−1 (R − {0}) is the preimage of an open set and is
therefore open.
b. This is an immediate consequence of Corollary 3.7, since f : Mn×n → Mn×n , f (A) =
AAT , is continuous.
2.3.17 The answer is that we must have α/γ + β/δ > 1. Here is the most elegant way we know
to show this. Using a “weighted” version of polar coordinates (see Example 6 in Section 1), and
working only with x ≥ 0, y ≥ 0, we take xγ/2 = r cos θ and y δ/2 = r sin θ. Then
xα y β
1 2 α
+ βδ −1
γ δ
= 2 r 2α/γ r 2β/δ (cos θ)2α/γ (sin θ)2β/δ ≤ r γ
.
x +y r
Thus, we see that f (x) → 0 as x → 0 whenever α/γ + β/δ > 1. On the other hand, when
x 1
α/γ +β/δ = 1, we choose θ = π/4 and get that f = when xγ = y δ . And when α/γ +β/δ < 1,
y 2
approaching along the same curve results in arbitrarily large values of f .
2.3. LIMITS AND CONTINUITY 45
∂f y ∂f x
c. =− 2 2
, = 2 .
∂x x + y ∂y x + y2
∂f 2 2 ∂f 2 2
d. = −2xe−(x +y ) , = −2ye−(x +y ) .
∂x ∂y
∂f ∂f
e. = log x + 1 + y 2 /x, = 2y log x.
∂x ∂y
∂f ∂f
f. = yexy z 2 − y sin(πyz), = xexy z 2 − x sin(πyz) − πxyz cos(πyz),
∂x ∂y
∂f
= 2exy z − πxy 2 cos(πyz).
∂z
f (a + tv) − f (a) (2 + t)2 + (2 + t)(1 − t) − 6 3t
3.1.2 a. Dv f (a) = lim = lim = lim = 3.
t→0 t t→0 t t→0 t
a.
Let ϕ(t) = 2 ′
3.1.3 f (a+tv)
= (2+tv1 ) +(2+tv1 )(1+tv2 ).Then Dv f (a) = ϕ (0) = 5v1 +2v2 .
5 v 1 5
Note that 5v1 + 2v2 = · 1 is largest when v = √ .
2 v2 29 2
b. Let ϕ(t) = (1 + tv )e−tv1 . Then D f (a) = ϕ′ (0) = −v + v = −1 · v1 is largest
2 v 1 2
1 v2
1 −1
when v = √ .
2 1
46
3.1. PARTIAL DERIVATIVES AND DIRECTIONAL DERIVATIVES 47
1 1 1
c. Let ϕ(t) = + + . Then Dv f (a) = ϕ′ (0) = −v1 − v2 − v3 =
1 + tv 1 −1 + tv
2 1 + tv 3
−1 v1 1
−1 · v2 is largest when v = − √1 1 .
−1 v 3 1
3
3.1.5 a. Since D−v f (a) = −Dv f (a), we cannot have Dv f (a) > 0 for all nonzero v.
b. Let b 6= 0 and f (x) = b · x. Then Db f (a) = kbk2 for all a.
V nRT p nRT p pV ∂f
3.1.6 We have p = f = , V =g = , and T = h = , so =
T V T p V nR ∂V
nRT ∂g nR ∂h V
− 2 , = , and = . Thus,
V ∂T p ∂p nR
∂f ∂g ∂h nRT nR V
· · =− 2 · · = −1.
∂V ∂T ∂p V p nR
(See Exercise 6.2.8 for the general result.)
3.1.10 a. Note first that De1 f (0) = 0, since f is identically 0 on the x-axis. If v2 6= 0,
t3 v12 v2
f (tv) − f (0) t4 v4 +t2 v22 v 2 v2 v2
lim = lim 1 = lim 2 1 2 4 = 1 .
t→0 t t→0 t t→0 v2 + t v1 v2
x x3 y
b. f = 6 , x 6= 0, f (0) = 0. A similar calculation to that in part a shows that
y x + y2
Dv f (0) = 0 for all v. And, approaching along y = x3 , we see that f is discontinuous.
48 3. THE DERIVATIVE
3.2. Differentiability
3.2.1 Note that every function here is C1 , so we know it is differentiable and can apply Propo-
sition 2.1. h i
a. [Df (a)] = 2e−2 −e−2 , so the equation of the tangent plane is
z = e−2 + e−2 2(x + 1) − (y − 2) = e−2 (2x − y + 5).
h i
b. [Df (a)] = −2 4 , so the equation of the tangent plane is
z = 5 + (−2)(x + 1) + 4(y − 2) = −2x + 4y − 5.
h i
c. [Df (a)] = 3/5 4/5 , so the equation of the tangent plane is
z = 5 + (3/5)(x − 3) + (4/5)(y − 4) = 53 x + 45 y.
h √ √ i
d. [Df (a)] = −1/ 2 −1/ 2 , so the equation of the tangent plane is
√ √ √
z = 2 + (−1/ 2)(x − 1) + (−1/ 2)(y − 1) = √12 (−x − y + 4).
h i
e. [Df (a)] = 6 3 2 , so the equation of the tangent plane is
w = 6 + 6(x − 1) + 3(y − 2) + 2(z − 3) = 6x + 3y + 2z − 12.
h i
f. [Df (a)] = −1 1 1 , so the equation of the tangent plane is
w = 1 + (−1)(x − 1) + 1(y − 0) + 1(z + 1) = −x + y + z + 3.
1
3.2.2 h √ here is√C i, it is differentiable, and we can apply Proposition 2.3.
Since every function
a. [Df (a)] = 1/ 2 −1/ 2 , so Dv f (a) = Df (a)v = 0.
h √ √ i √
b. [Df (a)] = 1/ 2 −1/ 2 , so Dv f (a) = Df (a)v = 2.
h i
c. [Df (a)] = 1 6 , so Dv f (a) = Df (a)v = 13.
h i √
d. [Df (a)] = 4 2 , so Dv f (a) = Df (a)v = 2 5.
3.2. DIFFERENTIABILITY 49
h √ √ i
e. [Df (a)] = 2/ 5 1/ 5 , so Dv f (a) = Df (a)v = 1.
h i
f. [Df (a)] = e −e −e , so Dv f (a) = Df (a)v = −e.
" #
y x
3.2.3 a.
2x 2y
− sin t
b. cos t
et
" #
cos t −s sin t
c.
sin t s cos t
" #
yz xz xy
d.
1 1 2z
cos y −x sin y
e. sin y x cos y
0 1
z 2 = x2 + y 2 and cz = ax + by.
By algebra, we obtain (ax + by)2 = c2 (x2 + y 2 ), and so (bx − ay)2 = 0. This means that we have
y =bx/a and z = cx/a, and so the intersection is the line through the origin with direction vector
a
b . (This is, of course, the generator of the cone passing through a.)
c
a
3.2.9 The tangent plane of the surface z = xy at a = b is given by z−c = b(x−a)+a(y−b),
c
or, simplifying, z = bx + ay − ab. To find the intersection of this plane and the original surface, we
solve the system of equations
z = xy and z = bx + ay − ab.
h i
a
b. We establish that the linear map T = b2 2ab is the derivative of f at a = :
b
as required.
∂f ∂f
3.2.12 Since f = 0 on the coordinate axes, it is clear that (0) = (0) = 0. On the other
∂x ∂y
hand,
∂f 2xy(y 2 − x4 ) ∂f x2 (x4 − y 2 )
= and = ,
∂x (x4 + y 2 )2 ∂y (x4 + y 2 )2
∂f x ∂f x
from which we see that → 2 as x → 0 and → ∞ as x → 0. Either of these is
∂x x ∂y 0
sufficient to establish that f cannot be C1 at 0.
2
3.2.13 Throughout this exercise, we identify n × n matrices with vectors in Rn . We have
(A + B)2 − A2 − (AB + BA) B2
lim = lim = O, since kB 2 k ≤ kBk2 . The latter statement
B→O kBk B→O kBk
follows from the Cauchy-Schwarz inequality, quite like Exercise 2.3.7: By the definition of matrix
P
n Pn
product, kB 2 k2 = kBbj k2 ≤ kBk2 kbj k2 = kBk4 .
j=1 j=1
Similarly, in the case of the second function, we have
(A + B)T (A + B) − AT A − (BAT + AB T ) B TB
lim = lim = O.
B→O kBk B→O kBk
P P
In this case, we have kB T Bk2 = (Bi · bj )2 ≤ kBi k2 kbj k2 = kBk4 .
i,j i,j
3.2.14 a. Of course, it is clear that f is C1 , hence differentiable. But, arguing directly, we have
f (a + h) − f (a) − (Aa · h + Ah · a) h · Ah
lim = lim = 0,
h→0 khk h→0 khk
3.2.15 Since f is differentiable at a, by Proposition 2.3 it suffices to check that Dv f (a) = 0 for
every v. This is a standard result from single-variable calculus: Letting ϕ(t) = f (a + tv), we are
told that 0 is a local maximum point of ϕ, and so ϕ′ (0) = Dv f (a) = 0. (See the proof of Lemma
2.1 of Chapter 5.)
52 3. THE DERIVATIVE
x x a
3.2.16 Choose x = ∈ B(a, δ). Write x = and a = . Then
y y b
x a x x
f (x) − f (a) = f −f + f −f
b b y b
∂f ξ ∂f x
= (x − a) + (y − b)
∂x b ∂y η
for some ξ between a and x and some η between b and y. Since Df = O on B(a, δ), it follows that
∂f ∂f
= = 0 on B(a, δ), and so the right-hand side vanishes. Therefore f (x) − f (a) = 0 for all
∂x ∂y
x ∈ B(a, δ), as required.
3.2.19 a. See Exercise 3.1.10, part a. f is discontinuous at 0 and therefore cannot be differen-
tiable at 0. (See Proposition 2.2.)
b. See Exercise 3.1.10, part b.
y px2 + y 2 , x 6= 0
x
c. Take f = |x| . Then evidently De2 f (0) = 0. For any v with
y
0, x=0
v1 6= 0, we have
tv2
|t|kvk
|t||v1 | v2
Dv f (0) = lim = kvk.
t→0 t |v1 |
y4
On the other hand, we see that f → ∞ as y → 0.
y
d. Here are two such functions:
y 4
x6 y 2
x (x − y 4 )2 , x > y 4 x , x 6= 0
f = x and g = (x12 + y 4 )(x2 + y 2 ) .
y 0, x ≤ y4 y
0, x=0
3.3. DIFFERENTIATION RULES 53
and
0
" # −1 2 " #
3 1 −1 −2 9
D(g◦ f )(0) = Dg 1 Df (0) = 1 3 = .
1 0 1 −1 2
0 0 0
We conclude that f ◦ g is a constant function. That is, the parametrized curve g lies on the sphere
x2 + y 2 + z 2 + 2x = 3.
∂f
3.3.4 a. Let f : R3 → R be given by f (x) = kxk. Then = xi /kxk, so Df (x) =
∂xi
3 0
1 h i 1 h i
x y z . We have (f ◦ g)′ (2π) = Df 0 g′ (2π) = √ 3 0 10π 3
kxk 9 + 100π 2
10π 5
50π
=√ .
9 + 100π 2
54 3. THE DERIVATIVE
h i
b. Now we have Df (x) = y x 2z , so
√
−3/ 2 h i − √32
√ √3 75π
(f ◦ g)′ (3π/4) = Df 3/ 2 g′ (3π/4) = √3 − √32 15π
− 2 = .
2 2
2
15π/4 5
3.3.5 Using a coordinate system (in miles) centered at the radar tower, with x > 0 to the east,
y > 0 to the north, and z > 0 upwards, let f (x) = kxk denote the distance
from x to the tower.
−3
Denote by g(t) the location of the plane at time t; suppose that g(t0 ) = 0 .
4
450
1 T
a. We are told that g′ (t
0 ) = 0 . Then, since Df (x) = x , we have
kxk
5
h i 450
1
(f ◦ g)′ (t0 ) = −3 0 4 0 = −266.
5
5
That is, the plane is approaching the tower at the rate of 266 mph.
√
225 2
√
b. Now we are told that g′ (t 0 ) = 225 2 , so
5
√
225 2
1h
i √ √
(f ◦ g)′ (t0 ) = −3 0 4 225 2 = −135 2 + 4 ≈ −186.9.
5
5
That is, the plane is approaching the tower at the rate of approximately 186.9 mph.
" # " #
V T V (t) 10
3.3.6 We have p = f = . Let g(t) = . We are given g(t0 ) = and
T V T (t) 300
" #
1
g′ (t0 ) = . (V is measured in liters (l), T is measured in ◦ K, p in atmospheres (atm), and
5
h i
V
time t is measured in minutes.) Then we have Df = −T /V 2 1/V . So, by the chain rule,
T
" #
h i 1
(f ◦ g)′ (t0 ) = Df (g(t0 ))g′ (t0 ) = −3 1/10 = −2.5 atm/min.
5
h " #
V V V
i V (t)
3.3.7 We have I = f = . Then Df = 1/R −V /R2 . Assuming g(t) =
R R R R(t)
" # " #
10 −0.1
is differentiable, g(t0 ) = and g′ (t0 ) = , then by the chain rule, I ′ (t0 ) = (f ◦ g)′ (t0 ) =
100 0.5
" #
h i −0.1
′
Df (g(t0 ))g (t0 ) = 0.01 −0.001 = −0.0015 amp/sec.
0.5
3.3. DIFFERENTIATION RULES 55
3.3.8 We know from single-variable calculus that for the function r(x) = 1/x, r ′ (x) = −1/x2 .
1
Then, by the chain rule, D(r ◦ g)(a) = r ′ (g(a))Dg(a) = − Dg(a), as required.
g(a)2
3.3.9 We have
(f · g)(a + h)−(f · g)(a) − Df (a)h · g(a) + f (a) · Dg(a)h
= f (a + h) · g(a + h) − f (a) · g(a) − Df (a)h · g(a) + f (a) · Dg(a)h
= (f (a + h) · g(a + h) − f (a + h) · g(a)) + (f (a + h) · g(a) − f (a) · g(a)) −
Df (a)h · g(a) + f (a) · Dg(a)h
= f (a + h) · g(a + h) − g(a) − Dg(a)h + f (a + h) − f (a) − Df (a)h · g(a)
+ (f (a + h) − f (a)) · Dg(a)h.
and all three terms go to 0 as h → 0: the first, by differentiability of g; the second, by differentia-
bility of f ; and the last, by continuity of f .
3.3.10 Perhaps the easiest proof comes from writing everything out in coordinates and using the
product rule for scalar-valued functions. But we can copy the proof given in Exercise 9, substituting
cross product for dot product (and being careful about order). Note that it follows from Proposition
5.1 of Chapter 1 that kx × yk ≤ kxkkyk. We have
(f × g)(a + h)−(f × g)(a) − Df (a)h × g(a) + f (a) × Dg(a)h
= f (a + h) × g(a + h) − f (a) × g(a) − Df (a)h × g(a) + f (a) × Dg(a)h
= (f (a + h) × g(a + h) − f (a + h) × g(a)) + (f (a + h) × g(a) − f (a) × g(a)) −
Df (a)h × g(a) + f (a) × Dg(a)h
= f (a + h) × g(a + h) − g(a) − Dg(a)h + f (a + h) − f (a) − Df (a)h × g(a)
+ (f (a + h) − f (a)) × Dg(a)h.
3.3.11 As the hint suggests, we fix x 6= 0 and consider the function h(t) = t−k f (tx). Assume
first that f is homogeneous of degree k. Then h is a constant function, and so, by the product rule
and chain rule, we have 0 = h′ (t) = −kt−k−1 f (tx) + t−k Df (tx)x. In particular, setting t = 1, we
obtain Df (x)x = kf (x), as required.
Conversely, suppose Df (x)x = kf (x) for all nonzero x. Then it follows that for any t > 0, we
have tDf (tx)x = Df (tx)(tx) = kf (tx). Thus,
and so h(t) = h(1) = f (x) for all t. Therefore, f (tx) = tk f (x) for all t > 0.
3.3.12 Recall that it is a consequence of the Mean Value Theorem that a continuous function
on a closed interval with zero derivative on that interval is a constant function. Fix a ∈ U and
let b be arbitrary. Let g : [0, 1] → Rn be given by g(t) = a + t(b − a), so g parametrizes the line
segment from a to b. Then (f ◦ g)′ (t) = Df (g(t))g′ (t) = 0 for all t, inasmuch as Df (x) = O for all
x ∈ U . Thus, fi ◦ g is a constant function for each i = 1, . . . , m, and so f ◦ g is a constant function.
That is, f (b) = f (a); since b is arbitrary, f (b) = f (a) for all b ∈ U , so f is a constant function.
The same proof shows that the result holds whenever we can join b to a by any differentiable
path g, and, therefore, by extension, by any piecewise-differentiable path.
2 x
3.3.13 Let g : R − {0} → R be given by g(x) = kxk. Then h = f ◦ g and Dh =
y
h i ∂h ∂h
f ′ (r) x/r y/r . Therefore, x +y = (x2 + y 2 )f ′ (r)/r = rf ′ (r).
∂x ∂y
Z v
2 u 2 u(t)
3.3.14 Define f : R → R by f = h(s)ds and g : (a, b) → R by g(t) = . The
v u v(t)
Fundamental Theorem of Calculus tells us that, since h is continuous, f is C1 and hence differen-
∂f ∂f
tiable, with = −h(u) and = h(v). Then F = f ◦ g is differentiable, and we have
∂u ∂v
" #
h i u′ (t)
′ ′
F (t) = Df (g(t))g (t) = −h(u(t)) h(v(t)) = −h(u(t))u′ (t) + h(v(t))v ′ (t).
v ′ (t)
" #
u u+v
3.3.15 Letting g = , we have F = f ◦ g, and
v u−v
" #
u ∂f u ∂f u 1 1
DF = g g
v ∂x v ∂y v 1 −1
∂f u ∂f u ∂f u ∂f u
= g + g g − g .
∂x v ∂y v ∂x v ∂y v
3.4. THE GRADIENT 57
Thus,
! !
∂F ∂F u ∂f ∂f u ∂f u ∂f u
= g + g g − g
∂u ∂v v∂x ∂y v ∂x v ∂y v
2 2
∂f u ∂f u
= g − g .
∂x v ∂y v
" #
r r cos θ
3.3.16 Letting g = , we have F = f ◦ g and
θ r sin θ
" #
r ∂fu ∂f u cos θ −r sin θ
DF = g g
θ ∂x v ∂y v sin θ r cos θ
∂f r cos θ ∂f r cos θ ∂f r cos θ ∂f r cos θ
= cos θ + sin θ −r sin θ + r cos θ ,
∂x r sin θ ∂y r sin θ ∂x r sin θ ∂y r sin θ
so
2 2 !2
∂F 1 ∂F ∂f r cos θ ∂f r cos θ
+ 2 = cos θ + sin θ
∂r r ∂θ ∂x r sin θ ∂y r sin θ
!2
∂f r cos θ1 ∂f r cos θ
−r + 2 sin θ + r cos θ
∂x r sin θ
r ∂y r sin θ
2 2
∂f r cos θ ∂f r cos θ
= + .
∂x r sin θ ∂y r sin θ
" # " #
x u u
3.3.17 As the hint suggests, let =g = . Letting F = f ◦ g, we find that
t v (v − u)/c
∂F
= 0, which means that F is independent of u and therefore a function just of v. That is,
∂u
u x
F = h(v) for some function h. But v = x + ct, so f = h(x + ct) for some (differentiable)
v t
function h.
3.4.1 We use the fact that ∇f (a) is the normal to the tangent line at a of the level curve of
f passing through a." # " #
3x2 1
a. ∇f = 2
, so ∇f (a) = 3 . Therefore, the tangent line is given by
3y 4
" # " #
1 x−1
0= · = (x − 1) + 4(y − 2), i.e., x + 4y = 9.
4 y−2
58 3. THE DERIVATIVE
" # " #
3y 2 + yexy 4
b. ∇f = xy
, so ∇f (a) = . Therefore, the tangent line is
6xy + xe − π cos(πy) π
given by " # " #
4 x
0= · = 4x + π(y − 1), i.e., 4x + πy = π.
π y−1
" # " #
3x2 + y 2 4
c. ∇f = 3
, so ∇f (a) = . Therefore, the tangent line is given by
2xy − 4y 2
" # " #
4 x−1
0= · = 4(x − 1) + 2(y + 1), i.e., 2x + y = 1.
2 y+1
3.4.2 We use the fact that ∇f (a) is the normal to the tangent plane at a of the level surface
of f passing througha.
2x 1
a. ∇f = 2y , so ∇f (a) = 2 0 . Therefore, the tangent plane is given by
2z 2
1 x−1
0 = 0 · y = (x − 1) + 2(z − 2), i.e., x + 2z = 5.
2 z−2
2yexy z 5 4
2 xy 5
b. ∇f = z + 2xe z , so ∇f (a) = 1 . Therefore, the tangent plane is given
2yz + 10exy z 4 14
by
4 x
0 = 1 · y − 2 = 4x + (y − 2) + 14(z − 1), i.e., 4x + y + 14z = 16.
14 z−1
3x2 + z 2 3
2
c. ∇f = 2yz + 3y , so ∇f (a) = 3 . Therefore, the tangent plane is given by
2xz + y 2 1
3 x+1
0 = 3 · y − 1 = 3(x + 1) + 3(y − 1) + z, i.e., 3x + 3y + z = 0.
1 z
2e2x+z cos(3y) − y 2
2x+z
d. ∇f = −3e sin(3y) − x , so ∇f (a) = 1 . Therefore, the tangent plane is
e2x+z cos(3y) + 1 2
given by
2 x+1
0 = 1 · y = 2(x + 1) + y + 2(z − 2), i.e., 2x + y + 2z = 2.
2 z−2
3.4. THE GRADIENT 59
3.4.4 a. We know that moving from the point a ∈ R2 in the direction of v = ∇f (a) will result
in the greatest rate of increase of f , hence in the steepest ascent up the hillside. That rate will be
Dv f (a) = ∇f (a) · v = k∇f (a)k2 = 25. This gives the “rise”
corresponding to the “run” ∇f (a),
3
and so a vector of steepest ascent up the hillside is −4 .
25
25e3
rise
∇f
(a)
run
b. If the stream flows in the e2 direction, the rate at which its elevation changes is
De2 f (a) = −4, so the stream bed makes an angle of − arctan 4 with the horizontal (“rise/run” =
−4/1).
3.4.5 Proceeding as in Example 3, let f1 (x) = kx − ak, f2 (x) = kx − bk, f = f1 + f2 , and let
the ladybug’s position as a function of time be given by g(t). At time t = t0 , we have g(t0 ) = x0
and g′ (t0 ) = v. Then, by the chain rule, we have
′ ′ x0 − a x0 − b π √
(f ◦ g) (0) = Df (x0 )g (t0 ) = ∇f (x0 ) · v = + · v = −2kvk cos = −5 2.
kx0 − ak kx0 − bk 4
√
Thus, the sum of the ladybug’s distances from a and b decreases at a rate of 5 2 units/sec.
3.4.6 Since (f ◦ g)(t) = c for all t ∈ (−ε, ε), we have (f ◦ g)′ (t) = 0 for all t ∈ (−ε, ε). In
particular, 0 = (f ◦ g)′ (0) = ∇f (a) · g′ (0). Since g′ (0) is the direction vector for the tangent line to
C at a, the conclusion follows. (That C can be so parametrized, with g′ (0) 6= 0, is a consequence
of the implicit function theorem. See Section 5 of Chapter 4 and Sections 2 and 3 of Chapter 6.)
60 3. THE DERIVATIVE
x −c c
3.4.7 Let P = , F1 = , and F2 = . We have
y 0 0
−−→ −−→ p p
kF1 P k + kF2 P k = (x + c)2 + y 2 + (x − c)2 + y 2 = 2a
m
p p
(x + c)2 + y 2 = 2a − (x − c)2 + y 2
m
p
(x + c)2 + y 2 = 4a2 − 4a (x − c)2 + y 2 + (x − c)2 + y 2
m
p
4a (x − c)2 + y 2 = 4a2 − 4cx
m (using cx ≤ ca < a2 for ⇑)
a2 (x − c)2 + y 2 = (a2 − cx)2
m
a2 (x2 − 2cx + c2 + y 2 ) = a4 − 2a2 cx + c2 x2
m
(a2 − c2 )x2 + a2 y 2 = a2 (a2 − c2 )
m
x2 y2
+ = 1, where a2 − c2 = b2 .
a2 b2
−−
→
3.4.8 of points P so that kF P k is equal
A parabola with focus F and directrix ℓ is the locus
0
to the distance from P to ℓ. For concreteness, let F = and ℓ = {x ∈ R2 : x2 = −c}. Then
c
−→
the parabola is a level set of the function f (x) = kF xk − x2 . If T is the unit tangent vector to the
parabola at x, then we have
−→ ! −→
Fx Fx
0= −→ − e2 · T = −→ · T − e2 · T = cos α − cos β,
kF xk kF xk
−→
where α is the angle between T and F x and β is the angle between T and the vertical. This means
that the light ray emanating from the focus and the vertical make equal angles with the parabola,
as required.
3.4.9 The crucial fact that’s needed here is the following: If P is external to a circle, the two
line segments from P tangent to the circle have equal length. The plane containing F1 , P , and
Q1 intersects the smaller sphere in a circle, and we observe that the line segments F1 P and Q1 P
are both tangent to that circle, the former because the sphere is tangent to the shaded plane, the
−−→ −−→
second because it is tangent to the cone at Q1 . Thus, kF1 P k = kQ1 P k. Similarly, using the larger
3.4. THE GRADIENT 61
−−→ −−→
inscribed sphere, we obtain kF2 P k = kQ2 P k. Therefore,
−−→ −−→ −−→ −−→
kF1 P k + kF2 P k = kQ1 P k + kQ2 P k = const,
inasmuch as the distance along generators from one horizontal slice of the cone to another is the
same for all the generators.
2
3.4.10 a. We are told that ∇f is everywhere a scalar multiple of the vector . Thus, the
1
level curves must be lines orthogonal to that vector, i.e., lines of the form 2x + y = c, c ∈ R.
To verify that our statement is correct, we check directly that fis constant along any such line.
t
Choose a parametrization g of any such line, e.g., g(t) = . Then
c − 2t
∂f ∂f
(f ◦ g)′ (t) = ∇f (g(t)) · g′ (t) = (g(t)) − 2 (g(t)) = 0,
∂x ∂y
as required.
0 x 0
b. Set F (s) = f . Then F is differentiable and f =f = F (2x + y).
s y 2x + y
" #
x −y
3.4.11 a. We are told that ∇f · = 0 everywhere. Since ∇f is orthogonal to the
y x
x −y
level curves of f , it follows that the level curve through must be tangent to . It is not
y x
difficult to see that that level curve must be a circle centered at the origin. Formally, if the level
curve is (locally) the graph y = g(x), then g ′ (x) = −x/g(x), so x + g(x)g ′ (x) = 0, from which we
obtain x2 + g(x)2 = const.
a cos t
To verify that this is correct, for any constant a, we differentiate ϕ(t) = f using the
a sin t
chain rule:
" #
′ −a
a cos t sin t ∂f a cos t ∂f a cos t
ϕ (t) = ∇f · = −a sin t + a cos t = 0.
a sin t
a cos t ∂x a sin t ∂y a sin t
p
s x x2 + y 2 p
b. For s > 0, set F (s) = f . Then f = f = F ( x2 + y 2 ), as
0 y 0
required.
b. The tangent planes are orthogonal at x precisely when A1 ·A2 = 0, so 2x2 +2y 2 −z = 0.
Solving this simultaneously with the other two equations, we find that x2 + y 2 = c, so z = 2c and
√ √
4c2 + c − 1 = 0. Thus, c = (−1 ± 17)/8. However, in the case c = −(1 + 17)/8 < −5/8, we
√
have z < −5/4, and there is no such point on S1 . Thus, only in the case c = ( 17 − 1)/8 do the
surfaces intersect orthogonally.
3.4.13 Let g be a parametrization of the ellipse. Note that since n is the unit normal, (n◦ g)′
−−→ ′
is orthogonal to n, hence tangential. Note that by the product rule, we have F1 g · (n◦ g) =
−−→ −−→ −−→ ′ −−→
g′ · (n◦ g) + F1 g · (n◦ g)′ = F1 g · (n◦ g)′ . Similarly, F2 g · (n◦ g) = F2 g · (n◦ g)′ . Differentiating, and
−−→
letting α denote the angle between Fi P and n, we have
−−→ −−→ ′ −−→ −−→ −−→ −−→
F1 g · (n◦ g) F2 g · (n◦ g) = (F1 g) · (n◦ g)′ F2 g · (n◦ g) + F1 g · (n◦ g) (F2 g) · (n◦ g)′
−−→ −−→
= k(n◦ g)′ kkF1 gkkF2 gk (− sin α)(cos α) + (cos α)(sin α) = 0.
−−→ −−→
Therefore, F1 g · (n◦ g) F2 g · (n◦ g) is constant, as required.
An alternative solution is to write it out in
coordinates, using the equation in Exercise 7. Since
x x2 y 2
the ellipse is a level curve of the function f = 2 + 2 = 1, its (non-unit) normal is given by
y a b
2
1 x/a
2 ∇f = y/b 2 . Then it suffices to prove that
as required.
3.4.14 At each point the stream flows in the direction of steepest descent,
so on the map
80 x
(projecting onto the xy-plane), it is following −∇h = . If the route of the
(4 + x2 + 3y 2 )2 3y
3.5. CURVES 63
3f (x) dy 3y
stream is given by y = f (x), then we have f ′ (x) = (or = ). Separating variables and
x dx x
integrating, we obtain
f ′ (x) 3
= =⇒ log f (x) = 3 log x + c =⇒ f (x) = Cx3 for some constant C .
f (x) x
1
Since f (1) = 1, we must have C = 1, and the stream follows the path y = x3 from “outwards.”
1
3.4.15 At each point, the water follows the path of steepest descent, so its path on the map
(projecting onto the xy-plane) must be orthogonal to level curves of the height
function.
From the
x 2 2
function h y = 9 − (4x + y ),
equation of the football, we infer that these are level curvesof the
4x
and so the (projection of the) water drop follows − 12 ∇h = . If the path is given by y = f (x),
y
f (x)
then we have f ′ (x) = . Separating variables and integrating, we obtain
4x
f ′ (x) 1 1
= =⇒ log f (x) = log x + c =⇒ f (x) = Cx1/4 for some constant C .
f (x) 4x 4
1
Since f (1) = 1, we must have C = 1, and the water drop follows the path x = y 4 from
1
1.398
“outwards” to approximately . Along the surface of the football, the actual water drop
1.087
4
t 1 1.398
takes the path √ t from 1 to 1.087 .
1
2 9 − 4t8 − t2 1 0
3.5. Curves
3.5.1 If g · g′ = 0, then (g · g)′ = 0, and so kgk2 is constant. Thus, g lies on a sphere centered
at the origin.
3.5.2 kg′ k = const ⇐⇒ kg′ k2 = (const)2 ⇐⇒ 2g′ ·g′′ = 0 ⇐⇒ the velocity and acceleration
vectors are always orthogonal.
3.5.3 The result is immediate from the product rule. The geometric interpretation when
kf k = kgk = 1 is very simple: In order to maintain a constant angle (f · g = cos θ, where θ is the
angle between f and g), as f turns towards g, g must simultaneously turn away from f at the same
rate.
3.5.4 There are two cases. If the force field is everywhere zero, then the particle either is
stationary (if its speed is 0) or moves in a line. If the force is nonzero, then we know from
Proposition 5.1 that the particle moves in a plane. From Exercise 2 it follows that its velocity and
acceleration vectors are always orthogonal. Since the force field is central, the particle’s velocity
and position vectors are always orthogonal, and so it follows from Exercise 1 that the particle moves
64 3. THE DERIVATIVE
on a sphere centered at the origin. We’ve already established that its trajectory is planar; thus, it
moves in a circle centered at the origin.
3.5.5 Intuitively, if a particle’s velocity is only in the direction of its position vector, then
its direction cannot change, and its position vector always points in the same direction. More
precisely, following the hint, we have g = kgkh, and so λg = g′ = kgk′ h + kgkh′ . Therefore,
h′ = (λ − kgk′ /kgk)h. But, inasmuch as h has constant length 1, by Proposition 5.2, we know that
h′ must be orthogonal to h. Therefore h′ = 0, as required.
3.5.6 t0 is a global minimum of the function f : (a, b) → R, f (t) = kg(t) − pk2 . Thus,
0 = f ′ (t0 ) = 2(g(t0 ) − p) · g′ (t0 ). (For an intuitive explanation, this is the Pythagorean Theorem
in action. If g(t0 ) is the point on the curve closest to p, then as we move away we approximate
g(t) − p as the hypotenuse of a triangle with legs g(t0 ) − p and (t − t0 )g′ (t0 ).)
t
e (cos t − sin t) cos t − sin t
3.5.7 a. We have g′ (t) = et (sin t + cos t) = et sin t + cos t . Thus,
et 1
p √
kg′ (t)k = et (cos t − sin t)2 + (sin t + cos t)2 + 1 = et 3,
Z b Z b√ √
and the arclength of the curve is kg′ (t)kdt = 3et dt = 3(eb − ea ).
a a
1 t
−t
2 (e −e )
b. We have g′ (t) = 21 (et + e−t ) , so
1
q q
et + e−t
kg′ (t)k = 1 t
4 (e − e−t )2 + 14 (et + e−t )2 + 1 = 1 t
2 (e + e−t )2 = √ .
2
Z 1 Z 1 √
1 1
Thus, the arclength of the curve is kg′ (t)kdt = √ et + e−t dt = 2(e − ).
−1 2 −1 e
1 √
c. We have g′ (t) = 6t , so kg′ (t)k = 1 + 36t2 + 324t4 = 18t2 + 1. Thus, the
2
18t
Z 1 Z 1
′
arclength of the curve is kg (t)kdt = (18t2 + 1)dt = 7.
0 0
1 − cos t p
′
d. We have g (t) = a , and so kg′ (t)k = a (1 − cos t)2 + sin2 t =
sin t
p
a 2(1 − cos t). Using the double angle formula cos t = 1 − 2 sin2 (t/2), we find that the arclength
Z 2π Z 2π p Z 2π
of the curve is kg′ (t)kdt = a 2(1 − cos t)dt = 2a | sin(t/2)|dt = 8a.
0 0 0
3.5. CURVES 65
− √13 sin t + √12 cos t
3.5.8 a. We have g′ (t) = − √13 sin t , so υ(t) = kg′ (t)k = 1, and T = g′ . Then the
− √13 sin t − √12 cos t
curve is arclength-parametrized and
− √1 cos t − √1 sin t
3 2
κ(t) = kT′ (t)k =
− √13 cos t
= 1.
− 3 cos t + 2 sin t
√1 √1
(Note that g is just the parametrization of the unit circle in the plane x1 − 2x2 + x3 = 0.)
−t
−e
b. We have g′ (t) = √
et , so υ(t) = kg′ (t)k = et + e−t , and the unit tangent vector
2
−e −t
1 et . By the chain rule, we have
is T = √
e + e−t
t
2
−e−t e−t 2
− e−t
t 1 et t 1
κ(s(t))υ(t)N(s(t)) = t −t e + t −t e = t 2 .
(e + e ) 2 √ e +e (e + e−t )2 √ −t t
2 0 2(e − e )
√
Therefore, κ = 2/(et + e−t )2 .
1 √
c. Since g′ (t) = 2t , so υ(t) = kg′ (t)k = 1 + 4t2 + 9t4 and the unit tangent vector
2
3t
1
1 2t . By the chain rule,
is given by T = √
1 + 4t2 + 9t4 3t2
1 0
′ 3 2 4 −3/2 2 4 −1/2
κ(s(t))υ(t)N(s(t)) = (T◦ s) (t) = −(4t + 18t )(1 + 4t + 9t ) 2t + (1 + 4t + 9t ) 2
3t2 6t
−t(2 + 9t2 )
2 4 −3/2
= 2(1 + 4t + 9t ) 1 − 9t4 ,
3t(1 + 2t2 )
and so, noting that 1 + 13t2 + 54t4 + 117t6 + 81t8 = (1 + 4t2 + 9t4 )(1 + 9t2 + 9t4 ), we find that
√
κ = 2(1 + 4t2 + 9t4 )−3/2 1 + 9t2 + 9t4 .
3.5.10 We wish to bank the road so that the resistive normal force n exerted by the road
contributes the centripetal acceleration with magnitude κυ 2 , so we should have knk sin θ = mκυ 2 .
Since the vertical component of the normal force must balance the weight of the car, we have
66 3. THE DERIVATIVE
W=mg
so θ ≈ 0.0813 (≈ 4.66◦ ).
3.5.11 a. Since T and N are orthogonal unit vectors, their cross product B is a unit vector
orthogonal to both of them. Then the matrix
| | |
A = T N B
| | |
is orthogonal, and hence A−1 = AT . Given any vector x ∈ R3 , it follows that we can solve the
equation Ac = x for c, and so every vector x ∈ R3 can be expressed as x = c1 T + c2 N + c3 B for
some scalars ci . (Indeed, c1 = T · x, c2 = N · x, and c3 = B · x.)
b. Since kBk = 1, it follows from Proposition 5.2 that B′ · B = 0. Since T′ · B = 0, it
follows from Exercise 3 that B′ · T = −T′ · B = 0. Since we know from part a that B′ is a linear
combination of T, N, and B, it follows that B′ must be a scalar multiple of N.
c. If τ = 0, then B(s) = B0 for all s. Then (g · B0 )′ = T · B0 = 0, so g lies in a
plane with normal vector B0 . Conversely, if g lies in the plane A · x = b, then A · g = b and so
A · g′ = A · T = 0, and thus A · N = 0. Then B = T × N is a scalar multiple of A, and is therefore
constant.
d. We know that N′ = c1 T + c2 N + c3 B for some scalar functions c1 , c2 , and c3 . Using
Proposition 5.2 and Exercise 3 once again, we have c2 = N′ · N = 0, c1 = N′ · T = −T′ · N = −κ,
and c3 = N′ · B = −B′ · N = τ . Thus, N′ = −κT + τ B, as desired.
−a sin t √ √
3.5.12 We have g′ (t) = a cos t , so υ(t) = kg′ (t)k = a2 + b2 . Writing c = a2 + b2 , we
b
have
−a sin t
1
T(s(t)) = a cos t ,
c
b
3.5. CURVES 67
cos t
T′
N= = − sin t ,
kT′ k
0
1 a a
κ= k(T◦ s)′ (t)k = 2 = 2 ,
υ c a + b2
b sin t
1
B = T × N = −b cos t ,
c
a
b cos t
′ 1
(B◦ s) (t) = b sin t = −υτ N(s(t)), so
c
0
b b
τ= 2
= 2 .
c a + b2
cos t − sin t √
3.5.13 We have g′ (t) = et cos t + sin t , so υ(t) = kg′ (t)k = et 3. Then we have
1
cos t − sin t
1
T(s(t)) = √ cos t + sin t ,
3
1
− cos t − sin t
T′ 1
N= = √ cos t − sin t ,
kT′ k 2
0
√
1 2
κ = k(T◦ s)′ (t)k = t ,
υ 3e
sin t − cos t
1
B = T × N = √ − sin t − cos t ,
6
2
cos t + sin t
′ 1
(B s) (t) = √ − cos t + sin t = −υτ N(s(t)),
◦ so
6
0
1 1
τ = √ = t.
υ 3 3e
−−→
3.5.14
Letting Q be the point of tangency of the string to the cycloid, we have OQ = f (t) =
t + sin t −−
→
. Note that when 0 ≤ t ≤ π, −QP is a scalar multiple of the vector f ′ (t)/kf ′ (t)k, the
1 − cos t
scalar being arclength of that portion of the cycloid from O to Q. When 0 ≤ t ≤ π, that length is
given by:
Z t Z t Z t
√
s(t) = kf ′ (u)kdu = 2 + 2 cos udu = 2| cos(u/2)|du = 4 sin(t/2).
0 0 0
68 3. THE DERIVATIVE
−−→
(Note, moreover, that since the figure
" is symmetric
# about t = π, this will also be the value of kQP k
1 + cos t
when π < t ≤ 2π.) Now, f ′ (t) = , so
sin t
" # " #
−−→ −−→ −−→ t + sin t 4 sin(t/2) 1 + cos t
OP = OQ + QP = −
1 − cos t 2 cos(t/2) sin t
" # " #
t + sin t 2 sin(t/2) 2 cos2 (t/2)
= −
1 − cos t cos(t/2)
2 sin(t/2) cos(t/2)
" # " #
t + sin t − 4 sin(t/2) cos(t/2) t − sin t
= = .
1 − cos t − 4 sin2 (t/2) −1 + cos t
(Although we derived this formula assuming 0 ≤ t ≤ π, it holds as well when π < t ≤ 2π, because
−−→
now QP is in the same direction as f ′ (t), so the sign of cos(t/2) takes care of this sign change.)
What is interesting, as Huygens discovered, is that the pendulum bob follows the arc of a congruent
cycloid.
b. Note first that e′r (t) = θ ′ (t)eθ (t) and e′θ (t) = −θ ′ (t)er (t). Now, differentiating g(t) =
r(t)er (t), we obtain
as required.
3.6. HIGHER-ORDER PARTIAL DERIVATIVES 69
c. Recall that A0 = g(t) × g′ (t) = r 2 (t)θ ′ (t)er (t) × eθ (t). Since the force field is inverse
square, we have g′′ (t) = −GM/r(t)2 er (t), so
GM
g′′ (t) × A0 = − e r (t) × r 2
(t)θ ′
(t)e r (t) × e θ (t) = GM θ ′ (t)eθ (t) = GM e′r (t).
r(t)2
Since A0 , G, and M are constants, this means that g′ (t) × A0 = GM (er (t) + c) for some constant
vector c.
d. We have g(t) · g′ (t) × A0 = A0 · g(t) × g′ (t) = kA0 k2 , so
(assuming, as per the problem, that c is a negative scalar multiple of e1 ). If kck ≥ 1, we see that
as cos θ(t) → (1/kck)− , r(t) → ∞. Since the orbit of a planet is bounded, we infer that in this case
we must have kck < 1 and it now follows from part a that the orbit is an ellipse with one focus at
the origin.
e. We know from Proposition 5.1 that the position vector of the planet sweeps out area
at the constant rate 12 kA0 k. It sweeps out the area of the ellipse in one period, so πab/T = 12 kA0 k.
r
b2 kA0 k2 3/2 kA0 k2
Now we infer from parts a and d that = , and so we have ab = a , whence
a GM GM
2πab 2πa3/2
T = = √ , as required.
kA0 k GM
3.5.16 The bicycle is going from right to left. Since the front wheel can turn but the rear wheel
cannot, following the path of the rear wheel (the solid curve) a constant distance along each tangent
line must give us the position of the front wheel (the dotted curve).
∂f ∂f
3.6.1 a. Note first that since f vanishes on the axes, (0) = (0) = 0. By the quotient
∂x ∂y
rule, we have
∂f y(x4 + 4x2 y 2 − y 4 ) ∂f 0
= , so = −y, and
∂x (x2 + y 2 )2 ∂x y
∂f x(x4 − 4x2 y 2 − y 4 ) ∂f x
= , so = x.
∂y (x2 + y 2 )2 ∂x 0
70 3. THE DERIVATIVE
∂2f ∂ ∂f ∂2f ∂ ∂f
b. We have (0) = (0) = 1 and (0) = (0) = −1.
∂x∂y ∂x ∂y ∂y∂x ∂y ∂x
c. It follows immediately from Theorem 6.1 that f cannot be C2 at 0.
∂2f ∂2f
3.6.2 a. We have + = 6 − 6 = 0.
∂x2 ∂y 2
∂f 2x ∂f 2y ∂2f y 2 − x2 ∂2f
b. We have = 2 and = . Then = 2 and =
∂x x + y2 ∂y x2 + y 2 ∂x2 (x2 + y 2 )2 ∂y 2
x2 − y 2 ∂2f ∂2f
2 2 2 2
, so 2
+ 2 = 0.
(x + y ) ∂x ∂y
∂2f ∂2f ∂2f
c. We have + + = 2 + 4 − 6 = 0.
∂x2 ∂y 2 ∂z 2
∂f ∂2f
d. We have = −x(x2 + y 2 + z 2 )−3/2 , so = (2x2 − y 2 − z 2 )(x2 + y 2 + z 2 )−5/2 .
∂x ∂x2
Permuting the variables, we have
∂2f ∂2f ∂2f
2
+ 2
+ 2
= (x2 + y 2 + z 2 )−5/2 (2x2 − y 2 − z 2 ) + (2y 2 − x2 − z 2 ) + (2z 2 − x2 − y 2 ) = 0.
∂x ∂y ∂z
Thus, using Theorem 6.1, summing and doing the algebra carefully,
2
∂2F 1 ∂F 1 ∂2F ∂ f 2 ∂2f ∂2f 2
+ + 2 2 = cos θ + 2 sin θ cos θ + 2 sin θ
∂r 2 r ∂r r ∂θ ∂x2 ∂x∂y ∂y
1 ∂f ∂f 1 ∂f ∂f
+ cos θ + sin θ + 2 (−r cos θ) + (−r sin θ)
r ∂x ∂y r ∂x ∂y
2 2 2
1 2∂ f 2 2 ∂ f 2∂ f 2
+ 2 r (sin θ) − 2r (cos θ sin θ) + r (cos θ)
r ∂x2 ∂x∂y ∂y 2
∂2f ∂2f
= + ,
∂x2 ∂y 2
as desired. Whew!
r
3.6.8 If F = r n cos nθ, then
θ
∂2F 1 ∂F 1 ∂2F
2 n−2
+ + = n(n − 1) + n − n r cos nθ = 0.
∂r 2 r ∂r r 2 ∂θ 2
The analogous computation holds with sin.
r 1 h′′ (r)
3.6.9 Suppose F = h(r) is harmonic. Then we have h′′ (r) + h′ (r) = 0. Thus, ′ =
θ r h (r)
1
− , so log h′ (r) = − log r + const. We now infer that h′ (r) = c/r for some constant c, so h(r) =
r
c log r + c′ for some constants c and c′ .
3.6.10 We wish to check that the given functions are solutions of the equation
∂f 2 ∂ 2 f ∂f 2 ∂ 2 f
∂f ∂f ∂ 2 f
1+ −2 + 1+ = 0.
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2
∂2f −3/2 2x
= 2 (ex + e−x )2 − 4y 2 y(e − e−2x ),
∂x∂y
∂2f −x 2
2 −3/2 x
= −2 (ex
+ e ) − 4y (e + e−x )2 .
∂y 2
Then, as is best checked with a computer algebra system,
∂f 2 ∂ 2 f ∂f 2 ∂ 2 f
∂f ∂f ∂ 2 f
1+ −2 + 1+
∂y ∂x2 ∂x ∂y ∂x∂y ∂x ∂y 2
−5/2 1
= (ex + e−x )2 − 4y 2 − e4x + 4e2x + 6 + 4e−2x + e−4x − 8y 2 (e2x + e−2x )
2
(ex + e−x )2 + 4y 2 (e2x − e−2x )2 − 2(ex + e−x )2 (ex + e−x )2 − 4y 2 +
1 2x −2x 2
(e − e ) = 0.
4
3.6.11 F is clearly C2 everywhere except perhaps along the set where u = 0 and v ≥ 0. Only
∂F ∂2F
and are not identically zero, and these are easily checked to be continuous along the set
∂u ∂u2
∂2F u
in question. It is obvious that = 0. Suppose we had F = φ(u) + ψ(v) for some functions
∂u∂v v
φ and ψ. Then for all u > 0 we would have
u 3 u
F = φ(u) + ψ(1) = u and F = φ(u) + ψ(−1) = 0,
1 −1
∂F
which, of course, is impossible. What’s wrong? We know that is independent of v on {v < 0}
∂u
and independent of v on {v > 0}, but no one says they must be the same function on both regions.
(The statement f ′ = 0 =⇒ f = const is true on an interval and false on a disconnected set.)
CHAPTER 4
Implicit and Explicit Solutions of Linear Systems
4.1. Gaussian Elimination and the Theory of Linear Systems
4.1.1 We need to show that every solution of Ax = b is also a solution of Cx = d and vice
versa. Start with a solution u of Ax = b. Denoting the rows of A by A1 , . . . , Am , then we have
A 1 · u = b1
A 2 · u = b2
..
.
A m · u = bm
If we apply an elementary operation of type (i), u still satisfies precisely the same list of equations.
If we apply an elementary operation of type (ii), say multiplying the kth equation by r 6= 0, then u
satisfies Ak · u = bk if and only if it satisfies (rAk ) · u = rbk , as is required. As for an elementary
operation of type (iii), suppose we add r times the k th equation to the ℓth ; since Ak · u = bk and
Aℓ · u = bℓ , it follows that
Ax = 0 if and only if
x1 − x3 = 0
x2 − x3 = 0 ,
1
so x = x3 1 .
1
2 −2 4 1 −1 2 1 −1 2
b. A = −1 1 −2 1 −1 2 0 0 0 .
3 −3 6 1 −1 2 0 0 0
Thus, the general solution of Ax = 0 is
x1 x2 −2x3 1 −2
x = x2 = x2 = x2 1 + x3 0 .
x3 x3 0 1
c.
1 2 −1 1 2 −1 1 2 −1 1 2 −1
1 3 1 0 1 2 0 1 2 0 1 2
A=
2
4 3
0
0 5
0
3 5
0
0 −1
−1 1 6 0 3 5 0 0 1 0 0 1
1 2 −1 1 2 0 1 0 0
0 1 2 0 1 0 0 1 0
.
0 0 1 0 0 1 0 0 1
0 0 0 0 0 0 0 0 0
Ax = 0 implies x1 = x2 = x3 = 0, so x = 0.
" # " # " #
1 −2 1 0 1 −2 1 0 1 −2 0 1
d. A = .
2 −4 3 −1 0 0 1 −1 0 0 1 −1
Ax = 0 gives
x1 2x2 −x4 2 −1
x2 x2 1 0
x=
x =
= x2 + x4 .
3 x4
0
1
x4 x4 0 1
76 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
e.
1 1 1 1 1 1 1 1 1 1 1 1
1 2 1 2 0 1 0 1 0 1 0 1
A=
1
3 2 4
0
2 1 3
0
0 1 1
1 2 2 3 0 1 1 2 0 0 1 1
1 1 1 1 1 1 0 0 1 0 0 −1
0 1 0 1 0 1 0 1 0 1 0 1
.
0 0 1 1 0 0 1 1 0 0 1 1
0 0 0 0 0 0 0 0 0 0 0 0
1
−1
Thus, x = x4
−1 .
1
f.
1 2 0 −1 −1 1 2 0 −1 −1
−1 −3 1 2 3 0 −1 1 1 2
A=
1
−1 3 1 1
0
−3 3 2 2
2 −3 7 3 4 0 −7 7 5 6
1 2 0 −1 −1 1 2 0 −1 −1
0 1 −1 −1 −2 0 1 −1 −1 −2
0 0 0 −1 −4 0 0 0 1 4
0 0 0 −2 −8 0 0 0 0 0
1 2 0 0 3 1 0 2 0 −1
0 1 −1 0 2 0 1 −1 0 2
,
0 0 0 1 4 4
0 0 0 1
0 0 0 0 0 0 0 0 0 0
so Ax = 0 gives rise to the general solution
x1 −2x3 +x5 −2 1
x2 x3 −2x5 1 −2
x = x3 = x3
= x3 1 + x5 0 .
x4 −4x5 0 −4
x5 x5 0 1
g.
1 −1 1 1 0 1 −1 1 1 0
1 0 2 1 1 0 1 1 0 1
A=
0
2 2 2 0
0
1 1 1 0
−1 1 −1 0 −1 0 0 0 1 −1
4.1. GAUSSIAN ELIMINATION AND THE THEORY OF LINEAR SYSTEMS 77
1 −1 1 1 0 1 −1 1 1 0
0 1 1 0 1 0 1 1 0 1
→
0
0 0 1 −1
0
0 0 1 −1
0 0 0 1 −1 0 0 0 0 0
1 −1 1 0 1 1 0 2 0 2
0 1 1 0 1 1
→ 0 1 1 0 .
0 0 0 1
−1 0 0 0 1 −1
0 0 0 0 0 0 0 0 0 0
x1 −2x3 −2x5 −2 −2
x2 −x3 −x5 −1 −1
Ax = 0 gives x =
x3 = x3
= x3 1 + x5 0.
x
4 x 5 0
1
x5 x5 0 1
h.
1 1 0 5 0 −1 1 1 0 5 0 −1
0 1 1 3 −2 0 0 1 1 3 −2 0
A=
−1
2 3 4 1 −6
0
3 3 9 1 −7
0 4 4 12 −1 −7 0 4 4 12 −1 −7
1 1 0 51 1 0 0 −15 0 −1
0 1 1 0
3 1 −2 1 0
3 −2 0
0 0 0 0
0 0 7 0 −7
0 1 −1
0 0 0 00 0 7 0 −70 0 0
1 1 0 5 0 −1 1 0 −1 2 0 1
0 1 1 3 0 −2 −2
0 1 1 3 0 .
0 0 0 0 1 −1 −1
0 0 0 0 1
0 0 0 0 0 0 0 0 0 0 0 0
x1 x3 −2x4 −x6 1 −2 −1
x2 −x3 −3x4 +2x6 −1 −3 2
x x 1 0 0
3 3
Ax = 0 gives x = = = x3 + x4 + x6 .
x4 x4
0 1 0
x x6 0 0 1
5
x6 x6 0 0 1
4.1.4 a.
2 1 −1 3 1 2 1 0 1 2 1 0
[A|b] = 1 2 1 0 2 1 −1 3 0 −3 −3 3
−1 1 2 −3 −1 1 2 −3 0 3 3 −3
78 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
1 2 1 0 1 2 1 0 1 0 −1 2
0 1 1 −1 → 0 1 1 −1 0 1 1 −1 .
0 3 3 −3 0 0 0 0 0 0 0 0
Thus, the system of equations from the matrix [A|b] is given in reduced echelon form by
− x3 = 2 x1
x2 + x3 = −1 ,
x1 2+x3 2 1
from which we read off x = x2 = −1−x3 = −1 + x3 −1.
x3 x3 0 1
b.
" # " #
1 1 1 1 6 1 1 1 1 6
[A|b] =
3 3 2 0 17 0 0 −1 −3 −1
" # " #
1 1 1 1 6 1 1 0 −2 5
.
0 0 1 3 1 0 0 1 3 1
Thus, the system of equations from the matrix [A|b] is given in reduced echelon form by
x1 + x2 − 2x4 = 5
x3 + 3x4 = 1 ,
4.1.7 We first find a, b, and c so that the given points satisfy x2 + y 2 + ax + by + c = 0. This
means we must solve the system
2a + 6b + c = −40
−a + 7b + c = −50
−4a − 2b + c = −20 .
Thus, x2 + y 2 + 2x − 4y − 20 = 0 gives the circle that contains the three points. To find the center
and radius of this circle, we complete the square:
Thus, the constraint equations are b3 − b1 = 2b1 + b2 + b4 = 0. The vector b in part a does not
satisfy the second constraint, the vector in part b satisfies both constraints, and the vector in part
c satisfies neither constraint.
1 1
4.1.10 a. Since the matrix A = 1 2 has rank at most 2, there will be constraint(s) for
1 2
the equation Ax = b to be consistent. Hence the vectors cannot span R3 .
1 1 1
b. A = 1 2 3 has rank 2, so there will be a constraint equation for Ax = b
1 2 3
to be consistent. Hence the vectors cannot span R3 .
1 1 3 2
c. A = 0 −1 5 3 has rank 2, so the vectors do not span R3 .
1 1 3 2
1 2 0
d. A = 0 1 1 has rank 3; thus, Ax = b is always consistent, and so the
−1 1 5
3
vectors span R .
4.1.11 a.
3 −1 b1 3 −1 b1
6 −2 b2 0 0 b2 − 2b1
−9 3 b3 0 0 b3 + 3b1
c.
1 2 1 b1 1 2 1 b1 1 2 1 b1
0 1 1 b2 0 1 1 b2 0 1 1 b2
−1 3 4 b3 0 5 5 b3 + b1 0 0 0 b3 + b1 − 5b2
−2 −1 1 b4 0 3 3 b4 + 2b1 0 0 0 b4 + 2b1 − 3b2
4.1.12 a.
To write
b as a linearcombination of these three vectors we must solve the system
1 0 1
0 1 1
Ax = b, where A = . The constraint equations are given by
1 1 1
1 2 0
1 0 1 b1 1 0 1 b1
0 1 1 b2 0 1 1 b2
1 1 1 b3 0 1 0 b3 − b1
1 2 0 b4 0 2 −1 b4 − b1
1 0 1 b1 1 0 1 b1
0 1 1 b2 0 1 1 b2
.
0 0 −1 b3 − b2 − b1 0 0 1 b1 + b2 − b3
0 0 −3 b4 − 2b2 − b1 0 0 0 2b1 + b2 − 3b3 + b4
1 0 2 b1 1 0 2 b1 1 0 2 b1
0 1 −1 b2 0 1 −1 b2 0 1 −1 b2
1 1 1 b3 0 1 −1 b3 − b1 0 0 0 b3 − b2 − b1
1 2 0 b4 0 2 −2 b4 − b1 0 0 0 b4 − 2b2 − b1
If α = −1, we have
1 1 −1 b1 1 1 −1 b1
−1 2 −1 b2 0 3 −2 b1 + b2 ,
−1 −1 1 b3 0 0 0 b1 + b3
4.1.17 a. Suppose (AB)x = 0. Then A(Bx) = 0, so, since A is nonsingular, Bx = 0. But now
since B is nonsingular, we must have x = 0.
b. Suppose first that B is singular; then there is a nonzero vector x so that Bx = 0. Then
that same nonzero vector x satisfies (AB)x = A(Bx) = 0, so AB is singular. Now suppose B is
nonsingular and A is singular. There is a nonzero vector y so that Ay = 0. Since B is nonsingular,
there is a (a fortiori nonzero) vector x so that Bx = y. Then we have (AB)x = A(Bx) = Ay = 0.
Thus, AB is singular.
4.1.20 a. By successively adding row 1, row 2, . . . , and row m − 1 to row m, we obtain a row
of zeroes in the last row. Proceeding to echelon form, we see that there must be a row of zeroes,
and so r < m.
An alternative argument is as follows. Because the sum of the rows is 0, any vector b for which
Ax = b is consistent must satisfy b1 + · · · + bm = 0. (To see this, note that if Ax = b, then this
means that Ai · x = bi , so 0 = (A1 + · · · + Am ) · x = A1 · x + · · · + Am · x = b1 + · · · + bm .) Since
constraint equations arise from rows of zeroes in the echelon form of the matrix, we must have
r < m.
b. Since ci 6= 0, as in part a, we first multiply the ith row by ci , and then add to it c1
times row 1, c2 times row 2, . . . , and cm times row m. We thereby obtain a row of zeroes, and the
echelon form must therefore contain a row of zeroes.
1
1
4.1.21 a. If a1 + a2 + · · · + an = 0, then Ax = 0 has a nontrivial solution, viz., x = . . Thus,
..
A is singular, and so r < n.
1
P
n
b. As in part a, if there is a nonzero vector c such that ci ai = 0, then Ax = 0 has
i=1
the nontrivial solution x = c. Thus, A is singular, so r < n.
1 x1 x21 1 x1 x21 1 x1 x21
1 x2 x22 0 x2 − x1 x22 − x21 0 1 x1 + x2
1 x3 x23 0 x3 − x1 x23 − x21 0 1 x1 + x3
1 x1 x21 1 x1 x21
0 1 x1 + x2 0 1 x1 + x2 .
0 0 x3 − x2 0 0 1
Therefore, A is nonsingular.
86 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
x21 x1 1 a y1
2
x2 x2 1 b = y2
x23 x3 1 c y3
is equivalent to solving
1 x1 x21 c y1
2
1 x2 x2 b = y 2 .
1 x3 x23 a y3
4.1.23 a. As in the hint, the three points P1 , P2 , and P3 are collinear if and only if there are
numbers a, b, and c with a and b not both zero such that the points satisfy ax + by + c = 0. That
is, they are collinear if and only if the system
ax1 + by1 + c = 0
ax2 + by2 + c = 0
ax3 + by3 + c = 0
a
has a solution b with a and b not both zero. But notice that the only solution with a = b = 0
c
is the trivial solution, a = b = c = 0; thus, P1 , P2 , and P3 are collinear if and only if the system
has a nontrivial solution. Of course the coefficient matrix for this system is A.
b. By part a, if the points P1 , P2 , and P3 are not collinear, then Ax = 0 has only the
trivial solution, so A is nonsingular. Now to find a circle passing through P1 , P2 , and P3 we need
to solve the system
a x21 + y12
i.e., Ax = b, where x = b and b = − x22 + y22 . Since A is nonsingular, this system has a
c x23 + y32
unique solution. Thus, there is a unique circle passing through P1 , P2 , and P3 .
4.2. ELEMENTARY MATRICES AND CALCULATING INVERSE MATRICES 87
4.2.5 For an elementary matrix E of type (i), E −1 = E, for interchanging rows i and j of
a matrix and then interchanging rows i and j of the result gives us our original matrix. For an
elementary matrix of type (ii), with c 6= 0 in the ii-entry, the inverse is given by putting 1/c in the
ii-entry. For an elementary matrix of type (iii), we replace c by −c. That is, after adding c times
row i to row j, if we then add −c times row i to row j, we have returned to the original matrix. In
each of these cases, the inverse is again an elementary matrix.
4.2.6 By Theorem 2.1, AB and B are invertible. Since B −1 is also invertible, we infer from
Proposition 4.3 of Chapter 1 that A = (AB)(B −1 ) is invertible as well. Indeed, we have A−1 =
B(AB)−1 .
4.2.8 a. If A is nonsingular, we know its reduced echelon form is I. There are therefore
finitely many elementary row operations that transform A into I, each of these operations being
implemented by multiplying on the left by an elementary matrix. Thus, there are finitely many
elementary matrices E1 , E2 , . . ., Ek so that Ek Ek−1 · · · E2 E1 A = I.
b. Let B = Ek Ek−1 · · · E2 E1 . Since every elementary matrix is invertible, we have
A = E1−1 E2−1 · · · Ek−1
−1
Ek−1
= B −1 , and so AB = B −1 B = I. (Or, more directly, from A = B −1 we
infer that A−1 = B.)
92 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
1 1 1 3 1 1 1 3
1 1 3 1 0 1 0 −1
,
1 3 1 1 0 0 1 −1
3 1 1 1 0 0 0 −10
so the only solution is the trivial solution. Thus, the vectors form a linearly independent set.
f. These vectors form a linearly dependent set. For example, their sum is 0.
4.3.3 Suppose c1 (v − w) + c2 (2v + w) = 0. Then (c1 + 2c2 )v +"(c2 − c1 )w#= 0." Since {v,
# w}
1 2 1 0
is linearly independent, we infer that c1 + 2c2 = −c1 + c2 = 0. But , so
−1 1 0 1
the only solution of this system is c1 = c2 = 0, as desired.
4.3.5 Suppose c1 v1 +c2 v2 +· · ·+ck vk = 0. Then, for i = 1, . . . , k, (c1 v1 +c2 v2 +· · ·+ck vk )·vi =
0, so c1 (v1 · vi ) + · · · + ci (vi · vi ) + · · · + ck (vk · vi ) = ci kvi k2 = 0. Since vi 6= 0, we must have ci = 0
for i = 1, . . . , k. Hence {v1 , . . . , vk } is linearly independent.
| | |
4.3.6 Suppose k > n, let v1 , . . . , vk ∈ Rn , and write A = v1 v2 · · · vk . Since
| | |
rank(A) ≤ n < k, the system Ax = 0 will have a nontrivial solution. Thus, {v1 , . . . , vk } is linearly
dependent. If we have k linearly independent vectors in Rn , we conclude that k ≤ n.
4.3.7 Suppose {v1 , . . . , vk } is linearly dependent. Then there exist scalars c1 , . . . , ck , not all
P
k P
zero, such that ci vi = 0. If cj 6= 0, we write vj = −(ci /cj )vi .
i=1 i6=j
4.3.8 We prove the contrapositive of the statement: If v1 6= 0 and vi+1 ∈ / Span(v1 , . . . , vi ) for
all i = 1, 2, . . . , k − 1, then {v1 , . . . , vk } is linearly independent. We proceed by induction. Suppose
v1 6= 0. Then {v1 } is linearly independent. Now suppose {v1 , . . . , vi } is linearly independent for
some 1 ≤ i ≤ k − 1. Then if vi+1 6∈ Span(v1 , . . . , vi ), we see from Proposition 3.2 that the set
{v1 , . . . , vi , vi+1 } is linearly independent.
c. x∈ V ⇐⇒ x1+
2x2+ 3x3 =
0. The general solution of this equation is x =
−2 −3 −2 −3
x2
1 + x3
0 , and so
1 , 0 gives a basis for V and dim V = 2.
0 1 0 1
1 0 0
1 0 0
d. The general
solution
of this system of equations is x = x2 0 + x4 1 + x5 0 ,
1 0 0 0 1 0
1 0 0 0 0 1
so a basis for V is 0 , 1 , 0 , and dim V = 3.
0 1 0
0 0 1
4.3.14 To show that {v1 , . . . , vn } is a basis for Rn , it suffices by Proposition 3.9 to show that
this is a linearly independent
" set of # vectors.
" #
2 3 3 1 0 3
a. From , we see that {v1 , v2 } is linearly indepen-
3 5 4 0 1 −1
c1 3
dent and that b = 3v1 − v2 , so the coordinates of b with respect to this basis are = .
c2 −1
1 1 1 1 1 0 0 0
b. From 0 2 3 1 0 1 0 2 , we see that {v1 , v2 , v3 } is
3 2 2 2 0 0 1 −1
linearly
independent
and that 2v2 − v3 = b, so the coordinates of b with respect to this basis are
c1 0
c2 = 2 .
c3 −1
96 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
1 1 1 3 1 0 0 3
c. From 0 1 1 0 0 1 0 −2 , we see that {v1 , v2 , v3 } is
1 2 1 1 0 0 1 2
c1 3
linearly independent. Moreover, the coordinates of b with respect to this basis are c2 = −2 .
c3 2
1 1 1 1 2 1 0 0 0 2
0 1 1 1 0 0 1 0 0 −1
d. From
0
, we see that {v1 , v2 , v3 , v4 }
0 1 3 1
0
0 1 0 1
0 0 1 4 1 0 0 0 1 0
c1 2
c2 −1
is linearly independent. Moreover, the coordinates of b with respect to this basis are
c3 = 1 .
c4 0
1 2 0 2
0 1 1 0
4.3.15 x ∈ V ∩ W if and only if x = a + b and x = c + d
1 for some scalars
1 1 1
1 2 0 2
a, b, c, and d.
1 2 0 2 1 0 0 1
1
0 1 1 0 0 1 0
A= 1
2 ,
1
1 1 1 0 0 1 −2
1 2 0 2 0 0 0 0
−a −1 −2
1
−b − 1 −1
so the vector 2
c = 1 = 2 1 spans the solution set of Ax = 0. This means that
2
d 1 2
1 2 0 2 4
0 1 1 0 1
x = 2
1 + 1 1 = 1 1 + 2 1 = 3 spans, and therefore gives a basis for, V ∩ W .
1 2 0 2 4
by definition.
4.3.17 To establish the first part of the Proposition, suppose v1 , . . . , vk span V and were to form
a linearly dependent set. Then one of the vectors, say vk , is a linear combination of the remaining
vectors, and so Span(v1 , . . . , vk−1 ) = Span(v1 , . . . , vk ). Continuing in this fashion, we end up with
a linearly independent subset of {v1 , . . . , vk } still spanning V . If there are ℓ < k vectors in that
subset, then it follows that those ℓ vectors form a basis for V , and so dim V = ℓ. Since dim V = k,
this is a contradiction; thus, the k vectors must have been linearly independent.
To establish the latter part of the Proposition, let {v1 , . . . , vk } be a linearly independent set in
V . Let W = Span(v1 , . . . , vk ). Then W ⊂ V and dim W = k. By Lemma 3.8, we have W = V .
4.3.18 This is the same as the proof of Theorem 3.5: If v1 , . . . , vk span V , we’re done; if
not, choose vk+1 ∈ V with vk+1 ∈ / Span(v1 , . . . , vk ). Then {v1 , . . . , vk , vk+1 } is still linearly
independent. Repeat the process; since n linearly independent vectors span Rn , this process must
terminate after a finite number of steps.
4.3.19 Pick a basis {v1 , . . . , vk } for W. By Exercise 18, we can find vectors vk+1 , . . . , vℓ so that
{v1 , . . . , vℓ } forms a basis for V . Hence, dim W = k ≤ ℓ = dim V .
Thus,
d1 w1 + · · · + dk wk + ck+1 vk+1 + · · · + cm vm = 0.
Span(v1 , . . . , vk ). Since {v1 , . . . , vk , vk+1 , . . . , vn } is, by construction, a basis for Rn , we infer that
ck+1 = · · · = cn = 0, as required.
c. This is immediate from part b: dim ker(T ) = k and dim image(T ) = n − k, so
dim ker(T ) + dim image(T ) = n.
" # " # " #
1 0 0 1 1 1
4.3.23 a. Suppose a +b +c = O. Then we have the following
0 1 1 0 1 −1
system:
a
+ c = 0
b + c = 0
a − c = 0 ,
(" # " # " #)
1 0 0 1 1 1
whose only solution is a = b = c = 0. Therefore, , , is linearly
0 1 1 0 1 −1
independent.
b. Suppose c1 f1 + c2 f2 + c3 f3 = 0. This means that c1 t + c2 (t + 1) + c3 (t + 2) = 0.
Evaluating at t = 0, −1, and −2, we obtain the system
c2 + 2c3 = 0
−c1 + c3 = 0
2c1 − c2 = 0 .
0 1 2 1 0 −1
Reducing the coefficient matrix, we find −1 0 1 0 1 2 , so we have a
−2 −1 0 0 0 0
nontrivial solution c1 = c3 = 1, c2 = −2. Thus, {f1 , f2 , f3 } is in fact linearly dependent.
c. Suppose c1 f1 + c2 f2 + c3 f3 = 0. Then c1 (1) + c2 cos t + c3 sin t = 0. Evaluating at
t = 0, π/2, and π, we obtain the system
c1 + c2 = 0
c1 + c3 = 0
c1 − c2 = 0 ,
c1 + c2 + c3 = 0
c1 − c3 = 0
c1 − c2 + c3 = 0 .
Since the reduced echelon form of the coefficient matrix is the identity matrix, the only solution is
c1 = c2 = c3 = 0, and so the set {f1 , f2 , f3 } is linearly independent.
100 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
4.3.24 a. Let Eij be the m × n matrix whose ij-entry is 1 and all of whose other entries are
0. Then it is clear that {Eij , i = 1, . . . , m, j = 1, . . . , n} give a basis for Mm×n . Thus, Mm×n is
mn-dimensional.
b. Denote by D, U, and L the sets of diagonal, upper-triangular, and lower-triangular
matrices, respectively. First, we check that these are all subspaces. (i) It is clear that O ∈ D, U,
and L. (ii) Since c0 = 0 for all scalars c, any scalar multiple of an upper (resp., lower) triangular
matrix is upper (resp., lower) triangular and any scalar multiple of a diagonal matrix is diagonal.
(iii) Since 0 + 0 = 0, the sum of two upper triangular matrices is upper triangular, the sum of two
lower triangular matrices is lower triangular, and the sum of two diagonal matrices is diagonal.
Hence U, L, and D are subspaces of Mn×n .
Using the notation given in the solution of part a, we see that:
(1) the matrices Eii , i = 1, . . . , n, form a basis for D; hence, dim D = n.
(2) the matrices Eij , i ≤ j, form a basis for U. Thus, dim U = n+(n−1)+· · · +2+1 = n(n+1)/2.
(3) the matrices Eij , i ≥ j, form a basis for L. As in the preceding case, dim L = n(n + 1)/2.
c. First we check that S is a subspace of Mn×n . (i) Since OT = O, O ∈ S. (ii) If A ∈ S
and c ∈ R then (cA)T = cAT = cA, so cA ∈ S. (iii) If A, B ∈ S then (A + B)T = AT + B T = A + B,
so A + B ∈ S. Similarly, K is a subspace of Mn×n . (i) Since OT = O = −O, O ∈ K. (ii) If A ∈ K
and c ∈ R then (cA)T = cAT = −cA, so cA ∈ K. (iii) If A, B ∈ K, then (A + B)T = AT + B T =
−A − B = −(A + B), so A + B ∈ K.
We see that the matrices Eii , i = 1, . . . , n, and Eij + Eji , i < j, form a basis for S. Thus,
dim S = dim(U) = n(n + 1)/2. It is easy to see that {Eij − Eji : i < j} is a basis for K. Thus,
dim K = (n − 1) + (n − 2) + · · · + 2 + 1 = n(n − 1)/2. As in Exercise 1.4.36, if A ∈ Mn×n is
arbitrary, then we write A = 12 (A + AT ) + 12 (A − AT ), so A ∈ S + K.
(S + T )(u + v) = S(u + v) + T (u + v) = S(u) + S(v) + T (u) + T (v)
= S(u) + T (u) + S(v) + T (v) = (S + T )(u) + (S + T )(v),
(S + T )(cv) = S(cv) + T (cv) = cS(v) + cT (v) = c S(v) + T (v) = c(S + T )(v),
(aT )(u + v) = a T (u + v) = a T (u) + T (v) = aT (u) + aT (v) = (aT )(u) + (aT )(v),
(aT )(cv) = a T (cv) = a cT (v) = acT (v) = c(aT )(v).
The eight properties in the definition of a vector space are all immediate consequences of the
algebraic properties of the real numbers.
(1) (S + T )(v) = S(v) + T (v) = T (v) + S(v) = (T + S)(v).
4.3. LINEAR INDEPENDENCE, BASIS, AND DIMENSION 101
(2) (R + S) + T (v) = (R + S)(v) + T (v) = R(v) + S(v) + T (v) = R(v) + S(v) + T (v) =
R(v) + (S + T )(v) = R + (S + T ) (v).
(3) Define 0 ∈ V ∗ by 0(v) = 0 for all v. Then (0 + T )(v) = 0(v) + T (v) = T (v) for all v.
(4) Given T ∈ V ∗ , define −T by (−T )(v) = −T (v). Then T + (−T ) (v) = T (v) + (−T )(v) =
T (v) − T (v) = 0 = 0(v).
(5) For all a, b ∈ R, a(bT )(v) = a bT (v) = (ab)T (v) = (ab)T (v).
(6) For all a ∈ R, a(S + T )(v) = a S(v) + T (v) = aS(v) + aT (v) = (aS) + (aT ) (v).
(7) For all a, b ∈ R, (a + b)T (v) = (a + b)T (v) = aT (v) + bT (v) = (aT ) + (bT ) (v).
(8) (1T )(v) = 1T (v) = T (v).
Hence, V ∗ is a vector space.
b. From the definition of the functions fi given in the problem, we have fi (vi ) = 1 and
fi (vj ) = 0 when i 6= j. Suppose c1 f1 + · · · + cn fn = 0. Evaluating this element of V ∗ on vi , we
obtain ci = 0. This holds for i = 1, . . . , n, and so {f1 , . . . , fn } is linearly independent.
Now, suppose T ∈ V ∗ is arbitrary. Let ci = T (vi ). Then for any x = a1 v1 + · · · + an vn , we
have
4.3.27 a. Clearly the 0 function is homogeneous of degree k. If f ∈ Pk,n and c is a scalar, then
(cf )(tx) = cf (tx) = c(tk f (x)) = tk (cf )(x). Similarly, if f, g ∈ Pk,n , then f + g ∈ Pk,n , inasmuch
as (f + g)(tx) = f (tx) + g(tx) = tk f (x) + tk g(x) = tk (f + g)(x).
b. It is evident that if i1 + i2 + · · · + in = k, the monomial xi11 xi22 · · · xinn is homogeneous
of degree k and that all such monomials span Pk,n . Now we must establish linear independence.
Let’s argue by induction on n. When n = 1, there is nothing to prove. Now suppose we assume
102 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
the corresponding result for Pk,n−1 (for all k) and we are given
X
ci1 i2 ...in xi11 xi22 · · · xinn = 0.
i1 +i2 +···+in =k
into n sections. The lines are allowed to be adjacent (with no dots between them) or to fall at the
beginning or end of the array. The number of dots between the (i − 1)th and ith vertical lines is
going to give us the exponent on xi . To count the number of ways of creating such configurations,
it is easier, as the figure suggests, to consider an array of k + (n − 1) dots and choose n − 1 of them
(these will be the dividing lines). Thus, there are n−1+kn−1 = n−1+k
k different monomials.
P n−1+i P n+k
k k
d. We have i = dim Pi,n = dim Pk,n+1 = k , because there is a one-
i=0 i=0
to-one correspondence between homogeneous polynomials of degree k or less in n variables and
homogeneous polynomials of degree k in n + 1 variables: If i1 + i2 + · · · + in ≤ k, then
k−(i +i2 +···+in )
xi11 xi22 · · · xinn ←→ xi11 xi22 · · · xinn xn+1 1 .
4.4.1 Let’s show that R(B) ⊂ R(A) if B is obtained by performing any row operation on A.
Obviously, a row interchange doesn’t affect the span. If Bi = cAi and all the other rows are the
same, c1 B1 +· · ·+ci Bi +· · ·+cm Bm = c1 A1 +· · ·+(ci c)Ai +· · ·+cm Am , so any vector in R(B) is in
R(A). If Bi = Ai + cAj and all the other rows are the same, then c1 B1 + · · · + ci Bi + · · · + cm Bm =
c1 A1 + · · · + ci (Ai + cAj ) + · · · + cm Am = c1 A1 + · · · + ci Ai + · · · + (cj + cci )Aj + · · · + cm Am , so
once again any vector in R(B) is in R(A).
To see that R(A) ⊂ R(B), we observe that the matrix A is obtained from B by performing the
(inverse) row operation (this is why we need c 6= 0 for the second type of row operation). Since
R(B) ⊂ R(A) and R(A) ⊂ R(B), we have R(A) = R(B).
1 2 1 1 b1 1 2 1 1 b1
−1 0 3 4 b2 0 2 4 5 b2 + b1 ,
2 2 −2 −3 b3 0 0 0 0 b3 − b1 + b2
so C(A) = {b ∈ R3 : b1 − b2 − b3 = 0}.
1
b. Since N(AT ) = C(A)⊥ , it is enough to find C(A)⊥ . From part a, we know −1 is
−1
in C(A)⊥ , but does it span? We know from part a that dim C(A) = 2, so dim(C(A)⊥ ) = 1, and
the answer is yes.
" # " # " #
1 2 3 1 2 3 1 0
4.4.3 a. A = = U = R, with E = , so R(A)
2 4 6 0 0 0 −2 1
1 (" #) −2 −3
1
has basis 2 , C(A) has basis , N(A) has basis 1 , 0 , and N(AT ) has
2
3 0 1
(" #)
−2
basis .
1
2 1 3 1 0 2 1 0 0
b. Since U = 0 1 −1 , R = 0 1 −1 , and E = −2 1 0 ,
0 0 0 0 0 0 3 −3 2
2 0
2 1
−2
R(A) has basis 1 , 1 , C(A) has basis 4 , 3 , N(A) has basis 1 ,
3 −1 3 3 1
3
T
and N(A ) has basis −3 .
2
104 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
" # " #
1 −2 1 0 1 −2 0 1
c. Since U = and R = , R(A) has ba-
0 0 1 −1 0 0 1 −1
1 0
2 −1
(" # " #)
−2 0
1 1
1 0
sis
, , C(A) has basis , , N(A) has basis , , and
1 1
2 3
0 1
0 −1 0 1
N(AT ) = {0}.
1 −1 1 1 0 1 0 2 0 2
0 1 1 0 1 0 1 1 0 1
d. Since U = , and
0 0 0 2 −2 , R = 0 0 0 1 −1
0 0 0 0 0 0 0 0 0 0
1 0 0
1 0 0 0
−1 1 0
−1 1 0 0
E = 2 −2
, R(A) has basis 1 , 1 , 0 , C(A) has basis
1 0
1 0 1
0 2 −1 2
0 1 −1
−2 −2
1 −1 1
0
−1 −1
1 0 1
, , , N(A) has basis 1 0 , and N(AT ) has basis 2 .
0 2 2
−1
0 1
−1 1 0
2
0 1
1 1 0 1 −1 1 1 0 1 −1
0 0 2 −2 2 0 0 1 −1 1
e. Since U = 0
, R =
, and E =
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0
1 0
1 0 0 0
1 0
1 0
−1 1 0 0
, R(A) has basis 0 , 1 , C(A) has basis 1 , 1 ,
−1 −1 0
1
2 1
1 −1
−1
2 −1 0 1
1
−1 1
−1 −1 1
−1 2
1 0 0
−1 −1
N(A) has basis ,
0 1 −1 , , and N(A T ) has basis ,
1 0 .
0 1 0
0 1
0 0 1
1 1 0 5 0 −1 1 0 −1 2 0 1
0 1 1 3 −2 0 0 1 1 3 0 −2
f. Here U = ,
0 0 0 0 7 −7 , R = 0 0 0 0 1 −1
0 0 0 0 0 0 0 0 0 0 0 0
4.4. THE FUNDAMENTAL SUBSPACES 105
1 0 0
1 0 0 0
0 1 0
0 1 0 0 −1 1 0
and E = , so we see that R(A) has basis , , ,
1 −3 1 0
2 3 0
0 0 1
−1 −1 −1 1
1 −2 −1
1 −2 −1
1 1 0
−1 −3 2
0 1 −2 1 0 0
C(A) has basis , and N(AT )
−1
,
, 2 1 , N(A) has basis
,
,
0 1 0
0 4 −1
0 0 1
0 0 1
−1
−1
has basis
−1 .
1
3 −1 b1
4.4.4 a. Reducing the augmented matrix [A|b] yields 0 0 2b1 − b2 . This gives the
0 0 3b1 + b3
constraint
" equations# 2b 1 − b 2 =
" #0 and 3b 1 + b 3 = 0 for C(A) and N(A) = Span (1, 3) . Set
2 −1 0 1
X= and Y = . Then C(A) = N(X) and N(A) = C(Y ).
3 0 1 3
h i −1
b. X = 3 −2 1 , Y = 1 .
1
" # −2
1 0 −1 0
c. X = , Y = 1 .
2 −1 0 −1
1
0 1
4.4.5 a. First, A must be 3 × 3; let its columns be a1 , a2 , a3 . Since 1 and 0 are in
0 1
1
N(A), we know that a2 = 0 and a1 = −a3 . In other words a1 must span C(A). But 1 and
1
0
1 are nonparallel, so no such matrix can exist.
1
A slightly more sophisticated argument would be to show that A must have rank at least 2
since its column space is (at least) a plane, but then its nullspace cannot be (at least) a plane.
106 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
1 0 −1 −1
b. 0 1 0 0
0 1 0 0
2 −1 0
c. One example is 0 0 0 .
2 −1 0
h i
d. The matrix 1 1 −1 works.
2 0 1
e. 0 2 1 works.
2 2 2
" #
0 1
f.
0 0
g. This is impossible by Corollary 4.6, since 1 + 1 6= 3.
0 0 1 1
4.4.6 a. A = 0 0 0 . C(A) = Span 0 , which is a subset of N(A).
0 0 0 0
0 1 0 1
b. A = 0 0 1 . N(A) = Span 0 , which is clearly a subset of C(A).
0 0 0 0
4.4.10 Since U is a matrix in echelon form, its last m − r rows are 0. When we consider the
matrix product A = BU , we see that every column of A is a linear combinations of the first r
columns of B; hence, these r column vectors span C(A). Since dim C(A) = r, by Proposition 3.9,
these column vectors must give a basis.
" #! 1
1
4.4.11 a. C(A) = Span and R(A) = Span 2 , so, for any b ∈ R we are looking
1
3
1 " # 1 " # 1
1 14 b
for s ∈ R so that A s 2 = b . But A 2 = , so s = b/14 and x = 2 .
1 14 14
3 3 3
108 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
1 0 " #
b1
b. Here C(A) = R2 and R(A) = Span 1 , 1 , so given b = , we are
b2
1 −1
1 0 " #" # " #
3 0 s b1
looking for x = s 1 + t 1 so that Ax = b. This yields the system = ,
0 2 t b2
1 −1
1 0
b1 b2
so s = b1 /3 and t = b2 /2. Thus, x = 1 + 1 .
3 2
1 −1
b1 − b2
An alternative solution is as follows. It is easy to see that x0 = b2 is a solution and
0
−2 b1 − b2
that a = 1 gives a basis for N(A) = R(A)⊥ . Then x = x0 − proja x0 = b2 −
1 0
1
−2 3 b1
b1 b2 1
− + 1 3 b1 + 21 b2 ∈ R(A).
=
3 2 1 1
1 3 b1 − 2 b2
4.4.13 a. By Exercise 12b, C(AB) ⊂ C(A), so, by Exercise 4.3.19 and Theorem 4.5,
rank(AB) = dim C(AB) ≤ dim C(A) = rank(A).
b. This is immediate from Exercise 12d.
c. Exercise 12a says that N(B) ⊂ N(AB), so dim(N(B)) ≤ dim(N(AB)). But
rank(AB) = p − dim(N(AB)) ≤ p − dim(N(B)) = rank(B).
d. This is immediate from Exercise 12c.
e. By part a, rank(A) ≥ rank(AB) = n. But if A is m × n, we know that rank(A) ≤ n,
hence rank(A) = n. By part c, we know rank(B) ≥ rank(AB) = n, but we also know that because
B is n × p, we must have rank(B) ≤ n. Hence rank(B) = n.
4.4.14 Exercise 1.4.32 tells us that N(AT A) ⊂ N(A), and Exercise 12 tells us that N(A) ⊂
N(AT A). Therefore, N(AT A) = N(A).
4.4. THE FUNDAMENTAL SUBSPACES 109
4.4.15 a. By Exercise 12, N(A) ⊂ N(AT A). So it suffices to prove that N(AT A) ⊂ N(A).
Suppose x ∈ N(AT A). Then Ax ∈ N(AT ); on the other hand, Ax ∈ C(A); but we know N(AT ) =
C(A)⊥ , so N(AT ) ∩ C(A) = {0}. Therefore, Ax = 0 and x ∈ N(A), as required.
b. rank(A) = n − dim N(A) = n − dim N(AT A) = rank(AT A).
c. By part b, dim C(AT A) = dim C(AT ). Moreover, by Exercise 12, C(AT A) ⊂ C(AT ).
Therefore, by Lemma 3.8, C(AT A) = C(AT ).
4.4.16 a. Suppose x ∈ C(A). Then, by Proposition 4.1, x = Av for some v ∈ Rn . But then
Ax = A(Av) = A2 v = Av = x. Thus, C(A) ⊂ {x ∈ Rn : x = Ax}. The other inclusion follows
immediately, as Ax ∈ C(A), once again by Proposition 4.1.
b. If x = u − Au for some u ∈ Rn , then Ax = Au − A2 u = Au − Au = 0. Thus,
{x : x = u − Au for some u ∈ Rn } ⊂ N(A). On the other hand, if x ∈ N(A), then Ax = 0, so
x = x − Ax.
c. Let x ∈ C(A) ∩ N(A). Since x ∈ C(A) we know by part a that x = Ax. Since
x ∈ N(A), then Ax = 0, and so x = Ax = 0.
d. Given an arbitrary x ∈ Rn , write x = Ax + (x − Ax). Since Ax ∈ C(A) and
x − Ax ∈ N(A), we have shown that every vector in Rn can be written as the sum of vectors in
C(A) and N(A). That is, Rn = C(A) + N(A).
Then AT Ax = kuk2 vvT (sv) = skuk2 (v · v)v = skuk2 kvk2 v, so AT Ax = x for all x ∈ R(A) if and
only if kuk2 kvk2 = 1, i.e., kukkvk = 1.
T uvT u v T u
Now, supposing kukkvk = 1 we have A = uv = = . Letting û =
kukkvk kuk kvk kuk
v T n
and v̂ = , we have kûk = kv̂k = 1 and A = ûv̂ . Since kv̂k = 1, notice that for x ∈ R , the
kvk
projection of x onto v̂ is just (x · v̂)v̂ = (v̂v̂T )x. Thus, T (x) is given by projecting x to v̂ and then
substituting û ∈ Rm for v̂ ∈ Rn .
4.5.1 a. Although xy = 0 is a graph locally away from the origin, there is no neighborhood of
the origin on which it is a graph.
b. This set is the union of the curves xy = π6 + 2πn, n ∈ Z, and xy = 5π
6 + 2πn, n ∈ Z.
Thus, in a neighborhood of each point satisfying this equation, it is a graph.
4.5.2 In each case, the solution set M = f −1 (0) will be a smooth curve (1-dimensional man-
ifold) provided that for each a ∈ M we have ∇f (a) 6= 0. Equivalently, we must check that if
∇f (x) = 0, then x ∈/ M .
1 − 3x2 x
a. ∇f = = 0 if and only if 1 − 3x2 = 0 and y = 0. But for such a point ,
2y y
x
f = y 2 − x3 + x = x(1 − x2 ) 6= 0, so M is a smooth curve.
y
−2x − 3x2
b. ∇f = = 0 if and only if x(2 + 3x) = 0 and y = 0. We note that
2y
0 −2/3
f = 0 and f 6= 0, so the origin is a “trouble point.” (See Figure 1.4 of Chapter 2.)
0 0
" # " #
x
z − xy −y −x 1
c. Consider f : R → R given by f y =
3 2
2 . Df = . Note that
y−x −2x 1 0
z
Df has rank 2 everywhere (neither row vector can be a scalar multiple of the other). Thus, M is a
smooth curve. (See Figure 1.5 of Chapter 2.)
" #
x 2 + y2 + z2 − 1
x
d. Consider f : R → R given by f y =
3 2 . Then
x2 − x + y 2
z
" # " # " #
2x 2y 2z 2x 2y 2z 1 0 2z
Df = ,
2x − 1 2y 0 1 0 2z 0 y z(1 − 2x)
rank < 2 if and only if y = z(1 − 2x) = 0. Substituting in the original equations, we find
so Dfhas
1
that 0 is the only “trouble point.”
0
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 111
" #
x
x2 + y 2 + z 2 − 1
e. Consider f : R3 → R2 given by f y = Then Df = .
" # z 2 − xy
z
2x 2y 2z
. When z = 0, this matrix has rank < 2 if and only if x2 = y 2 . Substituting
−y −x 2z
in the second equation, we find x = y = z = 0, which does not satisfy the first equation. When
z 6= 0, in order for rank(Df ) < 2, we must have 2x + y = 2y + x = 0, which again leads only to the
origin. Thus, M is a smooth curve.
∂f ∂f
4.5.3 a. Note that f is C1 and = x cos(xz) + ez , so (a) = 2 6= 0. It follows from
∂z ∂z
1
Theorem 5.1 that there is a C1 function φ defined in a neighborhood of so that the surface
−1
x
is given by z = φ near a.
y
∂f ∂f
b. We have = y 2 + z cos(xz) and = 2xy. By Lemma 5.2,
∂x ∂y
∂f ∂f
∂φ 1 ∂x 1 ∂φ 1 ∂y
=− ∂f
(a) = − and = − ∂f (a) = 1.
∂x −1 2 ∂y −1
∂z ∂z
1
c. On one hand, the normal vector of the surface at a is ∇f (a) = −2 , so the tangent
2
plane is given by 0 = ∇f (a) · (x − a) = (x − 1) − 2(y + 1) + 2z, i.e., x − 2y + 2z = 3. Alternatively,
writing the surface locally as the graph of φ yields the equation
1 x−1 1 1 3
z = Dφ = − (x − 1) + (y + 1) = − x + y + .
−1 y+1 2 2 2
∂h
4.5.4 Since 6= 0, the equation h(x) = 0 locally determines x2 as a C1 function of x1 , viz.,
∂x2
x
x2 = ψ(x1 ). Substituting, we obtain z = φ = xψ(y/x). Then we have
y
y y
∂φ ∂φ ′ y
1 ′ y y x
x +y =x ψ +x· − 2 ψ +y x· ψ = xψ =φ .
∂x ∂y x x x x x x y
x
y/x
Alternatively, we can apply the Implicit Function Theorem to the function f y = h .
z/x
z
∂f 1 ∂h y/x
By the chain rule, we have = 6= 0, so there is a C1 function φ as asserted. Now
∂z x ∂x2 z/x
we apply Lemma 5.2 to see that
∂φ
∂f
∂x
− xy2 ∂x
∂h
− z ∂h
x2 ∂x2
∂h ∂h
1 y ∂x1 + z ∂x2
= − ∂f = − 1
1 ∂h
= ∂h
∂x x ∂x
x ∂x
∂z 2 2
∂f 1 ∂h ∂h
∂φ ∂y x ∂x1 ∂x1
=− ∂f
=− 1 ∂h
=− ∂h
.
∂y x ∂x2 ∂x2
∂z
112 4. IMPLICIT AND EXPLICIT SOLUTIONS OF LINEAR SYSTEMS
Therefore,
∂h ∂h ∂h
∂φ ∂φ y ∂x + z ∂x ∂x1 x
x +y = 1
∂h
2
+ −y ∂h = z = φ ,
∂x ∂y y
∂x
2 ∂x 2
4.5.5 Let f : Rn → R be given by f (x) = kxk2 . Then Df (x) = 2xT 6= 0 at every point of
S n−1 . Therefore, S n−1 is a smooth hypersurface, i.e., (n − 1)-dimensional manifold.
h i
4.5.6 Df = 12x2 z − 6yz − 6xy 2 −6xz + 12y 2 − 6x2 y 2z + 4x3 − 6xy . Df (x) = 0 if and
only if
Eliminating z using the last equation, the first two equations give x(x2 − y)2 = 0 and (x2 − y)2 = 0,
respectively. (x = 0leads
to y = z = 0 and nothing else.) Thus, Df fails to have rank 1 at precisely
x
points of the form x2 , x ∈ R, and these points do indeed lie on M . Away from these points, M
x3
is a smooth surface. As pictured on p. 480 of the text, M is the tangent developable of the twisted
cubic curve, i.e., the locus of tangent lines of this curve. It has a cuspidal edge along the curve and
is smooth everywhere else.
" #
x 2 + 2y 2 + 3z 2 − 9
x
4.5.7 Consider F : R3 → R2 given by F y = . Then
x2 + y 2 − z 2
z
" # " #
x 2y 3z x 0 −5z
DF = 2 ,
x y −z 0 y 4z
and this matrix has rank 2 unless (at least) two of x, y, and z are 0. But no such point
lies
√ on
5 2
√
F = 0, and so the intersection is a smooth curve. Since N(DF(a)) is spanned by v = −4 2 ,
1
so the tangent line of the curve is the line through a with direction vector v.
1
4.5.8 We see that rank(DF(x)) < 1 at the points x = ± 0 . Indeed, when a = b, the two
0
separate curves pictured in Figure 5.5 meet at those two points, forming an “X” locally at each of
them.
" # " #!
x y x y
4.5.9 The matrix ∈ M2×2 is singular if and only if xw−yz = 0. Letting f =
z w z w
h i
xw − yz, we see that Df = w −z −y x has rank < 1 only at 0. Thus, the set of nonzero
singular 2 × 2 matrices is a smooth hypersurface in M2×2 .
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 113
∂f −1
4.5.10 a. The curve looks much like that in Example 1. Here = 0 at the points and
∂y 1/2
1
.
−1/2
1/2
-1 1
-1/2
1 1
lim φ′2 (x) = lim sin( 31 arccos x) √
x→1− x→1− 3 1 − x2
1
1 3 cos( 13 arccos x) · − √1−x
1
2 1 cos( 13 arccos x) 1
= lim x = lim = ,
3 x→1− − √1−x 2
9 x→1− x 9
as required.
4.5.13 The lines ℓ1 and ℓ2 are skew (neither parallel nor intersecting), so, in fact, through
“most” points P not on either of them there is a unique line that intersects both ℓ1 and ℓ2 . Suppose
4.5. THE NONLINEAR CASE: INTRODUCTION TO MANIFOLDS 115
a 1 a 0
P = b . Then the plane P containing ℓ1 and P has normal vector 0 × b = −c . Thus,
c 0 c b
c/b
P has the equation −cy + bz = 0, and the point of intersection of P and ℓ2 is Q = 1 . (Note
c/b
that if b = 0, then P is parallel to ℓ2 .)
a c − ab
←→
The line P Q therefore has the parametric representation x = b +v b(1 − b) . Now, taking
c c(1 − b)
←→
P to be an arbitrary point on ℓ3 , we find that a point on P Q is of the form
x 1+u 2(1 + u) − 2(1 + u) 1+u 0
x=y= 2 +v −2 = 2 + 2v 1
z 2(1 + u) −2(1 + u) 2(1 + u) 1+u
1+u
= 2(1 + v) .
2(1 + u)(1 + v)
Therefore, the surface formed by all the lines intersecting all the lines ℓ1 , ℓ2 , and ℓ3 has the equation
z = xy. This is a saddle surface, a doubly ruled surface. (See the solution of Exercise 2.1.9.)
x0
4.5.14 Suppose a = ∈ X × Y . Since X is a k-dimensional manifold in Rn , there is a
y0
neighborhood U of x0 in Rn so that X ∩ U is the graph of a C1 function f on some open set U
in a coordinate k-plane in Rn . Similarly, since Y is an ℓ-dimensional manifold in Rp , there is a
neighborhood V of y0 in Rp so that Y ∩ V is the graph of a C1 function g on some open subset V
in a coordinate ℓ-plane in Rp . It follows that (X × Y ) ∩ (U × V ) is given as the graph of f × g on
U × V in a neighborhood of a.
Since the direction vector of the line is constant, we see that as b varies, we get parallel lines
ℓb ⊂ Rn+1 . What’s more, given b, b′ ∈ Rn , the distance between the lines ℓb and ℓb′ is at most
kE(b − b′ )k ≤ kEkkb − b′ k, so it is quite reasonable to say the lines vary continuously with b.
In fact, the set of lines in Rn+1 is a manifold. To have a unique way of specifying a line, we
choose a unit direction vector v ∈ S n and the point p ∈ (Span(v))⊥ through which it passes. Thus,
locally, at least, the set of lines looks like S n × Rn .
b. One obvious generalization is to consider matrices of rank r. Then we get parallel
affine (n + 1 − r)-dimensional subspaces and the same argument applies. One might also allow the
matrix A to vary. Then one must decide what it means for two k-dimensional subspaces of Rn+1
to be “close”; this leads to one of the most important constructions in modern mathematics, the
Grassmannian.
CHAPTER 5
Extremum Problems
5.1. Compactness and the Maximum Value Theorem
117
118 5. EXTREMUM PROBLEMS
5.1.3 The standard matrix of T is aT for some a ∈ Rn and therefore T (x) = a · x. By the
Cauchy-Schwarz inequality, we have |T (x)| ≤ kak whenever kxk = 1, with equality holding when
x is a positive scalar multiple of a. Therefore, kT k = kak.
2 1 1
5.1.4 a. For any x ∈ R with kxk = 1, we have Ax = ·x , and so kAxk =
1 1
√ 1 1 1
2 · x ≤ 2kxk. Equality holds when x = √ , and so kAk = 2, as required.
1 2 1
2 3 1
b. For any x ∈ R with kxk = 1, we have Ax = ·x , and so kAxk =
4 1
√ 3 √ 1 3 √
2 · x ≤ 5 2kxk. Equality holds when x = ± , and so kAk = 5 2, as required.
4 5 4
5.1.5 The first inequality was proved in Exercise 2.3.7. For the second inequality, we note that
X X X
a2ij = kaj k2 = kAej k2 ≤ nkAk2 .
i,j j j
Thus, kS ◦ T k ≤ kSkkT k.
5.1.7 Suppose x0 ∈ Rn is a unit vector so that kAx0 k = kAk. Choose a unit vector y0 ∈ Rm
that is a positive scalar multiple of Ax0 . Then we have
5.1.9 First, S is closed: Any convergent sequence of points in S has a subsequence converging
to a point of S and hence, by Exercise 2.2.6, must converge itself to that point of S. Next, S
is bounded: If not, we could take xk ∈ S with kxk k > k; then {xk } would have no convergent
subsequence.
5.2. MAXIMUM/MINIMUM PROBLEMS 119
5.1.10 Suppose {bk } is a sequence of points in f (X). There are points ak ∈ X so that bk =
f (ak ). Since X is compact, there is a convergent subsequence {akj }. Say akj → a ∈ X. By
Proposition 3.6 of Chapter 2, the corresponding subsequence {bkj } must therefore converge to
b = f (a), so b ∈ f (X). It follows from Exercise 9 that f (X) is compact.
5.1.11 Form a sequence {ak } by choosing ak ∈ Sk . Since this is a sequence in the compact set
S1 , by Theorem 1.1, there is a subsequence akj converging to some point x ∈ S1 . Since akj ∈ Sk for
all k ≤ kj , we can chop off the first j0 terms of this subsequence and have a convergent sequence
that lives in Sk for all k ≤ kj0 . Since Sk is closed, it follows that x ∈ Sk for all k ≤ kj0 . Letting
j0 → ∞, we infer that x ∈ Sk for all k ∈ N.
5.1.12 Following the hint, suppose that for every k ∈ N the statement X ⊂ U1 ∪ · · · ∪ Uk were
false. Then for every k ∈ N we could choose xk ∈ X so that xk ∈ / U1 ∪ · · · ∪ Uk . Since X is compact,
the sequence {xk } has a convergent subsequence xkj → x0 . Now x0 ∈ Uℓ for some ℓ. Since Uℓ is
open, it follows that, for sufficiently large j, we have xkj ∈ Uℓ . But as soon as kj ≥ ℓ (which must
happen eventually), this contradicts the hypothesis that xkj ∈ / U1 ∪ · · · ∪ Uℓ ∪ · · · ∪ Uk j .
5.1.13 Following the hint, suppose that there were no such number δ > 0. Then for every δ > 0,
in particular, for δ = 1/k for k ∈ N, there would be some xk ∈ X so that B(xk , 1/k) is contained in
none of the open sets Uj . Since X is compact, there is a convergent subsequence xkj → x0 . Now,
x0 ∈ Uℓ for some ℓ, and since Uℓ is open, there is some r > 0 so that B(x0 , r) ⊂ Uℓ . Choose j large
enough so that kxkj − x0 k < r/2 and 1/kj < r/2. Then it follows from the triangle inequality that
for such a j, we have B(xkj , 1/kj ) ⊂ Uℓ , contradicting our hypothesis.
√ √
We have f (a0 ) = −1, f (a1 ) = 2 − 2 2 ≈ −0.83, f (a2 ) = 2 + 2 2 ≈ 4.83, f (a3 ) = f (a4 ) = 5. Thus,
the minimum temperature on D is −1 and the maximum temperature is 5.
122 5. EXTREMUM PROBLEMS
5.2.5 Consider the two rectangles as shown in the figure. The area of the lower rectangle is
x(1 − x) and that of the upper rectangle is u (1 − u) − (1 − x) . The sum of the areas gives us
u
1−u
x
1−x
x x
our function f = x(1 − x) + u(x − u), defined on the domain X = :0≤u≤x≤1 .
u u
Since X is compact and f continuous, we are guaranteed a global maximum value.
Since Df =
h i 2/3 2/3 1
1 − 2x + u x − 2u , we see that the only critical point is a = , and f = . On
1/3 1/3 3
the other hand, on the boundary of X, the situation degenerates to a single rectangle and the
maximum area of an inscribed rectangle is 1/4. Thus, we obtain the maximum area by taking the
width of the upper rectangle to be 1/3 and that of the lower rectangle to be 2/3.
5.2.6 Let x, y, and z denote the length, width, and height, respectively, of the box, mea-
sured in ft. Given that xy + 2z(x + y) = 12, we wish to maximize
the
volume V = xyz of
x 1 12 − xy
the box. We therefore consider the function f = xy . The domain of f is
y 2 x+y
x 0
X = 6= : 0 ≤ xy ≤ 12 . Although f is continuous on X, the set X is not compact,
y 0
so we are not a priori guaranteed a global maximum point for f .
Nevertheless, we first find the critical point(s) of f . We have
1 h i
Df = y 2 (12 − x2 − 2xy) x2 (12 − y 2 − 2xy) ,
2(x + y)2
2
and so the only critical point of f in X is at a = . To argue that a is in fact the global
2
maximum point of f on X, we proceed as in Example 5. Note, first of all, that lim f (x) = 0, so
x→0
we may think of f as being a continuous function defined on the compact set
x 12
S= : 0 ≤ x ≤ 24, 0 ≤ y ≤ min , 24 .
y x
The Maximum Value Theorem guarantees us a global maximum of the continuous function f on
S. Now, a is the sole critical point of f in the interior
of S
and f (a) = 4. Moreover, it is easy to
x
check that on the boundary of or outside S we have f ≤ 3. Thus, a is the global maximum
y
point of f . The largest possible box is 2 ft × 2 ft × 1 ft.
5.2.7 Let x, y, and z denote the length, width, and height, respectively, of the box. Given that
2(xy + xz + yz)=A, we wish
to maximize
the volume V = xyz of the box.
We therefore
consider
x A/2 − xy x 0 A
the function f = xy . The domain of f is X = 6= : 0 ≤ xy ≤ .
y x+y y 0 2
5.2. MAXIMUM/MINIMUM PROBLEMS 123
24
xy=12
a
S
x
24
Proceeding
r as in
the solution of Exercise 6, we determine that the unique critical point of f in X
A 1
is a = ; thus, the putative maximum value of f is (A/6)3/2 .
6 1
Consider the restriction of f to the compact set
x √ A √
S= : 0 ≤ x ≤ 4 A, 0 ≤ y ≤ min ,4 A .
y 2x
x AA 1 A3/2 A3/2
Then we see that on the boundary of and outside S we have f ≤ √ = < √ .
y 2 24 A 16 6 6
It follows that a is the global maximum point of f . This gives a box with dimensions x = y = z =
√
A/ 6, i.e., a cube.
5.2.8 Let x, y, and z denote the length, width, and height, respectively, of the box. The cost
of the box is proportional to C = xy + 2z(x + y). We wish to maximize the volume V = xyz
of the box. (We see that this problem is virtually
identical
to Exercise
6, merely replacing 12
x 1 C − xy
with C.) We therefore consider the function f = xy . The domain of f is X =
y 2 x+y
x 0
6= : 0 ≤ xy ≤ C . Although f is continuous on X, the set X is not compact, so we are
y 0
not a priori guaranteed a global maximum point for f .
Nevertheless, we first find the critical point(s) of f . We have
1 h i
Df = y 2 (C − x2 − 2xy) x2 (C − y 2 − 2xy) ,
2(x + y)2
r
C 1
and so the only critical point of f in X is at a = . To argue that a is in fact the global
3 1
maximum point of f on X, we proceed as in Example 5. Note, first of all, that lim f (x) = 0, so
x→0
we may think of f as being a continuous function defined on the compact set
x √ C √
S= : 0 ≤ x ≤ 6 C, 0 ≤ y ≤ min ,6 C .
y x
124 5. EXTREMUM PROBLEMS
The Maximum Value Theorem guarantees us a global maximum of the continuous function f on
3/2
√
S. Now, a is the sole critical point of f in the interior of S and
f (a)
= C /6 3. Moreover, it
x
is easy to check that on the boundary of or outside S we have f ≤ C 3/2 /12. Thus, a is the
y
r r r
C C 1 C
global maximum point of f . The largest possible box is × × .
3 3 2 3
x1 x2 x3
5.2.9 The plane in R3 with equation + + = 1 has respective intercepts x, y, and z
x y z
on the coordinate axes and forms a pyramid in the first octant whose volume is V = 16 xyz. This
1
1 2 2 2
plane passes through 2 if and only if + + = 1, so that z =
. (Note that it
2
x y z 1 − x1 − y2
x 1 xy
is immediate that x ≥ 1 and y, z ≥ 2.) So we consider the function f = , whose
y 3 1 − x1 − y2
x 1 2
domain is X = : x > 0, y > 0, + < 1 . Then
y x y
1 h y xi
Df = y − 2 − 2 x − 1 − 4
3(1 − x1 − y2 )2 x y
1 h i
= x(y − 2) − 2y y(x − 1) − 4x .
3xy(1 − x1 − y2 )2
3
Solving for critical points, we find that 2x = y and so x = 3 and y = 6. We claim now that a =
6
is the global minimum point of f . Note that f (a) = 18. Now if we let
60
a S
x y−2x−y = 0
30
x 1 2 29
S= : x ≤ 30, y ≤ 60, + ≤ ,
y x y 30
then S is compact and f achieves its global minimum on S. On the other hand, on the boundary
x
of or outside S, it is easy to check that f ≥ 20. Therefore, a is the global minimum point for
y
x y z
f on all of X. That is, the desired plane is + + = 1.
3 6 6
5.2. MAXIMUM/MINIMUM PROBLEMS 125
5.2.10
Let x and θ be as indicated in the figure. Then we wish to maximize the function
x x 2π
f = (x sin θ)(12 − 2x + x cos θ) on the domain X = : 0 ≤ x ≤ 6, 0 ≤ θ ≤ . Since f
θ θ 3
x
θ
12−2x
is continuous on the compact set X, we are guaranteed a global maximum. Since f is differentiable,
if that maximum occurs at an interior point, it must be a critical point. So we begin by finding the
critical points of f . We have
h i
Df = 2 sin θ(6 − 2x + x cos θ) x (12 − 2x) cos θ + x(cos2 θ − sin2 θ) .
Solving for critical points, we find x = θ = 0 or cos θ = 2 − 6/x, which leads to the equation
6 6 2
2 6−x 2− +x 2 2− − 1 = 3(x − 4) = 0.
x x
4 √
Therefore we have the single interior critical point a = , and f (a) = 12 3 ≈ 20.8. On the
π/3
x 0 6 x √
boundary of X, we have f =f = 0, f = 36 sin θ cos θ ≤ 18, and f ≤8 3≈
0 θ θ 2π/3
13.9, so the global maximum does in fact occur in the interior. We achieve the trough of maximum
cross-sectional area by bending up 4′′ on either side at an angle of π/3.
5.2.11 Let x be the base of the rectangle and y its height, and let θ be the base angle of the
isosceles triangle. Then the perimeter of the pentagon is P = x(1 + sec θ) + 2y and its area is
given
1 2 1
x
by A = xy + 4 x tan θ. So we have y = 2 P − x(1 + sec θ) and consider the function f =
θ
1 2 1
x P P
2 P x + x ( 2 tan θ − sec θ − 1) on the domain X = θ
: 0 ≤ x ≤ , 0 ≤ θ ≤ arcsec
2 x
−1 .
Since X is compact and f is continuous, we are guaranteed a global maximum.
We have
1h 1 2 1 2
i
Df = P + 2x( 2 tan θ − sec θ − 1) x ( 2 sec θ − sec θ tan θ) ,
2
so at an interior critical point we must have 21 sec θ = tan θ, whence θ = π/6. It then follows that
√ √ √
x = P/(2 + 3) = (2 − 3)P . These dimensions give an area of A0 = 41 (2 − 3)P 2 . To verify that
126 5. EXTREMUM PROBLEMS
this is the global maximum of f , we note that f = 0 on the boundary of X except in the instance
that we let y = 0 and the pentagon degenerates to a triangle; in this case, the maximum area is
√
obtained when we have an equilateral triangle with sides P/3 and area P 2 3/36, which is indeed
less than A0 . Thus, the pentagon of maximum area is obtained by taking a rectangle with base
√ √ √
P (2 − 3), height P (3 − 3)/6 and isosceles triangle with height P (2 3 − 3)/6.
x
5.2.12 We wish to maximize/minimize the function f = −(x + 2y) with x2 + y 2 = 1. This
y
is really just a one-variable problem if we parametrize the circle. That is, we must find the extrema
of the function g(t) = −(cos t + 2 sin t), t ∈ [0, 2π]. We have g ′ (t) = sin t − 2 cos t = 0 if and only if
√ √
tan t = 2, so the maximum of g occurs when cos t = −1/ 5 and sin t = −2/ 5, and theminimum √
√ √ −1/ 5
√
occurs when cos t = 1/ 5 and
sin t = 2 5. That is, the highest point on the ellipse is −2 5
√ √
1/ 5 5
√
and the lowest point is 2 √5 .
− 5
Given x, y, z > 0
with 2 3
5.2.13 xy z = 108, we wish to find the
minimum
valueof x + y + z. So
y 108 y
we consider the function f = y + z + 2 3 defined on X = : y, z > 0 . We first find
z y z z
2 · 108 3 · 108
the critical points of f . Well, Df = 1 − 3 3 1 − 2 4 , so at a critical point we must have
y z y z
2
3y = 2z and so y = 2 and z = 3. Now, we must argue that a = is the global minimum point
3
of f . Note first that f (a) = 6.
1y 1
Now consider the compact set S = ≤ y, ≤ z, y + z ≤ 6 . Then the continuous
:
4z 2
y
function f takes on its minimum on S. Now if y + z ≥ 6, we have f > 6; if y ≤ 1/4 and z ≤ 6,
z
y 108 y 108
then we have f ≥ 1 2 3 = 8; and if z ≤ 1/2 and y ≤ 6, then we have f ≥ 2 1 3 = 24.
z (4) 6 z 6 (2)
Therefore, on the boundary of or outside S the values of f are strictly greater than f (a) = 6. It
follows that a is the global minimum point of f on X.
5.2.14 Actually, no calculus whatsoever is required here: Just note that, by completing the
square, we obtain
1
k
X
2 Xk
1
k
X
2
f (x) = k
x − aj
+ kaj k2 −
aj
,
k k
j=1 j=1 j=1
k
1X
and so the global minimum point is aj , the average (or center of mass) of the given vectors.
k
j=1
P
k 1 Pk
If we use calculus, note that ∇f (x) = 2 (x − aj ), so ∇f (x) = 0 if and only if x = aj ,
j=1 k j=1
1 Pk
as before. We must now argue that a = aj is the global minimum point for f . Let R =
k j=1
5.2. MAXIMUM/MINIMUM PROBLEMS 127
R 2R
×
a
max{kaj −ak, j = 1, . . . , k}. Then f (a) ≤ kR2 . Now if kx−ak ≥ 2R, then by the triangle inequality,
we have f (x) ≥ kR2 . Indeed, as long as not all the aj are the same, we’ll have f (x) > kR2 . From
this we infer that the global minimum of f on the compact set B(a, 2R) must be at the critical
point, a, and that, moreover, this is the global minimum of f on Rn , since f (a) ≤ kR2 < f (x) for
all x ∈
/ B(a, 2R).
5.2.15 Using the result of Example 1 of Chapter 3, Section 4, we see that if fj (x) = kx − aj k,
x − aj
j = 1, 2, 3, then ∇fj (x) = . (Note that fj is differentiable except at aj , so f is differen-
kx − aj k
tiable except at a1 , a2 , and a3 .) Then x is a critical point of f if and only if1
3
X x − a1 x − a2 x − a3
∇fj (x) = + + = 0.
kx − a1 k kx − a2 k kx − a3 k
j=1
Now we need the result of Exercise 1.2.6. In order for three unit vectors in R2 to add up to 0,
a2
a3
a
a1
they must make an angle of 2π/3 with one another. Thus, we need to find a (presumably inside
the triangle with vertices aj ) with the property that the line segments from a to the vertices of the
triangle form angles of 2π/3. Such a point is sometimes called the Fermat point of the triangle. As
1
Remark. We could also arrive at this point on physical grounds: Hanging equal weights by massless strings
passing through a table at positions aj and attaching the other ends of the strings to a massless ring, the system
reaches equilibrium when the weights are as low as possible, hence when the sum of the distances from the ring to
the vertices is as small as possible. Each mass exerts a force on the ring pointing from the ring to the point aj ; since
the masses are equal, the forces have equal magnitudes. But at equilibrium, the force vectors must sum to 0.
128 5. EXTREMUM PROBLEMS
we shall establish shortly, such a point exists provided all of the angles of the triangle are less than
2π/3. If one of the angles is 2π/3 or greater, then the minimum point falls at that vertex.
Why is the Fermat point the global minimum of f ? We could attempt a compactness argument
like that in Exercise 14, but since f fails to be differentiable at the vertices, we’ll need to be more
careful. Note first that the union of the triangle and its interior forms a compact set, on which the
continuous function f must have a minimum, a′ . Next, if x is outside the triangle, then clearly
∇f (x) points away from the triangle, so we can decrease f by moving the point back towards the
triangle. Therefore, the point a′ is the global minimum point for f on all of R2 . Indeed, examining
∇f along the edges of our triangle, we see that moving the point toward the opposite vertex will
decrease f , so the minimum point cannot be on any edge of the triangle. When the angles of
the triangle are all less than 2π/3, we get (as we see below) the Fermat point in the interior of
the triangle, and examining ∇f as we move toward any vertex, we see that f increases; thus, the
Fermat point a is the global minimum. If some angle is 2π/3 or greater, then there can be no
critical point in the interior (since the angle joining the vertices remote from that angle can be no
smaller than it is at the vertex itself), and so the minimum must be at one of the vertices. It is
clear that the vertex with the largest angle gives the smallest value of f .
The geometric construction of the Fermat point is pictured in the figure below. Recall that an
angle inscribed in a circle subtends double its measure in arc. So given points A and B, the locus
of points P with ∠AP B measuring 2π/3 consists of an arc of circle and its mirror image, the arc
forming a central angle of 2π/3. It is easy to see that the two circles will be be the circumcircles
for the equilateral triangles with AB as one side. So we construct these circles, as pictured in the
figure, for any two sides of our triangle; their point of intersection must be the Fermat point. (Note
that the third circle is redundant, since 2π − (2π/3 + 2π/3) = 2π/3.)
An alternative geometric construction is shown in the figure below. We construct equilateral
triangles on each of the sides of the given triangle. Let Q be inside △ABC, and let Q′ be the point
obtained by rotating π/3 about vertex A. Then
−→ −−→ −−→ −−→ −−→ −−− → −−→
kAQk + kBQk + kCQk = kQ′ Qk + kQBk + kB ′ Q′ k ≥ kB ′ Bk,
5.3. QUADRATIC FORMS AND THE SECOND DERIVATIVE TEST 129
A′
B′ Q′ Q P
B
A
Q′′
C′
with equality holding if and only if B ′ , Q′ , Q, and B are collinear. Now, let P be the intersection
of BB ′ and CC ′ ; substituting Q = P , we see that P ′ lies on BB ′ if and only if P lies on CC ′ .
Therefore, B ′ , P ′ , P , and B are indeed collinear, so we have equality and so P is the Fermat point.
To see that, in fact, AA′ passes through P requires the following observation. First, by side-
−−→ −−→
angle-side, △BAB ′ ∼ = △C ′ AC, and so kBB ′ k = kCC ′ k. Rotating π/3 about B, we deduce that
−→ −−→ −−→ −−→ −−→ −−→ −−→ −−→
kAQk + kBQk + kCQk = kCQ′′ k + kQ′′ Qk + kQCk ≥ kC ′ Ck = kB ′ Bk,
and we’ve already established that equality holds with Q = P . Therefore, C ′ , P ′′ , P , and C must
be collinear, and P ′′ lies on CC ′ if and only if P lies on AA′ . Thus, AA′ passes through P , as we
wished to establish. (See also “The Fermat-Steiner Problem,” by Shay Gueron and Ran Tessler, in
The American Mathematical Monthly, vol. 109, no. 5, May, 2002.)
so
1 −4 1 −1 −1 4 −1 1
1 1
Hess(f ) −1 = √ 1 −4 1 and Hess(f ) 1 = √ −1 4 −1 .
3 e 3 e
1 −1 1 −4 −1 1 −1 4
Now, we have
−4 1 −1 1 −4 1 − 41 1
4
1
1 −4 1 = −4 1 − 15
4 1 − 51 ,
1
−1 1 −4 4 − 15 1 − 18
5 1
| {z }| {z }| {z }
L D LT
1 −1
so we see that −1 is a local maximum point and 1 is a local minimum point.
1 −1
−2 z y 0 −2 2
l. Hess(f )(x) = z −2 x , so Hess(f ) 0 = −2 , Hess(f ) 2 =
y x −2 0 −2 2
−2 2
2 2 −2 −2 −2 −2 −2 −2 2
2 −2
2 , Hess(f ) −2 = −2 −2
2 , Hess(f ) 2 = −2 −2 −2 , and
2 2 −2 −2 −2 2 −2 −2 2 −2 −2
−2 −2 2 −2
Hess(f ) −2 = 2 −2 −2 . Clearly, the origin is a local maximum point. All the remaining
2 −2 −2 −2
critical points are saddle points. For example, we have
−2 2 2 1 −2 1 −1 −1
1
2 −2 2 = −1 2 − 21 4 1
2 1
2 2 −2 −1 1 1 −4 − 12 1
| {z }| {z }| {z }
P D PT
5.3.6 a. We have
" # " # " # " #" #" #
1 3 1 h ih i
0 0 1 1 1 3
= 1 1 3 + = .
3 13 3 0 4 3 1 4 1
| {z }| {z }| {z }
L D LT
Since both entries of D are positive, the quadratic form Q is positive definite.
b. We have
" # " # " # " #" #" #
2 3 1 h ih 3
i 0 0 1 2 1 3
2
= 3 2 1 2
+ = .
3 4 2 0 − 12 3
2 1 − 12 1
| {z }| {z } | {z }
L D LT
Since the entries of D have opposite signs, the quadratic form Q is indefinite.
c. We have
2 2 −2 1 h ih i 0 0 0
2 −1 4 = 1 2 1 1 −1 + 0 −3 6
−2 4 1 −1 0 6 −1
1 h ih i 0 h ih i
= 1 2 1 1 −1 + 1 −3 0 1 −2
−1 −2
0 h ih i
+ 0 11 0 0 1
1
1 2 1 1 −1
= 1 1 −3 1 −2 .
−1 −2 1 11 1
| {z }| {z }| {z }
L D LT
Since the entries of D have different signs, the quadratic form Q is indefinite.
134 5. EXTREMUM PROBLEMS
d. We have
1 −2 2 1 h ih i0 0 0
−2 6 −6 = −2 1 1 −2 2 +0 2 −2
2 −6 9 2 0 −2 5
1 h ih i 0 h ih i
= −2 1 1 −2 2 + 1 2 0 1 −1
2 −1
0 h ih i
+0 3 0 0 1
1
1 1 1 −2 2
= −2 1 2 1 −1 .
2 −1 1 3 1
| {z }| {z }| {z }
L D LT
Since the entries of D are all positive, the quadratic form Q is positive definite.
e. We have
1 1 −3 1 1 0 0 0 0
h ih i
1 0 −3 0 0 −1 −1
= 1 1 1 1 −3 1 +
0
−3 −3 −1 0 2
11 −3 0 2
1 0 −1 2 1 0 −1 2 1
1 0
i
1h ih 1h ih i
=
−3 1 1 1 −3 1 +
0 −1 0 1 0 1
1 1
0 0 0 0
0 0 0 0
+
0
0 2 2
0 0 2 2
1 0
i
1h ih 1h ih i
= 1 1 1 −3 1 +
−1 0 1 0 1
−3 0
1 1
0
h ih i
0
+
1 2 0 0 1 1
1
5.4. LAGRANGE MULTIPLIERS 135
1 1 1 1 −3 1
1 1 −1 1 0 1
=
−3
.
0 1 2 1 1
1 1 1 1 0 1
| {z }| {z }| {z }
L D LT
Since there is a zero entry in D and the remaining entries have different signs, the quadratic form
Q is indefinite.
5.3.7 Suppose LDU = L′ D ′ U ′ . Then (L′−1 L)D = D ′ (U ′ U −1 ). Note first that L′−1 L is lower
triangular and U ′ U −1 is upper triangular. Thus, (L′−1 L)D is lower triangular and D ′ (U ′ U −1 ) is
upper triangular. Since the diagonal entries of L′−1 L and U ′ U −1 are all 1’s, we must have D = D ′
and then L′−1 L = I = U ′ U −1 , so L = L′ and U = U ′ .
5.3.8 a. We have
" # " #" #" #
1 2 1 1 1 2
B= = ,
2 0 2 1 −4 1
| {z }| {z } | {z }
L D LT
" #" # # "
0 1 1 0 2 1
so A = EDE T , where E = E1−1 L = = . This means that Q(x) =
1 0 2 1 1 0
xT Ax = xT (EDE T )x = (E T x)T D(E T x). Letting y = E T x, we have Q(x) = y12 − 4y22 . (Indeed,
we have x = (E T )−1 y, so Q(x) = 4(y2 )(y1 − 2y2 ) + (y1 − 2y2 )2 , which checks.)
" #
1
1 2 , we have E AE T = B, as promised. Then B = LDLT , with
b. With E1 = 1 1
0 1
" # " # " #" #
1
1 1 1 − 1 0
L= and D = . Thus, A = EDE T , where E = E1−1 L = 2 =
1 1 −1 0 1 1 1
" #
1 1
2 − 2 . Thus, we have Q(x) = 2x x = y 2 − y 2 , where y = E T x.
1 2 1 2
1 1
5.4.3 Since f is continuous and the sphere is compact, we are guaranteed a maximum value.
2
We wish to find theh maximum point i = kxk − 4 = 0. Thus, we seek x on
i of hf on the set g(x)
the sphere so that 2 2 −1 = λ x y z for some scalar λ. So we must find the
x y
points on the constraint surface satisfying = = −z. We find x = y = −2z with z = ±2/3.
2 2
−2
2
Thus,
the two critical points are ± 3 −2 . The maximum value of f on the sphere is therefore
4/3 1
f 4/3 = 6.
−2/3
5.4.4 Since f is continuous and D is compact, we are guaranteed maximum and minimum
values of f . The only critical point in the interior of D is the origin. We then check for constrained
hcritical points oni the boundary
h i by using Lagrange multipliers. We seek x on the circle so that
2x + y x + 2y = λ x y for some scalar λ. Provided we are not dividing by 0, this leads to
2x + y x + 2y y x 1 ±1
= , so 2 + = + 2, and y = ±x. This yields the four critical points √ .
x y x y 2 ±1
(Note that if x = 0, then y = 0, and vice versa, so there are no additional critical points.) Then
we have √ √
0 1/ 2 3 −1/ 2 1
f = 0, f ± √ = , and f ± √ = ,
0 1/ 2 2 1/ 2 2
so the minimum value of f on D is 0 and the maximum value is 3/2.
x 2 2 x
5.4.5 We wish to minimize f = (x − 1) + y subject to the constraint g = x2 +
y y
h i h i
4y 2 − 4 = 0. Thus, we seek points x on the constraint curve satisfying x − 1 y = λ x 4y for
x
some scalar λ. Thus, either = 4, x = 1 and y is arbitrary, or y = 0 and x is arbitrary. This
x−1
5.4. LAGRANGE MULTIPLIERS 137
±2 0 4/3 ±2 0
leads to the potential critical points , , √ . Since f = 1, f = 2,
0 ±1 ± 5/3 0 ±1
4/3 4/3 1
and f √ = 2/3, we see that √ are the points on the ellipse closest to .
± 5/3 ± 5/3 0
x
5.4.6 We want to find the extrema of f subject to the constraint g y = x2 + y 2 + z 2 − 3 =
z
0. Since f is continuous and the sphere is compact, we are hguaranteed
i a global
h maximum
i and
minimum of f . We seek points x on the sphere satisfying x 1 1 = λ x y z for some
p
scalar
λ. Therefore,
we
eitherhave x = 0 and y =
z = ± 3/2 or y = z = x = ±1. Since
1 and
p0 √ p0 √ ±1 ±1
f p3/2 = 2 6, f −p3/2 = −2 6, and f 1 = 5, it follows that 1 are the warmest
3/2 − 3/2 1 1
p0
points and −p3/2 is the coldest point.
− 3/2
±x
5.4.7 Let the vertices of the box be at the points ±y , x, y, z ≥ 0. Then we wish to maximize
±z
x x
y2 z2
the function f y = 8xyz subject to the constraint g y = x2 +
+ − 1 = 0. Since f
2 3
z z
is continuous and the portion of the ellipsoid in the first octant
h is compact,
i wehare guaranteed i a
maximum. We seek points x satisfying the constraint so that yz xz xy = λ x y/2 z/3 for
some scalar λ. Since we obviously do not obtain maximum volume when any coordinate is 0, we
x y z
find that = = , and so y 2 = 2x2 and z 2 = 3x2 . Solving, we obtain the critical point
√ yz 2xz 3xy
1/ 3 √
√ √
2/ 3 . Thus, the greatest volume of a box inscribed in the ellipsoid is 8 2/3.
1
x
5.4.8 We want to find the extrema of the function f subject to the constraint g y =
z
4x2 + + y2 4z 2
− 16 = 0. Since f is continuous and the ellipsoid is compact, we hare guaranteedi a
global maximum and minimum of f . We seek points x on the ellipsoid satisfying 4x z y − 4 =
h i y 4z
λ 4x y 4z for some scalar λ. Therefore, we must have x = 0 and = or x 6= 0
z −4
y
0 0 ±4/3
y 4z
and 1 = = . We thus obtain the potential critical points 4 , −2√
, and −4/3 .
z y−4
0 ± 3 −4/3
√ √
f − 600 at these points yields
Evaluating
the values 0, −6 3, 6 3, and 64/9, so the hottest point
0 0
is √
−2 and the coldest point is −2 .
√
− 3 3
138 5. EXTREMUM PROBLEMS
x
5.4.9 We want to find the extrema of f subject to the constraint g y = x2 +y 2 +z 2 −2z = 0.
z
Since f is continuous and the sphere is compact, we
h are guaranteed
i a global
h maximum
i and minimum
2
of f . We seek points x on the sphere satisfying y x z − 1 = λ x y z − 1 for some scalar
λ. Now, we get solutions if x = y = 0 and z is arbitrary or if z = 1 and y = ±x. Otherwise, we
must have
y x z2 − 1
= = = z + 1,
x y z−1
5.4.10 Since f is continuous and S is compact, it follows that f must achieve its hmaximum and
i
minimum values. On the upper hemisphere, a constrained critical point must satisfy y x 3z 2 =
h i
λ x y z for some scalar λ, so either x = y = 0 and z = 1 or y = x and z = 1/3. The latter
2/3 −2/3
leads to the critical points 2/3 and −2/3 . On the boundary circle, we are merely looking
1/3 1/3
x 1
1
for critical points of f y = xy subject to the constraint x2 + y 2 = 1. There are four: ± √ 1
0 2 0
−1 0
1
and ± √ 1 . Evaluating f at all seven points, we find that the maximum point is 0 and
2 0 1
−1
1
the minimum points are ± √ 1 .
2 0
h i h i
1
2 cos α2 sin β2 sin γ2 1
2 sin α2 cos β2 sin γ2 1
2 sin α2 sin β2 cos γ2 =λ 1 1 1
for some scalar λ. At such a constrained critical point with none of α, β, or γ equal to 0, we must
α β γ
have tan = tan = tan , so α = β = γ = π/3. When we have an equilateral triangle, the
2 2 2
function f attains its maximum value 1/8.
x
5.4.15
We want to find the extrema of the function f = x2 + y 2 subject to the constraint
y
x
g = 2x2 + 4xy + 5y 2 − 1 = 0. Since the ellipse is compact, the continuous function f is
y
h i
guaranteed to achieve its maximum and minimum values. We seek points where x y =
h i
λ 2x + 2y 2x + 5y for some scalar λ. (Note that x = 0 if and only if y = 0 and the origin is
2x + 2y 2x + 5y y 2 y y y
not on our ellipse.) Thus, = , so 2 −3 −2 = 2 +1 − 2 = 0.
x y x x x x
1 1 1 2
Substituting in the constraint equation, we obtain the critical points ± √ and ± √ .
30 2 5 −1
The former are the points on the ellipse closest to the origin, and the latter are those farthest from
it.
140 5. EXTREMUM PROBLEMS
x x
5.4.16 We want to maximize f y = xyz subject to the constraint g y = xy + 2(xz +
z z
h i h i
yz) − C = 0. We seek points where yz xz xy = λ y + 2z x + 2z 2x + 2y for some scalar
λ. Since we obviously cannot achieve a maximum if any of the variables is 0, this leads to the
equations
y + 2z x + 2z 2x + 2y 1 2 1 2 2 2
= = , so + = + = + .
yz xz xy z y z x y x
Thus, we have x = y = 2z, and so the base of the box should be a square and its height should be
one-half the length of the base.
x x
1 2 2
5.4.17 We want to maximize f y = xyz/6 subject to the constraint g y = + + −1 =
x y z
z z
h i
1 2 2
0. We seek points where yz xz xy = λ for some scalar λ. This leads to the
x2 y 2 z 2
1 2 2 1 x y z
equations = = = . Thus, the equation of the plane we seek is + + = 1.
x y z 3 3 6 6
5.4.18 We want to find the extrema of the function f (x) = x1 +· · ·+xn subject to the constraint
g(x) = kxk2 −1 = 0. Since the the unit sphere is compact and f continuous, we know that f achieves
its maximum and minimum. (Indeed, no calculus is necessary: The Cauchy-Schwarz inequality is
1
..
sufficient.) The method of Lagrange multipliers leads immediately to x = λ . for some scalar
1 1
1 √
λ. Therefore, we must have x = ± √ ... . The extreme values of f are ± n.
n
1
5.4.19 Let one vertex of the rectangular parallelepiped be at the origin and the opposite vertex
at x, with all the edges parallel to the coordinate axes. Then we wish to maximize f (x) = x1 x2 . . . xn
subject to the constraint g(x) = kxk2 − δ2 = 0 (obviously, we may take all xi ≥ 0.) Since f is
continuous and the constraint set h compact, we are guaranteed a globali maximum.
h This maximum
i
must occur at a point where x2 . . . xn x1 x3 . . . xn · · · x1 . . . xn−1 = λ x1 x2 · · · xn for
2 2 2
λ. This leads immediately to x1 = x2 = · · · = xn , so the maximum occurs at x =
some scalar
1
δ . δ n
√ .. , and the maximum volume is √ .
n n
1
5.4.20 Suppose
h we fix x1 + · · · + xn = c > 0 and i tryhto maximizeif (x) = x1 x2 . . . xn . Then we
must have x2 . . . xn x1 x3 . . . xn · · · x1 . . . xn−1 = λ 1 1 · · · 1 for some scalar λ. It follows
that x2 . . . xn = x1 x3 . . . xn = · · · = x1 . . . xn−1 , so (dividing through by x1 x2 . . . xn ) we infer that
the critical point of interest satisfies x1 = x2 = · · · = xn = c/n. It follows that the maximum
5.4. LAGRANGE MULTIPLIERS 141
c n n
x1 + · · · + xn
value of f is given by = . Then we conclude, upon taking nth roots, that
n n
√
n x x ...x ≤
x1 + · · · + xn
1 2 n , as desired.
n
x x
5.4.21 We wish to minimize the function f = xp /p+y q /q
subject to the constraint g =
y y
h i h i
xy = c > 0. Then at a constrained critical point we must have xp−1 y q−1 = λ y x for
p q
some scalar λ. Then we conclude that x = y , and so, substituting in the constraint equation, we
find x = cq/(p+q) = c1/p and y = cp/(p+q) = c1/q . Substituting these values into f , we find that the
minimum value is c/p + c/q = c, so the desired inequality holds.
x
5.4.22
We need to maximize the function f y = xy + 14 x2 tan θ subject to the constraint
x θ
g y = x(1 + sec θ) + 2y = P . This leads to
θ h i h i
1 1 2 2 = λ 1 + sec θ 2 x sec θ tan θ
y+ 2 x tan θ x 4 x sec θ
and so it is immediate from √the equation (2) that sin θ = 1/2 and θ = π/6. Substituting in the
y + x/(2 3) x x 1
equation (1) yields √ = , and so y = 1 + √ . Substituting in the constraint
1 + 2/ √3 2 2 3√
√
equation gives us x = P/(2 + 3) = P (2 − 3) and y = P (3 − 3)/6.
5.4.23 Let the radius of the cylinder and cone be r, let the
height
of the cylinder be H, and let
r
the height of the cone be h. Then we wish to maximize f H = πr 2 (H + h/3) subject to the
r h
√
constraint g H = πr 2H + r 2 + h2 = A. Applying the method of Lagrange multipliers leads
h
us to the equation
h i
√ r2 rh
2r(H + h/3) r2 r 2 /3 = λ 2H + r2 + h2 +√ 2r √
r 2 + h2 r 2 + h2
2
5.4.26 a.
" We# want to minimize the function f (x) = kx − bk subject to the constraint g(x) =
1 −1 3
x = 0. The method of Lagrange multipliers leads us to the equation
2 1 0
| {z } 1 2
A
x − b = λ −1 + µ 1 for some scalars λ and µ.
3 0
1 2 " # 3
λ
Thus, x = −1 1 + 7 , and so, using the constraint equation Ax = 0, we obtain
µ
3 0 1
" # " #" # " #
T λ 11 1 λ −1
0 = AA + Ab = + ,
µ 1 5 µ 13
5.4. LAGRANGE MULTIPLIERS 143
1 2 3 −2
λ 1 1 1 8
and so = and the closest point is x = −1 − 1 + 7 = 4 .
µ 3 −8 3 3
3 0 1 2
2
" # to minimize the function f (x) = kx − bk subject to the constraint g(x) =
b. We want
1 1 1 1
x = 0. The method of Lagrange multipliers leads us to the equation
1 0 2 1
| {z }
A 1 1
1 0
x − b = λ
1 + µ 2 for some scalars λ and µ.
1 1
1 1 " # 3
1 0 λ 1
Thus, x =
1
+
1 , and so, using the constraint equation Ax = 0, we obtain
2 µ
1 1 −1
" # " #" # " #
T λ 4 4 λ 4
0 = AA + Ab = + ,
µ 4 6 µ 4
1 3 2
λ 1 1 1 0
and so =− , and x = −
+ = .
µ 0 1 1 0
1 −1 −2
(See also (‡) and the discussion on pp. 229-230.)
x
2
5.4.27
" # " # the function f (x) = kxk subject to the constraint g y =
We want to minimize
2 2 2
x − xy + y − z − 1 0 z
2 2
= . The method of Lagrange multipliers leads to
x +y −1 0
h i h i h i
x y z = λ 2x − y −x + 2y −2z + µ x y 0
for some scalars λ and µ. There are two cases to consider. If z = 0, then
the
constraint
equations
1 0
tell us that xy = 0, so, letting λ = 0, we get the four critical points ± 0 and ± 1 . If z 6= 0,
0
#" # " # 0
"
2 −1 x x
then we must have λ = −1/2; then we must have = 2(µ−1) , and so µ = 3/2
−1 2 y y
x 1 1
or µ = 5/2. With µ = 3/2, we must have = ±√ and there are no solutions (since this
y 1
2
2 x 1 1
leads to z = −1/2). With µ = 5/2, we must have = ±√ , and this gives the critical
y 2 −1
1 1 0
1
points ± √ −1 . Evaluating f at the eight different points, we see that ± 0 and ± 1 are
2 ±1 0 0
closest to the origin.
144 5. EXTREMUM PROBLEMS
5.4.28 As shown in the figure below, let the sidelengths of the quadrilateral be a, b, c, and d,
and let one pair of the opposite angles be x and y. (Without loss of generality,
we may assume
x 1
that 0 < x < π/2 and 0 < y < π.) Then the area of the quadrilateral is f = 2 (ab sin x +
y
S
d
a
R
x y
P
c
b
Q
x
cd sin y). We wish to maximize this function subject to the constraint g = ab cos x − cd cos y =
y
h i
1 2 2 −c2 −d2 ) (coming from the law of cosines). This leads to the equation
2 (a +b ab cos x cd cos y =
h i
λ −ab sin x cd sin y for some scalar λ. Thus, at a constrained critical point, we must have
tan x = − tan y, so y = π − x. It now follows that the quadrilateral can be inscribed in a circle:
Consider the circle circumscribed about △P QS; then R must lie on this circle, since ∠QRS and
∠SP Q subtend a total angle 2π.
5.4.29 Proceeding as in Example 4 and the preceding discussion, we have the following.
a. We must solve Ax = λx:
x + 2y = λx
2x − 2y = λy,
x + 2y 2x − 2y y 2 y y y
so, eliminating λ, we obtain = . Thus, 2 +3 −2 = 2 − 1 + 2 = 0.
x y x x x x
y 1 1 2 1 1
Therefore, we have = or −2, leading to the critical points ± √ and ± √ , with
x 2 5 1 5 −2
respective Lagrange multipliers 2 and −3.
b. We must solve Ax = λx:
3y = λx
3x − 8y = λy,
3y 3x − 8y y 2 y y y
leading to the equation = , so 3 +8 −3 = 3 − 1 + 3 = 0. Therefore,
x y x x x x
y 1 1 1 1 3
we have = or −3, leading to the critical points ± √ and ± √ , with respective
x 3 10 −3 10 1
Lagrange multipliers −9 and 1.
5.4.30 Recall that kAk = max kAxk, so we need to maximize the function f (x) = kAxk2
kxk=1
subject to the constraint g(x) = kxk2 = 1.
5.4. LAGRANGE MULTIPLIERS 145
x
a. Here we have f = (x + y)2 + y 2 = x2 + 2xy + 2y 2 , so the method of Lagrange
y
h i h i x+y
multipliers leads to the equation x + y x + 2y = λ x y for some scalar λ. Thus, =
√ x
x + 2y y 2 y y 1± 5 0.5257
, and so − − 1 = 0, so = . This leads to the critical points ±
y 0.8507
x x x 2
0.8507 0.5257 0.8507
and ± , and we have f ≈ 2.618 and f ≈ 0.382. Therefore, we deduce
−0.5257 0.8507 −0.5257
√
that kAk ≈ 2.618 ≈ 1.618 (which happens to be the golden ratio). (We can with hindsight derive
the 2
exact
result algebraically: Letting r = y/x, we observed that r = r + 1; the critical points
2 + 2r + 1
x0 x0 x0 2r
= are found by solving x2 (1+r 2 ) = 1, so f = (1+r)2 +r 2 )x20 = =
y0 rx0 y0 r+2
4r + 3
= r + 1 = r 2 , as required.)
r+2
x
b. Now we have f = (2x + y)2 + (3y)2 = 4x2 + 4xy + 10y 2 , so the method of
y
h i h i
Lagrange multipliers leads to the equation 2x + y x + 5y = λ x y for some scalar λ.
y 2 y √
2x + y x + 5y y 3 ± 13
Thus, = , and so −3 − 1 = 0, so = . This leads to the critical
x y x x x 2
0.2898 0.9571 0.2898 0.9571
points ± and ± , and we have f ≈ 10.606 and f ≈ 3.394.
0.9571 −0.2898 0.9571 −0.2898
√
Therefore, kAk ≈ 10.606 ≈ 3.257.
x
c. Here we have f = (2x + y)2 + (x + 3y)2 = 5x2 + 10xy + 10y 2 , so the method
y
h i h i
of Lagrange multipliers leads to the equation x + y x + 2y = λ x y for some scalar λ.
y 2 y √
x+y x + 2y y 1± 5
Thus, = , and so − − 1 = 0, so = , just as in part a. This
x y x x x 2
0.5257 0.8507 0.5257
leads to the critical points ± and ± , and we have f ≈ 13.090 and
0.8507 −0.5257 0.8507
0.8507 √
f ≈ 1.910. Therefore, kAk ≈ 13.090 ≈ 3.618.
−0.5257
5.4.31 Consider the figure below: ℓ and h are constants, as is the length of the rope. Physics
dictates that at equilibrium the weight hang as low as possible. Thus, we wish to maximize
ℓ
h
y
x
β
θ
α
z
146 5. EXTREMUM PROBLEMS
x x
y y x + y + z − L 0
f z = x sin α + z given the constraints g z = y sin β − x sin α − h = 0 . This leads
α α x cos α + y cos β − ℓ 0
β β
to the following equation (which we write in the equivalent vector form for obvious typographical
reasons):
sin α 1 − sin α cos α
0 1 sin β cos β
1 = λ 1 + µ 0 +ν 0
x cos α 0 −x cos α −x sin α
0 0 y cos β −y sin β
for some scalars λ, µ, and ν. The third component of this equation tells us that λ = 1. The
fifth component tells us that µ/ν = tan β, and from this and the second component we infer that
ν = − cos β, so µ = − sin β. Now the fourth component gives us
Squaring and adding these, we obtain cos(α + β) = 1/2, so α + β = π/3 and the angle θ we seek
must be 2π/3. (This is not all that surprising if we think about the tension in the rope and the
nature of the force vectors at equilibrium. Cf. the solution of Exercise 5.2.15.)
5.4.32 a. We have (g ◦ ψ)(t) = t locally, so Dg(ψ(c))ψ ′ (c) = 1. We also have Df (a) = λDg(a),
so applying both sides of this equation to the vector ψ ′ (c) gives (f ◦ ψ)′ (c) = Df (a)ψ ′ (c) =
λDg(a)ψ ′ (c) = λ.
c
2
b. Since f and g are C , the function F : R × ×R → Rn Rn × R defined by F x =
" # λ
∇f (x) − λ∇g(x) 1
is C . Then the Implicit Function Theorem (see Section 2 of Chapter 6 for
c − g(x)
x
−1
the general statement) tells us that on the level set F (0) we can solve locally for as a C1
λ
function of c provided the (n + 1) × (n + 1) matrix giving the derivative of F with respect to the
variables x, λ is invertible. But that derivative is (aside from a couple of minus signs) the matrix
given in the exercise.
5.4.33 Suppose we are given a budget g(x) = p · x = c and we wish to maximize the production
function f . Then the method of Lagrange multipliers tells us that at the constrained critical point a
∂f
we must have Df (a) = λDg(a) for some scalar λ. This means that (a) = λpi for all i = 1, . . . , n,
∂xi
1 ∂f
which in turn means that λ = (a) for all i = 1, . . . , n. It is intuitively plausible that all the
pi ∂xi
5.5. PROJECTIONS AND LEAST SQUARES 147
marginal productivities should be equal; if that for item j were greater, we would produce more
widgets by using more than aj units of item j without changing the cost.
The result of Exercise 32a tells us that the marginal productivity is the derivative of the
(optimal) number of widgets produced as a function of our budget. That is, if we increase the
budget by one dollar, then λ will be the extra number of widgets produced optimally. What the
equality in this exercise establishes is that, at the optimal point, spending that extra dollar on any
one of the items results in the same increase in productivity. This is a non-obvious result.
5.4.34 Proceeding as in the hint, we parametrize the level set S locally by Φ : U → Rn , where
U is a neighborhood of a in Rn−1 . Set h = f − λg. If Hh◦Φ,a is positive (negative) definite, then
the function h◦ Φ has a local minimum (maximum) at a. But h◦ Φ = f ◦ Φ − λc, so h◦ Φ has a local
minimum (maximum) at a if and only if f ◦ Φ has a local minimum (maximum) at a, which in turns
happens if and only if f has a constrained local minimum (maximum) at a.
Now, we differentiate h◦ Φ carefully using the chain rule. Recall that for the particular value λ
that appears in Lagrange multipliers, we have Dh(a) = Df (a) − λDg(a) = 0. For j = 1, . . . , n − 1,
we have
Xn
∂(h◦ Φ) ∂h ∂Φℓ
(a) = (Φ(a)) (a), and so
∂uj ∂xℓ ∂uj
ℓ=1
Xn n
X ∂h
∂ 2 (h◦ Φ) ∂2h ∂Φk ∂Φℓ ∂ 2 Φℓ
(a) = (a) (a) (a) + (a) (a)
∂ui ∂uj ∂xk ∂xℓ ∂ui ∂uj ∂xℓ ∂ui ∂uj
k,ℓ=1 ℓ=1
n
X ∂2h ∂Φk ∂Φℓ
= (a) (a) (a) = viT Hess(h)(a)vj ,
∂xk ∂xℓ ∂ui ∂uj
k,ℓ=1
∂Φ
where vi = (a), i = 1, . . . , n − 1, give a basis for Ta S. That is, computing the Hessian of h◦ Φ
∂ui
at a is exactly computing the restriction of the Hessian of h at a to the tangent space of S at a.
5 2 1
1
so projV = I − projV ⊥ = 2 2 −2 .
6
1 −2 5
5.5. PROJECTIONS AND LEAST SQUARES 149
1 0 " #
2 −2
b. Let A = 0 1 . Then AT A = and so
−2 5
1 −2
1 0 " #" # 5 2 1
T −1 T 1 5 2 1 0 1 1
P = A(A A) A = 0 1 = 2 2 −2 .
6 2 2 0 1 −2 6
1 −2 1 −2 5
1 0
c. We have V = Span(v1 , v2 ), where v1 = 0 and v2 = 1 . Applying the Gram-
1 −2
0 1
1·0
0 1
−2 1
Schmidt process, we take w1 = v1 and w2 = v2 − projw1 v2 = 1 −
2 0 =
1
−2
1
0
0 1 1
1
1 + 0 = 1 . Then V = Span(w1 , w2 ) and
−2 1 −1
2
X 1 h i 1 1 h i 1 5 2 1
1 T 1
projV = 2
wi wi = 0 1 0 1 + 1 1 1 −1 = 2 2 −2 .
kwi k 2 3 6
i=1 1 −1 1 −2 5
1 1 4 " # " #
6 2 1
5.5.4 a. Let A = 2 1 and b = −2 . Then AT A = and AT b = .
2 3 1
1−1 1
" #" # " #
T −1 T 1 3 −2 1 1 1
Then x = (A A) A b = = is the least squares solution.
14 −2 6 1 14 4
5 4
1
b. Ax = 6 is the point in C(A) closest to −2 .
14
−3 1
1 1 1 " # " #
6 0 11
5.5.5 a. Let A = 1 −3 and b = 4 . Then AT A = and AT b = .
0 11 −8
2 1 3
" #" # " #
1/6 0 11 11/6
Then x = (AT A)−1 AT b = = is the least squares solution.
0 1/11 −8 −8/11
73 1
1
b. Ax = 265 is the point in C(A) closest to 4 .
66
194 3
150 5. EXTREMUM PROBLEMS
" #
1 −1 3
5.5.6 a. Let B = . Then
2 1 0
3 1 2 " #−1 " # 3
T T −1 11 1 1 −1 3
x0 = b − B (BB ) Bb = 7 − −1 1 7
1 5 2 1 0
1 3 0 1
3 1 2 " #" # 3 1 2 " #
1 5 −1 −1 1 −1
=7− −1 1 = 7 − −1 1
54 −1 11 13 3 8
1 3 0 1 3 0
3 5 −2
= 7− 3 = 4.
1 −1 2
" #
1 1 1 1
b. Let B = . Then
1 0 2 1
3 1 1 " #−1 " # 3
1 1 0 1
x0 = b − B T (BB T )−1 Bb = 4 4 1 1 1 1
1−1 2 1
4 6 1 0 2 1
−1 1 1 −1
3 1 2
1 1 0
=
1−1 = 0.
−1 1 −2
a=0
a=1
a=3
a = 5,
1 0
1 1
which takes the matrix form Aa = b, with A = T
1 and b = 3 . Then A A = [4] and
1 5
T
A b = [9], so a = 9/4. (Notice this is just the average of the given y-values.) The sum of the errors
is (0 − 94 ) + (1 − 94 ) + (3 − 94 ) + (5 − 49 ) = 0, of course.
5.5. PROJECTIONS AND LEAST SQUARES 151
b. The system
−a + b = 0
b = 1
a + b = 3
2a + b = 5
a=1
a=2
a=1
a = 3,
1 1
1 2
which takes the matrix form Aa = b, with A = T
1 and b = 1 . Then A A = [4] and
1 3
T
A b = [7], so a = 7/4. (Notice this is just the average of the given y-values.) The sum of the errors
is (1 − 74 ) + (2 − 74 ) + (1 − 74 ) + (3 − 47 ) = 0, of course.
b. The system
a + b = 1
2a + b = 2
3a + b = 1
4a + b = 3
5.5.10 a. Suppose projV x = p and projV y = q. Then x − p and y − q are vectors in V ⊥ . Then
x + y = (p + q) + (x + y) − (p + q) = (p + q) + (x − p) + (y − q) ,
| {z } | {z }
∈V ∈V ⊥
so projV (x + y) = p + q, as required.
b. Similarly, since
cx = c p + (x − p) = (cp) + (cx − cp) = (cp) + c(x − p) ,
|{z} | {z }
∈V ∈V ⊥
5.5.11 a. Let b ∈ Rm . Then p = Ab = projV b is the unique vector in V with the property
that b − p ∈ V ⊥ . Moreover, it follows that whenever p ∈ V to start with, we must have Ap = p
(inasmuch as p ∈ V and p − p ∈ V ⊥ ). Therefore, for any b ∈ Rm , A2 b = A(Ab) = Ap = p = Ab,
so A2 = A.
To prove that A = AT , we show that Ax · y = x · Ay for all x, y ∈ Rm . We write y =
Ay + (y − Ay), where Ay ∈ V and y − Ay ∈ V ⊥ . Since Ax ∈ V , it follows that Ax · (y − Ay) = 0,
and so Ax·y = Ax·Ay. But a similar argument shows that x·Ay = Ax·Ay, and so Ax·y = x·Ay
for any x, y ∈ Rm . But now it follows that Ax · y = AT x · y, so (A − AT )x · y = 0 for all x, y ∈ Rm .
So, by Exercise 1.2.15, (A − AT )x = 0 for all x ∈ Rm and so A − AT = O, as required.
154 5. EXTREMUM PROBLEMS
1
w1 = v1 = 0
0 2 1
1·0
2 1 0
0 0
w2 = v2 − projw1 v2 = 1 −
2 0 = 1
1
0
0 0
0
0
w3 = v3 − projw1 v3 − projw2 v3
3 1 3 0
2·0 2·1
3 1 0 0
1 0 1 0
= 2 −
2 0 −
2 1 = 0 .
1
0
1
0
0 1
0
1
0
0
These three vectors already have length 1, so they give an orthonormal basis.
1 0 0
b. Let v1 = 1 , v2 = 1 , and v3 = 0 . Applying the Gram-Schmidt process,
1 1 1
we have
1
w1 = v1 = 1
1 0 1
1·1
0 1 −2/3
1 1
w2 = v2 − projw1 v2 = 1 −
2 1 = 1/3
1
1
1 1/3
1
1
5.5. PROJECTIONS AND LEAST SQUARES 155
−2
w2′ = 1
1
w3 = v3 − projw1 v3 − projw2′ v3
0 1 0 −2
0·1 0· 1
0 1 −2 0
1 1 1 1
= 0 −
2 1 −
2 1 = −1/2 ,
1
−2
1
1
1 1/2
1
1
1
1
0
which we can likewise rescale to w3′ = −1 .
1
1 −2
1 1
We now take q1 = w1 /kw1 k = √ 1 , q2 = w2 /kw2 k = w2′ /kw2′ k = √ 1 , and
3 6
1 1
0
′ ′ 1
q3 = w3 /kw3 k = w3 /kw3 k = √ −1 to form our orthonormal basis.
2
1
1 2 0
0 1 1
c. Let v1 = , v2 = , and v3 = . Applying the Gram-Schmidt process,
1 0 2
0 1 −3
we have
1
0
w1 = v1 = 1
2 1
0
1 0
0·1
2 1 1
1 1 0 0 1
w2 = v2 − projw1 v2 =
0−
= −1
1
2 1
1
0 1
0
1
0
w3 = v3 − projw1 v3 − projw2 v3
156 5. EXTREMUM PROBLEMS
0 1 0 1
1 0
· 1· 1
2 1 2 −1
0 1 1
1 −3 0 0 −3 1 1
=
2−
−
2
1
2 1
1
−1
−3
0
1
0
1
1
−1
0
1
0
2
= .
0
−2
1 1
1 0 1 1, and q3 = w3 /kw3 k =
We now take q1 = w1 /kw1 k = √ , q2 = w2 /kw2 k =
21 2 −1
0 1
0
1 1
√ to form our orthonormal basis.
2 0
−1
−1 2 −1
2 −4 3
d. Let v1 =
0 , v2 = 1 , and v3 = 1 . Applying the Gram-Schmidt
2 −4 1
process, we have
−1
2
w1 = v1 =
0
2
2 −1
−4 2
·
1 0
2 −1 0
−4 −4 2 2 0
w2 = v2 − projw1 v2 = 1−
2
0=1
−1
−4
2 0
2
0
2
w3 = v3 − projw1 v3 − projw2 v3
5.5. PROJECTIONS AND LEAST SQUARES 157
−1 −1 −1 0
3 2 3 0
1· 0
1·1
−1 −1 0
3 1 2 2 1 0 0
=
1−
−
−1
2 0
0
2 1
1
2
0
2
0
0
1
2
0
0
1
= .
0
−1 −1 0 0
1 2
0
1 1
We now take q1 = w1 /kw1 k = , q2 = w2 /kw2 k = , and q3 = w3 /kw3 k = √
3 0 1
2 0
2 0 −1
to form our orthonormal basis.
1 1
0 −1
1· 0
1 1 1 1/2
−1 0 1 2 −1 1/2
5.5.13 a. Take w1 =
0 and w2 = 1 −
2
0 = 1 . To make life
1
2 1
2 0
−1
0
2
1
1
easier, we take w2′ = ′
2 . Then {w1 , w2 } gives an orthogonal basis for V .
0
b.
c. The least squares solution of Ax = b is the unique vector" # x with the property that
1
Ax = projV b. In this case projV b is the first column of A, so x = .
0
0 1
1 0
0·1
1 0 1 −1 −1
0 1 1 1 0 1 3 3
5.5.14 a. Take v1 = and v2 = −
2 = ′
1 0
1 3 −1 ; we set v2 = −1
1
1 1
1 2 2
0
1
1
for convenience. {v1 , v2′ } gives an orthogonal basis for V .
" #! −1 −1
1 0 1 1 0 −1
b. V ⊥ = N = Span
1 , 0 . Applying the Gram-
−1 3 −1 2
0 1
−1 1
0 2
Schmidt process once more, we find that w1 =
and w2 =
1 give an orthogonal basis
1
0 −2
for V ⊥ .
c. Given x ∈ R4 , we set v = projV x and w = projV ⊥ x. Then one can easily check that
x = v + w. Here we have
1 −1
0 3
x· x·
1 −1
1 −1
1 0 2 3
v = projv1 x + projv2′ x =
2 +
1
1
−1
2 −1
1
2
0
3
1
−1
1
2
1 −1
x1 + x3 + x4 0
+
−x1 + 3x2 − x3 + 2x4 3
= 1 −1
3 15
1 2
5.5. PROJECTIONS AND LEAST SQUARES 159
2x1 − x2 + 2x3 + x4
1 −x1 + 3x2 − x3 + 2x4
=
5 2x1 − x2 + 2x3 + x4
x1 + 2x2 + x3 + 3x4
−1 1
0 2
x· 1
x· 1
−1 1
0 0 −2 2
w = projw1 x + projw2 x =
+
2
−1
2 1
1
1
0
−2
0
2
1
1
0
−2
−1 1
−x1 + x3 0 x1 + 2x2 + x3 − 2x4 2
= 1+ 1
2 10
0 −2
3x1 + x2 − 2x3 − x4
1 x1 + 2x2 + x3 − 2x4 .
=
5 −2x1 + x2 + 3x3 − x4
−x1 − 2x2 − x3 + 2x4
1
5.5.15 a. Since C(A) = Span , we know that b ∈ C(A) if and only if it has the form
1
1
1
b=b for some b ∈ R. A solution v of Av = b is given by v = b 0 . To find x ∈ R(A) with
1
0
1 1
b 0 · 2
1 1
0 3 b
Ax = b, we take x = projR(A) v =
2 2 = 2 .
1
14
3 3
2
3
b. Since rank(A) = 2, we know that C(A) = R2 . Row reducing the augmented matrix
" # b1 − b2
1 1 1 b1
yields the solution v = b2 . The rows of A are orthogonal, so to find
0 1 −1 b2
0
160 5. EXTREMUM PROBLEMS
1 0
v·1 v· 1
1 0 1 0
1 −1 b1 b2
x = projR(A) v =
2 1 +
2 1 = 3 1 + 2 1 .
1
0
1
−1 1 −1
1
1
1
−1
d. The first two rows of A are orthogonal and the third row is the sum of the first
b1
two. Thus, if b ∈ C(A), it must have the form b = b2 for some b1 , b2 ∈ R. If we take
b1 + b2
1 1
b1
1 b2
1
x=
4 1 + 36 3 (as in part c), we see that Ax = b and x ∈ R(A).
1 −5
as required.
b. First, V = U + + U − by the remark. Let’s now check that U − = (U + )⊥ . We’ve
already established that U − ⊂ (U + )⊥ , so it remains only to show that if f ∈ (U + )⊥ , then f ∈ U − .
Write f = f1 + f2 , where f1 ∈ U + and f2 ∈ U − . Then we have
since we’ve already shown that even and odd functions are orthogonal. Thus, f1 = 0 and f ∈ U − ,
as we needed to show. (This means that (U + )⊥⊥ = U + in this instance. Although Proposition 4.8
of Chapter 4 need not hold in infinite dimensions, it does hold here.)
6.1.1 Suppose f is a contraction mapping. Then there is a constant c with 0 < c < 1 so that
kf (x) − f (y)k ≤ ckx − yk. Given ε > 0, take δ = ε/c; then whenever kx − yk < δ, we have
kf (x) − f (y)k < cδ = ε, as required. If x and y are fixed points, then kf (x) − f (y)k = kx − yk ≤
ckx − yk can hold only when kx − yk = 0. That is, f cannot have more than one fixed point.
6.1.2 We have f (x) > |x| for all x ∈ R, so f cannot have a fixed point. On the other hand,
|x|
|f ′ (x)| = √ < 1 for all x. Since lim |f ′ (x)| = 1, we suspect that there is no c < 1 so that
x2 + 1 |x|→∞
|f (x) − f (y)| ≤ c|x − y|. For example,
√ √
f (2x) − f (x) 4x2 + 1 − x2 + 1
lim = lim = 1,
x→∞ x x→∞ x
so there can indeed be no such c.
6.1.3 We have
k
X ∞
X ∞
X
xk − x = x0 + (xj − xj−1 ) − x0 + (xj − xj−1 ) = − (xj − xj−1 ),
j=1 j=1 j=k+1
so
∞
X X
∞ ck
kxk − xk ≤ kxj − xj−1 k ≤ cj−1 kx1 − x0 k = kx1 − x0 k.
1−c
j=k+1 j=k+1
6.1.4 Since kxk+1 − xk k ≤ ckxk − xk−1 k for all k ∈ N, we have kxk+1 − xk k ≤ ck kx1 − x0 k.
Thus, when ℓ > k > K, the triangle inequality gives us
X
ℓ
ℓ
X ∞
X
ksℓ − sk k =
aj
≤ kaj k ≤ kaj k < ε.
j=k+1 j=k+1 j=k+1
Therefore, the sequence {sk } is a Cauchy sequence and therefore converges, by Exercise 2.2.14.
P
6.1.6 a. If kHk < 1, then the geometric series H k converges (by virtue of Proposition 1.1
and the remark following). But
∞
X
(I − H) H k = lim (I − H)(I + H + H 2 + · · · + H K ) = lim I − H K+1 = I ,
K→∞ K→∞
k=0
X∞
X ∞
kHk ε
k(I + H)−1 − Ik =
(−1)k H k
≤ kHkk = < ,
1 − kHk 1−ε
k=1 k=1
x 1
since the function f (x) = = −1 + is increasing on (0, 1).
1−x 1−x
−1 P
∞
b. We have (A + H)−1 − A−1 = A(I + A−1 H) − A−1 = (−1)k (A−1 H)k A−1 ,
k=1
and so, proceeding as in part a, we have
∞
X ∞
X k ε
k(A + H)−1 − A−1 k ≤ kA−1 k kA−1 Hkk ≤ kA−1 k kA−1 kkHk < kA−1 k .
1−ε
k=1 k=1
6.1. THE CONTRACTION MAPPING PRINCIPLE 165
ε P 1/2
c. Given ε > 0, set δ = . If h2ij < δ, then kHk < δ, and it
kA−1 k kA−1 k + ε)
follows from part b that
ε
kA−1 k+ε
kf (A + H) − f (A)k < kA−1 k = ε,
1 − kA−1ε k+ε
as required.
6.1.8 a. We have
′ 1 ′′ ′ g(x0 ) 1 1
2
g(x1 ) = g(x0 ) + g (x0 )h0 + g (ξ)h0 = g(x0 ) + g (x0 ) − ′ + g ′′ (ξ)h20 = g ′′ (ξ)h20 .
2 g (x0 ) 2 2
Therefore,
1 g(x0 )2 1 M |g(x0 )| 1
|g(x1 )| ≤ M ′ 2
= |g(x0 )| ′ 2
≤ |g(x0 )|,
2 g (x0 ) 2 g (x0 ) 4
as required.
b. Applying the Mean Value Theorem to g ′ gives g ′ (x1 ) = g ′ (x0 ) + g ′′ (c)h0 for some
|g(x0 )|
c between x0 and x1 . Note that |g ′′ (c)h0 | ≤ M ′ ≤ 12 |g ′ (x0 )|. Therefore, by the triangle
|g (x0 )|
1 2
inequality, we have |g ′ (x1 )| ≥ |g ′ (x0 )| − |g ′′ (c)h0 | ≥ 12 |g ′ (x0 )|, and so ′ ≤ ′ . Thus, we
|g (x1 )| |g (x0 )|
have
|g(x1 )| 1 4 |g(x0 )|
≤ |g(x 0 )| = ′ .
g ′ (x1 )2 4 g ′ (x0 )2 g (x0 )2
It follows that
M |g(x1 )| M |g(x0 )| 1
′ 2
≤ ′ 2
≤ .
g (x1 ) g (x0 ) 2
c. We now have
|g(x1 )| 1 2 1
|h1 | = ′ ≤ |g(x0 )| ′
= |h0 |.
|g (x1 )| 4 |g (x0 )| 2
d. Because of the final result in part b, the arguments can all be iterated to show that
1 1 2 |g(xk )| |hk−1 |
|g(xk )| ≤ |g(xk−1 )|, ≤ ′ , and |hk | = ≤
4 |g ′ (xk )| |g (xk−1 )| |g ′ (xk )| 2
P
for all k ∈ N. Therefore, we have |hk | ≤ |hk−1 |/2 ≤ |hk−2 |/4 ≤ · · · ≤ |h0 |/2k . Since the series hk
converges absolutely, it follows from Proposition 1.1 that it converges. Moreover,
X∞ X ∞ ∞
X 1
hk ≤ |hk | ≤ |h0 | = |h0 |,
2k
k=1 k=1 k=1
6.1.9 a. We have x0 = 1 and x1 = x0 −g(x0 )/g ′ (x0 ) = 1.5, so h0 = −0.5. On the interval [1, 2],
we have |g ′′ | = 2 = M , and |g(x0 )|M = 2 ≤ 12 (4) = 12 (g ′ (x0 ))2 . Therefore, we are guaranteed that
Newton’s method will converge to a root in the interval [1, 2]. We have x2 = x1 − g(x1 )/g ′ (x1 ) =
1.41667, x3 = 1.41422, x4 = 1.41421, etc.
166 6. SOLVING NONLINEAR PROBLEMS
b. We have x0 = 1.25 and x1 = x0 − g(x0 )/g ′ (x0 ) = 1.26, so h0 = −0.01. On the interval
[1.25, 1.27], we have |g ′′ | ≤ M = 7.62 < 8, so |g(x0 )|M ≈ 0.38 < 2.34 ≈ 12 (g ′ (x0 ))2 . Therefore, we
are guaranteed that Newton’s method will converge to a root in the interval [1.25, 1.27]. Indeed,
we have x2 = x3 = ... = 1.25992105.
c. We have x0 = π/4 ≈ 0.785398 and x1 = x0 − g(x0 )/g ′ (x0 ) = 0.523599, so h0 ≈
−0.2618. Now, |g ′′ (x)| = |4 cos 2x| ≤ 4 = M , and we have |g(x0 )M | = (π/4)(4) ≤ 12 (3)2 =
1 ′ 2
2 (g (x0 )) , so we are guaranteed that Newton’s method will converge to a root in the interval
[0.26, 0.79]. Indeed, we have x2 = 0.514961, x3 = 0.514933, etc.
6.1.10 This follows the proof of the one-dimensional version in Exercise ??.
a. By Proposition 3.2 of Chapter 5, for each i = 1, . . . , n, we have gi (x1 ) = gi (x0 ) +
Dgi (x0 )h0 + 12 Hgi ,x0 +ξi h0 (h0 ) for some 0 < ξi < 1. Moreover, we have |Hgi ,x0 +ξi h0 (h0 )| ≤ Mi kh0 k2 .
By the definition of h0 , we have g(x0 ) + Dg(x0 )h0 = 0, and so
1 X 1/2 1
n
1 1
kg(x1 )k ≤ (Hgi ,x0 +ξi h0 (h0 ))2 ≤ M kh0 k2 ≤ M kDg(x0 )−1 k2 kg(x0 )k2 ≤ kg(x0 )k.
2 2 2 4
i=1
b. The Mean Value Inequality tells us that, for each i = 1, . . . , n, kDgi (x1 ) − Dgi (x0 )k ≤
Mi kh0 k, and so, by Exercise 5.1.5, we have kDg(x1 ) − Dg(x0 )k ≤ M kh0 k. Then, following the
hint, if we let H = Dg(x0 )−1 Dg(x1 ) − Dg(x0 ) , then we see that kHk ≤ kDg(x0 )−1 kM kh0 k =
kDg(x0 )−1 k2 kg(x0 )kM ≤ 1/2, and so, by part a of Exercise 7, we have k(I + H)−1 − Ik ≤
1. Therefore, kDg(x1 )−1 − Dg(x0 )−1 k ≤ kDg(x0 )−1 kk(I + H)−1 − Ik ≤ kDg(x0 )−1 k, and so
kDg(x1 )−1 k ≤ 2kDg(x0 )−1 k.
c. Combining the results of parts a and b, we obtain kDg(x1 )−1 k2 kg(x1 )k ≤
4kDg(x0 )−1 k2 14 kg(x0 )k = kDg(x0 )−1 k2 kg(x0 )k. Therefore, kDg(x1 )−1 k2 kg(x1 )kM ≤
kDg(x0 )−1 k2 kg(x0 )kM ≤ 1/2.
d. Using the results of parts a and b, and substituting once for kh0 k, we have
1
kh1 k = kDg(x1 )−1 g(x1 )k ≤ kDg(x1 )−1 kkg(x1 )k ≤ 2kDg(x0 )−1 k M kh0 k2
2
−1 2
≤ kDg(x0 ) k kg(x0 )kM kh0 k ≤ kh0 k/2.
e. Because of the result of part c, the arguments can all be iterated to show that
kg(xk )k ≤ 14 kg(xk−1 )k, kDg(xk )−1 k ≤ 2kDg(xk−1 )−1 k, and khk | ≤ 12 khk−1 k
for all k ∈ N. Therefore, we have khk k ≤ khk−1 k/2 ≤ khk−2 k/4 ≤ · · · ≤ kh0 k/2k . Since the series
P
hk converges absolutely, it follows from Proposition 1.1 that it converges. Moreover,
X∞
X ∞ ∞
X 1
hk
≤ khk k ≤ kh0 k = kh0 k,
2k
k=1 k=1 k=1
so Newton’s method converges to a point in the closed ball of radius kh0 k centered at x1 .
6.1. THE CONTRACTION MAPPING PRINCIPLE 167
1 1
6.1.11 a. We have x0 = and x1 = x0 − Dg(x0 )−1 g(x0 ) = . We have g(x0 ) =
0 1/4
" # " #
1 4 0 0 0
and Dg(x0 ) = , so kDg(x0 )−1 k = 1/4. We have Hess(g1 ) = and Hess(g2 ) =
−1 0 4 0 2
" #
0 4 √
, so M1 = 2 and M2 = 4. Therefore, we have M = 2 5. Then kDg(x0 )−1 k2 kg(x0 )kM =
4 0
√ √ √
( 41 )2 ( 2)(2 5) = 10/8 < 1/2, so we are guaranteed that Newton’s method will converge to a
0.983871 0.983858
root in the ball B(x1 , 1/4). In fact, x2 = and x3 = x4 = · · · = .
0.254032 0.254102
2 −1 9/4 −1
b. We have x0 = and x1 = x0 −Dg(x0 ) g(x0 ) = . We have g(x0 ) = ,
0 −1
" # " 1/3 # " #
4 0 2 0 0 3/2
Dg(x0 ) = , so kDg(x0 )−1 k = 1/3. We have Hess(g1 ) = and Hess(g2 ) = ,
0 3 0 2 3/2 0
so M1 = 2 and M2 = 3/2. Therefore, we have M = 5/2. Then kDg(x0 )−1 k2 kg(x0 )kM =
√
( 31 )2 ( 2)( 52 ) ≈ 0.39 < 1/2, so we are guaranteed that Newton’s method will converge to a root in
2.216164 2.215733
the ball B(x1 , 5/12). In fact, x2 = , x3 = x4 = · · · = .
0.301309 0.300879
π π 3.141593
c. We have x0 = and x1 = ≈ , so kh0 k = 1/2π. We have
0 1/2π 0.159155
" # " #
0 −4 0 −4 sin x1 0
g(x0 ) = and Dg(x0 ) = , so kDg(x0 )−1 k = 1/4. We have Hess(g1 ) =
−1 0 2π 0 2
" #
0 2 √
and Hess(g2 ) = , so M1 ≤ 4 (in fact 2 is more accurate) and M2 = 2, so M = 2 5. Then
2 0
√ √
kDg(x0 ) k kg(x0 )kM = ( 14 )2 (1)(2 5) = 5/8 < 1/2. Thus, we are guaranteed that Newton’s
−1 2
3.1478998
method will converge to a root in the ball B(x1 , 1/2π). In fact, we have x2 = ,
0.1588354
3.1478999
x3 = .
0.1588361
0 1/4 −1/4
d. We have x0 = and x1 = . We have g(x0 ) = and Dg(x0 ) =
1 1 0
" # " # " #
1 −1/2 0 0 cos x1 0
, so kDg(x0 )−1 k ≈ 1.28. We have Hess(g1 ) = and Hess(g2 ) = ,
0 1 0 −1/2 0 0
√
so M1 = 1/2 and M2 = 1. Thus, we have M = 5/2, and so kDg(x0 )−1 k2 kg(x0 )kM ≈
(1.28)2 (0.25)(1.12) ≈ 0.46 < 1/2. Therefore, we are guaranteed
that
Newton’s
method
will converge
0.236167 0.236299
to a root in the ball B(x1 , 1/4). Indeed, we have x2 = , x3 = .
0.972335 0.972211
6.1.12 Following the hints, we set g(t) = f (a + t(b − a)) and v = g(1) − g(0), and consider
φ(t) = g(t) · v. Since g is differentiable, so is φ, and so, by the Mean Value Theorem, we have
kvk2 = φ(1) − φ(0) = φ′ (c) = g′ (c) · v ≤ kg′ (c)kkvk for some 0 < c < 1. Therefore, we have kf (b) −
f (a)k = kvk ≤ kg′ (c)k for that value of c. Now, by the chain rule, g′ (c) = (Df (a + c(b − a))(b − a),
so kg′ (c)k ≤ kDf (a + c(b − a))kkb − ak. Setting ξ = a + c(b − a), we obtain the desired result.
168 6. SOLVING NONLINEAR PROBLEMS
1
6.2.1 Note once and for all that
" all the # functions f are C .
x −y
a. We have Df (x) = 2 , so Df (x) is invertible provided x2 + y 2 6= 0. So, for ev-
y x
" #
1 −1 1 x0 y 0
ery x0 6= 0, f has a local C inverse g near x0 , and Dg(f (x0 )) = Df (x0 ) = .
2(x20 + y02 ) −y0 x0
2
y − x2 −2xy " #
(x2 + y 2 )2 (x2 + y 2 )2 1 y 2 − x2 −2xy
b. We have Df (x) = −2xy
= . Thus,
x2 − y 2 (x2 + y 2 )2 −2xy x2 − y 2
(x2 + y 2 )2 (x2 + y 2 )2
Df (x) is invertible if and only if (x2 − y 2 )2 + 4x2 y 2 = (x2 + "y 2 )2 6= 0. So, for every
# x0 6= 0, f has a
2 2
x0 − y0 2x0 y0
local C1 inverse g near x0 , and Dg(f (x0 )) = Df (x0 )−1 = − .
2x0 y0 y02 − x20
" #
1 h′ (y)
c. We have Df (x) = , which is invertible for all x. Then for any x0 , f has
0 1
" #
1 −1 1 −h′ (y0 )
a local C inverse g near x0 , and Dg(f (x0 )) = Df (x0 ) = . (Indeed, we can write
0 1
x x − h(y)
down an explicit global inverse: g = .)
y y
" #
1 ey
d. We have Df (x) = , so Df (x) is invertible provided ex+y 6= 1, i.e., provided
ex 1
1 −1
# x0 , f has a local C inverse g near x0 and Dg(f (x0 )) = Df (x0 ) =
x + y 6= 0." For any such
1 1 −ey0
.
1 − ex0 +y0 −ex0 1
1 1 1
e. We have Df (x) = y + z x + z x + y . Now
yz xz xy
1 1 1 1 1 1
y +z x+z x+y 0 x−y x−z ,
yz xz xy 0 0 (x − z)(y − z)
so Df (x0 ) is nonsingular if and only if x0 , y0 , and z0 are all distinct. With (more than) a bit of
algebra, we find that
x20 −x0 1
(x0 − y0 )(x0 − z0 ) (x0 − y0 )(x0 − z0 ) (x0 − y0 )(x0 − z0 )
−y 2 y 0 −1
Dg(f (x0 )) = Df (x0 )−1 = 0
(x − y )(y − z ) (x − y )(y − z ) (x − y )(y − z ) .
0 0 0 0 0 0 0 0 0 0 0 0
z02 −z0 1
(x0 − z0 )(y0 − z0 ) (x0 − z0 )(y0 − z0 ) (x0 − z0 )(y0 − z0 )
6.2. THE INVERSE AND IMPLICIT FUNCTION THEOREMS 169
2 2
6.2.2 a. in R . In fact, since (u + v) − 4uv =
Note that f maps U to the (open) first quadrant
x
(u − v)2 > 0, we see that f maps U to the set of points with x2 > 4y. Indeed, consider
y
" p #
1 2 − 4y
x x + p x x 2
g = 1 2 . Then, letting W = : x > 4y > 0, x > 0 , we see that
y 2 x− x2 − 4y y
g : W → U is the global inverse function of f .
b. Calculating directly, we have
x −2 " #
1+ p p p
x 1 x2 − 4y x2 − 4y
= p 1
1
x + px2 − 4y −1
Dg =
x 2
2 .
y 2 1− p p x2 − 4y − 12 x − x2 − 4y 1
x2 − 4y x2 − 4y
u x
On the other hand, by the Inverse Function Theorem, if f = , then we have
v y
∂φ
Now we claim that, no matter what the shape of the triangle is, is the largest of the three partial
∂z
derivatives in absolute value; that is, the angle is most sensitive to a small change in the opposite
side. (The clever student is invited to find a heuristic geometric argument for this.) But this is not
difficult: note that z > |x−y cos θ|, inasmuch as z 2 = x2 +y 2 −2xy cos θ > x2 +y 2 cos2 θ−2xy cos θ =
(x − y cos θ)2 (and similarly when we switch x and y).
6.2.7 a. We know that f is C1 (since, for example, the entries are polynomial functions). We
2 2
have Df (A)B = AB + BA, so Df (I)B = 2B, and the linear map Df (I) : Rn → Rn is certainly
invertible. Therefore, in a neighborhood of I the function f has a C1 inverse, thereby giving a C1
square root for all matrices sufficiently close to f (I) = I. Similarly, Df (−I) is invertible, and so we
get a local C1 inverse on a neighborhood of −I as well.
b. Note that there are infinitely many matrices A so that f (A) = I. Aside from ±I, the
standard matrix for every reflection (across every k-dimensional subspace, 0 < k < n) is such. But
f is not locally invertible in a neighborhood
" # of any such matrix." Indeed, as the
# hint suggests, let’s
1 2b11 0
examine what happens at A0 = . Then Df (A0 )B = , so Df (A0 ) has a 2-
−1 0 −2b22
dimensional nullspace and is certainly not invertible; this corresponds to changing the 1-dimensional
subspaces on which the reflection is respectively the identity and negative the identity.
Of course, the Inverse Function Theorem gives a sufficient, but not necessary, condition to have
a local inverse. So, to be "sure, we
# should find a matrix B near I that has no square root near A0 .
1 ε
But this is easy: let B = for ε 6= 0.
0 1
6.2.8 We have
∂F ∂F ∂F
∂p ∂V ∂T ∂p
= − ∂V , = − ∂T , and =− ,
∂V T
∂F ∂T p
∂F ∂p V
∂F
∂p ∂V ∂T
172 6. SOLVING NONLINEAR PROBLEMS
and so
∂F ∂F ∂F
∂p ∂V ∂T ∂V ∂T ∂p
=− = −1.
∂V T ∂T p ∂p V
∂F ∂F ∂F
∂p ∂V ∂T
6.2.10 Given kDf (x) − Ik ≤ 12 for kxk ≤ r, we have (by Proposition 1.3) kf (x) − xk ≤ 12 kxk
whenever kxk ≤ r. By Exercise 1.2.17, we have kxk − kf (x)k ≤ kf (x) − xk ≤ 21 kxk, and so
kf (x)k ≥ 21 kxk. In particular, when kxk = r, we have kf (x)k ≥ r/2.
6.2.11 Following the proof of Theorem 2.1, define φ(x) = x−f (x)+y. Then Dφ(x) = I −Df (x),
so kDφ(x)k ≤ s < 1 and φ is a contraction mapping. If x ∈ B, then kφ(x)k ≤ kx − f (x)k + kyk <
sr + r(1 − s) = r, so φ is a contraction mapping from B to itself. Therefore, φ has a unique fixed
point x, which in turn is a point so that f (x) = y. (In fact, x ∈ B, since the image of φ lies in the
open ball.)
" #
c
c c f G
=F G = 0 ,
0 0
∗
c
and so b = G is a point (near 0) with f (b) = c.
0
6.3. MANIFOLDS REVISITED 173
x
x f t
6.2.13 a. Consider the C1 function F : R3 → R2 given by F =
∂f x . Then the
t
∂t t
x0
hypothesis of the problem tells us precisely that near the equation F = 0 defines x = g(t)
t0
g(t)
for some C1 function g. The equation f = 0 tells us that g(t) lies on the curve Ct . Now,
t
g(t)
differentiating h(t) = f = 0 gives us
t
′ g(t) ′ ∂f g(t)
0 = h (t) = ∇f · g (t) + ,
t ∂t t
∂f g(t)
so having = 0 tells us as well that g′ (t) is tangent to Ct . Thus, g gives (locally) a
∂t t
parametrization of the envelope, as desired.
b. We solve the"equations
# given in part a for x and y as functions of t.
cos t
(i) g(t) = , so the envelope is the circle x2 + y 2 = 1
sin t
" #
1/2t
(ii) g(t) = , so the envelope is the hyperbola 4xy = 1
t/2
" #
t3/2
(iii) g(t) = , so the envelope is the hypocycloid x2/3 + y 2/3 = 1
(1 − t)3/2
6.3.2 This subset of R2 seems to fulfill the requirements of definition (3), and yet in no neigh-
borhood of the origin is it a graph over either of the coordinate axes. What goes wrong is this:
lim g(t) = 0 = g(−π/4), so g−1 fails to be continuous at the origin.
t→π/4
6.3.3 No, any neighborhood of 0, for example, contains portions of infinitely many of these
parallel lines and is therefore not a graph over either coordinate axis.
6.3.4 Yes. Despite the fact that the hyperbola gets closer and closer to the asymptote as
set,we can find a ball W ⊂ R2 centered at p in which we
|x| → ∞, given any point p in this have
a a
a graph. To wit, say a > 0; if p = , then take W = B(p, min(1/2a, 1/2)), and if p = ,
0 1/a
174 6. SOLVING NONLINEAR PROBLEMS
then take W = B(p, 1/2a). Alternatively, we can observe that this locus is the zero set of the
x
h i
function F = y(xy − 1), and DF = y 2 2xy − 1 is everywhere nonzero.
y
2
x
x y x − y2
6.3.5 a. explicit: graph of = f (y) = ; implicit: zero set of F y = ; note
z y4 z − x2
z
that DF has rank 2 everywhere.
√ p
b. explicit: graph (locally) of y = ±3 1 − x2 or x = ± 1 − (y/3)2 ; implicit: zero set of
x
F = x2 + y 2 /9 − 1, whose derivative has rank 1 everywhere on the curve.
y
cos t 1p− 12 cos2 t
c. parametric: g(t) = sin t , |t| < π/3, or g(t) = ± cos t 1 − (cos2 t)/4 ,
√
± 2 cos t − 1 sin t
√ 1 2
y ± 1−x 2 x p 2 (1 + z )
|t| < π/2; explicit: graph (locally) of = √ or = .
z ± 2x − 1 y ± 1 − (1 + z 2 )2 /4
d. parametric: The curve has two connected portions, one parametrized by g+ (t) =
cos t cos t √
sin t sin t y 1 − x2
√
2 or
− sin t , the other by g− (t) = sin t ; explicit: graph (locally) of z = ± − 1 − x
w x
cos t − cos t
√
1 − x2
√
± 1 − x2 .
−x
2x1 2x2 2x3 0 0 0
DF = 0 0 0 2y1 2y2 2y3 .
y 1 y 2 y 3 x1 x2 x3
Suppose F(x) = 0. If rank(DF(x)) < 3, then y = λx and x = µy for some scalars λ and µ, and
since both x and y are unit vectors, we must have λ, µ = ±1. But then x · y = 0 is impossible.
Therefore, rank(DF(x)) = 3 for all x ∈ F−1 (0), and so F−1 (0) is a 3-dimensional manifold. In
fact, this manifold can be visualized as the collection of all unit tangent vectors of the unit sphere
S 2 ⊂ R3 .
6.3. MANIFOLDS REVISITED 177
7.1.1 Let P1 = {0 = x0 < x1 = 1} be the trivial partition of [0, 1] and let P2 = {0 = y0 <
y1 < y2 < y3 = 1} be a partition of [0, 1] with the properties that y1 ≤ 21 < y2 and y2 − y1 < ε; set
P = P1 × P2 . Then for j = 1 and 3, we have m1j = M1j , whereas m12 = 0 and M12 = 1. Then
U (f, P) − L(f, P) = (M12 − m12 )(y2 − y1 ) = y2 − y1 < ε,
and so, by the Convenient Criterion, Proposition 1.3, we infer that f is integrable. Now, for our
particular partition P, we have L(f, P) = 1 − y2 < 21 ≤ 1 − y1 = U (f,
Z P); thus, 1/2 is the only
number that can lie between all lower and upper sums, and therefore f dA = 1/2.
R
7.1.2 As suggested in the hint, let PN be the partition of R into 1/N × 1/N squares Rij ,
1 ≤ i, j ≤ N . Then whenever |i − j| > 1, we have mij = Mij = 0; but whenever |i − j| ≤ 1, we
have mij = 0 and Mij = 1. (Note that each of the squares with |i − j| = 1 has one corner on the
diagonal.) Therefore, we have U (f, PN )−L(f, PN ) = (3N −2)(1/N 2 ). Since lim (3N −2)/N 2 = 0,
N →∞
it follows that for any ε > 0 we can find N sufficiently large so that U (f, PN ) − L(f, PN ) < ε.
Therefore, f is integrable on R. On the other hand, since L(f, PN ) = 0 for every N , it must be
the case Zthat I = 0 is the unique number satisfying L(f, P) ≤ I ≤ U (f, P) for every partition P.
That is, f dA = 0.
R
7.1.3 Just as in the preceding problem, let PN be the partition of R into 1/N ×1/N squares Rij ,
1 ≤ i, j ≤ N . Then whenever i ≥ j, we have Mij = 1; if i < j, we have Mij = 0. On the other hand,
if i ≤ j + 1, then mij = 0 and if i > j + 1, then mij = 1. Thus, Mij − mij = 0 unless 0 ≤ i − j ≤ 1,
in which case Mij − mij = 1. Summing up, we have U (f, PN ) − L(f, PN ) = (2N − 1)(1/N 2 ).
Since lim (2N − 1)/N 2 = 0, it follows that for any ε > 0 we can find N sufficiently large so that
N →∞
U (f, PN ) − L(f, PN ) < ε. Therefore, f is integrable on R. Now, L(f, PN ) = (N − 1)(N − 2)/2N 2
and U (f, PN ) = N (N + 1)/2N 2 , both of which approach 1/2 as N → ∞. Therefore,
Z I = 1/2 is the
unique number satisfying L(f, P) ≤ I ≤ U (f, P) for every partition P. That is, f dA = 1/2.
R
∞
X X∞
1 1 1 1
7.1.4 First, note that = − = . Given any 0 < ε < 1, choose
n(n + 1) n n+1 2
n=2 n=2
ε 1 ε
N ≤ 2/ε. Let P1 be the partition of [0, 1] with x0 = 0, x1 = , x2 = − ,
2 N −1 2(N − 1)N
178
7.1. MULTIPLE INTEGRALS 179
1 ε 1 ε 1 ε
x3 = + , x4 = − , x5 = + ,. . . ,
N − 1 2(N − 1)N N − 2 2(N − 2)(N − 1) N − 2 2(N − 2)(N − 1)
1 ε 1 ε
x2N −4 = − , x2N −3 = + , x2N −2 = 1. Let P2 be the trivial partition of [0, 1], and let
2 12 2 12
P = P1 × P2 . Then we claim that U (f, P) − L(f, P) < ε, so that f is integrable on R. Note that
mi1 = 0 for all i. When i is even, Mi1 = 0, and when i is odd, Mi1 = 1. Therefore,
X X
U (f, P) − L(f, P) = (Mi1 − mi1 )(xi − xi−1 ) = xi − xi−1
i i odd
ε ε ε ε 1 ε
+2
= + · · · + 2 < + 2 · · = ε,
2 2(N − 1)N 12 2 2 2
Z
as required. Moreover, since L(f, P) = 0 for every partition P, we must have f dA = 0.
R
7.1.5 a. Say x0 ∈ R and f (x0 ) > 0. By Exercise 2.3.5, there is a neighborhood of x0 on which
f ≥ f (x0 )/2, so there is a rectangle R′ containing x0 so that m′ = inf x∈R′ f (x) ≥ f (x0 )/2. For any
partition P′ for which R′ is Za rectangle belonging to P′ , we therefore have L(f, P′ ) ≥ m′ vol(R′ ) > 0.
Since f is integrable, I = f dV is the unique number satisfying L(f, P) ≤ I ≤ U (f, P) for all
R
partitions P; therefore 0 < L(f, P′ ) ≤ I, so I > 0.
1, Z
x=0
b. Let R = [0, 1] × [0, 1], and let f (x) = . Then f dA = 0, despite the
0, otherwise R
fact that f is positive at some point of R.
7.1.7 Let Mε = sup f (x) and mε = inf f (x). Then we have (as in Exercise 6a)
x∈B(a,ε) x∈B(a,ε)
R
B(a,ε) f dV
mε ≤ ≤ Mε .
volB(a, ε)
Since f is continuous at a, lim mε = f (a) = lim Mε , and so, by Exercise 2.3.3, we have
Z ε→0+ ε→0+
1
lim f dV = f (a), as required.
ε→0+ volB(a, ε) B(a,ε)
180 7. INTEGRATION
7.1.8 Let R′′ = R ∩ R′ . Note that Ω ⊂ R′′ . Let f˜′′ denote the extension of f to R′′ . Then since
f˜ is 0 outside R′′ except
Z perhapsZ on a set ofZ volume 0 (namely, the Z intersection
Z of the frontier
Z of R′′
with Ω), we have f˜dV = f˜dV = f˜′′ dV . Similarly, f˜′ dV = f˜′ dV = f˜′′ dV .
Z R Z R ′′
Z R′′ R′ R′′ R′′
7.1.9 a. The crucial inequality given in the hint follows from, for example,
Since f and g are integrable on R, given ε > 0 there is a partition P′ so that U (f, P′ ) − L(f, P′ ) <
ε/2 and another partition P′′ so that U (g, P′′ ) − L(g, P′′ ) < ε/2. Letting P be the common
refinement of P′ and P′′ , we have U (f, P) − L(f, P) < ε/2 and U (g, P) − L(g, P) < ε/2. Therefore,
U (f, P) + U (g, P) − L(f, P) + L(g, P) < ε. From the inequality given in the hint, we have
U (f + g, P) − L(f + g, P) ≤ U (f, P) + U (g, P) − L(f, P) + L(g, P) < ε,
1
We take the union of the partitions in each coordinate so as to obtain an actual partition of the large rectangle.
′ ′′
So we are really refining P and P and then taking a union.
7.1. MULTIPLE INTEGRALS 181
Z Z Z
we see that f dV + f dV and f dV both lie between L(f, P′ ) + L(f, P′′ ) and U (f, P′ ) +
R′ R′′ R
U (f, P′′ ), and so, by uniqueness, they must be equal.
Conversely, suppose f is integrable on R. Given ε > 0, let P be a partition of R so that
U (f, P) − L(f, P) < ε. Let P ˜ be the refinement we obtain by appending the missing face of R′ ,
and let P ˜ ′ be the corresponding partition of R′ . Then U (f, P
˜ ′ ) − L(f, P
˜ ′ ) ≤ U (f, P
˜ ) − L(f, P
˜) ≤
U (f, P) − L(f, P) < ε, so f is integrable on R′ (and, similarly, on R′′ ). Now
Z Z
˜ ) = L(f, P
L(f, P) ≤ L(f, P ˜ ′ ) + L(f, P
˜ ′′ ) ≤ f dV + f dV
R′ R′′
˜ ′ ) + U (f, P
≤ U (f, P ˜ ′′ ) = U (f, P
˜ ) ≤ U (f, P),
Z Z Z
so f dV + f dV and f dV both lie between L(f, P) and U (f, P). By uniqueness, they
R′ R′′ R
must be equal.
7.1.10 Following the hint, we start with a partition P′ so that U (f, P′ )−L(f, P′ ) < ε/2. Say the
total (n − 1)-dimensional volume of the partitioning hyperplanes is A. (If the partition is given as
n
X Y
on p. 268 of the text, then we can give an explicit formula for A, viz., A = (ki − 1) (bj − aj ).)
i=1 j6=i
Suppose now we consider a partition P of R by rectangles of diameter < δ. Then the total volume
of all of those that intersect the partitioning hyperplanes is at most 2Aδ. (To cover an (n − 1)-
dimensional rectangle of (n − 1)-dimensional volume A with n-dimensional rectangles of diameter
(and therefore height) < δ requires less than volume Aδ. With thanks to Jacob Rooney for pointing
this out, we need a factor of 2 in case the partitioning hyperplanes belong to rectangles on either
side, i.e., when the partioning hyperplanes are faces of the rectangles. ) Now the contribution
of those rectangles to U (f, P) − L(f, P) is at most 2M · 2Aδ. The contribution of the remaining
rectangles is at most U (f, P′ ) − L(f, P′ ) < ε/2, inasmuch as every other rectangle is contained in
one of the rectangles of P′ . Thus, if we choose δ < ε/8M A, then we will have 2M · 2Aδ < ε/2 and
so U (f, P) − L(f, P) < ε, as required.
b. Given a linear map T : Rn → Rn , recall that it maps any ball of radius R into a ball
of radius kT kR. Since a cube of diameter δ is contained in a ball of radius δ/2, it follows that T
maps a cube of diameter δ into a ball of radius kT kδ/2, which is, in turn, contained in a cube of
√ √ n
diameter kT k nδ. Letting k = (kT k n) , given any cube C, it follows that T (C) is contained in
a cube whose volume is at most k times that of C.
P
Given X with volume 0, we find cubes C1 , . . . , Cr so that X ⊂ C1 ∪· · ·∪Cr and vol(Ci ) < ε/k.
P P
Then T (X) ⊂ T (C1 ) ∪ · · · ∪ T (Cr ) ⊂ C1′ ∪ · · · ∪ Cr′ , and vol(Ci′ ) ≤ k vol(Ci ) < ε. Therefore,
T (X) also has volume 0.
When m < n, of course this needn’t be true. Take, for example, the projection of a region in an
m-dimensional subspace. But what goes wrong with the proof? Imagine covering a line segment in
R2 parallel to the x-axis by s squares of area ε/s each. Then its projection to the x-axis is covered
p p √
by s line segments of length ε/s each. But as s → ∞, note that s ε/s = sε → ∞.
7.1.12 By Exercise 5.1.13, there is δ > 0 so that for every x ∈ X, we have B(x, δ) ⊂ U . Suppose
√
the sidelength of the cube C is c. Now choose N > c m/δ. Then when we divide C into N m
√
subcubes, each of sidelength c/N < δ/ m, it follows that the subcube containing x ∈ X must lie
inside U . Let Y be the union of those subcubes covering X. Then Y is also compact and so, by
the Maximum Value Theorem, the continuous function kDφk has a maximum value, say M , on Y .
It follows from Proposition 1.3 of Chapter 6 that for any cube C ′ of sidelength r, the image φ(C ′ )
√
is contained in a cube of sidelength M r n. Since X is covered by the subcubes constituting Y ,
√ n
φ(X) is covered by at most N m cubes of volume at most (c/N )M n . That is, φ(X) is covered
c √ n √
by cubes whose total volume is at most N m M n = (cM n)n N m−n . Since m < n, we see
N
that by making N bigger we can arrange for this to be less than any given positive ε. Therefore
φ(X) has volume 0.
7.1.13 The function f given here is discontinuous at every rational point. Nevertheless, f is
integrable. Given ε > 0, choose N > 2/ε. Then list all the rational numbers with denominator
≤ N : 0, 1, 21 , 31 , 32 , . . . , N1 ,. . . , NN−1 . Create a partition P in which each of these points is an
interior point of a subinterval of the partition, and in which all the lengths of those subintervals
add up to ε/2. For any one of these subintervals, we have Mi ≤ 1 and mi = 0. For any of the
remaining subintervals, we have Mi ≤ 1/(N + 1) < ε/2. Therefore, we have
X X
U (f, P) − L(f, P) = (Mi − mi )(xi − xi−1 ) + (M − m )(x − xi−1 )
| {z } | i {z i} i
special subintervals ≤1 remaining subintervals
<ε/2
ε ε
< 1 · + · 1 = ε.
2 2
Therefore, f is integrable on [0, 1]. Moreover, since L(f, P) = 0 for every partition P, we must have
Z 1
f (x)dx = 0.
0
Just count along the “anti-diagonals,” as pictured above. In formulas, we assign the counting
number i + (i + j − 1)(i + j − 2)/2 to the set Rij .)
7.1.15 a. Since M (f, a, δ) decreases and m(f, a, δ) increases as δ → 0+ , it follows that their
difference, M (f, a, δ) − m(f, a, δ), decreases. On the other hand, it is bounded below by 0, hence
converges as δ → 0+ . Whenever kx − ak < δ, we have |f (x) − f (a)| ≤ M (f, a, δ) − m(f, a, δ).
Therefore, if o(f, a) = 0, then f is continuous at a. Conversely, if f is continuous at a, then given
any ε > 0, there is δ > 0 so that whenever kx − ak < δ, we have |f (x) − f (a)| < ε/2, so for any
x, y ∈ B(a, δ), we have |f (x) − f (y)| < ε and therefore M (f, a, δ) − m(f, a, δ) ≤ ε. Since ε is
arbitrary, we must have o(f, a) = 0.
b. By part a, if f is discontinuous at a, then o(f, a) > 0, so there is some k ∈ N so that
o(f, a) ≥ 1/k. Therefore, x ∈ D =⇒ x ∈ D1/k for some k ∈ N, so D ⊂ D1 ∪ D1/2 ∪ D1/3 ∪ · · ·.
On the other hand, if x ∈ D1/k for some k ∈ N, then f is discontinuous at x, so x ∈ D. Therefore
D ⊃ D1 ∪ D1/2 ∪ D1/3 ∪ · · ·, and we’ve established equality.
To prove Dε is a closed set, suppose xk → a and xk ∈ Dε . This means that there are points
yk , zk ∈ B(xk , 1/k) with f (yk ) − f (zk ) ≥ ε. Since yk → a and zk → a, this shows that o(f, a) ≥ ε
as well.
184 7. INTEGRATION
c. Choose ε > 0. Then there is a partition P so that U (f, P) − L(f, P) < ε/k. For each
x ∈ D1/k , we have x ∈ Ri for some rectangle in P, and so (1/k)vol(Ri ) ≤ (Mi − mi )vol(Ri ). There-
1 X X ε
fore, we have vol(Ri ) ≤ (Mi − mi )vol(Ri ) ≤ U (f, P) − L(f, P) < . There-
k k
Ri ∩D1/k 6=∅ Ri ∩D1/k 6=∅
X
fore, vol(Ri ) < ε and therefore D1/k is a set of volume 0. It follows from Exercise 14 that
Ri ∩D1/k 6=∅
D1/k is a set of measure 0 for each k ∈ N and therefore D has measure 0.
d. The proof is quite like that of Proposition 1.8. Suppose |f | ≤ M . Suppose D has
measure 0 and we are given ε > 0; set ε′ = ε/2vol(R). Then Dε′ ⊂ D has measure 0; since Dε′ is a
closed subset of a rectangle, it is compact, and therefore by Exercise 14c it has volume 0. We can
cover Dε′ by finitely many rectangles Rj′ , j = 1, . . . , s, whose volumes sum to less than ε/4M ; we
also make sure that no point of Dε′ is a frontier point of the union of these rectangles.
S
s
Consider the closure Y of R − Rj′ . For every x ∈ Y , we have o(f, x) < ε′ and so there is an
j=1
open rectangle Sx on which supy∈Sx f (y) − inf y∈Sx f (y) < ε′ . By Exercise 5.1.12, we can cover Y
by finitely many such rectangles (hence by their closures). We finally create a partition P = {Ri }
of R so that every one of the S x ’s we used and every Rj′ we used is a union of subrectangles of P.
Then we have
X X
U (f, P) − L(f, P) = (Mj − mj ) vol(Rj ) + (Mj − mj ) vol(Rj )
| {z } s
| {z }
Rj ⊂Y
≤ε′ R′j ≤2M
S
Rj ⊂
j=1
s
X ε ε ε ε
< ε′ vol(R) + 2M vol(Rj′ ) < vol(R) + 2M = + = ε.
2vol(R) 4M 2 2
j=1
Therefore, it follows from the Convenient Criterion, Proposition 1.3, that f is integrable on R.
a. b. c.
d. e. f.
1π 1
so our integral is equal to − 1 + log 2. On the other hand, changing the order of integration
2 4 4
gives us
Z 1 Z √y Z i√ y Z
x 1 1 1 2 1 1 y y2
dxdy = x dy = − dy
0 y 1 + y2 2 0 1 + y2 y 2 0 1 + y2 1 + y2
1 1 i1 1 1 π
= log(1 + y 2 ) − y + arctan y = log 2 − 1 + .
2 2 0 2 2 4
a. b. c.
Z 1
x
7.2.4 No. f dy exists only for x = 1/2, since the integrand is otherwise everywhere
0 y
discontinuous. (Alternatively, when 0 ≤ x < 1/2, every upper sum is 1 and every lower sum is 2x.)
x m n
7.2.5 We claim that :x= and y = for some m, n, q ∈ N with q prime is dense
y q q
2
in R . In particular, any rectangle whatsoever contains a point of that form. To establish this,
note that if q is prime and 1/q < δ (remember that, by Euclid, there are infinitely many primes),
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 187
then any interval in R of length δ must contain a point of the form m/q (or else we’d have an
interval of length δ separating two consecutive multiples of 1/q).
7.2.6 This is a two-dimensional variant of the function we studied in Exercise 7.1.13. Taking
the trivial partition of the interval [0, 1] on the y-axis and using the same partition on the x-axis,
we see that f is integrable on R. Z 1
x
When y ∈ Q, we know from that exercise that f dx exists and is equal to 0. Therefore,
0 y
Z 1 Z 1Z 1
x x
f dx = 0 for all y and f dxdy = 0. On the other hand, for any x ∈ Q, x = p/q
0 y 0 0 y
Z 1
x
in lowest terms, the integral f dy does not exist, since every lower sum is 0 and every upper
0 y
Z 1Z 1
x
sum is 1/q. Therefore, the iterated integral f dydx does not exist.
0 0 y
1, x = 0, y ∈ Q or y = 0, x ∈ Q
x
7.2.7 Yes. Here is a cheap solution: Let f = . Then
y 0, otherwise
Z 1 Z 1
x 0
neither f dx nor f dy exists, so neither iterated integral exists.
0 0 0 y
7.2.8 In all these cases, the key is to change the order of integration. The student should in
every such problem begin by sketching the region.
Z 2 Z x2 Z 2 i2 1
1 x2 1 3 2
a. 3
dydx = 3
dx = log(1 + x ) = log 9 = log 3.
0 0 1 + x 0 1 + x 3 0 3 3
Z 1 Z y3 Z 1
4 4 1 4 i1 1
b. ey dxdy = y 3 ey dy = ey = (e − 1).
0 0 0 4 0 4
Z 1 Z x2 Z 1 ix 2 Z 1
1 i1 1
c. ey/x dydx = xey/x dx = x(ex − 1)dx = xex − ex − x2 = .
0 0 0 0 0 2 0 2
Note that, despite the seeming discontinuity at the origin, the function is really bounded on the
region Ω, since 0 ≤ y/x ≤ 1 on Ω. Moreover, as x → 0 in Ω, 0 ≤ y/x ≤ x → 0, so the function in
fact approaches 1 as x → 0.
x p x
7.2.9 Let f 2
= 16 − y and Ω = : 0 ≤ y ≤ 4, 0 ≤ x ≤ y/2 . Then the volume in
y y
question is
Z Z 4 Z y/2 p Z i4 32
x 1 4 p 1
f dA = 2
16 − y dxdy = y 16 − y 2 dy = − (16 − y 2 )3/2 = .
Ω y 0 0 2 0 6 0 3
Z Z 2 Z 4−x2 Z y Z 2 Z 4−x2 Z 2 Z 2
2 2 256
1dV = dzdydx = ydydx = (4−x ) dx = (16−8x2 +x4 )dx = .
Ω −2 0 0 −2 0 0 0 15
188 7. INTEGRATION
7.2.11 The volume of the region Ω is most easily computed as an iterated integral with the
x-integral outermost. By symmetry, we have
Z Z 1 Z √1−x2 Z √1−x2 Z 1
16
1dV = 8 dzdydx = 8 (1 − x2 )dx = .
Ω 0 0 0 0 3
Z 1 Z 1−x Z 1−x−z x
7.2.12 a. f y dydzdx
0 0 0 z
Z 1 Z 1−x2 Z 1−x2 x
b. f y dydzdx
0 0 z z
√
Z 1 Z 1 Z z 2 −x2 x
c. √ f y dydzdx
−1 |x| − z 2 −x2 z
Z 1 Z x Z 1−x2 x Z 1 Z 1+x−x2 Z 1−x2 x
d. f y dydzdx + f y dydzdx
0 0 0 z 0 x z−x z
Z 1 Z x Z 1−x x Z 1 Z 1 Z 1−x x
e. f y dydzdx + f y dydzdx
0 0 0 z 0 x z−x z
a. b. c.
d. e.
7.2.13 Let Ω be the region in the first octant bounded by the plane x/a + y/b + z/c = 1. Then,
doing careful bookkeeping, its volume is given by
Z Z a Z b(1−x/a) Z c(1−x/a−y/b)
1dV = dzdydx
Ω 0 0 0
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 189
Z a Z b(1−x/a) Z a
x y x 1 y 2 ib(1−x/a)
=c 1 − − dydx = c 1− y−
0 a b a 2 b 0
Z a0 i
0
bc
x 2 abc
x 3 a abc
= 1− dx = ·− 1− = .
2 0 a 6 a 0 6
Z Z 1Z 1 Z 2−y−z
xdV = xdxdydz
Ω 0 1−z 0
Z 1Z 1 Z 1 i1
1 1
= (2 − y − z)2 dydz = −(2 − y − z)3 dz
2 0 1−z 6 0 1−z
Z
1 1 1 1 i1 1 3 1
= 1 − (1 − z)3 dz = z + (1 − z)4 = · = .
6 0 6 4 0 6 4 8
7.2.16 We have
Z 1Z 1 Z 1Z 1 Z 1 i1
x−y 1 2y 1 y
3
dxdy = 2
− dxdy = − + dy
0 0 (x + y) 0 0 (x + y) (x + y)3 0 x + y (x + y)2 0
Z 1 Z 1
1 y 1 1 i1 1
= − + 2
dy = − 2
dy = =− ;
0 y + 1 (y + 1) 0 (y + 1) y+1 0 2
whereas
Z 1Z 1 Z 1Z 1 Z 1 i1
x−y 1 2x 1 x
dydx = − + dydx = − dx
0 0 (x + y)3 0 0 (x + y)2 (x + y)3 0 x + y (x + y)2 0
Z 1 Z 1
1 x 1 1 i1 1
= − dx = dx = − = .
0 x + 1 (x + 1)2 0 (x + 1)
2 x+1 0 2
There is no contradiction: Fubini’s Theorem does not apply, as f is unbounded on [0, 1] × [0, 1] and
hence not integrable.
190 7. INTEGRATION
x 1 1
7.2.17Note that f is unbounded on R, hence not integrable. Let Rkℓ = : ≤x≤ ,
y k+1 k
1 1
≤y≤ . Then we have
ℓ+1 ℓ
1/2ℓ−k , k < ℓ
Z
f dA = −1, k=ℓ.
Rkℓ
0, k>ℓ
Interesting! (This is a manifestation of the fact from infinite series that only in the case of absolute
convergence can we rearrange with impunity.)
Z 1
1 1 x
To be a bit more pedantic, note that when < x ≤ , we have f dy = −k(k + 1) +
k+1 k 0 y
X k(k + 1) Z 1Z 1
x 1 1
= 0. Therefore, f dydx really is equal to 0. And, when <y≤ ,
2ℓ−k 0 0 y ℓ + 1 ℓ
ℓ>k
Z 1 X ℓ(ℓ + 1)
x 1
then we have f dx = −ℓ(ℓ + 1) + ℓ−k
= −ℓ(ℓ + 1) ℓ−1 . Therefore, integrating with
0 y 2 2
k<ℓ
Z 1Z 1 X
x 1 1
respect to y, we obtain f dxdy = −ℓ(ℓ + 1) ℓ−1 = −2, as we surmised.
0 0 y 2 ℓ(ℓ + 1)
ℓ
Z Z bn
x1 Z b1 Z bn Z b1 −x1
.. ..
f (x)dV = ··· f . dx1 · · · dxn = ··· f . dx1 · · · dxn = . . .
R −bn −b1 −bn −b1
xn xn
Zbn b1
−x1Z Z Z
..
= ··· f . dx1 · · · dxn = f (−x)dV = − f (x)dV .
−bn −b1 R R
−xn
Z Z
Therefore, f dV = − f dV = 0.
R R
c. The same result applies to an arbitrary region Ω having the same symmetry. We
enclose Ω in a symmetric rectangle, consider the extended function f˜, and observe that it has the
same symmetry. Then the result follows from the results of a and b.
∂2f ∂2f
7.2.19 Suppose that for some x0 we have (x0 ) > (x0 ). Since f is C2 , there is a ball
∂x∂y ∂y∂x
∂2f ∂2f
centered at x0 (hence a rectangle R = [a, b]×[c, d] centered at x0 ) on which > . It follows
Z 2 Z ∂x∂y ∂y∂x Z
∂ f ∂2f ∂2f ∂2f
from Exercise 7.1.5 that − dA > 0, and hence that dA > dA.
R ∂x∂y ∂y∂x R ∂x∂y R ∂y∂x
Since f is C2 , the integrands are continuous and we can evaluate these by iterated integrals, using
the Fundamental Theorem of Calculus. We have
Z Z bZ Z bZ d
d
∂2f ∂2f ∂ ∂f
dA = dydx = dydx
R ∂y∂x a c ∂y∂x a c ∂y ∂x
Z b
∂f x ∂f x b a b a
= − dx = f −f −f +f , and
a ∂x d ∂x c d d c c
Z Z dZ b 2 Z dZ b
∂2f ∂ f ∂ ∂f
dA = dxdy = dxdy
R ∂x∂y c a ∂x∂y c a ∂x ∂y
Z d
∂f b ∂f a b b a a
= − dy = f −f −f +f .
c ∂y y ∂y y d c d c
Z Z
∂2f ∂2f
Comparing our answers, we have arrived at a contradiction, for we have dA = dA.
R ∂x∂y R ∂y∂x
It must therefore follow that the mixed partials are everywhere equal.
7.2.20 a. Since f is continuous and the rectangle [a, b] × [c, d] is compact, by Theorem 1.4 of
Chapter
5, we
know f is uniformly continuous.
′ Given
ε > 0, there is δ > 0 so that whenever
x x′
x x ε
′
y − y ′
< δ, then we have f y − f y ′ < d − c . Now we claim that if |x − x | < δ,
192 7. INTEGRATION
x x′
then |F (x) − F (x′ )|
< ε. For if |x − < δ, then we have
y −
< δ and so
x′ |
y
Z d ′ Z d ′ Z d
x x x x ε
′
|F (x) − F (x )| = −f
dy ≤
f f y − f y dy < dy = ε,
c y y c c d−c
as required.
Z d
∂f t
b. Set φ(t) = dy. Reasoning precisely as in part a, we conclude that φ is
y
c Z∂x x
continuous. Setting Φ(x) = φ(t)dt, we conclude from the Fundamental Theorem of Calculus
a
that Φ is differentiable and Φ′ (x) = φ(x) for all x ∈ [a, b]. On the other hand, by Fubini’s Theorem,
we have
Z xZ d Z dZ x Z d
∂f t ∂f t x a
Φ(x) = dydt = dtdy = f −f dy = F (x) − F (a).
a c ∂x y c a ∂x y c y y
It follows, then, that F = Φ + F (a) is differentiable and F ′ (x) = Φ′ (x) = φ(x), which is what we
wanted to establish.
Z 1 x
y −1
7.2.21 (Note first of all that the improper integral dy converges at 1, as, by L’Hôpital’s
a log y
yx − 1
rule, lim = lim xy x = x.) By Exercise 20, we have
y→1 log y y→1
Z 1 x Z 1 i1
′ y log y 1 1
F (x) = dy = y x dy = y x+1 = for x > −1.
0 log y 0 x+1 0 x+1
Then F (x) = log(x + 1) + C for some constant C. But note that F (0) = 0, so C = 0. Therefore
Z 1
y−1
F (1) = dy = log 2.
0 log y
7.2.22 a. Using the Fundamental Theorem of Calculus and Exercise 20 as necessary, we have
Z x
2 2
f ′ (x) = 2e−x e−t dt
0
Z 1 −x 2 2 Z 1 Z 1
′ e (t +1) −x2 (t2 +1) −x2 2
g (x) = · −2x(t + 1) dt = −2
2xe dt = −2xe e−(xt) dt
0 t2 +1 0 0
Z x
2 2
Now, making the substitution u = xt in the latter integral, we have g ′ (x) = −2e−x e−u du,
0
and so f ′ (x) + g ′ (x) = 0.
Z 1
dt π
b. We have f (0) = 0 and g(0) = = , so f (x) + g(x) = π/4 for all x ∈ R.
0 +1 4 t2
Now, we claim that lim g(x) = 0. To see this, note that
x→∞
Z 1 2 2 Z 1
−x2 e−x t 2 dt 2π
g(x) = e 2
dt ≤ e−x = e−x → 0 as x → ∞.
0 t +1 0 t2 +1 4
Therefore, it follows that lim f (x) = π/4, and so
x→∞
Z ∞ Z x p √
−t2 −t2 π
e dt = lim e dt = lim f (x) = .
0 x→∞ 0 x→∞ 2
7.2. ITERATED INTEGRALS AND FUBINI’S THEOREM 193
Z z
x x
7.2.23 As the hint suggests, consider F = f dy. Then by Exercise 20 and the
z c y
Fundamental Theorem of Calculus, respectively, we have
Z z
∂F ∂f x ∂F x
= dy and =f .
∂x c ∂x y ∂z z
2 x
Then, setting φ : [a, b] → R , φ(x) = , note that h = F ◦ φ. Therefore, we have
g(x)
Z g(x)
′ ′ ∂F x ∂F x ∂f x x
h (x) = DF (φ(x))φ (x) = + g ′ (x) = dy + f g ′ (x).
∂x g(x) ∂z g(x) c ∂x y g(x)
7.2.26 Obviously, the formula holds when n = 1. Suppose we know the result for a k-fold
integral and want to calculate the (k + 1)-fold integral. Then
Z x Z x1 Z xk Z xk+1
··· f (xk+1 )dxk+1 dxk · · · dx2 dx1
0 0 0 0
Z x Z x 1 Z xk Z xk+1
= ··· f (xk+1 )dxk+1 dxk · · · dx2 dx1
0 0 0 0
Z x Z x1
1 k−1
= (x1 − t) f (t)dt dx1
0 (k − 1)! 0
194 7. INTEGRATION
4 3
a. b.
−2
c. d.
7.3.2 The curves r cos θ = 1 and r = 2 intersect at θ = ±π/3. Therefore, the area of the region
Ω described is
Z Z π/3 Z 2 Z Z
1 π/3 2 i2 1 π/3
1dA = rdrdθ = r dθ = (4 − sec2 θ)dθ
Ω −π/3 sec θ 2 −π/3 sec θ 2 −π/3
Z π/3 iπ/3 4π √
= (4 − sec2 θ)dθ = 4θ − tan θ = − 3.
0 0 3
By elementary geometry, we note that we have a sector of a circle (with central angle 2π/3) less
1 1 √
the triangular region inside; thus, the area is (4π) − (2 3).
3 2
7.3. POLAR, CYLINDRICAL, AND SPHERICAL COORDINATES 195
Z Z Z √
2π 2
2 3
7.3.5 a. y dA = r 3 sin2 θdrdθ = π.
S 0 1 4
Z Z Z √
2π 2
3
b. On the other hand, (x2 + y 2 )dA = r 3 drdθ = (2π). But we observe
S 0 1 Z 4
Z Z
2 2
that since the region S is symmetric about the line y = x, y dA = x dA, so y 2 dA =
Z S S S
1
(x2 + y 2 )dA.
2 S
7.3.6 We have
Z Z √ Z π/4 Z √2
π/4 Z 2
2 2 −5/2 r sin θ sin θ
y(x + y ) dA = 5
rdrdθ = 3
drdθ
S 0 sec θ r 0 sec θ r
Z
1 π/4
1
1 1 1 iπ/4 √2 − 1
2 3
= cos θ − sin θdθ = − cos θ + cos θ = .
2 0 2 2 3 2 0 12
7.3.7 We have
Z Z 5π/6 Z 2 Z 5π/6 Z 2
2 2 −3/2 1 1
(x + y ) dA = 3
rdrdθ = 2
drdθ
S π/6 csc θ r π/6 csc θ r
Z 5π/6
1 1 i5π/6 √ π
= sin θ − dθ = − cos θ − θ = 3− .
π/6 2 2 π/6 3
7.3.8 We have
Z Z π/4 Z 2 cos θ 2 2 Z π/4 Z 2 cos θ
r sin θ
f dA = rdrdθ = r 2 sin2 θdrdθ
S 0 sec θ r 0 sec θ
Z Z
1 π/4 3 3 2 1 π/4
= (8 cos θ − sec θ) sin θdθ = 8 sin2 θ(1 − sin2 θ) cos θ − tan2 θ sec θ dθ
3 0 3 0
iπ/4 1 √
1 sin θ 3 5
sin θ 1 1 √ 2
= 8 −8 − sec θ tan θ + log(sec θ + tan θ) = log( 2 + 1) − .
3 3 5 2 2 0 6 90
7.3.9 Although this problem is quite easily done by changing the order of integration, the
x2 + y 2
in the denominator suggests to us that changing to polar coordinates may simplify matters.
Indeed, it does:
Z 1Z 1 Z π/4 Z sec θ Z π/4 Z sec θ
xex r cos θer cos θ
2 2
dxdy = rdrdθ = cos θer cos θ drdθ
0 y x +y 0 0 r2 0 0
Z π/4 isec θ Z π/4 π
= er cos θ = (e − 1)dθ = (e − 1).
0 0 0 4
196 7. INTEGRATION
7.3.12 Calculating in spherical coordinates, we have (note that since ρ ≥ 0, we must have
0 ≤ φ ≤ π)
Z π Z π Z sin θ Z Z Z π Z π
2 1 π π 3 1 3 8
ρ sin φdρdφdθ = sin θ sin φdφdθ = sin θdθ sin φdφ = .
0 0 0 3 0 0 3 0 0 9
7.3.15 The region lies over the disk r ≤ 1 in the xy-plane; thus, the volume of the region is
Z 2π Z 1 Z √2−r2 Z 1 p 1
1 i1 π √
rdzdrdθ = 2π r 2 − r 2 − r 3 dr = 2π − (2 − r 2 )3/2 − r 4 = (8 2 − 7).
0 0 r2 0 3 4 0 6
Z Z √
2π aZ 1
a2 −r 2 ia 4πa3
7.3.16 a. 2 rdzdrdθ = 4π − (a2 − r 2 )3/2 = .
0 0 0 3 0 3
Z 2π Z π Z a Z a Z π a3
2 2 4πa3
b. ρ sin φdρdφdθ = 2π ρ dρ sin φdφ = 2π (2) = .
0 0 0 0 0 3 3
7.3. POLAR, CYLINDRICAL, AND SPHERICAL COORDINATES 197
7.3.17 We set up the cone with its vertex at the origin and its axis of symmetry along the
z-axis. Z Z
2π aZ h Z a
r2 π
a. dr = a2 h.
rdzdrdθ = 2πh r−
0 0 hr/a 0 a 3
Z 2π Z arctan(a/h) Z h sec φ Z
2πh3 arctan(a/h)
b. ρ2 sin φdρdφdθ = sec3 φ sin φdφ
0 0 0 3 0
2πh3 1 2
iarctan(a/h) 2πh3 a 2 πa2 h
= sec φ = = .
3 2 0 6 h 3
Z 2π Z 1 Z √2−r2 Z 1 p
2π i1
7.3.18 a. rdzdrdθ = 2π r 2 − r 2 − r 2 dr = − (2 − r 2 )3/2 + r 3
0 0 r 0 3 0
4π √
= 2−1 .
3
Z 2π Z π/4 Z √2 √ ! √
2 2 2 1 4π( 2 − 1)
b. ρ sin φdρdφdθ = 2π 1− √ = .
0 0 0 3 2 3
Z Z √ √
2π a 3 Z 4a2 −r 2 Z a p
7.3.19 a. rdzdrdθ = 2π r 4a2 − r 2 − ar dr
0 √0 a 0
1 2 2 3/2 a 2 ia 3 7 3 5πa3
= −2π (4a − r ) + r = 2πa3 − = .
3 2 0 3 2 3
Z 2π Z π/3 Z 2a Z
2πa3 π/3 2πa3 5 5πa3
b. ρ2 sin φdρdφdθ = 8 − sec3 φ) sin φdφ = · = .
0 0 a sec φ 3 0 3 2 3
Z
7.3.20 Because S is symmetric under the interchanges x y z, we have x2 dV =
Z Z S
y 2 dV = z 2 dV , so
S S
Z Z Z 2π Z π Z 1
21 2 2 2 1 1 4π 4π
x dV = (x + y + z )dV = ρ4 sin φdρdφdθ = = .
S 3 S 3 0 0 0 3 5 15
7.3.21 a.
Rewriting the integral in spherical coordinates, we have
Z Z 2π Z π Z ∞ Z ∞
−(x2 +y 2 +z 2 ) −ρ2 2 2
e dV = e ρ sin φdρdφdθ = 4π ρ2 e−ρ dρ
R3 0 0 0 0
i∞ Z ∞ √
2 2 π
= 2π −ρe−ρ + e−ρ dρ = 2π = π 3/2 ,
0 0 2
Z
2 2 2
by Example 3. Alternatively, we could evaluate the integral directly, noting that e−(x +y +z ) dV
Z ∞ 3 R3
2
= e−x dx .
−∞
b. This is a bit sneaky, since we’ve not yet discussed the higher-dimensional change of
variables theorem. But, separating the iterated integral
Z ∞as a productZof∞single integrals, we then
−ax2
√
−( ax)2
p
just make simple substitutions. Indeed, if a > 0, then e dx = e dx = π/a, so
−∞ −∞
Z Z Z Z
−(x2 +2y 2 +3z 2 )
∞
−x2
∞
−2y 2
∞
−3z 2
√
e dV = e dx e dy e dz = π 3/2 / 6.
R3 −∞ −∞ −∞
198 7. INTEGRATION
7.3.22 The region described lies over the circle (x − 3/2)2 + (y − 2)2 = 25/4 in the xy-plane.
Since
translating
a region does not affect its volume, we see that the region has
the same volume as
x
25 3 3
S = y : x2 + y 2 ≤ , (x + )2 + (y + 2)2 ≤ z ≤ 3(x + ) + 4(y + 2) . Now, we compute
4 2 2
z
the volume of S:
Z 2π Z 5/2
25 3
vol(S) = 3r cos θ + 4 sin θ + − (r cos θ + )2 − (r sin θ + 2)2 rdrdθ
0 0 2 2
Z 2π Z 5/2
25
= − r 2 rdrdθ
0 0 4
π 25 2 625π
= = .
2 4 32
7.3.23 This integral can be calculated as a sum of two iterated integrals in spherical coordinates
or, more simply, in cylindrical coordinates, as follows.
Z Z 2π Z √3/2 Z √1−r2
z rz
2 2 2 3/2
dV = √ 2 2 3/2
dzdrdθ
S (x + y + z ) 0 0 1− 1−r 2 (r + z )
Z √3/2 i √
2
Z √3/2
r 1−r
q r
= 2π −√ √ dr = 2π √ − r dr
2
r +z 2 1− 1−r 2
0 0 2(1 − 1 − r ) 2
√
p p # 3/2
√ 1 1 11π
= 2π 2 (1 − 1 − r 2 )1/2 − (1 − 1 − r 2 )3/2 − r 2 = .
3 2 12
0
Z
r √ r
(To find p √ dr, we substitute u2 = 1 − 1 − r 2 , so 2udu = √ dr and the
1 −Z 1 − r 2 1 − r2
Z
2u(1 − u2 )du
integral becomes = 2 (1 − u2 )du.)
u
7.3.24 This is one of my all-time favorite challenge problems for vector calculus students. (It
is an even better challenge for a single-variable calculus student. And the ultimate challenge is
to solve the problem, à la grecque, with no calculus whatsoever.2 ) Exploiting symmetry to the
utmost, note that the region is composed of (6)(8) = 48 regions congruent to
x
y : x2 + y 2 ≤ 1, 0 ≤ y ≤ x, 0 ≤ z ≤ y ,
z
whose volume is
Z π/4 Z 1 Z r sin θ Z π/4 Z 1
1 1
rdzdrdθ = r 2 sin θdrdθ = 1− √ .
0 0 0 0 0 3 2
2
As a hint, recall that Archimedes computed the volume of a sphere by applying what we now call Cavalieri’s
principle, noting that the cross-sections of a sphere of radius a are the same as those of the region obtained by
removing a (double) cone of height a and radius a from a cylinder of height 2a and radius a. The region in question
can be written as the union of a cube and 24 “caps,” each of which is a truncated portion of the intersection of two
cylinders.
7.4. PHYSICAL APPLICATIONS 199
√
Therefore, the full region has volume 16 1 − √1 = 8(2 − 2), approximately 19% more than the
2
volume of the unit ball.
7.4.3 Let the point in question be 0, and take the boundary of the ball to be given by r =
2a cos θ, |θ| ≤ π/2. Then the average distance is
Z π/2 Z 2a cos θ Z
1 2 1 8a3 π/2 8a 4 32
2
r drdθ = 2
cos3 θdθ = · = a ≈ 1.13a.
πa −π/2 0 πa 3 −π/2 3π 3 9π
7.4.4 Let the point in question be 0, and take the boundary of the ball to be given by ρ =
2a cos φ, 0 ≤ φ ≤ π/2. Then the average distance is
Z 2π Z π/2 Z 2a cos φ Z π/2
1 3 1 4 4
6
4 3
ρ sin φdρdφdθ = 4 3
8πa cos φ sin φdφ = a.
3 πa 0 0 0 3 πa 0 5
7.4.5 We take the square [0, a] × [0, a] and the corner tobe 0. By symmetry, it suffices
to
x
find the average distance from 0 to points in the triangle S = : 0 ≤ x ≤ a, 0 ≤ y ≤ x . The
y
average distance is
Z π/4 Z a sec θ Z
1 2 1 a3 π/4
2a 1 √ √
1 2 r drdθ = 1 2 sec3 θdθ = ( 2 + log( 2 + 1))
2a 0 0 2a
3 0 3 2
√ √
2 + log( 2 + 1)
= a ≈ 0.77a.
3
7.4.6 First of all, the mass of Ω is given by
Z Z π/4 Z 2 cos θ Z π/4 Z 2 cos θ
r sin θ 1 − log 2
m= δdA = 2
rdrdθ = sin θdrdθ = .
Ω 0 sec θ r 0 sec θ 2
Now we have
Z Z π/4 Z 2 cos θ
1 1 1 1 1 3 − log 4
x= xδdA = r cos θ sin θdrdθ = · (3 − log 4) = · ≈ 1.31
m Ω m 0 sec θ m 8 4 1 − log 2
Z Z π/4 Z 2 cos θ
1 1 1 3π − 8 1 3π − 8
y= yδdA = r sin2 θdrdθ = · = · ≈ 0.58
m Ω m 0 sec θ m 16 8 1 − log 2
Now we have
Z Z √
1 π/3 2 cos θ
1 √ 3 3
x= r cos θdrdθ = · 3= √ ≈ 1.26
m −π/3 1 m 2(3 3 − π)
Z π/3 Z 2 cos θ
1
y= r sin θdrdθ = 0.
m −π/3 1
x
7.4.8 Let Ω = : x2 + y2 ≤ a2 , y ≥ 0 . Without loss of generality, we may take the
y
1
density to be δ = 1. Then mass(Ω) = πa2 . By symmetry, x = 0. And
2
Z πZ a
1 2 2a3 4
y= r 2 sin θdrdθ = 2
· = a ≈ 0.42a.
mass(Ω) 0 0 πa 3 3π
x
7.4.9 Ω = y : x2 + y 2 + z 2 ≤ a2 , z ≥ 0 . Without loss of generality, we may take the
z
2
density to be δ = 1. Then mass(Ω) = πa3 . By symmetry, x = y = 0. And
3
Z 2π Z π/2 Z a
3 3 πa4 3
z= 3
(ρ cos φ)ρ2 sin φdρdφdθ = 3
· = a.
2πa 0 0 0 2πa 4 8
7.4.11 Without loss of generality, we may take the density to be δ = 1. We know from Exercise
7.2.13 that the volume of the tetrahedron is V = abc/6. Then
Z Z Z
1 a b(1−x/a) c(1−x/a−y/b)
x= xdzdydx
V 0 0 0
Z
1 bc a x 2
= · x 1− dx
V 2 0 a
(substituting u = 1 − x/a)
Z 1
1 bc 6 a2 bc 1 a
= · a2 (1 − u)u2 du = · · = .
V 2 0 abc 2 12 4
Similarly (merely permuting the variables), we have y = b/4 and z = c/4. That is, x is the average
of the four vertices of the tetrahedron.
x
7.4.12 Let Ω = y : x2 + y 2 ≤ a2 , 0 ≤ z ≤ h . Using cylindrical coordinates, we have
z
Z 2π Z a Z h
2π 3
δ = r, and the mass of the solid cylinder is m = r 2 dzdrdθ = a h. The moment of
0 0 0 3
Z 2π Z a Z h
2π 5 3
inertia about the z-axis is I = r 4 dzdrdθ = a h = ma2 .
0 0 0 5 5
7.4. PHYSICAL APPLICATIONS 201
Z 2π Z π Z a
4π 6
7.4.13 I= (ρ sin φ)2 (ρ)ρ2 sin φdρdφdθ = a .
0 0 0 9
7.4.14 In cylindrical coordinates, we have
Z Z √
2Z
√ Z √ √ !
2π 4−r 2 2 p 64 8 2
2 3 4
I= (r )rdzdrdθ = 2π r 4 − r 2 − r dr = 2π − .
0 0 r 0 15 3
7.4.16 a. No integration is required here. Here every particle is at distance a from the axis of
revolution, and I = ma2 = πδa4 h.
Z 2π Z a Z h
a4 1
b. I = δ r 3 dzdrdθ = 2πδ · · h = ma2 .
0 0 0 4 2
Z 2π Z a Z h 4
a 3
c. I = δ r 3 dzdrdθ = 2πδ · ·h= ma2 .
0 0 hr/a 20 10
7.4.18 Z One approach is to apply Exercise Z 7.2.20, seeking critical points of this function
Z of a: If
2 T
F (a) = kx − ak dV , then DF (a) = 2 (x − a) dV = 0 if and only if vol(Ω)a = xdV . That
Ω Z Ω Ω
1
is, a = xdV = x, which is the center of mass of Ω.
vol(Ω) Ω
Alternatively, we write this integral out explicitly as quadratic function of a and complete the
square:
Z Z Z Z
2 2 2
2
kx − ak dV = kxk − 2a · x + kak dV = kak vol(Ω) − 2a · xdV + kxk2 dV
Ω Ω Ω Ω
R Z
2 Ω xdV
= vol(Ω) kak − 2a · + kxk2 dV
vol(Ω) Ω
Z
= vol(Ω)ka − xk2 + kxk2 dV − kxk2 vol(Ω).
Ω
7.4.19 Without loss of generality, we take the density to be given by δ = 1. The mass of the
Z 2π Z a1/n Z z n Z a1/n
2π
solid is m = 2 rdrdzdθ = 2π z 2n dz = a(2n+1)/n . Then the moment of
0 0 0 0 2n + 1
Z 2π Z a1/n Z z n Z a1/n
inertia about the axis of revolution is given by I = 2 r 3 drdzdθ = π z 4n dz =
0 0 0 0
1
π 4+ n
π I 4n+1 a 2n + 1
a(4n+1)/n . Thus, we have = = . In particular, note that as n → ∞,
4n + 1 ma2 2π 1
4+ n 2(4n + 1)
2n+1 a
the ratio approaches a limiting value of 1/4.
7.4.22 Denote by x the center of mass of Ω. Without any loss of generality, we’ll take ℓ0 to
be the
z-axis,
x = 0, and ℓ to be the line parallel through the z-axis passing through the point
a
a = b . Then
0
Z Z Z Z
2 2
2 2
I= δ (x − a) + (y − b) dV = δ(x + y )dV − 2 δ(a · x)dV + δ(a2 + b2 )dV
Ω Ω Ω Ω
Z
= I0 − 2a · δxdV + mh2 = I0 + mh2 ,
Ω
Z
since δxdV = mass(Ω)x = 0.
Ω
7.4.24 We have
Z
z
F3 = G 2 2 2 3/2
dV
Ω (x + y + z )
Z 2π Z π/4 Z √2a Z Z !
2π π/2 Z a cot φ csc φ
=G cos φ sin φdρdφdθ + cos φ sin φdρdφdθ
0 0 0 0 π/4 0
!
√ Z π/4 Z π/2
= 2πGa 2 cos φ sin φdφ + (csc φ − sin φ)dφ
0 π/4
1 √ 1 √ 1
= 2πGa √ + log( 2 + 1) − √ = 2πGa log( 2 + 1) − √ .
2 2 2 2 2
7.4.25 Imagine two identical objects of mass M/2 located at ±a on the x-axis. The gravitational
force exerted by that system on a test mass at a − ε is obviously large and to the right. But the
force exerted by a mass M at the origin (the center of mass) on our test mass would be towards
the origin.
7.4.26 Following the calculation in Example 6, the φ-integral is unchanged and when b ≥ R we
have Z
4πG R GM
F3 = −
2
δ(ρ)ρ2 dρ = − 2 .
b 0 b
When b < R, it is still the case that the integrand vanishes whenever ρ > b, and so
Z
4πG b G
F3 = − 2 δ(ρ)ρ2 dρ = − 2 · (mass of the earth within distance b from the center).
b 0 b
7.4.27 a. The volume of this region is
Z π/2 Z √k cos φ Z
2 2π 3/2 π/2 4πk 3/2
V = 2π ρ sin φdρdφ = k (cos φ)3/2 sin φdφ = .
0 0 3 0 15
We have V = 4π/3 when k = 52/3 ≈ 2.92.
b. By symmetry, the gravitational force is all vertical, and it is given by
Z √
2π Z π/2 Z k cos φ √ Z π/2 4πG √ 4πG
F3 = G cos φ sin φdρdφdθ = 2πG k (cos φ)3/2 sin φdφ = k = 2/3 .
0 0 0 0 5 5
This is about 2.6% greater than the gravitational force of the uniform unit ball.
Remark: Finding the (rotationally symmetric) shape with given volume that maximizes the
gravitational force is a maximum problem in the space of (continuous) functions, a problem in the
204 7. INTEGRATION
calculus of variations. We define an inner product (see Section 5.3 of Chapter 5) on the vector
Z π/2
space of continuous functions on [0, π/2] by hf, gi = f (φ)g(φ)dφ. Then we want to maximize
Z π/2 0 Z π/2
F(f ) = f (φ) cos φ sin φdφ subject to the constraint G(f ) = f (φ)3 sin φdφ = 2. If f is a
0 0
constrained critical point, then we should have
Z Z
d π/2 π/2
Dg F(f ) = (f (φ) + tg(φ)) cos φ sin φdφ = g(φ) cos φ sin φdφ = 0
dt 0 0 0
Z Z
d π/2
3
π/2
Dg G(f ) = (f (φ) + tg(φ)) sin φdφ = 3 f (φ)2 g(φ) sin φdφ = 0
dt 0 0 0
(i.e., for every g in the tangent space to the constraint set). That is, we seek a function f so
that hg, cos sini = 0 for all g with hg, f 2 sini = 0. Standard orthogonal complementarity arguments
suggest then that we should have f (φ)2 = k cos φ for some constant k.
7.4.28 There are all sorts of details to iron out, such as how the helicopters will fight the fire,
but the key issue is to choose the location a ∈ R2 of the helipad so that we minimize, on average, the
amount of forest that will burn as the helicopters fly from a to the center, x, of the fire. Intuitively,
if the fire starts at x, then the amount that burns will be proportional to kx − ak2 (since the time
is proportional to the distance from a to x and the area that burns is proportional to the square of
the time). So we should choose a to make the average value of kx − ak2 as small as possible. This
means (see Exercise 18) that we should put a at the center of mass of Ω. (Of course, this is not
absolutely right, because if x is sufficiently close to the boundary of Ω, then less forest will actually
burn.)
7.5.1 We use the properties of the determinant and arrange to apply Proposition 5.12. Of
course, one can
use either row or column
operations
to get
the the matrix
to triangular form.
−1
6 −2 −1 0
0 0 −1 0
a. 3 4 5 = 3 22 −1 = − 3 −1 22
5 2 1 5 32 −9 5 −9 32
−1 0 0
= − 3 −1 0 = −(−1)(−1)(−166) = 166.
5 −9 −166
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 205
1 0 2 0 1 0 0 0 1 0 0 0
−1 2 −2
0 −1 2 0
0 −1 2 0 0
b. = =
0 1 2 6 0 1 2 6 0 1 2 0
1 1 3 2 1 1 1 2 1 1 1 −1
= (1)(2)(2)(−1) = −4.
1 4 1 −3 1 0 0 0
2 10 0 1 4 2 0 0
c. = = (1)(2)(2)(3) = 12.
0 0 2 2 1 −2 2 0
0 0 −2 1 −3 7 2 3
2 −1 0 0 0 1 2 0 0 0
−1 2 −1 0 0 −2 −1 −1 0 0
d. 0 −1 2 −1 0 = 1 0 2 −1 0
0 0 −1 2 −1 0 0 −1 2 −1
0 0 0 −1 2 0 0 0 −1 2
1 0 0 0 0 1 0 0 0 0
−2 3 −1 0 0 −2 1 3 0 0
= 1 −2 2 −1 0 = 1 −2 −2 −1 0
0 0 −1 2 −1 0 1 0 2 −1
0 0 0 −1 2 0 0 0 −1 2
1 0 0 0 0 1 0 0 0 0
−2 1 0 0 0 −2 1 0 0 0
= 1 −2 4 −1 0 = 1 −2 1 4 0
0 1 −3 2 −1 0 1 −2 −3 −1
0 0 0 −1 2 0 0 1 0 2
1 0 0 0 0 1 0 0 0 0
−2 1 0 0 0 −2 1 0 0 0
= 1 −2 1 0 0 = 1 −2 1 0 0
0 1 −2 5 −1 0 1 −2 1 5
0 0 1 −4 2 0 0 1 −2 −4
1 0 0 0 0
−2 1 0 0 0
= 1 −2 1 0 0 = (1)(1)(1)(1)(6) = 6.
0 1 −2 1 0
0 0 1 −2 6
7.5.3 We need only prove this statement for each of the three types of elementary matrices
listed in Section 2 of Chapter 4.
(i) Multiplying A by an elementary matrix E of type (i) interchanges the ith and j th
columns of A, so det(AE) = − det A. Since E itself is obtained by interchanging the
same two columns of the identity matrix, we have det E = − det I = −1. Hence,
det(AE) = det E det A.
(ii) Multiplying A by an elementary matrix E of type (ii) multiplies the ith column of A by
the scalar c, and so det(AE) = c det A. Since we obtain E by multiplying the ith column
of the identity matrix by c, we have det E = c det I = c, and so det(EA) = det E det A.
(iii) Multiplying A by an elementary matrix E of type (iii) adds a scalar multiple of column i
to row j, and hence doesn’t change the determinant of A. On the other hand, we obtain
E by adding that same scalar multiple of the ith column of the identity matrix to its j th
column, and so det E = det I = 1. Thus, in this case, too, we have det(AE) = det A,
as required.
7.5.4 We need only prove this statement for each of the three types of elementary matrices
listed in Section 2 of Chapter 4. An elementary matrix of type (i) or (ii) is symmetric, and so the
result is immediate. Next, any elementary matrix of type (iii) has determinant equal to 1 and the
transpose of any elementary matrix of type (iii) is again of type (iii), so the result is immediate.
7.5.5 This follows by applying the second property of D repeatedly. Multiplying a single
column of A by the scalar c results in multiplying the determinant by c; doing so to each of the
columns in succession multiplies the determinant by c a total of n times, hence multiplies the
original determinant by a factor of cn .
7.5.6 When A is a 1 × 1 matrix with a single integer entry, then det A is obviously an integer.
Now, assume that the determinant of any k×k matrix whose entries are integers must be an integer.
Let A be a (k + 1) × (k + 1) matrix whose entries are all integers. By Proposition 5.14, expanding
P
in cofactors along the first row, we have det A = k+1 j=1 a1j C1j , where C1j = (−1)
1+j det A
1j and
th
A1j is the k × k matrix obtained by crossing out the first row and j column of A. Since this is a
k × k matrix with integers entries, we infer that C1j is an integer for j = 1, . . . , k + 1. Therefore,
det A, being a sum of products of integers, is itself an integer. (Alternatively, by Proposition 5.18
directly, det A is the sum of products of integers and is therefore an integer.)
7.5.7 The crucial ingredient in solving this problem is to remember the meaning of place values.
Using property (3) of Proposition 5.4 three times, we add 103 times the first column, 102 times the
second column, and 10 times the third column to the fourth. Then
1 8 9 8 1 8 9 1898 1 8 9 146
3 4 7 1 3 4 7 3471 3 4 7 267
= = 13 ;
7 2 1
5 7 2 1 7215 1 555
7 2
8 1 6 4 8 1 6 8164 8 1 6 628
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 207
since the final determinant must be an integer, we see that our original determinant is divisible by
13. (Of course, we needn’t know the actual values of the integers in the last column; all we need
to know is that all those entries are integers.)
7.5.8 Moving the bottom row to the first for ease of calculation, we have
a1 b1 c1 1 1 1 1 0 0
b1 − a1 c1 − a1
a2 b2 c2 = a1 b1 c1 = a1 b1 − a1 c1 − a1 = ,
b2 − a2 c2 − a2
1 1 1 a2 b2 c2 a2 b2 − a2 c2 − a2
which (see Section 5 of Chapter 1) is twice the signed area of △ABC. A nicely geometric, alternative
solution is this: the original 3× 3 determinant
above givesthesigned volume of the parallelepiped
a1 b1 c1
spanned by the vectors x = a2 , y = b2 , and z = c2 . The pyramid with vertices 0, x,
1 1 1
y
z x
1
y, and z has 1/6 that volume; but, on the other hand, the volume of the pyramid is 1/3 the area
of its base (the shaded triangle) times its height (1). Therefore, the area of the triangle is 1/2 the
determinant. On the other hand, that triangle is congruent to △ABC.
7.5.9 If A is in upper- or lower-triangular form, det A is the product of its diagonal entries,
and then the determinant of the given (n + 1) × (n + 1) matrix will be that same product. In
general, reducing the the given matrix to upper- or lower-triangular form requires exactly the
same row (or column) operations as reducing A itself to that form, so the determinants are the
same. Geometrically, the (n + 1)-dimensional signed volume of a parallelepiped with height 1 is the
n-dimensional signed volume of its base.
7.5.10 a. Let E ′ be the product of the k × k elementary matrices corresponding to the row
operations required to put A in upper-triangular form U ′ . Let E ′′ be the product of the ℓ × ℓ
208 7. INTEGRATION
which is an upper-triangular matrix. Now we have det E ′ det A = det U ′ , which equals the product
of the diagonal entries of U ′ , and det " E ′′ det D =# det U ′′ ,"which equals
# " the product
# of the diago-
′
E O ′
E O I O
nal entries of U ′′ . Also note that det ′′
= det = det E ′ det E ′′ .
O E O I O E ′′
Putting this all together, we have
" #
A B det U det U ′ det U ′′ det U ′ det U ′′
det = = = = det A det D,
O D det E ′ det E ′′ det E ′ det E ′′ det E ′ det E ′′
as desired.
b. We do row operations on the given matrix to reduce it to the block form in part a:
Since A is invertible, we can do row operations to convert A to the identity matrix, then do row
operations to remove the entries of C below. In equations, we have
" #" #" # " #
I O A−1 O A B I A−1 B
= .
−C I O I C D O D − CA−1 B
Rewriting this, we have
" # " #" #" #
A B A O I O I A−1 B
= ,
C D O I C I O D − CA−1 B
so, using the product rule for determinants and the result of part a, we have
" #
A B
det = det A det(D − CA−1 B).
C D
7.5.12 By Exercise 5, we have det AT = det(−A) = (−1)n det A = − det A, since" n is odd.
# But
0 −1
we know that det AT = det A, so det A = 0. Notice that when n = 2, we have det = 1.
1 0
7.5.15 a. If A has integer entries, then (by Exercise 6) the cofactors of A are all integers. Since
det A = ±1, we infer from Proposition 5.17 that A−1 = ±C T will have all integer entries as well.
b. Since AA−1 = I, we know that (det A)(det A−1 ) = 1. Since A and A−1 are both
matrices with integer entries, we know that det A and det A−1 are both integers. The only way to
obtain 1 as the product of two integers is for both integers to be either 1 or −1. Thus, det A = ±1.
7.5.16 Suppose i < j and we wish to interchange rows i and j. We first exchange Ai and Ai+1 ,
then Ai and Ai+2 , and so on, until we exchange Ai and Aj , so that we have reached the ordering
So far we have made j − i exchanges of adjacent rows. Next, we move Aj back into the original
position of Ai by exchanging Aj with Aj−1 , then with Aj−2 , . . . , finally with Ai+1 . This is a total
of j − i − 1 interchanges of adjacent rows. In summary, we have interchanged Ai and Aj with a
total of (j − i) + (j − i − 1) = 2(j − i) − 1 interchanges of adjacent rows. Since 2(j − i) − 1 is odd,
we are done.
7.5.18 Suppose A is singular; then det A = 0 and so we must show that AC T = O. The ij-entry
of AC T is equal to
n
X n
X
(∗) Ai · C j = aik Cjk = (−1)j+k aik det Ajk .
k=1 k=1
When i = j, this is just the formula for det A, expanding in cofactors along the ith row, and so we
obtain 0. When i 6= j, consider the matrix à obtained by replacing the j th row of A by Ai . Then
the final sum in (∗) is the formula we get when we expand det à in cofactors along the j th row,
which must be 0 by virtue of Lemma 5.2.
When A is singular, the columns of C T are in N(A).
1 1 1
x x1 x2
7.5.19 a. First notice that if = or , then x x1 x2 = 0, since the matrix has
y y1 y2
y y1 y2
two identical columns. Expanding the determinant in cofactors along the first column we obtain a
linear function ax + by + c = 0, where
1 1 1 1 x x
1 2
a = − , b= , and c= .
y1 y2 x1 x2 y1 y2
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 211
x1 x2
Since the points and are distinct, either a or b must be nonzero, so this gives the
y1 y2
x1
equation of a line, and that line, as we’ve already seen, passes through the given points and
y1
x2
.
y2
x1 x2 x3
b. As in part a, it is clear that the three points y1 , y2 , y3 must satisfy the
z1 z2 z3
given equation. Similarly, expanding the determinant in cofactors along the first column, we obtain
an equation of the form ax + by + cz + d = 0. This will be the equation of a plane provided we check
that at least one
ofa, b, and c must be nonzero. Taking a slightly different approach, we just show
x∗
that for some y ∗ ∈ R3 , this equation does not hold. We observe that since the three points are
z∗
x2 − x1 x3 − x1
noncollinear, the vectors y2 − y1 and y3 − y1 form a linearly independent set of vectors in R3 .
z2 − z1 z − z1
3
1 0 0
x1 x2 − x1
It follows that , , and x3 − x1 form a linearly independent set of vectors in R4 ,
y1 y2 − y1 y3 − y1
z1 z2 − z1 z3 − z1
1 1
0 0 1 1 1 1
1
∗ ∗
x ∗ x x1 x2 − x1 x3 − x1 x x1 x2 x3
4
and so there is some vector ∗ ∈ R so that ∗ 6 0.
y y y y − y y − y = y∗ y y y =
1 2 1 3 1 1 2 3
z∗ ∗
z z1 z2 − z1 z3 − z1 z ∗ z1 z2 z3
x x1 x2 x3
7.5.20 Setting equal to , , or yields a matrix with two identical columns,
y y1 y2 y3
and so we know that these three points satisfy the respective equations. Expanding in cofactors
along the first column, we see that the equations take the appropriate form.
For example,
the first
1 1 1
equation is of the form dy = a′ x2 + b′ x + c′ , and if we check that d = x1 x2 x3 6= 0, then we’ll
x2 x2 x2
1 2 3
be done. Since x1 , x2 , and x3 are distinct, it follows from Exercise 4.1.22 that this determinant is
nonzero.
Similarly,
assuming that three points are noncollinear ensures (see Exercise 4.1.23) that
1 1 1
x1 x2 x3 6= 0, so a′ 6= 0, and we have a bona fide equation of a parabola. Similarly, in the case
y1 y2 y3
of the circle (where we no longer need any assumption on the xi ’s), the same inequality guarantees
that we get a nonzero coefficient of x2 + y 2 when we expand the second determinant by cofactors
along the first column.
212 7. INTEGRATION
7.5.21 Suppose det and det f are two functions satisfying the properties in Theorem 5.1. Since
the proofs of Theorem 5.5, Corollary 5.6, and Proposition 5.12 rely only on the properties listed
in Theorem 5.1, we conclude that both det and det f satisfy the conclusions of these proposi-
tions. In particular, we infer that if A is singular, then det A = det f A = 0; if A is nonsin-
gular, then since A = Em Em−1 · · · E2 E1 is a product of elementary matrices, then det A =
f = detE
det Em det Em−1 · · · det E2 det E1 and detA f m detE
f m−1 · · · detE
f 2 detE
f 1 . Since it follows di-
rectly from the properties that det E = detEf for any elementary matrix E, we conclude that
f for all square matrices A.
det A = detA
Remark: Since there are many ways of writing A as a product of elementary matrices, we might
worry that we could get different answers for det A. This is a question of well-definedness of the
function det, not of its uniqueness.
7.5.22 Let V = Span(v1 , . . . , vk ) and choose an orthonormal basis {w1 , . . . , wn−k } for V ⊥ . Let
A be the n × n matrix whose columns are v1 , . . . , vk , w1 , . . . , wn−k . Since the parallelepiped in Rn
spanned by v1 , . . . , vk , w1 , . . . , wn−k is an “extension” of the k-dimensional parallelepiped spanned
by v1 , . . . , vk by unit lengths in the orthogonal directions, we conclude that det A is equal to the
signed volume of the k-dimensional parallelepiped spanned by v1 , . . . , vk . Thus, the square of this
volume is (det A)2 = det(AT A). But
v1 · v1 · · · v1 · vk
.. .. ..
. . . O
T
A A = vk · v1 · · · vk · vk ,
O In−k
v · v · · · v · v
1 1 1 k
. .. ..
and so (det A)2 = .. . . , as required.
vk · v1 · · · vk · vk
7.5.23 a. Let ϕ(t) = det(I + tB). Then D(det)(I)B = ϕ′ (0). Now, by Proposition 5.18,
X
ϕ(t) = sign(σ)(I + tB)1σ(1) · · · (I + tB)nσ(n)
σ
X
(†) = (1 + tb11 )(1 + tb22 ) · · · (1 + tbnn ) + sign(σ)(I + tB)1σ(1) · · · (I + tB)nσ(n)
σ(i)6=i for some i
2
= 1 + t(b11 + b22 + · · · + bnn ) + t (· · ·);
note that in each term of the second sum in (†), there must be at least two off-diagonal terms,
hence at least a factor of t2 . Therefore, ϕ′ (0) = b11 + b22 + · · · + bnn = trB, as desired.
b. Since ψ(t) = det(A + tB) = det A det(I + tA−1 B) for any invertible A, by the result
of part a, we have D(det)(A)B = ψ ′ (0) = det A tr(A−1 B).
7.5. DETERMINANTS AND n-DIMENSIONAL VOLUME 213
x1
7.5.24 Let Ω ⊂ Rn−1 be the projection of R into the x2 · · · xn -plane. For x = ∈ R × Rn−1 ,
x
write R = {x ∈ Rn : φ(x) ≤ x1 ≤ ψ(x)}. Then we have
Z Z ψ(x) Z
vol(R) = dx1 dVn−1 = ψ(x) − φ(x) dVn−1 .
Ω φ(x) Ω
7.5.26 First consider the case of a circle. Notice that all inscribed equilateral triangles have
maximum area, since if one alters the triangle by moving one vertex, the height—and hence the
area—of the triangle decreases. For the general case, place the ellipse E in the plane so that its
major and minor axes are aligned on the x- and y-axes; then its equation is x2 /a2 + y 2 /b2 =
1for
x
appropriate positive numbers a and b. The linear transformation T : R2 → R2 given by T =
y
ax
maps the unit circle to our ellipse E. By Proposition 5.13, T scales area by the (constant)
by
factor of ab, so triangles of maximal area inscribed in the unit circle must map to triangles of
maximal area inscribed in the ellipse. Since there are infinitely many of the former, there must be
infinitely many of the latter.
7.5.27 Consider f (t) = det(A + tB). This is a quadratic polynomial in t. By part b of Exercise
15, we have f (t) = ±1 for t = 0, 1, 2, 3, 4. Since a nonconstant quadratic polynomial can take on
a given value at most two times, it follows that f is a constant polynomial. So either f (t) = 1 for
all t or f (t) = −1 for all t. In either event, part a of Exercise 15 tells us that A + tB is invertible
for all real values t and that its inverse has integer entries for all integers t. (In fact, one can show
that (A−1 B)2 = O, since tr(A−1 B) = det(A−1 B) = 0.)
214 7. INTEGRATION
P
n
e. We have kxk2 = max (x2i ) ≤ x2i ≤ n max(x2i ) = nkxk2 . Now, choose x0 with
1≤i≤n i=1
±1
√
kx0 k = 1 so that kT k = kT (x0 )k ≤ kT (x0 )k. Since x0 = ... , we have kx0 k = n and
±1
√ √
so kT (x0 )k ≤ nkT k. Therefore, we have kT k ≤ nkT k. As for the remaining inequality,
let A be the standard matrix for T . Then kT k = kT (x∗ )k for some x∗ with kx∗ k = 1, and so
√ P √
kT k ≤ n max |aij | = nkAk .
1≤i≤m j
f. We have
Z b
Z b Z b Z b
g(t)dt
= max
gi (t)dt ≤ max |gi (t)|dt ≤ kg(t)k dt.
1≤i≤n 1≤i≤n
a a a a
7.6.2 If we wanted to assume that f is continuous (or, alternatively, that the hypotheses
of Fubini’s Theorem hold), we could derive this immediately by making substitutions in iterated
integrals. However, it is easy to give a straightforward argument using the definition of the integral.
There is a one-to-one correspondence between partitions P of R and partitions P′ of T (R). If the
(diagonal) entries of [T ] are d1 , . . . , dn , then, given a partition of R as on p. 268 of the text,
we obtain the partition P′ by taking (di )xij , 1 ≤ i ≤ n, 1 ≤ j ≤ ki . Since vol(Rj′ 1 j2 ...jn ) =
|d1 d2 · · · dn |vol(Rj1 j2 ...jn ), we see that
moreover, since f is integrable on T (R), given ε > 0, we can find a partition P′ so that U (f, P′ ) −
L(f, P′ ) < | det T |ε, and so the corresponding partition P will have the property thatZ U (f ◦ T, P) −
L(f ◦ T, P) < ε. Therefore, f ◦ T is integrable on R. Now, since L(f, P′ ) ≤ | det T | (f ◦ T )dV ≤
Z R Z
′
U (f, P ) and f is integrable on T (R), we infer that, by uniqueness, | det T | (f ◦ T )dV = f dV .
R T (R)
x ax
7.6.3 The ellipse is the image of the unit disk in R2 under the linear map T = ; the
y by
x ax
ellipsoid is the image of the unit disk in R3 under the linear map T y = by . Therefore, by
z cz
the Change of Variables Theorem, the area of the ellipse is | det T |π = πab, and the volume of the
4π 4π
ellipsoid is | det T | = abc.
3 3
7.6.4 a. We have
Z Z π/2 Z 1/(cos θ+sin θ) cos θ−sin θ
f dA = e cos θ+sin θ rdrdθ
S 0 0
Z π/2 Z 1
1 1 cos θ−sin θ 1 1 1
= 2
e cos θ+sin θ dθ = eu du = e − ).
2 0 (cos θ + sin θ) 4 −1 4 e
cos θ − sin θ
(Here, magically, the substitution u = works out perfectly, as du =
cos θ + sin θ
−2/(cos θ + sin θ)2 dθ.)
" # " #" #
x u 1 1 1 u
b. Let = g = . Then g maps the region Ω =
y v 2 −1 1 v
u
: 0 ≤ v ≤ 1, −v ≤ u ≤ v one-to-one and onto S. Thus, we have
v
Z Z
f dA = (f ◦ g)| det Dg|dAuv
S Ω
Z 1Z v Z 1 iv Z 1
u/v 1 1 1 1 1 1
= e v(e − )dv = e − ).
dudv = veu/v dv =
0 −v 2 2 0 −v 0 e 4 e 2
" # " #" #
x u 1 3 −1 u
7.6.5 Let u = 2x + y, v = −x + 3y. Then =g = , and g maps
y v 7 1 2 v
u
the region Ω = : 1 ≤ u ≤ 5, − u2 ≤ v ≤ 1 one-to-one and onto S. Then we have
v
Z Z 5Z 1 Z 5
x − 3y 1 v 1 u 1 1
dA = − dvdu = − du = (3 − log 5).
S 2x + y 7 1 −u/2 u 14 1 4 u 14
" # " #
x u u/v
7.6.6 Substituting u = xy, v = y means that we consider the map =g = .
y v v
u √
This function g maps the region Ω = : 1 ≤ u ≤ 4, u ≤ v ≤ 3 one-to-one and onto S. Now
v
216 7. INTEGRATION
" #
1/v −u/v 2
we have Dg = , and det(Dg) = 1/v. Therefore, we have
0 1
Z Z 4Z 3 Z 4Z 3 Z 4
1 √ 13
ydA = √
v · dvdu = √
dvdu = ( u − 3)du = ,
S 1 u v 1 u 1 3
just as before.
7.6.14 The image of g is a solid torus, obtained by rotating a disk of radius b about a circle of
radius a. We have
cos φ cos θ −(a + r cos φ) sin θ −r sin φ cos θ
Dg = cos φ sin θ (a + r cos φ) cos θ −r sin φ sin θ .
sin φ 0 r cos φ
Now, starting with the last row and subtracting the previous row from each, we have
1 1 1 · · · 1 1 1 1 ··· 1
1 2 1 · · · 1 0 1 0 ··· 0
1 2 3 · · · 1 = 0 0 2 ··· 0 = (n − 1)!.
. . . . . .. .. .. .. . . ..
.. .. .. . . . . . . .
1 2 3 · · · n 0 0 0 · · · n − 1
x1 x1
x2 2x2
7.6.16 Consider the linear map T : Rn → Rn defined by T . = . . Then det T = n!
.. ..
xn nxn
and T (S) is the “pyramid” R = {x ∈ Rn : xi ≥ 0 for all i, x1 + x2 + · · · + xn ≤ n}. Then vol(R) =
det T vol(S). By induction, the volume of the pyramid {xi ≥ 0 for all i, x1 + x2 + · · · + xn ≤ 1} is
1/n!, so vol(R) = nn /n!, and therefore vol(S) = nn /(n!)2 .
ρ ρ sin ψ sin φ cos θ
ψ ρ sin ψ sin φ sin θ
Define g : (0, ∞) × (0, π) × (0, π) × (0, 2π) → R4 by g =
. Then
7.6.17 φ ρ sin ψ cos φ
θ ρ cos ψ
3 2
a deliberate calculation yields | det Dg| = ρ sin ψ sin φ. Therefore,
Z Z 2π Z π Z π Z a
π a5 2π 2 5
kxkdV = ρ4 sin2 ψ sin φdρdψdφdθ = (4π) = a .
B(0,a) 0 0 0 0 2 5 5
1 P∞ 1 P∞
7.6.18 a. Since = uk when |u| < 1, whenever |xy| < 1, we have = (xy)k
1−u k=0 1 − xy k=0
and
Z Z 1Z 1X∞
1
I= dA = xk y k dxdy
R 1 − xy 0 0 k=0
∞
X Z 1 X∞ X∞
1 1 1
= y k dy = 2
= .
0 k+1 (k + 1) k2
k=0 k=0 k=1
To justify the interchange of the summation and the integration, we need uniform convergence.
(See Chapter 24 of Spivak.) We know that the geometric series converges uniformly on [−b, b] for
Z Z 1−δ Z 1 X
1
any 0 < b < 1. Thus, if we consider Iδ = dA = xk y k dxdy, then we
[0,1]×[0,1−δ] 1 − xy 0 0
can move the summation outside the integral, since the series converges uniformly on |xy| ≤ 1 − δ.
∞ (1 − δ)k
P
We then get Iδ = , and, by Abel’s Theorem, lim Iδ = I.
k=1 k2 δ→0+
" #
u 1 u−v
b. Consider the mapping g =√ . Then g maps the square S with vertices
v 2 u+v
√ √ √
0 1/ 2 2 1/ 2
, √ , , and √ to R. Then we have
0 −1/ 2 0 1/ 2
Z Z Z
1 1 1
dAxy = 1 2 dAuv = 2 dAuv .
R 1 − xy S 1 − 2 (u − v 2 ) S (2 − u2 ) + v 2
8.1.1 It does neither. A reflection does, however, interchange “front” and “back,” thereby
reversing orientation.
8.1.2 They made the fan belt in the form of a Möbius strip (see Figure 4.6).
8.2.1 By definition, Λk (Rn )∗ is the vector space spanned by the dxI with I increasing. Note
that if I = (i1 , . . . , ik ) and (j1 , . . . , jk ) are increasing k-tuples, then
1, j = i , j = i , . . . , j = i
1 1 2 2 k k
(†) dxI (ej1 , . . . , ejk ) = .
0, otherwise
(If i1 < i2 < · · · < ik and j1 < j2 < · · · < jk , suppose ℓ is the smallest index for which we have
jℓ 6= iℓ . If jℓ < iℓ , then jℓ 6= is for all s and so the ℓth column of our determinant is all 0’s; if
jℓ > iℓ , then iℓ 6= js for all s and so the ℓth row of our determinant is all 0’s.) So, suppose that
P
T = I increasing cI dxI = 0. Fixing an increasing k-tuple J = (j1 , . . . , jk ), it follows from (†) that
0 = T (ej1 , . . . , ejk ) = cJ . Since this holds for every increasing k-tuple J, we infer that all the cI ’s
are 0, and so the dxI form a linearly independent set. But this calculation also establishes the
P
second fact: given any T ∈ Λk (Rn )∗ , we write T = I increasing aI dxI for some scalars aI , and then,
by (†), we have aJ = T (ej1 , . . . , ejk ) for any increasing k-tuple J.
8.2.3 We have
e 1 v1 w 1
v × w = (v2 w3 − v3 w2 )e1 + (v3 w1 − v1 w3 )e2 + (v1 w2 − v2 w1 )e3 = e2 v2 w2 ,
e 3 v3 w 3
so dx(v × w) = v2 w3 − v3 w2 = (dy ∧ dz)(v, w). The other formulas are checked similarly.
220
8.2. DIFFERENTIAL FORMS 221
Remark: This exercise shows that the wedge product is the appropriate generalization of the
cross product to higher dimensions.
which is the signed volume of the parallelepiped spanned by n, v, and w. Since n is a unit vector
orthogonal to v and w, the volume is precisely the signed area of the parallelogram spanned by v
and w. (Cf. Proposition 5.1 of Chapter 1 and its proof.)
8.2.8 Part (4) of Proposition 2.3 gives a necessary condition. Note that if such a (k − 1)-form
η exists, then η ′ = η + dρ will also work for any (k − 2)-form ρ.
a. η = xdy
222 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
b. η = 12 x2 dy
c. dω = dz ∧ dx ∧ dy, so there can be no such 1-form η.
d. η = yz(dz − dx)
e. dω = dx ∧ dy ∧ dz, so there can be no such 1-form η.
f. dω = (x2 + y 2 + z 2 )−1 dx ∧ dy ∧ dz, so there can be no such 1-form η.
g. η = x1 x5 dx2 ∧ dx3 ∧ dx4
8.2.9 When we say “extending by linearity,” technically this is only correct if we think of the
set of forms as a module over the ring of smooth functions. We mean, of course, that ⋆(f dx+gdy) =
f ⋆dx + g⋆dy = f dy − gdx, etc.
∂f ∂f ∂f ∂f
a. We have df = dx + dy, so ⋆(df ) = dy − dx, and, thus, d⋆(df ) =
∂x ∂y ∂x ∂y
∂2f ∂2f ∂2f ∂2f
dx ∧ dy − dy ∧ dx = + dx ∧ dy.
∂x2 ∂y 2 ∂x2 ∂y 2
∂f ∂f ∂f ∂f ∂f ∂f
b. df = dx+ dy + dz, so ⋆(df ) = dy ∧dz + dz ∧dx+ dx∧dy, and, thus,
∂x ∂y ∂z ∂x ∂y ∂z
∂2f ∂2f ∂2f ∂2f ∂2f ∂2f
d⋆(df ) = dx ∧ dy ∧ dz + dy ∧ dz ∧ dx + dz ∧ dx ∧ dy = + + dx ∧ dy ∧ dz.
∂x2 ∂y 2 ∂z 2 ∂x2 ∂y 2 ∂z 2
8.2.10 Suppose df = λω. Then 0 = d(df ) = dλ ∧ ω + λdω. Wedging this equation with ω gives
8.2.13 Using the result immediately preceding Proposition 2.4 (or Exercise 17), we have (ex-
panding in cofactors along the third row)
sin φ cos θ ρ cos φ cos θ −ρ sin φ sin θ
g∗ (dx ∧ dy ∧ dz) = det(Dg)dρ ∧ dφ ∧ dθ = sin φ sin θ ρ cos φ sin θ ρ sin φ cos θ dρ ∧ dφ ∧ dθ
cos φ −ρ sin φ 0
= cos φ(ρ2 sin φ cos φ) + ρ sin φ(ρ sin2 φ) dρ ∧ dφ ∧ dθ = ρ2 sin φdρ ∧ dφ ∧ dθ.
8.2.14 a. By part (4) of Proposition 2.3, if ω = dη, then dω = d(dη) = 0. The closed 1-form
ω = (−ydx + xdy)/(x2 + y 2 ) is, however, not exact on R2 − {0}. (See Theorem 3.2 and Example
9 of Section 3.)
b. Since dω = dφ = 0, by part (3) of Proposition 2.3, d(ω ∧ φ) = dω ∧ φ ± ω ∧ dφ = 0.
c. Suppose ω = dη and dφ = 0. Then d(η ∧ φ) = dη ∧ φ ± η ∧ dφ = ω ∧ φ, so ω ∧ φ is
indeed exact.
8.2.15 Since dx1 , . . . , dxn give a basis for (Rn )∗ , there are scalars aij , j = 1, . . . , n, so that
Pn
ωi = j=1 aij dxj . Using the hypothesis, we have
k
X k X
X n X X
0= dxi ∧ ωi = aij dxi ∧ dxj = aij dxi ∧ dxj + aij dxi ∧ dxj
i=1 i=1 j=1 1≤i,j≤k 1≤i≤k
k+1≤j≤n
X X
= aij − aji dxi ∧ dxj + aij dxi ∧ dxj .
1≤i<j≤k 1≤i≤k
k+1≤j≤n
Since {dxi ∧ dxj : i < j} gives a basis for Λ2 (Rn )∗ , we infer that aij − aji = 0 for 1 ≤ i < j ≤ k and
aij = 0 for k + 1 ≤ j ≤ n. This is what we needed to establish.
224 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
8.2.16 By Proposition 2.4, we have (g◦ h)∗ dxi = d(g◦ h)i = d(gi ◦ h) = d(h∗ gi ) = h∗ (dgi ) =
h∗ (g∗ dxi ). It now follows from the definition of pullback that
X X X
(g◦ h)∗ fI dxI = (g◦ h)∗ fI (g◦ h)∗ dxI = fI ◦ (g◦ h)(g◦ h)∗ dxi1 ∧ · · · ∧ (g◦ h)∗ dxik
I I I
X
= (fI ◦ g)◦ h(g◦ h)∗ dxi1 ∧ · · · ∧ (g◦ h)∗ dxik
I
X
= h∗ (g∗ fI )h∗ (g∗ dxi1 ) ∧ · · · ∧ h∗ (g∗ dxik )
I
X X
= h∗ (g∗ fI )g∗ dxI = h∗ g∗ fI dxI ,
I I
as required.
8.2.17 a. By definition of the sign of the permutation σ (see p. 306), dxσ(1) ∧dxσ(2) ∧· · ·∧dxσ(n) =
sign(σ)dx1 ∧ dx2 ∧ · · · ∧ dxn .
b. We have
X
n X
n X
n
ω1 ∧ ω2 ∧ · · · ∧ ωn = a1j1 dxj1 ∧ a1j2 dxj2 ∧ · · · ∧ a1jn dxjn
j1 =1 j2 =1 jn =1
X
= a1j1 a2j2 . . . anjn dxj1 ∧ dxj2 ∧ · · · ∧ dxjn
X
= a1σ(1) a2σ(2) . . . anσ(n) dxσ(1) ∧ dxσ(2) ∧ · · · ∧ dxσ(n)
permutations σ
X
= a1σ(1) a2σ(2) . . . anσ(n) sign(σ)dx1 ∧ · · · ∧ dxn = det Adx1 ∧ · · · ∧ dxn ,
permutations σ
as required.
∂gi
c. Setting aij = , the result is immediate from part b.
∂xj
8.2.18 First note that it suffices to check the equality when the vj are basis vectors for the
following reason: By the properties of determinant, the function T : |Rn × ·{z
· · × Rn} → R defined
k times
by T (v1 , . . . , vk ) = det[φi (vj )] is alternating and multilinear and therefore defines an element of
Λk (Rn )∗ . As we know, e.g., from Exercise 1, the values of T on k-tuples of standard basis vectors
determine T uniquely.
P
n
Set φi = aij dxj . Then, as we saw in Exercise 17,
j=1
X
n X
n
φ1 ∧ · · · ∧ φk = a1j1 dxj1 ∧ · · · ∧ a1jk dxjk
j1 =1 jk =1
X
= a1j1 . . . akjk dxj1 ∧ · · · ∧ dxjk ,
Now,
sign(σ), j1 = σ(J1 ), . . . , jk = σ(Jk ) for some permutation σ
(dxj1 ∧ · · · ∧ dxjk )(eJ1 , . . . , eJk ) = ,
0, otherwise
8.2.19 Note, first of all, that if v ∈ Rm , then g∗ dxi (a)(v) = Dgi (a)v (e.g., by Proposition 2.4).
Then for any ordered k-tuple I = (i1 , . . . , ik ) and vectors v1 , . . . , vk ∈ Rm , we have (using Exercise
18 at the last step)
g∗ dxI (v1 , . . . , vk ) = g∗ dxi1 ∧ · · · ∧ g∗ dxik (v1 , . . . , vk ) = dgi1 ∧ · · · ∧ dgik (v1 , . . . , vk )
= det [dgiℓ (vj )]1≤ℓ,j≤k = det [Dgiℓ (vj )]1≤ℓ,j≤k = det [(Dg(a)vj )iℓ ]
= dxI Dg(a)v1 , . . . , Dg(a)vk .
The case of the general k-form follows by linearity (over smooth functions).
n
X ∂f
8.2.20 Suppose d′ satisfies all the properties listed in Proposition 2.3 as well as d′ f = dxj .
∂xj
j=1
P
Let ω ∈ Ak (U ) be arbitrary. Then ω = I fI dxI . We have
X
d′ ω = d′ fI dxI by Property (1)
I
X
= d′ fI ∧ dxI + fI d′ (dxI ) by Property (2)
I
X
= dfI ∧ dxI + fI d′ (dxI ) by the additional hypothesis
I
X
= dfI ∧ dxI + fI d′ (dxi1 ∧ · · · ∧ dxik )
I
X
= dfI ∧ dxI + fI d′ dxi1 ∧ · · · ∧ dxik − · · · ± dxi1 ∧ · · · ∧ d′ dxik by Property (3)
I
X
= dfI ∧ dxI by Property (4), using the hypothesis that dxi = d′ xi
I
= dω,
as required.
226 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
8.3.6 We use
the
second
Z x0 method Z yillustrated in the text to construct potential functions.
x0 0
1 2 2 x 1
a. f = xdx + (x0 + y)dy = (x0 + y0 ) + x0 y0 . Taking f = (x2 +
y0 0 0 2 y 2
y 2 ) + xy, we have indeed that df = (x + y)(dx + dy) = ω.
228 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
0
b. dω 6= 0, so ω cannot be exact. Letting C be the rectangle with vertices at ,
−1
Z
1 1 0
, , and , we check easily that ω = 2.
−1 1 1 C
Z x0 Z y0
x0 x 2 2 x0 2 1 3 x
c. f = e dx + (x0 + y )dy = e − 1 + x0 y0 + y0 . Taking f =
y0 0 0 3 y
1
ex + x2 y + y 3 − 1, we have df = (ex + 2xy)dx + (x2 + y 2 )dy = ω, as required.
3
x0 Z x0 Z y0 Z z0
2 2 1
d. f y0 = x dx + (x0 + y )dy + (x0 + y0 + z 2 )dz = (x30 + y03 + z03 ) +
0 0 0 3
z0
x
1
x0 y0 + x0 z0 + y0 z0 . Taking f y = (x3 + y 3 + z 3 ) + xy + xz + yz, we find that df = (x2 + y +
3
z
z)dx + (y 2 + x + z)dy + (z 2 + x + y)dz = ω, as required.
x0 Z z0 x
e. f y0 = (x0 y0 +y0 cos z)dz = x0 y0 z0 +y0 sin z0 . Taking f y = xy 2 z+y sin z,
2 2
z0 0 z
we find that df = y 2 zdx + (2xyz + sin z)dy + (xy 2 + y cos z)dz = ω, as required.
8.3.7 a. Differentiating (and using the result of Example 1 of Chapter 3, Section 5) we obtain
1 X X
n n
′
dω = f (kxk) xi dxi ∧ xi dxi = 0,
kxk
i=1 i=1
8.3.8 Of course, the ludicrous nature of the path makes it clear that the 1-form in question
must be exact. Indeed, applying
one of the methods in Example 4, we find that the 1-form is the
x
1 2
derivative of the function f y = (3x2 + y 2 + ez ) + xy 2 + x2 z + eyz . Applying Proposition 3.1,
2
z
we have
Z e 1
5 33 7
df = f (g(1)) − f (g(0)) = f 4 − f −1 = e4 + e2 + e + .
C 2 2 2
1 0
I I I
8.3.9 Since d(xy) = ydx + xdy, we have 0 = d(xy) = ydx + xdy. In the event that C
C C C
bounds a region Ω, then, intuitively speaking, the first integral computes the area of Ω by slicing the
region into thin vertical strips, whereas the second integral computes the area by slicing the region
into thin horizontal strips. Why the sign discrepancy? As we go around C counterclockwise, the
first integral corresponds to an x-integral going from right to left, and actually gives the negative
of the area.
8.3. LINE INTEGRALS AND GREEN’S THEOREM 229
Remark: It is tempting to try to apply Green’s Theorem, but we certainly do not “know” that
every closed curve bounds a region (or a union thereof).
Z
8.3.10 a. Traversing the edges counterclockwise, starting on the x-axis, we have ω =
Z 1 Z 1 Z 1 C
1dy − (−1)dx = 2. Letting R be the square with boundary C, Green’s Theorem yields
Z0 Z 0 Z Z Z 1Z 1
2 2 2 2
ω= dω = 3(x + y )dx ∧ dy = 3(x + y )dA = 3(x2 + y 2 )dydx = 2.
C R R R
0 0 Z
cos t
c. Of course, we parametrize C by g(t) = a , 0 ≤ t ≤ 2π. We have ω =
sin t C
Z 2π Z
a4 2π 2
a4 2(cos2 t)(sin2 t)dt = sin 2tdt = πa4 /2. Letting D denote the disk of radius a cen-
0 2 0 Z Z Z
tered at the origin and applying Green’s Theorem, we have ω = dω = (x2 + y 2 )dA =
Z 2π Z a C D D
3 4
r drdθ = πa /2.
0 0
Z
cos2 t
d. We parametrize C by g(t) = 2 , −π/2 ≤ t ≤ π/2. Then ω =
cos t sin t C
Z π/2
32
(2 cos t)(4 cos2 t)dt = . Letting D be the disk bounded by C, Green’s Theorem tells us that
−π/2 3
Z Z Z ! Z p
xdx + ydy p
ω = dω = p ∧ (−ydx + xdy) + x2 + y 2 (2dx ∧ dy) = 3 x2 + y 2 dA =
C D D x2 + y 2 D
Z π/2 Z 2 cos θ Z π/2
32
3r 2 drdθ = 8 cos3 θdθ = .
−π/2 0 −π/2 3
0 a a
e. Let C1 be the line segment from to , C2 the circular arc from to
0 0 0
√ √
a/ 2 a/ 2 0
√ , and C3 the line segment from √ to . Then
a/ 2 a/ 2 0
Z Z Z Z Z π/4
3 2 3
ω= ω+ ω+ ω =0+a (sin3 θ + cos3 θ)dθ + 0 = a .
C C1 C2 C3 0 3
Z
On the other hand, by Green’s Theorem, letting Ω denote the sector in question, we have ω=
Z Z Z π/4 Z a C
2
dω = 2 (x + y)dx ∧ dy = 2 r 2 (cos θ + sin θ)drdθ = a3 .
Ω Ω 0 0 3
230 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z Z
8.3.11 Let D be the disk bounded by C. Then ω = dω = (1 + 2y)dx ∧ dy =
Z C D D
(1 + 2y)dA = (1 + 2y)area(D) = π. (Since D is symmetric about the x-axis, y = 0.)
D
a cos t
8.3.12 Proceeding as in Example 8, we parametrize the boundary curve C by g(t) = ,
b sin t
Z Z
1 1 2π
0 ≤ t ≤ 2π, and calculate the area by the line integral −ydx + xdy = abdt = πab.
2 C 2 0
8.3.13 Proceeding as in Example 8, we use the parametrization givenZ in the hint (with
1
0 ≤ t ≤ 2π) to find that the area is given by the line integral −ydx + xdy =
Z 2π Z 2π Z 2π 2 C
1 3 3 3π
3(sin4 t cos2 t + cos4 t sin2 t)dt = sin2 t cos2 tdt = sin2 2tdt = .
2 0 2 0 8 0 8
8.3.14 It is essential to realize that to apply the method of Example 8 we must have a closed
curve. The boundary C of the region in question consists of the trochoid (traversed right to left),
the two vertical line segments, and the horizontal line segment (traversed left to right). Thus, the
area is given by
Z Z 2π Z a−b
1 1 1
−ydx + xdy = − −(a − b cos t)(a − b cos t) + (at − b sin t)(b sin t) dt + (2πa)dt
2 C 2 0 2 0
= π(2a + b2 ).
2
8.3.15 It is essential to realize that to apply the method of Example 8 we must have a closed
curve. The boundary C of the region in question consists of the evolute (traversed right to left)
and the vertical line segment from A to B. Thus, the area is given by
Z Z 2π Z
1 a2 1 0
−ydx + xdy = −(sin t − t cos t)(t cos t) + (cos t + t sin t)(t sin t) dt + (a)dt
2 C 2 0 2 −2πa
Z 2
a2 2π 2 2 2 4π
= t dt + πa = πa +1 .
2 0 3
I Z Z
x2 2
8.3.16 a. Let C = ∂S. Then (e − 2xy)dx + (2xy − x )dy = 2ydx ∧ dy = 2ydA =
C S S
2yarea(S) = 2 · 2 · 9 = 36. (Since S is symmetric about the line y = 2, we have y = 2.)
I Z
2 2 sin y
b. Let C = ∂S. Then (2xy − y )dx + (x + e )dy = 6ydx ∧ dy = 6yarea(S) =
C S
7 9 9
6· (15 + π) = 21(15 + π).
2 4 4
n t
t1 1 1
8.3.17 a. If T = , then we have σ(T) = −n2 t1 + n1 t2 = = 1, since n and T span
t2 n 2 t2
a parallelogram (square) with signed area 1. Alternatively, note that since we obtain T when we
rotate n by angle π/2, for any v ∈ R2 , we have σ(v) = v · T (see the beginning of Section 5 of
Chapter 1).
8.3. LINE INTEGRALS AND GREEN’S THEOREM 231
as required.
Z h(x)
x
b. Using Exercise 7.2.23, if we set F (x) = Q dy, then we have
g(x) y
Z h(x)
′ ∂Q x x x
F (x) = dy + Q h′ (x) − Q g ′ (x),
g(x) ∂x y h(x) g(x)
and so
Z h(b) Z h(a) Z b
b a
Q dy − Q dy = F ′ (x)dx =
g(b) y g(a) y a
Z bZ h(x) Z b Z b
∂Q x x x
dydx + Q h′ (x)dx − Q g ′ (x)dx.
a g(x) ∂x y a h(x) a g(x)
Thus, we have
Z bZ h(x) Z b Z h(b)
∂Q x x b
dydx = Q g ′ (x)dx + Q dy
a g(x) ∂x y a g(x) g(b) y
Z b Z h(a) Z
x a
− Q h′ (x)dx − Q dy = Qdy.
a h(x) g(a) y ∂Ω
On the other hand, it is immediate from the Fundamental Theorem of Calculus that
Z bZ h(x) Z b Z
∂P x x x
dydx = P −P dx = P dx.
a g(x) ∂y y a h(x) g(x) ∂Ω
8.3.21 To be completely rigorous here, we need the Jordan curve theorem in some form, but the
idea is quite simple. If C has no self-intersection, then we get ±2π, as in Example 10, depending
on whether the origin lies in the region bounded by C. If C has some self-intersection points, start
at one of them, a, and proceed counterclockwise around C until the curve first returns to a. If the
planar region bounded by that portion of the curve contains the origin, then the integral picks up
2π. We delete that portion of the curve and continue. Continue in this manner to delete all loops
in C due to self-intersections. We are guaranteed a finite number. In the end, we are left with
one curve that either encircles the origin or doesn’t. The total line integral is the sum of all the
individual line integrals, each of which is either ±2π or 0; thus, the sum is an integral multiple of
2π.
Remark: As one learns in differential topology (cf. Guillemin and Pollack, Differential Topology,
Prentice Hall, 1974), we can compute the winding number by choosing a “generic” ray from the
origin, which will cross C non-tangentially a finite number of times, and then count those points
of intersection with a sign, + if the curve crosses counterclockwise at the point, − if it crosses
clockwise.
8.3. LINE INTEGRALS AND GREEN’S THEOREM 233
8.3.22 This problem proceeds exactly like Exercise 21. The only difference is that the 1-form
we’re integrating is the sum of
8.3.23 If C = ∂S, then, by Green’s Theorem, the work done by F on the ant is given by
Z Z Z
F · Tds = (y 3 + x2 y)dx + (2x2 − 6xy)dy = (4x − 6y − 3y 2 − x2 )dx ∧ dy
C
ZC S
= 7 − (x − 2)2 − 3(y + 1)2 dA.
S
Now,
Z it follows from Exercise 7.1.5 that if f is continuous and f ≥ 0 on a subset of finite area, then
f dA is largest when S = {x : f (x) ≥ 0}. If we remove any of S, then the integral goes down,
S
and if we continue outside S, thenwe add a negative quantity tothe integral and it goes down.
x
Thus, in our case, we want S = : (x − 2)2 + 3(y + 1)2 ≤ 7 , and C should be the ellipse
y
(x − 2)2 + 3(y + 1)2 = 7.
8.3.24 Z The proof of Theorem 3.2—in particular, of (2) =⇒ (3)—seems to require that the
integral ω be path-independent. In fact, all that is needed is path-independence along paths
that are composed of line segments parallel to the coordinate axes. Then taking any two such
paths from A to B whose union is a simple closed curve, the hypothesis of the problem and Green’s
Theorem tell us that the line integrals agree.
8.3.25 a. Suppose we row a distance d downstream and then back upstream. The total time
d d v 2d
is therefore + = 2d 2 2
> , which is the time it would take with no current
v+c v−c v −c v
whatsoever.
b. Let’s say the current vector is c (constant). We row with ground velocity v, so
the resultant velocity of the boat is v + c. We write kvk = υ and kck = c. Note that since
υ sin β = c sin α (using the notation in Figure 3.15), we have υ 2 cos2 β − c2 cos2 α = υ 2 − c2 . Now,
the time of the trip is
Z Z Z
ds ds υ cos β − c cos α
= = 2 cos2 β − c2 cos2 α
ds
C kv + ck C υ cos β + c cos α C υ
Z Z Z
ds c ds
> − 2 2
cos αds > ,
C υ cos β υ −c C C υ
Z Z Z
1 c
which is the time of the trip with no current. Note that cos αds = c·Tds = · Tds = 0.
C c C c C
234 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Remark: We can avoid the trigonometry by calculating as follows: Note, first of all, that
v+c
T= . Now,
kv + ck
1 2 2
(v · T)2 − (c · T)2 = v · (v + c) − c · (v + c) = (v − c) · (v + c) = υ 2 − c2 .
kv + ck2
Therefore,
1 1 1 (v − c) · T v·T c·T
= = = −
kv + ck (v + c) · T (v + c) · T (v − c) · T (v · T)2 − (c · T)2 υ 2 − c2
1 c·T 1 c·T
> − 2 2
> − 2 ,
v·T υ −c υ υ − c2
which gives the same result as before.
x θ cos θ cos τ
8.3.26 We have x = = f = b +a , so a straightforward calculation
y τ sin θ sin τ
yields f ∗ (dx ∧ dy) = ab sin(τ − θ)dθ ∧ dτ . Inparticular,
by the Inverse Function
Theorem, so long
θ x
as sin(τ − θ) 6= 0, f is locally invertible and is locally a C1 function of . Indeed, it seems
τ y
reasonable
to assume that, for our purposes, f has a global inverse and so there is a closed curve Γ
θ
in -space with f (Γ) = C. Suppose, moreover, that C = ∂Ω.
τ
θ cos θ
Now, the planar coordinates of the center of the wheel are given by y = g =b −
τ sin θ
cos τ
, so it follows that y is locally a C1 function of x. Let’s assume the wheel is 1 unit down the
sin τ
arm (as pictured) and has radius 1. Then as the point x moves along the curve C (say, parametrized
by F(t), 0 ≤ t ≤ T ), the center of the wheel moves along a path G(t) = (g◦ f −1 ◦ F)(t). Moreover,
− sin τ
the angle through which the wheel turns is given by α(t), where α′ (t) = G′ (t) · (inasmuch
cos τ
as the wheel rotates in a plane perpendicular to the arm from y to x). Now
Z T Z Z
− sin τ
total angle = α′ (t)dt = g∗ dy · = b cos(τ − θ)dθ − dτ
0 Γ cos τ Γ
Z Z
= (f ) b cos(τ − θ)dθ − dτ = (f −1 )∗ b sin(τ − θ)dθ ∧ dτ
−1 ∗
C Ω
Z
1 1
= dx ∧ dy = area(Ω).
a Ω a
8.4.3 The projection of S onto the xy-plane is the region Ω bounded by (1 − y)2 = 2(x2 + y 2 ),
i.e., the ellipse x2 + (y + 1)2 /2 = 1. If γ is the angle between the tangent plane of the cone at any
√ √
point and the xy-plane, then it is easy to check that | cos γ| = 1/ 3, so area(S) = 3·area(ellipse) =
√ √ √
3π(1)( 2) = π 6.
8.4.5 Parametrizing by spherical coordinates parametrization (see Example 6), we pull back
the given 2-form ω = xdy ∧ dz + ydz ∧ dx + zdx ∧ dy to find
g∗ ω = a3 (sin φ cos θ)(sin2 φ cos θ) + (sin φ sin θ)(sin2 φ sin θ) + (cos φ)(sin φ cos φ) dφ ∧ dθ
= a3 sin φdφ ∧ dθ.
Z Z 2π Z π
Thus, ω= a3 sin φdφdθ = (2a3 )(2π) = 4πa3 . Now, we observe that the unit outward-
S 0 0
x
1 1
pointing normal to S is y and the area 2-form of S is σ = (xdy ∧ dz + ydz ∧ dx + zdx ∧ dy).
a a
z
We thereby recover the usual formula for the surface area of a sphere: area(S) = 4πa2 .
8.4.6Using
Z the calculation
Z 2π Z π in Exercise 5, we have:
4
a. x2 σ = (sin φ cos θ)2 sin φdφdθ = π.
S 0 0 3
Z Z Z
b. By symmetry of the sphere S, x2 σ = y2σ = z 2 σ. (Officially, the function
S S S
x y
T y = z , for example, maps S to itself preserving orientation, since T ∗ σ = σ. Therefore,
z
Z Z x Z Z Z Z
1 1 4π
z2 σ = T ∗ (z 2 σ) = x2 σ.) It follows that x2 σ = (x2 + y 2 + z 2 )σ = σ= .
S S S S 3 S 3 S 3
8.4.7 We obtain the outward-pointing unit normal by finding the cross product
e1 −(a + b cos v)(sin u) −b sin v cos u cos u cos v
∂g ∂g
× = e2 (a + b cos v)(cos u) −b sin v sin u = b(a + b cos v) sin u cos v
∂u ∂v
e3 0 b cos v sin v
cos u cos v
so n = sin u cos v . Thus, the pullback of the area 2-form of the torus is
sin v
g∗ σ = (cos u cos v)g∗ (dy ∧ dz) + (sin u cos v)g∗ (dz ∧ dx) + (sin v)g∗ (dx ∧ dy)
= b(a + b cos v)(cos2 u cos2 v + sin2 u cos2 v + sin2 v)du ∧ dv = b(a + b cos v)du ∧ dv.
Z 2π Z 2π
(Also see Exercise 20.) Therefore, the area of the torus is b(a + b cos v)dudv = 4π 2 ab.
0 0
Interestingly, the answer depends only on h and not on the location of the planes.
8.4. SURFACE INTEGRALS AND FLUX 237
sin v
∂g ∂g
8.4.9 a. We have × = − cos v , which points upwards for u > 0.
∂u ∂v
u
Z
b. We have g∗ (xdz ∧ dx) = (u cos v)(− cos vdu ∧ dv), so xdz ∧ dx =
Z S
1Z 2π
π
(−u cos2 v)dvdu = − .
0 0 2
u 0 x
8.4.10 We have v = (1 − t) 0 + t y for some t ∈ R. Therefore, 0 = (1 − t) + tz
0 1 z
x
u 1 x
and t = 1/(1 − z). Therefore, we have = . Using the fact that y lies on the
v 1−z y
z
2 2
unit sphere, we find that (1 − z)u + z 2 = 1, so 1 − z 2 = (1 − z)2 (u2 + v 2 ), from
+ (1 − z)v
2 2 x
which we infer that 1 + z = (1 − z)(u 2 + v 2 ) and z = u + v − 1 . That is, y = g u =
u 2 + v2 + 1 v
2u z
1 . We can see that g is orientation-reversing in several ways: We can
2v
u2 + v 2 + 1 2 2
u +v −1
∂g ∂g
(without a great deal of mirth) calculate the cross product × and see that it is inward-
∂u ∂v
pointing. More geometrically: Note that going counterclockwise around a latitude circle on the
sphere results in a counterclockwise motion in the plane and that heading “uphill” towards the
north pole results in an outwards motion in the plane; thus, a positively-oriented basis for the
tangent plane of the sphere corresponds to a negatively-oriented basis for R2 .
8.4.11 a. If we parametrize by spherical coordinates, then g∗ (xdy ∧ dz) = sin3 φ cos2 θdφ ∧ dθ
Z Z 2π Z π
4π
and xdy ∧ dz = sin3 φ cos2 θdφdθ = .
S 0 0 3
b. Let S ± denote the upper and lower hemispheres, parametrized respectively by
x
± x (perhaps it’s better to use polar coordinates). Then, letting D
g = p y
y
± 1 − x2 − y 2
Z Z Z 2π Z 1
x2 r 3 cos2 θ 2
denote the unit disk, xdy ∧ dz = p dx ∧ dy = √ drdθ = π · .
S + D 2
1−x −y 2 0 0 1−r 2 3
−
Because on S the graph
Z Z parametrization
Z is orientation-reversing,
Z we get the identical integral for
2π 4π
xdy ∧ dz, and so xdy ∧ dz = xdy ∧ dz + xdy ∧ dz = 2 · = .
S− S S+ S− 3 3
2u −8u
c. Using the result of Exercise 10, g∗ (xdy ∧ dz) = 2 du ∧ dv
u + v 2 + 1 (u2 + v 2 + 1)3
−16u 2
= du ∧ dv. Recalling that the parametrization is orientation-reversing, we have
(u + v 2 + 1)4
2
Z Z Z 2π Z ∞ Z ∞
16u2 16r 3 cos2 θ 8(u − 1) 4
xdy ∧ dz = 2 2 4
du ∧ dv = 2 4
drdθ = π 4
du = π.
S R2 (u + v + 1) 0 0 (r + 1) 1 u 3
238 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
zσ = xzdy ∧ dz + yzdz ∧ dx + z 2 dx ∧ dy
= zdz ∧ (−xdy + ydx) + (1 − x2 − y 2 )dx ∧ dy
= −(xdx + ydy) ∧ (−xdy + ydx) + (1 − x2 − y 2 )dx ∧ dy
= (x2 + y 2 )dx ∧ dy + (1 − x2 − y 2 )dx ∧ dy = dx ∧ dy.
Z Z
Therefore, zσ = dx ∧ dy = π, inasmuch as the projection of S onto the xy-plane is the unit
S S
disk.
0
Alternatively, zσ = (F · n)σ for F = 0 . But then we know that the corresponding 2-form
1
η = F1 dy ∧ dz + F2 dz ∧ dx + F3 dx ∧ dy = dx ∧ dy.
a cos θ
θ
8.4.13 We parametrize by cylindrical coordinates: g = a sin θ . This is an orientation-
z
z
preserving parametrization. Z
a. Note that g∗ ω = z(−a sin θdθ) ∧ (a cos θdθ) = 0. Therefore, ω = 0. Geometrically,
S
the projection of (any tangent plane) of S onto the xy-plane has area 0.
Z
b. Now g∗ ω = (a sin θ)(−a sin θdθ) ∧ dz = −a2 sin2 θdθ ∧ dz. Therefore, ω =
Z S
2π Z h
(−a2 sin2 θ)dzdθ = −πa2 h.
0 0
8.4.16 a. Since F · n = (x3 + y 3 + z 3 )/a, it is clear from symmetry considerations that the total
flux should be 0. Writing out the integral explicitly, we have
Z 2π Z π
a4 sin φ sin3 φ(cos3 θ + sin3 θ) + cos3 φ dφdθ = 0,
0 0
8.4. SURFACE INTEGRALS AND FLUX 239
Z 2π Z 2π Z π
3 3
since cos θdθ = sin θdθ = 0 and cos3 φ sin φdφ = 0.
0 0 0
b. By the same considerations as in part a, the flux is given by
Z 2π Z π/2 Z π/2
πa4
a4 sin φ sin3 φ(cos3 θ + sin3 θ) + cos3 φ dφdθ = 2π cos3 φ sin φdφ = .
0 0 0 2
r cos θ
θ
c. Parametrizing the cone by g = r sin θ , 0 ≤ θ ≤ 2π, 0 ≤ r ≤ 1, we see that
r
r
this gives the correct orientation. Then the flux of F outwards across S is given by integrating the
2-form ω = x2 dy ∧ dz + y 2 dz ∧ dx + z 2 dx ∧ dy, whose pullback is g∗ ω = r 3 (cos3 θ + sin3 θ − 1)dθ ∧ dr.
Z Z 1 Z 2π
π
Then ω= r 3 (cos3 θ + sin3 θ − 1)dθdr = − .
S 0 0 2
Z Z h Z 2π
1 3 3
d. The flux of F outwards across S is given by (x + y )dS = a3 (cos3 θ +
S a 0 0
sin3 θ)dθdz = 0.
e. To the answer of d, we add the flux of F outwards across the bottom disk (0, since
F · n = F · (−e3 ) = −z 2 = 0) and across the top disk (πa2 h2 , since F · n = F · e3 = z 2 = h2 here).
r cos θ
r
8.4.17 We parametrize the paraboloid by g = r sin θ , 0 ≤ r ≤ 2, 0 ≤ θ ≤ 2π. The flux
θ
4 − r2
of F is given by integrating the 2-form ω = xzdy ∧ dz + yzdz ∧ dx + (x2 + y 2 )dx ∧ dy over S. So
Z Z Z
ω= g∗ ω = 2r 3 (4 − r 2 ) + r 3 dr ∧ dθ
S [0,2]×[0,2π] [0,2]×[0,2π]
Z 2π Z 2 Z 4
3 2 88
= r (9 − 2r )drdθ = π u(9 − 2u)du = π.
0 0 0 3
1
8.4.18 a. We have F · n = 1/a2 , so the flux is (4πa2 ) = 4π.
a2
a
b. We have F · n = , so the flux is
+ z 2 )3/2 (a2
Z 2π Z h Z h Z arctan(h/a)
2 1 2 1 h
a 2 2 3/2
dzdθ = 2πa 2 2 3/2
dz = 4π cos udu = 4π √ .
0 −h (a + z ) −h (a + z ) 0 a + h2
2
Thus, the total flux across the surface of the cube is 4π.
a/2
8.4.19 Note that S lies over the disk D of radius a/2 centered at in the xy-plane.
0
Z Z
a. This is easy, if we keep track of orientation: dx ∧ dy = − dx ∧ dy = −area(D) =
S D
πa2
− .
4
Z x/z
b. We can interpret ω as the flux across S of the vector field F = y/z . Since
S −1
x Z √
1 √
the unit normal to S is n = √ y , we have F · n = 2, and F · ndS = 2area(S) =
2z −z S
πa2
2area(D) = , inasmuch as the tangent plane along S makes an angle of π/4 with the xy-plane.
2
(Of course, we could parametrize and pull back, getting 2rdθ ∧ dr.)
N1
8.4.20 If N = N2 , then we have g∗ (dy ∧ dz) = N1 du ∧ dv, g∗ (dz ∧ dx) = N2 du ∧ dv, and
N3
g∗ (dx ∧ dy) = N3 du ∧ dv. Therefore,
∂g ∂g
Since, by the discussion of Section 4.2, the area of the parallelogram spanned by and is
√ √ ∂u ∂v
EG − F 2 , we infer that kNk = EG − F 2 , as required.
Z Z Z
2π Z 2π
b. We have − cos2 θdθ = −π and
xzdy = xzdy = − cos2 θdθ = −π.
Z C1 0 Z Z C2 0
Z Z 2π
b. xdy + z 2 dx = a2 cos2 θdθ = πa2 .
C 0
The answers are equal. Parametrizing the hemisphere by a disk, we realize that Green’s Theo-
rem predicts that the two integrals should be equal. Note that d(xdy + z 2 dx) = dx ∧ dy + 2zdz ∧ dx.
8.4.25 a. If we cut the Möbius strip down the middle, we get a single band with two half-twists
(which is orientable). When we cut it again, we get two linked bands, each with two half-twists.
b. We end up with two linked bands, one a Möbius strip, the other orientable (with two
half-twists).
8.4.26 This is false. A union of k disjoint orientable surfaces has 2k possible orientations.
242 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
8.5.1 Since the outward-pointing normal to ∂Rk+ is −ek , we must decide whether
{−ek , e ,...,e } is a positively-oriented basis for Rk . We need k − 1 exchanges
| 1 {z k−1}
standard positive basis for Rk−1
and one change of sign to obtain {e1 , . . . , ek }. This is k sign changes in all, and hence the standard
positive basis for Rk−1 gives the correct orientation precisely when (−1)k = +1.
cos t
8.5.2 For the direct calculation, we parametrize C by g(t) = sin t , 0 ≤ t ≤ 2π.
2 cos t + 3 sin t − 1
Then
Z Z 2π
ydx − 2zdy + xdz = − sin2 t − 2(2 cos t + 3 sin t − 1)(cos t) + cos t(−2 sin t + 3 cos t) dt
C 0
Z 2π
= (− sin2 t − cos2 t − 8 sin t cos t + 2 cos t)dt = −2π.
0
Z
To apply Stokes’s Theorem, let S be the ellipse bounded by C. We have ydx − 2zdy + xdz =
C
Z 2
(2dy ∧ dz − dz ∧ dx − dx ∧ dy), which we can interpret as the flux of the vector field F = −1
S −1
−2
1
outwards across S. Since n = √ −3 (watch out for orientation issues here!), we have F · n =
14 1
√ √
−2/ 14. On the other hand, the area of S is 14π, so the flux of F is −2π. (Alternatively, pulling
back the 2-form 2dy ∧ dz − dz ∧ dx − dx ∧ dy to the unit disk, we get (−4 + 3 − 1)dx ∧ dy, whose
integral is −2π.)
8.5.3 Let S be the ellipse bounded by C, and let D be the disk of radius a in the xy-plane
centered at the origin. Note that because C is oriented clockwise as viewed from high above the
xy-plane, we must endow S and D with the opposite of their usual orientations. By Stokes’s
Theorem,
Z Z
(y − z)dx + (z − x)dy + (x − y)dz = −2(dy ∧ dz + dz ∧ dx + dx ∧ dy)
C S
Z
b
=2 + 1 dx ∧ dy = 2πa(a + b).
D a
8.5.4 Let S be the disk in the plane z = 1 bounded by its intersection with the sphere, oriented
with its outward-pointing normal upwards. Then
Z Z
(−y 3 + z)dx + (x3 + 2y)dy + (y − x)dz = 3(x2 + y 2 )dx ∧ dy + 2dz ∧ dx + dy ∧ dz
C S
Z 2π Z 1
3
= 3r 3 drdθ = π.
0 0 2
8.5. STOKES’S THEOREM 243
8.5.5 Proceeding as in Example 1, let S be the disk bounded by C in the plane x + y + z= 0.
Z Z 0
Then 2zdx + 3xdy − dz = 2dz ∧ dx + 3dx ∧ dy, which we can interpret as the flux of F = 2
C S 3
√ √
outwards across S. Since F · n = 5/ 3 and the area of S is πa2 , the flux is 5πa2 / 3.
8.5.8 Let D denote the unit disk in the xy-plane, oriented upwards. Calculating directly, we
have
Z Z
F · ndS = x2 zdy ∧ dz + y 2 zdz ∧ dx + (x2 + y 2 )dx ∧ dy
M M
Z
= (1 − x2 − y 2 )(x2 )(2x) + (1 − x2 − y 2 )(y 2 )(2y) + (x2 + y 2 ) dx ∧ dy
ZD
= (x2 + y 2 )dx ∧ dy (by symmetry)
D
Z 2π Z 1
π
= r 3 drdθ = .
0 0 2
In order to apply Stokes’s Theorem, we observe that M ∪ D − is the boundary of the 3-manifold
Ω = {0Z≤ z ≤ 1−x 2 2 2 2 2 2
with boundary
Z Z −y Z}. Then, Zletting ωZ= x zdy∧dz+y zdz∧dx+(x +y )dx∧dy,
we have F·ndS =
ω and ω+ ω= dω = 2(x+y)zdx∧dy∧dz = 0 (by symmetry).
MZ ZM ZD− ZM Ω Ω
π
Therefore, ω=− ω= ω= (x2 + y 2 )dx ∧ dy = , just as before.
M D− D D 2
8.5.9 Since we have no idea what the shape of M is, we must resort to Stokes’s Theorem. There
are two possible approaches. First, as in Example 3, if we attach the disk D = {x2 + y 2 ≤ 4, z = 0},
then M ∪ D − = ∂Ω. Let ω = yzdy ∧ dz + x3 dz ∧ dx + y 2 dx ∧ dy; then dω = 0, and
Z Z Z Z Z Z
0= dω = ω= ω+ ω= ω− ω.
Ω ∂Ω M D− M D
Z Z 2π Z 2 Z
Now, ω= r 3 sin2 θdrdθ = 4π, so ω = 4π, as required.
D 0 0 M
1
The second approach is to observe that ω = dη, where, for example, η = x3 zdx+xy 2 dy+ y 2 zdz.
2
Then
Z Z Z Z Z 2π
2
ω= dη = η= xy dy = 16 sin2 t cos2 tdt = 4π.
M M ∂M ∂M 0
Z Z Z Z
8.5.10 By Stokes’s Theorem, we have dω = ω= ω= dω.
M ∂M ∂M ′ M′
8.5.13 Note that X = ∂M , where M = {x21 + x22 ≤ 1, x23 + x24 = 1}. One checks that the
orientation on ∂M is that prescribed for X. Now, dω = (x4 dx2 + x2 dx4 ) ∧ dx1 ∧ dx3 , so (noting
that dx3 ∧ dx4 = 0 on M )
Z Z Z ! Z
ω= dω = −x4 dx1 ∧ dx2 ∧ dx3 = dx1 ∧ dx2 −x4 dx3 = π 2 .
X M B(0,1) S1
Z Z Z
∂f ∂f ∂f
8.5.14 We have Dn f dS = ∇f · ndS = dy ∧ dz + dz ∧ dx + dx ∧ dy =
Z Z ∂M Z ∂M ∂M ∂x ∂y ∂z
⋆(df ) = d⋆(df ) = ∇2 f dV .
∂M M M
8.5.15 a. We parametrize the cylinder by cylindrical coordinates, as usual. Note that the upper
p
intersection of the cylinder and the sphere is given by z = a 2(1 + sin θ). Thus, we have
Z Z Z √ 2π a Z2(1+sin θ) 2π
zdS = azdzdθ = a3 (1 + sin θ)dθ = 2πa3 .
S 0 0 0
(Note that we have oriented S with the outward-pointing normal away from the axis of the cylinder.)
b. Let C ′ be the circle of radius a in the xy-plane centered at the origin, oriented
′ 2 2 2
counterclockwise.
Z Z Z ∂S = CZ∪ C . Now, let ω = y(z − 1)dx + x(1 − z )dy +Zz dz. Then we
Then
have ω+ ω= dω = 2 xzdy ∧ dz + yzdz ∧ dx + (1 − z 2 )dx ∧ dy = 2a zdS = 4πa4 .
Z C CZ ′ S S Z S
2 2 2
Since ω= −ydx + xdy = 2πa , we infer that ω = 2πa (2a − 1).
C′ C′ C
246 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
8.5.16 a. We parametrize the cylinder by cylindrical coordinates, as usual. Note that the upper
p
intersection of the cylinder and the sphere is given by z = a 2(1 + cos θ) = 2a| cos(θ/2)|. Thus,
we have
Z Z 2π Z 2a| cos(θ/2)| Z 2π
8 3 64 4
z 2 dS = az 2 dzdθ = a | cos(θ/2)|3 dθ = a .
S 0 0 3 0 9
(Note that we have oriented S with the outward-pointing normal away from the axis of the cylinder.)
b. Let C ′ be the circle of radius a in the xy-plane centered at the origin, oriented
′ 3 3
counterclockwise.
Z Z Z Then ∂SZ = C ∪ C . Now, let ω = y(z + 1)dx − x(z + 1)dy + zdz.
Z Then we have
64 5
ω+ ω = dω = 3xz dy ∧ dz + 3yz dz ∧ dx − 2(z + 1)dx ∧ dy = 3a z 2 dS =
2 2 3
a .
C Z C′ Z S S Z S 3
64
Since ω= ydx − xdy = −2πa2 , we infer that ω = 2πa2 + a5 .
C′ C′ C 3
2 2 2 2 u 1 u−v x
8.5.17 Let T : R × R → R × R be given by T = = . Then T maps
v 2 u+v y
u
X = S1 × S1 = : kuk = kvk = 1 one-to-one onto M . Moreover, if we set Y = {kuk ≤
v
1
1, kvk = 1}, then X = ∂Y . Letting ω = (y12 − x21 )dx2 ∧ dy2 , we have T ∗ ω = u1 v1 du2 ∧ dv2 .
Z Z 2 Z
du2 dv2 ∗
Note that ∧ > 0 with the usual orientation on X, so ω = T ω = d(T ∗ ω) =
Z u 1 v 1 M X Y
1 1
v1 du1 ∧ du2 ∧ dv2 = π 2 .
2 Y 2
8.5.18 (See the solution of Exercise 8.3.3d.) Note that C = ∂M , where M is the disk bounded
1
1
by C in the plane x+y+z = 0. Now, an orthonormal basis for that plane is given by v1 = √ −1
2 0
1
1 r
and v2 = √ 1 . We thus are able to parametrize M by g = (r cos θ)v1 + (r sin θ)v2 ,
6 −2 θ
0 ≤ r ≤ 1, 0 ≤ t ≤ 2π. (Note that v1 × v2 points upwards, so g is orientation-preserving.) Then we
Z Z 2π Z 1
2 1 2
∗ 3 ∗ 2
have g d(z dx) = g (3z dz∧dx) = 3r 2 2
sin θ √ rdr∧dθ , so ω= √ r 3 sin2 θdrdθ
3 3 M 3 0 0
π
= √ .
2 3
1
Remark: The area 2-form σ of the plane x + y + z = 0 is given by σ = √ dy ∧ dz + dz ∧ dx +
3
dx ∧ dy . By symmetry (e.g., the projections of a region into all three coordinate planes have the
1 r
same area), on this plane we have dy ∧ dz = dz ∧ dx = dx ∧ dy, so dz ∧ dx = √ σ = √ dr ∧ dθ.
3 3
Z Z
8.5.19 2 2 2
(See the solution of Exercise 18.) Let ω = xy dx + yz dy + zx dz. Then ω= dω.
Z ZC M
Now, dω = −2(yzdy ∧dz+xzdz∧dx+xydx∧dy) and, by symmetry, yzdy ∧dz = xzdz∧dx =
M M
8.5. STOKES’S THEOREM 247
Z
xydx ∧ dy. Therefore,
M
Z Z Z
ω= dω = −6 xzdz ∧ dx
C M M
Z 2π Z 1
1 1 2 r
= −6 r √ cos θ + √ sin θ − √ r sin θ √ drdθ
0 0 2 6 6 3
Z 2π Z 1
2 π
=√ r 3 sin2 θdrdθ = √ .
3 0 0 2 3
8.5.20 Let ω ∈ Ak−2 (Rk ), and write d(dω) = f (x)dx1 ∧ · · · ∧ dxk . Suppose f (a) > 0. By
continuity, there is a ball B centered at a on which Zf > 0. Then,
Z Z on one hand,
Z by Exercise 7.1.5,
f dV > 0. On the other hand, by Corollary 5.3, f dV = d(dω) = dω = 0. From this
B B B ∂B
contradiction we infer that f = 0 everywhere.
4 2(vdu − udv)
g∗ σ = − du ∧ dv = d .
(1 + u2 + v 2 )2 2 2
| 1 + u{z + v }
η
−
c. If M and
Z N are Z
two oriented
Z surfaces with the same boundary
Z Zcurve, then M ∪ N =
∂W . Then we have σ− σ = dσ = 0, so area(M ) = σ = σ < area(N ), since
M N W M N
equality cannot hold with these hypotheses.
8.5.24 Assume M is connected (if not, work with a single connected piece of M ). Let σ be
the volume form of M (see Exercise 23), and write dω = f σ. If f were never 0, then, by the
Intermediate
Z Value Theorem, it would Zalways have
Z to have the same sign. If f > 0 everywhere,
then f σ > 0; yet, by Corollary 5.3, fσ = dω = 0.
M M M
1 −1/x
8.5.25 a. We have f (x) = e−1/x , so we can take p0 (x) = 1. Since f ′ (x) = e , we
x2
2 (k)
can take p1 (x) = x . Now, proceeding by induction, suppose f (x) = e −1/x pk (1/x) for some
1
polynomial pk of degree 2k. Then we have f (k+1) (x) = e−1/x 2 pk (1/x) − p′k (1/x) . So, if we
x
set pk+1 (x) = x2 pk (x) − p′k (x) , then we have f (k+1) (x) = e−1/x pk+1 (1/x), as required. Note,
moreover, that if pk has degree 2k, then pk+1 clearly has degree 2(k + 1).
8.6. APPLICATIONS TO PHYSICS 249
b. Obviously h(0) = 0. Suppose that h(k) (0) = 0 for some k ≥ 0. Now, clearly the
(k+1)
left-hand derivative h− (0) = 0, since h(x) = 0 for all x ≤ 0. But we also have
since lim P (u)e−u = 0 for any polynomial P . We conclude that h(k+1) (0) = 0, as desired. There-
u→∞
fore, h(k) (0) = 0 for all k ≥ 0.
8.5.26 As suggested in the hint, the collection of balls B(q, 1/k) with rational center q ∈ Qn
and radius 1/k, k ∈ N, is a countable collection. (Q is countable, and so Qn is as well. A countable
union of countable sets is countable.) Now, we claim that given any x ∈ Rn contained in some
open set V , there is a ball B(q, 1/k) with x ∈ B(q, 1/k) ⊂ V . (Proof: Since V is open, there is
1 r 1
r > 0 so that B(x, r) ⊂ V . Choose k ∈ N so that < . Then for any q ∈ Qn with kx − qk < ,
k 2 k
we note that, by the triangle inequality, x ∈ B(q, 1/k) ⊂ B(x, r) ⊂ V .)
For each x ∈ X, we know that x ∈ Vα for some α; choose one of our countable collection of balls
B(q, 1/k) containing x and contained in Vα . We end up with a countable collection Bi = B(qi , 1/ki )
covering X so that each Bi ⊂ Vαi for some αi . Since all these Bi ’s cover X, the corresponding sets
Vαi must as well.
8.6.2 By symmetry, the force F is radial and of uniform strength on spheres centered at the
center of the ball. Say the radius of the ball is R. Thus, the flux of F outwards across a sphere of
radius b > R is −kFk(4πb2 ); on the other hand, by Gauss’s law, that flux is −4πGM . Therefore
kFk = GM/b2 . So F is a radial inverse square force outside the ball, as we wished to show.
Z Z Z Z Z
8.6.3 a. We have Dn gdS = ∇g · ndS = ⋆dg = d⋆dg = ∇2 gdV . Then
Z Z ∂Ω Z Z
∂Ω ∂Ω Z Ω Ω
f Dn gdS = f ⋆dg = d(f ⋆dg) = (df ∧⋆dg+f d⋆dg) = (∇f ·∇g +f ∇2 g)dV . Using the
∂Ω ∂Ω Ω Ω Ω
250 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z
second result twice, we have (f Dn g−gDn f )dS = (f ∇2 g+∇f ·∇g)−(g∇2 f +∇g·∇f ) dV =
Z ∂Ω Ω
(f ∇2 g − g∇2 f )dV .
Ω
b. We merely apply the results of part a. For the first two equalities, substitute g = f
to get the results. The last result is immediate from the last equation in part a.
8.6.4 Since ∇2 is linear (e.g., because d and ⋆ are), if f and g are harmonic, so then is
h = f − g. So we start with a function h that is Z
harmonic on Ω and
Z has value 0 on ∂Ω. By
the second equation in Exercise 3b, we have 0 = hDn hdS = k∇hk2 dV . Since k∇hk2
∂Ω Ω
is continuous and nonnegative, we conclude from Exercise 7.1.5 that ∇h = 0 everywhere on Ω.
Therefore, h is constant on every connected piece of Ω. Since h = 0 on ∂Ω, it follows that h = 0
on Ω, and so f = g on Ω.
so
Z Z Z Z
1 1 1 1
f dS − 2 f dS = Dn f dS − Dn f dS = 0
r2 kxk=r ε kxk=ε r kxk=r ε kxk=ε
so, by continuity of f at 0,
Z Z
1 1
f dS = lim f dS = f (0).
4πr 2 kxk=r ε→0+ 4πε2 kxk=ε
(by the discussion of Section 6.2) since no point y ∈ D lies inside S. (Here Fy denotes the force
field due to a unit point mass at y.)
b. The argument is rather similar when all of D lies inside S. Fubini’s Theorem still
applies, and we have
Z Z Z Z Z
F · ndS = Fy · ndSx δ(y)dVy = (−4πG)δ(y)dVy = −4πG δdV ,
S D S D D
as required.
8.6.7 c., d., e., g., h. div = 0; a., f., g., h. curl = 0
−y
8.6.8 a. The vector field F = will do in R2 . Its flow lines are concentric circles about
x
the origin.
b. Suppose C were a closed flow line of F, parametrized by g : [a, b] → Rn . Then, since
I Z Z b
F is conservative, by Theorem 3.2, we have F · Tds = 0. Yet F · Tds = F(g(t)) · g′ (t)dt =
Z b C C a
′ 2 ′
kg (t)k dt, so we must have g (t) = 0 for all t. That is, C is merely a point.
a
Z Z
c. By Exercise 8.3.18, if C = ∂S, then F·nds = div FdA, but since F is everywhere
Z
C S
tangent to C, the flux of F across C is 0. Thus, div FdA = 0, and so it follows from Exercise
S
7.1.5 that div F(x) = 0 for some x ∈ S.
Z Z Z Z
∂f
8.6.9 a. For i = 1, 2, 3, we have f ni dS = (f ei ) · ndS = div (f ei )dV = dV .
Z Z ∂Ω ∂Ω Ω Ω ∂xi
Therefore, f ndS = ∇f dV .
∂Ω Ω
b. Applying the result of part a with f = 1 gives the result. Intuitively, the average
value of n must be 0 on a closed surface because, in order for the surface to close up, the normal
must spend equal amounts of area pointing in opposite directions.
8.6.10 a. This is immediate if we think of approximating the integral by a sum over (almost
planar) pieces of surface area.
Z Z Z
b. Using the result of part a, we have B = − pndS = − ∇pdV = − δgdV =
Z ∂Ω Ω Ω
−g δdV = −M g.
Ω
c. Because the object is in equilibrium, the buoyancy force (upwards) exactly balances
the weight of the object (downwards), and so the floating body must displace precisely that amount
of liquid that will have its own weight.
252 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Z Z
d ∂δ
8.6.11 Using Exercise 7.2.20, we have δdV = dV . On the other hand, by Theorem
Z Z dt Ω Ω ∂t
6.2, F · ndS = div FdV . The law of conservation of mass can therefore be rewritten as
∂Ω Ω
Z
∂δ
(∗) + div F dV = 0.
Ω ∂t
∂δ
Now comes a standard and important argument: Suppose the continuous function + div F
∂t
were nonzero (say, positive) at some point a; then, by Exercise 7.1.5, its integral over a small ball
centered at a would be positive, contradicting equation (∗).
Remark: It is in this way that we go back from integral laws to their “differential” versions, as
in Section 6.3.
Z Z Z
8.6.12 a. The flux of q inwards across ∂Ω is − q·ndS = K∇u·ndS = Kdiv (∇u)dV
Z ∂Ω ∂Ω Ω
= K∇2 udV . (We use Theorem 6.2 at the penultimate step.)
Ω
b. This is immediate from the definition of the integral.
Z Z
∂u
c. Applying the same reasoning as in Exercise 11c, we have K∇2 udV = c dV
Ω Ω ∂t
∂u
for all regions Ω, and therefore we must have K∇2 u = c (since the functions involved are
∂t
continuous).
x
8.6.13 a. E(0) = 0 since we are given u = 0 for all x. By Exercise 7.2.20 and the second
0
Green’s formula in Exercise 3, we have
Z Z Z Z
′ ∂u 2
E (t) = u dV = u∇ udV = uDn udS − k∇uk2 dV .
Ω ∂t Ω ∂Ω Ω
Z
By the hypothesis that the boundary is insulated, we therefore have E ′ (t) = − k∇uk2 dV ≤ 0.
Z t Ω
′
Since E(0) = 0, we have E(t) = E (s)ds ≤ 0. But, clearly, E(t) ≥ 0, since E is the integral of
0
a nonnegative function. We infer that E(t) = 0 for all t ≥ 0.
2
Exercise 7.1.5, since u ≥ 0 and is continuous, the only way its integral can be 0
b. By
x
is to have u = 0 for all x ∈ Ω and t ≥ 0.
t
c. By linearity of the derivative, if u1 and u2 are solutions are the heat equation, then
u = u1 − u2 is as well. If u1 = u2 at t = 0 and along ∂Ω, then we know that u satisfies the
hypotheses given originally. It follows from part b that u = 0, and hence that u1 = u2 , for all x ∈ Ω
and t ≥ 0.
Z ! Z Z
∂u 2 X3
∂u ∂ 2 u ∂u ∂u
= ∇ u+ dV = div ∇u dV = ∇u · ndS = 0,
Ω ∂t ∂xi ∂xi ∂t Ω ∂t ∂Ω ∂t
i=1
∂u x
inasmuch as = 0 for all x ∈ ∂Ω. Therefore, E is constant, as desired.
∂t t
Note that, by assumption, A 6= 0. We know from the quadratic formula that the solutions t will be
2
2 2 2
smooth functions provided B − 4AC = 4 (x − f (x)) · f (x) + kx − f (x)k 1 − kf (x)k > 0.
Being the sum of a nonnegative number and a positive number, this expression is in fact clearly
positive. Therefore, the positive root is a smooth function of x, as is r(x).
x
8.7.2 Yes, any two maps f , g : [0, 2π] → R2 are homotopic: We merely take H = tf (x) +
t
(1 − t)g(x), the so-called straight line homotopy.
8.7.3 The proof consists basically Zin copying Example Z 2. Let H : X × Z [0, 1] → Y be the
homotopy between f and g. We have H∗ ω = d(H∗ ω) = H∗ (dω) = 0.
∂(X×[0,1]) X×[0,1] Z X×[0,1]Z
∗
But ∂(X × [0, 1]) = (−1) dim X−1 −
(X × {1}) ∪ (X × {0}) , so we infer that g ω= H∗ ω =
Z Z X X×{1}
∗
H ω= f ∗ ω, as required.
X×{0} X
8.7.4 Note that on |z| = 2, we have |z 4 | = 16 and | − 3z + 9| ≤ |3z| + 9 = 15. Let g(z) = z 4 . It
f g
follows as in the proof of Theorem 7.6 that on X = {|z| = 2}, the maps and are homotopic
|f | |g|
maps X → S 1 . In particular, we define
z z 4 + t(−3z + 9)
H = 4 ,
t |z + t(−3z + 9)|
Z
z g(z) z f (z)
observe that it is smooth and that H = and H = . Therefore f ∗ω =
Z 0 |g(z)| 1 |f (z)| ∂Ω
8.7.6
Z Say M is an n-dimensional manifold. Let ω be an (n − 1)-form on ∂M with the property
that ω 6= 0. (One can either use the volume form on ∂M or else use a form defined in a single
∂M
coordinate chart, bumped off using a partition of unity to give a globally-defined form on ∂M .)
Suppose there were a retraction f . Then we would have
Z Z Z Z
0 6= ω= f ∗ω = d(f ∗ ω) = f ∗ (dω) = 0,
∂M ∂M M M
since the only n-form on an (n − 1)-dimensional manifold is 0. This contradiction completes the
proof.
Z Z Z
8.7.7 We have ω = 10, ω = 0, ω = −3. We can gradually deform C3 so that it is
C3 C4 C5
the union of a curve homotopic to C1 and one homotopic to C2− , the homotopy occurring in R3 − Z.
Since ω is closed, it follows
Z from
Z Proposition
Z 7.4 that the integral does not change under such a
homotopy. Therefore, ω = ω− ω = 10. C4 is actually the boundary of a punctured
C3 C1 C2
torus M that runs along Zthe “lollipop
Z stick” and around the “lollipop”, completely missing the
vertical axis. Therefore, ω = dω = 0. (Alternatively, one can slide C4 down the lollipop
C4 M
stick and pull it across the lollipop, and then pinch to obtain two curves, one homotopic to C2 , the
other homotopic to C2− .) Last,
Z C5 can
Z be deformed
Z into the union of a curve homotopic to C4− and
one homotopic to C1− , so ω=− ω− ω = −3.
C5 C4 C1
8.7.8 a. We have
dz f dg + gdf df dg dz
b. Since (f g)∗ dz = f dg + gdf , we have (f g)∗ = = + = f∗ +
z fg f g z
dz
g∗ . Taking the imaginary part of this equation, we obtain (f g)∗ ω = f ∗ ω + g ∗ ω, as desired.
z
8.7.9 Since we are given |g
a. − f | < |f | on C, we can give a homotopy between f and g as
z
maps from C to C − {0}: Let H = (1 − t)f (z) + tg(z). Since (1 − t)f + tg = f + t(g − f ), it
t
follows that |(1 − t)f + tg| ≥ |f | − |g − f | > 0 for all Zt, so H is Z
well-defined and smooth. Now, since
∗
ω is closed on C − {0}, by Proposition 7.4 we have f ω= g ∗ ω, as required.
C C
b. By the Maximum Value Theorem, Theorem 1.2 of Chapter 5, there is a number m > 0
so that |p| ≥ m on ∂D. Let max |z| = R. Choose 0 < δ < m/(1 + R + R2 + · · · + Rn−1 ). Then,
z∈∂D
whenever |aj − bj | < δ, j = 0, 1, . . . , n − 1, we have |P (z) − p(z)| ≤ |b0 − a0 | + |b1 − a1 ||z| + · · · +
|bn−1 − an−1 ||z|n−1 < δ(1 + R + · · · + Rn−1 ) < m whenever z ∈ ∂D. It follows that on ∂D we have
|P − p| < |p|, and so, by part a and Proposition 7.8, p and P have the same number of roots in D.
8.7. APPLICATIONS TO TOPOLOGY 255
c. Here we use the Fundamental Theorem of Algebra, Theorem 7.6, to get started. Given
the polynomial p with roots r1 , . . . , rn ∈ C (allowing repetitions, of course), choose any ε > 0 and
S
n
let D = B(rj , ε). Since we’ve already accounted for all the roots of p, we know that p 6= 0 on
j=1
∂D. It follows from part b that there is δ > 0 so that whenever |bj − aj | < δ, j = 0, 1, . . . , n − 1,
the polynomial P will have n (hence, all) roots inside D. That is, “wiggling” the coefficients of the
polynomial less than δ results in all the roots’ remaining within ε of the original roots.
8.7.10 Suppose not. Then for every x ∈ S 2m , f (x) is equal to neither x nor −x, so there is a
unique great circle starting at x and passing through f (x). Taking the unit tangent vector to that
circle at x (pointing towards f (x)) gives us a smooth nowhere-zero vector field v on S 2m . But by
Theorem 7.9 there can be no such.
8.7.11 As the hint suggests, if there is no x ∈ D n with f (x) = 0, then we can define the map
f
: D n → S n−1 . Consider tf (x)+(1−t)x = x+t(f (x)−x), 0 ≤ t ≤ 1. Then, since kf (x)−xk < 1
kf k
for all x ∈ S n−1 , it follows from the triangle inequality that kx+t(f (x)−x)k ≥ kxk−kf (x)−xk
> 0
x
for all x ∈ S n−1 and all t ∈ [0, 1]. Therefore, we can define H : S n−1 × [0, 1] → S n−1 by H =
t
tf (x) + (1 − t)x
. H is smooth and gives a homotopy between the identity map on S n−1 and
ktf (x) + (1 − t)xk
fZ/kf k. It follows
∗ from Proposition 7.4 that, taking ω to be the volume form of S n−1 , we have
f
ω 6= 0. But if f 6= 0 on D n , then f /kf k is a smooth function from D n to S n−1 , and,
S n−1 kf k
just as in the proof of Theorem 7.2, we have
Z ∗ Z ∗ Z ∗
f f f
0 6= ω= d ω= dω = 0.
S n−1 kf k Dn kf k Dn kf k
n Z
X 1 X k
∂fk d
d(I(φ)) = t (tx)dt dxℓ ∧ (−1)j−1 xij dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik
0 ∂xi
ℓ=1 j=1
Z 1
+k tk−1 fI (tx)dt dxI .
0
Pn ∂f
I
On the other hand, dφ = dxℓ ∧ dxI , so
ℓ=1 ∂x ℓ
n Z
X 1 k
X
k ∂fI d
I(dφ) = t (tx)dt xℓ dxI + (−1)j xij dxℓ ∧ dxi1 ∧ · · · ∧ dx ij ∧ · · · ∧ dxik .
0 ∂xℓ
ℓ=1 j=1
256 8. DIFFERENTIAL FORMS AND INTEGRATION ON MANIFOLDS
Remark: Every simple closed curve in R3 bounds an orientable surface, called its Seifert surface.
See Adams, Colin, The Knot Book , W. H. Freeman & Co., 1994, pp. 95 ff.
8.7.15 If v ∈ R3 , define 1 2
a vector field Xv on S ×S as follows. (Recall
2 2
thatρ : R → R denotes
p tρ(p)
rotation by π/2.) If ∈ S 1 × S 2 , set projq v = tq, and let Xv = ∈ T p (S 1 × S 2 ).
q v − tq
q
Then the vector fields Xei , i = 1, 2, 3, are the desired linearly independent vector fields.
8.7. APPLICATIONS TO TOPOLOGY 257
8.7.16 a.
V is obviously smooth away from the origin. It isn’t difficult to check that DV(0) = O
∂Vi
and that ≤ Ckxk for some constant C, so the partial derivatives are continuous at 0.
∂xj
b. Since the function f0 is one-to-one and onto, we strongly suspect that for sufficiently
small |t|, the same will be true of ft . Note that Dft (x) = I + tDV(x); since D n+1 is compact, there
is a constant K so that kDV (x)k ≤ K. Therefore, by the Inverse Function Theorem, Theorem 2.1
of Chapter 6, whenever |t| < 1/K, the function ft will have a local inverse at every x. What we want
to establish is that for sufficiently small |t|, the function ft is globally one-to-one. Suppose not. Then
we would have a sequence tk → 0 along with points xk 6= yk ∈ D n+1 with ftk (xk ) = ftk (yk ). Since
D n+1 is compact, by Theorem 1.1 of Chapter 5, we can find convergent subsequences xkj → x0 and
ykj → y0 . Then ftkj (xkj ) → f0 (x0 ) = x0 and ftkj (ykj ) → f0 (y0 ) = y0 , so x0 = y0 . Now, we would
like to assert that having xkj → x0 and ykj → x0 should contradict the fact that the functions ftkj
are locally one-to-one at x0 ; the flaw is that the neighborhood of x0 on which ftkj is one-to-one may
well not include both xkj and ykj . Thus, following the hint, consider F : D n+1 × R → Rn+1 × R,
|
x ft (x) x I + tDV V(x) x0
F = . Then DF = . Since the matrix DF is
t t t | 0
0 ··· 0 1
x0
invertible, by the Inverse Function Theorem, F is locally invertible on a neighborhood of ,
0
x y
hence one-to-one on that neighborhood. Therefore, F kj = F kj =⇒ xkj = ykj for j
tkj tkj
sufficiently large, contradicting our hypothesis.
Note that so far we haven’t used the fact that we started with a vector field v on S n . This
means that v(x) · x = 0, and so, for all x ∈ D n+1 , V(x) · x = 0 as well. Therefore, we have
√ √
kft (x)k = 1 + t2 for kxk = 1 and kft (x)k < 1 + t2 kxk for kxk < 1. So we want to claim that for
√
small |t|, the function ft maps D n+1 onto the closed ball of radius 1 + t2 centered at the origin.
It follows from the proof of the Inverse Function Theorem that the image of ft is the intersection
√
of B(0, 1 + t2 ) with an open subset of Rn+1 . It follows from Exercise 5.1.10 that the image of ft
√
is compact, hence a closed subset of B(0, 1 + t2 ). Because the disk is connected, it is a fact that
the only nonempty subset that is both open and closed is the whole set. (See also Exercise 2.2.11.)
Z
√
c. By Theorem 6.4 of Chapter 7, we have vol(B(0, 1 + t2 )) = | det Dft |dV , which
D n+1
is going to come out a polynomial function of t.
√
d. Since we have n = 2m, we know that vol(B(0, 1 + t2 )) = vol(D n+1 )(1 + t2 )(2m+1)/2 ,
which is not a polynomial. From this contradiction, we infer that when n is even there can be no
nowhere-vanishing vector field on S n .
CHAPTER 9
Eigenvalues, Eigenvectors, and Applications
9.1. Linear Transformations and Change of Basis
" # " #
2 1 −1 2 −1
9.1.1 a. The change-of-basis matrix is P = , whose inverse is P = .
3 2 −3 2
Thus,
(Note that we can clearly visualize the succession of moves here: To reflect across the line spanned
by v1 , we first rotate angle −θ, so that the desired axis is now horizontal, we reflect across the
horizontal axis, and then we rotate back through angle θ.)
258
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 259
1 1
9.1.3 a. A basis for the plane is given by u1 = 1 , u2 = 0 . To get an orthogonal basis
0 1
(which is actually unnecessary in this problem), we use the Gram-Schmidt process:
1
v1 = u1 = 1
0
1 1 1/2
u 2 · v1 1
v2′ = u2 − v1 = 0 − 1 = −1/2 .
kv1 k2 1
2
0 1
1
It is probably easier to clear out the fractions and work with v2 = −1 . Finally, we take v3 to
2
−1
be the normal vector of the plane: v3 = 1 .
1
b. Since T (v1 ) = v1 , T (v2 ) = v2 , and T (v3 ) = −v3 , the matrix for T with respect to
1 0 0
the basis B ′ = {v1 , v2 , v3 } is [T ]B′ = 0 1 0 .
0 0 −1
1 1 −1
c. Let P = 1 −1 1 be the change-of-basis matrix from the standard basis E to
0 2 1
kv1 k2 0 0 2
B ′ . Since the columns of P are orthogonal, P TP = 0 kv2 k2 0 = 6 ,
0 0 kv3 k2 3
1
1 1
2 2 2 0
T 1 . (See also Exercise 5.5.16.) Thus,
and so P −1 = 1
6 P = 1
6
1
−6 3
1
3 − 13 1
3
1
3
1 1
1 1 −1 1 0 0 2 2 0 1 2 2
[T ] = [T ]E = P [T ]B′ P −1
1 = 1
= 1 −1 10 1 0 16 − 61 3 2 1 −2 .
3
0 2 1 0 0 −1 − 13 1
3
1
3 2 −2 1
1 0
9.1.4 Let u1 = 0 and u2 = 1 . Then {u1 , u2 } is a basis for V . Using the Gram-
1 −2
Schmidt process, we get an orthogonal basis {v1 , v2 } for V :
1 0 1 1
−2
v1 = u1 = 0 , v2 = u2 − projv1 u2 = 1 − 0 = 1.
2
1 −2 1 −1
260 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
−1
We find easily that v3 = 2 gives a basis for V ⊥ . With respect to the basis B ′ = {v1 , v2 , v3 }
1
1 0 0
the matrix for T = projV is [T ]B′ = 0 1 0 , so the standard matrix is given by [T ] =
0 0 0
1 1
1 1 −1 2 0 2
P [T ]B′ P −1 , where P = 0 1 2 and P −1 = 31 1
3 − 31 . Thus,
1 −1 1 − 16 1
3
1
6
1 1
1 1 −1 1 0 0 2 0 2 5 2 1
1
[T ] = 0 1 20 1 0 13 1
3 − 31 = 2 2 −2 .
6
1 −1 1 0 0 0 − 16 1
3
1
6 1 −2 5
2 −2 1
9.1.5
The plane is spanned by v1 = 1 and v2 = 0 , and the vector v3 = −2
0 1 2
is normal to the plane. The matrix for T with respect to the basis B ′ = {v1 , v2 , v3 } is [T ]B′ =
1 0 0 2 −2 1
0 1 0 , and so the standard matrix for T is [T ] = P [T ]B′ P −1 , where P = 1 0 −2
0 0 −1 0 1 2
2 5 4
−1 1
and P = −2 4 5 . Thus,
9
1 −2 2
2 −2 1 1 0 0 2 5 4 7 4 −4
1 1
[T ] = 1 0 −2 0 1 0 −2 4 5 = 4 1 8.
9 9
0 1 2 0 0 −1 1 −2 2 −4 8 1
9.1.6 Rotation through an angle of π/2 about the x3 -axis is given by the matrix B =
0 −1 0
1 0 0 ; rotation through an angle of π/2 about the x1 -axis is given by the matrix
0 0 1
1 0 0
A = 0 0 −1 . Thus, the standard matrix for the composition of these two rotations
0 1 0
1 0 0 0 −1 0 0 −1 0
is AB = 0 0 −1 1 0 0 = 0 0 −1 .
0 1 0 0 0 1 1 0 0
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 261
1 −1 1
9.1.7 The vectors v1 = 1 and v2 = 0 give a basis for V ; v3 = −1 gives a
0 1 1
basis for V ⊥
.For part c we will
want an orthonormal basis,
so
the Gram-Schmidt process yields
1 −1 1
1 1 1
q1 = √ 1 and q2 = √ 1 , and we take q3 = √ −1 .
2 0 6 2 3 1
a. Working with the “new” basis B ′ = {v1 , v2 , v3 }, we have the change-of-basis matrix
1 −1 1 1 2 1
1
P = 1 0 −1 , P −1 = −1 1 2 , and the matrix for projection onto V is [T ]B′ =
3
0 1 1 1 −1 1
1 0 0
0 1 0 . Thus, the standard matrix is
0 0 0
1 −1 1 1 0 0 1 2 1
1
[T ] = P [T ]B′ P −1 = 1 0 −1 0 1 0 −1 1 2
3
0 1 1 0 0 0 1 −1 1
2 1 −1
1
= 1 2 1.
3
−1 1 2
1 −1 1 1 0 0 1 2 1
1
[S] = P [S]B′ P −1 = 1 0 −1 0 1 0 −1 1 2
3
0 1 1 0 0 −1 1 −1 1
1 2 −2
1
= 2 1 2.
3
−2 2 1
Notice that S = 2T − I.
c. Since we’re dealing with rotation now, we want to use the orthonormal basis (indeed,
we need the basis for V to be orthonormal;
the length of the
normal vector is immaterial). Now
√1 − 6
√1 √1
12 3
we take the change-of-basis matrix Q = √
2
√1
6
− √1 , and, since Q is orthogonal, we have
3
2 1
0 √
6
√
3
Q−1 = QT . The basis {q1 , q2 , q3 } is right-handed, since, for example, q1 × q2 = q3 . This means
262 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
√3
2 − 12 0
√
1 3
that q1 rotates toward q2 and q2 toward −q1 . Thus, we have the matrix [R]B′ = 2 2 0 ,
0 0 1
and
√ 1
√1 − √16 √1 3
− 21 0 √ √1 0
2 3 2 √ 2 2
[R] = Q[R]B′ Q−1 =
√1
2
√1
6
− √13
1
2 2
3
0 − √1
6
√1
6
√2
6
0 √20 0 √1 1 √1 − √1 √1
6 3 3 3 3
√ √
3+1 −1 − 3+1
1 √ √
= 3−1 3+1 −1 .
3 √ √
1 3−1 3+1
1 0
0 1
9.1.8 Let v1 = ⊥
2 and v2 = −1 . We obtain a basis for V by finding a basis for
1 1
" # −2 −1
1 0 2 1 1
the nullspace of the matrix : v3 = , v4 = −1 . Then we have
0 1 −1 1 1 0
0 1
projV v1 = v1 , projV v2 = v2 , projV v3 = 0, and projV v4 = 0, and so the matrixfor T = projV
1 0 0 0
0 1 0 0
′
with respect to the new basis B = {v1 , v2 , v3 , v4 } is [T ]B =
′ . Letting P =
0 0 0 0
0 0 0 0
1 0 −2 −1 3 1 5 4
0 1 1 −1 6 −4 7
, we have P −1 = 1 1 , and so [T ] = P [T ]B′ P −1 =
2 −1 1 0 17 −5 4 3 −1
1 1 0 1 −4 −7 −1 6
3 1 5 4
1 1 6 −4 7.
17 5 −4 14 1
4 7 1 11
v2
v3=a
v1
The equation x21 + x22 = 1 can be rewritten as x21 + x22 + x23 = 1 + x23 , and so the circle on the cylinder
given by intersecting with the plane x3 = b is given in the new coordinates by
The equation of the projection of this set onto the plane y3 = 0 is given by eliminating y3 from
these equations:
y22 − 2by2 sin φ + b2 sin2 φ (y2 − b sin φ)2
y12 + = y 2
1 + = 1,
cos2 φ cos2 φ
which we recognize as the equation of an ellipse. Thus, the projection of the portion of the cylinder
with −h ≤ x3 ≤ h gives the family of ellipses
(y2 − b sin φ)2
y12 + = 1, −h ≤ b ≤ h.
cos2 φ
This can be pictured by drawing the single ellipse with b = 0 and then displacing it continuously
vertically so that its center moves from −h sin φ to h sin φ. Notice that when h sin φ ≥ cos φ, this
appears as a rectangle with two half-elliptical ends (as we’d expect); when h sin φ < cos φ, there is
a “hole” in the projection.
y2 y2
cos φ
h sin φ
1 y1 y1
9.1.13 As we see in the figure below, the shape that is generated consists of two cones (with
√
vertex angle 2 arccos(1/ 3) ≈ 109◦ ) joined by a band in the shape of a hyperboloid of one sheet.
266 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
The hyperboloid is the surface of revolution obtained by rotating, say, the line segment joining
−1 1
1 and 1 about the given axis.
−1 −1
Let’s find the equationof that
surface in
a new
coordinate
system
corresponding to the or-
1 1 1
1 1 1
thonormal basis v1 = √ −1 , v2 = √ 1 , v3 = √ 1 . Letting A be the orthogonal
2 0 6 −2 3 1
matrix with these column vectors, we have
√1 − √12 0
2
A−1 = AT =
√1
6
√1
6
− √26
.
√1 √1 √1
3 3 3
We now solve for the curve of intersection√of this surface with the plane x1 = 0: We have tan s =
√ t−1 t+3 3(t − 1)
3 , so cos s = √ and sin s = √ , and
t+3 2 t2 + 3 2 t2 + 3
√ !
1 3 2 1 2 2 p2 1
x2 = √ √ (t − 1) + √ (t + 3) =√ t +3 and x3 = √ t,
2
2 t +3 2 6 6 3
so the “profile curve” of the surface of revolution is the segment of the hyperbola
x22 1 1
− x23 = 1, − √ ≤ x3 ≤ √ .
2 3 3
9.1. LINEAR TRANSFORMATIONS AND CHANGE OF BASIS 267
9.1.14 a. Let v ∈ V and let w = T (v). As in the proof of Theorem 1.1, we let x and x′ denote
the coordinate vectors of v with respect to the bases V and V′ , respectively. Likewise, we let y and
y′ denote the coordinate vectors of w with respect to W and W′ , respectively. We have
′
x = P x′ , y = Qy′ , y′ = [T ]W ′
V′ x , and y = [T ]W
V x.
′ ′
Thus, y = Qy′ = Q[T ]W V′
x′ = Q[T ]WV′
P −1 x. Since this works for any v ∈ V , we have [T ]W
V =
′ ′
W −1 W −1 W
Q[T ]V′ P , or, equivalently, [T ]V′ = Q [T ]V P .
b. If we use the basis V in both domain and range, then the matrix for T is [T ]VV = I.
Now, changing basis in the domain to V′ but keeping the same basis for the range, we have Q = I,
and so [T ]VV′ = Q−1 [T ]VV P = P .
9.1.15 First assume that A = QP , where Q is orthogonal and P is a projection matrix. Let
V = C(P ) and note that R(A) = R(QP ) = R(P ) = C(P T ) = C(P ) = V , since P T = P and
Q is nonsingular. Also note that C(A) = {Qv : v ∈ V }. Now, if x ∈ R(A) = V , we have
AT Ax = (P Q−1 )(QP )x = P x = x. If y ∈ C(A), then we can write y = Qv for some v ∈ V ,
so AAT y = (QP )(P Q−1 )y = (QP Q−1 )y = QP v = Qv = y. Thus, T : R(A) → C(A) and
S : C(A) → R(A) are inverse functions.
To prove the converse, let’s assume T : R(A) → C(A) and S : C(A) → R(A) are inverse
functions. Choose an orthonormal basis {v1 , . . . , vk } for R(A). We claim that {Av1 , . . . , Avk }
is an orthonormal basis for C(A). First, notice that Avi · Avj = vi · AT Avj = vi · vj , since
AT A is the identity on R(A). Thus, {Av1 , . . . , Avk } is an orthonormal set of vectors in C(A).
Since dim C(A) = dim R(A), it must give a basis for C(A). Now, extend the set {v1 , . . . , vk }
to give an orthonormal basis V = {v1 , . . . , vn } for Rn and extend the set {Av1 , . . . , Avk } to give
an orthonormal basis W = {Av1 , . . . , Avk , wk+1 , . . . , wn } for Rn . Notice that for j > k we have
vj ∈ R(A)⊥ = N(A) and so Av " j = 0 for# j > k. Thus, the matrix for T with respect to the bases
I k O
V and W is given by [T ]W
V = .
O O
Now let Q1 be the matrix whose columns are the respective basis vectors of V, and let Q2 be
the matrix whose columns are the respective basis vectors of W. Since these are orthonormal bases,
Q1 and Q2 are orthogonal matrices. Then, by Exercise 14, the standard matrix for T is given by
−1 −1 W −1
[T ] = Q1 [T ]W
V Q2 = Q1 Q2 Q2 [T ]V Q2 .
Letting Q = Q1 Q−1 W −1
2 and P = Q2 [T ]V Q2 , we see that A = QP , where Q is orthogonal (see Exercise
1.4.35) and P is a projection matrix.
268 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
l. p(t) = −(t + 1)(t − 2)(t − 3), so the eigenvalues are λ1 = −1, λ2 = 2, and λ3 = 3. We
have
2 −6 4 −1 −6 4 −2 −6 4
A + I = −2 −3 5, A − 2I = −2 −6 5, and A − 3I = −2 −7 5,
−2 −6 8 −2 −6 5 −2 −6 4
1 2 1
so 1 , 1 , and −1 give bases for the respective eigenspaces.
1 2 −1
m. p(t) = −(t − 1)2 (t − 3), so the eigenvalues are λ1 = λ2 = 1 and λ3 = 3. We have
2 2 −2 0 2 −2
A−I = 2 1 −1 and A − 3I = 2 −1 −1 ,
2 1 −1 2 1 −3
0 1
and we find that 1 gives a basis for E(1) and 1 gives a basis for E(3).
1 1
n. p(t) = (t − 1)2 (t − 2)2 (e.g., applying Exercise 7.5.10), and so the eigenvalues are
λ1 = λ2 = 1 and λ3 = λ4 = 2. We have
0 0 0 1 −1 0 0 1
0 0 1 1 0 −1 1 1
A−I = 0 and A − 2I =
,
0 1 0 0 0 0 0
0 0 0 1 0 0 0 0
1 0
1 0
0 , is a basis for E(1) and , 1 is a basis for E(2).
1 1
so 0 0 0 1
0 0 1 0
9.2.2 Proposition 2.3 tells us that 0 is an eigenvalue of A if and only if det A = 0, which is
true precisely when A is singular. (See Theorem 5.5 of Chapter 7.) Alternatively, directly from the
definition, 0 is an eigenvalue if and only if Av = 0 for some nonzero vector v, which means A is
singular.
9.2.3 If A is upper (lower) triangular, then so is A − tI, so, by Proposition 5.12 of Chapter 7,
we have p(t) = det(A − tI) = (a11 − t)(a22 − t) · · · (ann − t), whose roots are the diagonal entries of
A.
Remark: To be complete, how do we know these are the only possibilities? Suppose T (x) = λx.
Write x = v + w, where v ∈ V and w ∈ V ⊥ . Then, applying T , we have λ(v + w) = v, so
(λ − 1)v + λw = 0. Since v · w = 0, we see easily that λ = 0 and v = 0 or λ = 1 and w = 0. A
similar argument works in the case of S.
9.2.8 By Proposition 4.5 of Chapter 1, we have Ax·y = x·AT y. Since Ax = λx and AT y = µy,
we have λx · y = x · µy, so (λ − µ)(x · y) = 0. If λ 6= µ, then we must have x · y = 0.
Since A and AT have the same characteristic polynomial, they have the same eigenvalues.
" #
3 1 1
b. Let A = . Then 4 is an eigenvalue of both A and AT . gives a basis
−3 7 1
−3
for EA (4), whereas gives a basis for EAT (4).
1
272 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
9.2.10 Suppose λ1 , . . . , λn are the roots of p(t) = det(A−tI). Then, by the root-factor theorem,
since p(t) has degree n, there is a constant c so that p(t) = c(t − λ1 )(t − λ2 ) · · · (t − λn ). Since the
coefficient of tn in p(t) is (−1)n , we infer that p(t) = (−1)n (t − λ1 )(t − λ2 ) · · · (t − λn ). Therefore,
det A = p(0) = (−1)n (−λ1 )(−λ2 ) · · · (−λn ) = λ1 λ2 · · · λn .
9.2.11 a. Suppose A is nonsingular. Then A is invertible and so BA = A−1 (AB)A, from which
we see that AB and BA are similar. By Lemma 2.4, pAB (t) = pBA (t).
b. Applying Exercise 7.5.10a, consider
" #" #! " #
tI −A tI A t2 I − AB O
det = det = det(t2 I − AB) det(tI).
O I B tI B tI
On the other hand, we have
" #" #! " # " # " #
tI −A tI A tI −A tI A tI A
det = det det = det(tI) det ,
O I B tI O I B tI B tI
" #
tI A
and so we conclude that det(t2 I − AB) = det . Now, by Exercise 7.5.10c, for any
B tI
t 6= 0, the latter is equal to det(t2 I − BA), and so pAB (t2 ) = pBA (t2 ) for all t 6= 0. Therefore,
pAB (u) = pBA (u) for all u > 0, and so the polynomials are identical.
We can also avoid the reference to the latter exercise, by proceeding directly:
" # " #" #! " #
tI A I O tI A tI A
det(tI) det = det = det
B tI −B tI B tI O t2 I − BA
= det(tI) det(t2 I − BA),
as desired.
9.2.14 Let the eigenvalues of A be λ1 and λ2 . Since λ1 and λ2 are integers and their product
is det A = 120 (see Exercise 10), they must be distinct. Therefore, by Corollary 2.7, A must be
diagonalizable.
9.2.16 a. (A − 2I)2 = O.
" # " # " #
−1 1 0 1
b. A − 2I = , so we see by inspection that (A − 2I) = . How
−1 1 1 1
would we know a priori that this system is consistent? Part a tells us that if b ∈ C(A − 2I), then
b ∈ N(A − 2I), so C(A − 2I) ⊂ N(A − 2I). On the other hand, by Lemma 3.8 of Chapter 4, since
both of these subspaces are one-dimensional, they must be equal. Therefore, v1 ∈ N(A − 2I) =⇒
v1 ∈ C(A − 2I).
274 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
c." Since Av
# 1 = 2v1 and Av2 = v1 + 2v2 , the matrix for A with respect to the basis
2 1
{v1 , v2 } is .
0 2
9.2.20 a. p = pA is a polynomial of degree 3 and hence has a real root. (This is a standard
application of the intermediate value theorem, since polynomials are continuous and p(t) → −∞
as t → ∞ and p(t) → ∞ as t → −∞.)
b. Since A is orthogonal, we have AT A = I. Therefore, kAxk2 = Ax · Ax = x · AT Ax =
x · x = kxk2 . If x is an eigenvector with corresponding eigenvalue λ, then Ax = λx and so
kAxk = |λ|kxk = kxk, from which we deduce that |λ| = 1.
c. Since det A = 1, the characteristic polynomial takes the form p(t) = −t3 + · · · + 1.
Since p(0) = 1 and p(t) → −∞ as t → ∞, we see that the graph of p(t) must cross the positive real
axis. But we showed in part b that this can happen only at t = 1.
9.2. EIGENVALUES, EIGENVECTORS, AND DIAGONALIZABILITY 275
d. Assuming A 6= I, we first show that dim E(1) = 1. Suppose, to the contrary, that
dim E(1) = 2. Let {v1 , v2 } be a basis for E(1) and let v3 be a basis for E(1)⊥ . Then for i = 1, 2
we have Av3 · vi = v3 · AT vi = v3 · A−1 vi = v3 · vi = 0, so Av3 = λv3 for some scalar λ. By part
b, λ can equal only 1 or −1. We rule out the former because A 6= I; we rule out the latter because
the product of the eigenvalues must equal det A = 1.
Now choose an orthonormal basis {v1 , v2 , v3 } for R3 with v1 ∈ E(1). Let’s consider the matrix
B for A with respect to this basis. For j = 2, 3 we have Avj ·v1 = vj ·AT v1 = vj ·A−1 v1 = vj ·v1 = 0,
so Avj ∈ Span(v2 , v3 ) for j = 2, 3. This means that B takes the form
1 0 0
B= 0 .
C
0
Now what do we know about the matrix C? We know that C T C = I and det C = det B = 1.
Therefore, we conclude from Exercise 1.4.34 that µC gives a rotation of the plane spanned by v2
and v3 , and thus µA gives rotation through some angle θ about the line E(1).
e. Reversing the argument of part d, suppose T : R3 → R3 is a linear transformation
giving the rotation about some axis through some angle θ. Then with respect to the obvious
orthonormal basis for R3 it has the matrix
1 0 0
B = 0 cos θ − sin θ .
0 sin θ cos θ
Then the change-of-basis formula tells us that its standard matrix A = P BP −1 is orthogonal,
since AT A = (P BP −1 )T (P BP −1 ) = P B T P T P BP −1 = I. Likewise, det A = 1. Now, the matrix
representing the composition of two rotations is therefore the product of two orthogonal matrices
with determinant 1 and therefore is again an orthogonal matrix of determinant 1 (see Exercise
1.4.35). Thus, by part d, the composition of two rotations in R3 is again a rotation.
9.2.21 Although we could calculate the characteristic polynomial of C, that doesn’t appear to
be too much fun. Let’s just see if 1 is an eigenvalue by finding N(C − I). With a bit of care, we
find that
√
6
√
6
− 65 1
3 + 6
1
6 − 3√ 1 0 −1
√
6
C − I = 31 − 66 2
3 − 2
3 + 6 0 1 −2 ,
√ √
1 6 1 6
6 + 3 3 − 6 − 56 0 0 0
1 1 1
1 1
and so v1 = 2 gives a basis for E(1). Now, v2 = √ −1 and v3 = √ 0 give an
1 3 1 2 −1
orthonormal basis for E(1)⊥ , and we calculate that Cv2 = −v3 and Cv3 = v2 . It follows that T is
rotation through angle −π/2 about the axis spanned by v1 (as viewed from positively far out the
v1 -axis). (Note that v2 × v3 is in the direction of v1 , so {v1 , v2 , v3 } is a “right-handed” basis for
R3 .)
276 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
9.2.22 When n = 1 there is nothing to prove. Since all of the eigenvalues of A are real, A
must have at least one eigenvector. Let v1 be an eigenvector with corresponding eigenvalue λ1 , and
choose v2′ , . . . , vn′ so that {v1 , v2′ , . . . , vn′ } gives a basis for Rn . The matrix for A with respect to
this basis takes the form
λ1 ∗ · · · ∗
0
A′ = ..
,
. B
0
where B is an (n − 1) × (n − 1) matrix. Since det(A − tI) = det(A′ − tI) = (λ1 − t) det(B − tI),
we see that all the eigenvalues of B must be real. By induction, there is a basis {v2′′ , . . . , vn′′ }
for Span(v2′ , . . . , vn′ ) with respect to which the matrix for B becomes upper triangular. Then the
matrix A′′ for A with respect to the basis {v1 , v2′′ , . . . , vn′′ } is upper triangular, as desired.
9.2.23 Proceeding as suggested by the hint, let B = {v1 , . . . , vk , vk+1 , . . . , vn } be a basis for V
with Span(v1 , . . . , vk ) = W . Since T (W ) ⊂ W , the matrix for T with respect to B will take the
block form " #
B D
A= .
O C
Using Exercise 7.5.10 as usual, we have det(A − tI) = det(B − tI) det(C − tI). Fix an eigenvalue
λ of B. We must show that its geometric multiplicity equals its algebraic multiplicity. Denote
by dA , dB , and dC the geometric multiplicities of the eigenvalue λ for the respective matrices and
denote by mA , mB , and mC the corresponding algebraic multiplicities. Since pA (t) = pB (t)pC (t),
it follows that mA = mB + mC . We also know that dA ≤ mA , dB ≤ mB , and dC ≤ mC , and we
wish to prove that dB = mB .
The crucial observation is that rank(A − λI) ≥ rank(B − λI) + rank(C − λI) (as there may be
extra pivots in the upper right corner of A−λI). Since dA = n−rank(A−λI), dB = k−rank(B−λI),
and dC = (n − k) − rank(C − λI), we then infer that dA ≤ dB + dC . Putting together all the
information, we have
dA ≤ dB + dC ≤ mB + mC = mA .
But since T is diagonalizable, we know that dA = mA , and so equality must hold at every stage,
as in the proof of Theorem 2.9. In particular, we have dB = mB , and B will be diagonalizable.
9.2.24 a. Since A and B have the same eigenvectors, there is a single nonsingular matrix P so
that both P −1 AP = Λ1 and P −1 BP = Λ2 are diagonal. Since diagonal matrices commute, we have
(P −1 AP )(P −1 BP ) = Λ1 Λ2 = Λ2 Λ1 = (P −1 BP )(P −1 AP ),
distinct eigenvalues, there is a basis for Rn consisting of eigenvector of A. Since these are also
eigenvectors of B, it follows that B is diagonalizable.
The answer to the query is no: For example, if B = I, then every vector is an eigenvector of
B, but certainly need not be one of A.
c. As in part b, if v ∈ EA (λ), then A(Bv) = B(Av) = λ(Bv), so Bv ∈ EA (λ).
Therefore, B(EA (λ)) ⊂ EA (λ). Applying Exercise 23, since we are told that B is diagonalizable,
it follows that there is a basis for EA (λ) consisting of eigenvectors of B. Finally, since A is
diagonalizable, we know that the eigenspaces of A span all of Rn , so we conclude that there is a
basis {v1 , . . . , vn } for Rn consisting of eigenvectors of both A and B. Letting P be the matrix
whose column vectors are v1 , . . . , vn , we conclude that both P −1 AP and P −1 BP are diagonal.
9.2.25 This exercise is easy once we observe that Theorem 2.6 can be rephrased as follows:
Suppose λ1 , . . . , λk are distinct eigenvalues of a linear transformation. If Vi ∈ E(λi ) for i = 1, . . . , k,
and V1 + · · · + Vk = 0, then Vi = 0 for all i. (If not, discarding those that are 0, suppose
Vi1 + · · · + Vis = 0 and each Vij 6= 0. Then each Vij is an eigenvector with corresponding
eigenvalue λij , and this gives a nontrivial linear relation among eigenvectors corresponding to
distinct eigenvalues, contradicting the theorem.)
a. Suppose c1 v1 + c2 v2 + · · · + ck vk + d1 w1 + · · · + dℓ wℓ = 0. Set V = c1 v1 + c2 v2 + · · · +
ck vk ∈ E(λ) and W = d1 w1 + · · · + dℓ wℓ ∈ E(µ). Then, by our remark, we must have V = W = 0,
and now linear independence of the respective sets of vectors tells us that c1 = · · · = ck = 0 and
d1 = · · · = dℓ = 0.
P
k P di
(i) Pdi
(i)
b. Generalizing this argument, suppose cij vj = 0. Set Vi = cij vj and note
i=1 j=1 j=1
that Vi ∈ E(λi ). Then we have V1 + · · · + Vk = 0. We conclude from our remark that Vi = 0
for all i = 1, . . . , k. Then linear independence of the individual sets of vectors implies that all the
cij = 0.
2
9.3.1 = t − 9 = (t + 3)(t − 3), so the
p(t) eigenvalues of A are λ1 = −3 and λ2 = 3. The
−1 5
vector v1 = spans E(−3) and v2 = spans E(3). Now we can write A = P ΛP −1 , where
1 1
" # " # " #
−1 5 −3 1 −1 5
P = , Λ= , and P −1 = .
1 1 3 6 1 1
Then
" #" #" # " #
1 −1 5 (−3)k −1 5 3k 5 + (−1)k 5(1 + (−1)k+1 )
Ak = P Λk P −1 = = .
6 1 1 3k 1 1 6 1 + (−1)k+1 5(−1)k + 1
9.3.2 There are three possible states at any given time: (1) two Buds in the first tub and two
Becks in the second; (2) one of each type of beer in each tub; (3) two Becks in the first tub and
two Buds in the second. Let xk be the vector whose ith coordinateis the probability that the beers
1
0 4 0
1
are in state i at time k. We find that the transition matrix is A = 1 2 1 . The unique
1
0 4 0
1
1
vector whose entries sum to 1 in E(1) is 4 , so as k → ∞, 2/3 of the time there will be exactly
6
1
one Becks in the first tub and 5/6 of the time there will be at least one Becks in the first tub.
3
1 5 0 0 0
3
0 0 5 0 0
A=
0
2
5 0 3
5 0.
2
0 0 5 0 0
2
0 0 0 5 1
(We notice that the first and last columns of Ak never change, so this is not a regular stochastic
matrix. Indeed, we see that E(1) is at least two-dimensional.) The characteristic polynomial of A
√
is p(t) = −t(t − 1)2 (t2 − 12
25 ), and so the eigenvalues are λ1 = 0, λ2 = λ3 = 1, λ4 = −2 3/5, and
√
λ5 = 2 3/5. (Note that |λ4 | = |λ5 | < 1.) We solve for the respective eigenvectors and find that
A = P ΛP −1 , where
0 − 94 0 1 9
4 √
9
4 √
1 15 0 0 −3(5+2 3) 3(−5+2 3)
4 √ 4 √ √ 4 √
Λ=
1
and P = 0 0 0 3(5+2 3) 3(−5+2 3)
.
√
2√ 2√
−253 −5 0 0 −5−2 3 −5+2 3
√ 2 2 2
2 3
5 1 1 0 1 1
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 279
0
1
Now, note that Λk → Λ∞ =
1 as k → ∞, so
0
0
0 9/13
4/13 0
Ak x0 → P Λ∞ (P −1 x0 ) = P
9/13 = 0 .
0 0
0 4/13
Thus, the probability that Gus eventually loses all his money is 9/13. Interestingly, we see from
this calculation that the probability that the game continues forever is 0.
ak 2
9.3.4 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 3
" #
0 1
where A = . Since p(t) = t2 − 3t + 2 = (t − 1)(t − 2), the eigenvalues of A are λ1 = 1
−2 3
" #
1 1 1 1
and λ2 = 2. The vector v1 = spans E(1) and v2 = spans E(2). Letting P =
1 2 1 2
" #
1
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
2
" #" # " # " #!
1 1 1 2 −1 2
xk = Ak x0 = k
1 2 2 −1 1 3
" #" #" # " #" # " #
1 1 1 1 1 1 1 1 + 2k
= = = ,
1 2 2k 1 1 2 2k 1 + 2k+1
so ak = 1 + 2k .
ak 1
9.3.5 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 − t − 6 = (t − 3)(t + 2), the eigenvalues of A are λ1 = −2 and
6 1
" #
1 1 1 1
λ2 = 3. The vector v1 = spans E(−2) and v2 = spans E(3). Letting P =
−2 3 −2 3
" #
−2
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
3
" #" # " # " #!
k 1 1 1 (−2)k 3 −1 1
xk = A x0 =
5 −2 3 3k 2 1 1
280 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
" #" #" # " #" #
1 1 1 (−2)k 2 1 1 1 (−1)k 2k+1
= k
=
5 −2 3 3 3 5 −2 3 3k+1
" #
1 (−1)k 2k+1 + 3k+1
= ,
5 (−1)k+1 2k+2 + 3k+2
1
so ak = 5 (−1)k 2k+1 + 3k+1 .
ak 0
9.3.6 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 −3t−4 = (t−4)(t+1), the eigenvalues of A are λ1 = −1 and
4 3
" #
1 1 1 1
λ2 = 4. The vector v1 = spans E(−1) and v2 = spans E(4). Letting P =
−1 4 −1 4
" #
−1
and Λ = , we have A = P ΛP −1 and Ak = P Λk P −1 , so
4
" #" # " # " #!
1 1 (−1)k 4 −1 0
1
xk = Ak x0 = k
5 −1 4 4 1 1 1
" #" #" # " #" # " #
1 1 1 (−1)k −1 1 1 1 (−1)k+1 1 (−1)k+1 + 4k
= = = ,
5 −1 4 4k 1 5 −1 4 4k 5 (−1)k + 4k+1
so ak = 51 (−1)k+1 + 4k .
ak 0
9.3.7 As in the text, set xk = , k ≥ 0. Then x0 = and we have xk+1 = Axk ,
ak+1 1
" #
0 1
where A = . Since p(t) = t2 − 4t + 4 = (t − 2)2 , the eigenvalues of A are λ1 = λ2 = 2.
−4 4
1
On the other hand, spans E(2), so A is not diagonalizable. Following Exercise 9.2.16, we
2
" # " #
1 0 2 1
find the change-of-basis matrix P = so that P −1 AP = Λ = . Then, setting
2 1 0 2
" #
0 1
N= , we have Λ = 2I + N and N 2 = O. Thus,
0 0
and so ak = k2k−1 .
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 281
ak 0 0 1 0
9.3.8 Let xk = ak+1 , k ≥ 0, x0 = 1 , and set A = 0 0 1 ; then we have
ak+2 1 −2 1 2
xk+1 = Axk . Now p(t) = −t3 + 2t2 + t − 2 = −(t − 1)(t + 1)(t
− 2),
eigenvalues
so the of A
1 1 1
are λ1 = −1, λ2 = 1, and λ3 = 2. Respective eigenvectors are −1 , 1 , and 2 . Then
1 1 4
A= P ΛP −1 , where
1 1 1 −1 2 −3 1
1
P = −1 1 2, Λ= 1 , and P −1 = 6 3 −3 .
6
1 1 4 2 −2 0 2
Therefore, we have
1 1 1 (−1)k 2 −3 1 0
k k −1 1
xk = A x0 = P Λ P x0 = −1 1 2 1 6 3 −3 1
6 k
1 1 4 2 −2 0 2 1
1 1 1 (−1)k −1 1 1 1 (−1)k+1
1 1
= −1 1 2 1 0 = −1 1 2 0
3 3
1 1 4 2k 1 1 1 4 2k
(−1)k+1 + 2k
1
= (−1)k+2 + 2k+1 ,
3
(−1)k+3 + 2k+2
so ak = 31 2k + (−1)k+1 .
ck
9.3.9 Set xk = , and let A be the transition matrix so that xk+1 = Axk .
mk
" #
0.7
0.1
a. We have A = , and p(t) = t2 − 1.7t + 0.72 = (t − 0.9)(t − 0.8), so the
−0.2
1.0
1 1
eigenvalues are λ1 = 0.8 and λ2 = 0.9. The vector v1 = spans E(0.8) and v2 = spans
1 2
" # " #
1 1 0.8
E(0.9), so, letting P = and Λ = , we have
1 2 0.9
" #" #" #" #
1 1 (0.8) k 2 −1 c0
xk = Ak x0 = P Λk P −1 x0 =
1 2 (0.9)k −1 1 m0
" #" #" #
1 1 (0.8)k 2c0 − m0
=
1 2 (0.9)k −c0 + m0
" # " #
1 1
= (2c0 − m0 )(0.8)k + (−c0 + m0 )(0.9)k .
1 2
c0
As k → ∞, xk → 0, no matter what happens to be.
m0
282 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
" #
1.3 0.2
b. We have A = , and p(t) = t2 − 2.3t + 1.32 = (t − 1.1)(t − 1.2), so the
−0.1 1.0
1 2
eigenvalues are λ1 = 1.1 and λ2 = 1.2. The vector v1 = spans E(1.1) and v2 = spans
" # " #−1 −1
1 2 1.1
E(1.2), so, letting P = and Λ = , we have
−1 −1 1.2
" #" #" #" #
1 2 (1.1)k −1 −2 c0
xk = Ak x0 = P Λk P −1 x0 =
−1 −1 (1.2)k 1 1 m0
" #" #" #
1 2 (1.1)k −c0 − 2m0
= k
−1 −1 (1.2) c0 + m0
" # " #
k −1 k 2
= (c0 + 2m0 )(1.1) + (c0 + m0 )(1.2) .
1 −1
Since (1.2)k dominates (1.1)k as k grows larger, we see that for any nonzero initial cat/mouse
population, the cat population grows without bound and the mice die out.
" #
1.1 0.3
c. Here we have A = , and p(t) = t2 − 2t + 0.96 = (t − 0.8)(t − 1.2), so the
0.1 0.9
1 3
eigenvalues are λ1 = 0.8 and λ2 = 1.2. The vector v1 = spans E(0.8) and v2 = spans
−1 1
" # " #
1 3 0.8
E(1.2), so, letting P = and Λ = , we have
−1 1 1.2
" #" #" #" #
1 3 (0.8) k 1 −3 c
1 0
xk = Ak x0 = P Λk P −1 x0 =
4 −1 1 (1.2)k 1 1 m0
" #" #" #
1 1 3 (0.8)k c0 − 3m0
= k
4 −1 1 (1.2) c0 + m0
" # " #
k 1 k 3
= 14 (c0 − 3m0 )(0.8) + 41 (c0 + m0 )(1.2) .
−1 1
Since (0.8)k → 0 as k → ∞, we see that for any nonzero initial cat/mouse population, the cat and
mouse populations grow without bound and approach a limiting ratio of 3 : 1.
•
9.3.10 Since we know that etA = AetA and eO = I, it follows that E(t) = etA is a solution.
To check uniqueness, we proceed as in the proof of Proposition 3.1. Suppose E(t) is any solution
of this differential equation; then
•
e−tA E(t) = −AetA E(t) + etA AE(t) = Ae−tA + e−tA A E(t) = O ,
and so e−tA E(t) is a constant matrix. Since at t = 0 this expression is equal to I, we must have
e−tA E(t) = I for all t, and therefore E(t) = etA for all t. (Here we are tacitly assuming, as we did
−1
in the text, that e−A = eA . See Exercise 17.)
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 283
2
c. We have p(t) = t − 2t − 8 = (t −4)(t+ 2), and so A is diagonalizable with eigenbasis
1 −1
v1 = , corresponding to λ = 4, and v2 = , corresponding to λ = −2. In this case
1 1
" #" #" # " # " #
1 −1 e4t 3 1 −1
x(t) = etA x0 = P etΛ P −1 x0 = = 3e4t − 2e−2t .
1 1 e−2t −2 1 1
f. Now we have p(t) = −t3 + t =−t(t− 1)(t + 1), and here the eigenvectors are =
v1
−1 −2 −2
0 , corresponding to λ = −1, v2 = 1 , corresponding to λ = 0, and v3 = 1 ,
1 2 1
corresponding to λ = 1. Thus,
−1 −2 −2 −1 −1 −2 0
P = 0 1 1, Λ = 0 , and P −1 = 0 1 1,
1 2 1 1 1 2 1
−1 −1 −2 −2
so P −1 x0 = −2 and x(t) = etA x0 = P etΛ P −1 x0 = −e−t 0 − 2 1 + et 1 .
1 1 2 1
y2 = a2 cos t + b2 sin t
for appropriate values of the constants. Using the initial conditions, we easily determine that
a1 = b1 = b2 = 1 and a2 = −1. Thus,
" # " #
√ √ 1 −5
6t − 6t
x = P y = (e +e ) + (− cos t + sin t) .
1 2
b. Referring to Exercise 11b, we have
" # " # " # " # " #
1 −2 0 1 2
ÿ = y, y0 = P −1 = , ẏ0 = P −1 = .
−1 2 2 3 1
Then we find
y1 = a1 et + b1 e−t
y2 = a2 cos t + b2 sin t,
and, using the initial conditions, we determine that a1 = b2 = 1, a2 = 2, and b1 = −1. Thus,
" # " #
t −t 1 −1
x = P y = (e − e ) + (2 cos t + sin t) .
1 1
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 285
y1 = a1 e2t + b1 e−2t
√ √
y2 = a2 cos 2t + b2 sin 2t,
d. Here we proceed “by hand.” The system of differential equations can be rewritten
explicitly as ẍ1 = x2 , ẍ2 = 0, so, using the initial conditions, we determine that x2 = t + 2 and
hence x1 = 16 t3 + t2 + 2t + 1.
" c. Now# because of the different masses, we must be slightly more careful. In this case,
−3 2 1
A = , whose eigenvalues are −1 and −4, with corresponding eigenvectors and
1 −2 1
−2
. The general solution is
1
" # " #
1 −2
x(t) = (a1 cos t + b1 sin t) + (a2 cos 2t + b2 sin 2t) .
1 1
286 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
0 1 0
9.3.14 As in Example 8, we write J = 2I + B, where B = 0 0 1 . Then B 2 =
0 0 1 0 0 0
3 = O. Thus,
0 0 0 and B
0 0 0
∞ k
X ∞ k
X
t t k(k − 1) k−2 2
etJ = Jk = 2k I + k2k−1 B + 2 B
k! k! 2
k=0 k=0
X∞ X∞ ∞
(2t)k tk−1 1 X tk−2
= I +t 2k−1 B + t2 2k−2 B 2
k! (k − 1)! 2 (k − 2)!
k=0 k=1 k=2
1 2
∞
X ∞
X 1 t 2t
(2t)k 1 (2t)k 1 2 2t 2 2t
= e2t I + t B + t2 2 2t 2t
B = e I + te B + t e B = e 0 1 t .
k! 2 k! 2
k=0 k=0 0 0 1
# "
y(t)
9.3.15 We consider x(t) = . Then we obtain the system ẍ = Ax, where
ẏ(t)
" # " #
0 1 −1
a. A = and x0 = . The eigenvalues of A are −1 and 2, with corre-
2 1 4
1 1
sponding eigenvectors and . Thus, taking
−1 2
" # " # " # " #
1 1 −1 −1 1 2 −1 −1 −2
P = , Λ= , so P = and P x0 = ,
−1 2 2 3 1 1 1
This means that y(t) = et (1 + t) is the solution of the original differential equation.
9.3. DIFFERENCE EQUATIONS AND ORDINARY DIFFERENTIAL EQUATIONS 287
" #
−k + 1 k
Alternatively, we can guess (and prove by induction) that Ak = , k ∈ N, and
−k k+1
from this derive
∞ k ∞ k
"# " #
X t −k +X1t k −t + 1 t
etA = Ak = = et .
k! k! −k k + 1 −t t+1
k=0 k=0
" #" # " #
−t + 1 t 1 t + 1
Then, as before, we have x(t) = etA x0 = et = et .
−t t+1 2 t+2
" # # "
y(t) 0 1
9.3.16 We set x(t) = and A = , and we wish to solve the system ẋ = Ax.
ẏ(t) −b −a
√
The eigenvalues of A are λ = 21 −a ± a2 − 4b . When a2 − 4b 6= 0, A is diagonalizable (perhaps
over C), and we obtain
" #" #" # " # " #
1 1 eλ1 t c1 λ1 t 1 λ2 t 1
x(t) = λ t
= c1 e + c2 e .
λ1 λ2 e 2 c2 λ1 λ2
Thus, when a2 − 4b > 0, the general solution of the original ODE is y(t) = c1 eλ1 t + c2 eλ2 t . When
a2 − 4b < 0, A has a pair of conjugate complex eigenvalues λ = α ± βi, and the general solution
is y(t) = eαt (c1 cos βt + c2 sin βt). Last, when a2 − 4b = 0, A has a repeated eigenvalue λ with
algebraic multiplicity 1, and, as in Exercise 15b,
" #" #" # " # " #
1 0 eλt teλt c1 λt 1 λt 0
x(t) = = (c1 + c2 t)e + c2 e ,
λ 1 eλt c2 λ 1
from which we infer that the general solution of the original ODE is y(t) = (c1 + c2 t)eλt .
9.3.17 a. Since A commutes with its own powers, we have AetA = etA A. Differentiating f (t) =
etA e−tA by the product rule, we obtain
•
etA e−tA = AetA e−tA + etA (−A)e−tA = (AetA − etA A)e−tA = O .
This means the matrix function f is constant, and so, in particular, f (t) = f (0) = I for all t. This
−1 −1
means that e−tA = etA . Setting t = 1, we infer that eA = e−A .
b. Using the properties of the transpose, we have
T X
∞
Ak T
∞
X (Ak )T
∞
X (AT )k T) −1
eA = = = = e(A = e−A = eA ,
k! k! k!
k=0 k=0 k=0
since, by the definition of matrix exponential, eU will be upper triangular with diagonal entries the
exponential of the diagonal entries of U .
d
9.3.18 a. We have D(exp)(O)A = dt exp(tA) = A, so D(exp)(O) = I : Mn×n → Mn×n .
t=0
Therefore, by the Inverse Function Theorem, exp has a C1 local inverse mapping a neighborhood
of exp(O) = I to a neighborhood of O.
t
0 −1 " #
0 −π
1 0 cos t − sin t π 0
b. We have seen in Example 7 that e = , so e =
sin t cos t
" #
−1
. On the other hand, we infer from Exercise 17c that det(eA ) > 0 for every A, so
−1
" #
−2
cannot be written as eA for any A.
1
9.3.19 Let F(x) = x. Then the associated flow is given by φt (x) = et x. Let Ω = B(r) be the
ball of radius r centered n t
Z at 0 in R and let V (r) = vol(B(r)). Then φt (Ω) = B(e r), and so we
have V̇(0) = rV ′ (r) = rdS = r area(∂Ω) , so V ′ (r) = area(∂Ω), as required.
∂Ω
9.3.20 a. Fix t = t0 as suggested in the hint and x0 ∈ B(a, δ). Now consider the two functions
of s given by f (s) = φs+t0 (x0 ) and g(s) = φs (φt0 (x0 )). Then both f and g are solutions of the
differential equation ẋ(s) = F(x(s)), x(0) = φt0 (x0 ). Therefore, by the uniqueness result stated in
the problem, for sufficiently small s, we have f (s) = g(s), which means that φs+t0 = φs ◦ φt0 . Since
this holds for arbitrary (small) t0 , this proves the desired result.
b. We have φt ◦ φ−t = φ−t ◦ φt = φt+(−t) = φ0 , and φ0 (x) = x for all x. Therefore,
φ−t = (φt )−1 .
c. Each of the following functions x is a solution of the given differential equation: for
0, t≤a
any a ≥ 0, take x(t) = .
(t − a)2 , t > a
9.3.21 Fix t. Let W(s) = V(t + s). By Exercise 20, V(t + s) = vol(φs+t (Ω)) = vol(φs (φt (Ω))).
| {z }
Z Z Ω0
˙ (t) = W
By Proposition 3.5, we have V ˙ (0) = div FdV = div FdV , as required.
Ω0 φt (Ω)
9.3.22 a. We differentiate the equation φ̇t (x) = F(φt (x)) with respect to x, using the chain
rule, and use smoothness to interchange the space and time derivatives:
Here we have used the result of Exercise 1.4.22c, as well as the observation that div F = tr(DF).
But this differential equation is easy to integrate:
Z t
J˙ (t)
= div F(φt (x)) =⇒ log J(t) − log J(0) = div F(φs (x))ds.
J(t) 0
Rt
div F(φs (x))ds
Now, φ0 (x) = x for all x, so J(0) = det D(φ0 )(x) = 1. Therefore, J(t) = J(0)e 0 =
Rt
e 0 div F(φs (x))ds , as required.
e.
The eigenvalues
λ1 = −3, λ2 = λ3 = 3, with corresponding eigenvectors v1 =
ofA are
−1 −1 1
−1 , v2 = 1 , and v3 = 0 . We use the Gram-Schmidt process to obtain an orthonormal
1 0 1
−1 1
basis for E(3): w2 = 1 , w3 = 1 . Thus, an orthogonal matrix that diagonalizes A is
0 2
− √13 − √12 − − √1
6
1
Q=
− 3
√ √1
2
√1
6
.
√1 0 √2
3 6
f. The
eigenvalues
of Aare λ1 =λ2 = 0 and λ3 =λ4 = 2, with corresponding eigen-
−1 0 1 0
0 −1 0 1
vectors v1 =
1 , v2 = 0 , v3 = 1 , and v4 = 0 . An orthogonal matrix that
0 1 0 1
−1 0 1 0
1 0 −1 0 1
diagonalizes A is Q = √ .
2
1 0 1 0
0 1 0 1
9.4.2 By the Spectral Theorem, we know that E(2) must be the orthogonal complement of
1 1
E(5). Since A is a 3 × 3 matrix, E(2) is one-dimensional, spanned by 1 . Therefore, A 1 =
2 2
1 2
2 1 = 2 .
2 4
1
9.4.3 Since 1 gives a basis for E(2), we deduce from the Spectral Theorem that E(1) =
1
−1 −1
E(2)⊥ and therefore a basis for E(1) is 1 , 0 . We then use the change-of-basis formula,
0 1
Theorem 1.1, to construct A. Letting
2 1 −1 −1
Λ= 1 and P = 1 1 0,
1 1 0 1
4 1 1
1
we have A = P ΛP −1 = 1 4 1 .
3
1 1 4
9.4. THE SPECTRAL THEOREM 291
" # " #
2 1 −1
Λ= and P = ,
3 1 1
" #
1 5 −1
we use the change-of-basis formula to derive A = P ΛP −1 = .
2 −1 5
9.4.5 Since A is symmetric, A is diagonalizable. But since λ is its only eigenvalue, we must
have P AP = λI for some invertible matrix P , and hence A = P (λI)P −1 = λI.
−1
9.4.6 First of all, we see that B is symmetric and therefore diagonalizable. Since A, C, and
D are upper triangular, we read off their eigenvalues from the diagonal entries. A has eigenvalue
5 with algebraic multiplicity 3, but geometric multiplicity 2 (because rank(A − 5I) = 1), and is
therefore not diagonalizable. C has distinct eigenvalues and, according to Corollary 2.7, is therefore
diagonalizable. As far as D is concerned, the eigenvalues are 1—with algebraic multiplicity 2—and
2. We see that the matrix D − I has rank 1, and so the geometric multiplicity of the eigenvalue 1
is also 2. Therefore, the matrix D is diagonalizable as well.
9.4.7 Suppose A is diagonalizable and its eigenspaces are orthogonal. Then there are a diagonal
matrix Λ and an orthogonal matrix Q so that Q−1 AQ = Λ. Then A = QΛQ−1 = QΛQT and
AT = (QΛQT )T = QΛT QT = QΛQT = A, as desired.
9.4.11 Choose x0 ∈ Rn with kx0 k = 1 so that kAx0 k = kAk. By the proof of the Spectral
Theorem, kAx0 k2 = xT T 2 2
0 (A A)x0 ≤ λkx0 k , so kAk ≤ λ. On the other hand, by the properties of
norm (see Exercises 5.1.6 and 7), we have λ ≤ kAT Ak ≤ kAkkAT k = kAk2 . Therefore, we have
√
kAk = λ.
Since, by hypothesis, all the λi ’s are positive and some ci 6= 0, we conclude that Ax · x > 0, so A
is positive definite. (Likewise, if all the eigenvalues are negative, then A is negative definite.)
c. Replace the >’s with ≥’s in the arguments given in part b.
d. For any x 6= 0, we have Ax · x = (C T C)x · x = Cx · Cx = kCxk2 . Since rank(C) = n,
we know that N(C) = {0}, and so kCxk2 > 0, from which we conclude that A is positive definite.
It follows from part b that the eigenvalues of A are all positive.
e. Since A and B"are symmetric,
# the matrix
" AB #+ BA is symmetric, but it need not be
1 −2 3 0
positive definite. Take A = and B = . Then A and B are positive definite,
−2 5 0 1
" #
6 −8
yet AB + BA = has negative determinant and therefore cannot be positive definite.
−8 10
9.4.13 This follows immediately from parts b and d of Exercise 12. Nevertheless, here is a
self-contained proof.
Suppose v is an eigenvector of AT A with eigenvalue λ. Then λkvk2 = (AT A)v · v = Av · Av =
kAvk2 . Since A is nonsingular, Av 6= 0, and we conclude that λ > 0.
Conversely, suppose the eigenvalues λ1 , . . . , λn of AT A are all positive and let {q1 , . . . , qn } be
an orthonormal basis for Rn consisting of eigenvectors of AT A. Now let x 6= 0 be arbitrary. Writing
Pn
x= ci qi , we have
i=1
X X X
kAxk2 = Ax · Ax = (AT A)x · x = ci λi qi · cj qj = λi c2i ,
9.4.14 By the Spectral Theorem, we have a diagonal matrix Λ, whose diagonal entries are
the eigenvalues of A, and an orthogonal matrix Q so that Q−1 AQ = Λ. By assumption, the
√ √ √
eigenvalues λ1 , . . . , λn ≥ 0. Define Λ to be the diagonal matrix whose entries are λ1 , . . . , λn .
√ √ √ 2
Set B = Q ΛQ−1 . Then B 2 = (Q ΛQ−1 )2 = Q Λ Q−1 = QΛQ−1 = A, as desired.
Now we must argue that there is a unique positive semidefinite square root of A. Let C be any
such. The eigenvalues of A are the squares of the eigenvalues of C and the eigenvalues of C are
nonnegative. Let λ1 , . . . , λs be the distinct eigenvalues of A. Then the only possible eigenvalues
√ √ √
of C are λ1 , . . . , λs . We claim that EC ( λi ) = EA (λi ) for all i = 1, . . . , s. Since A = C 2 , any
√
eigenvector of C must be an eigenvector of A; thus, EC ( λi ) ⊂ EA (λi ). But since there is a basis
Ps √ Ps
for Rn consisting of eigenvectors of C, it follows that dim EC ( λi ) = n = dim EA (λi ), and
i=1 √ i=1
so, much as in the proof of Theorem 2.9, we conclude that dim EC ( λi ) = dim EA (λi ) and hence
√
that EC ( λi ) = EA (λi ) for each i = 1, . . . , s. This means that the linear transformation µC is
uniquely determined by decomposing Rn into the eigenspaces of A, and on each such eigenspace,
µC must act by multiplication by the square root of the appropriate eigenvalue of A.
9.4.16 Since A − λI is singular, it follows that B is singular, and so there is a nonzero vector
n
v ∈ R so that Bv = 0. Therefore, we have Bv · v = 0. So, using symmetry of A − aI,
0 = Bv · v = (A − aI)v · (A − aI)v + b2 v · v = k A − aI vk2 + b2 kvk2 . Now, the only way the
sum of two nonnegative numbers can be zero is for them both to be zero. That is, since v 6= 0,
kvk2 6= 0, and we infer that b = 0 and (A − aI)v = 0. So λ = a is a real number, and v is the
corresponding (real) eigenvector.
9.4.17 According to the Spectral Theorem, there is an orthonormal basis in whose coordinates
Pn
yi the quadratic form Q(x) = Ax · x becomes Q̃(y) = λi yi2 . Since A is positive definite, we
i=1
n
!2
X yi
know that all the λi are positive. The ellipsoid E has the equation p ≤ 1, so an easy
i=1
1/λi
294 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
√
application of the Change of Variables Theorem tells us that vol(E) = vol(unit ball)/ λ1 λ2 . . . λn =
√
vol(unit ball)/ det A.
9.4.18 We write each of the quadratic forms in the form Ax · x for the appropriate symmetric
matrix A and then deal with the linear terms as is necessary.
x2
y2
x1
y1
a.
" # " #
0 3 1 1 3
a. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
3 −8 10 −3 1
" #
−9
. Letting x = Qy, the equation of the conic becomes −9y12 + y22 = 9, or, equivalently,
1
−y12 + (y2 /3)2 = 1, which is a hyperbola with asymptotes y2 = ±3y1 . The asymptotes in the
x1 x2 -coordinates are given by x2 = 0 and x2 = (3/4)x1 .
" # " #
3 −1 1 1 −1
b. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
−1 3 2 1 1
" #
2
. Letting x = Qy, the equation of the conic becomes 2y12 + 4y22 = 4, or 12 y12 + y22 = 1,
4
√
which is an ellipse with semimajor axis 1 and semiminor axis 1/ 2.
x2 y2 x2
y1
y2 y1
x1
x1
b. c.
9.4. THE SPECTRAL THEOREM 295
" # " #
16 12 1 4 −3
c. Here A = , and, with Q = , we have Q−1 AQ = Λ =
12 9 5 3 4
" #
25
. Letting x = Qy, the equation of the conic becomes 25y12 + 5y2 = 5, or y2 = 1 − 5y12 ,
0
so this is a downwards-pointing parabola symmetric about the y2 -axis.
" # " #
10 3 1 3 −1
d. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
3 2 10 1 3
" #
11 1 2
. Letting x = Qy, the equation of the conic becomes 11y12 + y22 = 11, or y12 + 11 y2 = 1,
1
√
which is an ellipse with semimajor axis 11 and semiminor axis 1.
x2
y2 x2 y2
y1
y1
x1 x1
d. e.
" # " #
7 6 1 2 −1
e. Here A = , and, with Q = √ , we have Q−1 AQ = Λ =
6 −2 5 1 2
" #
10 √
. Letting x = Qy, the equation of the conic becomes 10y12 − 5y22 + 2 5y2 = 6, or
−5
10y1 − 5(y2 − √15 )2 = 5, which we can rewrite as 2y12 − (y2 − √15 )2 = 1. This is a hyperbola with cen-
2
√ √ √
ter at (0, 1/ 5) and asymptotes y2 = 1/ 5 ± 2y1 . Thus, in the x1 x2 -coordinates, the asymptotes
√ √
are given by x2 = 2−1√2 (1 + (1 + 2 2)x1 ) ≈ 2.7 + 4.8x and x2 = 2+1√2 (1 + (1 − 2 2)x1 ≈ 1.3 − 0.8x.
9.4.19 We write each of the quadratic forms in the form Ax · x for the appropriate symmetric
matrix A and then deal with the linear terms as is necessary.
296 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
3 1 1 √2 − √1 0
16 3
a. Here A = 1 0 2 , and, with Q = √
6
√1
3
− √12 −1
, we have Q AQ =
1 2 0 √1 √1 √1
6 3 2
4
Λ= 1 . Letting x = Qy, the equation of the quadric surface becomes 4y12 +y22 −2y32 =
−2
4, which is a hyperboloid of one sheet.
y3 x3
y2 x2
y1 x1
a.
4 −1 −1 √2 − √1 0
16 3
b. Here A = −1 3 2 , and, with Q = √
6
√1
3
− √1 , we have Q−1 AQ =
2
−1 2 3 √1 √1 √1
6 3 2
3
Λ= 6 . Letting x = Qy, the equation of the quadric surface becomes 3y12 +6y22 +y32 =
1
√ √
6, which is an ellipsoid of semimajor axes 1, 2, and 6.
y3
x3
y2
x2
y1 x1
b.
−1 −2 −5 √1 √1 − √1
2 3 6
c. Here we have A = −2 2 2 , and, with Q =
0 − √1
3
− √2 , we have
6
−5 2 −1 √1 − √1 √1
2 3 6
−6
Q−1 AQ = Λ = 6 . Letting x = Qy, the equation of the quadric surface becomes
0
−6y12 + 6y22 = 6, or −y12 + y22 = 1, which we recognize as a hyperbolic cylinder.
9.4. THE SPECTRAL THEOREM 297
x3
y3 y2
x2
y1 x1
c.
2 1 1 √2 0 − √13
1 6
d. Here we have A = 1 0 1 , and, with Q = 6
√ − √1
2
√1 , we have
3
1 1 0 √1 √1 √1
6 2 3
3
−1
Q AQ = Λ = −1 . Letting x = Qy, the equation of the quadric surface becomes
0
√
3y12 − y22 + 3y3 = 1, which we recognize as a hyperbolic paraboloid (or saddle).
y3 x3
y2 x2
y1 x1
d.
3 2 4 − √12 3
√1
2
2
3
e. Here we have A = 2 0 2 , and, with Q =
0 − √4
3 2
1
3
, we have
4 2 3 √1 √1 2
2 3 2 3
−1
−1
Q AQ = Λ = −1 . Letting x = Qy, the equation of the quadric surface becomes
8
−y12− + y22 8y32
= 8, which is the equation of a hyperboloid of two sheets.
3 0 1 − √12 √12 0
f. Here we have A = 0 −1 0 , and, with Q = 0 0 1, we have
1 1
1 0 3 √
2
√
2
0
2
−1
Q AQ = Λ = 4 . Letting x = Qy, the equation of the quadric surface becomes
−1
2y12
+ 4y22
− + 2y3 = 0, or 2y12 + 4y22 − (y3 − 1)2 = −1, which we recognize as the equation of a
y32
hyperboloid of two sheets.
298 9. EIGENVALUES, EIGENVECTORS, AND APPLICATIONS
y3
x3
x2
y2
y1 x1
e.
y3
x3
y2 x2
x1
y1
f.
9.4.20
" Given
# the quadratic form Q(x) = ax21 + 2bx1 x2 + cx22 , define the symmetric matrix
a b
A= as usual.
b c
2
a. Let " {q1 , q#2 } be an "orthonormal
# basis for R consisting of the eigenvectors of A, and
cos α − sin α
suppose q1 = , q2 = . If λ is an eigenvalue of A, then the corresponding
sin α cos α
" #
a−λ b
eigenvector spans the nullspace of the matrix A − λI = , and therefore—as we saw
b c−λ
" #
b
at the beginning of the section—is given by v = . The angle α this vector makes with the
λ−a
x1 -axis satisfies tan α = (λ − a)/b, and so
2 tan α 2 λ−a
b 2b(λ − a)
tan 2α = 2 = 2 = 2
1 − tan α 1 − λ−a b − (λ − a)2
b
2b(λ − a) 2b(λ − a) 2b
= = = ,
b2 2
− a + 2aλ − λ 2 (a − c)(λ − a) a−c
where at the penultimate step we’ve used the equation λ2 − (a + c)λ + (ac − b2 ) = 0 to eliminate
λ2 .
9.4. THE SPECTRAL THEOREM 299
so the maximum value is λ, attained when y1 = ±1, and the minimum value is µ, attained when
y1 = 0. (Cf. our proof of the Spectral Theorem.)
9.4.21 a. Let A′ denote the (n − 1) × (n − 1) matrix obtained by deleting the nth row and
column from A. Expanding the given determinant in cofactors along the last column and then
along the last row, we see that the given determinant is equal to − det(A′ − tI); the roots of this
polynomial are the eigenvalues of A′ . The restriction of the quadratic form Q to the hyperplane
xn = 0 is positive definite if and only if all the eigenvalues of A′ are positive.
b. Without loss of generality, we may assume b is a unit vector. Choose an orthonormal
basis {v1 , . . . , vn } for Rn with vn = b. Let Q be the orthogonal matrix whose columns are
v1 , . . . , vn . Set à = Q−1 AQ. Note that Q(x) = Q̃(QT x) and b · x = 0 if and only if en · QT x = 0.
Now, by part a, we know that the quadratic form Q̃(y) = yT Ãy is positive definite on the subspace
yn = 0 precisely when the roots of
0
..
.
à − tI
0 =0
1
0 ··· 0 1 0
9.4.22 a. If A is nonsingular, when we write A = LDLT , the entries of D are all nonzero. As
suggested in the problem, we consider the “straight-line homotopy” Ls between I and L, defined
by multiplying each non-diagonal entry of L by s, 0 ≤ s ≤ 1. Then we obtain a continuous path,
g(s) = As = Ls DLT s , 0 ≤ s ≤ 1, in Mn×n . Since As is the product of nonsingular matrices, As
is nonsingular for every s. Now, A0 = D and A1 = A, and, by Exercise 8.7.9, the eigenvalues of
As change continuously as s varies. Since 0 is never an eigenvalue, as we watch each eigenvalue
(starting with the ith entry of D and seeing it change continuously until we get the ith eigenvalue of
A), the sign cannot change. Therefore, the number of positive eigenvalues of A equals the number
of positive entries in D, and the number of negative eigenvalues of A equals the number of negative
entries in D.
b. We know that rank(A) = rank(D) = r, so dim E(0) = dim N(A) = dim N(D), which
is the number of zero entries on the diagonal of D. If d1 , . . . , dk are the negative entries of D, choose
ε0 < min(|d1 |, . . . , |dk |). Then, for all 0 < ε ≤ ε0 , all the diagonal entries of D + εI are nonzero,
k negative and n − k positive. It follows from part a that A + εI has k negative eigenvalues and
n − k positive eigenvalues. Since ε can be chosen as small as we want, it follows that A must have
k negative eigenvalues (and r − k positive).