In Chapter 9.1 and Chapter 9.3, we saw that eigenvalues and eigenvectors allow us to better understand the behavior of the linear transformation from Rn to Rn given by an n×n matrix A.
When understanding matrix powers and adjacency matrices, we often decomposed some arbitrary vector x as a linear combination of eigenvectors of A. But sometimes, the eigenvectors of A don’t span all of Rn, meaning we can’t write every other vector as a linear combination of eigenvectors of A. When does this happen, and why do we really care?
Recall that if A is an n×n matrix, then A has n eigenvalues, it’s just that some of the eigenvalues may be complex (leading to complex-valued eigenvectors), and some eigenvalues may be repeated. These n eigenvalues are all solutions to the characteristic polynomial of A,
p(λ)=det(A−λI)
The corresponding eigenvectors are the solutions to the system of equations (A−λI)v=0.
Suppose A has n linearly independent eigenvectors v1,v2,⋯,vn, with eigenvalues λ1,λ2,⋯,λn. What I’m about to propose next will seem a little arbitrary, but bear with me – you’ll see the power of this idea shortly. What happens if we multiply A by a matrix, say V, whose columns are the eigenvectors of A?
Here, Λ (the capitalized Greek letter for λ) is a diagonal matrix of eigenvalues in the same order as the eigenvectors in V.
AV=VΛ
We can take this a step further. IfV is invertible – which it is here, since we assumed A has n linearly independent eigenvectors – then we can multiply both sides of the equation above by V−1 on the right to get
A=VΛV−1
The existence of this decomposition is contingent on V being invertible, which happens when A has n linearly independent eigenvectors. If A doesn’t have “enough” eigenvectors, then V wouldn’t be invertible, and we wouldn’t be able to decompose A in this way.
Let’s start with a familiar example, A=[1221], which was the first example we saw in Chapter 9.1. A has eigenvalues λ1=3 and λ2=−1, and corresponding eigenvectors v1=[11] and v2=[1−1]. These eigenvectors are linearly independent, so A is diagonalizable, and we can write
V = np.array([[1, 1],
[1, -1]])
Lambda = np.diag([3, -1]) # New tool.
V_inv = np.linalg.inv(V)
V @ Lambda @ V_inv # The same as A!
array([[1., 2.],
[2., 1.]])
Is this decomposition unique? No, because we could have chosen a different set of eigenvectors, or wrote them in a different order (in which case Λ would be different). We’ll keep the eigenvectors in the same order, but instead let’s consider
V=[22−55]
which is also a valid eigenvector matrix for A. (Remember that any scalar multiple of an eigenvector is still an eigenvector!)
It is still true that A=VΛV−1. This may seem a little unbelievable (I wasn’t convinced at first), but remember that V−1 “undoes” any changes in scaling we introduce to V.
Subtracting the second equation from the first, we get
c2(λ2−λ1)v2=0
Since λ1 and λ2 are distinct, λ2−λ1=0, so we must have c2=0.
But if c2=0, then c1=0 also! We can see this by substituting c2=0 into the first equation we started working with.
c1v1+c2v2=0⟹c1v1=0⟹c1=0
So, the only solution to c1v1+c2v2=0 is c1=c2=0. This means that v1 and v2 are linearly independent.
Part 2
Now we need to show that if v1, v2, and v3 are eigenvectors of A with distinct eigenvalues (λ1, λ2, and λ3, respectively), then they are linearly independent. We need to show that the only solution to
c1v1+c2v2+c3v3=0
is c1=c2=c3=0. The general idea is the same as in Part 1: we need to use the fact that λ1, λ2, and λ3 are different.
Instead of multiplying c1v1+c2v2+c3v3=0 by A and then λ1 and then subtracting the two equations, here’s an idea: why don’t we multiply by A−λ1I and see what happens?
Note that each multiplication by A−λiI “killed” one of the eigenvectors. What we’re left with is
c3(λ3−λ1)(λ3−λ2)v3=0
But since λ1, λ2, and λ3 are distinct, λ3−λ1 and λ3−λ2 are non-zero, so we must have c3=0.
Substituting c3=0 into the equation involving c2 and c3 only gives
c2(λ2−λ1)v2=0⟹c2=0
And finally, substituting c2=c3=0 into c1v1+c2v2+c3v3=0 gives
c1v1+c2v2+c3v3=0⟹c1v1=0⟹c1=0
So, the only solution to c1v1+c2v2+c3v3=0 is c1=c2=c3=0. This means that v1, v2, and v3 are linearly independent! This argument generalizes to any number of eigenvectors, as long as they all correspond to distinct eigenvalues.
So, to compute Ak, we don’t need to multiply k matrices together (which would be a computational nightmare for large k). Instead, all we need to do is compute VΛkV−1. And remember, Λ is a diagonal matrix, so computing Λk is easy: we just raise the diagonal entries to the power in question.
Application 2: Understanding Linear Transformations¶
Remember that if A is an n×n matrix, then f(x)=Ax is a linear transformation from Rn to Rn. But, if A=VΛV−1, then
f(x)=Ax=VΛV−1x
This allows us to understand the effect of f on x in three stages. Remember that V’s columns are the eigenvectors of A, so the act of multiplying a vector y by V is equivalent to taking a linear combination of V’s columns (A’s eigenvectors) using the weights in y.
For example, V[3−4] says take 3 of the first eigenvector, and -4 of the second eigenvector, and add them together to get a new vector. If V[3−4]=[12], this says that taking 3 of the first eigenvector and -4 of the second eigenvector is the same as taking 1 of the first standard basis vector, [10] and 2 of the second standard basis vector, [01].
Intuitively, think of V as a matrix that takes in “amounts of each eigenvector” and outputs “amounts of each standard basis vector”, where the standard basis vectors are the columns of I.
i.e. multiplying V−1 by xexpresses x as a linear combination of the eigenvectors of A. If z is the output of V−1x, then x=Vz, meaning z contains the amounts of each eigenvector needed to produce x.
(This is non-standard notation, since V is not a function and eigenvector amounts and standard basis vector amounts are not sets but concepts, but I hope this helps convey the role of V and V−1 in this context.)
Taking a step back, recall
f(x)=Ax=VΛV−1x
What this is saying is f does three things:
First, it takes x and expresses it as a linear combination of the eigenvectors of A, which is V−1x. (This contains the “amounts of each eigenvector” needed to produce x.)
Then, it takes that linear combination V−1x and scales each eigenvector by its corresponding eigenvalue, i.e. it scales or stretches each eigenvector by a different amount. Remember that diagonal matrices only scale, they don’t do anything else!
Finally, it takes the resulting scaled vector ΛV−1x and expresses it as a linear combination of the standard basis vectors, i.e. it combines the correct amounts of the stretched eigenvectors to get the final result.
multiply by V−1express in eigenvector basis→multiply by Λscale eigenvectors→multiply by Vexpress in standard basis
Most matrices we’ve seen in Chapter 9.1 and here so far in this section have been diagonalizable. I’ve tried to shield you from the messy world of non-diagonalizable matrices until now, but we’re ready to dive deeper.
Consider A=[1011]. The characteristic polynomial of A is
p(λ)=det(A−λI)=det[1−λ011−λ]=(1−λ)2
which has a double root of λ=1. So, A has a single eigenvalue λ=1. What are all of A’s eigenvectors? They’re solutions to Av=1v, i.e. (A−I)v=0. Let’s look at the null space of A−I:
A−I=[1−1011−1]=[0010]
The null space of A−I is spanned by the single vector [10]. So, A has a single line of eigenvectors, spanned by v1=[10].
So, does A have an eigenvalue decomposition? No, because we don’t have enough eigenvectors to form V. If we try and form V by using v1 in the first column and a column of zeros in the second column, we’d have
V=[1000]
but this is not an invertible matrix, so V is not invertible, and we can’t write A=VΛV−1. A is not diagonalizable! This means that it’s harder to interpret A as a linear transformation through the lens of eigenvectors, and it’s harder to understand the long-run behavior of Akx for an arbitrary x.
Let me emphasize what I mean by A needing to have nlinearly independent eigenvectors. Above, we found that [10] is an eigenvector of A, which means that any scalar multiple of [10] is also an eigenvector of A. But, the matrix
[1020]
is not invertible, meaning it can’t be the V in A=VΛV−1.
The 2×2 identity matrix I=[1001] has the characteristic polynomial p(λ)=(λ−1)2. So, I has a single eigenvalue λ=1, just like the last example.
But, λ=1 corresponds to two different eigenvector directions! I can pick any two linearly independent vectors in R2 and they’ll both be eigenvectors for I. For example, let v1=[1−3] and v2=[1198], meaning
V=[1−31198]
The matrix Λ is just [1001] which is the same as I itself. This means that as long as V is invertible (i.e. as long as I pick two linearly independent vectors in R2),
VΛV−1=VIV−1=VV−1=I
which means that indeed, I is diagonalizable. (Any matrix that is diagonal to begin with is diagonalizable: in A=PDP−1, P is just the identity matrix.)
If you’d rather look at this example through the lens of solving systems of equations, an eigenvector v=[ab] of I with eigenvalue λ=1 satisfies
Iv=[1001][ab]=[ab]=1[ab]
But, as a system this just says a=a and b=b, which is true for any v. Think of a and b both as independent variables; the set of possible v’s, then, is two dimensional (and is all of R2).
A is not invertible (column 2 is double column 1), so it has an eigenvalue of λ1=0, which corresponds to the eigenvector v1=[−21].
It also has an eigenvalue of λ2=5, corresponding to the eigenvector v2=[12]. (Remember, the quick way to spot this eigenvalue is to remember that the sum of the eigenvalues of A is equal to the trace of A, which is 1+4=5; since 0 is an eigenvalue, the other eigenvalue must be 5.)
Does A have an eigenvalue decomposition? Let’s see if anything goes wrong when we try and construct the eigenvector matrix V and the diagonal matrix Λ and multiplying VΛV−1.
V=[−2112],Λ=[0005]
VΛV−1=[−2112][0005][−2/51/51/52/5]=[1224]
Everything worked out just fine! A is diagonalizable, even though it’s not invertible.
As we saw in an earlier example, the matrix [1011] does not have two linearly independent eigenvectors, and so is not diagonalizable. I’d like to dive deeper into identifying when a matrix is diagonalizable.
Suppose an n×n matrix A has the characteristic polynomial
p(λ)=(λ+1)2λ(λ−1)3(λ−4)2(λ−5)(λ−12)2
What is n (i.e. the number of rows/columns of A)?
What is the determinant of A?
What are all of A’s eigenvalues and their algebraic multiplicities?
Solution
n is equal to the sum of the exponents of the polynomial, which is 2+1+3+2+1+2=11. Remember that p(λ) is a polynomial of degree n.
The determinant of A is the product of the eigenvalues. Notice that p(λ) has a factor of λ, i.e. p(0)=0, meaning 0 is an eigenvalue, so det(A)=0.
A has eigenvalues:
-1 with multiplicity AM(−1)=2
0 with multiplicity AM(0)=1
1 with multiplicity AM(1)=3
4 with multiplicity AM(4)=2
5 with multiplicity AM(5)=1
12 with multiplicity AM(12)=2
These algebraic multiplicities come from the exponents of the factors in p(λ).
The algebraic multiplicity of an eigenvalue, as the name suggests, is purely a property of the characteristic polynomial. Alone, they don’t tell us whether or not a matrix is diagonalizable. Instead, we’ll need to look at another form of multiplicity alongside the algebraic multiplicity.
The geometric multiplicity of an eigenvalue can’t be determined from the characteristic polynomial alone – instead, it involves finding nullsp(A−λiI) and finding its dimension, i.e. the number of linearly independent vectors needed to span it. But in general,
1≤GM(λi)≤AM(λi)
Think of the algebraic multiplicity of an eigenvalue as the “potential” number of linearly independent eigenvectors for an eigenvalue, sort of like the number of slots we have for that eigenvalue. The geometric multiplicity, on the other hand, is the number of linearly independent eigenvectors we actually have for that eigenvalue. As we’ll see in the following examples, when the geometric multiplicity is less than the algebraic multiplicity for any eigenvalue, the matrix in question is not diagonalizable.
In each of the following matrices A, we’ll
Find the algebraic multiplicity of each of A’s eigenvalues.
For each eigenvalue, find the geometric multiplicity and a basis for the eigenspace.
Conclude whether A is diagonalizable.
As usual, attempt these examples on your own first before peeking at the solutions.
First, we’ll find the characteristic polynomial of A:
p(λ)=det(A−λI)=∣∣1−λ0011−λ0011−λ∣∣=(1−λ)3
(You should practice quickly finding the characteristic polynomial of a 3×3 matrix; Chapter 9.2 has relevant examples.)
This tells us that A has a single eigenvalue λ=1, with algebraic multiplicity AM(1)=3.
The geometric multiplicity of λ=1 is dim(nullsp(A−I)). Let’s look at A−I:
A−I=⎣⎡000100010⎦⎤
A−I has rank 2, so dim(nullsp(A−I))=3−2=1 (from the rank-nullity theorem), so the geometric multiplicity of λ=1 is GM(1)=1.
The vector v1=⎣⎡100⎦⎤ is an eigenvector for λ=1, and so is any scalar multiple of v1; I found this by noticing that the first column of A−I is all zeros. So,
nullsp(A−I)=span⎝⎛⎩⎨⎧⎣⎡100⎦⎤⎭⎬⎫⎠⎞
Since GM(1)=1<AM(1)=3, A is not diagonalizable. A only has one linearly independent eigenvector!
So, the eigenvalues are λ1=1 and λ2=0.5, each with algebraic multiplicity AM(1)=AM(0.5)=1.
The fact that A has two distinct eigenvalues alone tells us that A is diagonalizable, since each eigenvalue has exactly one independent eigenvector (which really means a line of eigenvectors, or a 1-dimensional eigenspace) and eigenvectors for different eigenvalues are always linearly independent. Let’s find those eigenvectors to be sure.
For λ1=1, we solve (A−I)v=0:
A−I=[−0.20.20.3−0.3]
The null space of this matrix is spanned by [32], so nullsp(A−I)=span({[32]}) and GM(1)=1.
For λ2=0.5, we solve (A−0.5I)v=0:
A−0.5I=[0.30.20.30.2]
The null space of this matrix is spanned by [−11], so nullsp(A−0.5I)=span({[−11]}) and GM(0.5)=1.
Since A has two linearly independent eigenvectors, it is diagonalizable:
So, A has eigenvalues λ1=3 with AM(3)=2 and λ2=1 with AM(1)=1. There’s a quicker way we could have spotted A’s eigenvalues, discussed after the solutions box.
For λ1=3, we solve (A−3I)v=0:
A−3I=⎣⎡−1101−10000⎦⎤
rank(A−3I)=1 (since all columns are multiples of ⎣⎡−110⎦⎤), so dim(nullsp(A−3I))=3−1=2, which means GM(3)=2. At this point, we can conclude that A is diagonalizable! But, let’s find a basis for the eigenspace nullsp(A−3I) to be thorough.
Two linearly independent vectors in nullsp(A−3I) are ⎣⎡110⎦⎤ and ⎣⎡001⎦⎤, so nullsp(A−3I)=span⎝⎛⎩⎨⎧⎣⎡110⎦⎤,⎣⎡001⎦⎤⎭⎬⎫⎠⎞. This is just one possible basis for the eigenspace; it’s not the only one.
What this says is any linear combination of ⎣⎡110⎦⎤ and ⎣⎡001⎦⎤ is also an eigenvector for λ=3. For example, ⎣⎡210120003⎦⎤⎣⎡−1−113⎦⎤=3⎣⎡−1−113⎦⎤.
For λ2=1, we solve (A−I)v=0:
A−I=⎣⎡110110002⎦⎤
Since AM(1)=1, we know that there could only be one linearly independent eigenvector for λ2=1, so GM(1)=1 too. One such eigenvector is ⎣⎡1−10⎦⎤, so nullsp(A−I)=span⎝⎛⎩⎨⎧⎣⎡1−10⎦⎤⎭⎬⎫⎠⎞.
Since A has three linearly independent eigenvectors, it is diagonalizable:
In the solutions above, I found A’s eigenvalues the traditional way, by solving for the roots of A’s characteristic polynomial. But there’s a quicker way to find A’s eigenvalues in this case that I want to highlight. A is a block diagonal matrix:
A=⎣⎡210120003⎦⎤=[A101×202×1A2]
In this case, A1=[2112] and A2=[3]. Think of this as us placing A1 and A2 on the diagonal of A, and then filling the rest of A with zeros.
In general, if A is a block diagonal matrix, then the eigenvalues of A are the eigenvalues of the blocks on the diagonal combined. To see why this is true, suppose you know that λ is an eigenvalue of one of A’s diagonal blocks with a corresponding eigenvector of v. To get a corresponding eigenvector of A, just extend v by placing 0’s in the positions of the other blocks. In other words, if v is an eigenvector of A1, then ⎣⎡v0⋮0⎦⎤ is an eigenvector of A.
Here, the eigenvalues of A1 are 1 and 3, and the sole eigenvalue of A2 is 3, so A’s eigenvalues are 1 and 3 (with algebraic multiplicity 2).
Is A diagonalizable? If so, find V and Λ such that A=VΛV−1. Tip: Use the fact that A is upper triangular to find its eigenvalues quickly. We first discussed this trick in Chapter 9.2.
Solution
A is upper triangular, so its eigenvalues are the entries on the diagonal: λ1=2, λ2=2, λ3=3. So, AM(2)=2 and AM(3)=1.
For λ1=2, we solve (A−2I)v=0:
A−2I=⎣⎡000100001⎦⎤
rank(A−2I)=2, so dim(nullsp(A−2I))=1, so GM(2)=1. At this point, we can conclude that A is not diagonalizable, since GM(2)<AM(2). Noticing that the first column of A−2I is all zeros, v1=⎣⎡100⎦⎤ is an eigenvector, and
nullsp(A−2I)=span⎝⎛⎩⎨⎧⎣⎡100⎦⎤⎭⎬⎫⎠⎞
For λ3=3, we solve (A−3I)v=0:
A−3I=⎣⎡−1001−10000⎦⎤
Using similar logic, GM(3)=1 and
nullsp(A−3I)=span⎝⎛⎩⎨⎧⎣⎡001⎦⎤⎭⎬⎫⎠⎞
A only has two linearly independent eigenvectors, one for λ=2 and one for λ=3, but is a 3×3 matrix, and hence its not diagonalizable. (Also, for λ1=2, the geometric multiplicity, 1, is less than the algebraic multiplicity, 2, but we need equality for all eigenvalues in order for A to be diagonalizable.)
In the previous examples, we were given a matrix A and then found its eigenvalues and eigenvectors. Let’s go in the other direction.
Suppose A is a 3×3 matrix with:
eigenvalue λ1=3 with eigenvector v1=⎣⎡201⎦⎤
eigenvalue λ2=−2 with eigenspace
span⎝⎛⎩⎨⎧⎣⎡110⎦⎤,⎣⎡012⎦⎤⎭⎬⎫⎠⎞
Find A.
Solution
If we’re given eigenvalues and enough linearly independent eigenvectors, we can build A directly from the decomposition A=VΛV−1.
Since the eigenspace for λ=−2 is 2-dimensional, we can use ⎣⎡110⎦⎤ and ⎣⎡012⎦⎤ as two linearly independent eigenvectors for λ=−2. So, one valid choice is
V=⎣⎡201110012⎦⎤,Λ=⎣⎡3000−2000−2⎦⎤
where the columns of V are the eigenvectors, and the diagonal entries of Λ are the corresponding eigenvalues in the same order.
The main takeaway is that if you know a full eigenbasis and the matching eigenvalues, you can reconstruct the original matrix by putting the eigenvectors into V, the eigenvalues into Λ, and multiplying VΛV−1.
In Chapter 9.5, I want to build on our understanding of diagonalizability by focusing on a special class of matrices: symmetric matrices. It turns out that symmetric matrices are always diagonalizable, and beyond that, their corresponding eigenvectors obey a really nice property.
For each statement, determine whether it is true or false and provide a brief justification. (For false statements, provide a counterexample.)
True or False: If all of A’s eigenvalues are distinct, then A is diagonalizable.
True or False: If all of A’s eigenvalues are positive, then A is diagonalizable.
True or False: If A is diagonalizable, then A is invertible.
True or False: If A is invertible, then A is diagonalizable.
Solution
If all of A’s eigenvalues are distinct, then A is diagonalizable. True. Remember that the minimum possible geometric multiplicity is 1, so if all eigenvalues are distinct, they each have an algebraic and geometric multiplicity of 1. Eigenvectors for different eigenvalues are always linearly independent; think of it this way, an eigenvector can only ever be paired with a single eigenvalue, so eigenvectors for different eigenvalues can’t be linearly dependent. Pairing these two facts, A must have n linearly independent eigenvectors, and so is diagonalizable.
If all of A’s eigenvalues are positive, then A is diagonalizable. False. In the example above titled “Another Non-Diagonalizable Matrix”, A had eigenvalues of 2, 2, and 3, all of which are positive, but A was not diagonalizable.
If A is diagonalizable, then A is invertible. False. We saw an example of a diagonalizable matrix that was not invertible earlier in this section:
If A is invertible, then A is diagonalizable. False. The matrix from the example “Another Non-Diagonalizable Matrix” above had eigenvalues of 2, 2, and 3, none of which are 0, meaning A is invertible, but A was not diagonalizable.