So far, we’ve found the eigenvalues and eigenvectors of a matrix by eyeballing them or reasoning about them geometrically, but this is not a sustainable strategy. Let’s develop a more systematic approach.
We’re looking for combinations of scalars, λ, and vectors, v, such that
Av=λv
In some ways, we’re trying to “solve” for λ and v given A. Let’s experiment a little:
Av=λvAv−λv=0(A−λI)v=0
What the above says is that if v is an eigenvector of A with eigenvalue λ, then v is in the null space of A−λI. But, since eigenvectors can’t be the zero vector, this means that A−λI is not invertible, since it has a non-trivial null space!
Thinking back to Chapter 6.2, we know that there are several equivalent ways to check if a matrix is not invertible. Perhaps the most computational approach is to compute its determinant; if a (square) matrix’s determinant is 0, then it is not invertible, otherwise it is.
So, since A−λI is not invertible, its determinant must be 0!
Av=λv⟹det(A−λI)=0
(We won’t always use the symbol p(λ), I just introduced it above to make it clear that det(A−λI) is a polynomial function of λ.)
Let’s revisit our example A=[1221].
The matrix A−λI is
A−λI=[1221]−λ[1001]=[1−λ221−λ]
Note that A−λI involves subtracting λ from the diagonal elements of A, and leaving all other elements unchanged.
The determinant of A−λI is
det(A−λI)=(1−λ)(1−λ)−2⋅2=λ2−2λ+1−4=p(λ)λ2−2λ−3
The eigenvalues of A are the values of λ where λ2−2λ−3=0.
(In general, we’re not going to plot the characteristic polynomial each time, but I think it’s useful to see once or twice to give the idea some context.)
By factoring, I can write this equation as
(λ+1)(λ−3)=0
which tells me that the eigenvalues of A are λ1=−1 and λ2=3. If this equation weren’t factorable, I’d need to use the quadratic formula to find the eigenvalues. For now, there’s no particular “ordering” to the eigenvalues, i.e. I could have said λ1=3 and λ2=−1; all that matters is that I stay consistent throughout a particular example.
Once we find the eigenvalues by solving p(λ)=0, we can find the eigenvectors by solving (A−λI)v=0 for each eigenvalue.
For λ1=−1, we’re looking for vectors v such that Av=−1v, i.e. if v=[ab], then
[1221][ab]=[−a−b]
(I’m using a and b instead of v1 and v2 because I’ll refer to the vectors v1 and v2 in just a moment.) As a system of equations, this says
a+2b2a+b=−a=−b
The first and second equations both tell us that b=−a. Remember, we’d expect there to be infinitely many solutions to this system, since any scalar multiple of an eigenvector is still an eigenvector. So, the “simple” solution is v1=[1−1], but [2−2], [3−3], etc. are also solutions.
For λ2=3, let me introduce another way of finding the corresponding eigenvector. There’s nothing wrong with the system of equations approach, but it’s useful to have multiple techniques for solving problems in our toolkit. λ2=3 tells us that Av=3v, or equivalently, (A−3I)v=0. So, the eigenvector we’re looking for is in the null space of A−3I.
A−3I=[1−3221−3]=[−222−2]
Here, we notice that both rows of A−3I sum to 0, meaning that
So our eigenvalues are λ1=2 and λ2=5. Next, let’s find eigenvectors for them. As I did in the first example, I’ll show two different techniques for finding eigenvectors.
For λ1=2, we’re looking for a vector v=[ab] such that Av=2v:
Av[3214][ab]3a+b2a+4b=2v=[2a2b]=2a=2b
As expected, there are infinitely many choices for a and b (since both equations tell us that b=−a), and the resulting eigenvectors v all lie on the same line. So, if we let a=1, then we have that v1=[1−1] is an eigenvector for λ1.
For λ2=5, I’ll try the null space approach, just to illustrate it once more. Again, I’m looking for a vector v such that Av=5v, or equivalently, (A−5I)v=0.
A−5I=[3−5214−5]=[−221−1]
The vector v2=[12] is orthogonal to both rows of A−5I, meaning (A−5I)[12]=[00]. So, v2=[12] is an eigenvector for λ2=5, though again any non-zero scalar multiple of [12] will also be an eigenvector for λ2=5.
So, A has eigenvalues λ1=2 and λ2=5, and corresponding eigenvectors v1=[1−1] and v2=[12].
Find the eigenvalues and eigenvectors of A=[300−4]. What do you notice about the eigenvalues? The eigenvectors?
Solution
The key is to notice that for diagonal matrices – that is, matrices where all the entries off the diagonal are 0 – the eigenvalues are simply the entries on the diagonal.
A−λIdet(A−λI)=[3−λ00−4−λ]=(3−λ)(−4−λ)
The eigenvalues λ1=3 and λ2=−4 are the same as the values on the diagonal! The corresponding eigenvectors are v1=[10] and v2=[01]. Just to illustrate one of those cases out,
Av2=[300−4][01]=[0−4]=−4[01]
The fact that eigenvalues and eigenvectors are so easy to find for diagonal matrices, coupled with the fact that diagonal matrices are really easy to compute powers of (e.g. A2 is just the diagonal matrix with each entry squared), makes them very useful in practice.
Find two different2×2 matrices who have the characteristic polynomial
p(λ)=λ2−4λ+3
Solution
Recall that for a 2×2 matrix A=[acbd]:
p(λ)=det(A−λI)=∣∣a−λcbd−λ∣∣=(a−λ)(d−λ)−bc
A simple approach to this problem is to find 2 diagonal matrices D=[a00d] and D′=[d00a]. That way, all we have to do is factor p(λ) into the form (a−λ)(d−λ):
λ2−4λ+3=(3−λ)(1−λ)
So, our two different matrices are D=[3001] and D′=[1003].
There exist non-diagonal matrices with the same characteristic polynomial, too. For instance,
[2112]
also has eigenvalues of 1 and 3, and its characteristic polynomial is also p(λ)=λ2−4λ+3.
So, the eigenvalues are the values along the diagonal: λ1=3, λ2=1, and λ3=2. This is a convenient property of triangular matrices (which is why I’ve planted this example here). Next, let’s find their eigenvectors.
For λ1=3, I’ll do it the null space way: I’m looking for a vector in the null space of A−3I. (I say “the null space way” but it’s really just a different way of writing the system of equations approach, and one that can sometimes be easier to eyeball.)
A−3I=⎣⎡0003−2005−1⎦⎤
The zeros in the first column tell me that (A−3I)⎣⎡100⎦⎤=⎣⎡000⎦⎤, so v1=⎣⎡100⎦⎤ is an eigenvector for λ1=3. Because I have three unique eigenvalues and my matrix is 3×3, I know that this is the only possible eigenvector direction for λ1=3; in Chapter 9.4, we’ll run into situations where the null space of A−3I (for instance) is spanned by more than one vector, but that’s not the case here.
For λ2=1, I’ll do it the system of equations way: I’m looking for a vector v=⎣⎡abc⎦⎤ such that Av=1v.
Once again, there are infinitely many solutions to this system, which we’d expect since there’s a whole line of eigenvectors.
The third equation above tells us that c=0.
Substituting that into the second equation, we have that b+5⋅0=b, i.e. b=b, so let’s treat b as a free variable for now.
The first equation above tells us that 3a+3b=a, i.e. a=−23b.
So, any vector of the form ⎣⎡−23bb0⎦⎤ is an eigenvector for λ2=1. But we just need to find one, so let’s take b=2, which gives us v2=⎣⎡−320⎦⎤.
Finally, for λ3=2, I’ll do it the null space way.
A−2I=⎣⎡3−20031−20052−2⎦⎤=⎣⎡1003−10050⎦⎤
I’m looking for a vector v3=⎣⎡abc⎦⎤ in the null space of A−2I, i.e. a vector whose dot product with each row of A−2I is 0. The last row of A−2I is all zeros, so I can focus my attention on the first two rows. If the first two components of v3 are -3 and 1, then the dot product of v3 with the first row of A−2I is (1)(−3)+(3)(1)=0, which is what I want.
Then, its dot product with the second row is (0)(−3)+(−1)(1)+(5)(c)=−1+5c, where c is the last component of v3. For −1+5c=0, we need c=51. So, one vector in the null space of A−2I – which is an eigenvector for λ3=2 – is v3=⎣⎡−3151⎦⎤. To clean the numbers up a bit, I’ll scale this vector by 5, which gives us v3=⎣⎡−1551⎦⎤.
So, our eigenvectors for λ1=3, λ2=1, and λ3=2 are v1=⎣⎡100⎦⎤, v2=⎣⎡−320⎦⎤, and v3=⎣⎡−1551⎦⎤.
# When in doubt, check with numpy!
np.linalg.eig(
np.array([
[2, 0, 0],
[-1, 5, 0],
[4, 3, -7]
])
)
The characteristic polynomial of an n×n matrix is a polynomial of degree n. A fact from algebra is that a polynomial of degree n has exactly n roots. The intuitive way of thinking about this is that degree n polynomials can have at most n−1 “bends” (a line can’t bend, a quadratic can bend once, a cubic can bend twice, etc.), and each time it bends, it can change directions to cross the x-axis again.
The issue is that some of these roots may be repeated, and some may be complex numbers, meaning that they don’t actually cross the x-axis in the standard xy-plane (or in our case, the λ-axis and λ,p(λ)-plane).
For example, the matrix A=⎣⎡4000010000000004⎦⎤ is diagonal, meaning its easy to read off its characteristic polynomial:
This characteristic polynomial has a double root at λ=4, a single root at λ=1, and a single root at λ=0. This A has 3 distinct eigenvalues, but one of them, λ=2, is repeated, and has an algebraic multiplicity of 2.
As another example, the matrix A=[34−43] has the characteristic polynomial
p(λ)=∣∣3−λ4−43−λ∣∣=(λ−3)2+16
I can expand this out to get λ2−6λ+25, but in a case like this I think the above form is more telling. Visually, p(λ) here is a parabola that sits entirely above the λ-axis, meaning it has no real roots, so A has no real eigenvalues.
where θ=cos−1(53). This matrix rotates vectors by θ radians counterclockwise, which means that no real-valued vector will remain in the same direction after being multiplied by A. In the 2×2 case, the only way to get a real-valued eigenvalue out of a rotation matrix is if the matrix rotates by an integer multiple of π radians (180 degrees), which either corresponds to reflecting/negating the vector (like in [−100−1]) or returning back the same vector itself (like in [1001], the identity matrix, which can be thought of as a rotation by 2π).
The solutions to p(λ)=(λ−3)2+16=0 are the complex numbers λ=3+4i and λ=3−4i, again where i is the imaginary unit, defined by i2=−1. The corresponding eigenvectors are complex too, and we won’t worry about finding them. (If you’re familiar with complex numbers, you might recognize these eigenvalues as 5eiθ and 5e−iθ, where θ=cos−1(53). This comes from Euler’s formula, eiθ=cosθ+isinθ.)
The coefficient on λ is −(a+d), which is the negative of the sum of the diagonal entries of A. A term often used to refer to the sum of the diagonal entries of a matrix is its trace, trace(A). And, the constant term of p(λ) above is just det(A)! For 2×2 matrices, this says that the characteristic polynomial is of the form
p(λ)=λ2−trace(A)λ+det(A)
For example, consider A=[3719]. The fact above says that A’s characteristic polynomial is
p(λ)=λ2−(3+9)λ+(3⋅9−1⋅7)=λ2−12λ+20
which, with enough practice, is a calculation you might be able to do in your head. But, this gives way to a more general fact, that is true for any n×n matrix.
Example 1: A=[3719]'s eigenvalues add up to 3+9=12 and multiply to det(A)=3⋅9−1⋅7=20. This gives me a quick way of figuring out that A’s eigenvalues are 10 and 2.
Example 2: A=⎣⎡0.50.400.50.30.500.30.5⎦⎤ has an eigenvalue of 1, corresponding to the eigenvector ⎣⎡111⎦⎤, since the sum of each row of A is 1. Let λ2 and λ3 be the other two eigenvalues. The trace fact tells me that
In other words, the other two eigenvalues sum to 0.3 and multiply to -0.1. This sounds like a job for 0.5 and -0.2, which are indeed the other two eigenvalues of A (other than 1).
Example 3: A=[1224] has a determinant of 1⋅4−2⋅2=0, which means that its eigenvalues multiply to 0, which is another reminder that A has an eigenvalue of 0. The trace fact tells me that the other eigenvalue λ2 must satisfy 1+4=0+λ2, so λ2=5.
Let’s summarize the key ideas from both this section and the last.
Suppose A is an n×n matrix.
λ is an eigenvalue of A, corresponding to the eigenvector v, if
Av=λv
The English interpretation of this is that v is an eigenvector of A if, when multiplied by A, v still points in the same direction, and is just stretched by a factor of λ.
An n×n matrix A has exactly n eigenvalues. Some of these may be complex, and some of these may be repeated, but all of them are roots to the characteristic polynomial of A,
p(λ)=det(A−λI)
A quick way to check if you’ve found the right eigenvalues is that
The product of the eigenvalues of A is equal to det(A), i.e. λ1λ2⋯λn=det(A).
The sum of the eigenvalues of A is equal to the trace of A, which is the sum of the diagonal elements of A.
0 is an eigenvalue of A if and only if A is not invertible.
If λ is an eigenvalue of A with eigenvector v, then λk is an eigenvalue of Ak with the same eigenvector v.