Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.2. Diagonalization

In Chapter 5.1 and Chapter 5.1, Part 2, we saw that eigenvalues and eigenvectors allow us to better understand the behavior of the linear transformation from Rn\mathbb{R}^n to Rn\mathbb{R}^n given by an n×nn \times n matrix AA.

When understanding matrix powers and adjacency matrices, we often decomposed some arbitrary vector x\vec x as a linear combination of eigenvectors of AA. But sometimes, the eigenvectors of AA don’t span all of Rn\mathbb{R}^n, meaning we can’t write every other vector as a linear combination of eigenvectors of AA. When does this happen, and why do we really care?


The Eigenvalue Decomposition

Recall that if AA is an n×nn \times n matrix, then AA has nn eigenvalues, it’s just that some of the eigenvalues may be complex (leading to complex-valued eigenvectors), and some eigenvalues may be repeated. These nn eigenvalues are all solutions to the characteristic polynomial of AA,

p(λ)=det(AλI)p(\lambda) = \text{det}(A - \lambda I)

The corresponding eigenvectors are the solutions to the system of equations (AλI)v=0(A - \lambda I) \vec v = \vec 0.

Suppose AA has nn linearly independent eigenvectors v1,v2,,vn\vec v_1, \vec v_2, \cdots, \vec v_n, with eigenvalues λ1,λ2,,λn\lambda_1, \lambda_2, \cdots, \lambda_n. What I’m about to propose next will seem a little arbitrary, but bear with me – you’ll see the power of this idea shortly. What happens if we multiply AA by a matrix, say VV, whose columns are the eigenvectors of AA?

AV=A[v1v2vn]=[Av1Av2Avn]=[λ1v1λ2v2λnvn]\begin{align*} AV &= A \begin{bmatrix} | & | && | \\ \vec v_1 & \vec v_2 & \cdots & \vec v_n \\ | & | && | \end{bmatrix} \\ &= \begin{bmatrix} | & | && | \\ A\vec v_1 & A\vec v_2 & \cdots & A\vec v_n \\ | & | && | \end{bmatrix} \\ &= \begin{bmatrix} | & | && | \\ \lambda_1 \vec v_1 & \lambda_2 \vec v_2 & \cdots & \lambda_n \vec v_n \\ | & | && | \end{bmatrix} \end{align*}

AVAV is a matrix whose columns are eigenvectors of AA, each scaled by the corresponding eigenvalue! Is there another way to write the last line above?

AV=[λ1v1λ2v2λnvn]=[v1v2vn][λ1000λ2000λn]=VΛ\begin{align*} AV &= \begin{bmatrix} | & | && | \\ \lambda_1 \vec v_1 & \lambda_2 \vec v_2 & \cdots & \lambda_n \vec v_n \\ | & | && | \end{bmatrix} \\ &= \begin{bmatrix} | & | && | \\ \vec v_1 & \vec v_2 & \cdots & \vec v_n \\ | & | && | \end{bmatrix} \begin{bmatrix} \lambda_1 & 0 & \cdots & 0 \\ 0 & \lambda_2 & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \lambda_n \end{bmatrix} \\ &= V \Lambda \end{align*}

Here, Λ\mathcal{\Lambda} (the capitalized Greek letter for λ\mathbf{\lambda}) is a diagonal matrix of eigenvalues in the same order as the eigenvectors in VV.

AV=VΛAV = V \Lambda

We can take this a step further. If VV is invertible – which it is here, since we assumed AA has nn linearly independent eigenvectors – then we can multiply both sides of the equation above by V1V^{-1} on the right to get

A=VΛV1A = V \Lambda V^{-1}

The existence of this decomposition is contingent on VV being invertible, which happens when AA has nn linearly independent eigenvectors. If AA doesn’t have “enough” eigenvectors, then VV wouldn’t be invertible, and we wouldn’t be able to decompose AA in this way.

Diagonalizable Matrices

A First Example

Let’s start with a familiar example, A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}, which was the first example we saw in Chapter 5.1. AA has eigenvalues λ1=3\lambda_1 = 3 and λ2=1\lambda_2 = -1, and corresponding eigenvectors v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} and v2=[11]\vec v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. These eigenvectors are linearly independent, so AA is diagonalizable, and we can write

V=[1111],Λ=[3001]V = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix}, \qquad \Lambda = \begin{bmatrix} 3 & 0 \\ 0 & -1 \end{bmatrix}

which tells us that

VΛV1=[1111][3001][1/21/21/21/2]=[1221]=AV \Lambda V^{-1} = \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix} = A

Let’s check with numpy.

V = np.array([[1, 1], 
              [1, -1]])

Lambda = np.diag([3, -1]) # New tool.
V_inv = np.linalg.inv(V)
V @ Lambda @ V_inv # The same as A!
array([[1., 2.], [2., 1.]])

Is this decomposition unique? No, because we could have chosen a different set of eigenvectors, or wrote them in a different order (in which case Λ\mathcal{\Lambda} would be different). We’ll keep the eigenvectors in the same order, but instead let’s consider

V=[2525]V = \begin{bmatrix} 2 & -5 \\ 2 & 5 \end{bmatrix}

which is also a valid eigenvector matrix for AA. (Remember that any scalar multiple of an eigenvector is still an eigenvector!)

It is still true that A=VΛV1A = V \Lambda V^{-1}. This may seem a little unbelievable (I wasn’t convinced at first), but remember that V1V^{-1} “undoes” any changes in scaling we introduce to VV.

VΛV1=[2525][3001][1/41/41/101/10]=[1221]=AV \Lambda V^{-1} = \begin{bmatrix} 2 & -5 \\ 2 & 5 \end{bmatrix} \begin{bmatrix} 3 & 0 \\ 0 & -1 \end{bmatrix} \begin{bmatrix} 1/4 & 1/4 \\ -1/10 & 1/10 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix} = A
V = np.array([[2, -5], 
              [2, 5]])
              
Lambda = np.diag([3, -1])
V_inv = np.linalg.inv(V)
V @ Lambda @ V_inv # Same as above!
array([[1., 2.], [2., 1.]])

Key Applications

Rightfully, you might be asking what the point of this is. There are (at least) two main uses of the eigenvalue decomposition.

Application 1: Matrix Powers

The first is that it makes it easy to compute powers of AA, which we know is a useful concept in understanding the long-run behavior of a Markov chain.

Suppose A=VΛV1A = V \Lambda V^{-1}. Then,

A2=(VΛV1)(VΛV1)=VΛ(V1V)ΛV1=VΛ2V1A^2 = (V \Lambda V^{-1})(V \Lambda V^{-1}) = V \Lambda (V^{-1}V) \Lambda V^{-1} = V \Lambda^2 V^{-1}

What does this say about A3A^3?

A3=(VΛV1)(VΛV1)(VΛV1)=VΛ(V1V)Λ(V1V)ΛV1=VΛ3V1A^3 = (V \Lambda V^{-1})(V \Lambda V^{-1})(V \Lambda V^{-1}) = V \Lambda (V^{-1}V) \Lambda (V^{-1}V) \Lambda V^{-1} = V \Lambda^3 V^{-1}

In general, if kk is a positive integer,

Ak=VΛkV1\boxed{A^k = V \Lambda^k V^{-1}}

So, to compute AkA^k, we don’t need to multiply kk matrices together (which would be a computational nightmare for large kk). Instead, all we need to do is compute VΛkV1V \Lambda^k V^{-1}. And remember, Λ\mathcal{\Lambda} is a diagonal matrix, so computing Λk\Lambda^k is easy: we just raise the diagonal entries to the power in question.

For example, A10=[1221]10A^{10} = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}^{10} is

A10=VΛ10V1=[1111][31000(1)10][1/21/21/21/2]=[1111][310/2310/21/21/2]=[(310+1)/2(3101)/2(3101)/2(310+1)/2]\begin{align*} A^{10} &= V \Lambda^{10} V^{-1} \\ &= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3^{10} & 0 \\ 0 & (-1)^{10} \end{bmatrix} \begin{bmatrix} 1/2 & 1/2 \\ 1/2 & -1/2 \end{bmatrix} \\ &= \begin{bmatrix} 1 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 3^{10}/2 & 3^{10}/2 \\1 /2 & -1/2 \end{bmatrix} \\ &= \begin{bmatrix} (3^{10} + 1)/2 & (3^{10} - 1)/2 \\ (3^{10} - 1)/2 & (3^{10} + 1)/2 \end{bmatrix} \end{align*}

Pretty neat!

np.linalg.matrix_power(A, 10)
array([[29525., 29524.], [29524., 29525.]])

Application 2: Understanding Linear Transformations

Remember that if AA is an n×nn \times n matrix, then f(x)=Axf(\vec x) = A \vec x is a linear transformation from Rn\mathbb{R}^n to Rn\mathbb{R}^n. But, if A=VΛV1A = V \Lambda V^{-1}, then

f(x)=Ax=VΛV1xf(\vec x) = A \vec x = V \Lambda V^{-1} \vec x

This allows us to understand the effect of ff on x\vec x in three stages. Remember that VV’s columns are the eigenvectors of AA, so the act of multiplying a vector y\vec y by VV is equivalent to taking a linear combination of VV’s columns (AA’s eigenvectors) using the weights in y\vec y.

Vy=[v1v2vn][y1y2yn]=y1v1+y2v2++ynvnV \vec y = \begin{bmatrix} | & | & \cdots & | \\ \vec v_1 & \vec v_2 & \cdots & \vec v_n \\ | & | & \cdots & | \end{bmatrix} \begin{bmatrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = y_1 \vec v_1 + y_2 \vec v_2 + \cdots + y_n \vec v_n

For example, V[34]V \begin{bmatrix} 3 \\ -4 \end{bmatrix} says take 3 of the first eigenvector, and -4 of the second eigenvector, and add them together to get a new vector. If V[34]=[12]V \begin{bmatrix} 3 \\ -4 \end{bmatrix} = \begin{bmatrix} 1 \\ 2 \end{bmatrix}, this says that taking 3 of the first eigenvector and -4 of the second eigenvector is the same as taking 1 of the first standard basis vector, [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix} and 2 of the second standard basis vector, [01]\begin{bmatrix} 0 \\ 1 \end{bmatrix}.

Intuitively, think of VV as a matrix that takes in “amounts of each eigenvector” and outputs “amounts of each standard basis vector”, where the standard basis vectors are the columns of II.

So, if

V:eigenvector amountsstandard basis vector amountsV: \text{eigenvector amounts} \to \text{standard basis vector amounts}

then V1V^{-1} does the opposite, and maps

V1:standard basis vector weightseigenvector amountsV^{-1}: \text{standard basis vector weights} \to \text{eigenvector amounts}

i.e. multiplying V1V^{-1} by x\vec x expresses x\vec x as a linear combination of the eigenvectors of AA. If z\vec z is the output of V1xV^{-1} \vec x, then x=Vz\vec x = V \vec z, meaning z\vec z contains the amounts of each eigenvector needed to produce x\vec x.

(This is non-standard notation, since VV is not a function and eigenvector amounts\text{eigenvector amounts} and standard basis vector amounts\text{standard basis vector amounts} are not sets but concepts, but I hope this helps convey the role of VV and V1V^{-1} in this context.)

Taking a step back, recall

f(x)=Ax=VΛV1xf(\vec x) = A \vec x = V \Lambda V^{-1} \vec x

What this is saying is ff does three things:

  1. First, it takes x\vec x and expresses it as a linear combination of the eigenvectors of AA, which is V1xV^{-1} \vec x. (This contains the “amounts of each eigenvector” needed to produce x\vec x.)

  2. Then, it takes that linear combination V1xV^{-1} \vec x and scales each eigenvector by its corresponding eigenvalue, i.e. it scales or stretches each eigenvector by a different amount. Remember that diagonal matrices only scale, they don’t do anything else!

  3. Finally, it takes the resulting scaled vector ΛV1x\Lambda V^{-1} \vec x and expresses it as a linear combination of the standard basis vectors, i.e. it combines the correct amounts of the stretched eigenvectors to get the final result.

express in eigenvector basismultiply by V1scale eigenvectorsmultiply by Λexpress in standard basismultiply by V\underbrace{\text{express in eigenvector basis}}_{\text{multiply by } V^{-1}} \to \underbrace{\text{scale eigenvectors}}_{\text{multiply by } \Lambda} \to \underbrace{\text{express in standard basis}}_{\text{multiply by } V}

This is better understood visually, as you’ll see in the Symmetric Matrices and the Spectral Theorem section below.


Examples

Example: Non-Diagonalizable Matrices

Most matrices we’ve seen in Chapter 5.1 and here so far in Chapter 5.2 have been diagonalizable. I’ve tried to shield you from the messy world of non-diagonalizable matrices until now, but we’re ready to dive deeper.

Consider A=[1101]A = \begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix}. The characteristic polynomial of AA is

p(λ)=det(AλI)=det[1λ101λ]=(1λ)2p(\lambda) = \text{det}(A - \lambda I) = \text{det} \begin{bmatrix} 1 - \lambda & 1 \\ 0 & 1 - \lambda \end{bmatrix} = (1 - \lambda)^2

which has a double root of λ=1\lambda = 1. So, AA has a single eigenvalue λ=1\lambda = 1. What are all of AA’s eigenvectors? They’re solutions to Av=1vA \vec v = 1 \vec v, i.e. (AI)v=0(A - I) \vec v = \vec 0. Let’s look at the null space of AIA - I:

AI=[111011]=[0100]A - I = \begin{bmatrix} 1 - 1 & 1 \\ 0 & 1 - 1 \end{bmatrix} = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

The null space of AIA - I is spanned by the single vector [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix}. So, AA has a single line of eigenvectors, spanned by v1=[10]\vec v_1 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}.

So, does AA have an eigenvalue decomposition? No, because we don’t have enough eigenvectors to form VV. If we try and form VV by using v1\vec v_1 in the first column and a column of zeros in the second column, we’d have

V=[1000]V = \begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}

but this is not an invertible matrix, so VV is not invertible, and we can’t write A=VΛV1A = V \Lambda V^{-1}. AA is not diagonalizable! This means that it’s harder to interpret AA as a linear transformation through the lens of eigenvectors, and it’s harder to understand the long-run behavior of AkxA^k \vec x for an arbitrary x\vec x.

Let me emphasize what I mean by AA needing to have nn linearly independent eigenvectors. Above, we found that [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix} is an eigenvector of AA, which means that any scalar multiple of [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix} is also an eigenvector of AA. But, the matrix

[1200]\begin{bmatrix} 1 & 2 \\ 0 & 0 \end{bmatrix}

is not invertible, meaning it can’t be the VV in A=VΛV1A = V \Lambda V^{-1}.

Example: The Identity Matrix

The 2×22 \times 2 identity matrix I=[1001]I = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} has the characteristic polynomial p(λ)=(λ1)2p(\lambda) = (\lambda - 1)^2. So, II has a single eigenvalue λ=1\lambda = 1, just like the last example.

But, λ=1\lambda = 1 corresponds to two different eigenvector directions! I can pick any two linearly independent vectors in R2\mathbb{R}^2 and they’ll both be eigenvectors for II. For example, let v1=[13]\vec v_1 = \begin{bmatrix} 1 \\ -3 \end{bmatrix} and v2=[1198]\vec v_2 = \begin{bmatrix} 11 \\ 98 \end{bmatrix}, meaning

V=[111398]V = \begin{bmatrix} 1 & 11 \\ -3 & 98 \end{bmatrix}

The matrix Λ\mathcal{\Lambda} is just [1001]\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} which is the same as II itself. This means that as long as VV is invertible (i.e. as long as I pick two linearly independent vectors in R2\mathbb{R}^2),

VΛV1=VIV1=VV1=IV \Lambda V^{-1} = V I V^{-1} = V V^{-1} = I

which means that indeed, II is diagonalizable. (Any matrix that is diagonal to begin with is diagonalizable: in A=PDP1A = PDP^{-1}, PP is just the identity matrix.)

If you’d rather look at this example through the lens of solving systems of equations, an eigenvector v=[ab]\vec v = \begin{bmatrix} a \\ b \end{bmatrix} of II with eigenvalue λ=1\lambda = 1 satisfies

Iv=[1001][ab]=[ab]=1[ab]I \vec v = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} a \\ b \end{bmatrix} = 1 \begin{bmatrix} a \\ b \end{bmatrix}

But, as a system this just says a=aa = a and b=bb = b, which is true for any v\vec v. Think of aa and bb both as independent variables; the set of possible v\vec v’s, then, is two dimensional (and is all of R2\mathbb{R}^2).

Example: Non-Invertible Matrices

Let A=[1224]A = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}.

  • AA is not invertible (column 2 is double column 1), so it has an eigenvalue of λ1=0\lambda_1 = 0, which corresponds to the eigenvector v1=[21]\vec v_1 = \begin{bmatrix} -2 \\ 1 \end{bmatrix}.

  • It also has an eigenvalue of λ2=5\lambda_2 = 5, corresponding to the eigenvector v2=[12]\vec v_2 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}. (Remember, the quick way to spot this eigenvalue is to remember that the sum of the eigenvalues of AA is equal to the trace of AA, which is 1+4=51 + 4 = 5; since 0 is an eigenvalue, the other eigenvalue must be 5.)

Does AA have an eigenvalue decomposition? Let’s see if anything goes wrong when we try and construct the eigenvector matrix VV and the diagonal matrix Λ\mathcal{\Lambda} and multiplying VΛV1V \Lambda V^{-1}.

V=[2112],Λ=[0005]V = \begin{bmatrix} -2 & 1 \\ 1 & 2 \end{bmatrix}, \qquad \Lambda = \begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix}
VΛV1=[2112][0005][2/51/51/52/5]=[1224]V \Lambda V^{-1} = \begin{bmatrix} -2 & 1 \\ 1 & 2 \end{bmatrix} \begin{bmatrix} 0 & 0 \\ 0 & 5 \end{bmatrix} \begin{bmatrix} -2/5 & 1/5 \\ 1/5 & 2/5 \end{bmatrix} = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}

Everything worked out just fine! AA is diagonalizable, even though it’s not invertible.


Algebraic and Geometric Multiplicity

As we saw in an earlier example, the matrix [1101]\begin{bmatrix} 1 & 1 \\ 0 & 1 \end{bmatrix} does not have two linearly independent eigenvectors, and so is not diagonalizable. I’d like to dive deeper into identifying when a matrix is diagonalizable.

The algebraic multiplicity of an eigenvalue, as the name suggests, is purely a property of the characteristic polynomial. Alone, they don’t tell us whether or not a matrix is diagonalizable. Instead, we’ll need to look at another form of multiplicity alongside the algebraic multiplicity.

The geometric multiplicity of an eigenvalue can’t be determined from the characteristic polynomial alone – instead, it involves finding nullsp(AλiI)\text{nullsp}(A - \lambda_i I) and finding its dimension, i.e. the number of linearly independent vectors needed to span it. But in general,

1GM(λi)AM(λi)1 \leq \text{GM}(\lambda_i) \leq \text{AM}(\lambda_i)

Think of the algebraic multiplicity of an eigenvalue as the “potential” number of linearly independent eigenvectors for an eigenvalue, sort of like the number of slots we have for that eigenvalue. The geometric multiplicity, on the other hand, is the number of linearly independent eigenvectors we actually have for that eigenvalue. As we’ll see in the following examples, when the geometric multiplicity is less than the algebraic multiplicity for any eigenvalue, the matrix in question is not diagonalizable.

In each of the following matrices AA, we’ll

  1. Find the algebraic multiplicity of each of AA’s eigenvalues.

  2. For each eigenvalue, find the geometric multiplicity and a basis for the eigenspace.

  3. Conclude whether AA is diagonalizable.

As usual, attempt these examples on your own first before peeking at the solutions.

A First Example

A=[110011001]A = \begin{bmatrix} 1 & 1 & 0 \\ 0 & 1 & 1 \\ 0 & 0 & 1 \end{bmatrix}

We’ll walk through this one together.

  1. First, we’ll find the characteristic polynomial of AA:

    p(λ)=det(AλI)=1λ1001λ1001λ=(1λ)3p(\lambda) = \text{det}(A - \lambda I) = \begin{vmatrix} 1 - \lambda & 1 & 0 \\ 0 & 1 - \lambda & 1 \\ 0 & 0 & 1 - \lambda \end{vmatrix} = (1 - \lambda)^3

    (You should practice quickly finding the characteristic polynomial of a 3×33 \times 3 matrix; Chapter 5.1 has relevant examples.)

    This tells us that AA has a single eigenvalue λ=1\lambda = 1, with algebraic multiplicity AM(1)=3\boxed{\text{AM}(1) = 3}.

  2. The geometric multiplicity of λ=1\lambda = 1 is dim(nullsp(AI))\text{dim}(\text{nullsp}(A - I)). Let’s look at AIA - I:

    AI=[010001000]A - I = \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 0 & 0 & 0 \end{bmatrix}

    AIA-I has rank 2, so dim(nullsp(AI))=32=1\text{dim}(\text{nullsp}(A - I)) = 3 - 2 = 1 (from the rank-nullity theorem), so the geometric multiplicity of λ=1\lambda = 1 is GM(1)=1\boxed{\text{GM}(1) = 1}.

    The vector v1=[100]\vec v_1 = \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} is an eigenvector for λ=1\lambda = 1, and so is any scalar multiple of v1\vec v_1; I found this by noticing that the first column of AIA-I is all zeros. So,

    nullsp(AI)=span({[100]})\boxed{\text{nullsp}(A - I) = \text{span} \left( \left\{ \begin{bmatrix} 1 \\ 0 \\ 0 \end{bmatrix} \right\} \right)}
  3. Since GM(1)=1<AM(1)=3\text{GM}(1) = 1 < \text{AM}(1) = 3, AA is not diagonalizable. AA only has one linearly independent eigenvector!

Now, it’s your turn.

Example: Adjacency Matrices

A=[0.80.30.20.7]A = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}

(This is the same adjacency matrix we introduced in Chapter 5.1, Part 2.)

Example: Another Diagonalizable Matrix

A=[210120003]A = \begin{bmatrix} 2 & 1 & 0 \\ 1 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix}

This is perhaps the single most comprehensive example!

Example: Another Non-Diagonalizable Matrix

A=[210020003]A = \begin{bmatrix} 2 & 1 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix}

(Note that this AA is almost identical to the AA in the previous example, but with a 1 switched to a 0 in the second row.)


Symmetric Matrices and the Spectral Theorem

Symmetric matrices – that is, square matrices where A=ATA = A^T – behave really nicely through the lens of eigenvectors, and understanding exactly how they work is key to Chapter 5.3, when we generalize beyond square matrices.

If you search for the spectral theorem online, you’ll often just see Statement 4 above; I’ve broken the theorem into smaller substatements to see how they are chained together.

  • The proof of Statement 1 is beyond our scope, since it involves fluency with complex numbers. If the term “complex conjugate” means something to you, read the proof here – it’s relatively short.

  • Statement 2 was proved in Lab 11, Activity 5.

  • I’m not going to cover the proof of Statement 3 here, as I don’t think it’ll add to your learning.

Orthogonal matrices QQ satisfy QTQ=QQT=IQ^TQ = QQ^T = I, meaning their columns (and rows) are orthonormal, not just orthogonal to one another. The fact that QTQ=QQT=IQ^TQ = QQ^T = I means that QT=Q1Q^T = Q^{-1}, so taking the transpose of a matrix is the same as taking its inverse.

So, instead of

A=VΛV1A = V \Lambda V^{-1}

we’ve “upgraded” to

A=QΛQTA = Q \Lambda Q^T

This is the main takeaway of the spectral theorem: that symmetric matrices can be diagonalized by an orthogonal matrix. Sometimes, A=QΛQTA = Q \Lambda Q^T is called the spectral decomposition of AA, but all it is is a special case of the eigenvalue decomposition for symmetric matrices.

Visualizing the Spectral Theorem

Why do we prefer QΛQTQ \Lambda Q^T over VΛV1V \Lambda V^{-1}? Taking the transpose of a matrix is much easier than inverting it, so actually working with QΛQTQ \Lambda Q^T is easier.

A=QΛQT    Ak=QΛkQTno inversion needed!\underbrace{A = Q \Lambda Q^T \implies A^k = Q \Lambda^k Q^T}_{\text{no inversion needed!}}

But it’s also an improvement in terms of interpretation: remember that orthogonal matrices are matrices that represent rotations. So, if AA is symmetric, then the linear transformation f(x)=Axf(\vec x) = A \vec x is a sequence of rotations and stretches.

f(x)=Ax=QΛQTxf(\vec x) = A \vec x = Q \Lambda Q^T \vec x

Let’s make sense of this visually. Consider the symmetric matrix A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}.

Image produced in Jupyter

AA appears to perform an arbitrary transformation; it turns the unit square into a parallelogram, as we first saw in Chapter 2.9.

But, since AA is symmetric, it can be diagonalized by an orthogonal matrix, A=QΛQTA = Q \Lambda Q^T.

A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}

has eigenvalues λ1=3\lambda_1 = 3 with eigenvector v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} and λ2=1\lambda_2 = -1 with eigenvector v2=[11]\vec v_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix}. But, the vi\vec v_i’s I’ve written aren’t unit vectors, which they need to be in order for QQ to be orthogonal. So, we normalize them to get q1=[1212]\vec q_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} and q2=[1212]\vec q_2 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix}. Placing these qi\vec q_i’s as columns of QQ, we get

Q=[12121212]Q = \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}

and so

A=QΛQT=[12121212]Q[3001]Λ[12121212]QTA = Q \Lambda Q^T = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_Q \underbrace{\begin{bmatrix} 3 & 0 \\ 0 & -1 \end{bmatrix}}_\Lambda \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_{Q^T}

We’re visualizing how x\vec x turns into AxA \vec x, i.e. how x\vec x turns into QΛQTxQ \Lambda Q^T \vec x. This means that we first need to consider the effect of QTQ^T on x\vec x, then the effect of Λ\mathcal{\Lambda} on that result, and finally the effect of QQ on that result – that is, read the matrices from right to left.

Image produced in Jupyter

The Ellipse Perspective

Another way of visualizing the linear transformation of a symmetric matrix is to consider its effect on the unit circle, not the unit square. Below, I’ll apply A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix} to the unit circle.

Image produced in Jupyter

Notice that AA transformed the unit circle into an ellipse. What’s more, the axes of the ellipse are the eigenvector directions of AA!

Why is one axis longer than the other? As you might have guessed, the longer axis – the one in the direction of the eigenvector v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} – corresponds to the larger eigenvalue. Remember that AA has λ1=3\lambda_1 = 3 and λ2=1\lambda_2 = -1, so the “up and to the right” axis is three times longer than the “down and to the right” axis, defined by v2=[11]\vec v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}.

To see why this happens, consult the solutions to Lab 11, Activity 6b to try and derive it. It has to do with the expression i=1nλiyi2\sum_{i = 1}^n \lambda_i y_i^2’s in that derivation. What are the λi\lambda_i’s and where did the yiy_i’s come from?


Positive Semidefinite Matrices

I will keep this section brief; this is mostly meant to be a reference for a specific definition that you used in Lab 11 and will use in Homework 10.

What does this have anything to do with the diagonalization of a matrix? We just spent a significant amount of time talking about the special properties of symmetric matrices, and positive semidefinite matrices are a subset of symmetric matrices, so the properties implied by the spectral theorem also apply to positive semidefinite matrices.

Positive semidefinite matrices appear in the context of minimizing quadratic forms, f(x)=xTAxf(\vec x) = \vec x^T A \vec x. You’ve toyed around with this in Lab 11, but also note that in Chapter 4.1 we saw the most important quadratic form of all: the mean-squared error!

Rsq(w)=1nyXw2this involves a quadratic form\underbrace{R_\text{sq}(\vec w) = \frac{1}{n} \lVert \vec y - X \vec w \rVert^2}_{\text{this involves a quadratic form}}

If we know all of the eigenvalues of AA in xTAx\vec x^T A \vec x are non-negative, then we know that xTAx0\vec x^T A \vec x \geq 0 for all x\vec x, meaning that the quadratic form has a global minimum. This is why, as discussed in Lab 11, the quadratic form xTAx\vec x^T A \vec x is convex if and only if AA is positive semidefinite.

The fact that having non-negative eigenvalues implies the first definition of positive semidefiniteness is not immediately obvious, but is exactly what we proved in Lab 11, Activity 6.

A positive definite matrix is one in which xTAx>0\vec x^T A \vec x > 0 for all x0\vec x \neq \vec 0, i.e. where all eigenvalues are positive, not just non-negative (0 is no longer an option).


Key Takeaways

  1. The eigenvalue decomposition of a matrix AA is a decomposition of the form

    A=VΛV1A = V \Lambda V^{-1}

    where VV is a matrix containing the eigenvectors of AA as columns, and Λ\mathcal{\Lambda} is a diagonal matrix of eigenvalues in the same order. Only diagonalizable matrices can be decomposed in this way.

  2. The algebraic multiplicity of an eigenvalue λi\mathbf{\lambda}_i is the number of times λi\mathbf{\lambda}_i appears as a root of the characteristic polynomial of AA.

  3. The geometric multiplicity of λ\mathbf{\lambda} is the dimension of the eigenspace of λ\mathbf{\lambda}, i.e. dim(nullsp(AλI))\text{dim}(\text{nullsp}(A - \lambda I)).

  4. The n×nn \times n matrix is diagonalizable if and only if any of these equivalent conditions are true:

    • AA has nn linearly independent eigenvectors.

    • For every eigenvalue λi\lambda_i, GM(λi)=AM(λi)\text{GM}(\lambda_i) = \text{AM}(\lambda_i).

    • AA has nn distinct eigenvalues.

    When AA is diagonalizable, it has an eigenvalue decomposition, A=VΛV1A = V \Lambda V^{-1}.

  5. If AA is a symmetric matrix, then the spectral theorem tells us that AA can be diagonalized by an orthogonal matrix QQ such that

    A=QΛQTA = Q \Lambda Q^T

    and that all of AA’s eigenvalues are guaranteed to be real.

What’s next? There’s the question of how any of this relates to real data. Real data comes in rectangular matrices, not square matrices. And even it were square, how does any of this enlighten us?