Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

9.5. Symmetric Matrices and the Spectral Theorem

Symmetric matrices – that is, square matrices where A=ATA = A^T – behave really nicely through the lens of eigenvectors, and understanding exactly how they work is key to Chapter 10.1, when we generalize beyond square matrices.

While editing these notes, I came across a fitting tweet:

Most people with “AI/ML” in their bios don’t even know a real symmetric matrix always has real eigenvalues.

vixhaℓ (@TheVixhal) March 29, 2026


The Spectral Theorem

If you search for the spectral theorem online, you’ll often just see Statement 4 above; I’ve broken the theorem into smaller substatements to see how they are chained together.

The proof of Statement 1 is beyond our scope, since it involves fluency with complex numbers. If the term “complex conjugate” means something to you, read the proof here – it’s relatively short.

The key idea to prove is Statement 2: that for a symmetric matrix, eigenvectors corresponding to different eigenvalues are orthogonal. Suppose vi\vec v_i is an eigenvector of AA with eigenvalue λi\lambda_i and vj\vec v_j is an eigenvector of AA with eigenvalue λj\lambda_j, where λiλj\lambda_i \neq \lambda_j. Then

Avi=λiviandAvj=λjvjA \vec v_i = \lambda_i \vec v_i \qquad \text{and} \qquad A \vec v_j = \lambda_j \vec v_j

Consider the dot product vi(Avj)\vec v_i \cdot (A \vec v_j). Using the fact that vj\vec v_j is an eigenvector, we get

vi(Avj)=vi(λjvj)=λj(vivj)\vec v_i \cdot (A \vec v_j) = \vec v_i \cdot (\lambda_j \vec v_j) = \lambda_j (\vec v_i \cdot \vec v_j)

But we can also rewrite the same quantity using the fact that AA is symmetric:

vi(Avj)=viTAvj=viTATvj=(Avi)Tvj=λiviTvj=λi(vivj)\vec v_i \cdot (A \vec v_j) = \vec v_i^T A \vec v_j = \vec v_i^T A^T \vec v_j = (A \vec v_i)^T \vec v_j = \lambda_i \vec v_i^T \vec v_j = \lambda_i (\vec v_i \cdot \vec v_j)

So,

λj(vivj)=λi(vivj)\lambda_j (\vec v_i \cdot \vec v_j) = \lambda_i (\vec v_i \cdot \vec v_j)

which means

(λjλi)(vivj)=0(\lambda_j - \lambda_i)(\vec v_i \cdot \vec v_j) = 0

Since λiλj\lambda_i \neq \lambda_j, the first factor is non-zero, so we must have vivj=0\vec v_i \cdot \vec v_j = 0. Therefore, eigenvectors corresponding to different eigenvalues are orthogonal.

For a given eigenvector direction, we can pick any vector in that direction to be the eigenvector we store in the VV that we use to diagonalize AA – if v\vec v is an eigenvector, so is 2v2 \vec v, 3v-3 \vec v, vv\frac{\vec v}{\lVert \vec v \rVert}, and so on. The convenient choice is to pick unit vectors in each direction. If we take these nn unit eigenvectors and place them in the columns of a matrix, that matrix is an orthogonal matrix! Orthogonal matrices QQ satisfy QTQ=QQT=IQ^TQ = QQ^T = I, meaning their columns (and rows) are orthonormal, not just orthogonal to one another. The fact that QTQ=QQT=IQ^TQ = QQ^T = I means that QT=Q1Q^T = Q^{-1}, so taking the transpose of a matrix is the same as taking its inverse.

A=VΛV1A = V \Lambda V^{-1}

we’ve “upgraded” to

A=QΛQTA = Q \Lambda Q^T

This is the main takeaway of the spectral theorem: that symmetric matrices can be diagonalized by an orthogonal matrix. Sometimes, A=QΛQTA = Q \Lambda Q^T is called the spectral decomposition of AA, but all it is is a special case of the eigenvalue decomposition for symmetric matrices.

Visualizing the Spectral Theorem

Why do we prefer QΛQTQ \Lambda Q^T over VΛV1V \Lambda V^{-1}? Taking the transpose of a matrix is much easier than inverting it, so actually working with QΛQTQ \Lambda Q^T is easier.

A=QΛQT    Ak=QΛkQTno inversion needed!\underbrace{A = Q \Lambda Q^T \implies A^k = Q \Lambda^k Q^T}_{\text{no inversion needed!}}

But it’s also an improvement in terms of interpretation: remember that orthogonal matrices are matrices that represent rotations. So, if AA is symmetric, then the linear transformation f(x)=Axf(\vec x) = A \vec x is a sequence of rotations and stretches.

f(x)=Ax=QΛQTxf(\vec x) = A \vec x = Q \Lambda Q^T \vec x

Let’s make sense of this visually. Consider the symmetric matrix A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}.

Image produced in Jupyter

AA appears to perform an arbitrary transformation; it turns the unit square into a parallelogram, as we first saw in Chapter 6.1.

But, since AA is symmetric, it can be diagonalized by an orthogonal matrix, A=QΛQTA = Q \Lambda Q^T.

A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix}

has eigenvalues λ1=3\lambda_1 = 3 with eigenvector v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} and λ2=1\lambda_2 = -1 with eigenvector v2=[11]\vec v_2 = \begin{bmatrix} -1 \\ 1 \end{bmatrix}. But, the vi\vec v_i’s I’ve written aren’t unit vectors, which they need to be in order for QQ to be orthogonal. So, we normalize them to get q1=[1212]\vec q_1 = \begin{bmatrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix} and q2=[1212]\vec q_2 = \begin{bmatrix} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \end{bmatrix}. Placing these qi\vec q_i’s as columns of QQ, we get

Q=[12121212]Q = \begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}

and so

A=QΛQT=[12121212]Q[3001]Λ[12121212]QTA = Q \Lambda Q^T = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_Q \underbrace{\begin{bmatrix} 3 & 0 \\ 0 & -1 \end{bmatrix}}_\Lambda \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_{Q^T}

We’re visualizing how x\vec x turns into AxA \vec x, i.e. how x\vec x turns into QΛQTxQ \Lambda Q^T \vec x. This means that we first need to consider the effect of QTQ^T on x\vec x, then the effect of Λ\mathcal{\Lambda} on that result, and finally the effect of QQ on that result – that is, read the matrices from right to left.

Image produced in Jupyter

The Ellipse Perspective

Another way of visualizing the linear transformation of a symmetric matrix is to consider its effect on the unit circle, not the unit square. Below, I’ll apply A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix} to the unit circle.

Image produced in Jupyter

Notice that AA transformed the unit circle into an ellipse. What’s more, the axes of the ellipse are the eigenvector directions of AA!

Why is one axis longer than the other? As you might have guessed, the longer axis – the one in the direction of the eigenvector v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} – corresponds to the larger eigenvalue. Remember that AA has λ1=3\lambda_1 = 3 and λ2=1\lambda_2 = -1, so the “up and to the right” axis is three times longer than the “down and to the right” axis, defined by v2=[11]\vec v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}.

Why does this happen? Since AA is symmetric, it has a spectral decomposition A=QΛQTA = Q \Lambda Q^T, where QQ is orthogonal and Λ\Lambda is diagonal. For any vector x\vec x, let y=QTx\vec y = Q^T \vec x. Then

xTAx=xT(QΛQT)x=yTΛy=i=1nλiyi2\vec x^T A \vec x = \vec x^T (Q \Lambda Q^T) \vec x = \vec y^T \Lambda \vec y = \sum_{i=1}^n \lambda_i y_i^2

Now imagine that x\vec x lies on one of AA’s eigenvector directions, say the direction of v1\vec v_1. In that case, after rotating by QTQ^T, the vector y\vec y has only one non-zero coordinate, namely the coordinate corresponding to λ1\lambda_1. So the sum above collapses to

xTAx=λ1y12\vec x^T A \vec x = \lambda_1 y_1^2

Similarly, along the eigenvector direction of v2\vec v_2, we get xTAx=λ2y22\vec x^T A \vec x = \lambda_2 y_2^2. The size of the output along each principal axis is therefore controlled by the corresponding eigenvalue. Larger eigenvalues produce longer axes, and smaller eigenvalues produce shorter axes. Here, λ1=3\lambda_1 = 3 and λ2=1\lambda_2 = -1, so the axis in the v1\vec v_1 direction is longer in magnitude than the axis in the v2\vec v_2 direction.


Key Takeaways

  1. The eigenvalue decomposition of a matrix AA is a decomposition of the form

    A=VΛV1A = V \Lambda V^{-1}

    where VV is a matrix containing the eigenvectors of AA as columns, and Λ\mathcal{\Lambda} is a diagonal matrix of eigenvalues in the same order. Only diagonalizable matrices can be decomposed in this way.

  2. The algebraic multiplicity of an eigenvalue λi\mathbf{\lambda}_i is the number of times λi\mathbf{\lambda}_i appears as a root of the characteristic polynomial of AA.

  3. The geometric multiplicity of λ\mathbf{\lambda} is the dimension of the eigenspace of λ\mathbf{\lambda}, i.e. dim(nullsp(AλI))\text{dim}(\text{nullsp}(A - \lambda I)).

  4. The n×nn \times n matrix is diagonalizable if and only if any of these equivalent conditions are true:

    • AA has nn linearly independent eigenvectors.

    • For every eigenvalue λi\lambda_i, GM(λi)=AM(λi)\text{GM}(\lambda_i) = \text{AM}(\lambda_i).

    • AA has nn distinct eigenvalues.

    When AA is diagonalizable, it has an eigenvalue decomposition, A=VΛV1A = V \Lambda V^{-1}.

  5. If AA is a symmetric matrix, then the spectral theorem tells us that AA can be diagonalized by an orthogonal matrix QQ such that

    A=QΛQTA = Q \Lambda Q^T

    and that all of AA’s eigenvalues are guaranteed to be real.

What’s next? There’s the question of how any of this relates to real data. Real data comes in rectangular matrices, not square matrices. And even it were square, how does any of this enlighten us?