Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

10.1. Computing the Singular Value Decomposition

Eigenvalues and eigenvectors are powerful concepts, but unfortunately, they only apply to square matrices. It would be nice if we could extend some of their utility to non-square matrices, like matrices containing real data, which typically have many more rows than columns.

This is precisely where the singular value decomposition (SVD) comes in. You should think of it as a generalization of the eigenvalue decomposition, A=VΛV1A = V \Lambda V^{-1}, to non-square matrices. When we first introduced eigenvalues and eigenvectors, we said that an eigenvector v\vec v of AA is a vector whose direction is unchanged when multiplied by AA – all that happens to v\vec v is that it’s scaled by a factor of λ\mathcal{\lambda}.

Av=λvA \vec v = \lambda \vec v

But now, suppose XX is some n×dn \times d matrix. For any vector vRd\vec v \in \mathbb{R}^d, the vector XvX \vec v is in Rn\mathbb{R}^n, not Rd\mathbb{R}^d, meaning v\vec v and XvX \vec v live in different universes. So, it can’t make sense to say that XvX \vec v results from stretching v\vec v.

Instead, we’ll find pairs of singular vectors, vRd\vec v \in \mathbb{R}^d and uRn\vec u \in \mathbb{R}^n, such that

Av=σuA \vec v = \sigma \vec u

where σ\mathcal{\sigma} is a singular value of AA (not a standard deviation!). Intuitively, this says that when XX is multiplied by v\vec v, the result is a scaled version of u\vec u. These singular values and vectors are the focus of this chapter. Applications of the singular value decomposition come in Chapter 10.2.


Introduction

I’ll start by giving you the definition of the SVD, and then together we’ll figure out where it came from.

Firstly, note that the SVD exists, no matter what XX is: it can be non-square, and it doesn’t even need to be full rank. But note that unlike in the eigenvalue decomposition, where we decomposed AA using just one eigenvector matrix VV and a diagonal matrix Λ\mathcal{\Lambda}, here we need to use two singular vector matrices UU and VV and a diagonal matrix Σ\mathcal{\Sigma}.

There’s a lot of notation and new concepts above. Let’s start with an example, then try and understand where each piece in X=UΣVTX = U \Sigma V^T comes from and means. Suppose

X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}

XX is a 4×34 \times 3 matrix with rank(X)=2\text{rank}(X) = 2, since its third column is the sum of the first two. Its singular value decomposition is given by X=UΣVTX = U \Sigma V^T:

X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VTX = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

Two important observations:

  1. UU and VV are both orthogonal matrices, meaning UTU=UUT=I4×4U^TU = UU^T = I_{4 \times 4} and VTV=VVT=I3×3V^TV = VV^T = I_{3 \times 3}.

  2. Σ\mathcal{\Sigma} contains the singular values of XX on the diagonal, arranged in decreasing order. We have that σ1=15\sigma_1 = 15, σ2=3\sigma_2 = 3, and σ3=0\sigma_3 = 0. XX has three singular values, but only two are non-zero. In general, the number of non-zero singular values is equal to the rank of XX.

Where did all of these numbers come from?


Discovering the SVD

The SVD of XX depends heavily on the matrices XTXX^TX and XXTXX^T. While XX itself is 4×34 \times 3,

  • XTXX^TX is a symmetric 3×33 \times 3 matrix, containing the dot products of XX’s columns

  • XXTXX^T is a symmetric 4×44 \times 4 matrix, containing the dot products of XX’s rows

Since XTXX^TX and XXTXX^T are both square matrices, they have eigenvalues and eigenvectors. And since they’re both symmetric, their eigenvectors for different eigenvalues are orthogonal to each other, as the spectral theorem A=QΛQTA = Q \Lambda Q^T guarantees for any symmetric matrix AA.

Singular Values and Singular Vectors

The singular value decomposition involves creatively using the eigenvalues and eigenvectors of XTXX^TX and XXTXX^T. Suppose X=UΣVTX = U \Sigma V^T is the SVD of XX. Then, using the facts that UTU=IU^TU = I and VTV=IV^TV = I, we have:

XTX=(UΣVT)T(UΣVT)=VΣTUTUIΣVT=VΣTΣVTlooks like QΛQTX^TX = (U \Sigma V^T)^T (U \Sigma V^T) = V \Sigma^T \underbrace{U^T U}_{I} \Sigma V^T = \underbrace{V \Sigma^T \Sigma V^T}_{\text{looks like } Q \Lambda Q^T}
XXT=(UΣVT)(UΣVT)T=UΣVTVIΣTUT=UΣΣTUTlooks like PΛPTXX^T = (U \Sigma V^T) (U \Sigma V^T)^T = U \Sigma \underbrace{V^T V}_{I} \Sigma^T U^T = \underbrace{U \Sigma \Sigma^T U^T}_{\text{looks like } P \Lambda P^T}

This just looks like we diagonalized XTXX^TX and XXTXX^T! The expressions above are saying that:

  • VV’s columns are the eigenvectors of XTXX^TX

  • UU’s columns are the eigenvectors of XXTXX^T

XTXX^TX and XXTXX^T usually have different sets of eigenvectors, which is why UU and VV are generally not the same matrix (they don’t even have the same shape).

The eigenvalues of XTXX^TX and XXTXX^T are the same, though: those are the non-zero entries of ΣTΣ\Sigma^T \Sigma and ΣΣT\Sigma \Sigma^T. Since Σ\mathcal{\Sigma} is an n×dn \times d matrix, ΣTΣ\Sigma^T \Sigma is a d×dd \times d matrix and ΣΣT\Sigma \Sigma^T is an n×nn \times n matrix. But, when you work out both products, you’ll notice that their non-zero values are the same.

Suppose for example that Σ=[σ1000σ2000σ3000]\Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix}. Then,

ΣTΣ=[σ10000σ20000σ30][σ1000σ2000σ3000]=[σ12000σ22000σ32]\Sigma^T \Sigma = \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix} = \begin{bmatrix} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2\end{bmatrix}
ΣΣT=[σ1000σ2000σ3000][σ10000σ20000σ30]=[σ120000σ220000σ3200000]\Sigma \Sigma^T = \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix} \begin{bmatrix} \sigma_1 & 0 & 0 & 0 \\ 0 & \sigma_2 & 0 & 0 \\ 0 & 0 & \sigma_3 & 0 \end{bmatrix} = \begin{bmatrix} \sigma_1^2 & 0 & 0 & 0 \\ 0 & \sigma_2^2 & 0 & 0 \\ 0 & 0 & \sigma_3^2 & 0 \\ 0 & 0 & 0 & 0 \end{bmatrix}

But, these matrices with the squared terms are precisely the Λ\mathcal{\Lambda}'s in the spectral decompositions of XTX=QΛQTX^TX = Q \Lambda Q^T and XXT=PΛPTXX^T = P \Lambda P^T. This means that

σi2=λi    σi=λi\sigma_i^2 = \lambda_i \implies \sigma_i = \sqrt{\lambda_i}

where σi\sigma_i is a singular value of XX and λi\lambda_i is an eigenvalue of XTXX^TX or XXTXX^T.

The above derivation is enough to justify that the eigenvalues of XTXX^TX and XXTXX^T are never negative, but for another perspective, note that both XTXX^TX and XXTXX^T are positive semidefinite, meaning their eigenvalues are non-negative. (You’re wrestling with this fact in Lab 11 and Homework 10.)

Computing the SVD

To find UU, Σ\mathcal{\Sigma}, and VTV^T, we don’t actually need to compute both XTXX^TX and XXTXX^T: all of these quantities can be uncovered just with one of them.

Let’s return to our example, X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}. Using what we’ve just learned, let’s find UU, Σ\mathcal{\Sigma}, and VTV^T ourselves. XTXX^TX has fewer entries than XXTXX^T, so let’s start with it. I will delegate some number crunching to numpy.

import numpy as np

X = np.array([[3, 2, 5],
              [2, 3, 5],
              [2, -2, 0],
              [5, 5, 10]])
X.T @ X
array([[ 42, 33, 75], [ 33, 42, 75], [ 75, 75, 150]])

XTXX^TX is a 3×33 \times 3 matrix, but its rank is 2, meaning it will have an eigenvalue of 0. What are its other eigenvalues?

eigvals, eigvecs = np.linalg.eig(X.T @ X)
eigvals
array([225., 9., 0.])

The eigenvalues of XTXX^TX are 225, 9, and 0. This tells us that the singular values of XX are 225=15\sqrt{225} = 15, 9=3\sqrt{9} = 3, and 0. So far, we’ve discovered that Σ=[1500030000000]\mathcal{\Sigma} = \begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}. Remember that Σ\mathcal{\Sigma} always has the same shape as XX (both are n×dn \times d), and all of its entries are 0 except for the singular values, which are arranged in decreasing order on the diagonal, starting with the largest singular value in the top left corner.

Let’s now find the eigenvectors of XTXX^TX – that is, the right singular vectors of XX – which we should store in VV. We expect the eigenvectors vi\vec v_i of XTXX^TXto be orthogonal, since XTXX^TX is symmetric.

X.T @ X
array([[ 42, 33, 75], [ 33, 42, 75], [ 75, 75, 150]])
  • For λ1=225\lambda_1 = 225, one eigenvector is v1=[112]\vec v_1 = \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}, since

[4233753342757575150]XTX[112]=[225225450]=225[112]\underbrace{\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix}}_{X^TX} \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} = \begin{bmatrix} 225 \\ 225 \\ 450 \end{bmatrix} = 225 \begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix}
  • For λ2=9\lambda_2 = 9, one eigenvector is v2=[110]\vec v_2 = \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}, since

[4233753342757575150][110]=[990]=9[110]\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} = \begin{bmatrix} 9 \\ -9 \\ 0 \end{bmatrix} = 9 \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix}
  • For λ3=0\lambda_3 = 0, one eigenvector is v3=[111]\vec v_3 = \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix}, since

[4233753342757575150][111]=[000]\begin{bmatrix} 42 & 33 & 75 \\ 33 & 42 & 75 \\ 75 & 75 & 150 \end{bmatrix} \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

Note that v3\vec v_3 is a basis for nullsp(X)\text{nullsp}(X), since dim(nullsp(X))=1\text{dim}(\text{nullsp}(X)) = 1 and Xv3=0X \vec v_3 = \vec 0. Hold that thought for now.

To create VV, all we need to do is turn v1\vec v_1, v2\vec v_2, and v3\vec v_3 into unit vectors.

[112][161626]v1,[110][12120]v2,[111][131313]v3\begin{bmatrix} 1 \\ 1 \\ 2 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{bmatrix}}_{\vec v_1}, \quad \begin{bmatrix} 1 \\ -1 \\ 0 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix}}_{\vec v_2}, \quad \begin{bmatrix} 1 \\ 1 \\ -1 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \end{bmatrix}}_{\vec v_3}

Stacking these unit vectors together gives us VV:

V=[16121316121326013]V = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{6}} & -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{3}} \\ \frac{2}{\sqrt{6}} & 0 & -\frac{1}{\sqrt{3}} \end{bmatrix}

And indeed, since XTXX^TX is symmetric, VV is orthogonal: VTV=VVT=I3×3V^TV = VV^T = I_{3 \times 3}.

Great! We’re almost done computing the SVD. So far, we have

[3252352205510]X=U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = U \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

XV=UΣXV = U \Sigma and Xvi=σiuiX \vec v_i = \sigma_i \vec u_i

Ideally, we can avoid having to compute the eigenvectors of XXTXX^T to stack into UU. And we can. If we start with

X=UΣVTX = U \Sigma V^T

and multiply both sides on the right by VV, we uncover a relationship between the columns of UU and the columns of VV.

XV=UΣXV = U \Sigma

Let’s unpack this. On the left, the matrix XVXV is made up of multiplying XX by each column of VV.

XV=X4×3[v1v2v3]3×3=[Xv1Xv2Xv3]4×3XV = X_{4 \times 3} \begin{bmatrix} | & | & | \\ \vec v_1 & \vec v_2 & \vec v_3 \\ | & | & | \end{bmatrix}_{3 \times 3} = \begin{bmatrix} | & | & | \\ X \vec v_1 & X \vec v_2 & X \vec v_3 \\ | & | & | \end{bmatrix}_{4 \times 3}

On the right, UΣU \Sigma is made up of stretching each column of UU by the corresponding singular value in the diagonal of Σ\mathcal{\Sigma}.

UΣ=[u1u2u3u4]4×4[σ1000σ2000σ3000]4×3=[σ1u1σ2u2σ3u3]4×3U \Sigma = \begin{bmatrix} | & | & | & | \\ \vec u_1 & \vec u_2 & \vec u_3 & \vec u_4 \\ | & | & | & | \end{bmatrix}_{4 \times 4} \begin{bmatrix} \sigma_1 & 0 & 0 \\ 0 & \sigma_2 & 0 \\ 0 & 0 & \sigma_3 \\ 0 & 0 & 0 \end{bmatrix}_{4 \times 3} = \begin{bmatrix} | & | & | \\ \sigma_1 \vec u_1 & \sigma_2 \vec u_2 & \sigma_3 \vec u_3 \\ | & | & | \end{bmatrix}_{4 \times 3}

But, since XV=UΣXV = U \Sigma, we have

[Xv1Xv2Xv3]=[σ1u1σ2u2σ3u3]\begin{bmatrix} | & | & | \\ X \vec v_1 & X \vec v_2 & X \vec v_3 \\ | & | & | \end{bmatrix} = \begin{bmatrix} | & | & | \\ \sigma_1 \vec u_1 & \sigma_2 \vec u_2 & \sigma_3 \vec u_3 \\ | & | & | \end{bmatrix}

Crucially though, Xvi=σiuiX \vec v_i = \sigma_i \vec u_i only holds when we’ve arranged the singular values and vectors in UU, Σ\mathcal{\Sigma}, and VTV^T consistently. This is one reason why we always arrange the singular values in Σ\mathcal{\Sigma} in decreasing order.

Back to our example. Again, we currently have

[3252352205510]X=U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = U \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

We know XX, and we know each vi\vec v_i and σi\sigma_i. Rearranging Xvi=σiuiX \vec v_i = \sigma_i \vec u_i gives us

ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i

which is a recipe for computing u1\vec u_1, u2\vec u_2, and u3\vec u_3.

  1. u1=115X[161626]=[1616026]\vec u_1 = \frac{1}{15} X \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ \frac{2}{\sqrt{6}} \end{bmatrix} = \begin{bmatrix} \frac{1}{\sqrt{6}} \\ \frac{1}{\sqrt{6}} \\ 0 \\ \frac{2}{\sqrt{6}} \end{bmatrix}

  2. u2=13X[12120]=[1321322230]\vec u_2 = \frac{1}{3} X \begin{bmatrix} \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} \\ 0 \end{bmatrix} = \begin{bmatrix} \frac{1}{3\sqrt{2}} \\ -\frac{1}{3\sqrt{2}} \\ \frac{2\sqrt{2}}{3} \\ 0 \end{bmatrix}

  3. u3=10X[131313]=...wait, what?\vec u_3 = \frac{1}{0} X \begin{bmatrix} \frac{1}{\sqrt{3}} \\ \frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \end{bmatrix} = ... \text{wait, what?}

Something is not quite right: σ3=0\sigma_3 = 0, which means we can’t use ui=1σiXvi\vec u_i = \frac{1}{\sigma_i} X \vec v_i to compute u3\vec u_3. And, even if we could, this recipe tells us nothing about u4\vec u_4, which we also need to find, since UU is 4×44 \times 4.

Null Spaces Return

So far, we’ve found two of the four columns of UU.

U=[16132??16132??0223??260??]U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & ? & ? \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & ? & ? \\ 0 & \frac{2\sqrt{2}}{3} & ? & ? \\ \frac{2}{\sqrt{6}} & 0 & ? & ? \end{bmatrix}

The issue is that we don’t have a recipe for what u3\vec u_3 and u4\vec u_4 should be. This problem stems from the fact that rank(X)=2\text{rank}(X) = 2.

X=[3252352205510]X = \begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}

If we were to compute XXTXX^T, whose eigenvectors are the columns of UU, we would have seen that it has an eigenvalue of 0 with geometric (and algebraic) multiplicity 2.

X @ X.T
array([[ 38, 37, 2, 75], [ 37, 38, -2, 75], [ 2, -2, 8, 0], [ 75, 75, 0, 150]])
eigvals, eigvecs = np.linalg.eig(X @ X.T)
eigvals
array([225., 9., -0., 0.])

So, all we need to do is fill u3\vec u_3 and u4\vec u_4 with two orthonormal vectors that span the eigenspace of XXTXX^T for eigenvalue 0. u1\vec u_1 is already an eigenvector for eigenvalue 225 and u2\vec u_2 is already an eigenvector for eigenvalue 9. Bringing in u3\vec u_3 and u4\vec u_4 will complete UU, making it a basis for R4\mathbb{R}^4, which it needs to be to be an invertible square matrix. (Remember, UU must satisfy UTU=UUT=IU^T U = UU^T = I, which means UU is invertible.)

But, the eigenspace of XXTXX^T for eigenvalue 0 is nullsp(XXT)\text{nullsp}(XX^T)!

Since XXTXX^T is symmetric, u3\vec u_3 and u4\vec u_4 will be orthogonal to u1\vec u_1 and u2\vec u_2 no matter what, since the eigenvectors for different eigenvalues of a symmetric matrix are always orthogonal. (Another perspective on why this is true is that u3\vec u_3 and u4\vec u_4 will be in nullsp(XXT)\text{nullsp}(XX^T) which is equivalent to nullsp(XT)\text{nullsp}(X^T), and any vector in nullsp(XT)\text{nullsp}(X^T) is orthogonal to any vector in colsp(X)\text{colsp}(X), which u1\vec u_1 and u2\vec u_2 are in. There’s a long chain of reasoning that leads to this conclusion: make sure you’re familiar with it.)

Observe that

  • [1101]\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} is in nullsp(XXT)\text{nullsp}(XX^T), since [38372753738275228075750150][1101]=[0000]\begin{bmatrix} 38 & 37 & 2 & 75 \\ 37 & 38 & -2 & 75 \\ 2 & -2 & 8 & 0 \\ 75 & 75 & 0 & 150 \\\end{bmatrix} \begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

  • [2210]\begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} is in nullsp(XXT)\text{nullsp}(XX^T), since [38372753738275228075750150][2210]=[0000]\begin{bmatrix} 38 & 37 & 2 & 75 \\ 37 & 38 & -2 & 75 \\ 2 & -2 & 8 & 0 \\ 75 & 75 & 0 & 150 \\\end{bmatrix} \begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \\ 0 \end{bmatrix}

The vectors [1101]\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} and [2210]\begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} are orthogonal to each other, so they’re good candidates for u3\vec u_3 and u4\vec u_4, we just need to normalize them first.

[1101][1313013]u3,[2210][2323130]u4\begin{bmatrix} -1 \\ -1 \\ 0 \\ 1 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} -\frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ 0 \\ \frac{1}{\sqrt{3}} \end{bmatrix}}_{\vec u_3}, \quad \begin{bmatrix} -2 \\ 2 \\ 1 \\ 0 \end{bmatrix} \rightarrow \underbrace{\begin{bmatrix} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix}}_{\vec u_4}

Before we place u3\vec u_3 and u4\vec u_4 in UU, it’s worth noticing that u3\vec u_3 and u4\vec u_4 are both in nullsp(XT)\text{nullsp}(X^T) too.

XTu3=[3225232555010][1313013]=[000]X^T \vec u_3 = \begin{bmatrix} 3 & 2 & 2 & 5 \\ 2 & 3 & -2 & 5 \\ 5 & 5 & 0 & 10 \end{bmatrix} \begin{bmatrix} -\frac{1}{\sqrt{3}} \\ -\frac{1}{\sqrt{3}} \\ 0 \\ \frac{1}{\sqrt{3}} \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}
XTu4=[3225232555010][2323130]=[000]X^T \vec u_4 = \begin{bmatrix} 3 & 2 & 2 & 5 \\ 2 & 3 & -2 & 5 \\ 5 & 5 & 0 & 10 \end{bmatrix} \begin{bmatrix} -\frac{2}{3} \\ \frac{2}{3} \\ \frac{1}{3} \\ 0 \end{bmatrix} = \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}

In fact, remember from Chapter 5.4 that

nullsp(XT)=nullsp(XXT)nullsp(X)=nullsp(XTX)\text{nullsp}(X^T) = \text{nullsp}(XX^T) \\ \text{nullsp}(X) = \text{nullsp}(X^TX)

So, we have

U=[1613213231613213230223013260130]U = \begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}

And finally, we have computed the SVD of XX!

[3252352205510]X=[1613213231613213230223013260130]U[1500030000000]Σ[16162612120131313]VT\underbrace{\begin{bmatrix} 3 & 2 & 5 \\ 2 & 3 & 5 \\ 2 & -2 & 0 \\ 5 & 5 & 10 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & -\frac{2}{3} \\ \frac{1}{\sqrt{6}} & -\frac{1}{3\sqrt{2}} & -\frac{1}{\sqrt{3}} & \frac{2}{3} \\ 0 & \frac{2\sqrt{2}}{3} & 0 & \frac{1}{3} \\ \frac{2}{\sqrt{6}} & 0 & \frac{1}{\sqrt{3}} & 0 \end{bmatrix}}_{U} \underbrace{\begin{bmatrix} 15 & 0 & 0 \\ 0 & 3 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{6}} & \frac{1}{\sqrt{6}} & \frac{2}{\sqrt{6}} \\ \frac{1}{\sqrt{2}} & -\frac{1}{\sqrt{2}} & 0 \\ \frac{1}{\sqrt{3}} & \frac{1}{\sqrt{3}} & -\frac{1}{\sqrt{3}} \end{bmatrix}}_{V^T}

For the most part, we will use numpy to compute the SVD of a matrix. But, it’s important to understand the steps we took to compute the SVD manually.


Examples

Let’s work through a few computational problems. Most of our focus on the SVD in this class is conceptual, but it’s important to have some baseline computational fluency.

Example: SVD of a 2×22 \times 2 Matrix

We developed the SVD to decompose non-square matrices. But, it still works for square matrices too. Find the SVD of X=[5403]X = \begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}.

Example: SVD of a Wide Matrix

Find the SVD of X=[310401]X = \begin{bmatrix} 3 & 1 & 0 \\ 4 & 0 & 1 \end{bmatrix}.

Example: SVD of a Positive Semidefinite Matrix

Recall, a positive semidefinite matrix is a square, symmetric matrix whose eigenvalues are non-negative.

Consider the positive semidefinite matrix X=[1224]X = \begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}. Find the SVD of XX. How does it compare to the spectral decomposition of XX into QΛQTQ \Lambda Q^T?

Example: SVD of a Symmetric Matrix with Negative Eigenvalues

Let X=[2332]X = \begin{bmatrix} 2 & 3 \\ 3 & 2 \end{bmatrix}. Find the SVD of XX. How do its singular values compare to its eigenvalues?

(Remember that only square matrices have eigenvalues, but all matrices have singular values.)

Example: SVD of an Orthogonal Matrix

Suppose XX itself is an n×nn \times n orthogonal matrix, meaning XTX=XXT=IX^TX = XX^T = I. Why are all of XX’s singular values 1?


Visualizing the SVD

In Chapter 9.5, we visualized the spectral theorem, which decomposed a symmetric matrix A=QΛQTA = Q \Lambda Q^T, which we said represented a rotation, stretch, and rotation back.

Uorthogonal!Σdiagonal!VTorthogonal!\underbrace{U}_{\text{orthogonal!}} \:\:\:\: \underbrace{\Sigma}_{\text{diagonal!}} \:\:\:\: \underbrace{V^T}_{\text{orthogonal!}}

X=UΣVTX = U \Sigma V^T can be interpreted similarly. If we think of f(w)=Xwf(\vec w) = X \vec w as a linear transformation from Rd\mathbb{R}^d to Rn\mathbb{R}^n, then ff operates on w\vec w in three stages:

  1. a rotation by VTV^T (because VV is orthogonal), followed by

  2. a stretch by Σ\mathcal{\Sigma}, followed by

  3. a rotation by UU

To illustrate, let’s consider the square (but not symmetric) matrix

X=[5403]X = \begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}

from one of the earlier worked examples. Unfortunately, it’s difficult to visualize the SVD of matrices larger than 2×22 \times 2, since then either the ui\vec u_i’s or vi\vec v_i’s (or both) are in R3\mathbb{R}^3 or higher.

XX’s SVD is

[5403]X=[310110110310]U[35005]Σ[12121212]VT\underbrace{\begin{bmatrix} 5 & 4 \\ 0 & 3 \end{bmatrix}}_X = \underbrace{\begin{bmatrix} \frac{3}{\sqrt{10}} & -\frac{1}{\sqrt{10}} \\ \frac{1}{\sqrt{10}} & \frac{3}{\sqrt{10}} \end{bmatrix}}_U \underbrace{\begin{bmatrix} 3\sqrt{5} & 0 \\ 0 & \sqrt{5} \end{bmatrix}}_{\Sigma} \underbrace{\begin{bmatrix} \frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \\ -\frac{1}{\sqrt{2}} & \frac{1}{\sqrt{2}} \end{bmatrix}}_{V^T}
Image produced in Jupyter

Next, we compare full vs. compact SVD and use the decomposition for low-rank approximation.