Eigenvalues and eigenvectors are powerful concepts, but unfortunately, they only apply to square matrices. It would be nice if we could extend some of their utility to non-square matrices, like matrices containing real data, which typically have many more rows than columns.
This is precisely where the singular value decomposition (SVD) comes in. You should think of it as a generalization of the eigenvalue decomposition, A=VΛV−1, to non-square matrices. When we first introduced eigenvalues and eigenvectors, we said that an eigenvector v of A is a vector whose direction is unchanged when multiplied by A – all that happens to v is that it’s scaled by a factor of λ.
Av=λv
But now, suppose X is some n×d matrix. For any vector v∈Rd, the vector Xv is in Rn, notRd, meaning v and Xv live in different universes. So, it can’t make sense to say that Xv results from stretching v.
Instead, we’ll find pairs of singular vectors, v∈Rd and u∈Rn, such that
Av=σu
where σ is a singular value of A (not a standard deviation!). Intuitively, this says that when X is multiplied by v, the result is a scaled version of u. These singular values and vectors are the focus of Chapter 5.3. Applications of the singular value decomposition come in Chapter 5.4.
I’ll start by giving you the definition of the SVD, and then together we’ll figure out where it came from.
Firstly, note that the SVD exists, no matter what X is: it can be non-square, and it doesn’t even need to be full rank. But note that unlike in the eigenvalue decomposition, where we decomposed A using just one eigenvector matrix V and a diagonal matrix Λ, here we need to use two singular vector matrices U and V and a diagonal matrix Σ.
There’s a lot of notation and new concepts above. Let’s start with an example, then try and understand where each piece in X=UΣVT comes from and means. Suppose
X=⎣⎡322523−2555010⎦⎤
X is a 4×3 matrix with rank(X)=2, since its third column is the sum of the first two. Its singular value decomposition is given by X=UΣVT:
U and V are both orthogonal matrices, meaning UTU=UUT=I4×4 and VTV=VVT=I3×3.
Σ contains the singular values of X on the diagonal, arranged in decreasing order. We have that σ1=15, σ2=3, and σ3=0. X has three singular values, but only two are non-zero. In general, the number of non-zero singular values is equal to the rank of X.
The SVD of X depends heavily on the matrices XTX and XXT. While X itself is 4×3,
XTX is a symmetric 3×3 matrix, containing the dot products of X’s columns
XXT is a symmetric 4×4 matrix, containing the dot products of X’s rows
Since XTX and XXT are both square matrices, they have eigenvalues and eigenvectors. And since they’re both symmetric, their eigenvectors for different eigenvalues are orthogonal to each other, as the spectral theorem A=QΛQT guarantees for any symmetric matrix A.
The singular value decomposition involves creatively using the eigenvalues and eigenvectors of XTX and XXT. Suppose X=UΣVT is the SVD of X. Then, using the facts that UTU=I and VTV=I, we have:
XTX=(UΣVT)T(UΣVT)=VΣTIUTUΣVT=looks like QΛQTVΣTΣVT
XXT=(UΣVT)(UΣVT)T=UΣIVTVΣTUT=looks like PΛPTUΣΣTUT
This just looks like we diagonalized XTX and XXT! The expressions above are saying that:
V’s columns are the eigenvectors of XTX
U’s columns are the eigenvectors of XXT
XTX and XXT usually have different sets of eigenvectors, which is why U and V are generally not the same matrix (they don’t even have the same shape).
The eigenvalues of XTX and XXTare the same, though: those are the non-zero entries of ΣTΣ and ΣΣT. Since Σ is an n×d matrix, ΣTΣ is a d×d matrix and ΣΣT is an n×n matrix. But, when you work out both products, you’ll notice that their non-zero values are the same.
Suppose for example that Σ=⎣⎡σ10000σ20000σ30⎦⎤. Then,
But, these matrices with the squared terms are precisely the Λ's in the spectral decompositions of XTX=QΛQT and XXT=PΛPT. This means that
σi2=λi⟹σi=λi
where σi is a singular value of X and λi is an eigenvalue of XTX or XXT.
The above derivation is enough to justify that the eigenvalues of XTX and XXT are never negative, but for another perspective, note that both XTX and XXT are positive semidefinite, meaning their eigenvalues are non-negative. (You’re wrestling with this fact in Lab 11 and Homework 10.)
Another proof that XTX and XXT have the same non-zero eigenvalues
Above, we implicitly used the fact that XTX and XXT have the same non-zero eigenvalues. Here’s a proof of this fact that has nothing to do with the SVD.
Suppose vi is an eigenvector of XTX with eigenvalue λi.
XTXvi=λivi
What happens if we multiply both sides on the left by X?
XXTXvi=Xλivi
Creatively adding parentheses gives us
XXT(Xvi)=λi(Xvi)
This shows that Xvi is an eigenvector of XXT with eigenvalue λi. The important thing is that the eigenvalue is shared. This logic can be reversed too, to show that if ui is an eigenvector of XXT with eigenvalue λi, then λi is also an eigenvalue of XTX.
To find U, Σ, and VT, we don’t actually need to compute both XTX and XXT: all of these quantities can be uncovered just with one of them.
Let’s return to our example, X=⎣⎡322523−2555010⎦⎤. Using what we’ve just learned, let’s find U, Σ, and VT ourselves. XTX has fewer entries than XXT, so let’s start with it. I will delegate some number crunching to numpy.
import numpy as np
X = np.array([[3, 2, 5],
[2, 3, 5],
[2, -2, 0],
[5, 5, 10]])
X.T @ X
XTX is a 3×3 matrix, but its rank is 2, meaning it will have an eigenvalue of 0. What are its other eigenvalues?
np.set_printoptions(precision=0, suppress=True)
eigvals, eigvecs = np.linalg.eig(X.T @ X)
eigvals
array([225., 9., 0.])
The eigenvalues of XTX are 225, 9, and 0. This tells us that the singular values of X are 225=15, 9=3, and 0. So far, we’ve discovered that Σ=⎣⎡1500003000000⎦⎤. Remember that Σ always has the same shape as X (both are n×d), and all of its entries are 0 except for the singular values, which are arranged in decreasing order on the diagonal, starting with the largest singular value in the top left corner.
Let’s now find the eigenvectors of XTX – that is, the right singular vectors of X – which we should store in V. We expect the eigenvectors vi of XTXto be orthogonal, since XTX is symmetric.
Crucially though, Xvi=σiui only holds when we’ve arranged the singular values and vectors in U, Σ, and VT consistently. This is one reason why we always arrange the singular values in Σ in decreasing order.
Something is not quite right: σ3=0, which means we can’t use ui=σi1Xvi to compute u3. And, even if we could, this recipe tells us nothing about u4, which we also need to find, since U is 4×4.
The issue is that we don’t have a recipe for what u3 and u4 should be. This problem stems from the fact that rank(X)=2.
X=⎣⎡322523−2555010⎦⎤
If we were to compute XXT, whose eigenvectors are the columns of U, we would have seen that it has an eigenvalue of 0 with geometric (and algebraic) multiplicity 2.
So, all we need to do is fill u3 and u4 with two orthonormal vectors that span the eigenspace of XXT for eigenvalue 0. u1 is already an eigenvector for eigenvalue 225 and u2 is already an eigenvector for eigenvalue 9. Bringing in u3 and u4 will complete U, making it a basis for R4, which it needs to be to be an invertible square matrix. (Remember, U must satisfy UTU=UUT=I, which means U is invertible.)
But, the eigenspace of XXT for eigenvalue 0 is nullsp(XXT)!
Since XXT is symmetric, u3 and u4 will be orthogonal to u1 and u2 no matter what, since the eigenvectors for different eigenvalues of a symmetric matrix are always orthogonal. (Another perspective on why this is true is that u3 and u4 will be in nullsp(XXT) which is equivalent to nullsp(XT), and any vector in nullsp(XT) is orthogonal to any vector in colsp(X), which u1 and u2 are in. There’s a long chain of reasoning that leads to this conclusion: make sure you’re familiar with it.)
Observe that
⎣⎡−1−101⎦⎤ is in nullsp(XXT), since ⎣⎡38372753738−2752−28075750150⎦⎤⎣⎡−1−101⎦⎤=⎣⎡0000⎦⎤
⎣⎡−2210⎦⎤ is in nullsp(XXT), since ⎣⎡38372753738−2752−28075750150⎦⎤⎣⎡−2210⎦⎤=⎣⎡0000⎦⎤
The vectors ⎣⎡−1−101⎦⎤ and ⎣⎡−2210⎦⎤ are orthogonal to each other, so they’re good candidates for u3 and u4, we just need to normalize them first.
Let’s work through a few computational problems. Most of our focus on the SVD in this class is conceptual, but it’s important to have some baseline computational fluency.
We developed the SVD to decompose non-square matrices. But, it still works for square matrices too. Find the SVD of X=[5043].
Solution
Let’s follow the six steps introduced in the “Computing the SVD By Hand” box above.
XTX=[5403][5043]=[25202025]
Now, we need to find the eigenvalues and eigenvectors of XTX. Notice that the sum of XTX’s rows are both the same, imploring me to verify that [11] is an eigenvector:
XTX[11]=[25202025][11]=[4545]=45[11]
So, [11] is an eigenvector for eigenvalue 45. Before placing it into V, I’ll need to turn it into a unit vector; since the norm of [11] is 2, I’ll place [2121] into V.
Since XTX is symmetric, its other eigenvector will be orthogonal to [11], so I’ll choose [−11]. This works because in R2, there’s only one direction orthogonal to any given vector. This corresponds to the eigenvalue
XTX[−11]=[25202025][−11]=[−55]=5[−11]
So, [−11] – or, equivalently, [2−121] – is an eigenvector for eigenvalue 5.
I now have enough information to form Σ and V:
Σ=[45005],V=[2121−2121]
To form U, I need to find two orthonormal vectors that are eigenvectors of XXT for eigenvalues 45 and 5. The easy way is to use the fact that
u2=σ21Xv2=51[5043][2−121]=[10−1103]
Note that I could have also used the fact that I knew u2 would be orthogonal to u1 to find it, since u1 and u2 both live in R2.
Don’t forget to transpose V when writing the final SVD! Also, note that this is a totally different decomposition from A’s eigenvalue decomposition A=VΛV−1. A’s eigenvalues are 5 and 3, which don’t appear above.
If I were to start with XTX, I’d have to compute a 3×3 matrix. Instead, I’ll start with XXT, which is 2×2. This will mean filling in the columns of U before I fill in the columns of V, which will require a slightly different approach.
XXT=[341001]⎣⎡310401⎦⎤=[10121217]
XXT’s eigenvalues must add to 10+17=27 and multiply to 10⋅17−122=26. This quickly tells me its eigenvalues are 26 and 1.
For λ=26, any eigenvector is of the form [ab] where
[10121217][ab]=26[ab]
This implies that 10a+12b=26a, i.e. 16a=12b, or a=43b. Eventually we’ll need to normalize the vector, but for now I’ll pick b=4 so that a=3 and so [34] is an eigenvector.
For λ=1, the eigenvector needs to be orthogonal to [34], which is easy to solve for when working in R2, since there’s only one possible direction for a vector orthogonal to [34] in R2. The easy choice is [−43].
I now have enough information to form both U and Σ:
U=[3/54/5−4/53/5],Σ=[2600100]
The extra column of 0s is to account for the fact that X is 2×3 and Σ has the same shape as X. Everything but the non-zero singular values along the diagonal of Σ is 0.
Finally, how do we find the columns of the 3×3 matrix V? The equation we used in earlier examples, Xvi=σiui, is not useful here because it would involve solving a system of three equations and three unknowns for each vi, since its the unknown vector vi that is being multiplied by X. That equation works only when you first find the eigenvectors of XTX (the columns of V). We’re proceeding in a different order, so we need a different equation. (There’s always the fallback of using the fact that we know that XTX’s non-zero eigenvalues are 26 and 1 and using them to solve for the eigenvectors, but let’s think of another approach.)
The parallel equation to Xvi=σiui is XTui=σivi. Where did this come from? Start with
Xvi=σiui
and multiply both sides on the left by XT:
XTXvi=XTσiui
But, vi is an eigenvector of XTX with eigenvalue σi2, so replace XTXvi with σi2vi:
σi2vi=XTσiui⟹XTui=σivi⟹vi=σi1XTui
This gives us recipes for the first and second columns of the 3×3 matrix V:
v2 will be orthogonal to v1, but there are infinitely many directions it could point in since these vectors live in R3 (and not R2), so I can’t pick just any vector orthogonal to v1. So, let’s use the equation again:
v3 can’t be solved using v3=σ31XTu3 because σ3=0. The rank of X is 2, so any σi’s after σ2 are 0. Instead, v3 can be found by searching for a vector in nullsp(X), or equivalently, nullsp(XTX).
X=[341001]
Notice that X’s first column is 3 times its second column plus 4 times its third column, so ⎣⎡−134⎦⎤ is in nullsp(X). I should normalize this before placing it into V, so v3=⎣⎡−1/263/264/26⎦⎤.
Putting all this together, we have that the SVD of X is
Recall, a positive semidefinite matrix is a square, symmetric matrix whose eigenvalues are non-negative.
Consider the positive semidefinite matrix X=[1224]. Find the SVD of X. How does it compare to the spectral decomposition of X into QΛQT?
Solution
Let’s find X’s singular value decomposition first. To prevent this section from being too long, I’ll do this relatively quickly:
XTX=[5101020], which has eigenvalues 25 and 0 (which we can spot quickly because X and XTX are not invertible, so they have an eigenvalue of 0).
For λ=0, the corresponding eigenvector direction is [2−1].
So, for λ=25, the eigenvector direction is [12].
So, the Σ and V are
Σ=[25000],V=[1/52/52/5−1/5]
What’s left is U, which has the eigenvectors of XXT. It is tempting to just say that XXT and XTX are the same matrix and so they have the same eigenvectors, so U=V. This happens to work here, but this is a bad habit to get into since it can lead us into trouble in other similar examples. (For example, if u1 is an column of U and v1 is a column of V, we could replace each with −u1 and −v1, respectively, and keep the same SVD. But if we were to find the columns of U totally independently from the columns of V, we wouldn’t know which sign to use).
Instead, let’s use ui=σi1Xvi to find the columns of U, to ensure the correspondence between each ui and its partner vi.
u1=σ11Xv1=51[1224][1/52/5]=[1/52/5].
u2=σ21Xv2 can’t be evaluated since σ2=0, but we know u2 must be orthogonal to u1, so pick [2/5−1/5].
This is exactly the same as the spectral decomposition of X into QΛQT, just with Σ=Λ and U=V!
Don’t try and make to broad of a generalization of this: when matrices are square, generally their SVD X=UΣVT is different from their eigenvalue decomposition X=VΛV−1, as we saw with [5043]. Even when X is symmetric, its SVD and spectral decomposition don’t need to be the same. A key aspect of the equivalence in this example is that X is positive semidefinite, meaning its eigenvalues – 5 and 0 (verify this yourself, it wasn’t part of our solution above) – are non-negative. If they were negative, then the SVD and spectral decompositions would be different, as we’re about to see.
Example: SVD of a Symmetric Matrix with Negative Eigenvalues¶
Let X=[2332]. Find the SVD of X. How do its singular values compare to its eigenvalues?
(Remember that only square matrices have eigenvalues, but all matrices have singular values.)
Solution
X is a symmetric matrix, but it is not positive semidefinite like in the last example, since one of its eigenvalues is negative. Its eigenvalues are 5 and -1, corresponding to eigenvectors [11] and [−11]. X’s spectral decomposition, then, is
So, we know X’s singular values can’t be the same as its eigenvalues, as singular values can’t be negative. What are X’s singular values? They come from the square rooting the eigenvalues of XTX, which is the same as XXT:
XTX=XXT=[13121213]
XTX has eigenvalues 25 and 1, which means X’s singular values are 5 and 1 (not -1), with corresponding eigenvectors [1/21/2] and [−1/21/2]. So, we have
Σ=[5001],V=[1/21/2−1/21/2]
Finally, we need to find U. Again, it’s tempting to want to say that XTX=XXT so U=V, but if we did that, our U would be wrong! Instead, we need to use the one-at-a-time relationship ui=σi1Xvi to find the columns of U.
So, U=[1/21/21/2−1/2]. Note that the negative signs in U and V are in very slightly different places; this subtlely makes the whole calculation work. So, putting it all together, X’s singular value decomposition is
This looks ever so similar to the boxed equation for X=QΛQT above, but it’s slightly different, with minus signs dropped and rearranged.
In short, for non-positive semidefinite but still symmetric matrices, the singular values are the absolute values of the eigenvalues. I don’t think this is a general rule worth remembering.
For non-symmetric but still square matrices, the eigenvalues of X and singular values of X don’t have any direct relation, other than the general rule that σi is the square root of the corresponding eigenvalue of XTX. Remember that the SVD is mostly designed to solve practical problems that arise with non-square matrices X, which don’t have any eigenvalues to begin with.
Suppose X itself is an n×n orthogonal matrix, meaning XTX=XXT=I. Why are all of X’s singular values 1?
Solution
Recall, in X=UΣVT, the columns of U are the eigenvectors of XXT and the columns of V are the eigenvectors of XTX. But here, XTX=XXT=In×n. The n×n identity matrix has the unique eigenvalue of 1 with algebraic and geometric multiplicity n, meaning
In Chapter 5.2, we visualized the spectral theorem, which decomposed a symmetric matrix A=QΛQT, which we said represented a rotation, stretch, and rotation back.
orthogonal!Udiagonal!Σorthogonal!VT
X=UΣVT can be interpreted similarly. If we think of f(w)=Xw as a linear transformation from Rd to Rn, then f operates on w in three stages:
a rotation by VT (because V is orthogonal), followed by
a stretch by Σ, followed by
a rotation by U
To illustrate, let’s consider the square (but not symmetric) matrix
X=[5043]
from one of the earlier worked examples. Unfortunately, it’s difficult to visualize the SVD of matrices larger than 2×2, since then either the ui’s or vi’s (or both) are in R3 or higher.
The version of the SVD we’ve constructed together is called the full SVD.
In the full SVD, U is n×n, Σ is n×d, and VT is d×d. If rank(X)=r, then
The first r columns of U are the left singular vectors of X and are a basis for colsp(X); the last n−r columns of U are a basis for nullsp(XT). Together, the columns of U span Rn.
The first r columns of V are the right singular vectors of X and are a basis for colsp(XT); the last d−r columns of V are a basis for nullsp(X). Together, the columns of V span Rd.
The full SVD is nice for proofs, but is a little... annoying to use in real applications, because it contains a lot of 0’s. The additional n−r columns of U and d−r columns of V are included to make U and V orthogonal matrices, but end up getting 0’d out when multiplied by the corresponding 0’s in Σ.
Remember that X=UΣVT is equivalent to XV=UΣ, which says that, for i=1,2,...,d,
Xvi=σiui
but when i>r, all this says is 0=0:
Xvi=0, since vr, vr+1, ..., vd are all in nullsp(X), and
The compact SVD throws away the 0’s in Σ and the corresponding columns in U and V. In the annotated figure above, it keeps only the first r columns inU, first r values inΣ, and first r rows inVT.
That is, the compact SVD says X=UrΣrVrT, where Ur is n×r, Σr is r×r, and VrT is r×d.
The full and compact SVDs are equivalent, in that X=UΣVT=UrΣrVrT. But the individual components are of different sizes:
Full SVD
Compact SVD
U
n×n
n×r
First r columns of U are basis for colsp(X)
Σ
n×d
r×r
Number of non-zero σi’s is r=rank(X)
VT
d×d
r×d
First r rows of VT are basis for colsp(XT)
Ur and Vr are no longer orthogonal matrices, since only square matrices can be orthogonal. However, their columns are still orthonormal, meaning UrTUr=Ir×r and VrTVr=Ir×r.
Each term σiuiviT is an outer product of ui and vi, scaled by σi. Outer products are rank-one matrices: each column of σiuiviT is a multiple of ui, and each row of it is a multiple of viT.
This outer product view of matrix multiplication is not one that we’ve emphasized a ton in this course, but it can be useful in certain contexts, as we’re about to see. To see how it works, let’s revisit our first example.
Since 15>3, the first outer product contributes more to X than the second one does.
We can think of the singular values as representing the importance of the corresponding singular vectors in representing X. Since we sort singular values in decreasing order, σ1≥σ2≥...≥σr>0, the first outer product is always the most important one, the second outer product is the second most important one, and so on.
Our most recent observation is that the SVD X=UΣVT allows us to write X as a sum of rank-one matrices, in decreasing order of importance.
X=most importantσ1u1v1T+second most importantσ2u2v2T+⋯+least importantσrurvrT
In many practical applications, the full matrix X can be too large to store or process. In such cases, we can produce a low-rank approximation of X by summing fewer than r of these rank-one matrices. In the example above, X was of rank 2, so a rank-1 approximation of X would just be the first outer product, σ1u1v1T, which is in blue above.
A common application of the SVD and low-rank approximations is to compress images. How so? Consider the following grayscale image of my (16 year old) dog, Junior.
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
# Load the image and convert to grayscale
img = Image.open('imgs/junior.jpeg').convert('L')
img_arr = np.rot90(np.array(img), k=3)
# Show the grayscale image
plt.figure(figsize=(4*1.5, 3*1.5))
plt.imshow(img_arr, cmap='gray')
plt.axis('off')
plt.show()
This image is 300 pixels wide and 400 pixels tall. Since the image is grayscale, each pixel’s intensity can be represented by a number between 0 and 255, where 0 is black and 255 is white. These intensities can be stored in a 400×300 matrix. The rank of this matrix is likely 300, since it’s extremely unlikely that any of the 300 columns of the image are exactly representable as a linear combination of other columns.
But, as the SVD reveals, the image can be approximated well by a rank-k matrix, for a k that is much smaller than 300. We build this low-rank approximation of the image by summing up k rank-one matrices.
rank k approximation of image=i=1∑kσiuiviT
A slider should appear below, allowing you to select a value of k and see the corresponding rank-k approximation of Junior.
import numpy as np
import plotly.graph_objs as go
from PIL import Image
IMAGE_PATH = 'imgs/junior.jpeg'
# Load image (fallback to dummy if not found), rotate 90º clockwise, and resize to 3:4 aspect ratio (e.g., 300x400)
try:
img = Image.open(IMAGE_PATH).convert('L')
img = img.rotate(-90, expand=True) # Rotate 90º clockwise
img = img.resize((300, 400), Image.LANCZOS) # 3:4 aspect ratio (width, height)
img_arr = np.array(img)
except FileNotFoundError:
H, W = 400, 300 # 3:4 aspect ratio
img_arr = np.zeros((H, W), dtype=np.uint8)
for i in range(H):
for j in range(W):
img_arr[i, j] = 200 if ((i // 10) % 2 == (j // 10) % 2) else 50
# Compute SVD
U, S, VT = np.linalg.svd(img_arr.astype(float), full_matrices=False)
max_k = len(S)
def get_reconstruction(k):
k = max(1, min(k, len(S)))
Uk = U[:, :k]
Sk = S[:k]
VTk = VT[:k, :]
recon = Uk @ np.diag(Sk) @ VTk
return 255 - np.clip(recon, 0, 255).astype(np.uint8)
# k=1, k=31, k=61, k=91, ..., max allowed (up to len(S))
ks = list(range(1, 51))
if ks[-1] != max_k-1:
ks.append(max_k)
# Create frames (each frame is a Heatmap of the reconstructed image)
frames = []
for k in ks:
z = get_reconstruction(k)
frames.append(
go.Frame(
data=[
go.Heatmap(
z=z,
zmin=0,
zmax=255,
colorscale=[[0, "white"], [1, "black"]],
showscale=False
)
],
name=str(k)
)
)
# Initial trace uses k=1
initial_z = get_reconstruction(1)
initial_trace = go.Heatmap(z=initial_z, zmin=0, zmax=255, colorscale=[[0, "white"], [1, "black"]], showscale=False)
fig = go.Figure(
data=[initial_trace],
frames=frames
)
# Slider that calls animate on each frame
slider_steps = []
for k in ks:
step = {
"args": [
[str(k)], # frame name
{
"frame": {"duration": 0, "redraw": True},
"mode": "immediate",
"transition": {"duration": 0}
}
],
"label": str(k),
"method": "animate"
}
slider_steps.append(step)
sliders = [{
"active": 0,
"pad": {"t": 50},
"steps": slider_steps,
"currentvalue": {"prefix": "Rank k: ", "font": {"size": 14}}
}]
fig.update_layout(
sliders=sliders,
width=300, # width matches aspect ratio
height=480, # height matches aspect ratio
margin=dict(l=20, r=20, t=0, b=0),
font=dict(family="Avenir"),
)
# Hide axes
fig.update_xaxes(visible=False)
fig.update_yaxes(visible=False)
fig.update_yaxes(autorange="reversed")
fig.show()
Loading...
To store the full image, we need to store 400⋅300=120,000 numbers. But to store a rank-k approximation of the image, we only need to store (1+400+300)k=701k numbers – only the first k singular values, along with u1,u2,...,uk (each of which has 400 numbers), and v1,v2,...,vk (each of which has 300 numbers). If we’re satisfied with a rank-30 approximation, we only need to store 701⋅30=21,030 numbers, which is a compression of 21,030120,000≈5.7 times!
Notice: The signs of the columns of U and V are not uniquely determined by the SVD. For example, we could replace u1 with −u1 and v1 with −v1 to get the same matrix X. The values for U we found had flipped signs for columns 1, 2 relative to the values returned by np.linalg.svd.
The first column of numpy’s u is the negative of our first column of U.
Both second columns are the same.
The latter two columns are different. Why? There are infinitely many orthogonal bases for the null space of XT, which are what the last two columns of U represent. We just picked a different choice than numpy did.
All n×d matrices X have a singular value decomposition X=UΣVT, where U is n×n, Σ is n×d, and V is d×d.
The columns of U are orthonormal eigenvectors of XXT; these are called the left singular vectors of X.
The columns of V are orthonormal eigenvectors of XTX; these are called the right singular vectors of X.
Both XXT and XTX share the same non-zero eigenvalues; the singular values of X are the square roots of these eigenvalues.
The number of non-zero singular values is equal to the rank of X. It’s important that we sort the singular values in decreasing order, so that σ1≥σ2≥...≥σr>0, and place the singular vectors in the columns of U and V in the same order.
σi=λi
5. A typical recipe for computing the SVD is to:
1. Compute XTX. Place its eigenvectors in the columns of V, and place the square roots of its eigenvalues in the diagonal of Σ.
1. To find each ui, use Xvi=σiui, i.e. ui=σi1Xvi.
1. The above rule only works for σi>0. If σi=0, then the remaining ui’s must be eigenvectors of XXT for the eigenvalue 0, meaning they must lie in the nullspace of XT.
6. The SVD allows us to interpret the linear transformation of multiplying by X as a composition of a rotation by VT, a scaling/stretching
7. The SVD X=UΣVT can be viewed as a sum of rank-one matrices:
X=i=1∑rσiuiviT
Each piece σiuiviT is a rank-one matrix, consisting of the outer product of ui and vi. This summation view can be used to compute a low-rank approximation of X by summing fewer than r of these rank-one matrices.