9.6. Positive Semidefinite Matrices and the Rayleigh Quotient
In Chapter 9.5, we saw that symmetric matrices can be diagonalized by an orthogonal matrix. In this section, we will use that fact to understand two closely related ideas:
positive semidefinite matrices, which control when a quadratic form has a global minimum and is convex, and
the Rayleigh quotient, which is a normalized quadratic form that isolates the effect of direction.
Quadratic Forms and Positive Semidefinite Matrices¶
If A is an n×n matrix, then the function
f(x)=xTAx
is called a quadratic form. When A is symmetric, quadratic forms are especially nice because the spectral theorem lets us understand them completely in terms of eigenvalues and eigenvectors.
A positive definite matrix is even stronger: it satisfies
xTAx>0for every x=0,
which is equivalent to saying that all eigenvalues of A are strictly positive.
Quadratic forms already appeared earlier in the course. For example, the mean-squared error from Chapter 8.1 can be expanded into a quadratic expression in the weight vector w.
Rsq(w)=n1∥y−Xw∥2
So understanding when a quadratic form has a minimum is not just abstract linear algebra; it is directly tied to optimization.
Why are the two definitions of positive semidefiniteness equivalent? Since A is symmetric, we can write
A=QΛQT
where Q is orthogonal and Λ is diagonal with eigenvalues λ1,…,λn on the diagonal. For any vector x, let y=QTx. Then
xTAx=xT(QΛQT)x=yTΛy=i=1∑nλiyi2
This formula is the key. Each yi2 is non-negative, so if every eigenvalue λi≥0, then every term λiyi2 is non-negative and therefore xTAx≥0 for every x. Conversely, if some eigenvalue were negative, then plugging in the corresponding eigenvector would make xTAx<0.
The picture on the left comes from the positive definite matrix
A=[2112],
whose eigenvalues are 3 and 1. Every output is non-negative, and the level curves are ellipses surrounding a single global minimum at the origin.
The picture on the right comes from the symmetric matrix
A=[1551],
whose eigenvalues are 6 and -4. Because one eigenvalue is negative, the quadratic form takes both positive and negative values, so it cannot be positive semidefinite and it is not convex.
For a symmetric quadratic form, convexity is controlled exactly by positive semidefiniteness. If
f(x)=xTAx,
then
∇f(x)=2AxandHf=2A,
because A=AT. From multivariable calculus, a function is convex exactly when its Hessian is positive semidefinite. So for symmetric quadratic forms,
f(x)=xTAx is convex ⟺A⪰0
This is one reason positive semidefinite matrices show up constantly in optimization: they tell us that the landscape bends upward in every direction.
But there is still one issue. The value of xTAx depends on both the direction of x and its length. If we double x, then the value quadruples. To study the effect of direction alone, we normalize by the squared length of the vector.
Suppose A is a symmetric n×n matrix. The Rayleigh quotient of A is the function
g(v)=vTvvTAv
for all non-zero vectors v.
You should think of this as a normalized quadratic form. The numerator vTAv measures the output of the quadratic form, while the denominator vTv=∥v∥2 removes the effect of scale.
Indeed, if c=0, then
g(cv)=(cv)T(cv)(cv)TA(cv)=c2vTvc2vTAv=g(v)
So the Rayleigh quotient depends only on the direction of v, not on its magnitude. In particular, if ∥v∥=1, then
g(v)=vTAv,
so the Rayleigh quotient is just the quadratic form restricted to the unit sphere.
In Homework 9, Problem 4, you showed that
∇g(v)=vTv2(Av−g(v)v)
If v is a critical point of g, then ∇g(v)=0, which forces
Av=g(v)v
That means every critical point of the Rayleigh quotient is an eigenvector of A, and the corresponding value of the Rayleigh quotient is the associated eigenvalue.
The dashed lines mark the eigenvector directions of A. Notice what changed compared to the earlier quadratic form plot: the extreme values no longer occur farther and farther away from the origin. After normalization, only direction matters.
The reddest direction is the eigenvector direction corresponding to the largest eigenvalue, which is 6. The bluest direction is the eigenvector direction corresponding to the smallest eigenvalue, which is -4. On the unit circle, those are exactly the maximum and minimum values of the quadratic form.
So for a symmetric matrix A,
the largest possible value of the Rayleigh quotient is the largest eigenvalue of A, and
the smallest possible value of the Rayleigh quotient is the smallest eigenvalue of A.
This gives a geometric interpretation of eigenvectors: they are the directions where the normalized quadratic form is stationary, and in fact extremized.