To illustrate, let’s start with our familiar matrix A:
A=⎣⎡320211−1−24900⎦⎤
The transpose of A is:
AT=⎣⎡3142190−102−20⎦⎤
Note that A∈R4×3 and AT∈R3×4.
Why would we ever need to do this? To illustrate, suppose u=⎣⎡u1u2u3u4⎦⎤, and that we’d like to compute the product ATu. (Note that u must be in R4 in order for ATu to be defined, unlike x∈R3 in the product Ax). Then:
This is a linear combination of the rows of A, where the weights are the components of u. Remember, the standard product Ax is a linear combination of the columns of A, so the transpose helps us if we want to compute a linear combination of the rows of A. (Equivalently, it helps us if we want to compute the dot product of the columns of A with u – see the “Two Pictures” note from earlier in this chapter.)
The transpose also gives us another way of expressing the dot product of two vectors. If u and v are two vectors in Rn, then uT is a row vector with 1 row and n columns. Multiplying uT by v results in a 1×1 matrix, which is just the scalar u⋅v.
The benefit of using the transpose to express the dot product is that it allows us to write the dot product of two vectors in terms of matrix multiplication, rather than being an entirely different type of operation. (In fact, as we’ve seen here, matrix multiplication is just a generalization of the dot product.)
There are other uses for the transpose, too, so it’s a useful tool to have in your toolbox.
The first three properties are relatively straightforward. The last property is a bit more subtle. Try and reason as to why it’s true on your own, then peek into the box below to verify your reasoning and to see an example.
Why is (AB)T=BTAT?
Let’s start with just (AB)T and reason our way from there. Define C=(AB)T. C is presumably the product of two matrices X and Y, we just don’t know what X and Y are. By the definition of matrix multiplication, we know that Cij is the dot product of the ith row of X and the jth column of Y. How can we express the product C=XY in terms of A and B?
Let’s work backwards. Since Cij=(AB)ijT=(AB)ji by the definition of the transpose, we know that:
Cij=(AB)ji=(row j of A)⋅(column i of B)=(column i of B)⋅(row j of A)
This is a little backwards relative to the definition of matrix multiplication, which says that:
Cij=(XY)ij=(row i of X)⋅(column j of Y)
In order for the two definitions of Cij to be consistent, we must have:
(column i of B)⋅(row j of A)=(row i of X)⋅(column j of Y)
Row i of X is the same as column i of B, if X=BT.
Column j of Y is the same as row j of A, if Y=AT.
Putting this together, we have:
C=(AB)T=BTAT
as we hoped!
To make things concrete, let’s consider two new matrices A and B:
BTAT is the transpose of AB. Both AB and BTAT have 12 elements, both computed using the same 12 dot products.
The fact that (AB)T=BTAT comes in handy when finding the norm of a matrix-vector product. If A is an n×d matrix and x∈Rd, then:
∥Ax∥2=(Ax)T(Ax)=xTATAx
As we’ll soon see, some matrices A have special properties that make this computation particularly easy.
TODO: Add discussion of symmetric, triangular, diagonal, orthogonal, and other special matrices.
We now introduce the transpose and other special matrix types that will keep appearing.
In numpy, the T attribute is used to compute the transpose of a 2D array.
Activity 1
Activity 1.1
In the cell above:
Define x to be an array corresponding to the vector x=⎣⎡103⎦⎤.
Find the norm of the product Ax using np.linalg.norm.
Find the norm of the product Ax using the fact that ∥Ax∥2=xTATAx, and verify that you get the same answer.
Activity 1.2
Suppose M∈Rn×d is a matrix, v∈Rd is a vector, and s∈R is a scalar.
Determine whether each of the following quantities is a matrix, vector, scalar, or undefined. If the result is a matrix or vector, determine its dimensions.
Mv
vM
v2
MTM
MMT
vTMv
(sMv)⋅(sMv)
(svTMT)T
vTMTMv
vvT+MTM
∥v∥Mv+(vTMTMv)Mv
Activity 1.3
Let A=⎣⎡23−1141⎦⎤, B=[120123], and C=⎣⎡101011210−11−1⎦⎤.
Compute AB, then multiply the result by C.
Compute A, then multiply the result by BC. Do you get the same result as above? If so, what property of matrix multiplication guarantees this?
Determine a formula for (ABC)T, and verify that your result works. (Hint: Start with the fact that (AB)T=BTAT.)
Saying “the identity matrix” is a bit ambiguous, as there are infinitely many identity matrices – there’s a 1×1 identity matrix, a 2×2 identity matrix, a 3×3 identity matrix, and so on. Often, the dimension of the identity matrix is implied by context, and if not, we might provide it as a subscript, e.g. In for the n×n identity matrix.
Why is the identity matrix defined this way? It’s the matrix equivalent of the number 1 in scalar multiplication, also known as the multiplicative identity. If c is a scalar, then c⋅1=c and 1⋅c=c. (0 is known as the additive identity in scalar multiplication.)
Similarly, if A is squaren×n matrix and x∈Rn is a vector, then the n×n identity matrix I is the unique matrix that satisfies:
Ix=x for all x∈Rn.
IA=AI=A for all A∈Rn×n.
A good exercise is to verify that the identity matrix satisfies these properties.
Activity 2
Let X=⎣⎡1−1203−230−12⎦⎤.
Compute XTX.
Then, compute the transpose of XTX. What do you notice? (XTX is called a symmetric matrix.)
Compute XTX+21I. We’ll use matrices of the form XTX+λI in Chapter 5.
This section was relatively mechanical, and I didn’t spend much time explaining why we’d multiply two matrices (or a matrix and a vector) together. More context for this operation will come throughout the rest of the chapter.
To conclude, I want to show you some of the magic behind matrix multiplication.
Consider the relatively innocent looking 2×2 matrix
A=⎣⎡2321−2123⎦⎤
Below, you’ll see that I’ve drawn out six vectors in R2.
u=[32] and Au
v=[2−2] and Av
w=[−5−4] and Aw
What do you notice about the vectors Au, Av, and Aw? and how they relate to u, v, and w?
from utils import plot_vectors
import numpy as np
A = np.array([[np.cos(np.pi/6), -np.sin(np.pi/6)], [np.sin(np.pi/6), np.cos(np.pi/6)]])
u = np.array([3, 2])
v = np.array([2, -2])
w = np.array([-5, -4])
Au = A @ u
Av = A @ v
Aw = A @ w
fig = plot_vectors([(tuple(u), 'orange', r'$\vec u$'),
(tuple(v), '#3d81f6', r'$\vec v$'),
(tuple(w), '#d81a60', r'$\vec w$'),
(tuple(Au), 'orange', r'$A \vec u$'),
(tuple(Av), '#3d81f6', r'$A \vec v$'),
(tuple(Aw), '#d81a60', r'$A \vec w$')], vdeltay=0.3)
fig.show(scale=3, renderer='png')
A is called a rotation matrix, since it rotates vectors by a certain angle (in this case, 6π radians, or 30∘). Rotations are a type of linear transformation.
Not all matrices are rotation matrices; there exist plenty of different types of linear transformations, like reflections, sheers, and projections (which sound familiar). These will all become familiar in Chapter 6.1. All I wanted to show you for now is that matrix multiplication may look like a bunch of random number crunching, but there’s a lot of meaning baked in.