Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.2. Transpose and Special Matrices

There’s an important operation on matrices that we haven’t discussed yet. Let me introduce it, and then walk through a few types of matrices that are important enough to be given names.

Transpose

To illustrate, let’s start with our familiar matrix AA:

A=[314219010220]A = \begin{bmatrix} 3 & {\color{#3d81f6} \mathbf{1}} & 4 \\ 2 & {\color{#3d81f6} \mathbf{1}} & 9 \\ 0 & {\color{#3d81f6} \mathbf{-1}} & 0 \\ 2 & {\color{#3d81f6} \mathbf{-2}} & 0 \end{bmatrix}

The transpose of AA is:

AT=[320211124900]A^T = \begin{bmatrix} 3 & 2 & 0 & 2 \\ {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{-1}} & {\color{#3d81f6} \mathbf{-2}} \\ 4 & 9 & 0 & 0 \end{bmatrix}

Note that AR4×3A \in \mathbb{R}^{4 \times 3} and ATR3×4A^T \in \mathbb{R}^{3 \times 4}.

Why would we ever need to do this? To illustrate, suppose u=[u1u2u3u4]\vec u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix}, and that we’d like to compute the product ATuA^T \vec u. (Note that u\vec u must be in R4\mathbb{R}^4 in order for ATuA^T \vec u to be defined, unlike xR3\vec x \in \mathbb{R}^3 in the product AxA \vec x). Then:

ATu=[320211124900][u1u2u3u4]=u1[314]+u2[219]+u3[010]+u4[220]\begin{align*}A^T \vec u &= \begin{bmatrix} 3 & 2 & 0 & 2 \\ 1 & 1 & -1 & -2 \\ 4 & 9 & 0 & 0 \end{bmatrix} \begin{bmatrix} {\color{orange} \mathbf{u_1}} \\ {\color{orange} \mathbf{u_2}} \\ {\color{orange} \mathbf{u_3}} \\ {\color{orange} \mathbf{u_4}} \end{bmatrix} \\ &= {\color{orange} \mathbf{u_1}} \begin{bmatrix} 3 \\ 1 \\ 4 \end{bmatrix} + {\color{orange} \mathbf{u_2}} \begin{bmatrix} 2 \\ 1 \\ 9 \end{bmatrix} + {\color{orange} \mathbf{u_3}} \begin{bmatrix} 0 \\ -1 \\ 0 \end{bmatrix} + {\color{orange} \mathbf{u_4}} \begin{bmatrix} 2 \\ -2 \\ 0 \end{bmatrix} \end{align*}

This is a linear combination of the rows of AA, where the weights are the components of u\vec u. Remember, the standard product AxA \vec x is a linear combination of the columns of AA, so the transpose helps us if we want to compute a linear combination of the rows of AA. (Equivalently, it helps us if we want to compute the dot product of the columns of AA with u\vec u – see the linear combination interpretation section of Chapter 5.1.)

The transpose also gives us another way of expressing the dot product of two vectors. If u\color{orange} \vec u and v\color{#3d81f6} \vec v are two vectors in Rn\mathbb{R}^n, then uT\color{orange} \vec u^T is a row vector with 1 row and nn columns. Multiplying uT\color{orange} \vec u^T by v\color{#3d81f6} \vec v results in a 1×11 \times 1 matrix, which is just the scalar uv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}.

uTv=[u1u2un][v1v2vn]=u1v1+u2v2++unvn=uv=vu=vTu\vec {\color{orange}u}^T \vec{\color{#3d81f6}v} = \begin{bmatrix} {\color{orange}u_1} & {\color{orange}u_2} & \ldots & {\color{orange}u_n} \end{bmatrix} \begin{bmatrix}{\color{#3d81f6}v_1} \\{\color{#3d81f6}v_2} \\ \vdots \\{\color{#3d81f6}v_n} \end{bmatrix} = {\color{orange}u_1}{\color{#3d81f6}v_1} + {\color{orange}u_2}{\color{#3d81f6}v_2} + \ldots + {\color{orange}u_n}{\color{#3d81f6}v_n} = \vec {\color{orange}u} \cdot \vec{\color{#3d81f6}v} = \vec{\color{#3d81f6}v} \cdot \vec {\color{orange}u} = \vec{\color{#3d81f6}v}^T \vec {\color{orange}u}

The benefit of using the transpose to express the dot product is that it allows us to write the dot product of two vectors in terms of matrix multiplication, rather than being an entirely different type of operation. (In fact, as we’ve seen here, matrix multiplication is just a generalization of the dot product.)

There are other uses for the transpose, too, so it’s a useful tool to have in your toolbox.

Properties

The first three properties are relatively straightforward. The last property is a bit more subtle. Try and reason as to why it’s true on your own, then peek into the box below to verify your reasoning and to see an example.

The fact that (AB)T=BTAT(AB)^T = B^T A^T comes in handy when finding the norm of a matrix-vector product. If AA is an n×dn \times d matrix and xRd\vec x \in \mathbb{R}^d, then:

Ax2=(Ax)T(Ax)=xTATAx\lVert A \vec x \rVert^2 = (A \vec x)^T (A \vec x) = \vec x^T A^T A \vec x

As we’ll soon see, some matrices AA have special properties that make this computation particularly easy.

In numpy, the T attribute is used to compute the transpose of a 2D array.

Activity 1


The Identity Matrix

Now, I’ll introduce several “special” types of matrices that will come in handy at various points throughout our journey.

Saying “the identity matrix” is a bit ambiguous, as there are infinitely many identity matrices – there’s a 1×11 \times 1 identity matrix, a 2×22 \times 2 identity matrix, a 3×33 \times 3 identity matrix, and so on. Often, the dimension of the identity matrix is implied by context, and if not, we might provide it as a subscript, e.g. InI_n for the n×nn \times n identity matrix.

Why is the identity matrix defined this way? It’s the matrix equivalent of the number 1 in scalar multiplication, also known as the multiplicative identity. If cc is a scalar, then c1=cc \cdot 1 = c and 1c=c1 \cdot c = c. (0 is known as the additive identity in scalar multiplication.)

Similarly, if AA is square n×nn \times n matrix and xRn\vec x \in \mathbb{R}^n is a vector, then the n×nn \times n identity matrix II is the unique matrix that satisfies:

  • Ix=xI \vec x = \vec x for all xRn\vec x \in \mathbb{R}^n.

  • IA=AI=AI A = A I = A for all ARn×nA \in \mathbb{R}^{n \times n}.

A good exercise is to verify that the identity matrix satisfies these properties.

Activity 2


Symmetric Matrices

For example, AA below is symmetric, but BB is not.

A=[1443],AT=[1443]=Asymmetric\underbrace{A = \begin{bmatrix} 1 & 4 \\ 4 & 3 \end{bmatrix}, \qquad A^T = \begin{bmatrix} 1 & 4 \\ 4 & 3 \end{bmatrix} = A}_{\text{symmetric}}
B=[1423],BT=[1243]Bnot symmetric\underbrace{B = \begin{bmatrix} 1 & 4 \\ -2 & 3 \end{bmatrix}, \qquad B^T = \begin{bmatrix} 1 & -2 \\ 4 & 3 \end{bmatrix} \neq B}_{\text{not symmetric}}

A symmetric matrix is such that row 1 is the same as column 1, row 2 is the same as column 2, and so on. We’ll see several applications of symmetric matrices later in the course – for example, they are easy to work with in multivariate calculus.

But, where do they come from? Data usually isn’t symmetric: if XX is a matrix in which each row is a data point and each column is a feature, then usually XX is very tall, and thus can’t be symmetric.

Here’s the key: for any n×dn \times d matrix XX, XTXX^TX is a symmetric d×dd \times d matrix.We can verify this using the fact that (AB)T=BTAT(AB)^T = B^T A^T.

(XTX)T=XT(XT)T=XTX({\color{orange}X^T}{\color{#3d81f6}X})^T = {\color{#3d81f6}X}^T({\color{orange}X^T})^T = X^TX

I think of XTXX^TX as the dot product matrix of XX, because its entries are the dot products of the pairs of columns of XX. For example, let

X=[261311]X = \begin{bmatrix} 2 & -6 & 1 \\ 3 & 1 & -1 \end{bmatrix}

Then,

XTX=[236111][261311]=[13919377172]X^TX = \begin{bmatrix} 2 & 3 \\ -6 & 1 \\ 1 & -1 \end{bmatrix} \begin{bmatrix} 2 & -6 & 1 \\ 3 & 1 & -1 \end{bmatrix} = \begin{bmatrix} 13 & -9 & -1 \\ -9 & 37 & -7 \\ -1 & -7 & 2 \end{bmatrix}

XTXX^TX contains the dot products of the columns of XX with each other. For instance, -9 in position (1,2)(1, 2) is the dot product of

  • row 1 of XTX^T, which is column 1 of XX, with

  • column 2 of XX

The elements along the diagonal of XTXX^TX13, 37, and 2 – are the dot products of the columns of XX with themselves, meaning they are the squared norms of the columns.

Activity 3


Diagonal Matrices

Usually, diagonal matrices are square, like the examples below.

[1001],[2003],[π000150003]\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}, \quad \begin{bmatrix} 2 & 0 \\ 0 & 3 \end{bmatrix}, \quad \begin{bmatrix} -\pi & 0 & 0 \\ 0 & 15 & 0 \\ 0 & 0 & 3 \end{bmatrix}

Notice that the diagonal is down and to the right – position (1, 1) is the first diagonal element, position (2, 2) is the second diagonal element, and so on.

Diagonal matrices don’t have to be square. We could also call the following a diagonal matrix.

[100020]\begin{bmatrix} 1 & 0 & 0 \\ 0 & 2 & 0 \end{bmatrix}

What is their significance? Let’s observe what happens when we multiply a diagonal matrix by a vector, as we did in Activity 4. of Chapter 5.1.

[3000π0005]D[123]x=[32π15]\underbrace{\begin{bmatrix} -3 & 0 & 0 \\ 0 & \pi & 0 \\ 0 & 0 & 5 \end{bmatrix}}_{D} \underbrace{\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix}}_{\vec x} = \begin{bmatrix} -3 \\ 2\pi \\ 15 \end{bmatrix}

What happened to x\vec x after being multiplied by DD? Each component was stretched. Element 1 of x\vec x was stretched by -3, element 2 was stretched by π\pi, and element 3 was stretched by 5. The 0’s in the off-diagonal elements allowed DD to scale each component of x\vec x independently.


Triangular Matrices

For instance, [123045006]\begin{bmatrix} 1 & 2 & 3 \\ 0 & 4 & 5 \\ 0 & 0 & 6 \end{bmatrix} is upper triangular, and [100230456]\begin{bmatrix} 1 & 0 & 0 \\ 2 & 3 & 0 \\ 4 & 5 & 6 \end{bmatrix} is lower triangular. I’ll have more to say on these matrices later in the term; for now, I just want you to be aware of their existence.


Orthogonal Matrices

The final category of matrix I want to introduce here is the orthogonal matrix. It’s the first type of matrix here whose defining property can’t be determined just by looking at the individual elements of the matrix; it involves some computation.

Let’s consider the matrix

A=[32121232]A = \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\[6pt] \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix}

It is orthogonal, because ATAA^TA and AATAA^T are both equal to the 2×22 \times 2 identity matrix. What does this really tell us? When I discussed symmetric matrices above, I said that ATAA^TA for any matrix AA contains the dot products of the columns of AA with each other. So, if

ATA=[32121232][32121232]=[1001]A^TA = \begin{bmatrix} \frac{\sqrt{3}}{2} & \frac{1}{2} \\[6pt] -\frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix} \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\[6pt] \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix} = \begin{bmatrix} 1 & 0 \\[6pt] 0 & 1 \end{bmatrix}

this tells us that AA’s columns are:

  • unit vectors, since the dot product of each column with itself is 1

  • orthogonal, since the dot product of each column with every other column is 0

Since AAT=IAA^T = I, too, this means that AA’s rows are also unit vectors that are orthogonal to each other.

So, orthogonal matrices are matrices whose

  • columns are unit vectors

  • rows are unit vectors

  • columns are orthogonal to each other

  • rows are orthogonal to each other

If a collection of vectors are all unit vectors and orthogonal, we may call that collection orthonormal. So, an orthogonal matrix has orthonormal columns and orthonormal rows.

Let’s preview some ideas from Chapter 6.1, let’s visualize six vectors in R2\mathbb{R}^2.

  • u=[32]\color{orange} \vec u = \begin{bmatrix} 3 \\ 2 \end{bmatrix} and AuA \color{orange} \vec u

  • v=[22]\color{#3d81f6} \vec v = \begin{bmatrix} 2 \\ -2 \end{bmatrix} and AvA \color{#3d81f6} \vec v

  • w=[54]\color{#d81a60} \vec w = \begin{bmatrix} -5 \\ -4 \end{bmatrix} and AwA \color{#d81a60} \vec w

What do you notice about the vectors AuA \color{orange} \vec u, AvA \color{#3d81f6} \vec v, and AwA \color{#d81a60} \vec w? and how they relate to u\color{orange} \vec u, v\color{#3d81f6} \vec v, and w\color{#d81a60} \vec w?

Image produced in Jupyter

AA corresponds to a rotation, since it rotates vectors by a certain angle (in this case, π6\frac{\pi}{6} radians, or 3030^\circ) but doesn’t change their length. Why π6\frac{\pi}{6}? More on that in Chapter 6.1.

A question we can answer now: why does multiplying an x\vec x by AA preserve its length? Let’s prove this, using the fact that v2=vv\lVert \vec v \rVert^2 = \vec v \cdot \vec v, which is also equal to vTv\vec v^T \vec v using our knowledge of the transpose operator.

Ax2=(Ax)T(Ax)=xTATAx=xTIx=xTx=x2\lVert A \vec x \rVert^2 = (A \vec x)^T (A \vec x) = \vec x^T A^T A \vec x = \vec x^T I \vec x = \vec x^T \vec x = \lVert \vec x \rVert^2

The length of AxA \vec x is the same as the length of x\vec x, because ATA=IA^TA = I! So, multiplying a vector by an orthogonal matrix preserves its length, and thus the only thing that could change is its direction.

Orthogonal matrices perform a rotation, which is a type of linear transformation. There exist plenty of different types of linear transformations, like reflections, sheers, and projections (which sound familiar). These will all become familiar in Chapter 6.1.

All I wanted to show you for now is that matrix multiplication may look like a bunch of random number crunching, but there’s a lot of meaning baked in.

The last thing I’ll note on orthogonal matrices is that just because a matrix satisfies ATA=IA^TA = I, it doesn’t mean that it is orthogonal: it just means that its columns are orthonormal. AA may not even be square, which is a prerequisite for a matrix to be orthogonal. For instance,

A=[3/54/5004/53/5]A = \begin{bmatrix} 3/5 & 4/5 \\ 0 & 0 \\ 4/5 & -3/5 \end{bmatrix}

Then,

ATA=[3/504/54/503/5][3/54/5004/53/5]=[1001]=I2A^TA = \begin{bmatrix} 3/5 & 0 & 4/5 \\ 4/5 & 0 & -3/5 \end{bmatrix} \begin{bmatrix} 3/5 & 4/5 \\ 0 & 0 \\ 4/5 & -3/5 \end{bmatrix} = \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix} = I_{2}

But AATAA^T is not equal to the above. It’s not even equal to the 3×33 \times 3 identity matrix!

AAT=[3/54/5004/53/5][3/504/54/503/5]=[100000001]I2AA^T = \begin{bmatrix} 3/5 & 4/5 \\ 0 & 0 \\ 4/5 & -3/5 \end{bmatrix} \begin{bmatrix} 3/5 & 0 & 4/5 \\ 4/5 & 0 & -3/5 \end{bmatrix} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix} \neq I_{2}

Activity 4