Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.2. Special Matrices

There’s an important operation on matrices that we haven’t discussed yet.

Definition

To illustrate, let’s start with our familiar matrix AA:

A=[314219010220]A = \begin{bmatrix} 3 & {\color{#3d81f6} \mathbf{1}} & 4 \\ 2 & {\color{#3d81f6} \mathbf{1}} & 9 \\ 0 & {\color{#3d81f6} \mathbf{-1}} & 0 \\ 2 & {\color{#3d81f6} \mathbf{-2}} & 0 \end{bmatrix}

The transpose of AA is:

AT=[320211124900]A^T = \begin{bmatrix} 3 & 2 & 0 & 2 \\ {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{1}} & {\color{#3d81f6} \mathbf{-1}} & {\color{#3d81f6} \mathbf{-2}} \\ 4 & 9 & 0 & 0 \end{bmatrix}

Note that AR4×3A \in \mathbb{R}^{4 \times 3} and ATR3×4A^T \in \mathbb{R}^{3 \times 4}.

Why would we ever need to do this? To illustrate, suppose u=[u1u2u3u4]\vec u = \begin{bmatrix} u_1 \\ u_2 \\ u_3 \\ u_4 \end{bmatrix}, and that we’d like to compute the product ATuA^T \vec u. (Note that u\vec u must be in R4\mathbb{R}^4 in order for ATuA^T \vec u to be defined, unlike xR3\vec x \in \mathbb{R}^3 in the product AxA \vec x). Then:

ATu=[320211124900][u1u2u3u4]=u1[314]+u2[219]+u3[010]+u4[220]\begin{align*}A^T \vec u &= \begin{bmatrix} 3 & 2 & 0 & 2 \\ 1 & 1 & -1 & -2 \\ 4 & 9 & 0 & 0 \end{bmatrix} \begin{bmatrix} {\color{orange} \mathbf{u_1}} \\ {\color{orange} \mathbf{u_2}} \\ {\color{orange} \mathbf{u_3}} \\ {\color{orange} \mathbf{u_4}} \end{bmatrix} \\ &= {\color{orange} \mathbf{u_1}} \begin{bmatrix} 3 \\ 1 \\ 4 \end{bmatrix} + {\color{orange} \mathbf{u_2}} \begin{bmatrix} 2 \\ 1 \\ 9 \end{bmatrix} + {\color{orange} \mathbf{u_3}} \begin{bmatrix} 0 \\ -1 \\ 0 \end{bmatrix} + {\color{orange} \mathbf{u_4}} \begin{bmatrix} 2 \\ -2 \\ 0 \end{bmatrix} \end{align*}

This is a linear combination of the rows of AA, where the weights are the components of u\vec u. Remember, the standard product AxA \vec x is a linear combination of the columns of AA, so the transpose helps us if we want to compute a linear combination of the rows of AA. (Equivalently, it helps us if we want to compute the dot product of the columns of AA with u\vec u – see the “Two Pictures” note from earlier in this chapter.)

The transpose also gives us another way of expressing the dot product of two vectors. If u\color{orange} \vec u and v\color{#3d81f6} \vec v are two vectors in Rn\mathbb{R}^n, then uT\color{orange} \vec u^T is a row vector with 1 row and nn columns. Multiplying uT\color{orange} \vec u^T by v\color{#3d81f6} \vec v results in a 1×11 \times 1 matrix, which is just the scalar uv{\color{orange} \vec u} \cdot {\color{#3d81f6} \vec v}.

uTv=[u1u2un][v1v2vn]=u1v1+u2v2++unvn=uv=vu=vTu\vec {\color{orange}u}^T \vec{\color{#3d81f6}v} = \begin{bmatrix} {\color{orange}u_1} & {\color{orange}u_2} & \ldots & {\color{orange}u_n} \end{bmatrix} \begin{bmatrix}{\color{#3d81f6}v_1} \\{\color{#3d81f6}v_2} \\ \vdots \\{\color{#3d81f6}v_n} \end{bmatrix} = {\color{orange}u_1}{\color{#3d81f6}v_1} + {\color{orange}u_2}{\color{#3d81f6}v_2} + \ldots + {\color{orange}u_n}{\color{#3d81f6}v_n} = \vec {\color{orange}u} \cdot \vec{\color{#3d81f6}v} = \vec{\color{#3d81f6}v} \cdot \vec {\color{orange}u} = \vec{\color{#3d81f6}v}^T \vec {\color{orange}u}

The benefit of using the transpose to express the dot product is that it allows us to write the dot product of two vectors in terms of matrix multiplication, rather than being an entirely different type of operation. (In fact, as we’ve seen here, matrix multiplication is just a generalization of the dot product.)

There are other uses for the transpose, too, so it’s a useful tool to have in your toolbox.

Properties

The first three properties are relatively straightforward. The last property is a bit more subtle. Try and reason as to why it’s true on your own, then peek into the box below to verify your reasoning and to see an example.

The fact that (AB)T=BTAT(AB)^T = B^T A^T comes in handy when finding the norm of a matrix-vector product. If AA is an n×dn \times d matrix and xRd\vec x \in \mathbb{R}^d, then:

Ax2=(Ax)T(Ax)=xTATAx\lVert A \vec x \rVert^2 = (A \vec x)^T (A \vec x) = \vec x^T A^T A \vec x

As we’ll soon see, some matrices AA have special properties that make this computation particularly easy.

TODO: Add discussion of symmetric, triangular, diagonal, orthogonal, and other special matrices.

We now introduce the transpose and other special matrix types that will keep appearing.

In numpy, the T attribute is used to compute the transpose of a 2D array.


The Identity Matrix

Saying “the identity matrix” is a bit ambiguous, as there are infinitely many identity matrices – there’s a 1×11 \times 1 identity matrix, a 2×22 \times 2 identity matrix, a 3×33 \times 3 identity matrix, and so on. Often, the dimension of the identity matrix is implied by context, and if not, we might provide it as a subscript, e.g. InI_n for the n×nn \times n identity matrix.

Why is the identity matrix defined this way? It’s the matrix equivalent of the number 1 in scalar multiplication, also known as the multiplicative identity. If cc is a scalar, then c1=cc \cdot 1 = c and 1c=c1 \cdot c = c. (0 is known as the additive identity in scalar multiplication.)

Similarly, if AA is square n×nn \times n matrix and xRn\vec x \in \mathbb{R}^n is a vector, then the n×nn \times n identity matrix II is the unique matrix that satisfies:

  • Ix=xI \vec x = \vec x for all xRn\vec x \in \mathbb{R}^n.

  • IA=AI=AI A = A I = A for all ARn×nA \in \mathbb{R}^{n \times n}.

A good exercise is to verify that the identity matrix satisfies these properties.


Preview: Transformations

This section was relatively mechanical, and I didn’t spend much time explaining why we’d multiply two matrices (or a matrix and a vector) together. More context for this operation will come throughout the rest of the chapter.

To conclude, I want to show you some of the magic behind matrix multiplication.

Consider the relatively innocent looking 2×22 \times 2 matrix

A=[32121232]A = \begin{bmatrix} \frac{\sqrt{3}}{2} & -\frac{1}{2} \\[6pt] \frac{1}{2} & \frac{\sqrt{3}}{2} \end{bmatrix}

Below, you’ll see that I’ve drawn out six vectors in R2\mathbb{R}^2.

  • u=[32]\color{orange} \vec u = \begin{bmatrix} 3 \\ 2 \end{bmatrix} and AuA \color{orange} \vec u

  • v=[22]\color{#3d81f6} \vec v = \begin{bmatrix} 2 \\ -2 \end{bmatrix} and AvA \color{#3d81f6} \vec v

  • w=[54]\color{#d81a60} \vec w = \begin{bmatrix} -5 \\ -4 \end{bmatrix} and AwA \color{#d81a60} \vec w

What do you notice about the vectors AuA \color{orange} \vec u, AvA \color{#3d81f6} \vec v, and AwA \color{#d81a60} \vec w? and how they relate to u\color{orange} \vec u, v\color{#3d81f6} \vec v, and w\color{#d81a60} \vec w?

Image produced in Jupyter

AA is called a rotation matrix, since it rotates vectors by a certain angle (in this case, π6\frac{\pi}{6} radians, or 3030^\circ). Rotations are a type of linear transformation.

Not all matrices are rotation matrices; there exist plenty of different types of linear transformations, like reflections, sheers, and projections (which sound familiar). These will all become familiar in Chapter 6.1. All I wanted to show you for now is that matrix multiplication may look like a bunch of random number crunching, but there’s a lot of meaning baked in.