Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

5.1, Part 2: Adjacency Matrices

This section was originally at the end of Chapter 5.1, but I’ve moved it here to avoid overcrowding Chapter 5.1. Next semester, I plan on renumbering the pages of the notes so that each one is a little shorter and more digestible.

To extend what we’ve seen in Chapter 5.1, I’ll introduce a simplified version of the PageRank problem. This will allow me to highlight a key property of eigenvalues and eigenvectors: that they can be used to understand the long-term behavior of a system.


The Game

That sounds abstract – what is a system, and what is long-term behavior? Let’s illustrate with an example. “The Game” refers to the annual football game between Michigan and Ohio State. Let’s suppose that:

  • If Michigan wins in year kk, there’s an 80% chance we win again in year k+1k+1, and a 20% chance that Ohio State wins in year k+1k+1.

  • If Ohio State wins in year kk, there’s a 70% chance they win again in year k+1k+1, and a 30% chance that Michigan wins in year k+1k+1.

Question: In the long-run, what percentage of the time does Michigan win? Perhaps a more approachable way of phrasing this is, what is the (unconditional) probability that Michigan wins a particular game?

This system is an example of a Markov chain, in which the probabilities of transitioning between states don’t change over time. Here, a “state” is the outcome of the game in a particular year. The probabilities described above are fixed: the chance Michigan wins in year 100 given they won in year 99 is 80%, and the chance they win in year 55555 given they won in year 55554 is also 80%.

It’s often useful to visualize Markov chains using a state diagram, which is a directed graph in which each node represents a state, and each edge represents a transition between states. Each edge is labeled with the probability of the transition.

Loading...

Notice that the sum of the probabilities of the outgoing edges from each node (state) is 1. Intuitively, this says we must account for all possible transitions from a given state.

This Markov chain can be described by a 2×22 \times 2 adjacency matrix, AA:

A=[p(UMUM)p(OSUUM)p(UMOSU)p(OSUOSU)]=[0.80.30.20.7]A = \begin{bmatrix} p(UM \to UM) & p(OSU \to UM) \\ p(UM \to OSU) & p(OSU \to OSU) \end{bmatrix} = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}

Note that in an adjacency matrix, for our purposes:

  • Each column describes the movement from a given node; the sums of the columns in an adjacency matrix are equal to 1.

  • Each row describes the movement into a given node; the sums of the rows in an adjacency matrix don’t necessarily add to 1.

That is, aija_{ij} is the probability of transitioning from node jj to node ii.

Recall, our goal is to determine the long-run fraction of games that Michigan wins. How does the adjacency matrix AA help us with this?

Simulating a Chain

Let’s use it to perform a simulation of this system. Let’s seed our simulation in the year t=0t = 0, in which Michigan wins the first game, i.e.

x0=[10]\vec x_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}

where xk\vec x_k is a vector with two components, the first of which is the probability that Michigan wins in year kk, and the second of which is the probability that Ohio State wins in year kk.

To find the distribution of wins in year 1, we can multiply x0\vec x_0 by AA:

x1=Ax0=[0.80.30.20.7][10]=[0.80.2]\vec x_1 = A \vec x_0 = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} \begin{bmatrix} 1 \\ 0 \end{bmatrix} = \begin{bmatrix} 0.8 \\ 0.2 \end{bmatrix}

So, in year 1, we’d expect Michigan to win 80% of the time, and Ohio State to win 20% of the time. What about in year 2?

x2=Ax1=[0.80.30.20.7][0.80.2]=[0.80.8+0.30.20.20.8+0.70.2]=[0.70.3]\vec x_2 = A \vec x_1 = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} \begin{bmatrix} 0.8 \\ 0.2 \end{bmatrix} = \begin{bmatrix} 0.8 \cdot 0.8 + 0.3 \cdot 0.2 \\ 0.2 \cdot 0.8 + 0.7 \cdot 0.2 \end{bmatrix} = \begin{bmatrix} 0.7 \\ 0.3 \end{bmatrix}

Or year 3?

x3=Ax2=[0.80.30.20.7][0.70.3]=[0.80.7+0.30.30.20.7+0.70.3]=[0.650.35]\vec x_3 = A \vec x_2 = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} \begin{bmatrix} 0.7 \\ 0.3 \end{bmatrix} = \begin{bmatrix} 0.8 \cdot 0.7 + 0.3 \cdot 0.3 \\ 0.2 \cdot 0.7 + 0.7 \cdot 0.3 \end{bmatrix} = \begin{bmatrix} 0.65 \\ 0.35 \end{bmatrix}

In general, the distribution of wins in year tt is given by

xk=Axk1=A2xk2==Akx0\vec x_k = A \vec x_{k-1} = A^2 \vec x_{k-2} = \cdots = \boxed{A^k \vec x_0}

If you look at x1\vec x_1, x2\vec x_2, and x3\vec x_3 above, they don’t seem to be exploding to some really large value. Despite repeatedly multiplying by AA, the values seem to be stabilizing. Are these vectors converging? And, if so, to what?

Python can help verify our intuition.

def simulate_steps(A, x0, num_steps=15):
    x = x0
    for k in range(1, num_steps+1):
        # Note that np.linalg.matrix_power(A, k) is the same as A @ A @ ... @ A (k times)
        # A ** k raises each element of A to the kth power, which is not what we want
        x_k = np.linalg.matrix_power(A, k) @ x
        print(f'x_{k} = {x_k.flatten()}')

A = np.array([[0.8, 0.3],
              [0.2, 0.7]])

x0 = np.array([[1], [0]])

simulate_steps(A, x0)
x_1 = [0.8 0.2]
x_2 = [0.7 0.3]
x_3 = [0.65 0.35]
x_4 = [0.625 0.375]
x_5 = [0.6125 0.3875]
x_6 = [0.60625 0.39375]
x_7 = [0.603125 0.396875]
x_8 = [0.6015625 0.3984375]
x_9 = [0.60078125 0.39921875]
x_10 = [0.60039063 0.39960938]
x_11 = [0.60019531 0.39980469]
x_12 = [0.60009766 0.39990234]
x_13 = [0.60004883 0.39995117]
x_14 = [0.60002441 0.39997559]
x_15 = [0.60001221 0.39998779]

It seems that the sequence of xk\vec x_k’s is converging to [0.60.4]\begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}, which would imply that in the long-run, Michigan wins 60% of the time.


An Eigenvalue Problem in Disguise

Could we have found the vector [0.60.4]\begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} without simulating the system? You sure bet. This is the eigenvector of AA corresponding to the eigenvalue 1! Intuitively, since I’m searching for a long-run distribution, I’m really looking for a vector x\vec x such that

Ax=xA \vec x = \vec x

meaning that advancing x\vec x one step in time won’t change it. This is precisely the definition of an eigenvector with eigenvalue λ=1\lambda = 1. Eventually, we will prove that all adjacency matrices have an eigenvalue of 1, and that the corresponding eigenvector is the long-run distribution we’re searching for.

But for now, let’s verify that 1 is indeed an eigenvalue of AA for this specific AA by computing AA’s eigenvalues and eigenvectors ourselves.

A=[0.80.30.20.7]A = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}

You might be able to spot the eigenvalues from the trace and determinant facts (they need to add to 1.5 and multiply to 0.5). If not, there’s nothing wrong with explicitly writing out the characteristic polynomial.

p(λ)=0.8λ0.30.20.7λ=(0.8λ)(0.7λ)0.20.3=λ21.5λ+0.5=(λ1)(λ0.5)\begin{align*} p(\lambda) &= \begin{vmatrix} 0.8 - \lambda & 0.3 \\ 0.2 & 0.7 - \lambda \end{vmatrix} \\ &= (0.8 - \lambda)(0.7 - \lambda) - 0.2 \cdot 0.3 \\ &= \lambda^2 - 1.5\lambda + 0.5 \\ &= (\lambda - 1)(\lambda - 0.5) \end{align*}

This tells us that AA’s eigenvalues are λ1=1\lambda_1 = 1 and λ2=0.5\lambda_2 = 0.5.

  • For λ1=1\lambda_1 = 1, we can find an eigenvector v1=[ab]\vec v_1 = \begin{bmatrix} a \\ b \end{bmatrix} by solving A[ab]=[ab]A \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} a \\ b \end{bmatrix}.

    A[ab]=[ab]    [0.8a+0.3b0.2a+0.7b]=[ab]A \begin{bmatrix} a \\ b \end{bmatrix} = \begin{bmatrix} a \\ b \end{bmatrix} \implies \begin{bmatrix} 0.8a + 0.3b \\ 0.2a + 0.7b \end{bmatrix} = \begin{bmatrix} a \\ b \end{bmatrix}

    Both components of the equation above give the relationship

    0.8a+0.3b=a    0.2a=0.3b    b=23a0.8a + 0.3b = a \implies 0.2a = 0.3b \implies b = \frac{2}{3}a

    So, one eigenvector is v1=[32]\vec v_1 = \begin{bmatrix} 3 \\ 2 \end{bmatrix}. But, since we’d like to interpret the components of v1\vec v_1 as a probability distribution, we should scale it so that its components sum to 1. Since 3+2=53 + 2 = 5, this gives v1=[3/52/5]=[0.60.4]\vec v_1 = \begin{bmatrix} 3/5 \\ 2/5 \end{bmatrix} = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}, which is precisely what our simulated distributions converged to!

  • For λ2=0.5\lambda_2 = 0.5, notice that

    A[11]=[0.80.30.20.7][11]=[0.80.30.20.7]=[0.50.5]=0.5[11]A \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix} \begin{bmatrix} 1 \\ -1 \end{bmatrix} = \begin{bmatrix} 0.8 - 0.3 \\ 0.2 - 0.7 \end{bmatrix} = \begin{bmatrix} 0.5 \\ -0.5 \end{bmatrix} = 0.5 \begin{bmatrix} 1 \\ -1 \end{bmatrix}

    So, v2=[11]\vec v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix} is an eigenvector of AA corresponding to the eigenvalue λ2=0.5\lambda_2 = 0.5. There’s no way to scale v2\vec v_2 so that its components are all positive and sum to 1, but there’s no need to, since we’re only interested in interpreting λ1=1\lambda_1 = 1’s eigenvector as a probability distribution.

I want to revisit our Python simulation from above. Using the initial vector x0=[10]\vec x_0 = \begin{bmatrix} 1 \\ 0 \end{bmatrix}, the state vectors xk\vec x_k converged to v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}.

simulate_steps(A, x0=np.array([[1], [0]]))
x_1 = [0.8 0.2]
x_2 = [0.7 0.3]
x_3 = [0.65 0.35]
x_4 = [0.625 0.375]
x_5 = [0.6125 0.3875]
x_6 = [0.60625 0.39375]
x_7 = [0.603125 0.396875]
x_8 = [0.6015625 0.3984375]
x_9 = [0.60078125 0.39921875]
x_10 = [0.60039063 0.39960938]
x_11 = [0.60019531 0.39980469]
x_12 = [0.60009766 0.39990234]
x_13 = [0.60004883 0.39995117]
x_14 = [0.60002441 0.39997559]
x_15 = [0.60001221 0.39998779]

But, there was nothing special about our choice of x0\vec x_0. Had we started with, say, x0=[0.50.5]\vec x_0 = \begin{bmatrix} 0.5 \\ 0.5 \end{bmatrix} or even x0=[01]\vec x_0 = \begin{bmatrix} 0 \\ 1 \end{bmatrix}, the state vectors would have still converged to the eigenvector v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}.

simulate_steps(A, x0=np.array([[0.5], [0.5]]))
x_1 = [0.55 0.45]
x_2 = [0.575 0.425]
x_3 = [0.5875 0.4125]
x_4 = [0.59375 0.40625]
x_5 = [0.596875 0.403125]
x_6 = [0.5984375 0.4015625]
x_7 = [0.59921875 0.40078125]
x_8 = [0.59960938 0.40039063]
x_9 = [0.59980469 0.40019531]
x_10 = [0.59990234 0.40009766]
x_11 = [0.59995117 0.40004883]
x_12 = [0.59997559 0.40002441]
x_13 = [0.59998779 0.40001221]
x_14 = [0.5999939 0.4000061]
x_15 = [0.59999695 0.40000305]
simulate_steps(A, x0=np.array([[0], [1]]))
x_1 = [0.3 0.7]
x_2 = [0.45 0.55]
x_3 = [0.525 0.475]
x_4 = [0.5625 0.4375]
x_5 = [0.58125 0.41875]
x_6 = [0.590625 0.409375]
x_7 = [0.5953125 0.4046875]
x_8 = [0.59765625 0.40234375]
x_9 = [0.59882813 0.40117188]
x_10 = [0.59941406 0.40058594]
x_11 = [0.59970703 0.40029297]
x_12 = [0.59985352 0.40014648]
x_13 = [0.59992676 0.40007324]
x_14 = [0.59996338 0.40003662]
x_15 = [0.59998169 0.40001831]

What if we start with a x0\vec x_0 that isn’t a probability distribution, like x0=[5015]\vec x_0 = \begin{bmatrix} 50 \\ -15 \end{bmatrix}?

simulate_steps(A, x0=np.array([[50], [-15]]))
x_1 = [35.5 -0.5]
x_2 = [28.25  6.75]
x_3 = [24.625 10.375]
x_4 = [22.8125 12.1875]
x_5 = [21.90625 13.09375]
x_6 = [21.453125 13.546875]
x_7 = [21.2265625 13.7734375]
x_8 = [21.11328125 13.88671875]
x_9 = [21.05664063 13.94335938]
x_10 = [21.02832031 13.97167969]
x_11 = [21.01416016 13.98583984]
x_12 = [21.00708008 13.99291992]
x_13 = [21.00354004 13.99645996]
x_14 = [21.00177002 13.99822998]
x_15 = [21.00088501 13.99911499]

The state vector is converging on [2114]\begin{bmatrix} 21 \\ 14 \end{bmatrix}, which is still on the line described by v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}. So, it seems that no matter where we start, the state vectors will converge to the eigenvector v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} or some scalar multiple of it.


The Dominant Eigenvalue

Why is it the case that xk=Akx0\vec x_k = A^k \vec x_0 converges to an eigenvector for λ1=1\lambda_1 = 1, and not to an eigenvector for the other eigenvalue of λ2=0.5\lambda_2 = 0.5?

The answer is that the dominant eigenvalue, i.e. the one with the largest magnitude, is the one that determines the long-run behavior of the system. Let’s see why this is the case.

First, recall that AA’s eigenvalues are λ1=1\lambda_1 = 1 and λ2=0.5\lambda_2 = 0.5, and corresponding eigenvectors are v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} and v2=[11]\vec v_2 = \begin{bmatrix} 1 \\ -1 \end{bmatrix}. Note that these two eigenvectors are linearly independent, so any other vector xR2\vec x \in \mathbb{R}^2 can be written as a linear combination of v1\vec v_1 and v2\vec v_2. The fact that these eigenvectors are linearly independent is key to the argument below.

So, suppose we start with some vector x=c1v1+c2v2\vec x = c_1 \vec v_1 + c_2 \vec v_2. Then, what we’re really interested in is the behavior of the vectors AxA \vec x, A2xA^2 \vec x, A3xA^3 \vec x, and so on. What is AxA \vec x?

Ax=A(c1v1+c2v2)=c1Av1+c2Av2=c1λ1v1+c2λ2v2A \vec x = A (c_1 \vec v_1 + c_2 \vec v_2) = c_1 A \vec v_1 + c_2 A \vec v_2 = c_1 \lambda_1 \vec v_1 + c_2 \lambda_2 \vec v_2

Above, I used the fact that Av1=λ1v1A \vec v_1 = \lambda_1 \vec v_1 and Av2=λ2v2A \vec v_2 = \lambda_2 \vec v_2 by the definition of eigenvectors.

Recall from the original Chapter 5.1 that if λ\mathbf{\lambda} is an eigenvalue of AA with eigenvector v\vec v, then Akv=λkvA^k \vec v = \mathbf{\lambda}^k \vec v also. So,

A2x=A(c1λ1v1+c2λ2v2)=c1λ12v1+c2λ22v2A^2 \vec x = A (c_1 \lambda_1 \vec v_1 + c_2 \lambda_2 \vec v_2) = c_1 \lambda_1^2 \vec v_1 + c_2 \lambda_2^2 \vec v_2

and more generally, if kk is a positive integer,

Akx=c1λ1kv1+c2λ2kv2A^k \vec x = c_1 \lambda_1^k \vec v_1 + c_2 \lambda_2^k \vec v_2

So, AkxA^k \vec x is a linear combination of v1\vec v_1 and v2\vec v_2 also, with weights c1λ1kc_1 \lambda_1^k and c2λ2kc_2 \lambda_2^k. Here, λ1=1\lambda_1 = 1 and λ2=0.5\lambda_2 = 0.5, so λ1k=1\lambda_1^k = 1 and λ2k=0.5k\lambda_2^k = 0.5^k.

Akx=c1(1)k[0.60.4]+c2(0.5)k[11]=c1[0.60.4]+c2(0.5)k[11]A^k \vec x = c_1 (1)^k \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} + c_2 (0.5)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix} = c_1 \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} + c_2 (0.5)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix}

But, as kk increases, 0.5k0.5^k approaches 0, while 1k=11^k = 1. So,

Akx=c1[0.60.4]+c2(0.5)k[11]c1[0.60.4] as kA^k \vec x = c_1 \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} + c_2 (0.5)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix} \approx c_1 \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix} \text{ as } k \to \infty

This explains why the state vectors xk\vec x_k converge to (some scalar multiple of) v1=[0.60.4]\vec v_1 = \begin{bmatrix} 0.6 \\ 0.4 \end{bmatrix}, no matter where we start!

Why do Adjacency Matrices Have an Eigenvalue of 1?

Let’s prove that all adjacency matrices have an eigenvalue of 1. Recall that an adjacency matrix is defined such that:

  • Each column describes the movement from a given node; the sums of the columns in an adjacency matrix are equal to 1.

  • Each row describes the movement into a given node; the sums of the rows in an adjacency matrix don’t necessarily add to 1.

Here’s the key fact: for any square matrix AA, the eigenvalues of AA are the same as the eigenvalues of ATA^T. The eigenvectors themselves are usually different, but the eigenvalues are the same. This is because AA and ATA^T both have the same characteristic polynomial. Remember from Chapter 2.9 that det(A)=det(AT)\text{det}(A) = \text{det}(A^T). So, AA’s characteristic polynomial is

p(λ)=det(AλI)p(\lambda) = \text{det}(A - \lambda I)

but this is the same as det((AλI)T)=det(ATλI)\text{det}((A - \lambda I)^T) = \text{det}(A^T - \lambda I), which is the characteristic polynomial of ATA^T. So, AA and ATA^T have the same characteristic polynomial, and thus the same eigenvalues.

Back to the original question: why does an adjacency matrix have an eigenvalue of 1? Remember, the columns of an adjacency matrix, which describe the movement from a given node, sum to 1. If AA is an adjacency matrix, then

AT[111]=[sum of row 1sum of row 2sum of row n]=[111]A^T \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix} = \begin{bmatrix} \text{sum of row 1} \\ \text{sum of row 2} \\ \vdots \\ \text{sum of row } n \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \\ \vdots \\ 1 \end{bmatrix}

which shows that AA has an eigenvector (the all ones vector) with an eigenvalue of 1.

For instance, if A=[0.80.30.20.7]A = \begin{bmatrix} 0.8 & 0.3 \\ 0.2 & 0.7 \end{bmatrix}, then AT=[0.80.20.30.7]A^T = \begin{bmatrix} 0.8 & 0.2 \\ 0.3 & 0.7 \end{bmatrix}, and AT[11]=[11]A^T \begin{bmatrix} 1 \\ 1 \end{bmatrix} = \begin{bmatrix} 1 \\ 1 \end{bmatrix}, which is an eigenvector of ATA^T with eigenvalue 1.

But, since AA and ATA^T have the same eigenvalues, AA must have an eigenvalue of 1.

The followup question is why it’s guaranteed that the largest eigenvalue of an adjacency matrix is 1, because this seems to be crucial in guaranteeing that the state vectors xk\vec x_k converge to the eigenvector corresponding to the eigenvalue 1. Proving this fact is beyond the scope of our course, but it follows from the Perron-Frobenius theorem if you’d like to read more.

Example: Non-Adjacency Matrices

You might wonder, if AA’s largest eigenvalue is greater than 1, what happens to the vector AkxA^k \vec x as kk increases? (I won’t call AkxA^k \vec x a state vector, since it’s not a probability distribution.)

Let’s consider A=[3124]A = \begin{bmatrix} 3 & 1 \\ 2 & 4 \end{bmatrix}, which was used as an example in Chapter 5.1. AA has eigenvalues λ1=2\lambda_1 = 2 and λ2=5\lambda_2 = 5, and corresponding eigenvectors v1=[11]\vec v_1 = \begin{bmatrix} 1 \\ -1 \end{bmatrix} and v2=[12]\vec v_2 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}. These eigenvectors are also linearly independent, so any xR2\vec x \in \mathbb{R}^2 can be written as a linear combination of v1\vec v_1 and v2\vec v_2. Then, following the logic from the previous example,

Akx=c1(2)k[11]+c2(5)k[12]A^k \vec x = c_1 (2)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix} + c_2 (5)^k \begin{bmatrix} 1 \\ 2 \end{bmatrix}

The issue is that both 5k5^k and 2k2^k grow without bound as kk increases. So, AkxA^k \vec x will not converge. But, the direction of AkxA^k \vec x is still predictable. Let me start with with x=[13100]\vec x = \begin{bmatrix} -13 \\ 100 \end{bmatrix} (nothing special about these values) and evaluate AkxA^k \vec x for k=1,2,...,15k = 1, 2, ..., 15.

simulate_steps(
    A = np.array([[3, 1],
                  [2, 4]]),
    x0 = np.array([[-13], [100]])
)
x_1 = [ 61 374]
x_2 = [ 557 1618]
x_3 = [3289 7586]
x_4 = [17453 36922]
x_5 = [ 89281 182594]
x_6 = [450437 908938]
x_7 = [2260249 4536626]
x_8 = [11317373 22667002]
x_9 = [ 56619121 113302754]
x_10 = [283160117 566449258]
x_11 = [1415929609 2832117266]
x_12 = [ 7079906093 14160328282]
x_13 = [35400046561 70801125314]
x_14 = [177001264997 354004594378]
x_15 = [ 885008389369 1770020907506]

The numbers are indeed getting bigger and bigger, but a relationship is emerging between the components of AkxA^k \vec x. Let’s plot the coordinates of the vector AkxA^k \vec x for k=1,2,...,15k = 1, 2, ..., 15.

Loading...
Loading...
Loading...

I had to log-scale the axes, since the components of AkxA^k \vec x are growing exponentially. But, you should notice that the vectors AkxA^k \vec x all lie on the same line – the line described by the eigenvector for the dominant eigenvalue, v2=[12]\vec v_2 = \begin{bmatrix} 1 \\ 2 \end{bmatrix}! Hover over any point above to see the coordinates of the vector AkxA^k \vec x for a particular kk, and you’ll see the yy-coordinate tends to be roughly double the xx-coordinate.

So, when the dominant eigenvalue (that is, the eigenvalue with the largest magnitude) is greater than 1, the vector AkxA^k \vec x doesn’t converge, but its direction converges to the direction of the eigenvector corresponding to the dominant eigenvalue. Through the lens of our most recent equation for AkxA^k \vec x,

Akx=c1(2)k[11]+c2(5)k[12]A^k \vec x = c_1 (2)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix} + c_2 (5)^k \begin{bmatrix} 1 \\ 2 \end{bmatrix}

Notice that if we divide both sides by 5k5^k, we see

Akx5k=c1(25)k[11]+c2[12]\frac{A^k \vec x}{5^k} = c_1 \left(\frac{2}{5}\right)^k \begin{bmatrix} 1 \\ -1 \end{bmatrix} + c_2 \begin{bmatrix} 1 \\ 2 \end{bmatrix}

And as kk increases, the term c1(25)kc_1 \left(\frac{2}{5}\right)^k approaches 0. So, the direction of Akx5k\frac{A^k \vec x}{5^k} approaches the direction of [12]\begin{bmatrix} 1 \\ 2 \end{bmatrix}, but AkxA^k \vec x and Akx5k\frac{A^k \vec x}{5^k} have the same direction (but different magnitudes).

You can view the process of computing AkxA^k \vec x for larger and larger kk until the direction stabilizes as being a technique for approximating the eigenvector corresponding to the dominant eigenvalue. This is called the power method.

Here’s the cliffhanger: our analysis of the behavior of AkxA^k \vec x as kk increases has depended on the eigenvectors of AA (that is, v1\vec v_1 and v2\vec v_2) being linearly independent. What if the eigenvectors of a particular matrix aren’t linearly independent? Time for Chapter 5.2 to shine.