This section was originally at the end of Chapter 5.1, but I’ve moved it here to avoid overcrowding Chapter 5.1. Next semester, I plan on renumbering the pages of the notes so that each one is a little shorter and more digestible.
To extend what we’ve seen in Chapter 5.1, I’ll introduce a simplified version of the PageRank problem. This will allow me to highlight a key property of eigenvalues and eigenvectors: that they can be used to understand the long-term behavior of a system.
The Game¶
That sounds abstract – what is a system, and what is long-term behavior? Let’s illustrate with an example. “The Game” refers to the annual football game between Michigan and Ohio State. Let’s suppose that:
If Michigan wins in year , there’s an 80% chance we win again in year , and a 20% chance that Ohio State wins in year .
If Ohio State wins in year , there’s a 70% chance they win again in year , and a 30% chance that Michigan wins in year .
Question: In the long-run, what percentage of the time does Michigan win? Perhaps a more approachable way of phrasing this is, what is the (unconditional) probability that Michigan wins a particular game?
This system is an example of a Markov chain, in which the probabilities of transitioning between states don’t change over time. Here, a “state” is the outcome of the game in a particular year. The probabilities described above are fixed: the chance Michigan wins in year 100 given they won in year 99 is 80%, and the chance they win in year 55555 given they won in year 55554 is also 80%.
It’s often useful to visualize Markov chains using a state diagram, which is a directed graph in which each node represents a state, and each edge represents a transition between states. Each edge is labeled with the probability of the transition.
Notice that the sum of the probabilities of the outgoing edges from each node (state) is 1. Intuitively, this says we must account for all possible transitions from a given state.
This Markov chain can be described by a adjacency matrix, :
Note that in an adjacency matrix, for our purposes:
Each column describes the movement from a given node; the sums of the columns in an adjacency matrix are equal to 1.
Each row describes the movement into a given node; the sums of the rows in an adjacency matrix don’t necessarily add to 1.
That is, is the probability of transitioning from node to node .
Recall, our goal is to determine the long-run fraction of games that Michigan wins. How does the adjacency matrix help us with this?
Simulating a Chain¶
Let’s use it to perform a simulation of this system. Let’s seed our simulation in the year , in which Michigan wins the first game, i.e.
where is a vector with two components, the first of which is the probability that Michigan wins in year , and the second of which is the probability that Ohio State wins in year .
To find the distribution of wins in year 1, we can multiply by :
So, in year 1, we’d expect Michigan to win 80% of the time, and Ohio State to win 20% of the time. What about in year 2?
Or year 3?
In general, the distribution of wins in year is given by
If you look at , , and above, they don’t seem to be exploding to some really large value. Despite repeatedly multiplying by , the values seem to be stabilizing. Are these vectors converging? And, if so, to what?
Python can help verify our intuition.
def simulate_steps(A, x0, num_steps=15):
x = x0
for k in range(1, num_steps+1):
# Note that np.linalg.matrix_power(A, k) is the same as A @ A @ ... @ A (k times)
# A ** k raises each element of A to the kth power, which is not what we want
x_k = np.linalg.matrix_power(A, k) @ x
print(f'x_{k} = {x_k.flatten()}')
A = np.array([[0.8, 0.3],
[0.2, 0.7]])
x0 = np.array([[1], [0]])
simulate_steps(A, x0)x_1 = [0.8 0.2]
x_2 = [0.7 0.3]
x_3 = [0.65 0.35]
x_4 = [0.625 0.375]
x_5 = [0.6125 0.3875]
x_6 = [0.60625 0.39375]
x_7 = [0.603125 0.396875]
x_8 = [0.6015625 0.3984375]
x_9 = [0.60078125 0.39921875]
x_10 = [0.60039063 0.39960938]
x_11 = [0.60019531 0.39980469]
x_12 = [0.60009766 0.39990234]
x_13 = [0.60004883 0.39995117]
x_14 = [0.60002441 0.39997559]
x_15 = [0.60001221 0.39998779]
It seems that the sequence of ’s is converging to , which would imply that in the long-run, Michigan wins 60% of the time.
An Eigenvalue Problem in Disguise¶
Could we have found the vector without simulating the system? You sure bet. This is the eigenvector of corresponding to the eigenvalue 1! Intuitively, since I’m searching for a long-run distribution, I’m really looking for a vector such that
meaning that advancing one step in time won’t change it. This is precisely the definition of an eigenvector with eigenvalue . Eventually, we will prove that all adjacency matrices have an eigenvalue of 1, and that the corresponding eigenvector is the long-run distribution we’re searching for.
But for now, let’s verify that 1 is indeed an eigenvalue of for this specific by computing ’s eigenvalues and eigenvectors ourselves.
You might be able to spot the eigenvalues from the trace and determinant facts (they need to add to 1.5 and multiply to 0.5). If not, there’s nothing wrong with explicitly writing out the characteristic polynomial.
This tells us that ’s eigenvalues are and .
For , we can find an eigenvector by solving .
Both components of the equation above give the relationship
So, one eigenvector is . But, since we’d like to interpret the components of as a probability distribution, we should scale it so that its components sum to 1. Since , this gives , which is precisely what our simulated distributions converged to!
For , notice that
So, is an eigenvector of corresponding to the eigenvalue . There’s no way to scale so that its components are all positive and sum to 1, but there’s no need to, since we’re only interested in interpreting ’s eigenvector as a probability distribution.
I want to revisit our Python simulation from above. Using the initial vector , the state vectors converged to .
simulate_steps(A, x0=np.array([[1], [0]]))x_1 = [0.8 0.2]
x_2 = [0.7 0.3]
x_3 = [0.65 0.35]
x_4 = [0.625 0.375]
x_5 = [0.6125 0.3875]
x_6 = [0.60625 0.39375]
x_7 = [0.603125 0.396875]
x_8 = [0.6015625 0.3984375]
x_9 = [0.60078125 0.39921875]
x_10 = [0.60039063 0.39960938]
x_11 = [0.60019531 0.39980469]
x_12 = [0.60009766 0.39990234]
x_13 = [0.60004883 0.39995117]
x_14 = [0.60002441 0.39997559]
x_15 = [0.60001221 0.39998779]
But, there was nothing special about our choice of . Had we started with, say, or even , the state vectors would have still converged to the eigenvector .
simulate_steps(A, x0=np.array([[0.5], [0.5]]))x_1 = [0.55 0.45]
x_2 = [0.575 0.425]
x_3 = [0.5875 0.4125]
x_4 = [0.59375 0.40625]
x_5 = [0.596875 0.403125]
x_6 = [0.5984375 0.4015625]
x_7 = [0.59921875 0.40078125]
x_8 = [0.59960938 0.40039063]
x_9 = [0.59980469 0.40019531]
x_10 = [0.59990234 0.40009766]
x_11 = [0.59995117 0.40004883]
x_12 = [0.59997559 0.40002441]
x_13 = [0.59998779 0.40001221]
x_14 = [0.5999939 0.4000061]
x_15 = [0.59999695 0.40000305]
simulate_steps(A, x0=np.array([[0], [1]]))x_1 = [0.3 0.7]
x_2 = [0.45 0.55]
x_3 = [0.525 0.475]
x_4 = [0.5625 0.4375]
x_5 = [0.58125 0.41875]
x_6 = [0.590625 0.409375]
x_7 = [0.5953125 0.4046875]
x_8 = [0.59765625 0.40234375]
x_9 = [0.59882813 0.40117188]
x_10 = [0.59941406 0.40058594]
x_11 = [0.59970703 0.40029297]
x_12 = [0.59985352 0.40014648]
x_13 = [0.59992676 0.40007324]
x_14 = [0.59996338 0.40003662]
x_15 = [0.59998169 0.40001831]
What if we start with a that isn’t a probability distribution, like ?
simulate_steps(A, x0=np.array([[50], [-15]]))x_1 = [35.5 -0.5]
x_2 = [28.25 6.75]
x_3 = [24.625 10.375]
x_4 = [22.8125 12.1875]
x_5 = [21.90625 13.09375]
x_6 = [21.453125 13.546875]
x_7 = [21.2265625 13.7734375]
x_8 = [21.11328125 13.88671875]
x_9 = [21.05664063 13.94335938]
x_10 = [21.02832031 13.97167969]
x_11 = [21.01416016 13.98583984]
x_12 = [21.00708008 13.99291992]
x_13 = [21.00354004 13.99645996]
x_14 = [21.00177002 13.99822998]
x_15 = [21.00088501 13.99911499]
The state vector is converging on , which is still on the line described by . So, it seems that no matter where we start, the state vectors will converge to the eigenvector or some scalar multiple of it.
The Dominant Eigenvalue¶
Why is it the case that converges to an eigenvector for , and not to an eigenvector for the other eigenvalue of ?
The answer is that the dominant eigenvalue, i.e. the one with the largest magnitude, is the one that determines the long-run behavior of the system. Let’s see why this is the case.
First, recall that ’s eigenvalues are and , and corresponding eigenvectors are and . Note that these two eigenvectors are linearly independent, so any other vector can be written as a linear combination of and . The fact that these eigenvectors are linearly independent is key to the argument below.
So, suppose we start with some vector . Then, what we’re really interested in is the behavior of the vectors , , , and so on. What is ?
Above, I used the fact that and by the definition of eigenvectors.
Recall from the original Chapter 5.1 that if is an eigenvalue of with eigenvector , then also. So,
and more generally, if is a positive integer,
So, is a linear combination of and also, with weights and . Here, and , so and .
But, as increases, approaches 0, while . So,
This explains why the state vectors converge to (some scalar multiple of) , no matter where we start!
Why do Adjacency Matrices Have an Eigenvalue of 1?¶
Let’s prove that all adjacency matrices have an eigenvalue of 1. Recall that an adjacency matrix is defined such that:
Each column describes the movement from a given node; the sums of the columns in an adjacency matrix are equal to 1.
Each row describes the movement into a given node; the sums of the rows in an adjacency matrix don’t necessarily add to 1.
Here’s the key fact: for any square matrix , the eigenvalues of are the same as the eigenvalues of . The eigenvectors themselves are usually different, but the eigenvalues are the same. This is because and both have the same characteristic polynomial. Remember from Chapter 2.9 that . So, ’s characteristic polynomial is
but this is the same as , which is the characteristic polynomial of . So, and have the same characteristic polynomial, and thus the same eigenvalues.
Back to the original question: why does an adjacency matrix have an eigenvalue of 1? Remember, the columns of an adjacency matrix, which describe the movement from a given node, sum to 1. If is an adjacency matrix, then
which shows that has an eigenvector (the all ones vector) with an eigenvalue of 1.
For instance, if , then , and , which is an eigenvector of with eigenvalue 1.
But, since and have the same eigenvalues, must have an eigenvalue of 1.
The followup question is why it’s guaranteed that the largest eigenvalue of an adjacency matrix is 1, because this seems to be crucial in guaranteeing that the state vectors converge to the eigenvector corresponding to the eigenvalue 1. Proving this fact is beyond the scope of our course, but it follows from the Perron-Frobenius theorem if you’d like to read more.
Example: Non-Adjacency Matrices¶
You might wonder, if ’s largest eigenvalue is greater than 1, what happens to the vector as increases? (I won’t call a state vector, since it’s not a probability distribution.)
Let’s consider , which was used as an example in Chapter 5.1. has eigenvalues and , and corresponding eigenvectors and . These eigenvectors are also linearly independent, so any can be written as a linear combination of and . Then, following the logic from the previous example,
The issue is that both and grow without bound as increases. So, will not converge. But, the direction of is still predictable. Let me start with with (nothing special about these values) and evaluate for .
simulate_steps(
A = np.array([[3, 1],
[2, 4]]),
x0 = np.array([[-13], [100]])
)x_1 = [ 61 374]
x_2 = [ 557 1618]
x_3 = [3289 7586]
x_4 = [17453 36922]
x_5 = [ 89281 182594]
x_6 = [450437 908938]
x_7 = [2260249 4536626]
x_8 = [11317373 22667002]
x_9 = [ 56619121 113302754]
x_10 = [283160117 566449258]
x_11 = [1415929609 2832117266]
x_12 = [ 7079906093 14160328282]
x_13 = [35400046561 70801125314]
x_14 = [177001264997 354004594378]
x_15 = [ 885008389369 1770020907506]
The numbers are indeed getting bigger and bigger, but a relationship is emerging between the components of . Let’s plot the coordinates of the vector for .
I had to log-scale the axes, since the components of are growing exponentially. But, you should notice that the vectors all lie on the same line – the line described by the eigenvector for the dominant eigenvalue, ! Hover over any point above to see the coordinates of the vector for a particular , and you’ll see the -coordinate tends to be roughly double the -coordinate.
So, when the dominant eigenvalue (that is, the eigenvalue with the largest magnitude) is greater than 1, the vector doesn’t converge, but its direction converges to the direction of the eigenvector corresponding to the dominant eigenvalue. Through the lens of our most recent equation for ,
Notice that if we divide both sides by , we see
And as increases, the term approaches 0. So, the direction of approaches the direction of , but and have the same direction (but different magnitudes).
You can view the process of computing for larger and larger until the direction stabilizes as being a technique for approximating the eigenvector corresponding to the dominant eigenvalue. This is called the power method.
Here’s the cliffhanger: our analysis of the behavior of as increases has depended on the eigenvectors of (that is, and ) being linearly independent. What if the eigenvectors of a particular matrix aren’t linearly independent? Time for Chapter 5.2 to shine.