10.5. Conclusion - EECS 245 Course Notes

In Chapter 10.4, I introduced principal components analysis (PCA), which is the process of creating new features (called principal components) that are linear combinations of the original features that

capture as much of the variation in the data as possible, and
are uncorrelated with each other.

The main use case of PCA so far has been for dimensionality reduction – taking a high-dimensional dataset and representing each point as a vector in a lower-dimensional space. The main example in Chapter 10.4 was the penguin dataset, where each penguin was originally described by four features (bill length, bill depth, flipper length, and body mass), but PCA allowed us to describe each penguin using just two features.

Handwritten Digits¶

At the very start of the course, in Chapter 1.1, I introduced you to the MNIST dataset (MNIST stands for Modified National Institute of Standards and Technology). To wrap up the course, I’d like to revisit this dataset.

The dataset contains 70,000 labeled grayscale images of handwritten digits, each of which is 28 pixels by 28 pixels. The dataset exists to train machine learning models to recognize handwritten digits. This is a classification task, rather than a regression task, which we spent the majority of the course studying.

A few of the images in the MNIST dataset.

Each image is a $28 \times 28$ grid of pixels, and each of these pixels is an integer between 0 and 255, representing the pixel’s intensity. In the example below, hover over any pixel to see its intensity value.

# This chunk must be in the first plotting cell of each notebook in order to guarantee that the mathjax script is loaded.

import base64
import html
import json
import uuid
from io import BytesIO

import numpy as np
import pandas as pd
import plotly
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from IPython.display import HTML, display
from PIL import Image
from plotly.utils import PlotlyJSONEncoder
from sklearn.datasets import fetch_openml
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline

plotly.offline.init_notebook_mode()
display(HTML(
    """
    <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=TeX-MML-AM_SVG"></script>
    <style>
      .js-plotly-plot .plotly, .js-plotly-plot .plotly .main-svg {
        font-family: 'Palatino Linotype', Palatino, serif;
      }
      mjx-container[jax="SVG"] {
        font-family: 'Palatino Linotype', Palatino, serif !important;
      }
    </style>
    """
))

DIGIT_COLORS = dict(zip(range(10), px.colors.qualitative.Bold))


def style_figure(fig, width=640, height=500):
    fig.update_layout(
        width=width,
        height=height,
        template='plotly_white',
        plot_bgcolor='white',
        paper_bgcolor='white',
        font=dict(family='Palatino Linotype, Palatino, serif', size=16, color='black'),
        margin=dict(l=60, r=24, t=70, b=70),
        legend_title_text='Digit',
    )
    fig.update_xaxes(showline=True, linecolor='black', linewidth=1, gridcolor='#f0f0f0')
    fig.update_yaxes(showline=True, linecolor='black', linewidth=1, gridcolor='#f0f0f0')
    return fig


def show_digit_examples(X, y):
    fig = make_subplots(
        rows=2,
        cols=5,
        subplot_titles=[f'Digit {digit}' for digit in range(10)],
        horizontal_spacing=0.04,
        vertical_spacing=0.18,
    )

    for pos, digit in enumerate(range(10)):
        idx = np.where(y == digit)[0][0]
        image = 255 - X[idx].reshape(28, 28)
        row = pos // 5 + 1
        col = pos % 5 + 1
        fig.add_trace(
            go.Heatmap(
                z=image,
                colorscale='gray',
                zmin=0,
                zmax=255,
                showscale=False,
                hoverinfo='skip',
            ),
            row=row,
            col=col,
        )
        fig.update_xaxes(visible=False, row=row, col=col)
        fig.update_yaxes(visible=False, autorange='reversed', scaleanchor=f'x{pos + 1 if pos > 0 else ""}', scaleratio=1, row=row, col=col)

    fig.update_layout(
        width=720,
        height=300,
        margin=dict(l=10, r=10, t=30, b=10),
        paper_bgcolor='white',
        plot_bgcolor='white',
        font=dict(family='Palatino Linotype, Palatino, serif', size=15, color='black'),
    )
    fig.show(renderer='png', scale=3)


def show_digit_and_flattening(flat_digit, label):
    image = 255 - flat_digit.reshape(28, 28)
    raw_image = flat_digit.reshape(28, 28)

    fig = go.Figure(
        data=go.Heatmap(
            z=image,
            customdata=raw_image,
            colorscale='gray',
            zmin=0,
            zmax=255,
            showscale=False,
            hovertemplate='row = %{y}<br>col = %{x}<br>intensity = %{customdata}<extra></extra>',
        )
    )

    fig.update_layout(
        title=f'Original Image (label = {label})',
        width=440,
        height=440,
        margin=dict(l=70, r=20, t=60, b=55),
        paper_bgcolor='white',
        plot_bgcolor='white',
        font=dict(family='Palatino Linotype, Palatino, serif', size=16, color='black'),
    )
    fig.update_xaxes(
        title='Column index',
        tickmode='array',
        tickvals=list(range(28)),
        ticktext=[str(i) for i in range(28)],
        tickfont=dict(size=8),
        showgrid=False,
        range=[-0.5, 27.5],
        ticks='outside',
        side='bottom',
        constrain='domain',
        scaleanchor='y',
        scaleratio=1,
        automargin=False,
    )
    fig.update_yaxes(
        title='Row index',
        tickmode='array',
        tickvals=list(range(28)),
        ticktext=[str(i) for i in range(28)],
        tickfont=dict(size=8),
        showgrid=False,
        range=[27.5, -0.5],
        ticks='outside',
        side='left',
        constrain='domain',
        automargin=False,
    )
    return fig


def plot_pca_scatter(Z, y=None, n=5000, random_state=245, color=False, title=''):
    rng = np.random.default_rng(random_state)
    idx = rng.choice(len(Z), size=n, replace=False)

    if color:
        plot_df = pd.DataFrame({
            'PCA Invented Feature 1': Z[idx, 0],
            'PCA Invented Feature 2': Z[idx, 1],
            'Digit': y[idx].astype(str),
        })
        fig = px.scatter(
            plot_df,
            x='PCA Invented Feature 1',
            y='PCA Invented Feature 2',
            color='Digit',
            render_mode='webgl',
            color_discrete_map={str(d): color for d, color in DIGIT_COLORS.items()},
            category_orders={'Digit': [str(d) for d in range(10)]},
            opacity=0.72,
            title=title,
        )
        fig.update_traces(
            marker=dict(size=5, line=dict(width=0)),
            hovertemplate='Digit %{fullData.name}<br>PCA Invented Feature 1 = %{x:.2f}<br>PCA Invented Feature 2 = %{y:.2f}<extra></extra>',
        )
    else:
        fig = go.Figure()
        fig.add_trace(go.Scattergl(
            x=Z[idx, 0],
            y=Z[idx, 1],
            mode='markers',
            marker=dict(color='#3D81F6', size=5, opacity=0.33),
            hovertemplate='PCA Invented Feature 1 = %{x:.2f}<br>PCA Invented Feature 2 = %{y:.2f}<extra></extra>',
            showlegend=False,
        ))
        fig.update_layout(title=title)

    fig.update_layout(xaxis_title='PCA Invented Feature 1', yaxis_title='PCA Invented Feature 2')
    if color:
        fig.update_layout(
            legend=dict(
                orientation='h',
                yanchor='top',
                y=-0.20,
                xanchor='center',
                x=0.5,
                title_text='Digit',
            ),
            margin=dict(l=60, r=24, t=70, b=110),
        )
    return style_figure(fig, width=610, height=470)


def digit_to_data_uri(flat_digit, scale=5):
    image = Image.fromarray((255 - flat_digit.reshape(28, 28)).astype(np.uint8), mode='L')
    image = image.resize((28 * scale, 28 * scale), Image.Resampling.NEAREST)
    buffer = BytesIO()
    image.save(buffer, format='PNG')
    return 'data:image/png;base64,' + base64.b64encode(buffer.getvalue()).decode('ascii')


def display_hoverable_pca_sample(Z, y, X, n=200, random_state=245):
    rng = np.random.default_rng(random_state)
    idx = rng.choice(len(Z), size=n, replace=False)

    sample_df = pd.DataFrame({
        'Principal Component 1': Z[idx, 0],
        'Principal Component 2': Z[idx, 1],
        'Digit': y[idx].astype(str),
    })
    sample_df['preview'] = [digit_to_data_uri(row) for row in X[idx]]

    fig = px.scatter(
        sample_df,
        x='Principal Component 1',
        y='Principal Component 2',
        color='Digit',
        render_mode='webgl',
        color_discrete_map={str(d): color for d, color in DIGIT_COLORS.items()},
        category_orders={'Digit': [str(d) for d in range(10)]},
        custom_data=['preview', 'Digit'],
        title='Hover to See the Full Original 28 x 28 Image',
    )
    fig.update_traces(
        marker=dict(size=8, line=dict(width=0.5, color='white')),
        hoverinfo='none',
        hovertemplate=None,
        showlegend=False,
    )
    style_figure(fig, width=610, height=470)
    fig.update_layout(margin=dict(l=50, r=20, t=70, b=55), showlegend=False)

    div_id = f'mnist-hover-{uuid.uuid4().hex}'
    fig_html = fig.to_html(
        include_plotlyjs='cdn',
        full_html=False,
        config={'displaylogo': False, 'responsive': True},
        div_id=div_id,
    )

    iframe_html = f"""
    <html>
      <head>
        <meta charset='utf-8' />
        <style>
          body {{ margin: 0; font-family: 'Palatino Linotype', Palatino, serif; background: white; }}
          .wrap {{ position: relative; width: 100%; padding: 0.5rem 0.25rem; box-sizing: border-box; }}
          .tooltip {{
            position: absolute;
            display: none;
            width: 214px;
            padding: 10px;
            border: 1px solid rgba(0, 0, 0, 0.18);
            border-radius: 8px;
            background: rgba(255, 255, 255, 0.97);
            box-shadow: 0 8px 20px rgba(0, 0, 0, 0.16);
            pointer-events: none;
            z-index: 30;
          }}
          .tooltip-title {{ font-size: 14px; font-weight: 600; margin-bottom: 8px; }}
          .tooltip-image {{
            width: 196px;
            height: 196px;
            display: block;
            margin: 0 auto;
            border: 1px solid #cccccc;
            background: white;
            image-rendering: pixelated;
          }}
          .tooltip-label {{ font-size: 13px; margin-top: 8px; text-align: center; color: #333333; }}
        </style>
      </head>
      <body>
        <div class='wrap' id='hover-wrap'>
          {fig_html}
          <div class='tooltip' id='hover-tooltip'>
            <div class='tooltip-title'>Original 28 x 28 image</div>
            <img id='tooltip-image' class='tooltip-image' alt='MNIST preview' />
            <div id='tooltip-label' class='tooltip-label'></div>
          </div>
        </div>
        <script>
          (function() {{
            const wrap = document.getElementById('hover-wrap');
            const plotDiv = document.getElementById('{div_id}');
            const tooltip = document.getElementById('hover-tooltip');
            const image = document.getElementById('tooltip-image');
            const label = document.getElementById('tooltip-label');
            if (!wrap || !plotDiv || !tooltip) return;

            function positionTooltip(point, evt) {{
              const fullLayout = plotDiv._fullLayout;
              const wrapRect = wrap.getBoundingClientRect();
              const tooltipWidth = 236;
              const tooltipHeight = 262;
              const offsetX = 22;
              const pointX = point.xaxis.l2p(point.x) + fullLayout.margin.l;
              const pointY = point.yaxis.l2p(point.y) + fullLayout.margin.t;
              const mouseX = evt && evt.clientX ? evt.clientX - wrapRect.left : pointX;
              const mouseY = evt && evt.clientY ? evt.clientY - wrapRect.top : pointY;

              let left;
              if (mouseX > wrapRect.width / 2) {{
                left = pointX - tooltipWidth - offsetX;
              }} else {{
                left = pointX + offsetX;
              }}
              let top = mouseY - tooltipHeight / 2;

              left = Math.min(left, wrapRect.width - tooltipWidth - 8);
              left = Math.max(left, 8);
              top = Math.min(top, wrapRect.height - tooltipHeight - 8);
              top = Math.max(top, 8);

              tooltip.style.left = `${{left}}px`;
              tooltip.style.top = `${{top}}px`;
            }}

            plotDiv.on('plotly_hover', function(evt) {{
              const point = evt.points[0];
              image.src = point.customdata[0];
              label.textContent = 'Label: ' + point.customdata[1];
              tooltip.style.display = 'block';
              positionTooltip(point, evt.event);
            }});

            plotDiv.on('plotly_unhover', function() {{
              tooltip.style.display = 'none';
              image.removeAttribute('src');
              label.textContent = '';
            }});

            plotDiv.on('plotly_relayout', function() {{
              tooltip.style.display = 'none';
            }});
          }})();
        </script>
      </body>
    </html>
    """

    iframe_srcdoc = html.escape(iframe_html, quote=True)
    iframe = (
        f'<iframe srcdoc="{iframe_srcdoc}" '
        'style="width: 100%; height: 600px; border: none;"></iframe>'
    )
    display(HTML(iframe))


def make_discrete_colorscale(colors):
    scale = []
    n = len(colors)
    for i, color in enumerate(colors):
        scale.append([i / n, color])
        scale.append([(i + 1) / n, color])
    return scale


def plot_decision_regions(Z, y, clf, accuracy, n=5000, random_state=245):
    rng = np.random.default_rng(random_state)
    idx = rng.choice(len(Z), size=n, replace=False)
    Z_plot = Z[idx]
    y_plot = y[idx]

    pad = 1.0
    x_min, x_max = Z_plot[:, 0].min() - pad, Z_plot[:, 0].max() + pad
    y_min, y_max = Z_plot[:, 1].min() - pad, Z_plot[:, 1].max() + pad

    x_grid = np.linspace(x_min, x_max, 260)
    y_grid = np.linspace(y_min, y_max, 260)
    xx, yy = np.meshgrid(x_grid, y_grid)
    grid = np.c_[xx.ravel(), yy.ravel()]
    zz = clf.predict(grid).reshape(xx.shape)

    fig = go.Figure()
    fig.add_trace(go.Contour(
        x=x_grid,
        y=y_grid,
        z=zz,
        zmin=-0.5,
        zmax=9.5,
        colorscale=make_discrete_colorscale([DIGIT_COLORS[d] for d in range(10)]),
        contours=dict(start=-0.5, end=9.5, size=1, coloring='fill'),
        line=dict(width=0),
        opacity=0.22,
        showscale=False,
        hoverinfo='skip',
        showlegend=False,
    ))

    for digit in range(10):
        mask = y_plot == digit
        fig.add_trace(go.Scattergl(
            x=Z_plot[mask, 0],
            y=Z_plot[mask, 1],
            mode='markers',
            name=str(digit),
            marker=dict(color=DIGIT_COLORS[digit], size=5, opacity=0.82, line=dict(width=0)),
            hovertemplate=f'Digit {digit}<br>PC 1 = %{{x:.2f}}<br>PC 2 = %{{y:.2f}}<extra></extra>',
        ))

    fig.update_layout(
        title=f'Multinomial Logistic Regression on Two Principal Components<br><sup>Test accuracy: {accuracy:.1%}</sup>',
        xaxis_title='PC 1',
        yaxis_title='PC 2',
        legend=dict(orientation='h', yanchor='top', y=-0.18, xanchor='center', x=0.5, title_text='Digit'),
        margin=dict(l=60, r=24, t=80, b=110),
    )
    return style_figure(fig, width=630, height=500)

mnist = fetch_openml('mnist_784', version=1, as_frame=True, parser='auto')
mnist_df = mnist.frame

X = mnist_df.iloc[:, :-1].to_numpy(dtype=np.float32)
y = mnist_df.iloc[:, -1].astype(int).to_numpy()

# As in lecture, the first 60,000 examples are training data and the final 10,000 are test data.
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]

example_idx = 98
fig = show_digit_and_flattening(X_train[example_idx], y_train[example_idx])
fig.update_layout(title='Digit 3')

Each image can be stored as a vector in $\mathbb{R}^{784}$ , since $28 \times 28 = 784$ . A table containing all images, then, has 70,000 rows (one per image) and 784 columns (one per pixel).

Each row above corresponds to one flattened digit. While we can visualize individual images in the dataset one at a time, we can’t visualize the entire dataset at once, since it’s made up of vectors in $\mathbb{R}^{784}$ . That’s where PCA comes in!

PCA Returns¶

By creating principal components, we can represent each image using only 2 or 3 features, rather than 784.

Below, I’ve reduced the dimensionality of the 784-feature dataset to just 2 features. Rather than using np.linalg.svd as in Chapter 10.4, I’ve used sklearn’s PCA implementation, just to show you how it works.

pca_2 = PCA(n_components=2)
mnist_pca = pca_2.fit_transform(X)

What fraction of the variance in the full 784-dimensional data is captured by the first two principal components? It looks like about 17%, which is less than the ~99% we saw for the first two principal components in the penguin dataset. We’re seeing a much lower proportion of variance explained here likely because it’s hard to distinguish between $28\times 28$ images with just two numbers per image.

# The proportion of variance explained by each principal component.
pca_2.explained_variance_ratio_

array([0.09746116, 0.07155445])

# The total variance explained by the first two principal components.
pca_2.explained_variance_ratio_.sum()

0.16901560509373448

In the scatter plot below, each point corresponds to an image. Its position is determined by its values in the first two principal components. Note that we’ve only included a small sample of the full 70,000 image dataset as to avoid crowding the plot (and slowing down your browser).

The values of principal component 1 and 2 not pixel intensities, as they don’t range between 0 and 255. Instead, they are linear combinations of the original pixel intensities.

By construction, PCA did not use the labels when determining the placement of each image. All it knew about each image were the 784 pixel values. Still, the images are roughly clustered by their true digits, meaning that handwritten 0s tend to be placed near handwritten 0s, handwritten 1s near handwritten 1s, and so on.

sorted_labels = [str(label) for label in np.unique(y)]

fig = px.scatter(
    x=mnist_pca[:, 0],
    y=mnist_pca[:, 1],
    color=y.astype(str),
    category_orders={'color': sorted_labels},
    labels={
        'x': 'Principal Component 1',
        'y': 'Principal Component 2',
        'color': 'Digit',
    },
    color_discrete_sequence=px.colors.qualitative.Bold,
)
fig.update_traces(
    marker=dict(size=7, opacity=0.78),
    selector=dict(mode='markers'),
    hovertemplate='Digit %{fullData.name}<br>Principal Component 1 = %{x:.2f}<br>Principal Component 2 = %{y:.2f}<extra></extra>',
)
fig.update_layout(
    legend_title_text='Digit',
    width=610,
    height=470,
    paper_bgcolor='white',
    plot_bgcolor='white',
    font=dict(family='Palatino Linotype, Palatino, serif', size=16, color='black'),
    margin=dict(l=60, r=24, t=24, b=110),
    legend=dict(
        orientation='h',
        yanchor='top',
        y=-0.20,
        xanchor='center',
        x=0.5,
        title_text='Digit',
    ),
)
fig.update_xaxes(
    gridcolor='#f0f0f0',
    zeroline=False,
    showline=True,
    linecolor='black',
    linewidth=1,
)
fig.update_yaxes(
    gridcolor='#f0f0f0',
    zeroline=False,
    showline=True,
    linecolor='black',
    linewidth=1,
)
fig.show()

Notice that even with just two features, the digits are roughly clustered by their true digit label. The 1’s are clustered together in the top left, the 0’s are clustered together in the right center, and so on. But, there’s a fair amount of overlap.

Let’s go one step further: in the graph below (which only uses a random sample of 200 points for speed), hovering over a point reveals the full original 28 x 28 image itself.

Zoom in on the graph, and look at any two points that are close to each other but have different labels. You’ll run into cases like a 9 that looks like a 1, or 5 that looks like a 3. Again, with just these two features for each image, we retain a lot of information about the original image, which is remarkable!

Feature Maps¶

How are these principal components computed? As we saw in Chapter 10.4,

\text{PC}_j = \tilde X \vec v_j

where $\tilde X$ is the centered data matrix, and $\vec v_j$ is the $j$ th eigenvector of $\tilde X^\top \tilde X$ . In other words, $\vec v_j$ is the $j$ th column of $V$ in $\tilde X = U \Sigma V^T$ , the SVD of $\tilde X$ .

Here, $\vec v_j$ contains 784 values, one for each pixel in the original $28 \times 28$ image. Here’s an idea: why don’t we visualize $\vec v_j$ as a $28 \times 28$ image? Such a graph is called a feature map, as it shows how the original 784 features are combined to create each principal component.

import plotly.subplots as sp
import plotly.graph_objects as go

# Get first two principal components and reshape into 28x28 images
pc0_image = pca_2.components_[0].reshape(28, 28)
pc1_image = pca_2.components_[1].reshape(28, 28)

# Create subplot with 1 row and 2 columns
fig = sp.make_subplots(
    rows=1, cols=2,
    subplot_titles=(r"$$\vec v_1 \text{ (first column of } V)$$", r"$$\vec v_2 \text{ (second column of } V)$$"),
    horizontal_spacing=0.05,
)

# Add images to the subplots
fig.add_trace(
    go.Heatmap(z=pc0_image, colorscale='RdBu', colorbar=dict(title='Value'), showscale=True),
    row=1, col=1,
)
fig.add_trace(
    go.Heatmap(z=pc1_image, colorscale='RdBu', showscale=False),
    row=1, col=2,
)

# Set axis appearance and size for both
for i in range(1, 3):
    fig.update_xaxes(showticklabels=False, row=1, col=i)
    fig.update_yaxes(showticklabels=False, row=1, col=i)

# Set overall layout with Palatino font
fig.update_layout(
    width=600,
    height=325,
    # title_text=r"$$\text{Columns of } V \text{ in } \tilde X = U \Sigma V^T$$",
    font=dict(family="Palatino, Palatino Linotype, serif"),
    margin=dict(t=50)
)

fig.show()

Now, let’s look at the scatter plot of the first two principal components again, with the digits colored by their true digit label. Notice that the 0’s are in the far right while the 1’s are in the far left. How does the above heat map explain this?

sorted_labels = [str(label) for label in np.unique(y)]

fig = px.scatter(
    x=mnist_pca[:, 0],
    y=mnist_pca[:, 1],
    color=y.astype(str),
    category_orders={'color': sorted_labels},
    labels={
        'x': 'Principal Component 1',
        'y': 'Principal Component 2',
        'color': 'Digit',
    },
    color_discrete_sequence=px.colors.qualitative.Bold,
)
fig.update_traces(
    marker=dict(size=7, opacity=0.78),
    selector=dict(mode='markers'),
    hovertemplate='Digit %{fullData.name}<br>Principal Component 1 = %{x:.2f}<br>Principal Component 2 = %{y:.2f}<extra></extra>',
)
fig.update_layout(
    legend_title_text='Digit',
    width=610,
    height=470,
    paper_bgcolor='white',
    plot_bgcolor='white',
    font=dict(family='Palatino Linotype, Palatino, serif', size=16, color='black'),
    margin=dict(l=60, r=24, t=24, b=110),
    legend=dict(
        orientation='h',
        yanchor='top',
        y=-0.20,
        xanchor='center',
        x=0.5,
        title_text='Digit',
    ),
)
fig.update_xaxes(
    gridcolor='#f0f0f0',
    zeroline=False,
    showline=True,
    linecolor='black',
    linewidth=1,
)
fig.update_yaxes(
    gridcolor='#f0f0f0',
    zeroline=False,
    showline=True,
    linecolor='black',
    linewidth=1,
)
fig.show()

PCA Regression¶

Here’s the last big idea: what if we use these new features as inputs to a model? We absolutely can. But what kind of predictive task is this: regression or classification? Classification, of course, because the goal is to predict which digit appears in the image.

A standard choice is logistic regression. Despite the name, logistic regression is a linear classification technique that builds on linear regression. Instead of predicting a numerical response directly, it predicts class probabilities and then turns those probabilities into decisions.

For handwritten digits, sklearn uses a multinomial version of logistic regression that predicts a probability for each digit 0 through 9. We won’t focus on the math, the loss function, or the optimization details here; those are beyond our scope. The point is just that PCA can create new features, and a classifier can use those new features as inputs.

pipe.fit(X_train, y_train)
two_pc_accuracy = pipe.score(X_test, y_test)
two_pc_accuracy

0.4465

Z_train_pipeline = pipe.named_steps['pca'].transform(X_train)
logistic_model = pipe.named_steps['logisticregression']

plot_decision_regions(Z_train_pipeline, y_train, logistic_model, two_pc_accuracy)

Even with just two principal components, the classifier reaches about 44.6% accuracy on the test set. That is far from perfect, but it is still much better than random guessing among 10 classes, which would only be correct about 10% of the time.

While the full MNIST dataset was too high-dimensional to visualize directly, it can be compressed into principal components, which can be visualized and then fed into a classifier. What a beautiful way to wrap up the course!