2.2. Detour: Partial Derivatives - EECS 245 Course Notes

Partial Derivatives¶

How do we take the derivative of a function with multiple input variables?

R_\text{sq}(w_0, w_1) = \frac{1}{n} \sum_{i=1}^n (y_i - (w_0 + w_1 x_i))^2

To illustrate, let’s focus on a simpler function with two input variables:

f(x,y) = \frac{x^2 + y^2}{9}

This is a quadratic function of two variables, and its graph is known as a paraboloid.

import numpy as np
import plotly.graph_objects as go

# Grid for paraboloid
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
a = 3
Z = (X**2 + Y ** 2) / a**2

fig = go.Figure()

# Paraboloid only, with RdBu_r colorscale
fig.add_trace(go.Surface(
    z=Z, x=X, y=Y, 
    colorscale="PuRd", 
    opacity=1, 
    name="Paraboloid", 
    showscale=False
))

fig.update_layout(
    scene=dict(
        xaxis=dict(
            title=r"x",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        yaxis=dict(
            title=r"y",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        zaxis=dict(
            title=r"f(x,y)",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        bgcolor="white"
    ),
    paper_bgcolor='white',
    plot_bgcolor='white',
    margin=dict(l=30, r=30, t=30, b=30),
    autosize=True,
    width=600,
    height=400,
    font=dict(
        family="Palatino Linotype, Palatino, serif",
        color="black"
    ),
    scene_camera=dict(
        eye=dict(x=1, y=-1.5, z=1.2)
    )
)

fig.show()

In the single-input case – i.e., for functions of the form $f: \mathbb{R} \to \mathbb{R}$ – the derivative $\frac{\text{d}}{\text{d}x}f(x)$ captured $f(x)$ 's rate of change along the $x$ -axis, which was the only axis of motion.

The function $f(x, y)$ has two input variables, and so there are two directions along which we can move. As such, we need two “derivatives” to describe the rate of change of $f(x, y)$ – one for the $x$ -axis and one for the $y$ -axis. Think of this as a science experiment, where we need control variables to isolate changes to a single variable. Our solution to this dilemma comes in the form of partial derivatives.

If $f$ has $n$ input variables, it has $n$ partial derivatives, one for each axis. The function $f(x, y) = \frac{x^2 + y^2}{9}$ has two partial derivatives, $\frac{\partial f}{\partial x}(x, y)$ and $\frac{\partial f}{\partial y}(x, y)$ . (The symbol you’re seeing, $\mathcal{\partial}$ , is the lowercase Greek letter delta, and is used specifically for partial derivatives.)

Let me show you how to compute partial derivatives before we visualize them. We’ll start with $\frac{\partial f}{\partial x}(x, y)$ .

\begin{align*} f(x,y) &= \frac{x^2 + y^2}{9} \\ \frac{\partial f}{\partial x}(x, y) &=\frac{\partial}{\partial x}\!\left(\frac{x^2+y^2}{9}\right) \\[4pt] &=\frac{1}{9}\,\frac{\partial}{\partial x}(x^2+y^2) \\ &=\frac{1}{9}\!\left(\frac{\partial}{\partial x}x^2+ \underbrace{\frac{\partial}{\partial x}y^2}_{=0}\right) \\ &=\frac{1}{9}\,(2x+0) \\ &=\frac{2x}{9} \end{align*}

The result, $\frac{\partial f}{\partial x}(x, y) = \frac{2x}{9}$ , is a function of $x$ and $y$ . It tells us the rate of change of $f(x,y)$ along the $x$ axis, at any point $(x, y)$ . It just so happens that this function doesn’t involve $y$ since we chose a relatively simple function $f$ , but we’ll see more sophisticated examples soon.

Following similar steps, you’ll see that $\frac{\partial f}{\partial y}(x, y) = \frac{2y}{9}$ . This gives us:

\frac{\partial f}{\partial x}(x, y) = \frac{2x}{9}, \quad \frac{\partial f}{\partial y}(x, y) = \frac{2y}{9}

Let’s pick an arbitrary point and see what the partial derivatives tell us about it. Consider, say, $(-3, 0.5)$ :

$\frac{\partial f}{\partial x}(-3, 0.5) = \frac{2(-3)}{9} = -\frac{2}{3}$ , so if we hold $\: \color{orange} y \:$ constant, $\color{orange} f$ decreases as $\: \color{orange} x \:$ increases.
$\frac{\partial f}{\partial y}(-3, 0.5) = \frac{2(0.5)}{9} = \frac{1}{9}$ , so if we hold $\: \color{#3d81f6} x \:$ constant, $\color{#3d81f6} f$ increases as $\: \color{#3d81f6} y \:$ increases.

import numpy as np
import plotly.graph_objects as go

# Grid for paraboloid
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
a = 3
Z = (X**2 + Y ** 2) / a**2

fig = go.Figure()

# Paraboloid only, with PuRd colorscale, add to legend stack, no colorbar
fig.add_trace(go.Surface(
    z=Z, x=X, y=Y, 
    colorscale="PuRd", 
    opacity=0.8, 
    name="f(x, y)", 
    showscale=False,
    showlegend=True
))

# Annotate the point (-3, 2)
x0, y0 = -3, 0.5
z0 = (x0**2 + y0**2) / a**2

fig.add_trace(go.Scatter3d(
    x=[x0],
    y=[y0],
    z=[z0],
    mode='markers+text',
    marker=dict(size=8, color='black'),
    text=["(-3, 0.5)"],
    textfont=dict(color='black', size=12),
    textposition="top center",
    name="(-3, 0.5)",
    showlegend=False
))

# Add gold line for Z(-3, y): x = -3, y varies, legend: path of f(-3, y)
y_line_full = np.linspace(-5, 5, 100)
x_line_full = np.full_like(y_line_full, -3)
z_line_full = (x_line_full**2 + y_line_full**2) / a**2
fig.add_trace(go.Scatter3d(
    x=x_line_full,
    y=y_line_full,
    z=z_line_full,
    mode='lines',
    line=dict(color='gold', width=8),
    name="path of f(-3, y)",
    showlegend=True,
    visible='legendonly'
))

# Add gold line for Z(x, 0.5): y = 0.5, x varies, legend: path of f(x, 0.5)
x_line_full2 = np.linspace(-5, 5, 100)
y_line_full2 = np.full_like(x_line_full2, 0.5)
z_line_full2 = (x_line_full2**2 + y_line_full2**2) / a**2
fig.add_trace(go.Scatter3d(
    x=x_line_full2,
    y=y_line_full2,
    z=z_line_full2,
    mode='lines',
    line=dict(color='gold', width=8),
    name="path of f(x, 0.5)",
    showlegend=True,
    visible='legendonly'
))

# Calculate partial derivatives at (-3, 2)
# f(x,y) = (x² + y²)/a²
# ∂f/∂x = 2x/a²
# ∂f/∂y = 2y/a²
dfdx_at_point = 2 * x0 / a**2  # = 2*(-3)/9 = -2/3
dfdy_at_point = 2 * y0 / a**2  # = 2*(2)/9 = 4/9

# Create tangent line in x-direction at (-3, 2)
# Tangent line: z = z0 + dfdx_at_point * (x - x0)
x_tangent_x = np.linspace(x0 - 1.5, x0 + 1.5, 20)
y_tangent_x = np.full_like(x_tangent_x, y0)
z_tangent_x = z0 + dfdx_at_point * (x_tangent_x - x0)

fig.add_trace(go.Scatter3d(
    x=x_tangent_x,
    y=y_tangent_x,
    z=z_tangent_x,
    mode='lines',
    line=dict(color='orange', width=8),
    name="tangent line in x-direction (steep, negative slope)",
    showlegend=True
))

# Create tangent line in y-direction at (-3, 2)
# Tangent line: z = z0 + dfdy_at_point * (y - y0)
y_tangent_y = np.linspace(y0 - 1.5, y0 + 1.5, 20)
x_tangent_y = np.full_like(y_tangent_y, x0)
z_tangent_y = z0 + dfdy_at_point * (y_tangent_y - y0)

fig.add_trace(go.Scatter3d(
    x=x_tangent_y,
    y=y_tangent_y,
    z=z_tangent_y,
    mode='lines',
    line=dict(color='#3d81f6', width=8),
    name="tangent line in y-direction (shallow, positive slope)",
    showlegend=True
))

fig.update_layout(
    scene=dict(
        xaxis=dict(
            title=r"x",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        yaxis=dict(
            title=r"y",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        zaxis=dict(
            title=r"f(x,y)",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        bgcolor="white"
    ),
    paper_bgcolor='white',
    plot_bgcolor='white',
    margin=dict(l=30, r=30, t=30, b=30),
    autosize=True,
    width=600,
    height=600,
    font=dict(
        family="Palatino Linotype, Palatino, serif",
        color="black"
    ),
    legend=dict(
        orientation="v",
        yanchor="bottom",
        y=-0.5,
        xanchor="center",
        x=0.5
    ),
    scene_camera=dict(
        eye=dict(x=1, y=-1.5, z=1.2)
    )
)

fig.show()

Above, we’ve shown the tangent lines in both the $x$ and $y$ directions at the point $(-3, 0.5)$ . After all, the derivative of a function at a point tells us the slope of the tangent line at that point; that interpretation remains true with partial derivatives.

Let’s look at a more complex example. Consider:

g(x, y) = x^3 - 3xy^2 + 2 \sin(x) \cos(y)

import numpy as np
import plotly.graph_objects as go

def g(x, y):
    return x**3 - 3*x*y**2 + 2*np.sin(x)*np.cos(y)

x = np.linspace(-3, 3, 51)
y = np.linspace(-3, 3, 51)
X, Y = np.meshgrid(x, y)
Z = g(X, Y)

surface = go.Surface(
    x=X, y=Y, z=Z,
    colorscale='PuRd',
    opacity=1,
    showscale=False,
)

fig = go.Figure(data=[surface])

fig.update_layout(
    scene=dict(
        xaxis=dict(
            title="x",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        yaxis=dict(
            title="y",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        zaxis=dict(
            title="g(x, y)",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        bgcolor="white"
    ),
    paper_bgcolor='white',
    plot_bgcolor='white',
    margin=dict(l=30, r=30, t=30, b=30),
    autosize=True,
    width=600,
    height=400
)

fig.show(scale=4)

Both partial derivatives are functions of both $x$ and $y$ , which is typically what we’ll see.

\begin{align*} g(x, y) &= x^3 - 3xy^2 + 2 \sin(x) \cos(y) \\ \frac{\partial g}{\partial x}(x, y) &= 3x^2 - 3y^2 + 2 \cos(x) \cos(y) \\ \frac{\partial g}{\partial y}(x, y) &= -6xy - 2 \sin(x) \sin(y) \end{align*}

To compute $\frac{\partial g}{\partial x}(x, y)$ , we treated $y$ as a constant. Let me try and make more sense of this.

To help visualize, we’ve drawn the function $\color{#d81b60} g(x, y)$ , along with the plane $\color{#3d81f6} y = a$ . The slider lets you change the value of $\color{#3d81f6} a$ being considered, i.e., it lets you change the constant value that we’re assigning to $y$ .

The intersection of $\color{#3d81f6} g(x, y)$ and $\color{#3d81f6} y = a$ is marked as a gold curve and is a function of $x$ alone.

import numpy as np
import plotly.graph_objects as go

title_maker = lambda x, y: f"<span style='color: #d81b60; font-weight: bold;'>g(x, y) = x³ - 3xy² + 2sin(x)cos(y)</span><br>on <span style='color: #3d81f6; font-weight: bold;'>the plane y = {y:.2f}</span>, <span style='color: gold; font-weight: bold;'>g(x, {y:.2f}) = x³ - 3x({y:.2f})² + 2sin(x)cos({y:.2f})</span><br>" + '&nbsp;' * 39 + f"∂g/∂x(x, <span style='color: #3d81f6'>{y:.2f}</span>) = 3x² - 3(<span style='color: #3d81f6'>{y:.2f}</span>)² + 2cos(x)cos(<span style='color: #3d81f6'>{y:.2f}</span>)"

def g(x, y):
    return x**3 - 3*x*y**2 + 2*np.sin(x)*np.cos(y)

x = np.linspace(-3, 3, 16)
y = np.linspace(-3, 3, 16)
X, Y = np.meshgrid(x, y)
Z = g(X, Y)

def slicing_plane(y0):
    X_plane = np.linspace(-3, 3, 16)
    Y_plane = np.full_like(X_plane, y0)
    Z_plane = g(X_plane, Y_plane)
    return X_plane, Y_plane, Z_plane

y0_init = 0.0
X_plane, Y_plane, Z_plane = slicing_plane(y0_init)

surface = go.Surface(
    x=X, y=Y, z=Z,
    colorscale='PuRd',
    opacity=1,
    showscale=False,
    name='Surface'
)

slice_curve = go.Scatter3d(
    x=X_plane, y=Y_plane, z=Z_plane,
    mode='lines',
    line=dict(color='gold', width=12),
    # name='Slice (y = y₀)'
)

plane_z = np.linspace(np.min(Z), np.max(Z), 2)
plane_x = np.linspace(-3, 3, 2)
plane_y = np.full((2, 2), y0_init)
plane_x_grid, plane_z_grid = np.meshgrid(plane_x, plane_z)
plane = go.Surface(
    x=plane_x_grid, y=plane_y, z=plane_z_grid,
    showscale=False,
    opacity=0.4,
    colorscale=[[0, '#3d81f6'], [1, '#3d81f6']],
    name='Slicing Plane'
)

fig = go.Figure(data=[surface, plane, slice_curve])

steps = []
y0_values = np.linspace(-3, 3, 16)
for i, y0 in enumerate(y0_values):
    Xp, Yp, Zp = slicing_plane(y0)
    plane_y_new = np.full((2, 2), y0)
    eq_text = title_maker(0, y0)
    step = dict(
        method="update",
        args=[
            {
                "x": [X, plane_x_grid, Xp],
                "y": [Y, plane_y_new, Yp],
                "z": [Z, plane_z_grid, Zp]
            },
            {
                "annotations": [dict(
                    text=eq_text,
                    xref="paper", yref="paper",
                    x=0.02, y=0.98, showarrow=False,
                    font=dict(size=15, family="Palatino"),
                    align="left",
                    bgcolor="white"
                )]
            }
        ],
        label=f"{y0:.2f}"
    )
    steps.append(step)

sliders = [dict(
    active=len(y0_values)//2,
    currentvalue={
        "prefix": "<span style='color: #3d81f6; font-weight: bold;'>Slice at y=</span>",
        "font": dict(family="Palatino", size=14, color="#3d81f6")
    },
    pad={"t": 30},
    steps=steps,
    font=dict(family="Palatino", size=14, color="black")
)]

init_eq = title_maker(0, y0_init)

fig.add_annotation(
    text=init_eq,
    xref="paper", yref="paper",
    x=0.02, y=0.98, showarrow=False,
    font=dict(size=15, family="Palatino"),
    align="left",
)

fig.update_layout(
    sliders=sliders,
    scene=dict(
        xaxis=dict(
            title="x",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        yaxis=dict(
            title="y",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        zaxis=dict(
            title="g(x, y)",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
            tickfont=dict(size=10, family="Palatino"),
            title_font=dict(family="Palatino")
        ),
        bgcolor="white"
    ),
    paper_bgcolor='white',
    plot_bgcolor='white',
    margin=dict(l=30, r=30, t=30, b=30),
    autosize=True,
    width=700,
    height=700
)

fig.show()

Drag the slider to $\color{#3d81f6} y = 1.40$ , for example, and look at the gold curve that results. The expression below tells you the derivative of that gold curve with respect to $x$ .

\frac{\partial g}{\partial x}(x, {\color{#3d81f6}1.40}) = 3x^2 - 3({\color{#3d81f6}1.40})^2 + 2 \cos(x) \cos({\color{#3d81f6}1.40}) = \underbrace{3x^2 - 0.34 \cos(x) - 5.88}_\text{derivative of {\color{gold}\textbf{gold curve}} w.r.t. $x$}

Thinking in three dimensions can be difficult, so don’t fret if you’re confused as to what all of these symbols mean – this is all a bit confusing to me too. (Are professors allowed to say this?) Nonetheless, I hope these interactive visualizations are helping you make some sense of the formulas, and if there’s anything I can do to make them clearer, please do tell me!

Activity 1¶

Activity 1

Find all three partial derivatives of the function:

g(x, y, z) = 2x^2 + y^2 + 3z^2 + 2xy^2 - 4yz + 6x - 4z - 10

Optimization¶

To minimize (or maximize) a function $f: \mathbb{R} \to \mathbb{R}$ , we solved for critical points, which were points where the (single variable) derivative was 0, and used the second derivative test to classify them as minima or maxima (or neither, like in the case of $f(x) = x^3$ at $x = 0$ ).

The analog in the $\mathbb{R}^2 \rightarrow \mathbb{R}$ case is solving for the points where both partial derivatives are 0, which corresponds to the points where the function is neither increasing nor decreasing along either axis.

In the case of our first example,

f(x, y) = \frac{x^2 + y^2}{9}

the partial derivatives were relatively simple,

\frac{\partial f}{\partial x} = \frac{2x}{9}, \quad \frac{\partial f}{\partial y} = \frac{2y}{9}

and both are 0 when $x = y = 0$ . So, $(0, 0, f(0))$ is a critical point, and we can see visually that it’s a global minimum.

(Notice that above I wrote $\frac{\partial f}{\partial x}$ and $\frac{\partial f}{\partial y}$ instead of $\frac{\partial f}{\partial x}(x, y)$ and $\frac{\partial f}{\partial y}(x, y)$ to save space, but don’t forget that both partial derivatives are functions of both $x$ and $y$ in general.)

import numpy as np
import plotly.graph_objects as go

# Grid for paraboloid
x = np.linspace(-5, 5, 50)
y = np.linspace(-5, 5, 50)
X, Y = np.meshgrid(x, y)
a = 3
Z = (X**2 + Y ** 2) / a**2

fig = go.Figure()

# Paraboloid only, with RdBu_r colorscale
fig.add_trace(go.Surface(
    z=Z, x=X, y=Y, 
    colorscale="PuRd", 
    opacity=1, 
    name="Paraboloid", 
    showscale=False
))

# Add gold point at (0, 0, 0) and label it "global minimum"
fig.add_trace(go.Scatter3d(
    x=[0],
    y=[0],
    z=[0],
    mode='markers+text',
    marker=dict(size=8, color='gold', symbol='circle'),
    # text=["global minimum"],
    textposition="top center",
    name="Global Minimum"
))

fig.update_layout(
    scene=dict(
        xaxis=dict(
            title=r"x",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        yaxis=dict(
            title=r"y",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        zaxis=dict(
            title=r"f(x,y)",
            backgroundcolor="white",
            gridcolor="#f0f0f0",
            showbackground=True,
            showline=True,
            linecolor="black",
            linewidth=1,
        ),
        bgcolor="white"
    ),
    paper_bgcolor='white',
    plot_bgcolor='white',
    margin=dict(l=30, r=30, t=30, b=30),
    autosize=True,
    width=600,
    height=400,
    font=dict(
        family="Palatino Linotype, Palatino, serif",
        color="black"
    ),
    scene_camera=dict(
        eye=dict(x=1, y=-1.5, z=1.2)
    )
)

fig.show()

There is a second derivative test for functions of multiple variables, but it’s a bit more complicated than the single variable case, and to give you an honest explanation of it, I’ll need to introduce you to quite a bit of linear algebra first. So, we’ll table that thought for now.

The function $g(x, y) = x^3 - 3xy^2 + 2 \sin(x) \cos(y)$ has much more complicated partial derivatives, and so it’s difficult to solve for its critical points by hand. Fear not – in Chapter 8, when we discover the technique of gradient descent, we’ll learn how to minimize such functions just by using their partial derivatives, even when we can’t solve for where they’re 0.

Activity 2¶

Activity 2

Find the values of $x_1$ and $x_2$ that minimize the function:

g(x_1, x_2) = 100(x_2 - x_1^2)^2 + (1 - x_1)^2

Here, we’ve used $x_1$ and $x_2$ to denote the two input variables, rather than $x$ and $y$ .

With partial derivatives in hand, we can now minimize mean squared error to solve for the optimal regression parameters.