Chapter 0.1. Summation Notation and the Mean

Sums and averages play an important role in machine learning. In Chapter 1.2, we’ll learn to take the average of an important measurement (called a “loss function”) for every value in our dataset.

Here, we’ll review the most relevant properties of summation notation, and use the arithmetic mean as a case study of sorts.

Introduction¶

For example, if we take $x_i = i^2$ , then $\displaystyle \sum_{i = 1}^6 i^2$ represents the sum of the squares of all integers from 1 to 6:

\sum_{i = 1}^6 i^2 = 1^2 + 2^2 + 3^2 + 4^2 + 5^2 + 6^2

Notice that both the starting and ending indices (1 and 6, respectively) are included in the sum. Summation notation allows us to express sums conveniently – the left-hand side above is more compact than the right-hand side.

Often, we’ll take the sum of the first $n$ terms of a sequence. For example, the sum of the squares of the first $n$ positive integers is:

\sum_{i = 1}^n i^2

Note that the index of summation can be any variable name ( $i$ is just a typical choice). That is, $\displaystyle \sum_{j = 1}^n j^2$ , $\displaystyle \sum_{i = 1}^n i^2$ , and $\displaystyle \sum_{\text{zebra} = 1}^n \text{zebra}^2$ all represent the same sum.

Summation notation can be thought of in terms of a for-loop. In Python, to compute the sum $\displaystyle \sum_{i = a}^b i^2$ , we could write:

total = 0
for i in range(a, b + 1):
	total = total + i ** 2

As we mentioned above, the ending index is inclusive in summation notation. This is in contrast to Python, where the ending index is exclusive, which is why we provided b + 1 as the second argument to the range function instead of b.

Properties and Examples¶

To illustrate various properties of summation notation, we’ll use the following fact:

\sum_{i = 0}^n 2^i = 2^0 + 2^1 + 2^2 + ... + 2^n = 2^{n+1} - 1

The expression $2^{n+1} - 1$ is called a closed form for the summation. When closed forms exist, they make it easy to compute the value of a summation. You’ll also notice that in addition to writing the sum using summation notation, I also showed the first few and last terms of the sum, with a ... to indicate that the pattern continues in between. As convenient as summation notation is, explicitly writing the first few and last terms of a sum can sometimes make it easier to understand what exactly is being summed.

For example, $\displaystyle \sum_{i = 0}^3 2^i = 2^0 + 2^1 + 2^2 + 2^3 = 1 + 2 + 4 + 8 = 15$ , which is indeed $2^{3+1} - 1 = 15$ . The sequence $2^0, 2^1, 2^2, \ldots, 2^n$ is called a geometric sequence, and the resulting sum is called a geometric series.

For convenience, we’ll define $\displaystyle S(n) = \sum_{i = 0}^n 2^i$ ; again, $S(n) = 2^{n+1} - 1$ .

The best way to learn through these examples is to try to solve them yourself before looking at the solution.

Example: Partial Sums¶

Determine the value of $\displaystyle \sum_{i = 8}^{19} 2^i$ . (Find an answer that doesn’t involve summation notation or a sum over many terms.)

Solution

We can’t use the formula for $S(n)$ directly, because the starting index is 8, not 0. But, if we look at the expansion of $S(19)$ , we’ll see that it contains the sum we’re looking for:

\begin{align*} S(19) &= \sum_{i = 0}^{19} 2^i \\ &= 2^0 + 2^1 + 2^2 + \cdots + 2^{19} \\ &= (2^0 + 2^1 + 2^2 + \cdots + 2^7) + (2^8 + 2^9 + 2^{10} + \cdots + 2^{19}) \\ &= \underbrace{\sum_{i = 0}^{7} 2^i}_{S(7)} \:\:\:\:\:\: + \underbrace{\sum_{i = 8}^{19} 2^i}_{\text{what we're looking for}} \end{align*}

So, the piece we’re looking for is $S(19) - S(7)$ , or:

\sum_{i = 8}^{19} 2^i = S(19) - S(7) = (2^{20} - 1) - (2^8 - 1) = 2^{20} - 2^8

The specific value of the sum, $2^{20} - 2^8$ , is $1048576 - 256 = 1048320$ , if you’re curious.

Example: Constant Multiples¶

Determine the value of $\displaystyle \sum_{i = 0}^{7} 5 \cdot 2^i$ .

Solution

\begin{align*} \sum_{i = 0}^{7} 5 \cdot 2^i &= 5 \cdot 2^0 + 5 \cdot 2^1 + 5 \cdot 2^2 + 5 \cdot 2^3 + 5 \cdot 2^4 + 5 \cdot 2^5 + 5 \cdot 2^6 + 5 \cdot 2^7 \\ &= 5 (2^0 + 2^1 + 2^2 + 2^3 + 2^4 + 2^5 + 2^6 + 2^7) \\ &= 5 \sum_{i = 0}^{7} 2^i \\ &= 5 (2^{8} - 1) \end{align*}

This last line is what’s most important; its specific value is $5 \cdot (2^8 - 1) = 5 \cdot 255 = 1275$ .

Example: Sum of a Constant¶

Determine the value of $\displaystyle \sum_{i = 7}^{20} 5$ .

Solution

\begin{align*} \sum_{i = 7}^{20} 5 &= \underbrace{5 + 5 + 5 + \cdots + 5}_{14 \text{ times}} \\ &= 5 \cdot 14 \\ &= 70 \end{align*}

Where did 14 come from? The starting index is 7, and the ending index is 20. So, the number of terms is $20 - 7 + 1 = 14$ .

Example: Shifting Indices¶

Determine the value of $\displaystyle \sum_{i = 10}^{20} 2^{i - 5}$ .

Solution

As we did in the very first example, let’s try and write the sum as a difference of two calls to $S(n)$ . $S(n)$ involves the sum of terms of the form $2^i$ , and the sum we’re looking for involves the terms $2^{i - 5}$ , so we’ll need to shift the indices.

When $i = 10$ , $i - 5 = 5$ , and when $i = 20$ , $i - 5 = 15$ . So, we can rewrite the sum in question as:

\sum_{i = 10}^{20} 2^{i - 5} = \sum_{i = 5}^{15} 2^i

This looks like $S(15) - S(4)$ , or $2^{16} - 1 - (2^5 - 1) = 2^{16} - 2^5 = 65536 - 32 = 65504$ .

Example: Separating Sums¶

Given that $\displaystyle \sum_{i = 1}^{n} i = \frac{n(n+1)}{2}$ , determine the value of $\displaystyle \sum_{i = 0}^{12} (2^i + 5i)$ .

Solution

\begin{align*} \sum_{i = 0}^{12} (2^i + 5i) &= ({\color{3d81f6}{2^0}} + {\color{orange}{5 \cdot 0}}) + ({\color{3d81f6}{2^1}} + {\color{orange}{5 \cdot 1}}) + ({\color{3d81f6}{2^2}} + {\color{orange}{5 \cdot 2}}) + \ldots + ({\color{3d81f6}{2^{12}}} + {\color{orange}{5 \cdot 12}}) \\ &= ({\color{3d81f6}{2^0}} + {\color{3d81f6}{2^1}} + {\color{3d81f6}{2^2}} + \ldots + {\color{3d81f6}{2^{12}}}) + ({\color{orange}{5 \cdot 0}} + {\color{orange}{5 \cdot 1}} + {\color{orange}{5 \cdot 2}} + \ldots + {\color{orange}{5 \cdot 12}}) \\ &= \sum_{i = 0}^{12} {\color{3d81f6}{2^i}} + \sum_{i = 0}^{12} {\color{orange}{5i}} \\ &= 2^{13} - 1 + 5 \cdot \frac{12 \cdot 13}{2} \\ &= 8581 \end{align*}

The key here is that we can split the sum into two sums.

Key Takeaways¶

In the order they were introduced in the examples, here are some useful properties of summations:

Partial Sums:
$\sum_{i = 1}^{n} x_i = \sum_{i = 1}^{k} x_i + \sum_{i = k+1}^{n} x_i$
Constant Multiples:
$\sum_{i = 1}^{n} c x_i = c \sum_{i = 1}^{n} x_i$
Sum of a Constant:
$\sum_{i = 1}^{n} c = c \cdot n$
Shifting Indices:
$\sum_{i = k}^{n} x_i = \sum_{i = 0}^{n-k} x_{i+k}$
Separating Sums:
$\sum_{i = 1}^{n} (x_i + y_i) = \sum_{i = 1}^{n} x_i + \sum_{i = 1}^{n} y_i$

For more practice, try the following activities.

Activity 2

Activity 2.1

Given that $\displaystyle \sum_{i = 1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$ , determine the value of $\displaystyle \sum_{j = 3}^{10} (j^2 - 7j)$ .

Activity 2.2

Show that $\displaystyle \sum_{n = 1}^{100} \frac{1}{n(n+1)} = 1 - \frac{1}{101}$ .

Hint: Try writing $\frac{1}{n(n+1)}$ as a difference of two fractions.

Activity 2.3

Recall, $n! = n \cdot (n-1) \cdot (n-2) \cdot \ldots \cdot 1$ .

The Taylor Series expansion for the function $f(x) = e^x$ , at the point $x = 0$ , is given by:

e^x = \sum_{n = 0}^{\infty} \frac{x^n}{n!}

Find a closed form expression for the infinite series:

\frac{1}{2!} - \frac{1}{3!} + \frac{1}{4!} - \frac{1}{5!} + \frac{1}{6!} - \frac{1}{7!} + \cdots

Activity 2.4

${n \choose k}$ , pronounced “n choose k”, is the number of ways to choose $k$ items from a set of $n$ items.

{n \choose k} = \frac{n!}{k!(n-k)!}

For example, ${5 \choose 2} = 10$ , because there are 10 ways to choose 2 items from a set of 5 items, and $\frac{5!}{2!(5-2)!} = \frac{120}{2 \cdot 6} = 10$ .

Using the fact introduced in Activity 2.1, find a closed form expression for the following sum:

\sum_{k = 2}^n {k \choose 2}

Activity 2.5

Argue why the following equality holds (it’ll be hard to prove this algebraically):

\sum_{k = 0}^n {n \choose k} = 2^n

Mean and Standard Deviation¶

As I mentioned at the start of this section, we’ll work with sums of data points quite frequently in this class. We’ll often set up a problem by saying we have a sequence of $n$ scalar^[1] values, represented by $x_1, x_2, \ldots, x_n$ . For instance, perhaps there are $n$ students in this course, and $x_i$ represents the height of student $i$ .

Mean¶

The mean, or average, of all $n$ values is given the symbol $\bar{x}$ (pronounced “x-bar”) and is defined as follows:

\bar{x} = \frac{x_1 + x_2 + \ldots + x_n}{n} = \frac{1}{n} \sum_{i = 1}^n x_i

You’ve likely seen this definition before. But, an often-forgotten property of the mean is that the sum of the deviations from the mean is zero. By that, I mean (no pun intended) that if you:

compute the mean of a sequence of numbers,
compute the signed difference between each number and the mean, and then
sum all of those differences, the result will be zero.

Let’s first see this in action, then show why it is true in general. Suppose there are only 4 students in the class, with heights 72, 63, 68, and 65 inches. The mean of these heights is:

\bar{x} = \frac{72 + 63 + 68 + 65}{4} = 67

The deviations from the mean are:

\begin{align*} 72 - 67 &= 5 \\ 63 - 67 &= -4 \\ 68 - 67 &= 1 \\ 65 - 67 &= -2 \end{align*}

The sum of the four deviations, then, is:

5 + (-4) + 1 + (-2) = 0

So, the mean deviation from the mean is zero in this example.

This is also true in general. Precisely, I’m claiming that if $x_1, x_2, ..., x_{n-1}, x_n$ are any $n$ numbers, and $\bar{x}$ is their mean, then $\displaystyle \sum_{i = 1}^n (x_i - \bar{x}) = 0$ .

Let’s prove it:

\begin{align*} \sum_{i = 1}^n (x_i - \bar{x}) &= \sum_{i = 1}^n x_i - \sum_{i = 1}^n \bar{x} \\ &= \sum_{i = 1}^n x_i - n \bar{x} \\ &= \sum_{i = 1}^n x_i - n \left( \frac{x_1 + x_2 + \ldots + x_n}{n} \right) \\ &= \sum_{i = 1}^n x_i - (x_1 + x_2 + \ldots + x_n) \\ &= 0 \end{align*}

So, we’ve shown that the sum of the deviations from the mean is 0 in general. A consequence of this is that the positive deviations and negative deviations are equal in magnitude, since they need to cancel each other out. In the 72, 63, 68, 65 example, the positive deviations are 5 and 1 and the negative deviations are -4 and -2, and both have magnitude 6. As a result, the mean is sometimes thought of as the “balance point” of the dataset – the point at which the negative deviations are balanced by the positive deviations. More on this in Chapter 1.3.

Standard Deviation¶

Since the sum (and average) of deviations from the mean is 0, no matter the dataset, we can’t use the average deviation from the mean to measure how far values tend to be from the mean. The average deviation will be 0, whether the dataset is tightly clustered or spread out.

To measure how far values in a dataset tend to deviate from their mean, then, we’ll need to address the fact that some deviations are positive and some are negative. A common approach is to:

Compute the mean of the dataset
Compute the deviation of each value from the mean
Square each deviation
Take the average of the squared deviations

The result of this process is called the variance, denoted $s^2$ or $\sigma^2$ ; its square root is called the standard deviation.

Following the above steps, the variance is given by:

\sigma^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2

(If you’ve read Chapter 1.2, you’ll notice some similarities between this formula and the formula for mean squared error.)

Activity 3

What is the variance of the dataset 72, 63, 68, 65?

As a final note, the variance has a convenient, equivalent form:

\sigma^2 = \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2 = \frac{1}{n} \sum_{i = 1}^n x_i^2 - \bar{x}^2

In English, we might say the variance is “the mean of the squares of $x$ minus the square of the mean of $x$ ”.

Your turn: prove this equivalent form of the variance.

Solution

\begin{align*} \sigma^2 &= \frac{1}{n} \sum_{i = 1}^n (x_i - \bar{x})^2 \\ &= \frac{1}{n} \sum_{i = 1}^n (x_i^2 - 2x_i\bar{x} + \bar{x}^2) \\ &= \frac{1}{n} \sum_{i = 1}^n x_i^2 - \frac{2}{n} \sum_{i = 1}^n x_i\bar{x} + \frac{1}{n} \sum_{i = 1}^n \bar{x}^2 \\ &= \frac{1}{n} \sum_{i = 1}^n x_i^2 - 2\bar{x} \frac{1}{n} \sum_{i = 1}^n x_i + \bar{x}^2 \\ &= \frac{1}{n} \sum_{i = 1}^n x_i^2 - 2\bar{x}^2 + \bar{x}^2 \\ &= \frac{1}{n} \sum_{i = 1}^n x_i^2 - \bar{x}^2 \end{align*}

Activity 4

Suppose the dataset $x_1, x_2, \ldots, x_n$ has mean $\bar{x}$ and variance $\sigma^2$ .

Find the mean and variance of the dataset $-4x_1 + 3, -4x_2 + 3, \ldots, -4x_n + 3$ , and justify your answer rigorously using the definitions of mean and variance.

Footnotes¶

“Scalar” just means “individual number”, as opposed to a vector or matrix which can contain multiple numbers, as we’ll see later in the course.
↩

EECS 245 Course Notes

Chapter 6: Probability

Chapter 0: Math Review

Chapter 0.2. Derivatives