# Deriving the probability distribution for the sum of many dice rolls

Rolling a whole handful of dice is one of a tabletop gamer’s simple pleasures. The math that governs the behavior of that dice roll is not so simple. We previously explored the cases of rolling just one or two dice. Later, we made the general observation that as you roll many dice the probability distribution tends to concentrate near the expected value.

In this post, we are going to look directly at the process of rolling
a whole bunch of dice and derive the probability distribution that governs
its behavior. Fair warning - this post is going to get pretty math heavy!^{1}

## Counting possible dice roll outcomes

Beginners in probability often work problems involving dice, coins, or
playing cards — really any sort of game of chance. These
problems are typically solved not by appealing to probability distribution or
random variables, but by *counting outcomes*. We will ultimately use the
same approach here. As a refresher, consider the following problem.

What is the probability of rolling a total of 3 on a 2d6 roll?To solve this, we need two numbers — the total number of possible outcomes and the number of outcomes that result in a total of 3.

There are 36 total possible outcomes, 6 for each individual dice, and we take the product for the total number. When counting up the outcomes that result in 3, we find only two, $(2, 1)$ and $(1, 2)$.

The probability is the number of outcomes with a total of 3 divided by the total number of outcomes, or $\frac{2}{36} \approx 0.06$. It’s easy to check that this answer is identical to one that we would get appealing to the uniform probability distribution that we used in this post.

The above is not too hard, but you can easily imagine how much of a pain this would be if you’re using 10 dice instead of 2. Oof. The good news is that there is a better way!

Let’s take a look at this polynomial. The purpose will become clear in a bit.

This is the all ones polynomial without the constant 1. If we square this polynomial and expand the terms, we get:

Look carefully at the coefficients. You may notice that they correspond exactly to the number of ways to get a given outcome on a 2d6 roll — there’s one way to get a 2, two ways to get a 3 (that’s the one we did above), three ways to get a 4, and so on and so forth.

If we want to get the probability of rolling an outcome like 3, we look at the coefficient of $x^3$. We take that coefficient and divide by 36 to get the probability - $\frac{2}{36}$ just like before.

This is important, because what we’ve shown here is an alternate way to count the outcomes — and therefore compute the probability — of our dice roll outcome. Instead of directly counting, we find polynomial coefficients. The above example is for two 6 sided dice, but the general approach for $m$ $n$-sided dice is the same. We take the below polynomial and find it’s coefficients.

For convenience, we divide the whole polynomial by the total number of outcomes, $n^m$ to make the coefficients probabilities instead of counts. Formally, our problem here can be written with random variables as follows:

Then, **$P(Y=y)$ is calculated by finding the coefficient of
$x^y$ in the polynomial $g(x)$ above.**

*Some of you may recognize that we are taking a conceptual approach to
solving this problem with probability generating functions. We
could have gone straight there, but I find the counting line of reasoning to
be a satisfying way to see why this works.*

## Tricky polynomials and series

Just knowing that we *can* compute probabilities using polynomial
coefficients doesn’t mean that it’s easy. In this case, it is possible,
but we are going to have to do some not-so-obvious polynomial manipulation to
get where we want to go.

We’ll start by rewriting our original polynomial $f(x)$:

This can be directly verified by polynomial division. This transforms $g(x)$ into:

From here, we go term by term. We start by taking a look at $(x^n - 1)^m$. We can expand this as a sum with the binomial theorem:

Multiplying through by $x^m$ we get:

This represents the numerator of $g(x)$. Let’s now turn our attention to the denominator, $(x-1)^{-m}$. Again we can rewrite this as a sum, but this time as a binomial series, which generalizes the binomial theorem to complex exponents:

We can now write $g(x)$ in it’s full glory as the product:

We make a slight simplification by bringing the factor of $(-1)^{-m}$ into the first sum to give (we also rewrite $(-1)^{-j} = (-1)^j$):

The finite sum can be thought of as an infinite series that takes on values of $0$s when $j > m$. With this in mind, we can take the Cauchy product of the two series to get the coefficients, which generally states:

Where the coefficients $c_k$ are given by:

Those coefficients are the key. This whole business we’re going through is to get those coefficients. Remember — those coefficients will be our probabilities. If we do some careful accounting to collect the coefficients, we are left with the following expression:

Where $\lfloor a \rfloor$ is the floor function that rounds down to the nearest integer.

## The probability distribution

To get back to probabilities, we ignore the power series and only look at the coefficients to write our final expression for the probability mass function:

Needless to say, it’s not the cleanest of expressions, and I wouldn’t want to have to calculate any of these probabilities by hand. It is correct though! Exactly so.

I have a hard time divining what this distribution looks like just from the closed form. We get a better idea if we look at a graph of the distribution. Below we show the shapes of the distribution for 2, 3, and 4 d6 rolls.

From top to bottom, we see the triangular distribution that characterizes the 2d6 roll smoothing out as we add more dice. The distribution quickly takes on a familiar bell shaped curve with as little as 4 dice!

We know from the central limit theorem that the sum approaches a normal distribution for large $m$, but we seem to be getting pretty close even with small $m$ (at least in shape)! Exploring this property further and looking at convergence to a normal distribution will have to be a topic for the future. Stay tuned!

- The derivation is adapted from
*Introduction to Mathematical Probability*(1937) by Uspensky↩