Say you have a fair die. (Most of them are these days.) You roll it once and observe a 3. You roll it again and observe a 1. You roll it yet again and get a 4. In three throws you’ve observed 3 of the possible sides. What if you keep going? How many rolls would it take to see all six sides at least once. Obviously the minimum number is 6. But the probability of that happening is about 0.015. (\( 1 \times \frac{5}{6}\times \frac{4}{6}\times \frac{3}{6}\times \frac{2}{6}\times \frac{1}{6}\)). What is the most likely number of rolls? Or put in the language of a statistics textbook, what’s the expected number of rolls?
To solve this problem we need to use the Geometric Distribution. If a series of trials are independent of one another, and each trial has the same probability of success, then the number of trials needed to observe the first success has a geometric distribution. This is the scenario of our roll of the die. Each roll is independent of one another and the probability of success remains the same for each roll, whatever we define “success” to be. The mean of a geometric distribution is \( \frac{1}{p}\). That means the expected number of times we need to roll a dice to observe, say, a four is 6.
Back to our problem. We’re certain to get at least one of the faces on the first roll. So that probability is 1. On the second roll we have a probability of \( \frac{5}{6}\) of getting a different face than what we got on the first roll. It’s not certain. There is a probability of \( \frac{1}{6}\) of getting the same result as the first roll. Once we have observed two sides, the probability of observing a third side becomes \( \frac{4}{6}\). On so on. Notice that as we observe previously unseen sides the definition of success changes, therefore the probability changes. So to determine the expected number of rolls to see all sides of a die at least once, we need to find the expected number of rolls for each definition of “success” and sum them. The steps below describe the process. (The numbers represent the order of the observed sides, not the values on the dice.)
- We’re guaranteed to see one of the sides on the first roll. Expected number of rolls: 1
- We have \( \frac{5}{6}\) probability of seeing a different side than what was previously observed. Expected number of rolls: \( 1/\frac{5}{6}=1.2\)
- We have \( \frac{4}{6}\) probability of seeing a different side than what was previously observed in steps 1 and 2. Expected number of rolls: \( 1/\frac{4}{6}=1.5\)
- We have \( \frac{3}{6}\) probability of seeing a different side than what was previously observed in steps 1 -3. Expected number of rolls: \( 1/\frac{3}{6}=2\)
- We have \( \frac{2}{6}\) probability of seeing a different side than what was previously observed in steps 1-4. Expected number of rolls: \( 1/\frac{2}{6}=3\)
- We have \( \frac{1}{6}\) probability of seeing a different side than what was previously observed in steps 1-5. Expected number of rolls: \( 1/\frac{1}{6}=6\)
Adding all the expected number of rolls for each definition of success we get 14.7. So we expect to roll a die about 15 times on average before we observe all sides at least once.
We can use this same line of thinking to answer such “real-life” questions as how many boxes of cereal would you have to buy to collect the latest series of cheap plastic toys, assuming the toys are randomly added to the cereal in a uniform manner. Of course if you’re desperate to collect the precious toys and have a modicum of patience, the answer is probably 0. You’d just wait to buy the whole set on eBay.