Using the mean to find the mode of a Binomial Distribution
The original motivation behind this was an attempt to save my Statistics students a few precious seconds in their upcoming S1 module paper.
The mean or expectation of a Binomial Distribution is always very close to mode or the value of X that has greatest probability. I want to know if you can use the mean to reliably predict the mode.
Binomial Distributions come up all over the place. A classic example would be where you try to score, say, a 5 with an ordinary dice. You perform n trials, and the probability of success on a particular go is 1/6. On each trial you’ll either succeed, i.e. score a 5 (the probability of which is 1/6) or you’ll fail, i.e. not score 5 (the probability of which is 5/6). X is simply the number of times you score 5 out of those n trials. Binomial literally means ‘two numbers’: the probability of success and the probability of failure; and the two numbers add up to 1.
But we can be very general. In the following example we are not even given the context of the experiment; we’re just given n which is the number of trials, and p which is the probability of success on each separate trial.
Let X be the number of successes and X~B(20,0.42)
i.e. X is binomially distributed across 20 trials with probability of success in each trial 0.42
(i) The expected number of successes (the mean).
(ii) The most likely number of successes (the mode).
Part (i) is straightforward. Just use the formula np.
20×0.42 = 8.4
Part (ii) is only a little fiddlier. Textbooks recommend using the expectation np as a guide, and to calculate the probabilities that X takes each of the values either side of np. The answer is the value of X that yields the greatest probability:
P(X=8) = 20C8(0.42)8(1-0.58)12 = 0.1767 (4s.f.)
P(X=9) = 20C9(0.42)9(1-0.58)11 = 0.1707 (4s.f.)
So the answer is 8 successes.
Is there a quicker way that doesn’t involve using the formula nCrpr(1-p)n-r ?
In this example, the expectation, np, rounded to the nearest integer, is 8. Is it a coincidence that this value is also the most likely number of successes?
Most textbook problems like this one are such that np rounded to the nearest integer gives the most likely number of successes. If this were a reliable fact for any n and p, then we could take all calculations away from the solution to part (ii) and simply write down the answer, saving a minute or more in an exam. So let’s see if we can prove or disprove it…
“If X~B(n,p) then the most likely value of X is equal to np rounded to the nearest integer.”
Consider the possible shapes of a binomial distribution:
When n is even, the hypothesis clearly holds:
When n is odd, np will be exactly halfway between two integers, and will round to the greater of them. This is ok because the distribution is bimodal and we’re still ending up with one of the two modes:
Asymmetrical or ‘skew’ (p≠0.5)
Usually there is one value of X that has greater probability than any of the others. Consider these two distributions:
The value of p changes only slightly from 0.44 to 0.45, but the most likely value of X has changed from X=3 to X=4. So there must be a value of p between 0.44 and 0.45 where P(X=3) and P(X=4) are exactly equal.
We can find this in the general case, for any n and for any pair P(X=r) and P(X=r-1):
We can derive this in either direction, so: if there is a positive integer r ≤n such that p=r/(n+1), then P(X=r) = P(X=r-1).
But, there is a tipping point at which np stops being rounded up and starts being rounded down: this is halfway between the two integers, at np = r-½. Divide both sides of this equation by n, to give p=(r-½)/n.
If the value of p falls between r/(n+1) and (r-½)/n then the hypothesis does not hold.
Example: Given n=20 and r=8, then r/(n+1) = 0.381 and (r-½)/n = 0.375
If p takes a value between these figures then the hypothesis does not hold. Given how close these figures are to each other, this explains why a counterexample seemed so elusive.
Formally, the hypothesis does not hold if and only if, for some positive integer r ≤n,
Though if np has already been calculated, it may make more sense to consider whether np falls within certain bounds:
These chain inequalities provide only small windows into which p must fall in order to negate the hypothesis. We will now add up the sizes of these small windows to determine the likelihood that the hypothesis does not hold.
Consider the size of these windows. We can sum over this expression for all values of r from 1 to n. This gives the probability that, for a randomly chosen value of p, the hypothesis does not hold.
For large values of n, this is approximately equal to ¼. So, for a randomly chosen probability of success from a single trial, the probability that the hypothesis holds is at least ¾.
This interactive diagram shows the regions where the value of p negates the hypothesis (the red regions show the values of p that provide counterexamples). Drag the slider to modify the value of n:
This graph shows how the probability that the hypothesis holds varies with increasing n.
The recommendation by the textbooks, to separately calculate the probabilities that X takes each of the values either side of np, is the best approach to finding the most likely value of X.
But there is at least a ¾ chance that it will be the value closest to np, so if you’re really short of time in an exam, guess!