The binomial distribution*

The next few posts will cover the binomial, the normal, and the poisson distributions. These are just three of hundreds of distributions that have been named, but these are the most important and will solve nearly every problem you encounter.

We call these models because they are mathematical descriptions of reality, not reality itself. Often the models work so well that we tend to lose the distinction.

Each of the models involves moderately complicated mathematics. The good news is that R is extremely efficient at these kinds of problems, usually requiring just a single line of code. We will see examples of problems that would take hours to solve by hand but that R can answer in a second.

We’ll start with the binomial probability model, which we use when there are binary outcomes: yes or no, heads or tails, boy or girl. For example, suppose you work at a hospital where 9 out of the last 10 births were girls. “What are the chances of that?” one of the nurses asks you. She is just using a figure of speech, but you take her literally and decide to calculate the answer.

The equation you would need looks like this:

where n is the number of trials or events, in this case 10. k is the number of events you are interested in (you’ll hear statisticians use the word “successes”), in this case 9.

p is the probability of a “success”, here 0.49. (Why not 0.5? Natural selection has made boys slightly more likely than girls, to counterbalance their shorter lifespan beginning in adulthood).

q is the probability of a “non-success”, here 0.51.

We can rewrite this as:

The leftmost part is 10!/9!, where ! denotes a factorial. A factorial of a number is itself multiplied by all the positive integers smaller than it. So 10! = 10 * 9 * 8 * 7 * 6 * 5 * 4 * 3 * 2 = 3,628,800. (In this example, the nice thing about 10!/9! is that you don’t have to do all the multiplication, as all the numbers will cancel except 10, but many real-world problems are not so tidy).

= 10 * .001628 * .51 = .0083

That’s less than a 1% chance. Normally in science we say that anything less than a 5% chance is unusual, so yes, 9 out of 10 girls is unusual.

There are two crucial caveats here, however!

When we ask “what are the chances of that?” in science we usually mean “what are the chances of at least that?”, that is, “what are the chances of either that or something even more extreme?”. Not “what are the chances of 9 girls out of 10 births”, but “what are the chances of 9 or 10 girls out of 10 births?” So it’s:

The second part of this becomes 1 * .0008 * 1

So the answer now is .0091 – it hasn’t changed much. But in other examples it can change quite a lot.

Caveat number two: Presumably you would be equally impressed if there were 9 out of 10 boys, so it’s only fair to count those, too. When a nurse asks “what are the chances of that?” a statistician hears “what are the chances that at least 9 of 10 babies are the same biological sex”?

Do the math again, and now we are up to about 2% - still unusual, but less so.

What happens if a nurse says “This year we have had 35 boys and 28 girls. That seems unusual to me. The numbers should be more equal”. The probability of that EXACT combination is easy to calculate, and it will be very small. What a statistician hears is “Out of 63 births, what is the probability that at least 35 of them will be the same biological sex?”. You will have to calculate the probability of 35 boys plus 36 boys plus 37 boys, etc. all the way up to 63, then repeat for girls. It will be a very long and tedious calculation. Fortunately, we can do this instantly in R. The functions we want are called dbinom(), for an exact number of occurrences, and pbinom(), for a range of occurrences. We end up using pbinom() much more often.

With dbinom(), we provide the number of “successes”, the number of trials, and the probability of a success:

dbinom(9,10,0.49) #probability of exactly 9 girls

## [1] 0.008304909

dbinom(9,10,0.51) #probability of exactly 9 boys

## [1] 0.01143741

dbinom(10,10,0.49) #probability of exactly 10 girls

## [1] 0.0007979227

dbinom(10,10,0.51) #probability of exactly 10 boys

## [1] 0.001190424

dbinom(9,10,0.49)+dbinom(10,10,0.49) #probability of 9 or 10 girls (adding the two)

## [1] 0.009102832

pbinom() gives you the cumulative probability of less than or equal to the number of successes. If we want greater than a number of successes, we have to subtract this result from 1.

pbinom(8,10,0.49) #probability of 0,1,2,3,4,5,6,7 or 8 girls

## [1] 0.9908972

1-pbinom(8,10,0.49) #probability of 9 or 10 girls

## [1] 0.009102832

This is the same answer we calculated above using dbinom().

Another way to do this is to tell R we want to look at the right, or upper, side of the distribution. (The word “tail” is also used for “side”). By default, R looks at the left or lower tail. To do this we add the parameter lower.tail=FALSE and get the same answer:

pbinom(8,10,0.49,lower.tail=FALSE)

## [1] 0.009102832

You are free to do it either way you prefer.

To calculate the probability of at least 9 girls AND 9 boys, we have to calculate pbinom() twice:

(1-pbinom(8,10,0.49)) + (1-pbinom(8,10,0.51))

## [1] 0.02173067

We can now answer the question of how likely is it to have at least at 35-28 split in the sex ratio:

(1-pbinom(34,63,0.49)) + (1-pbinom(34,63,0.51))

## [1] 0.4556656

It’s not unusual at all, nearly 50%.