In biomedical research, dealing with probabilities is part of the daily routine for many scientists, independently of their specific research area. Statistics and probability calculations are important during the design phase of experiments (e.g., for group size calculation and power estimations), as well as during the analysis of study results and outcomes.
However, for researchers not familiarized with statistics, probability calculations can be puzzling, as we’ve got very strong instincts about how we expect numbers to work. The following example, called the Birthday Paradox, shows how misleading our instincts and intuition can be:
Let’s consider the probability that at least two individuals in a given group of n randomly selected people have the same birthday (under the assumptions that each day of the year is equally probable for a birthday, and that all the birthdays are independent). Imagine a classroom of 30 children. What’s the probability that two of the children have the same birthday? 365 possible birthdays, and only 30 children in a classroom? Intuition would say it’s pretty unlikely.
However, the answer is 70.6%! This is called the Birthday Paradox.
In a room of 30, there’s a 70.6% chance of two children having matching birthdays. In a group of 23 people, chances are as high as 50% that at least two of the group have the same birthday.
Why is that?
For a group of 23 people, the first person has 22 comparisons to make, the second person has 21 comparisons (as this person was already compared to the first person), the third person has 20 comparisons, the fourth person has 19, and so on. This results in 22 + 21 + 20 + 19 + … +1 = 253 different combinations. Consequently, each group of 23 people involves 253 chances for matching birthdays.
Each of the 253 combinations has the same odds of 99.73% (p = 364/365) of not being a match. If 99.73% is multiplied by 99.73% 253 times [(364/365)^253], we obtain a 49.95% chance that all 253 comparisons contain no matches. Consequently, the odds that there is a birthday match in those 253 comparisons is 1 – 0.4995 = 0.5005, or just over half!
The Birthday Paradox shows how unexpected trends can randomly occur in a data set with a large number of variables.
Nowadays, biomedical scientists can access a wealth of tools and software packages to analyze data and to understand the patterns in experimental studies. However, our intuition can often let us down when it comes to interpreting patterns and complex statistical tests, and thorough statistical knowledge is needed in order not to misunderstand the subtleties within a dataset and avoid drawing incorrect conclusions.
As errors in preclinical research frequently result from the improper analysis of data, this curious example suggests that biostatisticians should have a more relevant position in the experimental planning and analysis of preclinical studies and data sets.