Performing a power analysis is important for planning experimental studies. It helps us to estimate the sample size necessary to support or reject a hypothesis in question. In life sciences, the use of power analysis is strongly supported by the need to minimize the number of animal subjects. Further, in some areas such as drug discovery and development, the power of a study is directly connected to costly decisions with a potential impact on our health.
For studies that are meant to confirm an earlier finding or to support an important statement (including those that result in scientific publications), power analysis is a must and it should ideally be conducted before the study takes place (i.e. a priori power analysis). Repeating an underpowered study by designing and executing another underpowered experiment is worth less than doing a correctly powered study right from the very beginning. It may sound like common sense and there is a lot said and written about it. Yet, most studies in life sciences are underpowered (e.g. Button, K.S., Ioannidis, J.P.A., Mokrysz, C., Nosek, B.A., Flint, J., Robinson, E.S.J., Munafo, M.R., 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14, 365–376).
The best way to do a power analysis is when the effect size can be estimated (GM Sullivan and R Feinn, J Grad Med Educ. 2012 Sep; 4(3): 279–282). For instance, effects of antidepressant drugs in clinical trials are about 0.4. When testing a new antidepressant that is expected to be as effective as the existing ones, the power analysis should be based on the effect size of 0.4 as well as a chosen alpha and beta value (α is the probability of incorrectly rejecting a true null hypothesis; β is the probability of incorrectly retaining a false null hypothesis). It is indeed very straightforward and every decision-enabling study should be based on an exploratory study or set of studies that can help to estimate the effect size.
Recognizing this problem may lead to an unwanted effect that we would like to illustrate:
A recent paper published in The Journal of Neuroscience (Titus et al., July 6, 2016; 36(27):7095–7108) described the effects a phosphodiesterase inhibitor has on the cognitive performance of animals after traumatic brain injury. It is stated that “To determine the minimum number of animals needed for these studies, a power analysis was prospectively performed to detect a 20% difference in water maze probe trial performance between groups at 80% power with a significance level of 0.05 (Titus et al., 2013b). A sample size of 10 animals per group was obtained.” (p. 7096).
Why did the authors choose 20% difference as the target? There is no biological or any other explanation but, as the citation above indicates, a reference was provided to an earlier paper. In the cited paper, for a similar water maze task, we read the following: “To detect a 30% difference between groups at 80% power and with a significance level of 0.05, the estimated sample size was 8 animals per group (Bramlett et al., 1997a)” (p. 5219). Why is a difference of 30% applied in this case? No explanation is provided. Unfortunately, in the Bramlett 1997 paper, a different water maze protocol was used and it becomes impossible to follow how it helped to estimate the effect size for all subsequent experiments.
What we described above is not so rare and can be found quite often in the literature. In some of these cases, the power analysis is indeed conducted prior to the study but is used to defend a pre-selected number of animals (hence, varying definitions of the effect size from study to study). Such practice is a by-product of the pressure that scientists experience when planning the studies. Indeed, in a better world, scientists would have time and money to run an exploratory experiment to estimate the effect size and design a well-powered confirmatory study.
Take-home message: A power analysis does not give a free ticket to avoid the need to distinguish between exploratory and confirmatory experiments.