We will continue presenting Case Study publications that are particularly interesting from the Good Research Practice perspective. We hope that these cases can be useful in training programs and will help younger scientists to learn about the basics of study design, planning and analysis.  We invite our readers to share examples that can be used for such educational purposes.

This month we would like to turn again to the subject of sample size. We will be using examples from a paper published by Bradley and colleagues in the Journal of Clinical Investigation in December 2016 (link). This paper reported on the therapeutic potential of M1 muscarinic positive allosteric modulators to slow prion-related neurodegeneration and restore memory loss in mice. We are not challenging the conclusions made in this paper and do not mean to question the value of this kind of research. We use this paper solely with the purpose to illustrate a phenomenon that seems to be common in many papers that combine multiple research methods.

In this paper, hippocampal-dependent learning and memory was assessed using one of the most frequently used tasks – fear conditioning. When fear conditioning was compared in M1 knockout and wild-type mice, each group contained 8 mice. Additional control experiments compared pain thresholds and locomotor activity using seemingly the same groups of 8 mice each (as shown in Figure  1 in the article). Along with the behavioral data, this Figure presents immunohistochemical evidence of M1 receptor activation as the result of the fear conditioning training (“representative” images, no quantification and no indication as to whether this analysis was done in more than one mouse). So far so good – it looks rather common and may reflect our unfortunate habit to leave some technical details non-reported (e.g. a power analysis that was based on previous studies and that can justify N=8; a quantification of the IHC studies since “a nice picture says more than 1000 words”, etc.).

In Figure 3, all studies were done using wild-type animals that were either prion-infected or served as controls. In an experiment analyzing  combined fear conditioning and prion infection, the authors “randomly” used 19 mice per group and observed impaired fear  conditioning (panel 3C). It is interesting though, that in this case the control experiments are conducted using only N=6 animals (elevated plus maze and pain thresholds, panels 3D and 3E). However, it is not clear to the reader whether these are the same animals or separate groups. The same applies to the results on M1 receptor Bmax (panel 3G) where the sample size is even lower – N=3. Again, there may be some reasons not stated in the manuscript to justify the sample size and explain allocation of animals to different experiments as well as the reasons for subjecting prion-infected and control animals to non-comparable training and testing conditions (Figure 3B).

The reason to choose this paper as an example and to specifically point out sample sizes in Figures 3C vs 3D-E-G is, because, if all these data came from the same two groups of animals (prion-infected and controls), the small-sample experiments may not provide accurate estimates of what is likely to happen in a larger population. In other words, if the sample size for the fear conditioning study was N=6 and not N=19, the difference between the prion-infected and control mice could look less convincing. And, vice versa, if the sample size for the plus maze were larger (N=19 instead of N=6), differences in the number of the open arms visits between prion-infected and control mice could lead to a conclusion that the former are more likely to display anxiety-related behavior.

The example above illustrates the need for particularly well detailed and transparent description of the study design (including flow of the experiments, total numbers of subjects and allocation to individual experiments) and sample size justification for research where:
– multiple methods are combined (e.g. in vivo and ex vivo)
– certain methods are applied only for randomly (?) selected subsets of subjects or samples, and
– expected (hypothesized) outcome is a mixture of positive and negative results such as the learning impairment in the absence of pain sensitivity changes in the paper discussed here.