Pre-clinical animal research is typically based on single-laboratory studies conducted under highly standardized conditions.
However, a new study published by Bernhard Voelkl, Lucile Vogt, Emily S. Sena and Hanno Würbel from the Universities of Bern and Edinburgh showed that such approach risks producing results that are valid only under very specific conditions. In contrast, multi-laboratory studies that are based on diversity, substantially increase the reproducibility of animal experiments.

The researchers used simulations based on 440 pre-clinical studies on 13 different experimental treatments in animal models of stroke, heart attack, and breast cancer. To simulate multi-laboratory studies, data from multiple studies were combined, as if several laboratories had conducted them in parallel.
Analyzing the effect of hypothermia on stroke severity in rodents, the authors first conducted a meta-analysis of 50 independent studies and found that it reduces severity by around 50 percent. They used this number as an estimate of the “true” effect, comparing it to single- and multi-lab simulations of the reduction in severity. The proportion of studies that accurately predicted the 50% reduction of infarct volume increased from under 50% in single-lab studies to 73% in two-lab studies, to 83% in three-lab studies, and to 87% in four-lab studies.

Overall, these results demonstrate that single-laboratory studies produced greater variation between study results in contrast to multi-laboratory studies, comprising as few as two to four laboratories, which led to much more consistent results, thereby increasing reproducibility without a need for larger sample sizes.
By using published results from preclinical studies, the authors could show that the standardization of animal experiments is an important cause of poor reproducibility of results in preclinical animal research. Poor reproducibility questions the benefit of animal experiments and requires more replicate experiments – and therefore overall more animals – to answer a given research question conclusively.

Besides obvious differences between studies, such as the species or strain of animals (i.e., genotype), differences in animal husbandry or experimental procedures, multiple sources of variation exist which are simply unknown and difficult to control, such as the influence of the experimenter or the microbiome, as well as subtle differences in visual, olfactory and auditory stimulation. All those factors might affect treatment effects and can therefore explain the low reproducibility rates of single-lab studies observed by the authors. Multi-laboratory designs are ideal to account for all of these sources of between-laboratory variation.

The authors acknowledge that it may be logistically difficult to include multiple labs in every new preclinical study, and to run initial ‘exploratory’ studies under highly standardized conditions should not be a problem. However, as soon as findings need to be generalized and form the basis for go/no go decisions (e.g. move to first-in-human trials), heterogeneity becomes important and multi-laboratory studies should replace standardized single-laboratory studies as the gold standard for late-phase preclinical trials.