In the November issue of our Newsletter, we have featured a publication by the NPQIP Collaborative Group (LINK) that analyzed the impact of the Nature journals’ “checklists” for life sciences research on the completeness of reporting of the “Landis 4” items (randomization, blinding, sample size calculation, exclusions). These checklists were introduced in May 2013 and the NPQIP Collaborative Group confirmed that there was indeed a substantial improvement in the reporting of risks of bias thanks to this editorial policy.

We were very curious to see what kind of statements the authors provide when submitting the checklists. Unfortunately, it used to be Nature journals’ policy to restrict these checklists for the use by editors and reviewers only. However, we were apparently not the only ones who questioned this practice (LINK) and a few months ago the editorial policy was changed and checklists are now published along with the accepted manuscripts (LINK).
These are indeed great advances in making the publication process more transparent and Nature is certainly leading the way! What should be the next steps?

One suggestion became obvious after reviewing the statements made by authors in theses checklists. For one recently published paper, the following statement was provided by the authors when asked to describe how the sample size was calculated: “The sample size was determined based on preliminary results or similar experiments carried out in the past. Power analysis was performed using G-power in order to estimate the number of animals required, for a signal-to-noise ratio of 1.4 and 80% to 90% power assuming a 5% significance level”.
When asked to describe whether the experimental findings were reliably reproduced, the authors provided the following statement: “The attempts at replication were successful”.

Despite the increase in transparency, this is not exactly what most of us would like to see. As Nature papers often show a great number of different experiments (see here an interesting commentary on this phenomenon – LINK), it is relevant to know whether all of them have been replicated. Furthermore, in some cases simple statements about blinding and randomization may satisfy the requirement “to tick the box” but still not explain how exactly these processes were performed. This is important because it has a direct impact on data quality and integrity.

Here is an example from the time before checklists were disclosed to the readers:
In the paper by Daniels and colleagues (LINK), one of the experiments concerned the effects of repeated mefenamic acid treatment on novel object recognition in rats that received a single unilateral intracerebroventricular injection of soluble Aß1–42 two weeks before the test. There were no animals excluded (which is assumed from the fact that excluded animals in other experiments were explicitly mentioned), the study was blinded, and a random assignment to treatment conditions was used. Interestingly, the figure legends indicate that, in this 2×2 study design, samples sizes ranged from 5 to 10 (given as “n=5-10”). Although we have no reason to doubt the experimental procedure, it leads to a question critical for any reproducibility effort: which kind of randomization protocol was actually applied and why?
This example is, however, not so uncommon and similar studies can be found quite often in the literature, where study groups are randomized despite markedly unequal group sizes. In some cases, one can “understand” the reasoning behind (e.g. different numbers of wild-type and transgenic animals available). In other cases, readers may ask themselves whether they understand the term “randomization” the same way as the authors do.

This is where we therefore see a need for further improvement of the editorial policies (and guidelines such as ARRIVE): providing clear definitions of key terms and methodology and to request authors to provide sufficient experimental details to avoid any confusion.