A long-awaited revision of the ARRIVE guidelines has finally been published (The ARRIVE guidelines 2019).What impact can we expect from ARRIVE 2.0? At the very least, we hope that the Nature journals will update their life sciences checklist, which will certainly have an impact. As an example, let’s look at the recently published paper that reported on the impact of gut microbiome on motor function and survival in the ALS mouse model.We picked this example because: i) it is recent, and ii) it uses the same SOD1 mouse model that has become a classical example of how adherence to higher research rigor standards turns “positive” data into “negative”.We would not use this example if the only finding was about changes in motor performance of the SOD1 mice exposed to long-term antibiotic cocktail – this is not be too surprising as antibiotics may penetrate into the CNS and may have effects unrelated to their “class” effects. And the survival data in the germ-free animals also do not make us focus on this paper because there were obvious problems with the rederivation itself (page 2, left column) and only insufficient information on study design is given (e.g. no details on surgery; no information on whether the colonized animals were also obtained via rederivation).The most striking are actually the data in Figure 3 “Akkermansia muciniphila colonization ameliorates motor degeneration and increases life-span in SOD1-Tg mice”.How would ARRIVE 2.0 help the reader to gain confidence in these data sets? In the Table below, we review the responses provided by the authors in the Life Sciences checklist and, stimulated by ARRIVE 2.0, indicate what information is missing to increase confident in these study results:

Published Life Sciences checklistWhat we would like to see
All in vivo experiments were randomized and controlled for gender, weight and cage effects.What methods were used for randomization and how can these methods explain highly unequal sample sizes within an experiment?
Sample sizes were determined based on previous studies and by the gold standards accepted in the field.A reference to the gold standards would be very helpful. The only gold standard in this field we are aware of – Scott et al. 2008 – would certainly not recommend using n = 5.
In all in vivo experiments the researches were blinded.This statement is insufficient to know whether blinding was maintained until data analysis.
No data were excluded from the manuscript.The experiment in Fig. 3 was repeated 6 (!) times with sample sizes between 5 and 26 resulting in the pooled sample sizes of up to 62 mice per group. However, survival data are presented only for 4-8 mice per group. It would be interesting to see survival data from the main pool of animals unless the main experiment was stopped at day 140.
All attempts of replication were successful and individual repeats are presented in ED and SI sections.Given that each of the “replication” experiments was severely underpowered, one may wonder whether these were indeed independent experiments or parts of a single study erroneously presented as “attempts of replication”.

We realize that ARRIVE 2.0 may not be sufficient to obtain answers to all of the above questions but this is certainly a major step forward that should be rigorously endorsed and promoted.