Commentary March 2020

Comment on Walsh et al. “The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index” J Clin Epidemiol 67: 622-628, 2014

Despite common misconceptions, a p-value does not tell us anything about truth (i.e. that an observed finding in a sample is representative for the underlying population of interest); it only describes the probability that a difference at least as large as the one being observed could have been found based on chance alone if in reality there is no difference. A p-value can no longer be interpreted at face value if the data being analyzed do not represent random samples, for instance because of unconscious bias in sampling, study execution, data analysis or reporting. Even worse, it can no longer be interpreted at face value if the investigators have actively violated the randomness principle by p-hacking (Motulsky, 2014). Even if none of this has happened, a relevant percentage of statistically significant findings may be false – a phenomenon largely driven by the a priori probability of an observation (Ioannidis, 2005). Add on top of these problems the issue of small sample sizes leading to fickle p-values (Halsey et al., 2015).

Canadian investigators have added an additional spin to this (Walsh et al., 2014): They performed modelling experiments based on 399 randomized controlled trials in which they added events to the control group in a step-wise fashion until the p-value exceeded 0.05 and called this the Fragility Index. Interestingly, the Fragility Index was smaller than the number of patients lost to follow-up in 53% of trials being analyzed. These findings show that the statistical significance of results from randomized clinical trials hinges on a small number of events. This highlights the general recommendation to focus reporting on effect sizes with confidence intervals and not on p-values (Michel et al., 2020).

Additional reads in March 2020

Preregistration of exploratory research: Learning from the golden age of discovery

What Research Institutions Can Do to Foster Research Integrity

When Science Needs Self-Correcting

Journal transparency index will be ‘alternative’ to impact scores

Universities ‘should bolster their research integrity policies’

Wozu Tierversuche? Medikamente gibt’s doch in der Apotheke (in German)

Scientists offered €1,000 to publish null results

Irreproducibility is not a sign of failure, but an inspiration for fresh ideas

How bad statistical practices drive away the good ones.

How a ‘no raw data, no science’ outlook can resolve the reproducibility crisis in science

The Data Must Be Accessible to All

In praise of replication studies and null results

Improving the trustworthiness, usefulness, and ethics of biomedical research through an innovative and comprehensive institutional initiative

Find a home for every imaging data set

A controlled trial for reproducibility

Challenges to the Reproducibility of Machine Learning Models in Health Care

The BJP expects authors to share data

Getting To The Root Of Poor ELISA Data Reproducibility

Comparing quality of reporting between preprints and peer-reviewed articles in the biomedical literature

Preprint usage is growing rapidly in the life sciences; however, questions remain on the relative quality of preprints when compared to published articles. An objective dimension of quality that is readily measurable is completeness of reporting, as transparency can improve the reader’s ability to independently interpret data and reproduce findings. In this study, the authors compared random samples of articles published in bioRxiv and in PubMed-indexed journals in 2016 using a quality of reporting questionnaire. It was found that peer-reviewed articles had, on average, higher quality of reporting than preprints, although this difference was small. On average, they found that peer reviewers caught just one deficiency per manuscript in about 25 categories of reporting.
Although the sample size was small and only 56 bioRxiv preprints were analyse, these results indicate show that quality of reporting in preprints in the life sciences is within a similar range as that of peer-reviewed articles, supporting the idea that preprints should be considered valid scientific contributions.


Research Culture: Framework for advancing rigorous research

There is a pressing need to increase the rigor of research in the life and biomedical sciences. To address this issue, the authors propose to establish communities of ’rigor champions’, who campaign for reforms of the research culture that has led to shortcomings in rigor. These rigor champions would also assist in the development and adoption of a comprehensive educational platform that would teach the principles of rigorous science to researchers at all career stages.


Improving the trustworthiness, usefulness, and ethics of biomedical research through an innovative and comprehensive institutional initiative

The reproducibility crisis triggered worldwide initiatives to improve rigor, reproducibility, and transparency in biomedical research. There are many examples of scientists, journals, and funding agencies adopting responsible research practices. The QUEST (Quality-Ethics-Open Science-Translation) Center offers a unique opportunity to examine the role of institutions. The Berlin Institute of Health founded QUEST to increase the likelihood that research conducted at this large academic medical center would be trustworthy, useful for scientists and society, and ethical. QUEST researchers perform “science of science” studies to understand problems with standard practices and develop targeted solutions. The staff work with institutional leadership and local scientists to incentivize and support responsible practices in research, funding, and hiring. Some activities described in this paper focus on the institution, whereas others may benefit the national and international scientific community. The experiences, approaches, and recommendations of the QUEST Center will be informative for faculty leadership, administrators, and researchers interested in improving scientific practice.


Research Culture and Reproducibility

There is an ongoing debate regarding the robustness and credibility of published scientific research. In this article, the authors argue that these issues stem from two broad causal mechanisms: the cognitive biases of researchers and the incentive structures within which researchers operate. To address these issues, the UK Reproducibility Network (UKRN) was founded and is introduced in this paper. The UKRN is supporting several initiatives at various levels across the UK research system and is investigating the factors that contribute to robust research, promoting training activities, disseminating best practice, and working with stakeholders (like researchers, institutions, funders, publishers, and others) to ensure coordination of efforts across the sector.