Comment on Walsh et al. “The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index” J Clin Epidemiol 67: 622-628, 2014
Despite common misconceptions, a p-value does not tell us anything about truth (i.e. that an observed finding in a sample is representative for the underlying population of interest); it only describes the probability that a difference at least as large as the one being observed could have been found based on chance alone if in reality there is no difference. A p-value can no longer be interpreted at face value if the data being analyzed do not represent random samples, for instance because of unconscious bias in sampling, study execution, data analysis or reporting. Even worse, it can no longer be interpreted at face value if the investigators have actively violated the randomness principle by p-hacking (Motulsky, 2014). Even if none of this has happened, a relevant percentage of statistically significant findings may be false – a phenomenon largely driven by the a priori probability of an observation (Ioannidis, 2005). Add on top of these problems the issue of small sample sizes leading to fickle p-values (Halsey et al., 2015).
Canadian investigators have added an additional spin to this (Walsh et al., 2014): They performed modelling experiments based on 399 randomized controlled trials in which they added events to the control group in a step-wise fashion until the p-value exceeded 0.05 and called this the Fragility Index. Interestingly, the Fragility Index was smaller than the number of patients lost to follow-up in 53% of trials being analyzed. These findings show that the statistical significance of results from randomized clinical trials hinges on a small number of events. This highlights the general recommendation to focus reporting on effect sizes with confidence intervals and not on p-values (Michel et al., 2020).
Preprint usage is growing rapidly in the life sciences; however, questions remain on the relative quality of preprints when compared to published articles. An objective dimension of quality that is readily measurable is completeness of reporting, as transparency can improve the reader’s ability to independently interpret data and reproduce findings. In this study, the authors compared random samples of articles published in bioRxiv and in PubMed-indexed journals in 2016 using a quality of reporting questionnaire. It was found that peer-reviewed articles had, on average, higher quality of reporting than preprints, although this difference was small. On average, they found that peer reviewers caught just one deficiency per manuscript in about 25 categories of reporting. Although the sample size was small and only 56 bioRxiv preprints were analyse, these results indicate show that quality of reporting in preprints in the life sciences is within a similar range as that of peer-reviewed articles, supporting the idea that preprints should be considered valid scientific contributions.
There is a pressing need to increase the rigor of research in the life and biomedical sciences. To address this issue, the authors propose to establish communities of ’rigor champions’, who campaign for reforms of the research culture that has led to shortcomings in rigor. These rigor champions would also assist in the development and adoption of a comprehensive educational platform that would teach the principles of rigorous science to researchers at all career stages.
The reproducibility crisis triggered worldwide initiatives to improve rigor, reproducibility, and transparency in biomedical research. There are many examples of scientists, journals, and funding agencies adopting responsible research practices. The QUEST (Quality-Ethics-Open Science-Translation) Center offers a unique opportunity to examine the role of institutions. The Berlin Institute of Health founded QUEST to increase the likelihood that research conducted at this large academic medical center would be trustworthy, useful for scientists and society, and ethical. QUEST researchers perform “science of science” studies to understand problems with standard practices and develop targeted solutions. The staff work with institutional leadership and local scientists to incentivize and support responsible practices in research, funding, and hiring. Some activities described in this paper focus on the institution, whereas others may benefit the national and international scientific community. The experiences, approaches, and recommendations of the QUEST Center will be informative for faculty leadership, administrators, and researchers interested in improving scientific practice.
There is an ongoing debate regarding the robustness and credibility of published scientific research. In this article, the authors argue that these issues stem from two broad causal mechanisms: the cognitive biases of researchers and the incentive structures within which researchers operate. To address these issues, the UK Reproducibility Network (UKRN) was founded and is introduced in this paper. The UKRN is supporting several initiatives at various levels across the UK research system and is investigating the factors that contribute to robust research, promoting training activities, disseminating best practice, and working with stakeholders (like researchers, institutions, funders, publishers, and others) to ensure coordination of efforts across the sector.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.