The Embassy of Good Science – An online platform fostering research integrity

By Iris Lechner
The field of research integrity is growing substantially. An increasing number of guidelines and initiatives to foster responsible research practices are being implemented worldwide.
Individual researchers however sometimes find it difficult to know which policies, codes and rules of good research practices apply to them in their specific context. In addition, surviving in academia is not easy, and short-cuts are still too often rewarded. To make research integrity information easily accessible to researchers, the European consortium ENTIRE developed the online platform The Embassy of Good Science. The platform can be found on Here, a wide range of information, resources, and tools can be found.
Uniquely, the platform is both for researchers, and made by researchers. The research community can easily search for and find relevant information to learn about good research practices relevant for their work. The Embassy contains short explanatory ‘theme pages’ which provide an introduction about specific research integrity topics in a way that is understandable for all researchers. A wide range of topics have already been covered, including plagiarism, authorship, the FAIR principles, p-hacking and conflict of interest.
The unique approach is made possible by using Semantic Media-Wiki which allows individual researchers to add and edit information. Theme pages are automatically linked to the relevant resources on the platform, including guidelines, cases and educational tools. For example, if you read a theme page on plagiarism, a case on plagiarism and online training is also shown on the same page. In this way, researchers can access all relevant information needed to navigate the complex web research integrity has become. Online training modules on specific research integrity topics will also be made available shortly aimed at increasing researchers’ knowledge of and commitment to research integrity in their everyday practice.
Interested in the platform? Check out this short video and go to
The EQIPD project is also present on The Embassy of Good Science platform and summarized HERE.

Commentary March 2020

Comment on Walsh et al. “The statistical significance of randomized controlled trial results is frequently fragile: a case for a Fragility Index” J Clin Epidemiol 67: 622-628, 2014

Despite common misconceptions, a p-value does not tell us anything about truth (i.e. that an observed finding in a sample is representative for the underlying population of interest); it only describes the probability that a difference at least as large as the one being observed could have been found based on chance alone if in reality there is no difference. A p-value can no longer be interpreted at face value if the data being analyzed do not represent random samples, for instance because of unconscious bias in sampling, study execution, data analysis or reporting. Even worse, it can no longer be interpreted at face value if the investigators have actively violated the randomness principle by p-hacking (Motulsky, 2014). Even if none of this has happened, a relevant percentage of statistically significant findings may be false – a phenomenon largely driven by the a priori probability of an observation (Ioannidis, 2005). Add on top of these problems the issue of small sample sizes leading to fickle p-values (Halsey et al., 2015).

Canadian investigators have added an additional spin to this (Walsh et al., 2014): They performed modelling experiments based on 399 randomized controlled trials in which they added events to the control group in a step-wise fashion until the p-value exceeded 0.05 and called this the Fragility Index. Interestingly, the Fragility Index was smaller than the number of patients lost to follow-up in 53% of trials being analyzed. These findings show that the statistical significance of results from randomized clinical trials hinges on a small number of events. This highlights the general recommendation to focus reporting on effect sizes with confidence intervals and not on p-values (Michel et al., 2020).

Improving quality of preclinical academic research through auditing: A feasibility study

Commentary provided by Claudia Kurreck (Department of Experimental Neurology, Charité -Universitätsmedizin Berlin, Germany)
The performance of audits and assessments is always a sensitive issue in the academic preclinical research environment. The most important argument for not conducting audits in the scientific community has been that audits would restrict the freedom of research; in the past, they were even perceived as inquisition. In this paper we describe that the opposite is true. Within six years, we were able to gain extensive experience with very different forms of audits and assessments within two different QM systems (ISO 9001 and PREMIER).
We investigate practicable options for auditing which have the potential to improve quality of preclinical research in academia, list specific recommendations regarding their benefits, and provide practical resources for their implementation.
We have been able to show that audits and assessments are an important quality assurance tool that can be performed without the existence of a QM system. In the process of our investigations, we have noticed a change in mentality regarding audits among our employees. Now audits are actively demanded instead of being strongly rejected, such as method audits to avoid protocol drifts.

Of lab notebooks and leaders in science

Good research practices are often said to work against the current success model in science (based on positive results, number and impact factor of publications, etc.). Changing the way we work requires the current science leaders to act as role models and set good examples. We applaud Dr Frances Arnold, a Nobel Prize winner, for retracting a Science paper with the following explanation: “Careful examination of the first author’s lab notebook then revealed missing contemporaneous entries and raw data for key experiments. The authors are therefore retracting the paper.“

Why is this example so important?
First, there are still way too many labs that have no lab notebooks at all that makes it even impossible to explain to them what the raw data actually means and why it is so important to have the raw data meet key expectations (e.g. being contemporaneous).
Second, as a true leader, Dr Arnold took a personal responsibility for this error (see her twit HERE).

Is N-Hacking ever OK?

It has been proposed repeatedly that adding samples based on results of initial experiments is a form of p-hacking (see e.g. new Instructions to Authors of journals of the Am Soc Exp Pharmacol Ther). While these recommendations were based on sound theoretical considerations, Pamela Rainagel from San Diego demonstrates in a manuscript not yet peer reviewed that the impact on false positives based on Monte-Carlo simulations of dynamically adjusting sample size, called n-hacking by her. Interestingly, her analysis shows that that n-hacking increase false positives and that effect sizes and prior probability are key drivers of this.

However, her simulations also suggest that the positive predictive value increases and is higher than that from non-incremental experiments. Apparently, the increase in false positives is more than offset by that in true positives. She proposes that post-hoc increases in sample size must be disclosed but could confer previously unappreciated advantages for efficiency and positive predictive value. However, she also warns that adaptations of sample sizes need careful considerations of correction of p-values because n-hacking essentially is a form of increasing the statistical alpha.

Therefore, the proposal by Pamela Rainagel does not invalidate the argument that findings resulting from n-hacking/adapted sample sizes are no longer suitable for hypothesis-testing statistical analysis unless careful pre-specification is in place to adjust for the increased alpha.

Instead of replicating studies with problems, let’s replicate the good studies

A few months ago, the “Reproducibility and Replicability in Science” report from the National Academies of Sciences, Engineering, and Medicine was published. It included a set of criteria to help determine when testing replicability may be warranted:

1) The scientific results are important for individual decision-making or for policy decisions.
2) The results have the potential to make a large contribution to basic scientific knowledge.
3) The original result is particularly surprising, that is, it is unexpected in light of previous evidence and knowledge.
4) There is controversy about the topic.
5) There was potential bias in the original investigation, due, for example, to the source of funding.
6) There was a weakness or flaw in the design, methods, or analysis of the original study.
7) The cost of a replication is offset by the potential value in reaffirming the original results.
8) Future expensive and important studies will build on the original scientific results.

However, especially points 3-6 encourage reproduction of poor studies or studies that have data quality issues.
Here is an interesting alternative view by Andrew Gelman who encourages attempts to reproduce the good studies. From this perspective, replication studies could indeed provide real incentives for scientists to focus on Good Research Practice and to conduct their studies as unbiased as possible.
As one of the commenters pointed out: “I put replications of my work in my CV. After all, it shows both that somebody was interested enough to repeat/continue the work and that they could.”