Die Reproduzierbarkeitskrise: Bedrohung oder Chance für die Wissenschaft? (In German)

In this Editorial, published in Biologie in unserer Zeit, Martin C. Michel and Ralf Dahm discuss threats and opportunities related to the current reproducibility crisis in biomedical sciences.
The authors highlight several top-down approaches currently in place to increase data quality and reproducibility: the BMBF, the EU or the NIH have launched research programs on the topic of reproducibility; various specialist journals (e.g. Nature or Molecular Pharmacology) have adapted their guidelines for authors; and the DFG has published newguidelines for Good Scientific Practice and declared them binding for all DFG-funded scientists.
In addition, there is also an increasing number of bottom-up initiatives, such as the European Quality in Preclinical Data (EQIPD) project (https://quality-preclinical-data.eu/) or the Global Preclinical Data Forum (https://www.preclinicaldataforum.org). Such initiatives as well as professional organizations like the PAASP Network (e.g. www.paasp.net) offer solutions, advice and training to promote preclinical data quality.


Systematic review of guidelines for internal validity in the design, conduct and analysis of preclinical biomedical experiments involving laboratory animals

Several initiatives have set out to increase transparency and internal validity of preclinical studies. While many of the points raised in these various guidelines are identical or similar, they differ in detail and rigour. Most of them focus on reporting, only few of them cover the planning and conduct of studies.
The aim of this systematic review was to identify existing experimental design, conduct, analysis and reporting guidelines relating to preclinical animal research. Based on a systematic search in PubMed, Embase and Web of Science unique 58 recommendations were extracted. Amongst the most recommended items were sample size calculations, adequate statistical methods, concealed and randomised allocation of animals to treatment, blinded outcome assessment and recording of animal flow through the experiment.
The authors highlight, that – although these recommendations are valuable – there is a striking lack of experimental evidence on their importance and relative effect on experiments and effect sizes.
This work is part of the European Quality In Preclinical Data (EQIPD) consortium.


Variability in the analysis of a single neuroimaging dataset by many teams

To test the reproducibility and robustness of results obtained in the neuroimaging field, 70 independent teams of neuroimaging experts from across the globe were asked to analyze and interpret the same functional magnetic resonance imaging dataset.
The authors found that no two teams chose identical workflows to analyse the data – a consequence of the degrees of freedom and flexibility around the best suited analytical approaches.
This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset. These findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows and the need for experts in the field to come together and discuss what minimum reporting standards are.
The most straightforward way to combat such (unintentional) degrees of freedom is to have detailed data processing and analysis protocols as part of the study plans. As this example illustrates, such protocols need to be checked by independent scientists to make sure that they are complete and unequivocal. While the imaging field is complex and data analysis cannot be described in one sentence, the need to have sufficiently detailed study plans is also a message to pre-registration platforms that should not impose any restrictions on the amount of information being pre-registered.


Covid-19 and the PPV

There has been a lot in the news recently about using antibody tests to detect people who have had Covid-19 and who might therefore be immune to further infection (this remains to be proven). Despite the relatively good performance of many of the proposed antibody tests, with sensitivity and specificity above 95% in most cases, the low prevalence of infection means that the Positive Predictive Value (PPV) of these tests is low.
For example, a test which has 98% sensitivity (ability to detect true positives) and specificity (ability to avoid false positives) will still give a false positive rate of nearly 30% if the underlying rate of infection is around 5% (i.e. a PPV of 72.1%). Thus, for every 1000 people tested there will be on average 50 people with antibodies to SARS-CoV-2 and the test will correctly detect 49 of those 50, This is the result of the test being 98% sensitive. However, the number of false positives in the 950 people who do not have antibodies will be 19. Despite the high specificity of the test, the high number of non-infected people means that the 2% of them that give a false positive result will be a large proportion of the overall number of positive tests: in this case 19 out of 68. This has quite rightly been pointed out by health authorities as insufficient to qualify people as ‘immune’.
However, there is a relatively simple solution. By using two independent antibody tests, with similar specificity and sensitivity, PPV can be increased to 99.2% if we consider as positive only those who are positive under both tests. This is because 48 of the previous positives will again be positive (98%of 49) but no more than 1 of the 19 false positives will give a second positive test (only 0.4 people, i.e. 2% of 19). The two tests would have to be truly independent, but as there are now numerous tests available (or in development) it should be possible to find two that can be combined to achieve a PPV that can be useful for making public health decisions.
What we have done here, in effect, is to increase the underlying rate of true positives for the second test (to 72%, 49 of the initially 68 positive tests). Under these conditions our antibody test meets our expectations of what a 98% test should do.
Too often we look at a test performance without considering the underlying rate. Tests to predict low incidence disease must be extremely good to be useful as diagnostic tools simply because of the large number of true negatives. The same holds true for analysis of most scientific experiments: we use an alpha of 0.05 and a beta of 0.2 to decide if out studies are significant. This is the same as 95% sensitivity and 80% specificity. In the example above that would result in 48 true positives but 190 false positives!
We rarely know what the true underlying rate of true positives is in our experiments. We might suspect that for structured, confirmatory studies it is quite high, maybe above 50%. But for exploratory studies, screening of compound libraries etc. it might be less than 5%. Interpretation and analysis of these experiments needs to consider such differences if we want to reach robust conclusions. With the importance of underlying rate for calculating PPV now getting a more public airing, we can only hope that it will be considered more in data analysis.

Additional note:
Whilst writing this piece, the new FDA-approved test by Roche claims 100% sensitivity and 99.8% specificity, which would get the PPV up to 96.3. This shows that the bar needs to be set very high for low-incidence events, but that Roche seem to have succeeded.

Conflict of interest – Education needed!

A “Conflict of Interest (CoI)” is a very special situation of which we are usually reminded about when a famous scientist gets criticized – most often in connection to highly publicized areas of science such as CRISPR or Covid-19.

For many scientists in Academia, awareness about CoI issues is limited to what is found in the journals’ Instructions for Authors. For example, for the European Journal of Neuroscience it states:

“EJN requires that all authors disclose any potential sources of conflict of interest. Any interest or relationship, financial or otherwise, which might be perceived as influencing an author’s objectivity is considered a potential source of conflict of interest. The existence of a conflict of interest does not preclude publication in this journal. The absence of conflicts of interest should also be stated. It is the responsibility of the corresponding author to ensure this policy is adhered to.”

Those who would like to understand whether there is something to disclose will have to do some research of their own and search for relevant guidance. One credible source of information is the 2009 book “Conflict of Interest in Medical Research, Education, and Practice” published by the Institute of Medicine.  Those who have time and patience to read its 392 pages will be equipped with all the essential knowledge and will understand how to judge whether there is a CoI or not:

“Conflicts of interest are defined as circumstances that create a risk that professional judgments or actions regarding a primary interest will be unduly influenced by a secondary interest. Primary interests include promoting and protecting the integrity of research, the quality of medical education, and the welfare of patients. Secondary interests include not only financial interests—the focus of this report—but also other interests, such as the pursuit of professional advancement and recognition and the desire to do favors for friends, family, students, or colleagues. Conflict of interest policies typically focus on financial gain because it is relatively more objective, fungible, and quantifiable. Financial gain can therefore be more effectively and fairly regulated than other secondary interests.

The severity of a conflict of interest depends on (1) the likelihood that professional decisions made under the relevant circumstances would be unduly influenced by a secondary interest and (2) the seriousness of the harm or wrong that could result from such an influence. The likelihood of undue influence is affected by the value of the secondary interest, its duration and depth, and the extent of discretion that the individual has in making important decisions.”

Those who would search such books for quick guidance may find the following table to be useful: 

Table 1: Candidate List of Categories of Financial Relationships with Industry to be Disclosed

Research grants and contracts
Consulting agreements
Participation in speakers bureaus
Intellectual property, including patents, royalties, licensing fees
Stock, options, warrants, and other ownership (excepting general mutual funds)
Position with a company:
– Company governing boards
– Technical advisory committees, scientific advisory boards, and marketing panels
– Company employee or officer, full or part time
Authorship of publications prepared by others
Expert witness for a plaintiff or defendant
Other payments or financial relationships

Having such summary tables with examples are indeed very useful as they may support the decision making in some cases. Yet, one should not view such tables as checklists that include all potential examples and there is still the need to understand what may constitute a CoI and to treat every situation individually.

Indeed, the title of the table above focuses on relationships with industry and it may therefore lead to misunderstandings and confusion that these categories are relevant only for existing relationships with industry.

In a recent example, an academic discovery originated from research was funded by a non-profit patient advocacy group with apparently no industry involvement either at the time when a patent application was filed or at a later time point when a related scientific publication appeared. However, even in such a case, the author should have disclosed the existence of a patent application and inform the reader about a potential conflict of interest. This is why conflict of interest policies generally emphasize preventive measures such as transparent disclosures.

Fun, fun, fun…

A mathematician, a physicist, and a statistician went hunting for deer. When they chanced upon one buck lounging about, the mathematician fired first, missing the buck’s nose by a few inches.
The physicist then tried his hand and missed the tail by a wee bit.
The statistician started jumping up and down saying, “We got him! We got him!”
(from the website CrossValidated)