Over the last two decades, the neuropeptide Oxytocin (OT) has been studied extensively and many articles have been published about its role in humans’ emotional and social lives, e.g. increasing trust and sensitivity to others’ feelings. Even a TED talk has been recorded (Trust, morality – and oxytocin?) with over 1.4 million viewers.
The human trials conducted were based on early animal studies, where a critical manipulation of the OT system was translated into behavioral phenotypes affecting social cognition, bonding and individual recognition.
However, some recent publications question the sometimes bewildering evidence for the role of OT in influencing complex social processes in humans, and failed to reproduce some of the most influential studies conducted. Furthermore, no elevated cerebrospinal fluid (CSF) OT levels could be detected 45 min after administration, which represents the time window at which most behavioral tasks took place (Striepens et al., 2013). CSF OT concentrations were increased after 75 minutes, indicating that OT pharmacokinetics is not fully understood. Moreover, it is still unclear whether the usual doses administered in the field (between 24 and 40 IU) can indeed deliver enough OT to the brain in order to produce significant changes in individuals (Leng et al., 2016).
This ultimately leads to the following question: ‘If the published literature on the OT effects does not reflect the true state of the world, how has the vast behavioral OT literature accumulated (Lane et al., 2016)?’
Several possible scenarios and reasons are currently discussed and analyzed amongst OT researchers, demonstrating the crucial importance of implementing Good Research Practice standards, proper study design and a priori statistical power calculations:Power analysis:
A meta-analysis of the effects of OT on human behavior found that the average OT study has a statistical power of 16% for healthy individuals and a median sample size of 49 individuals. For clinical trials the statistical power was even lower (12%), given a median sample size of 26 individuals (Walum et al., 2016) .
Hence, OT studies in humans are dangerously underpowered, as 80% is normally considered the standard for minimal adequate statistical power. Even for studies with the largest effect and sample sizes (N = 112), the statistical power was lower than 70%. In order to achieve 80% power for the average effect size reported, a sample size of 352 healthy individuals would be needed (310 individuals for clinical trials).
Statistical power is the probability that a test will be able to reject the null hypothesis considering a true relation with a given effect size. In other words, replication attempts of true positive OT studies (with the same sample size) would fail up to 88% of the time considering the false negative rate of 84% or 88%, respectively. To further aggravate the problem, the observed effect size in underpowered studies is likely to be highly exaggerated, a phenomenon also known as “the winner’s curse”.
In addition, this meta-analysis also demonstrated that the positive predictive value (PPV) of those studies (using information on power, the pre-study odds and the alpha level) is low. Therefore, it was concluded that most of the reported positive findings in this field are likely to be false positives (Walum et al., 2016).Publication bias:
Almost all studies (29 out of 33), which were investigated as part of the meta-analysis (Walum et al., 2016), reported at least one positive result (p-value below 0.05). This huge excess of statistically significant findings clearly points towards a phenomenon referred to as the ‘file-drawer effect’ or publication bias suggesting that there could be a substantial amount of unpublished negative or inconclusive findings.
In an admirable and applaudable attempt to investigate if there is a file drawer problem in OT research, Anthony Lane at Catholic University of Louvain started to analyze all studies that were performed in his laboratory from 2009 until 2014 on a total of 453 subjects (Lane et al., 2016). Indeed, he found a statistically significant effect of OT for only one out of 25 tasks. This large proportion of ‘unexpected’ null findings, which were never published after they were conducted, raised concerns about the validity of what is known about the influence of OT on human behaviors and cognition. A. Lane therefore states that ‘our initial enthusiasm for OT has slowly faded away over the years and the studies have turned us from ‘believers’ into ‘skeptics’.
This process of publication bias is further supported by the current publication culture and the strong tendency of journals to favor publishing results that confirm hypothesis and neglect unconvincing data.Study design:
In addition to publication bias, the excess of significant effects of OT may also be the result of methodological, measurement or statistical artefacts: A. Lane’s laboratory also reported a massive use of ‘between-subject’ designs of relatively small sample size (around 30 individuals per study), which carries the risk of attributing effects to OT that are in fact generated by various unobservable factors, e.g. personality of participants (Lane et al., 2016).
Furthermore, Lane et al. failed twice to replicate their own previous study (Lane at al., 2015), which showed a powerful effect of OT increasing trusting behavior of study members. Notably, in the original study, OT administration followed a single blind procedure, where the subject is blind to the treatment condition but the experimenter is not, introducing the risk that the experimenter might unconsciously act differently and thereby influencing the subjects’ behavior to confirm the researcher’s hypothesis (unconscious behavioral priming). Both subsequent replication attempts were conducted in a double-blinded manner!

Importantly, the statistical and methodological limitations discussed here are not specific to the OT field and also directly affect other areas of biomedical research. Nevertheless, a systematic change in research practices and in the OT publication process is required to increase the trustworthiness and integrity of the data and to reveal the true state of OT effects. The adherence to detailed Good Research Practices (e.g. a priori power calculations and accurate blinding procedures) and a transparent reporting of methods and findings should therefore be strongly encouraged.