Error bars can convey misleading information

by Martin C. Michel

The most common type of graphical data reporting are bar graphs depicting means with SEM error bars. Based on simulated data, Weissgerber et al. have argued convincingly that bar graphs are not showing but rather hiding data, as various patterns of underlying data can lead to the same mean value (Weissgerber et al., 2015). Thus, an apparent inter-group difference can represent symmetric variability in both groups, as most would assume the difference in means represents. However, it also could be driven by outliers, bimodal distribution within in each group or by unequal sample sizes across groups. Each option may reach statistical significance, but the story behind the data may be differing considerably. Weissgerber et al. have also shown that the choice of depicting variability, at least psychologically, affects how we perceive data. Thus, SEM (SD divided by square root of n) has the smallest error bar, and a small error bar may make even small group difference look large, even if the overlap between both groups is considerably. To further look into this, I have gone back into previously published real data from my lab (Frazier et al., 2006). That study has explored possible difference in relaxation of the urinary bladder by several β-adrenoceptor agonists between young and old rats. At the time, not knowing any better, we reported means with SEM error bars. In the figure below, I show a bar graph based on means with SEM error bars as the data had been presented in the paper along with other types of data representation. Looking at this panel only, it appears that there may be a fairly large difference between the young and old rats, i.e. old rats exhibiting only about half of the maximum relaxation. But if we look at the scatter plot, two problems appear with this interpretation. Firstly, there was one rat among the old rats in which noradrenaline caused hardly any relaxation. It does not look like a major outlier but clearly had impact on the overall mean. Second, there is considerable overlap in the noradrenaline effects between the two age groups. Thus, only 5 out of 9 measurements in old rats yield values smaller than the lowest in the young rats. Thus, these real data confirm that means may hide existing variability in data and pretend a certainty of conclusions that may not be warranted. As proposed by Weissgerber et al., the scatter plot conveys the real data much better than the bar graph and gives readers a choice to interpret the data as they are. Thus, unless there is a large number of data points, the scatter plot is clearly superior to the bar graph.

However, when data are not shown in a figure but in the main text, not all data points can be presented and a summarizing number is required. If one looks at the four bar graphs (each showing the same data, only with a different type of error bar), they convey different messages. The graph with an SEM error bar makes it look as if the difference between the two groups is quite robust, as the group difference is more than thrice the magnitude of the error bar. However, we have seen from the scatter plot that this is not what the data really say. On the other hand, the SD error bars by definition are larger. As everybody knows, about 95% of all data fall within twice the SD. Looking at the SD error bars, it is quite clear that the two groups overlap. This is what the raw data say, but not the impression coming from the SEM error bars.

There also is a conceptual difference between SD and SEM error bars. SD describes the variability within the sample, whereas SEM describes the precision with which the group mean has been estimated. An alternative to presenting precision of the parameter estimate is the 95% confidence interval. In this specific case, it provides a similar message as the SD error bar, i.e. the two populations may differ but probably are overlapping. Of note, SEM and SD are only meaningful if the samples come from a population with Gaussian distribution (or at least close to it). In biology, this often is not the case or we at least do not have sufficient information for an informed decision. In this case it involves fewer assumptions to show medians. To express the variability of the data depicted as medians, the interquartile range is a useful indicator. In this example, it conveys a similar message as the SD or confidence interval error bars.

In summary, many data points may lead to similar bar graphs, but a different biology may be hiding behind it in each case. Therefore, the scatter plot (where possible) is clearly the preferred option of showing quantitative data. If means with error bar have to be sown, e.g. within the main text, SD is the error bar of choice to depict variability and confidence interval to depict precision of parameter estimate. For data from populations with non-Gaussian distribution medians with interquartile ranges are the preferred option to present data when scatter plots are not possible.