To examine associations between urinary chemical concentrations and adult health status, the US Centers for Disease Control and Prevention (CDC) measured the urine of 1455 study participants (representing the general US adult population) for the presence of 275 environmental chemicals, including Bisphenol A (BPA).
BPA is used in certain packaging materials such as polycarbonates for baby food bottles. It is also used in epoxy resins for internal protective linings for canned food and metal lids. However, evidence of adverse effects in animals has generated concerns over low-level chronic exposures in humans.
As part of the 2003-2004 National Health and Nutrition Examination Survey (NHANES), the same participants mentioned above were also questioned about 32 different clinical outcomes. Based on this NHANES analysis, it was found that higher urinary concentrations of BPA were associated with an increased prevalence of cardiovascular disease, diabetes, and liver-enzyme abnormalities. (Lang et al., JAMA 2008; 300, 1303-10
Importantly, however, the potential for false positives
in this case is substantial when the complete CDC study design is examined: from the perspective of the full data set, there are 32 x 275 = 8800 different questions at issue. In addition, ten demographic variables (such as ethnicity, education and income) were also analyzed. With 32 possible health outcomes, potentially associated with any of the 275 chemicals, along with each demographic variable and different strategies for covariate adjustment, there could be as many as approximately 9 million statistical models and endpoints available to analyze the data (Young and Yu, JAMA 2009; 301, 720-721
Given that the publication by Lang et al. focused only on one chemical and 16 health conditions, it is important to understand how many questions were at issue before conducting the study. With this huge search space and all possible modeling variations in the CDC study design, there is a real possibility that the findings reported by the authors could well be the result of chance rather than representing real health concerns. When many questions are asked of the same data, some of those questions will by chance come up as false positives – a consequence known as the multiple testing problem.
The probability P of rejecting at least one true null hypothesis for the case when all tests are independent of each other can be calculated as follows:
P = 1 – (1 – α)^n
(1 – α): probability of not rejecting a true hypothesis for one test
(1 – α)^n : probability of not rejecting n true hypothesis (with n tests in total)
If the conventional significance level of α = 0.05 is used for n = 20 tests, then there is a probability of around 64% that at least one true null hypothesis is rejected.