February 5, 2016

The Multiple Testing Problem

  • If the null is true \(Pr(p <.05) = .05\)
  • Can fool yourself (and others) with randomness, make type-I errors
  • Test many hypotheses, get many rejections!
  • If you only report on the rejections, can fish results

Do you have a Multiple Testing Problem?

  • Yes

Areas in which multiple testing problems are common

  • Experiments with multiple treatment arms (Gerber, Green and Larimer, 2008)
  • Meta-analysis (Eggers et al, 2014)
  • Identifying survey items that predict political traits (Gerber et al, 2011; Jackman and Spahn, 2015)
  • Social Science


  • Do remittances create support for believing that the income distribution is fair? (Doyle, 2015)
  • Determinants of Election to City Councils in Swedish Municipalities (Dancygier et al, 2015)
  • Testing various model specification for whether transparency predicts the collapse of authoritatian regimes (Hollyer et al, 2015)
  • Does terrorism affect Israelis' tolerance, and on whom is it most impactful? (Peffley et al, 2015)

iid p-values

Multiple Testing Mixtures


  • \(m\) denotes the number of hypotheses being tested
  • \(p_i\) is the \(p\)-value for hypothesis \(i \forall i \in {1,2,...m}\)
  • \(\alpha\) is the level of a multiple testing error rate.

Bonferroni Correction

  • Classic Solution
  • Controls the probability of one or more type-I errors (Family-Wise Error Rate)
  • Very conservative error measure
  • Reject when \(p \leq \frac{\alpha}{m}\)

False Discovery Rate (Benjamini & Hochberg, 1995)

  • New solution, standard practice in genomics
  • Controls the expected proportion of Type-I Errors among all rejections
  • Defined to be 0 when no rejections.
  • In math: \[\begin{align*} FDR = E\left(\frac{V}{R}\right) Pr(R>0) \end{align*}\]

Benjamini-Hochberg Procedure

  • Order p-values, s.t. \(p_1 \leq p_2 \leq ... \leq p_m\)
  • Reject first \(\hat{k}\) hypotheses where \[\begin{align*} \hat{k}= \max \{k : p_k \leq \tfrac{k}{m} \alpha \} \end{align*}\]



  • Recall that null p-values are distributed \(U(0,1)\).
  • Suppose \(m\) hypotheses, of which \(m_T\) are truly null.
  • When you reject \(k\) hypotheses, you form the rejection region \(p \leq \frac{k}{m} \alpha\)
  • Of rejections, one expects \(m_t \frac{k}{m} \alpha\) hypotheses to be null
  • Thus, \(FDR = E\left(\frac{V}{R}\right) = \frac{m_T \frac{k}{m} \alpha}{k} = \frac{m_T}{m} \alpha \leq \alpha\) as desired.

Why we like FDR

  • Adaptable: tolerance for number false rejections changes with total number of rejections
  • Scalable: everything scales to number of hypotheses being tested: more false positives, more rejections, etc.
  • Hypotheses are separable: same inferences drawn when pooling hypotheses all together or splitting into multiple large and heterogeneous groups of hypotheses

Multiple Testing Mixtures - \(p<.05\)

Shrinkage: It's a Good Thing

Think about a regression that has many country indicator variables in it (cell means model). You could:

  • Not pool: set each country's intercept to its mean
  • Fully pool: set each countries coefficient to the same value
  • Partially pool: balance information from the country itself with what you know about countries in general
  • Partial pooling has better average mean squared error when you have 3 or more groups (Efron & Morris, 1973)

How to shrink:

  • James-Stein Estimator
  • Hierarchical Bayes (350D),for more see Gelman, Hill and Yajima (2015)
  • Penalized Regression (350C)
  • Mixed Effects Models

Shrunk Estiamtes

Weighted False Discovery Rate (Benjamini and Hochberg, 1997)

\[ \begin{align} Q(w) &= \begin{cases} \frac{ \sum_{i=1}^k w_i V_i}{\sum_{i=1}^k w_i},& R > 0 \\ 0,& R= 0 \end{cases} \end{align} \]

Order the p-values from smallest to largest, and reject the first \(k\) hypotheses, choosing k by:

\[ \begin{align} \hat{k}= \max \{k : p_k \leq \tfrac{\sum_{i=1}^k w_i}{m} \alpha \} \label{eqn:wbh} \end{align} \]

Graphical Intiution