February 5, 2016

## The Multiple Testing Problem

• If the null is true $$Pr(p <.05) = .05$$
• Can fool yourself (and others) with randomness, make type-I errors
• Test many hypotheses, get many rejections!
• If you only report on the rejections, can fish results

• Yes

## Areas in which multiple testing problems are common

• Experiments with multiple treatment arms (Gerber, Green and Larimer, 2008)
• Meta-analysis (Eggers et al, 2014)
• Identifying survey items that predict political traits (Gerber et al, 2011; Jackman and Spahn, 2015)
• Social Science

## Examples

• Do remittances create support for believing that the income distribution is fair? (Doyle, 2015)
• Determinants of Election to City Councils in Swedish Municipalities (Dancygier et al, 2015)
• Testing various model specification for whether transparency predicts the collapse of authoritatian regimes (Hollyer et al, 2015)
• Does terrorism affect Israelis' tolerance, and on whom is it most impactful? (Peffley et al, 2015)

## Notation

• $$m$$ denotes the number of hypotheses being tested
• $$p_i$$ is the $$p$$-value for hypothesis $$i \forall i \in {1,2,...m}$$
• $$\alpha$$ is the level of a multiple testing error rate.

## Bonferroni Correction

• Classic Solution
• Controls the probability of one or more type-I errors (Family-Wise Error Rate)
• Very conservative error measure
• Reject when $$p \leq \frac{\alpha}{m}$$

## False Discovery Rate (Benjamini & Hochberg, 1995)

• New solution, standard practice in genomics
• Controls the expected proportion of Type-I Errors among all rejections
• Defined to be 0 when no rejections.
• In math: \begin{align*} FDR = E\left(\frac{V}{R}\right) Pr(R>0) \end{align*}

## Benjamini-Hochberg Procedure

• Order p-values, s.t. $$p_1 \leq p_2 \leq ... \leq p_m$$
• Reject first $$\hat{k}$$ hypotheses where \begin{align*} \hat{k}= \max \{k : p_k \leq \tfrac{k}{m} \alpha \} \end{align*}

## Intuition

• Recall that null p-values are distributed $$U(0,1)$$.
• Suppose $$m$$ hypotheses, of which $$m_T$$ are truly null.
• When you reject $$k$$ hypotheses, you form the rejection region $$p \leq \frac{k}{m} \alpha$$
• Of rejections, one expects $$m_t \frac{k}{m} \alpha$$ hypotheses to be null
• Thus, $$FDR = E\left(\frac{V}{R}\right) = \frac{m_T \frac{k}{m} \alpha}{k} = \frac{m_T}{m} \alpha \leq \alpha$$ as desired.

## Why we like FDR

• Adaptable: tolerance for number false rejections changes with total number of rejections
• Scalable: everything scales to number of hypotheses being tested: more false positives, more rejections, etc.
• Hypotheses are separable: same inferences drawn when pooling hypotheses all together or splitting into multiple large and heterogeneous groups of hypotheses

## Shrinkage: It's a Good Thing

Think about a regression that has many country indicator variables in it (cell means model). You could:

• Not pool: set each country's intercept to its mean
• Fully pool: set each countries coefficient to the same value
• Partially pool: balance information from the country itself with what you know about countries in general
• Partial pooling has better average mean squared error when you have 3 or more groups (Efron & Morris, 1973)

## How to shrink:

• James-Stein Estimator
• Hierarchical Bayes (350D),for more see Gelman, Hill and Yajima (2015)
• Penalized Regression (350C)
• Mixed Effects Models

## Weighted False Discovery Rate (Benjamini and Hochberg, 1997)

\begin{align} Q(w) &= \begin{cases} \frac{ \sum_{i=1}^k w_i V_i}{\sum_{i=1}^k w_i},& R > 0 \\ 0,& R= 0 \end{cases} \end{align}

Order the p-values from smallest to largest, and reject the first $$k$$ hypotheses, choosing k by:

\begin{align} \hat{k}= \max \{k : p_k \leq \tfrac{\sum_{i=1}^k w_i}{m} \alpha \} \label{eqn:wbh} \end{align}