Sensitivity analysis is a useful, but underused, tool when analyzing observational data. In broad terms, the technique asks how strong unobserved confounders would have to be in order to alter the research’s conclusions. There are two formal tests of this general idea that take slightly different approaches. The first, developed by Rosenbaum (2002), and implemented in the rbounds
package, asks how strong an unobserved cofounder (U) on the probability of treatment assignment would have to be affect outcome. This test implicitly assumes that the unobserved cofounder is strongly correlated with the outcome. This approach is convenient because it only requires reporting a single parameter (\(\Gamma\)) whose substantive significance is easily characterized.
An alternative method was derived by Imbens (2003) and is implemented in the treatSens
package in . A similar implementation is available in Stata through the package. In contrast to Rosenbaum’s method, Imbens uses two parameters, alpha (\(\alpha\)) and delta(\(\delta\)) to capture the effect of an unobserved binary confounder (U) on both the assignment to treatment and the outcome. Alpha and delta are directly interpretable as the coefficients on U but it is often better to present their partial R-squared values in the treatment and outcome models. The advantage of this technique is that it allows one to vary the effect on the outcome and that these parameters can be directly compared to covariates in the model to assess their substantive significance.
This post explains the derivations and underlying logic.
This section lays out the derivation of the model, which is also explained in Imbens (2003).
First, the model assumes that the unobserved confounding variable (U) is distributed binomial with probability of success equal to \(1/2\). This assumption is mathematically convenient and could be changed in the derivation if researchers had reason to believe that an important confounding variable was distributed with a different functional form. However, given that the task at hand is to describe the effect of an unobserved covariate, it is unlikely that researchers will have strong substantive reasons to prefer any particular distribution of the variable.
\[\begin{equation} \mathbf{U} \sim Binomial(1, 1/2) \end{equation}\]This variable describes some unknown confounding variable that affects both the assignment to treatment and the outcome directly. If it could be included in the model, conventional statistical techniques would recover the correct estimate of the treatment effect (\(\tau\)). However, the inability to collect data on this variable, or perhaps failure to realize its importance, means that it has not been included in the analysis performed and that the estimate of \(\tau\) is potentially biased towards statistical significance. Concievably, unobserved confounders could be biasing the treatment variable \(\tau\) downwards and that true value has a larger magnitude, but the purpose of sensitivity analysis is to assuage doubts that the finding would disappear if alternative covariates were measured and included in the model.
Second, the model assumes that the treatment variable (W) is also binary and that it follows a logistic distribution. This is the variable of substantive interest to the research. Updates to this package will relax this assumption and allow alternative functional forms for the treatment variable.
\[\begin{equation} \mathbf{W} \sim Logistic(\gamma \mathbf{X} + \alpha \mathbf{U}) \end{equation}\]Regardless of the functional form, the probability of treatment assignment depends on the set of covariates X and the unobserved variable U. The effect of covariates X on assignment to treatment are captured in the vector of coefficients \(\gamma\) and the effect of unobserved variable U is captured in the coefficient \(\alpha\).
Third, Y is assumed to be a continuous variable with a normally distributed error term \(\mathbf{\epsilon}\). The implication is that the conditional distributions of Y for treatment and control are also normally distributed given U and X. This functional form assumption can also be relatex.
\[\begin{equation} \mathbf{Y} ~ \sim Normal(\tau \mathbf{W} + \beta \mathbf{X} + \delta \mathbf{U}, \sigma^2) \end{equation}\]Regardless of the functional form, the distribution of the outcome variable Y depends on the treatment variable W, the set of covariates X, and the unobserved variable U. The effect of the treatment variable W on the outcome is captured by coefficient \(\tau\), while the effect of the covariates X are captured by the vector of coefficients \(\beta\) and the effect of unobserved variable U is captured in the coefficient \(\delta\). The set of covariates X should include all measurable variables that effect either assignment to treatment or the outcome.
Setting the values of the coefficients \(\alpha\) and \(\delta\) to 0 recovers the OLS maximum likelihood estimates for \(\hat{\tau}\), \(\hat{\beta}\), and \(\hat{\gamma}\). Sensitivity analysis is performed by supposing alternative values of \(\alpha\) and \(\beta\) and calculating the maximum liklihood estimate of the parameter \(\hat{\tau}\). By varying these parameters, any estimate of \(\hat{\tau}\) could be achieved. By specifying a desired \(\hat{\tau}\) outcome, such as half of the original treatment effect or no treatment effect at all, one can trace out a contour of alpha and delta values that would yield this coefficient value.
The reasonability of these pairs of \(\alpha\) and \(\delta\) values can be assessed by comparing their effect on the R-squared of the treatment and outcome models to the covariates included in the model. If, in order to reach the targeted value of \(\hat{\tau}\), the impact of alpha and delta on the explanatory power of these models would have to be significantly larger than other variables already included in the model known to have a large substantive impact, then it is unlikley that the such a confounding variable exists.
In order to compute the maximum likelihood estimates of the model parameters \(\tau\), \(\beta\), \(\sigma^2\), \(\gamma\), \(\alpha\), and \(\delta\), we must derive the log likelihood of the model with the assumptions stated above.
First, write out the likelihood function of the parameters given the sample data. For conciseness, I substitute all parameters other than \(\tau\) with \(\cdots\). \[L(\tau, \cdots) = \prod\limits_{i=1}^n p(y_i, w_i, u_i | \tau, \cdots)\] Applying the log operator, we get \[\log L(\tau, \cdots) = \sum\limits_{i=1}^n \log [ p(y_i, w_i, u_i | \tau, \cdots) ] \]
Second, we decompose the joint probability of Y, W, and U using the law of conditional probability. These are still conditional on the parameters \(\tau, \cdots\), which I suppress from here on. \[p(Y, W, U) = p(Y|W,U) \times p(W|U) \times p(U) \]
With the assumptions laid out in the previous section,
\[p(\mathbf{Y}|\mathbf{W},\mathbf{U}, \mathbf{X}) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp \{ \frac{-1}{2\sigma^2}(\mathbf{Y} - \tau \mathbf{W} - \beta \mathbf{X} - \delta \mathbf{U})^2 \}\] \[p(\mathbf{W}|\mathbf{U}, \mathbf{X}) = \frac{(\exp(\gamma \mathbf{X} + \alpha \mathbf{U})^{\mathbf{W}})}{1 + \exp(\gamma \mathbf{X} + \alpha \mathbf{U})}\] \[P(\mathbf{U}) = 1/2\]
Third, we know that U only takes on two values, 1 or 0, each with probability 1/2. Use the law of total probability to break out conditional probabilities into these two conditions, i.e. when U = 1 and when U = 0. This is equivalent to integrating out the unobserved covariate, which is how Imbens (2003) describes this step.
\[p(Y, W, U) = \ p(Y|W,U=1) \times p(W|U=1) \times \dfrac{1}{2} \] \[+ p(Y|W,U=0) \times p(W|U=0) \times \dfrac{1}{2} \]
Notice that we’re basically mixing a model where there is a confounder (U=1) with a model without a confounder (U=0) with equal probabilities. However, the addition operator inside of the logarithm operator prevents us from solving this equation analytically and we must instead rely on numeric optimization.
In the following section, I illustrate sensitivity analysis using simulated toy data (where the true treatment effect is known) and real world experimental data from the Lalonde experiment on earnings after job training.
Lets generate data where both assignment to treatment W and the outcome Y are correlated with a known variable X and an unknown variable U. In the first chunk of code, I generate variables x
and u
, set their correlations with w
, and then generate w
from these correlations.
n <- 1000
#known covariate x
x <- matrix(sort(runif(n, -10, 10)),ncol=1)
gamma <- matrix(.1, nrow=1)
#unobserved variable affecting outcome
u <- matrix(sample(c(0,1),n,replace=TRUE), ncol=1)
alpha <- matrix(1, nrow=1)
#binary treatment is correlated with x and u
w <- matrix(apply(plogis(x%*%gamma + u%*%alpha), 1,
function(p) sample(c(1,0),1, prob=c(p, 1-p))), ncol=1)
Next, I generate the outcome variable y
from x
, u
, and w
and add a normally distributed error term with standard deviation 10.
#outcome variable y
beta <- matrix(1/3, nrow=1) #correlation between x and y
tau <- matrix(10, nrow=1) #correlation between w and y
delta <- matrix(10, nrow=1) #correlation between u and y
sigma <- 10 #error term
epsilon <- matrix(rnorm(n, 0, sigma), ncol=1)
y <- x%*%beta + w %*% tau + u %*% delta + epsilon
Without controlling for U, estimates of \(\tau\) are biased. An ordinary least squares regression excluding U returns an estimate for \(\tau\) of 16 when the true value is known to be 10. This is because U induced a higher probability of treatment assignment for units that also had higher outcome value and its effect was not controlled for in the regression. Including the variable U correctly estiamtes the casual effect of W at around 10. The effect of the covariates is now also correctly estimated at around .33 but is of less significance to the research at hand.
library(stargazer)
mod1 <- lm(y ~ w + x - 1)
mod2 <- lm(y ~ w + x + u -1)
stargazer(mod1, mod2, type="html")
Dependent variable: | ||
y | ||
(1) | (2) | |
w | 15.681^{***} | 9.992^{***} |
(0.467) | (0.544) | |
x | 0.171^{***} | 0.315^{***} |
(0.063) | (0.057) | |
u | 9.724^{***} | |
(0.601) | ||
Observations | 1,000 | 1,000 |
R^{2} | 0.543 | 0.638 |
Adjusted R^{2} | 0.543 | 0.637 |
Residual Std. Error | 11.320 (df = 998) | 10.079 (df = 997) |
F Statistic | 594.017^{***} (df = 2; 998) | 586.832^{***} (df = 3; 997) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
Knowing the true alpha and delta parameters, we can return the same results using numeric optimization. I use the log liklihood function defined by Imbens and programmed by me in the following code section to find the maximum likelihood estimate for parameter \(\tau\) using optim
. The function needs the data (stored in a list object), a set of start values, and a pair of alpha and delta values. I use the biased coefficients from Model 1 above as reasonable starting values and give optim the real alpha and delta values. Given the true alpha and delta values, optim recovers the same unbiased estimate of tau
as Model 2, with a little random noise.
loglik.lm <- function(pars, alpha, delta, env){
#get the variables
x <- env$data$x
y <- env$data$y
w <- env$data$w
#parameters are grouped as follows in a single vector:
#betas, gammas, tau, sigma
ncovars <- dim(x)[2]
beta <- matrix(pars[seq(1, ncovars, 1)],ncol=1)
gamma <- matrix(pars[seq(ncovars+1, ncovars*2,1)],ncol=1)
tau <- matrix(pars[length(pars)-1], ncol=1)
sigma <- matrix(exp(pars[length(pars)]), ncol=1)
#the sigma parameter is logged to make optimization easier
#so it has to be exponentiated before being used
#parameters to use in the likelihood functions
mu <- x %*% beta + w %*% tau
mu.delta <- mu + delta
lambda <- x %*% gamma
lambda.alpha <- lambda + alpha
ll <- sum(log(1/2 * dnorm(y, mean=mu, sd=sigma) *
((exp(lambda)^w)/(1+exp(lambda))) +
1/2 * dnorm(y, mean=mu.delta, sd=sigma) *
((exp(lambda.alpha)^w)/(1+exp(lambda.alpha)))))
return(ll)
}
#generate a new data vector
tmp <-list()
tmp$data$y <- y
tmp$data$x <- x
tmp$data$w <- w
#get reasonable starting values
tau <- lm(y ~ w + x - 1, data=tmp$data)$coefficients[1]
betas <- lm(y ~ w + x - 1, data=tmp$data)$coefficients[-1]
gammas <- glm(w ~ x - 1, family=binomial(link='logit'), data=tmp$data)$coefficients
sigma <- log(summary(lm(y ~ w + x - 1, data=tmp$data))$sigma)
init <- c(betas, gammas, tau, sigma)
optim(par=init, fn=loglik.lm,
alpha=1,delta=10,
env=tmp,
control=list(fnscale=-1))$par["w"]
## w
## 9.645083
But in non-trival examples we won’t know the true values of alpha and delta. In that case, we take the opposite approach and ask how strong alpha and delta would have to be in order to change our conclusions. This is the approach that the treatSens
and other packages take. They search over potential values of alpha and delta to create a contour map of how tau would change given each combination. You then compare these potential values of alpha and delta against other covariates included in your model.
The default plot colors for treatSens
are not great for presentations, and are difficult for base R users to change. I have written a script to make some simple changes to improve presentation, which can also serve as a guide for making your own changes. I have added the tag “vb-edit” where I have made changes to the original code. Download the file here and place it in the working directory for your analysis. Use the source()
function to call it.
source("sensPlot_custom.R")