### Homework 1 Key

#### Fall 2014

Question 1. Using Python

See hw1_key.py

Question 2. Properties of Random Variables

1. Suppose $$X$$ is a random variable, with $$E[X] = \mu$$ and var$$(X) = \sigma^2$$. Show that $$c = E[X]$$ minimizes

$\begin{eqnarray} E[ (X - c)^2] \nonumber \end{eqnarray}$ Why does this suggest $$E[X]$$ is a good" guess for the value of $$X$$?

Answer: $\begin{eqnarray} E[ (X - c)^2] & = & E[X^2] - 2 c E[X] + E[c^2] \nonumber \\ & = & E[X^2] - 2 cE[X] + c^2 \nonumber \\ \frac{\partial E[ (X - c)^2]}{\partial c} & = & - 2 E[X] + 2 c \nonumber \\ 0 & = & - 2 E[X] + 2 c^{*} \nonumber \\ E[X] & = & c^{*} \nonumber \end{eqnarray}$

It suggests that the $$E[X]$$ is the value of $$c$$ that minimizes the average squared distance.

1. Suppose $$Y$$ and $$Z$$ are random variables, with joint density $$f(y, z)$$.

1. How do we obtain the marginal distribution of $$Y$$, $$f_{Y}(y)$$ (should be an expression involving an integral)?

Answer $\begin{eqnarray} f_{Y}(y) & = & \int_{-\infty}^{\infty} f(y, z) dz \nonumber \end{eqnarray}$

1. How do we obtain the marginal distribution of $$Z$$, $$f_{Z}(z)$$ (should be an expression involving an integral)?

Answer $\begin{eqnarray} f_{Z}(z) & = & \int_{-\infty}^{\infty} f(y, z) dy \nonumber \end{eqnarray}$

1. Show that if $$Y$$ and $$Z$$ are independent, $$E[YZ] = E[Y]E[Z]$$.

Answer $\begin{eqnarray} E[YZ] & = & \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} YZ f(y, z) dydz \nonumber \\ & = & \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} YZ f_{Y}(y) f_{Z}(z) dy dz \nonumber \\ & = & \int_{-\infty}^{\infty} Y f_{Y}(y)dy \int_{-\infty}^{\infty} f_{Z}(z) dz \nonumber \\ & = & E[Y] E[Z] \nonumber \end{eqnarray}$

Question 3: Finding Critical Values for a Function

Suppose we have a function $$f:\Re\rightarrow\Re$$, $$f(x) = \sin(x)$$.

1. Using R plot $$\sin(x)$$ for $$x \in [-2 \pi, 2 \pi]$$ (here $$\pi$$ is the mathematical constant).
xx<- seq(-2*pi, 2*pi, len = 1000)
plot(sin(xx)~xx, type='l', lwd = 3, xlab = 'x', ylab = 'sin(x)', main= "f(x) = sin(x)")

1. What is $$f^{'}(x)$$ (first derivative at $$x$$)? Using R plot it over $$[-2 \pi, 2 \pi]$$
plot(cos(xx)~xx, type='l', lwd = 3, xlab = 'x', ylab = 'cos(x)', main= "f'(x) = cos(x")

1. What is $$f^{''}(x)$$ (second derivative at $$x$$)? Using R plot it over $$[-2 \pi, 2 \pi]$$
plot(I(-sin(xx))~xx, type='l', lwd = 3, xlab = 'x', ylab = '-sin(x)', main= "f''(x) = -sin(x)")

We say that $$x^{*}$$ is a critical value for a function if $$f^{'}(x^{*}) = 0$$. We can find $$x^{*}$$ algebraically. Or, we can use a computational approach.\ We discussed the Newton-Raphson approach in class on Thursday. We’re going to write our own implementation of the algorithm in R and apply it to find the critical values of $$f(x) = \sin(x)$$

1. Suppose that we have current guess for the root $$x_{t}$$. Then the updated guess, $$x_{t+1}$$ is given by $\begin{eqnarray} x_{t+1} & = & x_{t} - \frac{f^{'}(x_{t})}{f^{''}(x_{t})} \end{eqnarray}$

where $$f^{'}(x_{t})$$ is the first derivative evaluated at $$x_{t}$$ and $$f^{''}(x_{t})$$ is the second derivative evaluated at $$x_{t}$$.

Write a function in R that provides the update step for some value $$x_{t}$$ if $$f(x) = \sin(x)$$.

  up_newt<- function(x){
x.1<- x + cos(x)/sin(x)
return(x.1)
}  
1. The Newton-Raphson algorithm continues to update until the size of the update step drops below a threshold. Using the while command in R, write a loop that continues updating until the change,( $$|x_{t+1} - x_{t}|$$) drops below $$10^{-5}$$.
    while(diff>tol){
x<- x.1
x.1<- up_newt(x)
diff<- abs(x.1 - x)
}
1. Place the while loop in a function that returns the converged value $$x_{\text{final}}$$
  newt_raph<- function(x, tol){
x.1<- up_newt(x)
diff<- abs(x.1 - x)
while(diff>tol){
x<- x.1
x.1<- up_newt(x)
diff<- abs(x.1 - x)
}
return(x.1)
}
1. Use your function with initial guesses $$-2, -1, 1, 2$$. What values do you obtain? Now examine the behavior close to $$0$$. Why is it so unstable?
  guesses = c(-2,-1,1,2)
for (i in guesses ){
print(newt_raph(i, 10^-5))
}  
## [1] -1.571
## [1] -1.571
## [1] 1.571
## [1] 1.571

The algorithm is unstable because sin(0) = 0 . The result is that small differences in starting values are causing very big differences in the first step of the algorithm.

Problem 3: Probit Regression with a Prior

Suppose that we assume the following data generation process $\begin{eqnarray} Y_{i} & \sim & \text{Bernoulli}(\pi_{i} ) \nonumber \\ \pi_{i} & = & \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) \nonumber \\ \beta_{j} & \sim & \text{Normal}(\mu, \sigma^{2}_{j} )\nonumber \end{eqnarray}$ with $$\boldsymbol{X}_{i} = (1, x_{i})$$ for all $$i$$ $$(i = 1, ..., N)$$, $$\boldsymbol{\beta} = (\beta_{1}, \beta_{2})$$, and $$\Phi(\cdot)$$ is the cumulative normal distribution function.

This is very similar to the model described in class, but now we have added a prior on $$\boldsymbol{\beta}$$. This slightly alters the objective function: $\begin{eqnarray} p(\boldsymbol{\beta} | \boldsymbol{Y}, \boldsymbol{X}) & \propto & p(\boldsymbol{\beta}|\mu, \sigma^2) \times p(\boldsymbol{Y}| \boldsymbol{X}, \boldsymbol{\beta}) \label{e:post} \\ & \propto & \prod_{j=1}^{2} \frac{1}{\sqrt{2\pi} \sigma} \exp\left( - \frac{(\beta_{j} - \mu )^2}{2 \sigma^2} \right) \times \prod_{i=1}^{N} \Phi(\boldsymbol{X}_{i}\boldsymbol{\beta})^{Y_{i} } ( 1- \Phi(\boldsymbol{X}_{i}\boldsymbol{\beta})^{1 - Y_{i} } \nonumber \end{eqnarray}$

In this problem, we will examine how the prior on $$\boldsymbol{\beta}$$, and in particular the values we set for $$\mu$$ and $$\sigma^2$$, alters our inferences about $$\boldsymbol{\beta}$$.

1. Analytically, write out the $$\log(p(\boldsymbol{\beta} | \boldsymbol{Y}, \boldsymbol{X}))$$.
Answer $\begin{eqnarray} \log p(\boldsymbol{\beta}| \boldsymbol{Y}, \boldsymbol{X}) & = & \sum_{j=1}^{2} \left(- \log \sigma + \frac{- (\beta_{j} - \mu)^{2}}{2 \sigma^2} \right) + \sum_{i=1}^{N} Y_{i} \log \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) + (1- Y_{i}) \log \left( 1 - \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) \right) + c \nonumber \end{eqnarray}$

2. In R create a function for the $$\log$$ of Equation 1.

    log.post<- function(pars, mu, sig.2, Y, X){
beta<- pars[1:2]
part1<- dnorm(beta, mean = mu, sd = sqrt(sig.2), log = T)
part2<- X%*%beta
probs<- pnorm(part2)
combine<- log(probs)*Y + log(1- probs)*(1-Y)
out<- sum(part1) + sum(combine)
return(out)
}
1. Using the synthetic data and the optim guide from class, use optim to find $$\widehat{\boldsymbol{\beta}}$$ with $$\mu = 0$$ and $$\sigma^2 = 1000$$
  data_set<- read.delim('~/Dropbox/TextAsData/ProblemSets/PS1/DataSet.csv', sep=',')

data_set<- as.matrix(data_set)
Y<- data_set[,1]
X<- data_set[,-1]

max.output<- optim(c(1, 1), log.post, method = 'BFGS', control = list(trace =100, fnscale = -1), hessian = T,  X = X, Y = Y , mu = 0, sig.2 = 1000)
## initial  value 794.131479
## iter  10 value 194.440790
## iter  10 value 194.440790
## iter  10 value 194.440790
## final  value 194.440790
## converged
  max.output
## $par ## [1] 0.8724 -1.3899 ## ##$value
## [1] -194.4
##
## $counts ## function gradient ## 34 10 ## ##$convergence
## [1] 0
##
## $message ## NULL ## ##$hessian
##         [,1]   [,2]
## [1,] -181.75 -70.77
## [2,]  -70.77 -97.58
1. Set $$\mu = 1$$ and then vary $$\sigma^2$$. Using a for loop, store estimates of how $$\beta_{2}$$ changes as you vary $$\sigma^2$$ from $$10$$ to $$0.01$$. Plot $$\beta_{2}$$ against $$\sigma^2$$ and describe what happens as $$\sigma^2$$ varies.
  sigma.2.seq<- seq(0.01, 10, len = 1000)
store.est<- c()

for(z in 1:length(sigma.2.seq)){
temp<- optim(c(1, 1), log.post, method = 'BFGS', control = list(trace =100, fnscale = -1), hessian = T,  X = X, Y = Y , mu = 1, sig.2 = sigma.2.seq[z])
store.est[z]<- temp\$par[2]
}
#output surpressed

As $$\sigma^2 \rightarrow$$, the prior" becomes more informative and affects the inference more.

  plot(store.est~sigma.2.seq, type='b', xlab = 'Sigma^2', ylab = 'Beta2')