Homework 1 Key

PS 452: Text as Data

Fall 2014

Question 1. Using Python

See hw1_key.py

Question 2. Properties of Random Variables

  1. Suppose \(X\) is a random variable, with \(E[X] = \mu\) and var\((X) = \sigma^2\). Show that \(c = E[X]\) minimizes

\[ \begin{eqnarray} E[ (X - c)^2] \nonumber \end{eqnarray} \] Why does this suggest \(E[X]\) is a ``good" guess for the value of \(X\)?

Answer: \[ \begin{eqnarray} E[ (X - c)^2] & = & E[X^2] - 2 c E[X] + E[c^2] \nonumber \\ & = & E[X^2] - 2 cE[X] + c^2 \nonumber \\ \frac{\partial E[ (X - c)^2]}{\partial c} & = & - 2 E[X] + 2 c \nonumber \\ 0 & = & - 2 E[X] + 2 c^{*} \nonumber \\ E[X] & = & c^{*} \nonumber \end{eqnarray} \]

It suggests that the \(E[X]\) is the value of \(c\) that minimizes the average squared distance.

  1. Suppose \(Y\) and \(Z\) are random variables, with joint density \(f(y, z)\).

    1. How do we obtain the marginal distribution of \(Y\), \(f_{Y}(y)\) (should be an expression involving an integral)?

    Answer \[ \begin{eqnarray} f_{Y}(y) & = & \int_{-\infty}^{\infty} f(y, z) dz \nonumber \end{eqnarray} \]

    1. How do we obtain the marginal distribution of \(Z\), \(f_{Z}(z)\) (should be an expression involving an integral)?

    Answer \[ \begin{eqnarray} f_{Z}(z) & = & \int_{-\infty}^{\infty} f(y, z) dy \nonumber \end{eqnarray} \]

    1. Show that if \(Y\) and \(Z\) are independent, \(E[YZ] = E[Y]E[Z]\).

    Answer \[ \begin{eqnarray} E[YZ] & = & \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} YZ f(y, z) dydz \nonumber \\ & = & \int_{-\infty}^{\infty}\int_{-\infty}^{\infty} YZ f_{Y}(y) f_{Z}(z) dy dz \nonumber \\ & = & \int_{-\infty}^{\infty} Y f_{Y}(y)dy \int_{-\infty}^{\infty} f_{Z}(z) dz \nonumber \\ & = & E[Y] E[Z] \nonumber \end{eqnarray} \]

Question 3: Finding Critical Values for a Function

Suppose we have a function \(f:\Re\rightarrow\Re\), \(f(x) = \sin(x)\).

  1. Using R plot \(\sin(x)\) for \(x \in [-2 \pi, 2 \pi]\) (here \(\pi\) is the mathematical constant).
xx<- seq(-2*pi, 2*pi, len = 1000)
plot(sin(xx)~xx, type='l', lwd = 3, xlab = 'x', ylab = 'sin(x)', main= "f(x) = sin(x)")

plot of chunk unnamed-chunk-1

  1. What is \(f^{'}(x)\) (first derivative at \(x\))? Using R plot it over \([-2 \pi, 2 \pi]\)
plot(cos(xx)~xx, type='l', lwd = 3, xlab = 'x', ylab = 'cos(x)', main= "f'(x) = cos(x")

plot of chunk unnamed-chunk-2

  1. What is \(f^{''}(x)\) (second derivative at \(x\))? Using R plot it over \([-2 \pi, 2 \pi]\)
plot(I(-sin(xx))~xx, type='l', lwd = 3, xlab = 'x', ylab = '-sin(x)', main= "f''(x) = -sin(x)")

plot of chunk unnamed-chunk-3

We say that \(x^{*}\) is a critical value for a function if \(f^{'}(x^{*}) = 0\). We can find \(x^{*}\) algebraically. Or, we can use a computational approach.\ We discussed the Newton-Raphson approach in class on Thursday. We’re going to write our own implementation of the algorithm in R and apply it to find the critical values of \(f(x) = \sin(x)\)

  1. Suppose that we have current guess for the root \(x_{t}\). Then the updated guess, \(x_{t+1}\) is given by \[ \begin{eqnarray} x_{t+1} & = & x_{t} - \frac{f^{'}(x_{t})}{f^{''}(x_{t})} \end{eqnarray} \]

where \(f^{'}(x_{t})\) is the first derivative evaluated at \(x_{t}\) and \(f^{''}(x_{t})\) is the second derivative evaluated at \(x_{t}\).

Write a function in R that provides the update step for some value \(x_{t}\) if \(f(x) = \sin(x)\).

  up_newt<- function(x){
        x.1<- x + cos(x)/sin(x)
        return(x.1)
  }  
  1. The Newton-Raphson algorithm continues to update until the size of the update step drops below a threshold. Using the while command in R, write a loop that continues updating until the change,( \(|x_{t+1} - x_{t}|\)) drops below \(10^{-5}\).
    while(diff>tol){
            x<- x.1
            x.1<- up_newt(x)
            diff<- abs(x.1 - x)
    }
  1. Place the while loop in a function that returns the converged value \(x_{\text{final}}\)
  newt_raph<- function(x, tol){
        x.1<- up_newt(x)
        diff<- abs(x.1 - x)
        while(diff>tol){
            x<- x.1
            x.1<- up_newt(x)
            diff<- abs(x.1 - x)
        }
        return(x.1)
    }
  1. Use your function with initial guesses \(-2, -1, 1, 2\). What values do you obtain? Now examine the behavior close to \(0\). Why is it so unstable?
  guesses = c(-2,-1,1,2)
  for (i in guesses ){
        print(newt_raph(i, 10^-5))
  }  
## [1] -1.571
## [1] -1.571
## [1] 1.571
## [1] 1.571

The algorithm is unstable because sin(0) = 0 . The result is that small differences in starting values are causing very big differences in the first step of the algorithm.

Problem 3: Probit Regression with a Prior

Suppose that we assume the following data generation process \[ \begin{eqnarray} Y_{i} & \sim & \text{Bernoulli}(\pi_{i} ) \nonumber \\ \pi_{i} & = & \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) \nonumber \\ \beta_{j} & \sim & \text{Normal}(\mu, \sigma^{2}_{j} )\nonumber \end{eqnarray} \] with \(\boldsymbol{X}_{i} = (1, x_{i})\) for all \(i\) \((i = 1, ..., N)\), \(\boldsymbol{\beta} = (\beta_{1}, \beta_{2})\), and \(\Phi(\cdot)\) is the cumulative normal distribution function.

This is very similar to the model described in class, but now we have added a prior on \(\boldsymbol{\beta}\). This slightly alters the objective function: \[ \begin{eqnarray} p(\boldsymbol{\beta} | \boldsymbol{Y}, \boldsymbol{X}) & \propto & p(\boldsymbol{\beta}|\mu, \sigma^2) \times p(\boldsymbol{Y}| \boldsymbol{X}, \boldsymbol{\beta}) \label{e:post} \\ & \propto & \prod_{j=1}^{2} \frac{1}{\sqrt{2\pi} \sigma} \exp\left( - \frac{(\beta_{j} - \mu )^2}{2 \sigma^2} \right) \times \prod_{i=1}^{N} \Phi(\boldsymbol{X}_{i}\boldsymbol{\beta})^{Y_{i} } ( 1- \Phi(\boldsymbol{X}_{i}\boldsymbol{\beta})^{1 - Y_{i} } \nonumber \end{eqnarray} \]

In this problem, we will examine how the prior on \(\boldsymbol{\beta}\), and in particular the values we set for \(\mu\) and \(\sigma^2\), alters our inferences about \(\boldsymbol{\beta}\).

  1. Analytically, write out the \(\log(p(\boldsymbol{\beta} | \boldsymbol{Y}, \boldsymbol{X}))\).
    Answer \[ \begin{eqnarray} \log p(\boldsymbol{\beta}| \boldsymbol{Y}, \boldsymbol{X}) & = & \sum_{j=1}^{2} \left(- \log \sigma + \frac{- (\beta_{j} - \mu)^{2}}{2 \sigma^2} \right) + \sum_{i=1}^{N} Y_{i} \log \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) + (1- Y_{i}) \log \left( 1 - \Phi(\boldsymbol{X}_{i} \boldsymbol{\beta}) \right) + c \nonumber \end{eqnarray} \]

  2. In R create a function for the \(\log\) of Equation 1.

    log.post<- function(pars, mu, sig.2, Y, X){
        beta<- pars[1:2]
        part1<- dnorm(beta, mean = mu, sd = sqrt(sig.2), log = T)
        part2<- X%*%beta
        probs<- pnorm(part2)
        combine<- log(probs)*Y + log(1- probs)*(1-Y)
        out<- sum(part1) + sum(combine)
        return(out)
    }
  1. Using the synthetic data and the optim guide from class, use optim to find \(\widehat{\boldsymbol{\beta}}\) with \(\mu = 0\) and \(\sigma^2 = 1000\)
  data_set<- read.delim('~/Dropbox/TextAsData/ProblemSets/PS1/DataSet.csv', sep=',')

  data_set<- as.matrix(data_set)
  Y<- data_set[,1]
  X<- data_set[,-1]
  
  max.output<- optim(c(1, 1), log.post, method = 'BFGS', control = list(trace =100, fnscale = -1), hessian = T,  X = X, Y = Y , mu = 0, sig.2 = 1000)
## initial  value 794.131479 
## iter  10 value 194.440790
## iter  10 value 194.440790
## iter  10 value 194.440790
## final  value 194.440790 
## converged
  max.output
## $par
## [1]  0.8724 -1.3899
## 
## $value
## [1] -194.4
## 
## $counts
## function gradient 
##       34       10 
## 
## $convergence
## [1] 0
## 
## $message
## NULL
## 
## $hessian
##         [,1]   [,2]
## [1,] -181.75 -70.77
## [2,]  -70.77 -97.58
  1. Set \(\mu = 1\) and then vary \(\sigma^2\). Using a for loop, store estimates of how \(\beta_{2}\) changes as you vary \(\sigma^2\) from \(10\) to \(0.01\). Plot \(\beta_{2}\) against \(\sigma^2\) and describe what happens as \(\sigma^2\) varies.
  sigma.2.seq<- seq(0.01, 10, len = 1000)
  store.est<- c()
  
  for(z in 1:length(sigma.2.seq)){
        temp<- optim(c(1, 1), log.post, method = 'BFGS', control = list(trace =100, fnscale = -1), hessian = T,  X = X, Y = Y , mu = 1, sig.2 = sigma.2.seq[z])
        store.est[z]<- temp$par[2]
}
  #output surpressed

As \(\sigma^2 \rightarrow\), the ``prior" becomes more informative and affects the inference more.

  plot(store.est~sigma.2.seq, type='b', xlab = 'Sigma^2', ylab = 'Beta2')

plot of chunk unnamed-chunk-11