Assumptions for t-tests

J. Taylor

2024-01-01

Outline

  • Case studies:

    1. Cloud seeding

    2. Effects of agent orange

  • Robustness and resistance of two-sample \(t\)-tests

  • Transformations

Case study A: effect of cloud seeding

Histogram of Rainfall stratified by Treatment

Practical tip: log transformation

Histogram of log(Rainfall) stratified by Treatment

Does cloud seeding help?

  • Histogram on log scale has similar shape for both groups \(\implies\) \(t\)-test probably well founded here.

Robustness of two sample \(t\)-tests

  • Our analysis of beaks presumed \(\sigma^2_A=\sigma^2_B\) (as well as normality)

  • What happens if:

    1. Unequal variance: \(\sigma^2_A \neq \sigma^2_B\)?

    2. Populations are not normal?

    3. Observations are not independent?

    4. Data are contaminated with outliers?

Mental model

  • Draw \(n_A\) samples from orange, \(n_B\) samples from purple.

Non-normality

Equal sample size \(n_A \approx n_B\)

  • Some effect of long tails and skewness

Unequal sample size \(n_A \neq n_B\)

  • Substantially affected by skewness

Skewness

  • If skewness of distributions is quite different, \(t\) tools are affected for small and moderate sample sizes.

Unequal standard deviations \(\sigma^2_A \neq \sigma^2_B\)

  • If \(n_A \approx n_B\) then small effect.

  • Larger issue if \(n_A \neq n_B\).

Observations not being independent

  • \(t\)-tests work poorly here

  • Main problem is that \(SE\) will be off, usually we underestimate it…

Outlier

  • A point in the data that is far from the others.

  • Could be an accident in dataset construction, or could be due to long tails…

  • Try analyzing data with / without candidate outliers

Case study B: dioxin in veterans

Histogram of Dioxin stratified by Veteran

Outliers?

  • Two Vietnam vets with level > 20

  • Histograms have similar shape, so skewness similar + large sample sizes \(\implies\) \(t\)-test probably not too bad.

Transformations

  • We saw earlier that histogram for log(Rainfall) looked more “normal”.

  • Using \(t\)-test on log(Rainfall) has \(\mu_{\tt Treated}\) as the mean of the log of rainfall after seeding…

Parameter \(\mu_{\tt Treated} - \mu_{\tt Untreated}\)

  • Acts multiplicatively

Interpretation

  • As noted in the book, the estimated effect is on log scale.

  • Can be interpreted reasonably well when distribution of log-transformed data are symmetric.

  • We estimate Treated has \(e^{5.13-3.99}\) multiplicative effect on median(Rainfall).