2024-01-01
Case studies:
Cloud seeding
Effects of agent orange
Robustness and resistance of two-sample \(t\)-tests
Transformations
Rainfall
stratified by Treatment
log(Rainfall)
stratified by Treatment
Our analysis of beaks
presumed \(\sigma^2_A=\sigma^2_B\) (as well as normality)
What happens if:
Unequal variance: \(\sigma^2_A \neq \sigma^2_B\)?
Populations are not normal?
Observations are not independent?
Data are contaminated with outliers?
If \(n_A \approx n_B\) then small effect.
Larger issue if \(n_A \neq n_B\).
\(t\)-tests work poorly here
Main problem is that \(SE\) will be off, usually we underestimate it…
A point in the data that is far from the others.
Could be an accident in dataset construction, or could be due to long tails…
Try analyzing data with / without candidate outliers
Dioxin
stratified by Veteran
Two Vietnam vets with level > 20
Histograms have similar shape, so skewness similar + large sample sizes \(\implies\) \(t\)-test probably not too bad.
We saw earlier that histogram for log(Rainfall)
looked more “normal”.
Using \(t\)-test on log(Rainfall)
has \(\mu_{\tt Treated}\) as the mean of the log of rainfall after seeding…
As noted in the book, the estimated effect is on log scale.
Can be interpreted reasonably well when distribution of log-transformed data are symmetric.
We estimate Treated
has \(e^{5.13-3.99}\) multiplicative effect on median(Rainfall)
.