Multiple samples
One-way ANOVA
2024-01-01
Case study A: does diet affect longevity?
One-way ANOVA model: generalization of two-sample
Model details
Data: \(Y_{ij}, 1 \leq i \leq n_j, j \in {\tt [lopro, N/N85, N/R40, N/R50, NP, R/R50]}\)
Model: \(Y_{ij} \sim N(\mu_j, \sigma^2)\) (Note: assumed equal variance here!)
Null: \(H_0\) no difference: \(\mu_{\tt lopro}=\dots=\mu_{\tt R/R50}\)
Alternative: \(H_a\) model holds for some values \(\mu_{\tt lopro},\dots,\mu_{\tt R/R50}\) but they are not all identical.
Fitting the model
Null model
Comparing the models
How much better a fit is model
then null_model
?
\[
\begin{aligned}
SSE_R-SSE_F &= \|\hat{Y}_F-\hat{Y}_R\|^2_2 \\
&= \sum_{j=1}^6 \sum_{i=1}^{n_j}(\bar{Y}_j-\bar{Y}_{\cdot})^2
\end{aligned}
\]
\(F\)-statistic
\[
S^2_P = \frac{\sum_{j=1}^6 (n_j-1) \cdot S^2_j}{\sum_{j=1}^6 (n_j-1)}
\]
Using anova
Comparing two groups: N/N85
vs N/R50
Confidence interval
Comparing two groups: R/R50
vs N/R50
Confidence interval
Case study B: is the jury pool representative?
Any differences between judges?
How about Spock’s judge vs. others?
Using anova
How about variation only among others?
Summarizing all 3 models
Some diagnostic plots
longevity
study
Residual vs. fitted