Multiple samples

One-way ANOVA

STATS 191

2024-01-01

Outline

  • Case studies:

    1. Does diet affect longevity?

    2. the Spock consipracy trial

  • Sums of squares

  • F-tests

Case study A: does diet affect longevity?

One-way ANOVA model: generalization of two-sample

Model details

  • Data: \(Y_{ij}, 1 \leq i \leq n_j, j \in {\tt [lopro, N/N85, N/R40, N/R50, NP, R/R50]}\)

  • Model: \(Y_{ij} \sim N(\mu_j, \sigma^2)\) (Note: assumed equal variance here!)

  • Null: \(H_0\) no difference: \(\mu_{\tt lopro}=\dots=\mu_{\tt R/R50}\)

  • Alternative: \(H_a\) model holds for some values \(\mu_{\tt lopro},\dots,\mu_{\tt R/R50}\) but they are not all identical.

Fitting the model

Null model

Comparing the models

How much better a fit is model then null_model?

Extra sum of squares

\[ \begin{aligned} SSE_R-SSE_F &= \|\hat{Y}_F-\hat{Y}_R\|^2_2 \\ &= \sum_{j=1}^6 \sum_{i=1}^{n_j}(\bar{Y}_j-\bar{Y}_{\cdot})^2 \end{aligned} \]

\(F\)-statistic

Convert “extra” sum of squares to unitless quantity

\[ S^2_P = \frac{\sum_{j=1}^6 (n_j-1) \cdot S^2_j}{\sum_{j=1}^6 (n_j-1)} \]

Using anova

Comparing two groups: N/N85 vs N/R50

Confidence interval

Comparing two groups: R/R50 vs N/R50

Confidence interval

Case study B: is the jury pool representative?

Any differences between judges?

How about Spock’s judge vs. others?

Using anova

How about variation only among others?

Summarizing all 3 models

Some diagnostic plots

longevity study

Residual vs. fitted