Drawing Statistical Conclusions

J. Taylor

2024-01-01

Case Study A: Motivation for creative writers

  • Creative writing students randomly assigned to intrinsic vs. extrinsic priming questionnaires.

Summarizing the groups

Extrinsic Group

Intrinsic Group

Histogram of Score stratified by Sex

Case Study B: Difference in salaries between male and female employees

  • Salaries from Harris Trust and Bank over years 1969-1977

Females

Males

Histogram of Salary stratified by Sex

Boxplot of Salary stratified by Sex

Key differences between the studies

  • Creative writing study was a randomized experiment.

  • Salary dataset was an observational study.

Implications

  • Differences in strength of conclusions: randomized experiments like creativity can admit causal conclusions

  • Generalizability: what population are the data from?

    1. If we consider Harris a typical bank, then salaries represents a sample of starting salaries.

Making statistical inferences

  • What sort of conclusions are we entitled to make?

Modelling uncertainty in creativity

  • The observed difference (i.e. treatment_effect) is not 0. Is there a real difference?

  • We need a (statistical) model to draw statistical inferences!

Mental model: the world before randomization

  • Potential outcomes before randomization

Mental model: the null hypothesis

  • \(H_0\): green outcome identical to red

Mental model: the world after randomization

  • Observed outcomes after randomization

Computing difference via t.test

Effect

Null hypothesis: no difference in Score between the groups

Repeated 10000 times

Modelling uncertainty in salaries

  • The difference is not 0. Is the difference real?

  • We need a model to draw statistical inferences!

Mental model: Male and Female salaries

  • There are two populations of salaries

Mental model: Male and Female salaries

  • \(H_0\): distribution of orange box identical to purple

Computing difference via t.test

Repeated 10000 times

Other issues

  • We used the same method even for these different studies… does this make sense?

  • Terminology:

    1. Parameter: a property of the probability model (often written \(\theta\))

    2. Estimate: a function of the sample data (often written \(\hat{\theta}\))

    3. Goal of statistical inference is to learn about the parameter \(\theta\) from the estimate \(\hat{\theta}\)

Other issues

  • Experimental design:

    1. Randomization: individuals were randomly assigned Treatment in creative study

    2. Simple random sample: a way of sampling \(n\) from a population such that every \(n\) points are equally likely.

    3. Other sampling mechanisms: systematic sampling, cluster sampling.