Syllabus
Contents
Syllabus#
Videos: Every lecture will be recorded by SCPD
Email policy: Please use the Piazza site) for most questions. For administrative issues that only concern you, email the course staff mailing list: stats202-aut2223-staff@lists.stanford.edu
Website: stats202.stanford.edu
If you are auditing the class (not registered on Axess), email us your SUNet ID in order to gain access to the lectures and homework on canvas.
Description#
Stats 202 is an introduction to statistical / machine learning. By the end of the quarter, students will:
Understand the distinction between supervised and unsupervised learning and be able to identify appropriate tools to answer different research questions.
Become familiar with basic unsupervised procedures including clustering and principal components analysis.
Become familiar with the following regression and classification algorithms: linear regression, ridge regression, the lasso, logistic regression, linear discriminant analysis, K-nearest neighbors, splines, generalized additive models, tree-based methods, and support vector machines.
Gain a practical appreciation of the bias-variance tradeoff and apply model selection methods based on cross-validation and bootstrapping to a prediction challenge.
Analyze a real dataset of moderate size using R.
Develop the computational skills for data wrangling, collaboration, and reproducible research.
Be exposed to other topics in machine learning, possibly including missing data, prediction using time series and relational data, non-linear dimensionality reduction techniques, web-based data visualizations, anomaly detection, and representation learning.
Textbook#
Introduction to Statistical Learning (with applications in R), 2nd edition
Free version download
Prerequisites#
Introductory courses in statistics or probability (e.g., Stats 60), linear algebra (e.g., Math 51), and computer programming (e.g., CS 105).
Slides#
Notes on these pages are available as HTML slides:
Labs#
Source code for labs are available to download as jupyter notebooks.
Instructions for using Jupyter notebook for labs#
conda create -n stats202_aut2022 python=3.9 -y
conda activate stats202_aut2022
pip install jupyterlab
In R:
install.packages('IRkernel', repos='http://cloud.r-project.org')
library(IRkernel)
IRkernel::installspec()
Remember to remove the
.txt
extension… when you save it. If saved in yourDownloads
directory (common with Chrome)
mv ~/Downloads/Ch2-statlearn-lab.ipynb.txt ~/Downloads/Ch2-statlearn-lab.ipynb
Open a downloaded notebook:
jupyter lab ~/Downloads/Ch2-statlearn-lab.ipynb
Where to find files#
The links above will get you to
.ipynb
versions of the labs through theDownload
option at the top right of each page.Alternatively,
.Rmd
and.ipynb
versions of the labs can be downloaded at statlearning.comThe R markdown files (
.Rmd
) can be used within RStudio
Evaluation#
5 assignments (60%)
Midterm (10%) (Tentative date: 11/7 in class)
Final exam (30%): 12/16/2022 @ 8:30 AM according to exam schedule
All work to be submitted on gradescope. Use entry BB55NN.
Late policy#
No assignments will be graded if submitted more than three days after the due date.
Each 24 hours or part thereof that a homework is late will be treated as one full day.
Piazza#
Gradescope, use entry BB55NN.#
Office hours#
Instructor#
Jonathan Taylor: Friday 1-3pm, Sequoia Hall #137
TAs#
Sophia Lu: M 3:30-4:30pm, Zoom (click here), Th 9-10am, both in Fishbowl (Sequoia Hall)
Aditya Ghosh: MF 3-4pm, both in Bowker (Sequoia Hall)
Rex Shen: T 3-5pm, 380-380D
David Fager: W 9:30-11:30am, Fishbowl (Sequoia Hall)
Zitong Yang: Th 4-5pm, F 3-4pm, both in Bowker (Sequoia Hall)
Kevin Fry: Th 12-2pm, Zoom (click here)
Debolina Paul: T 1-3pm, Bowker (Sequoia Hall)