Topic |
Videos (on Canvas/Panopto) |
Course Materials |
|
Introduction to Reinforcement Learning |
Lecture 1
|
- Lecture 1 Draft Slides [Post class version]
- Additional Materials:
|
Tabular MDP planning |
Lecture 2
|
- Lecture 2 Slides (pre-class) [Post class, annotated]
- Additional Materials:
- SB (Sutton and Barto) Chp 3, 4.1-4.4
|
Tabular RL policy evaluation |
Lecture 3
|
- Lecture 3 Slides (pre-class) [Post class, with annotations]
- Additional Materials:
- SB (Sutton and Barto) Chp 5.1, 5.5, 6.1-6.3
- David Silver's Lecture 4 [link]
|
Q-learning |
Lecture 4
|
- Lecture 4 Slides (preclass) (post class with annotations)
- Additional Materials:
- SB (Sutton and Barto) Chp 5.2, 5.4, 6.4-6.5, 6.7
|
Policy Gradient |
Lecture 5
Lecture 6
Lecture 7
|
- Lecture 5 Slides [Post lecture with annotations]
- Lecture 6 Slides [Post class annotations]
- Lecture 7 Slides [Post class annotations]
- Additional Materials:
- SB (Sutton and Barto) Chp 13
|
Imitation Learning and Learning from Human Input |
Lecture 8
Lecture 9 (including DPO guest lecture by Rafael Rafailov, Archit Sharma, Eric Mitchell)
Lecture 10
|
- Lecture 7 Slides [Post class annotations]
- Lecture 8 Slides (preclass) [Post class with annotations]
- Lecture 9 Slides [Post class]
- Lecture 9 DPO Slides
- Lecture 10 Slides [Post class]
- Additional Materials:
|
Fast Learning / Data Efficient RL |
Lecture 11
Lecture 12
Lecture 13
|
- Lecture 11 Slides [Post class, with annotations]
- Lecture 12 Slides [Post class, with annotations]
- Lecture 13 Slides [Post class, with annotations]
- Additional Materials:
|
-->
MCTS |
Lecture 14
|
- Lecture 14 Slides [Post class, with annotations]
|
Rewards in Reinforcement Learning |
Lecture 15
|
Lecture 15 Slides (preclass) Post class with annotations
Lecture 15 (Value Alignment)
|
Review and Looking Forward |
Lecture 16 Slides [post class]
|
|