Statistics 311/Electrical Engineering 377 (Winter 2016)

Instructor: John Duchi (jduchi stanford edu)

Lectures: Tuesday/Thursday, 13:30--14:50, building 200 (History corner) room 205

TA: Hongseok Namkoong (hnamk stanford edu)

Office hours: John: Tuesday/Thurdsay 15:00 - 16:00 126 Sequoia Hall. Hong: Wednesday 13:15-15:00 Packard 106.

Description: Information theory was developed to solve fundamental problems in the theory of communications, but its connections to statistical estimation and inference date nearly to the birth of the field. With their focus on fundamental limits, information theoretic techniques have provided deep insights into optimal procedures for a variety of inferential tasks. In addition, the basic quantities of information theory--entropy and relative entropy and their generalizations to other divergence measures such as f-divergences--are central in many areas of mathematical statistics and probability.

The application of these tools are numerous. In mathematical statistics, for example, they allow characterization of optimal error probabilities in hypothesis testing, determination of minimax rates of convergence for estimation problems, demonstration of equivalence between (ostensibly) different estimation problems, and lead to penalized estimators and the minimum description length principle. In probability, they provide insights into the central limit theorem, large deviations theory (via Sanov's theorem and other results), and appear in empirical process theory and concentration of measure. Information theoretic techniques also arise in game playing, gambling, stochastic optimization and approximation, among other areas.

In this course, we will study information theoretic quantities, and their connections to estimation and statistics, in some depth, showing applications to many of the areas above. Except to provide background, we will not cover standard information-theoretic topics such as source-coding or channel-coding, focusing on the probabilistic and statistical consequences of information theory.

Here is a more precise syllabus including evaluation criteria and topics, and we also have last year's course webpage.

Readings listed below are supplementary to the lectures and not necessary, but should give additional perspective on the material.

Lectures

Full lecture notes: [pdf] I will post a fully linked and updated version of the lecture notes here as soon as they are prepared. The pdfs associated with each lecture are chapters within this document. [pdf]

In the readings below, "LN" stands for Lecture Notes, while CT, T, V, BLM, etc., stand for the supplementary books and notes listed above.
Date Topics Reading Assignments
January 5
Course overview and introduction LN Chapters 1 and 2
CT, Chapter 2
January 7
Review of information theory
Divergences
LN Chapters 1 and 2
CT, Chapter 2
Problem Set 1 Out
[pdf]
January 12 Concentration inequalities LN Chapter 3 Project information out
[pdf]
January 14 Concentration inequalities LN Chapter 3
January 19 Concentration inequalities LN Chapter 3
January 21 Concentration inequalities
and generalization bounds
LN Chapter 4 Problem set 1 due
Solutions: [pdf]
January 26 Generalization bounds
Coding and Kraft inequalities
LN Chapters 4 and 5 Problem set 2 out
[pdf]
January 28 Exponential families
Fisher information
LN Chapters 6, 7, 8
February 2 Universal coding LN Chapter 9
February 4 Universal coding and sequential prediction LN Chapters 9, 10 Problem set 2 due
Solutions: [pdf]
February 9 Online convex optimization
Bandit problems
LN Chapter 11 Problem set 3 out
[pdf]
February 11 Online convex optimization
Bandit problems
LN Chapters 11, 12
February 16 Exploration/exploitation and bandit problems LN Chapters 11, 12
February 18 Exploration/exploitation and bandit problems LN Chapters 11, 12 Problem set 3 due
Solutions: [pdf]
February 22 Minimax lower bounds
Le Cam's method
LN Chapter 13
Tsybakov Chapters 2.1-2.4
February 25 Minimax lower bounds
Fano's method and packing
LN Chapter 13
Tsybakov Chapters 2.5-2.7
Problem set 4 out
[pdf]