Instructor: John Duchi (jduchi stanford edu)
Lectures: Tuesday/Thursday, 13:3014:50, building 200 (History corner) room 205
TA: Hongseok Namkoong (hnamk stanford edu)
Office hours: John: Tuesday/Thurdsay 15:00  16:00 126 Sequoia Hall. Hong: Wednesday 13:1515:00 Packard 106.
Description: Information theory was developed to solve
fundamental problems in the theory of communications, but its
connections to statistical estimation and inference date nearly
to the birth of the field. With their focus on fundamental
limits, information theoretic techniques have provided deep
insights into optimal procedures for a variety of inferential
tasks. In addition, the basic quantities of information
theoryentropy and relative entropy and their generalizations
to other divergence measures such as fdivergencesare
central in many areas of mathematical statistics and
probability.
The application of these tools are numerous. In mathematical
statistics, for example, they allow characterization of optimal error
probabilities in hypothesis testing, determination of minimax rates of
convergence for estimation problems, demonstration of equivalence
between (ostensibly) different estimation problems, and lead to
penalized estimators and the minimum description length principle. In
probability, they provide insights into the central limit theorem,
large deviations theory (via Sanov's theorem and other results), and
appear in empirical process theory and concentration of
measure. Information theoretic techniques also arise in game playing,
gambling, stochastic optimization and approximation, among other
areas.
In this course, we will study information theoretic quantities,
and their connections to estimation and statistics, in some
depth, showing applications to many of the areas above. Except
to provide background, we will not cover standard
informationtheoretic topics such as sourcecoding or
channelcoding, focusing on the probabilistic and statistical
consequences of information theory.
Here is a more precise
syllabus
including evaluation criteria and topics, and we also
have
last year's course webpage.
Readings listed below are supplementary to the lectures
and not necessary, but should give additional
perspective on the material.
Cover and Thomas: Elements of Information Theory, Second Edition, Wiley 2006.
Tsybakov: Introduction to Nonparametric Estimation, Springer 2009.
Vershynin: Introduction to the nonasymptotic analysis of random matrices, 2012.
Boucheron, Lugosi, Massart: Concentration Inequalities: a Nonasymptotic Theory of Independence, Oxford University Press, 2013.
Date  Topics  Reading  Assignments  
January 5 
Course overview and introduction  LN Chapters 1 and 2
CT, Chapter 2 

January 7  Review of information theory
Divergences 
LN Chapters 1 and 2 CT, Chapter 2 
Problem Set 1 Out [pdf] 

January 12  Concentration inequalities  LN Chapter 3  Project information out
[pdf] 

January 14  Concentration inequalities  LN Chapter 3  
January 19  Concentration inequalities  LN Chapter 3  
January 21  Concentration inequalities and generalization bounds 
LN Chapter 4  Problem set 1 due
Solutions: [pdf] 

January 26  Generalization bounds Coding and Kraft inequalities 
LN Chapters 4 and 5  Problem set 2 out
[pdf] 

January 28  Exponential families Fisher information 
LN Chapters 6, 7, 8  
February 2  Universal coding  LN Chapter 9  
February 4  Universal coding and sequential prediction  LN Chapters 9, 10 
Problem set 2 due Solutions: [pdf] 

February 9  Online convex optimization Bandit problems 
LN Chapter 11  Problem set 3 out [pdf] 

February 11  Online convex optimization Bandit problems 
LN Chapters 11, 12  
February 16  Exploration/exploitation and bandit problems  LN Chapters 11, 12  
February 18  Exploration/exploitation and bandit problems  LN Chapters 11, 12 
Problem set 3 due Solutions: [pdf] 

February 22  Minimax lower bounds
Le Cam's method 
LN Chapter 13 Tsybakov Chapters 2.12.4 

February 25  Minimax lower bounds
Fano's method and packing 
LN Chapter 13 Tsybakov Chapters 2.52.7 
Problem set 4 out [pdf] 