MS&E 325: Topics in stochastic optimization.
HW 1. Given 1/22/09. Due 2/3/2009. You can do it in groups of two and
submit a single HW if you prefer.
Do problems 3.1-3.5 from the lecture notes. Also, consider the very first
algorithm in the paper "Finite-time Analysis of the Multiarmed Bandit
Problem" by Auer et al on the class web-page. Prove that if the discount
factor theta is larger than 1-1/N^2 then this algorithm results in a
near-optimal solution to the discounted infinite horizon problem under the
same input model. Don't forget that we use N to refer to the number of arms.