MS&E 325: Topics in stochastic optimization.

HW 1. Given 1/22/09. Due 2/3/2009. You can do it in groups of two and submit a single HW if you prefer.

Do problems 3.1-3.5 from the lecture notes. Also, consider the very first algorithm in the paper "Finite-time Analysis of the Multiarmed Bandit Problem" by Auer et al on the class web-page. Prove that if the discount factor theta is larger than 1-1/N^2 then this algorithm results in a near-optimal solution to the discounted infinite horizon problem under the same input model. Don't forget that we use N to refer to the number of arms.