MS&E 325: Topics in Stochastic Optimization
Department of Management Science and Engineering, Stanford University,
Winter 2008-09.
TuTh 2:05-3:15 pm, Terman M33.
Instructor: Ashish Goel. Office hours: Mondays 4-5, Terman 311.
Guests/auditors, please subscribe to msande325-win0809-guests by going to
mailman.stanford.edu .
Course Description
From the bulletin: Markov decision processes;
optimization with sparse priors; multi-armed bandit problems and the
Gittins' index; regret bounds for multi-armed bandit problems;
stochastic knapsack and the adaptivity gap; budgeted learning;
adversarial queueing theory; stochastic scheduling and routing;
stochastic inventory problems; multi-stage and multi-objective
stochastic optimization. Prerequisites: MS&E 221 or equivalent; and
MS&E 212 or CS 261 or equivalent.
Informally: The first part of
the class will focus on topics that can loosely be called
"Algorithmic decision theory": Algorithms for designing provably
(near)-optimal and computationally efficient strategies for decision
making under uncertainty, using available and acquired data, as
well as probabilistic models thereon. The second part will
focus on stochastic versions of combinatorial optimization problems.
Course requirements: There
will be four homework assignments, and an optional project. Please sign
up for a project only if you believe you will be able to devote
substantial amount of attention. Also, every student may be expected to
scribe one lecture each.
Reading list (will be updated continually; papers
marked as [C] will be covered in class; papers marked as [R] are for your
reference; this list is not exhaustive)
[C] A short proof
of the Gittins Index theorem. J. Tsitsiklis. Notice that this proof
assumes that the time for which an arm gets played is a continuous
distribution, whereas we assumed discrete in the proof in class.
[C] A spreadsheet illustrating a computation of
the Gittins' index for Beta priors.
[R] Four
proofs of Gittins' multiarmed bandit theorem. E. Frostig and G.
Weiss. Specially, see the proof which involves the fair and prevailing
prices.
[R] Elicitation
of a Beta prior for Bayesian inference in clinical trials. Y. Wu,
W. Shih, and D. Moore. This is just to support my contention that the
multi-armed bandit problems are important not just for new Internet
related applications but also traditional applications in health-care
etc. It would be enough to read the abstract.
[R] Asymptotically efficient adaptive
allocation rules.
T. Lai and H. Robbins. This is the original "regret paper", and is a classic. Read the treatment
of the lower bound from this paper, but for upper bounds, the proofs have been
simplified and generalized (sometimes with slightly weaker constants, but that
is not a major concern in this course).
[R] Sample mean based index policies with O(log n) regret for the
multi-armed bandit problem.
R. Agarwal. This contains a simplified proof of the main
result of Lai and Robbins. It would be most instructive to read sections 1
and 2 of this paper, and skip the rest, since (modulo constants), these
sections are subsumed by the next paper.
[C]
Finite-time Analysis of the Multiarmed Bandit Problem.
P. Auer, N. Cesa-Bianchi, and P. Fischer. Specifically, read the analysis of
UCB1. Then contemplate the following question: why is the performance of this
algorithm apparently worse if the arms have very similar means? Can you derive
a bound that does not depend on the difference in the means?
[C]
The Nonstochastic Multiarmed Bandit Problem.
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. Read the proofs for the
very first algorithm (section 3) followed by a quick look at section 4. Then
ask the following question: where did all the probabilistic analysis disappear
in this proof? Also read the lower bound in the appendix.
[C]
Efficient algorithms for online decision problems.
A. Kalai and S. Vempala. A nice exposition of the full information model. Read
the first three sections. Note that this algorithm can be used to solve a
large variety of linear optimization problems sequentially, without any
assumptions on the linear utilities in each time period. Also note that the
bounds that they obtain in the simplest multi-armed bandit setting are stronger than
in the partial information model, most notably, the sqrt(N) changes to a
log(N) -- verify this for yourself by finding the optimum epsilon for the
simplest algorithm in this paper.
[R] Online convex
programming and generalized infinitesimal gradient ascent.
M. Zinkevich. Read the algorithm. Then before reading the proof, consider the
following question: how would you use this technique in a black-box fashion to
solve the linear generalization of the bandits problem, as descried by Kalai
and Vempala in section 1.1? Before you assume that this is trivial, note that
Zinkevich requires a convex decision space, whereas Kalai and Vempala assume
that the decision space is arbitrary, possibly just a set of points.
[R]
Logarithmic Regret Algorithms for Online Convex Optimization.
E. Hazan, A. Agarwal, and S. Kale.
Read the first two sections and make sure you understand the connection to
Zinkevich's results and those of Kalai and Vempala.
[R]
Universal portfolios.
T. Cover.
Another classic. Read the problems in this paper and
think about how you would solve them using more recent techniques. Then go back
and read the rest of Hazan et al (don't worry too much about the proofs since
the techniques used are not similar to those used in the rest of this class).
[R]
The value of knowing the demand curve. R. Kleinberg and T. Leighton. An interesting application of the
ideas in the UCB algorithm of Auer et al, as well as new insights.
[R]
Robbing the bandit: Less regret in online geometric optimization against an
adaptive adversary. V. Dani and T. Hayes. Read the discussion on adaptivity.
[C] Allocating
bandwidth for bursty connections. J. Kleinberg, Y. Rabani,
E. Tardos. Read the sections on load balancing. Observe how the lower bound
and the upper bound seem almost contradictory -- the lower bound says that "no
definition of effective bandwidth works" while the upper bound says that "the
standard definition of effective
bandwidth works"! The trick, of course is in dealing with exceptional jobs
separately and doing binary search ot find the right cut-off. An imporvement
in the constant factor would be very interesting, specially if the improvement
is large.
[R] Stochastic
Load Balancing and Related Problems. A. Goel and P. Indyk. Read the
sections on Poisson and Exponential jobs. Then ask the following questions:
(a) How does the Poisson case for load balancing relate to the majorization
based proof we saw in class? And most importantly, can the kinds of techniques
we used for exponential jobs be used for Bernoulli jobs? For general jobs? A
PTAS for the Bernoulli or general problem would be spectacular.
[C] Simultaneous
optimization via approximate majorization for concave profits or convex
costs. A. Goel and A. Meyerson. Read sections 2, 3.4, and section 4 (skip
the proofs of section 4).
[C] Approximating the stochastic knapsack: the benefit of
adaptivity. B. Dean, M. Goemans and J. Vondrak.
[C] Approximation
algorithms for budgeted learning problems S. Guha and K. Munagala. Notice
how the proof uses techniques very similar to those used by Dean, Goemans, and
Vondrak.
[C] Multi-armed Bandits
with Metric Switching Costs S. Guha and K. Munagala. Notice the nice
balancing trick in the LP.
[C] The Ratio Index for Budgeted Learning, with Applications.
A. Goel, S. Khanna, B. Null. Read the definition of the ratio
index, the statement of the result, and then just focus on sections 3 and 4.
(Many) more to come shortly.
Project
suggestions.
My preference would be if you were to do this in teams of two.
Remember, the project is optional but if you choose to do one, I would
expect you to devote substantial effort. I will add more project
suggestions if needed. Also, please feel free to come up with a
formulation of your own.
- Fast computation of the Gittins index for finite state spaces.
The best known appears to be O(n^3) where n is the size of the state
space of the Markov chain of the arm. This can be achieved by analyzing
pivoting rules for simplex, or more simply by just implementing
Tsitsiklis's recursive rule as in section 3 of this paper.
Can you improve this running time? Contrary to what I may have said or
thought or appeared to say or think in class, I don't think any longer
that this is easy.
- The above will not apply to Beta priors. Assume that you are
allowed an error of epsilon in computing the Gittins index of the state
(alpha, beta). What is the fastest algorithm you can devise?
- For the truly adventurous: can you devise a fully strongly
polynomial time algorithm for solving the discounted infinite horizon
Markov Decision Process? The "most strongly polynomial" known algorithm
is strongly polynomial in everything except the discount factor.
- Can you find the best possible approximation for the budgeted
learning problem? Check out this LP
based and this
index based
approximation algorithm.
- Advertising auctions with priors
- Strategic fraud in multi-armed bandit problems: Suppose each arm
corresponds to a strategic agent. For example, each arm may be an
advertiser who wants good placement. Assume this strategic agent can
inflate the reward probability initially. What kind of a game does this
impose? This will involve first doing a survey of known results in this
direction. Also, suppose the agent could return with a new pseudonym if
its prior falls very low. What kind of a social cost does this impose?
- Data-oriented: Can you show, under some reasonable prior, that
the NetFlix data set does not have
sufficient information to obtain a better than x% prediction? This will
not help you win the NetFlix challenge, since this is in the wrong
direction, but it would be interesting nevertheless.
- Product pricing: Consider the scenario where you can offer a good at one
of K price points. Users arrive sequentially. The paper on the
value of knowing the demand curve studies this problem with a view
towards minimizing regret. Can you design optimum (or near-optimum)
strategies in a model where you are given a prior on the user valuations?
What priors make sense? This will also involve doing a survey of existing
work.
- Correlated bandits: What if different arms are correlated? We
have some problem formulations that I would be happy to discuss
individually, or you can find your own problem in this space.