PhD Oral Defense

Apr 27, 2021

We consider stochastic linear bandits and provide an analysis that yields several existing Bayesian and frequentist regret bounds for OFUL and Thompson Sampling (TS). Specifically, we obtain an O(d \sqrt{T}) worst-case regret bound for OFUL and an O(d \sqrt{T}) Bayesian regret bound for TS. Next, we construct examples to show that the worst-case regret of TS can, in fact, grow linearly in T. One of our examples works by only reducing the noise of the rewards in the model. This thus implies that a more robust analysis of TS is needed. We then apply our analysis to TS and introduce minor modifications to TS that enable us obtain sharper bounds for TS under additional conditions. Intuitively, our bound involves a parameter that depends on the “diversity” of the optimal action. When the optimal action is more “diverse,” this parameter will be smaller which leads to improved regret bounds. An important characteristic of this bound is its robustness to distributional mismatches.

Nima Hamidi
Nima Hamidi
Ph.D. Student in Statistics

I’m a Ph.D. student at Stanford and my area of research includes the theory of bandit problems.