Nov 12, 2020

In this talk, we consider the worst-case regret of Linear Thompson Sampling (LinTS). Russo and Van Roy (2014) show that the Bayesian regret of LinTS is bounded by $O(d\sqrt T)$ where T is the time horizon and d is the number of parameters. While this bound is minimax near-optimal, the existence of a similar worst-case regret bound is still unknown. The only known worst-case regret bound for LinTS, due to Agrawal and Goyal (2013) and Abeille et al. (2017), is $O(d\sqrt{dT})$ which requires the posterior variance to be inflated by a factor of $O(\sqrt d)$. In this talk, we show that this extra factor is unavoidable, settling an open problem stated in Russo et al. (2018). On the positive side, we prove that a slightly modified version of LinTS requires only an $O(1)$ inflation where the constant depends on the diversity of the optimal arm.

Nima Hamidi
Nima Hamidi
Ph.D. Student in Statistics

I’m a Ph.D. student at Stanford and my area of research includes the theory of bandit problems.