Date

Nov 12, 2020

In this talk, we consider the worst-case regret of Linear Thompson Sampling (LinTS). Russo and Van Roy (2014) show that the Bayesian regret of LinTS is bounded by $O(d\sqrt T)$ where T is the time horizon and d is the number of parameters. While this bound is minimax near-optimal, the existence of a similar worst-case regret bound is still unknown. The only known worst-case regret bound for LinTS, due to Agrawal and Goyal (2013) and Abeille et al. (2017), is $O(d\sqrt{dT})$ which requires the posterior variance to be inflated by a factor of $O(\sqrt d)$. In this talk, we show that this extra factor is unavoidable, settling an open problem stated in Russo et al. (2018). On the positive side, we prove that a slightly modified version of LinTS requires only an $O(1)$ inflation where the constant depends on the diversity of the optimal arm.