Multi-armed bandit (MAB) experiments have recently received significant attention in data-centric enterprises, given their promise in reducing opportunity cost of experimentation. In this talk we consider the well-known stochastic linear bandit problem which includes (as special case) standard MAB experiments and their personalized (contextual) counterpart. In this setting a decision-maker sequentially chooses among a set of given actions in
A well-known algorithm for this problem that is also commonly used in practice is Linear Thompson Sampling (LinTS). LinTS is shown, by Russo and Van Roy (2014), to achieve the best possible Bayesian regret up to logarithmic factors. However, in the frequentist setting, there is a factor