Oct 22, 2019

We consider the $k$-armed stochastic contextual bandit problem with $d$ dimensional features, when both $k$ and $d$ can be large. To the best of our knowledge, all existing algorithm for this problem have a regret bound that scale as polynomials of degree at least two in $k$ and $d$. The main contribution of this paper is to introduce and theoretically analyse a new algorithm (REAL-bandit) with a regret that scales by $r^2(k+d)$ when $r$ is rank of the $k\times d$ matrix of unknown parameters. REAL-bandit relies on ideas from low-rank matrix estimation literature and a new row-enhancement subroutine that yields sharper bounds for estimating each row of the parameter matrix that may be of independent interest.

Nima Hamidi
Nima Hamidi
Ph.D. Student in Statistics

I’m a Ph.D. student at Stanford and my area of research includes the theory of bandit problems.