Dynamic Learning of Patient Response Types: An Application to Treating Chronic Diseases
We introduce a framework for developing adaptive, personalized treatments for chronic diseases for which medication is effective for only a subset of patients. Our model is based on a continuous-time, multi-armed bandit setting where drug effectiveness is assessed by aggregating information from several channels: by continuously monitoring the state of the patient, but also by (not) observing the occurrence of particular infrequent health events, such as relapses or disease flare-ups. We illustrate the effectiveness of the methodology by developing a set of efficient treatment policies for multiple sclerosis, which we then use to benchmark several existing treatment guidelines.