When applying Reinforcement Techniques to real data, batch methods have to be considered. Fitted Q iteration, by Damien Ernst, is one popular method. Using FQI with an HIV model with Structured Treatment Interruptions from Adams et al., Ernst was able to find an optimal treatment policy that is administered every 5 days.
We plan on extending this in 2 ways. First, we will find optimal treatments across other constant number of days. We will use this in order to find the longest number of days a patient can undergo treatment without compromising their health. Second, we will expand the set of actions during the reinforcement learning in order to account for treatments across a variable number of days.