Online reinforcement learning using a probability density estimation
Journal Article (2017)
Function approximation in online, incremental, reinforcement learning needs to deal with two fundamental problems: biased sampling and non-stationarity. In this kind of tasks, biased sampling occurs because samples are obtained from specific trajectories dictated by the dynamics of the environment and are usually concentrated in particular convergence regions, which in the long term tend to dominate the approximation in the less sampled regions. The non-stationarity comes from the recursive nature of the estimations typical of temporal difference methods. This non-stationarity has a local profile, not only varying along the learning process but also along different regions of the state space. We propose to deal with these problems using an estimation of the probability density of samples represented with a Gaussian mixture model. To deal with the non-stationarity problem we use the common approach of introducing a forgetting factor in the updating formula. However, instead of using the same forgetting factor for the whole domain, we make it to depend on the local density of samples, which we use to estimate the non-stationarity of the function at any given input point. On the other hand, to address the biased sampling problem, the forgetting factor applied to each mixture component is modulated according to the new information provided in the updating, rather than forgetting only depending on time, thus avoiding undesired distortions of the approximation in less sampled regions.
dynamic programming, learning (artificial intelligence).
online reinforcement learning, Gaussian mixture models
A. Agostini and E. Celaya. Online reinforcement learning using a probability density estimation. Neural Computation, 29(1): 220-246, 2017.