Probably Approximately Correct Guarantees for Reinforcement Learning in Continuous State-Spaces Through Discretization

Published in Journal of AI Research, 2025

In safety-critical settings, for humans to rely on reinforcement learning (RL) based decision making, we need to understand how much experience, in terms of the sample-complexity, is needed to perform near optimally. For a Markov Decision Process (MDP) with a finite state-space of size S and finite action-space of size A, algorithms with sample-complexities that can be bounded by a polynomial in S and A with probably approximately correct learning guarantees are referred to as PAC-MDP algorithms. A limitation of model-free PAC-MDP algorithms is that they require finite and discrete state-action spaces as well as discrete-time decisions, while decision making problems related to robotic systems, healthcare and autonomous transport are often modelled using continuous environments. As such, the guarantees placed on learning-enabled components within safety-critical systems by applying PAC-MDP algorithms will not hold on environments that have been arbitrarily discretized. In this paper, we address the issue of learning guarantees on model-free RL in continuous state-spaces within continuous time settings. Our approach constructs an optimization problem with probabilistic constraints to find a bucketing function that bounds the discretization error with high confidence when applied to the continuous environment. This allows us to apply PAC-MDP algorithms to the resulting discrete MDP. We thus provide performance assurances in a wider range of safety-critical settings, significantly enhancing the applicability of PAC-MDP algorithms.

Recommended citation: Preprint
Download Paper