Recording date: 17/10/2023
Viewed: 8 times

Sample complexity of Q-learning: from single-agent to federated learning

Q-learning, which seeks to learn the optimal Q-function of a Markov decision process (MDP) in a model-free fashion, lies at the heart of reinforcement learning practices. However, theoretical understandings on its non-asymptotic sample complexity remain unsatisfactory, despite significant recent efforts. In this talk, we first show a tight sample complexity bound of Q-learning in the single-agent setting, together with a matching lower bound to establish its minimax sub-optimality. We then show how federated versions of Q-learning allow collaborative learning using data collected by multiple agents without central sharing, where an importance averaging scheme is introduced to unveil the blessing of heterogeneity.

Yuejie Chi

There are no attachments