Skip to Content

Search: {{$root.lsaSearchQuery.q}}, Page {{$}}

AIM Seminar: Matryoshka Policy Gradient for Max-Entropy Reinforcement Learning

François Ged, École Polytechnique Fédérale de Lausanne
Friday, January 27, 2023
3:00-4:00 PM
1084 East Hall Map
Reinforcement Learning (RL) is the area of Machine Learning specialized in tasks where an agent interacts with its environment through a sequence of actions, chosen according to its policy. The agent’s goal is to maximize the rewards collected along the way. Regularizing the rewards by giving the agent entropy bonuses induced by its policy has recently become more and more common. The major benefits are: enhancement of the exploration of the environment, a unique stochastic optimal policy, and an increase in the robustness of the agent to adversarial modifications of the rewards.

Typically, RL algorithms (such as policy gradient, Q-learning, etc) update the parameters of the agent’s policy based on how well the policy performs in future state. In this talk, after briefly illustrating how this way of training makes it hard to establish the convergence to a global optimum, I will present a novel algorithm called Matryoshka Policy Gradient (MPG). Its implementation is very intuitive and relies on the following idea: instead of learning to optimize an objective with fixed horizon (possibly infinite), the agent with MPG learns to optimize policies for all horizons simultaneously, from small to large in a nested way (recalling the image of Matryoshka dolls). Theoretically, under mild assumptions, our most important results can be summarized as follows:

1. training converges;
2. the limit is the unique optimal policy;
3. for policies parametrized by a neural network, we provide a simple sufficient criterion at convergence for the global optimality of the limit, in terms of the neural tangent kernel of the neural network.

While (i) is known for traditional policy gradient methods – under the name of the Policy Gradient Theorem – (ii) has been proved only very recently for some PG methods, in rather specific settings.
Numerically, we confirm the potential of our algorithm by successfully training an agent on two basic standard benchmarks from Open AI Gym, namely, frozen lake and pendulum.

The talk is mostly introductory and no background on RL is expected.

Based on joint work with Prof. Maria Han Veiga.
Building: East Hall
Event Type: Workshop / Seminar
Tags: Mathematics
Source: Happening @ Michigan from Department of Mathematics, Applied Interdisciplinary Mathematics (AIM) Seminar - Department of Mathematics