[an error occurred while processing this directive]
Skip to Content

Search: {{$root.lsaSearchQuery.q}}, Page {{$root.page}}

Statistics Department Seminar Series: Kihyuk Hong, PhD Candidate, Department of Statistics, University of Michigan

"Learning Infinite-horizon Average-reward MDPs with Linear Function Approximation"
Tuesday, December 10, 2024
4:00-5:00 PM
411 West Hall Map
Abstract: The infinite-horizon average-reward Markov decision process (MDP) provides a natural framework for sequential decision-making under uncertainty in settings where an agent interacts with the environment continuously, such as inventory management and network routing systems. Unlike episodic settings, where the environment is periodically reset, continuous interaction without resets introduces unique challenges. For example, it necessitates assumptions on the underlying MDP to avoid pathological scenarios where the agent becomes trapped in an unfavorable state with no path to recovery. Additionally, the average reward optimality criterion complicates algorithm design, as the corresponding Bellman operator is not a contraction, preventing the straightforward application of optimistic value iteration algorithms.

We address reinforcement learning in infinite-horizon average-reward settings, focusing on MDPs where both the reward function and the transition probability kernel are linear in a feature representation of state-action pairs. Existing approaches in this setting either face computational inefficiencies or rely on strong assumptions, such as ergodicity, to achieve order-optimal regret bounds. In this talk, we introduce a computationally efficient algorithm that attains an order-optimal regret bound under a mild assumption on the underlying MDP. The algorithm learns a discounted-reward MDP as a surrogate for the average-reward problem. Leveraging the contraction property of the associated Bellman operator for the surrogate problem, we design an optimistic value iteration algorithm and employ value function clipping technique for improving statistical efficiency. We show that appropriately tuning the discounting factor for the surrogate problem achieves an order-optimal regret for the original average-reward problem.

https://kihyukh.github.io/
Building: West Hall
Website:
Event Type: Workshop / Seminar
Tags: seminar
Source: Happening @ Michigan from Department of Statistics, Department of Statistics Seminar Series