Workshop at the Reinforcement Learning Conference 2024 in Amherst, MA.
Date: August 9, 2024, 9am - 4pm
Location: Hadley Room (posters in the Marriott Center)
Workshop Summary
The RL Safety Workshop will focus on the study of safe reinforcement learning systems that are aligned with human values. The workshop will blend research from several branches of reinforcement learning, ranging from theoretical formalisms of alignment, to strategies for building safe RL agents, to auditing and monitoring of deployed systems. We hope these topics will be of interest to a large subset of the RLC community, as they are crucial for ensuring that increasingly powerful RL systems remain beneficial to humanity.
Schedule
| Time | Event | Details | 
| 9:00a - 9:15a | Opening Remarks | |
| 9:15a - 9:30a | Contributed Talk: Jingwu Tang | Multi-Agent Imitation Learning: Value is Easy, Regret is Hard Abstract: We study a multi-agent imitation learning (MAIL) problem where we take the perspective of a learner attempting to coordinate a group of agents in a Markov Game by recommending actions based on demonstrations of an expert doing so. Most prior work in MAIL essentially reduces the problem to matching the behavior of the expert within the support of the demonstrations. While doing so is sufficient to drive the value gap between the learner and the expert to zero under the assumption that agents are non-strategic, it does not guarantee robustness to deviations by strategic agents. Intuitively, this is because strategic deviations can depend on a counterfactual quantity: the coordinator’s recommendations outside of the state distribution their recommendations induce. In response, we initiate the study of an alternative objective for MAIL in Markov Games we term the regret gap that explicitly accounts for potential deviations by agents in the group. We first perform an in-depth exploration of the relationship between the value and regret gaps. First, we show that while the value gap can be efficiently minimized via a direct extension of single-agent IL algorithms, even value equivalence can lead to an arbitrarily large regret gap. This implies that achieving regret equivalence is harder than achieving value equivalence in MAIL. We then provide a pair of efficient reductions to no-regret online convex optimization that are capable of minimizing the regret gap (a) under a coverage assumption on the expert (MALICE) or (b) with access to a queryable expert (BLADES). | 
| 9:30a - 9:45a | Contributed Talk: Sheila McIlraith | Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making Abstract: Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We further observe that fairness often needs to be assessed at time points within the process, not just at the end of the process. To advance our understanding of this class of fairness problems, we explore the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term, anytime, periodic, and bounded fairness. We explore the interplay between non-Markovian fairness and memory and how memory can support construction of fair policies. Finally, we introduce the FairQCM algorithm, which can automatically augment its training data to improve sample efficiency in the synthesis of fair policies via reinforcement learning. | 
| 9:45a - 10:00a | Contributed Talk: Tom Ringstrom | Operator Bellman Equations Support Compositional & Verifiable Planning in Hierarchical World-models Abstract: We introduce new reward-free Bellman Equations called Operator Bellman Equations, which produce predictive planning representations called state-time feasibility functions (STFFs) instead of traditional value functions. STFFs offer three key advantages: they are compositional, factorizable, and interpretable. This means: 1) STFFs can be sequentially composed to compute high-dimensional predictions over long horizons of sequential Options (policies). 2) High-dimensional STFFs can be efficiently represented and computed in a factorized form. 3) STFFs record the probabilities of semantically interpretable goal-success and constraint-violation events. We argue that the objective of reward maximization is often in conflict with these critical properties, which are essential for scalable, verifiable planning in dynamic, high-dimensional world-models. | 
| 10:00a - 10:15a | Contributed Talk: Toshinori Kitamura | A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees Abstract: We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient PD algorithm with uniform probably approximate correctness (Uniform-PAC) guarantees, simultaneously ensuring convergence to optimal policies, sublinear regret, and polynomial sample complexity for any target accuracy. Notably, this represents the first Uniform-PAC algorithm for the online CMDP problem. In addition to the theoretical guarantees, we empirically demonstrate in a simple CMDP that our algorithm converges to optimal policies, while baseline algorithms exhibit oscillatory performance and constraint violation. | 
| 10:15a - 10:30a | Contributed Talk: Nicolas Espinosa Dice | Efficient Inverse Reinforcement Learning without Compounding Errors Abstract: Inverse reinforcement learning (IRL) is an on-policy approach to imitation learning (IL) that allows the learner to observe the consequences of their actions at train-time. Accordingly, there are two seemingly contradictory desiderata for IRL algorithms: (a) preventing the compounding errors that stymie offline approaches like behavioral cloning and (b) avoiding the worst-case exploration complexity of reinforcement learning (RL). Prior work has been able to achieve either (a) or (b) but not both simultaneously. In our work, we first prove a negative result showing that, without further assumptions, there are no efficient IRL algorithms that learn a competitive policy in the worst case. We then provide a positive result: under a novel structural condition we term reward-agnostic policy completeness, we prove that efficient IRL algorithms do avoid compounding errors, giving us the best of both worlds. We also propose a principled method for using sub-optimal data to further improve the sample-efficiency of efficient IRL algorithms. | 
| 10:30a - 11:00a | Coffee & Networking | |
| 11:00a - 12:00p | Invited Talk: Natasha Jaques, University of Washington | Pluralistic Alignment through Personalized Reinforcement Learning from Human Feedback Abstract: Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning foundation models to human values and preferences. However, current RLHF techniques cannot account for the naturally occurring differences in individual human preferences across a diverse population. When these differences arise, traditional RLHF frameworks simply average over them, leading to inaccurate rewards and poor performance for minority groups. To address the need for pluralistic alignment, we develop a class of multimodal RLHF methods. Our proposed techniques are based on a latent variable formulation - inferring a novel user-specific latent and learning reward models and policies conditioned on this latent without additional user-specific data. While conceptually simple, we show that in practice, this reward modeling requires careful algorithmic considerations around model architecture and reward scaling. To empirically validate our proposed technique, we first show that it can provide a way to combat underspecification in simulated control problems, inferring and optimizing user-specific reward functions. Next, we conduct experiments on pluralistic language datasets representing diverse user preferences and demonstrate improved reward function accuracy. We additionally show the benefits of this probabilistic framework in terms of measuring uncertainty, and actively learning user preferences. This work enables learning from diverse populations of users with divergent preferences, an important challenge that naturally occurs in problems from robot learning to foundation model alignment. | 
| 12:00p - 13:00p | Lunch | |
| 13:00p - 14:00p | Invited Talk: David Abel, Google DeepMind | Toward a Science of Agency Abstract: Many of the core problems of AI safety are ultimately about agents. Yet, our basic science of agency is underdeveloped. In this talk I summarize a proposal for a few of the building blocks for a science of agency that I take to be necessary on the path fully understanding safe AI. I propose a way to define families of agents that is intended to open the door to study agents in the same way that the MDP opened the door to study sequential decision making problems. I first show that this definition is universal in a particular sense: it can represent every set of agents. I then briefly discuss two core elements of agency—goal-directedness and learning—through this agent-centric lens, and highlight a few consequences and questions that are most relevant for thinking carefully about safety. | 
| 14:00p - 14:45p | Panel Discussion | - Natasha Jaques, University of Washington - Scott Niekum, UMass Amherst - Eugene Vinitsky, NYU - Tan Zhi Xuan, MIT - Micah Carroll, UC Berkeley (moderator) | 
| 14:45p - 16:00p | Coffee & Poster Session | 
Topics of Interest
The following is a more detailed list of potential topic areas:
- Models/formalisms of safety/alignment/ethics in RL
- RL generalization and robustness
- Using LLMs/foundation models for sequential decision making
- Multi-agent RL
- Human-in-the-loop/cooperative RL
- Human value learning and inverse RL
- Using RL for algorithmic recommendations
- RL interpretability/explainability
- Fairness and bias in RL
- Governance/policy/law for RL systems
- Auditing and deployment safeguards for RL systems
Accepted Papers
- A Policy Gradient Primal-Dual Algorithm for Constrained MDPs with Uniform PAC Guarantees - Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, Yuki Ichihara, Soichiro Nishimori, Akiyoshi Sannai, Sho Sonoda, Wataru Kumagai, Yutaka Matsuo
- DECAF: Learning to be Fair in Multi-agent Resource Allocation - Ashwin Kumar, William Yeoh
- Prioritizing safety via curriculum learning - Cevahir Koprulu, Thiago D. Simão, Nils Jansen, Ufuk Topcu
- Predicting Future Actions of Reinforcement Learning Agents - Stephen Chung, Scott Niekum, David Krueger
- Multi-Agent Imitation Learning: Value is Easy, Regret is Hard - Jingwu Tang, Gokul Swamy, Fei Fang, Steven Wu
- Novelty Detection in Reinforcement Learning with World Models - Geigh Zollicoffer, Kenneth Eaton, Jonathan C Balloch, Julia Kim, Mark Riedl, Robert Wright
- Efficient Inverse Reinforcement Learning without Compounding Errors - Nicolas Espinosa Dice, Gokul Swamy, Sanjiban Choudhury, Wen Sun
- ROSARL: Reward-Only Safe Reinforcement Learning - Geraud Nangue Tasse, Tamlin Love, Mark Nemecek, Steven James, Benjamin Rosman
- Robust Model-Based Reinforcement Learning with an Adversarial Auxiliary Model - Siemen Herremans, Ali Anwar, Siegfried Mercelis
- Simplifying Constraint Inference with Inverse Reinforcement Learning - Adriana Hugessen, Harley Wiltzer, Glen Berseth
- Safe and Efficient Offline Reinforcement Learning: The Critic is Critical - Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey
- Sample Complexity for Obtaining Sub-optimality and Violation bound for Distributionally Robust Constrained MDP - Arnob Ghosh
- Pareto-Optimal Learning from Preferences with Hidden Context - Ryan Boldi, Li Ding, Lee Spector, Scott Niekum
- Online Statistical Inference of Sample-averaged Q-Learning - Saunak Kumar Panda, Tong Li, Yisha Xiang
- Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making - Parand A. Alamdari, Toryn Q. Klassen, Elliot Creager, Sheila A. McIlraith
- AI Alignment with Changing and Influenceable Reward Functions - Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
- Operator Bellman Equations Support Compositional & Verifiable Planning in Hierarchical World-models - Tom Ringstrom
Important Dates
- Paper submission deadline: May 24, 2024 (AoE)
- Paper acceptance notification: June 3, 2024
Submission Details
Submissions must be relevant to one or more of the topic areas listed above, in the following two categories:
- Long papers - up to 8 pages + unlimited references / appendices
- Short papers - up to 4 pages + unlimited references / appendices
Please format submissions in RLC style (see Author Kit: overleaf, zip).
We will review primarily for topic fit and technical correctness. Some accepted papers will be selected for contributed talks. All accepted papers will present during the poster session.
Submission link: click here
Organizing Committee
- Cameron Allen, UC Berkeley
- Serena Booth, US Senate
- Micah Carroll, UC Berkeley
- Davis Foote, UC Berkeley
- Dylan Hadfield-Menell, MIT
- Tan Zhi-Xuan, MIT
- Rui-Jie Yew, Brown University
Please send your inquiries to rlsworganizers@gmail.com.