Nathan Lambert - Reinforcement Learning

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Nov 20, 2023

H. Ivison and Y. Wang et al.

ArXiv

Advancements in instruction tuning and RLHF! Empirical studies.

[pdf][arxiv][code][video]Full Page

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Oct 31, 2023

Nathan Lambert, Roberto Calandra

How the optimization setup of RLHF is limiting the steerability of LLMs.

[pdf][arxiv][code][video]Full Page

Zephyr: Direct Distillation of LM Alignment

Oct 25, 2023

HuggingFace H4 Team

The report for a small and powerful chat model trained with DPO!

[pdf][arxiv][code][video]Full Page

The History and Risks of Reinforcement Learning and Human Feedback

Oct 20, 2023

Nathan Lambert, Thomas Krendl Gilbert, Tom Zick

The complicated historical past underpinning reinforcement learning from human feedback!

[pdf][arxiv][code][video]Full Page

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Oct 12, 2023

Ran Wei, Nathan Lambert, Anthony McDonald, Alfredo Garcia, Roberto Calandra

ArXiv

Where is model-based RL heading 4 years after the seminal paper of my Ph.D.

[pdf][arxiv][code][video]Full Page

Synergy of Prediction and Control in Model-based Reinforcement Learning

May 12, 2022

Nathan Lambert

My thesis on model-based RL. Let's make models work with tasks!

[pdf][arxiv][code][video]Full Page

Reward Reports for Reinforcement Learning

Apr 25, 2022

Thomas Krendl Gilbert, Sarah Dean, Nathan Lambert, Tom Zick, Aaron Snoswell

We propose a new type of documentation for dynamic machine learning (and reinforcement learning) systems!

[pdf][arxiv][code][video]Full Page

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

Feb 8, 2022

Thomas Krendl Gilbert, Sarah Dean, Tom Zick, Nathan Lambert

Center for Long-Term Cybersecurity White paper Series

We detail why reinforcement learning systems pose a different type of (dynamic) risks to society. This paper outlines the different types of feedback present in RL systems, the risks they pose, and a path forward for policymakers.

[pdf][arxiv][code][video]Full Page

The Challenges of Exploration for Offline Reinforcement Learning

Feb 1, 2022

Nathan Lambert, Markus Wulfmeier, William Whitney, Arunkumar Byravan, Michael Bloesch, Vibhavari Dasagi, Tim Hertweck, Martin Riedmiller

We flip the script on Offline RL research and ask the question of "what is the best dataset to collect?" rather than "what is the best algorithm?"

[pdf][arxiv][code][video]Full Page

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

Apr 20, 2021

Luis Pineda, Brandon Amos, Amy Zhang, Nathan O Lambert, Roberto Calandra

An open-source PyTorch repository designed from the bottom up for model-based reinforcement learning research.

[pdf][arxiv][code][video]Full Page

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Feb 26, 2021

Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

2021 International Conference on Artificial Intelligence and Statistics

We showed that advancements in AutoML when paired with common deep RL tasks, MBRL algorithms perform so well they break the simulator.

[pdf][arxiv][code][video]Full Page

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

Dec 21, 2020

Nathan Lambert, Craig Schindler, Daniel S Drew, Kristofer SJ Pister

IEEE Robotics and Automation Letters

We explored how MBRL can learn multi-step, nonlinear controllers!

[pdf][arxiv][code][video]Full Page

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Dec 16, 2020

Nathan O Lambert, Albert Wilcox, Howard Zhang, Kristofer SJ Pister, Roberto Calandra

2021 IEEE Conference on Decision and Control

Trying to reframe the MBRL framework with long-term predictions instead of one-step predictions!

[pdf][arxiv][code][video]Full Page

Objective Mismatch in Model-based Reinforcement Learning

Feb 11, 2020

Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

2020 Conference on Learning for Decision and Control

Studying the numerical effects of a dual-optimization problem in model-based reinforcement learning -- control and dynamics. When optimizing model accuracy, there is no guarantee on improving task performance!

[pdf][arxiv][code][video]Full Page

Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

Sep 26, 2019

Tianyu Li, Nathan Lambert , Roberto Calandra , Franziska Meier , Akshara Rai

2020 IEEE International Conference on Robotics and Automation (ICRA)

Learning how to walk with a real-world hexapod using a hierarchy of model-free RL for basic motion primitives with model-based RL for higher level planning.

[pdf][arxiv][code][video]Full Page

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

Jan 11, 2019

Nathan Lambert, Daniel Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer Pister

IEEE Robotics and Automation Letters

We used deep model-based reinforcement learning to have a quadrotor learn to hover from less than 5 minutes of all experimental training data.

[pdf][arxiv][code][video]Full Page

Reinforcement Learning

The most opaque and intriguing system

Open areas of study:

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

Zephyr: Direct Distillation of LM Alignment

The History and Risks of Reinforcement Learning and Human Feedback

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Synergy of Prediction and Control in Model-based Reinforcement Learning

Reward Reports for Reinforcement Learning

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

The Challenges of Exploration for Offline Reinforcement Learning

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Objective Mismatch in Model-based Reinforcement Learning

Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

Research Directions