My CVPhD ThesisContactGoogle ScholarSemantic Scholar
Representative papers where I am a primary contributor are highlighted

Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2

Advancements in instruction tuning and RLHF! Empirical studies.

The Alignment Ceiling: Objective Mismatch in Reinforcement Learning from Human Feedback

How the optimization setup of RLHF is limiting the steerability of LLMs.

Zephyr: Direct Distillation of LM Alignment

The report for a small and powerful chat model trained with DPO!

The History and Risks of Reinforcement Learning and Human Feedback

The complicated historical past underpinning reinforcement learning from human feedback!

A Unified View on Solving Objective Mismatch in Model-Based Reinforcement Learning

Where is model-based RL heading 4 years after the seminal paper of my Ph.D.

Measuring Data

When you "measure data", you quantify its characteristics to support dataset comparison & curation. You also begin to know what systems will learn. Many ML systems don't reason with this, we posit you should.

Synergy of Prediction and Control in Model-based Reinforcement Learning

My thesis on model-based RL. Let's make models work with tasks!

Reward Reports for Reinforcement Learning

We propose a new type of documentation for dynamic machine learning (and reinforcement learning) systems!

Choices, Risks, and Reward Reports: Charting Public Policy for Reinforcement Learning Systems

We detail why reinforcement learning systems pose a different type of (dynamic) risks to society. This paper outlines the different types of feedback present in RL systems, the risks they pose, and a path forward for policymakers.

The Challenges of Exploration for Offline Reinforcement Learning

We flip the script on Offline RL research and ask the question of "what is the best dataset to collect?" rather than "what is the best algorithm?"

Investigating Compounding Prediction Errors in Learned Dynamics Models

In this paper we set out to understand the causes of compounding prediction errors in one-step learned models. With this, we hope a next generation of models can be used to improve model-based reinforcement learning.

BotNet: A Simulator for Studying the Effects of Accurate Communication Models on Multi-agent and Swarm Control

A simulator for studying high-agent-count networked systems!

Axes for Sociotechnical Inquiry in AI Research

We present a concise set of directions for understanding the societal risks of new directions of AI research.

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning

An open-source PyTorch repository designed from the bottom up for model-based reinforcement learning research.

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning

We showed that advancements in AutoML when paired with common deep RL tasks, MBRL algorithms perform so well they break the simulator.

AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks

We study three developing subfields of AI research and their growing relationship with the sociotechnical: AI Safety, Fair Machine Learning, and Human-in-the-loop Autonomy.

Nonholonomic Yaw Control of an Underactuated Flying Robot with Model-based Reinforcement Learning

We explored how MBRL can learn multi-step, nonlinear controllers!

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Trying to reframe the MBRL framework with long-term predictions instead of one-step predictions!

Learning for Microrobot Exploration: Model-based Locomotion, Robust Navigation, and Low-Power Deep Classification

A collections of steps towards a data-driven autonomous microrobot.

Objective Mismatch in Model-based Reinforcement Learning

Studying the numerical effects of a dual-optimization problem in model-based reinforcement learning -- control and dynamics. When optimizing model accuracy, there is no guarantee on improving task performance!

Learning Generalizable Locomotion Skills with Hierarchical Reinforcement Learning

Learning how to walk with a real-world hexapod using a hierarchy of model-free RL for basic motion primitives with model-based RL for higher level planning.

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning

We used deep model-based reinforcement learning to have a quadrotor learn to hover from less than 5 minutes of all experimental training data.

Toward Controlled Flight of the Ionocraft: A Flying Microrobot Using Electrohydrodynamic Thrust With Onboard Sensing and No Moving Parts

A collection of steps towards controlled flight of The Ionocraft, a completely silent microrobot with ion thrust!

15min History of Reinforcement Learning and Human Feedback

A talk mirroring a recent paper of mine, The History and Risks of Reinforcement Learning and Human Feedback!
[Watch Me]
[Slides]

DPO: Is RL needed for RLHF?

I join the debate on if Direct Preference Optimization is the final solution for all things RLHF. Warning, technical.
[Watch Me]
[Slides]

Bridging RLHF from LLMs back to control

What the LLM community should learn from roboticists, and how we can all RLHF better together.
[Watch Me]
[Slides]

Objective Mismatch in Reinforcement Learning from Human Feedback

Linking the topics of my Ph.D. in model-based RL to all the happenings in RLHF.
[Watch Me]
[Slides]

Reinforcement Learning from Human Feedback: A Tutorial

Introduction to the basics of RLHF :)!
[Watch Me]
[Slides]

Reinforcement Learning from Human Feedback: Open and Academic Perspectives

A more technical version of my RLHF talk!
[Watch Me]
[Slides]

Intro to Reinforcement Learning from Human Feedback

I try and take you from 0 to ChatGPT on RLHF with language models!
[Watch Me]
[Slides]

Planning through Exploration and Exploitation in Model-based Reinforcement Learning

A talk on the links between exploration, model exploitation, and intrinsic curiosity from the lens of model-based reinforcement learning.
[Watch Me]
[Slides]

(Dissertation Talk) Synergy of Prediction and Control in Model-based Reinforcement Learning

My "defense" at Berkeley. As this is a talk I can only pass, I took liberties in trying to reflect the journey of a Ph.D.!
[Watch Me]
[Slides]

Machine Learning for Microsystem Control

An applied ML talk I gave at a sensor & actuator industry review event. I won best talk!
[Watch Me]
[Slides]

Improving Model Predictive Control Used in Model-based Reinforcement Learning

How can we use a better understanding of dynamics models to improve data-driven model predictive control (MPC).
[Watch Me]
[Slides]

Bringing Model-based RL to Novel Robots

My practice quals talk! Learn about how I got into model-based reinforcement learning and what I wanted to do in the end of my Ph.D.
[Watch Me]
[Slides]

Model Learning for Low-level Control in Robotics

A mixed talk discussing the research challenges of controlling microrobots and how model-learning can be used to synthesize highly specific controllers.
[Watch Me]
[Slides]

I have been lucky to work with many brilliant, younger students:

Berkeley Undergrads

Other Students

  • Brian Li (link forthcoming)