Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning

Accurately predicting the dynamics of robotic systems is crucial for model-based control and reinforcement learning. The most common way to estimate dynamics is by fitting a one-step ahead prediction model and using it to recursively propagate the predicted state distribution over long horizons. Unfortunately, this approach is known to compound even small prediction errors, making long-term predictions inaccurate. In this paper, we propose a new parametrization to supervised learning on state-action data to stably predict at longer horizons--that we call a trajectory-based model. This trajectory-based model takes an initial state, a future time index, and control parameters as inputs, and predicts the state at the future time. Our results in simulated and experimental robotic tasks show that our trajectory-based models yield significantly more accurate long term predictions, improved sample efficiency, and ability to predict task reward.


  1. Current methods for predicting into the future of MBRL are not thematically matched with how they are used for control.
  2. Reparametrizing the supervised learning problem can convey other benefits than prediction accuracy, such as sample efficiency.
  3. Creating new control paradigms with new models is hard.


 title={Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning},
 author={Lambert, Nathan O and Wilcox, Albert and Zhang, Howard and Pister, Kristofer SJ and Calandra, Roberto},
 journal={arXiv preprint arXiv:2012.09156},
 year={2020} }