2023-12-05 18:14:12 +01:00
|
|
|
|
What is Episodic RL?
|
|
|
|
|
--------------------
|
|
|
|
|
|
|
|
|
|
.. raw:: html
|
|
|
|
|
|
2023-12-13 15:50:13 +01:00
|
|
|
|
<div class="justify">
|
2023-12-05 18:14:12 +01:00
|
|
|
|
|
|
|
|
|
Movement primitive (MP) environments differ from traditional step-based
|
|
|
|
|
environments. They align more with concepts from stochastic search,
|
|
|
|
|
black-box optimization, and methods commonly found in classical robotics
|
|
|
|
|
and control. Instead of individual steps, MP environments operate on an
|
|
|
|
|
episode basis, executing complete trajectories. These trajectories are
|
|
|
|
|
produced by trajectory generators like Dynamic Movement Primitives
|
|
|
|
|
(DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic
|
|
|
|
|
Dynamic Movement Primitives (ProDMP).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Once generated, these trajectories are converted into step-by-step
|
|
|
|
|
actions using a trajectory tracking controller. The specific controller
|
|
|
|
|
chosen depends on the environment’s requirements. Currently, we support
|
|
|
|
|
position, velocity, and PD-Controllers tailored for position, velocity,
|
|
|
|
|
and torque control. Additionally, we have a specialized controller
|
|
|
|
|
designed for the MetaWorld control suite.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
While the overarching objective of MP environments remains the learning
|
|
|
|
|
of an optimal policy, the actions here represent the parametrization of
|
|
|
|
|
motion primitives to craft the right trajectory. Our framework further
|
|
|
|
|
enhances this by accommodating a contextual setting. At the episode’s
|
|
|
|
|
onset, we present the context space—a subset of the observation space.
|
|
|
|
|
This demands the prediction of a new action or MP parametrization for
|
|
|
|
|
every unique context.
|
|
|
|
|
|
|
|
|
|
.. raw:: html
|
|
|
|
|
|
2023-12-13 15:50:13 +01:00
|
|
|
|
</div>
|