Update README.md

This commit is contained in:
ottofabian 2021-12-01 14:31:47 +01:00 committed by GitHub
parent 4fd17e8a90
commit 1d5c2e6acb
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,26 +1,25 @@
## ALR Robotics Control Environments ## ALR Robotics Control Environments
This project offers a large variety of reinforcement learning environments under a unifying interface base on OpenAI gym. This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/).
Besides, some custom environments we also provide support for the benchmark suites Besides, we also provide support (under the OpenAI interface) for the benchmark suites
[OpenAI gym](https://gym.openai.com/),
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according (DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according
to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, additionally support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP,
we only consider the mean usually). we only consider the mean usually).
## Motion Primitive Environments (Episodic environments) ## Motion Primitive Environments (Episodic environments)
Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full optimization, and methods that are often used in robotics. MP environments are trajectory-based and always execute a full
trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (ProMP). The
generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action velocity, and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
framework we support the above setting for the contextual setting, for which we expose all changing substates of the framework we support all of this also for the contextual setting, for which we expose all changing substates of the
task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
trajectory. All environments provide the next to the cumulative episode reward also all collected information from each trajectory. All environments provide next to the cumulative episode reward all collected information from each
step as part of the info dictionary. This information should, however, mainly be used for debugging and logging. step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.
|Key| Description| |Key| Description|