diff --git a/README.md b/README.md index 1364491..4042450 100644 --- a/README.md +++ b/README.md @@ -1,26 +1,25 @@ ## ALR Robotics Control Environments -This project offers a large variety of reinforcement learning environments under a unifying interface base on OpenAI gym. -Besides, some custom environments we also provide support for the benchmark suites -[OpenAI gym](https://gym.openai.com/), +This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/). +Besides, we also provide support (under the OpenAI interface) for the benchmark suites [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) -(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according +(DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we -further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, +additionally support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP, we only consider the mean usually). ## Motion Primitive Environments (Episodic environments) Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box -optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full -trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The +optimization, and methods that are often used in robotics. MP environments are trajectory-based and always execute a full +trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (ProMP). The generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, -velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action +velocity, and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this -framework we support the above setting for the contextual setting, for which we expose all changing substates of the +framework we support all of this also for the contextual setting, for which we expose all changing substates of the task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each -trajectory. All environments provide the next to the cumulative episode reward also all collected information from each +trajectory. All environments provide next to the cumulative episode reward all collected information from each step as part of the info dictionary. This information should, however, mainly be used for debugging and logging. |Key| Description|