diff --git a/README.md b/README.md index 74e3838..860bddc 100644 --- a/README.md +++ b/README.md @@ -1,105 +1,212 @@ ## ALR Robotics Control Environments - -This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. -Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). -For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation. -We also support to solve environments with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually). -## Step-based Environments -Currently we have the following environments: - -### Mujoco - -|Name| Description|Horizon|Action Dimension|Observation Dimension -|---|---|---|---|---| -|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21 -|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21 -|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21 -|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 -|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 -|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 -|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip -|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip -|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip - -### Classic Control - -|Name| Description|Horizon|Action Dimension|Observation Dimension -|---|---|---|---|---| -|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 -|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 -|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 -|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 +This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym. +Besides, some custom environments we also provide support for the benchmark suites +[OpenAI gym](https://gym.openai.com/), +[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) +(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according +to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we +further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, +we only consider the mean usually). ## Motion Primitive Environments (Episodic environments) -Unlike step-based environments, these motion primitive (MP) environments are closer to stochastic search and what can be found in robotics. They always execute a full trajectory, which is computed by a Dynamic Motion Primitive (DMP) or Probabilitic Motion Primitive (DetPMP) and translated into individual actions with a controller, e.g. a PD controller. The actual Controller, however, depends on the type of environment, i.e. position, velocity, or torque controlled. -The goal is to learn the parametrization of the motion primitives in order to generate a suitable trajectory. -MP This can also be done in a contextual setting, where all changing elements of the task are exposed once in the beginning. This requires to find a new parametrization for each trajectory. -All environments provide the full cumulative episode reward and additional information about early terminations, e.g. due to collisions. -### Classic Control -|Name| Description|Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 -|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 -|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 -|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15 -|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35 -|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3 +Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box +optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full +trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The +generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, +however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, +velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action +represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this +framework we support the above setting for the contextual setting, for which we expose all changing substates of the +task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each +trajectory. All environments provide the next to the cumulative episode reward also all collected information from each +step as part of the info dictionary. This information should, however, mainly be used for debugging and logging. -[//]: |`HoleReacherDetPMP-v0`| +|Key| Description| +|---|---| +`trajectory`| Generated trajectory from MP +`step_actions`| Step-wise executed action based on controller output +`step_observations`| Step-wise intermediate observations +`step_rewards`| Step-wise rewards +`trajectory_length`| Total number of environment interactions +`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. -### OpenAI gym Environments -These environments are wrapped-versions of their OpenAI-gym counterparts. +## Installation -|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 -|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2 -|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 -|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 +1. Clone the repository -### Deep Mind Control Suite Environments -These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. -Given most task can be solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task. - -|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50 | 10 | 2 -|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50| 10 | 2 -|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 -|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 -|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 -|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 - -## Install -1. Clone the repository ```bash git clone git@github.com:ALRhub/alr_envs.git ``` -2. Go to the folder + +2. Go to the folder + ```bash cd alr_envs ``` -3. Install with + +3. Install with + ```bash pip install -e . ``` -4. Use (see [example.py](alr_envs/examples/examples_general.py)): -```python -import gym -env = gym.make('alr_envs:SimpleReacher-v0') +## Using the framework + +We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples. + +### Step-wise environments + +```python +import alr_envs + +env = alr_envs.make('HoleReacher-v0', seed=1) state = env.reset() -for i in range(10000): +for i in range(1000): state, reward, done, info = env.step(env.action_space.sample()) if i % 5 == 0: env.render() if done: state = env.reset() - ``` -For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py) +For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks +as `manipulation-environment_name`. All other environments can be created based on their original name. + +Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full +trajectory. + +```python +import alr_envs + +env = alr_envs.make('HoleReacherDetPMP-v0', seed=1) +# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). +env.render() + +state = env.reset() + +for i in range(5): + state, reward, done, info = env.step(env.action_space.sample()) + + # Not really necessary as the environments resets itself after each trajectory anyway. + state = env.reset() +``` + +To show all available environments, we provide some additional convenience. Each value will return a dictionary with two +keys `DMP` and `DetPMP` that store a list of available environment names. + +```python +import alr_envs + +print("Custom MP tasks:") +print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("OpenAI Gym MP tasks:") +print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("Deepmind Control MP tasks:") +print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("MetaWorld MP tasks:") +print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS) +``` + +### How to create a new MP task + +In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at +hand, the following interface needs to be implemented. + +```python +import numpy as np +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + + @property + def active_obs(self): + """ + Returns boolean mask for each substate in the full observation. + It determines whether the observation is returned for the contextual case or not. + This effectively allows to filter unwanted or unnecessary observations from the full step-based case. + E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first + observation, the velocities are not necessary in the observation for the MP task. + """ + return np.ones(self.observation_space.shape, dtype=bool) + + @property + def current_vel(self): + """ + Returns the current velocity of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using position control, + it should, however, be implemented regardless. + E.g. The joint velocities that are directly or indirectly controlled by the action. + """ + raise NotImplementedError() + + @property + def current_pos(self): + """ + Returns the current position of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using velocity control, + it should, however, be implemented regardless. + E.g. The joint positions that are directly or indirectly controlled by the action. + """ + raise NotImplementedError() + + @property + def goal_pos(self): + """ + Returns a predefined final position of the action/control dimension. + This is only required for the DMP and is most of the time learned instead. + """ + raise NotImplementedError() + + @property + def dt(self): + """ + Returns the time between two simulated steps of the environment + """ + raise NotImplementedError() + +``` + +If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. +Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend +having a look at the [examples](alr_envs/examples/). + +```python +import alr_envs + +# Base environment name, according to structure of above example +base_env_id = "ball_in_cup-catch" + +# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. +# You can also add other gym.Wrappers in case they are needed, +# e.g. gym.wrappers.FlattenObservation for dict observations +wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] +mp_kwargs = {...} +kwargs = {...} +env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs) +# OR for a deterministic ProMP (other mp_kwargs are required): +# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) + +rewards = 0 +obs = env.reset() + +# number of samples/full trajectories (multiple environment steps) +for i in range(5): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if done: + print(base_env_id, rewards) + rewards = 0 + obs = env.reset() +``` diff --git a/alr_envs/__init__.py b/alr_envs/__init__.py index a0d1b43..9fb3ae2 100644 --- a/alr_envs/__init__.py +++ b/alr_envs/__init__.py @@ -8,6 +8,13 @@ from alr_envs.utils.make_env_helpers import make_detpmp_env from alr_envs.utils.make_env_helpers import make from alr_envs.utils.make_env_helpers import make_rank +# Convenience function for all MP environments +ALL_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} +ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} + # Mujoco ## Reacher @@ -197,8 +204,9 @@ register( versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"] for v in versions: name = v.split("-") + env_id = f'{name[0]}DMP-{name[1]}' register( - id=f'{name[0]}DMP-{name[1]}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', # max_episode_steps=1, kwargs={ @@ -215,6 +223,28 @@ for v in versions: } } ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(env_id) + + env_id = f'{name[0]}DetPMP-{name[1]}' + register( + id=env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"alr_envs:{v}", + "wrappers": [classic_control.simple_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2 if "long" not in v.lower() else 5, + "num_basis": 5, + "duration": 2, + "width": 0.025, + "policy_type": "velocity", + "weights_scale": 0.2, + "zero_start": True + } + } + ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) register( id='ViaPointReacherDMP-v0', @@ -234,6 +264,7 @@ register( } } ) +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0") register( id='ViaPointReacherDetPMP-v0', @@ -253,12 +284,14 @@ register( } } ) +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ViaPointReacherDetPMP-v0") ## Hole Reacher versions = ["v0", "v1", "v2"] for v in versions: + env_id = f'HoleReacherDMP-{v}' register( - id=f'HoleReacherDMP-{v}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', # max_episode_steps=1, kwargs={ @@ -277,9 +310,11 @@ for v in versions: } } ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(env_id) + env_id = f'HoleReacherDetPMP-{v}' register( - id=f'HoleReacherDetPMP-{v}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": f"alr_envs:HoleReacher-{v}", @@ -295,6 +330,7 @@ for v in versions: } } ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) ## Deep Mind Control Suite (DMC) ### Suite @@ -305,13 +341,13 @@ register( # max_episode_steps=1, kwargs={ "name": f"ball_in_cup-catch", - "time_limit": 2, - "episode_length": 100, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.ball_in_cup.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 2, + "duration": 20, "learn_goal": True, "alpha_phase": 2, "bandwidth_factor": 2, @@ -324,19 +360,20 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0") register( id=f'dmc_ball_in_cup-catch_detpmp-v0', entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": f"ball_in_cup-catch", - "time_limit": 2, - "episode_length": 100, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.ball_in_cup.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 2, + "duration": 20, "width": 0.025, "policy_type": "motor", "zero_start": True, @@ -347,21 +384,21 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_ball_in_cup-catch_detpmp-v0") -# TODO tune episode length for all below register( id=f'dmc_reacher-easy_dmp-v0', entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', # max_episode_steps=1, kwargs={ "name": f"reacher-easy", - "time_limit": 1, - "episode_length": 50, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.reacher.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 1, + "duration": 20, "learn_goal": True, "alpha_phase": 2, "bandwidth_factor": 2, @@ -375,19 +412,20 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0") register( id=f'dmc_reacher-easy_detpmp-v0', entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": f"reacher-easy", - "time_limit": 1, - "episode_length": 50, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.reacher.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 1, + "duration": 20, "width": 0.025, "policy_type": "motor", "weights_scale": 0.2, @@ -399,6 +437,7 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_reacher-easy_detpmp-v0") register( id=f'dmc_reacher-hard_dmp-v0', @@ -406,13 +445,13 @@ register( # max_episode_steps=1, kwargs={ "name": f"reacher-hard", - "time_limit": 1, - "episode_length": 50, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.reacher.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 1, + "duration": 20, "learn_goal": True, "alpha_phase": 2, "bandwidth_factor": 2, @@ -426,19 +465,20 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0") register( id=f'dmc_reacher-hard_detpmp-v0', entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": f"reacher-hard", - "time_limit": 1, - "episode_length": 50, + "time_limit": 20, + "episode_length": 1000, "wrappers": [dmc.suite.reacher.MPWrapper], "mp_kwargs": { "num_dof": 2, "num_basis": 5, - "duration": 1, + "duration": 20, "width": 0.025, "policy_type": "motor", "weights_scale": 0.2, @@ -450,323 +490,67 @@ register( } } ) -register( - id=f'dmc_cartpole-balance_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-balance", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_reacher-hard_detpmp-v0") -register( - id=f'dmc_cartpole-balance_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-balance", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-balance_sparse_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-balance_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) +dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse", "two_poles", "three_poles"] -register( - id=f'dmc_cartpole-balance_sparse_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-balance_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 +for task in dmc_cartpole_tasks: + env_id = f'dmc_cartpole-{task}_dmp-v0' + register( + id=env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"cartpole-{task}", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [dmc.suite.cartpole.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } } } - } -) + ) + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(env_id) -register( - id=f'dmc_cartpole-swingup_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-swingup", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 + env_id = f'dmc_cartpole-{task}_detpmp-v0' + register( + id=env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"cartpole-{task}", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [dmc.suite.cartpole.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } } } - } -) - -register( - id=f'dmc_cartpole-swingup_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-swingup", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-swingup_sparse_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-swingup_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-swingup_sparse_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-swingup_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-two_poles_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=2)], - "wrappers": [dmc.suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-two_poles_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=2)], - "wrappers": [dmc.suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-three_poles_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=3)], - "wrappers": [dmc.suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-three_poles_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=3)], - "wrappers": [dmc.suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) + ) + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) ### Manipulation @@ -792,6 +576,7 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0") register( id=f'dmc_manipulation-reach_site_detpmp-v0', @@ -812,6 +597,7 @@ register( } } ) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_manipulation-reach_site_detpmp-v0") ## Open AI register( @@ -835,6 +621,7 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ContinuousMountainCarDetPMP-v0") register( id='ReacherDetPMP-v2', @@ -857,6 +644,7 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ReacherDetPMP-v2") register( id='FetchSlideDenseDetPMP-v1', @@ -875,6 +663,7 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchSlideDenseDetPMP-v1") register( id='FetchSlideDetPMP-v1', @@ -893,6 +682,7 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchSlideDetPMP-v1") register( id='FetchReachDenseDetPMP-v1', @@ -911,6 +701,7 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchReachDenseDetPMP-v1") register( id='FetchReachDetPMP-v1', @@ -929,19 +720,21 @@ register( } } ) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchReachDetPMP-v1") # MetaWorld goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2", ] -for env_id in goal_change_envs: - env_id_split = env_id.split("-") - name = "".join([s.capitalize() for s in env_id_split[:-1]]) +for task in goal_change_envs: + task_id_split = task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + env_id = f'{name}DetPMP-{task_id_split[-1]}' register( - id=f'{name}DetPMP-{env_id_split[-1]}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ - "name": env_id, + "name": task, "wrappers": [meta.goal_change.MPWrapper], "mp_kwargs": { "num_dof": 4, @@ -954,13 +747,15 @@ for env_id in goal_change_envs: } } ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"] -for env_id in object_change_envs: - env_id_split = env_id.split("-") - name = "".join([s.capitalize() for s in env_id_split[:-1]]) +for task in object_change_envs: + task_id_split = task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + env_id = f'{name}DetPMP-{task_id_split[-1]}' register( - id=f'{name}DetPMP-{env_id_split[-1]}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": env_id, @@ -976,6 +771,7 @@ for env_id in object_change_envs: } } ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2", "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2", @@ -988,11 +784,12 @@ goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press- "soccer-v2", "stick-push-v2", "stick-pull-v2", "push-wall-v2", "reach-wall-v2", "shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2" ] -for env_id in goal_and_object_change_envs: - env_id_split = env_id.split("-") - name = "".join([s.capitalize() for s in env_id_split[:-1]]) +for task in goal_and_object_change_envs: + task_id_split = task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + env_id = f'{name}DetPMP-{task_id_split[-1]}' register( - id=f'{name}DetPMP-{env_id_split[-1]}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": env_id, @@ -1008,13 +805,15 @@ for env_id in goal_and_object_change_envs: } } ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) goal_and_endeffector_change_envs = ["basketball-v2"] -for env_id in goal_and_endeffector_change_envs: - env_id_split = env_id.split("-") - name = "".join([s.capitalize() for s in env_id_split[:-1]]) +for task in goal_and_endeffector_change_envs: + task_id_split = task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + env_id = f'{name}DetPMP-{task_id_split[-1]}' register( - id=f'{name}DetPMP-{env_id_split[-1]}', + id=env_id, entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', kwargs={ "name": env_id, @@ -1030,3 +829,4 @@ for env_id in goal_and_endeffector_change_envs: } } ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(env_id) diff --git a/alr_envs/classic_control/README.MD b/alr_envs/classic_control/README.MD new file mode 100644 index 0000000..ebe2101 --- /dev/null +++ b/alr_envs/classic_control/README.MD @@ -0,0 +1,21 @@ +### Classic Control + +## Step-based Environments +|Name| Description|Horizon|Action Dimension|Observation Dimension +|---|---|---|---|---| +|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 +|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 +|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 +|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 + +## MP Environments +|Name| Description|Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 +|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 +|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 +|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15 +|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35 +|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3 + +[//]: |`HoleReacherDetPMP-v0`| \ No newline at end of file diff --git a/alr_envs/dmc/README.MD b/alr_envs/dmc/README.MD index f7d7475..791ee84 100644 --- a/alr_envs/dmc/README.MD +++ b/alr_envs/dmc/README.MD @@ -1,3 +1,19 @@ # DeepMind Control (DMC) Wrappers -These are the Environment Wrappers for selected [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) environments in order to use our Motion Primitive gym interface with them. \ No newline at end of file +These are the Environment Wrappers for selected +[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) +environments in order to use our Motion Primitive gym interface with them. + +## MP Environments + +[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. Given most task can be) +[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.) + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 +|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2 +|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 +|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 +|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 +|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 diff --git a/alr_envs/examples/examples_dmc.py b/alr_envs/examples/examples_dmc.py index 95dd51d..2d310c4 100644 --- a/alr_envs/examples/examples_dmc.py +++ b/alr_envs/examples/examples_dmc.py @@ -1,5 +1,4 @@ import alr_envs -from alr_envs.dmc.suite.ball_in_cup.mp_wrapper import MPWrapper def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): @@ -62,29 +61,29 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [MPWrapper] + wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] mp_kwargs = { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "learn_goal": True, + "num_dof": 2, # degrees of fredom a.k.a. the old action space dimensionality + "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis + "duration": 20, # length of trajectory in s, number of steps = duration / dt + "learn_goal": True, # learn the goal position (recommended) "alpha_phase": 2, "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { + "policy_type": "motor", # controller type, 'velocity', 'position', and 'motor' (torque control) + "weights_scale": 1, # scaling of MP weights + "goal_scale": 1, # scaling of learned goal position + "policy_kwargs": { # only required for torque control/PD-Controller "p_gains": 0.2, "d_gains": 0.05 } } kwargs = { - "time_limit": 20, - "episode_length": 1000, + "time_limit": 20, # same as duration value but as max horizon for underlying DMC environment + "episode_length": 1000, # corresponding number of episode steps # "frame_skip": 1 } env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) - # OR for a deterministic ProMP: + # OR for a deterministic ProMP (other mp_kwargs are required, see metaworld_examples): # env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) # This renders the full MP trajectory diff --git a/alr_envs/examples/examples_metaworld.py b/alr_envs/examples/examples_metaworld.py index b86b624..e88ed6c 100644 --- a/alr_envs/examples/examples_metaworld.py +++ b/alr_envs/examples/examples_metaworld.py @@ -1,5 +1,4 @@ import alr_envs -from alr_envs.meta.goal_and_object_change import MPWrapper def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): @@ -65,19 +64,20 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [MPWrapper] + wrappers = [alr_envs.meta.goal_and_object_change.MPWrapper] mp_kwargs = { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "post_traj_time": 0, - "width": 0.025, - "zero_start": True, - "policy_type": "metaworld", + "num_dof": 4, # degrees of fredom a.k.a. the old action space dimensionality + "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis + "duration": 6.25, # length of trajectory in s, number of steps = duration / dt + "post_traj_time": 0, # pad trajectory with additional zeros at the end (recommended: 0) + "width": 0.025, # width of the basis functions + "zero_start": True, # start from current environment position if True + "weights_scale": 1, # scaling of MP weights + "policy_type": "metaworld", # custom controller type for metaworld environments } env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) - # OR for a DMP: + # OR for a DMP (other mp_kwargs are required, see dmc_examples): # env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) # This renders the full MP trajectory diff --git a/alr_envs/mujoco/README.MD b/alr_envs/mujoco/README.MD new file mode 100644 index 0000000..0ea5a1f --- /dev/null +++ b/alr_envs/mujoco/README.MD @@ -0,0 +1,15 @@ +# Custom Mujoco tasks + +## Step-based Environments +|Name| Description|Horizon|Action Dimension|Observation Dimension +|---|---|---|---|---| +|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21 +|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21 +|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21 +|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 +|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 +|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 +|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip +|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip +|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip + \ No newline at end of file diff --git a/alr_envs/open_ai/README.MD b/alr_envs/open_ai/README.MD index 9c30ffe..985c093 100644 --- a/alr_envs/open_ai/README.MD +++ b/alr_envs/open_ai/README.MD @@ -1,3 +1,14 @@ # OpenAI Gym Wrappers -These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.com/) environments in order to use our Motion Primitive gym interface with them. \ No newline at end of file +These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.com/) environments to use +the Motion Primitive gym interface for them. + +## MP Environments +These environments are wrapped-versions of their OpenAI-gym counterparts. + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 +|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2 +|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 +|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4