diff --git a/README.md b/README.md index 88808e7..74e3838 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,11 @@ -## ALR Environments +## ALR Robotics Control Environments This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation. -We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation. -When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks. +We also support to solve environments with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually). -## Environments +## Step-based Environments Currently we have the following environments: ### Mujoco @@ -32,11 +31,13 @@ Currently we have the following environments: |`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 |`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 -### DMP Environments -These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller. -The goal is to learn the parameters of this DMP to generate a suitable trajectory. -All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions. +## Motion Primitive Environments (Episodic environments) +Unlike step-based environments, these motion primitive (MP) environments are closer to stochastic search and what can be found in robotics. They always execute a full trajectory, which is computed by a Dynamic Motion Primitive (DMP) or Probabilitic Motion Primitive (DetPMP) and translated into individual actions with a controller, e.g. a PD controller. The actual Controller, however, depends on the type of environment, i.e. position, velocity, or torque controlled. +The goal is to learn the parametrization of the motion primitives in order to generate a suitable trajectory. +MP This can also be done in a contextual setting, where all changing elements of the task are exposed once in the beginning. This requires to find a new parametrization for each trajectory. +All environments provide the full cumulative episode reward and additional information about early terminations, e.g. due to collisions. +### Classic Control |Name| Description|Horizon|Action Dimension|Context Dimension |---|---|---|---|---| |`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 @@ -48,6 +49,29 @@ All environments provide the full episode reward and additional information abou [//]: |`HoleReacherDetPMP-v0`| +### OpenAI gym Environments +These environments are wrapped-versions of their OpenAI-gym counterparts. + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 +|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2 +|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 +|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 + +### Deep Mind Control Suite Environments +These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. +Given most task can be solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task. + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50 | 10 | 2 +|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50| 10 | 2 +|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 +|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 +|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 +|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 + ## Install 1. Clone the repository ```bash @@ -78,4 +102,4 @@ for i in range(10000): ``` -For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py) \ No newline at end of file +For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py) diff --git a/alr_envs/__init__.py b/alr_envs/__init__.py index c3da16d..cfa6251 100644 --- a/alr_envs/__init__.py +++ b/alr_envs/__init__.py @@ -8,6 +8,7 @@ from alr_envs.dmc.manipulation.reach.reach_mp_wrapper import DMCReachSiteMPWrapp from alr_envs.dmc.suite.ball_in_cup.ball_in_cup_mp_wrapper import DMCBallInCupMPWrapper from alr_envs.dmc.suite.cartpole.cartpole_mp_wrapper import DMCCartpoleMPWrapper, DMCCartpoleThreePolesMPWrapper, \ DMCCartpoleTwoPolesMPWrapper +from alr_envs.open_ai import reacher_v2, continuous_mountain_car, fetch from alr_envs.dmc.suite.reacher.reacher_mp_wrapper import DMCReacherMPWrapper # Mujoco @@ -790,3 +791,80 @@ register( } } ) + +## Open AI +register( + id='ContinuousMountainCarDetPMP-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.classic_control:MountainCarContinuous-v0", + "wrappers": [continuous_mountain_car.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 4, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "policy_type": "motor", + "policy_kwargs": { + "p_gains": 1., + "d_gains": 1. + } + } + } +) + +register( + id='ReacherDetPMP-v2', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.mujoco:Reacher-v2", + "wrappers": [reacher_v2.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 6, + "duration": 1, + "post_traj_time": 0, + "width": 0.02, + "policy_type": "motor", + "policy_kwargs": { + "p_gains": .6, + "d_gains": .075 + } + } + } +) + +register( + id='FetchSlideDenseDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchSlideDense-v1", + "wrappers": [fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "policy_type": "position" + } + } +) + +register( + id='FetchReachDenseDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchReachDense-v1", + "wrappers": [fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "policy_type": "position" + } + } +) diff --git a/alr_envs/examples/examples_open_ai.py b/alr_envs/examples/examples_open_ai.py new file mode 100644 index 0000000..d001bc8 --- /dev/null +++ b/alr_envs/examples/examples_open_ai.py @@ -0,0 +1,41 @@ +from alr_envs.utils.make_env_helpers import make_env + + +def example_mp(env_name, seed=1): + """ + Example for running a motion primitive based version of a OpenAI-gym environment, which is already registered. + For more information on motion primitive specific stuff, look at the mp examples. + Args: + env_name: DetPMP env_id + seed: seed + + Returns: + + """ + # While in this case gym.make() is possible to use as well, we recommend our custom make env function. + env = make_env(env_name, seed) + + rewards = 0 + obs = env.reset() + + # number of samples/full trajectories (multiple environment steps) + for i in range(10): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if done: + print(rewards) + rewards = 0 + obs = env.reset() + +if __name__ == '__main__': + # DMP - not supported yet + #example_mp("ReacherDetPMP-v2") + + # DetProMP + example_mp("ContinuousMountainCarDetPMP-v0") + example_mp("ReacherDetPMP-v2") + example_mp("FetchReachDenseDetPMP-v1") + example_mp("FetchSlideDenseDetPMP-v1") + diff --git a/alr_envs/open_ai/__init__.py b/alr_envs/open_ai/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/alr_envs/open_ai/continuous_mountain_car/__init__.py b/alr_envs/open_ai/continuous_mountain_car/__init__.py new file mode 100644 index 0000000..36f731d --- /dev/null +++ b/alr_envs/open_ai/continuous_mountain_car/__init__.py @@ -0,0 +1 @@ +from alr_envs.open_ai.continuous_mountain_car.mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/open_ai/continuous_mountain_car/mp_wrapper.py b/alr_envs/open_ai/continuous_mountain_car/mp_wrapper.py new file mode 100644 index 0000000..29378ed --- /dev/null +++ b/alr_envs/open_ai/continuous_mountain_car/mp_wrapper.py @@ -0,0 +1,22 @@ +from typing import Union + +import numpy as np +from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + @property + def current_vel(self) -> Union[float, int, np.ndarray]: + return np.array([self.state[1]]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return np.array([self.state[0]]) + + @property + def goal_pos(self): + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return 0.02 \ No newline at end of file diff --git a/alr_envs/open_ai/fetch/__init__.py b/alr_envs/open_ai/fetch/__init__.py new file mode 100644 index 0000000..2e68176 --- /dev/null +++ b/alr_envs/open_ai/fetch/__init__.py @@ -0,0 +1 @@ +from alr_envs.open_ai.fetch.mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/open_ai/fetch/mp_wrapper.py b/alr_envs/open_ai/fetch/mp_wrapper.py new file mode 100644 index 0000000..6602a18 --- /dev/null +++ b/alr_envs/open_ai/fetch/mp_wrapper.py @@ -0,0 +1,22 @@ +from typing import Union + +import numpy as np +from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + @property + def current_vel(self) -> Union[float, int, np.ndarray]: + return self.unwrapped._get_obs()["observation"][-5:-1] + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.unwrapped._get_obs()["observation"][:4] + + @property + def goal_pos(self): + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return self.env.dt \ No newline at end of file diff --git a/alr_envs/open_ai/reacher_v2/__init__.py b/alr_envs/open_ai/reacher_v2/__init__.py new file mode 100644 index 0000000..48a5615 --- /dev/null +++ b/alr_envs/open_ai/reacher_v2/__init__.py @@ -0,0 +1 @@ +from alr_envs.open_ai.reacher_v2.mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/open_ai/reacher_v2/mp_wrapper.py b/alr_envs/open_ai/reacher_v2/mp_wrapper.py new file mode 100644 index 0000000..d3181b5 --- /dev/null +++ b/alr_envs/open_ai/reacher_v2/mp_wrapper.py @@ -0,0 +1,19 @@ +from typing import Union + +import numpy as np +from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + + @property + def current_vel(self) -> Union[float, int, np.ndarray]: + return self.sim.data.qvel[:2] + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.sim.data.qpos[:2] + + @property + def dt(self) -> Union[float, int]: + return self.env.dt \ No newline at end of file diff --git a/reacher.egg-info/PKG-INFO b/reacher.egg-info/PKG-INFO deleted file mode 100644 index 9ea9f7e..0000000 --- a/reacher.egg-info/PKG-INFO +++ /dev/null @@ -1,10 +0,0 @@ -Metadata-Version: 1.0 -Name: reacher -Version: 0.0.1 -Summary: UNKNOWN -Home-page: UNKNOWN -Author: UNKNOWN -Author-email: UNKNOWN -License: UNKNOWN -Description: UNKNOWN -Platform: UNKNOWN diff --git a/reacher.egg-info/SOURCES.txt b/reacher.egg-info/SOURCES.txt deleted file mode 100644 index b771181..0000000 --- a/reacher.egg-info/SOURCES.txt +++ /dev/null @@ -1,7 +0,0 @@ -README.md -setup.py -reacher.egg-info/PKG-INFO -reacher.egg-info/SOURCES.txt -reacher.egg-info/dependency_links.txt -reacher.egg-info/requires.txt -reacher.egg-info/top_level.txt \ No newline at end of file diff --git a/reacher.egg-info/dependency_links.txt b/reacher.egg-info/dependency_links.txt deleted file mode 100644 index 8b13789..0000000 --- a/reacher.egg-info/dependency_links.txt +++ /dev/null @@ -1 +0,0 @@ - diff --git a/reacher.egg-info/requires.txt b/reacher.egg-info/requires.txt deleted file mode 100644 index 1e6c2dd..0000000 --- a/reacher.egg-info/requires.txt +++ /dev/null @@ -1 +0,0 @@ -gym diff --git a/reacher.egg-info/top_level.txt b/reacher.egg-info/top_level.txt deleted file mode 100644 index 8b13789..0000000 --- a/reacher.egg-info/top_level.txt +++ /dev/null @@ -1 +0,0 @@ - diff --git a/setup.py b/setup.py index 6c3c658..189be19 100644 --- a/setup.py +++ b/setup.py @@ -3,14 +3,15 @@ from setuptools import setup setup( name='alr_envs', version='0.0.1', - packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.mujoco', 'alr_envs.stochastic_search', + packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.open_ai', 'alr_envs.mujoco', 'alr_envs.stochastic_search', 'alr_envs.utils'], install_requires=[ 'gym', 'PyQt5', 'matplotlib', - # 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git', - 'mujoco_py' + 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git', + 'mujoco-py<2.1,>=2.0', + 'dm_control' ], url='https://github.com/ALRhub/alr_envs/',