Merge remote-tracking branch 'origin/dmc_integration' into dmc_integration

# Conflicts: # README.md # alr_envs/__init__.py # setup.py
2021-07-26 17:13:10 +02:00 · 2021-07-26 17:13:10 +02:00 · f5fcbf7f54
commit f5fcbf7f54
parent 0dec89ff17 d2414797c7
16 changed files with 222 additions and 32 deletions
--- a/README.md
+++ b/README.md
@ -1,12 +1,11 @@
-## ALR Environments
+## ALR Robotics Control Environments
 This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. 
 Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
 For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
-We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation.
+We also support to solve environments with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually). 
 When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
-## Environments
+## Step-based Environments
 Currently we have the following environments: 
 ### Mujoco
@ -32,11 +31,13 @@ Currently we have the following environments:
 |`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 
 |`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
-### DMP Environments
+## Motion Primitive Environments (Episodic environments)
-These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
+Unlike step-based environments, these motion primitive (MP) environments are closer to stochastic search and what can be found in robotics. They always execute a full trajectory, which is computed by a Dynamic Motion Primitive (DMP) or Probabilitic Motion Primitive (DetPMP) and translated into individual actions with a controller, e.g. a PD controller. The actual Controller, however, depends on the type of environment, i.e. position, velocity, or torque controlled.
-The goal is to learn the parameters of this DMP to generate a suitable trajectory. 
+The goal is to learn the parametrization of the motion primitives in order to generate a suitable trajectory. 
-All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions. 
+MP This can also be done in a contextual setting, where all changing elements of the task are exposed once in the beginning. This requires to find a new parametrization for each trajectory.
 All environments provide the full cumulative episode reward and additional information about early terminations, e.g. due to collisions. 
 ### Classic Control
 |Name| Description|Horizon|Action Dimension|Context Dimension
 |---|---|---|---|---|
 |`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
@ -48,6 +49,29 @@ All environments provide the full episode reward and additional information abou
 [//]:  |`HoleReacherDetPMP-v0`|
 ### OpenAI gym Environments
 These environments are wrapped-versions of their OpenAI-gym counterparts.
 |Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
 |---|---|---|---|---|
 |`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
 |`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2
 |`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 
 |`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
 ### Deep Mind Control Suite Environments
 These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts.
 Given most task can be solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task. 
 |Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
 |---|---|---|---|---|
 |`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50 | 10 | 2
 |`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50| 10 | 2
 |`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
 |`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
 |`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
 |`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
 ## Install
 1. Clone the repository 
 ```bash 
--- a/alr_envs/init.py
+++ b/alr_envs/init.py
@ -8,6 +8,7 @@ from alr_envs.dmc.manipulation.reach.reach_mp_wrapper import DMCReachSiteMPWrapp
 from alr_envs.dmc.suite.ball_in_cup.ball_in_cup_mp_wrapper import DMCBallInCupMPWrapper
 from alr_envs.dmc.suite.cartpole.cartpole_mp_wrapper import DMCCartpoleMPWrapper, DMCCartpoleThreePolesMPWrapper, \
    DMCCartpoleTwoPolesMPWrapper
 from alr_envs.open_ai import reacher_v2, continuous_mountain_car, fetch
 from alr_envs.dmc.suite.reacher.reacher_mp_wrapper import DMCReacherMPWrapper
 # Mujoco
@ -790,3 +791,80 @@ register(
        }
    }
 )
 ## Open AI
 register(
    id='ContinuousMountainCarDetPMP-v0',
    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
    kwargs={
        "name": "gym.envs.classic_control:MountainCarContinuous-v0",
        "wrappers": [continuous_mountain_car.MPWrapper],
        "mp_kwargs": {
            "num_dof": 1,
            "num_basis": 4,
            "duration": 2,
            "post_traj_time": 0,
            "width": 0.02,
            "policy_type": "motor",
            "policy_kwargs": {
                "p_gains": 1.,
                "d_gains": 1.
            }
        }
    }
 )
 register(
    id='ReacherDetPMP-v2',
    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
    kwargs={
        "name": "gym.envs.mujoco:Reacher-v2",
        "wrappers": [reacher_v2.MPWrapper],
        "mp_kwargs": {
            "num_dof": 2,
            "num_basis": 6,
            "duration": 1,
            "post_traj_time": 0,
            "width": 0.02,
            "policy_type": "motor",
            "policy_kwargs": {
                "p_gains": .6,
                "d_gains": .075
            }
        }
    }
 )
 register(
    id='FetchSlideDenseDetPMP-v1',
    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
    kwargs={
        "name": "gym.envs.robotics:FetchSlideDense-v1",
        "wrappers": [fetch.MPWrapper],
        "mp_kwargs": {
            "num_dof": 4,
            "num_basis": 5,
            "duration": 2,
            "post_traj_time": 0,
            "width": 0.02,
            "policy_type": "position"
        }
    }
 )
 register(
    id='FetchReachDenseDetPMP-v1',
    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
    kwargs={
        "name": "gym.envs.robotics:FetchReachDense-v1",
        "wrappers": [fetch.MPWrapper],
        "mp_kwargs": {
            "num_dof": 4,
            "num_basis": 5,
            "duration": 2,
            "post_traj_time": 0,
            "width": 0.02,
            "policy_type": "position"
        }
    }
 )
--- a/alr_envs/examples/examples_open_ai.py
+++ b/alr_envs/examples/examples_open_ai.py
@ -0,0 +1,41 @@
 from alr_envs.utils.make_env_helpers import make_env
 def example_mp(env_name, seed=1):
    """
    Example for running a motion primitive based version of a OpenAI-gym environment, which is already registered.
    For more information on motion primitive specific stuff, look at the mp examples.
    Args:
        env_name: DetPMP env_id
        seed: seed
    Returns:
    """
    # While in this case gym.make() is possible to use as well, we recommend our custom make env function.
    env = make_env(env_name, seed)
    rewards = 0
    obs = env.reset()
    # number of samples/full trajectories (multiple environment steps)
    for i in range(10):
        ac = env.action_space.sample()
        obs, reward, done, info = env.step(ac)
        rewards += reward
        if done:
            print(rewards)
            rewards = 0
            obs = env.reset()
 if __name__ == '__main__':
    # DMP - not supported yet
    #example_mp("ReacherDetPMP-v2")
    # DetProMP
    example_mp("ContinuousMountainCarDetPMP-v0")
    example_mp("ReacherDetPMP-v2")
    example_mp("FetchReachDenseDetPMP-v1")
    example_mp("FetchSlideDenseDetPMP-v1")
--- a/alr_envs/open_ai/init.py
+++ b/alr_envs/open_ai/init.py
--- a/alr_envs/open_ai/continuous_mountain_car/init.py
+++ b/alr_envs/open_ai/continuous_mountain_car/init.py
@ -0,0 +1 @@
 from alr_envs.open_ai.continuous_mountain_car.mp_wrapper import MPWrapper
--- a/alr_envs/open_ai/continuous_mountain_car/mp_wrapper.py
+++ b/alr_envs/open_ai/continuous_mountain_car/mp_wrapper.py
@ -0,0 +1,22 @@
 from typing import Union
 import numpy as np
 from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
 class MPWrapper(MPEnvWrapper):
    @property
    def current_vel(self) -> Union[float, int, np.ndarray]:
        return np.array([self.state[1]])
    @property
    def current_pos(self) -> Union[float, int, np.ndarray]:
        return np.array([self.state[0]])
    @property
    def goal_pos(self):
        raise ValueError("Goal position is not available and has to be learnt based on the environment.")
    @property
    def dt(self) -> Union[float, int]:
        return 0.02
--- a/alr_envs/open_ai/fetch/init.py
+++ b/alr_envs/open_ai/fetch/init.py
@ -0,0 +1 @@
 from alr_envs.open_ai.fetch.mp_wrapper import MPWrapper
--- a/alr_envs/open_ai/fetch/mp_wrapper.py
+++ b/alr_envs/open_ai/fetch/mp_wrapper.py
@ -0,0 +1,22 @@
 from typing import Union
 import numpy as np
 from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
 class MPWrapper(MPEnvWrapper):
    @property
    def current_vel(self) -> Union[float, int, np.ndarray]:
        return self.unwrapped._get_obs()["observation"][-5:-1]
    @property
    def current_pos(self) -> Union[float, int, np.ndarray]:
        return self.unwrapped._get_obs()["observation"][:4]
    @property
    def goal_pos(self):
        raise ValueError("Goal position is not available and has to be learnt based on the environment.")
    @property
    def dt(self) -> Union[float, int]:
        return self.env.dt
--- a/alr_envs/open_ai/reacher_v2/init.py
+++ b/alr_envs/open_ai/reacher_v2/init.py
@ -0,0 +1 @@
 from alr_envs.open_ai.reacher_v2.mp_wrapper import MPWrapper
--- a/alr_envs/open_ai/reacher_v2/mp_wrapper.py
+++ b/alr_envs/open_ai/reacher_v2/mp_wrapper.py
@ -0,0 +1,19 @@
 from typing import Union
 import numpy as np
 from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
 class MPWrapper(MPEnvWrapper):
    @property
    def current_vel(self) -> Union[float, int, np.ndarray]:
        return self.sim.data.qvel[:2]
    @property
    def current_pos(self) -> Union[float, int, np.ndarray]:
        return self.sim.data.qpos[:2]
    @property
    def dt(self) -> Union[float, int]:
        return self.env.dt
--- a/reacher.egg-info/PKG-INFO
+++ b/reacher.egg-info/PKG-INFO
@ -1,10 +0,0 @@
 Metadata-Version: 1.0
 Name: reacher
 Version: 0.0.1
 Summary: UNKNOWN
 Home-page: UNKNOWN
 Author: UNKNOWN
 Author-email: UNKNOWN
 License: UNKNOWN
 Description: UNKNOWN
 Platform: UNKNOWN
--- a/reacher.egg-info/SOURCES.txt
+++ b/reacher.egg-info/SOURCES.txt
@ -1,7 +0,0 @@
 README.md
 setup.py
 reacher.egg-info/PKG-INFO
 reacher.egg-info/SOURCES.txt
 reacher.egg-info/dependency_links.txt
 reacher.egg-info/requires.txt
 reacher.egg-info/top_level.txt
--- a/reacher.egg-info/dependency_links.txt
+++ b/reacher.egg-info/dependency_links.txt
@ -1 +0,0 @@
--- a/reacher.egg-info/requires.txt
+++ b/reacher.egg-info/requires.txt
@ -1 +0,0 @@
 gym
--- a/reacher.egg-info/top_level.txt
+++ b/reacher.egg-info/top_level.txt
@ -1 +0,0 @@
--- a/setup.py
+++ b/setup.py
@ -3,14 +3,15 @@ from setuptools import setup
 setup(
    name='alr_envs',
    version='0.0.1',
-    packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.mujoco', 'alr_envs.stochastic_search',
+    packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.open_ai', 'alr_envs.mujoco', 'alr_envs.stochastic_search',
              'alr_envs.utils'],
    install_requires=[
        'gym',
        'PyQt5',
        'matplotlib',
-        # 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git',
+        'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git',
-        'mujoco_py'
+        'mujoco-py<2.1,>=2.0',
        'dm_control'
    ],
    url='https://github.com/ALRhub/alr_envs/',
		`@ -0,0 +1 @@`
							`from alr_envs.open_ai.continuous_mountain_car.mp_wrapper import MPWrapper`
		`@ -0,0 +1 @@`
							`from alr_envs.open_ai.fetch.mp_wrapper import MPWrapper`
		`@ -0,0 +1 @@`
							`from alr_envs.open_ai.reacher_v2.mp_wrapper import MPWrapper`