Merge remote-tracking branch 'origin/dmc_integration' into dmc_integration

# Conflicts:
#	README.md
#	alr_envs/__init__.py
#	setup.py
This commit is contained in:
ottofabian 2021-07-26 17:13:10 +02:00
commit f5fcbf7f54
16 changed files with 222 additions and 32 deletions

View File

@ -1,12 +1,11 @@
## ALR Environments ## ALR Robotics Control Environments
This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc.
Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation. For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation. We also support to solve environments with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually).
When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
## Environments ## Step-based Environments
Currently we have the following environments: Currently we have the following environments:
### Mujoco ### Mujoco
@ -32,11 +31,13 @@ Currently we have the following environments:
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 |`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 |`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
### DMP Environments ## Motion Primitive Environments (Episodic environments)
These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller. Unlike step-based environments, these motion primitive (MP) environments are closer to stochastic search and what can be found in robotics. They always execute a full trajectory, which is computed by a Dynamic Motion Primitive (DMP) or Probabilitic Motion Primitive (DetPMP) and translated into individual actions with a controller, e.g. a PD controller. The actual Controller, however, depends on the type of environment, i.e. position, velocity, or torque controlled.
The goal is to learn the parameters of this DMP to generate a suitable trajectory. The goal is to learn the parametrization of the motion primitives in order to generate a suitable trajectory.
All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions. MP This can also be done in a contextual setting, where all changing elements of the task are exposed once in the beginning. This requires to find a new parametrization for each trajectory.
All environments provide the full cumulative episode reward and additional information about early terminations, e.g. due to collisions.
### Classic Control
|Name| Description|Horizon|Action Dimension|Context Dimension |Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---| |---|---|---|---|---|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 |`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
@ -48,6 +49,29 @@ All environments provide the full episode reward and additional information abou
[//]: |`HoleReacherDetPMP-v0`| [//]: |`HoleReacherDetPMP-v0`|
### OpenAI gym Environments
These environments are wrapped-versions of their OpenAI-gym counterparts.
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2
|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4
|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
### Deep Mind Control Suite Environments
These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts.
Given most task can be solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50 | 10 | 2
|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50| 10 | 2
|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
## Install ## Install
1. Clone the repository 1. Clone the repository
```bash ```bash

View File

@ -8,6 +8,7 @@ from alr_envs.dmc.manipulation.reach.reach_mp_wrapper import DMCReachSiteMPWrapp
from alr_envs.dmc.suite.ball_in_cup.ball_in_cup_mp_wrapper import DMCBallInCupMPWrapper from alr_envs.dmc.suite.ball_in_cup.ball_in_cup_mp_wrapper import DMCBallInCupMPWrapper
from alr_envs.dmc.suite.cartpole.cartpole_mp_wrapper import DMCCartpoleMPWrapper, DMCCartpoleThreePolesMPWrapper, \ from alr_envs.dmc.suite.cartpole.cartpole_mp_wrapper import DMCCartpoleMPWrapper, DMCCartpoleThreePolesMPWrapper, \
DMCCartpoleTwoPolesMPWrapper DMCCartpoleTwoPolesMPWrapper
from alr_envs.open_ai import reacher_v2, continuous_mountain_car, fetch
from alr_envs.dmc.suite.reacher.reacher_mp_wrapper import DMCReacherMPWrapper from alr_envs.dmc.suite.reacher.reacher_mp_wrapper import DMCReacherMPWrapper
# Mujoco # Mujoco
@ -790,3 +791,80 @@ register(
} }
} }
) )
## Open AI
register(
id='ContinuousMountainCarDetPMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
kwargs={
"name": "gym.envs.classic_control:MountainCarContinuous-v0",
"wrappers": [continuous_mountain_car.MPWrapper],
"mp_kwargs": {
"num_dof": 1,
"num_basis": 4,
"duration": 2,
"post_traj_time": 0,
"width": 0.02,
"policy_type": "motor",
"policy_kwargs": {
"p_gains": 1.,
"d_gains": 1.
}
}
}
)
register(
id='ReacherDetPMP-v2',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
kwargs={
"name": "gym.envs.mujoco:Reacher-v2",
"wrappers": [reacher_v2.MPWrapper],
"mp_kwargs": {
"num_dof": 2,
"num_basis": 6,
"duration": 1,
"post_traj_time": 0,
"width": 0.02,
"policy_type": "motor",
"policy_kwargs": {
"p_gains": .6,
"d_gains": .075
}
}
}
)
register(
id='FetchSlideDenseDetPMP-v1',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
kwargs={
"name": "gym.envs.robotics:FetchSlideDense-v1",
"wrappers": [fetch.MPWrapper],
"mp_kwargs": {
"num_dof": 4,
"num_basis": 5,
"duration": 2,
"post_traj_time": 0,
"width": 0.02,
"policy_type": "position"
}
}
)
register(
id='FetchReachDenseDetPMP-v1',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
kwargs={
"name": "gym.envs.robotics:FetchReachDense-v1",
"wrappers": [fetch.MPWrapper],
"mp_kwargs": {
"num_dof": 4,
"num_basis": 5,
"duration": 2,
"post_traj_time": 0,
"width": 0.02,
"policy_type": "position"
}
}
)

View File

@ -0,0 +1,41 @@
from alr_envs.utils.make_env_helpers import make_env
def example_mp(env_name, seed=1):
"""
Example for running a motion primitive based version of a OpenAI-gym environment, which is already registered.
For more information on motion primitive specific stuff, look at the mp examples.
Args:
env_name: DetPMP env_id
seed: seed
Returns:
"""
# While in this case gym.make() is possible to use as well, we recommend our custom make env function.
env = make_env(env_name, seed)
rewards = 0
obs = env.reset()
# number of samples/full trajectories (multiple environment steps)
for i in range(10):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
rewards += reward
if done:
print(rewards)
rewards = 0
obs = env.reset()
if __name__ == '__main__':
# DMP - not supported yet
#example_mp("ReacherDetPMP-v2")
# DetProMP
example_mp("ContinuousMountainCarDetPMP-v0")
example_mp("ReacherDetPMP-v2")
example_mp("FetchReachDenseDetPMP-v1")
example_mp("FetchSlideDenseDetPMP-v1")

View File

View File

@ -0,0 +1 @@
from alr_envs.open_ai.continuous_mountain_car.mp_wrapper import MPWrapper

View File

@ -0,0 +1,22 @@
from typing import Union
import numpy as np
from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def current_vel(self) -> Union[float, int, np.ndarray]:
return np.array([self.state[1]])
@property
def current_pos(self) -> Union[float, int, np.ndarray]:
return np.array([self.state[0]])
@property
def goal_pos(self):
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return 0.02

View File

@ -0,0 +1 @@
from alr_envs.open_ai.fetch.mp_wrapper import MPWrapper

View File

@ -0,0 +1,22 @@
from typing import Union
import numpy as np
from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def current_vel(self) -> Union[float, int, np.ndarray]:
return self.unwrapped._get_obs()["observation"][-5:-1]
@property
def current_pos(self) -> Union[float, int, np.ndarray]:
return self.unwrapped._get_obs()["observation"][:4]
@property
def goal_pos(self):
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -0,0 +1 @@
from alr_envs.open_ai.reacher_v2.mp_wrapper import MPWrapper

View File

@ -0,0 +1,19 @@
from typing import Union
import numpy as np
from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def current_vel(self) -> Union[float, int, np.ndarray]:
return self.sim.data.qvel[:2]
@property
def current_pos(self) -> Union[float, int, np.ndarray]:
return self.sim.data.qpos[:2]
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -1,10 +0,0 @@
Metadata-Version: 1.0
Name: reacher
Version: 0.0.1
Summary: UNKNOWN
Home-page: UNKNOWN
Author: UNKNOWN
Author-email: UNKNOWN
License: UNKNOWN
Description: UNKNOWN
Platform: UNKNOWN

View File

@ -1,7 +0,0 @@
README.md
setup.py
reacher.egg-info/PKG-INFO
reacher.egg-info/SOURCES.txt
reacher.egg-info/dependency_links.txt
reacher.egg-info/requires.txt
reacher.egg-info/top_level.txt

View File

@ -1 +0,0 @@

View File

@ -1 +0,0 @@
gym

View File

@ -1 +0,0 @@

View File

@ -3,14 +3,15 @@ from setuptools import setup
setup( setup(
name='alr_envs', name='alr_envs',
version='0.0.1', version='0.0.1',
packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.mujoco', 'alr_envs.stochastic_search', packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.open_ai', 'alr_envs.mujoco', 'alr_envs.stochastic_search',
'alr_envs.utils'], 'alr_envs.utils'],
install_requires=[ install_requires=[
'gym', 'gym',
'PyQt5', 'PyQt5',
'matplotlib', 'matplotlib',
# 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git', 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git',
'mujoco_py' 'mujoco-py<2.1,>=2.0',
'dm_control'
], ],
url='https://github.com/ALRhub/alr_envs/', url='https://github.com/ALRhub/alr_envs/',