diff --git a/README.md b/README.md index 74e3838..860bddc 100644 --- a/README.md +++ b/README.md @@ -1,105 +1,212 @@ ## ALR Robotics Control Environments - -This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. -Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). -For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation. -We also support to solve environments with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually). -## Step-based Environments -Currently we have the following environments: - -### Mujoco - -|Name| Description|Horizon|Action Dimension|Observation Dimension -|---|---|---|---|---| -|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21 -|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21 -|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21 -|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 -|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 -|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 -|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip -|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip -|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip - -### Classic Control - -|Name| Description|Horizon|Action Dimension|Observation Dimension -|---|---|---|---|---| -|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 -|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 -|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 -|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 +This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym. +Besides, some custom environments we also provide support for the benchmark suites +[OpenAI gym](https://gym.openai.com/), +[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) +(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according +to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we +further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, +we only consider the mean usually). ## Motion Primitive Environments (Episodic environments) -Unlike step-based environments, these motion primitive (MP) environments are closer to stochastic search and what can be found in robotics. They always execute a full trajectory, which is computed by a Dynamic Motion Primitive (DMP) or Probabilitic Motion Primitive (DetPMP) and translated into individual actions with a controller, e.g. a PD controller. The actual Controller, however, depends on the type of environment, i.e. position, velocity, or torque controlled. -The goal is to learn the parametrization of the motion primitives in order to generate a suitable trajectory. -MP This can also be done in a contextual setting, where all changing elements of the task are exposed once in the beginning. This requires to find a new parametrization for each trajectory. -All environments provide the full cumulative episode reward and additional information about early terminations, e.g. due to collisions. -### Classic Control -|Name| Description|Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 -|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 -|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 -|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15 -|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35 -|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3 +Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box +optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full +trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The +generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, +however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, +velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action +represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this +framework we support the above setting for the contextual setting, for which we expose all changing substates of the +task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each +trajectory. All environments provide the next to the cumulative episode reward also all collected information from each +step as part of the info dictionary. This information should, however, mainly be used for debugging and logging. -[//]: |`HoleReacherDetPMP-v0`| +|Key| Description| +|---|---| +`trajectory`| Generated trajectory from MP +`step_actions`| Step-wise executed action based on controller output +`step_observations`| Step-wise intermediate observations +`step_rewards`| Step-wise rewards +`trajectory_length`| Total number of environment interactions +`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. -### OpenAI gym Environments -These environments are wrapped-versions of their OpenAI-gym counterparts. +## Installation -|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 -|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2 -|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 -|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 +1. Clone the repository -### Deep Mind Control Suite Environments -These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. -Given most task can be solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task. - -|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension -|---|---|---|---|---| -|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50 | 10 | 2 -|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 50| 10 | 2 -|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 -|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 -|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 -|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 - -## Install -1. Clone the repository ```bash git clone git@github.com:ALRhub/alr_envs.git ``` -2. Go to the folder + +2. Go to the folder + ```bash cd alr_envs ``` -3. Install with + +3. Install with + ```bash pip install -e . ``` -4. Use (see [example.py](alr_envs/examples/examples_general.py)): -```python -import gym -env = gym.make('alr_envs:SimpleReacher-v0') +## Using the framework + +We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples. + +### Step-wise environments + +```python +import alr_envs + +env = alr_envs.make('HoleReacher-v0', seed=1) state = env.reset() -for i in range(10000): +for i in range(1000): state, reward, done, info = env.step(env.action_space.sample()) if i % 5 == 0: env.render() if done: state = env.reset() - ``` -For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py) +For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks +as `manipulation-environment_name`. All other environments can be created based on their original name. + +Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full +trajectory. + +```python +import alr_envs + +env = alr_envs.make('HoleReacherDetPMP-v0', seed=1) +# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). +env.render() + +state = env.reset() + +for i in range(5): + state, reward, done, info = env.step(env.action_space.sample()) + + # Not really necessary as the environments resets itself after each trajectory anyway. + state = env.reset() +``` + +To show all available environments, we provide some additional convenience. Each value will return a dictionary with two +keys `DMP` and `DetPMP` that store a list of available environment names. + +```python +import alr_envs + +print("Custom MP tasks:") +print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("OpenAI Gym MP tasks:") +print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("Deepmind Control MP tasks:") +print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS) + +print("MetaWorld MP tasks:") +print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS) +``` + +### How to create a new MP task + +In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at +hand, the following interface needs to be implemented. + +```python +import numpy as np +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + + @property + def active_obs(self): + """ + Returns boolean mask for each substate in the full observation. + It determines whether the observation is returned for the contextual case or not. + This effectively allows to filter unwanted or unnecessary observations from the full step-based case. + E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first + observation, the velocities are not necessary in the observation for the MP task. + """ + return np.ones(self.observation_space.shape, dtype=bool) + + @property + def current_vel(self): + """ + Returns the current velocity of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using position control, + it should, however, be implemented regardless. + E.g. The joint velocities that are directly or indirectly controlled by the action. + """ + raise NotImplementedError() + + @property + def current_pos(self): + """ + Returns the current position of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using velocity control, + it should, however, be implemented regardless. + E.g. The joint positions that are directly or indirectly controlled by the action. + """ + raise NotImplementedError() + + @property + def goal_pos(self): + """ + Returns a predefined final position of the action/control dimension. + This is only required for the DMP and is most of the time learned instead. + """ + raise NotImplementedError() + + @property + def dt(self): + """ + Returns the time between two simulated steps of the environment + """ + raise NotImplementedError() + +``` + +If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. +Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend +having a look at the [examples](alr_envs/examples/). + +```python +import alr_envs + +# Base environment name, according to structure of above example +base_env_id = "ball_in_cup-catch" + +# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. +# You can also add other gym.Wrappers in case they are needed, +# e.g. gym.wrappers.FlattenObservation for dict observations +wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] +mp_kwargs = {...} +kwargs = {...} +env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs) +# OR for a deterministic ProMP (other mp_kwargs are required): +# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) + +rewards = 0 +obs = env.reset() + +# number of samples/full trajectories (multiple environment steps) +for i in range(5): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if done: + print(base_env_id, rewards) + rewards = 0 + obs = env.reset() +``` diff --git a/alr_envs/__init__.py b/alr_envs/__init__.py index e9c09db..e43e3b1 100644 --- a/alr_envs/__init__.py +++ b/alr_envs/__init__.py @@ -1,901 +1,15 @@ -from gym.envs.registration import register -from gym.wrappers import FlattenObservation +from alr_envs import dmc, meta, open_ai +from alr_envs.utils.make_env_helpers import make, make_detpmp_env, make_dmp_env, make_rank +from alr_envs.utils import make_dmc -from alr_envs import classic_control, dmc, open_ai +# Convenience function for all MP environments +from .alr import ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS +from .dmc import ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS +from .meta import ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS +from .open_ai import ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS -from alr_envs.utils.make_env_helpers import make_dmp_env -from alr_envs.utils.make_env_helpers import make_detpmp_env -from alr_envs.utils.make_env_helpers import make_env -from alr_envs.utils.make_env_helpers import make_env_rank - -# Mujoco - -## Reacher -register( - id='ALRReacher-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 0, - "n_links": 5, - "balance": False, - } -) - -register( - id='ALRReacherSparse-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 5, - "balance": False, - } -) - -register( - id='ALRReacherSparseBalanced-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 5, - "balance": True, - } -) - -register( - id='ALRLongReacher-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 0, - "n_links": 7, - "balance": False, - } -) - -register( - id='ALRLongReacherSparse-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 7, - "balance": False, - } -) - -register( - id='ALRLongReacherSparseBalanced-v0', - entry_point='alr_envs.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 7, - "balance": True, - } -) - -## Balancing Reacher - -register( - id='Balancing-v0', - entry_point='alr_envs.mujoco:BalancingEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - } -) - -# Classic control - -## Simple Reacher -register( - id='SimpleReacher-v0', - entry_point='alr_envs.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 2, - } -) - -register( - id='SimpleReacher-v1', - entry_point='alr_envs.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 2, - "random_start": False - } -) - -register( - id='LongSimpleReacher-v0', - entry_point='alr_envs.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - } -) - -register( - id='LongSimpleReacher-v1', - entry_point='alr_envs.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False - } -) - -## Viapoint Reacher - -register( - id='ViaPointReacher-v0', - entry_point='alr_envs.classic_control:ViaPointReacher', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "allow_self_collision": False, - "collision_penalty": 1000 - } -) - -## Hole Reacher -register( - id='HoleReacher-v0', - entry_point='alr_envs.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": True, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": None, - "hole_depth": 1, - "hole_x": None, - "collision_penalty": 100, - } -) - -register( - id='HoleReacher-v1', - entry_point='alr_envs.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": 0.25, - "hole_depth": 1, - "hole_x": None, - "collision_penalty": 100, - } -) - -register( - id='HoleReacher-v2', - entry_point='alr_envs.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": 0.25, - "hole_depth": 1, - "hole_x": 2, - "collision_penalty": 100, - } -) - -# Short Continuous Mountain Car -register( - id="MountainCarContinuous-v1", - entry_point="gym.envs.classic_control:Continuous_MountainCarEnv", - max_episode_steps=100, - reward_threshold=90.0, -) - -# Motion Primitive Environments - -## Simple Reacher -versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"] -for v in versions: - name = v.split("-") - register( - id=f'{name[0]}DMP-{name[1]}', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"alr_envs:{v}", - "wrappers": [classic_control.simple_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2 if "long" not in v.lower() else 5, - "num_basis": 5, - "duration": 20, - "alpha_phase": 2, - "learn_goal": True, - "policy_type": "velocity", - "weights_scale": 50, - } - } - ) - -register( - id='ViaPointReacherDMP-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": "alr_envs:ViaPointReacher-v0", - "wrappers": [classic_control.viapoint_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2, - "policy_type": "velocity", - "weights_scale": 50, - } - } -) - -register( - id='ViaPointReacherDetPMP-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": "alr_envs:ViaPointReacher-v0", - "wrappers": [classic_control.viapoint_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "width": 0.025, - "policy_type": "velocity", - "weights_scale": 0.2, - "zero_start": True - } - } -) - -## Hole Reacher -versions = ["v0", "v1", "v2"] -for v in versions: - register( - id=f'HoleReacherDMP-{v}', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"alr_envs:HoleReacher-{v}", - "wrappers": [classic_control.hole_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1 - } - } - ) - - register( - id=f'HoleReacherDetPMP-{v}', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"alr_envs:HoleReacher-{v}", - "wrappers": [classic_control.hole_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "width": 0.025, - "policy_type": "velocity", - "weights_scale": 0.2, - "zero_start": True - } - } - ) - -## Deep Mind Control Suite (DMC) -### Suite - -register( - id=f'dmc_ball_in_cup-catch_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"ball_in_cup-catch", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.ball_in_cup.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) - -register( - id=f'dmc_ball_in_cup-catch_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"ball_in_cup-catch", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.ball_in_cup.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) - -# TODO tune episode length for all below -register( - id=f'dmc_reacher-easy_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"reacher-easy", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) - -register( - id=f'dmc_reacher-easy_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"reacher-easy", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) - -register( - id=f'dmc_reacher-hard_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"reacher-hard", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) - -register( - id=f'dmc_reacher-hard_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"reacher-hard", - "time_limit": 1, - "episode_length": 50, - "wrappers": [dmc.suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 1, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -register( - id=f'dmc_cartpole-balance_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-balance", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-balance_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-balance", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-balance_sparse_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-balance_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-balance_sparse_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-balance_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-swingup_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-swingup", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-swingup_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-swingup", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-swingup_sparse_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-swingup_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-swingup_sparse_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-swingup_sparse", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [dmc.suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-two_poles_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=2)], - "wrappers": [dmc.suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-two_poles_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=2)], - "wrappers": [dmc.suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -register( - id=f'dmc_cartpole-three_poles_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=3)], - "wrappers": [dmc.suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -register( - id=f'dmc_cartpole-three_poles_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - # "wrappers": [partial(DMCCartpoleMPWrapper, n_poles=3)], - "wrappers": [dmc.suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) - -### Manipulation - -register( - id=f'dmc_manipulation-reach_site_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"manipulation-reach_site_features", - # "time_limit": 1, - "episode_length": 250, - "wrappers": [dmc.manipulation.reach.MPWrapper], - "mp_kwargs": { - "num_dof": 9, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1, - } - } -) - -register( - id=f'dmc_manipulation-reach_site_detpmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": f"manipulation-reach_site_features", - # "time_limit": 1, - "episode_length": 250, - "wrappers": [dmc.manipulation.reach.MPWrapper], - "mp_kwargs": { - "num_dof": 9, - "num_basis": 5, - "duration": 10, - "width": 0.025, - "policy_type": "velocity", - "weights_scale": 0.2, - "zero_start": True, - } - } -) - -## Open AI -register( - id='ContinuousMountainCarDetPMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": "alr_envs:MountainCarContinuous-v1", - "wrappers": [open_ai.classic_control.continuous_mountain_car.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 4, - "duration": 2, - "post_traj_time": 0, - "width": 0.02, - "policy_type": "motor", - "policy_kwargs": { - "p_gains": 1., - "d_gains": 1. - } - } - } -) - -register( - id='ReacherDetPMP-v2', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": "gym.envs.mujoco:Reacher-v2", - "wrappers": [open_ai.mujoco.reacher_v2.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 6, - "duration": 1, - "post_traj_time": 0, - "width": 0.02, - "policy_type": "motor", - "policy_kwargs": { - "p_gains": .6, - "d_gains": .075 - } - } - } -) - -register( - id='FetchSlideDenseDetPMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchSlideDense-v1", - "wrappers": [FlattenObservation, open_ai.robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "width": 0.02, - "policy_type": "position" - } - } -) - -register( - id='FetchReachDenseDetPMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchReachDense-v1", - "wrappers": [FlattenObservation, open_ai.robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "width": 0.02, - "policy_type": "position" - } - } -) +ALL_MOTION_PRIMITIVE_ENVIRONMENTS = { + key: value + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS[key] + + ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS[key] + + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS[key] + for key, value in ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS.items()} diff --git a/alr_envs/alr/__init__.py b/alr_envs/alr/__init__.py new file mode 100644 index 0000000..179839c --- /dev/null +++ b/alr_envs/alr/__init__.py @@ -0,0 +1,329 @@ +from gym import register + +from . import classic_control, mujoco +from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv +from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv +from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv +from .mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv +from .mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv +from .mujoco.reacher.alr_reacher import ALRReacherEnv +from .mujoco.reacher.balancing import BalancingEnv + +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} + +# Classic Control +## Simple Reacher +register( + id='SimpleReacher-v0', + entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 2, + } +) + +register( + id='SimpleReacher-v1', + entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 2, + "random_start": False + } +) + +register( + id='LongSimpleReacher-v0', + entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + } +) + +register( + id='LongSimpleReacher-v1', + entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False + } +) + +## Viapoint Reacher + +register( + id='ViaPointReacher-v0', + entry_point='alr_envs.alr.classic_control:ViaPointReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "allow_self_collision": False, + "collision_penalty": 1000 + } +) + +## Hole Reacher +register( + id='HoleReacher-v0', + entry_point='alr_envs.alr.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": True, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": None, + "hole_depth": 1, + "hole_x": None, + "collision_penalty": 100, + } +) + +register( + id='HoleReacher-v1', + entry_point='alr_envs.alr.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": 0.25, + "hole_depth": 1, + "hole_x": None, + "collision_penalty": 100, + } +) + +register( + id='HoleReacher-v2', + entry_point='alr_envs.alr.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": 0.25, + "hole_depth": 1, + "hole_x": 2, + "collision_penalty": 100, + } +) + +# Mujoco + +## Reacher +register( + id='ALRReacher-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 0, + "n_links": 5, + "balance": False, + } +) + +register( + id='ALRReacherSparse-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 200, + "n_links": 5, + "balance": False, + } +) + +register( + id='ALRReacherSparseBalanced-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 200, + "n_links": 5, + "balance": True, + } +) + +register( + id='ALRLongReacher-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 0, + "n_links": 7, + "balance": False, + } +) + +register( + id='ALRLongReacherSparse-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 200, + "n_links": 7, + "balance": False, + } +) + +register( + id='ALRLongReacherSparseBalanced-v0', + entry_point='alr_envs.alr.mujoco:ALRReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 200, + "n_links": 7, + "balance": True, + } +) + +## Balancing Reacher + +register( + id='Balancing-v0', + entry_point='alr_envs.alr.mujoco:BalancingEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + } +) + +# Motion Primitive Environments + +## Simple Reacher +_versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"] +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}DMP-{_name[1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"alr_envs:{_v}", + "wrappers": [classic_control.simple_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2 if "long" not in _v.lower() else 5, + "num_basis": 5, + "duration": 20, + "alpha_phase": 2, + "learn_goal": True, + "policy_type": "velocity", + "weights_scale": 50, + } + } + ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'{_name[0]}DetPMP-{_name[1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"alr_envs:{_v}", + "wrappers": [classic_control.simple_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2 if "long" not in _v.lower() else 5, + "num_basis": 5, + "duration": 20, + "width": 0.025, + "policy_type": "velocity", + "weights_scale": 0.2, + "zero_start": True + } + } + ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +# Viapoint reacher +register( + id='ViaPointReacherDMP-v0', + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": "alr_envs:ViaPointReacher-v0", + "wrappers": [classic_control.viapoint_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 2, + "learn_goal": True, + "alpha_phase": 2, + "policy_type": "velocity", + "weights_scale": 50, + } + } +) +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0") + +register( + id='ViaPointReacherDetPMP-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": "alr_envs:ViaPointReacher-v0", + "wrappers": [classic_control.viapoint_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 2, + "width": 0.025, + "policy_type": "velocity", + "weights_scale": 0.2, + "zero_start": True + } + } +) +ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ViaPointReacherDetPMP-v0") + +## Hole Reacher +_versions = ["v0", "v1", "v2"] +for _v in _versions: + _env_id = f'HoleReacherDMP-{_v}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"alr_envs:HoleReacher-{_v}", + "wrappers": [classic_control.hole_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 2, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "velocity", + "weights_scale": 50, + "goal_scale": 0.1 + } + } + ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'HoleReacherDetPMP-{_v}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"alr_envs:HoleReacher-{_v}", + "wrappers": [classic_control.hole_reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 2, + "width": 0.025, + "policy_type": "velocity", + "weights_scale": 0.2, + "zero_start": True + } + } + ) + ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) diff --git a/alr_envs/alr/classic_control/README.MD b/alr_envs/alr/classic_control/README.MD new file mode 100644 index 0000000..ebe2101 --- /dev/null +++ b/alr_envs/alr/classic_control/README.MD @@ -0,0 +1,21 @@ +### Classic Control + +## Step-based Environments +|Name| Description|Horizon|Action Dimension|Observation Dimension +|---|---|---|---|---| +|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 +|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 +|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 +|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 + +## MP Environments +|Name| Description|Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25 +|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 +|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 +|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15 +|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35 +|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3 + +[//]: |`HoleReacherDetPMP-v0`| \ No newline at end of file diff --git a/alr_envs/alr/classic_control/__init__.py b/alr_envs/alr/classic_control/__init__.py new file mode 100644 index 0000000..73575ab --- /dev/null +++ b/alr_envs/alr/classic_control/__init__.py @@ -0,0 +1,3 @@ +from .hole_reacher.hole_reacher import HoleReacherEnv +from .simple_reacher.simple_reacher import SimpleReacherEnv +from .viapoint_reacher.viapoint_reacher import ViaPointReacherEnv diff --git a/alr_envs/classic_control/hole_reacher/__init__.py b/alr_envs/alr/classic_control/hole_reacher/__init__.py similarity index 100% rename from alr_envs/classic_control/hole_reacher/__init__.py rename to alr_envs/alr/classic_control/hole_reacher/__init__.py diff --git a/alr_envs/classic_control/hole_reacher/hole_reacher.py b/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py similarity index 99% rename from alr_envs/classic_control/hole_reacher/hole_reacher.py rename to alr_envs/alr/classic_control/hole_reacher/hole_reacher.py index 2482714..ee1a997 100644 --- a/alr_envs/classic_control/hole_reacher/hole_reacher.py +++ b/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py @@ -6,7 +6,7 @@ import numpy as np from gym.utils import seeding from matplotlib import patches -from alr_envs.classic_control.utils import check_self_collision +from alr_envs.alr.classic_control.utils import check_self_collision class HoleReacherEnv(gym.Env): diff --git a/alr_envs/classic_control/hole_reacher/mp_wrapper.py b/alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py similarity index 100% rename from alr_envs/classic_control/hole_reacher/mp_wrapper.py rename to alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py diff --git a/alr_envs/classic_control/simple_reacher/__init__.py b/alr_envs/alr/classic_control/simple_reacher/__init__.py similarity index 100% rename from alr_envs/classic_control/simple_reacher/__init__.py rename to alr_envs/alr/classic_control/simple_reacher/__init__.py diff --git a/alr_envs/classic_control/simple_reacher/mp_wrapper.py b/alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py similarity index 100% rename from alr_envs/classic_control/simple_reacher/mp_wrapper.py rename to alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py diff --git a/alr_envs/classic_control/simple_reacher/simple_reacher.py b/alr_envs/alr/classic_control/simple_reacher/simple_reacher.py similarity index 100% rename from alr_envs/classic_control/simple_reacher/simple_reacher.py rename to alr_envs/alr/classic_control/simple_reacher/simple_reacher.py diff --git a/alr_envs/classic_control/utils.py b/alr_envs/alr/classic_control/utils.py similarity index 100% rename from alr_envs/classic_control/utils.py rename to alr_envs/alr/classic_control/utils.py diff --git a/alr_envs/classic_control/viapoint_reacher/__init__.py b/alr_envs/alr/classic_control/viapoint_reacher/__init__.py similarity index 100% rename from alr_envs/classic_control/viapoint_reacher/__init__.py rename to alr_envs/alr/classic_control/viapoint_reacher/__init__.py diff --git a/alr_envs/classic_control/viapoint_reacher/mp_wrapper.py b/alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py similarity index 100% rename from alr_envs/classic_control/viapoint_reacher/mp_wrapper.py rename to alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py diff --git a/alr_envs/classic_control/viapoint_reacher/viapoint_reacher.py b/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py similarity index 99% rename from alr_envs/classic_control/viapoint_reacher/viapoint_reacher.py rename to alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py index 99f8bcf..95d1e7f 100644 --- a/alr_envs/classic_control/viapoint_reacher/viapoint_reacher.py +++ b/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py @@ -5,10 +5,10 @@ import matplotlib.pyplot as plt import numpy as np from gym.utils import seeding -from alr_envs.classic_control.utils import check_self_collision +from alr_envs.alr.classic_control.utils import check_self_collision -class ViaPointReacher(gym.Env): +class ViaPointReacherEnv(gym.Env): def __init__(self, n_links, random_start: bool = False, via_target: Union[None, Iterable] = None, target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000): diff --git a/alr_envs/alr/mujoco/README.MD b/alr_envs/alr/mujoco/README.MD new file mode 100644 index 0000000..0ea5a1f --- /dev/null +++ b/alr_envs/alr/mujoco/README.MD @@ -0,0 +1,15 @@ +# Custom Mujoco tasks + +## Step-based Environments +|Name| Description|Horizon|Action Dimension|Observation Dimension +|---|---|---|---|---| +|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21 +|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21 +|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21 +|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 +|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 +|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 +|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip +|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip +|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip + \ No newline at end of file diff --git a/alr_envs/alr/mujoco/__init__.py b/alr_envs/alr/mujoco/__init__.py new file mode 100644 index 0000000..6744d15 --- /dev/null +++ b/alr_envs/alr/mujoco/__init__.py @@ -0,0 +1,4 @@ +from .reacher.alr_reacher import ALRReacherEnv +from .reacher.balancing import BalancingEnv +from .ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv +from .ball_in_a_cup.biac_pd import ALRBallInACupPDEnv \ No newline at end of file diff --git a/alr_envs/mujoco/alr_reward_fct.py b/alr_envs/alr/mujoco/alr_reward_fct.py similarity index 100% rename from alr_envs/mujoco/alr_reward_fct.py rename to alr_envs/alr/mujoco/alr_reward_fct.py diff --git a/alr_envs/mujoco/ball_in_a_cup/__init__.py b/alr_envs/alr/mujoco/ball_in_a_cup/__init__.py similarity index 100% rename from alr_envs/mujoco/ball_in_a_cup/__init__.py rename to alr_envs/alr/mujoco/ball_in_a_cup/__init__.py diff --git a/alr_envs/mujoco/ball_in_a_cup/assets/biac_base.xml b/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml similarity index 100% rename from alr_envs/mujoco/ball_in_a_cup/assets/biac_base.xml rename to alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py similarity index 96% rename from alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py rename to alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py index f870877..7cff670 100644 --- a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py +++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py @@ -35,10 +35,10 @@ class ALRBallInACupEnv(MujocoEnv, utils.EzPickle): self.sim_time = 8 # seconds self.sim_steps = int(self.sim_time / self.dt) if reward_type == "no_context": - from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward + from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward reward_function = BallInACupReward elif reward_type == "contextual_goal": - from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward + from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward reward_function = BallInACupReward else: raise ValueError("Unknown reward type: {}".format(reward_type)) diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py similarity index 100% rename from alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py rename to alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py similarity index 99% rename from alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py rename to alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py index e34aecf..4ea4381 100644 --- a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py +++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py @@ -1,5 +1,5 @@ import numpy as np -from alr_envs.mujoco import alr_reward_fct +from alr_envs.alr.mujoco import alr_reward_fct class BallInACupReward(alr_reward_fct.AlrReward): diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py similarity index 99% rename from alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py rename to alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py index 295dcf9..a147d89 100644 --- a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py +++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py @@ -1,5 +1,5 @@ import numpy as np -from alr_envs.mujoco import alr_reward_fct +from alr_envs.alr.mujoco import alr_reward_fct class BallInACupReward(alr_reward_fct.AlrReward): diff --git a/alr_envs/mujoco/ball_in_a_cup/biac_pd.py b/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py similarity index 97% rename from alr_envs/mujoco/ball_in_a_cup/biac_pd.py rename to alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py index 31545d0..b047c54 100644 --- a/alr_envs/mujoco/ball_in_a_cup/biac_pd.py +++ b/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py @@ -42,10 +42,10 @@ class ALRBallInACupPDEnv(mujoco_env.MujocoEnv, utils.EzPickle): self._dt = 0.02 self.ep_length = 4000 # based on 8 seconds with dt = 0.02 int(self.sim_time / self.dt) if reward_type == "no_context": - from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward + from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward reward_function = BallInACupReward elif reward_type == "contextual_goal": - from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward + from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward reward_function = BallInACupReward else: raise ValueError("Unknown reward type: {}".format(reward_type)) diff --git a/alr_envs/mujoco/ball_in_a_cup/utils.py b/alr_envs/alr/mujoco/ball_in_a_cup/utils.py similarity index 97% rename from alr_envs/mujoco/ball_in_a_cup/utils.py rename to alr_envs/alr/mujoco/ball_in_a_cup/utils.py index b851cae..0dd972c 100644 --- a/alr_envs/mujoco/ball_in_a_cup/utils.py +++ b/alr_envs/alr/mujoco/ball_in_a_cup/utils.py @@ -1,4 +1,4 @@ -from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv +from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper diff --git a/alr_envs/mujoco/beerpong/__init__.py b/alr_envs/alr/mujoco/beerpong/__init__.py similarity index 100% rename from alr_envs/mujoco/beerpong/__init__.py rename to alr_envs/alr/mujoco/beerpong/__init__.py diff --git a/alr_envs/mujoco/beerpong/assets/beerpong.xml b/alr_envs/alr/mujoco/beerpong/assets/beerpong.xml similarity index 100% rename from alr_envs/mujoco/beerpong/assets/beerpong.xml rename to alr_envs/alr/mujoco/beerpong/assets/beerpong.xml diff --git a/alr_envs/mujoco/beerpong/beerpong.py b/alr_envs/alr/mujoco/beerpong/beerpong.py similarity index 98% rename from alr_envs/mujoco/beerpong/beerpong.py rename to alr_envs/alr/mujoco/beerpong/beerpong.py index 1f3f781..4d8f389 100644 --- a/alr_envs/mujoco/beerpong/beerpong.py +++ b/alr_envs/alr/mujoco/beerpong/beerpong.py @@ -37,7 +37,7 @@ class ALRBeerpongEnv(MujocoEnv, utils.EzPickle): self.sim_time = 8 # seconds self.sim_steps = int(self.sim_time / self.dt) if reward_function is None: - from alr_envs.mujoco.beerpong.beerpong_reward import BeerpongReward + from alr_envs.alr.mujoco.beerpong.beerpong_reward import BeerpongReward reward_function = BeerpongReward self.reward_function = reward_function(self.sim, self.sim_steps) self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"] diff --git a/alr_envs/mujoco/beerpong/beerpong_reward.py b/alr_envs/alr/mujoco/beerpong/beerpong_reward.py similarity index 99% rename from alr_envs/mujoco/beerpong/beerpong_reward.py rename to alr_envs/alr/mujoco/beerpong/beerpong_reward.py index 3281409..c3ca3c6 100644 --- a/alr_envs/mujoco/beerpong/beerpong_reward.py +++ b/alr_envs/alr/mujoco/beerpong/beerpong_reward.py @@ -1,5 +1,5 @@ import numpy as np -from alr_envs.mujoco import alr_reward_fct +from alr_envs.alr.mujoco import alr_reward_fct class BeerpongReward(alr_reward_fct.AlrReward): diff --git a/alr_envs/mujoco/beerpong/beerpong_reward_simple.py b/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py similarity index 99% rename from alr_envs/mujoco/beerpong/beerpong_reward_simple.py rename to alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py index b79e963..fbe2163 100644 --- a/alr_envs/mujoco/beerpong/beerpong_reward_simple.py +++ b/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py @@ -1,5 +1,5 @@ import numpy as np -from alr_envs.mujoco import alr_reward_fct +from alr_envs.alr.mujoco import alr_reward_fct class BeerpongReward(alr_reward_fct.AlrReward): diff --git a/alr_envs/mujoco/beerpong/beerpong_simple.py b/alr_envs/alr/mujoco/beerpong/beerpong_simple.py similarity index 98% rename from alr_envs/mujoco/beerpong/beerpong_simple.py rename to alr_envs/alr/mujoco/beerpong/beerpong_simple.py index 5f6f6d2..73da83d 100644 --- a/alr_envs/mujoco/beerpong/beerpong_simple.py +++ b/alr_envs/alr/mujoco/beerpong/beerpong_simple.py @@ -38,7 +38,7 @@ class ALRBeerpongEnv(MujocoEnv, utils.EzPickle): self.sim_time = 8 # seconds self.sim_steps = int(self.sim_time / self.dt) if reward_function is None: - from alr_envs.mujoco.beerpong.beerpong_reward_simple import BeerpongReward + from alr_envs.alr.mujoco.beerpong.beerpong_reward_simple import BeerpongReward reward_function = BeerpongReward self.reward_function = reward_function(self.sim, self.sim_steps) self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"] diff --git a/alr_envs/mujoco/beerpong/utils.py b/alr_envs/alr/mujoco/beerpong/utils.py similarity index 93% rename from alr_envs/mujoco/beerpong/utils.py rename to alr_envs/alr/mujoco/beerpong/utils.py index 7cd4682..f43e881 100644 --- a/alr_envs/mujoco/beerpong/utils.py +++ b/alr_envs/alr/mujoco/beerpong/utils.py @@ -1,6 +1,6 @@ from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper -from alr_envs.mujoco.beerpong.beerpong import ALRBeerpongEnv -from alr_envs.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple +from alr_envs.alr.mujoco.beerpong.beerpong import ALRBeerpongEnv +from alr_envs.alr.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple def make_contextual_env(rank, seed=0): diff --git a/alr_envs/mujoco/gym_table_tennis/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/__init__.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/__init__.py rename to alr_envs/alr/mujoco/gym_table_tennis/__init__.py diff --git a/alr_envs/mujoco/gym_table_tennis/envs/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/envs/__init__.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/__init__.py rename to alr_envs/alr/mujoco/gym_table_tennis/envs/__init__.py diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_table.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_table.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/shared.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/shared.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml rename to alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml diff --git a/alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py b/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py similarity index 98% rename from alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py rename to alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py index f155f06..9b16ae1 100644 --- a/alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py +++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py @@ -2,9 +2,9 @@ import numpy as np from gym import spaces from gym.envs.robotics import robot_env, utils # import xml.etree.ElementTree as ET -from alr_envs.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis +from alr_envs.alr.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis import glfw -from alr_envs.mujoco.gym_table_tennis.utils.experiment import ball_initialize +from alr_envs.alr.mujoco.gym_table_tennis.utils.experiment import ball_initialize from pathlib import Path import os diff --git a/alr_envs/mujoco/gym_table_tennis/utils/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/__init__.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/utils/__init__.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/__init__.py diff --git a/alr_envs/mujoco/gym_table_tennis/utils/experiment.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py similarity index 96% rename from alr_envs/mujoco/gym_table_tennis/utils/experiment.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py index addd6c5..a106d68 100644 --- a/alr_envs/mujoco/gym_table_tennis/utils/experiment.py +++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py @@ -1,6 +1,6 @@ import numpy as np from gym.utils import seeding -from alr_envs.mujoco.gym_table_tennis.utils.util import read_yaml, read_json +from alr_envs.alr.mujoco.gym_table_tennis.utils.util import read_yaml, read_json from pathlib import Path diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/__init__.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/utils/rewards/__init__.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/__init__.py diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/rewards.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/utils/rewards/rewards.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py diff --git a/alr_envs/mujoco/gym_table_tennis/utils/util.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/utils/util.py rename to alr_envs/alr/mujoco/gym_table_tennis/utils/util.py diff --git a/alr_envs/mujoco/meshes/wam/base_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/base_link_convex.stl rename to alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl diff --git a/alr_envs/mujoco/meshes/wam/base_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/base_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/cup.stl b/alr_envs/alr/mujoco/meshes/wam/cup.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup.stl rename to alr_envs/alr/mujoco/meshes/wam/cup.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split1.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split1.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split1.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split1.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split10.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split10.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split10.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split10.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split11.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split11.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split11.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split11.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split12.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split12.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split12.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split12.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split13.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split13.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split13.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split13.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split14.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split14.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split14.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split14.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split15.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split15.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split15.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split15.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split16.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split16.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split16.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split16.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split17.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split17.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split17.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split17.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split18.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split18.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split18.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split18.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split2.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split2.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split2.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split2.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split3.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split3.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split3.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split3.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split4.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split4.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split4.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split4.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split5.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split5.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split5.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split5.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split6.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split6.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split6.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split6.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split7.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split7.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split7.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split7.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split8.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split8.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split8.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split8.stl diff --git a/alr_envs/mujoco/meshes/wam/cup_split9.stl b/alr_envs/alr/mujoco/meshes/wam/cup_split9.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/cup_split9.stl rename to alr_envs/alr/mujoco/meshes/wam/cup_split9.stl diff --git a/alr_envs/mujoco/meshes/wam/elbow_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/elbow_link_convex.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/elbow_link_convex.stl rename to alr_envs/alr/mujoco/meshes/wam/elbow_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/elbow_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/elbow_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_fine.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/forearm_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/forearm_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_fine.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_convex.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_pitch_link_convex.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_convex.stl diff --git a/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_fine.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/shoulder_pitch_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_fine.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/upper_arm_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/upper_arm_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/wrist_palm_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_convex.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/wrist_palm_link_convex.stl rename to alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_convex.stl diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_fine.stl rename to alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_fine.stl diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl rename to alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl rename to alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl rename to alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl diff --git a/alr_envs/mujoco/reacher/__init__.py b/alr_envs/alr/mujoco/reacher/__init__.py similarity index 100% rename from alr_envs/mujoco/reacher/__init__.py rename to alr_envs/alr/mujoco/reacher/__init__.py diff --git a/alr_envs/mujoco/reacher/alr_reacher.py b/alr_envs/alr/mujoco/reacher/alr_reacher.py similarity index 100% rename from alr_envs/mujoco/reacher/alr_reacher.py rename to alr_envs/alr/mujoco/reacher/alr_reacher.py diff --git a/alr_envs/mujoco/reacher/assets/reacher_5links.xml b/alr_envs/alr/mujoco/reacher/assets/reacher_5links.xml similarity index 100% rename from alr_envs/mujoco/reacher/assets/reacher_5links.xml rename to alr_envs/alr/mujoco/reacher/assets/reacher_5links.xml diff --git a/alr_envs/mujoco/reacher/assets/reacher_7links.xml b/alr_envs/alr/mujoco/reacher/assets/reacher_7links.xml similarity index 100% rename from alr_envs/mujoco/reacher/assets/reacher_7links.xml rename to alr_envs/alr/mujoco/reacher/assets/reacher_7links.xml diff --git a/alr_envs/mujoco/reacher/balancing.py b/alr_envs/alr/mujoco/reacher/balancing.py similarity index 100% rename from alr_envs/mujoco/reacher/balancing.py rename to alr_envs/alr/mujoco/reacher/balancing.py diff --git a/alr_envs/classic_control/__init__.py b/alr_envs/classic_control/__init__.py deleted file mode 100644 index 8d95b6a..0000000 --- a/alr_envs/classic_control/__init__.py +++ /dev/null @@ -1,3 +0,0 @@ -from alr_envs.classic_control.hole_reacher.hole_reacher import HoleReacherEnv -from alr_envs.classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacher -from alr_envs.classic_control.simple_reacher.simple_reacher import SimpleReacherEnv diff --git a/alr_envs/dmc/README.MD b/alr_envs/dmc/README.MD new file mode 100644 index 0000000..791ee84 --- /dev/null +++ b/alr_envs/dmc/README.MD @@ -0,0 +1,19 @@ +# DeepMind Control (DMC) Wrappers + +These are the Environment Wrappers for selected +[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) +environments in order to use our Motion Primitive gym interface with them. + +## MP Environments + +[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. Given most task can be) +[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.) + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`dmc_ball_in_cup-catch_detpmp-v0`| A DetPmP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 +|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2 +|`dmc_reacher-easy_detpmp-v0`| A DetPmP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 +|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 +|`dmc_reacher-hard_detpmp-v0`| A DetPmP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 +|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 diff --git a/alr_envs/dmc/__init__.py b/alr_envs/dmc/__init__.py index c5a343d..17d1f7f 100644 --- a/alr_envs/dmc/__init__.py +++ b/alr_envs/dmc/__init__.py @@ -1,5 +1,385 @@ -# from alr_envs.dmc import manipulation, suite -from alr_envs.dmc.suite import ball_in_cup -from alr_envs.dmc.suite import reacher -from alr_envs.dmc.suite import cartpole -from alr_envs.dmc.manipulation import reach \ No newline at end of file +from . import manipulation, suite + +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} + +from gym.envs.registration import register + +# DeepMind Control Suite (DMC) + +register( + id=f'dmc_ball_in_cup-catch_dmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"ball_in_cup-catch", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.ball_in_cup.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0") + +register( + id=f'dmc_ball_in_cup-catch_detpmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"ball_in_cup-catch", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.ball_in_cup.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "width": 0.025, + "policy_type": "motor", + "zero_start": True, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_ball_in_cup-catch_detpmp-v0") + +register( + id=f'dmc_reacher-easy_dmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"reacher-easy", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0") + +register( + id=f'dmc_reacher-easy_detpmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"reacher-easy", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_reacher-easy_detpmp-v0") + +register( + id=f'dmc_reacher-hard_dmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"reacher-hard", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0") + +register( + id=f'dmc_reacher-hard_detpmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"reacher-hard", + "time_limit": 20, + "episode_length": 1000, + "wrappers": [suite.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 5, + "duration": 20, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 50, + "d_gains": 1 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_reacher-hard_detpmp-v0") + +_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"] + +for _task in _dmc_cartpole_tasks: + _env_id = f'dmc_cartpole-{_task}_dmp-v0' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"cartpole-{_task}", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } + ) + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'dmc_cartpole-{_task}_detpmp-v0' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"cartpole-{_task}", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } + ) + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +_env_id = f'dmc_cartpole-two_poles_dmp-v0' +register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"cartpole-two_poles", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.TwoPolesMPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + +_env_id = f'dmc_cartpole-two_poles_detpmp-v0' +register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"cartpole-two_poles", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.TwoPolesMPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +_env_id = f'dmc_cartpole-three_poles_dmp-v0' +register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"cartpole-three_poles", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.ThreePolesMPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "motor", + "weights_scale": 50, + "goal_scale": 0.1, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + +_env_id = f'dmc_cartpole-three_poles_detpmp-v0' +register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"cartpole-three_poles", + # "time_limit": 1, + "camera_id": 0, + "episode_length": 1000, + "wrappers": [suite.cartpole.ThreePolesMPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 5, + "duration": 10, + "width": 0.025, + "policy_type": "motor", + "weights_scale": 0.2, + "zero_start": True, + "policy_kwargs": { + "p_gains": 10, + "d_gains": 10 + } + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +# DeepMind Manipulation + +register( + id=f'dmc_manipulation-reach_site_dmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', + # max_episode_steps=1, + kwargs={ + "name": f"manipulation-reach_site_features", + # "time_limit": 1, + "episode_length": 250, + "wrappers": [manipulation.reach_site.MPWrapper], + "mp_kwargs": { + "num_dof": 9, + "num_basis": 5, + "duration": 10, + "learn_goal": True, + "alpha_phase": 2, + "bandwidth_factor": 2, + "policy_type": "velocity", + "weights_scale": 50, + "goal_scale": 0.1, + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0") + +register( + id=f'dmc_manipulation-reach_site_detpmp-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": f"manipulation-reach_site_features", + # "time_limit": 1, + "episode_length": 250, + "wrappers": [manipulation.reach_site.MPWrapper], + "mp_kwargs": { + "num_dof": 9, + "num_basis": 5, + "duration": 10, + "width": 0.025, + "policy_type": "velocity", + "weights_scale": 0.2, + "zero_start": True, + } + } +) +ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("dmc_manipulation-reach_site_detpmp-v0") diff --git a/alr_envs/utils/dmc_wrapper.py b/alr_envs/dmc/dmc_wrapper.py similarity index 100% rename from alr_envs/utils/dmc_wrapper.py rename to alr_envs/dmc/dmc_wrapper.py diff --git a/alr_envs/dmc/manipulation/__init__.py b/alr_envs/dmc/manipulation/__init__.py index e69de29..f3a4147 100644 --- a/alr_envs/dmc/manipulation/__init__.py +++ b/alr_envs/dmc/manipulation/__init__.py @@ -0,0 +1 @@ +from . import reach_site diff --git a/alr_envs/dmc/manipulation/reach/__init__.py b/alr_envs/dmc/manipulation/reach_site/__init__.py similarity index 100% rename from alr_envs/dmc/manipulation/reach/__init__.py rename to alr_envs/dmc/manipulation/reach_site/__init__.py diff --git a/alr_envs/dmc/manipulation/reach/mp_wrapper.py b/alr_envs/dmc/manipulation/reach_site/mp_wrapper.py similarity index 100% rename from alr_envs/dmc/manipulation/reach/mp_wrapper.py rename to alr_envs/dmc/manipulation/reach_site/mp_wrapper.py diff --git a/alr_envs/dmc/suite/__init__.py b/alr_envs/dmc/suite/__init__.py index e69de29..7d206fc 100644 --- a/alr_envs/dmc/suite/__init__.py +++ b/alr_envs/dmc/suite/__init__.py @@ -0,0 +1 @@ +from . import cartpole, ball_in_cup, reacher diff --git a/alr_envs/dmc/suite/cartpole/__init__.py b/alr_envs/dmc/suite/cartpole/__init__.py index 823077a..c5f9bee 100644 --- a/alr_envs/dmc/suite/cartpole/__init__.py +++ b/alr_envs/dmc/suite/cartpole/__init__.py @@ -1,3 +1,3 @@ from .mp_wrapper import MPWrapper from .mp_wrapper import TwoPolesMPWrapper -from .mp_wrapper import ThreePolesMPWrapper \ No newline at end of file +from .mp_wrapper import ThreePolesMPWrapper diff --git a/alr_envs/examples/examples_dmc.py b/alr_envs/examples/examples_dmc.py index b29329d..2d310c4 100644 --- a/alr_envs/examples/examples_dmc.py +++ b/alr_envs/examples/examples_dmc.py @@ -1,5 +1,4 @@ import alr_envs -from alr_envs.dmc.suite.ball_in_cup.mp_wrapper import MPWrapper def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): @@ -17,7 +16,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): Returns: """ - env = alr_envs.make_env(env_id, seed) + env = alr_envs.make(env_id, seed) rewards = 0 obs = env.reset() print("observation shape:", env.observation_space.shape) @@ -62,29 +61,29 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [MPWrapper] + wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] mp_kwargs = { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "learn_goal": True, + "num_dof": 2, # degrees of fredom a.k.a. the old action space dimensionality + "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis + "duration": 20, # length of trajectory in s, number of steps = duration / dt + "learn_goal": True, # learn the goal position (recommended) "alpha_phase": 2, "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { + "policy_type": "motor", # controller type, 'velocity', 'position', and 'motor' (torque control) + "weights_scale": 1, # scaling of MP weights + "goal_scale": 1, # scaling of learned goal position + "policy_kwargs": { # only required for torque control/PD-Controller "p_gains": 0.2, "d_gains": 0.05 } } kwargs = { - "time_limit": 20, - "episode_length": 1000, + "time_limit": 20, # same as duration value but as max horizon for underlying DMC environment + "episode_length": 1000, # corresponding number of episode steps # "frame_skip": 1 } env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) - # OR for a deterministic ProMP: + # OR for a deterministic ProMP (other mp_kwargs are required, see metaworld_examples): # env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) # This renders the full MP trajectory diff --git a/alr_envs/examples/examples_general.py b/alr_envs/examples/examples_general.py index 88d79d5..d043d6d 100644 --- a/alr_envs/examples/examples_general.py +++ b/alr_envs/examples/examples_general.py @@ -21,7 +21,7 @@ def example_general(env_id="Pendulum-v0", seed=1, iterations=1000, render=True): """ - env = alr_envs.make_env(env_id, seed) + env = alr_envs.make(env_id, seed) rewards = 0 obs = env.reset() print("Observation shape: ", env.observation_space.shape) @@ -56,7 +56,7 @@ def example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=4, seed=int('533D', 16 Returns: Tuple of (obs, reward, done, info) with type np.ndarray """ - env = gym.vector.AsyncVectorEnv([alr_envs.make_env_rank(env_id, seed, i) for i in range(n_cpu)]) + env = gym.vector.AsyncVectorEnv([alr_envs.make_rank(env_id, seed, i) for i in range(n_cpu)]) # OR # envs = gym.vector.AsyncVectorEnv([make_env(env_id, seed + i) for i in range(n_cpu)]) @@ -80,20 +80,21 @@ def example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=4, seed=int('533D', 16 rewards[done] = 0 # do not return values above threshold - return *map(lambda v: np.stack(v)[:n_samples], buffer.values()), + return (*map(lambda v: np.stack(v)[:n_samples], buffer.values()),) if __name__ == '__main__': - render = False + render = True + # Basic gym task example_general("Pendulum-v0", seed=10, iterations=200, render=render) - # + # # Basis task from framework example_general("alr_envs:HoleReacher-v0", seed=10, iterations=200, render=render) - # + # # OpenAI Mujoco task example_general("HalfCheetah-v2", seed=10, render=render) - # + # # Mujoco task from framework example_general("alr_envs:ALRReacher-v0", seed=10, iterations=200, render=render) diff --git a/alr_envs/examples/examples_metaworld.py b/alr_envs/examples/examples_metaworld.py new file mode 100644 index 0000000..e88ed6c --- /dev/null +++ b/alr_envs/examples/examples_metaworld.py @@ -0,0 +1,128 @@ +import alr_envs + + +def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): + """ + Example for running a MetaWorld based env in the step based setting. + The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always + return the observable goal version. + All tasks can be found here: https://arxiv.org/pdf/1910.10897.pdf or https://meta-world.github.io/ + + Args: + env_id: `task_name-v2` + seed: seed for deterministic behaviour (TODO: currently not working due to an issue in MetaWorld code) + iterations: Number of rollout steps to run + render: Render the episode + + Returns: + + """ + env = alr_envs.make(env_id, seed) + rewards = 0 + obs = env.reset() + print("observation shape:", env.observation_space.shape) + print("action shape:", env.action_space.shape) + + for i in range(iterations): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if render: + # THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM + # TODO: Remove this, when Metaworld fixes its interface. + env.render(False) + + if done: + print(env_id, rewards) + rewards = 0 + obs = env.reset() + + env.close() + del env + + +def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): + """ + Example for running a custom motion primitive based environments. + Our already registered environments follow the same structure. + Hence, this also allows to adjust hyperparameters of the motion primitives. + Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks. + We appreciate PRs for custom environments (especially MP wrappers of existing tasks) + for our repo: https://github.com/ALRhub/alr_envs/ + Args: + seed: seed for deterministic behaviour (TODO: currently not working due to an issue in MetaWorld code) + iterations: Number of rollout steps to run + render: Render the episode (TODO: currently not working due to an issue in MetaWorld code) + + Returns: + + """ + + # Base MetaWorld name, according to structure of above example + base_env = "button-press-v2" + + # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. + # You can also add other gym.Wrappers in case they are needed. + wrappers = [alr_envs.meta.goal_and_object_change.MPWrapper] + mp_kwargs = { + "num_dof": 4, # degrees of fredom a.k.a. the old action space dimensionality + "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis + "duration": 6.25, # length of trajectory in s, number of steps = duration / dt + "post_traj_time": 0, # pad trajectory with additional zeros at the end (recommended: 0) + "width": 0.025, # width of the basis functions + "zero_start": True, # start from current environment position if True + "weights_scale": 1, # scaling of MP weights + "policy_type": "metaworld", # custom controller type for metaworld environments + } + + env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) + # OR for a DMP (other mp_kwargs are required, see dmc_examples): + # env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) + + # This renders the full MP trajectory + # It is only required to call render() once in the beginning, which renders every consecutive trajectory. + # Resetting to no rendering, can be achieved by render(mode=None). + # It is also possible to change them mode multiple times when + # e.g. only every nth trajectory should be displayed. + if render: + raise ValueError("Metaworld render interface bug does not allow to render() fixes its interface. " + "A temporary workaround is to alter their code in MujocoEnv render() from " + "`if not offscreen` to `if not offscreen or offscreen == 'human'`.") + # TODO: Remove this, when Metaworld fixes its interface. + # env.render(mode="human") + + rewards = 0 + obs = env.reset() + + # number of samples/full trajectories (multiple environment steps) + for i in range(iterations): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if done: + print(base_env, rewards) + rewards = 0 + obs = env.reset() + + env.close() + del env + + +if __name__ == '__main__': + # Disclaimer: MetaWorld environments require the seed to be specified in the beginning. + # Adjusting it afterwards with env.seed() is not recommended as it may not affect the underlying behavior. + + # For rendering it might be necessary to specify your OpenGL installation + # export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so + render = False + + # # Standard DMC Suite tasks + example_dmc("button-press-v2", seed=10, iterations=500, render=render) + + # MP + MetaWorld hybrid task provided in the our framework + example_dmc("ButtonPressDetPMP-v2", seed=10, iterations=1, render=render) + + # Custom MetaWorld task + example_custom_dmc_and_mp(seed=10, iterations=1, render=render) diff --git a/alr_envs/examples/examples_motion_primitives.py b/alr_envs/examples/examples_motion_primitives.py index 480b58d..6decdb1 100644 --- a/alr_envs/examples/examples_motion_primitives.py +++ b/alr_envs/examples/examples_motion_primitives.py @@ -1,5 +1,4 @@ -from alr_envs import MPWrapper -from alr_envs.utils.make_env_helpers import make_dmp_env, make_env +import alr_envs def example_mp(env_name="alr_envs:HoleReacherDMP-v1", seed=1, iterations=1, render=True): @@ -16,7 +15,7 @@ def example_mp(env_name="alr_envs:HoleReacherDMP-v1", seed=1, iterations=1, rend """ # While in this case gym.make() is possible to use as well, we recommend our custom make env function. # First, it already takes care of seeding and second enables the use of DMC tasks within the gym interface. - env = make_env(env_name, seed) + env = alr_envs.make(env_name, seed) rewards = 0 # env.render(mode=None) @@ -71,7 +70,7 @@ def example_custom_mp(env_name="alr_envs:HoleReacherDMP-v1", seed=1, iterations= "weights_scale": 50, "goal_scale": 0.1 } - env = make_env(env_name, seed, mp_kwargs=mp_kwargs) + env = alr_envs.make(env_name, seed, mp_kwargs=mp_kwargs) # This time rendering every trajectory if render: @@ -113,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True): # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [MPWrapper] + wrappers = [alr_envs.classic_control.hole_reacher.MPWrapper] mp_kwargs = { "num_dof": 5, "num_basis": 5, @@ -125,7 +124,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True): "weights_scale": 50, "goal_scale": 0.1 } - env = make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) + env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) # OR for a deterministic ProMP: # env = make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) diff --git a/alr_envs/examples/examples_open_ai.py b/alr_envs/examples/examples_open_ai.py index d001bc8..9f90be5 100644 --- a/alr_envs/examples/examples_open_ai.py +++ b/alr_envs/examples/examples_open_ai.py @@ -1,4 +1,4 @@ -from alr_envs.utils.make_env_helpers import make_env +import alr_envs def example_mp(env_name, seed=1): @@ -13,7 +13,7 @@ def example_mp(env_name, seed=1): """ # While in this case gym.make() is possible to use as well, we recommend our custom make env function. - env = make_env(env_name, seed) + env = alr_envs.make(env_name, seed) rewards = 0 obs = env.reset() @@ -29,13 +29,13 @@ def example_mp(env_name, seed=1): rewards = 0 obs = env.reset() + if __name__ == '__main__': # DMP - not supported yet - #example_mp("ReacherDetPMP-v2") + # example_mp("ReacherDMP-v2") # DetProMP example_mp("ContinuousMountainCarDetPMP-v0") example_mp("ReacherDetPMP-v2") example_mp("FetchReachDenseDetPMP-v1") example_mp("FetchSlideDenseDetPMP-v1") - diff --git a/alr_envs/examples/pd_control_gain_tuning.py b/alr_envs/examples/pd_control_gain_tuning.py index 55ab1c0..3fb8251 100644 --- a/alr_envs/examples/pd_control_gain_tuning.py +++ b/alr_envs/examples/pd_control_gain_tuning.py @@ -1,30 +1,30 @@ import numpy as np from matplotlib import pyplot as plt -from alr_envs import dmc +from alr_envs import dmc, meta from alr_envs.utils.make_env_helpers import make_detpmp_env # This might work for some environments, however, please verify either way the correct trajectory information # for your environment are extracted below SEED = 10 -env_id = "cartpole-swingup" -wrappers = [dmc.suite.cartpole.MPWrapper] +env_id = "ball_in_cup-catch" +wrappers = [dmc.ball_in_cup.MPWrapper] mp_kwargs = { - "num_dof": 1, - "num_basis": 5, + "num_dof": 2, + "num_basis": 10, "duration": 2, "width": 0.025, "policy_type": "motor", - "weights_scale": 0.2, + "weights_scale": 1, "zero_start": True, "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 # a good starting point is the sqrt of p_gains + "p_gains": 1, + "d_gains": 1 } } -kwargs = dict(time_limit=2, episode_length=200) +kwargs = dict(time_limit=2, episode_length=100) env = make_detpmp_env(env_id, wrappers, seed=SEED, mp_kwargs=mp_kwargs, **kwargs) @@ -35,7 +35,6 @@ pos, vel = env.mp_rollout(env.action_space.sample()) base_shape = env.full_action_space.shape actual_pos = np.zeros((len(pos), *base_shape)) -actual_pos_ball = np.zeros((len(pos), *base_shape)) actual_vel = np.zeros((len(pos), *base_shape)) act = np.zeros((len(pos), *base_shape)) @@ -46,7 +45,6 @@ for t, pos_vel in enumerate(zip(pos, vel)): act[t, :] = actions # TODO verify for your environment actual_pos[t, :] = env.current_pos - # actual_pos_ball[t, :] = env.physics.data.qpos[2:] actual_vel[t, :] = env.current_vel plt.figure(figsize=(15, 5)) diff --git a/alr_envs/meta/README.MD b/alr_envs/meta/README.MD new file mode 100644 index 0000000..c8d9cd1 --- /dev/null +++ b/alr_envs/meta/README.MD @@ -0,0 +1,26 @@ +# MetaWorld Wrappers + +These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Motion Primitive gym interface with them. +All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized. +Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following: +```python + return np.hstack([ + # Current observation + [False] * 3, # end-effector position + [False] * 1, # normalized gripper open distance + [True] * 3, # main object position + [False] * 4, # main object quaternion + [False] * 3, # secondary object position + [False] * 4, # secondary object quaternion + # Previous observation + [False] * 3, # previous end-effector position + [False] * 1, # previous normalized gripper open distance + [False] * 3, # previous main object position + [False] * 4, # previous main object quaternion + [False] * 3, # previous second object position + [False] * 4, # previous second object quaternion + # Goal + [True] * 3, # goal position + ]) +``` +For other tasks only the boolean values have to be adjusted accordingly. \ No newline at end of file diff --git a/alr_envs/meta/__init__.py b/alr_envs/meta/__init__.py new file mode 100644 index 0000000..8cf1784 --- /dev/null +++ b/alr_envs/meta/__init__.py @@ -0,0 +1,114 @@ +from gym import register + +from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \ + object_change_mp_wrapper + +ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} + +# MetaWorld + +_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2"] +for _task in _goal_change_envs: + task_id_split = _task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + _env_id = f'{name}DetPMP-{task_id_split[-1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": _task, + "wrappers": [goal_change_mp_wrapper.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 6.25, + "post_traj_time": 0, + "width": 0.025, + "zero_start": True, + "policy_type": "metaworld", + } + } + ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"] +for _task in _object_change_envs: + task_id_split = _task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + _env_id = f'{name}DetPMP-{task_id_split[-1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": _task, + "wrappers": [object_change_mp_wrapper.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 6.25, + "post_traj_time": 0, + "width": 0.025, + "zero_start": True, + "policy_type": "metaworld", + } + } + ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2", + "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2", + "coffee-push-v2", "dial-turn-v2", "disassemble-v2", "door-close-v2", + "door-lock-v2", "door-open-v2", "door-unlock-v2", "hand-insert-v2", + "drawer-close-v2", "drawer-open-v2", "faucet-open-v2", "faucet-close-v2", + "handle-press-side-v2", "handle-press-v2", "handle-pull-side-v2", + "handle-pull-v2", "lever-pull-v2", "peg-insert-side-v2", "pick-place-wall-v2", + "reach-v2", "push-back-v2", "push-v2", "pick-place-v2", "peg-unplug-side-v2", + "soccer-v2", "stick-push-v2", "stick-pull-v2", "push-wall-v2", "reach-wall-v2", + "shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2" + ] +for _task in _goal_and_object_change_envs: + task_id_split = _task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + _env_id = f'{name}DetPMP-{task_id_split[-1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": _task, + "wrappers": [goal_object_change_mp_wrapper.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 6.25, + "post_traj_time": 0, + "width": 0.025, + "zero_start": True, + "policy_type": "metaworld", + } + } + ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) + +_goal_and_endeffector_change_envs = ["basketball-v2"] +for _task in _goal_and_endeffector_change_envs: + task_id_split = _task.split("-") + name = "".join([s.capitalize() for s in task_id_split[:-1]]) + _env_id = f'{name}DetPMP-{task_id_split[-1]}' + register( + id=_env_id, + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": _task, + "wrappers": [goal_endeffector_change_mp_wrapper.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 6.25, + "post_traj_time": 0, + "width": 0.025, + "zero_start": True, + "policy_type": "metaworld", + } + } + ) + ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id) diff --git a/alr_envs/meta/goal_change_mp_wrapper.py b/alr_envs/meta/goal_change_mp_wrapper.py new file mode 100644 index 0000000..15f4de9 --- /dev/null +++ b/alr_envs/meta/goal_change_mp_wrapper.py @@ -0,0 +1,68 @@ +from typing import Tuple, Union + +import numpy as np + +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + """ + This Wrapper is for environments where merely the goal changes in the beginning + and no secondary objects or end effectors are altered at the start of an episode. + You can verify this by executing the code below for your environment id and check if the output is non-zero + at the same indices. + ```python + import alr_envs + env = alr_envs.make(env_id, 1) + print(env.reset() - env.reset()) + array([ 0. , 0. , 0. , 0. , 0, + 0 , 0 , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0 , 0 , 0 , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , !=0 , !=0 , !=0]) + ``` + """ + + @property + def active_obs(self): + # This structure is the same for all metaworld environments. + # Only the observations which change could differ + return np.hstack([ + # Current observation + [False] * 3, # end-effector position + [False] * 1, # normalized gripper open distance + [False] * 3, # main object position + [False] * 4, # main object quaternion + [False] * 3, # secondary object position + [False] * 4, # secondary object quaternion + # Previous observation + # TODO: Include previous values? According to their source they might be wrong for the first iteration. + [False] * 3, # previous end-effector position + [False] * 1, # previous normalized gripper open distance + [False] * 3, # previous main object position + [False] * 4, # previous main object quaternion + [False] * 3, # previous second object position + [False] * 4, # previous second object quaternion + # Goal + [True] * 3, # goal position + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + r_close = self.env.data.get_joint_qpos("r_close") + return np.hstack([self.env.data.mocap_pos.flatten(), r_close]) + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + raise NotImplementedError("Velocity cannot be retrieved.") + + @property + def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return self.env.dt diff --git a/alr_envs/meta/goal_endeffector_change_mp_wrapper.py b/alr_envs/meta/goal_endeffector_change_mp_wrapper.py new file mode 100644 index 0000000..95c29e3 --- /dev/null +++ b/alr_envs/meta/goal_endeffector_change_mp_wrapper.py @@ -0,0 +1,68 @@ +from typing import Tuple, Union + +import numpy as np + +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + """ + This Wrapper is for environments where merely the goal changes in the beginning + and no secondary objects or end effectors are altered at the start of an episode. + You can verify this by executing the code below for your environment id and check if the output is non-zero + at the same indices. + ```python + import alr_envs + env = alr_envs.make(env_id, 1) + print(env.reset() - env.reset()) + array([ !=0 , !=0 , !=0 , 0. , 0., + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , !=0 , !=0 , + !=0 , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , !=0 , !=0 , !=0]) + ``` + """ + + @property + def active_obs(self): + # This structure is the same for all metaworld environments. + # Only the observations which change could differ + return np.hstack([ + # Current observation + [True] * 3, # end-effector position + [False] * 1, # normalized gripper open distance + [False] * 3, # main object position + [False] * 4, # main object quaternion + [False] * 3, # secondary object position + [False] * 4, # secondary object quaternion + # Previous observation + # TODO: Include previous values? According to their source they might be wrong for the first iteration. + [False] * 3, # previous end-effector position + [False] * 1, # previous normalized gripper open distance + [False] * 3, # previous main object position + [False] * 4, # previous main object quaternion + [False] * 3, # previous second object position + [False] * 4, # previous second object quaternion + # Goal + [True] * 3, # goal position + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + r_close = self.env.data.get_joint_qpos("r_close") + return np.hstack([self.env.data.mocap_pos.flatten(), r_close]) + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + raise NotImplementedError("Velocity cannot be retrieved.") + + @property + def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return self.env.dt diff --git a/alr_envs/meta/goal_object_change_mp_wrapper.py b/alr_envs/meta/goal_object_change_mp_wrapper.py new file mode 100644 index 0000000..6811ace --- /dev/null +++ b/alr_envs/meta/goal_object_change_mp_wrapper.py @@ -0,0 +1,68 @@ +from typing import Tuple, Union + +import numpy as np + +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + """ + This Wrapper is for environments where merely the goal changes in the beginning + and no secondary objects or end effectors are altered at the start of an episode. + You can verify this by executing the code below for your environment id and check if the output is non-zero + at the same indices. + ```python + import alr_envs + env = alr_envs.make(env_id, 1) + print(env.reset() - env.reset()) + array([ 0. , 0. , 0. , 0. , !=0, + !=0 , !=0 , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , !=0 , !=0 , !=0 , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , !=0 , !=0 , !=0]) + ``` + """ + + @property + def active_obs(self): + # This structure is the same for all metaworld environments. + # Only the observations which change could differ + return np.hstack([ + # Current observation + [False] * 3, # end-effector position + [False] * 1, # normalized gripper open distance + [True] * 3, # main object position + [False] * 4, # main object quaternion + [False] * 3, # secondary object position + [False] * 4, # secondary object quaternion + # Previous observation + # TODO: Include previous values? According to their source they might be wrong for the first iteration. + [False] * 3, # previous end-effector position + [False] * 1, # previous normalized gripper open distance + [False] * 3, # previous main object position + [False] * 4, # previous main object quaternion + [False] * 3, # previous second object position + [False] * 4, # previous second object quaternion + # Goal + [True] * 3, # goal position + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + r_close = self.env.data.get_joint_qpos("r_close") + return np.hstack([self.env.data.mocap_pos.flatten(), r_close]) + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + raise NotImplementedError("Velocity cannot be retrieved.") + + @property + def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return self.env.dt diff --git a/alr_envs/meta/object_change_mp_wrapper.py b/alr_envs/meta/object_change_mp_wrapper.py new file mode 100644 index 0000000..c7e506b --- /dev/null +++ b/alr_envs/meta/object_change_mp_wrapper.py @@ -0,0 +1,68 @@ +from typing import Tuple, Union + +import numpy as np + +from mp_env_api import MPEnvWrapper + + +class MPWrapper(MPEnvWrapper): + """ + This Wrapper is for environments where merely the goal changes in the beginning + and no secondary objects or end effectors are altered at the start of an episode. + You can verify this by executing the code below for your environment id and check if the output is non-zero + at the same indices. + ```python + import alr_envs + env = alr_envs.make(env_id, 1) + print(env.reset() - env.reset()) + array([ 0. , 0. , 0. , 0. , !=0 , + !=0 , !=0 , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0 , 0 , 0 , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0. , 0. , + 0. , 0. , 0. , 0.]) + ``` + """ + + @property + def active_obs(self): + # This structure is the same for all metaworld environments. + # Only the observations which change could differ + return np.hstack([ + # Current observation + [False] * 3, # end-effector position + [False] * 1, # normalized gripper open distance + [False] * 3, # main object position + [False] * 4, # main object quaternion + [False] * 3, # secondary object position + [False] * 4, # secondary object quaternion + # Previous observation + # TODO: Include previous values? According to their source they might be wrong for the first iteration. + [False] * 3, # previous end-effector position + [False] * 1, # previous normalized gripper open distance + [False] * 3, # previous main object position + [False] * 4, # previous main object quaternion + [False] * 3, # previous second object position + [False] * 4, # previous second object quaternion + # Goal + [True] * 3, # goal position + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + r_close = self.env.data.get_joint_qpos("r_close") + return np.hstack([self.env.data.mocap_pos.flatten(), r_close]) + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + raise NotImplementedError("Velocity cannot be retrieved.") + + @property + def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: + raise ValueError("Goal position is not available and has to be learnt based on the environment.") + + @property + def dt(self) -> Union[float, int]: + return self.env.dt diff --git a/alr_envs/mujoco/__init__.py b/alr_envs/mujoco/__init__.py deleted file mode 100644 index d86e772..0000000 --- a/alr_envs/mujoco/__init__.py +++ /dev/null @@ -1,4 +0,0 @@ -from alr_envs.mujoco.reacher.alr_reacher import ALRReacherEnv -from alr_envs.mujoco.reacher.balancing import BalancingEnv -from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv -from alr_envs.mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv diff --git a/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml b/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml deleted file mode 100644 index b534205..0000000 --- a/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml +++ /dev/null @@ -1,366 +0,0 @@ - - - diff --git a/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT b/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT deleted file mode 100644 index 91c2162..0000000 --- a/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT +++ /dev/null @@ -1,3 +0,0 @@ -Mon Jan 25 15:45:30 2021 -ERROR: GLEW initalization error: Missing GL version - diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl deleted file mode 100644 index 3b05c27..0000000 Binary files a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl and /dev/null differ diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl deleted file mode 100644 index 7b2f001..0000000 Binary files a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl deleted file mode 100644 index 0a986fa..0000000 Binary files a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl b/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl deleted file mode 100755 index f6a1515..0000000 Binary files a/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl deleted file mode 100755 index f6b41ad..0000000 Binary files a/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl deleted file mode 100755 index c36f88f..0000000 Binary files a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl deleted file mode 100755 index d00cac1..0000000 Binary files a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl deleted file mode 100755 index 13d2f73..0000000 Binary files a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl and /dev/null differ diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl deleted file mode 100755 index 0d95239..0000000 Binary files a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl and /dev/null differ diff --git a/alr_envs/open_ai/README.MD b/alr_envs/open_ai/README.MD new file mode 100644 index 0000000..985c093 --- /dev/null +++ b/alr_envs/open_ai/README.MD @@ -0,0 +1,14 @@ +# OpenAI Gym Wrappers + +These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.com/) environments to use +the Motion Primitive gym interface for them. + +## MP Environments +These environments are wrapped-versions of their OpenAI-gym counterparts. + +|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension +|---|---|---|---|---| +|`ContinuousMountainCarDetPMP-v0`| A DetPmP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 +|`ReacherDetPMP-v2`| A DetPmP wrapped version of the Reacher-v2 environment. | 50 | 2 +|`FetchSlideDenseDetPMP-v1`| A DetPmP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 +|`FetchReachDenseDetPMP-v1`| A DetPmP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 diff --git a/alr_envs/open_ai/__init__.py b/alr_envs/open_ai/__init__.py index 1e531cf..b9e53b4 100644 --- a/alr_envs/open_ai/__init__.py +++ b/alr_envs/open_ai/__init__.py @@ -1,3 +1,130 @@ -from alr_envs.open_ai.mujoco import reacher_v2 -from alr_envs.open_ai.robotics import fetch -from alr_envs.open_ai.classic_control import continuous_mountain_car \ No newline at end of file +from gym import register +from gym.wrappers import FlattenObservation + +from . import classic_control, mujoco, robotics + +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []} + +# Open AI +# Classic Control +register( + id='ContinuousMountainCarDetPMP-v0', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.classic_control:MountainCarContinuous-v0", + "wrappers": [classic_control.continuous_mountain_car.MPWrapper], + "mp_kwargs": { + "num_dof": 1, + "num_basis": 4, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "motor", + "policy_kwargs": { + "p_gains": 1., + "d_gains": 1. + } + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ContinuousMountainCarDetPMP-v0") + +register( + id='ReacherDetPMP-v2', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.mujoco:Reacher-v2", + "wrappers": [mujoco.reacher_v2.MPWrapper], + "mp_kwargs": { + "num_dof": 2, + "num_basis": 6, + "duration": 1, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "motor", + "policy_kwargs": { + "p_gains": .6, + "d_gains": .075 + } + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ReacherDetPMP-v2") + +register( + id='FetchSlideDenseDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchSlideDense-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchSlideDenseDetPMP-v1") + +register( + id='FetchSlideDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchSlide-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchSlideDetPMP-v1") + +register( + id='FetchReachDenseDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchReachDense-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchReachDenseDetPMP-v1") + +register( + id='FetchReachDetPMP-v1', + entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchReach-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "mp_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "width": 0.02, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("FetchReachDetPMP-v1") diff --git a/alr_envs/open_ai/classic_control/__init__.py b/alr_envs/open_ai/classic_control/__init__.py index e69de29..b998494 100644 --- a/alr_envs/open_ai/classic_control/__init__.py +++ b/alr_envs/open_ai/classic_control/__init__.py @@ -0,0 +1 @@ +from . import continuous_mountain_car \ No newline at end of file diff --git a/alr_envs/open_ai/mujoco/__init__.py b/alr_envs/open_ai/mujoco/__init__.py index e69de29..3082d19 100644 --- a/alr_envs/open_ai/mujoco/__init__.py +++ b/alr_envs/open_ai/mujoco/__init__.py @@ -0,0 +1 @@ +from . import reacher_v2 diff --git a/alr_envs/open_ai/robotics/__init__.py b/alr_envs/open_ai/robotics/__init__.py index e69de29..13c65b3 100644 --- a/alr_envs/open_ai/robotics/__init__.py +++ b/alr_envs/open_ai/robotics/__init__.py @@ -0,0 +1 @@ +from . import fetch \ No newline at end of file diff --git a/alr_envs/utils/__init__.py b/alr_envs/utils/__init__.py index 758d49f..bb4b0bc 100644 --- a/alr_envs/utils/__init__.py +++ b/alr_envs/utils/__init__.py @@ -4,8 +4,10 @@ from typing import Union import gym from gym.envs.registration import register +from alr_envs.utils.make_env_helpers import make -def make( + +def make_dmc( id: str, seed: int = 1, visualize_reward: bool = True, @@ -45,7 +47,7 @@ def make( task_kwargs['time_limit'] = time_limit register( id=env_id, - entry_point='alr_envs.utils.dmc_wrapper:DMCWrapper', + entry_point='alr_envs.dmc.dmc_wrapper:DMCWrapper', kwargs=dict( domain_name=domain_name, task_name=task_name, diff --git a/alr_envs/utils/make_env_helpers.py b/alr_envs/utils/make_env_helpers.py index 0348492..fc73b05 100644 --- a/alr_envs/utils/make_env_helpers.py +++ b/alr_envs/utils/make_env_helpers.py @@ -1,23 +1,23 @@ -import logging -from typing import Iterable, List, Type, Union +from typing import Iterable, Type, Union import gym import numpy as np +from gym.envs.registration import EnvSpec from mp_env_api import MPEnvWrapper from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper -def make_env_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs): +def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs): """ TODO: Do we need this? Generate a callable to create a new gym environment with a given seed. The rank is added to the seed and can be used for example when using vector environments. - E.g. [make_env_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments + E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments with seeds 123 through 130. Hence, testing environments should be seeded with a value which is offset by the number of training environments. - Here e.g. [make_env_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns + Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns Args: env_id: name of the environment @@ -30,12 +30,12 @@ def make_env_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, * """ def f(): - return make_env(env_id, seed + rank, **kwargs) + return make(env_id, seed + rank, **kwargs) return f if return_callable else f() -def make_env(env_id: str, seed, **kwargs): +def make(env_id: str, seed, **kwargs): """ Converts an env_id to an environment with the gym API. This also works for DeepMind Control Suite interface_wrappers @@ -58,13 +58,30 @@ def make_env(env_id: str, seed, **kwargs): env.action_space.seed(seed) env.observation_space.seed(seed) except gym.error.Error: - # DMC - from alr_envs.utils import make - env = make(env_id, seed=seed, **kwargs) - assert env.base_step_limit == env.spec.max_episode_steps, \ - f"The specified 'episode_length' of {env.spec.max_episode_steps} steps for gym is different from " \ - f"the DMC environment specification of {env.base_step_limit} steps." + # MetaWorld env + import metaworld + if env_id in metaworld.ML1.ENV_NAMES: + env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs) + # setting this avoids generating the same initialization after each reset + env._freeze_rand_vec = False + # Manually set spec, as metaworld environments are not registered via gym + env.unwrapped.spec = EnvSpec(env_id) + # Set Timelimit based on the maximum allowed path length of the environment + env = gym.wrappers.TimeLimit(env, max_episode_steps=env.max_path_length) + env.seed(seed) + env.action_space.seed(seed) + env.observation_space.seed(seed) + env.goal_space.seed(seed) + + else: + # DMC + from alr_envs import make_dmc + env = make_dmc(env_id, seed=seed, **kwargs) + + assert env.base_step_limit == env.spec.max_episode_steps, \ + f"The specified 'episode_length' of {env.spec.max_episode_steps} steps for gym is different from " \ + f"the DMC environment specification of {env.base_step_limit} steps." return env @@ -84,7 +101,7 @@ def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1 """ # _env = gym.make(env_id) - _env = make_env(env_id, seed, **kwargs) + _env = make(env_id, seed, **kwargs) assert any(issubclass(w, MPEnvWrapper) for w in wrappers), \ "At least one MPEnvWrapper is required in order to leverage motion primitive environments." @@ -106,11 +123,11 @@ def make_dmp_env(env_id: str, wrappers: Iterable, seed=1, mp_kwargs={}, **kwargs Returns: DMP wrapped gym env """ - verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) + _verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) _env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) - verify_dof(_env, mp_kwargs.get("num_dof")) + _verify_dof(_env, mp_kwargs.get("num_dof")) return DmpWrapper(_env, **mp_kwargs) @@ -126,11 +143,11 @@ def make_detpmp_env(env_id: str, wrappers: Iterable, seed=1, mp_kwargs={}, **kwa Returns: DMP wrapped gym env """ - verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) + _verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) _env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) - verify_dof(_env, mp_kwargs.get("num_dof")) + _verify_dof(_env, mp_kwargs.get("num_dof")) return DetPMPWrapper(_env, **mp_kwargs) @@ -174,14 +191,7 @@ def make_detpmp_env_helper(**kwargs): mp_kwargs=kwargs.pop("mp_kwargs"), **kwargs) -def make_contextual_env(env_id, context, seed, rank): - env = make_env(env_id, seed + rank, context=context) - # env = gym.make(env_id, context=context) - # env.seed(seed + rank) - return lambda: env - - -def verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]): +def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]): """ When using DMC check if a manually specified time limit matches the trajectory duration the MP receives. Mostly, the time_limit for DMC is not specified and the default values from DMC are taken. @@ -201,7 +211,7 @@ def verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[N f"the duration of {mp_time_limit}s for the MP." -def verify_dof(base_env: gym.Env, dof: int): +def _verify_dof(base_env: gym.Env, dof: int): action_shape = np.prod(base_env.action_space.shape) assert dof == action_shape, \ f"The specified degrees of freedom ('num_dof') {dof} do not match " \ diff --git a/alr_envs/utils/mp_env_async_sampler.py b/alr_envs/utils/mp_env_async_sampler.py deleted file mode 100644 index 67a774c..0000000 --- a/alr_envs/utils/mp_env_async_sampler.py +++ /dev/null @@ -1,122 +0,0 @@ -import gym -from gym.vector.async_vector_env import AsyncVectorEnv -import numpy as np -from _collections import defaultdict - -from alr_envs.utils.make_env_helpers import make_env_rank - - -def split_array(ary, size): - n_samples = len(ary) - if n_samples < size: - tmp = np.zeros((size, ary.shape[1])) - tmp[0:n_samples] = ary - return [tmp] - elif n_samples == size: - return [ary] - else: - repeat = int(np.ceil(n_samples / size)) - split = [k * size for k in range(1, repeat)] - sub_arys = np.split(ary, split) - - if n_samples % size != 0: - tmp = np.zeros_like(sub_arys[0]) - last = sub_arys[-1] - tmp[0: len(last)] = last - sub_arys[-1] = tmp - - return sub_arys - - -def _flatten_list(l): - assert isinstance(l, (list, tuple)) - assert len(l) > 0 - assert all([len(l_) > 0 for l_ in l]) - - return [l__ for l_ in l for l__ in l_] - - -class DummyDist: - def __init__(self, dim): - self.dim = dim - - def sample(self, contexts): - contexts = np.atleast_2d(contexts) - n_samples = contexts.shape[0] - return np.random.normal(size=(n_samples, self.dim)), contexts - - -class AlrMpEnvSampler: - """ - An asynchronous sampler for non contextual MPWrapper environments. A sampler object can be called with a set of - parameters and returns the corresponding final obs, rewards, dones and info dicts. - """ - - def __init__(self, env_id, num_envs, seed=0, **env_kwargs): - self.num_envs = num_envs - self.env = AsyncVectorEnv([make_env_rank(env_id, seed, i, **env_kwargs) for i in range(num_envs)]) - - def __call__(self, params): - params = np.atleast_2d(params) - n_samples = params.shape[0] - split_params = split_array(params, self.num_envs) - - vals = defaultdict(list) - for p in split_params: - self.env.reset() - obs, reward, done, info = self.env.step(p) - vals['obs'].append(obs) - vals['reward'].append(reward) - vals['done'].append(done) - vals['info'].append(info) - - # do not return values above threshold - return np.vstack(vals['obs'])[:n_samples], np.hstack(vals['reward'])[:n_samples], \ - _flatten_list(vals['done'])[:n_samples], _flatten_list(vals['info'])[:n_samples] - - -class AlrContextualMpEnvSampler: - """ - An asynchronous sampler for contextual MPWrapper environments. A sampler object can be called with a set of - parameters and returns the corresponding final obs, rewards, dones and info dicts. - """ - - def __init__(self, env_id, num_envs, seed=0, **env_kwargs): - self.num_envs = num_envs - self.env = AsyncVectorEnv([make_env(env_id, seed, i, **env_kwargs) for i in range(num_envs)]) - - def __call__(self, dist, n_samples): - repeat = int(np.ceil(n_samples / self.env.num_envs)) - vals = defaultdict(list) - - obs = self.env.reset() - for i in range(repeat): - vals['obs'].append(obs) - new_samples, new_contexts = dist.sample(obs) - vals['new_samples'].append(new_samples) - - obs, reward, done, info = self.env.step(new_samples) - - vals['reward'].append(reward) - vals['done'].append(done) - vals['info'].append(info) - - # do not return values above threshold - return np.vstack(vals['new_samples'])[:n_samples], \ - np.vstack(vals['obs'])[:n_samples], np.hstack(vals['reward'])[:n_samples], \ - _flatten_list(vals['done'])[:n_samples], _flatten_list(vals['info'])[:n_samples] - - -if __name__ == "__main__": - env_name = "alr_envs:HoleReacherDetPMP-v1" - n_cpu = 8 - dim = 25 - n_samples = 10 - - sampler = AlrMpEnvSampler(env_name, num_envs=n_cpu) - - thetas = np.random.randn(n_samples, dim) # usually form a search distribution - - _, rewards, __, ___ = sampler(thetas) - - print(rewards) diff --git a/setup.py b/setup.py index 16374e4..beb2095 100644 --- a/setup.py +++ b/setup.py @@ -3,8 +3,7 @@ from setuptools import setup setup( name='alr_envs', version='0.0.1', - packages=['alr_envs', 'alr_envs.classic_control', 'alr_envs.open_ai', 'alr_envs.mujoco', 'alr_envs.dmc', - 'alr_envs.utils'], + packages=['alr_envs', 'alr_envs.alr', 'alr_envs.open_ai', 'alr_envs.dmc', 'alr_envs.meta', 'alr_envs.utils'], install_requires=[ 'gym', 'PyQt5', @@ -12,6 +11,7 @@ setup( 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git', 'mujoco-py<2.1,>=2.0', 'dm_control' + 'metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld' ], url='https://github.com/ALRhub/alr_envs/', diff --git a/test/test_dmc_envs.py b/test/test_dmc_envs.py new file mode 100644 index 0000000..a7cd9be --- /dev/null +++ b/test/test_dmc_envs.py @@ -0,0 +1,127 @@ +import unittest + +import gym +import numpy as np + +from dm_control import suite, manipulation + +from alr_envs import make + +DMC_ENVS = [f'{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"] +MANIPULATION_SPECS = [f'manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')] +SEED = 1 + + +class TestStepDMCEnvironments(unittest.TestCase): + + def _run_env(self, env_id, iterations=None, seed=SEED, render=False): + """ + Example for running a DMC based env in the step based setting. + The env_id has to be specified as `domain_name-task_name` or + for manipulation tasks as `manipulation-environment_name` + + Args: + env_id: Either `domain_name-task_name` or `manipulation-environment_name` + iterations: Number of rollout steps to run + seed= random seeding + render: Render the episode + + Returns: + + """ + env: gym.Env = make(env_id, seed=seed) + rewards = [] + observations = [] + dones = [] + obs = env.reset() + self._verify_observations(obs, env.observation_space, "reset()") + + length = env.spec.max_episode_steps + if iterations is None: + if length is None: + iterations = 1 + else: + iterations = length + + # number of samples(multiple environment steps) + for i in range(iterations): + observations.append(obs) + + ac = env.action_space.sample() + # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) + obs, reward, done, info = env.step(ac) + + self._verify_observations(obs, env.observation_space, "step()") + self._verify_reward(reward) + self._verify_done(done) + + rewards.append(reward) + dones.append(done) + + if render: + env.render("human") + + if done: + obs = env.reset() + + assert done, "Done flag is not True after max episode length." + observations.append(obs) + env.close() + del env + return np.array(observations), np.array(rewards), np.array(dones) + + def _verify_observations(self, obs, observation_space, obs_type="reset()"): + self.assertTrue(observation_space.contains(obs), + f"Observation {obs} received from {obs_type} " + f"not contained in observation space {observation_space}.") + + def _verify_reward(self, reward): + self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.") + + def _verify_done(self, done): + self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") + + def test_dmc_functionality(self): + """Tests that environments runs without errors using random actions.""" + for env_id in DMC_ENVS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_dmc_determinism(self): + """Tests that identical seeds produce identical trajectories.""" + seed = 0 + # Iterate over two trajectories, which should have the same state and action sequence + for env_id in DMC_ENVS: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, obs2, rwd2, done2 = time_step + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + + def test_manipulation_functionality(self): + """Tests that environments runs without errors using random actions.""" + for env_id in MANIPULATION_SPECS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_manipulation_determinism(self): + """Tests that identical seeds produce identical trajectories.""" + seed = 0 + # Iterate over two trajectories, which should have the same state and action sequence + for env_id in MANIPULATION_SPECS: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, obs2, rwd2, done2 = time_step + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_envs.py b/test/test_envs.py index 2e5bf7e..f8d7269 100644 --- a/test/test_envs.py +++ b/test/test_envs.py @@ -4,13 +4,13 @@ import gym import numpy as np import alr_envs # noqa -from alr_envs.utils.make_env_helpers import make_env +from alr_envs.utils.make_env_helpers import make ALL_SPECS = list(spec for spec in gym.envs.registry.all() if "alr_envs" in spec.entry_point) SEED = 1 -class TestEnvironments(unittest.TestCase): +class TestMPEnvironments(unittest.TestCase): def _run_env(self, env_id, iterations=None, seed=SEED, render=False): """ @@ -27,7 +27,7 @@ class TestEnvironments(unittest.TestCase): Returns: """ - env: gym.Env = make_env(env_id, seed=seed) + env: gym.Env = make(env_id, seed=seed) rewards = [] observations = [] dones = [] @@ -62,11 +62,24 @@ class TestEnvironments(unittest.TestCase): if done: obs = env.reset() + assert done, "Done flag is not True after max episode length." observations.append(obs) env.close() del env return np.array(observations), np.array(rewards), np.array(dones) + def _run_env_determinism(self, ids): + seed = 0 + for env_id in ids: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, obs2, rwd2, done2 = time_step + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + def _verify_observations(self, obs, observation_space, obs_type="reset()"): self.assertTrue(observation_space.contains(obs), f"Observation {obs} received from {obs_type} " @@ -78,27 +91,81 @@ class TestEnvironments(unittest.TestCase): def _verify_done(self, done): self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") - def test_environment_functionality(self): - """Tests that environments runs without errors using random actions.""" - for spec in ALL_SPECS: - # try: - with self.subTest(msg=spec.id): - self._run_env(spec.id) + def test_alr_environment_functionality(self): + """Tests that environments runs without errors using random actions for ALR MP envs.""" + with self.subTest(msg="DMP"): + for env_id in alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) - def test_environment_determinism(self): - """Tests that identical seeds produce identical trajectories.""" - seed = 0 - # Iterate over two trajectories, which should have the same state and action sequence - for spec in ALL_SPECS: - with self.subTest(msg=spec.id): - self._run_env(spec.id) - traj1 = self._run_env(spec.id, seed=seed) - traj2 = self._run_env(spec.id, seed=seed) - for i, time_step in enumerate(zip(*traj1, *traj2)): - obs1, rwd1, done1, obs2, rwd2, done2 = time_step - self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") - self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") - self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + with self.subTest(msg="DetPMP"): + for env_id in alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS['DetPMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_openai_environment_functionality(self): + """Tests that environments runs without errors using random actions for OpenAI gym MP envs.""" + with self.subTest(msg="DMP"): + for env_id in alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + with self.subTest(msg="DetPMP"): + for env_id in alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS['DetPMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_dmc_environment_functionality(self): + """Tests that environments runs without errors using random actions for DMC MP envs.""" + with self.subTest(msg="DMP"): + for env_id in alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + with self.subTest(msg="DetPMP"): + for env_id in alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS['DetPMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_metaworld_environment_functionality(self): + """Tests that environments runs without errors using random actions for Metaworld MP envs.""" + with self.subTest(msg="DMP"): + for env_id in alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + with self.subTest(msg="DetPMP"): + for env_id in alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS['DetPMP']: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_alr_environment_determinism(self): + """Tests that identical seeds produce identical trajectories for ALR MP Envs.""" + with self.subTest(msg="DMP"): + self._run_env_determinism(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) + with self.subTest(msg="DetPMP"): + self._run_env_determinism(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"]) + + def test_openai_environment_determinism(self): + """Tests that identical seeds produce identical trajectories for OpenAI gym MP Envs.""" + with self.subTest(msg="DMP"): + self._run_env_determinism(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) + with self.subTest(msg="DetPMP"): + self._run_env_determinism(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"]) + + def test_dmc_environment_determinism(self): + """Tests that identical seeds produce identical trajectories for DMC MP Envs.""" + with self.subTest(msg="DMP"): + self._run_env_determinism(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) + with self.subTest(msg="DetPMP"): + self._run_env_determinism(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"]) + + def test_metaworld_environment_determinism(self): + """Tests that identical seeds produce identical trajectories for Metaworld MP Envs.""" + with self.subTest(msg="DMP"): + self._run_env_determinism(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) + with self.subTest(msg="DetPMP"): + self._run_env_determinism(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"]) if __name__ == '__main__': diff --git a/test/test_metaworld_envs.py b/test/test_metaworld_envs.py new file mode 100644 index 0000000..a6bd244 --- /dev/null +++ b/test/test_metaworld_envs.py @@ -0,0 +1,107 @@ +import unittest + +import gym +import numpy as np + +from alr_envs import make +from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE + +ALL_ENVS = [env.split("-goal-observable")[0] for env, _ in ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()] +SEED = 1 + + +class TestStepMetaWorlEnvironments(unittest.TestCase): + + def _run_env(self, env_id, iterations=None, seed=SEED, render=False): + """ + Example for running a DMC based env in the step based setting. + The env_id has to be specified as `domain_name-task_name` or + for manipulation tasks as `manipulation-environment_name` + + Args: + env_id: Either `domain_name-task_name` or `manipulation-environment_name` + iterations: Number of rollout steps to run + seed= random seeding + render: Render the episode + + Returns: + + """ + env: gym.Env = make(env_id, seed=seed) + rewards = [] + observations = [] + actions = [] + dones = [] + obs = env.reset() + self._verify_observations(obs, env.observation_space, "reset()") + + length = env.max_path_length + if iterations is None: + if length is None: + iterations = 1 + else: + iterations = length + + # number of samples(multiple environment steps) + for i in range(iterations): + observations.append(obs) + + ac = env.action_space.sample() + actions.append(ac) + # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) + obs, reward, done, info = env.step(ac) + + self._verify_observations(obs, env.observation_space, "step()") + self._verify_reward(reward) + self._verify_done(done) + + rewards.append(reward) + dones.append(done) + + if render: + env.render("human") + + if done: + obs = env.reset() + + assert done, "Done flag is not True after max episode length." + observations.append(obs) + env.close() + del env + return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) + + def _verify_observations(self, obs, observation_space, obs_type="reset()"): + self.assertTrue(observation_space.contains(obs), + f"Observation {obs} received from {obs_type} " + f"not contained in observation space {observation_space}.") + + def _verify_reward(self, reward): + self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.") + + def _verify_done(self, done): + self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") + + def test_dmc_functionality(self): + """Tests that environments runs without errors using random actions.""" + for env_id in ALL_ENVS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_dmc_determinism(self): + """Tests that identical seeds produce identical trajectories.""" + seed = 0 + # Iterate over two trajectories, which should have the same state and action sequence + for env_id in ALL_ENVS: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step + self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.") + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.") + self.assertAlmostEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + + +if __name__ == '__main__': + unittest.main()