Merge branch 'master' into reacher_env_cleanup

# Conflicts: # alr_envs/examples/examples_general.py
2021-11-15 10:53:53 +01:00 · 2021-11-15 10:53:53 +01:00 · bb65584429
commit bb65584429
parent 7a725077e2 ac969d490a
200 changed files with 2695 additions and 1444 deletions
--- a/README.md
+++ b/README.md
@ -1,87 +1,212 @@
-## ALR Environments
-    
-This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc. 
-Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
-For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
-We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation.
-When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
+## ALR Robotics Control Environments

-## Environments
-Currently we have the following environments: 
+This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym.
+Besides, some custom environments we also provide support for the benchmark suites
+[OpenAI gym](https://gym.openai.com/),
+[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
+(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according
+to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
+further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP,
+we only consider the mean usually).

-### Mujoco
+## Motion Primitive Environments (Episodic environments)

-|Name| Description|Horizon|Action Dimension|Observation Dimension
-|---|---|---|---|---|
-|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
-|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
-|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
-|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
-|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
-|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
-|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
-|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
-|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
-    
-### Classic Control
+Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
+optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full
+trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The
+generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
+however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
+velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
+represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
+framework we support the above setting for the contextual setting, for which we expose all changing substates of the
+task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
+trajectory. All environments provide the next to the cumulative episode reward also all collected information from each
+step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.

-|Name| Description|Horizon|Action Dimension|Observation Dimension
-|---|---|---|---|---|
-|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
-|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
-|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 
-|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+|Key| Description|
+|---|---|
+`trajectory`| Generated trajectory from MP
+`step_actions`| Step-wise executed action based on controller output
+`step_observations`| Step-wise intermediate observations
+`step_rewards`| Step-wise rewards
+`trajectory_length`| Total number of environment interactions
+`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.

-### DMP Environments
-These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
-The goal is to learn the parameters of this DMP to generate a suitable trajectory. 
-All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions. 
+## Installation

-|Name| Description|Horizon|Action Dimension|Context Dimension
-|---|---|---|---|---|
-|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
-|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
-|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 
-|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
-|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
-|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3 
+1. Clone the repository

-[//]:  |`HoleReacherDetPMP-v0`|
-
-### Stochastic Search
-|Name| Description|Horizon|Action Dimension|Observation Dimension
-|---|---|---|---|---|
-|`Rosenbrock{dim}-v0`| Gym interface for Rosenbrock function. `{dim}` is one of 5, 10, 25, 50 or 100. | 1 | `{dim}` | 0
-
-
-## Install
-1. Clone the repository 
 ```bash 
 git clone git@github.com:ALRhub/alr_envs.git
 ```
-2. Go to the folder 
+
+2. Go to the folder
+
 ```bash 
 cd alr_envs
 ```
-3. Install with 
+
+3. Install with
+
 ```bash 
 pip install -e . 
 ```
-4. Use (see [example.py](alr_envs/examples/examples_general.py)): 
-```python
-import gym

-env = gym.make('alr_envs:SimpleReacher-v0')
+## Using the framework
+
+We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
+
+### Step-wise environments
+
+```python
+import alr_envs
+
+env = alr_envs.make('HoleReacher-v0', seed=1)
 state = env.reset()

-for i in range(10000):
+for i in range(1000):
    state, reward, done, info = env.step(env.action_space.sample())
    if i % 5 == 0:
        env.render()

    if done:
        state = env.reset()
-
 ``` 

-For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py)
+For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
+as `manipulation-environment_name`. All other environments can be created based on their original name.
+
+Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
+trajectory.
+
+```python
+import alr_envs
+
+env = alr_envs.make('HoleReacherDetPMP-v0', seed=1)
+# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). 
+env.render()
+
+state = env.reset()
+
+for i in range(5):
+    state, reward, done, info = env.step(env.action_space.sample())
+
+    # Not really necessary as the environments resets itself after each trajectory anyway.
+    state = env.reset()
+```
+
+To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
+keys `DMP` and `DetPMP` that store a list of available environment names.
+
+```python
+import alr_envs
+
+print("Custom MP tasks:")
+print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)
+
+print("OpenAI Gym MP tasks:")
+print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
+
+print("Deepmind Control MP tasks:")
+print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)
+
+print("MetaWorld MP tasks:")
+print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
+```
+
+### How to create a new MP task
+
+In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
+hand, the following interface needs to be implemented.
+
+```python
+import numpy as np
+from mp_env_api import MPEnvWrapper
+
+
+class MPWrapper(MPEnvWrapper):
+
+    @property
+    def active_obs(self):
+        """
+            Returns boolean mask for each substate in the full observation.
+            It determines whether the observation is returned for the contextual case or not.
+            This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+            E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first  
+            observation, the velocities are not necessary in the observation for the MP task.
+        """
+        return np.ones(self.observation_space.shape, dtype=bool)
+
+    @property
+    def current_vel(self):
+        """
+            Returns the current velocity of the action/control dimension. 
+            The dimensionality has to match the action/control dimension.
+            This is not required when exclusively using position control, 
+            it should, however, be implemented regardless.
+            E.g. The joint velocities that are directly or indirectly controlled by the action.
+        """
+        raise NotImplementedError()
+
+    @property
+    def current_pos(self):
+        """
+            Returns the current position of the action/control dimension. 
+            The dimensionality has to match the action/control dimension.
+            This is not required when exclusively using velocity control, 
+            it should, however, be implemented regardless.
+            E.g. The joint positions that are directly or indirectly controlled by the action.
+        """
+        raise NotImplementedError()
+
+    @property
+    def goal_pos(self):
+        """
+            Returns a predefined final position of the action/control dimension.
+            This is only required for the DMP and is most of the time learned instead.
+        """
+        raise NotImplementedError()
+
+    @property
+    def dt(self):
+        """
+            Returns the time between two simulated steps of the environment
+        """
+        raise NotImplementedError()
+
+```
+
+If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. 
+Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend 
+having a look at the [examples](alr_envs/examples/).
+
+```python
+import alr_envs
+
+# Base environment name, according to structure of above example
+base_env_id = "ball_in_cup-catch"
+
+# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
+# You can also add other gym.Wrappers in case they are needed, 
+# e.g. gym.wrappers.FlattenObservation for dict observations
+wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
+mp_kwargs = {...}
+kwargs = {...}
+env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
+# OR for a deterministic ProMP (other mp_kwargs are required):
+# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)
+
+rewards = 0
+obs = env.reset()
+
+# number of samples/full trajectories (multiple environment steps)
+for i in range(5):
+    ac = env.action_space.sample()
+    obs, reward, done, info = env.step(ac)
+    rewards += reward
+
+    if done:
+        print(base_env_id, rewards)
+        rewards = 0
+        obs = env.reset()
+```
--- a/alr_envs/init.py
+++ b/alr_envs/init.py
@ -1,579 +1,15 @@
-import numpy as np
-from gym.envs.registration import register
+from alr_envs import dmc, meta, open_ai
+from alr_envs.utils.make_env_helpers import make, make_detpmp_env, make_dmp_env, make_rank
+from alr_envs.utils import make_dmc

-from alr_envs.classic_control.hole_reacher.hole_reacher_mp_wrapper import HoleReacherMPWrapper
-from alr_envs.classic_control.simple_reacher.simple_reacher_mp_wrapper import SimpleReacherMPWrapper
-from alr_envs.classic_control.viapoint_reacher.viapoint_reacher_mp_wrapper import ViaPointReacherMPWrapper
-from alr_envs.dmc.ball_in_cup.ball_in_the_cup_mp_wrapper import DMCBallInCupMPWrapper
-from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_mp_wrapper import BallInACupMPWrapper
-from alr_envs.stochastic_search.functions.f_rosenbrock import Rosenbrock
+# Convenience function for all MP environments
+from .alr import ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS
+from .dmc import ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS
+from .meta import ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS
+from .open_ai import ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS

-# Mujoco
-
-## Reacher
-register(
-    id='ALRReacher-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 0,
-        "n_links": 5,
-        "balance": False,
-    }
-)
-
-register(
-    id='ALRReacherSparse-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 200,
-        "n_links": 5,
-        "balance": False,
-    }
-)
-
-register(
-    id='ALRReacherSparseBalanced-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 200,
-        "n_links": 5,
-        "balance": True,
-    }
-)
-
-register(
-    id='ALRLongReacher-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 0,
-        "n_links": 7,
-        "balance": False,
-    }
-)
-
-register(
-    id='ALRLongReacherSparse-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 200,
-        "n_links": 7,
-        "balance": False,
-    }
-)
-
-register(
-    id='ALRLongReacherSparseBalanced-v0',
-    entry_point='alr_envs.mujoco:ALRReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "steps_before_reward": 200,
-        "n_links": 7,
-        "balance": True,
-    }
-)
-
-## Balancing Reacher
-
-register(
-    id='Balancing-v0',
-    entry_point='alr_envs.mujoco:BalancingEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-    }
-)
-
-register(
-    id='ALRBallInACupSimple-v0',
-    entry_point='alr_envs.mujoco:ALRBallInACupEnv',
-    max_episode_steps=4000,
-    kwargs={
-        "simplified": True,
-        "reward_type": "no_context",
-    }
-)
-
-register(
-    id='ALRBallInACupPDSimple-v0',
-    entry_point='alr_envs.mujoco:ALRBallInACupPDEnv',
-    max_episode_steps=4000,
-    kwargs={
-        "simplified": True,
-        "reward_type": "no_context"
-    }
-)
-
-register(
-    id='ALRBallInACupPD-v0',
-    entry_point='alr_envs.mujoco:ALRBallInACupPDEnv',
-    max_episode_steps=4000,
-    kwargs={
-        "simplified": False,
-        "reward_type": "no_context"
-    }
-)
-
-register(
-    id='ALRBallInACup-v0',
-    entry_point='alr_envs.mujoco:ALRBallInACupEnv',
-    max_episode_steps=4000,
-    kwargs={
-        "reward_type": "no_context"
-    }
-)
-
-register(
-    id='ALRBallInACupGoal-v0',
-    entry_point='alr_envs.mujoco:ALRBallInACupEnv',
-    max_episode_steps=4000,
-    kwargs={
-        "reward_type": "contextual_goal"
-    }
-)
-
-# Classic control
-
-## Simple Reacher
-register(
-    id='SimpleReacher-v0',
-    entry_point='alr_envs.classic_control:SimpleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 2,
-    }
-)
-
-register(
-    id='SimpleReacher-v1',
-    entry_point='alr_envs.classic_control:SimpleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 2,
-        "random_start": False
-    }
-)
-
-register(
-    id='LongSimpleReacher-v0',
-    entry_point='alr_envs.classic_control:SimpleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-    }
-)
-
-register(
-    id='LongSimpleReacher-v1',
-    entry_point='alr_envs.classic_control:SimpleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-        "random_start": False
-    }
-)
-
-## Viapoint Reacher
-
-register(
-    id='ViaPointReacher-v0',
-    entry_point='alr_envs.classic_control:ViaPointReacher',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-        "allow_self_collision": False,
-        "collision_penalty": 1000
-    }
-)
-
-## Hole Reacher
-register(
-    id='HoleReacher-v0',
-    entry_point='alr_envs.classic_control:HoleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-        "random_start": True,
-        "allow_self_collision": False,
-        "allow_wall_collision": False,
-        "hole_width": None,
-        "hole_depth": 1,
-        "hole_x": None,
-        "collision_penalty": 100,
-    }
-)
-
-register(
-    id='HoleReacher-v1',
-    entry_point='alr_envs.classic_control:HoleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-        "random_start": False,
-        "allow_self_collision": False,
-        "allow_wall_collision": False,
-        "hole_width": None,
-        "hole_depth": 1,
-        "hole_x": None,
-        "collision_penalty": 100,
-    }
-)
-
-register(
-    id='HoleReacher-v2',
-    entry_point='alr_envs.classic_control:HoleReacherEnv',
-    max_episode_steps=200,
-    kwargs={
-        "n_links": 5,
-        "random_start": False,
-        "allow_self_collision": False,
-        "allow_wall_collision": False,
-        "hole_width": 0.25,
-        "hole_depth": 1,
-        "hole_x": 2,
-        "collision_penalty": 100,
-    }
-)
-
-# MP environments
-
-## Simple Reacher
-versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"]
-for v in versions:
-    name = v.split("-")
-    register(
-        id=f'{name[0]}DMP-{name[1]}',
-        entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-        # max_episode_steps=1,
-        kwargs={
-            "name": f"alr_envs:{v}",
-            "wrappers": [SimpleReacherMPWrapper],
-            "mp_kwargs": {
-                "num_dof": 2 if "long" not in v.lower() else 5,
-                "num_basis": 5,
-                "duration": 2,
-                "alpha_phase": 2,
-                "learn_goal": True,
-                "policy_type": "velocity",
-                "weights_scale": 50,
-            }
-        }
-    )
-
-register(
-    id='ViaPointReacherDMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-    # max_episode_steps=1,
-    kwargs={
-        "name": "alr_envs:ViaPointReacher-v0",
-        "wrappers": [ViaPointReacherMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 5,
-            "num_basis": 5,
-            "duration": 2,
-            "learn_goal": True,
-            "alpha_phase": 2,
-            "policy_type": "velocity",
-            "weights_scale": 50,
-        }
-    }
-)
-
-## Hole Reacher
-versions = ["v0", "v1", "v2"]
-for v in versions:
-    register(
-        id=f'HoleReacherDMP-{v}',
-        entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-        # max_episode_steps=1,
-        kwargs={
-            "name": f"alr_envs:HoleReacher-{v}",
-            "wrappers": [HoleReacherMPWrapper],
-            "mp_kwargs": {
-                "num_dof": 5,
-                "num_basis": 5,
-                "duration": 2,
-                "learn_goal": True,
-                "alpha_phase": 2,
-                "bandwidth_factor": 2,
-                "policy_type": "velocity",
-                "weights_scale": 50,
-                "goal_scale": 0.1
-            }
-        }
-    )
-
-    register(
-        id=f'HoleReacherDetPMP-{v}',
-        entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
-        kwargs={
-            "name": f"alr_envs:HoleReacher-{v}",
-            "wrappers": [HoleReacherMPWrapper],
-            "mp_kwargs": {
-                "num_dof": 5,
-                "num_basis": 5,
-                "duration": 2,
-                "width": 0.025,
-                "policy_type": "velocity",
-                "weights_scale": 0.2,
-                "zero_start": True
-            }
-        }
-    )
-
-# TODO: properly add final_pos
-register(
-    id='HoleReacherFixedGoalDMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-    # max_episode_steps=1,
-    kwargs={
-        "name": "alr_envs:HoleReacher-v0",
-        "wrappers": [HoleReacherMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 5,
-            "num_basis": 5,
-            "duration": 2,
-            "learn_goal": False,
-            "alpha_phase": 2,
-            "policy_type": "velocity",
-            "weights_scale": 50,
-            "goal_scale": 0.1
-        }
-    }
-)
-
-## Ball in Cup
-
-register(
-    id='ALRBallInACupSimpleDMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupSimple-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 3,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "learn_goal": False,
-            "alpha_phase": 3,
-            "bandwidth_factor": 2.5,
-            "policy_type": "motor",
-            "weights_scale": 100,
-            "return_to_start": True,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-register(
-    id='ALRBallInACupDMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-    kwargs={
-        "name": "alr_envs:ALRBallInACup-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 7,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "learn_goal": False,
-            "alpha_phase": 3,
-            "bandwidth_factor": 2.5,
-            "policy_type": "motor",
-            "weights_scale": 100,
-            "return_to_start": True,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-register(
-    id='ALRBallInACupSimpleDetPMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupSimple-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 3,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "width": 0.0035,
-            # "off": -0.05,
-            "policy_type": "motor",
-            "weights_scale": 0.2,
-            "zero_start": True,
-            "zero_goal": True,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-register(
-    id='ALRBallInACupPDSimpleDetPMP-v0',
-    entry_point='alr_envs.mujoco.ball_in_a_cup.biac_pd:make_detpmp_env_helper',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupPDSimple-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 3,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "width": 0.0035,
-            # "off": -0.05,
-            "policy_type": "motor",
-            "weights_scale": 0.2,
-            "zero_start": True,
-            "zero_goal": True,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-register(
-    id='ALRBallInACupPDDetPMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupPD-v0",
-        "num_dof": 7,
-        "num_basis": 5,
-        "duration": 3.5,
-        "post_traj_time": 4.5,
-        "width": 0.0035,
-        # "off": -0.05,
-        "policy_type": "motor",
-        "weights_scale": 0.2,
-        "zero_start": True,
-        "zero_goal": True,
-        "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-        "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-    }
-)
-
-register(
-    id='ALRBallInACupDetPMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupSimple-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 7,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "width": 0.0035,
-            "policy_type": "motor",
-            "weights_scale": 0.2,
-            "zero_start": True,
-            "zero_goal": True,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-register(
-    id='ALRBallInACupGoalDMP-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_contextual_env',
-    kwargs={
-        "name": "alr_envs:ALRBallInACupGoal-v0",
-        "wrappers": [BallInACupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 7,
-            "num_basis": 5,
-            "duration": 3.5,
-            "post_traj_time": 4.5,
-            "learn_goal": True,
-            "alpha_phase": 3,
-            "bandwidth_factor": 2.5,
-            "policy_type": "motor",
-            "weights_scale": 50,
-            "goal_scale": 0.1,
-            "policy_kwargs": {
-                "p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
-                "d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
-            }
-        }
-    }
-)
-
-## DMC
-
-register(
-    id=f'dmc_ball_in_cup-catch_dmp-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
-    # max_episode_steps=1,
-    kwargs={
-        "name": f"ball_in_cup-catch",
-        "wrappers": [DMCBallInCupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 2,
-            "num_basis": 5,
-            "duration": 20,
-            "learn_goal": True,
-            "alpha_phase": 2,
-            "bandwidth_factor": 2,
-            "policy_type": "motor",
-            "weights_scale": 50,
-            "goal_scale": 0.1,
-            "policy_kwargs": {
-                "p_gains": 0.2,
-                "d_gains": 0.05
-            }
-        }
-    }
-)
-
-register(
-    id=f'dmc_ball_in_cup-catch_detpmp-v0',
-    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
-    kwargs={
-        "name": f"ball_in_cup-catch",
-        "wrappers": [DMCBallInCupMPWrapper],
-        "mp_kwargs": {
-            "num_dof": 2,
-            "num_basis": 5,
-            "duration": 20,
-            "width": 0.025,
-            "policy_type": "velocity",
-            "weights_scale": 0.2,
-            "zero_start": True,
-            "policy_kwargs": {
-                "p_gains": 0.2,
-                "d_gains": 0.05
-            }
-        }
-    }
-)
-
-# BBO functions
-
-for dim in [5, 10, 25, 50, 100]:
-    register(
-        id=f'Rosenbrock{dim}-v0',
-        entry_point='alr_envs.stochastic_search:StochasticSearchEnv',
-        max_episode_steps=1,
-        kwargs={
-            "cost_f": Rosenbrock(dim),
-        }
-    )
+ALL_MOTION_PRIMITIVE_ENVIRONMENTS = {
+    key: value + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS[key] +
+         ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS[key] +
+         ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS[key]
+    for key, value in ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS.items()}
--- a/alr_envs/alr/init.py
+++ b/alr_envs/alr/init.py
@ -0,0 +1,329 @@
+from gym import register
+
+from . import classic_control, mujoco
+from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
+from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
+from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
+from .mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
+from .mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv
+from .mujoco.reacher.alr_reacher import ALRReacherEnv
+from .mujoco.reacher.balancing import BalancingEnv
+
+ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []}
+
+# Classic Control
+## Simple Reacher
+register(
+    id='SimpleReacher-v0',
+    entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 2,
+    }
+)
+
+register(
+    id='SimpleReacher-v1',
+    entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 2,
+        "random_start": False
+    }
+)
+
+register(
+    id='LongSimpleReacher-v0',
+    entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+    }
+)
+
+register(
+    id='LongSimpleReacher-v1',
+    entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+        "random_start": False
+    }
+)
+
+## Viapoint Reacher
+
+register(
+    id='ViaPointReacher-v0',
+    entry_point='alr_envs.alr.classic_control:ViaPointReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+        "allow_self_collision": False,
+        "collision_penalty": 1000
+    }
+)
+
+## Hole Reacher
+register(
+    id='HoleReacher-v0',
+    entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+        "random_start": True,
+        "allow_self_collision": False,
+        "allow_wall_collision": False,
+        "hole_width": None,
+        "hole_depth": 1,
+        "hole_x": None,
+        "collision_penalty": 100,
+    }
+)
+
+register(
+    id='HoleReacher-v1',
+    entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+        "random_start": False,
+        "allow_self_collision": False,
+        "allow_wall_collision": False,
+        "hole_width": 0.25,
+        "hole_depth": 1,
+        "hole_x": None,
+        "collision_penalty": 100,
+    }
+)
+
+register(
+    id='HoleReacher-v2',
+    entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+        "random_start": False,
+        "allow_self_collision": False,
+        "allow_wall_collision": False,
+        "hole_width": 0.25,
+        "hole_depth": 1,
+        "hole_x": 2,
+        "collision_penalty": 100,
+    }
+)
+
+# Mujoco
+
+## Reacher
+register(
+    id='ALRReacher-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 0,
+        "n_links": 5,
+        "balance": False,
+    }
+)
+
+register(
+    id='ALRReacherSparse-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 200,
+        "n_links": 5,
+        "balance": False,
+    }
+)
+
+register(
+    id='ALRReacherSparseBalanced-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 200,
+        "n_links": 5,
+        "balance": True,
+    }
+)
+
+register(
+    id='ALRLongReacher-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 0,
+        "n_links": 7,
+        "balance": False,
+    }
+)
+
+register(
+    id='ALRLongReacherSparse-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 200,
+        "n_links": 7,
+        "balance": False,
+    }
+)
+
+register(
+    id='ALRLongReacherSparseBalanced-v0',
+    entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
+    max_episode_steps=200,
+    kwargs={
+        "steps_before_reward": 200,
+        "n_links": 7,
+        "balance": True,
+    }
+)
+
+## Balancing Reacher
+
+register(
+    id='Balancing-v0',
+    entry_point='alr_envs.alr.mujoco:BalancingEnv',
+    max_episode_steps=200,
+    kwargs={
+        "n_links": 5,
+    }
+)
+
+# Motion Primitive Environments
+
+## Simple Reacher
+_versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"]
+for _v in _versions:
+    _name = _v.split("-")
+    _env_id = f'{_name[0]}DMP-{_name[1]}'
+    register(
+        id=_env_id,
+        entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
+        # max_episode_steps=1,
+        kwargs={
+            "name": f"alr_envs:{_v}",
+            "wrappers": [classic_control.simple_reacher.MPWrapper],
+            "mp_kwargs": {
+                "num_dof": 2 if "long" not in _v.lower() else 5,
+                "num_basis": 5,
+                "duration": 20,
+                "alpha_phase": 2,
+                "learn_goal": True,
+                "policy_type": "velocity",
+                "weights_scale": 50,
+            }
+        }
+    )
+    ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+    _env_id = f'{_name[0]}DetPMP-{_name[1]}'
+    register(
+        id=_env_id,
+        entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
+        # max_episode_steps=1,
+        kwargs={
+            "name": f"alr_envs:{_v}",
+            "wrappers": [classic_control.simple_reacher.MPWrapper],
+            "mp_kwargs": {
+                "num_dof": 2 if "long" not in _v.lower() else 5,
+                "num_basis": 5,
+                "duration": 20,
+                "width": 0.025,
+                "policy_type": "velocity",
+                "weights_scale": 0.2,
+                "zero_start": True
+            }
+        }
+    )
+    ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)
+
+# Viapoint reacher
+register(
+    id='ViaPointReacherDMP-v0',
+    entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
+    # max_episode_steps=1,
+    kwargs={
+        "name": "alr_envs:ViaPointReacher-v0",
+        "wrappers": [classic_control.viapoint_reacher.MPWrapper],
+        "mp_kwargs": {
+            "num_dof": 5,
+            "num_basis": 5,
+            "duration": 2,
+            "learn_goal": True,
+            "alpha_phase": 2,
+            "policy_type": "velocity",
+            "weights_scale": 50,
+        }
+    }
+)
+ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
+
+register(
+    id='ViaPointReacherDetPMP-v0',
+    entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
+    # max_episode_steps=1,
+    kwargs={
+        "name": "alr_envs:ViaPointReacher-v0",
+        "wrappers": [classic_control.viapoint_reacher.MPWrapper],
+        "mp_kwargs": {
+            "num_dof": 5,
+            "num_basis": 5,
+            "duration": 2,
+            "width": 0.025,
+            "policy_type": "velocity",
+            "weights_scale": 0.2,
+            "zero_start": True
+        }
+    }
+)
+ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ViaPointReacherDetPMP-v0")
+
+## Hole Reacher
+_versions = ["v0", "v1", "v2"]
+for _v in _versions:
+    _env_id = f'HoleReacherDMP-{_v}'
+    register(
+        id=_env_id,
+        entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
+        # max_episode_steps=1,
+        kwargs={
+            "name": f"alr_envs:HoleReacher-{_v}",
+            "wrappers": [classic_control.hole_reacher.MPWrapper],
+            "mp_kwargs": {
+                "num_dof": 5,
+                "num_basis": 5,
+                "duration": 2,
+                "learn_goal": True,
+                "alpha_phase": 2,
+                "bandwidth_factor": 2,
+                "policy_type": "velocity",
+                "weights_scale": 50,
+                "goal_scale": 0.1
+            }
+        }
+    )
+    ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+    _env_id = f'HoleReacherDetPMP-{_v}'
+    register(
+        id=_env_id,
+        entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
+        kwargs={
+            "name": f"alr_envs:HoleReacher-{_v}",
+            "wrappers": [classic_control.hole_reacher.MPWrapper],
+            "mp_kwargs": {
+                "num_dof": 5,
+                "num_basis": 5,
+                "duration": 2,
+                "width": 0.025,
+                "policy_type": "velocity",
+                "weights_scale": 0.2,
+                "zero_start": True
+            }
+        }
+    )
+    ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)
--- a/alr_envs/alr/classic_control/README.MD
+++ b/alr_envs/alr/classic_control/README.MD
@ -0,0 +1,21 @@
+### Classic Control
+
+## Step-based Environments
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
+|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
+|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 
+|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+
+## MP Environments
+|Name| Description|Horizon|Action Dimension|Context Dimension
+|---|---|---|---|---|
+|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
+|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
+|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 
+|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
+|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
+|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
+
+[//]:  |`HoleReacherDetPMP-v0`|
--- a/alr_envs/alr/classic_control/init.py
+++ b/alr_envs/alr/classic_control/init.py
@ -0,0 +1,3 @@
+from .hole_reacher.hole_reacher import HoleReacherEnv
+from .simple_reacher.simple_reacher import SimpleReacherEnv
+from .viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
--- a/alr_envs/alr/classic_control/hole_reacher/init.py
+++ b/alr_envs/alr/classic_control/hole_reacher/init.py
@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
--- a/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py
+++ b/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py
@ -6,7 +6,7 @@ import numpy as np
 from gym.utils import seeding
 from matplotlib import patches

-from alr_envs.classic_control.utils import check_self_collision
+from alr_envs.alr.classic_control.utils import check_self_collision


 class HoleReacherEnv(gym.Env):
@ -122,12 +122,26 @@ class HoleReacherEnv(gym.Env):
        return self._get_obs().copy()

    def _generate_hole(self):
-        self._tmp_x = self.np_random.uniform(1, 3.5, 1) if self.initial_x is None else np.copy(self.initial_x)
-        self._tmp_width = self.np_random.uniform(0.15, 0.5, 1) if self.initial_width is None else np.copy(
-            self.initial_width)
-        # TODO we do not want this right now.
-        self._tmp_depth = self.np_random.uniform(1, 1, 1) if self.initial_depth is None else np.copy(
-            self.initial_depth)
+        if self.initial_width is None:
+            width = self.np_random.uniform(0.15, 0.5)
+        else:
+            width = np.copy(self.initial_width)
+        if self.initial_x is None:
+            # sample whole on left or right side
+            direction = self.np_random.choice([-1, 1])
+            # Hole center needs to be half the width away from the arm to give a valid setting.
+            x = direction * self.np_random.uniform(width / 2, 3.5)
+        else:
+            x = np.copy(self.initial_x)
+        if self.initial_depth is None:
+            # TODO we do not want this right now.
+            depth = self.np_random.uniform(1, 1)
+        else:
+            depth = np.copy(self.initial_depth)
+
+        self._tmp_width = width
+        self._tmp_x = x
+        self._tmp_depth = depth
        self._goal = np.hstack([self._tmp_x, -self._tmp_depth])

    def _update_joints(self):
@ -202,7 +216,6 @@ class HoleReacherEnv(gym.Env):
        return np.squeeze(end_effector + self._joints[0, :])

    def _check_wall_collision(self, line_points):
-
        # all points that are before the hole in x
        r, c = np.where(line_points[:, :, 0] < (self._tmp_x - self._tmp_width / 2))

@ -250,7 +263,7 @@ class HoleReacherEnv(gym.Env):
            self.fig.show()

        self.fig.gca().set_title(
-            f"Iteration: {self._steps}, distance: {self.end_effector - self._goal}")
+            f"Iteration: {self._steps}, distance: {np.linalg.norm(self.end_effector - self._goal) ** 2}")

        if mode == "human":

--- a/alr_envs/classic_control/hole_reacher/hole_reacher_mp_wrapper.py
+++ b/alr_envs/classic_control/hole_reacher/hole_reacher_mp_wrapper.py
@ -2,10 +2,10 @@ from typing import Tuple, Union

 import numpy as np

-from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
+from mp_env_api import MPEnvWrapper


-class HoleReacherMPWrapper(MPEnvWrapper):
+class MPWrapper(MPEnvWrapper):
    @property
    def active_obs(self):
        return np.hstack([
--- a/alr_envs/alr/classic_control/simple_reacher/init.py
+++ b/alr_envs/alr/classic_control/simple_reacher/init.py
@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
--- a/alr_envs/classic_control/simple_reacher/simple_reacher_mp_wrapper.py
+++ b/alr_envs/classic_control/simple_reacher/simple_reacher_mp_wrapper.py
@ -2,10 +2,10 @@ from typing import Tuple, Union

 import numpy as np

-from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
+from mp_env_api import MPEnvWrapper


-class SimpleReacherMPWrapper(MPEnvWrapper):
+class MPWrapper(MPEnvWrapper):
    @property
    def active_obs(self):
        return np.hstack([
--- a/alr_envs/alr/classic_control/simple_reacher/simple_reacher.py
+++ b/alr_envs/alr/classic_control/simple_reacher/simple_reacher.py
--- a/alr_envs/alr/classic_control/utils.py
+++ b/alr_envs/alr/classic_control/utils.py
--- a/alr_envs/alr/classic_control/viapoint_reacher/init.py
+++ b/alr_envs/alr/classic_control/viapoint_reacher/init.py
@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
--- a/alr_envs/classic_control/viapoint_reacher/viapoint_reacher_mp_wrapper.py
+++ b/alr_envs/classic_control/viapoint_reacher/viapoint_reacher_mp_wrapper.py
@ -2,10 +2,10 @@ from typing import Tuple, Union

 import numpy as np

-from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
+from mp_env_api import MPEnvWrapper


-class ViaPointReacherMPWrapper(MPEnvWrapper):
+class MPWrapper(MPEnvWrapper):
    @property
    def active_obs(self):
        return np.hstack([
--- a/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py
+++ b/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py
@ -5,10 +5,10 @@ import matplotlib.pyplot as plt
 import numpy as np
 from gym.utils import seeding

-from alr_envs.classic_control.utils import check_self_collision
+from alr_envs.alr.classic_control.utils import check_self_collision


-class ViaPointReacher(gym.Env):
+class ViaPointReacherEnv(gym.Env):

    def __init__(self, n_links, random_start: bool = False, via_target: Union[None, Iterable] = None,
                 target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000):
--- a/alr_envs/alr/mujoco/README.MD
+++ b/alr_envs/alr/mujoco/README.MD
@ -0,0 +1,15 @@
+# Custom Mujoco tasks
+
+## Step-based Environments
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
+|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
+|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
+|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
+|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
+|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
+|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
+|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
+|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
+    
--- a/alr_envs/alr/mujoco/init.py
+++ b/alr_envs/alr/mujoco/init.py
@ -0,0 +1,4 @@
+from .reacher.alr_reacher import ALRReacherEnv
+from .reacher.balancing import BalancingEnv
+from .ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
+from .ball_in_a_cup.biac_pd import ALRBallInACupPDEnv
--- a/alr_envs/alr/mujoco/alr_reward_fct.py
+++ b/alr_envs/alr/mujoco/alr_reward_fct.py
--- a/alr_envs/alr/mujoco/ball_in_a_cup/init.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/init.py
--- a/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml
--- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py
@ -35,10 +35,10 @@ class ALRBallInACupEnv(MujocoEnv, utils.EzPickle):
        self.sim_time = 8  # seconds
        self.sim_steps = int(self.sim_time / self.dt)
        if reward_type == "no_context":
-            from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
+            from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
            reward_function = BallInACupReward
        elif reward_type == "contextual_goal":
-            from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
+            from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
            reward_function = BallInACupReward
        else:
            raise ValueError("Unknown reward type: {}".format(reward_type))
--- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py
@ -2,7 +2,7 @@ from typing import Tuple, Union

 import numpy as np

-from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper
+from mp_env_api import MPEnvWrapper


 class BallInACupMPWrapper(MPEnvWrapper):
--- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py
@ -1,5 +1,5 @@
 import numpy as np
-from alr_envs.mujoco import alr_reward_fct
+from alr_envs.alr.mujoco import alr_reward_fct


 class BallInACupReward(alr_reward_fct.AlrReward):
--- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py
@ -1,5 +1,5 @@
 import numpy as np
-from alr_envs.mujoco import alr_reward_fct
+from alr_envs.alr.mujoco import alr_reward_fct


 class BallInACupReward(alr_reward_fct.AlrReward):
--- a/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py
@ -42,10 +42,10 @@ class ALRBallInACupPDEnv(mujoco_env.MujocoEnv, utils.EzPickle):
        self._dt = 0.02
        self.ep_length = 4000  # based on 8 seconds with dt = 0.02 int(self.sim_time / self.dt)
        if reward_type == "no_context":
-            from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
+            from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
            reward_function = BallInACupReward
        elif reward_type == "contextual_goal":
-            from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
+            from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
            reward_function = BallInACupReward
        else:
            raise ValueError("Unknown reward type: {}".format(reward_type))
--- a/alr_envs/alr/mujoco/ball_in_a_cup/utils.py
+++ b/alr_envs/alr/mujoco/ball_in_a_cup/utils.py
@ -1,4 +1,4 @@
-from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
+from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
 from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper
 from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper

--- a/alr_envs/classic_control/hole_reacher/init.py
+++ b/alr_envs/classic_control/hole_reacher/init.py
--- a/alr_envs/alr/mujoco/beerpong/assets/beerpong.xml
+++ b/alr_envs/alr/mujoco/beerpong/assets/beerpong.xml
--- a/alr_envs/alr/mujoco/beerpong/beerpong.py
+++ b/alr_envs/alr/mujoco/beerpong/beerpong.py
@ -37,7 +37,7 @@ class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
        self.sim_time = 8  # seconds
        self.sim_steps = int(self.sim_time / self.dt)
        if reward_function is None:
-            from alr_envs.mujoco.beerpong.beerpong_reward import BeerpongReward
+            from alr_envs.alr.mujoco.beerpong.beerpong_reward import BeerpongReward
            reward_function = BeerpongReward
        self.reward_function = reward_function(self.sim, self.sim_steps)
        self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
--- a/alr_envs/alr/mujoco/beerpong/beerpong_reward.py
+++ b/alr_envs/alr/mujoco/beerpong/beerpong_reward.py
@ -1,5 +1,5 @@
 import numpy as np
-from alr_envs.mujoco import alr_reward_fct
+from alr_envs.alr.mujoco import alr_reward_fct


 class BeerpongReward(alr_reward_fct.AlrReward):
--- a/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py
+++ b/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py
@ -1,5 +1,5 @@
 import numpy as np
-from alr_envs.mujoco import alr_reward_fct
+from alr_envs.alr.mujoco import alr_reward_fct


 class BeerpongReward(alr_reward_fct.AlrReward):
--- a/alr_envs/alr/mujoco/beerpong/beerpong_simple.py
+++ b/alr_envs/alr/mujoco/beerpong/beerpong_simple.py
@ -38,7 +38,7 @@ class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
        self.sim_time = 8  # seconds
        self.sim_steps = int(self.sim_time / self.dt)
        if reward_function is None:
-            from alr_envs.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
+            from alr_envs.alr.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
            reward_function = BeerpongReward
        self.reward_function = reward_function(self.sim, self.sim_steps)
        self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
--- a/alr_envs/alr/mujoco/beerpong/utils.py
+++ b/alr_envs/alr/mujoco/beerpong/utils.py
@ -1,6 +1,6 @@
 from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
-from alr_envs.mujoco.beerpong.beerpong import ALRBeerpongEnv
-from alr_envs.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
+from alr_envs.alr.mujoco.beerpong.beerpong import ALRBeerpongEnv
+from alr_envs.alr.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple


 def make_contextual_env(rank, seed=0):
--- a/alr_envs/alr/mujoco/gym_table_tennis/init.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/init.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/init.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/init.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml
--- a/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py
@ -2,9 +2,9 @@ import numpy as np
 from gym import spaces
 from gym.envs.robotics import robot_env, utils
 # import xml.etree.ElementTree as ET
-from alr_envs.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
+from alr_envs.alr.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
 import glfw
-from alr_envs.mujoco.gym_table_tennis.utils.experiment import ball_initialize
+from alr_envs.alr.mujoco.gym_table_tennis.utils.experiment import ball_initialize
 from pathlib import Path
 import os

--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/init.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/init.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py
@ -1,6 +1,6 @@
 import numpy as np
 from gym.utils import seeding
-from alr_envs.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
+from alr_envs.alr.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
 from pathlib import Path


--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/init.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/init.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py
--- a/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py
+++ b/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py
--- a/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl
--- a/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split1.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split1.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split10.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split10.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split11.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split11.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split12.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split12.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split13.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split13.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split14.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split14.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split15.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split15.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split16.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split16.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split17.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split17.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split18.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split18.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split2.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split2.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split3.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split3.stl
--- a/alr_envs/alr/mujoco/meshes/wam/cup_split4.stl
+++ b/alr_envs/alr/mujoco/meshes/wam/cup_split4.stl
--- a/Show More
+++ b/Show More