Merge pull request #75 from D-o-d-o-x/great_refactor

Refactor and Upgrade to Gymnasium
2023-10-11 13:42:00 +02:00 · 2023-10-11 13:42:00 +02:00 · c420a96d4f
commit c420a96d4f
parent 5ad2942e9a a04a4b6e67
84 changed files with 3094 additions and 2741 deletions
--- a/README.md
+++ b/README.md
@ -1,103 +1,112 @@
-# Fancy Gym
+<h1 align="center">
  <br>
  <img src='./icon.svg' width="250px">
  <br><br>
  <b>Fancy Gym</b>
  <br><br>
 </h1>
-`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
+| :exclamation: Fancy Gym has recently received a major refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the [legacy branch](https://github.com/ALRhub/fancy_gym/tree/legacy). Find out more about what changed [here](https://github.com/ALRhub/fancy_gym/pull/75). |
-of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites
+| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
+
-(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
+Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAI’s renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
-gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
+
-you would then submit a PR for this environment to become part of `fancy_gym`.  
+**Key Features**:
-In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
+
-Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
+- **New Challenging Environments**: `fancy_gym` includes several new environments (Panda Box Pushing, Table Tennis, etc.) that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
 - **Support for Movement Primitives**: `fancy_gym` supports a range of movement primitives (MPs), including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
 - **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
 - **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/), whether you want to use them in the regular step-based setting or using MPs.
 - **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
 ## Movement Primitive Environments (Episode-Based/Black-Box Environments)
-Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
+<p align="justify">
-optimization, and methods that are often used in traditional robotics and control. MP environments are typically
+Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
-episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
+</p>
-Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
+<p align="justify">
-step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
+Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
-of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
+</p>
-respectively as well as a special controller for the MetaWorld control suite.  
+<p align="justify">
-The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
+While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
-the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
+</p>
 the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
 episode. This requires to predict a new action/MP parametrization for each context.
 ## Installation
 1. Clone the repository
-```bash 
+```bash
 git clone git@github.com:ALRhub/fancy_gym.git
 ```
 2. Go to the folder
-```bash 
+```bash
 cd fancy_gym
 ```
 3. Install with
-```bash 
+```bash
 pip install -e .
 ```
-In case you want to use dm_control oder metaworld, you can install them by specifying extras
+We have a few optional dependencies. If you also want to install those use
-```bash 
+```bash
-pip install -e .[dmc,metaworld]
+pip install -e '.[all]' # to install all optional dependencies
 pip install -e '.[dmc,metaworld,box2d,mujoco,mujoco-legacy,jax,testing]' # or choose only those you want
 ```
 > **Note:**   
 > While our library already fully supports the new mujoco bindings, metaworld still relies on
 > [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
 ## How to use Fancy Gym
 We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
-### Step-wise Environments
+### Step-Based Environments
 Regular step based environments added by Fancy Gym are added into the `fancy/` namespace.
 | :exclamation: Legacy versions of Fancy Gym used `fancy_gym.make(...)`. This is no longer supported and will raise an Exception on new versions. |
 | ----------------------------------------------------------------------------------------------------------------------------------------------- |
 ```python
 import gymnasium as gym
 import fancy_gym
-env = fancy_gym.make('Reacher5d-v0', seed=1)
+env = gym.make('fancy/Reacher5d-v0')
-obs = env.reset()
+# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
 # or env = gym.make('dm_control/ball_in_cup-catch-v0')
 # or env = gym.make('Reacher-v2')
 observation = env.reset(seed=1)
 for i in range(1000):
    action = env.action_space.sample()
-    obs, reward, done, info = env.step(action)
+    observation, reward, terminated, truncated, info = env.step(action)
    if i % 5 == 0:
        env.render()
-    if done:
+    if terminated or truncated:
-        obs = env.reset()
+        observation, info = env.reset()
-``` 
+```
 When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
 tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
 custom tasks and standard gym environments can be created without prefixes.
 ### Black-box Environments
-All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
+All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
 each environment returns all collected information from each step as part of the infos. This information is, however,
 mainly meant for debugging as well as logging and not for training.
-|Key| Description|Type
+| Key                 | Description                                                                                                                                                                                                                                      | Type     |
-|---|---|---|
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
-`positions`| Generated trajectory from MP | Optional
+| `positions`         | Generated trajectory from MP                                                                                                                                                                                                                     | Optional |
-`velocities`| Generated trajectory from MP | Optional
+| `velocities`        | Generated trajectory from MP                                                                                                                                                                                                                     | Optional |
-`step_actions`| Step-wise executed action based on controller output | Optional
+| `step_actions`      | Step-wise executed action based on controller output                                                                                                                                                                                             | Optional |
-`step_observations`| Step-wise intermediate observations | Optional
+| `step_observations` | Step-wise intermediate observations                                                                                                                                                                                                              | Optional |
-`step_rewards`| Step-wise rewards | Optional
+| `step_rewards`      | Step-wise rewards                                                                                                                                                                                                                                | Optional |
-`trajectory_length`| Total number of environment interactions | Always
+| `trajectory_length` | Total number of environment interactions                                                                                                                                                                                                         | Always   |
-`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
+| `other`             | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always   |
-Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
+Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by `<original namespace>_<MP name>/`.
 Just keep in mind, calling `step()` executes a full trajectory.
-> **Note:**   
+> **Note:**  
 > Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
 > This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
 > black-box leaning.
@ -105,30 +114,38 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
 > Feel free to try it and open an issue with any problems that occur.
 ```python
 import gymnasium as gym
 import fancy_gym
-env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
+env = gym.make('fancy_ProMP/Reacher5d-v0')
 # or env = gym.make('metaworld_ProDMP/reach-v2')
 # or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
 # or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
 # render() can be called once in the beginning with all necessary arguments.
-# To turn it of again just call render() without any arguments. 
+# To turn it of again just call render() without any arguments.
 env.render(mode='human')
 # This returns the context information, not the full state observation
-obs = env.reset()
+observation, info = env.reset(seed=1)
 for i in range(5):
    action = env.action_space.sample()
-    obs, reward, done, info = env.step(action)
+    observation, reward, terminated, truncated, info = env.step(action)
-    # Done is always True as we are working on the episode level, hence we always reset()
+    # terminated or truncated is always True as we are working on the episode level, hence we always reset()
-    obs = env.reset()
+    observation, info = env.reset()
 ```
 To show all available environments, we provide some additional convenience variables. All of them return a dictionary
-with two keys `DMP` and `ProMP` that store a list of available environment ids.
+with the keys `DMP`, `ProMP`, `ProDMP` and `all` that store a list of available environment ids.
 ```python
 import fancy_gym
 print("All Black-box tasks:")
 print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 print("Fancy Black-box tasks:")
 print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
@ -140,6 +157,9 @@ print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 print("MetaWorld Black-box tasks:")
 print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 print("If you add custom envs, their mp versions will be found in:")
 print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
 ```
 ### How to create a new MP task
@ -151,23 +171,27 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
 from abc import abstractmethod
 from typing import Union, Tuple
-import gym
+import gymnasium as gym
 import numpy as np
 class RawInterfaceWrapper(gym.Wrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
        """
-        Returns boolean mask of the same shape as the observation space.
+            Returns boolean mask of the same shape as the observation space.
-        It determines whether the observation is returned for the contextual case or not.
+            It determines whether the observation is returned for the contextual case or not.
-        This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+            This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
-        E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the 
+            E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
-        context/part of the first observation, the velocities are not necessary in the observation for the task.
+            context/part of the first observation, the velocities are not necessary in the observation for the task.
-        Returns:
+            Returns:
-            bool array representing the indices of the observations
+                bool array representing the indices of the observations
        """
        return np.ones(self.env.observation_space.shape[0], dtype=bool)
@ -197,34 +221,91 @@ class RawInterfaceWrapper(gym.Wrapper):
 ```
 Default configurations for MPs can be overitten by defining attributes in mp_config.
 Available parameters are documented in the [MP_PyTorch Userguide](https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md).
 ```python
 class RawInterfaceWrapper(gym.Wrapper):
    mp_config = {
        'ProMP': {
            'phase_generator_kwargs': {
                'phase_generator_type': 'linear'
                # When selecting another generator type, the default configuration will not be merged for the attribute.
            },
            'controller_kwargs': {
                'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
                'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
            },
            'basis_generator_kwargs': {
                'num_basis': 3,
                'num_basis_zero_start': 1,
                'num_basis_zero_goal': 1,
            },
        },
        'DMP': {},
        'ProDMP': {}.
    }
    [...]
 ```
 If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
 integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
 at the [examples](fancy_gym/examples/).
 If the step-based is already registered with gym, you can simply do the following:
 ```python
-import fancy_gym
+fancy_gym.upgrade(
    id='custom/cool_new_env-v0',
    mp_wrapper=my_custom_MPWrapper
 )
 ```
-# Base environment name, according to structure of above example
+If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
 base_env_id = "dmc:ball_in_cup-catch"
-# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
+```python
-# You can also add other gym.Wrappers in case they are needed, 
+fancy_gym.register(
-# e.g. gym.wrappers.FlattenObservation for dict observations
+    id='custom/cool_new_env-v0',
-wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
+    entry_point=my_custom_env,
-kwargs = {...}
+    mp_wrapper=my_custom_MPWrapper
-env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
+)
 ```
 From this point on, you can access MP-version of your environments via
 ```python
 env = gym.make('custom_ProDMP/cool_new_env-v0')
 rewards = 0
-obs = env.reset()
+observation, info = env.reset()
 # number of samples/full trajectories (multiple environment steps)
 for i in range(5):
    ac = env.action_space.sample()
-    obs, reward, done, info = env.step(ac)
+    observation, reward, terminated, truncated, info = env.step(ac)
    rewards += reward
-    if done:
+    if terminated or truncated:
-        print(base_env_id, rewards)
+        print(rewards)
        rewards = 0
-        obs = env.reset()
+        observation, info = env.reset()
 ```
 ## Citing the Project
 To cite this repository in publications:
 ```bibtex
@software{fancy_gym,
 	title = {Fancy Gym},
 	author = {Otto, Fabian and Celik, Onur and Roth, Dominik and Zhou, Hongyi},
 	abstract = {Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.},
 	url = {https://github.com/ALRhub/fancy_gym},
 	organization = {Autonomous Learning Robots Lab (ALR) at KIT},
 }
 ```
 ## Icon Attribution
 The icon is based on the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) icon as can be found [here](https://gymnasium.farama.org/_static/img/gymnasium_black.svg).
--- a/fancy_gym/init.py
+++ b/fancy_gym/init.py
@ -1,13 +1,17 @@
 from fancy_gym import dmc, meta, open_ai
-from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank
+from fancy_gym import envs as fancy
-from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+from fancy_gym.utils.make_env_helpers import make_bb
-# Convenience function for all MP environments
+from .envs.registry import register, upgrade
-from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+from .envs.registry import ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS, MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS
 from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
 from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
-ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['dm_control']
-    key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
+ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['fancy']
-         ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
+ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['metaworld']
-         ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key]
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['gym']
-    for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
+
 def make(*args, **kwargs):
    """
    As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.
    """
    raise Exception('As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.')
--- a/fancy_gym/black_box/black_box_wrapper.py
+++ b/fancy_gym/black_box/black_box_wrapper.py
@ -1,8 +1,9 @@
-from typing import Tuple, Optional, Callable
+from typing import Tuple, Optional, Callable, Dict, Any
-import gym
+import gymnasium as gym
 import numpy as np
-from gym import spaces
+from gymnasium import spaces
 from gymnasium.core import ObsType
 from mp_pytorch.mp.mp_interfaces import MPInterface
 from fancy_gym.black_box.controller.base_controller import BaseController
@ -67,7 +68,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
        self.reward_aggregation = reward_aggregation
        # spaces
-        self.return_context_observation = not (learn_sub_trajectories or self.do_replanning)
+        self.return_context_observation = not (
            learn_sub_trajectories or self.do_replanning)
        self.traj_gen_action_space = self._get_traj_gen_action_space()
        self.action_space = self._get_action_space()
        self.observation_space = self._get_observation_space()
@ -99,14 +101,17 @@ class BlackBoxWrapper(gym.ObservationWrapper):
            # If we do not do this, the traj_gen assumes we are continuing the trajectory.
            self.traj_gen.reset()
-        clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
+        clipped_params = np.clip(
            action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
        self.traj_gen.set_params(clipped_params)
-        init_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt)
+        init_time = np.array(
            0 if not self.do_replanning else self.current_traj_steps * self.dt)
-        condition_pos = self.condition_pos if self.condition_pos is not None else self.current_pos
+        condition_pos = self.condition_pos if self.condition_pos is not None else self.env.get_wrapper_attr('current_pos')
-        condition_vel = self.condition_vel if self.condition_vel is not None else self.current_vel
+        condition_vel = self.condition_vel if self.condition_vel is not None else self.env.get_wrapper_attr('current_vel')
-        self.traj_gen.set_initial_conditions(init_time, condition_pos, condition_vel)
+        self.traj_gen.set_initial_conditions(
            init_time, condition_pos, condition_vel)
        self.traj_gen.set_duration(duration, self.dt)
        position = get_numpy(self.traj_gen.get_traj_pos())
@ -153,7 +158,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
        trajectory_length = len(position)
        rewards = np.zeros(shape=(trajectory_length,))
        if self.verbose >= 2:
-            actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape)
+            actions = np.zeros(shape=(trajectory_length,) +
                               self.env.action_space.shape)
            observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
                                    dtype=self.env.observation_space.dtype)
@ -161,16 +167,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
        done = False
        if not traj_is_valid:
-            obs, trajectory_return, done, infos = self.env.invalid_traj_callback(action, position, velocity,
+            obs, trajectory_return, terminated, truncated, infos = self.env.invalid_traj_callback(action, position, velocity,
-                                                                                 self.return_context_observation,
+                                                                                                  self.return_context_observation, self.tau_bound, self.delay_bound)
-                                                                                 self.tau_bound, self.delay_bound)
+            return self.observation(obs), trajectory_return, terminated, truncated, infos
            return self.observation(obs), trajectory_return, done, infos
        self.plan_steps += 1
        for t, (pos, vel) in enumerate(zip(position, velocity)):
-            step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel)
+            step_action = self.tracking_controller.get_action(
-            c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high)
+                pos, vel, self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'))
-            obs, c_reward, done, info = self.env.step(c_action)
+            c_action = np.clip(
                step_action, self.env.action_space.low, self.env.action_space.high)
            obs, c_reward, terminated, truncated, info = self.env.step(
                c_action)
            rewards[t] = c_reward
            if self.verbose >= 2:
@ -185,9 +193,7 @@ class BlackBoxWrapper(gym.ObservationWrapper):
            if self.render_kwargs:
                self.env.render(**self.render_kwargs)
-            if done or (self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action,
+            if terminated or truncated or (self.replanning_schedule(self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'), obs, c_action, t + 1 + self.current_traj_steps) and self.plan_steps < self.max_planning_times):
                                                t + 1 + self.current_traj_steps)
                    and self.plan_steps < self.max_planning_times):
                if self.condition_on_desired:
                    self.condition_pos = pos
@ -207,17 +213,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
        infos['trajectory_length'] = t + 1
        trajectory_return = self.reward_aggregation(rewards[:t + 1])
-        return self.observation(obs), trajectory_return, done, infos
+        return self.observation(obs), trajectory_return, terminated, truncated, infos
    def render(self, **kwargs):
        """Only set render options here, such that they can be used during the rollout.
        This only needs to be called once"""
        self.render_kwargs = kwargs
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        self.current_traj_steps = 0
        self.plan_steps = 0
        self.traj_gen.reset()
        self.condition_pos = None
        self.condition_vel = None
-        return super(BlackBoxWrapper, self).reset()
+        return super(BlackBoxWrapper, self).reset(seed=seed, options=options)
--- a/fancy_gym/black_box/factory/controller_factory.py
+++ b/fancy_gym/black_box/factory/controller_factory.py
@ -11,11 +11,11 @@ def get_controller(controller_type: str, **kwargs):
    if controller_type == "motor":
        return PDController(**kwargs)
    elif controller_type == "velocity":
-        return VelController()
+        return VelController(**kwargs)
    elif controller_type == "position":
-        return PosController()
+        return PosController(**kwargs)
    elif controller_type == "metaworld":
-        return MetaWorldController()
+        return MetaWorldController(**kwargs)
    else:
        raise ValueError(f"Specified controller type {controller_type} not supported, "
                         f"please choose one of {ALL_TYPES}.")
--- a/fancy_gym/black_box/raw_interface_wrapper.py
+++ b/fancy_gym/black_box/raw_interface_wrapper.py
@ -1,6 +1,6 @@
 from typing import Union, Tuple
-import gym
+import gymnasium as gym
 import numpy as np
 from mp_pytorch.mp.mp_interfaces import MPInterface
@ -114,7 +114,8 @@ class RawInterfaceWrapper(gym.Wrapper):
        Returns:
            obs: artificial observation if the trajectory is invalid, by default a zero vector
            reward: artificial reward if the trajectory is invalid, by default 0
-            done: artificial done if the trajectory is invalid, by default True
+            terminated: artificial terminated if the trajectory is invalid, by default True
            truncated: artificial truncated if the trajectory is invalid, by default False
            info: artificial info if the trajectory is invalid, by default empty dict
        """
-        return np.zeros(1), 0, True, {}
+        return np.zeros(1), 0, True, False, {}
--- a/fancy_gym/dmc/README.MD
+++ b/fancy_gym/dmc/README.MD
@ -1,7 +1,7 @@
 # DeepMind Control (DMC) Wrappers
-These are the Environment Wrappers for selected 
+These are the Environment Wrappers for selected
-[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) 
+[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
 environments in order to use our Motion Primitive gym interface with them.
 ## MP Environments
@ -9,11 +9,11 @@ environments in order to use our Motion Primitive gym interface with them.
 [//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite &#40;DMC&#41; counterparts. Given most task can be)
 [//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
-|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
+| Name                                     | Description                                                                    | Trajectory Horizon | Action Dimension | Context Dimension |
-|---|---|---|---|---|
+| ---------------------------------------- | ------------------------------------------------------------------------------ | ------------------ | ---------------- | ----------------- |
-|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2
+| `dm_control_ProDMP/ball_in_cup-catch-v0` | A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000               | 10               | 2                 |
-|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2
+| `dm_control_DMP/ball_in_cup-catch-v0`    | A DMP wrapped version of the "catch" task for the "ball_in_cup" environment.   | 1000               | 10               | 2                 |
-|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
+| `dm_control_ProDMP/reacher-easy-v0`      | A ProMP wrapped version of the "easy" task for the "reacher" environment.      | 1000               | 10               | 4                 |
-|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
+| `dm_control_DMP/reacher-easy-v0`         | A DMP wrapped version of the "easy" task for the "reacher" environment.        | 1000               | 10               | 4                 |
-|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
+| `dm_control_ProDMP/reacher-hard-v0`      | A ProMP wrapped version of the "hard" task for the "reacher" environment.      | 1000               | 10               | 4                 |
-|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
+| `dm_control_DMP/reacher-hard-v0`         | A DMP wrapped version of the "hard" task for the "reacher" environment.        | 1000               | 10               | 4                 |
--- a/fancy_gym/dmc/init.py
+++ b/fancy_gym/dmc/init.py
@ -1,245 +1,61 @@
 from copy import deepcopy
 from gymnasium.wrappers import FlattenObservation
 from gymnasium.envs.registration import register
 from ..envs.registry import register
 from . import manipulation, suite
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
 from gym.envs.registration import register
 DEFAULT_BB_DICT_ProMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'promp'
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'linear'
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 50.,
        "d_gains": 1.,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'zero_rbf',
        'num_basis': 5,
        'num_basis_zero_start': 1
    }
 }
 DEFAULT_BB_DICT_DMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'dmp'
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'exp'
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 50.,
        "d_gains": 1.,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'rbf',
        'num_basis': 5
    }
 }
 # DeepMind Control Suite (DMC)
 kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
 kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
 # bandwidth_factor=2
 kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10  # TODO: weight scale 1, but goal scale 0.1
 register(
-    id=f'dmc_ball_in_cup-catch_dmp-v0',
+    id=f"dm_control/ball_in_cup-catch-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_bic_dmp
+    mp_wrapper=suite.ball_in_cup.MPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
 kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
 kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
 register(
-    id=f'dmc_ball_in_cup-catch_promp-v0',
+    id=f"dm_control/reacher-easy-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_bic_promp
+    mp_wrapper=suite.reacher.MPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
 kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
 kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
 # bandwidth_factor=2
 kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 # TODO: weight scale 50, but goal scale 0.1
 kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
 register(
-    id=f'dmc_reacher-easy_dmp-v0',
+    id=f"dm_control/reacher-hard-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_bic_dmp
+    mp_wrapper=suite.reacher.MPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
 kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
 kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
 kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
 register(
    id=f'dmc_reacher-easy_promp-v0',
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_reacher_easy_promp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
 kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
 kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
 # bandwidth_factor = 2
 kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 # TODO: weight scale 50, but goal scale 0.1
 kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
 register(
    id=f'dmc_reacher-hard_dmp-v0',
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_reacher_hard_dmp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
 kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
 kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
 kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
 register(
    id=f'dmc_reacher-hard_promp-v0',
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_reacher_hard_promp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
 _dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
 for _task in _dmc_cartpole_tasks:
    _env_id = f'dmc_cartpole-{_task}_dmp-v0'
    kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
    kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
    kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
    # bandwidth_factor = 2
    kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
    # TODO: weight scale 50, but goal scale 0.1
    kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
    kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
    kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
    register(
-        id=_env_id,
+        id=f'dm_control/cartpole-{_task}-v0',
-        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+        register_step_based=False,
-        kwargs=kwargs_dict_cartpole_dmp
+        mp_wrapper=suite.cartpole.MPWrapper,
        add_mp_types=['DMP', 'ProMP'],
    )
    ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
    _env_id = f'dmc_cartpole-{_task}_promp-v0'
    kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
    kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
    kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
    kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
    kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
    kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_cartpole_promp
    )
    ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
 kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
 # bandwidth_factor = 2
 kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 # TODO: weight scale 50, but goal scale 0.1
 kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
 kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
 kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
 _env_id = f'dmc_cartpole-two_poles_dmp-v0'
 register(
-    id=_env_id,
+    id=f"dm_control/cartpole-two_poles-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_cartpole2poles_dmp
+    mp_wrapper=suite.cartpole.TwoPolesMPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
 kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
 kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
 kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
 kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
 kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
 _env_id = f'dmc_cartpole-two_poles_promp-v0'
 register(
-    id=_env_id,
+    id=f"dm_control/cartpole-three_poles-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_cartpole2poles_promp
+    mp_wrapper=suite.cartpole.ThreePolesMPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
 kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
 # bandwidth_factor = 2
 kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 # TODO: weight scale 50, but goal scale 0.1
 kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
 kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
 kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
 _env_id = f'dmc_cartpole-three_poles_dmp-v0'
 register(
    id=_env_id,
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_cartpole3poles_dmp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
 kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
 kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
 kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
 kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
 kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
 _env_id = f'dmc_cartpole-three_poles_promp-v0'
 register(
    id=_env_id,
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_cartpole3poles_promp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 # DeepMind Manipulation
 kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
 kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
 kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 # TODO: weight scale 50, but goal scale 0.1
 kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
 kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
 register(
-    id=f'dmc_manipulation-reach_site_dmp-v0',
+    id=f"dm_control/reach_site_features-v0",
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    register_step_based=False,
-    kwargs=kwargs_dict_mani_reach_site_features_dmp
+    mp_wrapper=manipulation.reach_site.MPWrapper,
    add_mp_types=['DMP', 'ProMP'],
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
 kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
 kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
 kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
 kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
 register(
    id=f'dmc_manipulation-reach_site_promp-v0',
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_mani_reach_site_features_promp
 )
 ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")
--- a/fancy_gym/dmc/dmc_wrapper.py
+++ b/fancy_gym/dmc/dmc_wrapper.py
@ -1,186 +0,0 @@
 # Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
 # License: MIT
 # Copyright (c) 2020 Denis Yarats
 import collections
 from collections.abc import MutableMapping
 from typing import Any, Dict, Tuple, Optional, Union, Callable
 import gym
 import numpy as np
 from dm_control import composer
 from dm_control.rl import control
 from dm_env import specs
 from gym import spaces
 from gym.core import ObsType
 def _spec_to_box(spec):
    def extract_min_max(s):
        assert s.dtype == np.float64 or s.dtype == np.float32, \
            f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
        dim = int(np.prod(s.shape))
        if type(s) == specs.Array:
            bound = np.inf * np.ones(dim, dtype=s.dtype)
            return -bound, bound
        elif type(s) == specs.BoundedArray:
            zeros = np.zeros(dim, dtype=s.dtype)
            return s.minimum + zeros, s.maximum + zeros
    mins, maxs = [], []
    for s in spec:
        mn, mx = extract_min_max(s)
        mins.append(mn)
        maxs.append(mx)
    low = np.concatenate(mins, axis=0)
    high = np.concatenate(maxs, axis=0)
    assert low.shape == high.shape
    return spaces.Box(low, high, dtype=s.dtype)
 def _flatten_obs(obs: MutableMapping):
    """
    Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
    Args:
        obs: observation to flatten
    Returns: 1D array of observation
    """
    if not isinstance(obs, MutableMapping):
        raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
    # Keep key order consistent for non OrderedDicts
    keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
    obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
    return np.concatenate(obs_vals)
 class DMCWrapper(gym.Env):
    def __init__(self,
                 env: Callable[[], Union[composer.Environment, control.Environment]],
                 ):
        # TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
        self._env = env()
        # action and observation space
        self._action_space = _spec_to_box([self._env.action_spec()])
        self._observation_space = _spec_to_box(self._env.observation_spec().values())
        self._window = None
        self.id = 'dmc'
    def __getattr__(self, item):
        """Propagate only non-existent properties to wrapped env."""
        if item.startswith('_'):
            raise AttributeError("attempted to get missing private attribute '{}'".format(item))
        if item in self.__dict__:
            return getattr(self, item)
        return getattr(self._env, item)
    def _get_obs(self, time_step):
        obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
        return obs
    @property
    def observation_space(self):
        return self._observation_space
    @property
    def action_space(self):
        return self._action_space
    @property
    def dt(self):
        return self._env.control_timestep()
    def seed(self, seed=None):
        self._action_space.seed(seed)
        self._observation_space.seed(seed)
    def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
        assert self._action_space.contains(action)
        extra = {'internal_state': self._env.physics.get_state().copy()}
        time_step = self._env.step(action)
        reward = time_step.reward or 0.
        done = time_step.last()
        obs = self._get_obs(time_step)
        extra['discount'] = time_step.discount
        return obs, reward, done, extra
    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
        time_step = self._env.reset()
        obs = self._get_obs(time_step)
        return obs
    def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
               segmentation=False, scene_option=None, render_flag_overrides=None):
        # assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
        if mode == "rgb_array":
            return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
                                            depth=depth, segmentation=segmentation, scene_option=scene_option,
                                            render_flag_overrides=render_flag_overrides)
        # Render max available buffer size. Larger is only possible by altering the XML.
        img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
                                       width=self._env.physics.model.vis.global_.offwidth,
                                       camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
                                       scene_option=scene_option, render_flag_overrides=render_flag_overrides)
        if depth:
            img = np.dstack([img.astype(np.uint8)] * 3)
        if mode == 'human':
            try:
                import cv2
                if self._window is None:
                    self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
                cv2.imshow(self.id, img[..., ::-1])  # Image in BGR
                cv2.waitKey(1)
            except ImportError:
                raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
            # PYGAME seems to destroy some global rendering configs from the physics render
            # except ImportError:
            #     import pygame
            #     img_copy = img.copy().transpose((1, 0, 2))
            #     if self._window is None:
            #         pygame.init()
            #         pygame.display.init()
            #         self._window = pygame.display.set_mode(img_copy.shape[:2])
            #         self.clock = pygame.time.Clock()
            #
            #     surf = pygame.surfarray.make_surface(img_copy)
            #     self._window.blit(surf, (0, 0))
            #     pygame.event.pump()
            #     self.clock.tick(30)
            #     pygame.display.flip()
    def close(self):
        super().close()
        if self._window is not None:
            try:
                import cv2
                cv2.destroyWindow(self.id)
            except ImportError:
                import pygame
                pygame.display.quit()
                pygame.quit()
    @property
    def reward_range(self) -> Tuple[float, float]:
        reward_spec = self._env.reward_spec()
        if isinstance(reward_spec, specs.BoundedArray):
            return reward_spec.minimum, reward_spec.maximum
        return -float('inf'), float('inf')
    @property
    def metadata(self):
        return {'render.modes': ['human', 'rgb_array'],
                'video.frames_per_second': round(1.0 / self._env.control_timestep())}
--- a/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
+++ b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 0.2,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
            },
            'phase_generator': {
                'alpha_phase': 2,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 500,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
@ -35,4 +57,4 @@ class MPWrapper(RawInterfaceWrapper):
    @property
    def dt(self) -> Union[float, int]:
-        return self.env.dt
+        return self.env.control_timestep()
--- a/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
@ -6,6 +6,25 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
            },
            'phase_generator': {
                'alpha_phase': 2,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 10
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
@ -31,4 +50,4 @@ class MPWrapper(RawInterfaceWrapper):
    @property
    def dt(self) -> Union[float, int]:
-        return self.env.dt
+        return self.env.control_timestep()
--- a/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 10,
                'd_gains': 10,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 0.2,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'p_gains': 10,
                'd_gains': 10,
            },
            'phase_generator': {
                'alpha_phase': 2,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 500,
            },
        },
        'ProDMP': {},
    }
    def __init__(self, env, n_poles: int = 1):
        self.n_poles = n_poles
@ -35,7 +59,7 @@ class MPWrapper(RawInterfaceWrapper):
    @property
    def dt(self) -> Union[float, int]:
-        return self.env.dt
+        return self.env.control_timestep()
 class TwoPolesMPWrapper(MPWrapper):
--- a/fancy_gym/dmc/suite/reacher/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/reacher/mp_wrapper.py
@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
                'd_gains': 1.0,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 0.2,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'p_gains': 50.0,
                'd_gains': 1.0,
            },
            'phase_generator': {
                'alpha_phase': 2,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 500,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
@ -30,4 +54,4 @@ class MPWrapper(RawInterfaceWrapper):
    @property
    def dt(self) -> Union[float, int]:
-        return self.env.dt
+        return self.env.control_timestep()
--- a/fancy_gym/envs/init.py
+++ b/fancy_gym/envs/init.py
@ -1,103 +1,43 @@
 from copy import deepcopy
 import numpy as np
-from gym import register
+from gymnasium import register as gym_register
 from .registry import register, upgrade
 from . import classic_control, mujoco
 from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
 from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
 from .classic_control.simple_reacher import MPWrapper as MPWrapper_SimpleReacher
 from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
 from .classic_control.hole_reacher import MPWrapper as MPWrapper_HoleReacher
 from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
 from .classic_control.viapoint_reacher import MPWrapper as MPWrapper_ViaPointReacher
 from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
 from .mujoco.reacher.mp_wrapper import MPWrapper as MPWrapper_Reacher
 from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
 from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
 from .mujoco.beerpong.mp_wrapper import MPWrapper as MPWrapper_Beerpong
 from .mujoco.beerpong.mp_wrapper import MPWrapper_FixedRelease as MPWrapper_Beerpong_FixedRelease
 from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
 from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
 from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
 from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
 from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
 from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
 from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
 from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \
-                                                BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
+    BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
 from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \
-                                                MAX_EPISODE_STEPS_TABLE_TENNIS
+    MAX_EPISODE_STEPS_TABLE_TENNIS
-
+from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper as MPWrapper_TableTennis
-ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
+from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper_Replan as MPWrapper_TableTennis_Replan
-
+from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper as MPWrapper_TableTennis_VelObs
-DEFAULT_BB_DICT_ProMP = {
+from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper_Replan as MPWrapper_TableTennis_VelObs_Replan
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'promp'
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'linear'
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 1.0,
        "d_gains": 0.1,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'zero_rbf',
        'num_basis': 5,
        'num_basis_zero_start': 1,
        'basis_bandwidth_factor': 3.0,
    },
    "black_box_kwargs": {
    }
 }
 DEFAULT_BB_DICT_DMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'dmp'
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'exp'
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 1.0,
        "d_gains": 0.1,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'rbf',
        'num_basis': 5
    }
 }
 DEFAULT_BB_DICT_ProDMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'prodmp',
        'duration': 2.0,
        'weights_scale': 1.0,
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'exp',
        'tau': 1.5,
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 1.0,
        "d_gains": 0.1,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'prodmp',
        'alpha': 10,
        'num_basis': 5,
    },
    "black_box_kwargs": {
    }
 }
 # Classic Control
-## Simple Reacher
+# Simple Reacher
 register(
-    id='SimpleReacher-v0',
+    id='fancy/SimpleReacher-v0',
-    entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+    entry_point=SimpleReacherEnv,
    mp_wrapper=MPWrapper_SimpleReacher,
    max_episode_steps=200,
    kwargs={
        "n_links": 2,
@ -105,19 +45,20 @@ register(
 )
 register(
-    id='LongSimpleReacher-v0',
+    id='fancy/LongSimpleReacher-v0',
-    entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+    entry_point=SimpleReacherEnv,
    mp_wrapper=MPWrapper_SimpleReacher,
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
    }
 )
-## Viapoint Reacher
+# Viapoint Reacher
 register(
-    id='ViaPointReacher-v0',
+    id='fancy/ViaPointReacher-v0',
-    entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv',
+    entry_point=ViaPointReacherEnv,
    mp_wrapper=MPWrapper_ViaPointReacher,
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
@ -126,10 +67,11 @@ register(
    }
 )
-## Hole Reacher
+# Hole Reacher
 register(
-    id='HoleReacher-v0',
+    id='fancy/HoleReacher-v0',
-    entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
+    entry_point=HoleReacherEnv,
    mp_wrapper=MPWrapper_HoleReacher,
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
@ -145,31 +87,35 @@ register(
 # Mujoco
-## Mujoco Reacher
+# Mujoco Reacher
-for _dims in [5, 7]:
+for dims in [5, 7]:
    register(
-        id=f'Reacher{_dims}d-v0',
+        id=f'fancy/Reacher{dims}d-v0',
-        entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+        entry_point=ReacherEnv,
        mp_wrapper=MPWrapper_Reacher,
        max_episode_steps=MAX_EPISODE_STEPS_REACHER,
        kwargs={
-            "n_links": _dims,
+            "n_links": dims,
        }
    )
    register(
-        id=f'Reacher{_dims}dSparse-v0',
+        id=f'fancy/Reacher{dims}dSparse-v0',
-        entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+        entry_point=ReacherEnv,
        mp_wrapper=MPWrapper_Reacher,
        max_episode_steps=MAX_EPISODE_STEPS_REACHER,
        kwargs={
            "sparse": True,
            'reward_weight': 200,
-            "n_links": _dims,
+            "n_links": dims,
        }
    )
 register(
-    id='HopperJumpSparse-v0',
+    id='fancy/HopperJumpSparse-v0',
    entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
    mp_wrapper=mujoco.hopper_jump.MPWrapper,
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
    kwargs={
        "sparse": True,
@ -177,8 +123,9 @@ register(
 )
 register(
-    id='HopperJump-v0',
+    id='fancy/HopperJump-v0',
    entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
    mp_wrapper=mujoco.hopper_jump.MPWrapper,
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
    kwargs={
        "sparse": False,
@ -188,76 +135,117 @@ register(
    }
 )
 # TODO: Add [MPs] later when finished (old TODO I moved here during refactor)
 register(
-    id='AntJump-v0',
+    id='fancy/AntJump-v0',
    entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
    add_mp_types=[],
 )
 register(
-    id='HalfCheetahJump-v0',
+    id='fancy/HalfCheetahJump-v0',
    entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
    add_mp_types=[],
 )
 register(
-    id='HopperJumpOnBox-v0',
+    id='fancy/HopperJumpOnBox-v0',
    entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
    add_mp_types=[],
 )
 register(
-    id='HopperThrow-v0',
+    id='fancy/HopperThrow-v0',
    entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
    add_mp_types=[],
 )
 register(
-    id='HopperThrowInBasket-v0',
+    id='fancy/HopperThrowInBasket-v0',
    entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
    add_mp_types=[],
 )
 register(
-    id='Walker2DJump-v0',
+    id='fancy/Walker2DJump-v0',
    entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
    add_mp_types=[],
 )
 register(  # [MPDone
    id='fancy/BeerPong-v0',
    entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
    mp_wrapper=MPWrapper_Beerpong,
    max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
    add_mp_types=['ProMP'],
 )
 # Here we use the same reward as in BeerPong-v0, but now consider after the release,
 # only one time step, i.e. we simulate until the end of th episode
 register(
    id='fancy/BeerPongStepBased-v0',
    entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
    mp_wrapper=MPWrapper_Beerpong_FixedRelease,
    max_episode_steps=FIXED_RELEASE_STEP,
    add_mp_types=['ProMP'],
 )
 register(
-    id='BeerPong-v0',
+    id='fancy/BeerPongFixedRelease-v0',
    entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
-    max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
+    mp_wrapper=MPWrapper_Beerpong_FixedRelease,
    max_episode_steps=FIXED_RELEASE_STEP,
    add_mp_types=['ProMP'],
 )
 # Box pushing environments with different rewards
 for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]:
    register(
-        id='BoxPushing{}-v0'.format(reward_type),
+        id='fancy/BoxPushing{}-v0'.format(reward_type),
        entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
        mp_wrapper=mujoco.box_pushing.MPWrapper,
        max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
    )
    register(
-        id='BoxPushingRandomInit{}-v0'.format(reward_type),
+        id='fancy/BoxPushingRandomInit{}-v0'.format(reward_type),
        entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
        mp_wrapper=mujoco.box_pushing.MPWrapper,
        max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
        kwargs={"random_init": True}
    )
-# Here we use the same reward as in BeerPong-v0, but now consider after the release,
+    upgrade(
-# only one time step, i.e. we simulate until the end of th episode
+        id='fancy/BoxPushing{}Replan-v0'.format(reward_type),
-register(
+        base_id='fancy/BoxPushing{}-v0'.format(reward_type),
-    id='BeerPongStepBased-v0',
+        mp_wrapper=mujoco.box_pushing.ReplanMPWrapper,
-    entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
+    )
    max_episode_steps=FIXED_RELEASE_STEP,
 )
 # Table Tennis environments
 for ctxt_dim in [2, 4]:
    register(
-        id='TableTennis{}D-v0'.format(ctxt_dim),
+        id='fancy/TableTennis{}D-v0'.format(ctxt_dim),
        entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
        mp_wrapper=MPWrapper_TableTennis,
        max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
        add_mp_types=['ProMP', 'ProDMP'],
        kwargs={
            "ctxt_dim": ctxt_dim,
            'frame_skip': 4,
        }
    )
    register(
        id='fancy/TableTennis{}DReplan-v0'.format(ctxt_dim),
        entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
        mp_wrapper=MPWrapper_TableTennis,
        max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
        add_mp_types=['ProDMP'],
        kwargs={
            "ctxt_dim": ctxt_dim,
            'frame_skip': 4,
@ -265,626 +253,39 @@ for ctxt_dim in [2, 4]:
    )
 register(
-    id='TableTennisWind-v0',
+    id='fancy/TableTennisWind-v0',
    entry_point='fancy_gym.envs.mujoco:TableTennisWind',
    mp_wrapper=MPWrapper_TableTennis_VelObs,
    add_mp_types=['ProMP', 'ProDMP'],
    max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
 )
 register(
-    id='TableTennisGoalSwitching-v0',
+    id='fancy/TableTennisWindReplan-v0',
    entry_point='fancy_gym.envs.mujoco:TableTennisWind',
    mp_wrapper=MPWrapper_TableTennis_VelObs_Replan,
    add_mp_types=['ProDMP'],
    max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
 )
 register(
    id='fancy/TableTennisGoalSwitching-v0',
    entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
    mp_wrapper=MPWrapper_TableTennis,
    add_mp_types=['ProMP', 'ProDMP'],
    max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
    kwargs={
        'goal_switching_step': 99
    }
 )
 # movement Primitive Environments
 ## Simple Reacher
 _versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}DMP-{_name[1]}'
    kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
    kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
    kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
    kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
    kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
    kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
    kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_simple_reacher_dmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
    kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
    kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
    kwargs_dict_simple_reacher_promp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_simple_reacher_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 # Viapoint reacher
 kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
 kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
 kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
 kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
 kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
 kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
 register(
-    id='ViaPointReacherDMP-v0',
+    id='fancy/TableTennisGoalSwitchingReplan-v0',
-    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+    entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
-    # max_episode_steps=1,
+    mp_wrapper=MPWrapper_TableTennis_Replan,
-    kwargs=kwargs_dict_via_point_reacher_dmp
+    add_mp_types=['ProDMP'],
-)
+    max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
 ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
 kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
 kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
 kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
 kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
 register(
    id="ViaPointReacherProMP-v0",
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_via_point_reacher_promp
 )
 ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
 ## Hole Reacher
 _versions = ["HoleReacher-v0"]
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}DMP-{_name[1]}'
    kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
    kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
    kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
    # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
    kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
    kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
    kwargs_dict_hole_reacher_dmp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        # max_episode_steps=1,
        kwargs=kwargs_dict_hole_reacher_dmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
    kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
    kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
    kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_hole_reacher_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 ## ReacherNd
 _versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}DMP-{_name[1]}'
    kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
    kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
    kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
    kwargs_dict_reacher_dmp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        # max_episode_steps=1,
        kwargs=kwargs_dict_reacher_dmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
    kwargs_dict_reacher_promp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_reacher_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 ########################################################################################################################
 ## Beerpong ProMP
 _versions = ['BeerPong-v0']
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
    kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
    kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
    kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
    kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
    kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
    kwargs_dict_bp_promp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_bp_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 ### BP with Fixed release
 _versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
 for _v in _versions:
    if _v != 'BeerPong-v0':
        _name = _v.split("-")
        _env_id = f'{_name[0]}ProMP-{_name[1]}'
    else:
        _env_id = 'BeerPongFixedReleaseProMP-v0'
    kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
    kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
    kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
    kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
    kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
    kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
    kwargs_dict_bp_promp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_bp_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 ########################################################################################################################
 ## Table Tennis needs to be fixed according to Zhou's implementation
 # TODO: Add later when finished
 # ########################################################################################################################
 #
 # ## AntJump
 # _versions = ['AntJump-v0']
 # for _v in _versions:
 #     _name = _v.split("-")
 #     _env_id = f'{_name[0]}ProMP-{_name[1]}'
 #     kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
 #     kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
 #     kwargs_dict_ant_jump_promp['name'] = _v
 #     register(
 #         id=_env_id,
 #         entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
 #         kwargs=kwargs_dict_ant_jump_promp
 #     )
 #     ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 #
 # ########################################################################################################################
 #
 # ## HalfCheetahJump
 # _versions = ['HalfCheetahJump-v0']
 # for _v in _versions:
 #     _name = _v.split("-")
 #     _env_id = f'{_name[0]}ProMP-{_name[1]}'
 #     kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
 #     kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
 #     kwargs_dict_halfcheetah_jump_promp['name'] = _v
 #     register(
 #         id=_env_id,
 #         entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
 #         kwargs=kwargs_dict_halfcheetah_jump_promp
 #     )
 #     ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 #
 # ########################################################################################################################
 ## HopperJump
 _versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
             # 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
             ]
 # TODO: Check if all environments work with the same MPWrapper
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
    kwargs_dict_hopper_jump_promp['name'] = _v
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_hopper_jump_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 # ########################################################################################################################
 ## Box Pushing
 _versions = ['BoxPushingDense-v0', 'BoxPushingTemporalSparse-v0', 'BoxPushingTemporalSpatialSparse-v0',
             'BoxPushingRandomInitDense-v0', 'BoxPushingRandomInitTemporalSparse-v0',
             'BoxPushingRandomInitTemporalSpatialSparse-v0']
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_box_pushing_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_box_pushing_promp['wrappers'].append(mujoco.box_pushing.MPWrapper)
    kwargs_dict_box_pushing_promp['name'] = _v
    kwargs_dict_box_pushing_promp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
    kwargs_dict_box_pushing_promp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
    kwargs_dict_box_pushing_promp['basis_generator_kwargs']['basis_bandwidth_factor'] = 2 # 3.5, 4 to try
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_box_pushing_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProDMP-{_name[1]}'
    kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
    kwargs_dict_box_pushing_prodmp['name'] = _v
    kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
    kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
    kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
    kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
    kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_box_pushing_prodmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
    kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
    kwargs_dict_box_pushing_prodmp['name'] = _v
    kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
    kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
    kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
    kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
    kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
    kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
    kwargs_dict_box_pushing_prodmp['black_box_kwargs']['max_planning_times'] = 4
    kwargs_dict_box_pushing_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 25 == 0
    kwargs_dict_box_pushing_prodmp['black_box_kwargs']['condition_on_desired'] = True
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_box_pushing_prodmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 ## Table Tennis
 _versions = ['TableTennis2D-v0', 'TableTennis4D-v0', 'TableTennisWind-v0', 'TableTennisGoalSwitching-v0']
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProMP-{_name[1]}'
    kwargs_dict_tt_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    if _v == 'TableTennisWind-v0':
        kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
    else:
        kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
    kwargs_dict_tt_promp['name'] = _v
    kwargs_dict_tt_promp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
    kwargs_dict_tt_promp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
    kwargs_dict_tt_promp['phase_generator_kwargs']['learn_tau'] = True
    kwargs_dict_tt_promp['phase_generator_kwargs']['learn_delay'] = True
    kwargs_dict_tt_promp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
    kwargs_dict_tt_promp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
    kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis'] = 3
    kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_start'] = 1
    kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_goal'] = 1
    kwargs_dict_tt_promp['black_box_kwargs']['verbose'] = 2
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_tt_promp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ProDMP-{_name[1]}'
    kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    if _v == 'TableTennisWind-v0':
        kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
    else:
        kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
    kwargs_dict_tt_prodmp['name'] = _v
    kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
    kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.7
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['relative_goal'] = True
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['disable_goal'] = True
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 3
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_tt_prodmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 for _v in _versions:
    _name = _v.split("-")
    _env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
    kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    if _v == 'TableTennisWind-v0':
        kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
    else:
        kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
    kwargs_dict_tt_prodmp['name'] = _v
    kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
    kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = False
    kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['goal_offset'] = 1.0
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 2
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
    kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
    kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
    kwargs_dict_tt_prodmp['black_box_kwargs']['max_planning_times'] = 3
    kwargs_dict_tt_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 50 == 0
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_tt_prodmp
    )
    ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 #
 # ## Walker2DJump
 # _versions = ['Walker2DJump-v0']
 # for _v in _versions:
 #     _name = _v.split("-")
 #     _env_id = f'{_name[0]}ProMP-{_name[1]}'
 #     kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
 #     kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
 #     kwargs_dict_walker2d_jump_promp['name'] = _v
 #     register(
 #         id=_env_id,
 #         entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
 #         kwargs=kwargs_dict_walker2d_jump_promp
 #     )
 #     ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
 ### Depricated, we will not provide non random starts anymore
 """
 register(
    id='SimpleReacher-v1',
    entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
    max_episode_steps=200,
    kwargs={
-        "n_links": 2,
+        'goal_switching_step': 99
        "random_start": False
    }
 )
 register(
    id='LongSimpleReacher-v1',
    entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
        "random_start": False
    }
 )
 register(
    id='HoleReacher-v1',
    entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
        "random_start": False,
        "allow_self_collision": False,
        "allow_wall_collision": False,
        "hole_width": 0.25,
        "hole_depth": 1,
        "hole_x": None,
        "collision_penalty": 100,
    }
 )
 register(
    id='HoleReacher-v2',
    entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
    max_episode_steps=200,
    kwargs={
        "n_links": 5,
        "random_start": False,
        "allow_self_collision": False,
        "allow_wall_collision": False,
        "hole_width": 0.25,
        "hole_depth": 1,
        "hole_x": 2,
        "collision_penalty": 1,
    }
 )
 # CtxtFree are v0, Contextual are v1
 register(
    id='AntJump-v0',
    entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
        "context": False
    }
 )
 # CtxtFree are v0, Contextual are v1
 register(
    id='HalfCheetahJump-v0',
    entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
        "context": False
    }
 )
 register(
    id='HopperJump-v0',
    entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
        "context": False,
        "healthy_reward": 1.0
    }
 )
 """
 ### Deprecated used for CorL paper
 """
 _vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
 for i in _vs:
    _env_id = f'ALRReacher{i}-v0'
    register(
        id=_env_id,
        entry_point='fancy_gym.envs.mujoco:ReacherEnv',
        max_episode_steps=200,
        kwargs={
            "steps_before_reward": 0,
            "n_links": 5,
            "balance": False,
            '_ctrl_cost_weight': i
        }
    )
    _env_id = f'ALRReacherSparse{i}-v0'
    register(
        id=_env_id,
        entry_point='fancy_gym.envs.mujoco:ReacherEnv',
        max_episode_steps=200,
        kwargs={
            "steps_before_reward": 200,
            "n_links": 5,
            "balance": False,
            '_ctrl_cost_weight': i
        }
    )
    _vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
 for i in _vs:
    _env_id = f'ALRReacher{i}ProMP-v0'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
        kwargs={
            "name": f"{_env_id.replace('ProMP', '')}",
            "wrappers": [mujoco.reacher.MPWrapper],
            "mp_kwargs": {
                "num_dof": 5,
                "num_basis": 5,
                "duration": 4,
                "policy_type": "motor",
                # "weights_scale": 5,
                "n_zero_basis": 1,
                "zero_start": True,
                "policy_kwargs": {
                    "p_gains": 1,
                    "d_gains": 0.1
                }
            }
        }
    )
    _env_id = f'ALRReacherSparse{i}ProMP-v0'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
        kwargs={
            "name": f"{_env_id.replace('ProMP', '')}",
            "wrappers": [mujoco.reacher.MPWrapper],
            "mp_kwargs": {
                "num_dof": 5,
                "num_basis": 5,
                "duration": 4,
                "policy_type": "motor",
                # "weights_scale": 5,
                "n_zero_basis": 1,
                "zero_start": True,
                "policy_kwargs": {
                    "p_gains": 1,
                    "d_gains": 0.1
                }
            }
        }
    )
    register(
        id='HopperJumpOnBox-v0',
        entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
        max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
        kwargs={
            "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
            "context": False
        }
    )
    register(
    id='HopperThrow-v0',
    entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
        "context": False
    }
    )   
    register(
    id='HopperThrowInBasket-v0',
    entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
    max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
        "context": False
    }
    )
    register(
    id='Walker2DJump-v0',
    entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
    max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
    kwargs={
        "max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
        "context": False
    }
    )
    register(id='TableTennis2DCtxt-v1',
         entry_point='fancy_gym.envs.mujoco:TTEnvGym',
         max_episode_steps=MAX_EPISODE_STEPS,
         kwargs={'ctxt_dim': 2, 'fixed_goal': True})
    register(
        id='BeerPong-v0',
        entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
        max_episode_steps=300,
        kwargs={
            "rndm_goal": False,
            "cup_goal_pos": [0.1, -2.0],
            "frame_skip": 2
        }
        )
 """
--- a/fancy_gym/envs/classic_control/README.MD
+++ b/fancy_gym/envs/classic_control/README.MD
@ -1,18 +1,20 @@
 ### Classic Control
 ## Step-based Environments
-|Name| Description|Horizon|Action Dimension|Observation Dimension
+
-|---|---|---|---|---|
+| Name                         | Description                                                                                                                                                                                                         | Horizon | Action Dimension | Observation Dimension |
-|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
-|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
+| `fancy/SimpleReacher-v0`     | Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200     | 2                | 9                     |
-|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 
+| `fancy/LongSimpleReacher-v0` | Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200     | 5                | 18                    |
-|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+| `fancy/ViaPointReacher-v0`   | Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.                                 | 200     | 5                | 18                    |
 | `fancy/HoleReacher-v0`       | 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls                                                                                                 | 200     | 5                | 18                    |
 ## MP Environments
 |Name| Description|Horizon|Action Dimension|Context Dimension
 |---|---|---|---|---|
 |`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
 |`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
 |`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
-[//]:  |`HoleReacherProMPP-v0`|
+| Name                                | Description                                                                                              | Horizon | Action Dimension | Context Dimension |
 | ----------------------------------- | -------------------------------------------------------------------------------------------------------- | ------- | ---------------- | ----------------- |
 | `fancy_DMP/ViaPointReacher-v0`      | A DMP provides a trajectory for the `fancy/ViaPointReacher-v0` task.                                     | 200     | 25               |
 | `fancy_DMP/HoleReacherFixedGoal-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task with a fixed goal attractor.             | 200     | 25               |
 | `fancy_DMP/HoleReacher-v0`          | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task. The goal attractor needs to be learned. | 200     | 30               |
 [//]: |`fancy/HoleReacherProMPP-v0`|
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
@ -1,10 +1,10 @@
-from typing import Union, Tuple, Optional
+from typing import Union, Tuple, Optional, Any, Dict
-import gym
+import gymnasium as gym
 import numpy as np
-from gym import spaces
+from gymnasium import spaces
-from gym.core import ObsType
+from gymnasium.core import ObsType
-from gym.utils import seeding
+from gymnasium.utils import seeding
 from fancy_gym.envs.classic_control.utils import intersect
@ -55,7 +55,6 @@ class BaseReacherEnv(gym.Env):
        self.fig = None
        self._steps = 0
        self.seed()
    @property
    def dt(self) -> Union[float, int]:
@ -69,10 +68,15 @@ class BaseReacherEnv(gym.Env):
    def current_vel(self):
        return self._angle_velocity.copy()
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        # Sample only orientation of first link, i.e. the arm is always straight.
-        if self.random_start:
+        super(BaseReacherEnv, self).reset(seed=seed, options=options)
        try:
            random_start = options.get('random_start', self.random_start)
        except AttributeError:
            random_start = self.random_start
        if random_start:
            first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
            self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
            self._start_pos = self._joint_angles.copy()
@ -84,7 +88,7 @@ class BaseReacherEnv(gym.Env):
        self._update_joints()
        self._steps = 0
-        return self._get_obs().copy()
+        return self._get_obs().copy(), {}
    def _update_joints(self):
        """
@ -124,10 +128,6 @@ class BaseReacherEnv(gym.Env):
    def _terminate(self, info) -> bool:
        raise NotImplementedError
    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]
    def close(self):
        super(BaseReacherEnv, self).close()
        del self.fig
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
@ -1,5 +1,5 @@
 import numpy as np
-from gym import spaces
+from gymnasium import spaces
 from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -32,6 +32,7 @@ class BaseReacherDirectEnv(BaseReacherEnv):
        reward, info = self._get_reward(action)
        self._steps += 1
-        done = self._terminate(info)
+        terminated = self._terminate(info)
        truncated = False
-        return self._get_obs().copy(), reward, done, info
+        return self._get_obs().copy(), reward, terminated, truncated, info
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
@ -1,5 +1,5 @@
 import numpy as np
-from gym import spaces
+from gymnasium import spaces
 from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -31,6 +31,7 @@ class BaseReacherTorqueEnv(BaseReacherEnv):
        reward, info = self._get_reward(action)
        self._steps += 1
-        done = False
+        terminated = False
        truncated = False
-        return self._get_obs().copy(), reward, done, info
+        return self._get_obs().copy(), reward, terminated, truncated, info
--- a/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
+++ b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
@ -1,17 +1,20 @@
-from typing import Union, Optional, Tuple
+from typing import Union, Optional, Tuple, Any, Dict
-import gym
+import gymnasium as gym
 import matplotlib.pyplot as plt
 import numpy as np
-from gym.core import ObsType
+from gymnasium import spaces
 from gymnasium.core import ObsType
 from matplotlib import patches
 from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
 from . import MPWrapper
 MAX_EPISODE_STEPS_HOLEREACHER = 200
 class HoleReacherEnv(BaseReacherDirectEnv):
    def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
                 hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
                 allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
@ -40,7 +43,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
            [np.inf]  # env steps, because reward start after n steps TODO: Maybe
        ])
        # self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
-        self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+        self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
        if rew_fct == "simple":
            from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
@ -54,13 +57,18 @@ class HoleReacherEnv(BaseReacherDirectEnv):
        else:
            raise ValueError("Unknown reward function {}".format(rew_fct))
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        # initialize seed here as the random goal needs to be generated before the super reset()
        gym.Env.reset(self, seed=seed, options=options)
        self._generate_hole()
        self._set_patches()
        self.reward_function.reset()
-        return super().reset()
+        # do not provide seed to avoid setting it twice
        return super(HoleReacherEnv, self).reset(options=options)
    def _get_reward(self, action: np.ndarray) -> (float, dict):
        return self.reward_function.get_reward(self)
@ -160,7 +168,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
        # all points that are above the hole
        r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
-                line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
+            line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
        # check if any of those points are below surface
        nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
@ -223,16 +231,3 @@ class HoleReacherEnv(BaseReacherDirectEnv):
            self.fig.gca().add_patch(left_block)
            self.fig.gca().add_patch(right_block)
            self.fig.gca().add_patch(hole_floor)
 if __name__ == "__main__":
    env = HoleReacherEnv(5)
    env.reset()
    for i in range(10000):
        ac = env.action_space.sample()
        obs, rew, done, info = env.step(ac)
        env.render()
        if done:
            env.reset()
--- a/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
@ -7,6 +7,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'controller_type': 'velocity',
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 2,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'controller_type': 'velocity',
            },
            'trajectory_generator_kwargs': {
                # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
                'weights_scale': 500,
            },
            'phase_generator_kwargs': {
                'alpha_phase': 2.5,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self):
        return np.hstack([
--- a/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
@ -7,6 +7,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 0.6,
                'd_gains': 0.075,
            },
        },
        'DMP': {
            'controller_kwargs': {
                'p_gains': 0.6,
                'd_gains': 0.075,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 50,
            },
            'phase_generator_kwargs': {
                'alpha_phase': 2,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self):
        return np.hstack([
--- a/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
+++ b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
@ -1,11 +1,12 @@
-from typing import Iterable, Union, Optional, Tuple
+from typing import Iterable, Union, Optional, Tuple, Any, Dict
 import matplotlib.pyplot as plt
 import numpy as np
-from gym import spaces
+from gymnasium import spaces
-from gym.core import ObsType
+from gymnasium.core import ObsType
 from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
 from . import MPWrapper
 class SimpleReacherEnv(BaseReacherTorqueEnv):
@ -42,11 +43,15 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
    # def start_pos(self):
    #     return self._start_pos
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        # Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
        # (Env will not behave deterministic otherwise)
        # Yes, there is probably a more elegant solution to this problem...
        self._generate_goal()
-
+        super().reset(seed=seed, options=options)
-        return super().reset()
+        self._generate_goal()
        return super().reset(seed=seed, options=options)
    def _get_reward(self, action: np.ndarray):
        diff = self.end_effector - self._goal
@ -127,15 +132,3 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
        self.fig.canvas.draw()
        self.fig.canvas.flush_events()
 if __name__ == "__main__":
    env = SimpleReacherEnv(5)
    env.reset()
    for i in range(200):
        ac = env.action_space.sample()
        obs, rew, done, info = env.step(ac)
        env.render()
        if done:
            break
--- a/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
@ -7,6 +7,26 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'controller_type': 'velocity',
            },
        },
        'DMP': {
            'controller_kwargs': {
                'controller_type': 'velocity',
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 50,
            },
            'phase_generator_kwargs': {
                'alpha_phase': 2,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self):
        return np.hstack([
--- a/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
@ -1,11 +1,13 @@
-from typing import Iterable, Union, Tuple, Optional
+from typing import Iterable, Union, Tuple, Optional, Any, Dict
-import gym
+import gymnasium as gym
 import matplotlib.pyplot as plt
 import numpy as np
-from gym.core import ObsType
+from gymnasium import spaces
 from gymnasium.core import ObsType
 from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
 from . import MPWrapper
 class ViaPointReacherEnv(BaseReacherDirectEnv):
@ -34,16 +36,21 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
            [np.inf] * 2,  # x-y coordinates of target distance
            [np.inf]  # env steps, because reward start after n steps
        ])
-        self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+        self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
    # @property
    # def start_pos(self):
    #     return self._start_pos
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        # Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
        # (Env will not behave deterministic otherwise)
        # Yes, there is probably a more elegant solution to this problem...
        self._generate_goal()
-        return super().reset()
+        super().reset(seed=seed, options=options)
        self._generate_goal()
        return super().reset(seed=seed, options=options)
    def _generate_goal(self):
        # TODO: Maybe improve this later, this can yield quite a lot of invalid settings
@ -183,16 +190,3 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
                plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
                plt.pause(0.01)
 if __name__ == "__main__":
    env = ViaPointReacherEnv(5)
    env.reset()
    for i in range(10000):
        ac = env.action_space.sample()
        obs, rew, done, info = env.step(ac)
        env.render()
        if done:
            env.reset()
--- a/fancy_gym/envs/mujoco/README.MD
+++ b/fancy_gym/envs/mujoco/README.MD
@ -1,15 +1,48 @@
 # Custom Mujoco tasks
 ## Step-based Environments
-|Name| Description|Horizon|Action Dimension|Observation Dimension
+
-|---|---|---|---|---|
+| Name                                       | Description                                                                                        | Horizon | Action Dimension | Observation Dimension |
-|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
+| ------------------------------------------ | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
-|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
+| `fancy/Reacher-v0`                         | Modified (5 links) gymnasiums's mujoco `Reacher-v2` (2 links)                                      | 200     | 5                | 21                    |
-|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
+| `fancy/ReacherSparse-v0`                   | Same as `fancy/Reacher-v0`, but the distance penalty is only provided in the last time step.       | 200     | 5                | 21                    |
-|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
+| `fancy/ReacherSparseBalanced-v0`           | Same as `fancy/ReacherSparse-v0`, but the end-effector has to remain upright.                      | 200     | 5                | 21                    |
-|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
+| `fancy/LongReacher-v0`                     | Modified (7 links) gymnasiums's mujoco `Reacher-v2` (2 links)                                      | 200     | 7                | 27                    |
-|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
+| `fancy/LongReacherSparse-v0`               | Same as `fancy/LongReacher-v0`, but the distance penalty is only provided in the last time step.   | 200     | 7                | 27                    |
-|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
+| `fancy/LongReacherSparseBalanced-v0`       | Same as `fancy/LongReacherSparse-v0`, but the end-effector has to remain upright.                  | 200     | 7                | 27                    |
-|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
+| `fancy/Reacher5d-v0`                       | Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv`                       | 200     | 5                | 20                    |
-|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
+| `fancy/Reacher5dSparse-v0`                 | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv`                | 200     | 5                | 20                    |
-    
+| `fancy/Reacher7d-v0`                       | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv`                       | 200     | 7                | 22                    |
 | `fancy/Reacher7dSparse-v0`                 | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv`                | 200     | 7                | 22                    |
 | `fancy/HopperJumpSparse-v0`                | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper`                | 250     | 3                | 15 / 16\*             |
 | `fancy/HopperJump-v0`                      | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper`            | 250     | 3                | 15 / 16\*             |
 | `fancy/AntJump-v0`                         | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant`                                          | 200     | 8                | 119                   |
 | `fancy/HalfCheetahJump-v0`                 | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah`                          | 100     | 6                | 112                   |
 | `fancy/HopperJumpOnBox-v0`                 | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper`                             | 250     | 4                | 16 / 100\*            |
 | `fancy/HopperThrow-v0`                     | Hopper Throw task, based on Gymnasium's `gym.envs.mujoco.Hopper`                                   | 250     | 3                | 18 / 100\*            |
 | `fancy/HopperThrowInBasket-v0`             | Hopper Throw in Basket task, based on Gymnasium's `gym.envs.mujoco.Hopper`                         | 250     | 3                | 18 / 100\*            |
 | `fancy/Walker2DJump-v0`                    | Walker 2D Jump task, based on Gymnasium's `gym.envs.mujoco.Walker2d`                               | 300     | 6                | 18 / 19\*             |
 | `fancy/BeerPong-v0`                        | Beer Pong task, based on a custom environment with multiple task variations                        | 300     | 3                | 29                    |
 | `fancy/BeerPongStepBased-v0`               | Step-based Beer Pong task, based on a custom environment with episodic rewards                     | 300     | 3                | 29                    |
 | `fancy/BeerPongFixedRelease-v0`            | Beer Pong with fixed release, based on a custom environment with episodic rewards                  | 300     | 3                | 29                    |
 | `fancy/BoxPushingDense-v0`                 | Custom Box-pushing task with dense rewards                                                         | 100     | 3                | 13                    |
 | `fancy/BoxPushingTemporalSparse-v0`        | Custom Box-pushing task with temporally sparse rewards                                             | 100     | 3                | 13                    |
 | `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards                               | 100     | 3                | 13                    |
 | `fancy/TableTennis2D-v0`                   | Table Tennis task with 2D context, based on a custom environment for table tennis                  | 350     | 7                | 19                    |
 | `fancy/TableTennis2DReplan-v0`             | Table Tennis task with 2D context and replanning, based on a custom environment for table tennis   | 350     | 7                | 19                    |
 | `fancy/TableTennis4D-v0`                   | Table Tennis task with 4D context, based on a custom environment for table tennis                  | 350     | 7                | 22                    |
 | `fancy/TableTennis4DReplan-v0`             | Table Tennis task with 4D context and replanning, based on a custom environment for table tennis   | 350     | 7                | 22                    |
 | `fancy/TableTennisWind-v0`                 | Table Tennis task with wind effects, based on a custom environment for table tennis                | 350     | 7                | 19                    |
 | `fancy/TableTennisGoalSwitching-v0`        | Table Tennis task with goal switching, based on a custom environment for table tennis              | 350     | 7                | 19                    |
 | `fancy/TableTennisWindReplan-v0`           | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350     | 7                | 19                    |
 \*Observation dimensions depend on configuration.
 <!--
 No longer used?
 | Name                        | Description                                                                                         | Horizon | Action Dimension | Observation Dimension |
 | --------------------------- | --------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
 | `fancy/BallInACupSimple-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector.       | 4000    | 3                | wip                   |
 | `fancy/BallInACup-v0`       | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector        | 4000    | 7                | wip                   |
 | `fancy/BallInACupGoal-v0`   | Similar to `fancy/BallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000    | 7                | wip                   |
 -->
--- a/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
+++ b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
@ -1,8 +1,11 @@
-from typing import Tuple, Union, Optional
+from typing import Tuple, Union, Optional, Any, Dict
 import numpy as np
-from gym.core import ObsType
+from gymnasium.core import ObsType
-from gym.envs.mujoco.ant_v4 import AntEnv
+from gymnasium.envs.mujoco.ant_v4 import AntEnv, DEFAULT_CAMERA_CONFIG
 from gymnasium import utils
 from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 MAX_EPISODE_STEPS_ANTJUMP = 200
@ -12,8 +15,74 @@ MAX_EPISODE_STEPS_ANTJUMP = 200
 #  to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
 #  as possible, while landing at a specific target position
 class AntEnvCustomXML(AntEnv):
    def __init__(
        self,
        xml_file="ant.xml",
        ctrl_cost_weight=0.5,
        use_contact_forces=False,
        contact_cost_weight=5e-4,
        healthy_reward=1.0,
        terminate_when_unhealthy=True,
        healthy_z_range=(0.2, 1.0),
        contact_force_range=(-1.0, 1.0),
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs,
    ):
        utils.EzPickle.__init__(
            self,
            xml_file,
            ctrl_cost_weight,
            use_contact_forces,
            contact_cost_weight,
            healthy_reward,
            terminate_when_unhealthy,
            healthy_z_range,
            contact_force_range,
            reset_noise_scale,
            exclude_current_positions_from_observation,
            **kwargs,
        )
-class AntJumpEnv(AntEnv):
+        self._ctrl_cost_weight = ctrl_cost_weight
        self._contact_cost_weight = contact_cost_weight
        self._healthy_reward = healthy_reward
        self._terminate_when_unhealthy = terminate_when_unhealthy
        self._healthy_z_range = healthy_z_range
        self._contact_force_range = contact_force_range
        self._reset_noise_scale = reset_noise_scale
        self._use_contact_forces = use_contact_forces
        self._exclude_current_positions_from_observation = (
            exclude_current_positions_from_observation
        )
        obs_shape = 27 + 1
        if not exclude_current_positions_from_observation:
            obs_shape += 2
        if use_contact_forces:
            obs_shape += 84
        observation_space = Box(
            low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64
        )
        MujocoEnv.__init__(
            self,
            xml_file,
            5,
            observation_space=observation_space,
            default_camera_config=DEFAULT_CAMERA_CONFIG,
            **kwargs,
        )
 class AntJumpEnv(AntEnvCustomXML):
    """
    Initialization changes to normal Ant:
    - healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
@ -61,9 +130,10 @@ class AntJumpEnv(AntEnv):
        costs = ctrl_cost + contact_cost
-        done = bool(height < 0.3)  # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
+        terminated = bool(
            height < 0.3)  # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
-        if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done:
+        if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or terminated:
            # -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
            # height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
            height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
@ -80,19 +150,21 @@ class AntJumpEnv(AntEnv):
            'max_height': self.max_height,
            'goal': self.goal
        }
        truncated = False
-        return obs, reward, done, info
+        return obs, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.goal)
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        self.current_step = 0
        self.max_height = 0
        # goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
        ret = super().reset(seed=seed, options=options)
        self.goal = self.np_random.uniform(1.0, 2.5, 1)
-        return super().reset()
+        return ret
    # reset_model had to be implemented in every env to make it deterministic
    def reset_model(self):
--- a/fancy_gym/envs/mujoco/beerpong/beerpong.py
+++ b/fancy_gym/envs/mujoco/beerpong/beerpong.py
@ -1,9 +1,13 @@
 import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
 import numpy as np
-from gym import utils
+from gymnasium import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium.core import ObsType
 from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 import mujoco
 MAX_EPISODE_STEPS_BEERPONG = 300
 FIXED_RELEASE_STEP = 62  # empirically evaluated for frame_skip=2!
@ -30,7 +34,16 @@ CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "c
 class BeerPongEnv(MujocoEnv, utils.EzPickle):
-    def __init__(self):
+    metadata = {
        "render_modes": [
            "human",
            "rgb_array",
            "depth_array",
        ],
        "render_fps": 100
    }
    def __init__(self, **kwargs):
        self._steps = 0
        # Small Context -> Easier. Todo: Should we do different versions?
        # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
@ -50,9 +63,9 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
        self.repeat_action = 2
        # TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
        self.model = None
-        self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model,
+        self.geom_id = lambda x: mujoco.mj_name2id(self.model,
-                                                                  self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+                                                   mujoco.mjtObj.mjOBJ_GEOM,
-                                                                  x)
+                                                   x)
        # for reward calculation
        self.dists = []
@ -65,7 +78,17 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
        self.ball_in_cup = False
        self.dist_ground_cup = -1  # distance floor to cup if first floor contact
-        MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco")
+        self.observation_space = Box(
            low=-np.inf, high=np.inf, shape=(29,), dtype=np.float64
        )
        MujocoEnv.__init__(
            self,
            self.xml_path,
            frame_skip=1,
            observation_space=self.observation_space,
            **kwargs
        )
        utils.EzPickle.__init__(self)
    @property
@ -76,7 +99,8 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
    def start_vel(self):
        return self._start_vel
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        self.dists = []
        self.dists_final = []
        self.action_costs = []
@ -86,7 +110,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
        self.ball_cup_contact = False
        self.ball_in_cup = False
        self.dist_ground_cup = -1  # distance floor to cup if first floor contact
-        return super().reset()
+        return super().reset(seed=seed, options=options)
    def reset_model(self):
        init_pos_all = self.init_qpos.copy()
@ -128,11 +152,11 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
        if not crash:
            reward, reward_infos = self._get_reward(applied_action)
            is_collided = reward_infos['is_collided']  # TODO: Remove if self collision does not make a difference
-            done = is_collided
+            terminated = is_collided
            self._steps += 1
        else:
            reward = -30
-            done = True
+            terminated = True
            reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
        infos = dict(
@ -142,7 +166,10 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
            q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
        )
        infos.update(reward_infos)
-        return ob, reward, done, infos
+
        truncated = False
        return ob, reward, terminated, truncated, infos
    def _get_obs(self):
        theta = self.data.qpos.flat[:7].copy()
@ -197,13 +224,13 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
                    min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
            action_cost = 1e-4 * np.mean(action_cost)
            reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
-                     action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
+                action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
            # release step punishment
            min_time_bound = 0.1
            max_time_bound = 1.0
            release_time = self.release_step * self.dt
            release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
-                               int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
+                int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
            reward += release_time_rew
            success = self.ball_in_cup
        else:
@ -258,9 +285,9 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
            return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
        else:
            reward = 0
-            done = True
+            terminated, truncated = True, False
            while self._steps < MAX_EPISODE_STEPS_BEERPONG:
-                obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
+                obs, sub_reward, terminated, truncated, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
                    np.zeros(a.shape))
                reward += sub_reward
-        return obs, reward, done, infos
+        return obs, reward, terminated, truncated, infos
--- a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
+++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
@ -1,9 +1,8 @@
 import os
 import mujoco_py.builder
 import numpy as np
-from gym import utils
+from gymnasium import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium.envs.mujoco import MujocoEnv
 from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
@ -74,27 +73,24 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
        crash = False
        for _ in range(self.repeat_action):
            applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
-            try:
+            self.do_simulation(applied_action, self.frame_skip)
-                self.do_simulation(applied_action, self.frame_skip)
+            self.reward_function.initialize(self)
-                self.reward_function.initialize(self)
+            # self.reward_function.check_contacts(self.sim)   # I assume this is not important?
-                # self.reward_function.check_contacts(self.sim)   # I assume this is not important?
+            if self._steps < self.release_step:
-                if self._steps < self.release_step:
+                self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
-                    self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
+                self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
-                    self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
+            crash = False
                crash = False
            except mujoco_py.builder.MujocoException:
                crash = True
        ob = self._get_obs()
        if not crash:
            reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
            is_collided = reward_infos['is_collided']
-            done = is_collided or self._steps == self.ep_length - 1
+            terminated = is_collided or self._steps == self.ep_length - 1
            self._steps += 1
        else:
            reward = -30
-            done = True
+            terminated = True
            reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
        infos = dict(
@ -104,7 +100,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
            q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
        )
        infos.update(reward_infos)
-        return ob, reward, done, infos
+        return ob, reward, terminated, infos
    def _get_obs(self):
        theta = self.sim.data.qpos.flat[:7]
@ -143,16 +139,16 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
            return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
        else:
            reward = 0
-            done = False
+            terminated, truncated = False, False
-            while not done:
+            while not (terminated or truncated):
-                sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
+                sub_ob, sub_reward, terminated, truncated, sub_infos = super(BeerPongEnvStepBasedEpisodicReward,
-                    np.zeros(a.shape))
+                                                                             self).step(np.zeros(a.shape))
                reward += sub_reward
            infos = sub_infos
            ob = sub_ob
            ob[-1] = self.release_step + 1  # Since we simulate until the end of the episode, PPO does not see the
            # internal steps and thus, the observation also needs to be set correctly
-        return ob, reward, done, infos
+        return ob, reward, terminated, truncated, infos
 # class BeerBongEnvStepBased(BeerBongEnv):
@ -186,27 +182,3 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
 #             ob[-1] = self.release_step + 1  # Since we simulate until the end of the episode, PPO does not see the
 #             # internal steps and thus, the observation also needs to be set correctly
 #         return ob, reward, done, infos
 if __name__ == "__main__":
    env = BeerPongEnv(frame_skip=2)
    env.seed(0)
    # env = BeerBongEnvStepBased(frame_skip=2)
    # env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
    # env = BeerBongEnvFixedReleaseStep(frame_skip=2)
    import time
    env.reset()
    env.render("human")
    for i in range(600):
        # ac = 10 * env.action_space.sample()
        ac = 0.05 * np.ones(7)
        obs, rew, d, info = env.step(ac)
        env.render("human")
        if d:
            print('reward:', rew)
            print('RESETTING')
            env.reset()
            time.sleep(1)
    env.close()
--- a/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
@ -6,6 +6,23 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'phase_generator_kwargs': {
                'learn_tau': True
            },
            'controller_kwargs': {
                'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
                'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
            },
            'basis_generator_kwargs': {
                'num_basis': 2,
                'num_basis_zero_start': 2,
            },
        },
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
@ -39,3 +56,23 @@ class MPWrapper(RawInterfaceWrapper):
        xyz[-1] = 0.840
        self.model.body_pos[self.cup_table_id] = xyz
        return self.get_observation_from_step(self.get_obs())
 class MPWrapper_FixedRelease(MPWrapper):
    mp_config = {
        'ProMP': {
            'phase_generator_kwargs': {
                'tau': 0.62,
            },
            'controller_kwargs': {
                'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
                'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
            },
            'basis_generator_kwargs': {
                'num_basis': 2,
                'num_basis_zero_start': 2,
            },
        },
        'DMP': {},
        'ProDMP': {},
    }
--- a/fancy_gym/envs/mujoco/box_pushing/init.py
+++ b/fancy_gym/envs/mujoco/box_pushing/init.py
@ -1 +1 @@
-from .mp_wrapper import MPWrapper
+from .mp_wrapper import MPWrapper, ReplanMPWrapper
--- a/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py
+++ b/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py
@ -1,8 +1,8 @@
 import os
 import numpy as np
-from gym import utils, spaces
+from gymnasium import utils, spaces
-from gym.envs.mujoco import MujocoEnv
+from gymnasium.envs.mujoco import MujocoEnv
 from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance
 from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max
 from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat
@ -13,6 +13,7 @@ MAX_EPISODE_STEPS_BOX_PUSHING = 100
 BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]])
 class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
    """
    franka box pushing environment
@ -26,6 +27,15 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
    3. time-spatial-depend sparse reward
    """
    metadata = {
        "render_modes": [
            "human",
            "rgb_array",
            "depth_array",
        ],
        "render_fps": 50
    }
    def __init__(self, frame_skip: int = 10, random_init: bool = False):
        utils.EzPickle.__init__(**locals())
        self._steps = 0
@ -39,11 +49,16 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
        self._desired_rod_quat = desired_rod_quat
        self._episode_energy = 0.
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(28,), dtype=np.float64
        )
        self.random_init = random_init
        MujocoEnv.__init__(self,
                           model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"),
                           frame_skip=self.frame_skip,
-                           mujoco_bindings="mujoco")
+                           observation_space=self.observation_space)
        self.action_space = spaces.Box(low=-1, high=1, shape=(7,))
    def step(self, action):
@ -89,7 +104,11 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
            'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False,
            'num_steps': self._steps
        }
-        return obs, reward, episode_end, infos
+
        terminated = episode_end and infos['is_success']
        truncated = episode_end and not infos['is_success']
        return obs, reward, terminated, truncated, infos
    def reset_model(self):
        # rest box to initial position
@ -250,7 +269,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
            old_err_norm = err_norm
-            ### get Jacobian by mujoco
+            # get Jacobian by mujoco
            self.data.qpos[:7] = q
            mujoco.mj_forward(self.model, self.data)
@ -284,6 +303,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
        return q
 class BoxPushingDense(BoxPushingEnvBase):
    def __init__(self, frame_skip: int = 10, random_init: bool = False):
        super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -299,7 +319,7 @@ class BoxPushingDense(BoxPushingEnvBase):
        energy_cost = -0.0005 * np.sum(np.square(action))
        reward = joint_penalty + tcp_box_dist_reward + \
-                 box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
+            box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
        rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat)
        if rod_inclined_angle > np.pi / 4:
@ -307,6 +327,7 @@ class BoxPushingDense(BoxPushingEnvBase):
        return reward
 class BoxPushingTemporalSparse(BoxPushingEnvBase):
    def __init__(self, frame_skip: int = 10, random_init: bool = False):
        super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -368,6 +389,7 @@ class BoxPushingTemporalSpatialSparse(BoxPushingEnvBase):
        return reward
 class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase):
    def __init__(self, frame_skip: int = 10, random_init: bool = False):
--- a/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py
@ -6,6 +6,27 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'controller_kwargs': {
                'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
                'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
            },
            'basis_generator_kwargs': {
                'basis_bandwidth_factor': 2  # 3.5, 4 to try
            }
        },
        'DMP': {},
        'ProDMP': {
            'controller_kwargs': {
                'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
                'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
            },
            'basis_generator_kwargs': {
                'basis_bandwidth_factor': 2  # 3.5, 4 to try
            }
        },
    }
    # Random x goal + random init pos
    @property
@ -38,3 +59,35 @@ class MPWrapper(RawInterfaceWrapper):
    @property
    def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
        return self.data.qvel[:7].copy()
 class ReplanMPWrapper(MPWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {
            'controller_kwargs': {
                'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
                'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 0.3,
                'goal_scale': 0.3,
                'auto_scale_basis': True,
                'goal_offset': 1.0,
                'disable_goal': True,
            },
            'basis_generator_kwargs': {
                'num_basis': 5,
                'basis_bandwidth_factor': 3,
            },
            'phase_generator_kwargs': {
                'alpha_phase': 3,
            },
            'black_box_kwargs': {
                'max_planning_times': 4,
                'replanning_schedule': lambda pos, vel, obs, action, t: t % 25 == 0,
                'condition_on_desired': True,
            }
        }
    }
--- a/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
@ -1,14 +1,68 @@
 import os
-from typing import Tuple, Union, Optional
+from typing import Tuple, Union, Optional, Any, Dict
 import numpy as np
-from gym.core import ObsType
+from gymnasium.core import ObsType
-from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv
+from gymnasium.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv, DEFAULT_CAMERA_CONFIG
 from gymnasium import utils
 from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
-class HalfCheetahJumpEnv(HalfCheetahEnv):
+class HalfCheetahEnvCustomXML(HalfCheetahEnv):
    def __init__(
        self,
        xml_file,
        forward_reward_weight=1.0,
        ctrl_cost_weight=0.1,
        reset_noise_scale=0.1,
        exclude_current_positions_from_observation=True,
        **kwargs,
    ):
        utils.EzPickle.__init__(
            self,
            xml_file,
            forward_reward_weight,
            ctrl_cost_weight,
            reset_noise_scale,
            exclude_current_positions_from_observation,
            **kwargs,
        )
        self._forward_reward_weight = forward_reward_weight
        self._ctrl_cost_weight = ctrl_cost_weight
        self._reset_noise_scale = reset_noise_scale
        self._exclude_current_positions_from_observation = (
            exclude_current_positions_from_observation
        )
        if exclude_current_positions_from_observation:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
            )
        else:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
            )
        MujocoEnv.__init__(
            self,
            xml_file,
            5,
            observation_space=observation_space,
            default_camera_config=DEFAULT_CAMERA_CONFIG,
            **kwargs,
        )
 class HalfCheetahJumpEnv(HalfCheetahEnvCustomXML):
    """
    _ctrl_cost_weight 0.1 -> 0.0
    """
@ -41,10 +95,11 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
        height_after = self.get_body_com("torso")[2]
        self.max_height = max(height_after, self.max_height)
-        ## Didnt use fell_over, because base env also has no done condition - Paul and Marc
+        # Didnt use fell_over, because base env also has no done condition - Paul and Marc
        # fell_over = abs(self.sim.data.qpos[2]) > 2.5  # how to figure out if the cheetah fell over? -> 2.5 oke?
        # TODO: Should a fall over be checked here?
-        done = False
+        terminated = False
        truncated = False
        ctrl_cost = self.control_cost(action)
        costs = ctrl_cost
@ -63,17 +118,18 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
            'max_height': self.max_height
        }
-        return observation, reward, done, info
+        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.goal)
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-              options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+            -> Tuple[ObsType, Dict[str, Any]]:
        self.max_height = 0
        self.current_step = 0
        ret = super().reset(seed=seed, options=options)
        self.goal = self.np_random.uniform(1.1, 1.6, 1)  # 1.1 1.6
-        return super().reset()
+        return ret
    # overwrite reset_model to make it deterministic
    def reset_model(self):
--- a/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
@ -6,6 +6,12 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def context_mask(self) -> np.ndarray:
        return np.hstack([
--- a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.before_convert.xml
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.before_convert.xml
@ -0,0 +1,52 @@
 <mujoco model="hopper">
  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
  <default>
    <joint armature="1" damping="1" limited="true"/>
    <geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
    <motor ctrllimited="true" ctrlrange="-.4 .4"/>
  </default>
  <option integrator="RK4" timestep="0.002"/>
  <visual>
    <map znear="0.02"/>
  </visual>
  <worldbody>
    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
    <body name="torso" pos="0 0 1.25">
      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
      <body name="thigh" pos="0 0 1.05">
        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
        <body name="leg" pos="0 0 0.35">
          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
          <body name="foot" pos="0.13/2 0 0.1">
            <site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
            <geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
          </body>
        </body>
      </body>
    </body>
  <body name="goal_site_body" pos = "0 0 0">
    <site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
  </body>
  </worldbody>
  <actuator>
    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
  </actuator>
    <asset>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
@ -1,52 +1,51 @@
 <mujoco model="hopper">
-  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
+  <compiler angle="radian" autolimits="true"/>
-  <default>
+  <option integrator="RK4"/>
    <joint armature="1" damping="1" limited="true"/>
    <geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
    <motor ctrllimited="true" ctrlrange="-.4 .4"/>
  </default>
  <option integrator="RK4" timestep="0.002"/>
  <visual>
    <map znear="0.02"/>
  </visual>
  <default class="main">
    <joint limited="true" armature="1" damping="1"/>
    <geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
    <general ctrllimited="true" ctrlrange="-0.4 0.4"/>
  </default>
  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
    <texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
    <texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
    <material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
    <material name="geom" texture="texgeom" texuniform="true"/>
  </asset>
  <worldbody>
-    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
+    <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
-    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
+    <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
-    <body name="torso" pos="0 0 1.25">
+    <body name="torso" pos="0 0 1.25" gravcomp="0">
-      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
+      <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
+      <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
-      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
+      <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
-      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
+      <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
-      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
+      <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
-      <body name="thigh" pos="0 0 1.05">
+      <body name="thigh" pos="0 0 -0.2" gravcomp="0">
-        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
+        <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
-        <body name="leg" pos="0 0 0.35">
+        <body name="leg" pos="0 0 -0.7" gravcomp="0">
-          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
+          <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
-          <body name="foot" pos="0.13/2 0 0.1">
+          <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
-            <site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
+            <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
-            <geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
+            <site name="foot_site" pos="-0.065 0 -0.06" size="0.02" rgba="1 0 0 1"/>
          </body>
        </body>
      </body>
    </body>
-  <body name="goal_site_body" pos = "0 0 0">
+    <body name="goal_site_body" pos="0 0 0" gravcomp="0">
-    <site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
+      <site name="goal_site" pos="0 0 0" size="0.02" rgba="0 1 0 1"/>
-  </body>
+    </body>
  </worldbody>
  <actuator>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
+    <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
+    <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
+    <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
  </actuator>
    <asset>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
@ -1,51 +1,50 @@
 <mujoco model="hopper">
-  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
+  <compiler angle="radian" autolimits="true"/>
-  <default>
+  <option integrator="RK4"/>
    <joint armature="1" damping="1" limited="true"/>
    <geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
    <motor ctrllimited="true" ctrlrange="-.4 .4"/>
  </default>
  <option integrator="RK4" timestep="0.002"/>
  <visual>
    <map znear="0.02"/>
  </visual>
  <default class="main">
    <joint limited="true" armature="1" damping="1"/>
    <geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
    <general ctrllimited="true" ctrlrange="-0.4 0.4"/>
  </default>
  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
    <texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
    <texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
    <material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
    <material name="geom" texture="texgeom" texuniform="true"/>
  </asset>
  <worldbody>
-    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
+    <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
-    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
+    <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
-    <body name="torso" pos="0 0 1.25">
+    <body name="torso" pos="0 0 1.25" gravcomp="0">
-      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
+      <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
+      <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
-      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
+      <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
-      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
+      <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
-      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
+      <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
-      <body name="thigh" pos="0 0 1.05">
+      <body name="thigh" pos="0 0 -0.2" gravcomp="0">
-        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
+        <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
-        <body name="leg" pos="0 0 0.35">
+        <body name="leg" pos="0 0 -0.7" gravcomp="0">
-          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
+          <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
-          <body name="foot" pos="0.13/2 0 0.1">
+          <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
-            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
+            <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
          </body>
        </body>
      </body>
    </body>
-    <body name="box" pos="1 0 0">
+    <body name="box" pos="1 0 0" gravcomp="0">
-        <geom friction="1.0" fromto="0.48 0 0 1 0 0" name="basket_ground_geom" size="0.3" type="box" rgba="1 0 0 1"/>
+      <geom name="basket_ground_geom" size="0.3 0.3 0.26" pos="-0.26 0 0" quat="0.707107 0 -0.707107 0" type="box" rgba="1 0 0 1"/>
    </body>
  </worldbody>
  <actuator>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
+    <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
+    <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
+    <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
  </actuator>
-    <asset>
+</mujoco>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
@ -1,12 +1,95 @@
 import os
 import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.envs.mujoco.hopper_v4 import HopperEnv, DEFAULT_CAMERA_CONFIG
 from gymnasium import utils
 from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 import mujoco
 MAX_EPISODE_STEPS_HOPPERJUMP = 250
-class HopperJumpEnv(HopperEnv):
+class HopperEnvCustomXML(HopperEnv):
    """
    Initialization changes to normal Hopper:
    - terminate_when_unhealthy: True -> False
    - healthy_reward: 1.0 -> 2.0
    - healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
    - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
    - exclude_current_positions_from_observation: True -> False
    """
    def __init__(
            self,
            xml_file,
            forward_reward_weight=1.0,
            ctrl_cost_weight=1e-3,
            healthy_reward=1.0,
            terminate_when_unhealthy=True,
            healthy_state_range=(-100.0, 100.0),
            healthy_z_range=(0.7, float("inf")),
            healthy_angle_range=(-0.2, 0.2),
            reset_noise_scale=5e-3,
            exclude_current_positions_from_observation=True,
            **kwargs,
    ):
        xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
        utils.EzPickle.__init__(
            self,
            xml_file,
            forward_reward_weight,
            ctrl_cost_weight,
            healthy_reward,
            terminate_when_unhealthy,
            healthy_state_range,
            healthy_z_range,
            healthy_angle_range,
            reset_noise_scale,
            exclude_current_positions_from_observation,
            **kwargs
        )
        self._forward_reward_weight = forward_reward_weight
        self._ctrl_cost_weight = ctrl_cost_weight
        self._healthy_reward = healthy_reward
        self._terminate_when_unhealthy = terminate_when_unhealthy
        self._healthy_state_range = healthy_state_range
        self._healthy_z_range = healthy_z_range
        self._healthy_angle_range = healthy_angle_range
        self._reset_noise_scale = reset_noise_scale
        self._exclude_current_positions_from_observation = (
            exclude_current_positions_from_observation
        )
        if not hasattr(self, 'observation_space'):
            if exclude_current_positions_from_observation:
                self.observation_space = Box(
                    low=-np.inf, high=np.inf, shape=(15,), dtype=np.float64
                )
            else:
                self.observation_space = Box(
                    low=-np.inf, high=np.inf, shape=(16,), dtype=np.float64
                )
        MujocoEnv.__init__(
            self,
            xml_file,
            4,
            observation_space=self.observation_space,
            default_camera_config=DEFAULT_CAMERA_CONFIG,
            **kwargs,
        )
 class HopperJumpEnv(HopperEnvCustomXML):
    """
    Initialization changes to normal Hopper:
    - terminate_when_unhealthy: True -> False
@ -73,7 +156,7 @@ class HopperJumpEnv(HopperEnv):
        self.do_simulation(action, self.frame_skip)
        height_after = self.get_body_com("torso")[2]
-        #site_pos_after = self.data.get_site_xpos('foot_site')
+        # site_pos_after = self.data.get_site_xpos('foot_site')
        site_pos_after = self.data.site('foot_site').xpos
        self.max_height = max(height_after, self.max_height)
@ -88,7 +171,8 @@ class HopperJumpEnv(HopperEnv):
        ctrl_cost = self.control_cost(action)
        costs = ctrl_cost
-        done = False
+        terminated = False
        truncated = False
        goal_dist = np.linalg.norm(site_pos_after - self.goal)
        if self.contact_dist is None and self.contact_with_floor:
@ -115,7 +199,7 @@ class HopperJumpEnv(HopperEnv):
            healthy=self.is_healthy,
            contact_dist=self.contact_dist or 0
        )
-        return observation, reward, done, info
+        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        # goal_dist = self.data.get_site_xpos('foot_site') - self.goal
@ -140,8 +224,8 @@ class HopperJumpEnv(HopperEnv):
        noise_high[5] = 0.785
        qpos = (
-                self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
+            self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
-                self.init_qpos
+            self.init_qpos
        )
        qvel = (
            # self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
@ -162,12 +246,12 @@ class HopperJumpEnv(HopperEnv):
        # floor_geom_id = self.model.geom_name2id('floor')
        # foot_geom_id = self.model.geom_name2id('foot_geom')
        # TODO: do this properly over a sensor in the xml file, see dmc hopper
-        floor_geom_id = self._mujoco_bindings.mj_name2id(self.model,
+        floor_geom_id = mujoco.mj_name2id(self.model,
-                                                         self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+                                          mujoco.mjtObj.mjOBJ_GEOM,
-                                                         'floor')
+                                          'floor')
-        foot_geom_id = self._mujoco_bindings.mj_name2id(self.model,
+        foot_geom_id = mujoco.mj_name2id(self.model,
-                                                        self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+                                         mujoco.mjtObj.mjOBJ_GEOM,
-                                                        'foot_geom')
+                                         'foot_geom')
        for i in range(self.data.ncon):
            contact = self.data.contact[i]
            collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id
--- a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
@ -1,12 +1,16 @@
 import os
 from typing import Optional, Dict, Any, Tuple
 import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.core import ObsType
 from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
 from gymnasium import spaces
 MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
-class HopperJumpOnBoxEnv(HopperEnv):
+class HopperJumpOnBoxEnv(HopperEnvCustomXML):
    """
    Initialization changes to normal Hopper:
    - healthy_reward: 1.0 -> 0.01 -> 0.001
@ -33,6 +37,16 @@ class HopperJumpOnBoxEnv(HopperEnv):
        self.hopper_on_box = False
        self.context = context
        self.box_x = 1
        if exclude_current_positions_from_observation:
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64
            )
        else:
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(13,), dtype=np.float64
            )
        xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
        super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
                         healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
@ -74,10 +88,10 @@ class HopperJumpOnBoxEnv(HopperEnv):
        costs = ctrl_cost
-        done = fell_over or self.hopper_on_box
+        terminated = fell_over or self.hopper_on_box
-        if self.current_step >= self.max_episode_steps or done:
+        if self.current_step >= self.max_episode_steps or terminated:
-            done = False
+            done = False  # TODO why are we doing this???
            max_height = self.max_height.copy()
            min_distance = self.min_distance.copy()
@ -122,21 +136,25 @@ class HopperJumpOnBoxEnv(HopperEnv):
            'goal': self.box_x,
        }
-        return observation, reward, done, info
+        truncated = self.current_step >= self.max_episode_steps and not terminated
        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.box_x)
-    def reset(self):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        self.max_height = 0
        self.min_distance = 5000
        self.current_step = 0
        self.hopper_on_box = False
        ret = super().reset(seed=seed, options=options)
        if self.context:
            self.box_x = self.np_random.uniform(1, 3, 1)
            self.model.body("box").pos = [self.box_x[0], 0, 0]
-        return super().reset()
+        return ret
    # overwrite reset_model to make it deterministic
    def reset_model(self):
@ -150,21 +168,3 @@ class HopperJumpOnBoxEnv(HopperEnv):
        observation = self._get_obs()
        return observation
 if __name__ == '__main__':
    render_mode = "human"  # "human" or "partial" or "final"
    env = HopperJumpOnBoxEnv()
    obs = env.reset()
    for i in range(2000):
        # objective.load_result("/tmp/cma")
        # test with random actions
        ac = env.action_space.sample()
        obs, rew, d, info = env.step(ac)
        if i % 10 == 0:
            env.render(mode=render_mode)
        if d:
            print('After ', i, ' steps, done: ', d)
            env.reset()
    env.close()
--- a/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {},
    }
    # Random x goal + random init pos
    @property
--- a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
@ -1,56 +1,54 @@
 <mujoco model="hopper">
-  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
+  <compiler angle="radian" autolimits="true"/>
-  <default>
+  <option integrator="RK4"/>
    <joint armature="1" damping="1" limited="true"/>
    <geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
    <motor ctrllimited="true" ctrlrange="-.4 .4"/>
  </default>
  <option integrator="RK4" timestep="0.002"/>
  <visual>
    <map znear="0.02"/>
  </visual>
  <default class="main">
    <joint limited="true" armature="1" damping="1"/>
    <geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
    <general ctrllimited="true" ctrlrange="-0.4 0.4"/>
  </default>
  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
    <texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
    <texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
    <material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
    <material name="geom" texture="texgeom" texuniform="true"/>
  </asset>
  <worldbody>
-    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
+    <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
-    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
+    <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
-    <body name="torso" pos="0 0 1.25">
+    <body name="torso" pos="0 0 1.25" gravcomp="0">
-      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
+      <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
+      <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
-      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
+      <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
-      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
+      <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
-      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
+      <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
-      <body name="thigh" pos="0 0 1.05">
+      <body name="thigh" pos="0 0 -0.2" gravcomp="0">
-        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
+        <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
-        <body name="leg" pos="0 0 0.35">
+        <body name="leg" pos="0 0 -0.7" gravcomp="0">
-          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
+          <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
-          <body name="foot" pos="0.13/2 0 0.1">
+          <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
-            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
+            <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
          </body>
        </body>
      </body>
    </body>
-    <body name="ball" pos="0 0 1.53">
+    <body name="ball" pos="0 0 1.53" gravcomp="0">
-          <joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-          <joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
-          <joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
-          <geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
+      <geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
-                friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
+      <site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
          <site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
    </body>
  </worldbody>
  <actuator>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
+    <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
+    <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
+    <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
  </actuator>
    <asset>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
@ -1,132 +1,129 @@
 <mujoco model="hopper">
-  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
+  <compiler angle="radian" autolimits="true"/>
-  <default>
+  <option integrator="RK4"/>
    <joint armature="1" damping="1" limited="true"/>
    <geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
    <motor ctrllimited="true" ctrlrange="-.4 .4"/>
  </default>
  <option integrator="RK4" timestep="0.002"/>
  <visual>
    <map znear="0.02"/>
  </visual>
  <default class="main">
    <joint limited="true" armature="1" damping="1"/>
    <geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
    <general ctrllimited="true" ctrlrange="-0.4 0.4"/>
  </default>
  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
    <texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
    <texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
    <material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
    <material name="geom" texture="texgeom" texuniform="true"/>
  </asset>
  <worldbody>
-    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
+    <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
-    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
+    <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
-    <body name="torso" pos="0 0 1.25">
+    <body name="torso" pos="0 0 1.25" gravcomp="0">
-      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
+      <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
+      <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
-      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
+      <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
-      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
+      <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
-      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
+      <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
-      <body name="thigh" pos="0 0 1.05">
+      <body name="thigh" pos="0 0 -0.2" gravcomp="0">
-        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
+        <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
-        <body name="leg" pos="0 0 0.35">
+        <body name="leg" pos="0 0 -0.7" gravcomp="0">
-          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
+          <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
-          <body name="foot" pos="0.13/2 0 0.1">
+          <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
-            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
+            <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
          </body>
        </body>
      </body>
    </body>
-    <body name="ball" pos="0 0 1.53">
+    <body name="ball" pos="0 0 1.53" gravcomp="0">
-          <joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-          <joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
-          <joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
+      <joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
-          <geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
+      <geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
-                friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
+      <site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
          <site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
    </body>
-    <body name="basket_ground" pos="5 0 0">
+    <body name="basket_ground" pos="5 0 0" gravcomp="0">
-        <geom friction="0.9" fromto="5 0 0 5.3 0 0" name="basket_ground_geom" size="0.1 0.4 0.3" type="box"/>
+      <geom name="basket_ground_geom" size="0.1 0.1 0.15" pos="0.15 0 0" quat="0.707107 0 -0.707107 0" type="box" friction="0.9 0.005 0.0001"/>
-        <body name="edge1" pos="5 0 0">
+      <body name="edge1" pos="0 0 0" gravcomp="0">
-          <geom friction="2.0" fromto="5 0 0 5 0 0.2" name="edge1_geom" size="0.04" type="capsule"/>
+        <geom name="edge1_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge2" pos="5 0 0.05">
+      <body name="edge2" pos="0 0 0.05" gravcomp="0">
-          <geom friction="2.0" fromto="5 0.05 0 5 0.05 0.2" name="edge2_geom" size="0.04" type="capsule"/>
+        <geom name="edge2_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge3" pos="5 0 0.1">
+      <body name="edge3" pos="0 0 0.1" gravcomp="0">
-          <geom friction="2.0" fromto="5 0.1 0 5 0.1 0.2" name="edge3_geom" size="0.04" type="capsule"/>
+        <geom name="edge3_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge4" pos="5 0 0.15">
+      <body name="edge4" pos="0 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5 0.15 0 5 0.15 0.2" name="edge4_geom" size="0.04" type="capsule"/>
+        <geom name="edge4_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge5" pos="5.05 0 0.15">
+      <body name="edge5" pos="0.05 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.05 0.15 0 5.05 0.15 0.2" name="edge5_geom" size="0.04" type="capsule"/>
+        <geom name="edge5_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge6" pos="5.1 0 0.15">
+      <body name="edge6" pos="0.1 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.1 0.15 0 5.1 0.15 0.2" name="edge6_geom" size="0.04" type="capsule"/>
+        <geom name="edge6_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge7" pos="5.15 0 0.15">
+      <body name="edge7" pos="0.15 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.15 0.15 0 5.15 0.15 0.2" name="edge7_geom" size="0.04" type="capsule"/>
+        <geom name="edge7_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge8" pos="5.2 0 0.15">
+      <body name="edge8" pos="0.2 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.2 0.15 0 5.2 0.15 0.2" name="edge8_geom" size="0.04" type="capsule"/>
+        <geom name="edge8_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-         <body name="edge9" pos="5.25 0 0.15">
+      <body name="edge9" pos="0.25 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.25 0.15 0 5.25 0.15 0.2" name="edge9_geom" size="0.04" type="capsule"/>
+        <geom name="edge9_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge10" pos="5.3 0 0.15">
+      <body name="edge10" pos="0.3 0 0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 0.15 0 5.3 0.15 0.2" name="edge10_geom" size="0.04" type="capsule"/>
+        <geom name="edge10_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge11" pos="5.3 0 0.1">
+      <body name="edge11" pos="0.3 0 0.1" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 0.1 0 5.3 0.1 0.2" name="edge11_geom" size="0.04" type="capsule"/>
+        <geom name="edge11_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge12" pos="5.3 0 0.05">
+      <body name="edge12" pos="0.3 0 0.05" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 0.05 0 5.3 0.05 0.2" name="edge12_geom" size="0.04" type="capsule"/>
+        <geom name="edge12_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge13" pos="5.3 0 0.0">
+      <body name="edge13" pos="0.3 0 0" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 0 0 5.3 0 0.2" name="edge13_geom" size="0.04" type="capsule"/>
+        <geom name="edge13_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge14" pos="5.3 0 -0.05">
+      <body name="edge14" pos="0.3 0 -0.05" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 -0.05 0 5.3 -0.05 0.2" name="edge14_geom" size="0.04" type="capsule"/>
+        <geom name="edge14_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge15" pos="5.3 0 -0.1">
+      <body name="edge15" pos="0.3 0 -0.1" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 -0.1 0 5.3 -0.1 0.2" name="edge15_geom" size="0.04" type="capsule"/>
+        <geom name="edge15_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        <body name="edge16" pos="5.3 0 -0.15">
+      <body name="edge16" pos="0.3 0 -0.15" gravcomp="0">
-          <geom friction="2.0" fromto="5.3 -0.15 0 5.3 -0.15 0.2" name="edge16_geom" size="0.04" type="capsule"/>
+        <geom name="edge16_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-        </body>
+      </body>
-        
+      <body name="edge20" pos="0.25 0 -0.15" gravcomp="0">
-        <body name="edge20" pos="5.25 0 -0.15">
+        <geom name="edge20_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5.25 -0.15 0 5.25 -0.15 0.2" name="edge20_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge21" pos="0.2 0 -0.15" gravcomp="0">
-        <body name="edge21" pos="5.2 0 -0.15">
+        <geom name="edge21_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5.2 -0.15 0 5.2 -0.15 0.2" name="edge21_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge22" pos="0.15 0 -0.15" gravcomp="0">
-        <body name="edge22" pos="5.15 0 -0.15">
+        <geom name="edge22_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5.15 -0.15 0 5.15 -0.15 0.2" name="edge22_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge23" pos="0.1 0 -0.15" gravcomp="0">
-        <body name="edge23" pos="5.1 0 -0.15">
+        <geom name="edge23_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5.1 -0.15 0 5.1 -0.15 0.2" name="edge23_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge24" pos="0.05 0 -0.15" gravcomp="0">
-        <body name="edge24" pos="5.05 0 -0.15">
+        <geom name="edge24_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5.05 -0.15 0 5.05 -0.15 0.2" name="edge24_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge25" pos="0 0 -0.15" gravcomp="0">
-        <body name="edge25" pos="5 0 -0.15">
+        <geom name="edge25_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5 -0.15 0 5 -0.15 0.2" name="edge25_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge26" pos="0 0 -0.1" gravcomp="0">
-        <body name="edge26" pos="5 0 -0.1">
+        <geom name="edge26_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5 -0.1 0 5 -0.1 0.2" name="edge26_geom" size="0.04" type="capsule"/>
+      </body>
-        </body>
+      <body name="edge27" pos="0 0 -0.05" gravcomp="0">
-        <body name="edge27" pos="5 0 -0.05">
+        <geom name="edge27_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
-          <geom friction="2.0" fromto="5 -0.05 0 5 -0.05 0.2" name="edge27_geom" size="0.04" type="capsule"/>
+      </body>
        </body>
    </body>
  </worldbody>
  <actuator>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
+    <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
+    <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
+    <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
  </actuator>
-    <asset>
+</mujoco>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
@ -1,13 +1,15 @@
 import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
 import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.core import ObsType
 from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
 from gymnasium import spaces
 MAX_EPISODE_STEPS_HOPPERTHROW = 250
-class HopperThrowEnv(HopperEnv):
+class HopperThrowEnv(HopperEnvCustomXML):
    """
    Initialization changes to normal Hopper:
    - healthy_reward: 1.0 -> 0.0 -> 0.1
@ -36,6 +38,16 @@ class HopperThrowEnv(HopperEnv):
        self.max_episode_steps = max_episode_steps
        self.context = context
        self.goal = 0
        if not hasattr(self, 'observation_space'):
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
            )
        else:
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
            )
        super().__init__(xml_file=xml_file,
                         forward_reward_weight=forward_reward_weight,
                         ctrl_cost_weight=ctrl_cost_weight,
@ -56,14 +68,14 @@ class HopperThrowEnv(HopperEnv):
        # done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
        ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
-        done = ball_landed
+        terminated = ball_landed
        ctrl_cost = self.control_cost(action)
        costs = ctrl_cost
        rewards = 0
-        if self.current_step >= self.max_episode_steps or done:
+        if self.current_step >= self.max_episode_steps or terminated:
            distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
                self._forward_reward_weight * ball_pos_after
            healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
@ -78,16 +90,19 @@ class HopperThrowEnv(HopperEnv):
            '_steps': self.current_step,
            'goal': self.goal,
        }
        truncated = False
-        return observation, reward, done, info
+        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.goal)
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        self.current_step = 0
        ret = super().reset(seed=seed, options=options)
        self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1)  # 0.5 8.0
-        return super().reset()
+        return ret
    # overwrite reset_model to make it deterministic
    def reset_model(self):
@ -101,22 +116,3 @@ class HopperThrowEnv(HopperEnv):
        observation = self._get_obs()
        return observation
 if __name__ == '__main__':
    render_mode = "human"  # "human" or "partial" or "final"
    env = HopperThrowEnv()
    obs = env.reset()
    for i in range(2000):
        # objective.load_result("/tmp/cma")
        # test with random actions
        ac = env.action_space.sample()
        obs, rew, d, info = env.step(ac)
        if i % 10 == 0:
            env.render(mode=render_mode)
        if d:
            print('After ', i, ' steps, done: ', d)
            env.reset()
    env.close()
--- a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
@ -1,13 +1,16 @@
 import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
 import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
 from gymnasium.core import ObsType
 from gymnasium import spaces
 MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
-class HopperThrowInBasketEnv(HopperEnv):
+class HopperThrowInBasketEnv(HopperEnvCustomXML):
    """
    Initialization changes to normal Hopper:
    - healthy_reward: 1.0 -> 0.0
@ -42,6 +45,16 @@ class HopperThrowInBasketEnv(HopperEnv):
        self.context = context
        self.penalty = penalty
        self.basket_x = 5
        if exclude_current_positions_from_observation:
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
            )
        else:
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
            )
        xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
        super().__init__(xml_file=xml_file,
                         forward_reward_weight=forward_reward_weight,
@ -65,14 +78,14 @@ class HopperThrowInBasketEnv(HopperEnv):
        is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
        is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
-                self.basket_size / 2)
+            self.basket_size / 2)
        is_in_basket_z = ball_pos[2] < 0.1
        is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
        if is_in_basket:
            self.ball_in_basket = True
        ball_landed = self.get_body_com("ball")[2] <= 0.05
-        done = bool(ball_landed or is_in_basket)
+        terminated = bool(ball_landed or is_in_basket)
        rewards = 0
@ -80,7 +93,7 @@ class HopperThrowInBasketEnv(HopperEnv):
        costs = ctrl_cost
-        if self.current_step >= self.max_episode_steps or done:
+        if self.current_step >= self.max_episode_steps or terminated:
            if is_in_basket:
                if not self.context:
@ -101,23 +114,27 @@ class HopperThrowInBasketEnv(HopperEnv):
        info = {
            'ball_pos': ball_pos[0],
        }
        truncated = False
-        return observation, reward, done, info
+        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.basket_x)
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        if self.max_episode_steps == 10:
            # We have to initialize this here, because the spec is only added after creating the env.
            self.max_episode_steps = self.spec.max_episode_steps
        self.current_step = 0
        self.ball_in_basket = False
        ret = super().reset(seed=seed, options=options)
        if self.context:
            self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
            self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
-        return super().reset()
+        return ret
    # overwrite reset_model to make it deterministic
    def reset_model(self):
@ -132,22 +149,3 @@ class HopperThrowInBasketEnv(HopperEnv):
        observation = self._get_obs()
        return observation
 if __name__ == '__main__':
    render_mode = "human"  # "human" or "partial" or "final"
    env = HopperThrowInBasketEnv()
    obs = env.reset()
    for i in range(2000):
        # objective.load_result("/tmp/cma")
        # test with random actions
        ac = env.action_space.sample()
        obs, rew, d, info = env.step(ac)
        if i % 10 == 0:
            env.render(mode=render_mode)
        if d:
            print('After ', i, ' steps, done: ', d)
            env.reset()
    env.close()
--- a/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def context_mask(self):
--- a/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
@ -7,6 +7,16 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {
            'phase_generator_kwargs': {
                'alpha_phase': 2,
            },
        },
        'ProDMP': {},
    }
    @property
    def context_mask(self):
        return np.concatenate([[False] * self.n_links,  # cos
--- a/fancy_gym/envs/mujoco/reacher/reacher.py
+++ b/fancy_gym/envs/mujoco/reacher/reacher.py
@ -1,8 +1,9 @@
 import os
 import numpy as np
-from gym import utils
+from gymnasium import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 MAX_EPISODE_STEPS_REACHER = 200
@ -12,7 +13,17 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
    More general version of the gym mujoco Reacher environment
    """
-    def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1):
+    metadata = {
        "render_modes": [
            "human",
            "rgb_array",
            "depth_array",
        ],
        "render_fps": 50,
    }
    def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1.,
                 **kwargs):
        utils.EzPickle.__init__(**locals())
        self._steps = 0
@ -25,10 +36,16 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
        file_name = f'reacher_{n_links}links.xml'
        # sin, cos, velocity * n_Links + goal position (2) and goal distance (3)
        shape = (self.n_links * 3 + 5,)
        observation_space = Box(low=-np.inf, high=np.inf, shape=shape, dtype=np.float64)
        MujocoEnv.__init__(self,
                           model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
                           frame_skip=2,
-                           mujoco_bindings="mujoco")
+                           observation_space=observation_space,
                           **kwargs
                           )
    def step(self, action):
        self._steps += 1
@ -45,10 +62,14 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
        reward = reward_dist + reward_ctrl + angular_vel
        self.do_simulation(action, self.frame_skip)
-        ob = self._get_obs()
+        if self.render_mode == "human":
-        done = False
+            self.render()
-        infos = dict(
+        ob = self._get_obs()
        terminated = False
        truncated = False
        info = dict(
            reward_dist=reward_dist,
            reward_ctrl=reward_ctrl,
            velocity=angular_vel,
@ -56,7 +77,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
            goal=self.goal if hasattr(self, "goal") else None
        )
-        return ob, reward, done, infos
+        return ob, reward, terminated, truncated, info
    def distance_reward(self):
        vec = self.get_body_com("fingertip") - self.get_body_com("target")
@ -66,6 +87,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
        return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
    def viewer_setup(self):
        assert self.viewer is not None
        self.viewer.cam.trackbodyid = 0
    def reset_model(self):
--- a/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py
@ -7,6 +7,53 @@ from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, j
 class TT_MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            'phase_generator_kwargs': {
                'learn_tau': False,
                'learn_delay': False,
                'tau_bound': [0.8, 1.5],
                'delay_bound': [0.05, 0.15],
            },
            'controller_kwargs': {
                'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
                'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
            },
            'basis_generator_kwargs': {
                'num_basis': 3,
                'num_basis_zero_start': 1,
                'num_basis_zero_goal': 1,
            },
            'black_box_kwargs': {
                'verbose': 2,
            },
        },
        'DMP': {},
        'ProDMP': {
            'phase_generator_kwargs': {
                'learn_tau': True,
                'learn_delay': True,
                'tau_bound': [0.8, 1.5],
                'delay_bound': [0.05, 0.15],
                'alpha_phase': 3,
            },
            'controller_kwargs': {
                'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
                'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
            },
            'basis_generator_kwargs': {
                'num_basis': 3,
                'alpha': 25,
                'basis_bandwidth_factor': 3,
            },
            'trajectory_generator_kwargs': {
                'weights_scale': 0.7,
                'auto_scale_basis': True,
                'relative_goal': True,
                'disable_goal': True,
            },
        },
    }
    # Random x goal + random init pos
    @property
@ -16,7 +63,7 @@ class TT_MPWrapper(RawInterfaceWrapper):
            [False] * 7,  # joints velocity
            [True] * 2,  # position ball x, y
            [False] * 1,  # position ball z
-            #[True] * 3,    # velocity ball x, y, z
+            # [True] * 3,    # velocity ball x, y, z
            [True] * 2,  # target landing position
            # [True] * 1,  # time
        ])
@ -40,7 +87,42 @@ class TT_MPWrapper(RawInterfaceWrapper):
                              return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]:
        return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound)
 class TT_MPWrapper_Replan(TT_MPWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {
            'phase_generator_kwargs': {
                'learn_tau': True,
                'learn_delay': True,
                'tau_bound': [0.8, 1.5],
                'delay_bound': [0.05, 0.15],
                'alpha_phase': 3,
            },
            'controller_kwargs': {
                'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
                'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
            },
            'basis_generator_kwargs': {
                'num_basis': 2,
                'alpha': 25,
                'basis_bandwidth_factor': 3,
            },
            'trajectory_generator_kwargs': {
                'auto_scale_basis': True,
                'goal_offset': 1.0,
            },
            'black_box_kwargs': {
                'max_planning_times': 3,
                'replanning_schedule': lambda pos, vel, obs, action, t: t % 50 == 0,
            },
        },
    }
 class TTVelObs_MPWrapper(TT_MPWrapper):
    # Will inherit mp_config from TT_MPWrapper
    @property
    def context_mask(self):
@ -52,4 +134,20 @@ class TTVelObs_MPWrapper(TT_MPWrapper):
            [True] * 3,    # velocity ball x, y, z
            [True] * 2,  # target landing position
            # [True] * 1,  # time
-        ])
+        ])
 class TTVelObs_MPWrapper_Replan(TT_MPWrapper_Replan):
    # Will inherit mp_config from TT_MPWrapper_Replan
    @property
    def context_mask(self):
        return np.hstack([
            [False] * 7,  # joints position
            [False] * 7,  # joints velocity
            [True] * 2,  # position ball x, y
            [False] * 1,  # position ball z
            [True] * 3,    # velocity ball x, y, z
            [True] * 2,  # target landing position
            # [True] * 1,  # time
        ])
--- a/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py
+++ b/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py
@ -1,8 +1,8 @@
 import os
 import numpy as np
-from gym import utils, spaces
+from gymnasium import utils, spaces
-from gym.envs.mujoco import MujocoEnv
+from gymnasium.envs.mujoco import MujocoEnv
 from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force
 from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high
@ -22,6 +22,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
    """
    7 DoF table tennis environment
    """
    metadata = {
        "render_modes": [
            "human",
            "rgb_array",
            "depth_array",
        ],
        "render_fps": 125
    }
    def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4,
                 goal_switching_step: int = None,
                 enable_artificial_wind: bool = False):
@ -50,11 +60,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
        self._artificial_force = 0.
        if not hasattr(self, 'observation_space'):
            self.observation_space = spaces.Box(
                low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
            )
        MujocoEnv.__init__(self,
                           model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"),
                           frame_skip=frame_skip,
-                           mujoco_bindings="mujoco")
+                           observation_space=self.observation_space)
-        
+
        if ctxt_dim == 2:
            self.context_bounds = CONTEXT_BOUNDS_2DIMS
        elif ctxt_dim == 4:
@ -83,11 +98,11 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
        unstable_simulation = False
        if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5:
-                new_goal_pos = self._generate_goal_pos(random=True)
+            new_goal_pos = self._generate_goal_pos(random=True)
-                new_goal_pos[1] = -new_goal_pos[1]
+            new_goal_pos[1] = -new_goal_pos[1]
-                self._goal_pos = new_goal_pos
+            self._goal_pos = new_goal_pos
-                self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
+            self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
-                mujoco.mj_forward(self.model, self.data)
+            mujoco.mj_forward(self.model, self.data)
        for _ in range(self.frame_skip):
            if self._enable_artificial_wind:
@ -102,7 +117,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
            if not self._hit_ball:
                self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \
-                                self._contact_checker(self._ball_contact_id, self._bat_back_id)
+                    self._contact_checker(self._ball_contact_id, self._bat_back_id)
                if not self._hit_ball:
                    ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id)
                    if ball_land_on_floor_no_hit:
@ -130,9 +145,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
        reward = -25 if unstable_simulation else self._get_reward(self._terminated)
        land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \
-                            if self._ball_landing_pos is not None else 10.
+            if self._ball_landing_pos is not None else 10.
-        return self._get_obs(), reward, self._terminated, {
+        info = {
            "hit_ball": self._hit_ball,
            "ball_returned_success": self._ball_return_success,
            "land_dist_error": land_dist_err,
@ -140,6 +155,10 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
            "num_steps": self._steps,
        }
        terminated, truncated = self._terminated, False
        return self._get_obs(), reward, terminated, truncated, info
    def _contact_checker(self, id_1, id_2):
        for coni in range(0, self.data.ncon):
            con = self.data.contact[coni]
@ -202,7 +221,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
        if not self._hit_ball:
            return 0.2 * (1 - np.tanh(min_r_b_dist**2))
        if self._ball_landing_pos is None:
-            min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:,:2] - self._goal_pos[:2], axis=1))
+            min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:, :2] - self._goal_pos[:2], axis=1))
            return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2))
        min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2])
        over_net_bonus = int(self._ball_landing_pos[0] < 0)
@ -231,13 +250,13 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
        violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0))
        violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0))
        invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \
-                          violate_high_bound_error + violate_low_bound_error
+            violate_high_bound_error + violate_low_bound_error
        return -invalid_penalty
    def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound):
-        obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
+        obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])])  # 0 for invalid traj
        penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound)
-        return obs, penalty, True, {
+        return obs, penalty, True, False, {
            "hit_ball": [False],
            "ball_returned_success": [False],
            "land_dist_error": [10.],
@ -249,7 +268,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
    @staticmethod
    def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound):
        time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \
-                     or action[1] > delay_bound[1] or action[1] < delay_bound[0]
+            or action[1] > delay_bound[1] or action[1] < delay_bound[0]
        if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low):
            return False, pos_traj, vel_traj
        return True, pos_traj, vel_traj
@ -257,6 +276,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
 class TableTennisWind(TableTennisEnv):
    def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4):
        self.observation_space = spaces.Box(
            low=-np.inf, high=np.inf, shape=(22,), dtype=np.float64
        )
        super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True)
    def _get_obs(self):
--- a/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
@ -1,64 +1,60 @@
 <mujoco model="walker2d">
-  <compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
+  <compiler angle="radian" autolimits="true"/>
-  <default>
+  <option integrator="RK4"/>
-    <joint armature="0.01" damping=".1" limited="true"/>
+  <default class="main">
-    <geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/>
+    <joint limited="true" armature="0.01" damping="0.1"/>
    <geom conaffinity="0" friction="0.7 0.1 0.1" rgba="0.8 0.6 0.4 1"/>
  </default>
-  <option integrator="RK4" timestep="0.002"/>
+  <asset>
    <texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
    <texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
    <texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
    <material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
    <material name="geom" texture="texgeom" texuniform="true"/>
  </asset>
  <worldbody>
-    <light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
+    <geom name="floor" size="40 40 40" type="plane" conaffinity="1" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
-    <geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/>
+    <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
-    <body name="torso" pos="0 0 1.25">
+    <body name="torso" pos="0 0 1.25" gravcomp="0">
-      <camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
+      <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
-      <joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
+      <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
-      <joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
+      <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
-      <joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
+      <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.1 0.1"/>
-      <geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
+      <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
-      <body name="thigh" pos="0 0 1.05">
+      <body name="thigh" pos="0 0 -0.2" gravcomp="0">
-        <joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
+        <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1"/>
-        <body name="leg" pos="0 0 0.35">
+        <body name="leg" pos="0 0 -0.7" gravcomp="0">
-          <joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
+          <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1"/>
-          <body name="foot" pos="0.2/2 0 0.1">
+          <body name="foot" pos="0.1 0 -0.25" gravcomp="0">
-            <site name="foot_right_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="0 0 1 1" type="sphere"/>
+            <joint name="foot_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <geom name="foot_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="0.9 0.1 0.1"/>
-            <geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
+            <site name="foot_right_site" pos="-0.1 0 -0.06" size="0.02" rgba="0 0 1 1"/>
          </body>
        </body>
      </body>
-      <!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->
+      <body name="thigh_left" pos="0 0 -0.2" gravcomp="0">
-      <body name="thigh_left" pos="0 0 1.05">
+        <joint name="thigh_left_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
-        <joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
+        <geom name="thigh_left_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
-        <geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/>
+        <body name="leg_left" pos="0 0 -0.7" gravcomp="0">
-        <body name="leg_left" pos="0 0 0.35">
+          <joint name="leg_left_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
-          <joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
+          <geom name="leg_left_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
-          <geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/>
+          <body name="foot_left" pos="0.1 0 -0.25" gravcomp="0">
-          <body name="foot_left" pos="0.2/2 0 0.1">
+            <joint name="foot_left_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
-            <site name="foot_left_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
+            <geom name="foot_left_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="1.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
-            <joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
+            <site name="foot_left_site" pos="-0.1 0 -0.06" size="0.02" rgba="1 0 0 1"/>
            <geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
          </body>
        </body>
      </body>
    </body>
  </worldbody>
  <actuator>
-    <!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>-->
+    <general joint="thigh_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/>
+    <general joint="leg_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/>
+    <general joint="foot_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/>
+    <general joint="thigh_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/>
+    <general joint="leg_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
-    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/>
+    <general joint="foot_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
    <motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
    <!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
  </actuator>
    <asset>
        <texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
            width="100" height="100"/>
        <texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
        <texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
        <material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
        <material name="geom" texture="texgeom" texuniform="true"/>
    </asset>
 </mujoco>
--- a/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {},
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def context_mask(self):
--- a/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
@ -1,8 +1,13 @@
 import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
 import numpy as np
-from gym.envs.mujoco.walker2d_v4 import Walker2dEnv
+from gymnasium.envs.mujoco.walker2d_v4 import Walker2dEnv, DEFAULT_CAMERA_CONFIG
 from gymnasium.core import ObsType
 from gymnasium import utils
 from gymnasium.envs.mujoco import MujocoEnv
 from gymnasium.spaces import Box
 MAX_EPISODE_STEPS_WALKERJUMP = 300
@ -11,8 +16,71 @@ MAX_EPISODE_STEPS_WALKERJUMP = 300
 #  to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
 #  as possible, while landing at a specific target position
 class Walker2dEnvCustomXML(Walker2dEnv):
    def __init__(
        self,
        xml_file,
        forward_reward_weight=1.0,
        ctrl_cost_weight=1e-3,
        healthy_reward=1.0,
        terminate_when_unhealthy=True,
        healthy_z_range=(0.8, 2.0),
        healthy_angle_range=(-1.0, 1.0),
        reset_noise_scale=5e-3,
        exclude_current_positions_from_observation=True,
        **kwargs,
    ):
        utils.EzPickle.__init__(
            self,
            xml_file,
            forward_reward_weight,
            ctrl_cost_weight,
            healthy_reward,
            terminate_when_unhealthy,
            healthy_z_range,
            healthy_angle_range,
            reset_noise_scale,
            exclude_current_positions_from_observation,
            **kwargs,
        )
-class Walker2dJumpEnv(Walker2dEnv):
+        self._forward_reward_weight = forward_reward_weight
        self._ctrl_cost_weight = ctrl_cost_weight
        self._healthy_reward = healthy_reward
        self._terminate_when_unhealthy = terminate_when_unhealthy
        self._healthy_z_range = healthy_z_range
        self._healthy_angle_range = healthy_angle_range
        self._reset_noise_scale = reset_noise_scale
        self._exclude_current_positions_from_observation = (
            exclude_current_positions_from_observation
        )
        if exclude_current_positions_from_observation:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
            )
        else:
            observation_space = Box(
                low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
            )
        self.observation_space = observation_space
        MujocoEnv.__init__(
            self,
            xml_file,
            4,
            observation_space=observation_space,
            default_camera_config=DEFAULT_CAMERA_CONFIG,
            **kwargs,
        )
 class Walker2dJumpEnv(Walker2dEnvCustomXML):
    """
    healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
    penalty 10 -> 0 not from alex
@ -54,13 +122,13 @@ class Walker2dJumpEnv(Walker2dEnv):
        self.max_height = max(height, self.max_height)
-        done = bool(height < 0.2)
+        terminated = bool(height < 0.2)
        ctrl_cost = self.control_cost(action)
        costs = ctrl_cost
        rewards = 0
-        if self.current_step >= self.max_episode_steps or done:
+        if self.current_step >= self.max_episode_steps or terminated:
-            done = True
+            terminated = True
            height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
            healthy_reward = self.healthy_reward * self.current_step
@ -73,17 +141,20 @@ class Walker2dJumpEnv(Walker2dEnv):
            'max_height': self.max_height,
            'goal': self.goal,
        }
        truncated = False
-        return observation, reward, done, info
+        return observation, reward, terminated, truncated, info
    def _get_obs(self):
        return np.append(super()._get_obs(), self.goal)
-    def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+    def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
            -> Tuple[ObsType, Dict[str, Any]]:
        self.current_step = 0
        self.max_height = 0
        ret = super().reset(seed=seed, options=options)
        self.goal = self.np_random.uniform(1.5, 2.5, 1)  # 1.5 3.0
-        return super().reset()
+        return ret
    # overwrite reset_model to make it deterministic
    def reset_model(self):
@ -97,21 +168,3 @@ class Walker2dJumpEnv(Walker2dEnv):
        observation = self._get_obs()
        return observation
 if __name__ == '__main__':
    render_mode = "human"  # "human" or "partial" or "final"
    env = Walker2dJumpEnv()
    obs = env.reset()
    for i in range(6000):
        # test with random actions
        ac = env.action_space.sample()
        obs, rew, d, info = env.step(ac)
        if i % 10 == 0:
            env.render(mode=render_mode)
        if d:
            print('After ', i, ' steps, done: ', d)
            env.reset()
    env.close()
--- a/fancy_gym/envs/registry.py
+++ b/fancy_gym/envs/registry.py
@ -0,0 +1,309 @@
 from typing import Tuple, Union, Callable, List, Dict, Any, Optional
 import copy
 import importlib
 import numpy as np
 from collections import defaultdict
 from collections.abc import Mapping, MutableMapping
 from fancy_gym.utils.make_env_helpers import make_bb
 from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 from gymnasium import register as gym_register
 from gymnasium import make as gym_make
 from gymnasium.envs.registration import registry as gym_registry
 class DefaultMPWrapper(RawInterfaceWrapper):
    @property
    def context_mask(self):
        """
            Returns boolean mask of the same shape as the observation space.
            It determines whether the observation is returned for the contextual case or not.
            This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
            E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
            context/part of the first observation, the velocities are not necessary in the observation for the task.
            Returns:
                bool array representing the indices of the observations
        """
        # If the env already defines a context_mask, we will use that
        if hasattr(self.env, 'context_mask'):
            return self.env.context_mask
        # Otherwise we will use the whole observation as the context. (Write a custom MPWrapper to change this behavior)
        return np.full(self.env.observation_space.shape, True)
    @property
    def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
        """
            Returns the current position of the action/control dimension.
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using velocity control,
            it should, however, be implemented regardless.
            E.g. The joint positions that are directly or indirectly controlled by the action.
        """
        assert hasattr(self.env, 'current_pos'), 'DefaultMPWrapper was unable to access env.current_pos. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
        return self.env.current_pos
    @property
    def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
        """
            Returns the current velocity of the action/control dimension.
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using position control,
            it should, however, be implemented regardless.
            E.g. The joint velocities that are directly or indirectly controlled by the action.
        """
        assert hasattr(self.env, 'current_vel'), 'DefaultMPWrapper was unable to access env.current_vel. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
        return self.env.current_vel
 _BB_DEFAULTS = {
    'ProMP': {
        'wrappers': [],
        'trajectory_generator_kwargs': {
            'trajectory_generator_type': 'promp'
        },
        'phase_generator_kwargs': {
            'phase_generator_type': 'linear'
        },
        'controller_kwargs': {
            'controller_type': 'motor',
            'p_gains': 1.0,
            'd_gains': 0.1,
        },
        'basis_generator_kwargs': {
            'basis_generator_type': 'zero_rbf',
            'num_basis': 5,
            'num_basis_zero_start': 1,
            'basis_bandwidth_factor': 3.0,
        },
        'black_box_kwargs': {
        }
    },
    'DMP': {
        'wrappers': [],
        'trajectory_generator_kwargs': {
            'trajectory_generator_type': 'dmp'
        },
        'phase_generator_kwargs': {
            'phase_generator_type': 'exp'
        },
        'controller_kwargs': {
            'controller_type': 'motor',
            'p_gains': 1.0,
            'd_gains': 0.1,
        },
        'basis_generator_kwargs': {
            'basis_generator_type': 'rbf',
            'num_basis': 5
        },
        'black_box_kwargs': {
        }
    },
    'ProDMP': {
        'wrappers': [],
        'trajectory_generator_kwargs': {
            'trajectory_generator_type': 'prodmp',
            'duration': 2.0,
            'weights_scale': 1.0,
        },
        'phase_generator_kwargs': {
            'phase_generator_type': 'exp',
            'tau': 1.5,
        },
        'controller_kwargs': {
            'controller_type': 'motor',
            'p_gains': 1.0,
            'd_gains': 0.1,
        },
        'basis_generator_kwargs': {
            'basis_generator_type': 'prodmp',
            'alpha': 10,
            'num_basis': 5,
        },
        'black_box_kwargs': {
        }
    }
 }
 KNOWN_MPS = list(_BB_DEFAULTS.keys())
 _KNOWN_MPS_PLUS_ALL = KNOWN_MPS + ['all']
 ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
 MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS = {}
 def register(
        id: str,
        entry_point: Optional[Union[Callable, str]] = None,
        mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
        register_step_based: bool = True,  # TODO: Detect
        add_mp_types: List[str] = KNOWN_MPS,
        mp_config_override: Dict[str, Any] = {},
        **kwargs
 ):
    """
    Registers a Gymnasium environment, including Movement Primitives (MP) versions.
    If you only want to register MP versions for an already registered environment, use fancy_gym.upgrade instead.
    Args:
        id (str): The unique identifier for the environment.
        entry_point (Optional[Union[Callable, str]]): The entry point for creating the environment.
        mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment.
        register_step_based (bool): Whether to also register the raw srtep-based version of the environment (default True).
        add_mp_types (List[str]): List of additional MP types to register.
        mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
        **kwargs: Additional keyword arguments which are passed to the environment constructor.
    Notes:
        - When `register_step_based` is True, the raw environment will also be registered to gymnasium otherwise only mp-versions will be registered.
        - `entry_point` can be given as a string, allowing the same notation as gymnasium.
        - If `id` already exists in the Gymnasium registry and `register_step_based` is True,
          a warning message will be printed, suggesting to set `register_step_based=False` or use `fancy_gym.upgrade`.
    Example:
        To register a step-based environment with Movement Primitive versions (will use default mp_wrapper):
        >>> register("MyEnv-v0", MyEnvClass"my_module:MyEnvClass")
        The entry point can also be provided as a string:
        >>> register("MyEnv-v0", "my_module:MyEnvClass")
    """
    if register_step_based and id in gym_registry:
        print(f'[Info] Gymnasium env with id "{id}" already exists. You should supply register_step_based=False or use fancy_gym.upgrade if you only want to register mp versions of an existing env.')
    if register_step_based:
        assert entry_point != None, 'You need to provide an entry-point, when registering step-based.'
    if not callable(mp_wrapper):  # mp_wrapper can be given as a String (same notation as for entry_point)
        mod_name, attr_name = mp_wrapper.split(':')
        mod = importlib.import_module(mod_name)
        mp_wrapper = getattr(mod, attr_name)
    if register_step_based:
        gym_register(id=id, entry_point=entry_point, **kwargs)
    upgrade(id, mp_wrapper, add_mp_types, mp_config_override)
 def upgrade(
        id: str,
        mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
        add_mp_types: List[str] = KNOWN_MPS,
        base_id: Optional[str] = None,
        mp_config_override: Dict[str, Any] = {},
 ):
    """
    Upgrades an existing Gymnasium environment to include Movement Primitives (MP) versions.
    We expect the raw step-based env to be already registered with gymnasium. Otherwise please use fancy_gym.register instead.
    Args:
        id (str): The unique identifier for the environment.
        mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment (default is DefaultMPWrapper).
        add_mp_types (List[str]): List of additional MP types to register (default is KNOWN_MPS).
        base_id (Optional[str]): The unique identifier for the environment to upgrade. Will use id if non is provided. Can be defined to allow multiple registrations of different versions for the same step-based environment.
        mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
    Notes:
        - The `id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade. You can also pick a new one, but then `base_id` needs to be provided.
        - The `mp_wrapper` parameter specifies the MP wrapper to use, allowing for customization.
        - `add_mp_types` can be used to specify additional MP types to register alongside the base environment.
        - The `base_id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade.
        - `mp_config_override` allows for customizing MP configuration if needed.
    Example:
        To upgrade an existing environment with MP versions:
        >>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper)
        To upgrade an existing environment with custom MP types and configuration:
        >>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper, add_mp_types=["ProDMP", "DMP"], mp_config_override={"param": 42})
    """
    if not base_id:
        base_id = id
    register_mps(id, base_id, mp_wrapper, add_mp_types, mp_config_override)
 def register_mps(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, add_mp_types: List[str] = KNOWN_MPS, mp_config_override: Dict[str, Any] = {}):
    for mp_type in add_mp_types:
        register_mp(id, base_id, mp_wrapper, mp_type, mp_config_override.get(mp_type, {}))
 def register_mp(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, mp_type: List[str], mp_config_override: Dict[str, Any] = {}):
    assert mp_type in KNOWN_MPS, 'Unknown mp_type'
    assert id not in ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type], f'The environment {id} is already registered for {mp_type}.'
    parts = id.split('/')
    if len(parts) == 1:
        ns, name = 'gym', parts[0]
    elif len(parts) == 2:
        ns, name = parts[0], parts[1]
    else:
        raise ValueError('env id can not contain multiple "/".')
    parts = name.split('-')
    assert len(parts) >= 2 and parts[-1].startswith('v'), 'Malformed env id, must end in -v{int}.'
    fancy_id = f'{ns}_{mp_type}/{name}'
    gym_register(
        id=fancy_id,
        entry_point=bb_env_constructor,
        kwargs={
            'underlying_id': base_id,
            'mp_wrapper': mp_wrapper,
            'mp_type': mp_type,
            '_mp_config_override_register': mp_config_override
        }
    )
    ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type].append(fancy_id)
    ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all'].append(fancy_id)
    if ns not in MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS:
        MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns] = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
    MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns][mp_type].append(fancy_id)
    MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]['all'].append(fancy_id)
 def nested_update(base: MutableMapping, update):
    """
    Updated method for nested Mappings
    Args:
        base: main Mapping to be updated
        update: updated values for base Mapping
    """
    if any([item.endswith('_type') for item in update]):
        base = update
        return base
    for k, v in update.items():
        base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
    return base
 def bb_env_constructor(underlying_id, mp_wrapper, mp_type, mp_config_override={}, _mp_config_override_register={}, **kwargs):
    raw_underlying_env = gym_make(underlying_id, **kwargs)
    underlying_env = mp_wrapper(raw_underlying_env)
    mp_config = getattr(underlying_env, 'mp_config') if hasattr(underlying_env, 'mp_config') else {}
    active_mp_config = copy.deepcopy(mp_config.get(mp_type, {}))
    global_inherit_defaults = mp_config.get('inherit_defaults', True)
    inherit_defaults = active_mp_config.pop('inherit_defaults', global_inherit_defaults)
    config = copy.deepcopy(_BB_DEFAULTS[mp_type]) if inherit_defaults else {}
    nested_update(config, active_mp_config)
    nested_update(config, _mp_config_override_register)
    nested_update(config, mp_config_override)
    wrappers = config.pop('wrappers')
    traj_gen_kwargs = config.pop('trajectory_generator_kwargs', {})
    black_box_kwargs = config.pop('black_box_kwargs', {})
    contr_kwargs = config.pop('controller_kwargs', {})
    phase_kwargs = config.pop('phase_generator_kwargs', {})
    basis_kwargs = config.pop('basis_generator_kwargs', {})
    return make_bb(underlying_env,
                   wrappers=wrappers,
                   black_box_kwargs=black_box_kwargs,
                   traj_gen_kwargs=traj_gen_kwargs,
                   controller_kwargs=contr_kwargs,
                   phase_kwargs=phase_kwargs,
                   basis_kwargs=basis_kwargs,
                   **config)
--- a/fancy_gym/examples/example_replanning_envs.py
+++ b/fancy_gym/examples/example_replanning_envs.py
@ -1,20 +1,23 @@
 import gymnasium as gym
 import fancy_gym
-def example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False):
+
-    env = fancy_gym.make(env_name, seed=seed)
+def example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False):
-    env.reset()
+    env = gym.make(env_name)
    env.reset(seed=seed)
    for i in range(iterations):
        done = False
        while done is False:
            ac = env.action_space.sample()
-            obs, reward, done, info = env.step(ac)
+            obs, reward, terminated, truncated, info = env.step(ac)
            if render:
                env.render(mode="human")
-            if done:
+            if terminated or truncated:
                env.reset()
    env.close()
    del env
 def example_custom_replanning_envs(seed=0, iteration=100, render=True):
    # id for a step-based environment
    base_env_id = "BoxPushingDense-v0"
@ -22,7 +25,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
    wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper]
    trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp',
-                                   'weight_scale': 1}
+                                   'weights_scale': 1}
    phase_generator_kwargs = {'phase_generator_type': 'exp'}
    controller_kwargs = {'controller_type': 'velocity'}
    basis_generator_kwargs = {'basis_generator_type': 'prodmp',
@ -46,8 +49,8 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
    for i in range(iteration):
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
-        if done:
+        if terminated or truncated:
            env.reset()
    env.close()
@ -56,7 +59,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
 if __name__ == "__main__":
    # run a registered replanning environment
-    example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False)
+    example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False)
    # run a custom replanning environment
-    example_custom_replanning_envs(seed=0, iteration=8, render=True)
+    example_custom_replanning_envs(seed=0, iteration=8, render=True)
--- a/fancy_gym/examples/examples_dmc.py
+++ b/fancy_gym/examples/examples_dmc.py
@ -1,7 +1,8 @@
 import gymnasium as gym
 import fancy_gym
-def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
+def example_dmc(env_id="dm_control/fish-swim", seed=1, iterations=1000, render=True):
    """
    Example for running a DMC based env in the step based setting.
    The env_id has to be specified as `domain_name:task_name` or
@ -16,9 +17,9 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
    Returns:
    """
-    env = fancy_gym.make(env_id, seed)
+    env = gym.make(env_id)
    rewards = 0
-    obs = env.reset()
+    obs = env.reset(seed=seed)
    print("observation shape:", env.observation_space.shape)
    print("action shape:", env.action_space.shape)
@ -26,10 +27,10 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
        ac = env.action_space.sample()
        if render:
            env.render(mode="human")
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        rewards += reward
-        if done:
+        if terminated or truncated:
            print(env_id, rewards)
            rewards = 0
            obs = env.reset()
@ -56,7 +57,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
    """
    # Base DMC name, according to structure of above example
-    base_env_id = "dmc:ball_in_cup-catch"
+    base_env_id = "dm_control/ball_in_cup-catch"
    # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
    # You can also add other gym.Wrappers in case they are needed.
@ -65,8 +66,8 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
    trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
    phase_generator_kwargs = {'phase_generator_type': 'linear'}
    controller_kwargs = {'controller_type': 'motor',
-                          "p_gains": 1.0,
+                         "p_gains": 1.0,
-                          "d_gains": 0.1,}
+                         "d_gains": 0.1, }
    basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
                              'num_basis': 5,
                              'num_basis_zero_start': 1
@ -102,10 +103,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
    # number of samples/full trajectories (multiple environment steps)
    for i in range(iterations):
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        rewards += reward
-        if done:
+        if terminated or truncated:
            print(base_env_id, rewards)
            rewards = 0
            obs = env.reset()
@ -123,14 +124,14 @@ if __name__ == '__main__':
    render = True
    # # Standard DMC Suite tasks
-    example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render)
+    example_dmc("dm_control/fish-swim", seed=10, iterations=1000, render=render)
    #
    # # Manipulation tasks
    # # Disclaimer: The vision versions are currently not integrated and yield an error
-    example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render)
+    example_dmc("dm_control/manipulation-reach_site_features", seed=10, iterations=250, render=render)
    #
    # # Gym + DMC hybrid task provided in the MP framework
-    example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render)
+    example_dmc("dm_control_ProMP/ball_in_cup-catch-v0", seed=10, iterations=1, render=render)
    # Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
    # already registered above
--- a/fancy_gym/examples/examples_general.py
+++ b/fancy_gym/examples/examples_general.py
@ -1,6 +1,6 @@
 from collections import defaultdict
-import gym
+import gymnasium as gym
 import numpy as np
 import fancy_gym
@ -21,27 +21,27 @@ def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
    """
-    env = fancy_gym.make(env_id, seed)
+    env = gym.make(env_id)
    rewards = 0
-    obs = env.reset()
+    obs = env.reset(seed=seed)
    print("Observation shape: ", env.observation_space.shape)
    print("Action shape: ", env.action_space.shape)
    # number of environment steps
    for i in range(iterations):
-        obs, reward, done, info = env.step(env.action_space.sample())
+        obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
        rewards += reward
        if render:
            env.render()
-        if done:
+        if terminated or truncated:
            print(rewards)
            rewards = 0
            obs = env.reset()
-def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
+def example_async(env_id="fancy/HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
    """
    Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
    This also includes DMC and DMP environments when leveraging our custom make_env function.
@ -69,12 +69,15 @@ def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samp
    # this would generate more samples than requested if n_samples % num_envs != 0
    repeat = int(np.ceil(n_samples / env.num_envs))
    for i in range(repeat):
-        obs, reward, done, info = env.step(env.action_space.sample())
+        obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
        buffer['obs'].append(obs)
        buffer['reward'].append(reward)
-        buffer['done'].append(done)
+        buffer['terminated'].append(terminated)
        buffer['truncated'].append(truncated)
        buffer['info'].append(info)
        rewards += reward
        done = terminated or truncated
        if np.any(done):
            print(f"Reward at iteration {i}: {rewards[done]}")
            rewards[done] = 0
@ -90,11 +93,10 @@ if __name__ == '__main__':
    example_general("Pendulum-v1", seed=10, iterations=200, render=render)
    # Mujoco task from framework
-    example_general("Reacher5d-v0", seed=10, iterations=200, render=render)
+    example_general("fancy/Reacher5d-v0", seed=10, iterations=200, render=render)
    # # OpenAI Mujoco task
    example_general("HalfCheetah-v2", seed=10, render=render)
    # Vectorized multiprocessing environments
    # example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)
--- a/fancy_gym/examples/examples_metaworld.py
+++ b/fancy_gym/examples/examples_metaworld.py
@ -1,7 +1,8 @@
 import gymnasium as gym
 import fancy_gym
-def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
+def example_meta(env_id="fish-swim", seed=1, iterations=1000, render=True):
    """
    Example for running a MetaWorld based env in the step based setting.
    The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
@ -17,9 +18,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
    Returns:
    """
-    env = fancy_gym.make(env_id, seed)
+    env = gym.make(env_id)
    rewards = 0
-    obs = env.reset()
+    obs = env.reset(seed=seed)
    print("observation shape:", env.observation_space.shape)
    print("action shape:", env.action_space.shape)
@ -29,9 +30,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
            # THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
            # TODO: Remove this, when Metaworld fixes its interface.
            env.render(False)
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        rewards += reward
-        if done:
+        if terminated or truncated:
            print(env_id, rewards)
            rewards = 0
            obs = env.reset()
@ -40,7 +41,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
    del env
-def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
+def example_custom_meta_and_mp(seed=1, iterations=1, render=True):
    """
    Example for running a custom movement primitive based environments.
    Our already registered environments follow the same structure.
@ -58,7 +59,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
    """
    # Base MetaWorld name, according to structure of above example
-    base_env_id = "metaworld:button-press-v2"
+    base_env_id = "metaworld/button-press-v2"
    # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
    # You can also add other gym.Wrappers in case they are needed.
@ -103,10 +104,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
    # number of samples/full trajectories (multiple environment steps)
    for i in range(iterations):
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        rewards += reward
-        if done:
+        if terminated or truncated:
            print(base_env_id, rewards)
            rewards = 0
            obs = env.reset()
@ -124,11 +125,10 @@ if __name__ == '__main__':
    render = False
    # # Standard Meta world tasks
-    example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render)
+    example_meta("metaworld/button-press-v2", seed=10, iterations=500, render=render)
    # # MP + MetaWorld hybrid task provided in the our framework
-    example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render)
+    example_meta("metaworld_ProMP/ButtonPress-v2", seed=10, iterations=1, render=render)
    #
    # # Custom MetaWorld task
-    example_custom_dmc_and_mp(seed=10, iterations=1, render=render)
+    example_custom_meta_and_mp(seed=10, iterations=1, render=render)
--- a/fancy_gym/examples/examples_movement_primitives.py
+++ b/fancy_gym/examples/examples_movement_primitives.py
@ -1,7 +1,8 @@
 import gymnasium as gym
 import fancy_gym
-def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True):
+def example_mp(env_name="fancy_ProMP/HoleReacher-v0", seed=1, iterations=1, render=True):
    """
    Example for running a black box based environment, which is already registered
    Args:
@ -15,11 +16,11 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
    """
    # Equivalent to gym, we have a make function which can be used to create environments.
    # It takes care of seeding and enables the use of a variety of external environments using the gym interface.
-    env = fancy_gym.make(env_name, seed)
+    env = gym.make(env_name)
    returns = 0
    # env.render(mode=None)
-    obs = env.reset()
+    obs = env.reset(seed=seed)
    # number of samples/full trajectories (multiple environment steps)
    for i in range(iterations):
@ -41,16 +42,16 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
        # This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
        # full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
        # to the return of a trajectory. Default is the sum over the step-wise rewards.
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        # Aggregated returns
        returns += reward
-        if done:
+        if terminated or truncated:
            print(reward)
            obs = env.reset()
-def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True):
+def example_custom_mp(env_name="fancy_ProMP/Reacher5d-v0", seed=1, iterations=1, render=True):
    """
    Example for running a movement primitive based environment, which is already registered
    Args:
@ -62,12 +63,9 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
    Returns:
    """
-    # Changing the arguments of the black box env is possible by providing them to gym as with all kwargs.
+    # Changing the arguments of the black box env is possible by providing them to gym through mp_config_override.
    # E.g. here for way to many basis functions
-    env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000})
+    env = gym.make(env_name, seed, mp_config_override={'basis_generator_kwargs': {'num_basis': 1000}})
    # env = fancy_gym.make(env_name, seed)
    # mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
    # mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
    returns = 0
    obs = env.reset()
@ -79,10 +77,10 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
    # number of samples/full trajectories (multiple environment steps)
    for i in range(iterations):
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        returns += reward
-        if done:
+        if terminated or truncated:
            print(i, reward)
            obs = env.reset()
@ -106,7 +104,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
    """
-    base_env_id = "Reacher5d-v0"
+    base_env_id = "fancy/Reacher5d-v0"
    # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
    # You can also add other gym.Wrappers in case they are needed.
@ -114,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
    # For a ProMP
    trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
-                                   'weight_scale': 2}
+                                   'weights_scale': 2}
    phase_generator_kwargs = {'phase_generator_type': 'linear'}
    controller_kwargs = {'controller_type': 'velocity'}
    basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
@ -124,7 +122,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
    # # For a DMP
    # trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
-    #                                'weight_scale': 500}
+    #                                'weights_scale': 500}
    # phase_generator_kwargs = {'phase_generator_type': 'exp',
    #                           'alpha_phase': 2.5}
    # controller_kwargs = {'controller_type': 'velocity'}
@ -145,10 +143,10 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
    # number of samples/full trajectories (multiple environment steps)
    for i in range(iterations):
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        rewards += reward
-        if done:
+        if terminated or truncated:
            print(rewards)
            rewards = 0
            obs = env.reset()
@ -157,20 +155,20 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
 if __name__ == '__main__':
    render = False
    # DMP
-    example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render)
+    example_mp("fancy_DMP/HoleReacher-v0", seed=10, iterations=5, render=render)
    # ProMP
-    example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render)
+    example_mp("fancy_ProMP/HoleReacher-v0", seed=10, iterations=5, render=render)
-    example_mp("BoxPushingTemporalSparseProMP-v0", seed=10, iterations=1, render=render)
+    example_mp("fancy_ProMP/BoxPushingTemporalSparse-v0", seed=10, iterations=1, render=render)
-    example_mp("TableTennis4DProMP-v0", seed=10, iterations=20, render=render)
+    example_mp("fancy_ProMP/TableTennis4D-v0", seed=10, iterations=20, render=render)
    # ProDMP with Replanning
-    example_mp("BoxPushingDenseReplanProDMP-v0", seed=10, iterations=4, render=render)
+    example_mp("fancy_ProDMP/BoxPushingDenseReplan-v0", seed=10, iterations=4, render=render)
-    example_mp("TableTennis4DReplanProDMP-v0", seed=10, iterations=20, render=render)
+    example_mp("fancy_ProDMP/TableTennis4DReplan-v0", seed=10, iterations=20, render=render)
-    example_mp("TableTennisWindReplanProDMP-v0", seed=10, iterations=20, render=render)
+    example_mp("fancy_ProDMP/TableTennisWindReplan-v0", seed=10, iterations=20, render=render)
    # Altered basis functions
-    obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render)
+    obs1 = example_custom_mp("fancy_ProMP/Reacher5d-v0", seed=10, iterations=1, render=render)
    # Custom MP
    example_fully_custom_mp(seed=10, iterations=1, render=render)
--- a/fancy_gym/examples/examples_open_ai.py
+++ b/fancy_gym/examples/examples_open_ai.py
@ -1,3 +1,4 @@
 import gymnasium as gym
 import fancy_gym
@ -12,11 +13,10 @@ def example_mp(env_name, seed=1, render=True):
    Returns:
    """
-    # While in this case gym.make() is possible to use as well, we recommend our custom make env function.
+    env = gym.make(env_name)
    env = fancy_gym.make(env_name, seed)
    returns = 0
-    obs = env.reset()
+    obs = env.reset(seed=seed)
    # number of samples/full trajectories (multiple environment steps)
    for i in range(10):
        if render and i % 2 == 0:
@ -24,14 +24,13 @@ def example_mp(env_name, seed=1, render=True):
        else:
            env.render()
        ac = env.action_space.sample()
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        returns += reward
-        if done:
+        if terminated or truncated:
            print(returns)
            obs = env.reset()
 if __name__ == '__main__':
-    example_mp("ReacherProMP-v2")
+    example_mp("gym_ProMP/Reacher-v2")
--- a/fancy_gym/examples/mp_params_tuning.py
+++ b/fancy_gym/examples/mp_params_tuning.py
@ -1,10 +1,14 @@
 import gymnasium as gym
 import fancy_gym
 def compare_bases_shape(env1_id, env2_id):
-    env1 = fancy_gym.make(env1_id, seed=0)
+    env1 = gym.make(env1_id)
    env1.traj_gen.show_scaled_basis(plot=True)
-    env2 = fancy_gym.make(env2_id, seed=0)
+    env2 = gym.make(env2_id)
    env2.traj_gen.show_scaled_basis(plot=True)
    return
 if __name__ == '__main__':
-    compare_bases_shape("TableTennis4DProDMP-v0", "TableTennis4DProMP-v0")
+    compare_bases_shape("fancy_ProDMP/TableTennis4D-v0", "fancy_ProMP/TableTennis4D-v0")
--- a/fancy_gym/examples/pd_control_gain_tuning.py
+++ b/fancy_gym/examples/pd_control_gain_tuning.py
@ -3,19 +3,20 @@ from collections import OrderedDict
 import numpy as np
 from matplotlib import pyplot as plt
 import gymnasium as gym
 import fancy_gym
 # This might work for some environments, however, please verify either way the correct trajectory information
 # for your environment are extracted below
 SEED = 1
-env_id = "Reacher5dProMP-v0"
+env_id = "fancy_ProMP/Reacher5d-v0"
-env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env
+env = fancy_gym.make(env_id, mp_config_override={'controller_kwargs': {'p_gains': 0.05, 'd_gains': 0.05}}).env
 env.action_space.seed(SEED)
 # Plot difference between real trajectory and target MP trajectory
-env.reset()
+env.reset(seed=SEED)
 w = env.action_space.sample()
 pos, vel = env.get_trajectory(w)
@ -34,7 +35,7 @@ fig.show()
 for t, (des_pos, des_vel) in enumerate(zip(pos, vel)):
    actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel)
    actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
-    _, _, _, _ = env.env.step(actions)
+    env.env.step(actions)
    if t % 15 == 0:
        img.set_data(env.env.render(mode="rgb_array"))
        fig.canvas.draw()
--- a/fancy_gym/meta/README.MD
+++ b/fancy_gym/meta/README.MD
@ -1,26 +1,64 @@
-# MetaWorld Wrappers
+# Metaworld
-These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Movement Primitive gym interface with them. 
+[Metaworld](https://meta-world.github.io/) is an open-source simulated benchmark designed to advance meta-reinforcement learning and multi-task learning, comprising 50 diverse robotic manipulation tasks. The benchmark features a universal tabletop environment equipped with a simulated Sawyer arm and a variety of everyday objects. This shared environment is pivotal for reusing structured learning and efficiently acquiring related tasks.
-All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized. 
+
-Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following:
+## Step-Based Envs
-```python
+
-    return np.hstack([
+`fancy_gym` makes all metaworld ML1 tasks avaible via the standard gym interface. To access metaworld environments using a different mode of operation (MT1 / ML100 / etc.) please use the functionality provided by metaworld directly.
-        # Current observation
+
-        [False] * 3,  # end-effector position
+| Name                                     | Description                                                                           | Horizon | Action Dimension | Observation Dimension | Context Dimension |
-        [False] * 1,  # normalized gripper open distance
+| ---------------------------------------- | ------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- | ----------------- |
-        [True] * 3,  # main object position
+| `metaworld/assembly-v2`                  | A task where the robot must assemble components.                                      | 500     | 4                | 39                    | 6                 |
-        [False] * 4,  # main object quaternion
+| `metaworld/basketball-v2`                | A task where the robot must play a game of basketball.                                | 500     | 4                | 39                    | 6                 |
-        [False] * 3,  # secondary object position
+| `metaworld/bin-picking-v2`               | A task involving the robot picking objects from a bin.                                | 500     | 4                | 39                    | 6                 |
-        [False] * 4,  # secondary object quaternion
+| `metaworld/box-close-v2`                 | A task requiring the robot to close a box.                                            | 500     | 4                | 39                    | 6                 |
-        # Previous observation
+| `metaworld/button-press-topdown-v2`      | A task where the robot must press a button from a top-down perspective.               | 500     | 4                | 39                    | 6                 |
-        [False] * 3,  # previous end-effector position
+| `metaworld/button-press-topdown-wall-v2` | A task involving the robot pressing a button with a wall from a top-down perspective. | 500     | 4                | 39                    | 6                 |
-        [False] * 1,  # previous normalized gripper open distance
+| `metaworld/button-press-v2`              | A task where the robot must press a button.                                           | 500     | 4                | 39                    | 6                 |
-        [False] * 3,  # previous main object position
+| `metaworld/button-press-wall-v2`         | A task involving the robot pressing a button with a wall.                             | 500     | 4                | 39                    | 6                 |
-        [False] * 4,  # previous main object quaternion
+| `metaworld/coffee-button-v2`             | A task where the robot must press a button on a coffee machine.                       | 500     | 4                | 39                    | 6                 |
-        [False] * 3,  # previous second object position
+| `metaworld/coffee-pull-v2`               | A task involving the robot pulling a lever on a coffee machine.                       | 500     | 4                | 39                    | 6                 |
-        [False] * 4,  # previous second object quaternion
+| `metaworld/coffee-push-v2`               | A task involving the robot pushing a component on a coffee machine.                   | 500     | 4                | 39                    | 6                 |
-        # Goal
+| `metaworld/dial-turn-v2`                 | A task where the robot must turn a dial.                                              | 500     | 4                | 39                    | 6                 |
-        [True] * 3,  # goal position
+| `metaworld/disassemble-v2`               | A task requiring the robot to disassemble an object.                                  | 500     | 4                | 39                    | 6                 |
-    ])
+| `metaworld/door-close-v2`                | A task where the robot must close a door.                                             | 500     | 4                | 39                    | 6                 |
-```
+| `metaworld/door-lock-v2`                 | A task involving the robot locking a door.                                            | 500     | 4                | 39                    | 6                 |
-For other tasks only the boolean values have to be adjusted accordingly.
+| `metaworld/door-open-v2`                 | A task where the robot must open a door.                                              | 500     | 4                | 39                    | 6                 |
 | `metaworld/door-unlock-v2`               | A task involving the robot unlocking a door.                                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/hand-insert-v2`               | A task requiring the robot to insert a hand into an object.                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/drawer-close-v2`              | A task where the robot must close a drawer.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/drawer-open-v2`               | A task involving the robot opening a drawer.                                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/faucet-open-v2`               | A task requiring the robot to open a faucet.                                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/faucet-close-v2`              | A task where the robot must close a faucet.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/hammer-v2`                    | A task where the robot must use a hammer.                                             | 500     | 4                | 39                    | 6                 |
 | `metaworld/handle-press-side-v2`         | A task involving the robot pressing a handle from the side.                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/handle-press-v2`              | A task where the robot must press a handle.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/handle-pull-side-v2`          | A task requiring the robot to pull a handle from the side.                            | 500     | 4                | 39                    | 6                 |
 | `metaworld/handle-pull-v2`               | A task where the robot must pull a handle.                                            | 500     | 4                | 39                    | 6                 |
 | `metaworld/lever-pull-v2`                | A task involving the robot pulling a lever.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/peg-insert-side-v2`           | A task requiring the robot to insert a peg from the side.                             | 500     | 4                | 39                    | 6                 |
 | `metaworld/pick-place-wall-v2`           | A task involving the robot picking and placing an object with a wall.                 | 500     | 4                | 39                    | 6                 |
 | `metaworld/pick-out-of-hole-v2`          | A task where the robot must pick an object out of a hole.                             | 500     | 4                | 39                    | 6                 |
 | `metaworld/reach-v2`                     | A task where the robot must reach an object.                                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/push-back-v2`                 | A task involving the robot pushing an object backward.                                | 500     | 4                | 39                    | 6                 |
 | `metaworld/push-v2`                      | A task where the robot must push an object.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/pick-place-v2`                | A task involving the robot picking up and placing an object.                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/plate-slide-v2`               | A task requiring the robot to slide a plate.                                          | 500     | 4                | 39                    | 6                 |
 | `metaworld/plate-slide-side-v2`          | A task involving the robot sliding a plate from the side.                             | 500     | 4                | 39                    | 6                 |
 | `metaworld/plate-slide-back-v2`          | A task where the robot must slide a plate backward.                                   | 500     | 4                | 39                    | 6                 |
 | `metaworld/plate-slide-back-side-v2`     | A task involving the robot sliding a plate backward from the side.                    | 500     | 4                | 39                    | 6                 |
 | `metaworld/peg-unplug-side-v2`           | A task where the robot must unplug a peg from the side.                               | 500     | 4                | 39                    | 6                 |
 | `metaworld/soccer-v2`                    | A task where the robot must play soccer.                                              | 500     | 4                | 39                    | 6                 |
 | `metaworld/stick-push-v2`                | A task involving the robot pushing a stick.                                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/stick-pull-v2`                | A task where the robot must pull a stick.                                             | 500     | 4                | 39                    | 6                 |
 | `metaworld/push-wall-v2`                 | A task involving the robot pushing against a wall.                                    | 500     | 4                | 39                    | 6                 |
 | `metaworld/reach-wall-v2`                | A task where the robot must reach an object with a wall.                              | 500     | 4                | 39                    | 6                 |
 | `metaworld/shelf-place-v2`               | A task involving the robot placing an object on a shelf.                              | 500     | 4                | 39                    | 6                 |
 | `metaworld/sweep-into-v2`                | A task where the robot must sweep objects into a container.                           | 500     | 4                | 39                    | 6                 |
 | `metaworld/sweep-v2`                     | A task requiring the robot to sweep.                                                  | 500     | 4                | 39                    | 6                 |
 | `metaworld/window-open-v2`               | A task where the robot must open a window.                                            | 500     | 4                | 39                    | 6                 |
 | `metaworld/window-close-v2`              | A task involving the robot closing a window.                                          | 500     | 4                | 39                    | 6                 |
 ## MP-Based Envs
 All envs also exist in MP-variants. Refer to them using `metaworld_ProMP/<name-v2>` or `metaworld_ProDMP/<name-v2>` (DMP is currently not supported as of now).
--- a/fancy_gym/meta/init.py
+++ b/fancy_gym/meta/init.py
@ -1,125 +1,37 @@
 from typing import Iterable, Type, Union, Optional
 from copy import deepcopy
-from gym import register
+from ..envs.registry import register
 from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
    object_change_mp_wrapper
 from . import metaworld_adapter
 metaworld_adapter.register_all_ML1()
 ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
 # MetaWorld
 DEFAULT_BB_DICT_ProMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'promp',
        'weights_scale': 10,
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'linear'
    },
    "controller_kwargs": {
        'controller_type': 'metaworld',
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'zero_rbf',
        'num_basis': 5,
        'num_basis_zero_start': 1
    },
    'black_box_kwargs': {
        'condition_on_desired': False,
    }
 }
 DEFAULT_BB_DICT_ProDMP = {
    "name": 'EnvName',
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'prodmp',
        'auto_scale_basis': True,
        'weights_scale': 10,
        # 'goal_scale': 0.,
        'disable_goal': True,
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'exp',
        # 'alpha_phase' : 3,
    },
    "controller_kwargs": {
        'controller_type': 'metaworld',
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'prodmp',
        'num_basis': 5,
        'alpha': 10
    },
    'black_box_kwargs': {
        'condition_on_desired': False,
    }
 }
 _goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
                     "plate-slide-side-v2", "plate-slide-back-side-v2"]
 for _task in _goal_change_envs:
    task_id_split = _task.split("-")
    name = "".join([s.capitalize() for s in task_id_split[:-1]])
    # ProMP
    _env_id = f'{name}ProMP-{task_id_split[-1]}'
    kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
    register(
-        id=_env_id,
+        id=f'metaworld/{_task}',
-        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+        register_step_based=False,
-        kwargs=kwargs_dict_goal_change_promp
+        mp_wrapper=goal_change_mp_wrapper.MPWrapper,
        add_mp_types=['ProMP', 'ProDMP'],
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
    # ProDMP
    _env_id = f'{name}ProDMP-{task_id_split[-1]}'
    kwargs_dict_goal_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_goal_change_prodmp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_change_prodmp['name'] = f'metaworld:{_task}'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_goal_change_prodmp
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 _object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
 for _task in _object_change_envs:
    task_id_split = _task.split("-")
    name = "".join([s.capitalize() for s in task_id_split[:-1]])
    # ProMP
    _env_id = f'{name}ProMP-{task_id_split[-1]}'
    kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
    kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
    register(
-        id=_env_id,
+        id=f'metaworld/{_task}',
-        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+        register_step_based=False,
-        kwargs=kwargs_dict_object_change_promp
+        mp_wrapper=object_change_mp_wrapper.MPWrapper,
        add_mp_types=['ProMP', 'ProDMP'],
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
    # ProDMP
    _env_id = f'{name}ProDMP-{task_id_split[-1]}'
    kwargs_dict_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_object_change_prodmp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
    kwargs_dict_object_change_prodmp['name'] = f'metaworld:{_task}'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_object_change_prodmp
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
                                "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
@ -133,62 +45,18 @@ _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press
                                "shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
                                ]
 for _task in _goal_and_object_change_envs:
    task_id_split = _task.split("-")
    name = "".join([s.capitalize() for s in task_id_split[:-1]])
    # ProMP
    _env_id = f'{name}ProMP-{task_id_split[-1]}'
    kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
    register(
-        id=_env_id,
+        id=f'metaworld/{_task}',
-        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+        register_step_based=False,
-        kwargs=kwargs_dict_goal_and_object_change_promp
+        mp_wrapper=goal_object_change_mp_wrapper.MPWrapper,
        add_mp_types=['ProMP', 'ProDMP'],
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
    # ProDMP
    _env_id = f'{name}ProDMP-{task_id_split[-1]}'
    kwargs_dict_goal_and_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_goal_and_object_change_prodmp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_and_object_change_prodmp['name'] = f'metaworld:{_task}'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_goal_and_object_change_prodmp
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
 _goal_and_endeffector_change_envs = ["basketball-v2"]
 for _task in _goal_and_endeffector_change_envs:
    task_id_split = _task.split("-")
    name = "".join([s.capitalize() for s in task_id_split[:-1]])
    # ProMP
    _env_id = f'{name}ProMP-{task_id_split[-1]}'
    kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
    kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
    register(
-        id=_env_id,
+        id=f'metaworld/{_task}',
-        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+        register_step_based=False,
-        kwargs=kwargs_dict_goal_and_endeffector_change_promp
+        mp_wrapper=goal_endeffector_change_mp_wrapper.MPWrapper,
        add_mp_types=['ProMP', 'ProDMP'],
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
    # ProDMP
    _env_id = f'{name}ProDMP-{task_id_split[-1]}'
    kwargs_dict_goal_and_endeffector_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
    kwargs_dict_goal_and_endeffector_change_prodmp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
    kwargs_dict_goal_and_endeffector_change_prodmp['name'] = f'metaworld:{_task}'
    register(
        id=_env_id,
        entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
        kwargs=kwargs_dict_goal_and_endeffector_change_prodmp
    )
    ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
--- a/fancy_gym/meta/base_metaworld_mp_wrapper.py
+++ b/fancy_gym/meta/base_metaworld_mp_wrapper.py
@ -6,12 +6,63 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class BaseMetaworldMPWrapper(RawInterfaceWrapper):
    mp_config = {
        'inherit_defaults': False,
        'ProMP': {
            'wrappers': [],
            'trajectory_generator_kwargs': {
                'trajectory_generator_type': 'promp',
                'weights_scale': 10,
            },
            'phase_generator_kwargs': {
                'phase_generator_type': 'linear'
            },
            'controller_kwargs': {
                'controller_type': 'metaworld',
            },
            'basis_generator_kwargs': {
                'basis_generator_type': 'zero_rbf',
                'num_basis': 5,
                'num_basis_zero_start': 1
            },
            'black_box_kwargs': {
                'condition_on_desired': False,
            },
        },
        'DMP': {},
        'ProDMP': {
            'wrappers': [],
            'trajectory_generator_kwargs': {
                'trajectory_generator_type': 'prodmp',
                'auto_scale_basis': True,
                'weights_scale': 10,
                # 'goal_scale': 0.,
                'disable_goal': True,
            },
            'phase_generator_kwargs': {
                'phase_generator_type': 'exp',
                # 'alpha_phase' : 3,
            },
            'controller_kwargs': {
                'controller_type': 'metaworld',
            },
            'basis_generator_kwargs': {
                'basis_generator_type': 'prodmp',
                'num_basis': 5,
                'alpha': 10
            },
            'black_box_kwargs': {
                'condition_on_desired': False,
            },
        },
    }
    @property
    def current_pos(self) -> Union[float, int, np.ndarray]:
-        r_close = self.env.data.get_joint_qpos("r_close")
+        r_close = self.env.data.joint('r_close').qpos
        return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
    @property
    def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
        return np.zeros(4, )
-        # raise NotImplementedError("Velocity cannot be retrieved.")
+        # raise NotImplementedError('Velocity cannot be retrieved.')
--- a/fancy_gym/meta/goal_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_change_mp_wrapper.py
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
    and no secondary objects or end effectors are altered at the start of an episode.
    You can verify this by executing the code below for your environment id and check if the output is non-zero
    at the same indices.
    ```python
    import fancy_gym
    env = fancy_gym.make(env_id, 1)
    print(env.reset() - env.reset())
    array([ 0.        ,  0.        ,  0.        ,  0.        ,    0,
        0         , 0          ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0         ,  0         ,  0         ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        , !=0        , !=0        , !=0])
    ```
    """
    @property
--- a/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
    and no secondary objects or end effectors are altered at the start of an episode.
    You can verify this by executing the code below for your environment id and check if the output is non-zero
    at the same indices.
    ```python
    import fancy_gym
    env = fancy_gym.make(env_id, 1)
    print(env.reset() - env.reset())
    array([ !=0       ,  !=0       ,  !=0        ,  0.        ,  0.,
        0.        , 0.         ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  !=0       ,  !=0       ,
        !=0       ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        , !=0        , !=0        , !=0])
    ```
    """
    @property
--- a/fancy_gym/meta/goal_object_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_object_change_mp_wrapper.py
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
    and no secondary objects or end effectors are altered at the start of an episode.
    You can verify this by executing the code below for your environment id and check if the output is non-zero
    at the same indices.
    ```python
    import fancy_gym
    env = fancy_gym.make(env_id, 1)
    print(env.reset() - env.reset())
    array([ 0.        ,  0.        ,  0.        ,  0.        ,  !=0,
        !=0       , !=0        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        , !=0        , !=0        , !=0        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
        0.        , !=0        , !=0        , !=0])
    ```
    """
    @property
--- a/fancy_gym/meta/metaworld_adapter.py
+++ b/fancy_gym/meta/metaworld_adapter.py
@ -0,0 +1,97 @@
 import random
 from typing import Iterable, Type, Union, Optional
 import numpy as np
 from gymnasium import register as gym_register
 import uuid
 import gymnasium as gym
 import numpy as np
 from fancy_gym.utils.env_compatibility import EnvCompatibility
 try:
    import metaworld
 except Exception:
    print('[FANCY GYM] Metaworld not avaible')
 class FixMetaworldHasIncorrectObsSpaceWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env):
        gym.utils.RecordConstructorArgs.__init__(self)
        gym.Wrapper.__init__(self, env)
        eos = env.observation_space
        eas = env.action_space
        Obs_Space_Class = getattr(gym.spaces, str(eos.__class__).split("'")[1].split('.')[-1])
        Act_Space_Class = getattr(gym.spaces, str(eas.__class__).split("'")[1].split('.')[-1])
        self.observation_space = Obs_Space_Class(low=eos.low-np.inf, high=eos.high+np.inf, dtype=eos.dtype)
        self.action_space = Act_Space_Class(low=eas.low, high=eas.high, dtype=eas.dtype)
 class FixMetaworldIncorrectResetPathLengthWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env):
        gym.utils.RecordConstructorArgs.__init__(self)
        gym.Wrapper.__init__(self, env)
    def reset(self, **kwargs):
        ret = self.env.reset(**kwargs)
        head = self.env
        try:
            for i in range(16):
                head.curr_path_length = 0
                head = head.env
        except:
            pass
        return ret
 class FixMetaworldIgnoresSeedOnResetWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
    def __init__(self, env: gym.Env):
        gym.utils.RecordConstructorArgs.__init__(self)
        gym.Wrapper.__init__(self, env)
    def reset(self, **kwargs):
        print('[!] You just called .reset on a Metaworld env and supplied a seed. Metaworld curretly does not correctly implement seeding. Do not rely on deterministic behavior.')
        if 'seed' in kwargs:
            self.env.seed(kwargs['seed'])
        return self.env.reset(**kwargs)
 def make_metaworld(underlying_id: str, seed: int = 1, render_mode: Optional[str] = None, **kwargs):
    if underlying_id not in metaworld.ML1.ENV_NAMES:
        raise ValueError(f'Specified environment "{underlying_id}" not present in metaworld ML1.')
    env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[underlying_id + "-goal-observable"](seed=seed, **kwargs)
    # setting this avoids generating the same initialization after each reset
    env._freeze_rand_vec = False
    # New argument to use global seeding
    env.seeded_rand_vec = True
    # TODO remove, when this has been fixed upstream
    env = FixMetaworldHasIncorrectObsSpaceWrapper(env)
    # TODO remove, when this has been fixed upstream
    # env = FixMetaworldIncorrectResetPathLengthWrapper(env)
    # TODO remove, when this has been fixed upstream
    env = FixMetaworldIgnoresSeedOnResetWrapper(env)
    return env
 def register_all_ML1(**kwargs):
    for env_id in metaworld.ML1.ENV_NAMES:
        _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=0)
        max_episode_steps = _env.max_path_length
        gym_register(
            id='metaworld/'+env_id,
            entry_point=make_metaworld,
            max_episode_steps=max_episode_steps,
            kwargs={
                'underlying_id': env_id
            },
            **kwargs
        )
--- a/fancy_gym/open_ai/README.MD
+++ b/fancy_gym/open_ai/README.MD
@ -4,11 +4,12 @@ These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.
 the Motion Primitive gym interface for them.
 ## MP Environments
 These environments are wrapped-versions of their OpenAI-gym counterparts.
-|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
+| Name                                 | Description                                                          | Trajectory Horizon | Action Dimension |
-|---|---|---|---|---|
+| ------------------------------------ | -------------------------------------------------------------------- | ------------------ | ---------------- |
-|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
+| `gym_ProMP/ContinuousMountainCar-v0` | A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100                | 1                |
-|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2
+| `gym_ProMP/Reacher-v2`               | A ProMP wrapped version of the Reacher-v2 environment.               | 50                 | 2                |
-|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 
+| `gym_ProMP/FetchSlideDense-v1`       | A ProMP wrapped version of the FetchSlideDense-v1 environment.       | 50                 | 4                |
-|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
+| `gym_ProMP/FetchReachDense-v1`       | A ProMP wrapped version of the FetchReachDense-v1 environment.       | 50                 | 4                |
--- a/fancy_gym/open_ai/init.py
+++ b/fancy_gym/open_ai/init.py
@ -1,45 +1,16 @@
 from copy import deepcopy
-from gym import register
+from ..envs.registry import register, upgrade
 from . import mujoco
 from .deprecated_needs_gym_robotics import robotics
-ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
+upgrade(
-
+    id='Reacher-v2',
-DEFAULT_BB_DICT_ProMP = {
+    mp_wrapper=mujoco.reacher_v2.MPWrapper,
-    "name": 'EnvName',
+    add_mp_types=['ProMP'],
    "wrappers": [],
    "trajectory_generator_kwargs": {
        'trajectory_generator_type': 'promp'
    },
    "phase_generator_kwargs": {
        'phase_generator_type': 'linear'
    },
    "controller_kwargs": {
        'controller_type': 'motor',
        "p_gains": 1.0,
        "d_gains": 0.1,
    },
    "basis_generator_kwargs": {
        'basis_generator_type': 'zero_rbf',
        'num_basis': 5,
        'num_basis_zero_start': 1
    }
 }
 kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
 kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
 kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
 kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
 kwargs_dict_reacher_promp['name'] = "Reacher-v2"
 kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
 register(
    id='ReacherProMP-v2',
    entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
    kwargs=kwargs_dict_reacher_promp
 )
-ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
+
 """
 The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
 However, the usage and so on needs to be checked
--- a/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
+++ b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 class MPWrapper(RawInterfaceWrapper):
    mp_config = {
        'ProMP': {
            "trajectory_generator_kwargs": {
                'trajectory_generator_type': 'promp'
            },
            "phase_generator_kwargs": {
                'phase_generator_type': 'linear'
            },
            "controller_kwargs": {
                'controller_type': 'motor',
                "p_gains": 0.6,
                "d_gains": 0.075,
            },
            "basis_generator_kwargs": {
                'basis_generator_type': 'zero_rbf',
                'num_basis': 6,
                'num_basis_zero_start': 1
            }
        },
        'DMP': {},
        'ProDMP': {},
    }
    @property
    def current_vel(self) -> Union[float, int, np.ndarray]:
--- a/fancy_gym/utils/env_compatibility.py
+++ b/fancy_gym/utils/env_compatibility.py
@ -0,0 +1,11 @@
 import gymnasium as gym
 class EnvCompatibility(gym.wrappers.EnvCompatibility):
    def __getattr__(self, item):
        """Propagate only non-existent properties to wrapped env."""
        if item.startswith('_'):
            raise AttributeError("attempted to get missing private attribute '{}'".format(item))
        if item in self.__dict__:
            return getattr(self, item)
        return getattr(self.env, item)
--- a/fancy_gym/utils/make_env_helpers.py
+++ b/fancy_gym/utils/make_env_helpers.py
@ -1,17 +1,27 @@
-import logging
+from fancy_gym.utils.wrappers import TimeAwareObservation
-import re
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
 from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
 from fancy_gym.black_box.factory.controller_factory import get_controller
 from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
 from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
 import uuid
 from collections.abc import MutableMapping
 from copy import deepcopy
 from math import ceil
-from typing import Iterable, Type, Union
+from typing import Iterable, Type, Union, Optional
-import gym
+import gymnasium as gym
 from gymnasium import make
 import numpy as np
-from gym.envs.registration import register, registry
+from gymnasium.envs.registration import register, registry
 from gymnasium.wrappers import TimeLimit
 from fancy_gym.utils.env_compatibility import EnvCompatibility
 from fancy_gym.utils.wrappers import FlattenObservation
 try:
-    from dm_control import suite, manipulation
+    import shimmy
    from shimmy.dm_control_compatibility import EnvType
 except ImportError:
    pass
@ -21,111 +31,44 @@ except Exception:
    # catch Exception as Import error does not catch missing mujoco-py
    pass
 import fancy_gym
 from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
 from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
 from fancy_gym.black_box.factory.controller_factory import get_controller
 from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
 from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
 from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
 from fancy_gym.utils.time_aware_observation import TimeAwareObservation
 from fancy_gym.utils.utils import nested_update
-
+def _make_wrapped_env(env: gym.Env, wrappers: Iterable[Type[gym.Wrapper]], seed=1, fallback_max_steps=None):
 def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
    """
    TODO: Do we need this?
    Generate a callable to create a new gym environment with a given seed.
    The rank is added to the seed and can be used for example when using vector environments.
    E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
    with seeds 123 through 130.
    Hence, testing environments should be seeded with a value which is offset by the number of training environments.
    Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
    Args:
        env_id: name of the environment
        seed: seed for deterministic behaviour
        rank: environment rank for deterministic over multiple seeds behaviour
        return_callable: If True returns a callable to create the environment instead of the environment itself.
    Returns:
    """
    def f():
        return make(env_id, seed + rank, **kwargs)
    return f if return_callable else f()
 def make(env_id: str, seed: int, **kwargs):
    """
    Converts an env_id to an environment with the gym API.
    This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
    specified with "dmc:domain_name-task_name"
    Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
    Args:
        env_id: spec or env_id for gym tasks, external environments require a domain specification
        **kwargs: Additional kwargs for the constructor such as pixel observations, etc.
    Returns: Gym environment
    """
    if ':' in env_id:
        split_id = env_id.split(':')
        framework, env_id = split_id[-2:]
    else:
        framework = None
    if framework == 'metaworld':
        # MetaWorld environment
        env = make_metaworld(env_id, seed, **kwargs)
    elif framework == 'dmc':
        # DeepMind Control environment
        env = make_dmc(env_id, seed, **kwargs)
    else:
        env = make_gym(env_id, seed, **kwargs)
    env.seed(seed)
    env.action_space.seed(seed)
    env.observation_space.seed(seed)
    return env
 def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
    """
    Helper function for creating a wrapped gym environment using MPs.
    It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
    provided to expose the interface for MPs.
    Args:
-        env_id: name of the environment
+        env: base environemnt to wrap
        wrappers: list of wrappers (at least an RawInterfaceWrapper),
        seed: seed of environment
    Returns: gym environment with all specified wrappers applied
    """
-    # _env = gym.make(env_id)
+    if fallback_max_steps:
-    _env = make(env_id, seed, **kwargs)
+        env = ensure_finite_time(env, fallback_max_steps)
    has_black_box_wrapper = False
    head = env
    while hasattr(head, 'env'):
        if isinstance(head, RawInterfaceWrapper):
            has_black_box_wrapper = True
            break
        head = head.env
    for w in wrappers:
        # only wrap the environment if not BlackBoxWrapper, e.g. for vision
        if issubclass(w, RawInterfaceWrapper):
            has_black_box_wrapper = True
-        _env = w(_env)
+        env = w(env)
    if not has_black_box_wrapper:
        raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
-    return _env
+    return env
 def make_bb(
-        env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
+        env: Union[gym.Env, str], wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
-        controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1,
+        controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping,
-        **kwargs):
+        time_limit: int = None, fallback_max_steps: int = None, **kwargs):
    """
    This can also be used standalone for manually building a custom DMP environment.
    Args:
@ -133,7 +76,7 @@ def make_bb(
        basis_kwargs: kwargs for the basis generator
        phase_kwargs: kwargs for the phase generator
        controller_kwargs: kwargs for the tracking controller
-        env_id: base_env_name,
+        env: step based environment (or environment id),
        wrappers: list of wrappers (at least an RawInterfaceWrapper),
        seed: seed of environment
        traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
@ -141,7 +84,7 @@ def make_bb(
    Returns: DMP wrapped gym env
    """
-    _verify_time_limit(traj_gen_kwargs.get("duration"), kwargs.get("time_limit"))
+    _verify_time_limit(traj_gen_kwargs.get("duration"), time_limit)
    learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
    do_replanning = black_box_kwargs.get('replanning_schedule')
@ -153,12 +96,19 @@ def make_bb(
        # Add as first wrapper in order to alter observation
        wrappers.insert(0, TimeAwareObservation)
-    env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs)
+    if isinstance(env, str):
        env = make(env, **kwargs)
    env = _make_wrapped_env(env=env, wrappers=wrappers, fallback_max_steps=fallback_max_steps)
    # BB expects a spaces.Box to be exposed, need to convert for dict-observations
    if type(env.observation_space) == gym.spaces.dict.Dict:
        env = FlattenObservation(env)
    traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
    if black_box_kwargs.get('duration') is None:
-        black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt
+        black_box_kwargs['duration'] = get_env_duration(env)
    if phase_kwargs.get('tau') is None:
        phase_kwargs['tau'] = black_box_kwargs['duration']
@ -186,156 +136,27 @@ def make_bb(
    return bb_env
-def make_bb_env_helper(**kwargs):
+def ensure_finite_time(env: gym.Env, fallback_max_steps=500):
-    """
+    cur_limit = env.spec.max_episode_steps
-    Helper function for registering a black box gym environment.
+    if not cur_limit:
-    Args:
+        if hasattr(env.unwrapped, 'max_path_length'):
-        **kwargs: expects at least the following:
+            return TimeLimit(env, env.unwrapped.__getattribute__('max_path_length'))
-        {
+        return TimeLimit(env, fallback_max_steps)
        "name": base environment name.
        "wrappers": list of wrappers (at least an BlackBoxWrapper is required),
        "traj_gen_kwargs": {
            "trajectory_generator_type": type_of_your_movement_primitive,
            non default arguments for the movement primitive instance
            ...
            }
        "controller_kwargs": {
            "controller_type": type_of_your_controller,
            non default arguments for the tracking_controller instance
            ...
            },
        "basis_generator_kwargs": {
            "basis_generator_type": type_of_your_basis_generator,
            non default arguments for the basis generator instance
            ...
            },
        "phase_generator_kwargs": {
            "phase_generator_type": type_of_your_phase_generator,
            non default arguments for the phase generator instance
            ...
            },
        }
    Returns: MP wrapped gym env
    """
    seed = kwargs.pop("seed", None)
    wrappers = kwargs.pop("wrappers")
    traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
    black_box_kwargs = kwargs.pop('black_box_kwargs', {})
    contr_kwargs = kwargs.pop("controller_kwargs", {})
    phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
    basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
    return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
                   black_box_kwargs=black_box_kwargs,
                   traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
                   phase_kwargs=phase_kwargs,
                   basis_kwargs=basis_kwargs, **kwargs, seed=seed)
 def make_dmc(
        env_id: str,
        seed: int = None,
        visualize_reward: bool = True,
        time_limit: Union[None, float] = None,
        **kwargs
 ):
    if not re.match(r"\w+-\w+", env_id):
        raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
    domain_name, task_name = env_id.split("-")
    if task_name.endswith("_vision"):
        # TODO
        raise ValueError("The vision interface for manipulation tasks is currently not supported.")
    if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
        raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
    # env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
    gym_id = uuid.uuid4().hex + '-v1'
    task_kwargs = {'random': seed}
    if time_limit is not None:
        task_kwargs['time_limit'] = time_limit
    # create task
    # Accessing private attribute because DMC does not expose time_limit or step_limit.
    # Only the current time_step/time as well as the control_timestep can be accessed.
    if domain_name == "manipulation":
        env = manipulation.load(environment_name=task_name, seed=seed)
        max_episode_steps = ceil(env._time_limit / env.control_timestep())
    else:
        env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
                         visualize_reward=visualize_reward, environment_kwargs=kwargs)
        max_episode_steps = int(env._step_limit)
    register(
        id=gym_id,
        entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
        kwargs={'env': lambda: env},
        max_episode_steps=max_episode_steps,
    )
    env = gym.make(gym_id)
    env.seed(seed)
    return env
-def make_metaworld(env_id: str, seed: int, **kwargs):
+def get_env_duration(env: gym.Env):
    if env_id not in metaworld.ML1.ENV_NAMES:
        raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
    _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
    # setting this avoids generating the same initialization after each reset
    _env._freeze_rand_vec = False
    # New argument to use global seeding
    _env.seeded_rand_vec = True
    gym_id = uuid.uuid4().hex + '-v1'
    register(
        id=gym_id,
        entry_point=lambda: _env,
        max_episode_steps=_env.max_path_length,
    )
    # TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
    env = gym.make(gym_id, disable_env_checker=True)
    return env
 def make_gym(env_id, seed, **kwargs):
    """
    Create
    Args:
        env_id:
        seed:
        **kwargs:
    Returns:
    """
    # Getting the existing keywords to allow for nested dict updates for BB envs
    # gym only allows for non nested updates.
    try:
-        all_kwargs = deepcopy(registry.get(env_id).kwargs)
+        duration = env.spec.max_episode_steps * env.dt
-    except AttributeError as e:
+    except (AttributeError, TypeError) as e:
-        logging.error(f'The gym environment with id {env_id} could not been found.')
+        if env.env_type is EnvType.COMPOSER:
-        raise e
+            max_episode_steps = ceil(env.unwrapped._time_limit / env.dt)
-    nested_update(all_kwargs, kwargs)
+        elif env.env_type is EnvType.RL_CONTROL:
-    kwargs = all_kwargs
+            max_episode_steps = int(env.unwrapped._step_limit)
-
+        else:
-    # Add seed to kwargs for bb environments to pass seed to step environments
+            raise e
-    all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), [])
+        duration = max_episode_steps * env.control_timestep()
-    if env_id in all_bb_envs:
+    return duration
        kwargs.update({"seed": seed})
    # Gym
    env = gym.make(env_id, **kwargs)
    return env
 def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):
--- a/fancy_gym/utils/time_aware_observation.py
+++ b/fancy_gym/utils/time_aware_observation.py
@ -1,78 +0,0 @@
 """
 Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
 License: MIT
 Copyright (c) 2016 OpenAI (https://openai.com)
 Wrapper for adding time aware observations to environment observation.
 """
 import gym
 import numpy as np
 from gym.spaces import Box
 class TimeAwareObservation(gym.ObservationWrapper):
    """Augment the observation with the current time step in the episode.
    The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
    In particular, pixel observations are not supported. This wrapper will append the current timestep
     within the current episode to the observation.
    Example:
        >>> import gym
        >>> env = gym.make('CartPole-v1')
        >>> env = TimeAwareObservation(env)
        >>> env.reset()
        array([ 0.03810719,  0.03522411,  0.02231044, -0.01088205,  0.        ])
        >>> env.step(env.action_space.sample())[0]
        array([ 0.03881167, -0.16021058,  0.0220928 ,  0.28875574,  1.        ])
    """
    def __init__(self, env: gym.Env):
        """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
        observation space.
        Args:
            env: The environment to apply the wrapper
        """
        super().__init__(env)
        assert isinstance(env.observation_space, Box)
        low = np.append(self.observation_space.low, 0.0)
        high = np.append(self.observation_space.high, 1.0)
        self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
        self.t = 0
        self._max_episode_steps = env.spec.max_episode_steps
    def observation(self, observation):
        """Adds to the observation with the current time step normalized with max steps.
        Args:
            observation: The observation to add the time step to
        Returns:
            The observation with the time step appended to
        """
        return np.append(observation, self.t / self._max_episode_steps)
    def step(self, action):
        """Steps through the environment, incrementing the time step.
        Args:
            action: The action to take
        Returns:
            The environment's step using the action.
        """
        self.t += 1
        return super().step(action)
    def reset(self, **kwargs):
        """Reset the environment setting the time to zero.
        Args:
            **kwargs: Kwargs to apply to env.reset()
        Returns:
            The reset environment
        """
        self.t = 0
        return super().reset(**kwargs)
--- a/fancy_gym/utils/wrappers.py
+++ b/fancy_gym/utils/wrappers.py
@ -0,0 +1,130 @@
 from gymnasium.spaces import Box, Dict, flatten, flatten_space
 try:
    from gym.spaces import Box as OldBox
 except ImportError:
    OldBox = None
 import gymnasium as gym
 import numpy as np
 import copy
 class TimeAwareObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
    """Augment the observation with the current time step in the episode.
    The observation space of the wrapped environment is assumed to be a flat :class:`Box` or flattable :class:`Dict`.
    In particular, pixel observations are not supported. This wrapper will append the current progress within the current episode to the observation.
    The progress will be indicated as a number between 0 and 1.
    """
    def __init__(self, env: gym.Env, enforce_dtype_float32=False):
        """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` or flattable :class:`Dict` observation space.
        Args:
            env: The environment to apply the wrapper
        """
        gym.utils.RecordConstructorArgs.__init__(self)
        gym.ObservationWrapper.__init__(self, env)
        allowed_classes = [Box, OldBox, Dict]
        if enforce_dtype_float32:
            assert env.observation_space.dtype == np.float32, 'TimeAwareObservation was given an environment with a dtype!=np.float32 ('+str(
                env.observation_space.dtype)+'). This requirement can be removed by setting enforce_dtype_float32=False.'
        assert env.observation_space.__class__ in allowed_classes, str(env.observation_space)+' is not supported. Only Box or Dict'
        if env.observation_space.__class__ in [Box, OldBox]:
            dtype = env.observation_space.dtype
            low = np.append(env.observation_space.low, 0.0)
            high = np.append(env.observation_space.high, 1.0)
            self.observation_space = Box(low, high, dtype=dtype)
        else:
            spaces = copy.copy(env.observation_space.spaces)
            dtype = np.float64
            spaces['time_awareness'] = Box(0, 1, dtype=dtype)
            self.observation_space = Dict(spaces)
        self.is_vector_env = getattr(env, "is_vector_env", False)
    def observation(self, observation):
        """Adds to the observation with the current time step.
        Args:
            observation: The observation to add the time step to
        Returns:
            The observation with the time step appended to (relative to total number of steps)
        """
        if self.observation_space.__class__ in [Box, OldBox]:
            return np.append(observation, self.t / self.env.spec.max_episode_steps)
        else:
            obs = copy.copy(observation)
            obs['time_awareness'] = self.t / self.env.spec.max_episode_steps
            return obs
    def step(self, action):
        """Steps through the environment, incrementing the time step.
        Args:
            action: The action to take
        Returns:
            The environment's step using the action.
        """
        self.t += 1
        return super().step(action)
    def reset(self, **kwargs):
        """Reset the environment setting the time to zero.
        Args:
            **kwargs: Kwargs to apply to env.reset()
        Returns:
            The reset environment
        """
        self.t = 0
        return super().reset(**kwargs)
 class FlattenObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
    """Observation wrapper that flattens the observation.
    Example:
        >>> import gymnasium as gym
        >>> from gymnasium.wrappers import FlattenObservation
        >>> env = gym.make("CarRacing-v2")
        >>> env.observation_space.shape
        (96, 96, 3)
        >>> env = FlattenObservation(env)
        >>> env.observation_space.shape
        (27648,)
        >>> obs, _ = env.reset()
        >>> obs.shape
        (27648,)
    """
    def __init__(self, env: gym.Env):
        """Flattens the observations of an environment.
        Args:
            env: The environment to apply the wrapper
        """
        gym.utils.RecordConstructorArgs.__init__(self)
        gym.ObservationWrapper.__init__(self, env)
        self.observation_space = flatten_space(env.observation_space)
    def observation(self, observation):
        """Flattens an observation.
        Args:
            observation: The observation to flatten
        Returns:
            The flattened observation
        """
        try:
            return flatten(self.env.observation_space, observation)
        except:
            return np.array([flatten(self.env.observation_space, observation[i]) for i in range(len(observation))])
--- a/icon.svg
+++ b/icon.svg
--- a/setup.py
+++ b/setup.py
@ -6,33 +6,38 @@ from setuptools import setup, find_packages
 # Environment-specific dependencies for dmc and metaworld
 extras = {
-    "dmc": ["dm_control>=1.0.1"],
+    'dmc': ['shimmy[dm-control]', 'Shimmy==1.0.0'],
-    "metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
+    'metaworld': ['metaworld @ git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld'],
-                  'mujoco-py<2.2,>=2.1',
+    'box2d': ['gymnasium[box2d]>=0.26.0'],
-                  'scipy'
+    'mujoco': ['mujoco==2.3.3', 'gymnasium[mujoco]>0.26.0'],
-                  ],
+    'mujoco-legacy': ['mujoco-py >=2.1,<2.2', 'cython<3'],
    'jax': ["jax >=0.4.0", "jaxlib >=0.4.0"],
 }
 # All dependencies
 all_groups = set(extras.keys())
-extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups))))
+extras["all"] = list(set(itertools.chain.from_iterable(
    map(lambda group: extras[group], all_groups))))
 extras['testing'] = extras["all"] + ['pytest']
 def find_package_data(extensions_to_include: List[str]) -> List[str]:
    envs_dir = Path("fancy_gym/envs/mujoco")
    package_data_paths = []
    for extension in extensions_to_include:
-        package_data_paths.extend([str(path)[10:] for path in envs_dir.rglob(extension)])
+        package_data_paths.extend([str(path)[10:]
                                  for path in envs_dir.rglob(extension)])
    return package_data_paths
 setup(
-    author='Fabian Otto, Onur Celik',
+    author='Fabian Otto, Onur Celik, Dominik Roth, Hongyi Zhou',
    name='fancy_gym',
-    version='0.2',
+    version='1.0',
    classifiers=[
-        'Development Status :: 3 - Alpha',
+        'Development Status :: 4 - Beta',
        'Intended Audience :: Science/Research',
        'License :: OSI Approved :: MIT License',
        'Natural Language :: English',
@ -46,10 +51,11 @@ setup(
    ],
    extras_require=extras,
    install_requires=[
-        'gym[mujoco]<0.25.0,>=0.24.1',
+        'gymnasium>=0.26.0',
        'mp_pytorch<=0.1.3'
    ],
-    packages=[package for package in find_packages() if package.startswith("fancy_gym")],
+    packages=[package for package in find_packages(
    ) if package.startswith("fancy_gym")],
    package_data={
        "fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"])
    },
--- a/test/test_all_gym_builtin_envs.py
+++ b/test/test_all_gym_builtin_envs.py
@ -1,14 +1,21 @@
 import re
 from itertools import chain
 from typing import Callable
-import gym
+import gymnasium as gym
 import pytest
 import fancy_gym
 from test.utils import run_env, run_env_determinism
-GYM_IDS = [spec.id for spec in gym.envs.registry.all() if
+GYM_IDS = [spec.id for spec in gym.envs.registry.values() if
-           "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
+           not isinstance(spec.entry_point, Callable) and
-GYM_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+           "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point
           and 'jax' not in spec.id.lower()
           and 'jax' not in spec.id.lower()
           and not re.match(r'GymV2.Environment', spec.id)
           ]
 GYM_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 SEED = 1
--- a/test/test_black_box.py
+++ b/test/test_black_box.py
@ -1,21 +1,23 @@
 from itertools import chain
 from typing import Tuple, Type, Union, Optional, Callable
-import gym
+import gymnasium as gym
 import numpy as np
 import pytest
-from gym import register
+from gymnasium import register, make
-from gym.core import ActType, ObsType
+from gymnasium.core import ActType, ObsType
 import fancy_gym
 from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
-from fancy_gym.utils.time_aware_observation import TimeAwareObservation
+from fancy_gym.utils.wrappers import TimeAwareObservation
 SEED = 1
-ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
+ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
 WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
            fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
-ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 MAX_STEPS_FALLBACK = 100
 class Object(object):
@ -32,10 +34,12 @@ class ToyEnv(gym.Env):
    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
              options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
-        return np.array([-1])
+        obs, options = np.array([-1]), {}
        return obs, options
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
-        return np.array([-1]), 1, False, {}
+        obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
        return obs, reward, terminated, truncated, info
    def render(self, mode="human"):
        pass
@ -76,7 +80,7 @@ def test_missing_local_state(mp_type: str):
                            {'controller_type': 'motor'},
                            {'phase_generator_type': 'exp'},
                            {'basis_generator_type': basis_generator_type})
-    env.reset()
+    env.reset(seed=SEED)
    with pytest.raises(NotImplementedError):
        env.step(env.action_space.sample())
@ -93,12 +97,14 @@ def test_verbosity(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]
                            {'controller_type': 'motor'},
                            {'phase_generator_type': 'exp'},
                            {'basis_generator_type': basis_generator_type})
-    env.reset()
+    env.reset(seed=SEED)
-    info_keys = list(env.step(env.action_space.sample())[3].keys())
+    _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
    info_keys = list(info.keys())
-    env_step = fancy_gym.make(env_id, SEED)
+    env_step = make(env_id)
    env_step.reset()
-    info_keys_step = env_step.step(env_step.action_space.sample())[3].keys()
+    _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
    info_keys_step = info.keys()
    assert all(e in info_keys for e in info_keys_step)
    assert 'trajectory_length' in info_keys
@ -118,13 +124,15 @@ def test_length(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]):
                            {'trajectory_generator_type': mp_type},
                            {'controller_type': 'motor'},
                            {'phase_generator_type': 'exp'},
-                            {'basis_generator_type': basis_generator_type})
+                            {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
-    for _ in range(5):
+    for i in range(5):
-        env.reset()
+        env.reset(seed=SEED)
        length = env.step(env.action_space.sample())[3]['trajectory_length']
-        assert length == env.spec.max_episode_steps
+        _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
        length = info['trajectory_length']
        assert length == env.spec.max_episode_steps, f'Expcted total simulation length ({length}) to be equal to spec.max_episode_steps ({env.spec.max_episode_steps}), but was not during test nr. {i}'
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@ -136,9 +144,10 @@ def test_aggregation(mp_type: str, reward_aggregation: Callable[[np.ndarray], fl
                            {'controller_type': 'motor'},
                            {'phase_generator_type': 'exp'},
                            {'basis_generator_type': basis_generator_type})
-    env.reset()
+    env.reset(seed=SEED)
    # ToyEnv only returns 1 as reward
-    assert env.step(env.action_space.sample())[1] == reward_aggregation(np.ones(50, ))
+    _obs, reward, _terminated, _truncated, _info = env.step(env.action_space.sample())
    assert reward == reward_aggregation(np.ones(50, ))
@pytest.mark.parametrize('mp_type', ['promp', 'dmp'])
@ -151,14 +160,16 @@ def test_context_space(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapp
                            {'phase_generator_type': 'exp'},
                            {'basis_generator_type': 'rbf'})
    # check if observation space matches with the specified mask values which are true
-    env_step = fancy_gym.make(env_id, SEED)
+    env_step = make(env_id)
    wrapper = wrapper_class(env_step)
    assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@pytest.mark.parametrize('num_dof', [0, 1, 2, 5])
-@pytest.mark.parametrize('num_basis', [0, 1, 2, 5])
+@pytest.mark.parametrize('num_basis', [
    pytest.param(0, marks=pytest.mark.xfail(reason="Basis Length 0 is not yet implemented.")),
    1, 2, 5])
@pytest.mark.parametrize('learn_tau', [True, False])
@pytest.mark.parametrize('learn_delay', [True, False])
 def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool):
@ -219,16 +230,18 @@ def test_learn_tau(mp_type: str, tau: float):
                             'learn_delay': False
                             },
                            {'basis_generator_type': basis_generator_type,
-                             }, seed=SEED)
+                             })
-    d = True
+    env.reset(seed=SEED)
    done = True
    for i in range(5):
-        if d:
+        if done:
-            env.reset()
+            env.reset(seed=SEED)
        action = env.action_space.sample()
        action[0] = tau
-        obs, r, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        length = info['trajectory_length']
        assert length == env.spec.max_episode_steps
@ -248,6 +261,8 @@ def test_learn_tau(mp_type: str, tau: float):
        assert np.all(vel[:tau_time_steps - 2] != vel[-1])
 #
 #
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75])
 def test_learn_delay(mp_type: str, delay: float):
@ -262,16 +277,18 @@ def test_learn_delay(mp_type: str, delay: float):
                             'learn_delay': True
                             },
                            {'basis_generator_type': basis_generator_type,
-                             }, seed=SEED)
+                             })
-    d = True
+    env.reset(seed=SEED)
    done = True
    for i in range(5):
-        if d:
+        if done:
-            env.reset()
+            env.reset(seed=SEED)
        action = env.action_space.sample()
        action[0] = delay
-        obs, r, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        length = info['trajectory_length']
        assert length == env.spec.max_episode_steps
@ -290,6 +307,8 @@ def test_learn_delay(mp_type: str, delay: float):
        assert np.all(vel[max(1, delay_time_steps)] != vel[0])
 #
 #
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1])
@pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1])
@ -305,20 +324,23 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
                             'learn_delay': True
                             },
                            {'basis_generator_type': basis_generator_type,
-                             }, seed=SEED)
+                             })
    env.reset(seed=SEED)
    if env.spec.max_episode_steps * env.dt < delay + tau:
        return
-    d = True
+    done = True
    for i in range(5):
-        if d:
+        if done:
-            env.reset()
+            env.reset(seed=SEED)
        action = env.action_space.sample()
        action[0] = tau
        action[1] = delay
-        obs, r, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        length = info['trajectory_length']
        assert length == env.spec.max_episode_steps
@ -343,4 +365,4 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
        active_pos = pos[delay_time_steps: joint_time_steps - 1]
        active_vel = vel[delay_time_steps: joint_time_steps - 2]
        assert np.all(active_pos != pos[-1]) and np.all(active_pos != pos[0])
-        assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0])
+        assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0])
--- a/test/test_dmc_envs.py
+++ b/test/test_dmc_envs.py
@ -1,39 +1,30 @@
 from itertools import chain
 from typing import Callable
 import gymnasium as gym
 import pytest
 from dm_control import suite, manipulation
 import fancy_gym
 from test.utils import run_env, run_env_determinism
-SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
+DMC_IDS = [spec.id for spec in gym.envs.registry.values() if
-MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
+           spec.id.startswith('dm_control/')
-DMC_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+           and 'compatibility-env-v0' not in spec.id
           and 'lqr-lqr' not in spec.id]
 DMC_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 SEED = 1
-@pytest.mark.parametrize('env_id', SUITE_IDS)
+@pytest.mark.parametrize('env_id', DMC_IDS)
-def test_step_suite_functionality(env_id: str):
+def test_step_dm_control_functionality(env_id: str):
    """Tests that suite step environments run without errors using random actions."""
-    run_env(env_id)
+    run_env(env_id, 5000, wrappers=[gym.wrappers.FlattenObservation])
-@pytest.mark.parametrize('env_id', SUITE_IDS)
+@pytest.mark.parametrize('env_id', DMC_IDS)
-def test_step_suite_determinism(env_id: str):
+def test_step_dm_control_determinism(env_id: str):
    """Tests that for step environments identical seeds produce identical trajectories."""
-    run_env_determinism(env_id, SEED)
+    run_env_determinism(env_id, SEED, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
 def test_step_manipulation_functionality(env_id: str):
    """Tests that manipulation step environments run without errors using random actions."""
    run_env(env_id)
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
 def test_step_manipulation_determinism(env_id: str):
    """Tests that for step environments identical seeds produce identical trajectories."""
    run_env_determinism(env_id, SEED)
@pytest.mark.parametrize('env_id', DMC_MP_IDS)
--- a/test/test_fancy_envs.py
+++ b/test/test_fancy_envs.py
@ -1,14 +1,16 @@
-import itertools
+from itertools import chain
 from typing import Callable
 import fancy_gym
-import gym
+import gymnasium as gym
 import pytest
 from test.utils import run_env, run_env_determinism
-CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if
+CUSTOM_IDS = [id for id, spec in gym.envs.registry.items() if
              not isinstance(spec.entry_point, Callable) and
              "fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
-CUSTOM_MP_IDS = itertools.chain(*fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+CUSTOM_MP_IDS = fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 SEED = 1
--- a/test/test_fancy_registry.py
+++ b/test/test_fancy_registry.py
@ -0,0 +1,78 @@
 from typing import Tuple, Type, Union, Optional, Callable
 import gymnasium as gym
 import numpy as np
 import pytest
 from gymnasium import make
 from gymnasium.core import ActType, ObsType
 import fancy_gym
 from fancy_gym import register
 KNOWN_NS = ['dm_control', 'fancy', 'metaworld', 'gym']
 class Object(object):
    pass
 class ToyEnv(gym.Env):
    observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
    action_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
    dt = 0.02
    def __init__(self, a: int = 0, b: float = 0.0, c: list = [], d: dict = {}, e: Object = Object()):
        self.a, self.b, self.c, self.d, self.e = a, b, c, d, e
    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
              options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
        obs, options = np.array([-1]), {}
        return obs, options
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
        obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
        return obs, reward, terminated, truncated, info
    def render(self, mode="human"):
        pass
@pytest.fixture(scope="session", autouse=True)
 def setup():
    register(
        id=f'dummy/toy2-v0',
        entry_point='test.test_black_box:ToyEnv',
        max_episode_steps=50,
    )
@pytest.mark.parametrize('env_id', ['dummy/toy2-v0'])
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
 def test_make_mp(env_id: str, mp_type: str):
    parts = env_id.split('/')
    if len(parts) == 1:
        ns, name = 'gym', parts[0]
    elif len(parts) == 2:
        ns, name = parts[0], parts[1]
    else:
        raise ValueError('env id can not contain multiple "/".')
    fancy_id = f'{ns}_{mp_type}/{name}'
    make(fancy_id)
 def test_make_raw_toy():
    make('dummy/toy2-v0')
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
 def test_make_mp_toy(mp_type: str):
    fancy_id = f'dummy_{mp_type}/toy2-v0'
    make(fancy_id)
@pytest.mark.parametrize('ns', KNOWN_NS)
 def test_ns_nonempty(ns):
    assert len(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]), f'The namespace {ns} is empty even though, it should not be...'
--- a/test/test_metaworld_envs.py
+++ b/test/test_metaworld_envs.py
@ -6,9 +6,9 @@ from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
 import fancy_gym
 from test.utils import run_env, run_env_determinism
-METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in
+METAWORLD_IDS = [f'metaworld/{env.split("-goal-observable")[0]}' for env, _ in
                 ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
-METAWORLD_MP_IDS = chain(*fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+METAWORLD_MP_IDS = fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 SEED = 1
@ -18,6 +18,7 @@ def test_step_metaworld_functionality(env_id: str):
    run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_IDS)
 def test_step_metaworld_determinism(env_id: str):
    """Tests that for step environments identical seeds produce identical trajectories."""
@ -30,6 +31,7 @@ def test_bb_metaworld_functionality(env_id: str):
    run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_MP_IDS)
 def test_bb_metaworld_determinism(env_id: str):
    """Tests that for black box environment identical seeds produce identical trajectories."""
--- a/test/test_replanning_sequencing.py
+++ b/test/test_replanning_sequencing.py
@ -2,21 +2,25 @@ from itertools import chain
 from types import FunctionType
 from typing import Tuple, Type, Union, Optional
-import gym
+import gymnasium as gym
 import numpy as np
 import pytest
-from gym import register
+from gymnasium import register, make
-from gym.core import ActType, ObsType
+from gymnasium.core import ActType, ObsType
 from gymnasium import spaces
 import fancy_gym
 from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
-from fancy_gym.utils.time_aware_observation import TimeAwareObservation
+from fancy_gym.utils.wrappers import TimeAwareObservation
 from fancy_gym.utils.make_env_helpers import ensure_finite_time
 SEED = 1
-ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
+ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
 WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
            fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
-ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
 MAX_STEPS_FALLBACK = 50
 class ToyEnv(gym.Env):
@ -26,10 +30,12 @@ class ToyEnv(gym.Env):
    def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
              options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
-        return np.array([-1])
+        obs, options = np.array([-1]), {}
        return obs, options
    def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
-        return np.array([-1]), 1, False, {}
+        obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
        return obs, reward, terminated, truncated, info
    def render(self, mode="human"):
        pass
@ -61,7 +67,7 @@ def setup():
 def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
                                add_time_aware_wrapper_before: bool):
    env_id, wrapper_class = env_wrap
-    env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
+    env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
    wrappers = [wrapper_class]
    # has time aware wrapper
@ -72,24 +78,29 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
                            {'trajectory_generator_type': mp_type},
                            {'controller_type': 'motor'},
                            {'phase_generator_type': 'exp'},
-                            {'basis_generator_type': 'rbf'}, seed=SEED)
+                            {'basis_generator_type': 'rbf'}, fallback_max_steps=MAX_STEPS_FALLBACK)
    env.reset(seed=SEED)
    assert env.learn_sub_trajectories
    assert env.spec.max_episode_steps
    assert env_step.spec.max_episode_steps
    assert env.traj_gen.learn_tau
    # This also verifies we are not adding the TimeAwareObservationWrapper twice
-    assert env.observation_space == env_step.observation_space
+    assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
-    d = True
+    done = True
    for i in range(25):
-        if d:
+        if done:
-            env.reset()
+            env.reset(seed=SEED)
        action = env.action_space.sample()
-        obs, r, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        length = info['trajectory_length']
-        if not d:
+        if not done:
            assert length == np.round(action[0] / env.dt)
            assert length == np.round(env.traj_gen.tau.numpy() / env.dt)
        else:
@ -105,14 +116,14 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
 def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
                         add_time_aware_wrapper_before: bool, replanning_time: int):
    env_id, wrapper_class = env_wrap
-    env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
+    env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
    wrappers = [wrapper_class]
    # has time aware wrapper
    if add_time_aware_wrapper_before:
        wrappers += [TimeAwareObservation]
-    replanning_schedule = lambda c_pos, c_vel, obs, c_action, t: t % replanning_time == 0
+    def replanning_schedule(c_pos, c_vel, obs, c_action, t): return t % replanning_time == 0
    basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
    phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear'
@ -121,31 +132,36 @@ def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWra
                            {'trajectory_generator_type': mp_type},
                            {'controller_type': 'motor'},
                            {'phase_generator_type': phase_generator_type},
-                            {'basis_generator_type': basis_generator_type}, seed=SEED)
+                            {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
    env.reset(seed=SEED)
    assert env.do_replanning
    assert env.spec.max_episode_steps
    assert env_step.spec.max_episode_steps
    assert callable(env.replanning_schedule)
    # This also verifies we are not adding the TimeAwareObservationWrapper twice
-    assert env.observation_space == env_step.observation_space
+    assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
-    env.reset()
+    env.reset(seed=SEED)
    episode_steps = env_step.spec.max_episode_steps // replanning_time
    # Make 3 episodes, total steps depend on the replanning steps
    for i in range(3 * episode_steps):
        action = env.action_space.sample()
-        obs, r, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        length = info['trajectory_length']
-        if d:
+        if done:
            # Check if number of steps until termination match the replanning interval
-            print(d, (i + 1), episode_steps)
+            print(done, (i + 1), episode_steps)
            assert (i + 1) % episode_steps == 0
-            env.reset()
+            env.reset(seed=SEED)
        assert replanning_schedule(None, None, None, None, length)
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -165,15 +181,19 @@ def test_max_planning_times(mp_type: str, max_planning_times: int, sub_segment_s
                             },
                            {'basis_generator_type': basis_generator_type,
                             },
-                            seed=SEED)
+                            fallback_max_steps=MAX_STEPS_FALLBACK)
-    _ = env.reset()
+
-    d = False
+    _ = env.reset(seed=SEED)
    done = False
    planning_times = 0
-    while not d:
+    while not done:
-        _, _, d, _ = env.step(env.action_space.sample())
+        action = env.action_space.sample()
        _obs, _reward, terminated, truncated, _info = env.step(action)
        done = terminated or truncated
        planning_times += 1
    assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -194,17 +214,20 @@ def test_replanning_with_learn_tau(mp_type: str, max_planning_times: int, sub_se
                             },
                            {'basis_generator_type': basis_generator_type,
                             },
-                            seed=SEED)
+                            fallback_max_steps=MAX_STEPS_FALLBACK)
-    _ = env.reset()
+
-    d = False
+    _ = env.reset(seed=SEED)
    done = False
    planning_times = 0
-    while not d:
+    while not done:
        action = env.action_space.sample()
        action[0] = tau
-        _, _, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, _info = env.step(action)
        done = terminated or truncated
        planning_times += 1
    assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -213,26 +236,28 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
    basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
    phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
    env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
-                        {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
+                            {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
-                         'max_planning_times': max_planning_times,
+                             'max_planning_times': max_planning_times,
-                         'verbose': 2},
+                             'verbose': 2},
-                        {'trajectory_generator_type': mp_type,
+                            {'trajectory_generator_type': mp_type,
-                         },
+                             },
-                        {'controller_type': 'motor'},
+                            {'controller_type': 'motor'},
-                        {'phase_generator_type': phase_generator_type,
+                            {'phase_generator_type': phase_generator_type,
-                         'learn_tau': False,
+                             'learn_tau': False,
-                         'learn_delay': True
+                             'learn_delay': True
-                         },
+                             },
-                        {'basis_generator_type': basis_generator_type,
+                            {'basis_generator_type': basis_generator_type,
-                         },
+                             },
-                        seed=SEED)
+                            fallback_max_steps=MAX_STEPS_FALLBACK)
-    _ = env.reset()
+
-    d = False
+    _ = env.reset(seed=SEED)
    done = False
    planning_times = 0
-    while not d:
+    while not done:
        action = env.action_space.sample()
        action[0] = delay
-        _, _, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        delay_time_steps = int(np.round(delay / env.dt))
        pos = info['positions'].flatten()
@ -256,6 +281,7 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
    assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3])
@pytest.mark.parametrize('sub_segment_steps', [5, 10, 15])
@ -266,27 +292,29 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
    basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
    phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
    env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
-                        {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
+                            {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
-                         'max_planning_times': max_planning_times,
+                             'max_planning_times': max_planning_times,
-                         'verbose': 2},
+                             'verbose': 2},
-                        {'trajectory_generator_type': mp_type,
+                            {'trajectory_generator_type': mp_type,
-                         },
+                             },
-                        {'controller_type': 'motor'},
+                            {'controller_type': 'motor'},
-                        {'phase_generator_type': phase_generator_type,
+                            {'phase_generator_type': phase_generator_type,
-                         'learn_tau': True,
+                             'learn_tau': True,
-                         'learn_delay': True
+                             'learn_delay': True
-                         },
+                             },
-                        {'basis_generator_type': basis_generator_type,
+                            {'basis_generator_type': basis_generator_type,
-                         },
+                             },
-                        seed=SEED)
+                            fallback_max_steps=MAX_STEPS_FALLBACK)
-    _ = env.reset()
+
-    d = False
+    _ = env.reset(seed=SEED)
    done = False
    planning_times = 0
-    while not d:
+    while not done:
        action = env.action_space.sample()
        action[0] = tau
        action[1] = delay
-        _, _, d, info = env.step(action)
+        _obs, _reward, terminated, truncated, info = env.step(action)
        done = terminated or truncated
        delay_time_steps = int(np.round(delay / env.dt))
@ -306,6 +334,7 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
    assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -325,9 +354,11 @@ def test_replanning_schedule(mp_type: str, max_planning_times: int, sub_segment_
                             },
                            {'basis_generator_type': basis_generator_type,
                             },
-                            seed=SEED)
+                            fallback_max_steps=MAX_STEPS_FALLBACK)
-    _ = env.reset()
+
-    d = False
+    _ = env.reset(seed=SEED)
    for i in range(max_planning_times):
-        _, _, d, _ = env.step(env.action_space.sample())
+        action = env.action_space.sample()
-    assert d
+        _obs, _reward, terminated, truncated, _info = env.step(action)
        done = terminated or truncated
    assert done
--- a/test/utils.py
+++ b/test/utils.py
@ -1,9 +1,12 @@
-import gym
+from typing import List, Type
 import gymnasium as gym
 import numpy as np
-from fancy_gym import make
+from gymnasium import make
-def run_env(env_id, iterations=None, seed=0, render=False):
+def run_env(env_id: str, iterations: int = None, seed: int = 0, wrappers: List[Type[gym.Wrapper]] = [],
            render: bool = False):
    """
    Example for running a DMC based env in the step based setting.
    The env_id has to be specified as `dmc:domain_name-task_name` or
@ -13,70 +16,88 @@ def run_env(env_id, iterations=None, seed=0, render=False):
        env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
        iterations: Number of rollout steps to run
        seed: random seeding
        wrappers: List of Wrappers to apply to the environment
        render: Render the episode
-    Returns: observations, rewards, dones, actions
+    Returns: observations, rewards, terminations, truncations, actions
    """
-    env: gym.Env = make(env_id, seed=seed)
+    env: gym.Env = make(env_id)
    for w in wrappers:
        env = w(env)
    rewards = []
    observations = []
    actions = []
-    dones = []
+    terminations = []
-    obs = env.reset()
+    truncations = []
    obs, _ = env.reset(seed=seed)
    env.action_space.seed(seed)
    verify_observations(obs, env.observation_space, "reset()")
    iterations = iterations or (env.spec.max_episode_steps or 1)
-    # number of samples(multiple environment steps)
+    # number of samples (multiple environment steps)
    for i in range(iterations):
        observations.append(obs)
        ac = env.action_space.sample()
        actions.append(ac)
        # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
-        obs, reward, done, info = env.step(ac)
+        obs, reward, terminated, truncated, info = env.step(ac)
        verify_observations(obs, env.observation_space, "step()")
        verify_reward(reward)
-        verify_done(done)
+        verify_done(terminated)
        verify_done(truncated)
        rewards.append(reward)
-        dones.append(done)
+        terminations.append(terminated)
        truncations.append(truncated)
        if render:
            env.render("human")
-        if done:
+        if terminated or truncated:
            break
    if not hasattr(env, "replanning_schedule"):
-        assert done, "Done flag is not True after end of episode."
+        assert terminated or truncated, f"Termination or truncation flag is not True after {i + 1} iterations."
    observations.append(obs)
    env.close()
    del env
-    return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
+    return np.array(observations), np.array(rewards), np.array(terminations), np.array(truncations), np.array(actions)
-def run_env_determinism(env_id: str, seed: int):
+def run_env_determinism(env_id: str, seed: int, iterations: int = None, wrappers: List[Type[gym.Wrapper]] = []):
-    traj1 = run_env(env_id, seed=seed)
+    traj1 = run_env(env_id, iterations=iterations,
-    traj2 = run_env(env_id, seed=seed)
+                    seed=seed, wrappers=wrappers)
    traj2 = run_env(env_id, iterations=iterations,
                    seed=seed, wrappers=wrappers)
    # Iterate over two trajectories, which should have the same state and action sequence
    for i, time_step in enumerate(zip(*traj1, *traj2)):
-        obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
+        obs1, rwd1, term1, trunc1, ac1, obs2, rwd2, term2, trunc2, ac2 = time_step
-        assert np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match."
+        assert np.allclose(
-        assert np.array_equal(ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
+            obs1, obs2), f"Observations [{i}] {obs1} ({obs1.shape}) and {obs2} ({obs2.shape}) do not match: Biggest difference is {np.abs(obs1-obs2).max()} at index {np.abs(obs1-obs2).argmax()}."
-        assert np.array_equal(rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
+        assert np.array_equal(
-        assert np.array_equal(done1, done2), f"Dones [{i}] {done1} and {done2} do not match."
+            ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
        assert np.array_equal(
            rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
        assert np.array_equal(
            term1, term2), f"Terminateds [{i}] {term1} and {term2} do not match."
        assert np.array_equal(
            term1, term2), f"Truncateds [{i}] {trunc1} and {trunc2} do not match."
 def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"):
    assert observation_space.contains(obs), \
-        f"Observation {obs} received from {obs_type} not contained in observation space {observation_space}."
+        f"Observation {obs} ({obs.shape}) received from {obs_type} not contained in observation space {observation_space}."
 def verify_reward(reward):
-    assert isinstance(reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
+    assert isinstance(
        reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
 def verify_done(done):
-    assert isinstance(done, bool), f"Returned {done} as done flag, expected bool."
+    assert isinstance(
        done, bool), f"Returned {done} as done flag, expected bool."
`@ -1 +1 @@`
	`from .mp_wrapper import MPWrapper`	`from .mp_wrapper import MPWrapper, ReplanMPWrapper`