diff --git a/README.md b/README.md
index edd1aac..25cb6d8 100644
--- a/README.md
+++ b/README.md
@@ -1,199 +1,218 @@
-## ALR Robotics Control Environments
+# Fancy Gym
-This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/).
-Besides, we also provide support (under the OpenAI interface) for the benchmark suites
+`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
+of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI gym interface) for the benchmark suites
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
-(DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according
-to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
-additionally support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP,
-we only consider the mean usually).
+(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
+gym environments, use [this guide](https://www.gymlibrary.ml/content/environment_creation/). We highly appreciate it, if
+you would then submit a PR for this environment to become part of `fancy_gym`.
+In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
+Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
-## Motion Primitive Environments (Episodic environments)
+## Movement Primitive Environments (Episode-Based/Black-Box Environments)
-Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
-optimization, and methods that are often used in robotics. MP environments are trajectory-based and always execute a full
-trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (ProMP). The
-generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
-however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
-velocity, and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
-represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
-framework we support all of this also for the contextual setting, for which we expose all changing substates of the
-task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
-trajectory. All environments provide next to the cumulative episode reward all collected information from each
-step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.
-
-|Key| Description|
-|---|---|
-`trajectory`| Generated trajectory from MP
-`step_actions`| Step-wise executed action based on controller output
-`step_observations`| Step-wise intermediate observations
-`step_rewards`| Step-wise rewards
-`trajectory_length`| Total number of environment interactions
-`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
+Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
+optimization, and methods that are often used in traditional robotics and control. MP environments are typically
+episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
+Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
+step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
+of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
+respectively as well as a special controller for the MetaWorld control suite.
+The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
+the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
+the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
+episode. This requires to predict a new action/MP parametrization for each context.
## Installation
1. Clone the repository
```bash
-git clone git@github.com:ALRhub/alr_envs.git
+git clone git@github.com:ALRhub/fancy_gym.git
```
2. Go to the folder
```bash
-cd alr_envs
+cd fancy_gym
```
3. Install with
```bash
-pip install -e .
+pip install -e .
```
-## Using the framework
+In case you want to use dm_control oder metaworld, you can install them by specifying extras
-We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
+```bash
+pip install -e .[dmc, metaworld]
+```
-### Step-wise environments
+> **Note:**
+> While our library already fully supports the new mujoco bindings, metaworld still relies on
+> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
+
+## How to use Fancy Gym
+
+We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
+
+### Step-wise Environments
```python
-import alr_envs
+import fancy_gym
-env = alr_envs.make('HoleReacher-v0', seed=1)
-state = env.reset()
+env = fancy_gym.make('Reacher5d-v0', seed=1)
+obs = env.reset()
for i in range(1000):
- state, reward, done, info = env.step(env.action_space.sample())
+ action = env.action_space.sample()
+ obs, reward, done, info = env.step(action)
if i % 5 == 0:
env.render()
if done:
- state = env.reset()
+ obs = env.reset()
```
-For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
-as `manipulation-environment_name`. All other environments can be created based on their original name.
+When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
+tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
+custom tasks and standard gym environments can be created without prefixes.
-Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
-trajectory.
+### Black-box Environments
+
+All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
+each environment returns all collected information from each step as part of the infos. This information is, however,
+mainly meant for debugging as well as logging and not for training.
+
+|Key| Description|Type
+|---|---|---|
+`positions`| Generated trajectory from MP | Optional
+`velocities`| Generated trajectory from MP | Optional
+`step_actions`| Step-wise executed action based on controller output | Optional
+`step_observations`| Step-wise intermediate observations | Optional
+`step_rewards`| Step-wise rewards | Optional
+`trajectory_length`| Total number of environment interactions | Always
+`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
+
+Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
+
+> **Note:**
+> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
+> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
+> black-box leaning.
+> While this is already implemented, it is still in beta and requires further testing.
+> Feel free to try it and open an issue with any problems that occur.
```python
-import alr_envs
+import fancy_gym
-env = alr_envs.make('HoleReacherProMP-v0', seed=1)
-# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).
-env.render()
+env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
+# render() can be called once in the beginning with all necessary arguments.
+# To turn it of again just call render() without any arguments.
+env.render(mode='human')
-state = env.reset()
+# This returns the context information, not the full state observation
+obs = env.reset()
for i in range(5):
- state, reward, done, info = env.step(env.action_space.sample())
+ action = env.action_space.sample()
+ obs, reward, done, info = env.step(action)
- # Not really necessary as the environments resets itself after each trajectory anyway.
- state = env.reset()
+ # Done is always True as we are working on the episode level, hence we always reset()
+ obs = env.reset()
```
-To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
-keys `DMP` and `ProMP` that store a list of available environment names.
+To show all available environments, we provide some additional convenience variables. All of them return a dictionary
+with two keys `DMP` and `ProMP` that store a list of available environment ids.
```python
-import alr_envs
+import fancy_gym
-print("Custom MP tasks:")
-print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)
+print("Fancy Black-box tasks:")
+print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
-print("OpenAI Gym MP tasks:")
-print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
+print("OpenAI Gym Black-box tasks:")
+print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
-print("Deepmind Control MP tasks:")
-print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)
+print("Deepmind Control Black-box tasks:")
+print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
-print("MetaWorld MP tasks:")
-print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
+print("MetaWorld Black-box tasks:")
+print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
```
### How to create a new MP task
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
-hand, the following interface needs to be implemented.
+hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented.
```python
+from abc import abstractmethod
+from typing import Union, Tuple
+
+import gym
import numpy as np
-from mp_env_api import MPEnvWrapper
-class MPWrapper(MPEnvWrapper):
+class RawInterfaceWrapper(gym.Wrapper):
@property
- def active_obs(self):
+ def context_mask(self) -> np.ndarray:
"""
- Returns boolean mask for each substate in the full observation.
- It determines whether the observation is returned for the contextual case or not.
- This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
- E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first
- observation, the velocities are not necessary in the observation for the MP task.
+ Returns boolean mask of the same shape as the observation space.
+ It determines whether the observation is returned for the contextual case or not.
+ This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+ E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
+ context/part of the first observation, the velocities are not necessary in the observation for the task.
+ Returns:
+ bool array representing the indices of the observations
+
"""
- return np.ones(self.observation_space.shape, dtype=bool)
+ return np.ones(self.env.observation_space.shape[0], dtype=bool)
@property
- def current_vel(self):
+ @abstractmethod
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
"""
- Returns the current velocity of the action/control dimension.
+ Returns the current position of the action/control dimension.
The dimensionality has to match the action/control dimension.
- This is not required when exclusively using position control,
- it should, however, be implemented regardless.
- E.g. The joint velocities that are directly or indirectly controlled by the action.
- """
- raise NotImplementedError()
-
- @property
- def current_pos(self):
- """
- Returns the current position of the action/control dimension.
- The dimensionality has to match the action/control dimension.
- This is not required when exclusively using velocity control,
+ This is not required when exclusively using velocity control,
it should, however, be implemented regardless.
E.g. The joint positions that are directly or indirectly controlled by the action.
"""
raise NotImplementedError()
@property
- def goal_pos(self):
+ @abstractmethod
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
"""
- Returns a predefined final position of the action/control dimension.
- This is only required for the DMP and is most of the time learned instead.
- """
- raise NotImplementedError()
-
- @property
- def dt(self):
- """
- Returns the time between two simulated steps of the environment
+ Returns the current velocity of the action/control dimension.
+ The dimensionality has to match the action/control dimension.
+ This is not required when exclusively using position control,
+ it should, however, be implemented regardless.
+ E.g. The joint velocities that are directly or indirectly controlled by the action.
"""
raise NotImplementedError()
```
-If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well.
-Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend
-having a look at the [examples](alr_envs/examples/).
+If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
+integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
+at the [examples](fancy_gym/examples/).
```python
-import alr_envs
+import fancy_gym
# Base environment name, according to structure of above example
base_env_id = "ball_in_cup-catch"
-# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
+# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
# You can also add other gym.Wrappers in case they are needed,
# e.g. gym.wrappers.FlattenObservation for dict observations
-wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
-mp_kwargs = {...}
+wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
kwargs = {...}
-env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
-# OR for a deterministic ProMP (other mp_kwargs are required):
-# env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)
+env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
rewards = 0
obs = env.reset()
diff --git a/fancy_gym/__init__.py b/fancy_gym/__init__.py
new file mode 100644
index 0000000..f6f690a
--- /dev/null
+++ b/fancy_gym/__init__.py
@@ -0,0 +1,13 @@
+from fancy_gym import dmc, meta, open_ai
+from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank
+from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+# Convenience function for all MP environments
+from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+
+ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {
+ key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
+ ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
+ ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key]
+ for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
diff --git a/__init__.py b/fancy_gym/black_box/__init__.py
similarity index 100%
rename from __init__.py
rename to fancy_gym/black_box/__init__.py
diff --git a/fancy_gym/black_box/black_box_wrapper.py b/fancy_gym/black_box/black_box_wrapper.py
new file mode 100644
index 0000000..9b3e95e
--- /dev/null
+++ b/fancy_gym/black_box/black_box_wrapper.py
@@ -0,0 +1,183 @@
+from typing import Tuple, Optional
+
+import gym
+import numpy as np
+from gym import spaces
+from mp_pytorch.mp.mp_interfaces import MPInterface
+
+from fancy_gym.black_box.controller.base_controller import BaseController
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+from fancy_gym.utils.utils import get_numpy
+
+
+class BlackBoxWrapper(gym.ObservationWrapper):
+
+ def __init__(self,
+ env: RawInterfaceWrapper,
+ trajectory_generator: MPInterface,
+ tracking_controller: BaseController,
+ duration: float,
+ verbose: int = 1,
+ learn_sub_trajectories: bool = False,
+ replanning_schedule: Optional[callable] = None,
+ reward_aggregation: callable = np.sum
+ ):
+ """
+ gym.Wrapper for leveraging a black box approach with a trajectory generator.
+
+ Args:
+ env: The (wrapped) environment this wrapper is applied on
+ trajectory_generator: Generates the full or partial trajectory
+ tracking_controller: Translates the desired trajectory to raw action sequences
+ duration: Length of the trajectory of the movement primitive in seconds
+ verbose: level of detail for returned values in info dict.
+ learn_sub_trajectories: Transforms full episode learning into learning sub-trajectories, similar to
+ step-based learning
+ replanning_schedule: callable that receives
+ reward_aggregation: function that takes the np.ndarray of step rewards as input and returns the trajectory
+ reward, default summation over all values.
+ """
+ super().__init__(env)
+
+ self.duration = duration
+ self.learn_sub_trajectories = learn_sub_trajectories
+ self.do_replanning = replanning_schedule is not None
+ self.replanning_schedule = replanning_schedule or (lambda *x: False)
+ self.current_traj_steps = 0
+
+ # trajectory generation
+ self.traj_gen = trajectory_generator
+ self.tracking_controller = tracking_controller
+ # self.time_steps = np.linspace(0, self.duration, self.traj_steps)
+ # self.traj_gen.set_mp_times(self.time_steps)
+ self.traj_gen.set_duration(self.duration, self.dt)
+
+ # reward computation
+ self.reward_aggregation = reward_aggregation
+
+ # spaces
+ self.return_context_observation = not (learn_sub_trajectories or self.do_replanning)
+ self.traj_gen_action_space = self._get_traj_gen_action_space()
+ self.action_space = self._get_action_space()
+ self.observation_space = self._get_observation_space()
+
+ # rendering
+ self.render_kwargs = {}
+ self.verbose = verbose
+
+ def observation(self, observation):
+ # return context space if we are
+ obs = observation[self.env.context_mask] if self.return_context_observation else observation
+ # cast dtype because metaworld returns incorrect that throws gym error
+ return obs.astype(self.observation_space.dtype)
+
+ def get_trajectory(self, action: np.ndarray) -> Tuple:
+ clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
+ self.traj_gen.set_params(clipped_params)
+ # TODO: is this correct for replanning? Do we need to adjust anything here?
+ bc_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt)
+ self.traj_gen.set_boundary_conditions(bc_time, self.current_pos, self.current_vel)
+ # TODO: remove the - self.dt after Bruces fix.
+ self.traj_gen.set_duration(None if self.learn_sub_trajectories else self.duration, self.dt)
+ # traj_dict = self.traj_gen.get_trajs(get_pos=True, get_vel=True)
+ trajectory = get_numpy(self.traj_gen.get_traj_pos())
+ velocity = get_numpy(self.traj_gen.get_traj_vel())
+
+ print(len(trajectory))
+
+ if self.do_replanning:
+ # Remove first part of trajectory as this is already over
+ trajectory = trajectory[self.current_traj_steps:]
+ velocity = velocity[self.current_traj_steps:]
+
+ return trajectory, velocity
+
+ def _get_traj_gen_action_space(self):
+ """This function can be used to set up an individual space for the parameters of the traj_gen."""
+ min_action_bounds, max_action_bounds = self.traj_gen.get_params_bounds()
+ action_space = gym.spaces.Box(low=min_action_bounds.numpy(), high=max_action_bounds.numpy(),
+ dtype=self.env.action_space.dtype)
+ return action_space
+
+ def _get_action_space(self):
+ """
+ This function can be used to modify the action space for considering actions which are not learned via movement
+ primitives. E.g. ball releasing time for the beer pong task. By default, it is the parameter space of the
+ movement primitive.
+ Only needs to be overwritten if the action space needs to be modified.
+ """
+ try:
+ return self.traj_gen_action_space
+ except AttributeError:
+ return self._get_traj_gen_action_space()
+
+ def _get_observation_space(self):
+ if self.return_context_observation:
+ mask = self.env.context_mask
+ # return full observation
+ min_obs_bound = self.env.observation_space.low[mask]
+ max_obs_bound = self.env.observation_space.high[mask]
+ return spaces.Box(low=min_obs_bound, high=max_obs_bound, dtype=self.env.observation_space.dtype)
+ return self.env.observation_space
+
+ def step(self, action: np.ndarray):
+ """ This function generates a trajectory based on a MP and then does the usual loop over reset and step"""
+
+ # TODO remove this part, right now only needed for beer pong
+ mp_params, env_spec_params = self.env.episode_callback(action, self.traj_gen)
+ trajectory, velocity = self.get_trajectory(mp_params)
+
+ trajectory_length = len(trajectory)
+ rewards = np.zeros(shape=(trajectory_length,))
+ if self.verbose >= 2:
+ actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape)
+ observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
+ dtype=self.env.observation_space.dtype)
+
+ infos = dict()
+ done = False
+
+ for t, (pos, vel) in enumerate(zip(trajectory, velocity)):
+ step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel)
+ c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high)
+ obs, c_reward, done, info = self.env.step(c_action)
+ rewards[t] = c_reward
+
+ if self.verbose >= 2:
+ actions[t, :] = c_action
+ observations[t, :] = obs
+
+ for k, v in info.items():
+ elems = infos.get(k, [None] * trajectory_length)
+ elems[t] = v
+ infos[k] = elems
+
+ if self.render_kwargs:
+ self.env.render(**self.render_kwargs)
+
+ if done or self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action,
+ t + 1 + self.current_traj_steps):
+ break
+
+ infos.update({k: v[:t] for k, v in infos.items()})
+ self.current_traj_steps += t + 1
+
+ if self.verbose >= 2:
+ infos['positions'] = trajectory
+ infos['velocities'] = velocity
+ infos['step_actions'] = actions[:t + 1]
+ infos['step_observations'] = observations[:t + 1]
+ infos['step_rewards'] = rewards[:t + 1]
+
+ infos['trajectory_length'] = t + 1
+ trajectory_return = self.reward_aggregation(rewards[:t + 1])
+ return self.observation(obs), trajectory_return, done, infos
+
+ def render(self, **kwargs):
+ """Only set render options here, such that they can be used during the rollout.
+ This only needs to be called once"""
+ self.render_kwargs = kwargs
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ self.current_traj_steps = 0
+ return super(BlackBoxWrapper, self).reset()
diff --git a/fancy_gym/black_box/controller/__init__.py b/fancy_gym/black_box/controller/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/black_box/controller/base_controller.py b/fancy_gym/black_box/controller/base_controller.py
new file mode 100644
index 0000000..1ac1522
--- /dev/null
+++ b/fancy_gym/black_box/controller/base_controller.py
@@ -0,0 +1,4 @@
+class BaseController:
+
+ def get_action(self, des_pos, des_vel, c_pos, c_vel):
+ raise NotImplementedError
diff --git a/fancy_gym/black_box/controller/meta_world_controller.py b/fancy_gym/black_box/controller/meta_world_controller.py
new file mode 100644
index 0000000..efd8983
--- /dev/null
+++ b/fancy_gym/black_box/controller/meta_world_controller.py
@@ -0,0 +1,24 @@
+import numpy as np
+
+from fancy_gym.black_box.controller.base_controller import BaseController
+
+
+class MetaWorldController(BaseController):
+ """
+ A Metaworld Controller. Using position and velocity information from a provided environment,
+ the tracking_controller calculates a response based on the desired position and velocity.
+ Unlike the other Controllers, this is a special tracking_controller for MetaWorld environments.
+ They use a position delta for the xyz coordinates and a raw position for the gripper opening.
+
+ """
+
+ def get_action(self, des_pos, des_vel, c_pos, c_vel):
+ gripper_pos = des_pos[-1]
+
+ cur_pos = c_pos[:-1]
+ xyz_pos = des_pos[:-1]
+
+ assert xyz_pos.shape == cur_pos.shape, \
+ f"Mismatch in dimension between desired position {xyz_pos.shape} and current position {cur_pos.shape}"
+ trq = np.hstack([(xyz_pos - cur_pos), gripper_pos])
+ return trq
diff --git a/fancy_gym/black_box/controller/pd_controller.py b/fancy_gym/black_box/controller/pd_controller.py
new file mode 100644
index 0000000..35203d8
--- /dev/null
+++ b/fancy_gym/black_box/controller/pd_controller.py
@@ -0,0 +1,28 @@
+from typing import Union, Tuple
+
+from fancy_gym.black_box.controller.base_controller import BaseController
+
+
+class PDController(BaseController):
+ """
+ A PD-Controller. Using position and velocity information from a provided environment,
+ the tracking_controller calculates a response based on the desired position and velocity
+
+ :param env: A position environment
+ :param p_gains: Factors for the proportional gains
+ :param d_gains: Factors for the differential gains
+ """
+
+ def __init__(self,
+ p_gains: Union[float, Tuple] = 1,
+ d_gains: Union[float, Tuple] = 0.5):
+ self.p_gains = p_gains
+ self.d_gains = d_gains
+
+ def get_action(self, des_pos, des_vel, c_pos, c_vel):
+ assert des_pos.shape == c_pos.shape, \
+ f"Mismatch in dimension between desired position {des_pos.shape} and current position {c_pos.shape}"
+ assert des_vel.shape == c_vel.shape, \
+ f"Mismatch in dimension between desired velocity {des_vel.shape} and current velocity {c_vel.shape}"
+ trq = self.p_gains * (des_pos - c_pos) + self.d_gains * (des_vel - c_vel)
+ return trq
diff --git a/fancy_gym/black_box/controller/pos_controller.py b/fancy_gym/black_box/controller/pos_controller.py
new file mode 100644
index 0000000..517187d
--- /dev/null
+++ b/fancy_gym/black_box/controller/pos_controller.py
@@ -0,0 +1,9 @@
+from fancy_gym.black_box.controller.base_controller import BaseController
+
+
+class PosController(BaseController):
+ """
+ A Position Controller. The tracking_controller calculates a response only based on the desired position.
+ """
+ def get_action(self, des_pos, des_vel, c_pos, c_vel):
+ return des_pos
diff --git a/fancy_gym/black_box/controller/vel_controller.py b/fancy_gym/black_box/controller/vel_controller.py
new file mode 100644
index 0000000..b26d9ba
--- /dev/null
+++ b/fancy_gym/black_box/controller/vel_controller.py
@@ -0,0 +1,9 @@
+from fancy_gym.black_box.controller.base_controller import BaseController
+
+
+class VelController(BaseController):
+ """
+ A Velocity Controller. The tracking_controller calculates a response only based on the desired velocity.
+ """
+ def get_action(self, des_pos, des_vel, c_pos, c_vel):
+ return des_vel
diff --git a/fancy_gym/black_box/factory/__init__.py b/fancy_gym/black_box/factory/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/black_box/factory/basis_generator_factory.py b/fancy_gym/black_box/factory/basis_generator_factory.py
new file mode 100644
index 0000000..4601953
--- /dev/null
+++ b/fancy_gym/black_box/factory/basis_generator_factory.py
@@ -0,0 +1,18 @@
+from mp_pytorch.basis_gn import NormalizedRBFBasisGenerator, ZeroPaddingNormalizedRBFBasisGenerator
+from mp_pytorch.phase_gn import PhaseGenerator
+
+ALL_TYPES = ["rbf", "zero_rbf", "rhythmic"]
+
+
+def get_basis_generator(basis_generator_type: str, phase_generator: PhaseGenerator, **kwargs):
+ basis_generator_type = basis_generator_type.lower()
+ if basis_generator_type == "rbf":
+ return NormalizedRBFBasisGenerator(phase_generator, **kwargs)
+ elif basis_generator_type == "zero_rbf":
+ return ZeroPaddingNormalizedRBFBasisGenerator(phase_generator, **kwargs)
+ elif basis_generator_type == "rhythmic":
+ raise NotImplementedError()
+ # return RhythmicBasisGenerator(phase_generator, **kwargs)
+ else:
+ raise ValueError(f"Specified basis generator type {basis_generator_type} not supported, "
+ f"please choose one of {ALL_TYPES}.")
diff --git a/fancy_gym/black_box/factory/controller_factory.py b/fancy_gym/black_box/factory/controller_factory.py
new file mode 100644
index 0000000..8b2d865
--- /dev/null
+++ b/fancy_gym/black_box/factory/controller_factory.py
@@ -0,0 +1,21 @@
+from fancy_gym.black_box.controller.meta_world_controller import MetaWorldController
+from fancy_gym.black_box.controller.pd_controller import PDController
+from fancy_gym.black_box.controller.pos_controller import PosController
+from fancy_gym.black_box.controller.vel_controller import VelController
+
+ALL_TYPES = ["motor", "velocity", "position", "metaworld"]
+
+
+def get_controller(controller_type: str, **kwargs):
+ controller_type = controller_type.lower()
+ if controller_type == "motor":
+ return PDController(**kwargs)
+ elif controller_type == "velocity":
+ return VelController()
+ elif controller_type == "position":
+ return PosController()
+ elif controller_type == "metaworld":
+ return MetaWorldController()
+ else:
+ raise ValueError(f"Specified controller type {controller_type} not supported, "
+ f"please choose one of {ALL_TYPES}.")
diff --git a/fancy_gym/black_box/factory/phase_generator_factory.py b/fancy_gym/black_box/factory/phase_generator_factory.py
new file mode 100644
index 0000000..67e17f8
--- /dev/null
+++ b/fancy_gym/black_box/factory/phase_generator_factory.py
@@ -0,0 +1,23 @@
+from mp_pytorch.phase_gn import LinearPhaseGenerator, ExpDecayPhaseGenerator
+
+# from mp_pytorch.phase_gn.rhythmic_phase_generator import RhythmicPhaseGenerator
+# from mp_pytorch.phase_gn.smooth_phase_generator import SmoothPhaseGenerator
+
+ALL_TYPES = ["linear", "exp", "rhythmic", "smooth"]
+
+
+def get_phase_generator(phase_generator_type, **kwargs):
+ phase_generator_type = phase_generator_type.lower()
+ if phase_generator_type == "linear":
+ return LinearPhaseGenerator(**kwargs)
+ elif phase_generator_type == "exp":
+ return ExpDecayPhaseGenerator(**kwargs)
+ elif phase_generator_type == "rhythmic":
+ raise NotImplementedError()
+ # return RhythmicPhaseGenerator(**kwargs)
+ elif phase_generator_type == "smooth":
+ raise NotImplementedError()
+ # return SmoothPhaseGenerator(**kwargs)
+ else:
+ raise ValueError(f"Specified phase generator type {phase_generator_type} not supported, "
+ f"please choose one of {ALL_TYPES}.")
diff --git a/fancy_gym/black_box/factory/trajectory_generator_factory.py b/fancy_gym/black_box/factory/trajectory_generator_factory.py
new file mode 100644
index 0000000..5a36fdd
--- /dev/null
+++ b/fancy_gym/black_box/factory/trajectory_generator_factory.py
@@ -0,0 +1,21 @@
+from mp_pytorch.basis_gn import BasisGenerator
+from mp_pytorch.mp import ProDMP, DMP, ProMP
+
+ALL_TYPES = ["promp", "dmp", "idmp"]
+
+
+def get_trajectory_generator(
+ trajectory_generator_type: str, action_dim: int, basis_generator: BasisGenerator, **kwargs
+):
+ trajectory_generator_type = trajectory_generator_type.lower()
+ if trajectory_generator_type == "promp":
+ return ProMP(basis_generator, action_dim, **kwargs)
+ elif trajectory_generator_type == "dmp":
+ return DMP(basis_generator, action_dim, **kwargs)
+ elif trajectory_generator_type == 'prodmp':
+ from mp_pytorch.basis_gn import ProDMPBasisGenerator
+ assert isinstance(basis_generator, ProDMPBasisGenerator)
+ return ProDMP(basis_generator, action_dim, **kwargs)
+ else:
+ raise ValueError(f"Specified movement primitive type {trajectory_generator_type} not supported, "
+ f"please choose one of {ALL_TYPES}.")
diff --git a/fancy_gym/black_box/raw_interface_wrapper.py b/fancy_gym/black_box/raw_interface_wrapper.py
new file mode 100644
index 0000000..7b2e996
--- /dev/null
+++ b/fancy_gym/black_box/raw_interface_wrapper.py
@@ -0,0 +1,71 @@
+from abc import abstractmethod
+from typing import Union, Tuple
+
+import gym
+import numpy as np
+from mp_pytorch.mp.mp_interfaces import MPInterface
+
+
+class RawInterfaceWrapper(gym.Wrapper):
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ """
+ Returns boolean mask of the same shape as the observation space.
+ It determines whether the observation is returned for the contextual case or not.
+ This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+ E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
+ context/part of the first observation, the velocities are not necessary in the observation for the task.
+ Returns:
+ bool array representing the indices of the observations
+
+ """
+ return np.ones(self.env.observation_space.shape[0], dtype=bool)
+
+ @property
+ @abstractmethod
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ """
+ Returns the current position of the action/control dimension.
+ The dimensionality has to match the action/control dimension.
+ This is not required when exclusively using velocity control,
+ it should, however, be implemented regardless.
+ E.g. The joint positions that are directly or indirectly controlled by the action.
+ """
+ raise NotImplementedError()
+
+ @property
+ @abstractmethod
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ """
+ Returns the current velocity of the action/control dimension.
+ The dimensionality has to match the action/control dimension.
+ This is not required when exclusively using position control,
+ it should, however, be implemented regardless.
+ E.g. The joint velocities that are directly or indirectly controlled by the action.
+ """
+ raise NotImplementedError()
+
+ @property
+ def dt(self) -> float:
+ """
+ Control frequency of the environment
+ Returns: float
+
+ """
+ return self.env.dt
+
+ def episode_callback(self, action: np.ndarray, traj_gen: MPInterface) -> Tuple[
+ np.ndarray, Union[np.ndarray, None]]:
+ """
+ Used to extract the parameters for the movement primitive and other parameters from an action array which might
+ include other actions like ball releasing time for the beer pong environment.
+ This only needs to be overwritten if the action space is modified.
+ Args:
+ action: a vector instance of the whole action space, includes traj_gen parameters and additional parameters if
+ specified, else only traj_gen parameters
+
+ Returns:
+ Tuple: mp_arguments and other arguments
+ """
+ return action, None
diff --git a/fancy_gym/dmc/README.MD b/fancy_gym/dmc/README.MD
new file mode 100644
index 0000000..040a9a0
--- /dev/null
+++ b/fancy_gym/dmc/README.MD
@@ -0,0 +1,19 @@
+# DeepMind Control (DMC) Wrappers
+
+These are the Environment Wrappers for selected
+[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
+environments in order to use our Motion Primitive gym interface with them.
+
+## MP Environments
+
+[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. Given most task can be)
+[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
+
+|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
+|---|---|---|---|---|
+|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2
+|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2
+|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
+|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
+|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
+|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
diff --git a/fancy_gym/dmc/__init__.py b/fancy_gym/dmc/__init__.py
new file mode 100644
index 0000000..22ae47f
--- /dev/null
+++ b/fancy_gym/dmc/__init__.py
@@ -0,0 +1,245 @@
+from copy import deepcopy
+
+from . import manipulation, suite
+
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []}
+
+from gym.envs.registration import register
+
+DEFAULT_BB_DICT_ProMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'promp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'linear'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 50.,
+ "d_gains": 1.,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+}
+
+DEFAULT_BB_DICT_DMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'dmp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'exp'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 50.,
+ "d_gains": 1.,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'rbf',
+ 'num_basis': 5
+ }
+}
+
+# DeepMind Control Suite (DMC)
+kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
+kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
+# bandwidth_factor=2
+kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1
+register(
+ id=f'dmc_ball_in_cup-catch_dmp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_bic_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
+
+kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
+kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
+register(
+ id=f'dmc_ball_in_cup-catch_promp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_bic_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
+
+kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
+kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
+# bandwidth_factor=2
+kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+# TODO: weight scale 50, but goal scale 0.1
+kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+register(
+ id=f'dmc_reacher-easy_dmp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_bic_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
+
+kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
+kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
+kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+register(
+ id=f'dmc_reacher-easy_promp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_reacher_easy_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
+
+kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
+kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
+# bandwidth_factor = 2
+kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+# TODO: weight scale 50, but goal scale 0.1
+kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+register(
+ id=f'dmc_reacher-hard_dmp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_reacher_hard_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
+
+kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
+kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
+kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+register(
+ id=f'dmc_reacher-hard_promp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_reacher_hard_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
+
+_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
+
+for _task in _dmc_cartpole_tasks:
+ _env_id = f'dmc_cartpole-{_task}_dmp-v0'
+ kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+ kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
+ kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
+ # bandwidth_factor = 2
+ kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+ # TODO: weight scale 50, but goal scale 0.1
+ kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+ kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
+ kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole_dmp
+ )
+ ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+ _env_id = f'dmc_cartpole-{_task}_promp-v0'
+ kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+ kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
+ kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
+ kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
+ kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
+ kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole_promp
+ )
+ ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
+kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
+# bandwidth_factor = 2
+kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+# TODO: weight scale 50, but goal scale 0.1
+kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
+kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
+_env_id = f'dmc_cartpole-two_poles_dmp-v0'
+register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole2poles_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
+kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
+kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
+kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
+kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+_env_id = f'dmc_cartpole-two_poles_promp-v0'
+register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole2poles_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
+kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
+# bandwidth_factor = 2
+kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+# TODO: weight scale 50, but goal scale 0.1
+kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
+kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
+_env_id = f'dmc_cartpole-three_poles_dmp-v0'
+register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole3poles_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
+kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
+kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
+kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
+kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+_env_id = f'dmc_cartpole-three_poles_promp-v0'
+register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_cartpole3poles_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+# DeepMind Manipulation
+kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
+kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
+kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+# TODO: weight scale 50, but goal scale 0.1
+kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
+register(
+ id=f'dmc_manipulation-reach_site_dmp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_mani_reach_site_features_dmp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
+
+kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
+kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
+kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
+kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
+register(
+ id=f'dmc_manipulation-reach_site_promp-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_mani_reach_site_features_promp
+)
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")
diff --git a/fancy_gym/dmc/dmc_wrapper.py b/fancy_gym/dmc/dmc_wrapper.py
new file mode 100644
index 0000000..b1522c3
--- /dev/null
+++ b/fancy_gym/dmc/dmc_wrapper.py
@@ -0,0 +1,186 @@
+# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
+# License: MIT
+# Copyright (c) 2020 Denis Yarats
+import collections
+from collections.abc import MutableMapping
+from typing import Any, Dict, Tuple, Optional, Union, Callable
+
+import gym
+import numpy as np
+from dm_control import composer
+from dm_control.rl import control
+from dm_env import specs
+from gym import spaces
+from gym.core import ObsType
+
+
+def _spec_to_box(spec):
+ def extract_min_max(s):
+ assert s.dtype == np.float64 or s.dtype == np.float32, \
+ f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
+ dim = int(np.prod(s.shape))
+ if type(s) == specs.Array:
+ bound = np.inf * np.ones(dim, dtype=s.dtype)
+ return -bound, bound
+ elif type(s) == specs.BoundedArray:
+ zeros = np.zeros(dim, dtype=s.dtype)
+ return s.minimum + zeros, s.maximum + zeros
+
+ mins, maxs = [], []
+ for s in spec:
+ mn, mx = extract_min_max(s)
+ mins.append(mn)
+ maxs.append(mx)
+ low = np.concatenate(mins, axis=0)
+ high = np.concatenate(maxs, axis=0)
+ assert low.shape == high.shape
+ return spaces.Box(low, high, dtype=s.dtype)
+
+
+def _flatten_obs(obs: MutableMapping):
+ """
+ Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
+ Args:
+ obs: observation to flatten
+
+ Returns: 1D array of observation
+
+ """
+
+ if not isinstance(obs, MutableMapping):
+ raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
+
+ # Keep key order consistent for non OrderedDicts
+ keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
+
+ obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
+ return np.concatenate(obs_vals)
+
+
+class DMCWrapper(gym.Env):
+ def __init__(self,
+ env: Callable[[], Union[composer.Environment, control.Environment]],
+ ):
+
+ # TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
+ self._env = env()
+
+ # action and observation space
+ self._action_space = _spec_to_box([self._env.action_spec()])
+ self._observation_space = _spec_to_box(self._env.observation_spec().values())
+
+ self._window = None
+ self.id = 'dmc'
+
+ def __getattr__(self, item):
+ """Propagate only non-existent properties to wrapped env."""
+ if item.startswith('_'):
+ raise AttributeError("attempted to get missing private attribute '{}'".format(item))
+ if item in self.__dict__:
+ return getattr(self, item)
+ return getattr(self._env, item)
+
+ def _get_obs(self, time_step):
+ obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
+ return obs
+
+ @property
+ def observation_space(self):
+ return self._observation_space
+
+ @property
+ def action_space(self):
+ return self._action_space
+
+ @property
+ def dt(self):
+ return self._env.control_timestep()
+
+ def seed(self, seed=None):
+ self._action_space.seed(seed)
+ self._observation_space.seed(seed)
+
+ def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
+ assert self._action_space.contains(action)
+ extra = {'internal_state': self._env.physics.get_state().copy()}
+
+ time_step = self._env.step(action)
+ reward = time_step.reward or 0.
+ done = time_step.last()
+ obs = self._get_obs(time_step)
+ extra['discount'] = time_step.discount
+
+ return obs, reward, done, extra
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ time_step = self._env.reset()
+ obs = self._get_obs(time_step)
+ return obs
+
+ def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
+ segmentation=False, scene_option=None, render_flag_overrides=None):
+
+ # assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
+ if mode == "rgb_array":
+ return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
+ depth=depth, segmentation=segmentation, scene_option=scene_option,
+ render_flag_overrides=render_flag_overrides)
+
+ # Render max available buffer size. Larger is only possible by altering the XML.
+ img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
+ width=self._env.physics.model.vis.global_.offwidth,
+ camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
+ scene_option=scene_option, render_flag_overrides=render_flag_overrides)
+
+ if depth:
+ img = np.dstack([img.astype(np.uint8)] * 3)
+
+ if mode == 'human':
+ try:
+ import cv2
+ if self._window is None:
+ self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
+ cv2.imshow(self.id, img[..., ::-1]) # Image in BGR
+ cv2.waitKey(1)
+ except ImportError:
+ raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
+ # PYGAME seems to destroy some global rendering configs from the physics render
+ # except ImportError:
+ # import pygame
+ # img_copy = img.copy().transpose((1, 0, 2))
+ # if self._window is None:
+ # pygame.init()
+ # pygame.display.init()
+ # self._window = pygame.display.set_mode(img_copy.shape[:2])
+ # self.clock = pygame.time.Clock()
+ #
+ # surf = pygame.surfarray.make_surface(img_copy)
+ # self._window.blit(surf, (0, 0))
+ # pygame.event.pump()
+ # self.clock.tick(30)
+ # pygame.display.flip()
+
+ def close(self):
+ super().close()
+ if self._window is not None:
+ try:
+ import cv2
+ cv2.destroyWindow(self.id)
+ except ImportError:
+ import pygame
+
+ pygame.display.quit()
+ pygame.quit()
+
+ @property
+ def reward_range(self) -> Tuple[float, float]:
+ reward_spec = self._env.reward_spec()
+ if isinstance(reward_spec, specs.BoundedArray):
+ return reward_spec.minimum, reward_spec.maximum
+ return -float('inf'), float('inf')
+
+ @property
+ def metadata(self):
+ return {'render.modes': ['human', 'rgb_array'],
+ 'video.frames_per_second': round(1.0 / self._env.control_timestep())}
diff --git a/fancy_gym/dmc/manipulation/__init__.py b/fancy_gym/dmc/manipulation/__init__.py
new file mode 100644
index 0000000..f3a4147
--- /dev/null
+++ b/fancy_gym/dmc/manipulation/__init__.py
@@ -0,0 +1 @@
+from . import reach_site
diff --git a/fancy_gym/dmc/manipulation/reach_site/__init__.py b/fancy_gym/dmc/manipulation/reach_site/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/dmc/manipulation/reach_site/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
new file mode 100644
index 0000000..f64ac4a
--- /dev/null
+++ b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
@@ -0,0 +1,38 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # Joint and target positions are randomized, velocities are always set to 0.
+ return np.hstack([
+ [True] * 3, # target position
+ [True] * 12, # sin/cos arm joint position
+ [True] * 6, # arm joint torques
+ [False] * 6, # arm joint velocities
+ [True] * 3, # sin/cos hand joint position
+ [False] * 3, # hand joint velocities
+ [True] * 3, # hand pinch site position
+ [True] * 9, # pinch site rmat
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.env.physics.named.data.qpos[:]
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.physics.named.data.qvel[:]
+
+ @property
+ def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ raise ValueError("Goal position is not available and has to be learnt based on the environment.")
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self.env.dt
diff --git a/fancy_gym/dmc/suite/__init__.py b/fancy_gym/dmc/suite/__init__.py
new file mode 100644
index 0000000..7d206fc
--- /dev/null
+++ b/fancy_gym/dmc/suite/__init__.py
@@ -0,0 +1 @@
+from . import cartpole, ball_in_cup, reacher
diff --git a/fancy_gym/dmc/suite/ball_in_cup/__init__.py b/fancy_gym/dmc/suite/ball_in_cup/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/dmc/suite/ball_in_cup/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
new file mode 100644
index 0000000..dc6a539
--- /dev/null
+++ b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
@@ -0,0 +1,34 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # Besides the ball position, the environment is always set to 0.
+ return np.hstack([
+ [False] * 2, # cup position
+ [True] * 2, # ball position
+ [False] * 2, # cup velocity
+ [False] * 2, # ball velocity
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return np.hstack([self.env.physics.named.data.qpos['cup_x'], self.env.physics.named.data.qpos['cup_z']])
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return np.hstack([self.env.physics.named.data.qvel['cup_x'], self.env.physics.named.data.qvel['cup_z']])
+
+ @property
+ def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ raise ValueError("Goal position is not available and has to be learnt based on the environment.")
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self.env.dt
diff --git a/fancy_gym/dmc/suite/cartpole/__init__.py b/fancy_gym/dmc/suite/cartpole/__init__.py
new file mode 100644
index 0000000..8867457
--- /dev/null
+++ b/fancy_gym/dmc/suite/cartpole/__init__.py
@@ -0,0 +1,3 @@
+from .mp_wrapper import MPWrapper
+from .mp_wrapper import ThreePolesMPWrapper
+from .mp_wrapper import TwoPolesMPWrapper
diff --git a/fancy_gym/dmc/suite/cartpole/mp_wrapper.py b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
new file mode 100644
index 0000000..7edd51f
--- /dev/null
+++ b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
@@ -0,0 +1,50 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ def __init__(self, env, n_poles: int = 1):
+ self.n_poles = n_poles
+ super().__init__(env)
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # Besides the ball position, the environment is always set to 0.
+ return np.hstack([
+ [True], # slider position
+ [True] * 2 * self.n_poles, # sin/cos hinge angles
+ [True], # slider velocity
+ [True] * self.n_poles, # hinge velocities
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.env.physics.named.data.qpos["slider"]
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.physics.named.data.qvel["slider"]
+
+ @property
+ def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ raise ValueError("Goal position is not available and has to be learnt based on the environment.")
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self.env.dt
+
+
+class TwoPolesMPWrapper(MPWrapper):
+
+ def __init__(self, env):
+ super().__init__(env, n_poles=2)
+
+
+class ThreePolesMPWrapper(MPWrapper):
+
+ def __init__(self, env):
+ super().__init__(env, n_poles=3)
diff --git a/fancy_gym/dmc/suite/reacher/__init__.py b/fancy_gym/dmc/suite/reacher/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/dmc/suite/reacher/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/dmc/suite/reacher/mp_wrapper.py b/fancy_gym/dmc/suite/reacher/mp_wrapper.py
new file mode 100644
index 0000000..5ac52e5
--- /dev/null
+++ b/fancy_gym/dmc/suite/reacher/mp_wrapper.py
@@ -0,0 +1,33 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # Joint and target positions are randomized, velocities are always set to 0.
+ return np.hstack([
+ [True] * 2, # joint position
+ [True] * 2, # target position
+ [False] * 2, # joint velocity
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.env.physics.named.data.qpos[:]
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.physics.named.data.qvel[:]
+
+ @property
+ def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ raise ValueError("Goal position is not available and has to be learnt based on the environment.")
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self.env.dt
diff --git a/fancy_gym/envs/__init__.py b/fancy_gym/envs/__init__.py
new file mode 100644
index 0000000..a00570c
--- /dev/null
+++ b/fancy_gym/envs/__init__.py
@@ -0,0 +1,664 @@
+from copy import deepcopy
+
+import numpy as np
+from gym import register
+
+from . import classic_control, mujoco
+from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
+from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
+from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
+from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
+from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
+from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
+from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
+from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
+from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
+from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
+from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
+from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
+
+ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []}
+
+DEFAULT_BB_DICT_ProMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'promp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'linear'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 1.0,
+ "d_gains": 0.1,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+}
+
+DEFAULT_BB_DICT_DMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'dmp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'exp'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 1.0,
+ "d_gains": 0.1,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'rbf',
+ 'num_basis': 5
+ }
+}
+
+# Classic Control
+## Simple Reacher
+register(
+ id='SimpleReacher-v0',
+ entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 2,
+ }
+)
+
+register(
+ id='LongSimpleReacher-v0',
+ entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ }
+)
+
+## Viapoint Reacher
+
+register(
+ id='ViaPointReacher-v0',
+ entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "allow_self_collision": False,
+ "collision_penalty": 1000
+ }
+)
+
+## Hole Reacher
+register(
+ id='HoleReacher-v0',
+ entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "random_start": True,
+ "allow_self_collision": False,
+ "allow_wall_collision": False,
+ "hole_width": None,
+ "hole_depth": 1,
+ "hole_x": None,
+ "collision_penalty": 100,
+ }
+)
+
+# Mujoco
+
+## Mujoco Reacher
+for _dims in [5, 7]:
+ register(
+ id=f'Reacher{_dims}d-v0',
+ entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_REACHER,
+ kwargs={
+ "n_links": _dims,
+ }
+ )
+
+ register(
+ id=f'Reacher{_dims}dSparse-v0',
+ entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "sparse": True,
+ 'reward_weight': 200,
+ "n_links": _dims,
+ }
+ )
+
+register(
+ id='HopperJumpSparse-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
+ kwargs={
+ "sparse": True,
+ }
+)
+
+register(
+ id='HopperJump-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
+ kwargs={
+ "sparse": False,
+ "healthy_reward": 1.0,
+ "contact_weight": 0.0,
+ "height_weight": 3.0,
+ }
+)
+
+register(
+ id='AntJump-v0',
+ entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
+)
+
+register(
+ id='HalfCheetahJump-v0',
+ entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
+)
+
+register(
+ id='HopperJumpOnBox-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
+)
+
+register(
+ id='HopperThrow-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
+)
+
+register(
+ id='HopperThrowInBasket-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
+)
+
+register(
+ id='Walker2DJump-v0',
+ entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
+)
+
+register(
+ id='BeerPong-v0',
+ entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
+)
+
+# Here we use the same reward as in BeerPong-v0, but now consider after the release,
+# only one time step, i.e. we simulate until the end of th episode
+register(
+ id='BeerPongStepBased-v0',
+ entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
+ max_episode_steps=FIXED_RELEASE_STEP,
+)
+
+
+# movement Primitive Environments
+
+## Simple Reacher
+_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
+for _v in _versions:
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}DMP-{_name[1]}'
+ kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+ kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
+ kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
+ kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
+ kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
+ kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+ kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_simple_reacher_dmp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
+ kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
+ kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
+ kwargs_dict_simple_reacher_promp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_simple_reacher_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+# Viapoint reacher
+kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
+kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
+kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
+kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
+register(
+ id='ViaPointReacherDMP-v0',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ # max_episode_steps=1,
+ kwargs=kwargs_dict_via_point_reacher_dmp
+)
+ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
+
+kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
+kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
+kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
+register(
+ id="ViaPointReacherProMP-v0",
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_via_point_reacher_promp
+)
+ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
+
+## Hole Reacher
+_versions = ["HoleReacher-v0"]
+for _v in _versions:
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}DMP-{_name[1]}'
+ kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+ kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
+ kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
+ # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
+ kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
+ kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
+ kwargs_dict_hole_reacher_dmp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ # max_episode_steps=1,
+ kwargs=kwargs_dict_hole_reacher_dmp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
+ kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
+ kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
+ kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_hole_reacher_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+## ReacherNd
+_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
+for _v in _versions:
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}DMP-{_name[1]}'
+ kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
+ kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
+ kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
+ kwargs_dict_reacher_dmp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ # max_episode_steps=1,
+ kwargs=kwargs_dict_reacher_dmp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
+
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
+ kwargs_dict_reacher_promp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_reacher_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+########################################################################################################################
+## Beerpong ProMP
+_versions = ['BeerPong-v0']
+for _v in _versions:
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
+ kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
+ kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
+ kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
+ kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
+ kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
+ kwargs_dict_bp_promp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_bp_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+### BP with Fixed release
+_versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
+for _v in _versions:
+ if _v != 'BeerPong-v0':
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ else:
+ _env_id = 'BeerPongFixedReleaseProMP-v0'
+ kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
+ kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
+ kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
+ kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
+ kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
+ kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
+ kwargs_dict_bp_promp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_bp_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+########################################################################################################################
+
+## Table Tennis needs to be fixed according to Zhou's implementation
+
+# TODO: Add later when finished
+# ########################################################################################################################
+#
+# ## AntJump
+# _versions = ['AntJump-v0']
+# for _v in _versions:
+# _name = _v.split("-")
+# _env_id = f'{_name[0]}ProMP-{_name[1]}'
+# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
+# kwargs_dict_ant_jump_promp['name'] = _v
+# register(
+# id=_env_id,
+# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+# kwargs=kwargs_dict_ant_jump_promp
+# )
+# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+#
+# ########################################################################################################################
+#
+# ## HalfCheetahJump
+# _versions = ['HalfCheetahJump-v0']
+# for _v in _versions:
+# _name = _v.split("-")
+# _env_id = f'{_name[0]}ProMP-{_name[1]}'
+# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
+# kwargs_dict_halfcheetah_jump_promp['name'] = _v
+# register(
+# id=_env_id,
+# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+# kwargs=kwargs_dict_halfcheetah_jump_promp
+# )
+# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+#
+# ########################################################################################################################
+
+
+## HopperJump
+_versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
+ # 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
+ ]
+# TODO: Check if all environments work with the same MPWrapper
+for _v in _versions:
+ _name = _v.split("-")
+ _env_id = f'{_name[0]}ProMP-{_name[1]}'
+ kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
+ kwargs_dict_hopper_jump_promp['name'] = _v
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_hopper_jump_promp
+ )
+ ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+# ########################################################################################################################
+#
+#
+# ## Walker2DJump
+# _versions = ['Walker2DJump-v0']
+# for _v in _versions:
+# _name = _v.split("-")
+# _env_id = f'{_name[0]}ProMP-{_name[1]}'
+# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
+# kwargs_dict_walker2d_jump_promp['name'] = _v
+# register(
+# id=_env_id,
+# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+# kwargs=kwargs_dict_walker2d_jump_promp
+# )
+# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+### Depricated, we will not provide non random starts anymore
+"""
+register(
+ id='SimpleReacher-v1',
+ entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 2,
+ "random_start": False
+ }
+)
+
+register(
+ id='LongSimpleReacher-v1',
+ entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "random_start": False
+ }
+)
+register(
+ id='HoleReacher-v1',
+ entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "random_start": False,
+ "allow_self_collision": False,
+ "allow_wall_collision": False,
+ "hole_width": 0.25,
+ "hole_depth": 1,
+ "hole_x": None,
+ "collision_penalty": 100,
+ }
+)
+register(
+ id='HoleReacher-v2',
+ entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "random_start": False,
+ "allow_self_collision": False,
+ "allow_wall_collision": False,
+ "hole_width": 0.25,
+ "hole_depth": 1,
+ "hole_x": 2,
+ "collision_penalty": 1,
+ }
+)
+
+# CtxtFree are v0, Contextual are v1
+register(
+ id='AntJump-v0',
+ entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
+ "context": False
+ }
+)
+# CtxtFree are v0, Contextual are v1
+register(
+ id='HalfCheetahJump-v0',
+ entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
+ "context": False
+ }
+)
+register(
+ id='HopperJump-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
+ "context": False,
+ "healthy_reward": 1.0
+ }
+)
+
+"""
+
+### Deprecated used for CorL paper
+"""
+_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
+for i in _vs:
+ _env_id = f'ALRReacher{i}-v0'
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "steps_before_reward": 0,
+ "n_links": 5,
+ "balance": False,
+ '_ctrl_cost_weight': i
+ }
+ )
+
+ _env_id = f'ALRReacherSparse{i}-v0'
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ max_episode_steps=200,
+ kwargs={
+ "steps_before_reward": 200,
+ "n_links": 5,
+ "balance": False,
+ '_ctrl_cost_weight': i
+ }
+ )
+ _vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
+for i in _vs:
+ _env_id = f'ALRReacher{i}ProMP-v0'
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": f"{_env_id.replace('ProMP', '')}",
+ "wrappers": [mujoco.reacher.MPWrapper],
+ "mp_kwargs": {
+ "num_dof": 5,
+ "num_basis": 5,
+ "duration": 4,
+ "policy_type": "motor",
+ # "weights_scale": 5,
+ "n_zero_basis": 1,
+ "zero_start": True,
+ "policy_kwargs": {
+ "p_gains": 1,
+ "d_gains": 0.1
+ }
+ }
+ }
+ )
+
+ _env_id = f'ALRReacherSparse{i}ProMP-v0'
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": f"{_env_id.replace('ProMP', '')}",
+ "wrappers": [mujoco.reacher.MPWrapper],
+ "mp_kwargs": {
+ "num_dof": 5,
+ "num_basis": 5,
+ "duration": 4,
+ "policy_type": "motor",
+ # "weights_scale": 5,
+ "n_zero_basis": 1,
+ "zero_start": True,
+ "policy_kwargs": {
+ "p_gains": 1,
+ "d_gains": 0.1
+ }
+ }
+ }
+ )
+
+ register(
+ id='HopperJumpOnBox-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
+ "context": False
+ }
+ )
+ register(
+ id='HopperThrow-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
+ "context": False
+ }
+ )
+ register(
+ id='HopperThrowInBasket-v0',
+ entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
+ "context": False
+ }
+ )
+ register(
+ id='Walker2DJump-v0',
+ entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
+ max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
+ kwargs={
+ "max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
+ "context": False
+ }
+ )
+ register(id='TableTennis2DCtxt-v1',
+ entry_point='fancy_gym.envs.mujoco:TTEnvGym',
+ max_episode_steps=MAX_EPISODE_STEPS,
+ kwargs={'ctxt_dim': 2, 'fixed_goal': True})
+
+ register(
+ id='BeerPong-v0',
+ entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
+ max_episode_steps=300,
+ kwargs={
+ "rndm_goal": False,
+ "cup_goal_pos": [0.1, -2.0],
+ "frame_skip": 2
+ }
+ )
+"""
diff --git a/fancy_gym/envs/classic_control/README.MD b/fancy_gym/envs/classic_control/README.MD
new file mode 100644
index 0000000..bd1b68b
--- /dev/null
+++ b/fancy_gym/envs/classic_control/README.MD
@@ -0,0 +1,18 @@
+### Classic Control
+
+## Step-based Environments
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
+|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
+|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
+|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+
+## MP Environments
+|Name| Description|Horizon|Action Dimension|Context Dimension
+|---|---|---|---|---|
+|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
+|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
+|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
+
+[//]: |`HoleReacherProMPP-v0`|
\ No newline at end of file
diff --git a/fancy_gym/envs/classic_control/__init__.py b/fancy_gym/envs/classic_control/__init__.py
new file mode 100644
index 0000000..73575ab
--- /dev/null
+++ b/fancy_gym/envs/classic_control/__init__.py
@@ -0,0 +1,3 @@
+from .hole_reacher.hole_reacher import HoleReacherEnv
+from .simple_reacher.simple_reacher import SimpleReacherEnv
+from .viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
diff --git a/fancy_gym/envs/classic_control/base_reacher/__init__.py b/fancy_gym/envs/classic_control/base_reacher/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
new file mode 100644
index 0000000..abfe2ca
--- /dev/null
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
@@ -0,0 +1,148 @@
+from abc import ABC, abstractmethod
+from typing import Union, Tuple, Optional
+
+import gym
+import numpy as np
+from gym import spaces
+from gym.core import ObsType
+from gym.utils import seeding
+
+from fancy_gym.envs.classic_control.utils import intersect
+
+
+class BaseReacherEnv(gym.Env, ABC):
+ """
+ Base class for all reaching environments.
+ """
+
+ def __init__(self, n_links: int, random_start: bool = True, allow_self_collision: bool = False):
+ super().__init__()
+ self.link_lengths = np.ones(n_links)
+ self.n_links = n_links
+ self._dt = 0.01
+
+ self.random_start = random_start
+
+ self.allow_self_collision = allow_self_collision
+
+ # state
+ self._joints = None
+ self._joint_angles = None
+ self._angle_velocity = None
+ self._acc = None
+ self._start_pos = np.hstack([[np.pi / 2], np.zeros(self.n_links - 1)])
+ self._start_vel = np.zeros(self.n_links)
+
+ # joint limits
+ self.j_min = -np.pi * np.ones(n_links)
+ self.j_max = np.pi * np.ones(n_links)
+
+ self.steps_before_reward = 199
+
+ state_bound = np.hstack([
+ [np.pi] * self.n_links, # cos
+ [np.pi] * self.n_links, # sin
+ [np.inf] * self.n_links, # velocity
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps TODO: Maybe
+ ])
+
+ self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ self.reward_function = None # Needs to be set in sub class
+
+ # containers for plotting
+ self.metadata = {'render.modes': ["human"]}
+ self.fig = None
+
+ self._steps = 0
+ self.seed()
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self._dt
+
+ @property
+ def current_pos(self):
+ return self._joint_angles.copy()
+
+ @property
+ def current_vel(self):
+ return self._angle_velocity.copy()
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ # Sample only orientation of first link, i.e. the arm is always straight.
+ if self.random_start:
+ first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
+ self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
+ self._start_pos = self._joint_angles.copy()
+ else:
+ self._joint_angles = self._start_pos
+
+ self._angle_velocity = self._start_vel
+ self._joints = np.zeros((self.n_links + 1, 2))
+ self._update_joints()
+ self._steps = 0
+
+ return self._get_obs().copy()
+
+ @abstractmethod
+ def step(self, action: np.ndarray):
+ """
+ A single step with action in angular velocity space
+ """
+ raise NotImplementedError
+
+ def _update_joints(self):
+ """
+ update joints to get new end-effector position. The other links are only required for rendering.
+ Returns:
+
+ """
+ angles = np.cumsum(self._joint_angles)
+ x = self.link_lengths * np.vstack([np.cos(angles), np.sin(angles)])
+ self._joints[1:] = self._joints[0] + np.cumsum(x.T, axis=0)
+
+ def _check_self_collision(self):
+ """Checks whether line segments intersect"""
+
+ if self.allow_self_collision:
+ return False
+
+ if np.any(self._joint_angles > self.j_max) or np.any(self._joint_angles < self.j_min):
+ return True
+
+ link_lines = np.stack((self._joints[:-1, :], self._joints[1:, :]), axis=1)
+ for i, line1 in enumerate(link_lines):
+ for line2 in link_lines[i + 2:, :]:
+ if intersect(line1[0], line1[-1], line2[0], line2[-1]):
+ return True
+ return False
+
+ @abstractmethod
+ def _get_reward(self, action: np.ndarray) -> (float, dict):
+ pass
+
+ @abstractmethod
+ def _get_obs(self) -> np.ndarray:
+ pass
+
+ @abstractmethod
+ def _check_collisions(self) -> bool:
+ pass
+
+ @abstractmethod
+ def _terminate(self, info) -> bool:
+ return False
+
+ def seed(self, seed=None):
+ self.np_random, seed = seeding.np_random(seed)
+ return [seed]
+
+ def close(self):
+ del self.fig
+
+ @property
+ def end_effector(self):
+ return self._joints[self.n_links].T
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
new file mode 100644
index 0000000..0e44033
--- /dev/null
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
@@ -0,0 +1,39 @@
+from abc import ABC
+
+import numpy as np
+from gym import spaces
+
+from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
+
+
+class BaseReacherDirectEnv(BaseReacherEnv, ABC):
+ """
+ Base class for directly controlled reaching environments
+ """
+
+ def __init__(self, n_links: int, random_start: bool = True,
+ allow_self_collision: bool = False):
+ super().__init__(n_links, random_start, allow_self_collision)
+
+ self.max_vel = 2 * np.pi
+ action_bound = np.ones((self.n_links,)) * self.max_vel
+ self.action_space = spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
+
+ def step(self, action: np.ndarray):
+ """
+ A single step with action in angular velocity space
+ """
+
+ self._acc = (action - self._angle_velocity) / self.dt
+ self._angle_velocity = action
+ self._joint_angles = self._joint_angles + self.dt * self._angle_velocity
+ self._update_joints()
+
+ self._is_collided = self._check_collisions()
+
+ reward, info = self._get_reward(action)
+
+ self._steps += 1
+ done = self._terminate(info)
+
+ return self._get_obs().copy(), reward, done, info
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
new file mode 100644
index 0000000..79f3005
--- /dev/null
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
@@ -0,0 +1,38 @@
+from abc import ABC
+
+import numpy as np
+from gym import spaces
+
+from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
+
+
+class BaseReacherTorqueEnv(BaseReacherEnv, ABC):
+ """
+ Base class for torque controlled reaching environments
+ """
+
+ def __init__(self, n_links: int, random_start: bool = True,
+ allow_self_collision: bool = False):
+ super().__init__(n_links, random_start, allow_self_collision)
+
+ self.max_torque = 1000
+ action_bound = np.ones((self.n_links,)) * self.max_torque
+ self.action_space = spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
+
+ def step(self, action: np.ndarray):
+ """
+ A single step with action in torque space
+ """
+
+ self._angle_velocity = self._angle_velocity + self.dt * action
+ self._joint_angles = self._joint_angles + self.dt * self._angle_velocity
+ self._update_joints()
+
+ self._is_collided = self._check_collisions()
+
+ reward, info = self._get_reward(action)
+
+ self._steps += 1
+ done = False
+
+ return self._get_obs().copy(), reward, done, info
diff --git a/fancy_gym/envs/classic_control/hole_reacher/__init__.py b/fancy_gym/envs/classic_control/hole_reacher/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
new file mode 100644
index 0000000..5563ea6
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
@@ -0,0 +1,238 @@
+from typing import Union, Optional, Tuple
+
+import gym
+import matplotlib.pyplot as plt
+import numpy as np
+from gym.core import ObsType
+from matplotlib import patches
+
+from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
+
+MAX_EPISODE_STEPS_HOLEREACHER = 200
+
+
+class HoleReacherEnv(BaseReacherDirectEnv):
+ def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
+ hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
+ allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
+
+ super().__init__(n_links, random_start, allow_self_collision)
+
+ # provided initial parameters
+ self.initial_x = hole_x # x-position of center of hole
+ self.initial_width = hole_width # width of hole
+ self.initial_depth = hole_depth # depth of hole
+
+ # temp container for current env state
+ self._tmp_x = None
+ self._tmp_width = None
+ self._tmp_depth = None
+ self._goal = None # x-y coordinates for reaching the center at the bottom of the hole
+
+ # action_bound = np.pi * np.ones((self.n_links,))
+ state_bound = np.hstack([
+ [np.pi] * self.n_links, # cos
+ [np.pi] * self.n_links, # sin
+ [np.inf] * self.n_links, # velocity
+ [np.inf], # hole width
+ # [np.inf], # hole depth
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps TODO: Maybe
+ ])
+ # self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
+ self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ if rew_fct == "simple":
+ from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
+ self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision, collision_penalty)
+ elif rew_fct == "vel_acc":
+ from fancy_gym.envs.classic_control.hole_reacher.hr_dist_vel_acc_reward import HolereacherReward
+ self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision, collision_penalty)
+ elif rew_fct == "unbounded":
+ from fancy_gym.envs.classic_control.hole_reacher.hr_unbounded_reward import HolereacherReward
+ self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision)
+ else:
+ raise ValueError("Unknown reward function {}".format(rew_fct))
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ self._generate_hole()
+ self._set_patches()
+ self.reward_function.reset()
+
+ return super().reset()
+
+ def _get_reward(self, action: np.ndarray) -> (float, dict):
+ return self.reward_function.get_reward(self)
+
+ def _terminate(self, info):
+ return info["is_collided"]
+
+ def _generate_hole(self):
+ if self.initial_width is None:
+ width = self.np_random.uniform(0.15, 0.5)
+ else:
+ width = np.copy(self.initial_width)
+ if self.initial_x is None:
+ # sample whole on left or right side
+ direction = self.np_random.choice([-1, 1])
+ # Hole center needs to be half the width away from the arm to give a valid setting.
+ x = direction * self.np_random.uniform(width / 2, 3.5)
+ else:
+ x = np.copy(self.initial_x)
+ if self.initial_depth is None:
+ # TODO we do not want this right now.
+ depth = self.np_random.uniform(1, 1)
+ else:
+ depth = np.copy(self.initial_depth)
+
+ self._tmp_width = width
+ self._tmp_x = x
+ self._tmp_depth = depth
+ self._goal = np.hstack([self._tmp_x, -self._tmp_depth])
+
+ self._line_ground_left = np.array([-self.n_links, 0, x - width / 2, 0])
+ self._line_ground_right = np.array([x + width / 2, 0, self.n_links, 0])
+ self._line_ground_hole = np.array([x - width / 2, -depth, x + width / 2, -depth])
+ self._line_hole_left = np.array([x - width / 2, -depth, x - width / 2, 0])
+ self._line_hole_right = np.array([x + width / 2, -depth, x + width / 2, 0])
+
+ self.ground_lines = np.stack((self._line_ground_left,
+ self._line_ground_right,
+ self._line_ground_hole,
+ self._line_hole_left,
+ self._line_hole_right))
+
+ def _get_obs(self):
+ theta = self._joint_angles
+ return np.hstack([
+ np.cos(theta),
+ np.sin(theta),
+ self._angle_velocity,
+ self._tmp_width,
+ # self._tmp_hole_depth,
+ self.end_effector - self._goal,
+ self._steps
+ ]).astype(np.float32)
+
+ def _get_line_points(self, num_points_per_link=1):
+ theta = self._joint_angles[:, None]
+
+ intermediate_points = np.linspace(0, 1, num_points_per_link) if num_points_per_link > 1 else 1
+ accumulated_theta = np.cumsum(theta, axis=0)
+ end_effector = np.zeros(shape=(self.n_links, num_points_per_link, 2))
+
+ x = np.cos(accumulated_theta) * self.link_lengths[:, None] * intermediate_points
+ y = np.sin(accumulated_theta) * self.link_lengths[:, None] * intermediate_points
+
+ end_effector[0, :, 0] = x[0, :]
+ end_effector[0, :, 1] = y[0, :]
+
+ for i in range(1, self.n_links):
+ end_effector[i, :, 0] = x[i, :] + end_effector[i - 1, -1, 0]
+ end_effector[i, :, 1] = y[i, :] + end_effector[i - 1, -1, 1]
+
+ return np.squeeze(end_effector + self._joints[0, :])
+
+ def _check_collisions(self) -> bool:
+ return self._check_self_collision() or self.check_wall_collision()
+
+ def check_wall_collision(self):
+ line_points = self._get_line_points(num_points_per_link=100)
+
+ # all points that are before the hole in x
+ r, c = np.where(line_points[:, :, 0] < (self._tmp_x - self._tmp_width / 2))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_before_hole = np.sum(line_points[r, c, 1] < 0)
+
+ if nr_line_points_below_surface_before_hole > 0:
+ return True
+
+ # all points that are after the hole in x
+ r, c = np.where(line_points[:, :, 0] > (self._tmp_x + self._tmp_width / 2))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_after_hole = np.sum(line_points[r, c, 1] < 0)
+
+ if nr_line_points_below_surface_after_hole > 0:
+ return True
+
+ # all points that are above the hole
+ r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
+ line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
+
+ if nr_line_points_below_surface_in_hole > 0:
+ return True
+
+ return False
+
+ def render(self, mode='human'):
+ if self.fig is None:
+ # Create base figure once on the beginning. Afterwards only update
+ plt.ion()
+ self.fig = plt.figure()
+ ax = self.fig.add_subplot(1, 1, 1)
+
+ # limits
+ lim = np.sum(self.link_lengths) + 0.5
+ ax.set_xlim([-lim, lim])
+ ax.set_ylim([-1.1, lim])
+
+ self.line, = ax.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+ self._set_patches()
+ self.fig.show()
+
+ self.fig.gca().set_title(
+ f"Iteration: {self._steps}, distance: {np.linalg.norm(self.end_effector - self._goal) ** 2}")
+
+ if mode == "human":
+
+ # arm
+ self.line.set_data(self._joints[:, 0], self._joints[:, 1])
+
+ self.fig.canvas.draw()
+ self.fig.canvas.flush_events()
+
+ elif mode == "partial":
+ if self._steps % 20 == 0 or self._steps in [1, 199] or self._is_collided:
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k',
+ alpha=self._steps / 200)
+
+ def _set_patches(self):
+ if self.fig is not None:
+ # self.fig.gca().patches = []
+ left_block = patches.Rectangle((-self.n_links, -self._tmp_depth),
+ self.n_links + self._tmp_x - self._tmp_width / 2,
+ self._tmp_depth,
+ fill=True, edgecolor='k', facecolor='k')
+ right_block = patches.Rectangle((self._tmp_x + self._tmp_width / 2, -self._tmp_depth),
+ self.n_links - self._tmp_x + self._tmp_width / 2,
+ self._tmp_depth,
+ fill=True, edgecolor='k', facecolor='k')
+ hole_floor = patches.Rectangle((self._tmp_x - self._tmp_width / 2, -self._tmp_depth),
+ self._tmp_width,
+ 1 - self._tmp_depth,
+ fill=True, edgecolor='k', facecolor='k')
+
+ # Add the patch to the Axes
+ self.fig.gca().add_patch(left_block)
+ self.fig.gca().add_patch(right_block)
+ self.fig.gca().add_patch(hole_floor)
+
+
+if __name__ == "__main__":
+
+ env = HoleReacherEnv(5)
+ env.reset()
+
+ for i in range(10000):
+ ac = env.action_space.sample()
+ obs, rew, done, info = env.step(ac)
+ env.render()
+ if done:
+ env.reset()
diff --git a/fancy_gym/envs/classic_control/hole_reacher/hr_dist_vel_acc_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_dist_vel_acc_reward.py
new file mode 100644
index 0000000..1d75b66
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/hr_dist_vel_acc_reward.py
@@ -0,0 +1,60 @@
+import numpy as np
+
+
+class HolereacherReward:
+ def __init__(self, allow_self_collision, allow_wall_collision, collision_penalty):
+ self.collision_penalty = collision_penalty
+
+ # collision
+ self.allow_self_collision = allow_self_collision
+ self.allow_wall_collision = allow_wall_collision
+ self.collision_penalty = collision_penalty
+ self._is_collided = False
+
+ self.reward_factors = np.array((-1, -1e-4, -1e-6, -collision_penalty, 0))
+
+ def reset(self):
+ self._is_collided = False
+ self.collision_dist = 0
+
+ def get_reward(self, env):
+ dist_cost = 0
+ collision_cost = 0
+ time_cost = 0
+ success = False
+
+ self_collision = False
+ wall_collision = False
+
+ if not self._is_collided:
+ if not self.allow_self_collision:
+ self_collision = env._check_self_collision()
+
+ if not self.allow_wall_collision:
+ wall_collision = env.check_wall_collision()
+
+ self._is_collided = self_collision or wall_collision
+
+ self.collision_dist = np.linalg.norm(env.end_effector - env._goal)
+
+ if env._steps == 199: # or self._is_collided:
+ # return reward only in last time step
+ # Episode also terminates when colliding, hence return reward
+ dist = np.linalg.norm(env.end_effector - env._goal)
+
+ success = dist < 0.005 and not self._is_collided
+ dist_cost = dist ** 2
+ collision_cost = self._is_collided * self.collision_dist ** 2
+ time_cost = 199 - env._steps
+
+ info = {"is_success": success,
+ "is_collided": self._is_collided,
+ "end_effector": np.copy(env.end_effector)}
+
+ vel_cost = np.sum(env._angle_velocity ** 2)
+ acc_cost = np.sum(env._acc ** 2)
+
+ reward_features = np.array((dist_cost, vel_cost, acc_cost, collision_cost, time_cost))
+ reward = np.dot(reward_features, self.reward_factors)
+
+ return reward, info
diff --git a/fancy_gym/envs/classic_control/hole_reacher/hr_simple_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_simple_reward.py
new file mode 100644
index 0000000..5820ab1
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/hr_simple_reward.py
@@ -0,0 +1,53 @@
+import numpy as np
+
+
+class HolereacherReward:
+ def __init__(self, allow_self_collision, allow_wall_collision, collision_penalty):
+ self.collision_penalty = collision_penalty
+
+ # collision
+ self.allow_self_collision = allow_self_collision
+ self.allow_wall_collision = allow_wall_collision
+ self.collision_penalty = collision_penalty
+ self._is_collided = False
+
+ self.reward_factors = np.array((-1, -5e-8, -collision_penalty))
+
+ def reset(self):
+ self._is_collided = False
+
+ def get_reward(self, env):
+ dist_cost = 0
+ collision_cost = 0
+ success = False
+
+ self_collision = False
+ wall_collision = False
+
+ if not self.allow_self_collision:
+ self_collision = env._check_self_collision()
+
+ if not self.allow_wall_collision:
+ wall_collision = env.check_wall_collision()
+
+ self._is_collided = self_collision or wall_collision
+
+ if env._steps == 199 or self._is_collided:
+ # return reward only in last time step
+ # Episode also terminates when colliding, hence return reward
+ dist = np.linalg.norm(env.end_effector - env._goal)
+ dist_cost = dist ** 2
+ collision_cost = int(self._is_collided)
+
+ success = dist < 0.005 and not self._is_collided
+
+ info = {"is_success": success,
+ "is_collided": self._is_collided,
+ "end_effector": np.copy(env.end_effector)}
+
+ acc_cost = np.sum(env._acc ** 2)
+
+ reward_features = np.array((dist_cost, acc_cost, collision_cost))
+ reward = np.dot(reward_features, self.reward_factors)
+
+ return reward, info
diff --git a/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py
new file mode 100644
index 0000000..7ed13a1
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py
@@ -0,0 +1,60 @@
+import numpy as np
+
+
+class HolereacherReward:
+ def __init__(self, allow_self_collision, allow_wall_collision):
+
+ # collision
+ self.allow_self_collision = allow_self_collision
+ self.allow_wall_collision = allow_wall_collision
+ self._is_collided = False
+
+ self.reward_factors = np.array((1, -5e-6))
+
+ def reset(self):
+ self._is_collided = False
+
+ def get_reward(self, env):
+ dist_reward = 0
+ success = False
+
+ self_collision = False
+ wall_collision = False
+
+ if not self.allow_self_collision:
+ self_collision = env._check_self_collision()
+
+ if not self.allow_wall_collision:
+ wall_collision = env.check_wall_collision()
+
+ self._is_collided = self_collision or wall_collision
+
+ if env._steps == 180 or self._is_collided:
+ self.end_eff_pos = np.copy(env.end_effector)
+
+ if env._steps == 199 or self._is_collided:
+ # return reward only in last time step
+ # Episode also terminates when colliding, hence return reward
+ dist = np.linalg.norm(self.end_eff_pos - env._goal)
+
+ if self._is_collided:
+ dist_reward = 0.25 * np.exp(- dist)
+ else:
+ if env.end_effector[1] > 0:
+ dist_reward = np.exp(- dist)
+ else:
+ dist_reward = 1 - self.end_eff_pos[1]
+
+ success = not self._is_collided
+
+ info = {"is_success": success,
+ "is_collided": self._is_collided,
+ "end_effector": np.copy(env.end_effector),
+ "joints": np.copy(env.current_pos)}
+
+ acc_cost = np.sum(env._acc ** 2)
+
+ reward_features = np.array((dist_reward, acc_cost))
+ reward = np.dot(reward_features, self.reward_factors)
+
+ return reward, info
\ No newline at end of file
diff --git a/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
new file mode 100644
index 0000000..d160b5c
--- /dev/null
+++ b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
@@ -0,0 +1,28 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [self.env.random_start] * self.env.n_links, # cos
+ [self.env.random_start] * self.env.n_links, # sin
+ [self.env.random_start] * self.env.n_links, # velocity
+ [self.env.initial_width is None], # hole width
+ # [self.env.hole_depth is None], # hole depth
+ [True] * 2, # x-y coordinates of target distance
+ [False] # env steps
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_pos
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_vel
diff --git a/fancy_gym/envs/classic_control/simple_reacher/__init__.py b/fancy_gym/envs/classic_control/simple_reacher/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/envs/classic_control/simple_reacher/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
new file mode 100644
index 0000000..6d1fda1
--- /dev/null
+++ b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
@@ -0,0 +1,26 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [self.env.random_start] * self.env.n_links, # cos
+ [self.env.random_start] * self.env.n_links, # sin
+ [self.env.random_start] * self.env.n_links, # velocity
+ [True] * 2, # x-y coordinates of target distance
+ [False] # env steps
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_pos
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_vel
diff --git a/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
new file mode 100644
index 0000000..9b03147
--- /dev/null
+++ b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
@@ -0,0 +1,141 @@
+from typing import Iterable, Union, Optional, Tuple
+
+import matplotlib.pyplot as plt
+import numpy as np
+from gym import spaces
+from gym.core import ObsType
+
+from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
+
+
+class SimpleReacherEnv(BaseReacherTorqueEnv):
+ """
+ Simple Reaching Task without any physics simulation.
+ Returns no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions
+ towards the end of the trajectory.
+ """
+
+ def __init__(self, n_links: int, target: Union[None, Iterable] = None, random_start: bool = True,
+ allow_self_collision: bool = False, ):
+ super().__init__(n_links, random_start, allow_self_collision)
+
+ # provided initial parameters
+ self.inital_target = target
+
+ # temp container for current env state
+ self._goal = None
+
+ self._start_pos = np.zeros(self.n_links)
+
+ self.steps_before_reward = 199
+
+ state_bound = np.hstack([
+ [np.pi] * self.n_links, # cos
+ [np.pi] * self.n_links, # sin
+ [np.inf] * self.n_links, # velocity
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps TODO: Maybe
+ ])
+ self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ # @property
+ # def start_pos(self):
+ # return self._start_pos
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ self._generate_goal()
+
+ return super().reset()
+
+ def _get_reward(self, action: np.ndarray):
+ diff = self.end_effector - self._goal
+ reward_dist = 0
+
+ if not self.allow_self_collision:
+ self._is_collided = self._check_self_collision()
+
+ if self._steps >= self.steps_before_reward:
+ reward_dist -= np.linalg.norm(diff)
+ # reward_dist = np.exp(-0.1 * diff ** 2).mean()
+ # reward_dist = - (diff ** 2).mean()
+
+ reward_ctrl = (action ** 2).sum()
+ reward = reward_dist - reward_ctrl
+ return reward, dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl)
+
+ def _terminate(self, info):
+ return False
+
+ def _get_obs(self):
+ theta = self._joint_angles
+ return np.hstack([
+ np.cos(theta),
+ np.sin(theta),
+ self._angle_velocity,
+ self.end_effector - self._goal,
+ self._steps
+ ]).astype(np.float32)
+
+ def _generate_goal(self):
+
+ if self.inital_target is None:
+
+ total_length = np.sum(self.link_lengths)
+ goal = np.array([total_length, total_length])
+ while np.linalg.norm(goal) >= total_length:
+ goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
+ else:
+ goal = np.copy(self.inital_target)
+
+ self._goal = goal
+
+ def _check_collisions(self) -> bool:
+ return self._check_self_collision()
+
+ def render(self, mode='human'): # pragma: no cover
+ if self.fig is None:
+ # Create base figure once on the beginning. Afterwards only update
+ plt.ion()
+ self.fig = plt.figure()
+ ax = self.fig.add_subplot(1, 1, 1)
+
+ # limits
+ lim = np.sum(self.link_lengths) + 0.5
+ ax.set_xlim([-lim, lim])
+ ax.set_ylim([-lim, lim])
+
+ self.line, = ax.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+ goal_pos = self._goal.T
+ self.goal_point, = ax.plot(goal_pos[0], goal_pos[1], 'gx')
+ self.goal_dist, = ax.plot([self.end_effector[0], goal_pos[0]], [self.end_effector[1], goal_pos[1]], 'g--')
+
+ self.fig.show()
+
+ self.fig.gca().set_title(f"Iteration: {self._steps}, distance: {self.end_effector - self._goal}")
+
+ # goal
+ goal_pos = self._goal.T
+ if self._steps == 1:
+ self.goal_point.set_data(goal_pos[0], goal_pos[1])
+
+ # arm
+ self.line.set_data(self._joints[:, 0], self._joints[:, 1])
+
+ # distance between end effector and goal
+ self.goal_dist.set_data([self.end_effector[0], goal_pos[0]], [self.end_effector[1], goal_pos[1]])
+
+ self.fig.canvas.draw()
+ self.fig.canvas.flush_events()
+
+
+if __name__ == "__main__":
+ env = SimpleReacherEnv(5)
+ env.reset()
+ for i in range(200):
+ ac = env.action_space.sample()
+ obs, rew, done, info = env.step(ac)
+
+ env.render()
+ if done:
+ break
diff --git a/fancy_gym/envs/classic_control/utils.py b/fancy_gym/envs/classic_control/utils.py
new file mode 100644
index 0000000..ea378d1
--- /dev/null
+++ b/fancy_gym/envs/classic_control/utils.py
@@ -0,0 +1,18 @@
+def ccw(A, B, C):
+ return (C[1] - A[1]) * (B[0] - A[0]) - (B[1] - A[1]) * (C[0] - A[0]) > 1e-12
+
+
+def intersect(A, B, C, D):
+ """
+ Checks whether line segments AB and CD intersect
+ """
+ return ccw(A, C, D) != ccw(B, C, D) and ccw(A, B, C) != ccw(A, B, D)
+
+
+def check_self_collision(line_points):
+ """Checks whether line segments intersect"""
+ for i, line1 in enumerate(line_points):
+ for line2 in line_points[i + 2:, :, :]:
+ if intersect(line1[0], line1[-1], line2[0], line2[-1]):
+ return True
+ return False
diff --git a/fancy_gym/envs/classic_control/viapoint_reacher/__init__.py b/fancy_gym/envs/classic_control/viapoint_reacher/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
new file mode 100644
index 0000000..47da749
--- /dev/null
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
@@ -0,0 +1,27 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [self.env.random_start] * self.env.n_links, # cos
+ [self.env.random_start] * self.env.n_links, # sin
+ [self.env.random_start] * self.env.n_links, # velocity
+ [self.env.initial_via_target is None] * 2, # x-y coordinates of via point distance
+ [True] * 2, # x-y coordinates of target distance
+ [False] # env steps
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_pos
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.env.current_vel
diff --git a/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
new file mode 100644
index 0000000..f3412ac
--- /dev/null
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
@@ -0,0 +1,198 @@
+from typing import Iterable, Union, Tuple, Optional
+
+import gym
+import matplotlib.pyplot as plt
+import numpy as np
+from gym.core import ObsType
+
+from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
+
+
+class ViaPointReacherEnv(BaseReacherDirectEnv):
+
+ def __init__(self, n_links, random_start: bool = False, via_target: Union[None, Iterable] = None,
+ target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000):
+
+ super().__init__(n_links, random_start, allow_self_collision)
+
+ # provided initial parameters
+ self.intitial_target = target # provided target value
+ self.initial_via_target = via_target # provided via point target value
+
+ # temp container for current env state
+ self._via_point = np.ones(2)
+ self._goal = np.array((n_links, 0))
+
+ # collision
+ self.collision_penalty = collision_penalty
+
+ state_bound = np.hstack([
+ [np.pi] * self.n_links, # cos
+ [np.pi] * self.n_links, # sin
+ [np.inf] * self.n_links, # velocity
+ [np.inf] * 2, # x-y coordinates of via point distance
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps
+ ])
+ self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ # @property
+ # def start_pos(self):
+ # return self._start_pos
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ self._generate_goal()
+ return super().reset()
+
+ def _generate_goal(self):
+ # TODO: Maybe improve this later, this can yield quite a lot of invalid settings
+
+ total_length = np.sum(self.link_lengths)
+
+ # rejection sampled point in inner circle with 0.5*Radius
+ if self.initial_via_target is None:
+ via_target = np.array([total_length, total_length])
+ while np.linalg.norm(via_target) >= 0.5 * total_length:
+ via_target = self.np_random.uniform(low=-0.5 * total_length, high=0.5 * total_length, size=2)
+ else:
+ via_target = np.copy(self.initial_via_target)
+
+ # rejection sampled point in outer circle
+ if self.intitial_target is None:
+ goal = np.array([total_length, total_length])
+ while np.linalg.norm(goal) >= total_length or np.linalg.norm(goal) <= 0.5 * total_length:
+ goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
+ else:
+ goal = np.copy(self.intitial_target)
+
+ self._via_point = via_target
+ self._goal = goal
+
+ def _get_reward(self, acc):
+ success = False
+ reward = -np.inf
+
+ if not self.allow_self_collision:
+ self._is_collided = self._check_self_collision()
+
+ if not self._is_collided:
+ dist = np.inf
+ # return intermediate reward for via point
+ if self._steps == 100:
+ dist = np.linalg.norm(self.end_effector - self._via_point)
+ # return reward in last time step for goal
+ elif self._steps == 199:
+ dist = np.linalg.norm(self.end_effector - self._goal)
+
+ success = dist < 0.005
+ else:
+ # Episode terminates when colliding, hence return reward
+ dist = np.linalg.norm(self.end_effector - self._goal)
+ reward = -self.collision_penalty
+
+ reward -= dist ** 2
+ reward -= 5e-8 * np.sum(acc ** 2)
+ info = {"is_success": success,
+ "is_collided": self._is_collided,
+ "end_effector": np.copy(self.end_effector)}
+
+ return reward, info
+
+ def _terminate(self, info):
+ return info["is_collided"]
+
+ def _get_obs(self):
+ theta = self._joint_angles
+ return np.hstack([
+ np.cos(theta),
+ np.sin(theta),
+ self._angle_velocity,
+ self.end_effector - self._via_point,
+ self.end_effector - self._goal,
+ self._steps
+ ]).astype(np.float32)
+
+ def _check_collisions(self) -> bool:
+ return self._check_self_collision()
+
+ def render(self, mode='human'):
+ goal_pos = self._goal.T
+ via_pos = self._via_point.T
+
+ if self.fig is None:
+ # Create base figure once on the beginning. Afterwards only update
+ plt.ion()
+ self.fig = plt.figure()
+ ax = self.fig.add_subplot(1, 1, 1)
+
+ # limits
+ lim = np.sum(self.link_lengths) + 0.5
+ ax.set_xlim([-lim, lim])
+ ax.set_ylim([-lim, lim])
+
+ self.line, = ax.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+ self.goal_point_plot, = ax.plot(goal_pos[0], goal_pos[1], 'go')
+ self.via_point_plot, = ax.plot(via_pos[0], via_pos[1], 'gx')
+
+ self.fig.show()
+
+ self.fig.gca().set_title(f"Iteration: {self._steps}, distance: {self.end_effector - self._goal}")
+
+ if mode == "human":
+ # goal
+ if self._steps == 1:
+ self.goal_point_plot.set_data(goal_pos[0], goal_pos[1])
+ self.via_point_plot.set_data(via_pos[0], goal_pos[1])
+
+ # arm
+ self.line.set_data(self._joints[:, 0], self._joints[:, 1])
+
+ self.fig.canvas.draw()
+ self.fig.canvas.flush_events()
+
+ elif mode == "partial":
+ if self._steps == 1:
+ # fig, ax = plt.subplots()
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+ # plt.pause(0.01)
+
+ if self._steps % 20 == 0 or self._steps in [1, 199] or self._is_collided:
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k', alpha=self._steps / 200)
+ # ax.plot(line_points_in_taskspace[:, 0, 0],
+ # line_points_in_taskspace[:, 0, 1],
+ # line_points_in_taskspace[:, -1, 0],
+ # line_points_in_taskspace[:, -1, 1], marker='o', color='k', alpha=t / 200)
+
+ lim = np.sum(self.link_lengths) + 0.5
+ plt.xlim([-lim, lim])
+ plt.ylim([-1.1, lim])
+ plt.pause(0.01)
+
+ elif mode == "final":
+ if self._steps == 199 or self._is_collided:
+ # fig, ax = plt.subplots()
+
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+
+ plt.xlim(-self.n_links, self.n_links), plt.ylim(-1, self.n_links)
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+
+ plt.pause(0.01)
+
+
+if __name__ == "__main__":
+
+ env = ViaPointReacherEnv(5)
+ env.reset()
+
+ for i in range(10000):
+ ac = env.action_space.sample()
+ obs, rew, done, info = env.step(ac)
+ env.render()
+ if done:
+ env.reset()
diff --git a/fancy_gym/envs/mujoco/README.MD b/fancy_gym/envs/mujoco/README.MD
new file mode 100644
index 0000000..0ea5a1f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/README.MD
@@ -0,0 +1,15 @@
+# Custom Mujoco tasks
+
+## Step-based Environments
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
+|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
+|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
+|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
+|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
+|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
+|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
+|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
+|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
+
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/__init__.py b/fancy_gym/envs/mujoco/__init__.py
new file mode 100644
index 0000000..840691f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/__init__.py
@@ -0,0 +1,9 @@
+from .ant_jump.ant_jump import AntJumpEnv
+from .beerpong.beerpong import BeerPongEnv, BeerPongEnvStepBasedEpisodicReward
+from .half_cheetah_jump.half_cheetah_jump import HalfCheetahJumpEnv
+from .hopper_jump.hopper_jump import HopperJumpEnv
+from .hopper_jump.hopper_jump_on_box import HopperJumpOnBoxEnv
+from .hopper_throw.hopper_throw import HopperThrowEnv
+from .hopper_throw.hopper_throw_in_basket import HopperThrowInBasketEnv
+from .reacher.reacher import ReacherEnv
+from .walker_2d_jump.walker_2d_jump import Walker2dJumpEnv
diff --git a/fancy_gym/envs/mujoco/ant_jump/__init__.py b/fancy_gym/envs/mujoco/ant_jump/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/ant_jump/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/ant_jump/ant_jump.py b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
new file mode 100644
index 0000000..9311ae1
--- /dev/null
+++ b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
@@ -0,0 +1,109 @@
+from typing import Tuple, Union, Optional
+
+import numpy as np
+from gym.core import ObsType
+from gym.envs.mujoco.ant_v4 import AntEnv
+
+MAX_EPISODE_STEPS_ANTJUMP = 200
+
+
+# TODO: This environment was not tested yet. Do the following todos and test it.
+# TODO: Right now this environment only considers jumping to a specific height, which is not nice. It should be extended
+# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
+# as possible, while landing at a specific target position
+
+
+class AntJumpEnv(AntEnv):
+ """
+ Initialization changes to normal Ant:
+ - healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
+ - _ctrl_cost_weight 0.5 -> 0.0
+ - contact_cost_weight: 5e-4 -> 0.0
+ - healthy_z_range: (0.2, 1.0) -> (0.3, float('inf')) !!!!! Does that make sense, limiting height?
+ """
+
+ def __init__(self,
+ xml_file='ant.xml',
+ ctrl_cost_weight=0.0,
+ contact_cost_weight=0.0,
+ healthy_reward=0.0,
+ terminate_when_unhealthy=True,
+ healthy_z_range=(0.3, float('inf')),
+ contact_force_range=(-1.0, 1.0),
+ reset_noise_scale=0.1,
+ exclude_current_positions_from_observation=True,
+ ):
+ self.current_step = 0
+ self.max_height = 0
+ self.goal = 0
+ super().__init__(xml_file=xml_file,
+ ctrl_cost_weight=ctrl_cost_weight,
+ contact_cost_weight=contact_cost_weight,
+ healthy_reward=healthy_reward,
+ terminate_when_unhealthy=terminate_when_unhealthy,
+ healthy_z_range=healthy_z_range,
+ contact_force_range=contact_force_range,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ def step(self, action):
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+
+ height = self.get_body_com("torso")[2].copy()
+
+ self.max_height = max(height, self.max_height)
+
+ rewards = 0
+
+ ctrl_cost = self.control_cost(action)
+ contact_cost = self.contact_cost
+
+ costs = ctrl_cost + contact_cost
+
+ done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
+
+ if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done:
+ # -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
+ # height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
+ height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
+ # no healthy reward when using context, because we optimize a negative value
+ healthy_reward = 0
+
+ rewards = height_reward + healthy_reward
+
+ obs = self._get_obs()
+ reward = rewards - costs
+
+ info = {
+ 'height': height,
+ 'max_height': self.max_height,
+ 'goal': self.goal
+ }
+
+ return obs, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.goal)
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ self.current_step = 0
+ self.max_height = 0
+ # goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
+ self.goal = self.np_random.uniform(1.0, 2.5, 1)
+ return super().reset()
+
+ # reset_model had to be implemented in every env to make it deterministic
+ def reset_model(self):
+ # Todo remove if not needed
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
diff --git a/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml b/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml
new file mode 100644
index 0000000..ee4d679
--- /dev/null
+++ b/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml
@@ -0,0 +1,81 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py
new file mode 100644
index 0000000..2f2eb76
--- /dev/null
+++ b/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py
@@ -0,0 +1,23 @@
+from typing import Union, Tuple
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [False] * 111, # ant has 111 dimensional observation space !!
+ [True] # goal height
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.data.qpos[7:15].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[6:14].copy()
diff --git a/fancy_gym/envs/mujoco/beerpong/__init__.py b/fancy_gym/envs/mujoco/beerpong/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/beerpong.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong.xml
new file mode 100644
index 0000000..017bbad
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/assets/beerpong.xml
@@ -0,0 +1,238 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml
new file mode 100644
index 0000000..6786ecf
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml
@@ -0,0 +1,187 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml
new file mode 100644
index 0000000..4497818
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml
@@ -0,0 +1,193 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_convex.stl
new file mode 100755
index 0000000..133b112
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_fine.stl
new file mode 100755
index 0000000..047e9df
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_convex.stl
new file mode 100644
index 0000000..3b05c27
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_fine.stl
new file mode 100644
index 0000000..5ff94a2
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_convex.stl
new file mode 100644
index 0000000..c548448
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_fine.stl
new file mode 100644
index 0000000..495160d
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..b4bb322
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..7b2f001
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..f05174e
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_fine.stl
new file mode 100644
index 0000000..eb252d9
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_fine.stl
new file mode 100644
index 0000000..0a986fa
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..c039f0d
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..250acaf
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..993d0f7
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl
new file mode 100644
index 0000000..8448a3f
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup.stl
new file mode 100644
index 0000000..bc34058
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split1.stl
new file mode 100644
index 0000000..c80aa61
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split10.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split10.stl
new file mode 100644
index 0000000..bd5708b
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split10.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split11.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split11.stl
new file mode 100644
index 0000000..ac81da2
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split11.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split12.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split12.stl
new file mode 100644
index 0000000..a18e96e
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split12.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split13.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split13.stl
new file mode 100644
index 0000000..f0e5832
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split13.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split14.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split14.stl
new file mode 100644
index 0000000..41a3e94
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split14.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split15.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split15.stl
new file mode 100644
index 0000000..7a26643
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split15.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split16.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split16.stl
new file mode 100644
index 0000000..155b24e
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split16.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split17.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split17.stl
new file mode 100644
index 0000000..2fe8d95
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split17.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split18.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split18.stl
new file mode 100644
index 0000000..f5287b2
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split18.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split2.stl
new file mode 100644
index 0000000..5c1e50c
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split3.stl
new file mode 100644
index 0000000..ef6d547
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split3.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split4.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split4.stl
new file mode 100644
index 0000000..5476296
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split4.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split5.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split5.stl
new file mode 100644
index 0000000..ccfcd42
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split5.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split6.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split6.stl
new file mode 100644
index 0000000..72d6287
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split6.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split7.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split7.stl
new file mode 100644
index 0000000..d4918f2
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split7.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split8.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split8.stl
new file mode 100644
index 0000000..8a0cd84
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split8.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split9.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split9.stl
new file mode 100644
index 0000000..4281a69
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split9.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_convex.stl
new file mode 100755
index 0000000..b34963d
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_fine.stl
new file mode 100755
index 0000000..f6a1515
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..e6aa6b6
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..667902e
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_fine.stl
new file mode 100755
index 0000000..ed66bbb
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..aba957d
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..5cca6a9
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p3.stl
new file mode 100755
index 0000000..3343e27
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p3.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_fine.stl
new file mode 100755
index 0000000..ae505fd
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_convex.stl
new file mode 100755
index 0000000..c36cfec
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_fine.stl
new file mode 100755
index 0000000..dc633c4
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..82d0093
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..7fd5a55
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_fine.stl
new file mode 100755
index 0000000..76353ae
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_convex.stl
new file mode 100755
index 0000000..a0386f6
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_convex.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_fine.stl
new file mode 100755
index 0000000..f6b41ad
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..c36f88f
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..d00cac1
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl
new file mode 100755
index 0000000..34d1d8b
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_fine.stl
new file mode 100644
index 0000000..13d2f73
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..06e857f
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..48e1bb1
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_fine.stl
new file mode 100644
index 0000000..0d95239
Binary files /dev/null and b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_fine.stl differ
diff --git a/fancy_gym/envs/mujoco/beerpong/beerpong.py b/fancy_gym/envs/mujoco/beerpong/beerpong.py
new file mode 100644
index 0000000..368425d
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/beerpong.py
@@ -0,0 +1,266 @@
+import os
+from typing import Optional
+
+import numpy as np
+from gym import utils
+from gym.envs.mujoco import MujocoEnv
+
+MAX_EPISODE_STEPS_BEERPONG = 300
+FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2!
+
+# XML Variables
+ROBOT_COLLISION_OBJ = ["wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom",
+ "upper_arm_link_convex_decomposition_p1_geom",
+ "upper_arm_link_convex_decomposition_p2_geom",
+ "shoulder_link_convex_decomposition_p1_geom",
+ "shoulder_link_convex_decomposition_p2_geom",
+ "shoulder_link_convex_decomposition_p3_geom",
+ "base_link_convex_geom", "table_contact_geom"]
+
+CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6",
+ "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10",
+ "cup_geom_table15", "cup_geom_table16", "cup_geom_table17", "cup_geom1_table8"]
+
+
+class BeerPongEnv(MujocoEnv, utils.EzPickle):
+ def __init__(self):
+ self._steps = 0
+ # Small Context -> Easier. Todo: Should we do different versions?
+ # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
+ # self._cup_pos_min = np.array([-0.32, -2.2])
+ # self._cup_pos_max = np.array([0.32, -1.2])
+
+ self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
+ "beerpong_wo_cup_big_table.xml")
+ self._cup_pos_min = np.array([-1.42, -4.05])
+ self._cup_pos_max = np.array([1.42, -1.25])
+
+ self._start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59])
+ self._start_vel = np.zeros(7)
+
+ self.release_step = FIXED_RELEASE_STEP
+
+ self.repeat_action = 2
+ # TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
+ self.model = None
+ self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model,
+ self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+ x)
+
+ # for reward calculation
+ self.dists = []
+ self.dists_final = []
+ self.action_costs = []
+ self.ball_ground_contact_first = False
+ self.ball_table_contact = False
+ self.ball_wall_contact = False
+ self.ball_cup_contact = False
+ self.ball_in_cup = False
+ self.dist_ground_cup = -1 # distance floor to cup if first floor contact
+
+ MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco")
+ utils.EzPickle.__init__(self)
+
+ @property
+ def start_pos(self):
+ return self._start_pos
+
+ @property
+ def start_vel(self):
+ return self._start_vel
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ self.dists = []
+ self.dists_final = []
+ self.action_costs = []
+ self.ball_ground_contact_first = False
+ self.ball_table_contact = False
+ self.ball_wall_contact = False
+ self.ball_cup_contact = False
+ self.ball_in_cup = False
+ self.dist_ground_cup = -1 # distance floor to cup if first floor contact
+ return super().reset()
+
+ def reset_model(self):
+ init_pos_all = self.init_qpos.copy()
+ init_pos_robot = self.start_pos
+ init_vel = np.zeros_like(init_pos_all)
+
+ self._steps = 0
+
+ start_pos = init_pos_all
+ start_pos[0:7] = init_pos_robot
+
+ # TODO: Ask Max why we need to set the state twice.
+ self.set_state(start_pos, init_vel)
+ start_pos[7::] = self.data.site("init_ball_pos").xpos.copy()
+ self.set_state(start_pos, init_vel)
+ xy = self.np_random.uniform(self._cup_pos_min, self._cup_pos_max)
+ xyz = np.zeros(3)
+ xyz[:2] = xy
+ xyz[-1] = 0.840
+ self.model.body("cup_table").pos[:] = xyz
+ return self._get_obs()
+
+ def step(self, a):
+ crash = False
+ for _ in range(self.repeat_action):
+ applied_action = a + self.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
+ try:
+ self.do_simulation(applied_action, self.frame_skip)
+ # self.reward_function.check_contacts(self.sim) # I assume this is not important?
+ if self._steps < self.release_step:
+ self.data.qpos[7::] = self.data.site('init_ball_pos').xpos.copy()
+ self.data.qvel[7::] = self.data.sensor('init_ball_vel').data.copy()
+ crash = False
+ except Exception as e:
+ crash = True
+
+ ob = self._get_obs()
+
+ if not crash:
+ reward, reward_infos = self._get_reward(applied_action)
+ is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference
+ done = is_collided
+ self._steps += 1
+ else:
+ reward = -30
+ done = True
+ reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
+
+ infos = dict(
+ reward=reward,
+ action=a,
+ q_pos=self.data.qpos[0:7].ravel().copy(),
+ q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
+ )
+ infos.update(reward_infos)
+ return ob, reward, done, infos
+
+ def _get_obs(self):
+ theta = self.data.qpos.flat[:7].copy()
+ theta_dot = self.data.qvel.flat[:7].copy()
+ ball_pos = self.data.qpos.flat[7:].copy()
+ cup_goal_diff_final = ball_pos - self.data.site("cup_goal_final_table").xpos.copy()
+ cup_goal_diff_top = ball_pos - self.data.site("cup_goal_table").xpos.copy()
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ theta_dot,
+ cup_goal_diff_final,
+ cup_goal_diff_top,
+ self.model.body("cup_table").pos[:2].copy(),
+ # [self._steps], # Use TimeAwareObservation Wrapper instead ....
+ ])
+
+ @property
+ def dt(self):
+ return super(BeerPongEnv, self).dt * self.repeat_action
+
+ def _get_reward(self, action):
+ goal_pos = self.data.site("cup_goal_table").xpos
+ goal_final_pos = self.data.site("cup_goal_final_table").xpos
+ ball_pos = self.data.qpos.flat[7:].copy()
+ ball_vel = self.data.qvel.flat[7:].copy()
+
+ self._check_contacts()
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.dist_ground_cup = np.linalg.norm(ball_pos - goal_pos) \
+ if self.ball_ground_contact_first and self.dist_ground_cup == -1 else self.dist_ground_cup
+ action_cost = np.sum(np.square(action))
+ self.action_costs.append(np.copy(action_cost))
+ # # ##################### Reward function which does not force to bounce once on the table (quad dist) #########
+
+ # Is this needed?
+ # self._is_collided = self._check_collision_with_itself([self.geom_id(name) for name in CUP_COLLISION_OBJ])
+
+ if self._steps == MAX_EPISODE_STEPS_BEERPONG - 1: # or self._is_collided:
+ min_dist = np.min(self.dists)
+ final_dist = self.dists_final[-1]
+ if self.ball_ground_contact_first:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 2, -4
+ else:
+ if not self.ball_in_cup:
+ if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -4
+ else:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -2
+ else:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
+ action_cost = 1e-4 * np.mean(action_cost)
+ reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
+ action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
+ # release step punishment
+ min_time_bound = 0.1
+ max_time_bound = 1.0
+ release_time = self.release_step * self.dt
+ release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
+ int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
+ reward += release_time_rew
+ success = self.ball_in_cup
+ else:
+ action_cost = 1e-2 * action_cost
+ reward = - action_cost
+ success = False
+ # ##############################################################################################################
+ infos = {"success": success, "ball_pos": ball_pos.copy(),
+ "ball_vel": ball_vel.copy(), "action_cost": action_cost, "task_reward": reward,
+ "table_contact_first": int(not self.ball_ground_contact_first),
+ "is_collided": False} # TODO: Check if is collided is needed
+ return reward, infos
+
+ def _check_contacts(self):
+ # TODO proper contact detection using sensors?
+ if not self.ball_table_contact:
+ self.ball_table_contact = self._check_collision({self.geom_id("ball_geom")},
+ {self.geom_id("table_contact_geom")})
+ if not self.ball_cup_contact:
+ self.ball_cup_contact = self._check_collision({self.geom_id("ball_geom")},
+ {self.geom_id(name) for name in CUP_COLLISION_OBJ})
+ if not self.ball_wall_contact:
+ self.ball_wall_contact = self._check_collision({self.geom_id("ball_geom")},
+ {self.geom_id("wall")})
+ if not self.ball_in_cup:
+ self.ball_in_cup = self._check_collision({self.geom_id("ball_geom")},
+ {self.geom_id("cup_base_table_contact")})
+ if not self.ball_ground_contact_first:
+ if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact \
+ and not self.ball_in_cup:
+ self.ball_ground_contact_first = self._check_collision({self.geom_id("ball_geom")},
+ {self.geom_id("ground")})
+
+ def _check_collision(self, id_set_1, id_set_2=None):
+ """
+ Checks if id_set1 has a collision with id_set2.
+ If id_set_2 is set to None, it will check for a collision with itself (id_set_1).
+ """
+ collision_id_set = id_set_2 - id_set_1 if id_set_2 is not None else id_set_1
+ for coni in range(self.data.ncon):
+ con = self.data.contact[coni]
+ if ((con.geom1 in id_set_1 and con.geom2 in collision_id_set) or
+ (con.geom2 in id_set_1 and con.geom1 in collision_id_set)):
+ return True
+ return False
+
+
+class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
+
+ def step(self, a):
+ if self._steps < FIXED_RELEASE_STEP:
+ return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
+ else:
+ reward = 0
+ done = True
+ while self._steps < MAX_EPISODE_STEPS_BEERPONG:
+ obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
+ np.zeros(a.shape))
+ reward += sub_reward
+ return obs, reward, done, infos
diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/__init__.py b/fancy_gym/envs/mujoco/beerpong/deprecated/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
new file mode 100644
index 0000000..015e887
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
@@ -0,0 +1,212 @@
+import os
+
+import mujoco_py.builder
+import numpy as np
+from gym import utils
+from gym.envs.mujoco import MujocoEnv
+
+from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
+
+
+class BeerPongEnv(MujocoEnv, utils.EzPickle):
+ def __init__(self, frame_skip=2):
+ self._steps = 0
+ # Small Context -> Easier. Todo: Should we do different versions?
+ # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
+ # "beerpong_wo_cup" + ".xml")
+ # self._cup_pos_min = np.array([-0.32, -2.2])
+ # self._cup_pos_max = np.array([0.32, -1.2])
+
+ self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../assets",
+ "beerpong_wo_cup_big_table" + ".xml")
+ self._cup_pos_min = np.array([-1.42, -4.05])
+ self._cup_pos_max = np.array([1.42, -1.25])
+
+ self._start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59])
+ self._start_vel = np.zeros(7)
+
+ self.release_step = 100 # time step of ball release
+ self.ep_length = 600 // frame_skip
+
+ self.reward_function = BeerPongReward()
+ self.repeat_action = frame_skip
+ self.model = None
+ self.site_id = lambda x: self.model.site_name2id(x)
+ self.body_id = lambda x: self.model.body_name2id(x)
+
+ MujocoEnv.__init__(self, self.xml_path, frame_skip=1)
+ utils.EzPickle.__init__(self)
+
+ @property
+ def start_pos(self):
+ return self._start_pos
+
+ @property
+ def start_vel(self):
+ return self._start_vel
+
+ def reset(self):
+ self.reward_function.reset()
+ return super().reset()
+
+ def reset_model(self):
+ init_pos_all = self.init_qpos.copy()
+ init_pos_robot = self.start_pos
+ init_vel = np.zeros_like(init_pos_all)
+
+ self._steps = 0
+
+ start_pos = init_pos_all
+ start_pos[0:7] = init_pos_robot
+
+ # TODO: Ask Max why we need to set the state twice.
+ self.set_state(start_pos, init_vel)
+ start_pos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
+ self.set_state(start_pos, init_vel)
+ xy = self.np_random.uniform(self._cup_pos_min, self._cup_pos_max)
+ xyz = np.zeros(3)
+ xyz[:2] = xy
+ xyz[-1] = 0.840
+ self.sim.model.body_pos[self.body_id("cup_table")] = xyz
+ return self._get_obs()
+
+ def step(self, a):
+ crash = False
+ for _ in range(self.repeat_action):
+ applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
+ try:
+ self.do_simulation(applied_action, self.frame_skip)
+ self.reward_function.initialize(self)
+ # self.reward_function.check_contacts(self.sim) # I assume this is not important?
+ if self._steps < self.release_step:
+ self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
+ self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
+ crash = False
+ except mujoco_py.builder.MujocoException:
+ crash = True
+
+ ob = self._get_obs()
+
+ if not crash:
+ reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
+ is_collided = reward_infos['is_collided']
+ done = is_collided or self._steps == self.ep_length - 1
+ self._steps += 1
+ else:
+ reward = -30
+ done = True
+ reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
+
+ infos = dict(
+ reward=reward,
+ action=a,
+ q_pos=self.sim.data.qpos[0:7].ravel().copy(),
+ q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
+ )
+ infos.update(reward_infos)
+ return ob, reward, done, infos
+
+ def _get_obs(self):
+ theta = self.sim.data.qpos.flat[:7]
+ theta_dot = self.sim.data.qvel.flat[:7]
+ ball_pos = self.data.get_body_xpos("ball").copy()
+ cup_goal_diff_final = ball_pos - self.data.get_site_xpos("cup_goal_final_table").copy()
+ cup_goal_diff_top = ball_pos - self.data.get_site_xpos("cup_goal_table").copy()
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ theta_dot,
+ cup_goal_diff_final,
+ cup_goal_diff_top,
+ self.sim.model.body_pos[self.body_id("cup_table")][:2].copy(),
+ [self._steps],
+ ])
+
+ @property
+ def dt(self):
+ return super(BeerPongEnv, self).dt * self.repeat_action
+
+
+class BeerPongEnvFixedReleaseStep(BeerPongEnv):
+ def __init__(self, frame_skip=2):
+ super().__init__(frame_skip)
+ self.release_step = 62 # empirically evaluated for frame_skip=2!
+
+
+class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
+ def __init__(self, frame_skip=2):
+ super().__init__(frame_skip)
+ self.release_step = 62 # empirically evaluated for frame_skip=2!
+
+ def step(self, a):
+ if self._steps < self.release_step:
+ return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
+ else:
+ reward = 0
+ done = False
+ while not done:
+ sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
+ np.zeros(a.shape))
+ reward += sub_reward
+ infos = sub_infos
+ ob = sub_ob
+ ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
+ # internal steps and thus, the observation also needs to be set correctly
+ return ob, reward, done, infos
+
+
+# class BeerBongEnvStepBased(BeerBongEnv):
+# def __init__(self, frame_skip=1, apply_gravity_comp=True, noisy=False, rndm_goal=False, cup_goal_pos=None):
+# super().__init__(frame_skip, apply_gravity_comp, noisy, rndm_goal, cup_goal_pos)
+# self.release_step = 62 # empirically evaluated for frame_skip=2!
+#
+# def step(self, a):
+# if self._steps < self.release_step:
+# return super(BeerBongEnvStepBased, self).step(a)
+# else:
+# reward = 0
+# done = False
+# while not done:
+# sub_ob, sub_reward, done, sub_infos = super(BeerBongEnvStepBased, self).step(np.zeros(a.shape))
+# if not done or sub_infos['sim_crash']:
+# reward += sub_reward
+# else:
+# ball_pos = self.sim.data.body_xpos[self.sim.model._body_name2id["ball"]].copy()
+# cup_goal_dist_final = np.linalg.norm(ball_pos - self.sim.data.site_xpos[
+# self.sim.model._site_name2id["cup_goal_final_table"]].copy())
+# cup_goal_dist_top = np.linalg.norm(ball_pos - self.sim.data.site_xpos[
+# self.sim.model._site_name2id["cup_goal_table"]].copy())
+# if sub_infos['success']:
+# dist_rew = -cup_goal_dist_final ** 2
+# else:
+# dist_rew = -0.5 * cup_goal_dist_final ** 2 - cup_goal_dist_top ** 2
+# reward = reward - sub_infos['action_cost'] + dist_rew
+# infos = sub_infos
+# ob = sub_ob
+# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
+# # internal steps and thus, the observation also needs to be set correctly
+# return ob, reward, done, infos
+
+
+if __name__ == "__main__":
+ env = BeerPongEnv(frame_skip=2)
+ env.seed(0)
+ # env = BeerBongEnvStepBased(frame_skip=2)
+ # env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
+ # env = BeerBongEnvFixedReleaseStep(frame_skip=2)
+ import time
+
+ env.reset()
+ env.render("human")
+ for i in range(600):
+ # ac = 10 * env.action_space.sample()
+ ac = 0.05 * np.ones(7)
+ obs, rew, d, info = env.step(ac)
+ env.render("human")
+
+ if d:
+ print('reward:', rew)
+ print('RESETTING')
+ env.reset()
+ time.sleep(1)
+ env.close()
diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py
new file mode 100644
index 0000000..4308921
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py
@@ -0,0 +1,183 @@
+import numpy as np
+
+
+class BeerPongReward:
+ def __init__(self):
+
+ self.robot_collision_objects = ["wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom",
+ "upper_arm_link_convex_decomposition_p1_geom",
+ "upper_arm_link_convex_decomposition_p2_geom",
+ "shoulder_link_convex_decomposition_p1_geom",
+ "shoulder_link_convex_decomposition_p2_geom",
+ "shoulder_link_convex_decomposition_p3_geom",
+ "base_link_convex_geom", "table_contact_geom"]
+
+ self.cup_collision_objects = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6",
+ "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10",
+ "cup_geom_table15",
+ "cup_geom_table16",
+ "cup_geom_table17", "cup_geom1_table8",
+ ]
+
+ self.dists = None
+ self.dists_final = None
+ self.action_costs = None
+ self.ball_ground_contact_first = False
+ self.ball_table_contact = False
+ self.ball_wall_contact = False
+ self.ball_cup_contact = False
+ self.ball_in_cup = False
+ self.dist_ground_cup = -1 # distance floor to cup if first floor contact
+
+ # IDs
+ self.ball_collision_id = None
+ self.table_collision_id = None
+ self.wall_collision_id = None
+ self.cup_table_collision_id = None
+ self.ground_collision_id = None
+ self.cup_collision_ids = None
+ self.robot_collision_ids = None
+ self.reset()
+ self.is_initialized = False
+
+ def reset(self):
+ self.dists = []
+ self.dists_final = []
+ self.action_costs = []
+ self.ball_ground_contact_first = False
+ self.ball_table_contact = False
+ self.ball_wall_contact = False
+ self.ball_cup_contact = False
+ self.ball_in_cup = False
+ self.dist_ground_cup = -1 # distance floor to cup if first floor contact
+
+ def initialize(self, env):
+
+ if not self.is_initialized:
+ self.is_initialized = True
+ # TODO: Find a more elegant way to acces to the geom ids in each step -> less code
+ self.ball_collision_id = {env.model.geom_name2id("ball_geom")}
+ # self.ball_collision_id = env.model.geom_name2id("ball_geom")
+ self.table_collision_id = {env.model.geom_name2id("table_contact_geom")}
+ # self.table_collision_id = env.model.geom_name2id("table_contact_geom")
+ self.wall_collision_id = {env.model.geom_name2id("wall")}
+ # self.wall_collision_id = env.model.geom_name2id("wall")
+ self.cup_table_collision_id = {env.model.geom_name2id("cup_base_table_contact")}
+ # self.cup_table_collision_id = env.model.geom_name2id("cup_base_table_contact")
+ self.ground_collision_id = {env.model.geom_name2id("ground")}
+ # self.ground_collision_id = env.model.geom_name2id("ground")
+ self.cup_collision_ids = {env.model.geom_name2id(name) for name in self.cup_collision_objects}
+ # self.cup_collision_ids = [env.model.geom_name2id(name) for name in self.cup_collision_objects]
+ self.robot_collision_ids = [env.model.geom_name2id(name) for name in self.robot_collision_objects]
+
+ def compute_reward(self, env, action):
+
+ goal_pos = env.data.get_site_xpos("cup_goal_table")
+ ball_pos = env.data.get_body_xpos("ball")
+ ball_vel = env.data.get_body_xvelp("ball")
+ goal_final_pos = env.data.get_site_xpos("cup_goal_final_table")
+
+ self.check_contacts(env.sim)
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.dist_ground_cup = np.linalg.norm(ball_pos - goal_pos) \
+ if self.ball_ground_contact_first and self.dist_ground_cup == -1 else self.dist_ground_cup
+ action_cost = np.sum(np.square(action))
+ self.action_costs.append(np.copy(action_cost))
+ # # ##################### Reward function which does not force to bounce once on the table (quad dist) #########
+
+ # Is this needed?
+ # self._is_collided = self._check_collision_with_itself(env.sim, self.robot_collision_ids)
+
+ if env._steps == env.ep_length - 1: # or self._is_collided:
+ min_dist = np.min(self.dists)
+ final_dist = self.dists_final[-1]
+ if self.ball_ground_contact_first:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 2, -4
+ else:
+ if not self.ball_in_cup:
+ if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -4
+ else:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -2
+ else:
+ min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
+ action_cost = 1e-4 * np.mean(action_cost)
+ reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
+ action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
+ # release step punishment
+ min_time_bound = 0.1
+ max_time_bound = 1.0
+ release_time = env.release_step * env.dt
+ release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
+ int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
+ reward += release_time_rew
+ success = self.ball_in_cup
+ else:
+ action_cost = 1e-2 * action_cost
+ reward = - action_cost
+ success = False
+ # ##############################################################################################################
+ infos = {"success": success, "ball_pos": ball_pos.copy(),
+ "ball_vel": ball_vel.copy(), "action_cost": action_cost, "task_reward": reward,
+ "table_contact_first": int(not self.ball_ground_contact_first),
+ "is_collided": False} # TODO: Check if is collided is needed
+ return reward, infos
+
+ def check_contacts(self, sim):
+ if not self.ball_table_contact:
+ self.ball_table_contact = self._check_collision(sim, self.ball_collision_id, self.table_collision_id)
+ if not self.ball_cup_contact:
+ self.ball_cup_contact = self._check_collision(sim, self.ball_collision_id, self.cup_collision_ids)
+ if not self.ball_wall_contact:
+ self.ball_wall_contact = self._check_collision(sim, self.ball_collision_id, self.wall_collision_id)
+ if not self.ball_in_cup:
+ self.ball_in_cup = self._check_collision(sim, self.ball_collision_id, self.cup_table_collision_id)
+ if not self.ball_ground_contact_first:
+ if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact \
+ and not self.ball_in_cup:
+ self.ball_ground_contact_first = self._check_collision(sim, self.ball_collision_id,
+ self.ground_collision_id)
+
+ # Checks if id_set1 has a collision with id_set2
+ def _check_collision(self, sim, id_set_1, id_set_2):
+ """
+ If id_set_2 is set to None, it will check for a collision with itself (id_set_1).
+ """
+ collision_id_set = id_set_2 - id_set_1 if id_set_2 is not None else id_set_1
+ for coni in range(sim.data.ncon):
+ con = sim.data.contact[coni]
+ if ((con.geom1 in id_set_1 and con.geom2 in collision_id_set) or
+ (con.geom2 in id_set_1 and con.geom1 in collision_id_set)):
+ return True
+ return False
+
+ # def _check_collision_with_itself(self, sim, collision_ids):
+ # col_1, col_2 = False, False
+ # for j, id in enumerate(collision_ids):
+ # col_1 = self._check_collision_with_set_of_objects(sim, id, collision_ids[:j])
+ # if j != len(collision_ids) - 1:
+ # col_2 = self._check_collision_with_set_of_objects(sim, id, collision_ids[j + 1:])
+ # else:
+ # col_2 = False
+ # collision = True if col_1 or col_2 else False
+ # return collision
+
+ ### This function will not be needed if we really do not need to check for collision with itself
+ # def _check_collision_with_set_of_objects(self, sim, id_1, id_list):
+ # for coni in range(0, sim.data.ncon):
+ # con = sim.data.contact[coni]
+ #
+ # collision = con.geom1 in id_list and con.geom2 == id_1
+ # collision_trans = con.geom1 == id_1 and con.geom2 in id_list
+ #
+ # if collision or collision_trans:
+ # return True
+ # return False
diff --git a/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
new file mode 100644
index 0000000..8988f5a
--- /dev/null
+++ b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
@@ -0,0 +1,41 @@
+from typing import Union, Tuple
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ return np.hstack([
+ [False] * 7, # cos
+ [False] * 7, # sin
+ [False] * 7, # joint velocities
+ [False] * 3, # cup_goal_diff_final
+ [False] * 3, # cup_goal_diff_top
+ [True] * 2, # xy position of cup
+ # [False] # env steps
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qpos[0:7].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[0:7].copy()
+
+ # TODO: Fix this
+ def episode_callback(self, action: np.ndarray, mp) -> Tuple[np.ndarray, Union[np.ndarray, None]]:
+ if mp.learn_tau:
+ self.release_step = action[0] / self.dt # Tau value
+ return action, None
+
+ def set_context(self, context):
+ xyz = np.zeros(3)
+ xyz[:2] = context
+ xyz[-1] = 0.840
+ self.model.body_pos[self.cup_table_id] = xyz
+ return self.get_observation_from_step(self.get_obs())
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py b/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml b/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml
new file mode 100644
index 0000000..41daadf
--- /dev/null
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml
@@ -0,0 +1,62 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
new file mode 100644
index 0000000..e0a5982
--- /dev/null
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
@@ -0,0 +1,90 @@
+import os
+from typing import Tuple, Union, Optional
+
+import numpy as np
+from gym.core import ObsType
+from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv
+
+MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
+
+
+class HalfCheetahJumpEnv(HalfCheetahEnv):
+ """
+ _ctrl_cost_weight 0.1 -> 0.0
+ """
+
+ def __init__(self,
+ xml_file='cheetah.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=0.0,
+ reset_noise_scale=0.1,
+ context=True,
+ exclude_current_positions_from_observation=True,
+ max_episode_steps=100):
+ self.current_step = 0
+ self.max_height = 0
+ # self.max_episode_steps = max_episode_steps
+ self.goal = 0
+ self.context = context
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ super().__init__(xml_file=xml_file,
+ forward_reward_weight=forward_reward_weight,
+ ctrl_cost_weight=ctrl_cost_weight,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ def step(self, action):
+
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+
+ height_after = self.get_body_com("torso")[2]
+ self.max_height = max(height_after, self.max_height)
+
+ ## Didnt use fell_over, because base env also has no done condition - Paul and Marc
+ # fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke?
+ # TODO: Should a fall over be checked here?
+ done = False
+
+ ctrl_cost = self.control_cost(action)
+ costs = ctrl_cost
+
+ if self.current_step == MAX_EPISODE_STEPS_HALFCHEETAHJUMP:
+ height_goal_distance = -10 * np.linalg.norm(self.max_height - self.goal) + 1e-8 if self.context \
+ else self.max_height
+ rewards = self._forward_reward_weight * height_goal_distance
+ else:
+ rewards = 0
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'height': height_after,
+ 'max_height': self.max_height
+ }
+
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.goal)
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ self.max_height = 0
+ self.current_step = 0
+ self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6
+ return super().reset()
+
+ # overwrite reset_model to make it deterministic
+ def reset_model(self):
+ # TODO remove if not needed!
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
new file mode 100644
index 0000000..11b169b
--- /dev/null
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
@@ -0,0 +1,22 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+ @property
+ def context_mask(self) -> np.ndarray:
+ return np.hstack([
+ [False] * 17,
+ [True] # goal height
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.data.qpos[3:9].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[3:9].copy()
diff --git a/fancy_gym/envs/mujoco/hopper_jump/__init__.py b/fancy_gym/envs/mujoco/hopper_jump/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
new file mode 100644
index 0000000..3348bab
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
@@ -0,0 +1,52 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
new file mode 100644
index 0000000..69d78ff
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
@@ -0,0 +1,51 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
new file mode 100644
index 0000000..63f40c8
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
@@ -0,0 +1,251 @@
+import os
+
+import numpy as np
+from gym.envs.mujoco.hopper_v4 import HopperEnv
+
+MAX_EPISODE_STEPS_HOPPERJUMP = 250
+
+
+class HopperJumpEnv(HopperEnv):
+ """
+ Initialization changes to normal Hopper:
+ - terminate_when_unhealthy: True -> False
+ - healthy_reward: 1.0 -> 2.0
+ - healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
+ - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
+ - exclude_current_positions_from_observation: True -> False
+ """
+
+ def __init__(
+ self,
+ xml_file='hopper_jump.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=2.0,
+ contact_weight=2.0,
+ height_weight=10.0,
+ dist_weight=3.0,
+ terminate_when_unhealthy=False,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.5, float('inf')),
+ healthy_angle_range=(-float('inf'), float('inf')),
+ reset_noise_scale=5e-3,
+ exclude_current_positions_from_observation=False,
+ sparse=False,
+ ):
+
+ self.sparse = sparse
+ self._height_weight = height_weight
+ self._dist_weight = dist_weight
+ self._contact_weight = contact_weight
+
+ self.max_height = 0
+ self.goal = np.zeros(3, )
+
+ self._steps = 0
+ self.contact_with_floor = False
+ self.init_floor_contact = False
+ self.has_left_floor = False
+ self.contact_dist = None
+
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ super().__init__(xml_file=xml_file,
+ forward_reward_weight=forward_reward_weight,
+ ctrl_cost_weight=ctrl_cost_weight,
+ healthy_reward=healthy_reward,
+ terminate_when_unhealthy=terminate_when_unhealthy,
+ healthy_state_range=healthy_state_range,
+ healthy_z_range=healthy_z_range,
+ healthy_angle_range=healthy_angle_range,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ # increase initial height
+ self.init_qpos[1] = 1.5
+
+ @property
+ def exclude_current_positions_from_observation(self):
+ return self._exclude_current_positions_from_observation
+
+ def step(self, action):
+ self._steps += 1
+
+ self.do_simulation(action, self.frame_skip)
+
+ height_after = self.get_body_com("torso")[2]
+ #site_pos_after = self.data.get_site_xpos('foot_site')
+ site_pos_after = self.data.site('foot_site').xpos
+ self.max_height = max(height_after, self.max_height)
+
+ has_floor_contact = self._is_floor_foot_contact() if not self.contact_with_floor else False
+
+ if not self.init_floor_contact:
+ self.init_floor_contact = has_floor_contact
+ if self.init_floor_contact and not self.has_left_floor:
+ self.has_left_floor = not has_floor_contact
+ if not self.contact_with_floor and self.has_left_floor:
+ self.contact_with_floor = has_floor_contact
+
+ ctrl_cost = self.control_cost(action)
+ costs = ctrl_cost
+ done = False
+
+ goal_dist = np.linalg.norm(site_pos_after - self.goal)
+ if self.contact_dist is None and self.contact_with_floor:
+ self.contact_dist = goal_dist
+
+ rewards = 0
+ if not self.sparse or (self.sparse and self._steps >= MAX_EPISODE_STEPS_HOPPERJUMP):
+ healthy_reward = self.healthy_reward
+ distance_reward = -goal_dist * self._dist_weight
+ height_reward = (self.max_height if self.sparse else self.get_body_com("torso")[2]) * self._height_weight
+ contact_reward = -(self.contact_dist or 5) * self._contact_weight
+ rewards = self._forward_reward_weight * (distance_reward + height_reward + contact_reward + healthy_reward)
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = dict(
+ height=height_after,
+ x_pos=site_pos_after,
+ max_height=self.max_height,
+ goal=self.goal[:1],
+ goal_dist=goal_dist,
+ height_rew=self.max_height,
+ healthy_reward=self.healthy_reward,
+ healthy=self.is_healthy,
+ contact_dist=self.contact_dist or 0
+ )
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ # goal_dist = self.data.get_site_xpos('foot_site') - self.goal
+ goal_dist = self.data.site('foot_site').xpos - self.goal
+ return np.concatenate((super(HopperJumpEnv, self)._get_obs(), goal_dist.copy(), self.goal[:1]))
+
+ def reset_model(self):
+ super(HopperJumpEnv, self).reset_model()
+
+ # self.goal = self.np_random.uniform(0.3, 1.35, 1)[0]
+ self.goal = np.concatenate([self.np_random.uniform(0.3, 1.35, 1), np.zeros(2, )])
+ # self.sim.model.body_pos[self.sim.model.body_name2id('goal_site_body')] = self.goal
+ self.model.body('goal_site_body').pos[:] = np.copy(self.goal)
+ self.max_height = 0
+ self._steps = 0
+
+ noise_low = np.zeros(self.model.nq)
+ noise_low[3] = -0.5
+ noise_low[4] = -0.2
+
+ noise_high = np.zeros(self.model.nq)
+ noise_high[5] = 0.785
+
+ qpos = (
+ self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
+ self.init_qpos
+ )
+ qvel = (
+ # self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
+ self.init_qvel
+ )
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ self.has_left_floor = False
+ self.contact_with_floor = False
+ self.init_floor_contact = False
+ self.contact_dist = None
+
+ return observation
+
+ def _is_floor_foot_contact(self):
+ # floor_geom_id = self.model.geom_name2id('floor')
+ # foot_geom_id = self.model.geom_name2id('foot_geom')
+ # TODO: do this properly over a sensor in the xml file, see dmc hopper
+ floor_geom_id = self._mujoco_bindings.mj_name2id(self.model,
+ self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+ 'floor')
+ foot_geom_id = self._mujoco_bindings.mj_name2id(self.model,
+ self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
+ 'foot_geom')
+ for i in range(self.data.ncon):
+ contact = self.data.contact[i]
+ collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id
+ collision_trans = contact.geom1 == foot_geom_id and contact.geom2 == floor_geom_id
+ if collision or collision_trans:
+ return True
+ return False
+
+# TODO is that needed? if so test it
+class HopperJumpStepEnv(HopperJumpEnv):
+
+ def __init__(self,
+ xml_file='hopper_jump.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=1.0,
+ height_weight=3,
+ dist_weight=3,
+ terminate_when_unhealthy=False,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.5, float('inf')),
+ healthy_angle_range=(-float('inf'), float('inf')),
+ reset_noise_scale=5e-3,
+ exclude_current_positions_from_observation=False
+ ):
+
+ self._height_weight = height_weight
+ self._dist_weight = dist_weight
+ super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
+ healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
+ exclude_current_positions_from_observation)
+
+ def step(self, action):
+ self._steps += 1
+
+ self.do_simulation(action, self.frame_skip)
+
+ height_after = self.get_body_com("torso")[2]
+ site_pos_after = self.data.site('foot_site').xpos.copy()
+ self.max_height = max(height_after, self.max_height)
+
+ ctrl_cost = self.control_cost(action)
+ healthy_reward = self.healthy_reward
+ height_reward = self._height_weight * height_after
+ goal_dist = np.linalg.norm(site_pos_after - np.array([self.goal, 0, 0]))
+ goal_dist_reward = -self._dist_weight * goal_dist
+ dist_reward = self._forward_reward_weight * (goal_dist_reward + height_reward)
+
+ rewards = dist_reward + healthy_reward
+ costs = ctrl_cost
+ done = False
+
+ # This is only for logging the distance to goal when first having the contact
+ has_floor_contact = self._is_floor_foot_contact() if not self.contact_with_floor else False
+
+ if not self.init_floor_contact:
+ self.init_floor_contact = has_floor_contact
+ if self.init_floor_contact and not self.has_left_floor:
+ self.has_left_floor = not has_floor_contact
+ if not self.contact_with_floor and self.has_left_floor:
+ self.contact_with_floor = has_floor_contact
+
+ if self.contact_dist is None and self.contact_with_floor:
+ self.contact_dist = goal_dist
+
+ ##############################################################
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'height': height_after,
+ 'x_pos': site_pos_after,
+ 'max_height': copy.copy(self.max_height),
+ 'goal': copy.copy(self.goal),
+ 'goal_dist': goal_dist,
+ 'height_rew': height_reward,
+ 'healthy_reward': healthy_reward,
+ 'healthy': copy.copy(self.is_healthy),
+ 'contact_dist': copy.copy(self.contact_dist) or 0
+ }
+ return observation, reward, done, info
diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
new file mode 100644
index 0000000..f9834bd
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
@@ -0,0 +1,170 @@
+import os
+
+import numpy as np
+from gym.envs.mujoco.hopper_v4 import HopperEnv
+
+MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
+
+
+class HopperJumpOnBoxEnv(HopperEnv):
+ """
+ Initialization changes to normal Hopper:
+ - healthy_reward: 1.0 -> 0.01 -> 0.001
+ - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
+ """
+
+ def __init__(self,
+ xml_file='hopper_jump_on_box.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=0.001,
+ terminate_when_unhealthy=True,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.7, float('inf')),
+ healthy_angle_range=(-float('inf'), float('inf')),
+ reset_noise_scale=5e-3,
+ context=True,
+ exclude_current_positions_from_observation=True,
+ max_episode_steps=250):
+ self.current_step = 0
+ self.max_height = 0
+ self.max_episode_steps = max_episode_steps
+ self.min_distance = 5000 # what value?
+ self.hopper_on_box = False
+ self.context = context
+ self.box_x = 1
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
+ healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
+ exclude_current_positions_from_observation)
+
+ def step(self, action):
+
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+ height_after = self.get_body_com("torso")[2]
+ foot_pos = self.get_body_com("foot")
+ self.max_height = max(height_after, self.max_height)
+
+ vx, vz, vangle = self.data.qvel[0:3]
+
+ s = self.state_vector()
+ fell_over = not (np.isfinite(s).all() and (np.abs(s[2:]) < 100).all() and (height_after > .7))
+
+ box_pos = self.get_body_com("box")
+ box_size = 0.3
+ box_height = 0.3
+ box_center = (box_pos[0] + (box_size / 2), box_pos[1], box_height)
+ foot_length = 0.3
+ foot_center = foot_pos[0] - (foot_length / 2)
+
+ dist = np.linalg.norm(foot_pos - box_center)
+
+ self.min_distance = min(dist, self.min_distance)
+
+ # check if foot is on box
+ is_on_box_x = box_pos[0] <= foot_center <= box_pos[0] + box_size
+ is_on_box_y = True # is y always true because he can only move in x and z direction?
+ is_on_box_z = box_height - 0.02 <= foot_pos[2] <= box_height + 0.02
+ is_on_box = is_on_box_x and is_on_box_y and is_on_box_z
+ if is_on_box:
+ self.hopper_on_box = True
+
+ ctrl_cost = self.control_cost(action)
+
+ costs = ctrl_cost
+
+ done = fell_over or self.hopper_on_box
+
+ if self.current_step >= self.max_episode_steps or done:
+ done = False
+
+ max_height = self.max_height.copy()
+ min_distance = self.min_distance.copy()
+
+ alive_bonus = self._healthy_reward * self.current_step
+ box_bonus = 0
+ rewards = 0
+
+ # TODO explain what we did here for the calculation of the reward
+ if is_on_box:
+ if self.context:
+ rewards -= 100 * vx ** 2 if 100 * vx ** 2 < 1 else 1
+ else:
+ box_bonus = 10
+ rewards += box_bonus
+ # rewards -= dist * dist ???? why when already on box?
+ # reward -= 90 - abs(angle)
+ rewards -= 100 * vx ** 2 if 100 * vx ** 2 < 1 else 1
+ rewards += max_height * 3
+ rewards += alive_bonus
+
+ else:
+ if self.context:
+ rewards = -10 - min_distance
+ rewards += max_height * 3
+ else:
+ # reward -= (dist*dist)
+ rewards -= min_distance * min_distance
+ # rewards -= dist / self.max_distance
+ rewards += max_height
+ rewards += alive_bonus
+
+ else:
+ rewards = 0
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'height': height_after,
+ 'max_height': self.max_height.copy(),
+ 'min_distance': self.min_distance,
+ 'goal': self.box_x,
+ }
+
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.box_x)
+
+ def reset(self):
+
+ self.max_height = 0
+ self.min_distance = 5000
+ self.current_step = 0
+ self.hopper_on_box = False
+ if self.context:
+ self.box_x = self.np_random.uniform(1, 3, 1)
+ self.model.body("box").pos = [self.box_x[0], 0, 0]
+ return super().reset()
+
+ # overwrite reset_model to make it deterministic
+ def reset_model(self):
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
+
+if __name__ == '__main__':
+ render_mode = "human" # "human" or "partial" or "final"
+ env = HopperJumpOnBoxEnv()
+ obs = env.reset()
+
+ for i in range(2000):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = env.action_space.sample()
+ obs, rew, d, info = env.step(ac)
+ if i % 10 == 0:
+ env.render(mode=render_mode)
+ if d:
+ print('After ', i, ' steps, done: ', d)
+ env.reset()
+
+ env.close()
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
new file mode 100644
index 0000000..d46fa92
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
@@ -0,0 +1,27 @@
+from typing import Union, Tuple
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ # Random x goal + random init pos
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [False] * (2 + int(not self.exclude_current_positions_from_observation)), # position
+ [True] * 3, # set to true if randomize initial pos
+ [False] * 6, # velocity
+ [True] * 3, # goal distance
+ [True] # goal
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qpos[3:6].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[3:6].copy()
diff --git a/fancy_gym/envs/mujoco/hopper_throw/__init__.py b/fancy_gym/envs/mujoco/hopper_throw/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
new file mode 100644
index 0000000..1c39602
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
@@ -0,0 +1,56 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
new file mode 100644
index 0000000..b4f0342
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
@@ -0,0 +1,132 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
new file mode 100644
index 0000000..e69cea6
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
@@ -0,0 +1,122 @@
+import os
+from typing import Optional
+
+import numpy as np
+from gym.envs.mujoco.hopper_v4 import HopperEnv
+
+MAX_EPISODE_STEPS_HOPPERTHROW = 250
+
+
+class HopperThrowEnv(HopperEnv):
+ """
+ Initialization changes to normal Hopper:
+ - healthy_reward: 1.0 -> 0.0 -> 0.1
+ - forward_reward_weight -> 5.0
+ - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
+
+ Reward changes to normal Hopper:
+ - velocity: (x_position_after - x_position_before) -> self.get_body_com("ball")[0]
+ """
+
+ def __init__(self,
+ xml_file='hopper_throw.xml',
+ forward_reward_weight=5.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=0.1,
+ terminate_when_unhealthy=True,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.7, float('inf')),
+ healthy_angle_range=(-float('inf'), float('inf')),
+ reset_noise_scale=5e-3,
+ context=True,
+ exclude_current_positions_from_observation=True,
+ max_episode_steps=250):
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ self.current_step = 0
+ self.max_episode_steps = max_episode_steps
+ self.context = context
+ self.goal = 0
+ super().__init__(xml_file=xml_file,
+ forward_reward_weight=forward_reward_weight,
+ ctrl_cost_weight=ctrl_cost_weight,
+ healthy_reward=healthy_reward,
+ terminate_when_unhealthy=terminate_when_unhealthy,
+ healthy_angle_range=healthy_state_range,
+ healthy_z_range=healthy_z_range,
+ healthy_state_range=healthy_angle_range,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ def step(self, action):
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+ ball_pos_after = self.get_body_com("ball")[
+ 0] # abs(self.get_body_com("ball")[0]) # use x and y to get point and use euclid distance as reward?
+ ball_pos_after_y = self.get_body_com("ball")[2]
+
+ # done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
+ ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
+ done = ball_landed
+
+ ctrl_cost = self.control_cost(action)
+ costs = ctrl_cost
+
+ rewards = 0
+
+ if self.current_step >= self.max_episode_steps or done:
+ distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
+ self._forward_reward_weight * ball_pos_after
+ healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
+
+ rewards = distance_reward + healthy_reward
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'ball_pos': ball_pos_after,
+ 'ball_pos_y': ball_pos_after_y,
+ '_steps': self.current_step,
+ 'goal': self.goal,
+ }
+
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.goal)
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ self.current_step = 0
+ self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0
+ return super().reset()
+
+ # overwrite reset_model to make it deterministic
+ def reset_model(self):
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
+
+
+if __name__ == '__main__':
+ render_mode = "human" # "human" or "partial" or "final"
+ env = HopperThrowEnv()
+ obs = env.reset()
+
+ for i in range(2000):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = env.action_space.sample()
+ obs, rew, d, info = env.step(ac)
+ if i % 10 == 0:
+ env.render(mode=render_mode)
+ if d:
+ print('After ', i, ' steps, done: ', d)
+ env.reset()
+
+ env.close()
diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
new file mode 100644
index 0000000..76ef861
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
@@ -0,0 +1,153 @@
+import os
+from typing import Optional
+
+import numpy as np
+from gym.envs.mujoco.hopper_v4 import HopperEnv
+
+MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
+
+
+class HopperThrowInBasketEnv(HopperEnv):
+ """
+ Initialization changes to normal Hopper:
+ - healthy_reward: 1.0 -> 0.0
+ - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
+ - hit_basket_reward: - -> 10
+
+ Reward changes to normal Hopper:
+ - velocity: (x_position_after - x_position_before) -> (ball_position_after - ball_position_before)
+ """
+
+ def __init__(self,
+ xml_file='hopper_throw_in_basket.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=0.0,
+ hit_basket_reward=10,
+ basket_size=0.3,
+ terminate_when_unhealthy=True,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.7, float('inf')),
+ healthy_angle_range=(-float('inf'), float('inf')),
+ reset_noise_scale=5e-3,
+ context=True,
+ penalty=0.0,
+ exclude_current_positions_from_observation=True,
+ max_episode_steps=250):
+ self.hit_basket_reward = hit_basket_reward
+ self.current_step = 0
+ self.max_episode_steps = max_episode_steps
+ self.ball_in_basket = False
+ self.basket_size = basket_size
+ self.context = context
+ self.penalty = penalty
+ self.basket_x = 5
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ super().__init__(xml_file=xml_file,
+ forward_reward_weight=forward_reward_weight,
+ ctrl_cost_weight=ctrl_cost_weight,
+ healthy_reward=healthy_reward,
+ terminate_when_unhealthy=terminate_when_unhealthy,
+ healthy_state_range=healthy_state_range,
+ healthy_z_range=healthy_z_range,
+ healthy_angle_range=healthy_angle_range,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ def step(self, action):
+
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+ ball_pos = self.get_body_com("ball")
+
+ basket_pos = self.get_body_com("basket_ground")
+ basket_center = (basket_pos[0] + 0.5, basket_pos[1], basket_pos[2])
+
+ is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
+ is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
+ self.basket_size / 2)
+ is_in_basket_z = ball_pos[2] < 0.1
+ is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
+ if is_in_basket:
+ self.ball_in_basket = True
+
+ ball_landed = self.get_body_com("ball")[2] <= 0.05
+ done = bool(ball_landed or is_in_basket)
+
+ rewards = 0
+
+ ctrl_cost = self.control_cost(action)
+
+ costs = ctrl_cost
+
+ if self.current_step >= self.max_episode_steps or done:
+
+ if is_in_basket:
+ if not self.context:
+ rewards += self.hit_basket_reward
+ else:
+ dist = np.linalg.norm(ball_pos - basket_center)
+ if self.context:
+ rewards = -10 * dist
+ else:
+ rewards -= (dist * dist)
+ else:
+ # penalty not needed
+ rewards += ((action[
+ :2] > 0) * self.penalty).sum() if self.current_step < 10 else 0 # too much of a penalty?
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'ball_pos': ball_pos[0],
+ }
+
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.basket_x)
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ if self.max_episode_steps == 10:
+ # We have to initialize this here, because the spec is only added after creating the env.
+ self.max_episode_steps = self.spec.max_episode_steps
+
+ self.current_step = 0
+ self.ball_in_basket = False
+ if self.context:
+ self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
+ self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
+ return super().reset()
+
+ # overwrite reset_model to make it deterministic
+ def reset_model(self):
+ # Todo remove if not needed!
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
+
+
+if __name__ == '__main__':
+ render_mode = "human" # "human" or "partial" or "final"
+ env = HopperThrowInBasketEnv()
+ obs = env.reset()
+
+ for i in range(2000):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = env.action_space.sample()
+ obs, rew, d, info = env.step(ac)
+ if i % 10 == 0:
+ env.render(mode=render_mode)
+ if d:
+ print('After ', i, ' steps, done: ', d)
+ env.reset()
+
+ env.close()
diff --git a/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
new file mode 100644
index 0000000..cad680a
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
@@ -0,0 +1,23 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [False] * 17,
+ [True] # goal pos
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.data.qpos[3:6].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[3:6].copy()
diff --git a/fancy_gym/envs/mujoco/reacher/__init__.py b/fancy_gym/envs/mujoco/reacher/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/reacher/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml b/fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml
new file mode 100644
index 0000000..742f45c
--- /dev/null
+++ b/fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml
@@ -0,0 +1,57 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/reacher/assets/reacher_7links.xml b/fancy_gym/envs/mujoco/reacher/assets/reacher_7links.xml
new file mode 100644
index 0000000..6da5461
--- /dev/null
+++ b/fancy_gym/envs/mujoco/reacher/assets/reacher_7links.xml
@@ -0,0 +1,75 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/reacher/mp_wrapper.py b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
new file mode 100644
index 0000000..0464640
--- /dev/null
+++ b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
@@ -0,0 +1,26 @@
+from typing import Union, Tuple
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.concatenate([[False] * self.n_links, # cos
+ [False] * self.n_links, # sin
+ [True] * 2, # goal position
+ [False] * self.n_links, # angular velocity
+ [False] * 3, # goal distance
+ # [False], # step
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qpos.flat[:self.n_links]
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel.flat[:self.n_links]
diff --git a/fancy_gym/envs/mujoco/reacher/reacher.py b/fancy_gym/envs/mujoco/reacher/reacher.py
new file mode 100644
index 0000000..ccd0073
--- /dev/null
+++ b/fancy_gym/envs/mujoco/reacher/reacher.py
@@ -0,0 +1,110 @@
+import os
+
+import numpy as np
+from gym import utils
+from gym.envs.mujoco import MujocoEnv
+
+MAX_EPISODE_STEPS_REACHER = 200
+
+
+class ReacherEnv(MujocoEnv, utils.EzPickle):
+ """
+ More general version of the gym mujoco Reacher environment
+ """
+
+ def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1):
+ utils.EzPickle.__init__(**locals())
+
+ self._steps = 0
+ self.n_links = n_links
+
+ self.sparse = sparse
+
+ self._ctrl_cost_weight = ctrl_cost_weight
+ self._reward_weight = reward_weight
+
+ file_name = f'reacher_{n_links}links.xml'
+
+ MujocoEnv.__init__(self,
+ model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
+ frame_skip=2,
+ mujoco_bindings="mujoco")
+
+ def step(self, action):
+ self._steps += 1
+
+ is_reward = not self.sparse or (self.sparse and self._steps == MAX_EPISODE_STEPS_REACHER)
+
+ reward_dist = 0.0
+ angular_vel = 0.0
+ if is_reward:
+ reward_dist = self.distance_reward()
+ angular_vel = self.velocity_reward()
+
+ reward_ctrl = -self._ctrl_cost_weight * np.square(action).sum()
+
+ reward = reward_dist + reward_ctrl + angular_vel
+ self.do_simulation(action, self.frame_skip)
+ ob = self._get_obs()
+ done = False
+
+ infos = dict(
+ reward_dist=reward_dist,
+ reward_ctrl=reward_ctrl,
+ velocity=angular_vel,
+ end_effector=self.get_body_com("fingertip").copy(),
+ goal=self.goal if hasattr(self, "goal") else None
+ )
+
+ return ob, reward, done, infos
+
+ def distance_reward(self):
+ vec = self.get_body_com("fingertip") - self.get_body_com("target")
+ return -self._reward_weight * np.linalg.norm(vec)
+
+ def velocity_reward(self):
+ return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
+
+ def viewer_setup(self):
+ self.viewer.cam.trackbodyid = 0
+
+ def reset_model(self):
+ qpos = (
+ # self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq) +
+ self.init_qpos.copy()
+ )
+ while True:
+ # full space
+ self.goal = self.np_random.uniform(low=-self.n_links / 10, high=self.n_links / 10, size=2)
+ # I Quadrant
+ # self.goal = self.np_random.uniform(low=0, high=self.n_links / 10, size=2)
+ # II Quadrant
+ # self.goal = np.random.uniform(low=[-self.n_links / 10, 0], high=[0, self.n_links / 10], size=2)
+ # II + III Quadrant
+ # self.goal = np.random.uniform(low=-self.n_links / 10, high=[0, self.n_links / 10], size=2)
+ # I + II Quadrant
+ # self.goal = np.random.uniform(low=[-self.n_links / 10, 0], high=self.n_links, size=2)
+ if np.linalg.norm(self.goal) < self.n_links / 10:
+ break
+
+ qpos[-2:] = self.goal
+ qvel = (
+ # self.np_random.uniform(low=-0.005, high=0.005, size=self.model.nv) +
+ self.init_qvel.copy()
+ )
+ qvel[-2:] = 0
+ self.set_state(qpos, qvel)
+ self._steps = 0
+
+ return self._get_obs()
+
+ def _get_obs(self):
+ theta = self.data.qpos.flat[:self.n_links]
+ target = self.get_body_com("target")
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ target[:2], # x-y of goal position
+ self.data.qvel.flat[:self.n_links], # angular velocity
+ self.get_body_com("fingertip") - target, # goal distance
+ ])
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py b/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py
new file mode 100644
index 0000000..c5e6d2f
--- /dev/null
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
new file mode 100644
index 0000000..f3bcbd1
--- /dev/null
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
@@ -0,0 +1,64 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
new file mode 100644
index 0000000..d55e9d2
--- /dev/null
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
@@ -0,0 +1,23 @@
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [False] * 17,
+ [True] # goal pos
+ ])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.data.qpos[3:9].copy()
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ return self.data.qvel[3:9].copy()
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
new file mode 100644
index 0000000..ed663d2
--- /dev/null
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
@@ -0,0 +1,117 @@
+import os
+from typing import Optional
+
+import numpy as np
+from gym.envs.mujoco.walker2d_v4 import Walker2dEnv
+
+MAX_EPISODE_STEPS_WALKERJUMP = 300
+
+
+# TODO: Right now this environment only considers jumping to a specific height, which is not nice. It should be extended
+# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
+# as possible, while landing at a specific target position
+
+
+class Walker2dJumpEnv(Walker2dEnv):
+ """
+ healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
+ penalty 10 -> 0 not from alex
+ """
+
+ def __init__(self,
+ xml_file='walker2d.xml',
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=0.0025,
+ terminate_when_unhealthy=True,
+ healthy_z_range=(0.8, 2.0),
+ healthy_angle_range=(-1.0, 1.0),
+ reset_noise_scale=5e-3,
+ penalty=0,
+ exclude_current_positions_from_observation=True,
+ max_episode_steps=300):
+ self.current_step = 0
+ self.max_episode_steps = max_episode_steps
+ self.max_height = 0
+ self._penalty = penalty
+ self.goal = 0
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ super().__init__(xml_file=xml_file,
+ forward_reward_weight=forward_reward_weight,
+ ctrl_cost_weight=ctrl_cost_weight,
+ healthy_reward=healthy_reward,
+ terminate_when_unhealthy=terminate_when_unhealthy,
+ healthy_z_range=healthy_z_range,
+ healthy_angle_range=healthy_angle_range,
+ reset_noise_scale=reset_noise_scale,
+ exclude_current_positions_from_observation=exclude_current_positions_from_observation)
+
+ def step(self, action):
+ self.current_step += 1
+ self.do_simulation(action, self.frame_skip)
+ # pos_after = self.get_body_com("torso")[0]
+ height = self.get_body_com("torso")[2]
+
+ self.max_height = max(height, self.max_height)
+
+ done = bool(height < 0.2)
+
+ ctrl_cost = self.control_cost(action)
+ costs = ctrl_cost
+ rewards = 0
+ if self.current_step >= self.max_episode_steps or done:
+ done = True
+ height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
+ healthy_reward = self.healthy_reward * self.current_step
+
+ rewards = height_goal_distance + healthy_reward
+
+ observation = self._get_obs()
+ reward = rewards - costs
+ info = {
+ 'height': height,
+ 'max_height': self.max_height,
+ 'goal': self.goal,
+ }
+
+ return observation, reward, done, info
+
+ def _get_obs(self):
+ return np.append(super()._get_obs(), self.goal)
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ self.current_step = 0
+ self.max_height = 0
+ self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0
+ return super().reset()
+
+ # overwrite reset_model to make it deterministic
+ def reset_model(self):
+ noise_low = -self._reset_noise_scale
+ noise_high = self._reset_noise_scale
+
+ qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq)
+ qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv)
+
+ self.set_state(qpos, qvel)
+
+ observation = self._get_obs()
+ return observation
+
+
+if __name__ == '__main__':
+ render_mode = "human" # "human" or "partial" or "final"
+ env = Walker2dJumpEnv()
+ obs = env.reset()
+
+ for i in range(6000):
+ # test with random actions
+ ac = env.action_space.sample()
+ obs, rew, d, info = env.step(ac)
+ if i % 10 == 0:
+ env.render(mode=render_mode)
+ if d:
+ print('After ', i, ' steps, done: ', d)
+ env.reset()
+
+ env.close()
diff --git a/fancy_gym/examples/__init__.py b/fancy_gym/examples/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/examples/examples_dmc.py b/fancy_gym/examples/examples_dmc.py
new file mode 100644
index 0000000..75648b7
--- /dev/null
+++ b/fancy_gym/examples/examples_dmc.py
@@ -0,0 +1,137 @@
+import fancy_gym
+
+
+def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
+ """
+ Example for running a DMC based env in the step based setting.
+ The env_id has to be specified as `domain_name:task_name` or
+ for manipulation tasks as `domain_name:manipulation-environment_name`
+
+ Args:
+ env_id: Either `domain_name-task_name` or `manipulation-environment_name`
+ seed: seed for deterministic behaviour
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+ env = fancy_gym.make(env_id, seed)
+ rewards = 0
+ obs = env.reset()
+ print("observation shape:", env.observation_space.shape)
+ print("action shape:", env.action_space.shape)
+
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ if render:
+ env.render(mode="human")
+ obs, reward, done, info = env.step(ac)
+ rewards += reward
+
+ if done:
+ print(env_id, rewards)
+ rewards = 0
+ obs = env.reset()
+
+ env.close()
+ del env
+
+
+def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
+ """
+ Example for running a custom movement primitive based environments.
+ Our already registered environments follow the same structure.
+ Hence, this also allows to adjust hyperparameters of the movement primitives.
+ Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks.
+ We appreciate PRs for custom environments (especially MP wrappers of existing tasks)
+ for our repo: https://github.com/ALRhub/fancy_gym/
+ Args:
+ seed: seed for deterministic behaviour
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+
+ # Base DMC name, according to structure of above example
+ base_env_id = "dmc:ball_in_cup-catch"
+
+ # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
+ # You can also add other gym.Wrappers in case they are needed.
+ wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
+ # # For a ProMP
+ trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
+ phase_generator_kwargs = {'phase_generator_type': 'linear'}
+ controller_kwargs = {'controller_type': 'motor',
+ "p_gains": 1.0,
+ "d_gains": 0.1,}
+ basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+
+ # For a DMP
+ # trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp'}
+ # phase_generator_kwargs = {'phase_generator_type': 'exp',
+ # 'alpha_phase': 2}
+ # controller_kwargs = {'controller_type': 'motor',
+ # "p_gains": 1.0,
+ # "d_gains": 0.1,
+ # }
+ # basis_generator_kwargs = {'basis_generator_type': 'rbf',
+ # 'num_basis': 5
+ # }
+ env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={},
+ traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs,
+ phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs,
+ seed=seed)
+
+ # This renders the full MP trajectory
+ # It is only required to call render() once in the beginning, which renders every consecutive trajectory.
+ # Resetting to no rendering, can be achieved by render(mode=None).
+ # It is also possible to change them mode multiple times when
+ # e.g. only every nth trajectory should be displayed.
+ if render:
+ env.render(mode="human")
+
+ rewards = 0
+ obs = env.reset()
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ obs, reward, done, info = env.step(ac)
+ rewards += reward
+
+ if done:
+ print(base_env_id, rewards)
+ rewards = 0
+ obs = env.reset()
+
+ env.close()
+ del env
+
+
+if __name__ == '__main__':
+ # Disclaimer: DMC environments require the seed to be specified in the beginning.
+ # Adjusting it afterwards with env.seed() is not recommended as it does not affect the underlying physics.
+
+ # For rendering DMC
+ # export MUJOCO_GL="osmesa"
+ render = True
+
+ # # Standard DMC Suite tasks
+ example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render)
+ #
+ # # Manipulation tasks
+ # # Disclaimer: The vision versions are currently not integrated and yield an error
+ example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render)
+ #
+ # # Gym + DMC hybrid task provided in the MP framework
+ example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render)
+
+ # Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
+ # already registered above
+ example_custom_dmc_and_mp(seed=11, iterations=1, render=render)
diff --git a/fancy_gym/examples/examples_general.py b/fancy_gym/examples/examples_general.py
new file mode 100644
index 0000000..1a89e30
--- /dev/null
+++ b/fancy_gym/examples/examples_general.py
@@ -0,0 +1,100 @@
+from collections import defaultdict
+
+import gym
+import numpy as np
+
+import fancy_gym
+
+
+def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
+ """
+ Example for running any env in the step based setting.
+ This also includes DMC environments when leveraging our custom make_env function.
+
+ Args:
+ env_id: OpenAI/Custom gym task id or either `domain_name-task_name` or `manipulation-environment_name` for DMC tasks
+ seed: seed for deterministic behaviour
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+
+ env = fancy_gym.make(env_id, seed)
+ rewards = 0
+ obs = env.reset()
+ print("Observation shape: ", env.observation_space.shape)
+ print("Action shape: ", env.action_space.shape)
+
+ # number of environment steps
+ for i in range(iterations):
+ obs, reward, done, info = env.step(env.action_space.sample())
+ rewards += reward
+
+ if render:
+ env.render()
+
+ if done:
+ print(rewards)
+ rewards = 0
+ obs = env.reset()
+
+
+def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
+ """
+ Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
+ This also includes DMC and DMP environments when leveraging our custom make_env function.
+ Be aware, increasing the number of environments reduces the total length of the individual episodes.
+
+ Args:
+ env_id: OpenAI/Custom gym task id or either `domain_name-task_name` or `manipulation-environment_name` for DMC tasks
+ seed: seed for deterministic behaviour
+ n_cpu: Number of cpus cores to use in parallel
+ n_samples: number of samples generated in total by all environments.
+
+ Returns: Tuple of (obs, reward, done, info) with type np.ndarray
+
+ """
+ env = gym.vector.AsyncVectorEnv([fancy_gym.make_rank(env_id, seed, i) for i in range(n_cpu)])
+ # OR
+ # envs = gym.vector.AsyncVectorEnv([make_env(env_id, seed + i) for i in range(n_cpu)])
+
+ # for plotting
+ rewards = np.zeros(n_cpu)
+ buffer = defaultdict(list)
+
+ obs = env.reset()
+
+ # this would generate more samples than requested if n_samples % num_envs != 0
+ repeat = int(np.ceil(n_samples / env.num_envs))
+ for i in range(repeat):
+ obs, reward, done, info = env.step(env.action_space.sample())
+ buffer['obs'].append(obs)
+ buffer['reward'].append(reward)
+ buffer['done'].append(done)
+ buffer['info'].append(info)
+ rewards += reward
+ if np.any(done):
+ print(f"Reward at iteration {i}: {rewards[done]}")
+ rewards[done] = 0
+
+ # do not return values above threshold
+ return *map(lambda v: np.stack(v)[:n_samples], buffer.values()),
+
+
+if __name__ == '__main__':
+ render = True
+
+ # Basic gym task
+ example_general("Pendulum-v1", seed=10, iterations=200, render=render)
+
+ # Mujoco task from framework
+ example_general("Reacher5d-v0", seed=10, iterations=200, render=render)
+
+ # # OpenAI Mujoco task
+ example_general("HalfCheetah-v2", seed=10, render=render)
+
+ # Vectorized multiprocessing environments
+ # example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)
+
diff --git a/fancy_gym/examples/examples_metaworld.py b/fancy_gym/examples/examples_metaworld.py
new file mode 100644
index 0000000..0fa7066
--- /dev/null
+++ b/fancy_gym/examples/examples_metaworld.py
@@ -0,0 +1,134 @@
+import fancy_gym
+
+
+def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
+ """
+ Example for running a MetaWorld based env in the step based setting.
+ The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
+ return the observable goal version.
+ All tasks can be found here: https://arxiv.org/pdf/1910.10897.pdf or https://meta-world.github.io/
+
+ Args:
+ env_id: `task_name-v2`
+ seed: seed for deterministic behaviour (TODO: currently not working due to an issue in MetaWorld code)
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+ env = fancy_gym.make(env_id, seed)
+ rewards = 0
+ obs = env.reset()
+ print("observation shape:", env.observation_space.shape)
+ print("action shape:", env.action_space.shape)
+
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ if render:
+ # THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
+ # TODO: Remove this, when Metaworld fixes its interface.
+ env.render(False)
+ obs, reward, done, info = env.step(ac)
+ rewards += reward
+ if done:
+ print(env_id, rewards)
+ rewards = 0
+ obs = env.reset()
+
+ env.close()
+ del env
+
+
+def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
+ """
+ Example for running a custom movement primitive based environments.
+ Our already registered environments follow the same structure.
+ Hence, this also allows to adjust hyperparameters of the movement primitives.
+ Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks.
+ We appreciate PRs for custom environments (especially MP wrappers of existing tasks)
+ for our repo: https://github.com/ALRhub/fancy_gym/
+ Args:
+ seed: seed for deterministic behaviour (TODO: currently not working due to an issue in MetaWorld code)
+ iterations: Number of rollout steps to run
+ render: Render the episode (TODO: currently not working due to an issue in MetaWorld code)
+
+ Returns:
+
+ """
+
+ # Base MetaWorld name, according to structure of above example
+ base_env_id = "metaworld:button-press-v2"
+
+ # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
+ # You can also add other gym.Wrappers in case they are needed.
+ wrappers = [fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper]
+ # # For a ProMP
+ # trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
+ # phase_generator_kwargs = {'phase_generator_type': 'linear'}
+ # controller_kwargs = {'controller_type': 'metaworld'}
+ # basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
+ # 'num_basis': 5,
+ # 'num_basis_zero_start': 1
+ # }
+
+ # For a DMP
+ trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp'}
+ phase_generator_kwargs = {'phase_generator_type': 'exp',
+ 'alpha_phase': 2}
+ controller_kwargs = {'controller_type': 'metaworld'}
+ basis_generator_kwargs = {'basis_generator_type': 'rbf',
+ 'num_basis': 5
+ }
+ env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={},
+ traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs,
+ phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs,
+ seed=seed)
+
+ # This renders the full MP trajectory
+ # It is only required to call render() once in the beginning, which renders every consecutive trajectory.
+ # Resetting to no rendering, can be achieved by render(mode=None).
+ # It is also possible to change them mode multiple times when
+ # e.g. only every nth trajectory should be displayed.
+ if render:
+ raise ValueError("Metaworld render interface bug does not allow to render() fixes its interface. "
+ "A temporary workaround is to alter their code in MujocoEnv render() from "
+ "`if not offscreen` to `if not offscreen or offscreen == 'human'`.")
+ # TODO: Remove this, when Metaworld fixes its interface.
+ # env.render(mode="human")
+
+ rewards = 0
+ obs = env.reset()
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ obs, reward, done, info = env.step(ac)
+ rewards += reward
+
+ if done:
+ print(base_env_id, rewards)
+ rewards = 0
+ obs = env.reset()
+
+ env.close()
+ del env
+
+
+if __name__ == '__main__':
+ # Disclaimer: MetaWorld environments require the seed to be specified in the beginning.
+ # Adjusting it afterwards with env.seed() is not recommended as it may not affect the underlying behavior.
+
+ # For rendering it might be necessary to specify your OpenGL installation
+ # export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so
+ render = False
+
+ # # Standard Meta world tasks
+ example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render)
+
+ # # MP + MetaWorld hybrid task provided in the our framework
+ example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render)
+ #
+ # # Custom MetaWorld task
+ example_custom_dmc_and_mp(seed=10, iterations=1, render=render)
+
diff --git a/fancy_gym/examples/examples_movement_primitives.py b/fancy_gym/examples/examples_movement_primitives.py
new file mode 100644
index 0000000..3794112
--- /dev/null
+++ b/fancy_gym/examples/examples_movement_primitives.py
@@ -0,0 +1,169 @@
+import fancy_gym
+
+
+def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True):
+ """
+ Example for running a black box based environment, which is already registered
+ Args:
+ env_name: Black box env_id
+ seed: seed for deterministic behaviour
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+ # Equivalent to gym, we have a make function which can be used to create environments.
+ # It takes care of seeding and enables the use of a variety of external environments using the gym interface.
+ env = fancy_gym.make(env_name, seed)
+
+ returns = 0
+ # env.render(mode=None)
+ obs = env.reset()
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(iterations):
+
+ if render and i % 2 == 0:
+ # This renders the full MP trajectory
+ # It is only required to call render() once in the beginning, which renders every consecutive trajectory.
+ # Resetting to no rendering, can be achieved by render(mode=None).
+ # It is also possible to change the mode multiple times when
+ # e.g. only every second trajectory should be displayed, such as here
+ # Just make sure the correct mode is set before executing the step.
+ env.render(mode="human")
+ else:
+ env.render(mode=None)
+
+ # Now the action space is not the raw action but the parametrization of the trajectory generator,
+ # such as a ProMP
+ ac = env.action_space.sample()
+ # This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
+ # full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
+ # to the return of a trajectory. Default is the sum over the step-wise rewards.
+ obs, reward, done, info = env.step(ac)
+ # Aggregated returns
+ returns += reward
+
+ if done:
+ print(reward)
+ obs = env.reset()
+
+
+def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True):
+ """
+ Example for running a movement primitive based environment, which is already registered
+ Args:
+ env_name: DMP env_id
+ seed: seed for deterministic behaviour
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+ # Changing the arguments of the black box env is possible by providing them to gym as with all kwargs.
+ # E.g. here for way to many basis functions
+ env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000})
+ # env = fancy_gym.make(env_name, seed)
+ # mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
+ # mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
+
+ returns = 0
+ obs = env.reset()
+
+ # This time rendering every trajectory
+ if render:
+ env.render(mode="human")
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ obs, reward, done, info = env.step(ac)
+ returns += reward
+
+ if done:
+ print(i, reward)
+ obs = env.reset()
+
+ return obs
+
+
+def example_fully_custom_mp(seed=1, iterations=1, render=True):
+ """
+ Example for running a custom movement primitive based environments.
+ Our already registered environments follow the same structure.
+ Hence, this also allows to adjust hyperparameters of the movement primitives.
+ Yet, we recommend the method above if you are just interested in changing those parameters for existing tasks.
+ We appreciate PRs for custom environments (especially MP wrappers of existing tasks)
+ for our repo: https://github.com/ALRhub/fancy_gym/
+ Args:
+ seed: seed
+ iterations: Number of rollout steps to run
+ render: Render the episode
+
+ Returns:
+
+ """
+
+ base_env_id = "Reacher5d-v0"
+
+ # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
+ # You can also add other gym.Wrappers in case they are needed.
+ wrappers = [fancy_gym.envs.mujoco.reacher.MPWrapper]
+
+ # For a ProMP
+ trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
+ 'weight_scale': 2}
+ phase_generator_kwargs = {'phase_generator_type': 'linear'}
+ controller_kwargs = {'controller_type': 'velocity'}
+ basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+
+ # # For a DMP
+ # trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
+ # 'weight_scale': 500}
+ # phase_generator_kwargs = {'phase_generator_type': 'exp',
+ # 'alpha_phase': 2.5}
+ # controller_kwargs = {'controller_type': 'velocity'}
+ # basis_generator_kwargs = {'basis_generator_type': 'rbf',
+ # 'num_basis': 5
+ # }
+ env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={},
+ traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs,
+ phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs,
+ seed=seed)
+
+ if render:
+ env.render(mode="human")
+
+ rewards = 0
+ obs = env.reset()
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(iterations):
+ ac = env.action_space.sample()
+ obs, reward, done, info = env.step(ac)
+ rewards += reward
+
+ if done:
+ print(rewards)
+ rewards = 0
+ obs = env.reset()
+
+
+if __name__ == '__main__':
+ render = False
+ # DMP
+ # example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render)
+ #
+ # # ProMP
+ # example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render)
+
+ # Altered basis functions
+ # obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render)
+
+ # Custom MP
+ example_fully_custom_mp(seed=10, iterations=1, render=render)
diff --git a/fancy_gym/examples/examples_open_ai.py b/fancy_gym/examples/examples_open_ai.py
new file mode 100644
index 0000000..a4a162d
--- /dev/null
+++ b/fancy_gym/examples/examples_open_ai.py
@@ -0,0 +1,37 @@
+import fancy_gym
+
+
+def example_mp(env_name, seed=1, render=True):
+ """
+ Example for running a movement primitive based version of a OpenAI-gym environment, which is already registered.
+ For more information on movement primitive specific stuff, look at the traj_gen examples.
+ Args:
+ env_name: ProMP env_id
+ seed: seed
+ render: boolean
+ Returns:
+
+ """
+ # While in this case gym.make() is possible to use as well, we recommend our custom make env function.
+ env = fancy_gym.make(env_name, seed)
+
+ returns = 0
+ obs = env.reset()
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(10):
+ if render and i % 2 == 0:
+ env.render(mode="human")
+ else:
+ env.render(mode=None)
+ ac = env.action_space.sample()
+ obs, reward, done, info = env.step(ac)
+ returns += reward
+
+ if done:
+ print(returns)
+ obs = env.reset()
+
+
+if __name__ == '__main__':
+ example_mp("ReacherProMP-v2")
+
diff --git a/fancy_gym/examples/pd_control_gain_tuning.py b/fancy_gym/examples/pd_control_gain_tuning.py
new file mode 100644
index 0000000..407bfa1
--- /dev/null
+++ b/fancy_gym/examples/pd_control_gain_tuning.py
@@ -0,0 +1,69 @@
+from collections import OrderedDict
+
+import numpy as np
+from matplotlib import pyplot as plt
+
+import fancy_gym
+
+# This might work for some environments, however, please verify either way the correct trajectory information
+# for your environment are extracted below
+SEED = 1
+
+env_id = "Reacher5dProMP-v0"
+
+env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env
+env.action_space.seed(SEED)
+
+# Plot difference between real trajectory and target MP trajectory
+env.reset()
+w = env.action_space.sample()
+pos, vel = env.get_trajectory(w)
+
+base_shape = env.env.action_space.shape
+actual_pos = np.zeros((len(pos), *base_shape))
+actual_vel = np.zeros((len(pos), *base_shape))
+act = np.zeros((len(pos), *base_shape))
+
+plt.ion()
+fig = plt.figure()
+ax = fig.add_subplot(1, 1, 1)
+
+img = ax.imshow(env.env.render(mode="rgb_array"))
+fig.show()
+
+for t, pos_vel in enumerate(zip(pos, vel)):
+ actions = env.tracking_controller.get_action(pos_vel[0], pos_vel[1], env.current_vel, env.current_pos)
+ actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
+ _, _, _, _ = env.env.step(actions)
+ if t % 15 == 0:
+ img.set_data(env.env.render(mode="rgb_array"))
+ fig.canvas.draw()
+ fig.canvas.flush_events()
+ act[t, :] = actions
+ # TODO verify for your environment
+ actual_pos[t, :] = env.current_pos
+ actual_vel[t, :] = env.current_vel
+
+plt.figure(figsize=(15, 5))
+
+plt.subplot(131)
+plt.title("Position")
+p1 = plt.plot(actual_pos, c='C0', label="true")
+p2 = plt.plot(pos, c='C1', label="MP")
+plt.xlabel("Episode steps")
+handles, labels = plt.gca().get_legend_handles_labels()
+
+by_label = OrderedDict(zip(labels, handles))
+plt.legend(by_label.values(), by_label.keys())
+
+plt.subplot(132)
+plt.title("Velocity")
+plt.plot(actual_vel, c='C0', label="true")
+plt.plot(vel, c='C1', label="MP")
+plt.xlabel("Episode steps")
+
+plt.subplot(133)
+plt.title(f"Actions {np.std(act, axis=0)}")
+plt.plot(act, c="C0"),
+plt.xlabel("Episode steps")
+plt.show()
diff --git a/fancy_gym/meta/README.MD b/fancy_gym/meta/README.MD
new file mode 100644
index 0000000..c8d9cd1
--- /dev/null
+++ b/fancy_gym/meta/README.MD
@@ -0,0 +1,26 @@
+# MetaWorld Wrappers
+
+These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Motion Primitive gym interface with them.
+All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized.
+Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following:
+```python
+ return np.hstack([
+ # Current observation
+ [False] * 3, # end-effector position
+ [False] * 1, # normalized gripper open distance
+ [True] * 3, # main object position
+ [False] * 4, # main object quaternion
+ [False] * 3, # secondary object position
+ [False] * 4, # secondary object quaternion
+ # Previous observation
+ [False] * 3, # previous end-effector position
+ [False] * 1, # previous normalized gripper open distance
+ [False] * 3, # previous main object position
+ [False] * 4, # previous main object quaternion
+ [False] * 3, # previous second object position
+ [False] * 4, # previous second object quaternion
+ # Goal
+ [True] * 3, # goal position
+ ])
+```
+For other tasks only the boolean values have to be adjusted accordingly.
\ No newline at end of file
diff --git a/fancy_gym/meta/__init__.py b/fancy_gym/meta/__init__.py
new file mode 100644
index 0000000..04438f4
--- /dev/null
+++ b/fancy_gym/meta/__init__.py
@@ -0,0 +1,103 @@
+from copy import deepcopy
+
+from gym import register
+
+from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
+ object_change_mp_wrapper
+
+ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []}
+
+# MetaWorld
+
+DEFAULT_BB_DICT_ProMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'promp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'linear'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'metaworld',
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+}
+
+_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
+ "plate-slide-side-v2", "plate-slide-back-side-v2"]
+for _task in _goal_change_envs:
+ task_id_split = _task.split("-")
+ name = "".join([s.capitalize() for s in task_id_split[:-1]])
+ _env_id = f'{name}ProMP-{task_id_split[-1]}'
+ kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
+ kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
+
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_goal_change_promp
+ )
+ ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
+for _task in _object_change_envs:
+ task_id_split = _task.split("-")
+ name = "".join([s.capitalize() for s in task_id_split[:-1]])
+ _env_id = f'{name}ProMP-{task_id_split[-1]}'
+ kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
+ kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_object_change_promp
+ )
+ ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
+ "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
+ "coffee-push-v2", "dial-turn-v2", "disassemble-v2", "door-close-v2",
+ "door-lock-v2", "door-open-v2", "door-unlock-v2", "hand-insert-v2",
+ "drawer-close-v2", "drawer-open-v2", "faucet-open-v2", "faucet-close-v2",
+ "handle-press-side-v2", "handle-press-v2", "handle-pull-side-v2",
+ "handle-pull-v2", "lever-pull-v2", "peg-insert-side-v2", "pick-place-wall-v2",
+ "reach-v2", "push-back-v2", "push-v2", "pick-place-v2", "peg-unplug-side-v2",
+ "soccer-v2", "stick-push-v2", "stick-pull-v2", "push-wall-v2", "reach-wall-v2",
+ "shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
+ ]
+for _task in _goal_and_object_change_envs:
+ task_id_split = _task.split("-")
+ name = "".join([s.capitalize() for s in task_id_split[:-1]])
+ _env_id = f'{name}ProMP-{task_id_split[-1]}'
+ kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
+ kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
+
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_goal_and_object_change_promp
+ )
+ ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
+
+_goal_and_endeffector_change_envs = ["basketball-v2"]
+for _task in _goal_and_endeffector_change_envs:
+ task_id_split = _task.split("-")
+ name = "".join([s.capitalize() for s in task_id_split[:-1]])
+ _env_id = f'{name}ProMP-{task_id_split[-1]}'
+ kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+ kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
+ kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
+
+ register(
+ id=_env_id,
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_goal_and_endeffector_change_promp
+ )
+ ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
diff --git a/fancy_gym/meta/base_metaworld_mp_wrapper.py b/fancy_gym/meta/base_metaworld_mp_wrapper.py
new file mode 100644
index 0000000..ae800ff
--- /dev/null
+++ b/fancy_gym/meta/base_metaworld_mp_wrapper.py
@@ -0,0 +1,21 @@
+from abc import ABC
+from typing import Tuple, Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class BaseMetaworldMPWrapper(RawInterfaceWrapper, ABC):
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ r_close = self.env.data.get_joint_qpos("r_close")
+ # TODO check if this is correct
+ # return np.hstack([self.env.data.get_body_xpos('hand').flatten() / self.env.action_scale, r_close])
+ return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ # TODO check if this is correct
+ return np.zeros(4, )
+ # raise NotImplementedError("Velocity cannot be retrieved.")
diff --git a/fancy_gym/meta/goal_change_mp_wrapper.py b/fancy_gym/meta/goal_change_mp_wrapper.py
new file mode 100644
index 0000000..a8eabb5
--- /dev/null
+++ b/fancy_gym/meta/goal_change_mp_wrapper.py
@@ -0,0 +1,49 @@
+import numpy as np
+
+from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper
+
+
+class MPWrapper(BaseMetaworldMPWrapper):
+ """
+ This Wrapper is for environments where merely the goal changes in the beginning
+ and no secondary objects or end effectors are altered at the start of an episode.
+ You can verify this by executing the code below for your environment id and check if the output is non-zero
+ at the same indices.
+ ```python
+ import fancy_gym
+ env = fancy_gym.make(env_id, 1)
+ print(env.reset() - env.reset())
+ array([ 0. , 0. , 0. , 0. , 0,
+ 0 , 0 , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0 , 0 , 0 ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , !=0 , !=0 , !=0])
+ ```
+ """
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # This structure is the same for all metaworld environments.
+ # Only the observations which change could differ
+ return np.hstack([
+ # Current observation
+ [False] * 3, # end-effector position
+ [False] * 1, # normalized gripper open distance
+ [False] * 3, # main object position
+ [False] * 4, # main object quaternion
+ [False] * 3, # secondary object position
+ [False] * 4, # secondary object quaternion
+ # Previous observation
+ # TODO: Include previous values? According to their source they might be wrong for the first iteration.
+ [False] * 3, # previous end-effector position
+ [False] * 1, # previous normalized gripper open distance
+ [False] * 3, # previous main object position
+ [False] * 4, # previous main object quaternion
+ [False] * 3, # previous second object position
+ [False] * 4, # previous second object quaternion
+ # Goal
+ [True] * 3, # goal position
+ ])
diff --git a/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
new file mode 100644
index 0000000..c299597
--- /dev/null
+++ b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
@@ -0,0 +1,49 @@
+import numpy as np
+
+from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper
+
+
+class MPWrapper(BaseMetaworldMPWrapper):
+ """
+ This Wrapper is for environments where merely the goal changes in the beginning
+ and no secondary objects or end effectors are altered at the start of an episode.
+ You can verify this by executing the code below for your environment id and check if the output is non-zero
+ at the same indices.
+ ```python
+ import fancy_gym
+ env = fancy_gym.make(env_id, 1)
+ print(env.reset() - env.reset())
+ array([ !=0 , !=0 , !=0 , 0. , 0.,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , !=0 , !=0 ,
+ !=0 , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , !=0 , !=0 , !=0])
+ ```
+ """
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # This structure is the same for all metaworld environments.
+ # Only the observations which change could differ
+ return np.hstack([
+ # Current observation
+ [True] * 3, # end-effector position
+ [False] * 1, # normalized gripper open distance
+ [False] * 3, # main object position
+ [False] * 4, # main object quaternion
+ [False] * 3, # secondary object position
+ [False] * 4, # secondary object quaternion
+ # Previous observation
+ # TODO: Include previous values? According to their source they might be wrong for the first iteration.
+ [False] * 3, # previous end-effector position
+ [False] * 1, # previous normalized gripper open distance
+ [False] * 3, # previous main object position
+ [False] * 4, # previous main object quaternion
+ [False] * 3, # previous second object position
+ [False] * 4, # previous second object quaternion
+ # Goal
+ [True] * 3, # goal position
+ ])
diff --git a/fancy_gym/meta/goal_object_change_mp_wrapper.py b/fancy_gym/meta/goal_object_change_mp_wrapper.py
new file mode 100644
index 0000000..ae667a6
--- /dev/null
+++ b/fancy_gym/meta/goal_object_change_mp_wrapper.py
@@ -0,0 +1,49 @@
+import numpy as np
+
+from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper
+
+
+class MPWrapper(BaseMetaworldMPWrapper):
+ """
+ This Wrapper is for environments where merely the goal changes in the beginning
+ and no secondary objects or end effectors are altered at the start of an episode.
+ You can verify this by executing the code below for your environment id and check if the output is non-zero
+ at the same indices.
+ ```python
+ import fancy_gym
+ env = fancy_gym.make(env_id, 1)
+ print(env.reset() - env.reset())
+ array([ 0. , 0. , 0. , 0. , !=0,
+ !=0 , !=0 , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , !=0 , !=0 , !=0 ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , !=0 , !=0 , !=0])
+ ```
+ """
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # This structure is the same for all metaworld environments.
+ # Only the observations which change could differ
+ return np.hstack([
+ # Current observation
+ [False] * 3, # end-effector position
+ [False] * 1, # normalized gripper open distance
+ [True] * 3, # main object position
+ [False] * 4, # main object quaternion
+ [False] * 3, # secondary object position
+ [False] * 4, # secondary object quaternion
+ # Previous observation
+ # TODO: Include previous values? According to their source they might be wrong for the first iteration.
+ [False] * 3, # previous end-effector position
+ [False] * 1, # previous normalized gripper open distance
+ [False] * 3, # previous main object position
+ [False] * 4, # previous main object quaternion
+ [False] * 3, # previous second object position
+ [False] * 4, # previous second object quaternion
+ # Goal
+ [True] * 3, # goal position
+ ])
diff --git a/fancy_gym/meta/object_change_mp_wrapper.py b/fancy_gym/meta/object_change_mp_wrapper.py
new file mode 100644
index 0000000..6faecc9
--- /dev/null
+++ b/fancy_gym/meta/object_change_mp_wrapper.py
@@ -0,0 +1,49 @@
+import numpy as np
+
+from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper
+
+
+class MPWrapper(BaseMetaworldMPWrapper):
+ """
+ This Wrapper is for environments where merely the goal changes in the beginning
+ and no secondary objects or end effectors are altered at the start of an episode.
+ You can verify this by executing the code below for your environment id and check if the output is non-zero
+ at the same indices.
+ ```python
+ import fancy_gym
+ env = fancy_gym.make(env_id, 1)
+ print(env.reset() - env.reset())
+ array([ 0. , 0. , 0. , 0. , !=0 ,
+ !=0 , !=0 , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0 , 0 , 0 ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0. , 0. ,
+ 0. , 0. , 0. , 0.])
+ ```
+ """
+
+ @property
+ def context_mask(self) -> np.ndarray:
+ # This structure is the same for all metaworld environments.
+ # Only the observations which change could differ
+ return np.hstack([
+ # Current observation
+ [False] * 3, # end-effector position
+ [False] * 1, # normalized gripper open distance
+ [False] * 3, # main object position
+ [False] * 4, # main object quaternion
+ [False] * 3, # secondary object position
+ [False] * 4, # secondary object quaternion
+ # Previous observation
+ # TODO: Include previous values? According to their source they might be wrong for the first iteration.
+ [False] * 3, # previous end-effector position
+ [False] * 1, # previous normalized gripper open distance
+ [False] * 3, # previous main object position
+ [False] * 4, # previous main object quaternion
+ [False] * 3, # previous second object position
+ [False] * 4, # previous second object quaternion
+ # Goal
+ [True] * 3, # goal position
+ ])
diff --git a/fancy_gym/open_ai/README.MD b/fancy_gym/open_ai/README.MD
new file mode 100644
index 0000000..62d1f20
--- /dev/null
+++ b/fancy_gym/open_ai/README.MD
@@ -0,0 +1,14 @@
+# OpenAI Gym Wrappers
+
+These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.com/) environments to use
+the Motion Primitive gym interface for them.
+
+## MP Environments
+These environments are wrapped-versions of their OpenAI-gym counterparts.
+
+|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
+|---|---|---|---|---|
+|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
+|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2
+|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4
+|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
diff --git a/fancy_gym/open_ai/__init__.py b/fancy_gym/open_ai/__init__.py
new file mode 100644
index 0000000..3e0b770
--- /dev/null
+++ b/fancy_gym/open_ai/__init__.py
@@ -0,0 +1,118 @@
+from copy import deepcopy
+
+from gym import register
+
+from . import mujoco
+from .deprecated_needs_gym_robotics import robotics
+
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []}
+
+DEFAULT_BB_DICT_ProMP = {
+ "name": 'EnvName',
+ "wrappers": [],
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'promp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'linear'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 1.0,
+ "d_gains": 0.1,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ }
+}
+
+kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
+kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
+kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
+kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
+kwargs_dict_reacher_promp['name'] = "Reacher-v2"
+kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
+register(
+ id='ReacherProMP-v2',
+ entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
+ kwargs=kwargs_dict_reacher_promp
+)
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
+"""
+The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
+However, the usage and so on needs to be checked
+
+register(
+ id='FetchSlideDenseProMP-v1',
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": "gym.envs.robotics:FetchSlideDense-v1",
+ "wrappers": [FlattenObservation, robotics.fetch.MPWrapper],
+ "traj_gen_kwargs": {
+ "num_dof": 4,
+ "num_basis": 5,
+ "duration": 2,
+ "post_traj_time": 0,
+ "zero_start": True,
+ "policy_type": "position"
+ }
+ }
+)
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideDenseProMP-v1")
+
+register(
+ id='FetchSlideProMP-v1',
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": "gym.envs.robotics:FetchSlide-v1",
+ "wrappers": [FlattenObservation, robotics.fetch.MPWrapper],
+ "traj_gen_kwargs": {
+ "num_dof": 4,
+ "num_basis": 5,
+ "duration": 2,
+ "post_traj_time": 0,
+ "zero_start": True,
+ "policy_type": "position"
+ }
+ }
+)
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideProMP-v1")
+
+register(
+ id='FetchReachDenseProMP-v1',
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": "gym.envs.robotics:FetchReachDense-v1",
+ "wrappers": [FlattenObservation, robotics.fetch.MPWrapper],
+ "traj_gen_kwargs": {
+ "num_dof": 4,
+ "num_basis": 5,
+ "duration": 2,
+ "post_traj_time": 0,
+ "zero_start": True,
+ "policy_type": "position"
+ }
+ }
+)
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachDenseProMP-v1")
+
+register(
+ id='FetchReachProMP-v1',
+ entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
+ kwargs={
+ "name": "gym.envs.robotics:FetchReach-v1",
+ "wrappers": [FlattenObservation, robotics.fetch.MPWrapper],
+ "traj_gen_kwargs": {
+ "num_dof": 4,
+ "num_basis": 5,
+ "duration": 2,
+ "post_traj_time": 0,
+ "zero_start": True,
+ "policy_type": "position"
+ }
+ }
+)
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachProMP-v1")
+"""
diff --git a/fancy_gym/open_ai/deprecated_needs_gym_robotics/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/__init__.py
new file mode 100644
index 0000000..13c65b3
--- /dev/null
+++ b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/__init__.py
@@ -0,0 +1 @@
+from . import fetch
\ No newline at end of file
diff --git a/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py
new file mode 100644
index 0000000..e96698a
--- /dev/null
+++ b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py
@@ -0,0 +1,49 @@
+from typing import Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def active_obs(self):
+ return np.hstack([
+ [False] * 3, # achieved goal
+ [True] * 3, # desired/true goal
+ [False] * 3, # grip pos
+ [True, True, False] * int(self.has_object), # object position
+ [True, True, False] * int(self.has_object), # object relative position
+ [False] * 2, # gripper state
+ [False] * 3 * int(self.has_object), # object rotation
+ [False] * 3 * int(self.has_object), # object velocity position
+ [False] * 3 * int(self.has_object), # object velocity rotation
+ [False] * 3, # grip velocity position
+ [False] * 2, # gripper velocity
+ ]).astype(bool)
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray]:
+ dt = self.sim.nsubsteps * self.sim.model.opt.timestep
+ grip_velp = self.sim.data.get_site_xvelp("robot0:grip") * dt
+ # gripper state should be symmetric for left and right.
+ # They are controlled with only one action for both gripper joints
+ gripper_state = self.sim.data.get_joint_qvel('robot0:r_gripper_finger_joint') * dt
+ return np.hstack([grip_velp, gripper_state])
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ grip_pos = self.sim.data.get_site_xpos("robot0:grip")
+ # gripper state should be symmetric for left and right.
+ # They are controlled with only one action for both gripper joints
+ gripper_state = self.sim.data.get_joint_qpos('robot0:r_gripper_finger_joint')
+ return np.hstack([grip_pos, gripper_state])
+
+ @property
+ def goal_pos(self):
+ raise ValueError("Goal position is not available and has to be learnt based on the environment.")
+
+ @property
+ def dt(self) -> Union[float, int]:
+ return self.env.dt
diff --git a/fancy_gym/open_ai/mujoco/__init__.py b/fancy_gym/open_ai/mujoco/__init__.py
new file mode 100644
index 0000000..3082d19
--- /dev/null
+++ b/fancy_gym/open_ai/mujoco/__init__.py
@@ -0,0 +1 @@
+from . import reacher_v2
diff --git a/fancy_gym/open_ai/mujoco/reacher_v2/__init__.py b/fancy_gym/open_ai/mujoco/reacher_v2/__init__.py
new file mode 100644
index 0000000..989b5a9
--- /dev/null
+++ b/fancy_gym/open_ai/mujoco/reacher_v2/__init__.py
@@ -0,0 +1 @@
+from .mp_wrapper import MPWrapper
\ No newline at end of file
diff --git a/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
new file mode 100644
index 0000000..b2fa04c
--- /dev/null
+++ b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
@@ -0,0 +1,26 @@
+from typing import Union
+
+import numpy as np
+
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+
+class MPWrapper(RawInterfaceWrapper):
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray]:
+ return self.sim.data.qvel[:2]
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray]:
+ return self.sim.data.qpos[:2]
+
+ @property
+ def context_mask(self):
+ return np.concatenate([
+ [False] * 2, # cos of two links
+ [False] * 2, # sin of two links
+ [True] * 2, # goal position
+ [False] * 2, # angular velocity
+ [False] * 3, # goal distance
+ ])
diff --git a/fancy_gym/utils/__init__.py b/fancy_gym/utils/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/fancy_gym/utils/__init__.py
@@ -0,0 +1 @@
+
diff --git a/fancy_gym/utils/make_env_helpers.py b/fancy_gym/utils/make_env_helpers.py
new file mode 100644
index 0000000..e4537e6
--- /dev/null
+++ b/fancy_gym/utils/make_env_helpers.py
@@ -0,0 +1,351 @@
+import re
+import uuid
+from collections.abc import MutableMapping
+from copy import deepcopy
+from math import ceil
+from typing import Iterable, Type, Union
+
+import gym
+import numpy as np
+from gym.envs.registration import register, registry
+
+try:
+ from dm_control import suite, manipulation
+except ImportError:
+ pass
+
+try:
+ import metaworld
+except Exception:
+ # catch Exception as Import error does not catch missing mujoco-py
+ pass
+
+import fancy_gym
+from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
+from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
+from fancy_gym.black_box.factory.controller_factory import get_controller
+from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
+from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+from fancy_gym.utils.time_aware_observation import TimeAwareObservation
+from fancy_gym.utils.utils import nested_update
+
+
+def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
+ """
+ TODO: Do we need this?
+ Generate a callable to create a new gym environment with a given seed.
+ The rank is added to the seed and can be used for example when using vector environments.
+ E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
+ with seeds 123 through 130.
+ Hence, testing environments should be seeded with a value which is offset by the number of training environments.
+ Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
+
+ Args:
+ env_id: name of the environment
+ seed: seed for deterministic behaviour
+ rank: environment rank for deterministic over multiple seeds behaviour
+ return_callable: If True returns a callable to create the environment instead of the environment itself.
+
+ Returns:
+
+ """
+
+ def f():
+ return make(env_id, seed + rank, **kwargs)
+
+ return f if return_callable else f()
+
+
+def make(env_id: str, seed: int, **kwargs):
+ """
+ Converts an env_id to an environment with the gym API.
+ This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
+ specified with "dmc:domain_name-task_name"
+ Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
+
+ Args:
+ env_id: spec or env_id for gym tasks, external environments require a domain specification
+ **kwargs: Additional kwargs for the constructor such as pixel observations, etc.
+
+ Returns: Gym environment
+
+ """
+
+ if ':' in env_id:
+ split_id = env_id.split(':')
+ framework, env_id = split_id[-2:]
+ else:
+ framework = None
+
+ if framework == 'metaworld':
+ # MetaWorld environment
+ env = make_metaworld(env_id, seed, **kwargs)
+ elif framework == 'dmc':
+ # DeepMind Control environment
+ env = make_dmc(env_id, seed, **kwargs)
+ else:
+ env = make_gym(env_id, seed, **kwargs)
+
+ env.seed(seed)
+ env.action_space.seed(seed)
+ env.observation_space.seed(seed)
+
+ return env
+
+
+def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
+ """
+ Helper function for creating a wrapped gym environment using MPs.
+ It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
+ provided to expose the interface for MPs.
+
+ Args:
+ env_id: name of the environment
+ wrappers: list of wrappers (at least an RawInterfaceWrapper),
+ seed: seed of environment
+
+ Returns: gym environment with all specified wrappers applied
+
+ """
+ # _env = gym.make(env_id)
+ _env = make(env_id, seed, **kwargs)
+ has_black_box_wrapper = False
+ for w in wrappers:
+ # only wrap the environment if not BlackBoxWrapper, e.g. for vision
+ if issubclass(w, RawInterfaceWrapper):
+ has_black_box_wrapper = True
+ _env = w(_env)
+ if not has_black_box_wrapper:
+ raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
+ return _env
+
+
+def make_bb(
+ env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
+ controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1,
+ **kwargs):
+ """
+ This can also be used standalone for manually building a custom DMP environment.
+ Args:
+ black_box_kwargs: kwargs for the black-box wrapper
+ basis_kwargs: kwargs for the basis generator
+ phase_kwargs: kwargs for the phase generator
+ controller_kwargs: kwargs for the tracking controller
+ env_id: base_env_name,
+ wrappers: list of wrappers (at least an RawInterfaceWrapper),
+ seed: seed of environment
+ traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
+
+ Returns: DMP wrapped gym env
+
+ """
+ _verify_time_limit(traj_gen_kwargs.get("duration", None), kwargs.get("time_limit", None))
+
+ learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
+ do_replanning = black_box_kwargs.get('replanning_schedule')
+ if learn_sub_trajs and do_replanning:
+ raise ValueError('Cannot used sub-trajectory learning and replanning together.')
+
+ # add time_step observation when replanning
+ if (learn_sub_trajs or do_replanning) and not any(issubclass(w, TimeAwareObservation) for w in wrappers):
+ # Add as first wrapper in order to alter observation
+ wrappers.insert(0, TimeAwareObservation)
+
+ env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs)
+
+ traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
+
+ if black_box_kwargs.get('duration') is None:
+ black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt
+ if phase_kwargs.get('tau') is None:
+ phase_kwargs['tau'] = black_box_kwargs['duration']
+
+ if learn_sub_trajs is not None:
+ # We have to learn the length when learning sub_trajectories trajectories
+ phase_kwargs['learn_tau'] = True
+
+ phase_gen = get_phase_generator(**phase_kwargs)
+ basis_gen = get_basis_generator(phase_generator=phase_gen, **basis_kwargs)
+ controller = get_controller(**controller_kwargs)
+ traj_gen = get_trajectory_generator(basis_generator=basis_gen, **traj_gen_kwargs)
+
+ bb_env = BlackBoxWrapper(env, trajectory_generator=traj_gen, tracking_controller=controller,
+ **black_box_kwargs)
+
+ return bb_env
+
+
+def make_bb_env_helper(**kwargs):
+ """
+ Helper function for registering a black box gym environment.
+ Args:
+ **kwargs: expects at least the following:
+ {
+ "name": base environment name.
+ "wrappers": list of wrappers (at least an BlackBoxWrapper is required),
+ "traj_gen_kwargs": {
+ "trajectory_generator_type": type_of_your_movement_primitive,
+ non default arguments for the movement primitive instance
+ ...
+ }
+ "controller_kwargs": {
+ "controller_type": type_of_your_controller,
+ non default arguments for the tracking_controller instance
+ ...
+ },
+ "basis_generator_kwargs": {
+ "basis_generator_type": type_of_your_basis_generator,
+ non default arguments for the basis generator instance
+ ...
+ },
+ "phase_generator_kwargs": {
+ "phase_generator_type": type_of_your_phase_generator,
+ non default arguments for the phase generator instance
+ ...
+ },
+ }
+
+ Returns: MP wrapped gym env
+
+ """
+ seed = kwargs.pop("seed", None)
+ wrappers = kwargs.pop("wrappers")
+
+ traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
+ black_box_kwargs = kwargs.pop('black_box_kwargs', {})
+ contr_kwargs = kwargs.pop("controller_kwargs", {})
+ phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
+ basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
+
+ return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
+ black_box_kwargs=black_box_kwargs,
+ traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
+ phase_kwargs=phase_kwargs,
+ basis_kwargs=basis_kwargs, **kwargs, seed=seed)
+
+
+def make_dmc(
+ env_id: str,
+ seed: int = None,
+ visualize_reward: bool = True,
+ time_limit: Union[None, float] = None,
+ **kwargs
+):
+ if not re.match(r"\w+-\w+", env_id):
+ raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
+ domain_name, task_name = env_id.split("-")
+
+ if task_name.endswith("_vision"):
+ # TODO
+ raise ValueError("The vision interface for manipulation tasks is currently not supported.")
+
+ if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
+ raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
+
+ # env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
+ gym_id = uuid.uuid4().hex + '-v1'
+
+ task_kwargs = {'random': seed}
+ if time_limit is not None:
+ task_kwargs['time_limit'] = time_limit
+
+ # create task
+ # Accessing private attribute because DMC does not expose time_limit or step_limit.
+ # Only the current time_step/time as well as the control_timestep can be accessed.
+ if domain_name == "manipulation":
+ env = manipulation.load(environment_name=task_name, seed=seed)
+ max_episode_steps = ceil(env._time_limit / env.control_timestep())
+ else:
+ env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
+ visualize_reward=visualize_reward, environment_kwargs=kwargs)
+ max_episode_steps = int(env._step_limit)
+
+ register(
+ id=gym_id,
+ entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
+ kwargs={'env': lambda: env},
+ max_episode_steps=max_episode_steps,
+ )
+
+ env = gym.make(gym_id)
+ env.seed(seed)
+ return env
+
+
+def make_metaworld(env_id: str, seed: int, **kwargs):
+ if env_id not in metaworld.ML1.ENV_NAMES:
+ raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
+
+ _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
+
+ # setting this avoids generating the same initialization after each reset
+ _env._freeze_rand_vec = False
+ # New argument to use global seeding
+ _env.seeded_rand_vec = True
+
+ gym_id = uuid.uuid4().hex + '-v1'
+
+ register(
+ id=gym_id,
+ entry_point=lambda: _env,
+ max_episode_steps=_env.max_path_length,
+ )
+
+ # TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
+ env = gym.make(gym_id, disable_env_checker=True)
+ return env
+
+
+def make_gym(env_id, seed, **kwargs):
+ """
+ Create
+ Args:
+ env_id:
+ seed:
+ **kwargs:
+
+ Returns:
+
+ """
+ # Getting the existing keywords to allow for nested dict updates for BB envs
+ # gym only allows for non nested updates.
+ all_kwargs = deepcopy(registry.get(env_id).kwargs)
+ nested_update(all_kwargs, kwargs)
+ kwargs = all_kwargs
+
+ # Add seed to kwargs for bb environments to pass seed to step environments
+ all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), [])
+ if env_id in all_bb_envs:
+ kwargs.update({"seed": seed})
+
+ # Gym
+ env = gym.make(env_id, **kwargs)
+ return env
+
+
+def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):
+ """
+ When using DMC check if a manually specified time limit matches the trajectory duration the MP receives.
+ Mostly, the time_limit for DMC is not specified and the default values from DMC are taken.
+ This check, however, can only been done after instantiating the environment.
+ It can be found in the BaseMP class.
+
+ Args:
+ mp_time_limit: max trajectory length of traj_gen in seconds
+ env_time_limit: max trajectory length of DMC environment in seconds
+
+ Returns:
+
+ """
+ if mp_time_limit is not None and env_time_limit is not None:
+ assert mp_time_limit == env_time_limit, \
+ f"The specified 'time_limit' of {env_time_limit}s does not match " \
+ f"the duration of {mp_time_limit}s for the MP."
+
+
+def _verify_dof(base_env: gym.Env, dof: int):
+ action_shape = np.prod(base_env.action_space.shape)
+ assert dof == action_shape, \
+ f"The specified degrees of freedom ('num_dof') {dof} do not match " \
+ f"the action space of {action_shape} the base environments"
diff --git a/fancy_gym/utils/time_aware_observation.py b/fancy_gym/utils/time_aware_observation.py
new file mode 100644
index 0000000..b2cbc78
--- /dev/null
+++ b/fancy_gym/utils/time_aware_observation.py
@@ -0,0 +1,78 @@
+"""
+Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
+License: MIT
+Copyright (c) 2016 OpenAI (https://openai.com)
+
+Wrapper for adding time aware observations to environment observation.
+"""
+import gym
+import numpy as np
+from gym.spaces import Box
+
+
+class TimeAwareObservation(gym.ObservationWrapper):
+ """Augment the observation with the current time step in the episode.
+
+ The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
+ In particular, pixel observations are not supported. This wrapper will append the current timestep
+ within the current episode to the observation.
+
+ Example:
+ >>> import gym
+ >>> env = gym.make('CartPole-v1')
+ >>> env = TimeAwareObservation(env)
+ >>> env.reset()
+ array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ])
+ >>> env.step(env.action_space.sample())[0]
+ array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ])
+ """
+
+ def __init__(self, env: gym.Env):
+ """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
+ observation space.
+
+ Args:
+ env: The environment to apply the wrapper
+ """
+ super().__init__(env)
+ assert isinstance(env.observation_space, Box)
+ low = np.append(self.observation_space.low, 0.0)
+ high = np.append(self.observation_space.high, 1.0)
+ self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
+ self.t = 0
+ self._max_episode_steps = env.spec.max_episode_steps
+
+ def observation(self, observation):
+ """Adds to the observation with the current time step normalized with max steps.
+
+ Args:
+ observation: The observation to add the time step to
+
+ Returns:
+ The observation with the time step appended to
+ """
+ return np.append(observation, self.t / self._max_episode_steps)
+
+ def step(self, action):
+ """Steps through the environment, incrementing the time step.
+
+ Args:
+ action: The action to take
+
+ Returns:
+ The environment's step using the action.
+ """
+ self.t += 1
+ return super().step(action)
+
+ def reset(self, **kwargs):
+ """Reset the environment setting the time to zero.
+
+ Args:
+ **kwargs: Kwargs to apply to env.reset()
+
+ Returns:
+ The reset environment
+ """
+ self.t = 0
+ return super().reset(**kwargs)
diff --git a/fancy_gym/utils/utils.py b/fancy_gym/utils/utils.py
new file mode 100644
index 0000000..f4fea74
--- /dev/null
+++ b/fancy_gym/utils/utils.py
@@ -0,0 +1,50 @@
+from collections.abc import Mapping, MutableMapping
+
+import numpy as np
+import torch as ch
+
+
+def angle_normalize(x, type="deg"):
+ """
+ normalize angle x to [-pi,pi].
+ Args:
+ x: Angle in either degrees or radians
+ type: one of "deg" or "rad" for x being in degrees or radians
+
+ Returns:
+
+ """
+
+ if type not in ["deg", "rad"]: raise ValueError(f"Invalid type {type}. Choose one of 'deg' or 'rad'.")
+
+ if type == "deg":
+ x = np.deg2rad(x) # x * pi / 180
+
+ two_pi = 2 * np.pi
+ return x - two_pi * np.floor((x + np.pi) / two_pi)
+
+
+def get_numpy(x: ch.Tensor):
+ """
+ Returns numpy array from torch tensor
+ Args:
+ x:
+
+ Returns:
+
+ """
+ return x.detach().cpu().numpy()
+
+
+def nested_update(base: MutableMapping, update):
+ """
+ Updated method for nested Mappings
+ Args:
+ base: main Mapping to be updated
+ update: updated values for base Mapping
+
+ """
+ for k, v in update.items():
+ base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
+ return base
+
diff --git a/setup.py b/setup.py
index b5aa424..9c539da 100644
--- a/setup.py
+++ b/setup.py
@@ -1,24 +1,44 @@
-from setuptools import setup
+import itertools
+
+from setuptools import setup, find_packages
+
+# Environment-specific dependencies for dmc and metaworld
+extras = {
+ "dmc": ["dm_control>=1.0.1"],
+ "metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
+ 'mujoco-py<2.2,>=2.1'],
+}
+
+# All dependencies
+all_groups = set(extras.keys())
+extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups))))
setup(
- name='alr_envs',
- version='0.0.1',
- packages=['alr_envs', 'alr_envs.alr', 'alr_envs.open_ai', 'alr_envs.dmc', 'alr_envs.meta', 'alr_envs.utils'],
- install_requires=[
- 'gym',
- 'PyQt5',
- 'matplotlib',
- 'mp_env_api @ git+https://github.com/ALRhub/motion_primitive_env_api.git',
-# 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git',
- 'mujoco-py<2.1,>=2.0',
- 'dm_control',
- 'metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld',
+ author='Fabian Otto, Onur Celik',
+ name='fancy_gym',
+ version='0.2',
+ classifiers=[
+ # Python 3.7 is minimally supported
+ "Programming Language :: Python :: 3",
+ "Programming Language :: Python :: 3.7",
+ "Programming Language :: Python :: 3.8",
+ "Programming Language :: Python :: 3.9",
+ "Programming Language :: Python :: 3.10",
],
-
- url='https://github.com/ALRhub/alr_envs/',
- license='MIT',
- author='Fabian Otto, Marcel Sandermann, Maximilian Huettenrauch',
+ extras_require=extras,
+ install_requires=[
+ 'gym[mujoco]<0.25.0,>=0.24.0',
+ 'mp_pytorch @ git+https://github.com/ALRhub/MP_PyTorch.git@main'
+ ],
+ packages=[package for package in find_packages() if package.startswith("fancy_gym")],
+ package_data={
+ "fancy_gym": [
+ "envs/mujoco/*/assets/*.xml",
+ ]
+ },
+ python_requires=">=3.7",
+ url='https://github.com/ALRhub/fancy_gym/',
+ # license='AGPL-3.0 license',
author_email='',
- description='Custom Gym environments for various (robotics) tasks. integration of DMC environments into the'
- 'gym interface, and support for using motion primitives with gym environments.'
+ description='Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.'
)
diff --git a/test/test_dmc.py b/test/test_dmc.py
new file mode 100644
index 0000000..0800a67
--- /dev/null
+++ b/test/test_dmc.py
@@ -0,0 +1,130 @@
+import unittest
+
+import gym
+import numpy as np
+from dm_control import suite, manipulation
+
+import fancy_gym
+from fancy_gym import make
+
+SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
+MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
+SEED = 1
+
+
+class TestDMCEnvironments(unittest.TestCase):
+
+ def _run_env(self, env_id, iterations=None, seed=SEED, render=False):
+ """
+ Example for running a DMC based env in the step based setting.
+ The env_id has to be specified as `dmc:domain_name-task_name` or
+ for manipulation tasks as `manipulation-environment_name`
+
+ Args:
+ env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
+ iterations: Number of rollout steps to run
+ seed: random seeding
+ render: Render the episode
+
+ Returns: observations, rewards, dones, actions
+
+ """
+ env: gym.Env = make(env_id, seed=seed)
+ rewards = []
+ observations = []
+ actions = []
+ dones = []
+ obs = env.reset()
+ self._verify_observations(obs, env.observation_space, "reset()")
+
+ iterations = iterations or (env.spec.max_episode_steps or 1)
+
+ # number of samples(multiple environment steps)
+ for i in range(iterations):
+ observations.append(obs)
+
+ ac = env.action_space.sample()
+ actions.append(ac)
+ # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
+ obs, reward, done, info = env.step(ac)
+
+ self._verify_observations(obs, env.observation_space, "step()")
+ self._verify_reward(reward)
+ self._verify_done(done)
+
+ rewards.append(reward)
+ dones.append(done)
+
+ if render:
+ env.render("human")
+
+ if done:
+ break
+
+ assert done, "Done flag is not True after end of episode."
+ observations.append(obs)
+ env.close()
+ del env
+ return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
+
+ def _run_env_determinism(self, ids):
+ seed = 0
+ for env_id in ids:
+ with self.subTest(msg=env_id):
+ traj1 = self._run_env(env_id, seed=seed)
+ traj2 = self._run_env(env_id, seed=seed)
+ for i, time_step in enumerate(zip(*traj1, *traj2)):
+ obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
+ self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.")
+ self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.")
+ self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.")
+ self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.")
+
+ def _verify_observations(self, obs, observation_space, obs_type="reset()"):
+ self.assertTrue(observation_space.contains(obs),
+ f"Observation {obs} received from {obs_type} "
+ f"not contained in observation space {observation_space}.")
+
+ def _verify_reward(self, reward):
+ self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.")
+
+ def _verify_done(self, done):
+ self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.")
+
+ def test_suite_functionality(self):
+ """Tests that suite step environments run without errors using random actions."""
+ for env_id in SUITE_IDS:
+ with self.subTest(msg=env_id):
+ self._run_env(env_id)
+
+ def test_suite_determinism(self):
+ """Tests that for step environments identical seeds produce identical trajectories."""
+ self._run_env_determinism(SUITE_IDS)
+
+ def test_manipulation_functionality(self):
+ """Tests that manipulation step environments run without errors using random actions."""
+ for env_id in MANIPULATION_IDS:
+ with self.subTest(msg=env_id):
+ self._run_env(env_id)
+
+ def test_manipulation_determinism(self):
+ """Tests that for step environments identical seeds produce identical trajectories."""
+ self._run_env_determinism(MANIPULATION_IDS)
+
+ def test_bb_functionality(self):
+ """Tests that black box environments run without errors using random actions."""
+ for traj_gen, env_ids in fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ for id in env_ids:
+ with self.subTest(msg=id):
+ self._run_env(id)
+
+ def test_bb_determinism(self):
+ """Tests that for black box environment identical seeds produce identical trajectories."""
+ for traj_gen, env_ids in fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ self._run_env_determinism(env_ids)
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/test/test_dmc_envs.py b/test/test_dmc_envs.py
deleted file mode 100644
index 7293030..0000000
--- a/test/test_dmc_envs.py
+++ /dev/null
@@ -1,49 +0,0 @@
-from typing import Tuple
-
-import fancy_gym
-import pytest
-from dm_control import suite, manipulation
-
-from test.utils import run_env_determinism, run_env
-
-SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
-MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
-SEED = 1
-
-
-@pytest.mark.parametrize('env_id', SUITE_IDS)
-def test_step_suite_functionality(env_id: str):
- """Tests that suite step environments run without errors using random actions."""
- run_env(env_id)
-
-
-@pytest.mark.parametrize('env_id', SUITE_IDS)
-def test_step_suite_determinism(env_id: str):
- """Tests that for step environments identical seeds produce identical trajectories."""
- seed = 0
- run_env_determinism(env_id, seed)
-
-
-@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
-def test_step_manipulation_functionality(env_id: str):
- """Tests that manipulation step environments run without errors using random actions."""
- run_env(env_id)
-
-
-@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
-def test_step_manipulation_determinism(env_id: str):
- """Tests that for step environments identical seeds produce identical trajectories."""
- seed = 0
- run_env_determinism(env_id, seed)
-
-
-@pytest.mark.parametrize('env_id', [fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()])
-def test_bb_dmc_functionality(env_id: str):
- """Tests that black box environments run without errors using random actions."""
- run_env(env_id)
-
-
-@pytest.mark.parametrize('env_id', [fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()])
-def test_bb_dmc_determinism(env_id: str):
- """Tests that for black box environment identical seeds produce identical trajectories."""
- run_env_determinism(env_id)
diff --git a/test/test_metaworld_envs.py b/test/test_fancy.py
similarity index 59%
rename from test/test_metaworld_envs.py
rename to test/test_fancy.py
index b84ba3c..d4890cc 100644
--- a/test/test_metaworld_envs.py
+++ b/test/test_fancy.py
@@ -3,14 +3,15 @@ import unittest
import gym
import numpy as np
-from alr_envs import make
-from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
+import fancy_gym # noqa
+from fancy_gym.utils.make_env_helpers import make
-ALL_ENVS = [env.split("-goal-observable")[0] for env, _ in ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
+CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if
+ "fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
SEED = 1
-class TestStepMetaWorlEnvironments(unittest.TestCase):
+class TestCustomEnvironments(unittest.TestCase):
def _run_env(self, env_id, iterations=None, seed=SEED, render=False):
"""
@@ -21,26 +22,21 @@ class TestStepMetaWorlEnvironments(unittest.TestCase):
Args:
env_id: Either `domain_name-task_name` or `manipulation-environment_name`
iterations: Number of rollout steps to run
- seed= random seeding
+ seed: random seeding
render: Render the episode
- Returns:
+ Returns: observations, rewards, dones, actions
"""
env: gym.Env = make(env_id, seed=seed)
rewards = []
- observations = []
actions = []
+ observations = []
dones = []
obs = env.reset()
self._verify_observations(obs, env.observation_space, "reset()")
- length = env.max_path_length
- if iterations is None:
- if length is None:
- iterations = 1
- else:
- iterations = length
+ iterations = iterations or (env.spec.max_episode_steps or 1)
# number of samples(multiple environment steps)
for i in range(iterations):
@@ -48,7 +44,6 @@ class TestStepMetaWorlEnvironments(unittest.TestCase):
ac = env.action_space.sample()
actions.append(ac)
- # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
obs, reward, done, info = env.step(ac)
self._verify_observations(obs, env.observation_space, "step()")
@@ -62,36 +57,17 @@ class TestStepMetaWorlEnvironments(unittest.TestCase):
env.render("human")
if done:
- obs = env.reset()
+ break
- assert done, "Done flag is not True after max episode length."
+ assert done, "Done flag is not True after end of episode."
observations.append(obs)
env.close()
del env
return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
- def _verify_observations(self, obs, observation_space, obs_type="reset()"):
- self.assertTrue(observation_space.contains(obs),
- f"Observation {obs} received from {obs_type} "
- f"not contained in observation space {observation_space}.")
-
- def _verify_reward(self, reward):
- self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.")
-
- def _verify_done(self, done):
- self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.")
-
- def test_metaworld_functionality(self):
- """Tests that environments runs without errors using random actions."""
- for env_id in ALL_ENVS:
- with self.subTest(msg=env_id):
- self._run_env(env_id)
-
- def test_metaworld_determinism(self):
- """Tests that identical seeds produce identical trajectories."""
+ def _run_env_determinism(self, ids):
seed = 0
- # Iterate over two trajectories, which should have the same state and action sequence
- for env_id in ALL_ENVS:
+ for env_id in ids:
with self.subTest(msg=env_id):
traj1 = self._run_env(env_id, seed=seed)
traj2 = self._run_env(env_id, seed=seed)
@@ -99,9 +75,44 @@ class TestStepMetaWorlEnvironments(unittest.TestCase):
obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.")
self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.")
- self.assertAlmostEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.")
+ self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.")
self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.")
+ def _verify_observations(self, obs, observation_space, obs_type="reset()"):
+ self.assertTrue(observation_space.contains(obs),
+ f"Observation {obs} received from {obs_type} "
+ f"not contained in observation space {observation_space}.")
+
+ def _verify_reward(self, reward):
+ self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.")
+
+ def _verify_done(self, done):
+ self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.")
+
+ def test_step_functionality(self):
+ """Tests that step environments run without errors using random actions."""
+ for env_id in CUSTOM_IDS:
+ with self.subTest(msg=env_id):
+ self._run_env(env_id)
+
+ def test_step_determinism(self):
+ """Tests that for step environments identical seeds produce identical trajectories."""
+ self._run_env_determinism(CUSTOM_IDS)
+
+ def test_bb_functionality(self):
+ """Tests that black box environments run without errors using random actions."""
+ for traj_gen, env_ids in fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ for id in env_ids:
+ with self.subTest(msg=id):
+ self._run_env(id)
+
+ def test_bb_determinism(self):
+ """Tests that for black box environment identical seeds produce identical trajectories."""
+ for traj_gen, env_ids in fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ self._run_env_determinism(env_ids)
+
if __name__ == '__main__':
unittest.main()
diff --git a/test/test_gym.py b/test/test_gym.py
new file mode 100644
index 0000000..c8252d0
--- /dev/null
+++ b/test/test_gym.py
@@ -0,0 +1,118 @@
+import unittest
+
+import gym
+import numpy as np
+
+import fancy_gym
+from fancy_gym import make
+
+GYM_IDS = [spec.id for spec in gym.envs.registry.all() if
+ "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
+SEED = 1
+
+
+class TestGymEnvironments(unittest.TestCase):
+
+ def _run_env(self, env_id, iterations=None, seed=SEED, render=False):
+ """
+ Example for running a openai gym env in the step based setting.
+ The env_id has to be specified as `env_id-vX`.
+
+ Args:
+ env_id: env id in the form `env_id-vX`
+ iterations: Number of rollout steps to run
+ seed: random seeding
+ render: Render the episode
+
+ Returns:
+
+ """
+ env: gym.Env = make(env_id, seed=seed)
+ rewards = []
+ observations = []
+ actions = []
+ dones = []
+ obs = env.reset()
+ self._verify_observations(obs, env.observation_space, "reset()")
+
+ iterations = iterations or (env.spec.max_episode_steps or 1)
+
+ # number of samples(multiple environment steps)
+ for i in range(iterations):
+ observations.append(obs)
+
+ ac = env.action_space.sample()
+ actions.append(ac)
+ # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
+ obs, reward, done, info = env.step(ac)
+
+ self._verify_observations(obs, env.observation_space, "step()")
+ self._verify_reward(reward)
+ self._verify_done(done)
+
+ rewards.append(reward)
+ dones.append(done)
+
+ if render:
+ env.render("human")
+
+ if done:
+ break
+
+ assert done or env.spec.max_episode_steps is None, "Done flag is not True after end of episode."
+ observations.append(obs)
+ env.close()
+ del env
+ return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
+
+ def _run_env_determinism(self, ids):
+ seed = 0
+ for env_id in ids:
+ with self.subTest(msg=env_id):
+ traj1 = self._run_env(env_id, seed=seed)
+ traj2 = self._run_env(env_id, seed=seed)
+ for i, time_step in enumerate(zip(*traj1, *traj2)):
+ obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
+ self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.")
+ self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.")
+ self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.")
+ self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.")
+
+ def _verify_observations(self, obs, observation_space, obs_type="reset()"):
+ self.assertTrue(observation_space.contains(obs),
+ f"Observation {obs} received from {obs_type} "
+ f"not contained in observation space {observation_space}.")
+
+ def _verify_reward(self, reward):
+ self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.")
+
+ def _verify_done(self, done):
+ self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.")
+
+ def test_step_functionality(self):
+ """Tests that step environments run without errors using random actions."""
+ for env_id in GYM_IDS:
+ with self.subTest(msg=env_id):
+ self._run_env(env_id)
+
+ def test_step_determinism(self):
+ """Tests that for step environments identical seeds produce identical trajectories."""
+ self._run_env_determinism(GYM_IDS)
+
+ def test_bb_functionality(self):
+ """Tests that black box environments run without errors using random actions."""
+ for traj_gen, env_ids in fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ for id in env_ids:
+ with self.subTest(msg=id):
+ self._run_env(id)
+
+ def test_bb_determinism(self):
+ """Tests that for black box environment identical seeds produce identical trajectories."""
+ for traj_gen, env_ids in fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ self._run_env_determinism(env_ids)
+
+
+if __name__ == '__main__':
+ unittest.main()
diff --git a/test/test_metaworld.py b/test/test_metaworld.py
new file mode 100644
index 0000000..f10d54a
--- /dev/null
+++ b/test/test_metaworld.py
@@ -0,0 +1,119 @@
+import unittest
+
+import gym
+import numpy as np
+from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
+
+import fancy_gym
+from fancy_gym import make
+
+METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in
+ ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
+SEED = 1
+
+
+class TestMetaWorldEnvironments(unittest.TestCase):
+
+ def _run_env(self, env_id, iterations=None, seed=SEED, render=False):
+ """
+ Example for running a metaworld based env in the step based setting.
+ The env_id has to be specified as `metaworld:env_id-vX`.
+
+ Args:
+ env_id: env id in the form `metaworld:env_id-vX`
+ iterations: Number of rollout steps to run
+ seed: random seeding
+ render: Render the episode
+
+ Returns:
+
+ """
+ env: gym.Env = make(env_id, seed=seed)
+ rewards = []
+ observations = []
+ actions = []
+ dones = []
+ obs = env.reset()
+ self._verify_observations(obs, env.observation_space, "reset()")
+
+ iterations = iterations or (env.spec.max_episode_steps or 1)
+
+ # number of samples(multiple environment steps)
+ for i in range(iterations):
+ observations.append(obs)
+
+ ac = env.action_space.sample()
+ actions.append(ac)
+ # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
+ obs, reward, done, info = env.step(ac)
+
+ self._verify_observations(obs, env.observation_space, "step()")
+ self._verify_reward(reward)
+ self._verify_done(done)
+
+ rewards.append(reward)
+ dones.append(done)
+
+ if render:
+ env.render("human")
+
+ if done:
+ break
+
+ assert done, "Done flag is not True after end of episode."
+ observations.append(obs)
+ env.close()
+ del env
+ return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
+
+ def _run_env_determinism(self, ids):
+ seed = 0
+ for env_id in ids:
+ with self.subTest(msg=env_id):
+ traj1 = self._run_env(env_id, seed=seed)
+ traj2 = self._run_env(env_id, seed=seed)
+ for i, time_step in enumerate(zip(*traj1, *traj2)):
+ obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
+ self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.")
+ self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.")
+ self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.")
+ self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.")
+
+ def _verify_observations(self, obs, observation_space, obs_type="reset()"):
+ self.assertTrue(observation_space.contains(obs),
+ f"Observation {obs} received from {obs_type} "
+ f"not contained in observation space {observation_space}.")
+
+ def _verify_reward(self, reward):
+ self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.")
+
+ def _verify_done(self, done):
+ self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.")
+
+ def test_step_functionality(self):
+ """Tests that step environments run without errors using random actions."""
+ for env_id in METAWORLD_IDS:
+ with self.subTest(msg=env_id):
+ self._run_env(env_id)
+
+ def test_step_determinism(self):
+ """Tests that for step environments identical seeds produce identical trajectories."""
+ self._run_env_determinism(METAWORLD_IDS)
+
+ def test_bb_functionality(self):
+ """Tests that black box environments run without errors using random actions."""
+ for traj_gen, env_ids in fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ for id in env_ids:
+ with self.subTest(msg=id):
+ self._run_env(id)
+
+ def test_bb_determinism(self):
+ """Tests that for black box environment identical seeds produce identical trajectories."""
+ for traj_gen, env_ids in fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items():
+ with self.subTest(msg=traj_gen):
+ self._run_env_determinism(env_ids)
+
+
+if __name__ == '__main__':
+ unittest.main()