-`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
-of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites
-[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
-(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
-gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
-you would then submit a PR for this environment to become part of `fancy_gym`.
-In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
-Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
+| :exclamation: Fancy Gym has recently received a major refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the [legacy branch](https://github.com/ALRhub/fancy_gym/tree/legacy). Find out more about what changed [here](https://github.com/ALRhub/fancy_gym/pull/75). |
+| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+
+Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAI’s renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
+
+**Key Features**:
+
+- **New Challenging Environments**: `fancy_gym` includes several new environments (Panda Box Pushing, Table Tennis, etc.) that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
+- **Support for Movement Primitives**: `fancy_gym` supports a range of movement primitives (MPs), including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
+- **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
+- **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/), whether you want to use them in the regular step-based setting or using MPs.
+- **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
-Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
-optimization, and methods that are often used in traditional robotics and control. MP environments are typically
-episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
-Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
-step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
-of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
-respectively as well as a special controller for the MetaWorld control suite.
-The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
-the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
-the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
-episode. This requires to predict a new action/MP parametrization for each context.
+
+Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
+
+
+Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
+
+
+While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
+
## Installation
1. Clone the repository
-```bash
+```bash
git clone git@github.com:ALRhub/fancy_gym.git
```
2. Go to the folder
-```bash
+```bash
cd fancy_gym
```
3. Install with
-```bash
+```bash
pip install -e .
```
-In case you want to use dm_control oder metaworld, you can install them by specifying extras
+We have a few optional dependencies. If you also want to install those use
-```bash
-pip install -e .[dmc,metaworld]
+```bash
+pip install -e '.[all]' # to install all optional dependencies
+pip install -e '.[dmc,metaworld,box2d,mujoco,mujoco-legacy,jax,testing]' # or choose only those you want
```
-> **Note:**
-> While our library already fully supports the new mujoco bindings, metaworld still relies on
-> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
-
## How to use Fancy Gym
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
-### Step-wise Environments
+### Step-Based Environments
+
+Regular step based environments added by Fancy Gym are added into the `fancy/` namespace.
+
+| :exclamation: Legacy versions of Fancy Gym used `fancy_gym.make(...)`. This is no longer supported and will raise an Exception on new versions. |
+| ----------------------------------------------------------------------------------------------------------------------------------------------- |
```python
+import gymnasium as gym
import fancy_gym
-env = fancy_gym.make('Reacher5d-v0', seed=1)
-obs = env.reset()
+env = gym.make('fancy/Reacher5d-v0')
+# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
+# or env = gym.make('dm_control/ball_in_cup-catch-v0')
+# or env = gym.make('Reacher-v2')
+observation = env.reset(seed=1)
for i in range(1000):
action = env.action_space.sample()
- obs, reward, done, info = env.step(action)
+ observation, reward, terminated, truncated, info = env.step(action)
if i % 5 == 0:
env.render()
- if done:
- obs = env.reset()
-```
-
-When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
-tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
-custom tasks and standard gym environments can be created without prefixes.
+ if terminated or truncated:
+ observation, info = env.reset()
+```
### Black-box Environments
-All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
-each environment returns all collected information from each step as part of the infos. This information is, however,
-mainly meant for debugging as well as logging and not for training.
+All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
-|Key| Description|Type
-|---|---|---|
-`positions`| Generated trajectory from MP | Optional
-`velocities`| Generated trajectory from MP | Optional
-`step_actions`| Step-wise executed action based on controller output | Optional
-`step_observations`| Step-wise intermediate observations | Optional
-`step_rewards`| Step-wise rewards | Optional
-`trajectory_length`| Total number of environment interactions | Always
-`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
+| Key | Description | Type |
+| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
+| `positions` | Generated trajectory from MP | Optional |
+| `velocities` | Generated trajectory from MP | Optional |
+| `step_actions` | Step-wise executed action based on controller output | Optional |
+| `step_observations` | Step-wise intermediate observations | Optional |
+| `step_rewards` | Step-wise rewards | Optional |
+| `trajectory_length` | Total number of environment interactions | Always |
+| `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always |
-Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
+Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by `_/`.
+Just keep in mind, calling `step()` executes a full trajectory.
-> **Note:**
+> **Note:**
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
> black-box leaning.
@@ -105,30 +114,38 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
> Feel free to try it and open an issue with any problems that occur.
```python
+import gymnasium as gym
import fancy_gym
-env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
+env = gym.make('fancy_ProMP/Reacher5d-v0')
+# or env = gym.make('metaworld_ProDMP/reach-v2')
+# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
+# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_ NS
+
# render() can be called once in the beginning with all necessary arguments.
-# To turn it of again just call render() without any arguments.
+# To turn it of again just call render() without any arguments.
env.render(mode='human')
# This returns the context information, not the full state observation
-obs = env.reset()
+observation, info = env.reset(seed=1)
for i in range(5):
action = env.action_space.sample()
- obs, reward, done, info = env.step(action)
+ observation, reward, terminated, truncated, info = env.step(action)
- # Done is always True as we are working on the episode level, hence we always reset()
- obs = env.reset()
+ # terminated or truncated is always True as we are working on the episode level, hence we always reset()
+ observation, info = env.reset()
```
To show all available environments, we provide some additional convenience variables. All of them return a dictionary
-with two keys `DMP` and `ProMP` that store a list of available environment ids.
+with the keys `DMP`, `ProMP`, `ProDMP` and `all` that store a list of available environment ids.
```python
import fancy_gym
+print("All Black-box tasks:")
+print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
+
print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
@@ -140,6 +157,9 @@ print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("MetaWorld Black-box tasks:")
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
+
+print("If you add custom envs, their mp versions will be found in:")
+print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[''])
```
### How to create a new MP task
@@ -151,23 +171,27 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
from abc import abstractmethod
from typing import Union, Tuple
-import gym
+import gymnasium as gym
import numpy as np
class RawInterfaceWrapper(gym.Wrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {},
+ }
@property
def context_mask(self) -> np.ndarray:
"""
- Returns boolean mask of the same shape as the observation space.
- It determines whether the observation is returned for the contextual case or not.
- This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
- E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
- context/part of the first observation, the velocities are not necessary in the observation for the task.
- Returns:
- bool array representing the indices of the observations
-
+ Returns boolean mask of the same shape as the observation space.
+ It determines whether the observation is returned for the contextual case or not.
+ This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+ E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
+ context/part of the first observation, the velocities are not necessary in the observation for the task.
+ Returns:
+ bool array representing the indices of the observations
"""
return np.ones(self.env.observation_space.shape[0], dtype=bool)
@@ -197,34 +221,91 @@ class RawInterfaceWrapper(gym.Wrapper):
```
+Default configurations for MPs can be overitten by defining attributes in mp_config.
+Available parameters are documented in the [MP_PyTorch Userguide](https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md).
+
+```python
+class RawInterfaceWrapper(gym.Wrapper):
+ mp_config = {
+ 'ProMP': {
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'linear'
+ # When selecting another generator type, the default configuration will not be merged for the attribute.
+ },
+ 'controller_kwargs': {
+ 'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
+ 'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 3,
+ 'num_basis_zero_start': 1,
+ 'num_basis_zero_goal': 1,
+ },
+ },
+ 'DMP': {},
+ 'ProDMP': {}.
+ }
+
+ [...]
+```
+
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
at the [examples](fancy_gym/examples/).
+If the step-based is already registered with gym, you can simply do the following:
+
```python
-import fancy_gym
+fancy_gym.upgrade(
+ id='custom/cool_new_env-v0',
+ mp_wrapper=my_custom_MPWrapper
+)
+```
-# Base environment name, according to structure of above example
-base_env_id = "dmc:ball_in_cup-catch"
+If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
-# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
-# You can also add other gym.Wrappers in case they are needed,
-# e.g. gym.wrappers.FlattenObservation for dict observations
-wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
-kwargs = {...}
-env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
+```python
+fancy_gym.register(
+ id='custom/cool_new_env-v0',
+ entry_point=my_custom_env,
+ mp_wrapper=my_custom_MPWrapper
+)
+```
+
+From this point on, you can access MP-version of your environments via
+
+```python
+env = gym.make('custom_ProDMP/cool_new_env-v0')
rewards = 0
-obs = env.reset()
+observation, info = env.reset()
# number of samples/full trajectories (multiple environment steps)
for i in range(5):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ observation, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
- print(base_env_id, rewards)
+ if terminated or truncated:
+ print(rewards)
rewards = 0
- obs = env.reset()
+ observation, info = env.reset()
```
+
+## Citing the Project
+
+To cite this repository in publications:
+
+```bibtex
+@software{fancy_gym,
+ title = {Fancy Gym},
+ author = {Otto, Fabian and Celik, Onur and Roth, Dominik and Zhou, Hongyi},
+ abstract = {Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.},
+ url = {https://github.com/ALRhub/fancy_gym},
+ organization = {Autonomous Learning Robots Lab (ALR) at KIT},
+}
+```
+
+## Icon Attribution
+
+The icon is based on the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) icon as can be found [here](https://gymnasium.farama.org/_static/img/gymnasium_black.svg).
diff --git a/fancy_gym/__init__.py b/fancy_gym/__init__.py
index f6f690a..c3adaad 100644
--- a/fancy_gym/__init__.py
+++ b/fancy_gym/__init__.py
@@ -1,13 +1,17 @@
from fancy_gym import dmc, meta, open_ai
-from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank
-from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS
-# Convenience function for all MP environments
-from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS
-from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
-from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
+from fancy_gym import envs as fancy
+from fancy_gym.utils.make_env_helpers import make_bb
+from .envs.registry import register, upgrade
+from .envs.registry import ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS, MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS
-ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {
- key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
- ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key]
- for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
+ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['dm_control']
+ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['fancy']
+ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['metaworld']
+ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['gym']
+
+
+def make(*args, **kwargs):
+ """
+ As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.
+ """
+ raise Exception('As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.')
diff --git a/fancy_gym/black_box/black_box_wrapper.py b/fancy_gym/black_box/black_box_wrapper.py
index 7c33428..6da24c7 100644
--- a/fancy_gym/black_box/black_box_wrapper.py
+++ b/fancy_gym/black_box/black_box_wrapper.py
@@ -1,8 +1,9 @@
-from typing import Tuple, Optional, Callable
+from typing import Tuple, Optional, Callable, Dict, Any
-import gym
+import gymnasium as gym
import numpy as np
-from gym import spaces
+from gymnasium import spaces
+from gymnasium.core import ObsType
from mp_pytorch.mp.mp_interfaces import MPInterface
from fancy_gym.black_box.controller.base_controller import BaseController
@@ -67,7 +68,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
self.reward_aggregation = reward_aggregation
# spaces
- self.return_context_observation = not (learn_sub_trajectories or self.do_replanning)
+ self.return_context_observation = not (
+ learn_sub_trajectories or self.do_replanning)
self.traj_gen_action_space = self._get_traj_gen_action_space()
self.action_space = self._get_action_space()
self.observation_space = self._get_observation_space()
@@ -99,14 +101,17 @@ class BlackBoxWrapper(gym.ObservationWrapper):
# If we do not do this, the traj_gen assumes we are continuing the trajectory.
self.traj_gen.reset()
- clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
+ clipped_params = np.clip(
+ action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
self.traj_gen.set_params(clipped_params)
- init_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt)
+ init_time = np.array(
+ 0 if not self.do_replanning else self.current_traj_steps * self.dt)
- condition_pos = self.condition_pos if self.condition_pos is not None else self.current_pos
- condition_vel = self.condition_vel if self.condition_vel is not None else self.current_vel
+ condition_pos = self.condition_pos if self.condition_pos is not None else self.env.get_wrapper_attr('current_pos')
+ condition_vel = self.condition_vel if self.condition_vel is not None else self.env.get_wrapper_attr('current_vel')
- self.traj_gen.set_initial_conditions(init_time, condition_pos, condition_vel)
+ self.traj_gen.set_initial_conditions(
+ init_time, condition_pos, condition_vel)
self.traj_gen.set_duration(duration, self.dt)
position = get_numpy(self.traj_gen.get_traj_pos())
@@ -153,7 +158,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
trajectory_length = len(position)
rewards = np.zeros(shape=(trajectory_length,))
if self.verbose >= 2:
- actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape)
+ actions = np.zeros(shape=(trajectory_length,) +
+ self.env.action_space.shape)
observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
dtype=self.env.observation_space.dtype)
@@ -161,16 +167,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
done = False
if not traj_is_valid:
- obs, trajectory_return, done, infos = self.env.invalid_traj_callback(action, position, velocity,
- self.return_context_observation,
- self.tau_bound, self.delay_bound)
- return self.observation(obs), trajectory_return, done, infos
+ obs, trajectory_return, terminated, truncated, infos = self.env.invalid_traj_callback(action, position, velocity,
+ self.return_context_observation, self.tau_bound, self.delay_bound)
+ return self.observation(obs), trajectory_return, terminated, truncated, infos
self.plan_steps += 1
for t, (pos, vel) in enumerate(zip(position, velocity)):
- step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel)
- c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high)
- obs, c_reward, done, info = self.env.step(c_action)
+ step_action = self.tracking_controller.get_action(
+ pos, vel, self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'))
+ c_action = np.clip(
+ step_action, self.env.action_space.low, self.env.action_space.high)
+ obs, c_reward, terminated, truncated, info = self.env.step(
+ c_action)
rewards[t] = c_reward
if self.verbose >= 2:
@@ -185,9 +193,7 @@ class BlackBoxWrapper(gym.ObservationWrapper):
if self.render_kwargs:
self.env.render(**self.render_kwargs)
- if done or (self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action,
- t + 1 + self.current_traj_steps)
- and self.plan_steps < self.max_planning_times):
+ if terminated or truncated or (self.replanning_schedule(self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'), obs, c_action, t + 1 + self.current_traj_steps) and self.plan_steps < self.max_planning_times):
if self.condition_on_desired:
self.condition_pos = pos
@@ -207,17 +213,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
infos['trajectory_length'] = t + 1
trajectory_return = self.reward_aggregation(rewards[:t + 1])
- return self.observation(obs), trajectory_return, done, infos
+ return self.observation(obs), trajectory_return, terminated, truncated, infos
def render(self, **kwargs):
"""Only set render options here, such that they can be used during the rollout.
This only needs to be called once"""
self.render_kwargs = kwargs
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.current_traj_steps = 0
self.plan_steps = 0
self.traj_gen.reset()
self.condition_pos = None
self.condition_vel = None
- return super(BlackBoxWrapper, self).reset()
+ return super(BlackBoxWrapper, self).reset(seed=seed, options=options)
diff --git a/fancy_gym/black_box/factory/controller_factory.py b/fancy_gym/black_box/factory/controller_factory.py
index 8b2d865..7a4bc34 100644
--- a/fancy_gym/black_box/factory/controller_factory.py
+++ b/fancy_gym/black_box/factory/controller_factory.py
@@ -11,11 +11,11 @@ def get_controller(controller_type: str, **kwargs):
if controller_type == "motor":
return PDController(**kwargs)
elif controller_type == "velocity":
- return VelController()
+ return VelController(**kwargs)
elif controller_type == "position":
- return PosController()
+ return PosController(**kwargs)
elif controller_type == "metaworld":
- return MetaWorldController()
+ return MetaWorldController(**kwargs)
else:
raise ValueError(f"Specified controller type {controller_type} not supported, "
f"please choose one of {ALL_TYPES}.")
diff --git a/fancy_gym/black_box/raw_interface_wrapper.py b/fancy_gym/black_box/raw_interface_wrapper.py
index c8f7273..6dba765 100644
--- a/fancy_gym/black_box/raw_interface_wrapper.py
+++ b/fancy_gym/black_box/raw_interface_wrapper.py
@@ -1,6 +1,6 @@
from typing import Union, Tuple
-import gym
+import gymnasium as gym
import numpy as np
from mp_pytorch.mp.mp_interfaces import MPInterface
@@ -114,7 +114,8 @@ class RawInterfaceWrapper(gym.Wrapper):
Returns:
obs: artificial observation if the trajectory is invalid, by default a zero vector
reward: artificial reward if the trajectory is invalid, by default 0
- done: artificial done if the trajectory is invalid, by default True
+ terminated: artificial terminated if the trajectory is invalid, by default True
+ truncated: artificial truncated if the trajectory is invalid, by default False
info: artificial info if the trajectory is invalid, by default empty dict
"""
- return np.zeros(1), 0, True, {}
\ No newline at end of file
+ return np.zeros(1), 0, True, False, {}
diff --git a/fancy_gym/dmc/README.MD b/fancy_gym/dmc/README.MD
index 040a9a0..a360e44 100644
--- a/fancy_gym/dmc/README.MD
+++ b/fancy_gym/dmc/README.MD
@@ -1,7 +1,7 @@
# DeepMind Control (DMC) Wrappers
-These are the Environment Wrappers for selected
-[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
+These are the Environment Wrappers for selected
+[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
environments in order to use our Motion Primitive gym interface with them.
## MP Environments
@@ -9,11 +9,11 @@ environments in order to use our Motion Primitive gym interface with them.
[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. Given most task can be)
[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
-|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
-|---|---|---|---|---|
-|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2
-|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2
-|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
-|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
-|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
-|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
+| Name | Description | Trajectory Horizon | Action Dimension | Context Dimension |
+| ---------------------------------------- | ------------------------------------------------------------------------------ | ------------------ | ---------------- | ----------------- |
+| `dm_control_ProDMP/ball_in_cup-catch-v0` | A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
+| `dm_control_DMP/ball_in_cup-catch-v0` | A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
+| `dm_control_ProDMP/reacher-easy-v0` | A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
+| `dm_control_DMP/reacher-easy-v0` | A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
+| `dm_control_ProDMP/reacher-hard-v0` | A ProMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
+| `dm_control_DMP/reacher-hard-v0` | A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
diff --git a/fancy_gym/dmc/__init__.py b/fancy_gym/dmc/__init__.py
index 397e6fa..7fcebba 100644
--- a/fancy_gym/dmc/__init__.py
+++ b/fancy_gym/dmc/__init__.py
@@ -1,245 +1,61 @@
from copy import deepcopy
+from gymnasium.wrappers import FlattenObservation
+from gymnasium.envs.registration import register
+
+from ..envs.registry import register
+
from . import manipulation, suite
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
-
-from gym.envs.registration import register
-
-DEFAULT_BB_DICT_ProMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'promp'
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'linear'
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 50.,
- "d_gains": 1.,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'zero_rbf',
- 'num_basis': 5,
- 'num_basis_zero_start': 1
- }
-}
-
-DEFAULT_BB_DICT_DMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'dmp'
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'exp'
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 50.,
- "d_gains": 1.,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'rbf',
- 'num_basis': 5
- }
-}
-
# DeepMind Control Suite (DMC)
-kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
-kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
-# bandwidth_factor=2
-kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1
register(
- id=f'dmc_ball_in_cup-catch_dmp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_bic_dmp
+ id=f"dm_control/ball_in_cup-catch-v0",
+ register_step_based=False,
+ mp_wrapper=suite.ball_in_cup.MPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
-kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
-kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
register(
- id=f'dmc_ball_in_cup-catch_promp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_bic_promp
+ id=f"dm_control/reacher-easy-v0",
+ register_step_based=False,
+ mp_wrapper=suite.reacher.MPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
-kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
-kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
-# bandwidth_factor=2
-kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-# TODO: weight scale 50, but goal scale 0.1
-kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
register(
- id=f'dmc_reacher-easy_dmp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_bic_dmp
+ id=f"dm_control/reacher-hard-v0",
+ register_step_based=False,
+ mp_wrapper=suite.reacher.MPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
-
-kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
-kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
-kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
-register(
- id=f'dmc_reacher-easy_promp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_reacher_easy_promp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
-
-kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
-kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
-# bandwidth_factor = 2
-kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-# TODO: weight scale 50, but goal scale 0.1
-kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
-register(
- id=f'dmc_reacher-hard_dmp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_reacher_hard_dmp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
-
-kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
-kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
-kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
-register(
- id=f'dmc_reacher-hard_promp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_reacher_hard_promp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
-
for _task in _dmc_cartpole_tasks:
- _env_id = f'dmc_cartpole-{_task}_dmp-v0'
- kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
- kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
- kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
- # bandwidth_factor = 2
- kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
- # TODO: weight scale 50, but goal scale 0.1
- kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
- kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
- kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole_dmp
+ id=f'dm_control/cartpole-{_task}-v0',
+ register_step_based=False,
+ mp_wrapper=suite.cartpole.MPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
- ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
- _env_id = f'dmc_cartpole-{_task}_promp-v0'
- kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
- kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
- kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
- kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
- kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
- kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole_promp
- )
- ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
-kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
-# bandwidth_factor = 2
-kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-# TODO: weight scale 50, but goal scale 0.1
-kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
-kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
-kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
-_env_id = f'dmc_cartpole-two_poles_dmp-v0'
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole2poles_dmp
+ id=f"dm_control/cartpole-two_poles-v0",
+ register_step_based=False,
+ mp_wrapper=suite.cartpole.TwoPolesMPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
-kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
-kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
-kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
-kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
-kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
-_env_id = f'dmc_cartpole-two_poles_promp-v0'
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole2poles_promp
+ id=f"dm_control/cartpole-three_poles-v0",
+ register_step_based=False,
+ mp_wrapper=suite.cartpole.ThreePolesMPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
-kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
-# bandwidth_factor = 2
-kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-# TODO: weight scale 50, but goal scale 0.1
-kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
-kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
-kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
-_env_id = f'dmc_cartpole-three_poles_dmp-v0'
-register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole3poles_dmp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
-
-kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
-kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
-kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
-kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
-kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
-_env_id = f'dmc_cartpole-three_poles_promp-v0'
-register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_cartpole3poles_promp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# DeepMind Manipulation
-kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
-kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
-kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-# TODO: weight scale 50, but goal scale 0.1
-kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
-kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
register(
- id=f'dmc_manipulation-reach_site_dmp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_mani_reach_site_features_dmp
+ id=f"dm_control/reach_site_features-v0",
+ register_step_based=False,
+ mp_wrapper=manipulation.reach_site.MPWrapper,
+ add_mp_types=['DMP', 'ProMP'],
)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
-
-kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
-kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
-kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
-kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
-register(
- id=f'dmc_manipulation-reach_site_promp-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_mani_reach_site_features_promp
-)
-ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")
diff --git a/fancy_gym/dmc/dmc_wrapper.py b/fancy_gym/dmc/dmc_wrapper.py
deleted file mode 100644
index b1522c3..0000000
--- a/fancy_gym/dmc/dmc_wrapper.py
+++ /dev/null
@@ -1,186 +0,0 @@
-# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
-# License: MIT
-# Copyright (c) 2020 Denis Yarats
-import collections
-from collections.abc import MutableMapping
-from typing import Any, Dict, Tuple, Optional, Union, Callable
-
-import gym
-import numpy as np
-from dm_control import composer
-from dm_control.rl import control
-from dm_env import specs
-from gym import spaces
-from gym.core import ObsType
-
-
-def _spec_to_box(spec):
- def extract_min_max(s):
- assert s.dtype == np.float64 or s.dtype == np.float32, \
- f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
- dim = int(np.prod(s.shape))
- if type(s) == specs.Array:
- bound = np.inf * np.ones(dim, dtype=s.dtype)
- return -bound, bound
- elif type(s) == specs.BoundedArray:
- zeros = np.zeros(dim, dtype=s.dtype)
- return s.minimum + zeros, s.maximum + zeros
-
- mins, maxs = [], []
- for s in spec:
- mn, mx = extract_min_max(s)
- mins.append(mn)
- maxs.append(mx)
- low = np.concatenate(mins, axis=0)
- high = np.concatenate(maxs, axis=0)
- assert low.shape == high.shape
- return spaces.Box(low, high, dtype=s.dtype)
-
-
-def _flatten_obs(obs: MutableMapping):
- """
- Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
- Args:
- obs: observation to flatten
-
- Returns: 1D array of observation
-
- """
-
- if not isinstance(obs, MutableMapping):
- raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
-
- # Keep key order consistent for non OrderedDicts
- keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
-
- obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
- return np.concatenate(obs_vals)
-
-
-class DMCWrapper(gym.Env):
- def __init__(self,
- env: Callable[[], Union[composer.Environment, control.Environment]],
- ):
-
- # TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
- self._env = env()
-
- # action and observation space
- self._action_space = _spec_to_box([self._env.action_spec()])
- self._observation_space = _spec_to_box(self._env.observation_spec().values())
-
- self._window = None
- self.id = 'dmc'
-
- def __getattr__(self, item):
- """Propagate only non-existent properties to wrapped env."""
- if item.startswith('_'):
- raise AttributeError("attempted to get missing private attribute '{}'".format(item))
- if item in self.__dict__:
- return getattr(self, item)
- return getattr(self._env, item)
-
- def _get_obs(self, time_step):
- obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
- return obs
-
- @property
- def observation_space(self):
- return self._observation_space
-
- @property
- def action_space(self):
- return self._action_space
-
- @property
- def dt(self):
- return self._env.control_timestep()
-
- def seed(self, seed=None):
- self._action_space.seed(seed)
- self._observation_space.seed(seed)
-
- def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
- assert self._action_space.contains(action)
- extra = {'internal_state': self._env.physics.get_state().copy()}
-
- time_step = self._env.step(action)
- reward = time_step.reward or 0.
- done = time_step.last()
- obs = self._get_obs(time_step)
- extra['discount'] = time_step.discount
-
- return obs, reward, done, extra
-
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
- time_step = self._env.reset()
- obs = self._get_obs(time_step)
- return obs
-
- def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
- segmentation=False, scene_option=None, render_flag_overrides=None):
-
- # assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
- if mode == "rgb_array":
- return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
- depth=depth, segmentation=segmentation, scene_option=scene_option,
- render_flag_overrides=render_flag_overrides)
-
- # Render max available buffer size. Larger is only possible by altering the XML.
- img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
- width=self._env.physics.model.vis.global_.offwidth,
- camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
- scene_option=scene_option, render_flag_overrides=render_flag_overrides)
-
- if depth:
- img = np.dstack([img.astype(np.uint8)] * 3)
-
- if mode == 'human':
- try:
- import cv2
- if self._window is None:
- self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
- cv2.imshow(self.id, img[..., ::-1]) # Image in BGR
- cv2.waitKey(1)
- except ImportError:
- raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
- # PYGAME seems to destroy some global rendering configs from the physics render
- # except ImportError:
- # import pygame
- # img_copy = img.copy().transpose((1, 0, 2))
- # if self._window is None:
- # pygame.init()
- # pygame.display.init()
- # self._window = pygame.display.set_mode(img_copy.shape[:2])
- # self.clock = pygame.time.Clock()
- #
- # surf = pygame.surfarray.make_surface(img_copy)
- # self._window.blit(surf, (0, 0))
- # pygame.event.pump()
- # self.clock.tick(30)
- # pygame.display.flip()
-
- def close(self):
- super().close()
- if self._window is not None:
- try:
- import cv2
- cv2.destroyWindow(self.id)
- except ImportError:
- import pygame
-
- pygame.display.quit()
- pygame.quit()
-
- @property
- def reward_range(self) -> Tuple[float, float]:
- reward_spec = self._env.reward_spec()
- if isinstance(reward_spec, specs.BoundedArray):
- return reward_spec.minimum, reward_spec.maximum
- return -float('inf'), float('inf')
-
- @property
- def metadata(self):
- return {'render.modes': ['human', 'rgb_array'],
- 'video.frames_per_second': round(1.0 / self._env.control_timestep())}
diff --git a/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
index f64ac4a..0eaf8b9 100644
--- a/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
+++ b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py
@@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 0.2,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ },
+ 'phase_generator': {
+ 'alpha_phase': 2,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 500,
+ },
+ },
+ 'ProDMP': {},
+ }
@property
def context_mask(self) -> np.ndarray:
@@ -35,4 +57,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
- return self.env.dt
+ return self.env.control_timestep()
diff --git a/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
index dc6a539..4441fb0 100644
--- a/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py
@@ -6,6 +6,25 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ },
+ 'phase_generator': {
+ 'alpha_phase': 2,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 10
+ },
+ },
+ 'ProDMP': {},
+ }
@property
def context_mask(self) -> np.ndarray:
@@ -31,4 +50,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
- return self.env.dt
+ return self.env.control_timestep()
diff --git a/fancy_gym/dmc/suite/cartpole/mp_wrapper.py b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
index 7edd51f..d4c8dcc 100644
--- a/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py
@@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 10,
+ 'd_gains': 10,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 0.2,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'p_gains': 10,
+ 'd_gains': 10,
+ },
+ 'phase_generator': {
+ 'alpha_phase': 2,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 500,
+ },
+ },
+ 'ProDMP': {},
+ }
def __init__(self, env, n_poles: int = 1):
self.n_poles = n_poles
@@ -35,7 +59,7 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
- return self.env.dt
+ return self.env.control_timestep()
class TwoPolesMPWrapper(MPWrapper):
diff --git a/fancy_gym/dmc/suite/reacher/mp_wrapper.py b/fancy_gym/dmc/suite/reacher/mp_wrapper.py
index 5ac52e5..d713fb6 100644
--- a/fancy_gym/dmc/suite/reacher/mp_wrapper.py
+++ b/fancy_gym/dmc/suite/reacher/mp_wrapper.py
@@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ 'd_gains': 1.0,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 0.2,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'p_gains': 50.0,
+ 'd_gains': 1.0,
+ },
+ 'phase_generator': {
+ 'alpha_phase': 2,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 500,
+ },
+ },
+ 'ProDMP': {},
+ }
@property
def context_mask(self) -> np.ndarray:
@@ -30,4 +54,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
- return self.env.dt
+ return self.env.control_timestep()
diff --git a/fancy_gym/envs/__init__.py b/fancy_gym/envs/__init__.py
index 32bd8f8..a40c81f 100644
--- a/fancy_gym/envs/__init__.py
+++ b/fancy_gym/envs/__init__.py
@@ -1,103 +1,43 @@
from copy import deepcopy
import numpy as np
-from gym import register
+from gymnasium import register as gym_register
+from .registry import register, upgrade
from . import classic_control, mujoco
-from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
+from .classic_control.simple_reacher import MPWrapper as MPWrapper_SimpleReacher
+from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
+from .classic_control.hole_reacher import MPWrapper as MPWrapper_HoleReacher
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
+from .classic_control.viapoint_reacher import MPWrapper as MPWrapper_ViaPointReacher
+from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
+from .mujoco.reacher.mp_wrapper import MPWrapper as MPWrapper_Reacher
from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
+from .mujoco.beerpong.mp_wrapper import MPWrapper as MPWrapper_Beerpong
+from .mujoco.beerpong.mp_wrapper import MPWrapper_FixedRelease as MPWrapper_Beerpong_FixedRelease
from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
-from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \
- BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
+ BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \
- MAX_EPISODE_STEPS_TABLE_TENNIS
-
-ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
-
-DEFAULT_BB_DICT_ProMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'promp'
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'linear'
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 1.0,
- "d_gains": 0.1,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'zero_rbf',
- 'num_basis': 5,
- 'num_basis_zero_start': 1,
- 'basis_bandwidth_factor': 3.0,
- },
- "black_box_kwargs": {
- }
-}
-
-DEFAULT_BB_DICT_DMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'dmp'
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'exp'
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 1.0,
- "d_gains": 0.1,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'rbf',
- 'num_basis': 5
- }
-}
-
-DEFAULT_BB_DICT_ProDMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'prodmp',
- 'duration': 2.0,
- 'weights_scale': 1.0,
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'exp',
- 'tau': 1.5,
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 1.0,
- "d_gains": 0.1,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'prodmp',
- 'alpha': 10,
- 'num_basis': 5,
- },
- "black_box_kwargs": {
- }
-}
+ MAX_EPISODE_STEPS_TABLE_TENNIS
+from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper as MPWrapper_TableTennis
+from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper_Replan as MPWrapper_TableTennis_Replan
+from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper as MPWrapper_TableTennis_VelObs
+from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper_Replan as MPWrapper_TableTennis_VelObs_Replan
# Classic Control
-## Simple Reacher
+# Simple Reacher
register(
- id='SimpleReacher-v0',
- entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ id='fancy/SimpleReacher-v0',
+ entry_point=SimpleReacherEnv,
+ mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200,
kwargs={
"n_links": 2,
@@ -105,19 +45,20 @@ register(
)
register(
- id='LongSimpleReacher-v0',
- entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
+ id='fancy/LongSimpleReacher-v0',
+ entry_point=SimpleReacherEnv,
+ mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
-## Viapoint Reacher
-
+# Viapoint Reacher
register(
- id='ViaPointReacher-v0',
- entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv',
+ id='fancy/ViaPointReacher-v0',
+ entry_point=ViaPointReacherEnv,
+ mp_wrapper=MPWrapper_ViaPointReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
@@ -126,10 +67,11 @@ register(
}
)
-## Hole Reacher
+# Hole Reacher
register(
- id='HoleReacher-v0',
- entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
+ id='fancy/HoleReacher-v0',
+ entry_point=HoleReacherEnv,
+ mp_wrapper=MPWrapper_HoleReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
@@ -145,31 +87,35 @@ register(
# Mujoco
-## Mujoco Reacher
-for _dims in [5, 7]:
+# Mujoco Reacher
+for dims in [5, 7]:
register(
- id=f'Reacher{_dims}d-v0',
- entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ id=f'fancy/Reacher{dims}d-v0',
+ entry_point=ReacherEnv,
+ mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={
- "n_links": _dims,
+ "n_links": dims,
}
)
register(
- id=f'Reacher{_dims}dSparse-v0',
- entry_point='fancy_gym.envs.mujoco:ReacherEnv',
+ id=f'fancy/Reacher{dims}dSparse-v0',
+ entry_point=ReacherEnv,
+ mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={
"sparse": True,
'reward_weight': 200,
- "n_links": _dims,
+ "n_links": dims,
}
)
+
register(
- id='HopperJumpSparse-v0',
+ id='fancy/HopperJumpSparse-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
+ mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"sparse": True,
@@ -177,8 +123,9 @@ register(
)
register(
- id='HopperJump-v0',
+ id='fancy/HopperJump-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
+ mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"sparse": False,
@@ -188,76 +135,117 @@ register(
}
)
+# TODO: Add [MPs] later when finished (old TODO I moved here during refactor)
register(
- id='AntJump-v0',
+ id='fancy/AntJump-v0',
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
+ add_mp_types=[],
)
register(
- id='HalfCheetahJump-v0',
+ id='fancy/HalfCheetahJump-v0',
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
+ add_mp_types=[],
)
register(
- id='HopperJumpOnBox-v0',
+ id='fancy/HopperJumpOnBox-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
+ add_mp_types=[],
)
register(
- id='HopperThrow-v0',
+ id='fancy/HopperThrow-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
+ add_mp_types=[],
)
register(
- id='HopperThrowInBasket-v0',
+ id='fancy/HopperThrowInBasket-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
+ add_mp_types=[],
)
register(
- id='Walker2DJump-v0',
+ id='fancy/Walker2DJump-v0',
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
+ add_mp_types=[],
+)
+
+register( # [MPDone
+ id='fancy/BeerPong-v0',
+ entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
+ mp_wrapper=MPWrapper_Beerpong,
+ max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
+ add_mp_types=['ProMP'],
+)
+
+# Here we use the same reward as in BeerPong-v0, but now consider after the release,
+# only one time step, i.e. we simulate until the end of th episode
+register(
+ id='fancy/BeerPongStepBased-v0',
+ entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
+ mp_wrapper=MPWrapper_Beerpong_FixedRelease,
+ max_episode_steps=FIXED_RELEASE_STEP,
+ add_mp_types=['ProMP'],
)
register(
- id='BeerPong-v0',
+ id='fancy/BeerPongFixedRelease-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
- max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
+ mp_wrapper=MPWrapper_Beerpong_FixedRelease,
+ max_episode_steps=FIXED_RELEASE_STEP,
+ add_mp_types=['ProMP'],
)
# Box pushing environments with different rewards
for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]:
register(
- id='BoxPushing{}-v0'.format(reward_type),
+ id='fancy/BoxPushing{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
+ mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
)
register(
- id='BoxPushingRandomInit{}-v0'.format(reward_type),
+ id='fancy/BoxPushingRandomInit{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
+ mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
kwargs={"random_init": True}
)
-# Here we use the same reward as in BeerPong-v0, but now consider after the release,
-# only one time step, i.e. we simulate until the end of th episode
-register(
- id='BeerPongStepBased-v0',
- entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
- max_episode_steps=FIXED_RELEASE_STEP,
-)
+ upgrade(
+ id='fancy/BoxPushing{}Replan-v0'.format(reward_type),
+ base_id='fancy/BoxPushing{}-v0'.format(reward_type),
+ mp_wrapper=mujoco.box_pushing.ReplanMPWrapper,
+ )
# Table Tennis environments
for ctxt_dim in [2, 4]:
register(
- id='TableTennis{}D-v0'.format(ctxt_dim),
+ id='fancy/TableTennis{}D-v0'.format(ctxt_dim),
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
+ mp_wrapper=MPWrapper_TableTennis,
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
+ add_mp_types=['ProMP', 'ProDMP'],
+ kwargs={
+ "ctxt_dim": ctxt_dim,
+ 'frame_skip': 4,
+ }
+ )
+
+ register(
+ id='fancy/TableTennis{}DReplan-v0'.format(ctxt_dim),
+ entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
+ mp_wrapper=MPWrapper_TableTennis,
+ max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
+ add_mp_types=['ProDMP'],
kwargs={
"ctxt_dim": ctxt_dim,
'frame_skip': 4,
@@ -265,626 +253,39 @@ for ctxt_dim in [2, 4]:
)
register(
- id='TableTennisWind-v0',
+ id='fancy/TableTennisWind-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
+ mp_wrapper=MPWrapper_TableTennis_VelObs,
+ add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
)
register(
- id='TableTennisGoalSwitching-v0',
+ id='fancy/TableTennisWindReplan-v0',
+ entry_point='fancy_gym.envs.mujoco:TableTennisWind',
+ mp_wrapper=MPWrapper_TableTennis_VelObs_Replan,
+ add_mp_types=['ProDMP'],
+ max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
+)
+
+register(
+ id='fancy/TableTennisGoalSwitching-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
+ mp_wrapper=MPWrapper_TableTennis,
+ add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
kwargs={
'goal_switching_step': 99
}
)
-
-# movement Primitive Environments
-
-## Simple Reacher
-_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}DMP-{_name[1]}'
- kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
- kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
- kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
- kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
- kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
- kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
- kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_simple_reacher_dmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
-
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
- kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
- kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
- kwargs_dict_simple_reacher_promp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_simple_reacher_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-# Viapoint reacher
-kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
-kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
-kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
-kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
-kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
-kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
register(
- id='ViaPointReacherDMP-v0',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- # max_episode_steps=1,
- kwargs=kwargs_dict_via_point_reacher_dmp
-)
-ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
-
-kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
-kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
-kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
-kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
-register(
- id="ViaPointReacherProMP-v0",
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_via_point_reacher_promp
-)
-ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
-
-## Hole Reacher
-_versions = ["HoleReacher-v0"]
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}DMP-{_name[1]}'
- kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
- kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
- kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
- # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
- kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
- kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
- kwargs_dict_hole_reacher_dmp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- # max_episode_steps=1,
- kwargs=kwargs_dict_hole_reacher_dmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
-
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
- kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
- kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
- kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_hole_reacher_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-## ReacherNd
-_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}DMP-{_name[1]}'
- kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
- kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
- kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
- kwargs_dict_reacher_dmp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- # max_episode_steps=1,
- kwargs=kwargs_dict_reacher_dmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
-
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
- kwargs_dict_reacher_promp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_reacher_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-########################################################################################################################
-## Beerpong ProMP
-_versions = ['BeerPong-v0']
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
- kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
- kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
- kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
- kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
- kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
- kwargs_dict_bp_promp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_bp_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-### BP with Fixed release
-_versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
-for _v in _versions:
- if _v != 'BeerPong-v0':
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- else:
- _env_id = 'BeerPongFixedReleaseProMP-v0'
- kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
- kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
- kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
- kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
- kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
- kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
- kwargs_dict_bp_promp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_bp_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-########################################################################################################################
-
-## Table Tennis needs to be fixed according to Zhou's implementation
-
-# TODO: Add later when finished
-# ########################################################################################################################
-#
-# ## AntJump
-# _versions = ['AntJump-v0']
-# for _v in _versions:
-# _name = _v.split("-")
-# _env_id = f'{_name[0]}ProMP-{_name[1]}'
-# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
-# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
-# kwargs_dict_ant_jump_promp['name'] = _v
-# register(
-# id=_env_id,
-# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
-# kwargs=kwargs_dict_ant_jump_promp
-# )
-# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-#
-# ########################################################################################################################
-#
-# ## HalfCheetahJump
-# _versions = ['HalfCheetahJump-v0']
-# for _v in _versions:
-# _name = _v.split("-")
-# _env_id = f'{_name[0]}ProMP-{_name[1]}'
-# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
-# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
-# kwargs_dict_halfcheetah_jump_promp['name'] = _v
-# register(
-# id=_env_id,
-# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
-# kwargs=kwargs_dict_halfcheetah_jump_promp
-# )
-# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-#
-# ########################################################################################################################
-
-
-## HopperJump
-_versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
- # 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
- ]
-# TODO: Check if all environments work with the same MPWrapper
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
- kwargs_dict_hopper_jump_promp['name'] = _v
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_hopper_jump_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-# ########################################################################################################################
-
-## Box Pushing
-_versions = ['BoxPushingDense-v0', 'BoxPushingTemporalSparse-v0', 'BoxPushingTemporalSpatialSparse-v0',
- 'BoxPushingRandomInitDense-v0', 'BoxPushingRandomInitTemporalSparse-v0',
- 'BoxPushingRandomInitTemporalSpatialSparse-v0']
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_box_pushing_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_box_pushing_promp['wrappers'].append(mujoco.box_pushing.MPWrapper)
- kwargs_dict_box_pushing_promp['name'] = _v
- kwargs_dict_box_pushing_promp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
- kwargs_dict_box_pushing_promp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
- kwargs_dict_box_pushing_promp['basis_generator_kwargs']['basis_bandwidth_factor'] = 2 # 3.5, 4 to try
-
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_box_pushing_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProDMP-{_name[1]}'
- kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
- kwargs_dict_box_pushing_prodmp['name'] = _v
- kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
- kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
- kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
- kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
- kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_box_pushing_prodmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
-
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
- kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
- kwargs_dict_box_pushing_prodmp['name'] = _v
- kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
- kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
- kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
- kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
- kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
- kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
- kwargs_dict_box_pushing_prodmp['black_box_kwargs']['max_planning_times'] = 4
- kwargs_dict_box_pushing_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 25 == 0
- kwargs_dict_box_pushing_prodmp['black_box_kwargs']['condition_on_desired'] = True
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_box_pushing_prodmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
-
-## Table Tennis
-_versions = ['TableTennis2D-v0', 'TableTennis4D-v0', 'TableTennisWind-v0', 'TableTennisGoalSwitching-v0']
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProMP-{_name[1]}'
- kwargs_dict_tt_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- if _v == 'TableTennisWind-v0':
- kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
- else:
- kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
- kwargs_dict_tt_promp['name'] = _v
- kwargs_dict_tt_promp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
- kwargs_dict_tt_promp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
- kwargs_dict_tt_promp['phase_generator_kwargs']['learn_tau'] = True
- kwargs_dict_tt_promp['phase_generator_kwargs']['learn_delay'] = True
- kwargs_dict_tt_promp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
- kwargs_dict_tt_promp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
- kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis'] = 3
- kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_start'] = 1
- kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_goal'] = 1
- kwargs_dict_tt_promp['black_box_kwargs']['verbose'] = 2
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_tt_promp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ProDMP-{_name[1]}'
- kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- if _v == 'TableTennisWind-v0':
- kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
- else:
- kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
- kwargs_dict_tt_prodmp['name'] = _v
- kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
- kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.7
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['relative_goal'] = True
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['disable_goal'] = True
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 3
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_tt_prodmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
-
-for _v in _versions:
- _name = _v.split("-")
- _env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
- kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- if _v == 'TableTennisWind-v0':
- kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
- else:
- kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
- kwargs_dict_tt_prodmp['name'] = _v
- kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
- kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = False
- kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['goal_offset'] = 1.0
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 2
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
- kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
- kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
- kwargs_dict_tt_prodmp['black_box_kwargs']['max_planning_times'] = 3
- kwargs_dict_tt_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 50 == 0
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_tt_prodmp
- )
- ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
-#
-# ## Walker2DJump
-# _versions = ['Walker2DJump-v0']
-# for _v in _versions:
-# _name = _v.split("-")
-# _env_id = f'{_name[0]}ProMP-{_name[1]}'
-# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
-# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
-# kwargs_dict_walker2d_jump_promp['name'] = _v
-# register(
-# id=_env_id,
-# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
-# kwargs=kwargs_dict_walker2d_jump_promp
-# )
-# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
-### Depricated, we will not provide non random starts anymore
-"""
-register(
- id='SimpleReacher-v1',
- entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
- max_episode_steps=200,
+ id='fancy/TableTennisGoalSwitchingReplan-v0',
+ entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
+ mp_wrapper=MPWrapper_TableTennis_Replan,
+ add_mp_types=['ProDMP'],
+ max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
kwargs={
- "n_links": 2,
- "random_start": False
+ 'goal_switching_step': 99
}
)
-
-register(
- id='LongSimpleReacher-v1',
- entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
- max_episode_steps=200,
- kwargs={
- "n_links": 5,
- "random_start": False
- }
-)
-register(
- id='HoleReacher-v1',
- entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
- max_episode_steps=200,
- kwargs={
- "n_links": 5,
- "random_start": False,
- "allow_self_collision": False,
- "allow_wall_collision": False,
- "hole_width": 0.25,
- "hole_depth": 1,
- "hole_x": None,
- "collision_penalty": 100,
- }
-)
-register(
- id='HoleReacher-v2',
- entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
- max_episode_steps=200,
- kwargs={
- "n_links": 5,
- "random_start": False,
- "allow_self_collision": False,
- "allow_wall_collision": False,
- "hole_width": 0.25,
- "hole_depth": 1,
- "hole_x": 2,
- "collision_penalty": 1,
- }
-)
-
-# CtxtFree are v0, Contextual are v1
-register(
- id='AntJump-v0',
- entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
- max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
- "context": False
- }
-)
-# CtxtFree are v0, Contextual are v1
-register(
- id='HalfCheetahJump-v0',
- entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
- max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
- "context": False
- }
-)
-register(
- id='HopperJump-v0',
- entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
- max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
- "context": False,
- "healthy_reward": 1.0
- }
-)
-
-"""
-
-### Deprecated used for CorL paper
-"""
-_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
-for i in _vs:
- _env_id = f'ALRReacher{i}-v0'
- register(
- id=_env_id,
- entry_point='fancy_gym.envs.mujoco:ReacherEnv',
- max_episode_steps=200,
- kwargs={
- "steps_before_reward": 0,
- "n_links": 5,
- "balance": False,
- '_ctrl_cost_weight': i
- }
- )
-
- _env_id = f'ALRReacherSparse{i}-v0'
- register(
- id=_env_id,
- entry_point='fancy_gym.envs.mujoco:ReacherEnv',
- max_episode_steps=200,
- kwargs={
- "steps_before_reward": 200,
- "n_links": 5,
- "balance": False,
- '_ctrl_cost_weight': i
- }
- )
- _vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
-for i in _vs:
- _env_id = f'ALRReacher{i}ProMP-v0'
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
- kwargs={
- "name": f"{_env_id.replace('ProMP', '')}",
- "wrappers": [mujoco.reacher.MPWrapper],
- "mp_kwargs": {
- "num_dof": 5,
- "num_basis": 5,
- "duration": 4,
- "policy_type": "motor",
- # "weights_scale": 5,
- "n_zero_basis": 1,
- "zero_start": True,
- "policy_kwargs": {
- "p_gains": 1,
- "d_gains": 0.1
- }
- }
- }
- )
-
- _env_id = f'ALRReacherSparse{i}ProMP-v0'
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
- kwargs={
- "name": f"{_env_id.replace('ProMP', '')}",
- "wrappers": [mujoco.reacher.MPWrapper],
- "mp_kwargs": {
- "num_dof": 5,
- "num_basis": 5,
- "duration": 4,
- "policy_type": "motor",
- # "weights_scale": 5,
- "n_zero_basis": 1,
- "zero_start": True,
- "policy_kwargs": {
- "p_gains": 1,
- "d_gains": 0.1
- }
- }
- }
- )
-
- register(
- id='HopperJumpOnBox-v0',
- entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
- max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
- "context": False
- }
- )
- register(
- id='HopperThrow-v0',
- entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
- max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
- "context": False
- }
- )
- register(
- id='HopperThrowInBasket-v0',
- entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
- max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
- "context": False
- }
- )
- register(
- id='Walker2DJump-v0',
- entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
- max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
- kwargs={
- "max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
- "context": False
- }
- )
- register(id='TableTennis2DCtxt-v1',
- entry_point='fancy_gym.envs.mujoco:TTEnvGym',
- max_episode_steps=MAX_EPISODE_STEPS,
- kwargs={'ctxt_dim': 2, 'fixed_goal': True})
-
- register(
- id='BeerPong-v0',
- entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
- max_episode_steps=300,
- kwargs={
- "rndm_goal": False,
- "cup_goal_pos": [0.1, -2.0],
- "frame_skip": 2
- }
- )
-"""
diff --git a/fancy_gym/envs/classic_control/README.MD b/fancy_gym/envs/classic_control/README.MD
index bd1b68b..b714554 100644
--- a/fancy_gym/envs/classic_control/README.MD
+++ b/fancy_gym/envs/classic_control/README.MD
@@ -1,18 +1,20 @@
### Classic Control
## Step-based Environments
-|Name| Description|Horizon|Action Dimension|Observation Dimension
-|---|---|---|---|---|
-|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
-|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
-|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
-|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+
+| Name | Description | Horizon | Action Dimension | Observation Dimension |
+| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
+| `fancy/SimpleReacher-v0` | Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 2 | 9 |
+| `fancy/LongSimpleReacher-v0` | Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 5 | 18 |
+| `fancy/ViaPointReacher-v0` | Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively. | 200 | 5 | 18 |
+| `fancy/HoleReacher-v0` | 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 |
## MP Environments
-|Name| Description|Horizon|Action Dimension|Context Dimension
-|---|---|---|---|---|
-|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
-|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
-|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
-[//]: |`HoleReacherProMPP-v0`|
\ No newline at end of file
+| Name | Description | Horizon | Action Dimension | Context Dimension |
+| ----------------------------------- | -------------------------------------------------------------------------------------------------------- | ------- | ---------------- | ----------------- |
+| `fancy_DMP/ViaPointReacher-v0` | A DMP provides a trajectory for the `fancy/ViaPointReacher-v0` task. | 200 | 25 |
+| `fancy_DMP/HoleReacherFixedGoal-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 |
+| `fancy_DMP/HoleReacher-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 |
+
+[//]: |`fancy/HoleReacherProMPP-v0`|
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
index f2ba135..18305fd 100644
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py
@@ -1,10 +1,10 @@
-from typing import Union, Tuple, Optional
+from typing import Union, Tuple, Optional, Any, Dict
-import gym
+import gymnasium as gym
import numpy as np
-from gym import spaces
-from gym.core import ObsType
-from gym.utils import seeding
+from gymnasium import spaces
+from gymnasium.core import ObsType
+from gymnasium.utils import seeding
from fancy_gym.envs.classic_control.utils import intersect
@@ -55,7 +55,6 @@ class BaseReacherEnv(gym.Env):
self.fig = None
self._steps = 0
- self.seed()
@property
def dt(self) -> Union[float, int]:
@@ -69,10 +68,15 @@ class BaseReacherEnv(gym.Env):
def current_vel(self):
return self._angle_velocity.copy()
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
# Sample only orientation of first link, i.e. the arm is always straight.
- if self.random_start:
+ super(BaseReacherEnv, self).reset(seed=seed, options=options)
+ try:
+ random_start = options.get('random_start', self.random_start)
+ except AttributeError:
+ random_start = self.random_start
+ if random_start:
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
self._start_pos = self._joint_angles.copy()
@@ -84,7 +88,7 @@ class BaseReacherEnv(gym.Env):
self._update_joints()
self._steps = 0
- return self._get_obs().copy()
+ return self._get_obs().copy(), {}
def _update_joints(self):
"""
@@ -124,10 +128,6 @@ class BaseReacherEnv(gym.Env):
def _terminate(self, info) -> bool:
raise NotImplementedError
- def seed(self, seed=None):
- self.np_random, seed = seeding.np_random(seed)
- return [seed]
-
def close(self):
super(BaseReacherEnv, self).close()
del self.fig
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
index ab21b39..6878922 100644
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py
@@ -1,5 +1,5 @@
import numpy as np
-from gym import spaces
+from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@@ -32,6 +32,7 @@ class BaseReacherDirectEnv(BaseReacherEnv):
reward, info = self._get_reward(action)
self._steps += 1
- done = self._terminate(info)
+ terminated = self._terminate(info)
+ truncated = False
- return self._get_obs().copy(), reward, done, info
+ return self._get_obs().copy(), reward, terminated, truncated, info
diff --git a/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
index 7364948..c9a7d4f 100644
--- a/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
+++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py
@@ -1,5 +1,5 @@
import numpy as np
-from gym import spaces
+from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@@ -31,6 +31,7 @@ class BaseReacherTorqueEnv(BaseReacherEnv):
reward, info = self._get_reward(action)
self._steps += 1
- done = False
+ terminated = False
+ truncated = False
- return self._get_obs().copy(), reward, done, info
+ return self._get_obs().copy(), reward, terminated, truncated, info
diff --git a/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
index 5563ea6..c9e0a61 100644
--- a/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
+++ b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py
@@ -1,17 +1,20 @@
-from typing import Union, Optional, Tuple
+from typing import Union, Optional, Tuple, Any, Dict
-import gym
+import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
-from gym.core import ObsType
+from gymnasium import spaces
+from gymnasium.core import ObsType
from matplotlib import patches
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
+from . import MPWrapper
MAX_EPISODE_STEPS_HOLEREACHER = 200
class HoleReacherEnv(BaseReacherDirectEnv):
+
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
@@ -40,7 +43,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
[np.inf] # env steps, because reward start after n steps TODO: Maybe
])
# self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
- self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+ self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
if rew_fct == "simple":
from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
@@ -54,13 +57,18 @@ class HoleReacherEnv(BaseReacherDirectEnv):
else:
raise ValueError("Unknown reward function {}".format(rew_fct))
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
+
+ # initialize seed here as the random goal needs to be generated before the super reset()
+ gym.Env.reset(self, seed=seed, options=options)
+
self._generate_hole()
self._set_patches()
self.reward_function.reset()
- return super().reset()
+ # do not provide seed to avoid setting it twice
+ return super(HoleReacherEnv, self).reset(options=options)
def _get_reward(self, action: np.ndarray) -> (float, dict):
return self.reward_function.get_reward(self)
@@ -160,7 +168,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
# all points that are above the hole
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
- line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
+ line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
# check if any of those points are below surface
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
@@ -223,16 +231,3 @@ class HoleReacherEnv(BaseReacherDirectEnv):
self.fig.gca().add_patch(left_block)
self.fig.gca().add_patch(right_block)
self.fig.gca().add_patch(hole_floor)
-
-
-if __name__ == "__main__":
-
- env = HoleReacherEnv(5)
- env.reset()
-
- for i in range(10000):
- ac = env.action_space.sample()
- obs, rew, done, info = env.step(ac)
- env.render()
- if done:
- env.reset()
diff --git a/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
index d160b5c..4c56f87 100644
--- a/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py
@@ -7,6 +7,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'controller_type': 'velocity',
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 2,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'controller_type': 'velocity',
+ },
+ 'trajectory_generator_kwargs': {
+ # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
+ 'weights_scale': 500,
+ },
+ 'phase_generator_kwargs': {
+ 'alpha_phase': 2.5,
+ },
+ },
+ 'ProDMP': {},
+ }
+
@property
def context_mask(self):
return np.hstack([
diff --git a/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
index 6d1fda1..d2f90d5 100644
--- a/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py
@@ -7,6 +7,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 0.6,
+ 'd_gains': 0.075,
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'p_gains': 0.6,
+ 'd_gains': 0.075,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 50,
+ },
+ 'phase_generator_kwargs': {
+ 'alpha_phase': 2,
+ },
+ },
+ 'ProDMP': {},
+ }
+
@property
def context_mask(self):
return np.hstack([
diff --git a/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
index 9b03147..3afd021 100644
--- a/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
+++ b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py
@@ -1,11 +1,12 @@
-from typing import Iterable, Union, Optional, Tuple
+from typing import Iterable, Union, Optional, Tuple, Any, Dict
import matplotlib.pyplot as plt
import numpy as np
-from gym import spaces
-from gym.core import ObsType
+from gymnasium import spaces
+from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
+from . import MPWrapper
class SimpleReacherEnv(BaseReacherTorqueEnv):
@@ -42,11 +43,15 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
# def start_pos(self):
# return self._start_pos
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
+ # Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
+ # (Env will not behave deterministic otherwise)
+ # Yes, there is probably a more elegant solution to this problem...
self._generate_goal()
-
- return super().reset()
+ super().reset(seed=seed, options=options)
+ self._generate_goal()
+ return super().reset(seed=seed, options=options)
def _get_reward(self, action: np.ndarray):
diff = self.end_effector - self._goal
@@ -127,15 +132,3 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
self.fig.canvas.draw()
self.fig.canvas.flush_events()
-
-
-if __name__ == "__main__":
- env = SimpleReacherEnv(5)
- env.reset()
- for i in range(200):
- ac = env.action_space.sample()
- obs, rew, done, info = env.step(ac)
-
- env.render()
- if done:
- break
diff --git a/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
index 47da749..b915ec0 100644
--- a/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py
@@ -7,6 +7,26 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'controller_type': 'velocity',
+ },
+ },
+ 'DMP': {
+ 'controller_kwargs': {
+ 'controller_type': 'velocity',
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 50,
+ },
+ 'phase_generator_kwargs': {
+ 'alpha_phase': 2,
+ },
+ },
+ 'ProDMP': {},
+ }
+
@property
def context_mask(self):
return np.hstack([
diff --git a/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
index f3412ac..e4d9091 100644
--- a/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
+++ b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py
@@ -1,11 +1,13 @@
-from typing import Iterable, Union, Tuple, Optional
+from typing import Iterable, Union, Tuple, Optional, Any, Dict
-import gym
+import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
-from gym.core import ObsType
+from gymnasium import spaces
+from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
+from . import MPWrapper
class ViaPointReacherEnv(BaseReacherDirectEnv):
@@ -34,16 +36,21 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
[np.inf] * 2, # x-y coordinates of target distance
[np.inf] # env steps, because reward start after n steps
])
- self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+ self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
# @property
# def start_pos(self):
# return self._start_pos
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
+ # Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
+ # (Env will not behave deterministic otherwise)
+ # Yes, there is probably a more elegant solution to this problem...
self._generate_goal()
- return super().reset()
+ super().reset(seed=seed, options=options)
+ self._generate_goal()
+ return super().reset(seed=seed, options=options)
def _generate_goal(self):
# TODO: Maybe improve this later, this can yield quite a lot of invalid settings
@@ -183,16 +190,3 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
plt.pause(0.01)
-
-
-if __name__ == "__main__":
-
- env = ViaPointReacherEnv(5)
- env.reset()
-
- for i in range(10000):
- ac = env.action_space.sample()
- obs, rew, done, info = env.step(ac)
- env.render()
- if done:
- env.reset()
diff --git a/fancy_gym/envs/mujoco/README.MD b/fancy_gym/envs/mujoco/README.MD
index 0ea5a1f..ff74085 100644
--- a/fancy_gym/envs/mujoco/README.MD
+++ b/fancy_gym/envs/mujoco/README.MD
@@ -1,15 +1,48 @@
# Custom Mujoco tasks
## Step-based Environments
-|Name| Description|Horizon|Action Dimension|Observation Dimension
-|---|---|---|---|---|
-|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
-|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
-|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
-|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
-|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
-|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
-|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
-|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
-|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
-
\ No newline at end of file
+
+| Name | Description | Horizon | Action Dimension | Observation Dimension |
+| ------------------------------------------ | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
+| `fancy/Reacher-v0` | Modified (5 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 5 | 21 |
+| `fancy/ReacherSparse-v0` | Same as `fancy/Reacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 5 | 21 |
+| `fancy/ReacherSparseBalanced-v0` | Same as `fancy/ReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 5 | 21 |
+| `fancy/LongReacher-v0` | Modified (7 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 7 | 27 |
+| `fancy/LongReacherSparse-v0` | Same as `fancy/LongReacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 7 | 27 |
+| `fancy/LongReacherSparseBalanced-v0` | Same as `fancy/LongReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 7 | 27 |
+| `fancy/Reacher5d-v0` | Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
+| `fancy/Reacher5dSparse-v0` | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
+| `fancy/Reacher7d-v0` | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
+| `fancy/Reacher7dSparse-v0` | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
+| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
+| `fancy/HopperJump-v0` | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
+| `fancy/AntJump-v0` | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant` | 200 | 8 | 119 |
+| `fancy/HalfCheetahJump-v0` | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah` | 100 | 6 | 112 |
+| `fancy/HopperJumpOnBox-v0` | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 4 | 16 / 100\* |
+| `fancy/HopperThrow-v0` | Hopper Throw task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
+| `fancy/HopperThrowInBasket-v0` | Hopper Throw in Basket task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
+| `fancy/Walker2DJump-v0` | Walker 2D Jump task, based on Gymnasium's `gym.envs.mujoco.Walker2d` | 300 | 6 | 18 / 19\* |
+| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
+| `fancy/BeerPongStepBased-v0` | Step-based Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
+| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
+| `fancy/BoxPushingDense-v0` | Custom Box-pushing task with dense rewards | 100 | 3 | 13 |
+| `fancy/BoxPushingTemporalSparse-v0` | Custom Box-pushing task with temporally sparse rewards | 100 | 3 | 13 |
+| `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards | 100 | 3 | 13 |
+| `fancy/TableTennis2D-v0` | Table Tennis task with 2D context, based on a custom environment for table tennis | 350 | 7 | 19 |
+| `fancy/TableTennis2DReplan-v0` | Table Tennis task with 2D context and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
+| `fancy/TableTennis4D-v0` | Table Tennis task with 4D context, based on a custom environment for table tennis | 350 | 7 | 22 |
+| `fancy/TableTennis4DReplan-v0` | Table Tennis task with 4D context and replanning, based on a custom environment for table tennis | 350 | 7 | 22 |
+| `fancy/TableTennisWind-v0` | Table Tennis task with wind effects, based on a custom environment for table tennis | 350 | 7 | 19 |
+| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
+| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
+
+\*Observation dimensions depend on configuration.
+
+
diff --git a/fancy_gym/envs/mujoco/ant_jump/ant_jump.py b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
index 9311ae1..ed6bea5 100644
--- a/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
+++ b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py
@@ -1,8 +1,11 @@
-from typing import Tuple, Union, Optional
+from typing import Tuple, Union, Optional, Any, Dict
import numpy as np
-from gym.core import ObsType
-from gym.envs.mujoco.ant_v4 import AntEnv
+from gymnasium.core import ObsType
+from gymnasium.envs.mujoco.ant_v4 import AntEnv, DEFAULT_CAMERA_CONFIG
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
MAX_EPISODE_STEPS_ANTJUMP = 200
@@ -12,8 +15,74 @@ MAX_EPISODE_STEPS_ANTJUMP = 200
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
# as possible, while landing at a specific target position
+class AntEnvCustomXML(AntEnv):
+ def __init__(
+ self,
+ xml_file="ant.xml",
+ ctrl_cost_weight=0.5,
+ use_contact_forces=False,
+ contact_cost_weight=5e-4,
+ healthy_reward=1.0,
+ terminate_when_unhealthy=True,
+ healthy_z_range=(0.2, 1.0),
+ contact_force_range=(-1.0, 1.0),
+ reset_noise_scale=0.1,
+ exclude_current_positions_from_observation=True,
+ **kwargs,
+ ):
+ utils.EzPickle.__init__(
+ self,
+ xml_file,
+ ctrl_cost_weight,
+ use_contact_forces,
+ contact_cost_weight,
+ healthy_reward,
+ terminate_when_unhealthy,
+ healthy_z_range,
+ contact_force_range,
+ reset_noise_scale,
+ exclude_current_positions_from_observation,
+ **kwargs,
+ )
-class AntJumpEnv(AntEnv):
+ self._ctrl_cost_weight = ctrl_cost_weight
+ self._contact_cost_weight = contact_cost_weight
+
+ self._healthy_reward = healthy_reward
+ self._terminate_when_unhealthy = terminate_when_unhealthy
+ self._healthy_z_range = healthy_z_range
+
+ self._contact_force_range = contact_force_range
+
+ self._reset_noise_scale = reset_noise_scale
+
+ self._use_contact_forces = use_contact_forces
+
+ self._exclude_current_positions_from_observation = (
+ exclude_current_positions_from_observation
+ )
+
+ obs_shape = 27 + 1
+ if not exclude_current_positions_from_observation:
+ obs_shape += 2
+ if use_contact_forces:
+ obs_shape += 84
+
+ observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64
+ )
+
+ MujocoEnv.__init__(
+ self,
+ xml_file,
+ 5,
+ observation_space=observation_space,
+ default_camera_config=DEFAULT_CAMERA_CONFIG,
+ **kwargs,
+ )
+
+
+class AntJumpEnv(AntEnvCustomXML):
"""
Initialization changes to normal Ant:
- healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
@@ -61,9 +130,10 @@ class AntJumpEnv(AntEnv):
costs = ctrl_cost + contact_cost
- done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
+ terminated = bool(
+ height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
- if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done:
+ if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or terminated:
# -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
# height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
@@ -80,19 +150,21 @@ class AntJumpEnv(AntEnv):
'max_height': self.max_height,
'goal': self.goal
}
+ truncated = False
- return obs, reward, done, info
+ return obs, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
self.max_height = 0
# goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
+ ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.0, 2.5, 1)
- return super().reset()
+ return ret
# reset_model had to be implemented in every env to make it deterministic
def reset_model(self):
diff --git a/fancy_gym/envs/mujoco/beerpong/beerpong.py b/fancy_gym/envs/mujoco/beerpong/beerpong.py
index 368425d..802776f 100644
--- a/fancy_gym/envs/mujoco/beerpong/beerpong.py
+++ b/fancy_gym/envs/mujoco/beerpong/beerpong.py
@@ -1,9 +1,13 @@
import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
import numpy as np
-from gym import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium import utils
+from gymnasium.core import ObsType
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
+
+import mujoco
MAX_EPISODE_STEPS_BEERPONG = 300
FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2!
@@ -30,7 +34,16 @@ CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "c
class BeerPongEnv(MujocoEnv, utils.EzPickle):
- def __init__(self):
+ metadata = {
+ "render_modes": [
+ "human",
+ "rgb_array",
+ "depth_array",
+ ],
+ "render_fps": 100
+ }
+
+ def __init__(self, **kwargs):
self._steps = 0
# Small Context -> Easier. Todo: Should we do different versions?
# self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
@@ -50,9 +63,9 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.repeat_action = 2
# TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
self.model = None
- self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model,
- self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
- x)
+ self.geom_id = lambda x: mujoco.mj_name2id(self.model,
+ mujoco.mjtObj.mjOBJ_GEOM,
+ x)
# for reward calculation
self.dists = []
@@ -65,7 +78,17 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
- MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco")
+ self.observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(29,), dtype=np.float64
+ )
+
+ MujocoEnv.__init__(
+ self,
+ self.xml_path,
+ frame_skip=1,
+ observation_space=self.observation_space,
+ **kwargs
+ )
utils.EzPickle.__init__(self)
@property
@@ -76,7 +99,8 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
def start_vel(self):
return self._start_vel
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.dists = []
self.dists_final = []
self.action_costs = []
@@ -86,7 +110,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_cup_contact = False
self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
- return super().reset()
+ return super().reset(seed=seed, options=options)
def reset_model(self):
init_pos_all = self.init_qpos.copy()
@@ -128,11 +152,11 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
if not crash:
reward, reward_infos = self._get_reward(applied_action)
is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference
- done = is_collided
+ terminated = is_collided
self._steps += 1
else:
reward = -30
- done = True
+ terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict(
@@ -142,7 +166,10 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
)
infos.update(reward_infos)
- return ob, reward, done, infos
+
+ truncated = False
+
+ return ob, reward, terminated, truncated, infos
def _get_obs(self):
theta = self.data.qpos.flat[:7].copy()
@@ -197,13 +224,13 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
action_cost = 1e-4 * np.mean(action_cost)
reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
- action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
+ action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
# release step punishment
min_time_bound = 0.1
max_time_bound = 1.0
release_time = self.release_step * self.dt
release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
- int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
+ int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
reward += release_time_rew
success = self.ball_in_cup
else:
@@ -258,9 +285,9 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else:
reward = 0
- done = True
+ terminated, truncated = True, False
while self._steps < MAX_EPISODE_STEPS_BEERPONG:
- obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
+ obs, sub_reward, terminated, truncated, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
np.zeros(a.shape))
reward += sub_reward
- return obs, reward, done, infos
+ return obs, reward, terminated, truncated, infos
diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
index 015e887..93bba06 100644
--- a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
+++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py
@@ -1,9 +1,8 @@
import os
-import mujoco_py.builder
import numpy as np
-from gym import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
@@ -74,27 +73,24 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
crash = False
for _ in range(self.repeat_action):
applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
- try:
- self.do_simulation(applied_action, self.frame_skip)
- self.reward_function.initialize(self)
- # self.reward_function.check_contacts(self.sim) # I assume this is not important?
- if self._steps < self.release_step:
- self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
- self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
- crash = False
- except mujoco_py.builder.MujocoException:
- crash = True
+ self.do_simulation(applied_action, self.frame_skip)
+ self.reward_function.initialize(self)
+ # self.reward_function.check_contacts(self.sim) # I assume this is not important?
+ if self._steps < self.release_step:
+ self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
+ self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
+ crash = False
ob = self._get_obs()
if not crash:
reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
is_collided = reward_infos['is_collided']
- done = is_collided or self._steps == self.ep_length - 1
+ terminated = is_collided or self._steps == self.ep_length - 1
self._steps += 1
else:
reward = -30
- done = True
+ terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict(
@@ -104,7 +100,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
)
infos.update(reward_infos)
- return ob, reward, done, infos
+ return ob, reward, terminated, infos
def _get_obs(self):
theta = self.sim.data.qpos.flat[:7]
@@ -143,16 +139,16 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else:
reward = 0
- done = False
- while not done:
- sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
- np.zeros(a.shape))
+ terminated, truncated = False, False
+ while not (terminated or truncated):
+ sub_ob, sub_reward, terminated, truncated, sub_infos = super(BeerPongEnvStepBasedEpisodicReward,
+ self).step(np.zeros(a.shape))
reward += sub_reward
infos = sub_infos
ob = sub_ob
ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# internal steps and thus, the observation also needs to be set correctly
- return ob, reward, done, infos
+ return ob, reward, terminated, truncated, infos
# class BeerBongEnvStepBased(BeerBongEnv):
@@ -186,27 +182,3 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# # internal steps and thus, the observation also needs to be set correctly
# return ob, reward, done, infos
-
-
-if __name__ == "__main__":
- env = BeerPongEnv(frame_skip=2)
- env.seed(0)
- # env = BeerBongEnvStepBased(frame_skip=2)
- # env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
- # env = BeerBongEnvFixedReleaseStep(frame_skip=2)
- import time
-
- env.reset()
- env.render("human")
- for i in range(600):
- # ac = 10 * env.action_space.sample()
- ac = 0.05 * np.ones(7)
- obs, rew, d, info = env.step(ac)
- env.render("human")
-
- if d:
- print('reward:', rew)
- print('RESETTING')
- env.reset()
- time.sleep(1)
- env.close()
diff --git a/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
index 17a11e1..452ee05 100644
--- a/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py
@@ -6,6 +6,23 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'phase_generator_kwargs': {
+ 'learn_tau': True
+ },
+ 'controller_kwargs': {
+ 'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
+ 'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 2,
+ 'num_basis_zero_start': 2,
+ },
+ },
+ 'DMP': {},
+ 'ProDMP': {},
+ }
@property
def context_mask(self) -> np.ndarray:
@@ -39,3 +56,23 @@ class MPWrapper(RawInterfaceWrapper):
xyz[-1] = 0.840
self.model.body_pos[self.cup_table_id] = xyz
return self.get_observation_from_step(self.get_obs())
+
+
+class MPWrapper_FixedRelease(MPWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'phase_generator_kwargs': {
+ 'tau': 0.62,
+ },
+ 'controller_kwargs': {
+ 'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
+ 'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 2,
+ 'num_basis_zero_start': 2,
+ },
+ },
+ 'DMP': {},
+ 'ProDMP': {},
+ }
diff --git a/fancy_gym/envs/mujoco/box_pushing/__init__.py b/fancy_gym/envs/mujoco/box_pushing/__init__.py
index c5e6d2f..d683024 100644
--- a/fancy_gym/envs/mujoco/box_pushing/__init__.py
+++ b/fancy_gym/envs/mujoco/box_pushing/__init__.py
@@ -1 +1 @@
-from .mp_wrapper import MPWrapper
+from .mp_wrapper import MPWrapper, ReplanMPWrapper
diff --git a/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py b/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py
index 06f7e02..932e3df 100644
--- a/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py
+++ b/fancy_gym/envs/mujoco/box_pushing/box_pushing_env.py
@@ -1,8 +1,8 @@
import os
import numpy as np
-from gym import utils, spaces
-from gym.envs.mujoco import MujocoEnv
+from gymnasium import utils, spaces
+from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat
@@ -13,6 +13,7 @@ MAX_EPISODE_STEPS_BOX_PUSHING = 100
BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]])
+
class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
"""
franka box pushing environment
@@ -26,6 +27,15 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
3. time-spatial-depend sparse reward
"""
+ metadata = {
+ "render_modes": [
+ "human",
+ "rgb_array",
+ "depth_array",
+ ],
+ "render_fps": 50
+ }
+
def __init__(self, frame_skip: int = 10, random_init: bool = False):
utils.EzPickle.__init__(**locals())
self._steps = 0
@@ -39,11 +49,16 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
self._desired_rod_quat = desired_rod_quat
self._episode_energy = 0.
+
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(28,), dtype=np.float64
+ )
+
self.random_init = random_init
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"),
frame_skip=self.frame_skip,
- mujoco_bindings="mujoco")
+ observation_space=self.observation_space)
self.action_space = spaces.Box(low=-1, high=1, shape=(7,))
def step(self, action):
@@ -89,7 +104,11 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False,
'num_steps': self._steps
}
- return obs, reward, episode_end, infos
+
+ terminated = episode_end and infos['is_success']
+ truncated = episode_end and not infos['is_success']
+
+ return obs, reward, terminated, truncated, infos
def reset_model(self):
# rest box to initial position
@@ -250,7 +269,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
old_err_norm = err_norm
- ### get Jacobian by mujoco
+ # get Jacobian by mujoco
self.data.qpos[:7] = q
mujoco.mj_forward(self.model, self.data)
@@ -284,6 +303,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
return q
+
class BoxPushingDense(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init)
@@ -299,7 +319,7 @@ class BoxPushingDense(BoxPushingEnvBase):
energy_cost = -0.0005 * np.sum(np.square(action))
reward = joint_penalty + tcp_box_dist_reward + \
- box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
+ box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat)
if rod_inclined_angle > np.pi / 4:
@@ -307,6 +327,7 @@ class BoxPushingDense(BoxPushingEnvBase):
return reward
+
class BoxPushingTemporalSparse(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init)
@@ -368,6 +389,7 @@ class BoxPushingTemporalSpatialSparse(BoxPushingEnvBase):
return reward
+
class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):
diff --git a/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py b/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py
index 8da6855..06bb7dc 100644
--- a/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/box_pushing/mp_wrapper.py
@@ -6,6 +6,27 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'controller_kwargs': {
+ 'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
+ 'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
+ },
+ 'basis_generator_kwargs': {
+ 'basis_bandwidth_factor': 2 # 3.5, 4 to try
+ }
+ },
+ 'DMP': {},
+ 'ProDMP': {
+ 'controller_kwargs': {
+ 'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
+ 'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
+ },
+ 'basis_generator_kwargs': {
+ 'basis_bandwidth_factor': 2 # 3.5, 4 to try
+ }
+ },
+ }
# Random x goal + random init pos
@property
@@ -38,3 +59,35 @@ class MPWrapper(RawInterfaceWrapper):
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.data.qvel[:7].copy()
+
+
+class ReplanMPWrapper(MPWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {
+ 'controller_kwargs': {
+ 'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
+ 'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 0.3,
+ 'goal_scale': 0.3,
+ 'auto_scale_basis': True,
+ 'goal_offset': 1.0,
+ 'disable_goal': True,
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 5,
+ 'basis_bandwidth_factor': 3,
+ },
+ 'phase_generator_kwargs': {
+ 'alpha_phase': 3,
+ },
+ 'black_box_kwargs': {
+ 'max_planning_times': 4,
+ 'replanning_schedule': lambda pos, vel, obs, action, t: t % 25 == 0,
+ 'condition_on_desired': True,
+ }
+ }
+ }
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
index e0a5982..f15a9f4 100644
--- a/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py
@@ -1,14 +1,68 @@
import os
-from typing import Tuple, Union, Optional
+from typing import Tuple, Union, Optional, Any, Dict
import numpy as np
-from gym.core import ObsType
-from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv
+from gymnasium.core import ObsType
+from gymnasium.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv, DEFAULT_CAMERA_CONFIG
+
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
-class HalfCheetahJumpEnv(HalfCheetahEnv):
+class HalfCheetahEnvCustomXML(HalfCheetahEnv):
+
+ def __init__(
+ self,
+ xml_file,
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=0.1,
+ reset_noise_scale=0.1,
+ exclude_current_positions_from_observation=True,
+ **kwargs,
+ ):
+ utils.EzPickle.__init__(
+ self,
+ xml_file,
+ forward_reward_weight,
+ ctrl_cost_weight,
+ reset_noise_scale,
+ exclude_current_positions_from_observation,
+ **kwargs,
+ )
+
+ self._forward_reward_weight = forward_reward_weight
+
+ self._ctrl_cost_weight = ctrl_cost_weight
+
+ self._reset_noise_scale = reset_noise_scale
+
+ self._exclude_current_positions_from_observation = (
+ exclude_current_positions_from_observation
+ )
+
+ if exclude_current_positions_from_observation:
+ observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
+ )
+ else:
+ observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
+ )
+
+ MujocoEnv.__init__(
+ self,
+ xml_file,
+ 5,
+ observation_space=observation_space,
+ default_camera_config=DEFAULT_CAMERA_CONFIG,
+ **kwargs,
+ )
+
+
+class HalfCheetahJumpEnv(HalfCheetahEnvCustomXML):
"""
_ctrl_cost_weight 0.1 -> 0.0
"""
@@ -41,10 +95,11 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
height_after = self.get_body_com("torso")[2]
self.max_height = max(height_after, self.max_height)
- ## Didnt use fell_over, because base env also has no done condition - Paul and Marc
+ # Didnt use fell_over, because base env also has no done condition - Paul and Marc
# fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke?
# TODO: Should a fall over be checked here?
- done = False
+ terminated = False
+ truncated = False
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
@@ -63,17 +118,18 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
'max_height': self.max_height
}
- return observation, reward, done, info
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
- options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0
self.current_step = 0
+ ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6
- return super().reset()
+ return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
index 11b169b..f5f7634 100644
--- a/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py
@@ -6,6 +6,12 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {},
+ }
+
@property
def context_mask(self) -> np.ndarray:
return np.hstack([
diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.before_convert.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.before_convert.xml
new file mode 100644
index 0000000..3348bab
--- /dev/null
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.before_convert.xml
@@ -0,0 +1,52 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
index 3348bab..fb1b978 100644
--- a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml
@@ -1,52 +1,51 @@
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
+
+
+
-
-
-
+
+
+
-
-
-
-
-
-
-
diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
index 69d78ff..b66c3ca 100644
--- a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
+++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml
@@ -1,51 +1,50 @@
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
-
+
+
+
-
-
-
-
-
-
-
-
\ No newline at end of file
+
diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
index da9ac4d..b77cab1 100644
--- a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py
@@ -1,12 +1,95 @@
import os
import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.envs.mujoco.hopper_v4 import HopperEnv, DEFAULT_CAMERA_CONFIG
+
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
+
+import mujoco
MAX_EPISODE_STEPS_HOPPERJUMP = 250
-class HopperJumpEnv(HopperEnv):
+class HopperEnvCustomXML(HopperEnv):
+ """
+ Initialization changes to normal Hopper:
+ - terminate_when_unhealthy: True -> False
+ - healthy_reward: 1.0 -> 2.0
+ - healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
+ - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
+ - exclude_current_positions_from_observation: True -> False
+ """
+
+ def __init__(
+ self,
+ xml_file,
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=1.0,
+ terminate_when_unhealthy=True,
+ healthy_state_range=(-100.0, 100.0),
+ healthy_z_range=(0.7, float("inf")),
+ healthy_angle_range=(-0.2, 0.2),
+ reset_noise_scale=5e-3,
+ exclude_current_positions_from_observation=True,
+ **kwargs,
+ ):
+ xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
+ utils.EzPickle.__init__(
+ self,
+ xml_file,
+ forward_reward_weight,
+ ctrl_cost_weight,
+ healthy_reward,
+ terminate_when_unhealthy,
+ healthy_state_range,
+ healthy_z_range,
+ healthy_angle_range,
+ reset_noise_scale,
+ exclude_current_positions_from_observation,
+ **kwargs
+ )
+
+ self._forward_reward_weight = forward_reward_weight
+
+ self._ctrl_cost_weight = ctrl_cost_weight
+
+ self._healthy_reward = healthy_reward
+ self._terminate_when_unhealthy = terminate_when_unhealthy
+
+ self._healthy_state_range = healthy_state_range
+ self._healthy_z_range = healthy_z_range
+ self._healthy_angle_range = healthy_angle_range
+
+ self._reset_noise_scale = reset_noise_scale
+
+ self._exclude_current_positions_from_observation = (
+ exclude_current_positions_from_observation
+ )
+
+ if not hasattr(self, 'observation_space'):
+ if exclude_current_positions_from_observation:
+ self.observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(15,), dtype=np.float64
+ )
+ else:
+ self.observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(16,), dtype=np.float64
+ )
+
+ MujocoEnv.__init__(
+ self,
+ xml_file,
+ 4,
+ observation_space=self.observation_space,
+ default_camera_config=DEFAULT_CAMERA_CONFIG,
+ **kwargs,
+ )
+
+
+class HopperJumpEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- terminate_when_unhealthy: True -> False
@@ -73,7 +156,7 @@ class HopperJumpEnv(HopperEnv):
self.do_simulation(action, self.frame_skip)
height_after = self.get_body_com("torso")[2]
- #site_pos_after = self.data.get_site_xpos('foot_site')
+ # site_pos_after = self.data.get_site_xpos('foot_site')
site_pos_after = self.data.site('foot_site').xpos
self.max_height = max(height_after, self.max_height)
@@ -88,7 +171,8 @@ class HopperJumpEnv(HopperEnv):
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
- done = False
+ terminated = False
+ truncated = False
goal_dist = np.linalg.norm(site_pos_after - self.goal)
if self.contact_dist is None and self.contact_with_floor:
@@ -115,7 +199,7 @@ class HopperJumpEnv(HopperEnv):
healthy=self.is_healthy,
contact_dist=self.contact_dist or 0
)
- return observation, reward, done, info
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
# goal_dist = self.data.get_site_xpos('foot_site') - self.goal
@@ -140,8 +224,8 @@ class HopperJumpEnv(HopperEnv):
noise_high[5] = 0.785
qpos = (
- self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
- self.init_qpos
+ self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
+ self.init_qpos
)
qvel = (
# self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
@@ -162,12 +246,12 @@ class HopperJumpEnv(HopperEnv):
# floor_geom_id = self.model.geom_name2id('floor')
# foot_geom_id = self.model.geom_name2id('foot_geom')
# TODO: do this properly over a sensor in the xml file, see dmc hopper
- floor_geom_id = self._mujoco_bindings.mj_name2id(self.model,
- self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
- 'floor')
- foot_geom_id = self._mujoco_bindings.mj_name2id(self.model,
- self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
- 'foot_geom')
+ floor_geom_id = mujoco.mj_name2id(self.model,
+ mujoco.mjtObj.mjOBJ_GEOM,
+ 'floor')
+ foot_geom_id = mujoco.mj_name2id(self.model,
+ mujoco.mjtObj.mjOBJ_GEOM,
+ 'foot_geom')
for i in range(self.data.ncon):
contact = self.data.contact[i]
collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id
diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
index f9834bd..506344b 100644
--- a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py
@@ -1,12 +1,16 @@
import os
+from typing import Optional, Dict, Any, Tuple
import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.core import ObsType
+from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
+from gymnasium import spaces
+
MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
-class HopperJumpOnBoxEnv(HopperEnv):
+class HopperJumpOnBoxEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.01 -> 0.001
@@ -33,6 +37,16 @@ class HopperJumpOnBoxEnv(HopperEnv):
self.hopper_on_box = False
self.context = context
self.box_x = 1
+
+ if exclude_current_positions_from_observation:
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64
+ )
+ else:
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(13,), dtype=np.float64
+ )
+
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
@@ -74,10 +88,10 @@ class HopperJumpOnBoxEnv(HopperEnv):
costs = ctrl_cost
- done = fell_over or self.hopper_on_box
+ terminated = fell_over or self.hopper_on_box
- if self.current_step >= self.max_episode_steps or done:
- done = False
+ if self.current_step >= self.max_episode_steps or terminated:
+ done = False # TODO why are we doing this???
max_height = self.max_height.copy()
min_distance = self.min_distance.copy()
@@ -122,21 +136,25 @@ class HopperJumpOnBoxEnv(HopperEnv):
'goal': self.box_x,
}
- return observation, reward, done, info
+ truncated = self.current_step >= self.max_episode_steps and not terminated
+
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.box_x)
- def reset(self):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0
self.min_distance = 5000
self.current_step = 0
self.hopper_on_box = False
+ ret = super().reset(seed=seed, options=options)
if self.context:
self.box_x = self.np_random.uniform(1, 3, 1)
self.model.body("box").pos = [self.box_x[0], 0, 0]
- return super().reset()
+ return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@@ -150,21 +168,3 @@ class HopperJumpOnBoxEnv(HopperEnv):
observation = self._get_obs()
return observation
-
-if __name__ == '__main__':
- render_mode = "human" # "human" or "partial" or "final"
- env = HopperJumpOnBoxEnv()
- obs = env.reset()
-
- for i in range(2000):
- # objective.load_result("/tmp/cma")
- # test with random actions
- ac = env.action_space.sample()
- obs, rew, d, info = env.step(ac)
- if i % 10 == 0:
- env.render(mode=render_mode)
- if d:
- print('After ', i, ' steps, done: ', d)
- env.reset()
-
- env.close()
\ No newline at end of file
diff --git a/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
index ed95b3d..4faeaad 100644
--- a/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py
@@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {},
+ }
# Random x goal + random init pos
@property
diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
index 1c39602..fd17979 100644
--- a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml
@@ -1,56 +1,54 @@
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
+
+
+
+
+
+
-
-
-
+
+
+
-
-
-
-
-
-
-
diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
index b4f0342..655b056 100644
--- a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
+++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml
@@ -1,132 +1,129 @@
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
+
+
+
-
-
-
-
-
-
-
-
\ No newline at end of file
+
diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
index e69cea6..b5afc8b 100644
--- a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py
@@ -1,13 +1,15 @@
import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from gymnasium.core import ObsType
+from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
+from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERTHROW = 250
-class HopperThrowEnv(HopperEnv):
+class HopperThrowEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0 -> 0.1
@@ -36,6 +38,16 @@ class HopperThrowEnv(HopperEnv):
self.max_episode_steps = max_episode_steps
self.context = context
self.goal = 0
+
+ if not hasattr(self, 'observation_space'):
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
+ )
+ else:
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
+ )
+
super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight,
ctrl_cost_weight=ctrl_cost_weight,
@@ -56,14 +68,14 @@ class HopperThrowEnv(HopperEnv):
# done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
- done = ball_landed
+ terminated = ball_landed
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
rewards = 0
- if self.current_step >= self.max_episode_steps or done:
+ if self.current_step >= self.max_episode_steps or terminated:
distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
self._forward_reward_weight * ball_pos_after
healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
@@ -78,16 +90,19 @@ class HopperThrowEnv(HopperEnv):
'_steps': self.current_step,
'goal': self.goal,
}
+ truncated = False
- return observation, reward, done, info
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
+ ret = super().reset(seed=seed, options=options)
self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0
- return super().reset()
+ return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@@ -101,22 +116,3 @@ class HopperThrowEnv(HopperEnv):
observation = self._get_obs()
return observation
-
-
-if __name__ == '__main__':
- render_mode = "human" # "human" or "partial" or "final"
- env = HopperThrowEnv()
- obs = env.reset()
-
- for i in range(2000):
- # objective.load_result("/tmp/cma")
- # test with random actions
- ac = env.action_space.sample()
- obs, rew, d, info = env.step(ac)
- if i % 10 == 0:
- env.render(mode=render_mode)
- if d:
- print('After ', i, ' steps, done: ', d)
- env.reset()
-
- env.close()
diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
index 76ef861..00d1bdb 100644
--- a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py
@@ -1,13 +1,16 @@
import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
import numpy as np
-from gym.envs.mujoco.hopper_v4 import HopperEnv
+from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
+from gymnasium.core import ObsType
+from gymnasium import spaces
+
MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
-class HopperThrowInBasketEnv(HopperEnv):
+class HopperThrowInBasketEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0
@@ -42,6 +45,16 @@ class HopperThrowInBasketEnv(HopperEnv):
self.context = context
self.penalty = penalty
self.basket_x = 5
+
+ if exclude_current_positions_from_observation:
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
+ )
+ else:
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
+ )
+
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight,
@@ -65,14 +78,14 @@ class HopperThrowInBasketEnv(HopperEnv):
is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
- self.basket_size / 2)
+ self.basket_size / 2)
is_in_basket_z = ball_pos[2] < 0.1
is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
if is_in_basket:
self.ball_in_basket = True
ball_landed = self.get_body_com("ball")[2] <= 0.05
- done = bool(ball_landed or is_in_basket)
+ terminated = bool(ball_landed or is_in_basket)
rewards = 0
@@ -80,7 +93,7 @@ class HopperThrowInBasketEnv(HopperEnv):
costs = ctrl_cost
- if self.current_step >= self.max_episode_steps or done:
+ if self.current_step >= self.max_episode_steps or terminated:
if is_in_basket:
if not self.context:
@@ -101,23 +114,27 @@ class HopperThrowInBasketEnv(HopperEnv):
info = {
'ball_pos': ball_pos[0],
}
+ truncated = False
- return observation, reward, done, info
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.basket_x)
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
+
if self.max_episode_steps == 10:
# We have to initialize this here, because the spec is only added after creating the env.
self.max_episode_steps = self.spec.max_episode_steps
self.current_step = 0
self.ball_in_basket = False
+ ret = super().reset(seed=seed, options=options)
if self.context:
self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
- return super().reset()
+ return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@@ -132,22 +149,3 @@ class HopperThrowInBasketEnv(HopperEnv):
observation = self._get_obs()
return observation
-
-
-if __name__ == '__main__':
- render_mode = "human" # "human" or "partial" or "final"
- env = HopperThrowInBasketEnv()
- obs = env.reset()
-
- for i in range(2000):
- # objective.load_result("/tmp/cma")
- # test with random actions
- ac = env.action_space.sample()
- obs, rew, d, info = env.step(ac)
- if i % 10 == 0:
- env.render(mode=render_mode)
- if d:
- print('After ', i, ' steps, done: ', d)
- env.reset()
-
- env.close()
diff --git a/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
index cad680a..03588a2 100644
--- a/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py
@@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {},
+ }
@property
def context_mask(self):
diff --git a/fancy_gym/envs/mujoco/reacher/mp_wrapper.py b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
index 0464640..d47737a 100644
--- a/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py
@@ -7,6 +7,16 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {
+ 'phase_generator_kwargs': {
+ 'alpha_phase': 2,
+ },
+ },
+ 'ProDMP': {},
+ }
+
@property
def context_mask(self):
return np.concatenate([[False] * self.n_links, # cos
diff --git a/fancy_gym/envs/mujoco/reacher/reacher.py b/fancy_gym/envs/mujoco/reacher/reacher.py
index e55f13a..f5af7f6 100644
--- a/fancy_gym/envs/mujoco/reacher/reacher.py
+++ b/fancy_gym/envs/mujoco/reacher/reacher.py
@@ -1,8 +1,9 @@
import os
import numpy as np
-from gym import utils
-from gym.envs.mujoco import MujocoEnv
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
MAX_EPISODE_STEPS_REACHER = 200
@@ -12,7 +13,17 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
More general version of the gym mujoco Reacher environment
"""
- def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1):
+ metadata = {
+ "render_modes": [
+ "human",
+ "rgb_array",
+ "depth_array",
+ ],
+ "render_fps": 50,
+ }
+
+ def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1.,
+ **kwargs):
utils.EzPickle.__init__(**locals())
self._steps = 0
@@ -25,10 +36,16 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
file_name = f'reacher_{n_links}links.xml'
+ # sin, cos, velocity * n_Links + goal position (2) and goal distance (3)
+ shape = (self.n_links * 3 + 5,)
+ observation_space = Box(low=-np.inf, high=np.inf, shape=shape, dtype=np.float64)
+
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
frame_skip=2,
- mujoco_bindings="mujoco")
+ observation_space=observation_space,
+ **kwargs
+ )
def step(self, action):
self._steps += 1
@@ -45,10 +62,14 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
reward = reward_dist + reward_ctrl + angular_vel
self.do_simulation(action, self.frame_skip)
- ob = self._get_obs()
- done = False
+ if self.render_mode == "human":
+ self.render()
- infos = dict(
+ ob = self._get_obs()
+ terminated = False
+ truncated = False
+
+ info = dict(
reward_dist=reward_dist,
reward_ctrl=reward_ctrl,
velocity=angular_vel,
@@ -56,7 +77,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
goal=self.goal if hasattr(self, "goal") else None
)
- return ob, reward, done, infos
+ return ob, reward, terminated, truncated, info
def distance_reward(self):
vec = self.get_body_com("fingertip") - self.get_body_com("target")
@@ -66,6 +87,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
def viewer_setup(self):
+ assert self.viewer is not None
self.viewer.cam.trackbodyid = 0
def reset_model(self):
diff --git a/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py b/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py
index 3370047..fcc31a8 100644
--- a/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/table_tennis/mp_wrapper.py
@@ -7,6 +7,53 @@ from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, j
class TT_MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ 'phase_generator_kwargs': {
+ 'learn_tau': False,
+ 'learn_delay': False,
+ 'tau_bound': [0.8, 1.5],
+ 'delay_bound': [0.05, 0.15],
+ },
+ 'controller_kwargs': {
+ 'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
+ 'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 3,
+ 'num_basis_zero_start': 1,
+ 'num_basis_zero_goal': 1,
+ },
+ 'black_box_kwargs': {
+ 'verbose': 2,
+ },
+ },
+ 'DMP': {},
+ 'ProDMP': {
+ 'phase_generator_kwargs': {
+ 'learn_tau': True,
+ 'learn_delay': True,
+ 'tau_bound': [0.8, 1.5],
+ 'delay_bound': [0.05, 0.15],
+ 'alpha_phase': 3,
+ },
+ 'controller_kwargs': {
+ 'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
+ 'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 3,
+ 'alpha': 25,
+ 'basis_bandwidth_factor': 3,
+ },
+ 'trajectory_generator_kwargs': {
+ 'weights_scale': 0.7,
+ 'auto_scale_basis': True,
+ 'relative_goal': True,
+ 'disable_goal': True,
+ },
+ },
+ }
# Random x goal + random init pos
@property
@@ -16,7 +63,7 @@ class TT_MPWrapper(RawInterfaceWrapper):
[False] * 7, # joints velocity
[True] * 2, # position ball x, y
[False] * 1, # position ball z
- #[True] * 3, # velocity ball x, y, z
+ # [True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position
# [True] * 1, # time
])
@@ -40,7 +87,42 @@ class TT_MPWrapper(RawInterfaceWrapper):
return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]:
return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound)
+
+class TT_MPWrapper_Replan(TT_MPWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {
+ 'phase_generator_kwargs': {
+ 'learn_tau': True,
+ 'learn_delay': True,
+ 'tau_bound': [0.8, 1.5],
+ 'delay_bound': [0.05, 0.15],
+ 'alpha_phase': 3,
+ },
+ 'controller_kwargs': {
+ 'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
+ 'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
+ },
+ 'basis_generator_kwargs': {
+ 'num_basis': 2,
+ 'alpha': 25,
+ 'basis_bandwidth_factor': 3,
+ },
+ 'trajectory_generator_kwargs': {
+ 'auto_scale_basis': True,
+ 'goal_offset': 1.0,
+ },
+ 'black_box_kwargs': {
+ 'max_planning_times': 3,
+ 'replanning_schedule': lambda pos, vel, obs, action, t: t % 50 == 0,
+ },
+ },
+ }
+
+
class TTVelObs_MPWrapper(TT_MPWrapper):
+ # Will inherit mp_config from TT_MPWrapper
@property
def context_mask(self):
@@ -52,4 +134,20 @@ class TTVelObs_MPWrapper(TT_MPWrapper):
[True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position
# [True] * 1, # time
- ])
\ No newline at end of file
+ ])
+
+
+class TTVelObs_MPWrapper_Replan(TT_MPWrapper_Replan):
+ # Will inherit mp_config from TT_MPWrapper_Replan
+
+ @property
+ def context_mask(self):
+ return np.hstack([
+ [False] * 7, # joints position
+ [False] * 7, # joints velocity
+ [True] * 2, # position ball x, y
+ [False] * 1, # position ball z
+ [True] * 3, # velocity ball x, y, z
+ [True] * 2, # target landing position
+ # [True] * 1, # time
+ ])
diff --git a/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py b/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py
index 3f86463..5395de7 100644
--- a/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py
+++ b/fancy_gym/envs/mujoco/table_tennis/table_tennis_env.py
@@ -1,8 +1,8 @@
import os
import numpy as np
-from gym import utils, spaces
-from gym.envs.mujoco import MujocoEnv
+from gymnasium import utils, spaces
+from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high
@@ -22,6 +22,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
"""
7 DoF table tennis environment
"""
+
+ metadata = {
+ "render_modes": [
+ "human",
+ "rgb_array",
+ "depth_array",
+ ],
+ "render_fps": 125
+ }
+
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4,
goal_switching_step: int = None,
enable_artificial_wind: bool = False):
@@ -50,11 +60,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
self._artificial_force = 0.
+ if not hasattr(self, 'observation_space'):
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
+ )
+
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"),
frame_skip=frame_skip,
- mujoco_bindings="mujoco")
-
+ observation_space=self.observation_space)
+
if ctxt_dim == 2:
self.context_bounds = CONTEXT_BOUNDS_2DIMS
elif ctxt_dim == 4:
@@ -83,11 +98,11 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
unstable_simulation = False
if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5:
- new_goal_pos = self._generate_goal_pos(random=True)
- new_goal_pos[1] = -new_goal_pos[1]
- self._goal_pos = new_goal_pos
- self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
- mujoco.mj_forward(self.model, self.data)
+ new_goal_pos = self._generate_goal_pos(random=True)
+ new_goal_pos[1] = -new_goal_pos[1]
+ self._goal_pos = new_goal_pos
+ self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
+ mujoco.mj_forward(self.model, self.data)
for _ in range(self.frame_skip):
if self._enable_artificial_wind:
@@ -102,7 +117,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball:
self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \
- self._contact_checker(self._ball_contact_id, self._bat_back_id)
+ self._contact_checker(self._ball_contact_id, self._bat_back_id)
if not self._hit_ball:
ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id)
if ball_land_on_floor_no_hit:
@@ -130,9 +145,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
reward = -25 if unstable_simulation else self._get_reward(self._terminated)
land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \
- if self._ball_landing_pos is not None else 10.
+ if self._ball_landing_pos is not None else 10.
- return self._get_obs(), reward, self._terminated, {
+ info = {
"hit_ball": self._hit_ball,
"ball_returned_success": self._ball_return_success,
"land_dist_error": land_dist_err,
@@ -140,6 +155,10 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
"num_steps": self._steps,
}
+ terminated, truncated = self._terminated, False
+
+ return self._get_obs(), reward, terminated, truncated, info
+
def _contact_checker(self, id_1, id_2):
for coni in range(0, self.data.ncon):
con = self.data.contact[coni]
@@ -202,7 +221,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball:
return 0.2 * (1 - np.tanh(min_r_b_dist**2))
if self._ball_landing_pos is None:
- min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:,:2] - self._goal_pos[:2], axis=1))
+ min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:, :2] - self._goal_pos[:2], axis=1))
return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2))
min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2])
over_net_bonus = int(self._ball_landing_pos[0] < 0)
@@ -231,13 +250,13 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0))
violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0))
invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \
- violate_high_bound_error + violate_low_bound_error
+ violate_high_bound_error + violate_low_bound_error
return -invalid_penalty
def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound):
- obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
+ obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound)
- return obs, penalty, True, {
+ return obs, penalty, True, False, {
"hit_ball": [False],
"ball_returned_success": [False],
"land_dist_error": [10.],
@@ -249,7 +268,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
@staticmethod
def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound):
time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \
- or action[1] > delay_bound[1] or action[1] < delay_bound[0]
+ or action[1] > delay_bound[1] or action[1] < delay_bound[0]
if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low):
return False, pos_traj, vel_traj
return True, pos_traj, vel_traj
@@ -257,6 +276,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
class TableTennisWind(TableTennisEnv):
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4):
+ self.observation_space = spaces.Box(
+ low=-np.inf, high=np.inf, shape=(22,), dtype=np.float64
+ )
super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True)
def _get_obs(self):
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
index f3bcbd1..96621c7 100644
--- a/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml
@@ -1,64 +1,60 @@
-
-
-
-
+
+
+
+
+
-
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
+
+
+
+
+
+
-
-
-
-
-
-
-
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
index d55e9d2..3dd8c55 100644
--- a/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py
@@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {},
+ 'DMP': {},
+ 'ProDMP': {},
+ }
@property
def context_mask(self):
diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
index ed663d2..6ad2be0 100644
--- a/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
+++ b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py
@@ -1,8 +1,13 @@
import os
-from typing import Optional
+from typing import Optional, Any, Dict, Tuple
import numpy as np
-from gym.envs.mujoco.walker2d_v4 import Walker2dEnv
+from gymnasium.envs.mujoco.walker2d_v4 import Walker2dEnv, DEFAULT_CAMERA_CONFIG
+from gymnasium.core import ObsType
+
+from gymnasium import utils
+from gymnasium.envs.mujoco import MujocoEnv
+from gymnasium.spaces import Box
MAX_EPISODE_STEPS_WALKERJUMP = 300
@@ -11,8 +16,71 @@ MAX_EPISODE_STEPS_WALKERJUMP = 300
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
# as possible, while landing at a specific target position
+class Walker2dEnvCustomXML(Walker2dEnv):
+ def __init__(
+ self,
+ xml_file,
+ forward_reward_weight=1.0,
+ ctrl_cost_weight=1e-3,
+ healthy_reward=1.0,
+ terminate_when_unhealthy=True,
+ healthy_z_range=(0.8, 2.0),
+ healthy_angle_range=(-1.0, 1.0),
+ reset_noise_scale=5e-3,
+ exclude_current_positions_from_observation=True,
+ **kwargs,
+ ):
+ utils.EzPickle.__init__(
+ self,
+ xml_file,
+ forward_reward_weight,
+ ctrl_cost_weight,
+ healthy_reward,
+ terminate_when_unhealthy,
+ healthy_z_range,
+ healthy_angle_range,
+ reset_noise_scale,
+ exclude_current_positions_from_observation,
+ **kwargs,
+ )
-class Walker2dJumpEnv(Walker2dEnv):
+ self._forward_reward_weight = forward_reward_weight
+ self._ctrl_cost_weight = ctrl_cost_weight
+
+ self._healthy_reward = healthy_reward
+ self._terminate_when_unhealthy = terminate_when_unhealthy
+
+ self._healthy_z_range = healthy_z_range
+ self._healthy_angle_range = healthy_angle_range
+
+ self._reset_noise_scale = reset_noise_scale
+
+ self._exclude_current_positions_from_observation = (
+ exclude_current_positions_from_observation
+ )
+
+ if exclude_current_positions_from_observation:
+ observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
+ )
+ else:
+ observation_space = Box(
+ low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
+ )
+
+ self.observation_space = observation_space
+
+ MujocoEnv.__init__(
+ self,
+ xml_file,
+ 4,
+ observation_space=observation_space,
+ default_camera_config=DEFAULT_CAMERA_CONFIG,
+ **kwargs,
+ )
+
+
+class Walker2dJumpEnv(Walker2dEnvCustomXML):
"""
healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
penalty 10 -> 0 not from alex
@@ -54,13 +122,13 @@ class Walker2dJumpEnv(Walker2dEnv):
self.max_height = max(height, self.max_height)
- done = bool(height < 0.2)
+ terminated = bool(height < 0.2)
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
rewards = 0
- if self.current_step >= self.max_episode_steps or done:
- done = True
+ if self.current_step >= self.max_episode_steps or terminated:
+ terminated = True
height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
healthy_reward = self.healthy_reward * self.current_step
@@ -73,17 +141,20 @@ class Walker2dJumpEnv(Walker2dEnv):
'max_height': self.max_height,
'goal': self.goal,
}
+ truncated = False
- return observation, reward, done, info
+ return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
- def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
+ def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
+ -> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
self.max_height = 0
+ ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0
- return super().reset()
+ return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@@ -97,21 +168,3 @@ class Walker2dJumpEnv(Walker2dEnv):
observation = self._get_obs()
return observation
-
-
-if __name__ == '__main__':
- render_mode = "human" # "human" or "partial" or "final"
- env = Walker2dJumpEnv()
- obs = env.reset()
-
- for i in range(6000):
- # test with random actions
- ac = env.action_space.sample()
- obs, rew, d, info = env.step(ac)
- if i % 10 == 0:
- env.render(mode=render_mode)
- if d:
- print('After ', i, ' steps, done: ', d)
- env.reset()
-
- env.close()
diff --git a/fancy_gym/envs/registry.py b/fancy_gym/envs/registry.py
new file mode 100644
index 0000000..321996f
--- /dev/null
+++ b/fancy_gym/envs/registry.py
@@ -0,0 +1,309 @@
+from typing import Tuple, Union, Callable, List, Dict, Any, Optional
+
+import copy
+import importlib
+import numpy as np
+from collections import defaultdict
+
+from collections.abc import Mapping, MutableMapping
+
+from fancy_gym.utils.make_env_helpers import make_bb
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+
+from gymnasium import register as gym_register
+from gymnasium import make as gym_make
+from gymnasium.envs.registration import registry as gym_registry
+
+
+class DefaultMPWrapper(RawInterfaceWrapper):
+ @property
+ def context_mask(self):
+ """
+ Returns boolean mask of the same shape as the observation space.
+ It determines whether the observation is returned for the contextual case or not.
+ This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
+ E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
+ context/part of the first observation, the velocities are not necessary in the observation for the task.
+ Returns:
+ bool array representing the indices of the observations
+ """
+ # If the env already defines a context_mask, we will use that
+ if hasattr(self.env, 'context_mask'):
+ return self.env.context_mask
+
+ # Otherwise we will use the whole observation as the context. (Write a custom MPWrapper to change this behavior)
+ return np.full(self.env.observation_space.shape, True)
+
+ @property
+ def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
+ """
+ Returns the current position of the action/control dimension.
+ The dimensionality has to match the action/control dimension.
+ This is not required when exclusively using velocity control,
+ it should, however, be implemented regardless.
+ E.g. The joint positions that are directly or indirectly controlled by the action.
+ """
+ assert hasattr(self.env, 'current_pos'), 'DefaultMPWrapper was unable to access env.current_pos. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
+ return self.env.current_pos
+
+ @property
+ def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
+ """
+ Returns the current velocity of the action/control dimension.
+ The dimensionality has to match the action/control dimension.
+ This is not required when exclusively using position control,
+ it should, however, be implemented regardless.
+ E.g. The joint velocities that are directly or indirectly controlled by the action.
+ """
+ assert hasattr(self.env, 'current_vel'), 'DefaultMPWrapper was unable to access env.current_vel. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
+ return self.env.current_vel
+
+
+_BB_DEFAULTS = {
+ 'ProMP': {
+ 'wrappers': [],
+ 'trajectory_generator_kwargs': {
+ 'trajectory_generator_type': 'promp'
+ },
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'linear'
+ },
+ 'controller_kwargs': {
+ 'controller_type': 'motor',
+ 'p_gains': 1.0,
+ 'd_gains': 0.1,
+ },
+ 'basis_generator_kwargs': {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1,
+ 'basis_bandwidth_factor': 3.0,
+ },
+ 'black_box_kwargs': {
+ }
+ },
+ 'DMP': {
+ 'wrappers': [],
+ 'trajectory_generator_kwargs': {
+ 'trajectory_generator_type': 'dmp'
+ },
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'exp'
+ },
+ 'controller_kwargs': {
+ 'controller_type': 'motor',
+ 'p_gains': 1.0,
+ 'd_gains': 0.1,
+ },
+ 'basis_generator_kwargs': {
+ 'basis_generator_type': 'rbf',
+ 'num_basis': 5
+ },
+ 'black_box_kwargs': {
+ }
+ },
+ 'ProDMP': {
+ 'wrappers': [],
+ 'trajectory_generator_kwargs': {
+ 'trajectory_generator_type': 'prodmp',
+ 'duration': 2.0,
+ 'weights_scale': 1.0,
+ },
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'exp',
+ 'tau': 1.5,
+ },
+ 'controller_kwargs': {
+ 'controller_type': 'motor',
+ 'p_gains': 1.0,
+ 'd_gains': 0.1,
+ },
+ 'basis_generator_kwargs': {
+ 'basis_generator_type': 'prodmp',
+ 'alpha': 10,
+ 'num_basis': 5,
+ },
+ 'black_box_kwargs': {
+ }
+ }
+}
+
+KNOWN_MPS = list(_BB_DEFAULTS.keys())
+_KNOWN_MPS_PLUS_ALL = KNOWN_MPS + ['all']
+ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
+MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS = {}
+
+
+def register(
+ id: str,
+ entry_point: Optional[Union[Callable, str]] = None,
+ mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
+ register_step_based: bool = True, # TODO: Detect
+ add_mp_types: List[str] = KNOWN_MPS,
+ mp_config_override: Dict[str, Any] = {},
+ **kwargs
+):
+ """
+ Registers a Gymnasium environment, including Movement Primitives (MP) versions.
+ If you only want to register MP versions for an already registered environment, use fancy_gym.upgrade instead.
+
+ Args:
+ id (str): The unique identifier for the environment.
+ entry_point (Optional[Union[Callable, str]]): The entry point for creating the environment.
+ mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment.
+ register_step_based (bool): Whether to also register the raw srtep-based version of the environment (default True).
+ add_mp_types (List[str]): List of additional MP types to register.
+ mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
+ **kwargs: Additional keyword arguments which are passed to the environment constructor.
+
+ Notes:
+ - When `register_step_based` is True, the raw environment will also be registered to gymnasium otherwise only mp-versions will be registered.
+ - `entry_point` can be given as a string, allowing the same notation as gymnasium.
+ - If `id` already exists in the Gymnasium registry and `register_step_based` is True,
+ a warning message will be printed, suggesting to set `register_step_based=False` or use `fancy_gym.upgrade`.
+
+ Example:
+ To register a step-based environment with Movement Primitive versions (will use default mp_wrapper):
+ >>> register("MyEnv-v0", MyEnvClass"my_module:MyEnvClass")
+
+ The entry point can also be provided as a string:
+ >>> register("MyEnv-v0", "my_module:MyEnvClass")
+
+ """
+ if register_step_based and id in gym_registry:
+ print(f'[Info] Gymnasium env with id "{id}" already exists. You should supply register_step_based=False or use fancy_gym.upgrade if you only want to register mp versions of an existing env.')
+ if register_step_based:
+ assert entry_point != None, 'You need to provide an entry-point, when registering step-based.'
+ if not callable(mp_wrapper): # mp_wrapper can be given as a String (same notation as for entry_point)
+ mod_name, attr_name = mp_wrapper.split(':')
+ mod = importlib.import_module(mod_name)
+ mp_wrapper = getattr(mod, attr_name)
+ if register_step_based:
+ gym_register(id=id, entry_point=entry_point, **kwargs)
+ upgrade(id, mp_wrapper, add_mp_types, mp_config_override)
+
+
+def upgrade(
+ id: str,
+ mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
+ add_mp_types: List[str] = KNOWN_MPS,
+ base_id: Optional[str] = None,
+ mp_config_override: Dict[str, Any] = {},
+):
+ """
+ Upgrades an existing Gymnasium environment to include Movement Primitives (MP) versions.
+ We expect the raw step-based env to be already registered with gymnasium. Otherwise please use fancy_gym.register instead.
+
+ Args:
+ id (str): The unique identifier for the environment.
+ mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment (default is DefaultMPWrapper).
+ add_mp_types (List[str]): List of additional MP types to register (default is KNOWN_MPS).
+ base_id (Optional[str]): The unique identifier for the environment to upgrade. Will use id if non is provided. Can be defined to allow multiple registrations of different versions for the same step-based environment.
+ mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
+
+ Notes:
+ - The `id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade. You can also pick a new one, but then `base_id` needs to be provided.
+ - The `mp_wrapper` parameter specifies the MP wrapper to use, allowing for customization.
+ - `add_mp_types` can be used to specify additional MP types to register alongside the base environment.
+ - The `base_id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade.
+ - `mp_config_override` allows for customizing MP configuration if needed.
+
+ Example:
+ To upgrade an existing environment with MP versions:
+ >>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper)
+
+ To upgrade an existing environment with custom MP types and configuration:
+ >>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper, add_mp_types=["ProDMP", "DMP"], mp_config_override={"param": 42})
+ """
+ if not base_id:
+ base_id = id
+ register_mps(id, base_id, mp_wrapper, add_mp_types, mp_config_override)
+
+
+def register_mps(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, add_mp_types: List[str] = KNOWN_MPS, mp_config_override: Dict[str, Any] = {}):
+ for mp_type in add_mp_types:
+ register_mp(id, base_id, mp_wrapper, mp_type, mp_config_override.get(mp_type, {}))
+
+
+def register_mp(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, mp_type: List[str], mp_config_override: Dict[str, Any] = {}):
+ assert mp_type in KNOWN_MPS, 'Unknown mp_type'
+ assert id not in ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type], f'The environment {id} is already registered for {mp_type}.'
+
+ parts = id.split('/')
+ if len(parts) == 1:
+ ns, name = 'gym', parts[0]
+ elif len(parts) == 2:
+ ns, name = parts[0], parts[1]
+ else:
+ raise ValueError('env id can not contain multiple "/".')
+
+ parts = name.split('-')
+ assert len(parts) >= 2 and parts[-1].startswith('v'), 'Malformed env id, must end in -v{int}.'
+
+ fancy_id = f'{ns}_{mp_type}/{name}'
+
+ gym_register(
+ id=fancy_id,
+ entry_point=bb_env_constructor,
+ kwargs={
+ 'underlying_id': base_id,
+ 'mp_wrapper': mp_wrapper,
+ 'mp_type': mp_type,
+ '_mp_config_override_register': mp_config_override
+ }
+ )
+
+ ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type].append(fancy_id)
+ ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all'].append(fancy_id)
+ if ns not in MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS:
+ MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns] = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
+ MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns][mp_type].append(fancy_id)
+ MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]['all'].append(fancy_id)
+
+
+def nested_update(base: MutableMapping, update):
+ """
+ Updated method for nested Mappings
+ Args:
+ base: main Mapping to be updated
+ update: updated values for base Mapping
+
+ """
+ if any([item.endswith('_type') for item in update]):
+ base = update
+ return base
+ for k, v in update.items():
+ base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
+ return base
+
+
+def bb_env_constructor(underlying_id, mp_wrapper, mp_type, mp_config_override={}, _mp_config_override_register={}, **kwargs):
+ raw_underlying_env = gym_make(underlying_id, **kwargs)
+ underlying_env = mp_wrapper(raw_underlying_env)
+
+ mp_config = getattr(underlying_env, 'mp_config') if hasattr(underlying_env, 'mp_config') else {}
+ active_mp_config = copy.deepcopy(mp_config.get(mp_type, {}))
+ global_inherit_defaults = mp_config.get('inherit_defaults', True)
+ inherit_defaults = active_mp_config.pop('inherit_defaults', global_inherit_defaults)
+
+ config = copy.deepcopy(_BB_DEFAULTS[mp_type]) if inherit_defaults else {}
+ nested_update(config, active_mp_config)
+ nested_update(config, _mp_config_override_register)
+ nested_update(config, mp_config_override)
+
+ wrappers = config.pop('wrappers')
+
+ traj_gen_kwargs = config.pop('trajectory_generator_kwargs', {})
+ black_box_kwargs = config.pop('black_box_kwargs', {})
+ contr_kwargs = config.pop('controller_kwargs', {})
+ phase_kwargs = config.pop('phase_generator_kwargs', {})
+ basis_kwargs = config.pop('basis_generator_kwargs', {})
+
+ return make_bb(underlying_env,
+ wrappers=wrappers,
+ black_box_kwargs=black_box_kwargs,
+ traj_gen_kwargs=traj_gen_kwargs,
+ controller_kwargs=contr_kwargs,
+ phase_kwargs=phase_kwargs,
+ basis_kwargs=basis_kwargs,
+ **config)
diff --git a/fancy_gym/examples/example_replanning_envs.py b/fancy_gym/examples/example_replanning_envs.py
index 977ce9e..2c3c3f4 100644
--- a/fancy_gym/examples/example_replanning_envs.py
+++ b/fancy_gym/examples/example_replanning_envs.py
@@ -1,20 +1,23 @@
+import gymnasium as gym
import fancy_gym
-def example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False):
- env = fancy_gym.make(env_name, seed=seed)
- env.reset()
+
+def example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False):
+ env = gym.make(env_name)
+ env.reset(seed=seed)
for i in range(iterations):
done = False
while done is False:
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
if render:
env.render(mode="human")
- if done:
+ if terminated or truncated:
env.reset()
env.close()
del env
+
def example_custom_replanning_envs(seed=0, iteration=100, render=True):
# id for a step-based environment
base_env_id = "BoxPushingDense-v0"
@@ -22,7 +25,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper]
trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp',
- 'weight_scale': 1}
+ 'weights_scale': 1}
phase_generator_kwargs = {'phase_generator_type': 'exp'}
controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'prodmp',
@@ -46,8 +49,8 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
for i in range(iteration):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
- if done:
+ obs, reward, terminated, truncated, info = env.step(ac)
+ if terminated or truncated:
env.reset()
env.close()
@@ -56,7 +59,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
if __name__ == "__main__":
# run a registered replanning environment
- example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False)
+ example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False)
# run a custom replanning environment
- example_custom_replanning_envs(seed=0, iteration=8, render=True)
\ No newline at end of file
+ example_custom_replanning_envs(seed=0, iteration=8, render=True)
diff --git a/fancy_gym/examples/examples_dmc.py b/fancy_gym/examples/examples_dmc.py
index 75648b7..fbb1473 100644
--- a/fancy_gym/examples/examples_dmc.py
+++ b/fancy_gym/examples/examples_dmc.py
@@ -1,7 +1,8 @@
+import gymnasium as gym
import fancy_gym
-def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
+def example_dmc(env_id="dm_control/fish-swim", seed=1, iterations=1000, render=True):
"""
Example for running a DMC based env in the step based setting.
The env_id has to be specified as `domain_name:task_name` or
@@ -16,9 +17,9 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
Returns:
"""
- env = fancy_gym.make(env_id, seed)
+ env = gym.make(env_id)
rewards = 0
- obs = env.reset()
+ obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape)
@@ -26,10 +27,10 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
ac = env.action_space.sample()
if render:
env.render(mode="human")
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
+ if terminated or truncated:
print(env_id, rewards)
rewards = 0
obs = env.reset()
@@ -56,7 +57,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
"""
# Base DMC name, according to structure of above example
- base_env_id = "dmc:ball_in_cup-catch"
+ base_env_id = "dm_control/ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@@ -65,8 +66,8 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'motor',
- "p_gains": 1.0,
- "d_gains": 0.1,}
+ "p_gains": 1.0,
+ "d_gains": 0.1, }
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
@@ -102,10 +103,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
+ if terminated or truncated:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
@@ -123,14 +124,14 @@ if __name__ == '__main__':
render = True
# # Standard DMC Suite tasks
- example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render)
+ example_dmc("dm_control/fish-swim", seed=10, iterations=1000, render=render)
#
# # Manipulation tasks
# # Disclaimer: The vision versions are currently not integrated and yield an error
- example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render)
+ example_dmc("dm_control/manipulation-reach_site_features", seed=10, iterations=250, render=render)
#
# # Gym + DMC hybrid task provided in the MP framework
- example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render)
+ example_dmc("dm_control_ProMP/ball_in_cup-catch-v0", seed=10, iterations=1, render=render)
# Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
# already registered above
diff --git a/fancy_gym/examples/examples_general.py b/fancy_gym/examples/examples_general.py
index 1a89e30..e341bfe 100644
--- a/fancy_gym/examples/examples_general.py
+++ b/fancy_gym/examples/examples_general.py
@@ -1,6 +1,6 @@
from collections import defaultdict
-import gym
+import gymnasium as gym
import numpy as np
import fancy_gym
@@ -21,27 +21,27 @@ def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
"""
- env = fancy_gym.make(env_id, seed)
+ env = gym.make(env_id)
rewards = 0
- obs = env.reset()
+ obs = env.reset(seed=seed)
print("Observation shape: ", env.observation_space.shape)
print("Action shape: ", env.action_space.shape)
# number of environment steps
for i in range(iterations):
- obs, reward, done, info = env.step(env.action_space.sample())
+ obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
rewards += reward
if render:
env.render()
- if done:
+ if terminated or truncated:
print(rewards)
rewards = 0
obs = env.reset()
-def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
+def example_async(env_id="fancy/HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
"""
Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
This also includes DMC and DMP environments when leveraging our custom make_env function.
@@ -69,12 +69,15 @@ def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samp
# this would generate more samples than requested if n_samples % num_envs != 0
repeat = int(np.ceil(n_samples / env.num_envs))
for i in range(repeat):
- obs, reward, done, info = env.step(env.action_space.sample())
+ obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
buffer['obs'].append(obs)
buffer['reward'].append(reward)
- buffer['done'].append(done)
+ buffer['terminated'].append(terminated)
+ buffer['truncated'].append(truncated)
buffer['info'].append(info)
rewards += reward
+
+ done = terminated or truncated
if np.any(done):
print(f"Reward at iteration {i}: {rewards[done]}")
rewards[done] = 0
@@ -90,11 +93,10 @@ if __name__ == '__main__':
example_general("Pendulum-v1", seed=10, iterations=200, render=render)
# Mujoco task from framework
- example_general("Reacher5d-v0", seed=10, iterations=200, render=render)
+ example_general("fancy/Reacher5d-v0", seed=10, iterations=200, render=render)
# # OpenAI Mujoco task
example_general("HalfCheetah-v2", seed=10, render=render)
# Vectorized multiprocessing environments
# example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)
-
diff --git a/fancy_gym/examples/examples_metaworld.py b/fancy_gym/examples/examples_metaworld.py
index 0fa7066..7919b71 100644
--- a/fancy_gym/examples/examples_metaworld.py
+++ b/fancy_gym/examples/examples_metaworld.py
@@ -1,7 +1,8 @@
+import gymnasium as gym
import fancy_gym
-def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
+def example_meta(env_id="fish-swim", seed=1, iterations=1000, render=True):
"""
Example for running a MetaWorld based env in the step based setting.
The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
@@ -17,9 +18,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
Returns:
"""
- env = fancy_gym.make(env_id, seed)
+ env = gym.make(env_id)
rewards = 0
- obs = env.reset()
+ obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape)
@@ -29,9 +30,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
# THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
# TODO: Remove this, when Metaworld fixes its interface.
env.render(False)
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
+ if terminated or truncated:
print(env_id, rewards)
rewards = 0
obs = env.reset()
@@ -40,7 +41,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
del env
-def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
+def example_custom_meta_and_mp(seed=1, iterations=1, render=True):
"""
Example for running a custom movement primitive based environments.
Our already registered environments follow the same structure.
@@ -58,7 +59,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
"""
# Base MetaWorld name, according to structure of above example
- base_env_id = "metaworld:button-press-v2"
+ base_env_id = "metaworld/button-press-v2"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@@ -103,10 +104,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
+ if terminated or truncated:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
@@ -124,11 +125,10 @@ if __name__ == '__main__':
render = False
# # Standard Meta world tasks
- example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render)
+ example_meta("metaworld/button-press-v2", seed=10, iterations=500, render=render)
# # MP + MetaWorld hybrid task provided in the our framework
- example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render)
+ example_meta("metaworld_ProMP/ButtonPress-v2", seed=10, iterations=1, render=render)
#
# # Custom MetaWorld task
- example_custom_dmc_and_mp(seed=10, iterations=1, render=render)
-
+ example_custom_meta_and_mp(seed=10, iterations=1, render=render)
diff --git a/fancy_gym/examples/examples_movement_primitives.py b/fancy_gym/examples/examples_movement_primitives.py
index 485f71a..317a103 100644
--- a/fancy_gym/examples/examples_movement_primitives.py
+++ b/fancy_gym/examples/examples_movement_primitives.py
@@ -1,7 +1,8 @@
+import gymnasium as gym
import fancy_gym
-def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True):
+def example_mp(env_name="fancy_ProMP/HoleReacher-v0", seed=1, iterations=1, render=True):
"""
Example for running a black box based environment, which is already registered
Args:
@@ -15,11 +16,11 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
"""
# Equivalent to gym, we have a make function which can be used to create environments.
# It takes care of seeding and enables the use of a variety of external environments using the gym interface.
- env = fancy_gym.make(env_name, seed)
+ env = gym.make(env_name)
returns = 0
# env.render(mode=None)
- obs = env.reset()
+ obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
@@ -41,16 +42,16 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
# This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
# full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
# to the return of a trajectory. Default is the sum over the step-wise rewards.
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
# Aggregated returns
returns += reward
- if done:
+ if terminated or truncated:
print(reward)
obs = env.reset()
-def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True):
+def example_custom_mp(env_name="fancy_ProMP/Reacher5d-v0", seed=1, iterations=1, render=True):
"""
Example for running a movement primitive based environment, which is already registered
Args:
@@ -62,12 +63,9 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
Returns:
"""
- # Changing the arguments of the black box env is possible by providing them to gym as with all kwargs.
+ # Changing the arguments of the black box env is possible by providing them to gym through mp_config_override.
# E.g. here for way to many basis functions
- env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000})
- # env = fancy_gym.make(env_name, seed)
- # mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
- # mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
+ env = gym.make(env_name, seed, mp_config_override={'basis_generator_kwargs': {'num_basis': 1000}})
returns = 0
obs = env.reset()
@@ -79,10 +77,10 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
returns += reward
- if done:
+ if terminated or truncated:
print(i, reward)
obs = env.reset()
@@ -106,7 +104,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
"""
- base_env_id = "Reacher5d-v0"
+ base_env_id = "fancy/Reacher5d-v0"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@@ -114,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# For a ProMP
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
- 'weight_scale': 2}
+ 'weights_scale': 2}
phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
@@ -124,7 +122,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# # For a DMP
# trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
- # 'weight_scale': 500}
+ # 'weights_scale': 500}
# phase_generator_kwargs = {'phase_generator_type': 'exp',
# 'alpha_phase': 2.5}
# controller_kwargs = {'controller_type': 'velocity'}
@@ -145,10 +143,10 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
- if done:
+ if terminated or truncated:
print(rewards)
rewards = 0
obs = env.reset()
@@ -157,20 +155,20 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
if __name__ == '__main__':
render = False
# DMP
- example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render)
+ example_mp("fancy_DMP/HoleReacher-v0", seed=10, iterations=5, render=render)
# ProMP
- example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render)
- example_mp("BoxPushingTemporalSparseProMP-v0", seed=10, iterations=1, render=render)
- example_mp("TableTennis4DProMP-v0", seed=10, iterations=20, render=render)
+ example_mp("fancy_ProMP/HoleReacher-v0", seed=10, iterations=5, render=render)
+ example_mp("fancy_ProMP/BoxPushingTemporalSparse-v0", seed=10, iterations=1, render=render)
+ example_mp("fancy_ProMP/TableTennis4D-v0", seed=10, iterations=20, render=render)
# ProDMP with Replanning
- example_mp("BoxPushingDenseReplanProDMP-v0", seed=10, iterations=4, render=render)
- example_mp("TableTennis4DReplanProDMP-v0", seed=10, iterations=20, render=render)
- example_mp("TableTennisWindReplanProDMP-v0", seed=10, iterations=20, render=render)
+ example_mp("fancy_ProDMP/BoxPushingDenseReplan-v0", seed=10, iterations=4, render=render)
+ example_mp("fancy_ProDMP/TableTennis4DReplan-v0", seed=10, iterations=20, render=render)
+ example_mp("fancy_ProDMP/TableTennisWindReplan-v0", seed=10, iterations=20, render=render)
# Altered basis functions
- obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render)
+ obs1 = example_custom_mp("fancy_ProMP/Reacher5d-v0", seed=10, iterations=1, render=render)
# Custom MP
example_fully_custom_mp(seed=10, iterations=1, render=render)
diff --git a/fancy_gym/examples/examples_open_ai.py b/fancy_gym/examples/examples_open_ai.py
index 789271f..07f1719 100644
--- a/fancy_gym/examples/examples_open_ai.py
+++ b/fancy_gym/examples/examples_open_ai.py
@@ -1,3 +1,4 @@
+import gymnasium as gym
import fancy_gym
@@ -12,11 +13,10 @@ def example_mp(env_name, seed=1, render=True):
Returns:
"""
- # While in this case gym.make() is possible to use as well, we recommend our custom make env function.
- env = fancy_gym.make(env_name, seed)
+ env = gym.make(env_name)
returns = 0
- obs = env.reset()
+ obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps)
for i in range(10):
if render and i % 2 == 0:
@@ -24,14 +24,13 @@ def example_mp(env_name, seed=1, render=True):
else:
env.render()
ac = env.action_space.sample()
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
returns += reward
- if done:
+ if terminated or truncated:
print(returns)
obs = env.reset()
if __name__ == '__main__':
- example_mp("ReacherProMP-v2")
-
+ example_mp("gym_ProMP/Reacher-v2")
diff --git a/fancy_gym/examples/mp_params_tuning.py b/fancy_gym/examples/mp_params_tuning.py
index 644d86b..71a579a 100644
--- a/fancy_gym/examples/mp_params_tuning.py
+++ b/fancy_gym/examples/mp_params_tuning.py
@@ -1,10 +1,14 @@
+import gymnasium as gym
import fancy_gym
+
def compare_bases_shape(env1_id, env2_id):
- env1 = fancy_gym.make(env1_id, seed=0)
+ env1 = gym.make(env1_id)
env1.traj_gen.show_scaled_basis(plot=True)
- env2 = fancy_gym.make(env2_id, seed=0)
+ env2 = gym.make(env2_id)
env2.traj_gen.show_scaled_basis(plot=True)
return
+
+
if __name__ == '__main__':
- compare_bases_shape("TableTennis4DProDMP-v0", "TableTennis4DProMP-v0")
\ No newline at end of file
+ compare_bases_shape("fancy_ProDMP/TableTennis4D-v0", "fancy_ProMP/TableTennis4D-v0")
diff --git a/fancy_gym/examples/pd_control_gain_tuning.py b/fancy_gym/examples/pd_control_gain_tuning.py
index 71a3ba4..d0905ca 100644
--- a/fancy_gym/examples/pd_control_gain_tuning.py
+++ b/fancy_gym/examples/pd_control_gain_tuning.py
@@ -3,19 +3,20 @@ from collections import OrderedDict
import numpy as np
from matplotlib import pyplot as plt
+import gymnasium as gym
import fancy_gym
# This might work for some environments, however, please verify either way the correct trajectory information
# for your environment are extracted below
SEED = 1
-env_id = "Reacher5dProMP-v0"
+env_id = "fancy_ProMP/Reacher5d-v0"
-env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env
+env = fancy_gym.make(env_id, mp_config_override={'controller_kwargs': {'p_gains': 0.05, 'd_gains': 0.05}}).env
env.action_space.seed(SEED)
# Plot difference between real trajectory and target MP trajectory
-env.reset()
+env.reset(seed=SEED)
w = env.action_space.sample()
pos, vel = env.get_trajectory(w)
@@ -34,7 +35,7 @@ fig.show()
for t, (des_pos, des_vel) in enumerate(zip(pos, vel)):
actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel)
actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
- _, _, _, _ = env.env.step(actions)
+ env.env.step(actions)
if t % 15 == 0:
img.set_data(env.env.render(mode="rgb_array"))
fig.canvas.draw()
diff --git a/fancy_gym/meta/README.MD b/fancy_gym/meta/README.MD
index 1664cb0..9ec5594 100644
--- a/fancy_gym/meta/README.MD
+++ b/fancy_gym/meta/README.MD
@@ -1,26 +1,64 @@
-# MetaWorld Wrappers
+# Metaworld
-These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Movement Primitive gym interface with them.
-All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized.
-Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following:
-```python
- return np.hstack([
- # Current observation
- [False] * 3, # end-effector position
- [False] * 1, # normalized gripper open distance
- [True] * 3, # main object position
- [False] * 4, # main object quaternion
- [False] * 3, # secondary object position
- [False] * 4, # secondary object quaternion
- # Previous observation
- [False] * 3, # previous end-effector position
- [False] * 1, # previous normalized gripper open distance
- [False] * 3, # previous main object position
- [False] * 4, # previous main object quaternion
- [False] * 3, # previous second object position
- [False] * 4, # previous second object quaternion
- # Goal
- [True] * 3, # goal position
- ])
-```
-For other tasks only the boolean values have to be adjusted accordingly.
\ No newline at end of file
+[Metaworld](https://meta-world.github.io/) is an open-source simulated benchmark designed to advance meta-reinforcement learning and multi-task learning, comprising 50 diverse robotic manipulation tasks. The benchmark features a universal tabletop environment equipped with a simulated Sawyer arm and a variety of everyday objects. This shared environment is pivotal for reusing structured learning and efficiently acquiring related tasks.
+
+## Step-Based Envs
+
+`fancy_gym` makes all metaworld ML1 tasks avaible via the standard gym interface. To access metaworld environments using a different mode of operation (MT1 / ML100 / etc.) please use the functionality provided by metaworld directly.
+
+| Name | Description | Horizon | Action Dimension | Observation Dimension | Context Dimension |
+| ---------------------------------------- | ------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- | ----------------- |
+| `metaworld/assembly-v2` | A task where the robot must assemble components. | 500 | 4 | 39 | 6 |
+| `metaworld/basketball-v2` | A task where the robot must play a game of basketball. | 500 | 4 | 39 | 6 |
+| `metaworld/bin-picking-v2` | A task involving the robot picking objects from a bin. | 500 | 4 | 39 | 6 |
+| `metaworld/box-close-v2` | A task requiring the robot to close a box. | 500 | 4 | 39 | 6 |
+| `metaworld/button-press-topdown-v2` | A task where the robot must press a button from a top-down perspective. | 500 | 4 | 39 | 6 |
+| `metaworld/button-press-topdown-wall-v2` | A task involving the robot pressing a button with a wall from a top-down perspective. | 500 | 4 | 39 | 6 |
+| `metaworld/button-press-v2` | A task where the robot must press a button. | 500 | 4 | 39 | 6 |
+| `metaworld/button-press-wall-v2` | A task involving the robot pressing a button with a wall. | 500 | 4 | 39 | 6 |
+| `metaworld/coffee-button-v2` | A task where the robot must press a button on a coffee machine. | 500 | 4 | 39 | 6 |
+| `metaworld/coffee-pull-v2` | A task involving the robot pulling a lever on a coffee machine. | 500 | 4 | 39 | 6 |
+| `metaworld/coffee-push-v2` | A task involving the robot pushing a component on a coffee machine. | 500 | 4 | 39 | 6 |
+| `metaworld/dial-turn-v2` | A task where the robot must turn a dial. | 500 | 4 | 39 | 6 |
+| `metaworld/disassemble-v2` | A task requiring the robot to disassemble an object. | 500 | 4 | 39 | 6 |
+| `metaworld/door-close-v2` | A task where the robot must close a door. | 500 | 4 | 39 | 6 |
+| `metaworld/door-lock-v2` | A task involving the robot locking a door. | 500 | 4 | 39 | 6 |
+| `metaworld/door-open-v2` | A task where the robot must open a door. | 500 | 4 | 39 | 6 |
+| `metaworld/door-unlock-v2` | A task involving the robot unlocking a door. | 500 | 4 | 39 | 6 |
+| `metaworld/hand-insert-v2` | A task requiring the robot to insert a hand into an object. | 500 | 4 | 39 | 6 |
+| `metaworld/drawer-close-v2` | A task where the robot must close a drawer. | 500 | 4 | 39 | 6 |
+| `metaworld/drawer-open-v2` | A task involving the robot opening a drawer. | 500 | 4 | 39 | 6 |
+| `metaworld/faucet-open-v2` | A task requiring the robot to open a faucet. | 500 | 4 | 39 | 6 |
+| `metaworld/faucet-close-v2` | A task where the robot must close a faucet. | 500 | 4 | 39 | 6 |
+| `metaworld/hammer-v2` | A task where the robot must use a hammer. | 500 | 4 | 39 | 6 |
+| `metaworld/handle-press-side-v2` | A task involving the robot pressing a handle from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/handle-press-v2` | A task where the robot must press a handle. | 500 | 4 | 39 | 6 |
+| `metaworld/handle-pull-side-v2` | A task requiring the robot to pull a handle from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/handle-pull-v2` | A task where the robot must pull a handle. | 500 | 4 | 39 | 6 |
+| `metaworld/lever-pull-v2` | A task involving the robot pulling a lever. | 500 | 4 | 39 | 6 |
+| `metaworld/peg-insert-side-v2` | A task requiring the robot to insert a peg from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/pick-place-wall-v2` | A task involving the robot picking and placing an object with a wall. | 500 | 4 | 39 | 6 |
+| `metaworld/pick-out-of-hole-v2` | A task where the robot must pick an object out of a hole. | 500 | 4 | 39 | 6 |
+| `metaworld/reach-v2` | A task where the robot must reach an object. | 500 | 4 | 39 | 6 |
+| `metaworld/push-back-v2` | A task involving the robot pushing an object backward. | 500 | 4 | 39 | 6 |
+| `metaworld/push-v2` | A task where the robot must push an object. | 500 | 4 | 39 | 6 |
+| `metaworld/pick-place-v2` | A task involving the robot picking up and placing an object. | 500 | 4 | 39 | 6 |
+| `metaworld/plate-slide-v2` | A task requiring the robot to slide a plate. | 500 | 4 | 39 | 6 |
+| `metaworld/plate-slide-side-v2` | A task involving the robot sliding a plate from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/plate-slide-back-v2` | A task where the robot must slide a plate backward. | 500 | 4 | 39 | 6 |
+| `metaworld/plate-slide-back-side-v2` | A task involving the robot sliding a plate backward from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/peg-unplug-side-v2` | A task where the robot must unplug a peg from the side. | 500 | 4 | 39 | 6 |
+| `metaworld/soccer-v2` | A task where the robot must play soccer. | 500 | 4 | 39 | 6 |
+| `metaworld/stick-push-v2` | A task involving the robot pushing a stick. | 500 | 4 | 39 | 6 |
+| `metaworld/stick-pull-v2` | A task where the robot must pull a stick. | 500 | 4 | 39 | 6 |
+| `metaworld/push-wall-v2` | A task involving the robot pushing against a wall. | 500 | 4 | 39 | 6 |
+| `metaworld/reach-wall-v2` | A task where the robot must reach an object with a wall. | 500 | 4 | 39 | 6 |
+| `metaworld/shelf-place-v2` | A task involving the robot placing an object on a shelf. | 500 | 4 | 39 | 6 |
+| `metaworld/sweep-into-v2` | A task where the robot must sweep objects into a container. | 500 | 4 | 39 | 6 |
+| `metaworld/sweep-v2` | A task requiring the robot to sweep. | 500 | 4 | 39 | 6 |
+| `metaworld/window-open-v2` | A task where the robot must open a window. | 500 | 4 | 39 | 6 |
+| `metaworld/window-close-v2` | A task involving the robot closing a window. | 500 | 4 | 39 | 6 |
+
+## MP-Based Envs
+
+All envs also exist in MP-variants. Refer to them using `metaworld_ProMP/` or `metaworld_ProDMP/` (DMP is currently not supported as of now).
diff --git a/fancy_gym/meta/__init__.py b/fancy_gym/meta/__init__.py
index 401fc44..78ec73c 100644
--- a/fancy_gym/meta/__init__.py
+++ b/fancy_gym/meta/__init__.py
@@ -1,125 +1,37 @@
+from typing import Iterable, Type, Union, Optional
+
from copy import deepcopy
-from gym import register
+from ..envs.registry import register
from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
object_change_mp_wrapper
+from . import metaworld_adapter
+
+metaworld_adapter.register_all_ML1()
+
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
# MetaWorld
-
-DEFAULT_BB_DICT_ProMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'promp',
- 'weights_scale': 10,
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'linear'
- },
- "controller_kwargs": {
- 'controller_type': 'metaworld',
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'zero_rbf',
- 'num_basis': 5,
- 'num_basis_zero_start': 1
- },
- 'black_box_kwargs': {
- 'condition_on_desired': False,
- }
-}
-
-DEFAULT_BB_DICT_ProDMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'prodmp',
- 'auto_scale_basis': True,
- 'weights_scale': 10,
- # 'goal_scale': 0.,
- 'disable_goal': True,
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'exp',
- # 'alpha_phase' : 3,
- },
- "controller_kwargs": {
- 'controller_type': 'metaworld',
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'prodmp',
- 'num_basis': 5,
- 'alpha': 10
- },
- 'black_box_kwargs': {
- 'condition_on_desired': False,
- }
-
-}
-
_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
"plate-slide-side-v2", "plate-slide-back-side-v2"]
for _task in _goal_change_envs:
- task_id_split = _task.split("-")
- name = "".join([s.capitalize() for s in task_id_split[:-1]])
-
- # ProMP
- _env_id = f'{name}ProMP-{task_id_split[-1]}'
- kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
-
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_change_promp
+ id=f'metaworld/{_task}',
+ register_step_based=False,
+ mp_wrapper=goal_change_mp_wrapper.MPWrapper,
+ add_mp_types=['ProMP', 'ProDMP'],
)
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
- # ProDMP
- _env_id = f'{name}ProDMP-{task_id_split[-1]}'
- kwargs_dict_goal_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_goal_change_prodmp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_change_prodmp['name'] = f'metaworld:{_task}'
-
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_change_prodmp
- )
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
for _task in _object_change_envs:
- task_id_split = _task.split("-")
- name = "".join([s.capitalize() for s in task_id_split[:-1]])
-
- # ProMP
- _env_id = f'{name}ProMP-{task_id_split[-1]}'
- kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
- kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_object_change_promp
+ id=f'metaworld/{_task}',
+ register_step_based=False,
+ mp_wrapper=object_change_mp_wrapper.MPWrapper,
+ add_mp_types=['ProMP', 'ProDMP'],
)
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
- # ProDMP
- _env_id = f'{name}ProDMP-{task_id_split[-1]}'
- kwargs_dict_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_object_change_prodmp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
- kwargs_dict_object_change_prodmp['name'] = f'metaworld:{_task}'
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_object_change_prodmp
- )
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
"button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
@@ -133,62 +45,18 @@ _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press
"shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
]
for _task in _goal_and_object_change_envs:
- task_id_split = _task.split("-")
- name = "".join([s.capitalize() for s in task_id_split[:-1]])
-
- # ProMP
- _env_id = f'{name}ProMP-{task_id_split[-1]}'
- kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
-
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_and_object_change_promp
+ id=f'metaworld/{_task}',
+ register_step_based=False,
+ mp_wrapper=goal_object_change_mp_wrapper.MPWrapper,
+ add_mp_types=['ProMP', 'ProDMP'],
)
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
- # ProDMP
- _env_id = f'{name}ProDMP-{task_id_split[-1]}'
- kwargs_dict_goal_and_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_goal_and_object_change_prodmp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_and_object_change_prodmp['name'] = f'metaworld:{_task}'
-
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_and_object_change_prodmp
- )
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_endeffector_change_envs = ["basketball-v2"]
for _task in _goal_and_endeffector_change_envs:
- task_id_split = _task.split("-")
- name = "".join([s.capitalize() for s in task_id_split[:-1]])
-
- # ProMP
- _env_id = f'{name}ProMP-{task_id_split[-1]}'
- kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
- kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
-
register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_and_endeffector_change_promp
+ id=f'metaworld/{_task}',
+ register_step_based=False,
+ mp_wrapper=goal_endeffector_change_mp_wrapper.MPWrapper,
+ add_mp_types=['ProMP', 'ProDMP'],
)
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
-
- # ProDMP
- _env_id = f'{name}ProDMP-{task_id_split[-1]}'
- kwargs_dict_goal_and_endeffector_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
- kwargs_dict_goal_and_endeffector_change_prodmp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
- kwargs_dict_goal_and_endeffector_change_prodmp['name'] = f'metaworld:{_task}'
-
- register(
- id=_env_id,
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_goal_and_endeffector_change_prodmp
- )
- ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
diff --git a/fancy_gym/meta/base_metaworld_mp_wrapper.py b/fancy_gym/meta/base_metaworld_mp_wrapper.py
index 0f1a9a9..03b78dc 100644
--- a/fancy_gym/meta/base_metaworld_mp_wrapper.py
+++ b/fancy_gym/meta/base_metaworld_mp_wrapper.py
@@ -6,12 +6,63 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class BaseMetaworldMPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'inherit_defaults': False,
+ 'ProMP': {
+ 'wrappers': [],
+ 'trajectory_generator_kwargs': {
+ 'trajectory_generator_type': 'promp',
+ 'weights_scale': 10,
+ },
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'linear'
+ },
+ 'controller_kwargs': {
+ 'controller_type': 'metaworld',
+ },
+ 'basis_generator_kwargs': {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 5,
+ 'num_basis_zero_start': 1
+ },
+ 'black_box_kwargs': {
+ 'condition_on_desired': False,
+ },
+ },
+ 'DMP': {},
+ 'ProDMP': {
+ 'wrappers': [],
+ 'trajectory_generator_kwargs': {
+ 'trajectory_generator_type': 'prodmp',
+ 'auto_scale_basis': True,
+ 'weights_scale': 10,
+ # 'goal_scale': 0.,
+ 'disable_goal': True,
+ },
+ 'phase_generator_kwargs': {
+ 'phase_generator_type': 'exp',
+ # 'alpha_phase' : 3,
+ },
+ 'controller_kwargs': {
+ 'controller_type': 'metaworld',
+ },
+ 'basis_generator_kwargs': {
+ 'basis_generator_type': 'prodmp',
+ 'num_basis': 5,
+ 'alpha': 10
+ },
+ 'black_box_kwargs': {
+ 'condition_on_desired': False,
+ },
+ },
+ }
+
@property
def current_pos(self) -> Union[float, int, np.ndarray]:
- r_close = self.env.data.get_joint_qpos("r_close")
+ r_close = self.env.data.joint('r_close').qpos
return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return np.zeros(4, )
- # raise NotImplementedError("Velocity cannot be retrieved.")
+ # raise NotImplementedError('Velocity cannot be retrieved.')
diff --git a/fancy_gym/meta/goal_change_mp_wrapper.py b/fancy_gym/meta/goal_change_mp_wrapper.py
index a8eabb5..41cd9be 100644
--- a/fancy_gym/meta/goal_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_change_mp_wrapper.py
@@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
- ```python
- import fancy_gym
- env = fancy_gym.make(env_id, 1)
- print(env.reset() - env.reset())
- array([ 0. , 0. , 0. , 0. , 0,
- 0 , 0 , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0 , 0 , 0 ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , !=0 , !=0 , !=0])
- ```
"""
@property
diff --git a/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
index c299597..ec89702 100644
--- a/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py
@@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
- ```python
- import fancy_gym
- env = fancy_gym.make(env_id, 1)
- print(env.reset() - env.reset())
- array([ !=0 , !=0 , !=0 , 0. , 0.,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , !=0 , !=0 ,
- !=0 , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , !=0 , !=0 , !=0])
- ```
"""
@property
diff --git a/fancy_gym/meta/goal_object_change_mp_wrapper.py b/fancy_gym/meta/goal_object_change_mp_wrapper.py
index ae667a6..b42f142 100644
--- a/fancy_gym/meta/goal_object_change_mp_wrapper.py
+++ b/fancy_gym/meta/goal_object_change_mp_wrapper.py
@@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
- ```python
- import fancy_gym
- env = fancy_gym.make(env_id, 1)
- print(env.reset() - env.reset())
- array([ 0. , 0. , 0. , 0. , !=0,
- !=0 , !=0 , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , !=0 , !=0 , !=0 ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , 0. , 0. , 0. , 0. ,
- 0. , !=0 , !=0 , !=0])
- ```
"""
@property
diff --git a/fancy_gym/meta/metaworld_adapter.py b/fancy_gym/meta/metaworld_adapter.py
new file mode 100644
index 0000000..21dfed7
--- /dev/null
+++ b/fancy_gym/meta/metaworld_adapter.py
@@ -0,0 +1,97 @@
+import random
+from typing import Iterable, Type, Union, Optional
+
+import numpy as np
+from gymnasium import register as gym_register
+
+import uuid
+
+import gymnasium as gym
+import numpy as np
+
+from fancy_gym.utils.env_compatibility import EnvCompatibility
+
+try:
+ import metaworld
+except Exception:
+ print('[FANCY GYM] Metaworld not avaible')
+
+
+class FixMetaworldHasIncorrectObsSpaceWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
+ def __init__(self, env: gym.Env):
+ gym.utils.RecordConstructorArgs.__init__(self)
+ gym.Wrapper.__init__(self, env)
+
+ eos = env.observation_space
+ eas = env.action_space
+
+ Obs_Space_Class = getattr(gym.spaces, str(eos.__class__).split("'")[1].split('.')[-1])
+ Act_Space_Class = getattr(gym.spaces, str(eas.__class__).split("'")[1].split('.')[-1])
+
+ self.observation_space = Obs_Space_Class(low=eos.low-np.inf, high=eos.high+np.inf, dtype=eos.dtype)
+ self.action_space = Act_Space_Class(low=eas.low, high=eas.high, dtype=eas.dtype)
+
+
+class FixMetaworldIncorrectResetPathLengthWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
+ def __init__(self, env: gym.Env):
+ gym.utils.RecordConstructorArgs.__init__(self)
+ gym.Wrapper.__init__(self, env)
+
+ def reset(self, **kwargs):
+ ret = self.env.reset(**kwargs)
+ head = self.env
+ try:
+ for i in range(16):
+ head.curr_path_length = 0
+ head = head.env
+ except:
+ pass
+ return ret
+
+
+class FixMetaworldIgnoresSeedOnResetWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
+ def __init__(self, env: gym.Env):
+ gym.utils.RecordConstructorArgs.__init__(self)
+ gym.Wrapper.__init__(self, env)
+
+ def reset(self, **kwargs):
+ print('[!] You just called .reset on a Metaworld env and supplied a seed. Metaworld curretly does not correctly implement seeding. Do not rely on deterministic behavior.')
+ if 'seed' in kwargs:
+ self.env.seed(kwargs['seed'])
+ return self.env.reset(**kwargs)
+
+
+def make_metaworld(underlying_id: str, seed: int = 1, render_mode: Optional[str] = None, **kwargs):
+ if underlying_id not in metaworld.ML1.ENV_NAMES:
+ raise ValueError(f'Specified environment "{underlying_id}" not present in metaworld ML1.')
+
+ env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[underlying_id + "-goal-observable"](seed=seed, **kwargs)
+
+ # setting this avoids generating the same initialization after each reset
+ env._freeze_rand_vec = False
+ # New argument to use global seeding
+ env.seeded_rand_vec = True
+
+ # TODO remove, when this has been fixed upstream
+ env = FixMetaworldHasIncorrectObsSpaceWrapper(env)
+ # TODO remove, when this has been fixed upstream
+ # env = FixMetaworldIncorrectResetPathLengthWrapper(env)
+ # TODO remove, when this has been fixed upstream
+ env = FixMetaworldIgnoresSeedOnResetWrapper(env)
+ return env
+
+
+def register_all_ML1(**kwargs):
+ for env_id in metaworld.ML1.ENV_NAMES:
+ _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=0)
+ max_episode_steps = _env.max_path_length
+
+ gym_register(
+ id='metaworld/'+env_id,
+ entry_point=make_metaworld,
+ max_episode_steps=max_episode_steps,
+ kwargs={
+ 'underlying_id': env_id
+ },
+ **kwargs
+ )
diff --git a/fancy_gym/open_ai/README.MD b/fancy_gym/open_ai/README.MD
index 62d1f20..1db09ff 100644
--- a/fancy_gym/open_ai/README.MD
+++ b/fancy_gym/open_ai/README.MD
@@ -4,11 +4,12 @@ These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.
the Motion Primitive gym interface for them.
## MP Environments
+
These environments are wrapped-versions of their OpenAI-gym counterparts.
-|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
-|---|---|---|---|---|
-|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
-|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2
-|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4
-|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
+| Name | Description | Trajectory Horizon | Action Dimension |
+| ------------------------------------ | -------------------------------------------------------------------- | ------------------ | ---------------- |
+| `gym_ProMP/ContinuousMountainCar-v0` | A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 |
+| `gym_ProMP/Reacher-v2` | A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2 |
+| `gym_ProMP/FetchSlideDense-v1` | A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 |
+| `gym_ProMP/FetchReachDense-v1` | A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 |
diff --git a/fancy_gym/open_ai/__init__.py b/fancy_gym/open_ai/__init__.py
index ca87c84..c8422d2 100644
--- a/fancy_gym/open_ai/__init__.py
+++ b/fancy_gym/open_ai/__init__.py
@@ -1,45 +1,16 @@
from copy import deepcopy
-from gym import register
+from ..envs.registry import register, upgrade
from . import mujoco
from .deprecated_needs_gym_robotics import robotics
-ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
-
-DEFAULT_BB_DICT_ProMP = {
- "name": 'EnvName',
- "wrappers": [],
- "trajectory_generator_kwargs": {
- 'trajectory_generator_type': 'promp'
- },
- "phase_generator_kwargs": {
- 'phase_generator_type': 'linear'
- },
- "controller_kwargs": {
- 'controller_type': 'motor',
- "p_gains": 1.0,
- "d_gains": 0.1,
- },
- "basis_generator_kwargs": {
- 'basis_generator_type': 'zero_rbf',
- 'num_basis': 5,
- 'num_basis_zero_start': 1
- }
-}
-
-kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
-kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
-kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
-kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
-kwargs_dict_reacher_promp['name'] = "Reacher-v2"
-kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
-register(
- id='ReacherProMP-v2',
- entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
- kwargs=kwargs_dict_reacher_promp
+upgrade(
+ id='Reacher-v2',
+ mp_wrapper=mujoco.reacher_v2.MPWrapper,
+ add_mp_types=['ProMP'],
)
-ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
+
"""
The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
However, the usage and so on needs to be checked
diff --git a/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
index b2fa04c..3000353 100644
--- a/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
+++ b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py
@@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
+ mp_config = {
+ 'ProMP': {
+ "trajectory_generator_kwargs": {
+ 'trajectory_generator_type': 'promp'
+ },
+ "phase_generator_kwargs": {
+ 'phase_generator_type': 'linear'
+ },
+ "controller_kwargs": {
+ 'controller_type': 'motor',
+ "p_gains": 0.6,
+ "d_gains": 0.075,
+ },
+ "basis_generator_kwargs": {
+ 'basis_generator_type': 'zero_rbf',
+ 'num_basis': 6,
+ 'num_basis_zero_start': 1
+ }
+ },
+ 'DMP': {},
+ 'ProDMP': {},
+ }
@property
def current_vel(self) -> Union[float, int, np.ndarray]:
diff --git a/fancy_gym/utils/env_compatibility.py b/fancy_gym/utils/env_compatibility.py
new file mode 100644
index 0000000..a278451
--- /dev/null
+++ b/fancy_gym/utils/env_compatibility.py
@@ -0,0 +1,11 @@
+import gymnasium as gym
+
+
+class EnvCompatibility(gym.wrappers.EnvCompatibility):
+ def __getattr__(self, item):
+ """Propagate only non-existent properties to wrapped env."""
+ if item.startswith('_'):
+ raise AttributeError("attempted to get missing private attribute '{}'".format(item))
+ if item in self.__dict__:
+ return getattr(self, item)
+ return getattr(self.env, item)
diff --git a/fancy_gym/utils/make_env_helpers.py b/fancy_gym/utils/make_env_helpers.py
index 2e04d71..cebf7aa 100644
--- a/fancy_gym/utils/make_env_helpers.py
+++ b/fancy_gym/utils/make_env_helpers.py
@@ -1,17 +1,27 @@
-import logging
-import re
+from fancy_gym.utils.wrappers import TimeAwareObservation
+from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
+from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
+from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
+from fancy_gym.black_box.factory.controller_factory import get_controller
+from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
+from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
import uuid
from collections.abc import MutableMapping
-from copy import deepcopy
from math import ceil
-from typing import Iterable, Type, Union
+from typing import Iterable, Type, Union, Optional
-import gym
+import gymnasium as gym
+from gymnasium import make
import numpy as np
-from gym.envs.registration import register, registry
+from gymnasium.envs.registration import register, registry
+from gymnasium.wrappers import TimeLimit
+
+from fancy_gym.utils.env_compatibility import EnvCompatibility
+from fancy_gym.utils.wrappers import FlattenObservation
try:
- from dm_control import suite, manipulation
+ import shimmy
+ from shimmy.dm_control_compatibility import EnvType
except ImportError:
pass
@@ -21,111 +31,44 @@ except Exception:
# catch Exception as Import error does not catch missing mujoco-py
pass
-import fancy_gym
-from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
-from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
-from fancy_gym.black_box.factory.controller_factory import get_controller
-from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
-from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
-from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
-from fancy_gym.utils.time_aware_observation import TimeAwareObservation
-from fancy_gym.utils.utils import nested_update
-
-def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
- """
- TODO: Do we need this?
- Generate a callable to create a new gym environment with a given seed.
- The rank is added to the seed and can be used for example when using vector environments.
- E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
- with seeds 123 through 130.
- Hence, testing environments should be seeded with a value which is offset by the number of training environments.
- Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
-
- Args:
- env_id: name of the environment
- seed: seed for deterministic behaviour
- rank: environment rank for deterministic over multiple seeds behaviour
- return_callable: If True returns a callable to create the environment instead of the environment itself.
-
- Returns:
-
- """
-
- def f():
- return make(env_id, seed + rank, **kwargs)
-
- return f if return_callable else f()
-
-
-def make(env_id: str, seed: int, **kwargs):
- """
- Converts an env_id to an environment with the gym API.
- This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
- specified with "dmc:domain_name-task_name"
- Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
-
- Args:
- env_id: spec or env_id for gym tasks, external environments require a domain specification
- **kwargs: Additional kwargs for the constructor such as pixel observations, etc.
-
- Returns: Gym environment
-
- """
-
- if ':' in env_id:
- split_id = env_id.split(':')
- framework, env_id = split_id[-2:]
- else:
- framework = None
-
- if framework == 'metaworld':
- # MetaWorld environment
- env = make_metaworld(env_id, seed, **kwargs)
- elif framework == 'dmc':
- # DeepMind Control environment
- env = make_dmc(env_id, seed, **kwargs)
- else:
- env = make_gym(env_id, seed, **kwargs)
-
- env.seed(seed)
- env.action_space.seed(seed)
- env.observation_space.seed(seed)
-
- return env
-
-
-def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
+def _make_wrapped_env(env: gym.Env, wrappers: Iterable[Type[gym.Wrapper]], seed=1, fallback_max_steps=None):
"""
Helper function for creating a wrapped gym environment using MPs.
It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
provided to expose the interface for MPs.
Args:
- env_id: name of the environment
+ env: base environemnt to wrap
wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment
Returns: gym environment with all specified wrappers applied
"""
- # _env = gym.make(env_id)
- _env = make(env_id, seed, **kwargs)
+ if fallback_max_steps:
+ env = ensure_finite_time(env, fallback_max_steps)
has_black_box_wrapper = False
+ head = env
+ while hasattr(head, 'env'):
+ if isinstance(head, RawInterfaceWrapper):
+ has_black_box_wrapper = True
+ break
+ head = head.env
for w in wrappers:
# only wrap the environment if not BlackBoxWrapper, e.g. for vision
if issubclass(w, RawInterfaceWrapper):
has_black_box_wrapper = True
- _env = w(_env)
+ env = w(env)
if not has_black_box_wrapper:
raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
- return _env
+ return env
def make_bb(
- env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
- controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1,
- **kwargs):
+ env: Union[gym.Env, str], wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
+ controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping,
+ time_limit: int = None, fallback_max_steps: int = None, **kwargs):
"""
This can also be used standalone for manually building a custom DMP environment.
Args:
@@ -133,7 +76,7 @@ def make_bb(
basis_kwargs: kwargs for the basis generator
phase_kwargs: kwargs for the phase generator
controller_kwargs: kwargs for the tracking controller
- env_id: base_env_name,
+ env: step based environment (or environment id),
wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment
traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
@@ -141,7 +84,7 @@ def make_bb(
Returns: DMP wrapped gym env
"""
- _verify_time_limit(traj_gen_kwargs.get("duration"), kwargs.get("time_limit"))
+ _verify_time_limit(traj_gen_kwargs.get("duration"), time_limit)
learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
do_replanning = black_box_kwargs.get('replanning_schedule')
@@ -153,12 +96,19 @@ def make_bb(
# Add as first wrapper in order to alter observation
wrappers.insert(0, TimeAwareObservation)
- env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs)
+ if isinstance(env, str):
+ env = make(env, **kwargs)
+
+ env = _make_wrapped_env(env=env, wrappers=wrappers, fallback_max_steps=fallback_max_steps)
+
+ # BB expects a spaces.Box to be exposed, need to convert for dict-observations
+ if type(env.observation_space) == gym.spaces.dict.Dict:
+ env = FlattenObservation(env)
traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
if black_box_kwargs.get('duration') is None:
- black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt
+ black_box_kwargs['duration'] = get_env_duration(env)
if phase_kwargs.get('tau') is None:
phase_kwargs['tau'] = black_box_kwargs['duration']
@@ -186,156 +136,27 @@ def make_bb(
return bb_env
-def make_bb_env_helper(**kwargs):
- """
- Helper function for registering a black box gym environment.
- Args:
- **kwargs: expects at least the following:
- {
- "name": base environment name.
- "wrappers": list of wrappers (at least an BlackBoxWrapper is required),
- "traj_gen_kwargs": {
- "trajectory_generator_type": type_of_your_movement_primitive,
- non default arguments for the movement primitive instance
- ...
- }
- "controller_kwargs": {
- "controller_type": type_of_your_controller,
- non default arguments for the tracking_controller instance
- ...
- },
- "basis_generator_kwargs": {
- "basis_generator_type": type_of_your_basis_generator,
- non default arguments for the basis generator instance
- ...
- },
- "phase_generator_kwargs": {
- "phase_generator_type": type_of_your_phase_generator,
- non default arguments for the phase generator instance
- ...
- },
- }
-
- Returns: MP wrapped gym env
-
- """
- seed = kwargs.pop("seed", None)
- wrappers = kwargs.pop("wrappers")
-
- traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
- black_box_kwargs = kwargs.pop('black_box_kwargs', {})
- contr_kwargs = kwargs.pop("controller_kwargs", {})
- phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
- basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
-
- return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
- black_box_kwargs=black_box_kwargs,
- traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
- phase_kwargs=phase_kwargs,
- basis_kwargs=basis_kwargs, **kwargs, seed=seed)
-
-
-def make_dmc(
- env_id: str,
- seed: int = None,
- visualize_reward: bool = True,
- time_limit: Union[None, float] = None,
- **kwargs
-):
- if not re.match(r"\w+-\w+", env_id):
- raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
- domain_name, task_name = env_id.split("-")
-
- if task_name.endswith("_vision"):
- # TODO
- raise ValueError("The vision interface for manipulation tasks is currently not supported.")
-
- if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
- raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
-
- # env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
- gym_id = uuid.uuid4().hex + '-v1'
-
- task_kwargs = {'random': seed}
- if time_limit is not None:
- task_kwargs['time_limit'] = time_limit
-
- # create task
- # Accessing private attribute because DMC does not expose time_limit or step_limit.
- # Only the current time_step/time as well as the control_timestep can be accessed.
- if domain_name == "manipulation":
- env = manipulation.load(environment_name=task_name, seed=seed)
- max_episode_steps = ceil(env._time_limit / env.control_timestep())
- else:
- env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
- visualize_reward=visualize_reward, environment_kwargs=kwargs)
- max_episode_steps = int(env._step_limit)
-
- register(
- id=gym_id,
- entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
- kwargs={'env': lambda: env},
- max_episode_steps=max_episode_steps,
- )
-
- env = gym.make(gym_id)
- env.seed(seed)
+def ensure_finite_time(env: gym.Env, fallback_max_steps=500):
+ cur_limit = env.spec.max_episode_steps
+ if not cur_limit:
+ if hasattr(env.unwrapped, 'max_path_length'):
+ return TimeLimit(env, env.unwrapped.__getattribute__('max_path_length'))
+ return TimeLimit(env, fallback_max_steps)
return env
-def make_metaworld(env_id: str, seed: int, **kwargs):
- if env_id not in metaworld.ML1.ENV_NAMES:
- raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
-
- _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
-
- # setting this avoids generating the same initialization after each reset
- _env._freeze_rand_vec = False
- # New argument to use global seeding
- _env.seeded_rand_vec = True
-
- gym_id = uuid.uuid4().hex + '-v1'
-
- register(
- id=gym_id,
- entry_point=lambda: _env,
- max_episode_steps=_env.max_path_length,
- )
-
- # TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
- env = gym.make(gym_id, disable_env_checker=True)
- return env
-
-
-def make_gym(env_id, seed, **kwargs):
- """
- Create
- Args:
- env_id:
- seed:
- **kwargs:
-
- Returns:
-
- """
- # Getting the existing keywords to allow for nested dict updates for BB envs
- # gym only allows for non nested updates.
+def get_env_duration(env: gym.Env):
try:
- all_kwargs = deepcopy(registry.get(env_id).kwargs)
- except AttributeError as e:
- logging.error(f'The gym environment with id {env_id} could not been found.')
- raise e
- nested_update(all_kwargs, kwargs)
- kwargs = all_kwargs
-
- # Add seed to kwargs for bb environments to pass seed to step environments
- all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), [])
- if env_id in all_bb_envs:
- kwargs.update({"seed": seed})
-
- # Gym
- env = gym.make(env_id, **kwargs)
- return env
+ duration = env.spec.max_episode_steps * env.dt
+ except (AttributeError, TypeError) as e:
+ if env.env_type is EnvType.COMPOSER:
+ max_episode_steps = ceil(env.unwrapped._time_limit / env.dt)
+ elif env.env_type is EnvType.RL_CONTROL:
+ max_episode_steps = int(env.unwrapped._step_limit)
+ else:
+ raise e
+ duration = max_episode_steps * env.control_timestep()
+ return duration
def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):
diff --git a/fancy_gym/utils/time_aware_observation.py b/fancy_gym/utils/time_aware_observation.py
deleted file mode 100644
index b2cbc78..0000000
--- a/fancy_gym/utils/time_aware_observation.py
+++ /dev/null
@@ -1,78 +0,0 @@
-"""
-Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
-License: MIT
-Copyright (c) 2016 OpenAI (https://openai.com)
-
-Wrapper for adding time aware observations to environment observation.
-"""
-import gym
-import numpy as np
-from gym.spaces import Box
-
-
-class TimeAwareObservation(gym.ObservationWrapper):
- """Augment the observation with the current time step in the episode.
-
- The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
- In particular, pixel observations are not supported. This wrapper will append the current timestep
- within the current episode to the observation.
-
- Example:
- >>> import gym
- >>> env = gym.make('CartPole-v1')
- >>> env = TimeAwareObservation(env)
- >>> env.reset()
- array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ])
- >>> env.step(env.action_space.sample())[0]
- array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ])
- """
-
- def __init__(self, env: gym.Env):
- """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
- observation space.
-
- Args:
- env: The environment to apply the wrapper
- """
- super().__init__(env)
- assert isinstance(env.observation_space, Box)
- low = np.append(self.observation_space.low, 0.0)
- high = np.append(self.observation_space.high, 1.0)
- self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
- self.t = 0
- self._max_episode_steps = env.spec.max_episode_steps
-
- def observation(self, observation):
- """Adds to the observation with the current time step normalized with max steps.
-
- Args:
- observation: The observation to add the time step to
-
- Returns:
- The observation with the time step appended to
- """
- return np.append(observation, self.t / self._max_episode_steps)
-
- def step(self, action):
- """Steps through the environment, incrementing the time step.
-
- Args:
- action: The action to take
-
- Returns:
- The environment's step using the action.
- """
- self.t += 1
- return super().step(action)
-
- def reset(self, **kwargs):
- """Reset the environment setting the time to zero.
-
- Args:
- **kwargs: Kwargs to apply to env.reset()
-
- Returns:
- The reset environment
- """
- self.t = 0
- return super().reset(**kwargs)
diff --git a/fancy_gym/utils/wrappers.py b/fancy_gym/utils/wrappers.py
new file mode 100644
index 0000000..7526269
--- /dev/null
+++ b/fancy_gym/utils/wrappers.py
@@ -0,0 +1,130 @@
+from gymnasium.spaces import Box, Dict, flatten, flatten_space
+try:
+ from gym.spaces import Box as OldBox
+except ImportError:
+ OldBox = None
+import gymnasium as gym
+import numpy as np
+import copy
+
+
+class TimeAwareObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
+ """Augment the observation with the current time step in the episode.
+
+ The observation space of the wrapped environment is assumed to be a flat :class:`Box` or flattable :class:`Dict`.
+ In particular, pixel observations are not supported. This wrapper will append the current progress within the current episode to the observation.
+ The progress will be indicated as a number between 0 and 1.
+ """
+
+ def __init__(self, env: gym.Env, enforce_dtype_float32=False):
+ """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` or flattable :class:`Dict` observation space.
+
+ Args:
+ env: The environment to apply the wrapper
+ """
+ gym.utils.RecordConstructorArgs.__init__(self)
+ gym.ObservationWrapper.__init__(self, env)
+ allowed_classes = [Box, OldBox, Dict]
+ if enforce_dtype_float32:
+ assert env.observation_space.dtype == np.float32, 'TimeAwareObservation was given an environment with a dtype!=np.float32 ('+str(
+ env.observation_space.dtype)+'). This requirement can be removed by setting enforce_dtype_float32=False.'
+ assert env.observation_space.__class__ in allowed_classes, str(env.observation_space)+' is not supported. Only Box or Dict'
+
+ if env.observation_space.__class__ in [Box, OldBox]:
+ dtype = env.observation_space.dtype
+
+ low = np.append(env.observation_space.low, 0.0)
+ high = np.append(env.observation_space.high, 1.0)
+
+ self.observation_space = Box(low, high, dtype=dtype)
+ else:
+ spaces = copy.copy(env.observation_space.spaces)
+ dtype = np.float64
+ spaces['time_awareness'] = Box(0, 1, dtype=dtype)
+
+ self.observation_space = Dict(spaces)
+
+ self.is_vector_env = getattr(env, "is_vector_env", False)
+
+ def observation(self, observation):
+ """Adds to the observation with the current time step.
+
+ Args:
+ observation: The observation to add the time step to
+
+ Returns:
+ The observation with the time step appended to (relative to total number of steps)
+ """
+ if self.observation_space.__class__ in [Box, OldBox]:
+ return np.append(observation, self.t / self.env.spec.max_episode_steps)
+ else:
+ obs = copy.copy(observation)
+ obs['time_awareness'] = self.t / self.env.spec.max_episode_steps
+ return obs
+
+ def step(self, action):
+ """Steps through the environment, incrementing the time step.
+
+ Args:
+ action: The action to take
+
+ Returns:
+ The environment's step using the action.
+ """
+ self.t += 1
+ return super().step(action)
+
+ def reset(self, **kwargs):
+ """Reset the environment setting the time to zero.
+
+ Args:
+ **kwargs: Kwargs to apply to env.reset()
+
+ Returns:
+ The reset environment
+ """
+ self.t = 0
+ return super().reset(**kwargs)
+
+
+class FlattenObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
+ """Observation wrapper that flattens the observation.
+
+ Example:
+ >>> import gymnasium as gym
+ >>> from gymnasium.wrappers import FlattenObservation
+ >>> env = gym.make("CarRacing-v2")
+ >>> env.observation_space.shape
+ (96, 96, 3)
+ >>> env = FlattenObservation(env)
+ >>> env.observation_space.shape
+ (27648,)
+ >>> obs, _ = env.reset()
+ >>> obs.shape
+ (27648,)
+ """
+
+ def __init__(self, env: gym.Env):
+ """Flattens the observations of an environment.
+
+ Args:
+ env: The environment to apply the wrapper
+ """
+ gym.utils.RecordConstructorArgs.__init__(self)
+ gym.ObservationWrapper.__init__(self, env)
+
+ self.observation_space = flatten_space(env.observation_space)
+
+ def observation(self, observation):
+ """Flattens an observation.
+
+ Args:
+ observation: The observation to flatten
+
+ Returns:
+ The flattened observation
+ """
+ try:
+ return flatten(self.env.observation_space, observation)
+ except:
+ return np.array([flatten(self.env.observation_space, observation[i]) for i in range(len(observation))])
diff --git a/icon.svg b/icon.svg
new file mode 100644
index 0000000..64ec435
--- /dev/null
+++ b/icon.svg
@@ -0,0 +1,101 @@
+
+
+
+
diff --git a/setup.py b/setup.py
index 5993519..1daa568 100644
--- a/setup.py
+++ b/setup.py
@@ -6,33 +6,38 @@ from setuptools import setup, find_packages
# Environment-specific dependencies for dmc and metaworld
extras = {
- "dmc": ["dm_control>=1.0.1"],
- "metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
- 'mujoco-py<2.2,>=2.1',
- 'scipy'
- ],
+ 'dmc': ['shimmy[dm-control]', 'Shimmy==1.0.0'],
+ 'metaworld': ['metaworld @ git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld'],
+ 'box2d': ['gymnasium[box2d]>=0.26.0'],
+ 'mujoco': ['mujoco==2.3.3', 'gymnasium[mujoco]>0.26.0'],
+ 'mujoco-legacy': ['mujoco-py >=2.1,<2.2', 'cython<3'],
+ 'jax': ["jax >=0.4.0", "jaxlib >=0.4.0"],
}
# All dependencies
all_groups = set(extras.keys())
-extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups))))
+extras["all"] = list(set(itertools.chain.from_iterable(
+ map(lambda group: extras[group], all_groups))))
+
+extras['testing'] = extras["all"] + ['pytest']
def find_package_data(extensions_to_include: List[str]) -> List[str]:
envs_dir = Path("fancy_gym/envs/mujoco")
package_data_paths = []
for extension in extensions_to_include:
- package_data_paths.extend([str(path)[10:] for path in envs_dir.rglob(extension)])
+ package_data_paths.extend([str(path)[10:]
+ for path in envs_dir.rglob(extension)])
return package_data_paths
setup(
- author='Fabian Otto, Onur Celik',
+ author='Fabian Otto, Onur Celik, Dominik Roth, Hongyi Zhou',
name='fancy_gym',
- version='0.2',
+ version='1.0',
classifiers=[
- 'Development Status :: 3 - Alpha',
+ 'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: MIT License',
'Natural Language :: English',
@@ -46,10 +51,11 @@ setup(
],
extras_require=extras,
install_requires=[
- 'gym[mujoco]<0.25.0,>=0.24.1',
+ 'gymnasium>=0.26.0',
'mp_pytorch<=0.1.3'
],
- packages=[package for package in find_packages() if package.startswith("fancy_gym")],
+ packages=[package for package in find_packages(
+ ) if package.startswith("fancy_gym")],
package_data={
"fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"])
},
diff --git a/test/test_gym_envs.py b/test/test_all_gym_builtin_envs.py
similarity index 69%
rename from test/test_gym_envs.py
rename to test/test_all_gym_builtin_envs.py
index dae5944..f2eeac6 100644
--- a/test/test_gym_envs.py
+++ b/test/test_all_gym_builtin_envs.py
@@ -1,14 +1,21 @@
+import re
from itertools import chain
+from typing import Callable
-import gym
+import gymnasium as gym
import pytest
import fancy_gym
from test.utils import run_env, run_env_determinism
-GYM_IDS = [spec.id for spec in gym.envs.registry.all() if
- "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
-GYM_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+GYM_IDS = [spec.id for spec in gym.envs.registry.values() if
+ not isinstance(spec.entry_point, Callable) and
+ "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point
+ and 'jax' not in spec.id.lower()
+ and 'jax' not in spec.id.lower()
+ and not re.match(r'GymV2.Environment', spec.id)
+ ]
+GYM_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
diff --git a/test/test_black_box.py b/test/test_black_box.py
index 5ade1ae..8cdc543 100644
--- a/test/test_black_box.py
+++ b/test/test_black_box.py
@@ -1,21 +1,23 @@
from itertools import chain
from typing import Tuple, Type, Union, Optional, Callable
-import gym
+import gymnasium as gym
import numpy as np
import pytest
-from gym import register
-from gym.core import ActType, ObsType
+from gymnasium import register, make
+from gymnasium.core import ActType, ObsType
import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
-from fancy_gym.utils.time_aware_observation import TimeAwareObservation
+from fancy_gym.utils.wrappers import TimeAwareObservation
SEED = 1
-ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
+ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
-ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
+
+MAX_STEPS_FALLBACK = 100
class Object(object):
@@ -32,10 +34,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
- return np.array([-1])
+ obs, options = np.array([-1]), {}
+ return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
- return np.array([-1]), 1, False, {}
+ obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
+ return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@@ -76,7 +80,7 @@ def test_missing_local_state(mp_type: str):
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
- env.reset()
+ env.reset(seed=SEED)
with pytest.raises(NotImplementedError):
env.step(env.action_space.sample())
@@ -93,12 +97,14 @@ def test_verbosity(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
- env.reset()
- info_keys = list(env.step(env.action_space.sample())[3].keys())
+ env.reset(seed=SEED)
+ _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
+ info_keys = list(info.keys())
- env_step = fancy_gym.make(env_id, SEED)
+ env_step = make(env_id)
env_step.reset()
- info_keys_step = env_step.step(env_step.action_space.sample())[3].keys()
+ _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
+ info_keys_step = info.keys()
assert all(e in info_keys for e in info_keys_step)
assert 'trajectory_length' in info_keys
@@ -118,13 +124,15 @@ def test_length(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]):
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
- {'basis_generator_type': basis_generator_type})
+ {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
- for _ in range(5):
- env.reset()
- length = env.step(env.action_space.sample())[3]['trajectory_length']
+ for i in range(5):
+ env.reset(seed=SEED)
- assert length == env.spec.max_episode_steps
+ _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
+ length = info['trajectory_length']
+
+ assert length == env.spec.max_episode_steps, f'Expcted total simulation length ({length}) to be equal to spec.max_episode_steps ({env.spec.max_episode_steps}), but was not during test nr. {i}'
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@@ -136,9 +144,10 @@ def test_aggregation(mp_type: str, reward_aggregation: Callable[[np.ndarray], fl
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
- env.reset()
+ env.reset(seed=SEED)
# ToyEnv only returns 1 as reward
- assert env.step(env.action_space.sample())[1] == reward_aggregation(np.ones(50, ))
+ _obs, reward, _terminated, _truncated, _info = env.step(env.action_space.sample())
+ assert reward == reward_aggregation(np.ones(50, ))
@pytest.mark.parametrize('mp_type', ['promp', 'dmp'])
@@ -151,14 +160,16 @@ def test_context_space(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapp
{'phase_generator_type': 'exp'},
{'basis_generator_type': 'rbf'})
# check if observation space matches with the specified mask values which are true
- env_step = fancy_gym.make(env_id, SEED)
+ env_step = make(env_id)
wrapper = wrapper_class(env_step)
assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@pytest.mark.parametrize('num_dof', [0, 1, 2, 5])
-@pytest.mark.parametrize('num_basis', [0, 1, 2, 5])
+@pytest.mark.parametrize('num_basis', [
+ pytest.param(0, marks=pytest.mark.xfail(reason="Basis Length 0 is not yet implemented.")),
+ 1, 2, 5])
@pytest.mark.parametrize('learn_tau', [True, False])
@pytest.mark.parametrize('learn_delay', [True, False])
def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool):
@@ -219,16 +230,18 @@ def test_learn_tau(mp_type: str, tau: float):
'learn_delay': False
},
{'basis_generator_type': basis_generator_type,
- }, seed=SEED)
+ })
- d = True
+ env.reset(seed=SEED)
+ done = True
for i in range(5):
- if d:
- env.reset()
+ if done:
+ env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = tau
- obs, r, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps
@@ -248,6 +261,8 @@ def test_learn_tau(mp_type: str, tau: float):
assert np.all(vel[:tau_time_steps - 2] != vel[-1])
#
#
+
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75])
def test_learn_delay(mp_type: str, delay: float):
@@ -262,16 +277,18 @@ def test_learn_delay(mp_type: str, delay: float):
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
- }, seed=SEED)
+ })
- d = True
+ env.reset(seed=SEED)
+ done = True
for i in range(5):
- if d:
- env.reset()
+ if done:
+ env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = delay
- obs, r, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps
@@ -290,6 +307,8 @@ def test_learn_delay(mp_type: str, delay: float):
assert np.all(vel[max(1, delay_time_steps)] != vel[0])
#
#
+
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1])
@pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1])
@@ -305,20 +324,23 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
- }, seed=SEED)
+ })
+
+ env.reset(seed=SEED)
if env.spec.max_episode_steps * env.dt < delay + tau:
return
- d = True
+ done = True
for i in range(5):
- if d:
- env.reset()
+ if done:
+ env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = tau
action[1] = delay
- obs, r, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps
@@ -343,4 +365,4 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
active_pos = pos[delay_time_steps: joint_time_steps - 1]
active_vel = vel[delay_time_steps: joint_time_steps - 2]
assert np.all(active_pos != pos[-1]) and np.all(active_pos != pos[0])
- assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0])
\ No newline at end of file
+ assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0])
diff --git a/test/test_dmc_envs.py b/test/test_dmc_envs.py
index 410f3c1..3602da6 100644
--- a/test/test_dmc_envs.py
+++ b/test/test_dmc_envs.py
@@ -1,39 +1,30 @@
from itertools import chain
+from typing import Callable
+import gymnasium as gym
import pytest
-from dm_control import suite, manipulation
import fancy_gym
from test.utils import run_env, run_env_determinism
-SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
-MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
-DMC_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+DMC_IDS = [spec.id for spec in gym.envs.registry.values() if
+ spec.id.startswith('dm_control/')
+ and 'compatibility-env-v0' not in spec.id
+ and 'lqr-lqr' not in spec.id]
+DMC_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
-@pytest.mark.parametrize('env_id', SUITE_IDS)
-def test_step_suite_functionality(env_id: str):
+@pytest.mark.parametrize('env_id', DMC_IDS)
+def test_step_dm_control_functionality(env_id: str):
"""Tests that suite step environments run without errors using random actions."""
- run_env(env_id)
+ run_env(env_id, 5000, wrappers=[gym.wrappers.FlattenObservation])
-@pytest.mark.parametrize('env_id', SUITE_IDS)
-def test_step_suite_determinism(env_id: str):
+@pytest.mark.parametrize('env_id', DMC_IDS)
+def test_step_dm_control_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
- run_env_determinism(env_id, SEED)
-
-
-@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
-def test_step_manipulation_functionality(env_id: str):
- """Tests that manipulation step environments run without errors using random actions."""
- run_env(env_id)
-
-
-@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
-def test_step_manipulation_determinism(env_id: str):
- """Tests that for step environments identical seeds produce identical trajectories."""
- run_env_determinism(env_id, SEED)
+ run_env_determinism(env_id, SEED, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', DMC_MP_IDS)
diff --git a/test/test_fancy_envs.py b/test/test_fancy_envs.py
index 9acd696..a15c837 100644
--- a/test/test_fancy_envs.py
+++ b/test/test_fancy_envs.py
@@ -1,14 +1,16 @@
-import itertools
+from itertools import chain
+from typing import Callable
import fancy_gym
-import gym
+import gymnasium as gym
import pytest
from test.utils import run_env, run_env_determinism
-CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if
+CUSTOM_IDS = [id for id, spec in gym.envs.registry.items() if
+ not isinstance(spec.entry_point, Callable) and
"fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
-CUSTOM_MP_IDS = itertools.chain(*fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+CUSTOM_MP_IDS = fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
diff --git a/test/test_fancy_registry.py b/test/test_fancy_registry.py
new file mode 100644
index 0000000..aad076b
--- /dev/null
+++ b/test/test_fancy_registry.py
@@ -0,0 +1,78 @@
+from typing import Tuple, Type, Union, Optional, Callable
+
+import gymnasium as gym
+import numpy as np
+import pytest
+from gymnasium import make
+from gymnasium.core import ActType, ObsType
+
+import fancy_gym
+from fancy_gym import register
+
+KNOWN_NS = ['dm_control', 'fancy', 'metaworld', 'gym']
+
+
+class Object(object):
+ pass
+
+
+class ToyEnv(gym.Env):
+ observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
+ action_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
+ dt = 0.02
+
+ def __init__(self, a: int = 0, b: float = 0.0, c: list = [], d: dict = {}, e: Object = Object()):
+ self.a, self.b, self.c, self.d, self.e = a, b, c, d, e
+
+ def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
+ options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
+ obs, options = np.array([-1]), {}
+ return obs, options
+
+ def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
+ obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
+ return obs, reward, terminated, truncated, info
+
+ def render(self, mode="human"):
+ pass
+
+
+@pytest.fixture(scope="session", autouse=True)
+def setup():
+ register(
+ id=f'dummy/toy2-v0',
+ entry_point='test.test_black_box:ToyEnv',
+ max_episode_steps=50,
+ )
+
+
+@pytest.mark.parametrize('env_id', ['dummy/toy2-v0'])
+@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
+def test_make_mp(env_id: str, mp_type: str):
+ parts = env_id.split('/')
+ if len(parts) == 1:
+ ns, name = 'gym', parts[0]
+ elif len(parts) == 2:
+ ns, name = parts[0], parts[1]
+ else:
+ raise ValueError('env id can not contain multiple "/".')
+
+ fancy_id = f'{ns}_{mp_type}/{name}'
+
+ make(fancy_id)
+
+
+def test_make_raw_toy():
+ make('dummy/toy2-v0')
+
+
+@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
+def test_make_mp_toy(mp_type: str):
+ fancy_id = f'dummy_{mp_type}/toy2-v0'
+
+ make(fancy_id)
+
+
+@pytest.mark.parametrize('ns', KNOWN_NS)
+def test_ns_nonempty(ns):
+ assert len(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]), f'The namespace {ns} is empty even though, it should not be...'
diff --git a/test/test_metaworld_envs.py b/test/test_metaworld_envs.py
index ed300f4..90d98a3 100644
--- a/test/test_metaworld_envs.py
+++ b/test/test_metaworld_envs.py
@@ -6,9 +6,9 @@ from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
import fancy_gym
from test.utils import run_env, run_env_determinism
-METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in
+METAWORLD_IDS = [f'metaworld/{env.split("-goal-observable")[0]}' for env, _ in
ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
-METAWORLD_MP_IDS = chain(*fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+METAWORLD_MP_IDS = fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
@@ -18,6 +18,7 @@ def test_step_metaworld_functionality(env_id: str):
run_env(env_id)
+@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_IDS)
def test_step_metaworld_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
@@ -30,6 +31,7 @@ def test_bb_metaworld_functionality(env_id: str):
run_env(env_id)
+@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_MP_IDS)
def test_bb_metaworld_determinism(env_id: str):
"""Tests that for black box environment identical seeds produce identical trajectories."""
diff --git a/test/test_replanning_sequencing.py b/test/test_replanning_sequencing.py
index 9d04d02..c2edf42 100644
--- a/test/test_replanning_sequencing.py
+++ b/test/test_replanning_sequencing.py
@@ -2,21 +2,25 @@ from itertools import chain
from types import FunctionType
from typing import Tuple, Type, Union, Optional
-import gym
+import gymnasium as gym
import numpy as np
import pytest
-from gym import register
-from gym.core import ActType, ObsType
+from gymnasium import register, make
+from gymnasium.core import ActType, ObsType
+from gymnasium import spaces
import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
-from fancy_gym.utils.time_aware_observation import TimeAwareObservation
+from fancy_gym.utils.wrappers import TimeAwareObservation
+from fancy_gym.utils.make_env_helpers import ensure_finite_time
SEED = 1
-ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
+ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
-ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
+ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
+
+MAX_STEPS_FALLBACK = 50
class ToyEnv(gym.Env):
@@ -26,10 +30,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
- return np.array([-1])
+ obs, options = np.array([-1]), {}
+ return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
- return np.array([-1]), 1, False, {}
+ obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
+ return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@@ -61,7 +67,7 @@ def setup():
def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool):
env_id, wrapper_class = env_wrap
- env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
+ env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class]
# has time aware wrapper
@@ -72,24 +78,29 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
- {'basis_generator_type': 'rbf'}, seed=SEED)
+ {'basis_generator_type': 'rbf'}, fallback_max_steps=MAX_STEPS_FALLBACK)
+ env.reset(seed=SEED)
assert env.learn_sub_trajectories
+ assert env.spec.max_episode_steps
+ assert env_step.spec.max_episode_steps
assert env.traj_gen.learn_tau
# This also verifies we are not adding the TimeAwareObservationWrapper twice
- assert env.observation_space == env_step.observation_space
+ assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
- d = True
+ done = True
for i in range(25):
- if d:
- env.reset()
+ if done:
+ env.reset(seed=SEED)
+
action = env.action_space.sample()
- obs, r, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
length = info['trajectory_length']
- if not d:
+ if not done:
assert length == np.round(action[0] / env.dt)
assert length == np.round(env.traj_gen.tau.numpy() / env.dt)
else:
@@ -105,14 +116,14 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool, replanning_time: int):
env_id, wrapper_class = env_wrap
- env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
+ env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class]
# has time aware wrapper
if add_time_aware_wrapper_before:
wrappers += [TimeAwareObservation]
- replanning_schedule = lambda c_pos, c_vel, obs, c_action, t: t % replanning_time == 0
+ def replanning_schedule(c_pos, c_vel, obs, c_action, t): return t % replanning_time == 0
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear'
@@ -121,31 +132,36 @@ def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWra
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type},
- {'basis_generator_type': basis_generator_type}, seed=SEED)
+ {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
+ env.reset(seed=SEED)
assert env.do_replanning
+ assert env.spec.max_episode_steps
+ assert env_step.spec.max_episode_steps
assert callable(env.replanning_schedule)
# This also verifies we are not adding the TimeAwareObservationWrapper twice
- assert env.observation_space == env_step.observation_space
+ assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
- env.reset()
+ env.reset(seed=SEED)
episode_steps = env_step.spec.max_episode_steps // replanning_time
# Make 3 episodes, total steps depend on the replanning steps
for i in range(3 * episode_steps):
action = env.action_space.sample()
- obs, r, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
length = info['trajectory_length']
- if d:
+ if done:
# Check if number of steps until termination match the replanning interval
- print(d, (i + 1), episode_steps)
+ print(done, (i + 1), episode_steps)
assert (i + 1) % episode_steps == 0
- env.reset()
+ env.reset(seed=SEED)
assert replanning_schedule(None, None, None, None, length)
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@@ -165,15 +181,19 @@ def test_max_planning_times(mp_type: str, max_planning_times: int, sub_segment_s
},
{'basis_generator_type': basis_generator_type,
},
- seed=SEED)
- _ = env.reset()
- d = False
+ fallback_max_steps=MAX_STEPS_FALLBACK)
+
+ _ = env.reset(seed=SEED)
+ done = False
planning_times = 0
- while not d:
- _, _, d, _ = env.step(env.action_space.sample())
+ while not done:
+ action = env.action_space.sample()
+ _obs, _reward, terminated, truncated, _info = env.step(action)
+ done = terminated or truncated
planning_times += 1
assert planning_times == max_planning_times
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@@ -194,17 +214,20 @@ def test_replanning_with_learn_tau(mp_type: str, max_planning_times: int, sub_se
},
{'basis_generator_type': basis_generator_type,
},
- seed=SEED)
- _ = env.reset()
- d = False
+ fallback_max_steps=MAX_STEPS_FALLBACK)
+
+ _ = env.reset(seed=SEED)
+ done = False
planning_times = 0
- while not d:
+ while not done:
action = env.action_space.sample()
action[0] = tau
- _, _, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, _info = env.step(action)
+ done = terminated or truncated
planning_times += 1
assert planning_times == max_planning_times
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@@ -213,26 +236,28 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
- {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
- 'max_planning_times': max_planning_times,
- 'verbose': 2},
- {'trajectory_generator_type': mp_type,
- },
- {'controller_type': 'motor'},
- {'phase_generator_type': phase_generator_type,
- 'learn_tau': False,
- 'learn_delay': True
- },
- {'basis_generator_type': basis_generator_type,
- },
- seed=SEED)
- _ = env.reset()
- d = False
+ {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
+ 'max_planning_times': max_planning_times,
+ 'verbose': 2},
+ {'trajectory_generator_type': mp_type,
+ },
+ {'controller_type': 'motor'},
+ {'phase_generator_type': phase_generator_type,
+ 'learn_tau': False,
+ 'learn_delay': True
+ },
+ {'basis_generator_type': basis_generator_type,
+ },
+ fallback_max_steps=MAX_STEPS_FALLBACK)
+
+ _ = env.reset(seed=SEED)
+ done = False
planning_times = 0
- while not d:
+ while not done:
action = env.action_space.sample()
action[0] = delay
- _, _, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt))
pos = info['positions'].flatten()
@@ -256,6 +281,7 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
assert planning_times == max_planning_times
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3])
@pytest.mark.parametrize('sub_segment_steps', [5, 10, 15])
@@ -266,27 +292,29 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
- {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
- 'max_planning_times': max_planning_times,
- 'verbose': 2},
- {'trajectory_generator_type': mp_type,
- },
- {'controller_type': 'motor'},
- {'phase_generator_type': phase_generator_type,
- 'learn_tau': True,
- 'learn_delay': True
- },
- {'basis_generator_type': basis_generator_type,
- },
- seed=SEED)
- _ = env.reset()
- d = False
+ {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
+ 'max_planning_times': max_planning_times,
+ 'verbose': 2},
+ {'trajectory_generator_type': mp_type,
+ },
+ {'controller_type': 'motor'},
+ {'phase_generator_type': phase_generator_type,
+ 'learn_tau': True,
+ 'learn_delay': True
+ },
+ {'basis_generator_type': basis_generator_type,
+ },
+ fallback_max_steps=MAX_STEPS_FALLBACK)
+
+ _ = env.reset(seed=SEED)
+ done = False
planning_times = 0
- while not d:
+ while not done:
action = env.action_space.sample()
action[0] = tau
action[1] = delay
- _, _, d, info = env.step(action)
+ _obs, _reward, terminated, truncated, info = env.step(action)
+ done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt))
@@ -306,6 +334,7 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
assert planning_times == max_planning_times
+
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@@ -325,9 +354,11 @@ def test_replanning_schedule(mp_type: str, max_planning_times: int, sub_segment_
},
{'basis_generator_type': basis_generator_type,
},
- seed=SEED)
- _ = env.reset()
- d = False
+ fallback_max_steps=MAX_STEPS_FALLBACK)
+
+ _ = env.reset(seed=SEED)
for i in range(max_planning_times):
- _, _, d, _ = env.step(env.action_space.sample())
- assert d
+ action = env.action_space.sample()
+ _obs, _reward, terminated, truncated, _info = env.step(action)
+ done = terminated or truncated
+ assert done
diff --git a/test/utils.py b/test/utils.py
index dff2292..427622d 100644
--- a/test/utils.py
+++ b/test/utils.py
@@ -1,9 +1,12 @@
-import gym
+from typing import List, Type
+
+import gymnasium as gym
import numpy as np
-from fancy_gym import make
+from gymnasium import make
-def run_env(env_id, iterations=None, seed=0, render=False):
+def run_env(env_id: str, iterations: int = None, seed: int = 0, wrappers: List[Type[gym.Wrapper]] = [],
+ render: bool = False):
"""
Example for running a DMC based env in the step based setting.
The env_id has to be specified as `dmc:domain_name-task_name` or
@@ -13,70 +16,88 @@ def run_env(env_id, iterations=None, seed=0, render=False):
env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
iterations: Number of rollout steps to run
seed: random seeding
+ wrappers: List of Wrappers to apply to the environment
render: Render the episode
- Returns: observations, rewards, dones, actions
+ Returns: observations, rewards, terminations, truncations, actions
"""
- env: gym.Env = make(env_id, seed=seed)
+ env: gym.Env = make(env_id)
+ for w in wrappers:
+ env = w(env)
rewards = []
observations = []
actions = []
- dones = []
- obs = env.reset()
+ terminations = []
+ truncations = []
+ obs, _ = env.reset(seed=seed)
+ env.action_space.seed(seed)
verify_observations(obs, env.observation_space, "reset()")
iterations = iterations or (env.spec.max_episode_steps or 1)
- # number of samples(multiple environment steps)
+ # number of samples (multiple environment steps)
for i in range(iterations):
observations.append(obs)
ac = env.action_space.sample()
actions.append(ac)
# ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
- obs, reward, done, info = env.step(ac)
+ obs, reward, terminated, truncated, info = env.step(ac)
verify_observations(obs, env.observation_space, "step()")
verify_reward(reward)
- verify_done(done)
+ verify_done(terminated)
+ verify_done(truncated)
rewards.append(reward)
- dones.append(done)
+ terminations.append(terminated)
+ truncations.append(truncated)
if render:
env.render("human")
- if done:
+ if terminated or truncated:
break
if not hasattr(env, "replanning_schedule"):
- assert done, "Done flag is not True after end of episode."
+ assert terminated or truncated, f"Termination or truncation flag is not True after {i + 1} iterations."
+
observations.append(obs)
env.close()
del env
- return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
+ return np.array(observations), np.array(rewards), np.array(terminations), np.array(truncations), np.array(actions)
-def run_env_determinism(env_id: str, seed: int):
- traj1 = run_env(env_id, seed=seed)
- traj2 = run_env(env_id, seed=seed)
+def run_env_determinism(env_id: str, seed: int, iterations: int = None, wrappers: List[Type[gym.Wrapper]] = []):
+ traj1 = run_env(env_id, iterations=iterations,
+ seed=seed, wrappers=wrappers)
+ traj2 = run_env(env_id, iterations=iterations,
+ seed=seed, wrappers=wrappers)
# Iterate over two trajectories, which should have the same state and action sequence
for i, time_step in enumerate(zip(*traj1, *traj2)):
- obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
- assert np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match."
- assert np.array_equal(ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
- assert np.array_equal(rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
- assert np.array_equal(done1, done2), f"Dones [{i}] {done1} and {done2} do not match."
+ obs1, rwd1, term1, trunc1, ac1, obs2, rwd2, term2, trunc2, ac2 = time_step
+ assert np.allclose(
+ obs1, obs2), f"Observations [{i}] {obs1} ({obs1.shape}) and {obs2} ({obs2.shape}) do not match: Biggest difference is {np.abs(obs1-obs2).max()} at index {np.abs(obs1-obs2).argmax()}."
+ assert np.array_equal(
+ ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
+ assert np.array_equal(
+ rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
+ assert np.array_equal(
+ term1, term2), f"Terminateds [{i}] {term1} and {term2} do not match."
+ assert np.array_equal(
+ term1, term2), f"Truncateds [{i}] {trunc1} and {trunc2} do not match."
def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"):
assert observation_space.contains(obs), \
- f"Observation {obs} received from {obs_type} not contained in observation space {observation_space}."
+ f"Observation {obs} ({obs.shape}) received from {obs_type} not contained in observation space {observation_space}."
def verify_reward(reward):
- assert isinstance(reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
+ assert isinstance(
+ reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
def verify_done(done):
- assert isinstance(done, bool), f"Returned {done} as done flag, expected bool."
+ assert isinstance(
+ done, bool), f"Returned {done} as done flag, expected bool."