Merge pull request #75 from D-o-d-o-x/great_refactor

Refactor and Upgrade to Gymnasium
This commit is contained in:
Dominik Roth 2023-10-11 13:42:00 +02:00 committed by GitHub
commit c420a96d4f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
84 changed files with 3094 additions and 2741 deletions

237
README.md
View File

@ -1,27 +1,35 @@
# Fancy Gym
<h1 align="center">
<br>
<img src='./icon.svg' width="250px">
<br><br>
<b>Fancy Gym</b>
<br><br>
</h1>
`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
you would then submit a PR for this environment to become part of `fancy_gym`.
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
| :exclamation: Fancy Gym has recently received a major refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the [legacy branch](https://github.com/ALRhub/fancy_gym/tree/legacy). Find out more about what changed [here](https://github.com/ALRhub/fancy_gym/pull/75). |
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAIs renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
**Key Features**:
- **New Challenging Environments**: `fancy_gym` includes several new environments (Panda Box Pushing, Table Tennis, etc.) that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
- **Support for Movement Primitives**: `fancy_gym` supports a range of movement primitives (MPs), including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
- **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
- **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/), whether you want to use them in the regular step-based setting or using MPs.
- **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
optimization, and methods that are often used in traditional robotics and control. MP environments are typically
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
respectively as well as a special controller for the MetaWorld control suite.
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
episode. This requires to predict a new action/MP parametrization for each context.
<p align="justify">
Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
</p>
<p align="justify">
Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
</p>
<p align="justify">
While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
</p>
## Installation
@ -43,59 +51,60 @@ cd fancy_gym
pip install -e .
```
In case you want to use dm_control oder metaworld, you can install them by specifying extras
We have a few optional dependencies. If you also want to install those use
```bash
pip install -e .[dmc,metaworld]
pip install -e '.[all]' # to install all optional dependencies
pip install -e '.[dmc,metaworld,box2d,mujoco,mujoco-legacy,jax,testing]' # or choose only those you want
```
> **Note:**
> While our library already fully supports the new mujoco bindings, metaworld still relies on
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
## How to use Fancy Gym
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
### Step-wise Environments
### Step-Based Environments
Regular step based environments added by Fancy Gym are added into the `fancy/` namespace.
| :exclamation: Legacy versions of Fancy Gym used `fancy_gym.make(...)`. This is no longer supported and will raise an Exception on new versions. |
| ----------------------------------------------------------------------------------------------------------------------------------------------- |
```python
import gymnasium as gym
import fancy_gym
env = fancy_gym.make('Reacher5d-v0', seed=1)
obs = env.reset()
env = gym.make('fancy/Reacher5d-v0')
# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
# or env = gym.make('dm_control/ball_in_cup-catch-v0')
# or env = gym.make('Reacher-v2')
observation = env.reset(seed=1)
for i in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
observation, reward, terminated, truncated, info = env.step(action)
if i % 5 == 0:
env.render()
if done:
obs = env.reset()
if terminated or truncated:
observation, info = env.reset()
```
When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
custom tasks and standard gym environments can be created without prefixes.
### Black-box Environments
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
each environment returns all collected information from each step as part of the infos. This information is, however,
mainly meant for debugging as well as logging and not for training.
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
|Key| Description|Type
|---|---|---|
`positions`| Generated trajectory from MP | Optional
`velocities`| Generated trajectory from MP | Optional
`step_actions`| Step-wise executed action based on controller output | Optional
`step_observations`| Step-wise intermediate observations | Optional
`step_rewards`| Step-wise rewards | Optional
`trajectory_length`| Total number of environment interactions | Always
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
| Key | Description | Type |
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
| `positions` | Generated trajectory from MP | Optional |
| `velocities` | Generated trajectory from MP | Optional |
| `step_actions` | Step-wise executed action based on controller output | Optional |
| `step_observations` | Step-wise intermediate observations | Optional |
| `step_rewards` | Step-wise rewards | Optional |
| `trajectory_length` | Total number of environment interactions | Always |
| `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always |
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by `<original namespace>_<MP name>/`.
Just keep in mind, calling `step()` executes a full trajectory.
> **Note:**
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
@ -105,30 +114,38 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
> Feel free to try it and open an issue with any problems that occur.
```python
import gymnasium as gym
import fancy_gym
env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
env = gym.make('fancy_ProMP/Reacher5d-v0')
# or env = gym.make('metaworld_ProDMP/reach-v2')
# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
# render() can be called once in the beginning with all necessary arguments.
# To turn it of again just call render() without any arguments.
env.render(mode='human')
# This returns the context information, not the full state observation
obs = env.reset()
observation, info = env.reset(seed=1)
for i in range(5):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
observation, reward, terminated, truncated, info = env.step(action)
# Done is always True as we are working on the episode level, hence we always reset()
obs = env.reset()
# terminated or truncated is always True as we are working on the episode level, hence we always reset()
observation, info = env.reset()
```
To show all available environments, we provide some additional convenience variables. All of them return a dictionary
with two keys `DMP` and `ProMP` that store a list of available environment ids.
with the keys `DMP`, `ProMP`, `ProDMP` and `all` that store a list of available environment ids.
```python
import fancy_gym
print("All Black-box tasks:")
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
@ -140,6 +157,9 @@ print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("MetaWorld Black-box tasks:")
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("If you add custom envs, their mp versions will be found in:")
print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
```
### How to create a new MP task
@ -151,23 +171,27 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
from abc import abstractmethod
from typing import Union, Tuple
import gym
import gymnasium as gym
import numpy as np
class RawInterfaceWrapper(gym.Wrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
"""
Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns:
bool array representing the indices of the observations
Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns:
bool array representing the indices of the observations
"""
return np.ones(self.env.observation_space.shape[0], dtype=bool)
@ -197,34 +221,91 @@ class RawInterfaceWrapper(gym.Wrapper):
```
Default configurations for MPs can be overitten by defining attributes in mp_config.
Available parameters are documented in the [MP_PyTorch Userguide](https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md).
```python
class RawInterfaceWrapper(gym.Wrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
# When selecting another generator type, the default configuration will not be merged for the attribute.
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'num_basis_zero_start': 1,
'num_basis_zero_goal': 1,
},
},
'DMP': {},
'ProDMP': {}.
}
[...]
```
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
at the [examples](fancy_gym/examples/).
If the step-based is already registered with gym, you can simply do the following:
```python
import fancy_gym
fancy_gym.upgrade(
id='custom/cool_new_env-v0',
mp_wrapper=my_custom_MPWrapper
)
```
# Base environment name, according to structure of above example
base_env_id = "dmc:ball_in_cup-catch"
If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
# You can also add other gym.Wrappers in case they are needed,
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
kwargs = {...}
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
```python
fancy_gym.register(
id='custom/cool_new_env-v0',
entry_point=my_custom_env,
mp_wrapper=my_custom_MPWrapper
)
```
From this point on, you can access MP-version of your environments via
```python
env = gym.make('custom_ProDMP/cool_new_env-v0')
rewards = 0
obs = env.reset()
observation, info = env.reset()
# number of samples/full trajectories (multiple environment steps)
for i in range(5):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
observation, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
print(base_env_id, rewards)
if terminated or truncated:
print(rewards)
rewards = 0
obs = env.reset()
observation, info = env.reset()
```
## Citing the Project
To cite this repository in publications:
```bibtex
@software{fancy_gym,
title = {Fancy Gym},
author = {Otto, Fabian and Celik, Onur and Roth, Dominik and Zhou, Hongyi},
abstract = {Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.},
url = {https://github.com/ALRhub/fancy_gym},
organization = {Autonomous Learning Robots Lab (ALR) at KIT},
}
```
## Icon Attribution
The icon is based on the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) icon as can be found [here](https://gymnasium.farama.org/_static/img/gymnasium_black.svg).

View File

@ -1,13 +1,17 @@
from fancy_gym import dmc, meta, open_ai
from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank
from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS
# Convenience function for all MP environments
from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS
from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
from fancy_gym import envs as fancy
from fancy_gym.utils.make_env_helpers import make_bb
from .envs.registry import register, upgrade
from .envs.registry import ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS, MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {
key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key]
for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['dm_control']
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['fancy']
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['metaworld']
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['gym']
def make(*args, **kwargs):
"""
As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.
"""
raise Exception('As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.')

View File

@ -1,8 +1,9 @@
from typing import Tuple, Optional, Callable
from typing import Tuple, Optional, Callable, Dict, Any
import gym
import gymnasium as gym
import numpy as np
from gym import spaces
from gymnasium import spaces
from gymnasium.core import ObsType
from mp_pytorch.mp.mp_interfaces import MPInterface
from fancy_gym.black_box.controller.base_controller import BaseController
@ -67,7 +68,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
self.reward_aggregation = reward_aggregation
# spaces
self.return_context_observation = not (learn_sub_trajectories or self.do_replanning)
self.return_context_observation = not (
learn_sub_trajectories or self.do_replanning)
self.traj_gen_action_space = self._get_traj_gen_action_space()
self.action_space = self._get_action_space()
self.observation_space = self._get_observation_space()
@ -99,14 +101,17 @@ class BlackBoxWrapper(gym.ObservationWrapper):
# If we do not do this, the traj_gen assumes we are continuing the trajectory.
self.traj_gen.reset()
clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
clipped_params = np.clip(
action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
self.traj_gen.set_params(clipped_params)
init_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt)
init_time = np.array(
0 if not self.do_replanning else self.current_traj_steps * self.dt)
condition_pos = self.condition_pos if self.condition_pos is not None else self.current_pos
condition_vel = self.condition_vel if self.condition_vel is not None else self.current_vel
condition_pos = self.condition_pos if self.condition_pos is not None else self.env.get_wrapper_attr('current_pos')
condition_vel = self.condition_vel if self.condition_vel is not None else self.env.get_wrapper_attr('current_vel')
self.traj_gen.set_initial_conditions(init_time, condition_pos, condition_vel)
self.traj_gen.set_initial_conditions(
init_time, condition_pos, condition_vel)
self.traj_gen.set_duration(duration, self.dt)
position = get_numpy(self.traj_gen.get_traj_pos())
@ -153,7 +158,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
trajectory_length = len(position)
rewards = np.zeros(shape=(trajectory_length,))
if self.verbose >= 2:
actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape)
actions = np.zeros(shape=(trajectory_length,) +
self.env.action_space.shape)
observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
dtype=self.env.observation_space.dtype)
@ -161,16 +167,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
done = False
if not traj_is_valid:
obs, trajectory_return, done, infos = self.env.invalid_traj_callback(action, position, velocity,
self.return_context_observation,
self.tau_bound, self.delay_bound)
return self.observation(obs), trajectory_return, done, infos
obs, trajectory_return, terminated, truncated, infos = self.env.invalid_traj_callback(action, position, velocity,
self.return_context_observation, self.tau_bound, self.delay_bound)
return self.observation(obs), trajectory_return, terminated, truncated, infos
self.plan_steps += 1
for t, (pos, vel) in enumerate(zip(position, velocity)):
step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel)
c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high)
obs, c_reward, done, info = self.env.step(c_action)
step_action = self.tracking_controller.get_action(
pos, vel, self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'))
c_action = np.clip(
step_action, self.env.action_space.low, self.env.action_space.high)
obs, c_reward, terminated, truncated, info = self.env.step(
c_action)
rewards[t] = c_reward
if self.verbose >= 2:
@ -185,9 +193,7 @@ class BlackBoxWrapper(gym.ObservationWrapper):
if self.render_kwargs:
self.env.render(**self.render_kwargs)
if done or (self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action,
t + 1 + self.current_traj_steps)
and self.plan_steps < self.max_planning_times):
if terminated or truncated or (self.replanning_schedule(self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'), obs, c_action, t + 1 + self.current_traj_steps) and self.plan_steps < self.max_planning_times):
if self.condition_on_desired:
self.condition_pos = pos
@ -207,17 +213,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
infos['trajectory_length'] = t + 1
trajectory_return = self.reward_aggregation(rewards[:t + 1])
return self.observation(obs), trajectory_return, done, infos
return self.observation(obs), trajectory_return, terminated, truncated, infos
def render(self, **kwargs):
"""Only set render options here, such that they can be used during the rollout.
This only needs to be called once"""
self.render_kwargs = kwargs
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_traj_steps = 0
self.plan_steps = 0
self.traj_gen.reset()
self.condition_pos = None
self.condition_vel = None
return super(BlackBoxWrapper, self).reset()
return super(BlackBoxWrapper, self).reset(seed=seed, options=options)

View File

@ -11,11 +11,11 @@ def get_controller(controller_type: str, **kwargs):
if controller_type == "motor":
return PDController(**kwargs)
elif controller_type == "velocity":
return VelController()
return VelController(**kwargs)
elif controller_type == "position":
return PosController()
return PosController(**kwargs)
elif controller_type == "metaworld":
return MetaWorldController()
return MetaWorldController(**kwargs)
else:
raise ValueError(f"Specified controller type {controller_type} not supported, "
f"please choose one of {ALL_TYPES}.")

View File

@ -1,6 +1,6 @@
from typing import Union, Tuple
import gym
import gymnasium as gym
import numpy as np
from mp_pytorch.mp.mp_interfaces import MPInterface
@ -114,7 +114,8 @@ class RawInterfaceWrapper(gym.Wrapper):
Returns:
obs: artificial observation if the trajectory is invalid, by default a zero vector
reward: artificial reward if the trajectory is invalid, by default 0
done: artificial done if the trajectory is invalid, by default True
terminated: artificial terminated if the trajectory is invalid, by default True
truncated: artificial truncated if the trajectory is invalid, by default False
info: artificial info if the trajectory is invalid, by default empty dict
"""
return np.zeros(1), 0, True, {}
return np.zeros(1), 0, True, False, {}

View File

@ -9,11 +9,11 @@ environments in order to use our Motion Primitive gym interface with them.
[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite &#40;DMC&#41; counterparts. Given most task can be)
[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2
|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2
|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
| Name | Description | Trajectory Horizon | Action Dimension | Context Dimension |
| ---------------------------------------- | ------------------------------------------------------------------------------ | ------------------ | ---------------- | ----------------- |
| `dm_control_ProDMP/ball_in_cup-catch-v0` | A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
| `dm_control_DMP/ball_in_cup-catch-v0` | A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
| `dm_control_ProDMP/reacher-easy-v0` | A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
| `dm_control_DMP/reacher-easy-v0` | A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
| `dm_control_ProDMP/reacher-hard-v0` | A ProMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
| `dm_control_DMP/reacher-hard-v0` | A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |

View File

@ -1,245 +1,61 @@
from copy import deepcopy
from gymnasium.wrappers import FlattenObservation
from gymnasium.envs.registration import register
from ..envs.registry import register
from . import manipulation, suite
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
from gym.envs.registration import register
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 50.,
"d_gains": 1.,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
}
}
DEFAULT_BB_DICT_DMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'dmp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 50.,
"d_gains": 1.,
},
"basis_generator_kwargs": {
'basis_generator_type': 'rbf',
'num_basis': 5
}
}
# DeepMind Control Suite (DMC)
kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
# bandwidth_factor=2
kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1
register(
id=f'dmc_ball_in_cup-catch_dmp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bic_dmp
id=f"dm_control/ball_in_cup-catch-v0",
register_step_based=False,
mp_wrapper=suite.ball_in_cup.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
register(
id=f'dmc_ball_in_cup-catch_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bic_promp
id=f"dm_control/reacher-easy-v0",
register_step_based=False,
mp_wrapper=suite.reacher.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
# bandwidth_factor=2
kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
register(
id=f'dmc_reacher-easy_dmp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bic_dmp
id=f"dm_control/reacher-hard-v0",
register_step_based=False,
mp_wrapper=suite.reacher.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=f'dmc_reacher-easy_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_easy_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
# bandwidth_factor = 2
kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
register(
id=f'dmc_reacher-hard_dmp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_hard_dmp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=f'dmc_reacher-hard_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_hard_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
for _task in _dmc_cartpole_tasks:
_env_id = f'dmc_cartpole-{_task}_dmp-v0'
kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole_dmp
id=f'dm_control/cartpole-{_task}-v0',
register_step_based=False,
mp_wrapper=suite.cartpole.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'dmc_cartpole-{_task}_promp-v0'
kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
_env_id = f'dmc_cartpole-two_poles_dmp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole2poles_dmp
id=f"dm_control/cartpole-two_poles-v0",
register_step_based=False,
mp_wrapper=suite.cartpole.TwoPolesMPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
_env_id = f'dmc_cartpole-two_poles_promp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole2poles_promp
id=f"dm_control/cartpole-three_poles-v0",
register_step_based=False,
mp_wrapper=suite.cartpole.ThreePolesMPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
_env_id = f'dmc_cartpole-three_poles_dmp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole3poles_dmp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
_env_id = f'dmc_cartpole-three_poles_promp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole3poles_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# DeepMind Manipulation
kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
register(
id=f'dmc_manipulation-reach_site_dmp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_mani_reach_site_features_dmp
id=f"dm_control/reach_site_features-v0",
register_step_based=False,
mp_wrapper=manipulation.reach_site.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
register(
id=f'dmc_manipulation-reach_site_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_mani_reach_site_features_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")

View File

@ -1,186 +0,0 @@
# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
# License: MIT
# Copyright (c) 2020 Denis Yarats
import collections
from collections.abc import MutableMapping
from typing import Any, Dict, Tuple, Optional, Union, Callable
import gym
import numpy as np
from dm_control import composer
from dm_control.rl import control
from dm_env import specs
from gym import spaces
from gym.core import ObsType
def _spec_to_box(spec):
def extract_min_max(s):
assert s.dtype == np.float64 or s.dtype == np.float32, \
f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
dim = int(np.prod(s.shape))
if type(s) == specs.Array:
bound = np.inf * np.ones(dim, dtype=s.dtype)
return -bound, bound
elif type(s) == specs.BoundedArray:
zeros = np.zeros(dim, dtype=s.dtype)
return s.minimum + zeros, s.maximum + zeros
mins, maxs = [], []
for s in spec:
mn, mx = extract_min_max(s)
mins.append(mn)
maxs.append(mx)
low = np.concatenate(mins, axis=0)
high = np.concatenate(maxs, axis=0)
assert low.shape == high.shape
return spaces.Box(low, high, dtype=s.dtype)
def _flatten_obs(obs: MutableMapping):
"""
Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
Args:
obs: observation to flatten
Returns: 1D array of observation
"""
if not isinstance(obs, MutableMapping):
raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
# Keep key order consistent for non OrderedDicts
keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
return np.concatenate(obs_vals)
class DMCWrapper(gym.Env):
def __init__(self,
env: Callable[[], Union[composer.Environment, control.Environment]],
):
# TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
self._env = env()
# action and observation space
self._action_space = _spec_to_box([self._env.action_spec()])
self._observation_space = _spec_to_box(self._env.observation_spec().values())
self._window = None
self.id = 'dmc'
def __getattr__(self, item):
"""Propagate only non-existent properties to wrapped env."""
if item.startswith('_'):
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
if item in self.__dict__:
return getattr(self, item)
return getattr(self._env, item)
def _get_obs(self, time_step):
obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
return obs
@property
def observation_space(self):
return self._observation_space
@property
def action_space(self):
return self._action_space
@property
def dt(self):
return self._env.control_timestep()
def seed(self, seed=None):
self._action_space.seed(seed)
self._observation_space.seed(seed)
def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
assert self._action_space.contains(action)
extra = {'internal_state': self._env.physics.get_state().copy()}
time_step = self._env.step(action)
reward = time_step.reward or 0.
done = time_step.last()
obs = self._get_obs(time_step)
extra['discount'] = time_step.discount
return obs, reward, done, extra
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
time_step = self._env.reset()
obs = self._get_obs(time_step)
return obs
def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
segmentation=False, scene_option=None, render_flag_overrides=None):
# assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
if mode == "rgb_array":
return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
depth=depth, segmentation=segmentation, scene_option=scene_option,
render_flag_overrides=render_flag_overrides)
# Render max available buffer size. Larger is only possible by altering the XML.
img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
width=self._env.physics.model.vis.global_.offwidth,
camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
scene_option=scene_option, render_flag_overrides=render_flag_overrides)
if depth:
img = np.dstack([img.astype(np.uint8)] * 3)
if mode == 'human':
try:
import cv2
if self._window is None:
self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
cv2.imshow(self.id, img[..., ::-1]) # Image in BGR
cv2.waitKey(1)
except ImportError:
raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
# PYGAME seems to destroy some global rendering configs from the physics render
# except ImportError:
# import pygame
# img_copy = img.copy().transpose((1, 0, 2))
# if self._window is None:
# pygame.init()
# pygame.display.init()
# self._window = pygame.display.set_mode(img_copy.shape[:2])
# self.clock = pygame.time.Clock()
#
# surf = pygame.surfarray.make_surface(img_copy)
# self._window.blit(surf, (0, 0))
# pygame.event.pump()
# self.clock.tick(30)
# pygame.display.flip()
def close(self):
super().close()
if self._window is not None:
try:
import cv2
cv2.destroyWindow(self.id)
except ImportError:
import pygame
pygame.display.quit()
pygame.quit()
@property
def reward_range(self) -> Tuple[float, float]:
reward_spec = self._env.reward_spec()
if isinstance(reward_spec, specs.BoundedArray):
return reward_spec.minimum, reward_spec.maximum
return -float('inf'), float('inf')
@property
def metadata(self):
return {'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': round(1.0 / self._env.control_timestep())}

View File

@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
@ -35,4 +57,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
return self.env.dt
return self.env.control_timestep()

View File

@ -6,6 +6,25 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 10
},
},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
@ -31,4 +50,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
return self.env.dt
return self.env.control_timestep()

View File

@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 10,
'd_gains': 10,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 10,
'd_gains': 10,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
def __init__(self, env, n_poles: int = 1):
self.n_poles = n_poles
@ -35,7 +59,7 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
return self.env.dt
return self.env.control_timestep()
class TwoPolesMPWrapper(MPWrapper):

View File

@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
'd_gains': 1.0,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
'd_gains': 1.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
@ -30,4 +54,4 @@ class MPWrapper(RawInterfaceWrapper):
@property
def dt(self) -> Union[float, int]:
return self.env.dt
return self.env.control_timestep()

View File

@ -1,103 +1,43 @@
from copy import deepcopy
import numpy as np
from gym import register
from gymnasium import register as gym_register
from .registry import register, upgrade
from . import classic_control, mujoco
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
from .classic_control.simple_reacher import MPWrapper as MPWrapper_SimpleReacher
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.hole_reacher import MPWrapper as MPWrapper_HoleReacher
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
from .classic_control.viapoint_reacher import MPWrapper as MPWrapper_ViaPointReacher
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
from .mujoco.reacher.mp_wrapper import MPWrapper as MPWrapper_Reacher
from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
from .mujoco.beerpong.mp_wrapper import MPWrapper as MPWrapper_Beerpong
from .mujoco.beerpong.mp_wrapper import MPWrapper_FixedRelease as MPWrapper_Beerpong_FixedRelease
from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \
BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \
MAX_EPISODE_STEPS_TABLE_TENNIS
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1,
'basis_bandwidth_factor': 3.0,
},
"black_box_kwargs": {
}
}
DEFAULT_BB_DICT_DMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'dmp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'rbf',
'num_basis': 5
}
}
DEFAULT_BB_DICT_ProDMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'prodmp',
'duration': 2.0,
'weights_scale': 1.0,
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp',
'tau': 1.5,
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'prodmp',
'alpha': 10,
'num_basis': 5,
},
"black_box_kwargs": {
}
}
MAX_EPISODE_STEPS_TABLE_TENNIS
from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper as MPWrapper_TableTennis
from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper_Replan as MPWrapper_TableTennis_Replan
from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper as MPWrapper_TableTennis_VelObs
from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper_Replan as MPWrapper_TableTennis_VelObs_Replan
# Classic Control
## Simple Reacher
# Simple Reacher
register(
id='SimpleReacher-v0',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
id='fancy/SimpleReacher-v0',
entry_point=SimpleReacherEnv,
mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200,
kwargs={
"n_links": 2,
@ -105,19 +45,20 @@ register(
)
register(
id='LongSimpleReacher-v0',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
id='fancy/LongSimpleReacher-v0',
entry_point=SimpleReacherEnv,
mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
## Viapoint Reacher
# Viapoint Reacher
register(
id='ViaPointReacher-v0',
entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv',
id='fancy/ViaPointReacher-v0',
entry_point=ViaPointReacherEnv,
mp_wrapper=MPWrapper_ViaPointReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
@ -126,10 +67,11 @@ register(
}
)
## Hole Reacher
# Hole Reacher
register(
id='HoleReacher-v0',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
id='fancy/HoleReacher-v0',
entry_point=HoleReacherEnv,
mp_wrapper=MPWrapper_HoleReacher,
max_episode_steps=200,
kwargs={
"n_links": 5,
@ -145,31 +87,35 @@ register(
# Mujoco
## Mujoco Reacher
for _dims in [5, 7]:
# Mujoco Reacher
for dims in [5, 7]:
register(
id=f'Reacher{_dims}d-v0',
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
id=f'fancy/Reacher{dims}d-v0',
entry_point=ReacherEnv,
mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={
"n_links": _dims,
"n_links": dims,
}
)
register(
id=f'Reacher{_dims}dSparse-v0',
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
id=f'fancy/Reacher{dims}dSparse-v0',
entry_point=ReacherEnv,
mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={
"sparse": True,
'reward_weight': 200,
"n_links": _dims,
"n_links": dims,
}
)
register(
id='HopperJumpSparse-v0',
id='fancy/HopperJumpSparse-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"sparse": True,
@ -177,8 +123,9 @@ register(
)
register(
id='HopperJump-v0',
id='fancy/HopperJump-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"sparse": False,
@ -188,76 +135,117 @@ register(
}
)
# TODO: Add [MPs] later when finished (old TODO I moved here during refactor)
register(
id='AntJump-v0',
id='fancy/AntJump-v0',
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
add_mp_types=[],
)
register(
id='HalfCheetahJump-v0',
id='fancy/HalfCheetahJump-v0',
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
add_mp_types=[],
)
register(
id='HopperJumpOnBox-v0',
id='fancy/HopperJumpOnBox-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
add_mp_types=[],
)
register(
id='HopperThrow-v0',
id='fancy/HopperThrow-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
add_mp_types=[],
)
register(
id='HopperThrowInBasket-v0',
id='fancy/HopperThrowInBasket-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
add_mp_types=[],
)
register(
id='Walker2DJump-v0',
id='fancy/Walker2DJump-v0',
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
add_mp_types=[],
)
register( # [MPDone
id='fancy/BeerPong-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
mp_wrapper=MPWrapper_Beerpong,
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
add_mp_types=['ProMP'],
)
# Here we use the same reward as in BeerPong-v0, but now consider after the release,
# only one time step, i.e. we simulate until the end of th episode
register(
id='fancy/BeerPongStepBased-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
mp_wrapper=MPWrapper_Beerpong_FixedRelease,
max_episode_steps=FIXED_RELEASE_STEP,
add_mp_types=['ProMP'],
)
register(
id='BeerPong-v0',
id='fancy/BeerPongFixedRelease-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
mp_wrapper=MPWrapper_Beerpong_FixedRelease,
max_episode_steps=FIXED_RELEASE_STEP,
add_mp_types=['ProMP'],
)
# Box pushing environments with different rewards
for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]:
register(
id='BoxPushing{}-v0'.format(reward_type),
id='fancy/BoxPushing{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
)
register(
id='BoxPushingRandomInit{}-v0'.format(reward_type),
id='fancy/BoxPushingRandomInit{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
kwargs={"random_init": True}
)
# Here we use the same reward as in BeerPong-v0, but now consider after the release,
# only one time step, i.e. we simulate until the end of th episode
register(
id='BeerPongStepBased-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
max_episode_steps=FIXED_RELEASE_STEP,
)
upgrade(
id='fancy/BoxPushing{}Replan-v0'.format(reward_type),
base_id='fancy/BoxPushing{}-v0'.format(reward_type),
mp_wrapper=mujoco.box_pushing.ReplanMPWrapper,
)
# Table Tennis environments
for ctxt_dim in [2, 4]:
register(
id='TableTennis{}D-v0'.format(ctxt_dim),
id='fancy/TableTennis{}D-v0'.format(ctxt_dim),
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
mp_wrapper=MPWrapper_TableTennis,
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
add_mp_types=['ProMP', 'ProDMP'],
kwargs={
"ctxt_dim": ctxt_dim,
'frame_skip': 4,
}
)
register(
id='fancy/TableTennis{}DReplan-v0'.format(ctxt_dim),
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
mp_wrapper=MPWrapper_TableTennis,
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
add_mp_types=['ProDMP'],
kwargs={
"ctxt_dim": ctxt_dim,
'frame_skip': 4,
@ -265,626 +253,39 @@ for ctxt_dim in [2, 4]:
)
register(
id='TableTennisWind-v0',
id='fancy/TableTennisWind-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
mp_wrapper=MPWrapper_TableTennis_VelObs,
add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
)
register(
id='TableTennisGoalSwitching-v0',
id='fancy/TableTennisWindReplan-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
mp_wrapper=MPWrapper_TableTennis_VelObs_Replan,
add_mp_types=['ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
)
register(
id='fancy/TableTennisGoalSwitching-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
mp_wrapper=MPWrapper_TableTennis,
add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
kwargs={
'goal_switching_step': 99
}
)
# movement Primitive Environments
## Simple Reacher
_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_simple_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_simple_reacher_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_simple_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# Viapoint reacher
kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
register(
id='ViaPointReacherDMP-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# max_episode_steps=1,
kwargs=kwargs_dict_via_point_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
register(
id="ViaPointReacherProMP-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_via_point_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
## Hole Reacher
_versions = ["HoleReacher-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
kwargs_dict_hole_reacher_dmp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# max_episode_steps=1,
kwargs=kwargs_dict_hole_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_hole_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
## ReacherNd
_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_reacher_dmp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# max_episode_steps=1,
kwargs=kwargs_dict_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
kwargs_dict_reacher_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
########################################################################################################################
## Beerpong ProMP
_versions = ['BeerPong-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
kwargs_dict_bp_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bp_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
### BP with Fixed release
_versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
for _v in _versions:
if _v != 'BeerPong-v0':
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
else:
_env_id = 'BeerPongFixedReleaseProMP-v0'
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
kwargs_dict_bp_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bp_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
########################################################################################################################
## Table Tennis needs to be fixed according to Zhou's implementation
# TODO: Add later when finished
# ########################################################################################################################
#
# ## AntJump
# _versions = ['AntJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
# kwargs_dict_ant_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_ant_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
#
# ########################################################################################################################
#
# ## HalfCheetahJump
# _versions = ['HalfCheetahJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
# kwargs_dict_halfcheetah_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_halfcheetah_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
#
# ########################################################################################################################
## HopperJump
_versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
# 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
]
# TODO: Check if all environments work with the same MPWrapper
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
kwargs_dict_hopper_jump_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_hopper_jump_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ########################################################################################################################
## Box Pushing
_versions = ['BoxPushingDense-v0', 'BoxPushingTemporalSparse-v0', 'BoxPushingTemporalSpatialSparse-v0',
'BoxPushingRandomInitDense-v0', 'BoxPushingRandomInitTemporalSparse-v0',
'BoxPushingRandomInitTemporalSpatialSparse-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_box_pushing_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_box_pushing_promp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_promp['name'] = _v
kwargs_dict_box_pushing_promp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_promp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_promp['basis_generator_kwargs']['basis_bandwidth_factor'] = 2 # 3.5, 4 to try
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_prodmp['name'] = _v
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_prodmp['name'] = _v
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['max_planning_times'] = 4
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 25 == 0
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['condition_on_desired'] = True
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
## Table Tennis
_versions = ['TableTennis2D-v0', 'TableTennis4D-v0', 'TableTennisWind-v0', 'TableTennisGoalSwitching-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_tt_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_promp['name'] = _v
kwargs_dict_tt_promp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_promp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_promp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_promp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis'] = 3
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_start'] = 1
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_goal'] = 1
kwargs_dict_tt_promp['black_box_kwargs']['verbose'] = 2
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_prodmp['name'] = _v
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.7
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['relative_goal'] = True
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['disable_goal'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 3
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_prodmp['name'] = _v
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = False
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['goal_offset'] = 1.0
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
kwargs_dict_tt_prodmp['black_box_kwargs']['max_planning_times'] = 3
kwargs_dict_tt_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 50 == 0
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
#
# ## Walker2DJump
# _versions = ['Walker2DJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
# kwargs_dict_walker2d_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_walker2d_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
### Depricated, we will not provide non random starts anymore
"""
register(
id='SimpleReacher-v1',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
id='fancy/TableTennisGoalSwitchingReplan-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
mp_wrapper=MPWrapper_TableTennis_Replan,
add_mp_types=['ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
kwargs={
"n_links": 2,
"random_start": False
'goal_switching_step': 99
}
)
register(
id='LongSimpleReacher-v1',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False
}
)
register(
id='HoleReacher-v1',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v2',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": 2,
"collision_penalty": 1,
}
)
# CtxtFree are v0, Contextual are v1
register(
id='AntJump-v0',
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
"context": False
}
)
# CtxtFree are v0, Contextual are v1
register(
id='HalfCheetahJump-v0',
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
"context": False
}
)
register(
id='HopperJump-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
"context": False,
"healthy_reward": 1.0
}
)
"""
### Deprecated used for CorL paper
"""
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
for i in _vs:
_env_id = f'ALRReacher{i}-v0'
register(
id=_env_id,
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 5,
"balance": False,
'_ctrl_cost_weight': i
}
)
_env_id = f'ALRReacherSparse{i}-v0'
register(
id=_env_id,
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": False,
'_ctrl_cost_weight': i
}
)
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
for i in _vs:
_env_id = f'ALRReacher{i}ProMP-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
kwargs={
"name": f"{_env_id.replace('ProMP', '')}",
"wrappers": [mujoco.reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 4,
"policy_type": "motor",
# "weights_scale": 5,
"n_zero_basis": 1,
"zero_start": True,
"policy_kwargs": {
"p_gains": 1,
"d_gains": 0.1
}
}
}
)
_env_id = f'ALRReacherSparse{i}ProMP-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
kwargs={
"name": f"{_env_id.replace('ProMP', '')}",
"wrappers": [mujoco.reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 4,
"policy_type": "motor",
# "weights_scale": 5,
"n_zero_basis": 1,
"zero_start": True,
"policy_kwargs": {
"p_gains": 1,
"d_gains": 0.1
}
}
}
)
register(
id='HopperJumpOnBox-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
"context": False
}
)
register(
id='HopperThrow-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
"context": False
}
)
register(
id='HopperThrowInBasket-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
"context": False
}
)
register(
id='Walker2DJump-v0',
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
"context": False
}
)
register(id='TableTennis2DCtxt-v1',
entry_point='fancy_gym.envs.mujoco:TTEnvGym',
max_episode_steps=MAX_EPISODE_STEPS,
kwargs={'ctxt_dim': 2, 'fixed_goal': True})
register(
id='BeerPong-v0',
entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
max_episode_steps=300,
kwargs={
"rndm_goal": False,
"cup_goal_pos": [0.1, -2.0],
"frame_skip": 2
}
)
"""

View File

@ -1,18 +1,20 @@
### Classic Control
## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/SimpleReacher-v0` | Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 2 | 9 |
| `fancy/LongSimpleReacher-v0` | Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 5 | 18 |
| `fancy/ViaPointReacher-v0` | Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively. | 200 | 5 | 18 |
| `fancy/HoleReacher-v0` | 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 |
## MP Environments
|Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
[//]: |`HoleReacherProMPP-v0`|
| Name | Description | Horizon | Action Dimension | Context Dimension |
| ----------------------------------- | -------------------------------------------------------------------------------------------------------- | ------- | ---------------- | ----------------- |
| `fancy_DMP/ViaPointReacher-v0` | A DMP provides a trajectory for the `fancy/ViaPointReacher-v0` task. | 200 | 25 |
| `fancy_DMP/HoleReacherFixedGoal-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 |
| `fancy_DMP/HoleReacher-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 |
[//]: |`fancy/HoleReacherProMPP-v0`|

View File

@ -1,10 +1,10 @@
from typing import Union, Tuple, Optional
from typing import Union, Tuple, Optional, Any, Dict
import gym
import gymnasium as gym
import numpy as np
from gym import spaces
from gym.core import ObsType
from gym.utils import seeding
from gymnasium import spaces
from gymnasium.core import ObsType
from gymnasium.utils import seeding
from fancy_gym.envs.classic_control.utils import intersect
@ -55,7 +55,6 @@ class BaseReacherEnv(gym.Env):
self.fig = None
self._steps = 0
self.seed()
@property
def dt(self) -> Union[float, int]:
@ -69,10 +68,15 @@ class BaseReacherEnv(gym.Env):
def current_vel(self):
return self._angle_velocity.copy()
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
# Sample only orientation of first link, i.e. the arm is always straight.
if self.random_start:
super(BaseReacherEnv, self).reset(seed=seed, options=options)
try:
random_start = options.get('random_start', self.random_start)
except AttributeError:
random_start = self.random_start
if random_start:
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
self._start_pos = self._joint_angles.copy()
@ -84,7 +88,7 @@ class BaseReacherEnv(gym.Env):
self._update_joints()
self._steps = 0
return self._get_obs().copy()
return self._get_obs().copy(), {}
def _update_joints(self):
"""
@ -124,10 +128,6 @@ class BaseReacherEnv(gym.Env):
def _terminate(self, info) -> bool:
raise NotImplementedError
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def close(self):
super(BaseReacherEnv, self).close()
del self.fig

View File

@ -1,5 +1,5 @@
import numpy as np
from gym import spaces
from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -32,6 +32,7 @@ class BaseReacherDirectEnv(BaseReacherEnv):
reward, info = self._get_reward(action)
self._steps += 1
done = self._terminate(info)
terminated = self._terminate(info)
truncated = False
return self._get_obs().copy(), reward, done, info
return self._get_obs().copy(), reward, terminated, truncated, info

View File

@ -1,5 +1,5 @@
import numpy as np
from gym import spaces
from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -31,6 +31,7 @@ class BaseReacherTorqueEnv(BaseReacherEnv):
reward, info = self._get_reward(action)
self._steps += 1
done = False
terminated = False
truncated = False
return self._get_obs().copy(), reward, done, info
return self._get_obs().copy(), reward, terminated, truncated, info

View File

@ -1,17 +1,20 @@
from typing import Union, Optional, Tuple
from typing import Union, Optional, Tuple, Any, Dict
import gym
import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
from gym.core import ObsType
from gymnasium import spaces
from gymnasium.core import ObsType
from matplotlib import patches
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
from . import MPWrapper
MAX_EPISODE_STEPS_HOLEREACHER = 200
class HoleReacherEnv(BaseReacherDirectEnv):
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
@ -40,7 +43,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
[np.inf] # env steps, because reward start after n steps TODO: Maybe
])
# self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
if rew_fct == "simple":
from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
@ -54,13 +57,18 @@ class HoleReacherEnv(BaseReacherDirectEnv):
else:
raise ValueError("Unknown reward function {}".format(rew_fct))
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
# initialize seed here as the random goal needs to be generated before the super reset()
gym.Env.reset(self, seed=seed, options=options)
self._generate_hole()
self._set_patches()
self.reward_function.reset()
return super().reset()
# do not provide seed to avoid setting it twice
return super(HoleReacherEnv, self).reset(options=options)
def _get_reward(self, action: np.ndarray) -> (float, dict):
return self.reward_function.get_reward(self)
@ -160,7 +168,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
# all points that are above the hole
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
# check if any of those points are below surface
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
@ -223,16 +231,3 @@ class HoleReacherEnv(BaseReacherDirectEnv):
self.fig.gca().add_patch(left_block)
self.fig.gca().add_patch(right_block)
self.fig.gca().add_patch(hole_floor)
if __name__ == "__main__":
env = HoleReacherEnv(5)
env.reset()
for i in range(10000):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
env.reset()

View File

@ -7,6 +7,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
'weights_scale': 2,
},
},
'DMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
'weights_scale': 500,
},
'phase_generator_kwargs': {
'alpha_phase': 2.5,
},
},
'ProDMP': {},
}
@property
def context_mask(self):
return np.hstack([

View File

@ -7,6 +7,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 0.6,
'd_gains': 0.075,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 0.6,
'd_gains': 0.075,
},
'trajectory_generator_kwargs': {
'weights_scale': 50,
},
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property
def context_mask(self):
return np.hstack([

View File

@ -1,11 +1,12 @@
from typing import Iterable, Union, Optional, Tuple
from typing import Iterable, Union, Optional, Tuple, Any, Dict
import matplotlib.pyplot as plt
import numpy as np
from gym import spaces
from gym.core import ObsType
from gymnasium import spaces
from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
from . import MPWrapper
class SimpleReacherEnv(BaseReacherTorqueEnv):
@ -42,11 +43,15 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
# def start_pos(self):
# return self._start_pos
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
# (Env will not behave deterministic otherwise)
# Yes, there is probably a more elegant solution to this problem...
self._generate_goal()
return super().reset()
super().reset(seed=seed, options=options)
self._generate_goal()
return super().reset(seed=seed, options=options)
def _get_reward(self, action: np.ndarray):
diff = self.end_effector - self._goal
@ -127,15 +132,3 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
self.fig.canvas.draw()
self.fig.canvas.flush_events()
if __name__ == "__main__":
env = SimpleReacherEnv(5)
env.reset()
for i in range(200):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
break

View File

@ -7,6 +7,26 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
},
'DMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
'weights_scale': 50,
},
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property
def context_mask(self):
return np.hstack([

View File

@ -1,11 +1,13 @@
from typing import Iterable, Union, Tuple, Optional
from typing import Iterable, Union, Tuple, Optional, Any, Dict
import gym
import gymnasium as gym
import matplotlib.pyplot as plt
import numpy as np
from gym.core import ObsType
from gymnasium import spaces
from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
from . import MPWrapper
class ViaPointReacherEnv(BaseReacherDirectEnv):
@ -34,16 +36,21 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
[np.inf] * 2, # x-y coordinates of target distance
[np.inf] # env steps, because reward start after n steps
])
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
# @property
# def start_pos(self):
# return self._start_pos
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
# (Env will not behave deterministic otherwise)
# Yes, there is probably a more elegant solution to this problem...
self._generate_goal()
return super().reset()
super().reset(seed=seed, options=options)
self._generate_goal()
return super().reset(seed=seed, options=options)
def _generate_goal(self):
# TODO: Maybe improve this later, this can yield quite a lot of invalid settings
@ -183,16 +190,3 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
plt.pause(0.01)
if __name__ == "__main__":
env = ViaPointReacherEnv(5)
env.reset()
for i in range(10000):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
env.reset()

View File

@ -1,15 +1,48 @@
# Custom Mujoco tasks
## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ------------------------------------------ | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/Reacher-v0` | Modified (5 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 5 | 21 |
| `fancy/ReacherSparse-v0` | Same as `fancy/Reacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 5 | 21 |
| `fancy/ReacherSparseBalanced-v0` | Same as `fancy/ReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 5 | 21 |
| `fancy/LongReacher-v0` | Modified (7 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 7 | 27 |
| `fancy/LongReacherSparse-v0` | Same as `fancy/LongReacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 7 | 27 |
| `fancy/LongReacherSparseBalanced-v0` | Same as `fancy/LongReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 7 | 27 |
| `fancy/Reacher5d-v0` | Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
| `fancy/Reacher5dSparse-v0` | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
| `fancy/Reacher7d-v0` | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/Reacher7dSparse-v0` | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/HopperJump-v0` | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/AntJump-v0` | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant` | 200 | 8 | 119 |
| `fancy/HalfCheetahJump-v0` | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah` | 100 | 6 | 112 |
| `fancy/HopperJumpOnBox-v0` | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 4 | 16 / 100\* |
| `fancy/HopperThrow-v0` | Hopper Throw task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
| `fancy/HopperThrowInBasket-v0` | Hopper Throw in Basket task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
| `fancy/Walker2DJump-v0` | Walker 2D Jump task, based on Gymnasium's `gym.envs.mujoco.Walker2d` | 300 | 6 | 18 / 19\* |
| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
| `fancy/BeerPongStepBased-v0` | Step-based Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BoxPushingDense-v0` | Custom Box-pushing task with dense rewards | 100 | 3 | 13 |
| `fancy/BoxPushingTemporalSparse-v0` | Custom Box-pushing task with temporally sparse rewards | 100 | 3 | 13 |
| `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards | 100 | 3 | 13 |
| `fancy/TableTennis2D-v0` | Table Tennis task with 2D context, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennis2DReplan-v0` | Table Tennis task with 2D context and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennis4D-v0` | Table Tennis task with 4D context, based on a custom environment for table tennis | 350 | 7 | 22 |
| `fancy/TableTennis4DReplan-v0` | Table Tennis task with 4D context and replanning, based on a custom environment for table tennis | 350 | 7 | 22 |
| `fancy/TableTennisWind-v0` | Table Tennis task with wind effects, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
\*Observation dimensions depend on configuration.
<!--
No longer used?
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| --------------------------- | --------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/BallInACupSimple-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip |
| `fancy/BallInACup-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip |
| `fancy/BallInACupGoal-v0` | Similar to `fancy/BallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip |
-->

View File

@ -1,8 +1,11 @@
from typing import Tuple, Union, Optional
from typing import Tuple, Union, Optional, Any, Dict
import numpy as np
from gym.core import ObsType
from gym.envs.mujoco.ant_v4 import AntEnv
from gymnasium.core import ObsType
from gymnasium.envs.mujoco.ant_v4 import AntEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_ANTJUMP = 200
@ -12,8 +15,74 @@ MAX_EPISODE_STEPS_ANTJUMP = 200
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
# as possible, while landing at a specific target position
class AntEnvCustomXML(AntEnv):
def __init__(
self,
xml_file="ant.xml",
ctrl_cost_weight=0.5,
use_contact_forces=False,
contact_cost_weight=5e-4,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_z_range=(0.2, 1.0),
contact_force_range=(-1.0, 1.0),
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
ctrl_cost_weight,
use_contact_forces,
contact_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_z_range,
contact_force_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
class AntJumpEnv(AntEnv):
self._ctrl_cost_weight = ctrl_cost_weight
self._contact_cost_weight = contact_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_z_range = healthy_z_range
self._contact_force_range = contact_force_range
self._reset_noise_scale = reset_noise_scale
self._use_contact_forces = use_contact_forces
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
obs_shape = 27 + 1
if not exclude_current_positions_from_observation:
obs_shape += 2
if use_contact_forces:
obs_shape += 84
observation_space = Box(
low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
5,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class AntJumpEnv(AntEnvCustomXML):
"""
Initialization changes to normal Ant:
- healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
@ -61,9 +130,10 @@ class AntJumpEnv(AntEnv):
costs = ctrl_cost + contact_cost
done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
terminated = bool(
height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done:
if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or terminated:
# -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
# height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
@ -80,19 +150,21 @@ class AntJumpEnv(AntEnv):
'max_height': self.max_height,
'goal': self.goal
}
truncated = False
return obs, reward, done, info
return obs, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
self.max_height = 0
# goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.0, 2.5, 1)
return super().reset()
return ret
# reset_model had to be implemented in every env to make it deterministic
def reset_model(self):

View File

@ -1,9 +1,13 @@
import os
from typing import Optional
from typing import Optional, Any, Dict, Tuple
import numpy as np
from gym import utils
from gym.envs.mujoco import MujocoEnv
from gymnasium import utils
from gymnasium.core import ObsType
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
import mujoco
MAX_EPISODE_STEPS_BEERPONG = 300
FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2!
@ -30,7 +34,16 @@ CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "c
class BeerPongEnv(MujocoEnv, utils.EzPickle):
def __init__(self):
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 100
}
def __init__(self, **kwargs):
self._steps = 0
# Small Context -> Easier. Todo: Should we do different versions?
# self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
@ -50,9 +63,9 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.repeat_action = 2
# TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
self.model = None
self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
x)
self.geom_id = lambda x: mujoco.mj_name2id(self.model,
mujoco.mjtObj.mjOBJ_GEOM,
x)
# for reward calculation
self.dists = []
@ -65,7 +78,17 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco")
self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(29,), dtype=np.float64
)
MujocoEnv.__init__(
self,
self.xml_path,
frame_skip=1,
observation_space=self.observation_space,
**kwargs
)
utils.EzPickle.__init__(self)
@property
@ -76,7 +99,8 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
def start_vel(self):
return self._start_vel
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.dists = []
self.dists_final = []
self.action_costs = []
@ -86,7 +110,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_cup_contact = False
self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
return super().reset()
return super().reset(seed=seed, options=options)
def reset_model(self):
init_pos_all = self.init_qpos.copy()
@ -128,11 +152,11 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
if not crash:
reward, reward_infos = self._get_reward(applied_action)
is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference
done = is_collided
terminated = is_collided
self._steps += 1
else:
reward = -30
done = True
terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict(
@ -142,7 +166,10 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
)
infos.update(reward_infos)
return ob, reward, done, infos
truncated = False
return ob, reward, terminated, truncated, infos
def _get_obs(self):
theta = self.data.qpos.flat[:7].copy()
@ -197,13 +224,13 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
action_cost = 1e-4 * np.mean(action_cost)
reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
# release step punishment
min_time_bound = 0.1
max_time_bound = 1.0
release_time = self.release_step * self.dt
release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
reward += release_time_rew
success = self.ball_in_cup
else:
@ -258,9 +285,9 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else:
reward = 0
done = True
terminated, truncated = True, False
while self._steps < MAX_EPISODE_STEPS_BEERPONG:
obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
obs, sub_reward, terminated, truncated, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
np.zeros(a.shape))
reward += sub_reward
return obs, reward, done, infos
return obs, reward, terminated, truncated, infos

View File

@ -1,9 +1,8 @@
import os
import mujoco_py.builder
import numpy as np
from gym import utils
from gym.envs.mujoco import MujocoEnv
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
@ -74,27 +73,24 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
crash = False
for _ in range(self.repeat_action):
applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
try:
self.do_simulation(applied_action, self.frame_skip)
self.reward_function.initialize(self)
# self.reward_function.check_contacts(self.sim) # I assume this is not important?
if self._steps < self.release_step:
self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
crash = False
except mujoco_py.builder.MujocoException:
crash = True
self.do_simulation(applied_action, self.frame_skip)
self.reward_function.initialize(self)
# self.reward_function.check_contacts(self.sim) # I assume this is not important?
if self._steps < self.release_step:
self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
crash = False
ob = self._get_obs()
if not crash:
reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
is_collided = reward_infos['is_collided']
done = is_collided or self._steps == self.ep_length - 1
terminated = is_collided or self._steps == self.ep_length - 1
self._steps += 1
else:
reward = -30
done = True
terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict(
@ -104,7 +100,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
)
infos.update(reward_infos)
return ob, reward, done, infos
return ob, reward, terminated, infos
def _get_obs(self):
theta = self.sim.data.qpos.flat[:7]
@ -143,16 +139,16 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else:
reward = 0
done = False
while not done:
sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
np.zeros(a.shape))
terminated, truncated = False, False
while not (terminated or truncated):
sub_ob, sub_reward, terminated, truncated, sub_infos = super(BeerPongEnvStepBasedEpisodicReward,
self).step(np.zeros(a.shape))
reward += sub_reward
infos = sub_infos
ob = sub_ob
ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# internal steps and thus, the observation also needs to be set correctly
return ob, reward, done, infos
return ob, reward, terminated, truncated, infos
# class BeerBongEnvStepBased(BeerBongEnv):
@ -186,27 +182,3 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# # internal steps and thus, the observation also needs to be set correctly
# return ob, reward, done, infos
if __name__ == "__main__":
env = BeerPongEnv(frame_skip=2)
env.seed(0)
# env = BeerBongEnvStepBased(frame_skip=2)
# env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
# env = BeerBongEnvFixedReleaseStep(frame_skip=2)
import time
env.reset()
env.render("human")
for i in range(600):
# ac = 10 * env.action_space.sample()
ac = 0.05 * np.ones(7)
obs, rew, d, info = env.step(ac)
env.render("human")
if d:
print('reward:', rew)
print('RESETTING')
env.reset()
time.sleep(1)
env.close()

View File

@ -6,6 +6,23 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'learn_tau': True
},
'controller_kwargs': {
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'num_basis_zero_start': 2,
},
},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
@ -39,3 +56,23 @@ class MPWrapper(RawInterfaceWrapper):
xyz[-1] = 0.840
self.model.body_pos[self.cup_table_id] = xyz
return self.get_observation_from_step(self.get_obs())
class MPWrapper_FixedRelease(MPWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'tau': 0.62,
},
'controller_kwargs': {
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'num_basis_zero_start': 2,
},
},
'DMP': {},
'ProDMP': {},
}

View File

@ -1 +1 @@
from .mp_wrapper import MPWrapper
from .mp_wrapper import MPWrapper, ReplanMPWrapper

View File

@ -1,8 +1,8 @@
import os
import numpy as np
from gym import utils, spaces
from gym.envs.mujoco import MujocoEnv
from gymnasium import utils, spaces
from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat
@ -13,6 +13,7 @@ MAX_EPISODE_STEPS_BOX_PUSHING = 100
BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]])
class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
"""
franka box pushing environment
@ -26,6 +27,15 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
3. time-spatial-depend sparse reward
"""
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 50
}
def __init__(self, frame_skip: int = 10, random_init: bool = False):
utils.EzPickle.__init__(**locals())
self._steps = 0
@ -39,11 +49,16 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
self._desired_rod_quat = desired_rod_quat
self._episode_energy = 0.
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(28,), dtype=np.float64
)
self.random_init = random_init
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"),
frame_skip=self.frame_skip,
mujoco_bindings="mujoco")
observation_space=self.observation_space)
self.action_space = spaces.Box(low=-1, high=1, shape=(7,))
def step(self, action):
@ -89,7 +104,11 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False,
'num_steps': self._steps
}
return obs, reward, episode_end, infos
terminated = episode_end and infos['is_success']
truncated = episode_end and not infos['is_success']
return obs, reward, terminated, truncated, infos
def reset_model(self):
# rest box to initial position
@ -250,7 +269,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
old_err_norm = err_norm
### get Jacobian by mujoco
# get Jacobian by mujoco
self.data.qpos[:7] = q
mujoco.mj_forward(self.model, self.data)
@ -284,6 +303,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
return q
class BoxPushingDense(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -299,7 +319,7 @@ class BoxPushingDense(BoxPushingEnvBase):
energy_cost = -0.0005 * np.sum(np.square(action))
reward = joint_penalty + tcp_box_dist_reward + \
box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat)
if rod_inclined_angle > np.pi / 4:
@ -307,6 +327,7 @@ class BoxPushingDense(BoxPushingEnvBase):
return reward
class BoxPushingTemporalSparse(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -368,6 +389,7 @@ class BoxPushingTemporalSpatialSparse(BoxPushingEnvBase):
return reward
class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False):

View File

@ -6,6 +6,27 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'basis_generator_kwargs': {
'basis_bandwidth_factor': 2 # 3.5, 4 to try
}
},
'DMP': {},
'ProDMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'basis_generator_kwargs': {
'basis_bandwidth_factor': 2 # 3.5, 4 to try
}
},
}
# Random x goal + random init pos
@property
@ -38,3 +59,35 @@ class MPWrapper(RawInterfaceWrapper):
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.data.qvel[:7].copy()
class ReplanMPWrapper(MPWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'trajectory_generator_kwargs': {
'weights_scale': 0.3,
'goal_scale': 0.3,
'auto_scale_basis': True,
'goal_offset': 1.0,
'disable_goal': True,
},
'basis_generator_kwargs': {
'num_basis': 5,
'basis_bandwidth_factor': 3,
},
'phase_generator_kwargs': {
'alpha_phase': 3,
},
'black_box_kwargs': {
'max_planning_times': 4,
'replanning_schedule': lambda pos, vel, obs, action, t: t % 25 == 0,
'condition_on_desired': True,
}
}
}

View File

@ -1,14 +1,68 @@
import os
from typing import Tuple, Union, Optional
from typing import Tuple, Union, Optional, Any, Dict
import numpy as np
from gym.core import ObsType
from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv
from gymnasium.core import ObsType
from gymnasium.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
class HalfCheetahJumpEnv(HalfCheetahEnv):
class HalfCheetahEnvCustomXML(HalfCheetahEnv):
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=0.1,
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if exclude_current_positions_from_observation:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
5,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class HalfCheetahJumpEnv(HalfCheetahEnvCustomXML):
"""
_ctrl_cost_weight 0.1 -> 0.0
"""
@ -41,10 +95,11 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
height_after = self.get_body_com("torso")[2]
self.max_height = max(height_after, self.max_height)
## Didnt use fell_over, because base env also has no done condition - Paul and Marc
# Didnt use fell_over, because base env also has no done condition - Paul and Marc
# fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke?
# TODO: Should a fall over be checked here?
done = False
terminated = False
truncated = False
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
@ -63,17 +118,18 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
'max_height': self.max_height
}
return observation, reward, done, info
return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0
self.current_step = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6
return super().reset()
return ret
# overwrite reset_model to make it deterministic
def reset_model(self):

View File

@ -6,6 +6,12 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
return np.hstack([

View File

@ -0,0 +1,52 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual>
<map znear="0.02"/>
</visual>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
</body>
</body>
</body>
</body>
<body name="goal_site_body" pos = "0 0 0">
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,52 +1,51 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<compiler angle="radian" autolimits="true"/>
<option integrator="RK4"/>
<visual>
<map znear="0.02"/>
</visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25" gravcomp="0">
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
<site name="foot_site" pos="-0.065 0 -0.06" size="0.02" rgba="1 0 0 1"/>
</body>
</body>
</body>
</body>
<body name="goal_site_body" pos = "0 0 0">
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
</body>
<body name="goal_site_body" pos="0 0 0" gravcomp="0">
<site name="goal_site" pos="0 0 0" size="0.02" rgba="0 1 0 1"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,51 +1,50 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<compiler angle="radian" autolimits="true"/>
<option integrator="RK4"/>
<visual>
<map znear="0.02"/>
</visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25" gravcomp="0">
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
</body>
</body>
</body>
<body name="box" pos="1 0 0">
<geom friction="1.0" fromto="0.48 0 0 1 0 0" name="basket_ground_geom" size="0.3" type="box" rgba="1 0 0 1"/>
<body name="box" pos="1 0 0" gravcomp="0">
<geom name="basket_ground_geom" size="0.3 0.3 0.26" pos="-0.26 0 0" quat="0.707107 0 -0.707107 0" type="box" rgba="1 0 0 1"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,12 +1,95 @@
import os
import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv
from gymnasium.envs.mujoco.hopper_v4 import HopperEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
import mujoco
MAX_EPISODE_STEPS_HOPPERJUMP = 250
class HopperJumpEnv(HopperEnv):
class HopperEnvCustomXML(HopperEnv):
"""
Initialization changes to normal Hopper:
- terminate_when_unhealthy: True -> False
- healthy_reward: 1.0 -> 2.0
- healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
- healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
- exclude_current_positions_from_observation: True -> False
"""
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=1e-3,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_state_range=(-100.0, 100.0),
healthy_z_range=(0.7, float("inf")),
healthy_angle_range=(-0.2, 0.2),
reset_noise_scale=5e-3,
exclude_current_positions_from_observation=True,
**kwargs,
):
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_state_range,
healthy_z_range,
healthy_angle_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs
)
self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_state_range = healthy_state_range
self._healthy_z_range = healthy_z_range
self._healthy_angle_range = healthy_angle_range
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if not hasattr(self, 'observation_space'):
if exclude_current_positions_from_observation:
self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(15,), dtype=np.float64
)
else:
self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(16,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
4,
observation_space=self.observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class HopperJumpEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- terminate_when_unhealthy: True -> False
@ -73,7 +156,7 @@ class HopperJumpEnv(HopperEnv):
self.do_simulation(action, self.frame_skip)
height_after = self.get_body_com("torso")[2]
#site_pos_after = self.data.get_site_xpos('foot_site')
# site_pos_after = self.data.get_site_xpos('foot_site')
site_pos_after = self.data.site('foot_site').xpos
self.max_height = max(height_after, self.max_height)
@ -88,7 +171,8 @@ class HopperJumpEnv(HopperEnv):
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
done = False
terminated = False
truncated = False
goal_dist = np.linalg.norm(site_pos_after - self.goal)
if self.contact_dist is None and self.contact_with_floor:
@ -115,7 +199,7 @@ class HopperJumpEnv(HopperEnv):
healthy=self.is_healthy,
contact_dist=self.contact_dist or 0
)
return observation, reward, done, info
return observation, reward, terminated, truncated, info
def _get_obs(self):
# goal_dist = self.data.get_site_xpos('foot_site') - self.goal
@ -140,8 +224,8 @@ class HopperJumpEnv(HopperEnv):
noise_high[5] = 0.785
qpos = (
self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
self.init_qpos
self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
self.init_qpos
)
qvel = (
# self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
@ -162,12 +246,12 @@ class HopperJumpEnv(HopperEnv):
# floor_geom_id = self.model.geom_name2id('floor')
# foot_geom_id = self.model.geom_name2id('foot_geom')
# TODO: do this properly over a sensor in the xml file, see dmc hopper
floor_geom_id = self._mujoco_bindings.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
'floor')
foot_geom_id = self._mujoco_bindings.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
'foot_geom')
floor_geom_id = mujoco.mj_name2id(self.model,
mujoco.mjtObj.mjOBJ_GEOM,
'floor')
foot_geom_id = mujoco.mj_name2id(self.model,
mujoco.mjtObj.mjOBJ_GEOM,
'foot_geom')
for i in range(self.data.ncon):
contact = self.data.contact[i]
collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id

View File

@ -1,12 +1,16 @@
import os
from typing import Optional, Dict, Any, Tuple
import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv
from gymnasium.core import ObsType
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
class HopperJumpOnBoxEnv(HopperEnv):
class HopperJumpOnBoxEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.01 -> 0.001
@ -33,6 +37,16 @@ class HopperJumpOnBoxEnv(HopperEnv):
self.hopper_on_box = False
self.context = context
self.box_x = 1
if exclude_current_positions_from_observation:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(13,), dtype=np.float64
)
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
@ -74,10 +88,10 @@ class HopperJumpOnBoxEnv(HopperEnv):
costs = ctrl_cost
done = fell_over or self.hopper_on_box
terminated = fell_over or self.hopper_on_box
if self.current_step >= self.max_episode_steps or done:
done = False
if self.current_step >= self.max_episode_steps or terminated:
done = False # TODO why are we doing this???
max_height = self.max_height.copy()
min_distance = self.min_distance.copy()
@ -122,21 +136,25 @@ class HopperJumpOnBoxEnv(HopperEnv):
'goal': self.box_x,
}
return observation, reward, done, info
truncated = self.current_step >= self.max_episode_steps and not terminated
return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.box_x)
def reset(self):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0
self.min_distance = 5000
self.current_step = 0
self.hopper_on_box = False
ret = super().reset(seed=seed, options=options)
if self.context:
self.box_x = self.np_random.uniform(1, 3, 1)
self.model.body("box").pos = [self.box_x[0], 0, 0]
return super().reset()
return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@ -150,21 +168,3 @@ class HopperJumpOnBoxEnv(HopperEnv):
observation = self._get_obs()
return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperJumpOnBoxEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
# Random x goal + random init pos
@property

View File

@ -1,56 +1,54 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<compiler angle="radian" autolimits="true"/>
<option integrator="RK4"/>
<visual>
<map znear="0.02"/>
</visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25" gravcomp="0">
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
</body>
</body>
</body>
<body name="ball" pos="0 0 1.53">
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
<body name="ball" pos="0 0 1.53" gravcomp="0">
<joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
<geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
<site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,132 +1,129 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<compiler angle="radian" autolimits="true"/>
<option integrator="RK4"/>
<visual>
<map znear="0.02"/>
</visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25" gravcomp="0">
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
</body>
</body>
</body>
<body name="ball" pos="0 0 1.53">
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
<body name="ball" pos="0 0 1.53" gravcomp="0">
<joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
<geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
<site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
</body>
<body name="basket_ground" pos="5 0 0">
<geom friction="0.9" fromto="5 0 0 5.3 0 0" name="basket_ground_geom" size="0.1 0.4 0.3" type="box"/>
<body name="edge1" pos="5 0 0">
<geom friction="2.0" fromto="5 0 0 5 0 0.2" name="edge1_geom" size="0.04" type="capsule"/>
</body>
<body name="edge2" pos="5 0 0.05">
<geom friction="2.0" fromto="5 0.05 0 5 0.05 0.2" name="edge2_geom" size="0.04" type="capsule"/>
</body>
<body name="edge3" pos="5 0 0.1">
<geom friction="2.0" fromto="5 0.1 0 5 0.1 0.2" name="edge3_geom" size="0.04" type="capsule"/>
</body>
<body name="edge4" pos="5 0 0.15">
<geom friction="2.0" fromto="5 0.15 0 5 0.15 0.2" name="edge4_geom" size="0.04" type="capsule"/>
</body>
<body name="edge5" pos="5.05 0 0.15">
<geom friction="2.0" fromto="5.05 0.15 0 5.05 0.15 0.2" name="edge5_geom" size="0.04" type="capsule"/>
</body>
<body name="edge6" pos="5.1 0 0.15">
<geom friction="2.0" fromto="5.1 0.15 0 5.1 0.15 0.2" name="edge6_geom" size="0.04" type="capsule"/>
</body>
<body name="edge7" pos="5.15 0 0.15">
<geom friction="2.0" fromto="5.15 0.15 0 5.15 0.15 0.2" name="edge7_geom" size="0.04" type="capsule"/>
</body>
<body name="edge8" pos="5.2 0 0.15">
<geom friction="2.0" fromto="5.2 0.15 0 5.2 0.15 0.2" name="edge8_geom" size="0.04" type="capsule"/>
</body>
<body name="edge9" pos="5.25 0 0.15">
<geom friction="2.0" fromto="5.25 0.15 0 5.25 0.15 0.2" name="edge9_geom" size="0.04" type="capsule"/>
</body>
<body name="edge10" pos="5.3 0 0.15">
<geom friction="2.0" fromto="5.3 0.15 0 5.3 0.15 0.2" name="edge10_geom" size="0.04" type="capsule"/>
</body>
<body name="edge11" pos="5.3 0 0.1">
<geom friction="2.0" fromto="5.3 0.1 0 5.3 0.1 0.2" name="edge11_geom" size="0.04" type="capsule"/>
</body>
<body name="edge12" pos="5.3 0 0.05">
<geom friction="2.0" fromto="5.3 0.05 0 5.3 0.05 0.2" name="edge12_geom" size="0.04" type="capsule"/>
</body>
<body name="edge13" pos="5.3 0 0.0">
<geom friction="2.0" fromto="5.3 0 0 5.3 0 0.2" name="edge13_geom" size="0.04" type="capsule"/>
</body>
<body name="edge14" pos="5.3 0 -0.05">
<geom friction="2.0" fromto="5.3 -0.05 0 5.3 -0.05 0.2" name="edge14_geom" size="0.04" type="capsule"/>
</body>
<body name="edge15" pos="5.3 0 -0.1">
<geom friction="2.0" fromto="5.3 -0.1 0 5.3 -0.1 0.2" name="edge15_geom" size="0.04" type="capsule"/>
</body>
<body name="edge16" pos="5.3 0 -0.15">
<geom friction="2.0" fromto="5.3 -0.15 0 5.3 -0.15 0.2" name="edge16_geom" size="0.04" type="capsule"/>
</body>
<body name="edge20" pos="5.25 0 -0.15">
<geom friction="2.0" fromto="5.25 -0.15 0 5.25 -0.15 0.2" name="edge20_geom" size="0.04" type="capsule"/>
</body>
<body name="edge21" pos="5.2 0 -0.15">
<geom friction="2.0" fromto="5.2 -0.15 0 5.2 -0.15 0.2" name="edge21_geom" size="0.04" type="capsule"/>
</body>
<body name="edge22" pos="5.15 0 -0.15">
<geom friction="2.0" fromto="5.15 -0.15 0 5.15 -0.15 0.2" name="edge22_geom" size="0.04" type="capsule"/>
</body>
<body name="edge23" pos="5.1 0 -0.15">
<geom friction="2.0" fromto="5.1 -0.15 0 5.1 -0.15 0.2" name="edge23_geom" size="0.04" type="capsule"/>
</body>
<body name="edge24" pos="5.05 0 -0.15">
<geom friction="2.0" fromto="5.05 -0.15 0 5.05 -0.15 0.2" name="edge24_geom" size="0.04" type="capsule"/>
</body>
<body name="edge25" pos="5 0 -0.15">
<geom friction="2.0" fromto="5 -0.15 0 5 -0.15 0.2" name="edge25_geom" size="0.04" type="capsule"/>
</body>
<body name="edge26" pos="5 0 -0.1">
<geom friction="2.0" fromto="5 -0.1 0 5 -0.1 0.2" name="edge26_geom" size="0.04" type="capsule"/>
</body>
<body name="edge27" pos="5 0 -0.05">
<geom friction="2.0" fromto="5 -0.05 0 5 -0.05 0.2" name="edge27_geom" size="0.04" type="capsule"/>
</body>
<body name="basket_ground" pos="5 0 0" gravcomp="0">
<geom name="basket_ground_geom" size="0.1 0.1 0.15" pos="0.15 0 0" quat="0.707107 0 -0.707107 0" type="box" friction="0.9 0.005 0.0001"/>
<body name="edge1" pos="0 0 0" gravcomp="0">
<geom name="edge1_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge2" pos="0 0 0.05" gravcomp="0">
<geom name="edge2_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge3" pos="0 0 0.1" gravcomp="0">
<geom name="edge3_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge4" pos="0 0 0.15" gravcomp="0">
<geom name="edge4_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge5" pos="0.05 0 0.15" gravcomp="0">
<geom name="edge5_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge6" pos="0.1 0 0.15" gravcomp="0">
<geom name="edge6_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge7" pos="0.15 0 0.15" gravcomp="0">
<geom name="edge7_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge8" pos="0.2 0 0.15" gravcomp="0">
<geom name="edge8_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge9" pos="0.25 0 0.15" gravcomp="0">
<geom name="edge9_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge10" pos="0.3 0 0.15" gravcomp="0">
<geom name="edge10_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge11" pos="0.3 0 0.1" gravcomp="0">
<geom name="edge11_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge12" pos="0.3 0 0.05" gravcomp="0">
<geom name="edge12_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge13" pos="0.3 0 0" gravcomp="0">
<geom name="edge13_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge14" pos="0.3 0 -0.05" gravcomp="0">
<geom name="edge14_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge15" pos="0.3 0 -0.1" gravcomp="0">
<geom name="edge15_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge16" pos="0.3 0 -0.15" gravcomp="0">
<geom name="edge16_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge20" pos="0.25 0 -0.15" gravcomp="0">
<geom name="edge20_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge21" pos="0.2 0 -0.15" gravcomp="0">
<geom name="edge21_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge22" pos="0.15 0 -0.15" gravcomp="0">
<geom name="edge22_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge23" pos="0.1 0 -0.15" gravcomp="0">
<geom name="edge23_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge24" pos="0.05 0 -0.15" gravcomp="0">
<geom name="edge24_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge25" pos="0 0 -0.15" gravcomp="0">
<geom name="edge25_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge26" pos="0 0 -0.1" gravcomp="0">
<geom name="edge26_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
<body name="edge27" pos="0 0 -0.05" gravcomp="0">
<geom name="edge27_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,13 +1,15 @@
import os
from typing import Optional
from typing import Optional, Any, Dict, Tuple
import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv
from gymnasium.core import ObsType
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERTHROW = 250
class HopperThrowEnv(HopperEnv):
class HopperThrowEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0 -> 0.1
@ -36,6 +38,16 @@ class HopperThrowEnv(HopperEnv):
self.max_episode_steps = max_episode_steps
self.context = context
self.goal = 0
if not hasattr(self, 'observation_space'):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight,
ctrl_cost_weight=ctrl_cost_weight,
@ -56,14 +68,14 @@ class HopperThrowEnv(HopperEnv):
# done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
done = ball_landed
terminated = ball_landed
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
rewards = 0
if self.current_step >= self.max_episode_steps or done:
if self.current_step >= self.max_episode_steps or terminated:
distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
self._forward_reward_weight * ball_pos_after
healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
@ -78,16 +90,19 @@ class HopperThrowEnv(HopperEnv):
'_steps': self.current_step,
'goal': self.goal,
}
truncated = False
return observation, reward, done, info
return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0
return super().reset()
return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@ -101,22 +116,3 @@ class HopperThrowEnv(HopperEnv):
observation = self._get_obs()
return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperThrowEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -1,13 +1,16 @@
import os
from typing import Optional
from typing import Optional, Any, Dict, Tuple
import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium.core import ObsType
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
class HopperThrowInBasketEnv(HopperEnv):
class HopperThrowInBasketEnv(HopperEnvCustomXML):
"""
Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0
@ -42,6 +45,16 @@ class HopperThrowInBasketEnv(HopperEnv):
self.context = context
self.penalty = penalty
self.basket_x = 5
if exclude_current_positions_from_observation:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight,
@ -65,14 +78,14 @@ class HopperThrowInBasketEnv(HopperEnv):
is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
self.basket_size / 2)
self.basket_size / 2)
is_in_basket_z = ball_pos[2] < 0.1
is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
if is_in_basket:
self.ball_in_basket = True
ball_landed = self.get_body_com("ball")[2] <= 0.05
done = bool(ball_landed or is_in_basket)
terminated = bool(ball_landed or is_in_basket)
rewards = 0
@ -80,7 +93,7 @@ class HopperThrowInBasketEnv(HopperEnv):
costs = ctrl_cost
if self.current_step >= self.max_episode_steps or done:
if self.current_step >= self.max_episode_steps or terminated:
if is_in_basket:
if not self.context:
@ -101,23 +114,27 @@ class HopperThrowInBasketEnv(HopperEnv):
info = {
'ball_pos': ball_pos[0],
}
truncated = False
return observation, reward, done, info
return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.basket_x)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
if self.max_episode_steps == 10:
# We have to initialize this here, because the spec is only added after creating the env.
self.max_episode_steps = self.spec.max_episode_steps
self.current_step = 0
self.ball_in_basket = False
ret = super().reset(seed=seed, options=options)
if self.context:
self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
return super().reset()
return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@ -132,22 +149,3 @@ class HopperThrowInBasketEnv(HopperEnv):
observation = self._get_obs()
return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperThrowInBasketEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self):

View File

@ -7,6 +7,16 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property
def context_mask(self):
return np.concatenate([[False] * self.n_links, # cos

View File

@ -1,8 +1,9 @@
import os
import numpy as np
from gym import utils
from gym.envs.mujoco import MujocoEnv
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_REACHER = 200
@ -12,7 +13,17 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
More general version of the gym mujoco Reacher environment
"""
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1):
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 50,
}
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1.,
**kwargs):
utils.EzPickle.__init__(**locals())
self._steps = 0
@ -25,10 +36,16 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
file_name = f'reacher_{n_links}links.xml'
# sin, cos, velocity * n_Links + goal position (2) and goal distance (3)
shape = (self.n_links * 3 + 5,)
observation_space = Box(low=-np.inf, high=np.inf, shape=shape, dtype=np.float64)
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
frame_skip=2,
mujoco_bindings="mujoco")
observation_space=observation_space,
**kwargs
)
def step(self, action):
self._steps += 1
@ -45,10 +62,14 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
reward = reward_dist + reward_ctrl + angular_vel
self.do_simulation(action, self.frame_skip)
ob = self._get_obs()
done = False
if self.render_mode == "human":
self.render()
infos = dict(
ob = self._get_obs()
terminated = False
truncated = False
info = dict(
reward_dist=reward_dist,
reward_ctrl=reward_ctrl,
velocity=angular_vel,
@ -56,7 +77,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
goal=self.goal if hasattr(self, "goal") else None
)
return ob, reward, done, infos
return ob, reward, terminated, truncated, info
def distance_reward(self):
vec = self.get_body_com("fingertip") - self.get_body_com("target")
@ -66,6 +87,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
def viewer_setup(self):
assert self.viewer is not None
self.viewer.cam.trackbodyid = 0
def reset_model(self):

View File

@ -7,6 +7,53 @@ from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, j
class TT_MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'learn_tau': False,
'learn_delay': False,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'num_basis_zero_start': 1,
'num_basis_zero_goal': 1,
},
'black_box_kwargs': {
'verbose': 2,
},
},
'DMP': {},
'ProDMP': {
'phase_generator_kwargs': {
'learn_tau': True,
'learn_delay': True,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
'alpha_phase': 3,
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'alpha': 25,
'basis_bandwidth_factor': 3,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.7,
'auto_scale_basis': True,
'relative_goal': True,
'disable_goal': True,
},
},
}
# Random x goal + random init pos
@property
@ -16,7 +63,7 @@ class TT_MPWrapper(RawInterfaceWrapper):
[False] * 7, # joints velocity
[True] * 2, # position ball x, y
[False] * 1, # position ball z
#[True] * 3, # velocity ball x, y, z
# [True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position
# [True] * 1, # time
])
@ -40,7 +87,58 @@ class TT_MPWrapper(RawInterfaceWrapper):
return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]:
return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound)
class TT_MPWrapper_Replan(TT_MPWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {
'phase_generator_kwargs': {
'learn_tau': True,
'learn_delay': True,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
'alpha_phase': 3,
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'alpha': 25,
'basis_bandwidth_factor': 3,
},
'trajectory_generator_kwargs': {
'auto_scale_basis': True,
'goal_offset': 1.0,
},
'black_box_kwargs': {
'max_planning_times': 3,
'replanning_schedule': lambda pos, vel, obs, action, t: t % 50 == 0,
},
},
}
class TTVelObs_MPWrapper(TT_MPWrapper):
# Will inherit mp_config from TT_MPWrapper
@property
def context_mask(self):
return np.hstack([
[False] * 7, # joints position
[False] * 7, # joints velocity
[True] * 2, # position ball x, y
[False] * 1, # position ball z
[True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position
# [True] * 1, # time
])
class TTVelObs_MPWrapper_Replan(TT_MPWrapper_Replan):
# Will inherit mp_config from TT_MPWrapper_Replan
@property
def context_mask(self):

View File

@ -1,8 +1,8 @@
import os
import numpy as np
from gym import utils, spaces
from gym.envs.mujoco import MujocoEnv
from gymnasium import utils, spaces
from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high
@ -22,6 +22,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
"""
7 DoF table tennis environment
"""
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 125
}
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4,
goal_switching_step: int = None,
enable_artificial_wind: bool = False):
@ -50,10 +60,15 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
self._artificial_force = 0.
if not hasattr(self, 'observation_space'):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"),
frame_skip=frame_skip,
mujoco_bindings="mujoco")
observation_space=self.observation_space)
if ctxt_dim == 2:
self.context_bounds = CONTEXT_BOUNDS_2DIMS
@ -83,11 +98,11 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
unstable_simulation = False
if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5:
new_goal_pos = self._generate_goal_pos(random=True)
new_goal_pos[1] = -new_goal_pos[1]
self._goal_pos = new_goal_pos
self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
mujoco.mj_forward(self.model, self.data)
new_goal_pos = self._generate_goal_pos(random=True)
new_goal_pos[1] = -new_goal_pos[1]
self._goal_pos = new_goal_pos
self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
mujoco.mj_forward(self.model, self.data)
for _ in range(self.frame_skip):
if self._enable_artificial_wind:
@ -102,7 +117,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball:
self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \
self._contact_checker(self._ball_contact_id, self._bat_back_id)
self._contact_checker(self._ball_contact_id, self._bat_back_id)
if not self._hit_ball:
ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id)
if ball_land_on_floor_no_hit:
@ -130,9 +145,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
reward = -25 if unstable_simulation else self._get_reward(self._terminated)
land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \
if self._ball_landing_pos is not None else 10.
if self._ball_landing_pos is not None else 10.
return self._get_obs(), reward, self._terminated, {
info = {
"hit_ball": self._hit_ball,
"ball_returned_success": self._ball_return_success,
"land_dist_error": land_dist_err,
@ -140,6 +155,10 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
"num_steps": self._steps,
}
terminated, truncated = self._terminated, False
return self._get_obs(), reward, terminated, truncated, info
def _contact_checker(self, id_1, id_2):
for coni in range(0, self.data.ncon):
con = self.data.contact[coni]
@ -202,7 +221,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball:
return 0.2 * (1 - np.tanh(min_r_b_dist**2))
if self._ball_landing_pos is None:
min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:,:2] - self._goal_pos[:2], axis=1))
min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:, :2] - self._goal_pos[:2], axis=1))
return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2))
min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2])
over_net_bonus = int(self._ball_landing_pos[0] < 0)
@ -231,13 +250,13 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0))
violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0))
invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \
violate_high_bound_error + violate_low_bound_error
violate_high_bound_error + violate_low_bound_error
return -invalid_penalty
def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound):
obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound)
return obs, penalty, True, {
return obs, penalty, True, False, {
"hit_ball": [False],
"ball_returned_success": [False],
"land_dist_error": [10.],
@ -249,7 +268,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
@staticmethod
def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound):
time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \
or action[1] > delay_bound[1] or action[1] < delay_bound[0]
or action[1] > delay_bound[1] or action[1] < delay_bound[0]
if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low):
return False, pos_traj, vel_traj
return True, pos_traj, vel_traj
@ -257,6 +276,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
class TableTennisWind(TableTennisEnv):
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(22,), dtype=np.float64
)
super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True)
def _get_obs(self):

View File

@ -1,64 +1,60 @@
<mujoco model="walker2d">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="0.01" damping=".1" limited="true"/>
<geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/>
<compiler angle="radian" autolimits="true"/>
<option integrator="RK4"/>
<default class="main">
<joint limited="true" armature="0.01" damping="0.1"/>
<geom conaffinity="0" friction="0.7 0.1 0.1" rgba="0.8 0.6 0.4 1"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.2/2 0 0.1">
<site name="foot_right_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="0 0 1 1" type="sphere"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
<geom name="floor" size="40 40 40" type="plane" conaffinity="1" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25" gravcomp="0">
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.1 0.1"/>
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1"/>
<body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1"/>
<body name="foot" pos="0.1 0 -0.25" gravcomp="0">
<joint name="foot_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="0.9 0.1 0.1"/>
<site name="foot_right_site" pos="-0.1 0 -0.06" size="0.02" rgba="0 0 1 1"/>
</body>
</body>
</body>
<!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->
<body name="thigh_left" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/>
<body name="leg_left" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/>
<body name="foot_left" pos="0.2/2 0 0.1">
<site name="foot_left_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
<joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
<body name="thigh_left" pos="0 0 -0.2" gravcomp="0">
<joint name="thigh_left_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom name="thigh_left_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<body name="leg_left" pos="0 0 -0.7" gravcomp="0">
<joint name="leg_left_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom name="leg_left_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<body name="foot_left" pos="0.1 0 -0.25" gravcomp="0">
<joint name="foot_left_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom name="foot_left_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="1.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<site name="foot_left_site" pos="-0.1 0 -0.06" size="0.02" rgba="1 0 0 1"/>
</body>
</body>
</body>
</body>
</worldbody>
<actuator>
<!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>-->
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
<!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
<general joint="thigh_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<general joint="leg_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<general joint="foot_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<general joint="thigh_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<general joint="leg_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<general joint="foot_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self):

View File

@ -1,8 +1,13 @@
import os
from typing import Optional
from typing import Optional, Any, Dict, Tuple
import numpy as np
from gym.envs.mujoco.walker2d_v4 import Walker2dEnv
from gymnasium.envs.mujoco.walker2d_v4 import Walker2dEnv, DEFAULT_CAMERA_CONFIG
from gymnasium.core import ObsType
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_WALKERJUMP = 300
@ -11,8 +16,71 @@ MAX_EPISODE_STEPS_WALKERJUMP = 300
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
# as possible, while landing at a specific target position
class Walker2dEnvCustomXML(Walker2dEnv):
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=1e-3,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_z_range=(0.8, 2.0),
healthy_angle_range=(-1.0, 1.0),
reset_noise_scale=5e-3,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_z_range,
healthy_angle_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
class Walker2dJumpEnv(Walker2dEnv):
self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_z_range = healthy_z_range
self._healthy_angle_range = healthy_angle_range
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if exclude_current_positions_from_observation:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
self.observation_space = observation_space
MujocoEnv.__init__(
self,
xml_file,
4,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class Walker2dJumpEnv(Walker2dEnvCustomXML):
"""
healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
penalty 10 -> 0 not from alex
@ -54,13 +122,13 @@ class Walker2dJumpEnv(Walker2dEnv):
self.max_height = max(height, self.max_height)
done = bool(height < 0.2)
terminated = bool(height < 0.2)
ctrl_cost = self.control_cost(action)
costs = ctrl_cost
rewards = 0
if self.current_step >= self.max_episode_steps or done:
done = True
if self.current_step >= self.max_episode_steps or terminated:
terminated = True
height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
healthy_reward = self.healthy_reward * self.current_step
@ -73,17 +141,20 @@ class Walker2dJumpEnv(Walker2dEnv):
'max_height': self.max_height,
'goal': self.goal,
}
truncated = False
return observation, reward, done, info
return observation, reward, terminated, truncated, info
def _get_obs(self):
return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0
self.max_height = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0
return super().reset()
return ret
# overwrite reset_model to make it deterministic
def reset_model(self):
@ -97,21 +168,3 @@ class Walker2dJumpEnv(Walker2dEnv):
observation = self._get_obs()
return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = Walker2dJumpEnv()
obs = env.reset()
for i in range(6000):
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

309
fancy_gym/envs/registry.py Normal file
View File

@ -0,0 +1,309 @@
from typing import Tuple, Union, Callable, List, Dict, Any, Optional
import copy
import importlib
import numpy as np
from collections import defaultdict
from collections.abc import Mapping, MutableMapping
from fancy_gym.utils.make_env_helpers import make_bb
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from gymnasium import register as gym_register
from gymnasium import make as gym_make
from gymnasium.envs.registration import registry as gym_registry
class DefaultMPWrapper(RawInterfaceWrapper):
@property
def context_mask(self):
"""
Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns:
bool array representing the indices of the observations
"""
# If the env already defines a context_mask, we will use that
if hasattr(self.env, 'context_mask'):
return self.env.context_mask
# Otherwise we will use the whole observation as the context. (Write a custom MPWrapper to change this behavior)
return np.full(self.env.observation_space.shape, True)
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
"""
Returns the current position of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using velocity control,
it should, however, be implemented regardless.
E.g. The joint positions that are directly or indirectly controlled by the action.
"""
assert hasattr(self.env, 'current_pos'), 'DefaultMPWrapper was unable to access env.current_pos. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
return self.env.current_pos
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
"""
Returns the current velocity of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using position control,
it should, however, be implemented regardless.
E.g. The joint velocities that are directly or indirectly controlled by the action.
"""
assert hasattr(self.env, 'current_vel'), 'DefaultMPWrapper was unable to access env.current_vel. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
return self.env.current_vel
_BB_DEFAULTS = {
'ProMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'promp'
},
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1,
'basis_bandwidth_factor': 3.0,
},
'black_box_kwargs': {
}
},
'DMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'dmp'
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp'
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'rbf',
'num_basis': 5
},
'black_box_kwargs': {
}
},
'ProDMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'prodmp',
'duration': 2.0,
'weights_scale': 1.0,
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp',
'tau': 1.5,
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'prodmp',
'alpha': 10,
'num_basis': 5,
},
'black_box_kwargs': {
}
}
}
KNOWN_MPS = list(_BB_DEFAULTS.keys())
_KNOWN_MPS_PLUS_ALL = KNOWN_MPS + ['all']
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS = {}
def register(
id: str,
entry_point: Optional[Union[Callable, str]] = None,
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
register_step_based: bool = True, # TODO: Detect
add_mp_types: List[str] = KNOWN_MPS,
mp_config_override: Dict[str, Any] = {},
**kwargs
):
"""
Registers a Gymnasium environment, including Movement Primitives (MP) versions.
If you only want to register MP versions for an already registered environment, use fancy_gym.upgrade instead.
Args:
id (str): The unique identifier for the environment.
entry_point (Optional[Union[Callable, str]]): The entry point for creating the environment.
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment.
register_step_based (bool): Whether to also register the raw srtep-based version of the environment (default True).
add_mp_types (List[str]): List of additional MP types to register.
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
**kwargs: Additional keyword arguments which are passed to the environment constructor.
Notes:
- When `register_step_based` is True, the raw environment will also be registered to gymnasium otherwise only mp-versions will be registered.
- `entry_point` can be given as a string, allowing the same notation as gymnasium.
- If `id` already exists in the Gymnasium registry and `register_step_based` is True,
a warning message will be printed, suggesting to set `register_step_based=False` or use `fancy_gym.upgrade`.
Example:
To register a step-based environment with Movement Primitive versions (will use default mp_wrapper):
>>> register("MyEnv-v0", MyEnvClass"my_module:MyEnvClass")
The entry point can also be provided as a string:
>>> register("MyEnv-v0", "my_module:MyEnvClass")
"""
if register_step_based and id in gym_registry:
print(f'[Info] Gymnasium env with id "{id}" already exists. You should supply register_step_based=False or use fancy_gym.upgrade if you only want to register mp versions of an existing env.')
if register_step_based:
assert entry_point != None, 'You need to provide an entry-point, when registering step-based.'
if not callable(mp_wrapper): # mp_wrapper can be given as a String (same notation as for entry_point)
mod_name, attr_name = mp_wrapper.split(':')
mod = importlib.import_module(mod_name)
mp_wrapper = getattr(mod, attr_name)
if register_step_based:
gym_register(id=id, entry_point=entry_point, **kwargs)
upgrade(id, mp_wrapper, add_mp_types, mp_config_override)
def upgrade(
id: str,
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
add_mp_types: List[str] = KNOWN_MPS,
base_id: Optional[str] = None,
mp_config_override: Dict[str, Any] = {},
):
"""
Upgrades an existing Gymnasium environment to include Movement Primitives (MP) versions.
We expect the raw step-based env to be already registered with gymnasium. Otherwise please use fancy_gym.register instead.
Args:
id (str): The unique identifier for the environment.
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment (default is DefaultMPWrapper).
add_mp_types (List[str]): List of additional MP types to register (default is KNOWN_MPS).
base_id (Optional[str]): The unique identifier for the environment to upgrade. Will use id if non is provided. Can be defined to allow multiple registrations of different versions for the same step-based environment.
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
Notes:
- The `id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade. You can also pick a new one, but then `base_id` needs to be provided.
- The `mp_wrapper` parameter specifies the MP wrapper to use, allowing for customization.
- `add_mp_types` can be used to specify additional MP types to register alongside the base environment.
- The `base_id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade.
- `mp_config_override` allows for customizing MP configuration if needed.
Example:
To upgrade an existing environment with MP versions:
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper)
To upgrade an existing environment with custom MP types and configuration:
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper, add_mp_types=["ProDMP", "DMP"], mp_config_override={"param": 42})
"""
if not base_id:
base_id = id
register_mps(id, base_id, mp_wrapper, add_mp_types, mp_config_override)
def register_mps(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, add_mp_types: List[str] = KNOWN_MPS, mp_config_override: Dict[str, Any] = {}):
for mp_type in add_mp_types:
register_mp(id, base_id, mp_wrapper, mp_type, mp_config_override.get(mp_type, {}))
def register_mp(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, mp_type: List[str], mp_config_override: Dict[str, Any] = {}):
assert mp_type in KNOWN_MPS, 'Unknown mp_type'
assert id not in ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type], f'The environment {id} is already registered for {mp_type}.'
parts = id.split('/')
if len(parts) == 1:
ns, name = 'gym', parts[0]
elif len(parts) == 2:
ns, name = parts[0], parts[1]
else:
raise ValueError('env id can not contain multiple "/".')
parts = name.split('-')
assert len(parts) >= 2 and parts[-1].startswith('v'), 'Malformed env id, must end in -v{int}.'
fancy_id = f'{ns}_{mp_type}/{name}'
gym_register(
id=fancy_id,
entry_point=bb_env_constructor,
kwargs={
'underlying_id': base_id,
'mp_wrapper': mp_wrapper,
'mp_type': mp_type,
'_mp_config_override_register': mp_config_override
}
)
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type].append(fancy_id)
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all'].append(fancy_id)
if ns not in MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS:
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns] = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns][mp_type].append(fancy_id)
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]['all'].append(fancy_id)
def nested_update(base: MutableMapping, update):
"""
Updated method for nested Mappings
Args:
base: main Mapping to be updated
update: updated values for base Mapping
"""
if any([item.endswith('_type') for item in update]):
base = update
return base
for k, v in update.items():
base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
return base
def bb_env_constructor(underlying_id, mp_wrapper, mp_type, mp_config_override={}, _mp_config_override_register={}, **kwargs):
raw_underlying_env = gym_make(underlying_id, **kwargs)
underlying_env = mp_wrapper(raw_underlying_env)
mp_config = getattr(underlying_env, 'mp_config') if hasattr(underlying_env, 'mp_config') else {}
active_mp_config = copy.deepcopy(mp_config.get(mp_type, {}))
global_inherit_defaults = mp_config.get('inherit_defaults', True)
inherit_defaults = active_mp_config.pop('inherit_defaults', global_inherit_defaults)
config = copy.deepcopy(_BB_DEFAULTS[mp_type]) if inherit_defaults else {}
nested_update(config, active_mp_config)
nested_update(config, _mp_config_override_register)
nested_update(config, mp_config_override)
wrappers = config.pop('wrappers')
traj_gen_kwargs = config.pop('trajectory_generator_kwargs', {})
black_box_kwargs = config.pop('black_box_kwargs', {})
contr_kwargs = config.pop('controller_kwargs', {})
phase_kwargs = config.pop('phase_generator_kwargs', {})
basis_kwargs = config.pop('basis_generator_kwargs', {})
return make_bb(underlying_env,
wrappers=wrappers,
black_box_kwargs=black_box_kwargs,
traj_gen_kwargs=traj_gen_kwargs,
controller_kwargs=contr_kwargs,
phase_kwargs=phase_kwargs,
basis_kwargs=basis_kwargs,
**config)

View File

@ -1,20 +1,23 @@
import gymnasium as gym
import fancy_gym
def example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False):
env = fancy_gym.make(env_name, seed=seed)
env.reset()
def example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False):
env = gym.make(env_name)
env.reset(seed=seed)
for i in range(iterations):
done = False
while done is False:
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
if render:
env.render(mode="human")
if done:
if terminated or truncated:
env.reset()
env.close()
del env
def example_custom_replanning_envs(seed=0, iteration=100, render=True):
# id for a step-based environment
base_env_id = "BoxPushingDense-v0"
@ -22,7 +25,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper]
trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp',
'weight_scale': 1}
'weights_scale': 1}
phase_generator_kwargs = {'phase_generator_type': 'exp'}
controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'prodmp',
@ -46,8 +49,8 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
for i in range(iteration):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
if done:
obs, reward, terminated, truncated, info = env.step(ac)
if terminated or truncated:
env.reset()
env.close()
@ -56,7 +59,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
if __name__ == "__main__":
# run a registered replanning environment
example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False)
example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False)
# run a custom replanning environment
example_custom_replanning_envs(seed=0, iteration=8, render=True)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym
def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
def example_dmc(env_id="dm_control/fish-swim", seed=1, iterations=1000, render=True):
"""
Example for running a DMC based env in the step based setting.
The env_id has to be specified as `domain_name:task_name` or
@ -16,9 +17,9 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
Returns:
"""
env = fancy_gym.make(env_id, seed)
env = gym.make(env_id)
rewards = 0
obs = env.reset()
obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape)
@ -26,10 +27,10 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
ac = env.action_space.sample()
if render:
env.render(mode="human")
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(env_id, rewards)
rewards = 0
obs = env.reset()
@ -56,7 +57,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
"""
# Base DMC name, according to structure of above example
base_env_id = "dmc:ball_in_cup-catch"
base_env_id = "dm_control/ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@ -65,8 +66,8 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,}
"p_gains": 1.0,
"d_gains": 0.1, }
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
@ -102,10 +103,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
@ -123,14 +124,14 @@ if __name__ == '__main__':
render = True
# # Standard DMC Suite tasks
example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render)
example_dmc("dm_control/fish-swim", seed=10, iterations=1000, render=render)
#
# # Manipulation tasks
# # Disclaimer: The vision versions are currently not integrated and yield an error
example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render)
example_dmc("dm_control/manipulation-reach_site_features", seed=10, iterations=250, render=render)
#
# # Gym + DMC hybrid task provided in the MP framework
example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render)
example_dmc("dm_control_ProMP/ball_in_cup-catch-v0", seed=10, iterations=1, render=render)
# Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
# already registered above

View File

@ -1,6 +1,6 @@
from collections import defaultdict
import gym
import gymnasium as gym
import numpy as np
import fancy_gym
@ -21,27 +21,27 @@ def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
"""
env = fancy_gym.make(env_id, seed)
env = gym.make(env_id)
rewards = 0
obs = env.reset()
obs = env.reset(seed=seed)
print("Observation shape: ", env.observation_space.shape)
print("Action shape: ", env.action_space.shape)
# number of environment steps
for i in range(iterations):
obs, reward, done, info = env.step(env.action_space.sample())
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
rewards += reward
if render:
env.render()
if done:
if terminated or truncated:
print(rewards)
rewards = 0
obs = env.reset()
def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
def example_async(env_id="fancy/HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
"""
Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
This also includes DMC and DMP environments when leveraging our custom make_env function.
@ -69,12 +69,15 @@ def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samp
# this would generate more samples than requested if n_samples % num_envs != 0
repeat = int(np.ceil(n_samples / env.num_envs))
for i in range(repeat):
obs, reward, done, info = env.step(env.action_space.sample())
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
buffer['obs'].append(obs)
buffer['reward'].append(reward)
buffer['done'].append(done)
buffer['terminated'].append(terminated)
buffer['truncated'].append(truncated)
buffer['info'].append(info)
rewards += reward
done = terminated or truncated
if np.any(done):
print(f"Reward at iteration {i}: {rewards[done]}")
rewards[done] = 0
@ -90,11 +93,10 @@ if __name__ == '__main__':
example_general("Pendulum-v1", seed=10, iterations=200, render=render)
# Mujoco task from framework
example_general("Reacher5d-v0", seed=10, iterations=200, render=render)
example_general("fancy/Reacher5d-v0", seed=10, iterations=200, render=render)
# # OpenAI Mujoco task
example_general("HalfCheetah-v2", seed=10, render=render)
# Vectorized multiprocessing environments
# example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym
def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
def example_meta(env_id="fish-swim", seed=1, iterations=1000, render=True):
"""
Example for running a MetaWorld based env in the step based setting.
The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
@ -17,9 +18,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
Returns:
"""
env = fancy_gym.make(env_id, seed)
env = gym.make(env_id)
rewards = 0
obs = env.reset()
obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape)
@ -29,9 +30,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
# THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
# TODO: Remove this, when Metaworld fixes its interface.
env.render(False)
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(env_id, rewards)
rewards = 0
obs = env.reset()
@ -40,7 +41,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
del env
def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
def example_custom_meta_and_mp(seed=1, iterations=1, render=True):
"""
Example for running a custom movement primitive based environments.
Our already registered environments follow the same structure.
@ -58,7 +59,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
"""
# Base MetaWorld name, according to structure of above example
base_env_id = "metaworld:button-press-v2"
base_env_id = "metaworld/button-press-v2"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@ -103,10 +104,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
@ -124,11 +125,10 @@ if __name__ == '__main__':
render = False
# # Standard Meta world tasks
example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render)
example_meta("metaworld/button-press-v2", seed=10, iterations=500, render=render)
# # MP + MetaWorld hybrid task provided in the our framework
example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render)
example_meta("metaworld_ProMP/ButtonPress-v2", seed=10, iterations=1, render=render)
#
# # Custom MetaWorld task
example_custom_dmc_and_mp(seed=10, iterations=1, render=render)
example_custom_meta_and_mp(seed=10, iterations=1, render=render)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym
def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True):
def example_mp(env_name="fancy_ProMP/HoleReacher-v0", seed=1, iterations=1, render=True):
"""
Example for running a black box based environment, which is already registered
Args:
@ -15,11 +16,11 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
"""
# Equivalent to gym, we have a make function which can be used to create environments.
# It takes care of seeding and enables the use of a variety of external environments using the gym interface.
env = fancy_gym.make(env_name, seed)
env = gym.make(env_name)
returns = 0
# env.render(mode=None)
obs = env.reset()
obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
@ -41,16 +42,16 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
# This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
# full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
# to the return of a trajectory. Default is the sum over the step-wise rewards.
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
# Aggregated returns
returns += reward
if done:
if terminated or truncated:
print(reward)
obs = env.reset()
def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True):
def example_custom_mp(env_name="fancy_ProMP/Reacher5d-v0", seed=1, iterations=1, render=True):
"""
Example for running a movement primitive based environment, which is already registered
Args:
@ -62,12 +63,9 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
Returns:
"""
# Changing the arguments of the black box env is possible by providing them to gym as with all kwargs.
# Changing the arguments of the black box env is possible by providing them to gym through mp_config_override.
# E.g. here for way to many basis functions
env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000})
# env = fancy_gym.make(env_name, seed)
# mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
# mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
env = gym.make(env_name, seed, mp_config_override={'basis_generator_kwargs': {'num_basis': 1000}})
returns = 0
obs = env.reset()
@ -79,10 +77,10 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
returns += reward
if done:
if terminated or truncated:
print(i, reward)
obs = env.reset()
@ -106,7 +104,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
"""
base_env_id = "Reacher5d-v0"
base_env_id = "fancy/Reacher5d-v0"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed.
@ -114,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# For a ProMP
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
'weight_scale': 2}
'weights_scale': 2}
phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
@ -124,7 +122,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# # For a DMP
# trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
# 'weight_scale': 500}
# 'weights_scale': 500}
# phase_generator_kwargs = {'phase_generator_type': 'exp',
# 'alpha_phase': 2.5}
# controller_kwargs = {'controller_type': 'velocity'}
@ -145,10 +143,10 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps)
for i in range(iterations):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(rewards)
rewards = 0
obs = env.reset()
@ -157,20 +155,20 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
if __name__ == '__main__':
render = False
# DMP
example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render)
example_mp("fancy_DMP/HoleReacher-v0", seed=10, iterations=5, render=render)
# ProMP
example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render)
example_mp("BoxPushingTemporalSparseProMP-v0", seed=10, iterations=1, render=render)
example_mp("TableTennis4DProMP-v0", seed=10, iterations=20, render=render)
example_mp("fancy_ProMP/HoleReacher-v0", seed=10, iterations=5, render=render)
example_mp("fancy_ProMP/BoxPushingTemporalSparse-v0", seed=10, iterations=1, render=render)
example_mp("fancy_ProMP/TableTennis4D-v0", seed=10, iterations=20, render=render)
# ProDMP with Replanning
example_mp("BoxPushingDenseReplanProDMP-v0", seed=10, iterations=4, render=render)
example_mp("TableTennis4DReplanProDMP-v0", seed=10, iterations=20, render=render)
example_mp("TableTennisWindReplanProDMP-v0", seed=10, iterations=20, render=render)
example_mp("fancy_ProDMP/BoxPushingDenseReplan-v0", seed=10, iterations=4, render=render)
example_mp("fancy_ProDMP/TableTennis4DReplan-v0", seed=10, iterations=20, render=render)
example_mp("fancy_ProDMP/TableTennisWindReplan-v0", seed=10, iterations=20, render=render)
# Altered basis functions
obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render)
obs1 = example_custom_mp("fancy_ProMP/Reacher5d-v0", seed=10, iterations=1, render=render)
# Custom MP
example_fully_custom_mp(seed=10, iterations=1, render=render)

View File

@ -1,3 +1,4 @@
import gymnasium as gym
import fancy_gym
@ -12,11 +13,10 @@ def example_mp(env_name, seed=1, render=True):
Returns:
"""
# While in this case gym.make() is possible to use as well, we recommend our custom make env function.
env = fancy_gym.make(env_name, seed)
env = gym.make(env_name)
returns = 0
obs = env.reset()
obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps)
for i in range(10):
if render and i % 2 == 0:
@ -24,14 +24,13 @@ def example_mp(env_name, seed=1, render=True):
else:
env.render()
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
returns += reward
if done:
if terminated or truncated:
print(returns)
obs = env.reset()
if __name__ == '__main__':
example_mp("ReacherProMP-v2")
example_mp("gym_ProMP/Reacher-v2")

View File

@ -1,10 +1,14 @@
import gymnasium as gym
import fancy_gym
def compare_bases_shape(env1_id, env2_id):
env1 = fancy_gym.make(env1_id, seed=0)
env1 = gym.make(env1_id)
env1.traj_gen.show_scaled_basis(plot=True)
env2 = fancy_gym.make(env2_id, seed=0)
env2 = gym.make(env2_id)
env2.traj_gen.show_scaled_basis(plot=True)
return
if __name__ == '__main__':
compare_bases_shape("TableTennis4DProDMP-v0", "TableTennis4DProMP-v0")
compare_bases_shape("fancy_ProDMP/TableTennis4D-v0", "fancy_ProMP/TableTennis4D-v0")

View File

@ -3,19 +3,20 @@ from collections import OrderedDict
import numpy as np
from matplotlib import pyplot as plt
import gymnasium as gym
import fancy_gym
# This might work for some environments, however, please verify either way the correct trajectory information
# for your environment are extracted below
SEED = 1
env_id = "Reacher5dProMP-v0"
env_id = "fancy_ProMP/Reacher5d-v0"
env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env
env = fancy_gym.make(env_id, mp_config_override={'controller_kwargs': {'p_gains': 0.05, 'd_gains': 0.05}}).env
env.action_space.seed(SEED)
# Plot difference between real trajectory and target MP trajectory
env.reset()
env.reset(seed=SEED)
w = env.action_space.sample()
pos, vel = env.get_trajectory(w)
@ -34,7 +35,7 @@ fig.show()
for t, (des_pos, des_vel) in enumerate(zip(pos, vel)):
actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel)
actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
_, _, _, _ = env.env.step(actions)
env.env.step(actions)
if t % 15 == 0:
img.set_data(env.env.render(mode="rgb_array"))
fig.canvas.draw()

View File

@ -1,26 +1,64 @@
# MetaWorld Wrappers
# Metaworld
These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Movement Primitive gym interface with them.
All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized.
Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following:
```python
return np.hstack([
# Current observation
[False] * 3, # end-effector position
[False] * 1, # normalized gripper open distance
[True] * 3, # main object position
[False] * 4, # main object quaternion
[False] * 3, # secondary object position
[False] * 4, # secondary object quaternion
# Previous observation
[False] * 3, # previous end-effector position
[False] * 1, # previous normalized gripper open distance
[False] * 3, # previous main object position
[False] * 4, # previous main object quaternion
[False] * 3, # previous second object position
[False] * 4, # previous second object quaternion
# Goal
[True] * 3, # goal position
])
```
For other tasks only the boolean values have to be adjusted accordingly.
[Metaworld](https://meta-world.github.io/) is an open-source simulated benchmark designed to advance meta-reinforcement learning and multi-task learning, comprising 50 diverse robotic manipulation tasks. The benchmark features a universal tabletop environment equipped with a simulated Sawyer arm and a variety of everyday objects. This shared environment is pivotal for reusing structured learning and efficiently acquiring related tasks.
## Step-Based Envs
`fancy_gym` makes all metaworld ML1 tasks avaible via the standard gym interface. To access metaworld environments using a different mode of operation (MT1 / ML100 / etc.) please use the functionality provided by metaworld directly.
| Name | Description | Horizon | Action Dimension | Observation Dimension | Context Dimension |
| ---------------------------------------- | ------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- | ----------------- |
| `metaworld/assembly-v2` | A task where the robot must assemble components. | 500 | 4 | 39 | 6 |
| `metaworld/basketball-v2` | A task where the robot must play a game of basketball. | 500 | 4 | 39 | 6 |
| `metaworld/bin-picking-v2` | A task involving the robot picking objects from a bin. | 500 | 4 | 39 | 6 |
| `metaworld/box-close-v2` | A task requiring the robot to close a box. | 500 | 4 | 39 | 6 |
| `metaworld/button-press-topdown-v2` | A task where the robot must press a button from a top-down perspective. | 500 | 4 | 39 | 6 |
| `metaworld/button-press-topdown-wall-v2` | A task involving the robot pressing a button with a wall from a top-down perspective. | 500 | 4 | 39 | 6 |
| `metaworld/button-press-v2` | A task where the robot must press a button. | 500 | 4 | 39 | 6 |
| `metaworld/button-press-wall-v2` | A task involving the robot pressing a button with a wall. | 500 | 4 | 39 | 6 |
| `metaworld/coffee-button-v2` | A task where the robot must press a button on a coffee machine. | 500 | 4 | 39 | 6 |
| `metaworld/coffee-pull-v2` | A task involving the robot pulling a lever on a coffee machine. | 500 | 4 | 39 | 6 |
| `metaworld/coffee-push-v2` | A task involving the robot pushing a component on a coffee machine. | 500 | 4 | 39 | 6 |
| `metaworld/dial-turn-v2` | A task where the robot must turn a dial. | 500 | 4 | 39 | 6 |
| `metaworld/disassemble-v2` | A task requiring the robot to disassemble an object. | 500 | 4 | 39 | 6 |
| `metaworld/door-close-v2` | A task where the robot must close a door. | 500 | 4 | 39 | 6 |
| `metaworld/door-lock-v2` | A task involving the robot locking a door. | 500 | 4 | 39 | 6 |
| `metaworld/door-open-v2` | A task where the robot must open a door. | 500 | 4 | 39 | 6 |
| `metaworld/door-unlock-v2` | A task involving the robot unlocking a door. | 500 | 4 | 39 | 6 |
| `metaworld/hand-insert-v2` | A task requiring the robot to insert a hand into an object. | 500 | 4 | 39 | 6 |
| `metaworld/drawer-close-v2` | A task where the robot must close a drawer. | 500 | 4 | 39 | 6 |
| `metaworld/drawer-open-v2` | A task involving the robot opening a drawer. | 500 | 4 | 39 | 6 |
| `metaworld/faucet-open-v2` | A task requiring the robot to open a faucet. | 500 | 4 | 39 | 6 |
| `metaworld/faucet-close-v2` | A task where the robot must close a faucet. | 500 | 4 | 39 | 6 |
| `metaworld/hammer-v2` | A task where the robot must use a hammer. | 500 | 4 | 39 | 6 |
| `metaworld/handle-press-side-v2` | A task involving the robot pressing a handle from the side. | 500 | 4 | 39 | 6 |
| `metaworld/handle-press-v2` | A task where the robot must press a handle. | 500 | 4 | 39 | 6 |
| `metaworld/handle-pull-side-v2` | A task requiring the robot to pull a handle from the side. | 500 | 4 | 39 | 6 |
| `metaworld/handle-pull-v2` | A task where the robot must pull a handle. | 500 | 4 | 39 | 6 |
| `metaworld/lever-pull-v2` | A task involving the robot pulling a lever. | 500 | 4 | 39 | 6 |
| `metaworld/peg-insert-side-v2` | A task requiring the robot to insert a peg from the side. | 500 | 4 | 39 | 6 |
| `metaworld/pick-place-wall-v2` | A task involving the robot picking and placing an object with a wall. | 500 | 4 | 39 | 6 |
| `metaworld/pick-out-of-hole-v2` | A task where the robot must pick an object out of a hole. | 500 | 4 | 39 | 6 |
| `metaworld/reach-v2` | A task where the robot must reach an object. | 500 | 4 | 39 | 6 |
| `metaworld/push-back-v2` | A task involving the robot pushing an object backward. | 500 | 4 | 39 | 6 |
| `metaworld/push-v2` | A task where the robot must push an object. | 500 | 4 | 39 | 6 |
| `metaworld/pick-place-v2` | A task involving the robot picking up and placing an object. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-v2` | A task requiring the robot to slide a plate. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-side-v2` | A task involving the robot sliding a plate from the side. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-back-v2` | A task where the robot must slide a plate backward. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-back-side-v2` | A task involving the robot sliding a plate backward from the side. | 500 | 4 | 39 | 6 |
| `metaworld/peg-unplug-side-v2` | A task where the robot must unplug a peg from the side. | 500 | 4 | 39 | 6 |
| `metaworld/soccer-v2` | A task where the robot must play soccer. | 500 | 4 | 39 | 6 |
| `metaworld/stick-push-v2` | A task involving the robot pushing a stick. | 500 | 4 | 39 | 6 |
| `metaworld/stick-pull-v2` | A task where the robot must pull a stick. | 500 | 4 | 39 | 6 |
| `metaworld/push-wall-v2` | A task involving the robot pushing against a wall. | 500 | 4 | 39 | 6 |
| `metaworld/reach-wall-v2` | A task where the robot must reach an object with a wall. | 500 | 4 | 39 | 6 |
| `metaworld/shelf-place-v2` | A task involving the robot placing an object on a shelf. | 500 | 4 | 39 | 6 |
| `metaworld/sweep-into-v2` | A task where the robot must sweep objects into a container. | 500 | 4 | 39 | 6 |
| `metaworld/sweep-v2` | A task requiring the robot to sweep. | 500 | 4 | 39 | 6 |
| `metaworld/window-open-v2` | A task where the robot must open a window. | 500 | 4 | 39 | 6 |
| `metaworld/window-close-v2` | A task involving the robot closing a window. | 500 | 4 | 39 | 6 |
## MP-Based Envs
All envs also exist in MP-variants. Refer to them using `metaworld_ProMP/<name-v2>` or `metaworld_ProDMP/<name-v2>` (DMP is currently not supported as of now).

View File

@ -1,125 +1,37 @@
from typing import Iterable, Type, Union, Optional
from copy import deepcopy
from gym import register
from ..envs.registry import register
from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
object_change_mp_wrapper
from . import metaworld_adapter
metaworld_adapter.register_all_ML1()
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
# MetaWorld
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp',
'weights_scale': 10,
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'metaworld',
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
},
'black_box_kwargs': {
'condition_on_desired': False,
}
}
DEFAULT_BB_DICT_ProDMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'prodmp',
'auto_scale_basis': True,
'weights_scale': 10,
# 'goal_scale': 0.,
'disable_goal': True,
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp',
# 'alpha_phase' : 3,
},
"controller_kwargs": {
'controller_type': 'metaworld',
},
"basis_generator_kwargs": {
'basis_generator_type': 'prodmp',
'num_basis': 5,
'alpha': 10
},
'black_box_kwargs': {
'condition_on_desired': False,
}
}
_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
"plate-slide-side-v2", "plate-slide-back-side-v2"]
for _task in _goal_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_change_promp
id=f'metaworld/{_task}',
register_step_based=False,
mp_wrapper=goal_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_change_prodmp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
for _task in _object_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_object_change_promp
id=f'metaworld/{_task}',
register_step_based=False,
mp_wrapper=object_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_object_change_prodmp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
kwargs_dict_object_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_object_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
"button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
@ -133,62 +45,18 @@ _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press
"shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
]
for _task in _goal_and_object_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_object_change_promp
id=f'metaworld/{_task}',
register_step_based=False,
mp_wrapper=goal_object_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_and_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_and_object_change_prodmp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_object_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_object_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_endeffector_change_envs = ["basketball-v2"]
for _task in _goal_and_endeffector_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_endeffector_change_promp
id=f'metaworld/{_task}',
register_step_based=False,
mp_wrapper=goal_endeffector_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_and_endeffector_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_and_endeffector_change_prodmp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_endeffector_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_endeffector_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)

View File

@ -6,12 +6,63 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class BaseMetaworldMPWrapper(RawInterfaceWrapper):
mp_config = {
'inherit_defaults': False,
'ProMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'promp',
'weights_scale': 10,
},
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
},
'controller_kwargs': {
'controller_type': 'metaworld',
},
'basis_generator_kwargs': {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
},
'black_box_kwargs': {
'condition_on_desired': False,
},
},
'DMP': {},
'ProDMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'prodmp',
'auto_scale_basis': True,
'weights_scale': 10,
# 'goal_scale': 0.,
'disable_goal': True,
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp',
# 'alpha_phase' : 3,
},
'controller_kwargs': {
'controller_type': 'metaworld',
},
'basis_generator_kwargs': {
'basis_generator_type': 'prodmp',
'num_basis': 5,
'alpha': 10
},
'black_box_kwargs': {
'condition_on_desired': False,
},
},
}
@property
def current_pos(self) -> Union[float, int, np.ndarray]:
r_close = self.env.data.get_joint_qpos("r_close")
r_close = self.env.data.joint('r_close').qpos
return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return np.zeros(4, )
# raise NotImplementedError("Velocity cannot be retrieved.")
# raise NotImplementedError('Velocity cannot be retrieved.')

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ 0. , 0. , 0. , 0. , 0,
0 , 0 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0 , 0 , 0 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
"""
@property

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ !=0 , !=0 , !=0 , 0. , 0.,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , !=0 , !=0 ,
!=0 , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
"""
@property

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ 0. , 0. , 0. , 0. , !=0,
!=0 , !=0 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , !=0 , !=0 , !=0 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
"""
@property

View File

@ -0,0 +1,97 @@
import random
from typing import Iterable, Type, Union, Optional
import numpy as np
from gymnasium import register as gym_register
import uuid
import gymnasium as gym
import numpy as np
from fancy_gym.utils.env_compatibility import EnvCompatibility
try:
import metaworld
except Exception:
print('[FANCY GYM] Metaworld not avaible')
class FixMetaworldHasIncorrectObsSpaceWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
eos = env.observation_space
eas = env.action_space
Obs_Space_Class = getattr(gym.spaces, str(eos.__class__).split("'")[1].split('.')[-1])
Act_Space_Class = getattr(gym.spaces, str(eas.__class__).split("'")[1].split('.')[-1])
self.observation_space = Obs_Space_Class(low=eos.low-np.inf, high=eos.high+np.inf, dtype=eos.dtype)
self.action_space = Act_Space_Class(low=eas.low, high=eas.high, dtype=eas.dtype)
class FixMetaworldIncorrectResetPathLengthWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
def reset(self, **kwargs):
ret = self.env.reset(**kwargs)
head = self.env
try:
for i in range(16):
head.curr_path_length = 0
head = head.env
except:
pass
return ret
class FixMetaworldIgnoresSeedOnResetWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
def reset(self, **kwargs):
print('[!] You just called .reset on a Metaworld env and supplied a seed. Metaworld curretly does not correctly implement seeding. Do not rely on deterministic behavior.')
if 'seed' in kwargs:
self.env.seed(kwargs['seed'])
return self.env.reset(**kwargs)
def make_metaworld(underlying_id: str, seed: int = 1, render_mode: Optional[str] = None, **kwargs):
if underlying_id not in metaworld.ML1.ENV_NAMES:
raise ValueError(f'Specified environment "{underlying_id}" not present in metaworld ML1.')
env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[underlying_id + "-goal-observable"](seed=seed, **kwargs)
# setting this avoids generating the same initialization after each reset
env._freeze_rand_vec = False
# New argument to use global seeding
env.seeded_rand_vec = True
# TODO remove, when this has been fixed upstream
env = FixMetaworldHasIncorrectObsSpaceWrapper(env)
# TODO remove, when this has been fixed upstream
# env = FixMetaworldIncorrectResetPathLengthWrapper(env)
# TODO remove, when this has been fixed upstream
env = FixMetaworldIgnoresSeedOnResetWrapper(env)
return env
def register_all_ML1(**kwargs):
for env_id in metaworld.ML1.ENV_NAMES:
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=0)
max_episode_steps = _env.max_path_length
gym_register(
id='metaworld/'+env_id,
entry_point=make_metaworld,
max_episode_steps=max_episode_steps,
kwargs={
'underlying_id': env_id
},
**kwargs
)

View File

@ -4,11 +4,12 @@ These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.
the Motion Primitive gym interface for them.
## MP Environments
These environments are wrapped-versions of their OpenAI-gym counterparts.
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2
|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4
|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
| Name | Description | Trajectory Horizon | Action Dimension |
| ------------------------------------ | -------------------------------------------------------------------- | ------------------ | ---------------- |
| `gym_ProMP/ContinuousMountainCar-v0` | A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 |
| `gym_ProMP/Reacher-v2` | A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2 |
| `gym_ProMP/FetchSlideDense-v1` | A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 |
| `gym_ProMP/FetchReachDense-v1` | A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 |

View File

@ -1,45 +1,16 @@
from copy import deepcopy
from gym import register
from ..envs.registry import register, upgrade
from . import mujoco
from .deprecated_needs_gym_robotics import robotics
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
}
}
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
kwargs_dict_reacher_promp['name'] = "Reacher-v2"
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
register(
id='ReacherProMP-v2',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_promp
upgrade(
id='Reacher-v2',
mp_wrapper=mujoco.reacher_v2.MPWrapper,
add_mp_types=['ProMP'],
)
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
"""
The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
However, the usage and so on needs to be checked

View File

@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 0.6,
"d_gains": 0.075,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 6,
'num_basis_zero_start': 1
}
},
'DMP': {},
'ProDMP': {},
}
@property
def current_vel(self) -> Union[float, int, np.ndarray]:

View File

@ -0,0 +1,11 @@
import gymnasium as gym
class EnvCompatibility(gym.wrappers.EnvCompatibility):
def __getattr__(self, item):
"""Propagate only non-existent properties to wrapped env."""
if item.startswith('_'):
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
if item in self.__dict__:
return getattr(self, item)
return getattr(self.env, item)

View File

@ -1,17 +1,27 @@
import logging
import re
from fancy_gym.utils.wrappers import TimeAwareObservation
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
from fancy_gym.black_box.factory.controller_factory import get_controller
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
import uuid
from collections.abc import MutableMapping
from copy import deepcopy
from math import ceil
from typing import Iterable, Type, Union
from typing import Iterable, Type, Union, Optional
import gym
import gymnasium as gym
from gymnasium import make
import numpy as np
from gym.envs.registration import register, registry
from gymnasium.envs.registration import register, registry
from gymnasium.wrappers import TimeLimit
from fancy_gym.utils.env_compatibility import EnvCompatibility
from fancy_gym.utils.wrappers import FlattenObservation
try:
from dm_control import suite, manipulation
import shimmy
from shimmy.dm_control_compatibility import EnvType
except ImportError:
pass
@ -21,111 +31,44 @@ except Exception:
# catch Exception as Import error does not catch missing mujoco-py
pass
import fancy_gym
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
from fancy_gym.black_box.factory.controller_factory import get_controller
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
from fancy_gym.utils.utils import nested_update
def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
"""
TODO: Do we need this?
Generate a callable to create a new gym environment with a given seed.
The rank is added to the seed and can be used for example when using vector environments.
E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
with seeds 123 through 130.
Hence, testing environments should be seeded with a value which is offset by the number of training environments.
Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
Args:
env_id: name of the environment
seed: seed for deterministic behaviour
rank: environment rank for deterministic over multiple seeds behaviour
return_callable: If True returns a callable to create the environment instead of the environment itself.
Returns:
"""
def f():
return make(env_id, seed + rank, **kwargs)
return f if return_callable else f()
def make(env_id: str, seed: int, **kwargs):
"""
Converts an env_id to an environment with the gym API.
This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
specified with "dmc:domain_name-task_name"
Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
Args:
env_id: spec or env_id for gym tasks, external environments require a domain specification
**kwargs: Additional kwargs for the constructor such as pixel observations, etc.
Returns: Gym environment
"""
if ':' in env_id:
split_id = env_id.split(':')
framework, env_id = split_id[-2:]
else:
framework = None
if framework == 'metaworld':
# MetaWorld environment
env = make_metaworld(env_id, seed, **kwargs)
elif framework == 'dmc':
# DeepMind Control environment
env = make_dmc(env_id, seed, **kwargs)
else:
env = make_gym(env_id, seed, **kwargs)
env.seed(seed)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env
def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
def _make_wrapped_env(env: gym.Env, wrappers: Iterable[Type[gym.Wrapper]], seed=1, fallback_max_steps=None):
"""
Helper function for creating a wrapped gym environment using MPs.
It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
provided to expose the interface for MPs.
Args:
env_id: name of the environment
env: base environemnt to wrap
wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment
Returns: gym environment with all specified wrappers applied
"""
# _env = gym.make(env_id)
_env = make(env_id, seed, **kwargs)
if fallback_max_steps:
env = ensure_finite_time(env, fallback_max_steps)
has_black_box_wrapper = False
head = env
while hasattr(head, 'env'):
if isinstance(head, RawInterfaceWrapper):
has_black_box_wrapper = True
break
head = head.env
for w in wrappers:
# only wrap the environment if not BlackBoxWrapper, e.g. for vision
if issubclass(w, RawInterfaceWrapper):
has_black_box_wrapper = True
_env = w(_env)
env = w(env)
if not has_black_box_wrapper:
raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
return _env
return env
def make_bb(
env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1,
**kwargs):
env: Union[gym.Env, str], wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping,
time_limit: int = None, fallback_max_steps: int = None, **kwargs):
"""
This can also be used standalone for manually building a custom DMP environment.
Args:
@ -133,7 +76,7 @@ def make_bb(
basis_kwargs: kwargs for the basis generator
phase_kwargs: kwargs for the phase generator
controller_kwargs: kwargs for the tracking controller
env_id: base_env_name,
env: step based environment (or environment id),
wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment
traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
@ -141,7 +84,7 @@ def make_bb(
Returns: DMP wrapped gym env
"""
_verify_time_limit(traj_gen_kwargs.get("duration"), kwargs.get("time_limit"))
_verify_time_limit(traj_gen_kwargs.get("duration"), time_limit)
learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
do_replanning = black_box_kwargs.get('replanning_schedule')
@ -153,12 +96,19 @@ def make_bb(
# Add as first wrapper in order to alter observation
wrappers.insert(0, TimeAwareObservation)
env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs)
if isinstance(env, str):
env = make(env, **kwargs)
env = _make_wrapped_env(env=env, wrappers=wrappers, fallback_max_steps=fallback_max_steps)
# BB expects a spaces.Box to be exposed, need to convert for dict-observations
if type(env.observation_space) == gym.spaces.dict.Dict:
env = FlattenObservation(env)
traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
if black_box_kwargs.get('duration') is None:
black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt
black_box_kwargs['duration'] = get_env_duration(env)
if phase_kwargs.get('tau') is None:
phase_kwargs['tau'] = black_box_kwargs['duration']
@ -186,156 +136,27 @@ def make_bb(
return bb_env
def make_bb_env_helper(**kwargs):
"""
Helper function for registering a black box gym environment.
Args:
**kwargs: expects at least the following:
{
"name": base environment name.
"wrappers": list of wrappers (at least an BlackBoxWrapper is required),
"traj_gen_kwargs": {
"trajectory_generator_type": type_of_your_movement_primitive,
non default arguments for the movement primitive instance
...
}
"controller_kwargs": {
"controller_type": type_of_your_controller,
non default arguments for the tracking_controller instance
...
},
"basis_generator_kwargs": {
"basis_generator_type": type_of_your_basis_generator,
non default arguments for the basis generator instance
...
},
"phase_generator_kwargs": {
"phase_generator_type": type_of_your_phase_generator,
non default arguments for the phase generator instance
...
},
}
Returns: MP wrapped gym env
"""
seed = kwargs.pop("seed", None)
wrappers = kwargs.pop("wrappers")
traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
black_box_kwargs = kwargs.pop('black_box_kwargs', {})
contr_kwargs = kwargs.pop("controller_kwargs", {})
phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
black_box_kwargs=black_box_kwargs,
traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
phase_kwargs=phase_kwargs,
basis_kwargs=basis_kwargs, **kwargs, seed=seed)
def make_dmc(
env_id: str,
seed: int = None,
visualize_reward: bool = True,
time_limit: Union[None, float] = None,
**kwargs
):
if not re.match(r"\w+-\w+", env_id):
raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
domain_name, task_name = env_id.split("-")
if task_name.endswith("_vision"):
# TODO
raise ValueError("The vision interface for manipulation tasks is currently not supported.")
if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
# env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
gym_id = uuid.uuid4().hex + '-v1'
task_kwargs = {'random': seed}
if time_limit is not None:
task_kwargs['time_limit'] = time_limit
# create task
# Accessing private attribute because DMC does not expose time_limit or step_limit.
# Only the current time_step/time as well as the control_timestep can be accessed.
if domain_name == "manipulation":
env = manipulation.load(environment_name=task_name, seed=seed)
max_episode_steps = ceil(env._time_limit / env.control_timestep())
else:
env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
visualize_reward=visualize_reward, environment_kwargs=kwargs)
max_episode_steps = int(env._step_limit)
register(
id=gym_id,
entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
kwargs={'env': lambda: env},
max_episode_steps=max_episode_steps,
)
env = gym.make(gym_id)
env.seed(seed)
def ensure_finite_time(env: gym.Env, fallback_max_steps=500):
cur_limit = env.spec.max_episode_steps
if not cur_limit:
if hasattr(env.unwrapped, 'max_path_length'):
return TimeLimit(env, env.unwrapped.__getattribute__('max_path_length'))
return TimeLimit(env, fallback_max_steps)
return env
def make_metaworld(env_id: str, seed: int, **kwargs):
if env_id not in metaworld.ML1.ENV_NAMES:
raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
# setting this avoids generating the same initialization after each reset
_env._freeze_rand_vec = False
# New argument to use global seeding
_env.seeded_rand_vec = True
gym_id = uuid.uuid4().hex + '-v1'
register(
id=gym_id,
entry_point=lambda: _env,
max_episode_steps=_env.max_path_length,
)
# TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
env = gym.make(gym_id, disable_env_checker=True)
return env
def make_gym(env_id, seed, **kwargs):
"""
Create
Args:
env_id:
seed:
**kwargs:
Returns:
"""
# Getting the existing keywords to allow for nested dict updates for BB envs
# gym only allows for non nested updates.
def get_env_duration(env: gym.Env):
try:
all_kwargs = deepcopy(registry.get(env_id).kwargs)
except AttributeError as e:
logging.error(f'The gym environment with id {env_id} could not been found.')
raise e
nested_update(all_kwargs, kwargs)
kwargs = all_kwargs
# Add seed to kwargs for bb environments to pass seed to step environments
all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), [])
if env_id in all_bb_envs:
kwargs.update({"seed": seed})
# Gym
env = gym.make(env_id, **kwargs)
return env
duration = env.spec.max_episode_steps * env.dt
except (AttributeError, TypeError) as e:
if env.env_type is EnvType.COMPOSER:
max_episode_steps = ceil(env.unwrapped._time_limit / env.dt)
elif env.env_type is EnvType.RL_CONTROL:
max_episode_steps = int(env.unwrapped._step_limit)
else:
raise e
duration = max_episode_steps * env.control_timestep()
return duration
def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):

View File

@ -1,78 +0,0 @@
"""
Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
License: MIT
Copyright (c) 2016 OpenAI (https://openai.com)
Wrapper for adding time aware observations to environment observation.
"""
import gym
import numpy as np
from gym.spaces import Box
class TimeAwareObservation(gym.ObservationWrapper):
"""Augment the observation with the current time step in the episode.
The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
In particular, pixel observations are not supported. This wrapper will append the current timestep
within the current episode to the observation.
Example:
>>> import gym
>>> env = gym.make('CartPole-v1')
>>> env = TimeAwareObservation(env)
>>> env.reset()
array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ])
>>> env.step(env.action_space.sample())[0]
array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ])
"""
def __init__(self, env: gym.Env):
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
observation space.
Args:
env: The environment to apply the wrapper
"""
super().__init__(env)
assert isinstance(env.observation_space, Box)
low = np.append(self.observation_space.low, 0.0)
high = np.append(self.observation_space.high, 1.0)
self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
self.t = 0
self._max_episode_steps = env.spec.max_episode_steps
def observation(self, observation):
"""Adds to the observation with the current time step normalized with max steps.
Args:
observation: The observation to add the time step to
Returns:
The observation with the time step appended to
"""
return np.append(observation, self.t / self._max_episode_steps)
def step(self, action):
"""Steps through the environment, incrementing the time step.
Args:
action: The action to take
Returns:
The environment's step using the action.
"""
self.t += 1
return super().step(action)
def reset(self, **kwargs):
"""Reset the environment setting the time to zero.
Args:
**kwargs: Kwargs to apply to env.reset()
Returns:
The reset environment
"""
self.t = 0
return super().reset(**kwargs)

130
fancy_gym/utils/wrappers.py Normal file
View File

@ -0,0 +1,130 @@
from gymnasium.spaces import Box, Dict, flatten, flatten_space
try:
from gym.spaces import Box as OldBox
except ImportError:
OldBox = None
import gymnasium as gym
import numpy as np
import copy
class TimeAwareObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
"""Augment the observation with the current time step in the episode.
The observation space of the wrapped environment is assumed to be a flat :class:`Box` or flattable :class:`Dict`.
In particular, pixel observations are not supported. This wrapper will append the current progress within the current episode to the observation.
The progress will be indicated as a number between 0 and 1.
"""
def __init__(self, env: gym.Env, enforce_dtype_float32=False):
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` or flattable :class:`Dict` observation space.
Args:
env: The environment to apply the wrapper
"""
gym.utils.RecordConstructorArgs.__init__(self)
gym.ObservationWrapper.__init__(self, env)
allowed_classes = [Box, OldBox, Dict]
if enforce_dtype_float32:
assert env.observation_space.dtype == np.float32, 'TimeAwareObservation was given an environment with a dtype!=np.float32 ('+str(
env.observation_space.dtype)+'). This requirement can be removed by setting enforce_dtype_float32=False.'
assert env.observation_space.__class__ in allowed_classes, str(env.observation_space)+' is not supported. Only Box or Dict'
if env.observation_space.__class__ in [Box, OldBox]:
dtype = env.observation_space.dtype
low = np.append(env.observation_space.low, 0.0)
high = np.append(env.observation_space.high, 1.0)
self.observation_space = Box(low, high, dtype=dtype)
else:
spaces = copy.copy(env.observation_space.spaces)
dtype = np.float64
spaces['time_awareness'] = Box(0, 1, dtype=dtype)
self.observation_space = Dict(spaces)
self.is_vector_env = getattr(env, "is_vector_env", False)
def observation(self, observation):
"""Adds to the observation with the current time step.
Args:
observation: The observation to add the time step to
Returns:
The observation with the time step appended to (relative to total number of steps)
"""
if self.observation_space.__class__ in [Box, OldBox]:
return np.append(observation, self.t / self.env.spec.max_episode_steps)
else:
obs = copy.copy(observation)
obs['time_awareness'] = self.t / self.env.spec.max_episode_steps
return obs
def step(self, action):
"""Steps through the environment, incrementing the time step.
Args:
action: The action to take
Returns:
The environment's step using the action.
"""
self.t += 1
return super().step(action)
def reset(self, **kwargs):
"""Reset the environment setting the time to zero.
Args:
**kwargs: Kwargs to apply to env.reset()
Returns:
The reset environment
"""
self.t = 0
return super().reset(**kwargs)
class FlattenObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
"""Observation wrapper that flattens the observation.
Example:
>>> import gymnasium as gym
>>> from gymnasium.wrappers import FlattenObservation
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> env = FlattenObservation(env)
>>> env.observation_space.shape
(27648,)
>>> obs, _ = env.reset()
>>> obs.shape
(27648,)
"""
def __init__(self, env: gym.Env):
"""Flattens the observations of an environment.
Args:
env: The environment to apply the wrapper
"""
gym.utils.RecordConstructorArgs.__init__(self)
gym.ObservationWrapper.__init__(self, env)
self.observation_space = flatten_space(env.observation_space)
def observation(self, observation):
"""Flattens an observation.
Args:
observation: The observation to flatten
Returns:
The flattened observation
"""
try:
return flatten(self.env.observation_space, observation)
except:
return np.array([flatten(self.env.observation_space, observation[i]) for i in range(len(observation))])

101
icon.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 114 KiB

View File

@ -6,33 +6,38 @@ from setuptools import setup, find_packages
# Environment-specific dependencies for dmc and metaworld
extras = {
"dmc": ["dm_control>=1.0.1"],
"metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
'mujoco-py<2.2,>=2.1',
'scipy'
],
'dmc': ['shimmy[dm-control]', 'Shimmy==1.0.0'],
'metaworld': ['metaworld @ git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld'],
'box2d': ['gymnasium[box2d]>=0.26.0'],
'mujoco': ['mujoco==2.3.3', 'gymnasium[mujoco]>0.26.0'],
'mujoco-legacy': ['mujoco-py >=2.1,<2.2', 'cython<3'],
'jax': ["jax >=0.4.0", "jaxlib >=0.4.0"],
}
# All dependencies
all_groups = set(extras.keys())
extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups))))
extras["all"] = list(set(itertools.chain.from_iterable(
map(lambda group: extras[group], all_groups))))
extras['testing'] = extras["all"] + ['pytest']
def find_package_data(extensions_to_include: List[str]) -> List[str]:
envs_dir = Path("fancy_gym/envs/mujoco")
package_data_paths = []
for extension in extensions_to_include:
package_data_paths.extend([str(path)[10:] for path in envs_dir.rglob(extension)])
package_data_paths.extend([str(path)[10:]
for path in envs_dir.rglob(extension)])
return package_data_paths
setup(
author='Fabian Otto, Onur Celik',
author='Fabian Otto, Onur Celik, Dominik Roth, Hongyi Zhou',
name='fancy_gym',
version='0.2',
version='1.0',
classifiers=[
'Development Status :: 3 - Alpha',
'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research',
'License :: OSI Approved :: MIT License',
'Natural Language :: English',
@ -46,10 +51,11 @@ setup(
],
extras_require=extras,
install_requires=[
'gym[mujoco]<0.25.0,>=0.24.1',
'gymnasium>=0.26.0',
'mp_pytorch<=0.1.3'
],
packages=[package for package in find_packages() if package.startswith("fancy_gym")],
packages=[package for package in find_packages(
) if package.startswith("fancy_gym")],
package_data={
"fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"])
},

View File

@ -1,14 +1,21 @@
import re
from itertools import chain
from typing import Callable
import gym
import gymnasium as gym
import pytest
import fancy_gym
from test.utils import run_env, run_env_determinism
GYM_IDS = [spec.id for spec in gym.envs.registry.all() if
"fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
GYM_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
GYM_IDS = [spec.id for spec in gym.envs.registry.values() if
not isinstance(spec.entry_point, Callable) and
"fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point
and 'jax' not in spec.id.lower()
and 'jax' not in spec.id.lower()
and not re.match(r'GymV2.Environment', spec.id)
]
GYM_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1

View File

@ -1,21 +1,23 @@
from itertools import chain
from typing import Tuple, Type, Union, Optional, Callable
import gym
import gymnasium as gym
import numpy as np
import pytest
from gym import register
from gym.core import ActType, ObsType
from gymnasium import register, make
from gymnasium.core import ActType, ObsType
import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
from fancy_gym.utils.wrappers import TimeAwareObservation
SEED = 1
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
MAX_STEPS_FALLBACK = 100
class Object(object):
@ -32,10 +34,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
return np.array([-1])
obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
return np.array([-1]), 1, False, {}
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@ -76,7 +80,7 @@ def test_missing_local_state(mp_type: str):
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
env.reset()
env.reset(seed=SEED)
with pytest.raises(NotImplementedError):
env.step(env.action_space.sample())
@ -93,12 +97,14 @@ def test_verbosity(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
env.reset()
info_keys = list(env.step(env.action_space.sample())[3].keys())
env.reset(seed=SEED)
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
info_keys = list(info.keys())
env_step = fancy_gym.make(env_id, SEED)
env_step = make(env_id)
env_step.reset()
info_keys_step = env_step.step(env_step.action_space.sample())[3].keys()
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
info_keys_step = info.keys()
assert all(e in info_keys for e in info_keys_step)
assert 'trajectory_length' in info_keys
@ -118,13 +124,15 @@ def test_length(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]):
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
{'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
for _ in range(5):
env.reset()
length = env.step(env.action_space.sample())[3]['trajectory_length']
for i in range(5):
env.reset(seed=SEED)
assert length == env.spec.max_episode_steps
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
length = info['trajectory_length']
assert length == env.spec.max_episode_steps, f'Expcted total simulation length ({length}) to be equal to spec.max_episode_steps ({env.spec.max_episode_steps}), but was not during test nr. {i}'
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@ -136,9 +144,10 @@ def test_aggregation(mp_type: str, reward_aggregation: Callable[[np.ndarray], fl
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type})
env.reset()
env.reset(seed=SEED)
# ToyEnv only returns 1 as reward
assert env.step(env.action_space.sample())[1] == reward_aggregation(np.ones(50, ))
_obs, reward, _terminated, _truncated, _info = env.step(env.action_space.sample())
assert reward == reward_aggregation(np.ones(50, ))
@pytest.mark.parametrize('mp_type', ['promp', 'dmp'])
@ -151,14 +160,16 @@ def test_context_space(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapp
{'phase_generator_type': 'exp'},
{'basis_generator_type': 'rbf'})
# check if observation space matches with the specified mask values which are true
env_step = fancy_gym.make(env_id, SEED)
env_step = make(env_id)
wrapper = wrapper_class(env_step)
assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@pytest.mark.parametrize('num_dof', [0, 1, 2, 5])
@pytest.mark.parametrize('num_basis', [0, 1, 2, 5])
@pytest.mark.parametrize('num_basis', [
pytest.param(0, marks=pytest.mark.xfail(reason="Basis Length 0 is not yet implemented.")),
1, 2, 5])
@pytest.mark.parametrize('learn_tau', [True, False])
@pytest.mark.parametrize('learn_delay', [True, False])
def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool):
@ -219,16 +230,18 @@ def test_learn_tau(mp_type: str, tau: float):
'learn_delay': False
},
{'basis_generator_type': basis_generator_type,
}, seed=SEED)
})
d = True
env.reset(seed=SEED)
done = True
for i in range(5):
if d:
env.reset()
if done:
env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = tau
obs, r, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps
@ -248,6 +261,8 @@ def test_learn_tau(mp_type: str, tau: float):
assert np.all(vel[:tau_time_steps - 2] != vel[-1])
#
#
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75])
def test_learn_delay(mp_type: str, delay: float):
@ -262,16 +277,18 @@ def test_learn_delay(mp_type: str, delay: float):
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
}, seed=SEED)
})
d = True
env.reset(seed=SEED)
done = True
for i in range(5):
if d:
env.reset()
if done:
env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = delay
obs, r, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps
@ -290,6 +307,8 @@ def test_learn_delay(mp_type: str, delay: float):
assert np.all(vel[max(1, delay_time_steps)] != vel[0])
#
#
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1])
@pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1])
@ -305,20 +324,23 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
}, seed=SEED)
})
env.reset(seed=SEED)
if env.spec.max_episode_steps * env.dt < delay + tau:
return
d = True
done = True
for i in range(5):
if d:
env.reset()
if done:
env.reset(seed=SEED)
action = env.action_space.sample()
action[0] = tau
action[1] = delay
obs, r, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length']
assert length == env.spec.max_episode_steps

View File

@ -1,39 +1,30 @@
from itertools import chain
from typing import Callable
import gymnasium as gym
import pytest
from dm_control import suite, manipulation
import fancy_gym
from test.utils import run_env, run_env_determinism
SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
DMC_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
DMC_IDS = [spec.id for spec in gym.envs.registry.values() if
spec.id.startswith('dm_control/')
and 'compatibility-env-v0' not in spec.id
and 'lqr-lqr' not in spec.id]
DMC_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
@pytest.mark.parametrize('env_id', SUITE_IDS)
def test_step_suite_functionality(env_id: str):
@pytest.mark.parametrize('env_id', DMC_IDS)
def test_step_dm_control_functionality(env_id: str):
"""Tests that suite step environments run without errors using random actions."""
run_env(env_id)
run_env(env_id, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', SUITE_IDS)
def test_step_suite_determinism(env_id: str):
@pytest.mark.parametrize('env_id', DMC_IDS)
def test_step_dm_control_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
run_env_determinism(env_id, SEED)
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
def test_step_manipulation_functionality(env_id: str):
"""Tests that manipulation step environments run without errors using random actions."""
run_env(env_id)
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
def test_step_manipulation_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
run_env_determinism(env_id, SEED)
run_env_determinism(env_id, SEED, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', DMC_MP_IDS)

View File

@ -1,14 +1,16 @@
import itertools
from itertools import chain
from typing import Callable
import fancy_gym
import gym
import gymnasium as gym
import pytest
from test.utils import run_env, run_env_determinism
CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if
CUSTOM_IDS = [id for id, spec in gym.envs.registry.items() if
not isinstance(spec.entry_point, Callable) and
"fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
CUSTOM_MP_IDS = itertools.chain(*fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
CUSTOM_MP_IDS = fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1

View File

@ -0,0 +1,78 @@
from typing import Tuple, Type, Union, Optional, Callable
import gymnasium as gym
import numpy as np
import pytest
from gymnasium import make
from gymnasium.core import ActType, ObsType
import fancy_gym
from fancy_gym import register
KNOWN_NS = ['dm_control', 'fancy', 'metaworld', 'gym']
class Object(object):
pass
class ToyEnv(gym.Env):
observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
action_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
dt = 0.02
def __init__(self, a: int = 0, b: float = 0.0, c: list = [], d: dict = {}, e: Object = Object()):
self.a, self.b, self.c, self.d, self.e = a, b, c, d, e
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@pytest.fixture(scope="session", autouse=True)
def setup():
register(
id=f'dummy/toy2-v0',
entry_point='test.test_black_box:ToyEnv',
max_episode_steps=50,
)
@pytest.mark.parametrize('env_id', ['dummy/toy2-v0'])
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
def test_make_mp(env_id: str, mp_type: str):
parts = env_id.split('/')
if len(parts) == 1:
ns, name = 'gym', parts[0]
elif len(parts) == 2:
ns, name = parts[0], parts[1]
else:
raise ValueError('env id can not contain multiple "/".')
fancy_id = f'{ns}_{mp_type}/{name}'
make(fancy_id)
def test_make_raw_toy():
make('dummy/toy2-v0')
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
def test_make_mp_toy(mp_type: str):
fancy_id = f'dummy_{mp_type}/toy2-v0'
make(fancy_id)
@pytest.mark.parametrize('ns', KNOWN_NS)
def test_ns_nonempty(ns):
assert len(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]), f'The namespace {ns} is empty even though, it should not be...'

View File

@ -6,9 +6,9 @@ from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
import fancy_gym
from test.utils import run_env, run_env_determinism
METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in
METAWORLD_IDS = [f'metaworld/{env.split("-goal-observable")[0]}' for env, _ in
ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
METAWORLD_MP_IDS = chain(*fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
METAWORLD_MP_IDS = fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1
@ -18,6 +18,7 @@ def test_step_metaworld_functionality(env_id: str):
run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_IDS)
def test_step_metaworld_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
@ -30,6 +31,7 @@ def test_bb_metaworld_functionality(env_id: str):
run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_MP_IDS)
def test_bb_metaworld_determinism(env_id: str):
"""Tests that for black box environment identical seeds produce identical trajectories."""

View File

@ -2,21 +2,25 @@ from itertools import chain
from types import FunctionType
from typing import Tuple, Type, Union, Optional
import gym
import gymnasium as gym
import numpy as np
import pytest
from gym import register
from gym.core import ActType, ObsType
from gymnasium import register, make
from gymnasium.core import ActType, ObsType
from gymnasium import spaces
import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
from fancy_gym.utils.wrappers import TimeAwareObservation
from fancy_gym.utils.make_env_helpers import ensure_finite_time
SEED = 1
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
MAX_STEPS_FALLBACK = 50
class ToyEnv(gym.Env):
@ -26,10 +30,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
return np.array([-1])
obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
return np.array([-1]), 1, False, {}
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@ -61,7 +67,7 @@ def setup():
def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool):
env_id, wrapper_class = env_wrap
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class]
# has time aware wrapper
@ -72,24 +78,29 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': 'exp'},
{'basis_generator_type': 'rbf'}, seed=SEED)
{'basis_generator_type': 'rbf'}, fallback_max_steps=MAX_STEPS_FALLBACK)
env.reset(seed=SEED)
assert env.learn_sub_trajectories
assert env.spec.max_episode_steps
assert env_step.spec.max_episode_steps
assert env.traj_gen.learn_tau
# This also verifies we are not adding the TimeAwareObservationWrapper twice
assert env.observation_space == env_step.observation_space
assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
d = True
done = True
for i in range(25):
if d:
env.reset()
if done:
env.reset(seed=SEED)
action = env.action_space.sample()
obs, r, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length']
if not d:
if not done:
assert length == np.round(action[0] / env.dt)
assert length == np.round(env.traj_gen.tau.numpy() / env.dt)
else:
@ -105,14 +116,14 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool, replanning_time: int):
env_id, wrapper_class = env_wrap
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class]
# has time aware wrapper
if add_time_aware_wrapper_before:
wrappers += [TimeAwareObservation]
replanning_schedule = lambda c_pos, c_vel, obs, c_action, t: t % replanning_time == 0
def replanning_schedule(c_pos, c_vel, obs, c_action, t): return t % replanning_time == 0
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear'
@ -121,31 +132,36 @@ def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWra
{'trajectory_generator_type': mp_type},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type},
{'basis_generator_type': basis_generator_type}, seed=SEED)
{'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
env.reset(seed=SEED)
assert env.do_replanning
assert env.spec.max_episode_steps
assert env_step.spec.max_episode_steps
assert callable(env.replanning_schedule)
# This also verifies we are not adding the TimeAwareObservationWrapper twice
assert env.observation_space == env_step.observation_space
assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
env.reset()
env.reset(seed=SEED)
episode_steps = env_step.spec.max_episode_steps // replanning_time
# Make 3 episodes, total steps depend on the replanning steps
for i in range(3 * episode_steps):
action = env.action_space.sample()
obs, r, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length']
if d:
if done:
# Check if number of steps until termination match the replanning interval
print(d, (i + 1), episode_steps)
print(done, (i + 1), episode_steps)
assert (i + 1) % episode_steps == 0
env.reset()
env.reset(seed=SEED)
assert replanning_schedule(None, None, None, None, length)
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -165,15 +181,19 @@ def test_max_planning_times(mp_type: str, max_planning_times: int, sub_segment_s
},
{'basis_generator_type': basis_generator_type,
},
seed=SEED)
_ = env.reset()
d = False
fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset(seed=SEED)
done = False
planning_times = 0
while not d:
_, _, d, _ = env.step(env.action_space.sample())
while not done:
action = env.action_space.sample()
_obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
planning_times += 1
assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -194,17 +214,20 @@ def test_replanning_with_learn_tau(mp_type: str, max_planning_times: int, sub_se
},
{'basis_generator_type': basis_generator_type,
},
seed=SEED)
_ = env.reset()
d = False
fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset(seed=SEED)
done = False
planning_times = 0
while not d:
while not done:
action = env.action_space.sample()
action[0] = tau
_, _, d, info = env.step(action)
_obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
planning_times += 1
assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -213,26 +236,28 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times,
'verbose': 2},
{'trajectory_generator_type': mp_type,
},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type,
'learn_tau': False,
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
},
seed=SEED)
_ = env.reset()
d = False
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times,
'verbose': 2},
{'trajectory_generator_type': mp_type,
},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type,
'learn_tau': False,
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
},
fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset(seed=SEED)
done = False
planning_times = 0
while not d:
while not done:
action = env.action_space.sample()
action[0] = delay
_, _, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt))
pos = info['positions'].flatten()
@ -256,6 +281,7 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3])
@pytest.mark.parametrize('sub_segment_steps', [5, 10, 15])
@ -266,27 +292,29 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times,
'verbose': 2},
{'trajectory_generator_type': mp_type,
},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type,
'learn_tau': True,
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
},
seed=SEED)
_ = env.reset()
d = False
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times,
'verbose': 2},
{'trajectory_generator_type': mp_type,
},
{'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type,
'learn_tau': True,
'learn_delay': True
},
{'basis_generator_type': basis_generator_type,
},
fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset(seed=SEED)
done = False
planning_times = 0
while not d:
while not done:
action = env.action_space.sample()
action[0] = tau
action[1] = delay
_, _, d, info = env.step(action)
_obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt))
@ -306,6 +334,7 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -325,9 +354,11 @@ def test_replanning_schedule(mp_type: str, max_planning_times: int, sub_segment_
},
{'basis_generator_type': basis_generator_type,
},
seed=SEED)
_ = env.reset()
d = False
fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset(seed=SEED)
for i in range(max_planning_times):
_, _, d, _ = env.step(env.action_space.sample())
assert d
action = env.action_space.sample()
_obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
assert done

View File

@ -1,9 +1,12 @@
import gym
from typing import List, Type
import gymnasium as gym
import numpy as np
from fancy_gym import make
from gymnasium import make
def run_env(env_id, iterations=None, seed=0, render=False):
def run_env(env_id: str, iterations: int = None, seed: int = 0, wrappers: List[Type[gym.Wrapper]] = [],
render: bool = False):
"""
Example for running a DMC based env in the step based setting.
The env_id has to be specified as `dmc:domain_name-task_name` or
@ -13,70 +16,88 @@ def run_env(env_id, iterations=None, seed=0, render=False):
env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
iterations: Number of rollout steps to run
seed: random seeding
wrappers: List of Wrappers to apply to the environment
render: Render the episode
Returns: observations, rewards, dones, actions
Returns: observations, rewards, terminations, truncations, actions
"""
env: gym.Env = make(env_id, seed=seed)
env: gym.Env = make(env_id)
for w in wrappers:
env = w(env)
rewards = []
observations = []
actions = []
dones = []
obs = env.reset()
terminations = []
truncations = []
obs, _ = env.reset(seed=seed)
env.action_space.seed(seed)
verify_observations(obs, env.observation_space, "reset()")
iterations = iterations or (env.spec.max_episode_steps or 1)
# number of samples(multiple environment steps)
# number of samples (multiple environment steps)
for i in range(iterations):
observations.append(obs)
ac = env.action_space.sample()
actions.append(ac)
# ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
obs, reward, done, info = env.step(ac)
obs, reward, terminated, truncated, info = env.step(ac)
verify_observations(obs, env.observation_space, "step()")
verify_reward(reward)
verify_done(done)
verify_done(terminated)
verify_done(truncated)
rewards.append(reward)
dones.append(done)
terminations.append(terminated)
truncations.append(truncated)
if render:
env.render("human")
if done:
if terminated or truncated:
break
if not hasattr(env, "replanning_schedule"):
assert done, "Done flag is not True after end of episode."
assert terminated or truncated, f"Termination or truncation flag is not True after {i + 1} iterations."
observations.append(obs)
env.close()
del env
return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
return np.array(observations), np.array(rewards), np.array(terminations), np.array(truncations), np.array(actions)
def run_env_determinism(env_id: str, seed: int):
traj1 = run_env(env_id, seed=seed)
traj2 = run_env(env_id, seed=seed)
def run_env_determinism(env_id: str, seed: int, iterations: int = None, wrappers: List[Type[gym.Wrapper]] = []):
traj1 = run_env(env_id, iterations=iterations,
seed=seed, wrappers=wrappers)
traj2 = run_env(env_id, iterations=iterations,
seed=seed, wrappers=wrappers)
# Iterate over two trajectories, which should have the same state and action sequence
for i, time_step in enumerate(zip(*traj1, *traj2)):
obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
assert np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match."
assert np.array_equal(ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
assert np.array_equal(rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
assert np.array_equal(done1, done2), f"Dones [{i}] {done1} and {done2} do not match."
obs1, rwd1, term1, trunc1, ac1, obs2, rwd2, term2, trunc2, ac2 = time_step
assert np.allclose(
obs1, obs2), f"Observations [{i}] {obs1} ({obs1.shape}) and {obs2} ({obs2.shape}) do not match: Biggest difference is {np.abs(obs1-obs2).max()} at index {np.abs(obs1-obs2).argmax()}."
assert np.array_equal(
ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
assert np.array_equal(
rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
assert np.array_equal(
term1, term2), f"Terminateds [{i}] {term1} and {term2} do not match."
assert np.array_equal(
term1, term2), f"Truncateds [{i}] {trunc1} and {trunc2} do not match."
def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"):
assert observation_space.contains(obs), \
f"Observation {obs} received from {obs_type} not contained in observation space {observation_space}."
f"Observation {obs} ({obs.shape}) received from {obs_type} not contained in observation space {observation_space}."
def verify_reward(reward):
assert isinstance(reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
assert isinstance(
reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
def verify_done(done):
assert isinstance(done, bool), f"Returned {done} as done flag, expected bool."
assert isinstance(
done, bool), f"Returned {done} as done flag, expected bool."