Merge pull request #75 from D-o-d-o-x/great_refactor

Refactor and Upgrade to Gymnasium
This commit is contained in:
Dominik Roth 2023-10-11 13:42:00 +02:00 committed by GitHub
commit c420a96d4f
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
84 changed files with 3094 additions and 2741 deletions

251
README.md
View File

@ -1,103 +1,112 @@
# Fancy Gym <h1 align="center">
<br>
<img src='./icon.svg' width="250px">
<br><br>
<b>Fancy Gym</b>
<br><br>
</h1>
`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface | :exclamation: Fancy Gym has recently received a major refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the [legacy branch](https://github.com/ALRhub/fancy_gym/tree/legacy). Find out more about what changed [here](https://github.com/ALRhub/fancy_gym/pull/75). |
of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAIs renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
you would then submit a PR for this environment to become part of `fancy_gym`. **Key Features**:
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP). - **New Challenging Environments**: `fancy_gym` includes several new environments (Panda Box Pushing, Table Tennis, etc.) that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
- **Support for Movement Primitives**: `fancy_gym` supports a range of movement primitives (MPs), including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
- **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
- **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/), whether you want to use them in the regular step-based setting or using MPs.
- **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
## Movement Primitive Environments (Episode-Based/Black-Box Environments) ## Movement Primitive Environments (Episode-Based/Black-Box Environments)
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box <p align="justify">
optimization, and methods that are often used in traditional robotics and control. MP environments are typically Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement </p>
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual <p align="justify">
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, </p>
respectively as well as a special controller for the MetaWorld control suite. <p align="justify">
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for </p>
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
episode. This requires to predict a new action/MP parametrization for each context.
## Installation ## Installation
1. Clone the repository 1. Clone the repository
```bash ```bash
git clone git@github.com:ALRhub/fancy_gym.git git clone git@github.com:ALRhub/fancy_gym.git
``` ```
2. Go to the folder 2. Go to the folder
```bash ```bash
cd fancy_gym cd fancy_gym
``` ```
3. Install with 3. Install with
```bash ```bash
pip install -e . pip install -e .
``` ```
In case you want to use dm_control oder metaworld, you can install them by specifying extras We have a few optional dependencies. If you also want to install those use
```bash ```bash
pip install -e .[dmc,metaworld] pip install -e '.[all]' # to install all optional dependencies
pip install -e '.[dmc,metaworld,box2d,mujoco,mujoco-legacy,jax,testing]' # or choose only those you want
``` ```
> **Note:**
> While our library already fully supports the new mujoco bindings, metaworld still relies on
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
## How to use Fancy Gym ## How to use Fancy Gym
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look. We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
### Step-wise Environments ### Step-Based Environments
Regular step based environments added by Fancy Gym are added into the `fancy/` namespace.
| :exclamation: Legacy versions of Fancy Gym used `fancy_gym.make(...)`. This is no longer supported and will raise an Exception on new versions. |
| ----------------------------------------------------------------------------------------------------------------------------------------------- |
```python ```python
import gymnasium as gym
import fancy_gym import fancy_gym
env = fancy_gym.make('Reacher5d-v0', seed=1) env = gym.make('fancy/Reacher5d-v0')
obs = env.reset() # or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
# or env = gym.make('dm_control/ball_in_cup-catch-v0')
# or env = gym.make('Reacher-v2')
observation = env.reset(seed=1)
for i in range(1000): for i in range(1000):
action = env.action_space.sample() action = env.action_space.sample()
obs, reward, done, info = env.step(action) observation, reward, terminated, truncated, info = env.step(action)
if i % 5 == 0: if i % 5 == 0:
env.render() env.render()
if done: if terminated or truncated:
obs = env.reset() observation, info = env.reset()
``` ```
When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
custom tasks and standard gym environments can be created without prefixes.
### Black-box Environments ### Black-box Environments
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
each environment returns all collected information from each step as part of the infos. This information is, however,
mainly meant for debugging as well as logging and not for training.
|Key| Description|Type | Key | Description | Type |
|---|---|---| | ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
`positions`| Generated trajectory from MP | Optional | `positions` | Generated trajectory from MP | Optional |
`velocities`| Generated trajectory from MP | Optional | `velocities` | Generated trajectory from MP | Optional |
`step_actions`| Step-wise executed action based on controller output | Optional | `step_actions` | Step-wise executed action based on controller output | Optional |
`step_observations`| Step-wise intermediate observations | Optional | `step_observations` | Step-wise intermediate observations | Optional |
`step_rewards`| Step-wise rewards | Optional | `step_rewards` | Step-wise rewards | Optional |
`trajectory_length`| Total number of environment interactions | Always | `trajectory_length` | Total number of environment interactions | Always |
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always | `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always |
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory. Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by `<original namespace>_<MP name>/`.
Just keep in mind, calling `step()` executes a full trajectory.
> **Note:** > **Note:**
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories. > Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and > This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
> black-box leaning. > black-box leaning.
@ -105,30 +114,38 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
> Feel free to try it and open an issue with any problems that occur. > Feel free to try it and open an issue with any problems that occur.
```python ```python
import gymnasium as gym
import fancy_gym import fancy_gym
env = fancy_gym.make('Reacher5dProMP-v0', seed=1) env = gym.make('fancy_ProMP/Reacher5d-v0')
# or env = gym.make('metaworld_ProDMP/reach-v2')
# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
# render() can be called once in the beginning with all necessary arguments. # render() can be called once in the beginning with all necessary arguments.
# To turn it of again just call render() without any arguments. # To turn it of again just call render() without any arguments.
env.render(mode='human') env.render(mode='human')
# This returns the context information, not the full state observation # This returns the context information, not the full state observation
obs = env.reset() observation, info = env.reset(seed=1)
for i in range(5): for i in range(5):
action = env.action_space.sample() action = env.action_space.sample()
obs, reward, done, info = env.step(action) observation, reward, terminated, truncated, info = env.step(action)
# Done is always True as we are working on the episode level, hence we always reset() # terminated or truncated is always True as we are working on the episode level, hence we always reset()
obs = env.reset() observation, info = env.reset()
``` ```
To show all available environments, we provide some additional convenience variables. All of them return a dictionary To show all available environments, we provide some additional convenience variables. All of them return a dictionary
with two keys `DMP` and `ProMP` that store a list of available environment ids. with the keys `DMP`, `ProMP`, `ProDMP` and `all` that store a list of available environment ids.
```python ```python
import fancy_gym import fancy_gym
print("All Black-box tasks:")
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("Fancy Black-box tasks:") print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
@ -140,6 +157,9 @@ print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("MetaWorld Black-box tasks:") print("MetaWorld Black-box tasks:")
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("If you add custom envs, their mp versions will be found in:")
print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
``` ```
### How to create a new MP task ### How to create a new MP task
@ -151,23 +171,27 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
from abc import abstractmethod from abc import abstractmethod
from typing import Union, Tuple from typing import Union, Tuple
import gym import gymnasium as gym
import numpy as np import numpy as np
class RawInterfaceWrapper(gym.Wrapper): class RawInterfaceWrapper(gym.Wrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
""" """
Returns boolean mask of the same shape as the observation space. Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not. It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case. This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task. context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns: Returns:
bool array representing the indices of the observations bool array representing the indices of the observations
""" """
return np.ones(self.env.observation_space.shape[0], dtype=bool) return np.ones(self.env.observation_space.shape[0], dtype=bool)
@ -197,34 +221,91 @@ class RawInterfaceWrapper(gym.Wrapper):
``` ```
Default configurations for MPs can be overitten by defining attributes in mp_config.
Available parameters are documented in the [MP_PyTorch Userguide](https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md).
```python
class RawInterfaceWrapper(gym.Wrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
# When selecting another generator type, the default configuration will not be merged for the attribute.
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'num_basis_zero_start': 1,
'num_basis_zero_goal': 1,
},
},
'DMP': {},
'ProDMP': {}.
}
[...]
```
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
at the [examples](fancy_gym/examples/). at the [examples](fancy_gym/examples/).
If the step-based is already registered with gym, you can simply do the following:
```python ```python
import fancy_gym fancy_gym.upgrade(
id='custom/cool_new_env-v0',
mp_wrapper=my_custom_MPWrapper
)
```
# Base environment name, according to structure of above example If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
base_env_id = "dmc:ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper. ```python
# You can also add other gym.Wrappers in case they are needed, fancy_gym.register(
# e.g. gym.wrappers.FlattenObservation for dict observations id='custom/cool_new_env-v0',
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper] entry_point=my_custom_env,
kwargs = {...} mp_wrapper=my_custom_MPWrapper
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs) )
```
From this point on, you can access MP-version of your environments via
```python
env = gym.make('custom_ProDMP/cool_new_env-v0')
rewards = 0 rewards = 0
obs = env.reset() observation, info = env.reset()
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(5): for i in range(5):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) observation, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(base_env_id, rewards) print(rewards)
rewards = 0 rewards = 0
obs = env.reset() observation, info = env.reset()
``` ```
## Citing the Project
To cite this repository in publications:
```bibtex
@software{fancy_gym,
title = {Fancy Gym},
author = {Otto, Fabian and Celik, Onur and Roth, Dominik and Zhou, Hongyi},
abstract = {Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.},
url = {https://github.com/ALRhub/fancy_gym},
organization = {Autonomous Learning Robots Lab (ALR) at KIT},
}
```
## Icon Attribution
The icon is based on the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) icon as can be found [here](https://gymnasium.farama.org/_static/img/gymnasium_black.svg).

View File

@ -1,13 +1,17 @@
from fancy_gym import dmc, meta, open_ai from fancy_gym import dmc, meta, open_ai
from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank from fancy_gym import envs as fancy
from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS from fancy_gym.utils.make_env_helpers import make_bb
# Convenience function for all MP environments from .envs.registry import register, upgrade
from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS from .envs.registry import ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS, MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS
from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = { ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['dm_control']
key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['fancy']
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['metaworld']
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['gym']
for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
def make(*args, **kwargs):
"""
As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.
"""
raise Exception('As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.')

View File

@ -1,8 +1,9 @@
from typing import Tuple, Optional, Callable from typing import Tuple, Optional, Callable, Dict, Any
import gym import gymnasium as gym
import numpy as np import numpy as np
from gym import spaces from gymnasium import spaces
from gymnasium.core import ObsType
from mp_pytorch.mp.mp_interfaces import MPInterface from mp_pytorch.mp.mp_interfaces import MPInterface
from fancy_gym.black_box.controller.base_controller import BaseController from fancy_gym.black_box.controller.base_controller import BaseController
@ -67,7 +68,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
self.reward_aggregation = reward_aggregation self.reward_aggregation = reward_aggregation
# spaces # spaces
self.return_context_observation = not (learn_sub_trajectories or self.do_replanning) self.return_context_observation = not (
learn_sub_trajectories or self.do_replanning)
self.traj_gen_action_space = self._get_traj_gen_action_space() self.traj_gen_action_space = self._get_traj_gen_action_space()
self.action_space = self._get_action_space() self.action_space = self._get_action_space()
self.observation_space = self._get_observation_space() self.observation_space = self._get_observation_space()
@ -99,14 +101,17 @@ class BlackBoxWrapper(gym.ObservationWrapper):
# If we do not do this, the traj_gen assumes we are continuing the trajectory. # If we do not do this, the traj_gen assumes we are continuing the trajectory.
self.traj_gen.reset() self.traj_gen.reset()
clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high) clipped_params = np.clip(
action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
self.traj_gen.set_params(clipped_params) self.traj_gen.set_params(clipped_params)
init_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt) init_time = np.array(
0 if not self.do_replanning else self.current_traj_steps * self.dt)
condition_pos = self.condition_pos if self.condition_pos is not None else self.current_pos condition_pos = self.condition_pos if self.condition_pos is not None else self.env.get_wrapper_attr('current_pos')
condition_vel = self.condition_vel if self.condition_vel is not None else self.current_vel condition_vel = self.condition_vel if self.condition_vel is not None else self.env.get_wrapper_attr('current_vel')
self.traj_gen.set_initial_conditions(init_time, condition_pos, condition_vel) self.traj_gen.set_initial_conditions(
init_time, condition_pos, condition_vel)
self.traj_gen.set_duration(duration, self.dt) self.traj_gen.set_duration(duration, self.dt)
position = get_numpy(self.traj_gen.get_traj_pos()) position = get_numpy(self.traj_gen.get_traj_pos())
@ -153,7 +158,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
trajectory_length = len(position) trajectory_length = len(position)
rewards = np.zeros(shape=(trajectory_length,)) rewards = np.zeros(shape=(trajectory_length,))
if self.verbose >= 2: if self.verbose >= 2:
actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape) actions = np.zeros(shape=(trajectory_length,) +
self.env.action_space.shape)
observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape, observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
dtype=self.env.observation_space.dtype) dtype=self.env.observation_space.dtype)
@ -161,16 +167,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
done = False done = False
if not traj_is_valid: if not traj_is_valid:
obs, trajectory_return, done, infos = self.env.invalid_traj_callback(action, position, velocity, obs, trajectory_return, terminated, truncated, infos = self.env.invalid_traj_callback(action, position, velocity,
self.return_context_observation, self.return_context_observation, self.tau_bound, self.delay_bound)
self.tau_bound, self.delay_bound) return self.observation(obs), trajectory_return, terminated, truncated, infos
return self.observation(obs), trajectory_return, done, infos
self.plan_steps += 1 self.plan_steps += 1
for t, (pos, vel) in enumerate(zip(position, velocity)): for t, (pos, vel) in enumerate(zip(position, velocity)):
step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel) step_action = self.tracking_controller.get_action(
c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high) pos, vel, self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'))
obs, c_reward, done, info = self.env.step(c_action) c_action = np.clip(
step_action, self.env.action_space.low, self.env.action_space.high)
obs, c_reward, terminated, truncated, info = self.env.step(
c_action)
rewards[t] = c_reward rewards[t] = c_reward
if self.verbose >= 2: if self.verbose >= 2:
@ -185,9 +193,7 @@ class BlackBoxWrapper(gym.ObservationWrapper):
if self.render_kwargs: if self.render_kwargs:
self.env.render(**self.render_kwargs) self.env.render(**self.render_kwargs)
if done or (self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action, if terminated or truncated or (self.replanning_schedule(self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'), obs, c_action, t + 1 + self.current_traj_steps) and self.plan_steps < self.max_planning_times):
t + 1 + self.current_traj_steps)
and self.plan_steps < self.max_planning_times):
if self.condition_on_desired: if self.condition_on_desired:
self.condition_pos = pos self.condition_pos = pos
@ -207,17 +213,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
infos['trajectory_length'] = t + 1 infos['trajectory_length'] = t + 1
trajectory_return = self.reward_aggregation(rewards[:t + 1]) trajectory_return = self.reward_aggregation(rewards[:t + 1])
return self.observation(obs), trajectory_return, done, infos return self.observation(obs), trajectory_return, terminated, truncated, infos
def render(self, **kwargs): def render(self, **kwargs):
"""Only set render options here, such that they can be used during the rollout. """Only set render options here, such that they can be used during the rollout.
This only needs to be called once""" This only needs to be called once"""
self.render_kwargs = kwargs self.render_kwargs = kwargs
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_traj_steps = 0 self.current_traj_steps = 0
self.plan_steps = 0 self.plan_steps = 0
self.traj_gen.reset() self.traj_gen.reset()
self.condition_pos = None self.condition_pos = None
self.condition_vel = None self.condition_vel = None
return super(BlackBoxWrapper, self).reset() return super(BlackBoxWrapper, self).reset(seed=seed, options=options)

View File

@ -11,11 +11,11 @@ def get_controller(controller_type: str, **kwargs):
if controller_type == "motor": if controller_type == "motor":
return PDController(**kwargs) return PDController(**kwargs)
elif controller_type == "velocity": elif controller_type == "velocity":
return VelController() return VelController(**kwargs)
elif controller_type == "position": elif controller_type == "position":
return PosController() return PosController(**kwargs)
elif controller_type == "metaworld": elif controller_type == "metaworld":
return MetaWorldController() return MetaWorldController(**kwargs)
else: else:
raise ValueError(f"Specified controller type {controller_type} not supported, " raise ValueError(f"Specified controller type {controller_type} not supported, "
f"please choose one of {ALL_TYPES}.") f"please choose one of {ALL_TYPES}.")

View File

@ -1,6 +1,6 @@
from typing import Union, Tuple from typing import Union, Tuple
import gym import gymnasium as gym
import numpy as np import numpy as np
from mp_pytorch.mp.mp_interfaces import MPInterface from mp_pytorch.mp.mp_interfaces import MPInterface
@ -114,7 +114,8 @@ class RawInterfaceWrapper(gym.Wrapper):
Returns: Returns:
obs: artificial observation if the trajectory is invalid, by default a zero vector obs: artificial observation if the trajectory is invalid, by default a zero vector
reward: artificial reward if the trajectory is invalid, by default 0 reward: artificial reward if the trajectory is invalid, by default 0
done: artificial done if the trajectory is invalid, by default True terminated: artificial terminated if the trajectory is invalid, by default True
truncated: artificial truncated if the trajectory is invalid, by default False
info: artificial info if the trajectory is invalid, by default empty dict info: artificial info if the trajectory is invalid, by default empty dict
""" """
return np.zeros(1), 0, True, {} return np.zeros(1), 0, True, False, {}

View File

@ -1,7 +1,7 @@
# DeepMind Control (DMC) Wrappers # DeepMind Control (DMC) Wrappers
These are the Environment Wrappers for selected These are the Environment Wrappers for selected
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
environments in order to use our Motion Primitive gym interface with them. environments in order to use our Motion Primitive gym interface with them.
## MP Environments ## MP Environments
@ -9,11 +9,11 @@ environments in order to use our Motion Primitive gym interface with them.
[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite &#40;DMC&#41; counterparts. Given most task can be) [//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite &#40;DMC&#41; counterparts. Given most task can be)
[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.) [//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension | Name | Description | Trajectory Horizon | Action Dimension | Context Dimension |
|---|---|---|---|---| | ---------------------------------------- | ------------------------------------------------------------------------------ | ------------------ | ---------------- | ----------------- |
|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 | `dm_control_ProDMP/ball_in_cup-catch-v0` | A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2 | `dm_control_DMP/ball_in_cup-catch-v0` | A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 | `dm_control_ProDMP/reacher-easy-v0` | A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4 | `dm_control_DMP/reacher-easy-v0` | A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4 | `dm_control_ProDMP/reacher-hard-v0` | A ProMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 | `dm_control_DMP/reacher-hard-v0` | A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |

View File

@ -1,245 +1,61 @@
from copy import deepcopy from copy import deepcopy
from gymnasium.wrappers import FlattenObservation
from gymnasium.envs.registration import register
from ..envs.registry import register
from . import manipulation, suite from . import manipulation, suite
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
from gym.envs.registration import register
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 50.,
"d_gains": 1.,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
}
}
DEFAULT_BB_DICT_DMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'dmp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 50.,
"d_gains": 1.,
},
"basis_generator_kwargs": {
'basis_generator_type': 'rbf',
'num_basis': 5
}
}
# DeepMind Control Suite (DMC) # DeepMind Control Suite (DMC)
kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
# bandwidth_factor=2
kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1
register( register(
id=f'dmc_ball_in_cup-catch_dmp-v0', id=f"dm_control/ball_in_cup-catch-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_bic_dmp mp_wrapper=suite.ball_in_cup.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
register( register(
id=f'dmc_ball_in_cup-catch_promp-v0', id=f"dm_control/reacher-easy-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_bic_promp mp_wrapper=suite.reacher.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
# bandwidth_factor=2
kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
register( register(
id=f'dmc_reacher-easy_dmp-v0', id=f"dm_control/reacher-hard-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_bic_dmp mp_wrapper=suite.reacher.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=f'dmc_reacher-easy_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_easy_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
# bandwidth_factor = 2
kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
register(
id=f'dmc_reacher-hard_dmp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_hard_dmp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=f'dmc_reacher-hard_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_hard_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"] _dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
for _task in _dmc_cartpole_tasks: for _task in _dmc_cartpole_tasks:
_env_id = f'dmc_cartpole-{_task}_dmp-v0'
kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
register( register(
id=_env_id, id=f'dm_control/cartpole-{_task}-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_cartpole_dmp mp_wrapper=suite.cartpole.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'dmc_cartpole-{_task}_promp-v0'
kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
_env_id = f'dmc_cartpole-two_poles_dmp-v0'
register( register(
id=_env_id, id=f"dm_control/cartpole-two_poles-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_cartpole2poles_dmp mp_wrapper=suite.cartpole.TwoPolesMPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
_env_id = f'dmc_cartpole-two_poles_promp-v0'
register( register(
id=_env_id, id=f"dm_control/cartpole-three_poles-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_cartpole2poles_promp mp_wrapper=suite.cartpole.ThreePolesMPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
# bandwidth_factor = 2
kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
_env_id = f'dmc_cartpole-three_poles_dmp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole3poles_dmp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
_env_id = f'dmc_cartpole-three_poles_promp-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_cartpole3poles_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# DeepMind Manipulation # DeepMind Manipulation
kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
# TODO: weight scale 50, but goal scale 0.1
kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
register( register(
id=f'dmc_manipulation-reach_site_dmp-v0', id=f"dm_control/reach_site_features-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_mani_reach_site_features_dmp mp_wrapper=manipulation.reach_site.MPWrapper,
add_mp_types=['DMP', 'ProMP'],
) )
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
register(
id=f'dmc_manipulation-reach_site_promp-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_mani_reach_site_features_promp
)
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")

View File

@ -1,186 +0,0 @@
# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
# License: MIT
# Copyright (c) 2020 Denis Yarats
import collections
from collections.abc import MutableMapping
from typing import Any, Dict, Tuple, Optional, Union, Callable
import gym
import numpy as np
from dm_control import composer
from dm_control.rl import control
from dm_env import specs
from gym import spaces
from gym.core import ObsType
def _spec_to_box(spec):
def extract_min_max(s):
assert s.dtype == np.float64 or s.dtype == np.float32, \
f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
dim = int(np.prod(s.shape))
if type(s) == specs.Array:
bound = np.inf * np.ones(dim, dtype=s.dtype)
return -bound, bound
elif type(s) == specs.BoundedArray:
zeros = np.zeros(dim, dtype=s.dtype)
return s.minimum + zeros, s.maximum + zeros
mins, maxs = [], []
for s in spec:
mn, mx = extract_min_max(s)
mins.append(mn)
maxs.append(mx)
low = np.concatenate(mins, axis=0)
high = np.concatenate(maxs, axis=0)
assert low.shape == high.shape
return spaces.Box(low, high, dtype=s.dtype)
def _flatten_obs(obs: MutableMapping):
"""
Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
Args:
obs: observation to flatten
Returns: 1D array of observation
"""
if not isinstance(obs, MutableMapping):
raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
# Keep key order consistent for non OrderedDicts
keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
return np.concatenate(obs_vals)
class DMCWrapper(gym.Env):
def __init__(self,
env: Callable[[], Union[composer.Environment, control.Environment]],
):
# TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
self._env = env()
# action and observation space
self._action_space = _spec_to_box([self._env.action_spec()])
self._observation_space = _spec_to_box(self._env.observation_spec().values())
self._window = None
self.id = 'dmc'
def __getattr__(self, item):
"""Propagate only non-existent properties to wrapped env."""
if item.startswith('_'):
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
if item in self.__dict__:
return getattr(self, item)
return getattr(self._env, item)
def _get_obs(self, time_step):
obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
return obs
@property
def observation_space(self):
return self._observation_space
@property
def action_space(self):
return self._action_space
@property
def dt(self):
return self._env.control_timestep()
def seed(self, seed=None):
self._action_space.seed(seed)
self._observation_space.seed(seed)
def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
assert self._action_space.contains(action)
extra = {'internal_state': self._env.physics.get_state().copy()}
time_step = self._env.step(action)
reward = time_step.reward or 0.
done = time_step.last()
obs = self._get_obs(time_step)
extra['discount'] = time_step.discount
return obs, reward, done, extra
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
time_step = self._env.reset()
obs = self._get_obs(time_step)
return obs
def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
segmentation=False, scene_option=None, render_flag_overrides=None):
# assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
if mode == "rgb_array":
return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
depth=depth, segmentation=segmentation, scene_option=scene_option,
render_flag_overrides=render_flag_overrides)
# Render max available buffer size. Larger is only possible by altering the XML.
img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
width=self._env.physics.model.vis.global_.offwidth,
camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
scene_option=scene_option, render_flag_overrides=render_flag_overrides)
if depth:
img = np.dstack([img.astype(np.uint8)] * 3)
if mode == 'human':
try:
import cv2
if self._window is None:
self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
cv2.imshow(self.id, img[..., ::-1]) # Image in BGR
cv2.waitKey(1)
except ImportError:
raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
# PYGAME seems to destroy some global rendering configs from the physics render
# except ImportError:
# import pygame
# img_copy = img.copy().transpose((1, 0, 2))
# if self._window is None:
# pygame.init()
# pygame.display.init()
# self._window = pygame.display.set_mode(img_copy.shape[:2])
# self.clock = pygame.time.Clock()
#
# surf = pygame.surfarray.make_surface(img_copy)
# self._window.blit(surf, (0, 0))
# pygame.event.pump()
# self.clock.tick(30)
# pygame.display.flip()
def close(self):
super().close()
if self._window is not None:
try:
import cv2
cv2.destroyWindow(self.id)
except ImportError:
import pygame
pygame.display.quit()
pygame.quit()
@property
def reward_range(self) -> Tuple[float, float]:
reward_spec = self._env.reward_spec()
if isinstance(reward_spec, specs.BoundedArray):
return reward_spec.minimum, reward_spec.maximum
return -float('inf'), float('inf')
@property
def metadata(self):
return {'render.modes': ['human', 'rgb_array'],
'video.frames_per_second': round(1.0 / self._env.control_timestep())}

View File

@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
@ -35,4 +57,4 @@ class MPWrapper(RawInterfaceWrapper):
@property @property
def dt(self) -> Union[float, int]: def dt(self) -> Union[float, int]:
return self.env.dt return self.env.control_timestep()

View File

@ -6,6 +6,25 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 10
},
},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
@ -31,4 +50,4 @@ class MPWrapper(RawInterfaceWrapper):
@property @property
def dt(self) -> Union[float, int]: def dt(self) -> Union[float, int]:
return self.env.dt return self.env.control_timestep()

View File

@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 10,
'd_gains': 10,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 10,
'd_gains': 10,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
def __init__(self, env, n_poles: int = 1): def __init__(self, env, n_poles: int = 1):
self.n_poles = n_poles self.n_poles = n_poles
@ -35,7 +59,7 @@ class MPWrapper(RawInterfaceWrapper):
@property @property
def dt(self) -> Union[float, int]: def dt(self) -> Union[float, int]:
return self.env.dt return self.env.control_timestep()
class TwoPolesMPWrapper(MPWrapper): class TwoPolesMPWrapper(MPWrapper):

View File

@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 50.0,
'd_gains': 1.0,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.2,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 50.0,
'd_gains': 1.0,
},
'phase_generator': {
'alpha_phase': 2,
},
'trajectory_generator_kwargs': {
'weights_scale': 500,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
@ -30,4 +54,4 @@ class MPWrapper(RawInterfaceWrapper):
@property @property
def dt(self) -> Union[float, int]: def dt(self) -> Union[float, int]:
return self.env.dt return self.env.control_timestep()

View File

@ -1,103 +1,43 @@
from copy import deepcopy from copy import deepcopy
import numpy as np import numpy as np
from gym import register from gymnasium import register as gym_register
from .registry import register, upgrade
from . import classic_control, mujoco from . import classic_control, mujoco
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
from .classic_control.simple_reacher import MPWrapper as MPWrapper_SimpleReacher
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.hole_reacher import MPWrapper as MPWrapper_HoleReacher
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
from .classic_control.viapoint_reacher import MPWrapper as MPWrapper_ViaPointReacher
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
from .mujoco.reacher.mp_wrapper import MPWrapper as MPWrapper_Reacher
from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
from .mujoco.beerpong.mp_wrapper import MPWrapper as MPWrapper_Beerpong
from .mujoco.beerpong.mp_wrapper import MPWrapper_FixedRelease as MPWrapper_Beerpong_FixedRelease
from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \ from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \
BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \ from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \
MAX_EPISODE_STEPS_TABLE_TENNIS MAX_EPISODE_STEPS_TABLE_TENNIS
from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper as MPWrapper_TableTennis
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []} from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper_Replan as MPWrapper_TableTennis_Replan
from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper as MPWrapper_TableTennis_VelObs
DEFAULT_BB_DICT_ProMP = { from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper_Replan as MPWrapper_TableTennis_VelObs_Replan
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1,
'basis_bandwidth_factor': 3.0,
},
"black_box_kwargs": {
}
}
DEFAULT_BB_DICT_DMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'dmp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'rbf',
'num_basis': 5
}
}
DEFAULT_BB_DICT_ProDMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'prodmp',
'duration': 2.0,
'weights_scale': 1.0,
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp',
'tau': 1.5,
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'prodmp',
'alpha': 10,
'num_basis': 5,
},
"black_box_kwargs": {
}
}
# Classic Control # Classic Control
## Simple Reacher # Simple Reacher
register( register(
id='SimpleReacher-v0', id='fancy/SimpleReacher-v0',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', entry_point=SimpleReacherEnv,
mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200, max_episode_steps=200,
kwargs={ kwargs={
"n_links": 2, "n_links": 2,
@ -105,19 +45,20 @@ register(
) )
register( register(
id='LongSimpleReacher-v0', id='fancy/LongSimpleReacher-v0',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', entry_point=SimpleReacherEnv,
mp_wrapper=MPWrapper_SimpleReacher,
max_episode_steps=200, max_episode_steps=200,
kwargs={ kwargs={
"n_links": 5, "n_links": 5,
} }
) )
## Viapoint Reacher # Viapoint Reacher
register( register(
id='ViaPointReacher-v0', id='fancy/ViaPointReacher-v0',
entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv', entry_point=ViaPointReacherEnv,
mp_wrapper=MPWrapper_ViaPointReacher,
max_episode_steps=200, max_episode_steps=200,
kwargs={ kwargs={
"n_links": 5, "n_links": 5,
@ -126,10 +67,11 @@ register(
} }
) )
## Hole Reacher # Hole Reacher
register( register(
id='HoleReacher-v0', id='fancy/HoleReacher-v0',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv', entry_point=HoleReacherEnv,
mp_wrapper=MPWrapper_HoleReacher,
max_episode_steps=200, max_episode_steps=200,
kwargs={ kwargs={
"n_links": 5, "n_links": 5,
@ -145,31 +87,35 @@ register(
# Mujoco # Mujoco
## Mujoco Reacher # Mujoco Reacher
for _dims in [5, 7]: for dims in [5, 7]:
register( register(
id=f'Reacher{_dims}d-v0', id=f'fancy/Reacher{dims}d-v0',
entry_point='fancy_gym.envs.mujoco:ReacherEnv', entry_point=ReacherEnv,
mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER, max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={ kwargs={
"n_links": _dims, "n_links": dims,
} }
) )
register( register(
id=f'Reacher{_dims}dSparse-v0', id=f'fancy/Reacher{dims}dSparse-v0',
entry_point='fancy_gym.envs.mujoco:ReacherEnv', entry_point=ReacherEnv,
mp_wrapper=MPWrapper_Reacher,
max_episode_steps=MAX_EPISODE_STEPS_REACHER, max_episode_steps=MAX_EPISODE_STEPS_REACHER,
kwargs={ kwargs={
"sparse": True, "sparse": True,
'reward_weight': 200, 'reward_weight': 200,
"n_links": _dims, "n_links": dims,
} }
) )
register( register(
id='HopperJumpSparse-v0', id='fancy/HopperJumpSparse-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv', entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP, max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={ kwargs={
"sparse": True, "sparse": True,
@ -177,8 +123,9 @@ register(
) )
register( register(
id='HopperJump-v0', id='fancy/HopperJump-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv', entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
mp_wrapper=mujoco.hopper_jump.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP, max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={ kwargs={
"sparse": False, "sparse": False,
@ -188,76 +135,117 @@ register(
} }
) )
# TODO: Add [MPs] later when finished (old TODO I moved here during refactor)
register( register(
id='AntJump-v0', id='fancy/AntJump-v0',
entry_point='fancy_gym.envs.mujoco:AntJumpEnv', entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP, max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
add_mp_types=[],
) )
register( register(
id='HalfCheetahJump-v0', id='fancy/HalfCheetahJump-v0',
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv', entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP, max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
add_mp_types=[],
) )
register( register(
id='HopperJumpOnBox-v0', id='fancy/HopperJumpOnBox-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv', entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX, max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
add_mp_types=[],
) )
register( register(
id='HopperThrow-v0', id='fancy/HopperThrow-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv', entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW, max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
add_mp_types=[],
) )
register( register(
id='HopperThrowInBasket-v0', id='fancy/HopperThrowInBasket-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv', entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET, max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
add_mp_types=[],
) )
register( register(
id='Walker2DJump-v0', id='fancy/Walker2DJump-v0',
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv', entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP, max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
add_mp_types=[],
)
register( # [MPDone
id='fancy/BeerPong-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
mp_wrapper=MPWrapper_Beerpong,
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
add_mp_types=['ProMP'],
)
# Here we use the same reward as in BeerPong-v0, but now consider after the release,
# only one time step, i.e. we simulate until the end of th episode
register(
id='fancy/BeerPongStepBased-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
mp_wrapper=MPWrapper_Beerpong_FixedRelease,
max_episode_steps=FIXED_RELEASE_STEP,
add_mp_types=['ProMP'],
) )
register( register(
id='BeerPong-v0', id='fancy/BeerPongFixedRelease-v0',
entry_point='fancy_gym.envs.mujoco:BeerPongEnv', entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG, mp_wrapper=MPWrapper_Beerpong_FixedRelease,
max_episode_steps=FIXED_RELEASE_STEP,
add_mp_types=['ProMP'],
) )
# Box pushing environments with different rewards # Box pushing environments with different rewards
for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]: for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]:
register( register(
id='BoxPushing{}-v0'.format(reward_type), id='fancy/BoxPushing{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type), entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING, max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
) )
register( register(
id='BoxPushingRandomInit{}-v0'.format(reward_type), id='fancy/BoxPushingRandomInit{}-v0'.format(reward_type),
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type), entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
mp_wrapper=mujoco.box_pushing.MPWrapper,
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING, max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
kwargs={"random_init": True} kwargs={"random_init": True}
) )
# Here we use the same reward as in BeerPong-v0, but now consider after the release, upgrade(
# only one time step, i.e. we simulate until the end of th episode id='fancy/BoxPushing{}Replan-v0'.format(reward_type),
register( base_id='fancy/BoxPushing{}-v0'.format(reward_type),
id='BeerPongStepBased-v0', mp_wrapper=mujoco.box_pushing.ReplanMPWrapper,
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward', )
max_episode_steps=FIXED_RELEASE_STEP,
)
# Table Tennis environments # Table Tennis environments
for ctxt_dim in [2, 4]: for ctxt_dim in [2, 4]:
register( register(
id='TableTennis{}D-v0'.format(ctxt_dim), id='fancy/TableTennis{}D-v0'.format(ctxt_dim),
entry_point='fancy_gym.envs.mujoco:TableTennisEnv', entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
mp_wrapper=MPWrapper_TableTennis,
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS, max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
add_mp_types=['ProMP', 'ProDMP'],
kwargs={
"ctxt_dim": ctxt_dim,
'frame_skip': 4,
}
)
register(
id='fancy/TableTennis{}DReplan-v0'.format(ctxt_dim),
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
mp_wrapper=MPWrapper_TableTennis,
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
add_mp_types=['ProDMP'],
kwargs={ kwargs={
"ctxt_dim": ctxt_dim, "ctxt_dim": ctxt_dim,
'frame_skip': 4, 'frame_skip': 4,
@ -265,626 +253,39 @@ for ctxt_dim in [2, 4]:
) )
register( register(
id='TableTennisWind-v0', id='fancy/TableTennisWind-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisWind', entry_point='fancy_gym.envs.mujoco:TableTennisWind',
mp_wrapper=MPWrapper_TableTennis_VelObs,
add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS, max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
) )
register( register(
id='TableTennisGoalSwitching-v0', id='fancy/TableTennisWindReplan-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
mp_wrapper=MPWrapper_TableTennis_VelObs_Replan,
add_mp_types=['ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
)
register(
id='fancy/TableTennisGoalSwitching-v0',
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching', entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
mp_wrapper=MPWrapper_TableTennis,
add_mp_types=['ProMP', 'ProDMP'],
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS, max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
kwargs={ kwargs={
'goal_switching_step': 99 'goal_switching_step': 99
} }
) )
# movement Primitive Environments
## Simple Reacher
_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_simple_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_simple_reacher_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_simple_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# Viapoint reacher
kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
register( register(
id='ViaPointReacherDMP-v0', id='fancy/TableTennisGoalSwitchingReplan-v0',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
# max_episode_steps=1, mp_wrapper=MPWrapper_TableTennis_Replan,
kwargs=kwargs_dict_via_point_reacher_dmp add_mp_types=['ProDMP'],
) max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
register(
id="ViaPointReacherProMP-v0",
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_via_point_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
## Hole Reacher
_versions = ["HoleReacher-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
kwargs_dict_hole_reacher_dmp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# max_episode_steps=1,
kwargs=kwargs_dict_hole_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_hole_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
## ReacherNd
_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
kwargs_dict_reacher_dmp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# max_episode_steps=1,
kwargs=kwargs_dict_reacher_dmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
kwargs_dict_reacher_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
########################################################################################################################
## Beerpong ProMP
_versions = ['BeerPong-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
kwargs_dict_bp_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bp_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
### BP with Fixed release
_versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
for _v in _versions:
if _v != 'BeerPong-v0':
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
else:
_env_id = 'BeerPongFixedReleaseProMP-v0'
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
kwargs_dict_bp_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_bp_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
########################################################################################################################
## Table Tennis needs to be fixed according to Zhou's implementation
# TODO: Add later when finished
# ########################################################################################################################
#
# ## AntJump
# _versions = ['AntJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
# kwargs_dict_ant_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_ant_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
#
# ########################################################################################################################
#
# ## HalfCheetahJump
# _versions = ['HalfCheetahJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
# kwargs_dict_halfcheetah_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_halfcheetah_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
#
# ########################################################################################################################
## HopperJump
_versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
# 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
]
# TODO: Check if all environments work with the same MPWrapper
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
kwargs_dict_hopper_jump_promp['name'] = _v
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_hopper_jump_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ########################################################################################################################
## Box Pushing
_versions = ['BoxPushingDense-v0', 'BoxPushingTemporalSparse-v0', 'BoxPushingTemporalSpatialSparse-v0',
'BoxPushingRandomInitDense-v0', 'BoxPushingRandomInitTemporalSparse-v0',
'BoxPushingRandomInitTemporalSpatialSparse-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_box_pushing_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_box_pushing_promp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_promp['name'] = _v
kwargs_dict_box_pushing_promp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_promp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_promp['basis_generator_kwargs']['basis_bandwidth_factor'] = 2 # 3.5, 4 to try
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_prodmp['name'] = _v
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
kwargs_dict_box_pushing_prodmp['name'] = _v
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['max_planning_times'] = 4
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 25 == 0
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['condition_on_desired'] = True
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_box_pushing_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
## Table Tennis
_versions = ['TableTennis2D-v0', 'TableTennis4D-v0', 'TableTennisWind-v0', 'TableTennisGoalSwitching-v0']
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProMP-{_name[1]}'
kwargs_dict_tt_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_promp['name'] = _v
kwargs_dict_tt_promp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_promp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_promp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_promp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis'] = 3
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_start'] = 1
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_goal'] = 1
kwargs_dict_tt_promp['black_box_kwargs']['verbose'] = 2
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_promp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_prodmp['name'] = _v
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.7
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['relative_goal'] = True
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['disable_goal'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 3
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
if _v == 'TableTennisWind-v0':
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
else:
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
kwargs_dict_tt_prodmp['name'] = _v
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = False
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['goal_offset'] = 1.0
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 2
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
kwargs_dict_tt_prodmp['black_box_kwargs']['max_planning_times'] = 3
kwargs_dict_tt_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 50 == 0
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_tt_prodmp
)
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
#
# ## Walker2DJump
# _versions = ['Walker2DJump-v0']
# for _v in _versions:
# _name = _v.split("-")
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
# kwargs_dict_walker2d_jump_promp['name'] = _v
# register(
# id=_env_id,
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
# kwargs=kwargs_dict_walker2d_jump_promp
# )
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
### Depricated, we will not provide non random starts anymore
"""
register(
id='SimpleReacher-v1',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={ kwargs={
"n_links": 2, 'goal_switching_step': 99
"random_start": False
} }
) )
register(
id='LongSimpleReacher-v1',
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False
}
)
register(
id='HoleReacher-v1',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v2',
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": 2,
"collision_penalty": 1,
}
)
# CtxtFree are v0, Contextual are v1
register(
id='AntJump-v0',
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
"context": False
}
)
# CtxtFree are v0, Contextual are v1
register(
id='HalfCheetahJump-v0',
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
"context": False
}
)
register(
id='HopperJump-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
"context": False,
"healthy_reward": 1.0
}
)
"""
### Deprecated used for CorL paper
"""
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
for i in _vs:
_env_id = f'ALRReacher{i}-v0'
register(
id=_env_id,
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 5,
"balance": False,
'_ctrl_cost_weight': i
}
)
_env_id = f'ALRReacherSparse{i}-v0'
register(
id=_env_id,
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": False,
'_ctrl_cost_weight': i
}
)
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
for i in _vs:
_env_id = f'ALRReacher{i}ProMP-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
kwargs={
"name": f"{_env_id.replace('ProMP', '')}",
"wrappers": [mujoco.reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 4,
"policy_type": "motor",
# "weights_scale": 5,
"n_zero_basis": 1,
"zero_start": True,
"policy_kwargs": {
"p_gains": 1,
"d_gains": 0.1
}
}
}
)
_env_id = f'ALRReacherSparse{i}ProMP-v0'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
kwargs={
"name": f"{_env_id.replace('ProMP', '')}",
"wrappers": [mujoco.reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 4,
"policy_type": "motor",
# "weights_scale": 5,
"n_zero_basis": 1,
"zero_start": True,
"policy_kwargs": {
"p_gains": 1,
"d_gains": 0.1
}
}
}
)
register(
id='HopperJumpOnBox-v0',
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
"context": False
}
)
register(
id='HopperThrow-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
"context": False
}
)
register(
id='HopperThrowInBasket-v0',
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
"context": False
}
)
register(
id='Walker2DJump-v0',
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
kwargs={
"max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
"context": False
}
)
register(id='TableTennis2DCtxt-v1',
entry_point='fancy_gym.envs.mujoco:TTEnvGym',
max_episode_steps=MAX_EPISODE_STEPS,
kwargs={'ctxt_dim': 2, 'fixed_goal': True})
register(
id='BeerPong-v0',
entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
max_episode_steps=300,
kwargs={
"rndm_goal": False,
"cup_goal_pos": [0.1, -2.0],
"frame_skip": 2
}
)
"""

View File

@ -1,18 +1,20 @@
### Classic Control ### Classic Control
## Step-based Environments ## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---| | Name | Description | Horizon | Action Dimension | Observation Dimension |
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 | ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 | `fancy/SimpleReacher-v0` | Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 2 | 9 |
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18 | `fancy/LongSimpleReacher-v0` | Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 5 | 18 |
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 | `fancy/ViaPointReacher-v0` | Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively. | 200 | 5 | 18 |
| `fancy/HoleReacher-v0` | 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 |
## MP Environments ## MP Environments
|Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
[//]: |`HoleReacherProMPP-v0`| | Name | Description | Horizon | Action Dimension | Context Dimension |
| ----------------------------------- | -------------------------------------------------------------------------------------------------------- | ------- | ---------------- | ----------------- |
| `fancy_DMP/ViaPointReacher-v0` | A DMP provides a trajectory for the `fancy/ViaPointReacher-v0` task. | 200 | 25 |
| `fancy_DMP/HoleReacherFixedGoal-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 |
| `fancy_DMP/HoleReacher-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 |
[//]: |`fancy/HoleReacherProMPP-v0`|

View File

@ -1,10 +1,10 @@
from typing import Union, Tuple, Optional from typing import Union, Tuple, Optional, Any, Dict
import gym import gymnasium as gym
import numpy as np import numpy as np
from gym import spaces from gymnasium import spaces
from gym.core import ObsType from gymnasium.core import ObsType
from gym.utils import seeding from gymnasium.utils import seeding
from fancy_gym.envs.classic_control.utils import intersect from fancy_gym.envs.classic_control.utils import intersect
@ -55,7 +55,6 @@ class BaseReacherEnv(gym.Env):
self.fig = None self.fig = None
self._steps = 0 self._steps = 0
self.seed()
@property @property
def dt(self) -> Union[float, int]: def dt(self) -> Union[float, int]:
@ -69,10 +68,15 @@ class BaseReacherEnv(gym.Env):
def current_vel(self): def current_vel(self):
return self._angle_velocity.copy() return self._angle_velocity.copy()
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
# Sample only orientation of first link, i.e. the arm is always straight. # Sample only orientation of first link, i.e. the arm is always straight.
if self.random_start: super(BaseReacherEnv, self).reset(seed=seed, options=options)
try:
random_start = options.get('random_start', self.random_start)
except AttributeError:
random_start = self.random_start
if random_start:
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4) first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)]) self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
self._start_pos = self._joint_angles.copy() self._start_pos = self._joint_angles.copy()
@ -84,7 +88,7 @@ class BaseReacherEnv(gym.Env):
self._update_joints() self._update_joints()
self._steps = 0 self._steps = 0
return self._get_obs().copy() return self._get_obs().copy(), {}
def _update_joints(self): def _update_joints(self):
""" """
@ -124,10 +128,6 @@ class BaseReacherEnv(gym.Env):
def _terminate(self, info) -> bool: def _terminate(self, info) -> bool:
raise NotImplementedError raise NotImplementedError
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
def close(self): def close(self):
super(BaseReacherEnv, self).close() super(BaseReacherEnv, self).close()
del self.fig del self.fig

View File

@ -1,5 +1,5 @@
import numpy as np import numpy as np
from gym import spaces from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -32,6 +32,7 @@ class BaseReacherDirectEnv(BaseReacherEnv):
reward, info = self._get_reward(action) reward, info = self._get_reward(action)
self._steps += 1 self._steps += 1
done = self._terminate(info) terminated = self._terminate(info)
truncated = False
return self._get_obs().copy(), reward, done, info return self._get_obs().copy(), reward, terminated, truncated, info

View File

@ -1,5 +1,5 @@
import numpy as np import numpy as np
from gym import spaces from gymnasium import spaces
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
@ -31,6 +31,7 @@ class BaseReacherTorqueEnv(BaseReacherEnv):
reward, info = self._get_reward(action) reward, info = self._get_reward(action)
self._steps += 1 self._steps += 1
done = False terminated = False
truncated = False
return self._get_obs().copy(), reward, done, info return self._get_obs().copy(), reward, terminated, truncated, info

View File

@ -1,17 +1,20 @@
from typing import Union, Optional, Tuple from typing import Union, Optional, Tuple, Any, Dict
import gym import gymnasium as gym
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
from gym.core import ObsType from gymnasium import spaces
from gymnasium.core import ObsType
from matplotlib import patches from matplotlib import patches
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
from . import MPWrapper
MAX_EPISODE_STEPS_HOLEREACHER = 200 MAX_EPISODE_STEPS_HOLEREACHER = 200
class HoleReacherEnv(BaseReacherDirectEnv): class HoleReacherEnv(BaseReacherDirectEnv):
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None, def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False, hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"): allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
@ -40,7 +43,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
[np.inf] # env steps, because reward start after n steps TODO: Maybe [np.inf] # env steps, because reward start after n steps TODO: Maybe
]) ])
# self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape) # self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape) self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
if rew_fct == "simple": if rew_fct == "simple":
from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
@ -54,13 +57,18 @@ class HoleReacherEnv(BaseReacherDirectEnv):
else: else:
raise ValueError("Unknown reward function {}".format(rew_fct)) raise ValueError("Unknown reward function {}".format(rew_fct))
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
# initialize seed here as the random goal needs to be generated before the super reset()
gym.Env.reset(self, seed=seed, options=options)
self._generate_hole() self._generate_hole()
self._set_patches() self._set_patches()
self.reward_function.reset() self.reward_function.reset()
return super().reset() # do not provide seed to avoid setting it twice
return super(HoleReacherEnv, self).reset(options=options)
def _get_reward(self, action: np.ndarray) -> (float, dict): def _get_reward(self, action: np.ndarray) -> (float, dict):
return self.reward_function.get_reward(self) return self.reward_function.get_reward(self)
@ -160,7 +168,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
# all points that are above the hole # all points that are above the hole
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & ( r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2))) line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
# check if any of those points are below surface # check if any of those points are below surface
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth) nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
@ -223,16 +231,3 @@ class HoleReacherEnv(BaseReacherDirectEnv):
self.fig.gca().add_patch(left_block) self.fig.gca().add_patch(left_block)
self.fig.gca().add_patch(right_block) self.fig.gca().add_patch(right_block)
self.fig.gca().add_patch(hole_floor) self.fig.gca().add_patch(hole_floor)
if __name__ == "__main__":
env = HoleReacherEnv(5)
env.reset()
for i in range(10000):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
env.reset()

View File

@ -7,6 +7,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
'weights_scale': 2,
},
},
'DMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
'weights_scale': 500,
},
'phase_generator_kwargs': {
'alpha_phase': 2.5,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):
return np.hstack([ return np.hstack([

View File

@ -7,6 +7,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 0.6,
'd_gains': 0.075,
},
},
'DMP': {
'controller_kwargs': {
'p_gains': 0.6,
'd_gains': 0.075,
},
'trajectory_generator_kwargs': {
'weights_scale': 50,
},
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):
return np.hstack([ return np.hstack([

View File

@ -1,11 +1,12 @@
from typing import Iterable, Union, Optional, Tuple from typing import Iterable, Union, Optional, Tuple, Any, Dict
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
from gym import spaces from gymnasium import spaces
from gym.core import ObsType from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
from . import MPWrapper
class SimpleReacherEnv(BaseReacherTorqueEnv): class SimpleReacherEnv(BaseReacherTorqueEnv):
@ -42,11 +43,15 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
# def start_pos(self): # def start_pos(self):
# return self._start_pos # return self._start_pos
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
# (Env will not behave deterministic otherwise)
# Yes, there is probably a more elegant solution to this problem...
self._generate_goal() self._generate_goal()
super().reset(seed=seed, options=options)
return super().reset() self._generate_goal()
return super().reset(seed=seed, options=options)
def _get_reward(self, action: np.ndarray): def _get_reward(self, action: np.ndarray):
diff = self.end_effector - self._goal diff = self.end_effector - self._goal
@ -127,15 +132,3 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
self.fig.canvas.draw() self.fig.canvas.draw()
self.fig.canvas.flush_events() self.fig.canvas.flush_events()
if __name__ == "__main__":
env = SimpleReacherEnv(5)
env.reset()
for i in range(200):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
break

View File

@ -7,6 +7,26 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
},
'DMP': {
'controller_kwargs': {
'controller_type': 'velocity',
},
'trajectory_generator_kwargs': {
'weights_scale': 50,
},
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):
return np.hstack([ return np.hstack([

View File

@ -1,11 +1,13 @@
from typing import Iterable, Union, Tuple, Optional from typing import Iterable, Union, Tuple, Optional, Any, Dict
import gym import gymnasium as gym
import matplotlib.pyplot as plt import matplotlib.pyplot as plt
import numpy as np import numpy as np
from gym.core import ObsType from gymnasium import spaces
from gymnasium.core import ObsType
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
from . import MPWrapper
class ViaPointReacherEnv(BaseReacherDirectEnv): class ViaPointReacherEnv(BaseReacherDirectEnv):
@ -34,16 +36,21 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
[np.inf] * 2, # x-y coordinates of target distance [np.inf] * 2, # x-y coordinates of target distance
[np.inf] # env steps, because reward start after n steps [np.inf] # env steps, because reward start after n steps
]) ])
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape) self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
# @property # @property
# def start_pos(self): # def start_pos(self):
# return self._start_pos # return self._start_pos
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
# (Env will not behave deterministic otherwise)
# Yes, there is probably a more elegant solution to this problem...
self._generate_goal() self._generate_goal()
return super().reset() super().reset(seed=seed, options=options)
self._generate_goal()
return super().reset(seed=seed, options=options)
def _generate_goal(self): def _generate_goal(self):
# TODO: Maybe improve this later, this can yield quite a lot of invalid settings # TODO: Maybe improve this later, this can yield quite a lot of invalid settings
@ -183,16 +190,3 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k') plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
plt.pause(0.01) plt.pause(0.01)
if __name__ == "__main__":
env = ViaPointReacherEnv(5)
env.reset()
for i in range(10000):
ac = env.action_space.sample()
obs, rew, done, info = env.step(ac)
env.render()
if done:
env.reset()

View File

@ -1,15 +1,48 @@
# Custom Mujoco tasks # Custom Mujoco tasks
## Step-based Environments ## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---| | Name | Description | Horizon | Action Dimension | Observation Dimension |
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21 | ------------------------------------------ | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21 | `fancy/Reacher-v0` | Modified (5 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 5 | 21 |
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21 | `fancy/ReacherSparse-v0` | Same as `fancy/Reacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 5 | 21 |
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 | `fancy/ReacherSparseBalanced-v0` | Same as `fancy/ReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 5 | 21 |
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 | `fancy/LongReacher-v0` | Modified (7 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 7 | 27 |
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 | `fancy/LongReacherSparse-v0` | Same as `fancy/LongReacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 7 | 27 |
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip | `fancy/LongReacherSparseBalanced-v0` | Same as `fancy/LongReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 7 | 27 |
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip | `fancy/Reacher5d-v0` | Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip | `fancy/Reacher5dSparse-v0` | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
| `fancy/Reacher7d-v0` | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/Reacher7dSparse-v0` | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/HopperJump-v0` | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/AntJump-v0` | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant` | 200 | 8 | 119 |
| `fancy/HalfCheetahJump-v0` | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah` | 100 | 6 | 112 |
| `fancy/HopperJumpOnBox-v0` | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 4 | 16 / 100\* |
| `fancy/HopperThrow-v0` | Hopper Throw task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
| `fancy/HopperThrowInBasket-v0` | Hopper Throw in Basket task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
| `fancy/Walker2DJump-v0` | Walker 2D Jump task, based on Gymnasium's `gym.envs.mujoco.Walker2d` | 300 | 6 | 18 / 19\* |
| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
| `fancy/BeerPongStepBased-v0` | Step-based Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BoxPushingDense-v0` | Custom Box-pushing task with dense rewards | 100 | 3 | 13 |
| `fancy/BoxPushingTemporalSparse-v0` | Custom Box-pushing task with temporally sparse rewards | 100 | 3 | 13 |
| `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards | 100 | 3 | 13 |
| `fancy/TableTennis2D-v0` | Table Tennis task with 2D context, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennis2DReplan-v0` | Table Tennis task with 2D context and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennis4D-v0` | Table Tennis task with 4D context, based on a custom environment for table tennis | 350 | 7 | 22 |
| `fancy/TableTennis4DReplan-v0` | Table Tennis task with 4D context and replanning, based on a custom environment for table tennis | 350 | 7 | 22 |
| `fancy/TableTennisWind-v0` | Table Tennis task with wind effects, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
\*Observation dimensions depend on configuration.
<!--
No longer used?
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| --------------------------- | --------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/BallInACupSimple-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip |
| `fancy/BallInACup-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip |
| `fancy/BallInACupGoal-v0` | Similar to `fancy/BallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip |
-->

View File

@ -1,8 +1,11 @@
from typing import Tuple, Union, Optional from typing import Tuple, Union, Optional, Any, Dict
import numpy as np import numpy as np
from gym.core import ObsType from gymnasium.core import ObsType
from gym.envs.mujoco.ant_v4 import AntEnv from gymnasium.envs.mujoco.ant_v4 import AntEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_ANTJUMP = 200 MAX_EPISODE_STEPS_ANTJUMP = 200
@ -12,8 +15,74 @@ MAX_EPISODE_STEPS_ANTJUMP = 200
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh # to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
# as possible, while landing at a specific target position # as possible, while landing at a specific target position
class AntEnvCustomXML(AntEnv):
def __init__(
self,
xml_file="ant.xml",
ctrl_cost_weight=0.5,
use_contact_forces=False,
contact_cost_weight=5e-4,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_z_range=(0.2, 1.0),
contact_force_range=(-1.0, 1.0),
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
ctrl_cost_weight,
use_contact_forces,
contact_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_z_range,
contact_force_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
class AntJumpEnv(AntEnv): self._ctrl_cost_weight = ctrl_cost_weight
self._contact_cost_weight = contact_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_z_range = healthy_z_range
self._contact_force_range = contact_force_range
self._reset_noise_scale = reset_noise_scale
self._use_contact_forces = use_contact_forces
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
obs_shape = 27 + 1
if not exclude_current_positions_from_observation:
obs_shape += 2
if use_contact_forces:
obs_shape += 84
observation_space = Box(
low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
5,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class AntJumpEnv(AntEnvCustomXML):
""" """
Initialization changes to normal Ant: Initialization changes to normal Ant:
- healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc - healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
@ -61,9 +130,10 @@ class AntJumpEnv(AntEnv):
costs = ctrl_cost + contact_cost costs = ctrl_cost + contact_cost
done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle terminated = bool(
height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done: if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or terminated:
# -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled # -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
# height_reward = -10 * (np.linalg.norm(self.max_height - self.goal)) # height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
height_reward = -10 * np.linalg.norm(self.max_height - self.goal) height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
@ -80,19 +150,21 @@ class AntJumpEnv(AntEnv):
'max_height': self.max_height, 'max_height': self.max_height,
'goal': self.goal 'goal': self.goal
} }
truncated = False
return obs, reward, done, info return obs, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.goal) return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0 self.current_step = 0
self.max_height = 0 self.max_height = 0
# goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE # goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.0, 2.5, 1) self.goal = self.np_random.uniform(1.0, 2.5, 1)
return super().reset() return ret
# reset_model had to be implemented in every env to make it deterministic # reset_model had to be implemented in every env to make it deterministic
def reset_model(self): def reset_model(self):

View File

@ -1,9 +1,13 @@
import os import os
from typing import Optional from typing import Optional, Any, Dict, Tuple
import numpy as np import numpy as np
from gym import utils from gymnasium import utils
from gym.envs.mujoco import MujocoEnv from gymnasium.core import ObsType
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
import mujoco
MAX_EPISODE_STEPS_BEERPONG = 300 MAX_EPISODE_STEPS_BEERPONG = 300
FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2! FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2!
@ -30,7 +34,16 @@ CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "c
class BeerPongEnv(MujocoEnv, utils.EzPickle): class BeerPongEnv(MujocoEnv, utils.EzPickle):
def __init__(self): metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 100
}
def __init__(self, **kwargs):
self._steps = 0 self._steps = 0
# Small Context -> Easier. Todo: Should we do different versions? # Small Context -> Easier. Todo: Should we do different versions?
# self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml") # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
@ -50,9 +63,9 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.repeat_action = 2 self.repeat_action = 2
# TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this # TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
self.model = None self.model = None
self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model, self.geom_id = lambda x: mujoco.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM, mujoco.mjtObj.mjOBJ_GEOM,
x) x)
# for reward calculation # for reward calculation
self.dists = [] self.dists = []
@ -65,7 +78,17 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_in_cup = False self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact self.dist_ground_cup = -1 # distance floor to cup if first floor contact
MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco") self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(29,), dtype=np.float64
)
MujocoEnv.__init__(
self,
self.xml_path,
frame_skip=1,
observation_space=self.observation_space,
**kwargs
)
utils.EzPickle.__init__(self) utils.EzPickle.__init__(self)
@property @property
@ -76,7 +99,8 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
def start_vel(self): def start_vel(self):
return self._start_vel return self._start_vel
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.dists = [] self.dists = []
self.dists_final = [] self.dists_final = []
self.action_costs = [] self.action_costs = []
@ -86,7 +110,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
self.ball_cup_contact = False self.ball_cup_contact = False
self.ball_in_cup = False self.ball_in_cup = False
self.dist_ground_cup = -1 # distance floor to cup if first floor contact self.dist_ground_cup = -1 # distance floor to cup if first floor contact
return super().reset() return super().reset(seed=seed, options=options)
def reset_model(self): def reset_model(self):
init_pos_all = self.init_qpos.copy() init_pos_all = self.init_qpos.copy()
@ -128,11 +152,11 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
if not crash: if not crash:
reward, reward_infos = self._get_reward(applied_action) reward, reward_infos = self._get_reward(applied_action)
is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference
done = is_collided terminated = is_collided
self._steps += 1 self._steps += 1
else: else:
reward = -30 reward = -30
done = True terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False} reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict( infos = dict(
@ -142,7 +166,10 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash, q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
) )
infos.update(reward_infos) infos.update(reward_infos)
return ob, reward, done, infos
truncated = False
return ob, reward, terminated, truncated, infos
def _get_obs(self): def _get_obs(self):
theta = self.data.qpos.flat[:7].copy() theta = self.data.qpos.flat[:7].copy()
@ -197,13 +224,13 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0 min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
action_cost = 1e-4 * np.mean(action_cost) action_cost = 1e-4 * np.mean(action_cost)
reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \ reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2 action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
# release step punishment # release step punishment
min_time_bound = 0.1 min_time_bound = 0.1
max_time_bound = 1.0 max_time_bound = 1.0
release_time = self.release_step * self.dt release_time = self.release_step * self.dt
release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \ release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2) int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
reward += release_time_rew reward += release_time_rew
success = self.ball_in_cup success = self.ball_in_cup
else: else:
@ -258,9 +285,9 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a) return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else: else:
reward = 0 reward = 0
done = True terminated, truncated = True, False
while self._steps < MAX_EPISODE_STEPS_BEERPONG: while self._steps < MAX_EPISODE_STEPS_BEERPONG:
obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step( obs, sub_reward, terminated, truncated, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
np.zeros(a.shape)) np.zeros(a.shape))
reward += sub_reward reward += sub_reward
return obs, reward, done, infos return obs, reward, terminated, truncated, infos

View File

@ -1,9 +1,8 @@
import os import os
import mujoco_py.builder
import numpy as np import numpy as np
from gym import utils from gymnasium import utils
from gym.envs.mujoco import MujocoEnv from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
@ -74,27 +73,24 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
crash = False crash = False
for _ in range(self.repeat_action): for _ in range(self.repeat_action):
applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0] applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
try: self.do_simulation(applied_action, self.frame_skip)
self.do_simulation(applied_action, self.frame_skip) self.reward_function.initialize(self)
self.reward_function.initialize(self) # self.reward_function.check_contacts(self.sim) # I assume this is not important?
# self.reward_function.check_contacts(self.sim) # I assume this is not important? if self._steps < self.release_step:
if self._steps < self.release_step: self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy() self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy() crash = False
crash = False
except mujoco_py.builder.MujocoException:
crash = True
ob = self._get_obs() ob = self._get_obs()
if not crash: if not crash:
reward, reward_infos = self.reward_function.compute_reward(self, applied_action) reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
is_collided = reward_infos['is_collided'] is_collided = reward_infos['is_collided']
done = is_collided or self._steps == self.ep_length - 1 terminated = is_collided or self._steps == self.ep_length - 1
self._steps += 1 self._steps += 1
else: else:
reward = -30 reward = -30
done = True terminated = True
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False} reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
infos = dict( infos = dict(
@ -104,7 +100,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash, q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
) )
infos.update(reward_infos) infos.update(reward_infos)
return ob, reward, done, infos return ob, reward, terminated, infos
def _get_obs(self): def _get_obs(self):
theta = self.sim.data.qpos.flat[:7] theta = self.sim.data.qpos.flat[:7]
@ -143,16 +139,16 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a) return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
else: else:
reward = 0 reward = 0
done = False terminated, truncated = False, False
while not done: while not (terminated or truncated):
sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step( sub_ob, sub_reward, terminated, truncated, sub_infos = super(BeerPongEnvStepBasedEpisodicReward,
np.zeros(a.shape)) self).step(np.zeros(a.shape))
reward += sub_reward reward += sub_reward
infos = sub_infos infos = sub_infos
ob = sub_ob ob = sub_ob
ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# internal steps and thus, the observation also needs to be set correctly # internal steps and thus, the observation also needs to be set correctly
return ob, reward, done, infos return ob, reward, terminated, truncated, infos
# class BeerBongEnvStepBased(BeerBongEnv): # class BeerBongEnvStepBased(BeerBongEnv):
@ -186,27 +182,3 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the # ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
# # internal steps and thus, the observation also needs to be set correctly # # internal steps and thus, the observation also needs to be set correctly
# return ob, reward, done, infos # return ob, reward, done, infos
if __name__ == "__main__":
env = BeerPongEnv(frame_skip=2)
env.seed(0)
# env = BeerBongEnvStepBased(frame_skip=2)
# env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
# env = BeerBongEnvFixedReleaseStep(frame_skip=2)
import time
env.reset()
env.render("human")
for i in range(600):
# ac = 10 * env.action_space.sample()
ac = 0.05 * np.ones(7)
obs, rew, d, info = env.step(ac)
env.render("human")
if d:
print('reward:', rew)
print('RESETTING')
env.reset()
time.sleep(1)
env.close()

View File

@ -6,6 +6,23 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'learn_tau': True
},
'controller_kwargs': {
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'num_basis_zero_start': 2,
},
},
'DMP': {},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
@ -39,3 +56,23 @@ class MPWrapper(RawInterfaceWrapper):
xyz[-1] = 0.840 xyz[-1] = 0.840
self.model.body_pos[self.cup_table_id] = xyz self.model.body_pos[self.cup_table_id] = xyz
return self.get_observation_from_step(self.get_obs()) return self.get_observation_from_step(self.get_obs())
class MPWrapper_FixedRelease(MPWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'tau': 0.62,
},
'controller_kwargs': {
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'num_basis_zero_start': 2,
},
},
'DMP': {},
'ProDMP': {},
}

View File

@ -1 +1 @@
from .mp_wrapper import MPWrapper from .mp_wrapper import MPWrapper, ReplanMPWrapper

View File

@ -1,8 +1,8 @@
import os import os
import numpy as np import numpy as np
from gym import utils, spaces from gymnasium import utils, spaces
from gym.envs.mujoco import MujocoEnv from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat
@ -13,6 +13,7 @@ MAX_EPISODE_STEPS_BOX_PUSHING = 100
BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]]) BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]])
class BoxPushingEnvBase(MujocoEnv, utils.EzPickle): class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
""" """
franka box pushing environment franka box pushing environment
@ -26,6 +27,15 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
3. time-spatial-depend sparse reward 3. time-spatial-depend sparse reward
""" """
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 50
}
def __init__(self, frame_skip: int = 10, random_init: bool = False): def __init__(self, frame_skip: int = 10, random_init: bool = False):
utils.EzPickle.__init__(**locals()) utils.EzPickle.__init__(**locals())
self._steps = 0 self._steps = 0
@ -39,11 +49,16 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
self._desired_rod_quat = desired_rod_quat self._desired_rod_quat = desired_rod_quat
self._episode_energy = 0. self._episode_energy = 0.
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(28,), dtype=np.float64
)
self.random_init = random_init self.random_init = random_init
MujocoEnv.__init__(self, MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"), model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"),
frame_skip=self.frame_skip, frame_skip=self.frame_skip,
mujoco_bindings="mujoco") observation_space=self.observation_space)
self.action_space = spaces.Box(low=-1, high=1, shape=(7,)) self.action_space = spaces.Box(low=-1, high=1, shape=(7,))
def step(self, action): def step(self, action):
@ -89,7 +104,11 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False, 'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False,
'num_steps': self._steps 'num_steps': self._steps
} }
return obs, reward, episode_end, infos
terminated = episode_end and infos['is_success']
truncated = episode_end and not infos['is_success']
return obs, reward, terminated, truncated, infos
def reset_model(self): def reset_model(self):
# rest box to initial position # rest box to initial position
@ -250,7 +269,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
old_err_norm = err_norm old_err_norm = err_norm
### get Jacobian by mujoco # get Jacobian by mujoco
self.data.qpos[:7] = q self.data.qpos[:7] = q
mujoco.mj_forward(self.model, self.data) mujoco.mj_forward(self.model, self.data)
@ -284,6 +303,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
return q return q
class BoxPushingDense(BoxPushingEnvBase): class BoxPushingDense(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False): def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init) super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -299,7 +319,7 @@ class BoxPushingDense(BoxPushingEnvBase):
energy_cost = -0.0005 * np.sum(np.square(action)) energy_cost = -0.0005 * np.sum(np.square(action))
reward = joint_penalty + tcp_box_dist_reward + \ reward = joint_penalty + tcp_box_dist_reward + \
box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat) rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat)
if rod_inclined_angle > np.pi / 4: if rod_inclined_angle > np.pi / 4:
@ -307,6 +327,7 @@ class BoxPushingDense(BoxPushingEnvBase):
return reward return reward
class BoxPushingTemporalSparse(BoxPushingEnvBase): class BoxPushingTemporalSparse(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False): def __init__(self, frame_skip: int = 10, random_init: bool = False):
super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init) super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init)
@ -368,6 +389,7 @@ class BoxPushingTemporalSpatialSparse(BoxPushingEnvBase):
return reward return reward
class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase): class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase):
def __init__(self, frame_skip: int = 10, random_init: bool = False): def __init__(self, frame_skip: int = 10, random_init: bool = False):

View File

@ -6,6 +6,27 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'basis_generator_kwargs': {
'basis_bandwidth_factor': 2 # 3.5, 4 to try
}
},
'DMP': {},
'ProDMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'basis_generator_kwargs': {
'basis_bandwidth_factor': 2 # 3.5, 4 to try
}
},
}
# Random x goal + random init pos # Random x goal + random init pos
@property @property
@ -38,3 +59,35 @@ class MPWrapper(RawInterfaceWrapper):
@property @property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.data.qvel[:7].copy() return self.data.qvel[:7].copy()
class ReplanMPWrapper(MPWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {
'controller_kwargs': {
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
},
'trajectory_generator_kwargs': {
'weights_scale': 0.3,
'goal_scale': 0.3,
'auto_scale_basis': True,
'goal_offset': 1.0,
'disable_goal': True,
},
'basis_generator_kwargs': {
'num_basis': 5,
'basis_bandwidth_factor': 3,
},
'phase_generator_kwargs': {
'alpha_phase': 3,
},
'black_box_kwargs': {
'max_planning_times': 4,
'replanning_schedule': lambda pos, vel, obs, action, t: t % 25 == 0,
'condition_on_desired': True,
}
}
}

View File

@ -1,14 +1,68 @@
import os import os
from typing import Tuple, Union, Optional from typing import Tuple, Union, Optional, Any, Dict
import numpy as np import numpy as np
from gym.core import ObsType from gymnasium.core import ObsType
from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv from gymnasium.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100 MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
class HalfCheetahJumpEnv(HalfCheetahEnv): class HalfCheetahEnvCustomXML(HalfCheetahEnv):
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=0.1,
reset_noise_scale=0.1,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if exclude_current_positions_from_observation:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
5,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class HalfCheetahJumpEnv(HalfCheetahEnvCustomXML):
""" """
_ctrl_cost_weight 0.1 -> 0.0 _ctrl_cost_weight 0.1 -> 0.0
""" """
@ -41,10 +95,11 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
height_after = self.get_body_com("torso")[2] height_after = self.get_body_com("torso")[2]
self.max_height = max(height_after, self.max_height) self.max_height = max(height_after, self.max_height)
## Didnt use fell_over, because base env also has no done condition - Paul and Marc # Didnt use fell_over, because base env also has no done condition - Paul and Marc
# fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke? # fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke?
# TODO: Should a fall over be checked here? # TODO: Should a fall over be checked here?
done = False terminated = False
truncated = False
ctrl_cost = self.control_cost(action) ctrl_cost = self.control_cost(action)
costs = ctrl_cost costs = ctrl_cost
@ -63,17 +118,18 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
'max_height': self.max_height 'max_height': self.max_height
} }
return observation, reward, done, info return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.goal) return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: -> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0 self.max_height = 0
self.current_step = 0 self.current_step = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6 self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6
return super().reset() return ret
# overwrite reset_model to make it deterministic # overwrite reset_model to make it deterministic
def reset_model(self): def reset_model(self):

View File

@ -6,6 +6,12 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property @property
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
return np.hstack([ return np.hstack([

View File

@ -0,0 +1,52 @@
<mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
<default>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual>
<map znear="0.02"/>
</visual>
<worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
<body name="torso" pos="0 0 1.25">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
<body name="thigh" pos="0 0 1.05">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
<body name="leg" pos="0 0 0.35">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
<body name="foot" pos="0.13/2 0 0.1">
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
</body>
</body>
</body>
</body>
<body name="goal_site_body" pos = "0 0 0">
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
</body>
</worldbody>
<actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
</actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,52 +1,51 @@
<mujoco model="hopper"> <mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/> <compiler angle="radian" autolimits="true"/>
<default> <option integrator="RK4"/>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual> <visual>
<map znear="0.02"/> <map znear="0.02"/>
</visual> </visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody> <worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/> <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/> <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25"> <body name="torso" pos="0 0 1.25" gravcomp="0">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/> <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/> <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/> <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/> <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/> <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 1.05"> <body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/> <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 0.35"> <body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/> <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.13/2 0 0.1"> <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/> <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/> <site name="foot_site" pos="-0.065 0 -0.06" size="0.02" rgba="1 0 0 1"/>
</body> </body>
</body> </body>
</body> </body>
</body> </body>
<body name="goal_site_body" pos = "0 0 0"> <body name="goal_site_body" pos="0 0 0" gravcomp="0">
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/> <site name="goal_site" pos="0 0 0" size="0.02" rgba="0 1 0 1"/>
</body> </body>
</worldbody> </worldbody>
<actuator> <actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/> <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/> <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/> <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator> </actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco> </mujoco>

View File

@ -1,51 +1,50 @@
<mujoco model="hopper"> <mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/> <compiler angle="radian" autolimits="true"/>
<default> <option integrator="RK4"/>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual> <visual>
<map znear="0.02"/> <map znear="0.02"/>
</visual> </visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody> <worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/> <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/> <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25"> <body name="torso" pos="0 0 1.25" gravcomp="0">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/> <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/> <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/> <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/> <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/> <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 1.05"> <body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/> <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 0.35"> <body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/> <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.13/2 0 0.1"> <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/> <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
</body> </body>
</body> </body>
</body> </body>
<body name="box" pos="1 0 0"> <body name="box" pos="1 0 0" gravcomp="0">
<geom friction="1.0" fromto="0.48 0 0 1 0 0" name="basket_ground_geom" size="0.3" type="box" rgba="1 0 0 1"/> <geom name="basket_ground_geom" size="0.3 0.3 0.26" pos="-0.26 0 0" quat="0.707107 0 -0.707107 0" type="box" rgba="1 0 0 1"/>
</body> </body>
</worldbody> </worldbody>
<actuator> <actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/> <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/> <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/> <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator> </actuator>
<asset> </mujoco>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,12 +1,95 @@
import os import os
import numpy as np import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv from gymnasium.envs.mujoco.hopper_v4 import HopperEnv, DEFAULT_CAMERA_CONFIG
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
import mujoco
MAX_EPISODE_STEPS_HOPPERJUMP = 250 MAX_EPISODE_STEPS_HOPPERJUMP = 250
class HopperJumpEnv(HopperEnv): class HopperEnvCustomXML(HopperEnv):
"""
Initialization changes to normal Hopper:
- terminate_when_unhealthy: True -> False
- healthy_reward: 1.0 -> 2.0
- healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
- healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
- exclude_current_positions_from_observation: True -> False
"""
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=1e-3,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_state_range=(-100.0, 100.0),
healthy_z_range=(0.7, float("inf")),
healthy_angle_range=(-0.2, 0.2),
reset_noise_scale=5e-3,
exclude_current_positions_from_observation=True,
**kwargs,
):
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_state_range,
healthy_z_range,
healthy_angle_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs
)
self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_state_range = healthy_state_range
self._healthy_z_range = healthy_z_range
self._healthy_angle_range = healthy_angle_range
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if not hasattr(self, 'observation_space'):
if exclude_current_positions_from_observation:
self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(15,), dtype=np.float64
)
else:
self.observation_space = Box(
low=-np.inf, high=np.inf, shape=(16,), dtype=np.float64
)
MujocoEnv.__init__(
self,
xml_file,
4,
observation_space=self.observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class HopperJumpEnv(HopperEnvCustomXML):
""" """
Initialization changes to normal Hopper: Initialization changes to normal Hopper:
- terminate_when_unhealthy: True -> False - terminate_when_unhealthy: True -> False
@ -73,7 +156,7 @@ class HopperJumpEnv(HopperEnv):
self.do_simulation(action, self.frame_skip) self.do_simulation(action, self.frame_skip)
height_after = self.get_body_com("torso")[2] height_after = self.get_body_com("torso")[2]
#site_pos_after = self.data.get_site_xpos('foot_site') # site_pos_after = self.data.get_site_xpos('foot_site')
site_pos_after = self.data.site('foot_site').xpos site_pos_after = self.data.site('foot_site').xpos
self.max_height = max(height_after, self.max_height) self.max_height = max(height_after, self.max_height)
@ -88,7 +171,8 @@ class HopperJumpEnv(HopperEnv):
ctrl_cost = self.control_cost(action) ctrl_cost = self.control_cost(action)
costs = ctrl_cost costs = ctrl_cost
done = False terminated = False
truncated = False
goal_dist = np.linalg.norm(site_pos_after - self.goal) goal_dist = np.linalg.norm(site_pos_after - self.goal)
if self.contact_dist is None and self.contact_with_floor: if self.contact_dist is None and self.contact_with_floor:
@ -115,7 +199,7 @@ class HopperJumpEnv(HopperEnv):
healthy=self.is_healthy, healthy=self.is_healthy,
contact_dist=self.contact_dist or 0 contact_dist=self.contact_dist or 0
) )
return observation, reward, done, info return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
# goal_dist = self.data.get_site_xpos('foot_site') - self.goal # goal_dist = self.data.get_site_xpos('foot_site') - self.goal
@ -140,8 +224,8 @@ class HopperJumpEnv(HopperEnv):
noise_high[5] = 0.785 noise_high[5] = 0.785
qpos = ( qpos = (
self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
self.init_qpos self.init_qpos
) )
qvel = ( qvel = (
# self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + # self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
@ -162,12 +246,12 @@ class HopperJumpEnv(HopperEnv):
# floor_geom_id = self.model.geom_name2id('floor') # floor_geom_id = self.model.geom_name2id('floor')
# foot_geom_id = self.model.geom_name2id('foot_geom') # foot_geom_id = self.model.geom_name2id('foot_geom')
# TODO: do this properly over a sensor in the xml file, see dmc hopper # TODO: do this properly over a sensor in the xml file, see dmc hopper
floor_geom_id = self._mujoco_bindings.mj_name2id(self.model, floor_geom_id = mujoco.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM, mujoco.mjtObj.mjOBJ_GEOM,
'floor') 'floor')
foot_geom_id = self._mujoco_bindings.mj_name2id(self.model, foot_geom_id = mujoco.mj_name2id(self.model,
self._mujoco_bindings.mjtObj.mjOBJ_GEOM, mujoco.mjtObj.mjOBJ_GEOM,
'foot_geom') 'foot_geom')
for i in range(self.data.ncon): for i in range(self.data.ncon):
contact = self.data.contact[i] contact = self.data.contact[i]
collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id

View File

@ -1,12 +1,16 @@
import os import os
from typing import Optional, Dict, Any, Tuple
import numpy as np import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv from gymnasium.core import ObsType
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250 MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
class HopperJumpOnBoxEnv(HopperEnv): class HopperJumpOnBoxEnv(HopperEnvCustomXML):
""" """
Initialization changes to normal Hopper: Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.01 -> 0.001 - healthy_reward: 1.0 -> 0.01 -> 0.001
@ -33,6 +37,16 @@ class HopperJumpOnBoxEnv(HopperEnv):
self.hopper_on_box = False self.hopper_on_box = False
self.context = context self.context = context
self.box_x = 1 self.box_x = 1
if exclude_current_positions_from_observation:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(13,), dtype=np.float64
)
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy, super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale, healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
@ -74,10 +88,10 @@ class HopperJumpOnBoxEnv(HopperEnv):
costs = ctrl_cost costs = ctrl_cost
done = fell_over or self.hopper_on_box terminated = fell_over or self.hopper_on_box
if self.current_step >= self.max_episode_steps or done: if self.current_step >= self.max_episode_steps or terminated:
done = False done = False # TODO why are we doing this???
max_height = self.max_height.copy() max_height = self.max_height.copy()
min_distance = self.min_distance.copy() min_distance = self.min_distance.copy()
@ -122,21 +136,25 @@ class HopperJumpOnBoxEnv(HopperEnv):
'goal': self.box_x, 'goal': self.box_x,
} }
return observation, reward, done, info truncated = self.current_step >= self.max_episode_steps and not terminated
return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.box_x) return np.append(super()._get_obs(), self.box_x)
def reset(self): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.max_height = 0 self.max_height = 0
self.min_distance = 5000 self.min_distance = 5000
self.current_step = 0 self.current_step = 0
self.hopper_on_box = False self.hopper_on_box = False
ret = super().reset(seed=seed, options=options)
if self.context: if self.context:
self.box_x = self.np_random.uniform(1, 3, 1) self.box_x = self.np_random.uniform(1, 3, 1)
self.model.body("box").pos = [self.box_x[0], 0, 0] self.model.body("box").pos = [self.box_x[0], 0, 0]
return super().reset() return ret
# overwrite reset_model to make it deterministic # overwrite reset_model to make it deterministic
def reset_model(self): def reset_model(self):
@ -150,21 +168,3 @@ class HopperJumpOnBoxEnv(HopperEnv):
observation = self._get_obs() observation = self._get_obs()
return observation return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperJumpOnBoxEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
# Random x goal + random init pos # Random x goal + random init pos
@property @property

View File

@ -1,56 +1,54 @@
<mujoco model="hopper"> <mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/> <compiler angle="radian" autolimits="true"/>
<default> <option integrator="RK4"/>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual> <visual>
<map znear="0.02"/> <map znear="0.02"/>
</visual> </visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody> <worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/> <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/> <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25"> <body name="torso" pos="0 0 1.25" gravcomp="0">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/> <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/> <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/> <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/> <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/> <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 1.05"> <body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/> <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 0.35"> <body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/> <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.13/2 0 0.1"> <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/> <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
</body> </body>
</body> </body>
</body> </body>
<body name="ball" pos="0 0 1.53"> <body name="ball" pos="0 0 1.53" gravcomp="0">
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1" <geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/> <site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
</body> </body>
</worldbody> </worldbody>
<actuator> <actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/> <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/> <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/> <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator> </actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco> </mujoco>

View File

@ -1,132 +1,129 @@
<mujoco model="hopper"> <mujoco model="hopper">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/> <compiler angle="radian" autolimits="true"/>
<default> <option integrator="RK4"/>
<joint armature="1" damping="1" limited="true"/>
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
</default>
<option integrator="RK4" timestep="0.002"/>
<visual> <visual>
<map znear="0.02"/> <map znear="0.02"/>
</visual> </visual>
<default class="main">
<joint limited="true" armature="1" damping="1"/>
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
</default>
<asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody> <worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/> <geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/> <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25"> <body name="torso" pos="0 0 1.25" gravcomp="0">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/> <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/> <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/> <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/> <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/> <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 1.05"> <body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/> <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="leg" pos="0 0 0.35"> <body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/> <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
<body name="foot" pos="0.13/2 0 0.1"> <body name="foot" pos="0.065 0 -0.25" gravcomp="0">
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/> <geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
</body> </body>
</body> </body>
</body> </body>
<body name="ball" pos="0 0 1.53"> <body name="ball" pos="0 0 1.53" gravcomp="0">
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/> <joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1" <geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/> <site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
</body> </body>
<body name="basket_ground" pos="5 0 0"> <body name="basket_ground" pos="5 0 0" gravcomp="0">
<geom friction="0.9" fromto="5 0 0 5.3 0 0" name="basket_ground_geom" size="0.1 0.4 0.3" type="box"/> <geom name="basket_ground_geom" size="0.1 0.1 0.15" pos="0.15 0 0" quat="0.707107 0 -0.707107 0" type="box" friction="0.9 0.005 0.0001"/>
<body name="edge1" pos="5 0 0"> <body name="edge1" pos="0 0 0" gravcomp="0">
<geom friction="2.0" fromto="5 0 0 5 0 0.2" name="edge1_geom" size="0.04" type="capsule"/> <geom name="edge1_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge2" pos="5 0 0.05"> <body name="edge2" pos="0 0 0.05" gravcomp="0">
<geom friction="2.0" fromto="5 0.05 0 5 0.05 0.2" name="edge2_geom" size="0.04" type="capsule"/> <geom name="edge2_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge3" pos="5 0 0.1"> <body name="edge3" pos="0 0 0.1" gravcomp="0">
<geom friction="2.0" fromto="5 0.1 0 5 0.1 0.2" name="edge3_geom" size="0.04" type="capsule"/> <geom name="edge3_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge4" pos="5 0 0.15"> <body name="edge4" pos="0 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5 0.15 0 5 0.15 0.2" name="edge4_geom" size="0.04" type="capsule"/> <geom name="edge4_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge5" pos="5.05 0 0.15"> <body name="edge5" pos="0.05 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.05 0.15 0 5.05 0.15 0.2" name="edge5_geom" size="0.04" type="capsule"/> <geom name="edge5_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge6" pos="5.1 0 0.15"> <body name="edge6" pos="0.1 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.1 0.15 0 5.1 0.15 0.2" name="edge6_geom" size="0.04" type="capsule"/> <geom name="edge6_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge7" pos="5.15 0 0.15"> <body name="edge7" pos="0.15 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.15 0.15 0 5.15 0.15 0.2" name="edge7_geom" size="0.04" type="capsule"/> <geom name="edge7_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge8" pos="5.2 0 0.15"> <body name="edge8" pos="0.2 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.2 0.15 0 5.2 0.15 0.2" name="edge8_geom" size="0.04" type="capsule"/> <geom name="edge8_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge9" pos="5.25 0 0.15"> <body name="edge9" pos="0.25 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.25 0.15 0 5.25 0.15 0.2" name="edge9_geom" size="0.04" type="capsule"/> <geom name="edge9_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge10" pos="5.3 0 0.15"> <body name="edge10" pos="0.3 0 0.15" gravcomp="0">
<geom friction="2.0" fromto="5.3 0.15 0 5.3 0.15 0.2" name="edge10_geom" size="0.04" type="capsule"/> <geom name="edge10_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge11" pos="5.3 0 0.1"> <body name="edge11" pos="0.3 0 0.1" gravcomp="0">
<geom friction="2.0" fromto="5.3 0.1 0 5.3 0.1 0.2" name="edge11_geom" size="0.04" type="capsule"/> <geom name="edge11_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge12" pos="5.3 0 0.05"> <body name="edge12" pos="0.3 0 0.05" gravcomp="0">
<geom friction="2.0" fromto="5.3 0.05 0 5.3 0.05 0.2" name="edge12_geom" size="0.04" type="capsule"/> <geom name="edge12_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge13" pos="5.3 0 0.0"> <body name="edge13" pos="0.3 0 0" gravcomp="0">
<geom friction="2.0" fromto="5.3 0 0 5.3 0 0.2" name="edge13_geom" size="0.04" type="capsule"/> <geom name="edge13_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge14" pos="5.3 0 -0.05"> <body name="edge14" pos="0.3 0 -0.05" gravcomp="0">
<geom friction="2.0" fromto="5.3 -0.05 0 5.3 -0.05 0.2" name="edge14_geom" size="0.04" type="capsule"/> <geom name="edge14_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge15" pos="5.3 0 -0.1"> <body name="edge15" pos="0.3 0 -0.1" gravcomp="0">
<geom friction="2.0" fromto="5.3 -0.1 0 5.3 -0.1 0.2" name="edge15_geom" size="0.04" type="capsule"/> <geom name="edge15_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge16" pos="5.3 0 -0.15"> <body name="edge16" pos="0.3 0 -0.15" gravcomp="0">
<geom friction="2.0" fromto="5.3 -0.15 0 5.3 -0.15 0.2" name="edge16_geom" size="0.04" type="capsule"/> <geom name="edge16_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
</body> </body>
<body name="edge20" pos="0.25 0 -0.15" gravcomp="0">
<body name="edge20" pos="5.25 0 -0.15"> <geom name="edge20_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5.25 -0.15 0 5.25 -0.15 0.2" name="edge20_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge21" pos="0.2 0 -0.15" gravcomp="0">
<body name="edge21" pos="5.2 0 -0.15"> <geom name="edge21_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5.2 -0.15 0 5.2 -0.15 0.2" name="edge21_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge22" pos="0.15 0 -0.15" gravcomp="0">
<body name="edge22" pos="5.15 0 -0.15"> <geom name="edge22_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5.15 -0.15 0 5.15 -0.15 0.2" name="edge22_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge23" pos="0.1 0 -0.15" gravcomp="0">
<body name="edge23" pos="5.1 0 -0.15"> <geom name="edge23_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5.1 -0.15 0 5.1 -0.15 0.2" name="edge23_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge24" pos="0.05 0 -0.15" gravcomp="0">
<body name="edge24" pos="5.05 0 -0.15"> <geom name="edge24_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5.05 -0.15 0 5.05 -0.15 0.2" name="edge24_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge25" pos="0 0 -0.15" gravcomp="0">
<body name="edge25" pos="5 0 -0.15"> <geom name="edge25_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5 -0.15 0 5 -0.15 0.2" name="edge25_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge26" pos="0 0 -0.1" gravcomp="0">
<body name="edge26" pos="5 0 -0.1"> <geom name="edge26_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5 -0.1 0 5 -0.1 0.2" name="edge26_geom" size="0.04" type="capsule"/> </body>
</body> <body name="edge27" pos="0 0 -0.05" gravcomp="0">
<body name="edge27" pos="5 0 -0.05"> <geom name="edge27_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
<geom friction="2.0" fromto="5 -0.05 0 5 -0.05 0.2" name="edge27_geom" size="0.04" type="capsule"/> </body>
</body>
</body> </body>
</worldbody> </worldbody>
<actuator> <actuator>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/> <general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/> <general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/> <general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
</actuator> </actuator>
<asset> </mujoco>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco>

View File

@ -1,13 +1,15 @@
import os import os
from typing import Optional from typing import Optional, Any, Dict, Tuple
import numpy as np import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv from gymnasium.core import ObsType
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERTHROW = 250 MAX_EPISODE_STEPS_HOPPERTHROW = 250
class HopperThrowEnv(HopperEnv): class HopperThrowEnv(HopperEnvCustomXML):
""" """
Initialization changes to normal Hopper: Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0 -> 0.1 - healthy_reward: 1.0 -> 0.0 -> 0.1
@ -36,6 +38,16 @@ class HopperThrowEnv(HopperEnv):
self.max_episode_steps = max_episode_steps self.max_episode_steps = max_episode_steps
self.context = context self.context = context
self.goal = 0 self.goal = 0
if not hasattr(self, 'observation_space'):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
super().__init__(xml_file=xml_file, super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight, forward_reward_weight=forward_reward_weight,
ctrl_cost_weight=ctrl_cost_weight, ctrl_cost_weight=ctrl_cost_weight,
@ -56,14 +68,14 @@ class HopperThrowEnv(HopperEnv):
# done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc # done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
ball_landed = bool(self.get_body_com("ball")[2] <= 0.05) ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
done = ball_landed terminated = ball_landed
ctrl_cost = self.control_cost(action) ctrl_cost = self.control_cost(action)
costs = ctrl_cost costs = ctrl_cost
rewards = 0 rewards = 0
if self.current_step >= self.max_episode_steps or done: if self.current_step >= self.max_episode_steps or terminated:
distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \ distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
self._forward_reward_weight * ball_pos_after self._forward_reward_weight * ball_pos_after
healthy_reward = 0 if self.context else self.healthy_reward * self.current_step healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
@ -78,16 +90,19 @@ class HopperThrowEnv(HopperEnv):
'_steps': self.current_step, '_steps': self.current_step,
'goal': self.goal, 'goal': self.goal,
} }
truncated = False
return observation, reward, done, info return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.goal) return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0 self.current_step = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0 self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0
return super().reset() return ret
# overwrite reset_model to make it deterministic # overwrite reset_model to make it deterministic
def reset_model(self): def reset_model(self):
@ -101,22 +116,3 @@ class HopperThrowEnv(HopperEnv):
observation = self._get_obs() observation = self._get_obs()
return observation return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperThrowEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -1,13 +1,16 @@
import os import os
from typing import Optional from typing import Optional, Any, Dict, Tuple
import numpy as np import numpy as np
from gym.envs.mujoco.hopper_v4 import HopperEnv from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
from gymnasium.core import ObsType
from gymnasium import spaces
MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250 MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
class HopperThrowInBasketEnv(HopperEnv): class HopperThrowInBasketEnv(HopperEnvCustomXML):
""" """
Initialization changes to normal Hopper: Initialization changes to normal Hopper:
- healthy_reward: 1.0 -> 0.0 - healthy_reward: 1.0 -> 0.0
@ -42,6 +45,16 @@ class HopperThrowInBasketEnv(HopperEnv):
self.context = context self.context = context
self.penalty = penalty self.penalty = penalty
self.basket_x = 5 self.basket_x = 5
if exclude_current_positions_from_observation:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
super().__init__(xml_file=xml_file, super().__init__(xml_file=xml_file,
forward_reward_weight=forward_reward_weight, forward_reward_weight=forward_reward_weight,
@ -65,14 +78,14 @@ class HopperThrowInBasketEnv(HopperEnv):
is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + ( is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
self.basket_size / 2) self.basket_size / 2)
is_in_basket_z = ball_pos[2] < 0.1 is_in_basket_z = ball_pos[2] < 0.1
is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
if is_in_basket: if is_in_basket:
self.ball_in_basket = True self.ball_in_basket = True
ball_landed = self.get_body_com("ball")[2] <= 0.05 ball_landed = self.get_body_com("ball")[2] <= 0.05
done = bool(ball_landed or is_in_basket) terminated = bool(ball_landed or is_in_basket)
rewards = 0 rewards = 0
@ -80,7 +93,7 @@ class HopperThrowInBasketEnv(HopperEnv):
costs = ctrl_cost costs = ctrl_cost
if self.current_step >= self.max_episode_steps or done: if self.current_step >= self.max_episode_steps or terminated:
if is_in_basket: if is_in_basket:
if not self.context: if not self.context:
@ -101,23 +114,27 @@ class HopperThrowInBasketEnv(HopperEnv):
info = { info = {
'ball_pos': ball_pos[0], 'ball_pos': ball_pos[0],
} }
truncated = False
return observation, reward, done, info return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.basket_x) return np.append(super()._get_obs(), self.basket_x)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
if self.max_episode_steps == 10: if self.max_episode_steps == 10:
# We have to initialize this here, because the spec is only added after creating the env. # We have to initialize this here, because the spec is only added after creating the env.
self.max_episode_steps = self.spec.max_episode_steps self.max_episode_steps = self.spec.max_episode_steps
self.current_step = 0 self.current_step = 0
self.ball_in_basket = False self.ball_in_basket = False
ret = super().reset(seed=seed, options=options)
if self.context: if self.context:
self.basket_x = self.np_random.uniform(low=3, high=7, size=1) self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0] self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
return super().reset() return ret
# overwrite reset_model to make it deterministic # overwrite reset_model to make it deterministic
def reset_model(self): def reset_model(self):
@ -132,22 +149,3 @@ class HopperThrowInBasketEnv(HopperEnv):
observation = self._get_obs() observation = self._get_obs()
return observation return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = HopperThrowInBasketEnv()
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):

View File

@ -7,6 +7,16 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {
'phase_generator_kwargs': {
'alpha_phase': 2,
},
},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):
return np.concatenate([[False] * self.n_links, # cos return np.concatenate([[False] * self.n_links, # cos

View File

@ -1,8 +1,9 @@
import os import os
import numpy as np import numpy as np
from gym import utils from gymnasium import utils
from gym.envs.mujoco import MujocoEnv from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_REACHER = 200 MAX_EPISODE_STEPS_REACHER = 200
@ -12,7 +13,17 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
More general version of the gym mujoco Reacher environment More general version of the gym mujoco Reacher environment
""" """
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1): metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 50,
}
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1.,
**kwargs):
utils.EzPickle.__init__(**locals()) utils.EzPickle.__init__(**locals())
self._steps = 0 self._steps = 0
@ -25,10 +36,16 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
file_name = f'reacher_{n_links}links.xml' file_name = f'reacher_{n_links}links.xml'
# sin, cos, velocity * n_Links + goal position (2) and goal distance (3)
shape = (self.n_links * 3 + 5,)
observation_space = Box(low=-np.inf, high=np.inf, shape=shape, dtype=np.float64)
MujocoEnv.__init__(self, MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", file_name), model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
frame_skip=2, frame_skip=2,
mujoco_bindings="mujoco") observation_space=observation_space,
**kwargs
)
def step(self, action): def step(self, action):
self._steps += 1 self._steps += 1
@ -45,10 +62,14 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
reward = reward_dist + reward_ctrl + angular_vel reward = reward_dist + reward_ctrl + angular_vel
self.do_simulation(action, self.frame_skip) self.do_simulation(action, self.frame_skip)
ob = self._get_obs() if self.render_mode == "human":
done = False self.render()
infos = dict( ob = self._get_obs()
terminated = False
truncated = False
info = dict(
reward_dist=reward_dist, reward_dist=reward_dist,
reward_ctrl=reward_ctrl, reward_ctrl=reward_ctrl,
velocity=angular_vel, velocity=angular_vel,
@ -56,7 +77,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
goal=self.goal if hasattr(self, "goal") else None goal=self.goal if hasattr(self, "goal") else None
) )
return ob, reward, done, infos return ob, reward, terminated, truncated, info
def distance_reward(self): def distance_reward(self):
vec = self.get_body_com("fingertip") - self.get_body_com("target") vec = self.get_body_com("fingertip") - self.get_body_com("target")
@ -66,6 +87,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0 return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
def viewer_setup(self): def viewer_setup(self):
assert self.viewer is not None
self.viewer.cam.trackbodyid = 0 self.viewer.cam.trackbodyid = 0
def reset_model(self): def reset_model(self):

View File

@ -7,6 +7,53 @@ from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, j
class TT_MPWrapper(RawInterfaceWrapper): class TT_MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
'phase_generator_kwargs': {
'learn_tau': False,
'learn_delay': False,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'num_basis_zero_start': 1,
'num_basis_zero_goal': 1,
},
'black_box_kwargs': {
'verbose': 2,
},
},
'DMP': {},
'ProDMP': {
'phase_generator_kwargs': {
'learn_tau': True,
'learn_delay': True,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
'alpha_phase': 3,
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 3,
'alpha': 25,
'basis_bandwidth_factor': 3,
},
'trajectory_generator_kwargs': {
'weights_scale': 0.7,
'auto_scale_basis': True,
'relative_goal': True,
'disable_goal': True,
},
},
}
# Random x goal + random init pos # Random x goal + random init pos
@property @property
@ -16,7 +63,7 @@ class TT_MPWrapper(RawInterfaceWrapper):
[False] * 7, # joints velocity [False] * 7, # joints velocity
[True] * 2, # position ball x, y [True] * 2, # position ball x, y
[False] * 1, # position ball z [False] * 1, # position ball z
#[True] * 3, # velocity ball x, y, z # [True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position [True] * 2, # target landing position
# [True] * 1, # time # [True] * 1, # time
]) ])
@ -40,7 +87,42 @@ class TT_MPWrapper(RawInterfaceWrapper):
return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]: return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]:
return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound) return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound)
class TT_MPWrapper_Replan(TT_MPWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {
'phase_generator_kwargs': {
'learn_tau': True,
'learn_delay': True,
'tau_bound': [0.8, 1.5],
'delay_bound': [0.05, 0.15],
'alpha_phase': 3,
},
'controller_kwargs': {
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
},
'basis_generator_kwargs': {
'num_basis': 2,
'alpha': 25,
'basis_bandwidth_factor': 3,
},
'trajectory_generator_kwargs': {
'auto_scale_basis': True,
'goal_offset': 1.0,
},
'black_box_kwargs': {
'max_planning_times': 3,
'replanning_schedule': lambda pos, vel, obs, action, t: t % 50 == 0,
},
},
}
class TTVelObs_MPWrapper(TT_MPWrapper): class TTVelObs_MPWrapper(TT_MPWrapper):
# Will inherit mp_config from TT_MPWrapper
@property @property
def context_mask(self): def context_mask(self):
@ -52,4 +134,20 @@ class TTVelObs_MPWrapper(TT_MPWrapper):
[True] * 3, # velocity ball x, y, z [True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position [True] * 2, # target landing position
# [True] * 1, # time # [True] * 1, # time
]) ])
class TTVelObs_MPWrapper_Replan(TT_MPWrapper_Replan):
# Will inherit mp_config from TT_MPWrapper_Replan
@property
def context_mask(self):
return np.hstack([
[False] * 7, # joints position
[False] * 7, # joints velocity
[True] * 2, # position ball x, y
[False] * 1, # position ball z
[True] * 3, # velocity ball x, y, z
[True] * 2, # target landing position
# [True] * 1, # time
])

View File

@ -1,8 +1,8 @@
import os import os
import numpy as np import numpy as np
from gym import utils, spaces from gymnasium import utils, spaces
from gym.envs.mujoco import MujocoEnv from gymnasium.envs.mujoco import MujocoEnv
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high
@ -22,6 +22,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
""" """
7 DoF table tennis environment 7 DoF table tennis environment
""" """
metadata = {
"render_modes": [
"human",
"rgb_array",
"depth_array",
],
"render_fps": 125
}
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4, def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4,
goal_switching_step: int = None, goal_switching_step: int = None,
enable_artificial_wind: bool = False): enable_artificial_wind: bool = False):
@ -50,11 +60,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
self._artificial_force = 0. self._artificial_force = 0.
if not hasattr(self, 'observation_space'):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
MujocoEnv.__init__(self, MujocoEnv.__init__(self,
model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"), model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"),
frame_skip=frame_skip, frame_skip=frame_skip,
mujoco_bindings="mujoco") observation_space=self.observation_space)
if ctxt_dim == 2: if ctxt_dim == 2:
self.context_bounds = CONTEXT_BOUNDS_2DIMS self.context_bounds = CONTEXT_BOUNDS_2DIMS
elif ctxt_dim == 4: elif ctxt_dim == 4:
@ -83,11 +98,11 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
unstable_simulation = False unstable_simulation = False
if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5: if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5:
new_goal_pos = self._generate_goal_pos(random=True) new_goal_pos = self._generate_goal_pos(random=True)
new_goal_pos[1] = -new_goal_pos[1] new_goal_pos[1] = -new_goal_pos[1]
self._goal_pos = new_goal_pos self._goal_pos = new_goal_pos
self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]]) self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
mujoco.mj_forward(self.model, self.data) mujoco.mj_forward(self.model, self.data)
for _ in range(self.frame_skip): for _ in range(self.frame_skip):
if self._enable_artificial_wind: if self._enable_artificial_wind:
@ -102,7 +117,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball: if not self._hit_ball:
self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \ self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \
self._contact_checker(self._ball_contact_id, self._bat_back_id) self._contact_checker(self._ball_contact_id, self._bat_back_id)
if not self._hit_ball: if not self._hit_ball:
ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id) ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id)
if ball_land_on_floor_no_hit: if ball_land_on_floor_no_hit:
@ -130,9 +145,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
reward = -25 if unstable_simulation else self._get_reward(self._terminated) reward = -25 if unstable_simulation else self._get_reward(self._terminated)
land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \ land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \
if self._ball_landing_pos is not None else 10. if self._ball_landing_pos is not None else 10.
return self._get_obs(), reward, self._terminated, { info = {
"hit_ball": self._hit_ball, "hit_ball": self._hit_ball,
"ball_returned_success": self._ball_return_success, "ball_returned_success": self._ball_return_success,
"land_dist_error": land_dist_err, "land_dist_error": land_dist_err,
@ -140,6 +155,10 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
"num_steps": self._steps, "num_steps": self._steps,
} }
terminated, truncated = self._terminated, False
return self._get_obs(), reward, terminated, truncated, info
def _contact_checker(self, id_1, id_2): def _contact_checker(self, id_1, id_2):
for coni in range(0, self.data.ncon): for coni in range(0, self.data.ncon):
con = self.data.contact[coni] con = self.data.contact[coni]
@ -202,7 +221,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
if not self._hit_ball: if not self._hit_ball:
return 0.2 * (1 - np.tanh(min_r_b_dist**2)) return 0.2 * (1 - np.tanh(min_r_b_dist**2))
if self._ball_landing_pos is None: if self._ball_landing_pos is None:
min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:,:2] - self._goal_pos[:2], axis=1)) min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:, :2] - self._goal_pos[:2], axis=1))
return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2)) return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2))
min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2]) min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2])
over_net_bonus = int(self._ball_landing_pos[0] < 0) over_net_bonus = int(self._ball_landing_pos[0] < 0)
@ -231,13 +250,13 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0)) violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0))
violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0)) violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0))
invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \ invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \
violate_high_bound_error + violate_low_bound_error violate_high_bound_error + violate_low_bound_error
return -invalid_penalty return -invalid_penalty
def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound): def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound):
obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound) penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound)
return obs, penalty, True, { return obs, penalty, True, False, {
"hit_ball": [False], "hit_ball": [False],
"ball_returned_success": [False], "ball_returned_success": [False],
"land_dist_error": [10.], "land_dist_error": [10.],
@ -249,7 +268,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
@staticmethod @staticmethod
def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound): def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound):
time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \ time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \
or action[1] > delay_bound[1] or action[1] < delay_bound[0] or action[1] > delay_bound[1] or action[1] < delay_bound[0]
if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low): if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low):
return False, pos_traj, vel_traj return False, pos_traj, vel_traj
return True, pos_traj, vel_traj return True, pos_traj, vel_traj
@ -257,6 +276,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
class TableTennisWind(TableTennisEnv): class TableTennisWind(TableTennisEnv):
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4): def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4):
self.observation_space = spaces.Box(
low=-np.inf, high=np.inf, shape=(22,), dtype=np.float64
)
super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True) super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True)
def _get_obs(self): def _get_obs(self):

View File

@ -1,64 +1,60 @@
<mujoco model="walker2d"> <mujoco model="walker2d">
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/> <compiler angle="radian" autolimits="true"/>
<default> <option integrator="RK4"/>
<joint armature="0.01" damping=".1" limited="true"/> <default class="main">
<geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/> <joint limited="true" armature="0.01" damping="0.1"/>
<geom conaffinity="0" friction="0.7 0.1 0.1" rgba="0.8 0.6 0.4 1"/>
</default> </default>
<option integrator="RK4" timestep="0.002"/> <asset>
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
<worldbody> <worldbody>
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/> <geom name="floor" size="40 40 40" type="plane" conaffinity="1" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/> <light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
<body name="torso" pos="0 0 1.25"> <body name="torso" pos="0 0 1.25" gravcomp="0">
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/> <joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/> <joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/> <joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/> <geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.1 0.1"/>
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/> <camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
<body name="thigh" pos="0 0 1.05"> <body name="thigh" pos="0 0 -0.2" gravcomp="0">
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/> <geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1"/>
<body name="leg" pos="0 0 0.35"> <body name="leg" pos="0 0 -0.7" gravcomp="0">
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/> <geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1"/>
<body name="foot" pos="0.2/2 0 0.1"> <body name="foot" pos="0.1 0 -0.25" gravcomp="0">
<site name="foot_right_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="0 0 1 1" type="sphere"/> <joint name="foot_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <geom name="foot_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="0.9 0.1 0.1"/>
<geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/> <site name="foot_right_site" pos="-0.1 0 -0.06" size="0.02" rgba="0 0 1 1"/>
</body> </body>
</body> </body>
</body> </body>
<!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right --> <body name="thigh_left" pos="0 0 -0.2" gravcomp="0">
<body name="thigh_left" pos="0 0 1.05"> <joint name="thigh_left_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
<joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/> <geom name="thigh_left_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/> <body name="leg_left" pos="0 0 -0.7" gravcomp="0">
<body name="leg_left" pos="0 0 0.35"> <joint name="leg_left_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
<joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/> <geom name="leg_left_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/> <body name="foot_left" pos="0.1 0 -0.25" gravcomp="0">
<body name="foot_left" pos="0.2/2 0 0.1"> <joint name="foot_left_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
<site name="foot_left_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/> <geom name="foot_left_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="1.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
<joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/> <site name="foot_left_site" pos="-0.1 0 -0.06" size="0.02" rgba="1 0 0 1"/>
<geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
</body> </body>
</body> </body>
</body> </body>
</body> </body>
</worldbody> </worldbody>
<actuator> <actuator>
<!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>--> <general joint="thigh_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/> <general joint="leg_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/> <general joint="foot_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/> <general joint="thigh_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/> <general joint="leg_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/> <general joint="foot_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
<!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
</actuator> </actuator>
<asset>
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
width="100" height="100"/>
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
<material name="geom" texture="texgeom" texuniform="true"/>
</asset>
</mujoco> </mujoco>

View File

@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property @property
def context_mask(self): def context_mask(self):

View File

@ -1,8 +1,13 @@
import os import os
from typing import Optional from typing import Optional, Any, Dict, Tuple
import numpy as np import numpy as np
from gym.envs.mujoco.walker2d_v4 import Walker2dEnv from gymnasium.envs.mujoco.walker2d_v4 import Walker2dEnv, DEFAULT_CAMERA_CONFIG
from gymnasium.core import ObsType
from gymnasium import utils
from gymnasium.envs.mujoco import MujocoEnv
from gymnasium.spaces import Box
MAX_EPISODE_STEPS_WALKERJUMP = 300 MAX_EPISODE_STEPS_WALKERJUMP = 300
@ -11,8 +16,71 @@ MAX_EPISODE_STEPS_WALKERJUMP = 300
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height # to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
# as possible, while landing at a specific target position # as possible, while landing at a specific target position
class Walker2dEnvCustomXML(Walker2dEnv):
def __init__(
self,
xml_file,
forward_reward_weight=1.0,
ctrl_cost_weight=1e-3,
healthy_reward=1.0,
terminate_when_unhealthy=True,
healthy_z_range=(0.8, 2.0),
healthy_angle_range=(-1.0, 1.0),
reset_noise_scale=5e-3,
exclude_current_positions_from_observation=True,
**kwargs,
):
utils.EzPickle.__init__(
self,
xml_file,
forward_reward_weight,
ctrl_cost_weight,
healthy_reward,
terminate_when_unhealthy,
healthy_z_range,
healthy_angle_range,
reset_noise_scale,
exclude_current_positions_from_observation,
**kwargs,
)
class Walker2dJumpEnv(Walker2dEnv): self._forward_reward_weight = forward_reward_weight
self._ctrl_cost_weight = ctrl_cost_weight
self._healthy_reward = healthy_reward
self._terminate_when_unhealthy = terminate_when_unhealthy
self._healthy_z_range = healthy_z_range
self._healthy_angle_range = healthy_angle_range
self._reset_noise_scale = reset_noise_scale
self._exclude_current_positions_from_observation = (
exclude_current_positions_from_observation
)
if exclude_current_positions_from_observation:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
)
else:
observation_space = Box(
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
)
self.observation_space = observation_space
MujocoEnv.__init__(
self,
xml_file,
4,
observation_space=observation_space,
default_camera_config=DEFAULT_CAMERA_CONFIG,
**kwargs,
)
class Walker2dJumpEnv(Walker2dEnvCustomXML):
""" """
healthy reward 1.0 -> 0.005 -> 0.0025 not from alex healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
penalty 10 -> 0 not from alex penalty 10 -> 0 not from alex
@ -54,13 +122,13 @@ class Walker2dJumpEnv(Walker2dEnv):
self.max_height = max(height, self.max_height) self.max_height = max(height, self.max_height)
done = bool(height < 0.2) terminated = bool(height < 0.2)
ctrl_cost = self.control_cost(action) ctrl_cost = self.control_cost(action)
costs = ctrl_cost costs = ctrl_cost
rewards = 0 rewards = 0
if self.current_step >= self.max_episode_steps or done: if self.current_step >= self.max_episode_steps or terminated:
done = True terminated = True
height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal)) height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
healthy_reward = self.healthy_reward * self.current_step healthy_reward = self.healthy_reward * self.current_step
@ -73,17 +141,20 @@ class Walker2dJumpEnv(Walker2dEnv):
'max_height': self.max_height, 'max_height': self.max_height,
'goal': self.goal, 'goal': self.goal,
} }
truncated = False
return observation, reward, done, info return observation, reward, terminated, truncated, info
def _get_obs(self): def _get_obs(self):
return np.append(super()._get_obs(), self.goal) return np.append(super()._get_obs(), self.goal)
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
-> Tuple[ObsType, Dict[str, Any]]:
self.current_step = 0 self.current_step = 0
self.max_height = 0 self.max_height = 0
ret = super().reset(seed=seed, options=options)
self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0 self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0
return super().reset() return ret
# overwrite reset_model to make it deterministic # overwrite reset_model to make it deterministic
def reset_model(self): def reset_model(self):
@ -97,21 +168,3 @@ class Walker2dJumpEnv(Walker2dEnv):
observation = self._get_obs() observation = self._get_obs()
return observation return observation
if __name__ == '__main__':
render_mode = "human" # "human" or "partial" or "final"
env = Walker2dJumpEnv()
obs = env.reset()
for i in range(6000):
# test with random actions
ac = env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
if d:
print('After ', i, ' steps, done: ', d)
env.reset()
env.close()

309
fancy_gym/envs/registry.py Normal file
View File

@ -0,0 +1,309 @@
from typing import Tuple, Union, Callable, List, Dict, Any, Optional
import copy
import importlib
import numpy as np
from collections import defaultdict
from collections.abc import Mapping, MutableMapping
from fancy_gym.utils.make_env_helpers import make_bb
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from gymnasium import register as gym_register
from gymnasium import make as gym_make
from gymnasium.envs.registration import registry as gym_registry
class DefaultMPWrapper(RawInterfaceWrapper):
@property
def context_mask(self):
"""
Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns:
bool array representing the indices of the observations
"""
# If the env already defines a context_mask, we will use that
if hasattr(self.env, 'context_mask'):
return self.env.context_mask
# Otherwise we will use the whole observation as the context. (Write a custom MPWrapper to change this behavior)
return np.full(self.env.observation_space.shape, True)
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
"""
Returns the current position of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using velocity control,
it should, however, be implemented regardless.
E.g. The joint positions that are directly or indirectly controlled by the action.
"""
assert hasattr(self.env, 'current_pos'), 'DefaultMPWrapper was unable to access env.current_pos. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
return self.env.current_pos
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
"""
Returns the current velocity of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using position control,
it should, however, be implemented regardless.
E.g. The joint velocities that are directly or indirectly controlled by the action.
"""
assert hasattr(self.env, 'current_vel'), 'DefaultMPWrapper was unable to access env.current_vel. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
return self.env.current_vel
_BB_DEFAULTS = {
'ProMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'promp'
},
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1,
'basis_bandwidth_factor': 3.0,
},
'black_box_kwargs': {
}
},
'DMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'dmp'
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp'
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'rbf',
'num_basis': 5
},
'black_box_kwargs': {
}
},
'ProDMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'prodmp',
'duration': 2.0,
'weights_scale': 1.0,
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp',
'tau': 1.5,
},
'controller_kwargs': {
'controller_type': 'motor',
'p_gains': 1.0,
'd_gains': 0.1,
},
'basis_generator_kwargs': {
'basis_generator_type': 'prodmp',
'alpha': 10,
'num_basis': 5,
},
'black_box_kwargs': {
}
}
}
KNOWN_MPS = list(_BB_DEFAULTS.keys())
_KNOWN_MPS_PLUS_ALL = KNOWN_MPS + ['all']
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS = {}
def register(
id: str,
entry_point: Optional[Union[Callable, str]] = None,
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
register_step_based: bool = True, # TODO: Detect
add_mp_types: List[str] = KNOWN_MPS,
mp_config_override: Dict[str, Any] = {},
**kwargs
):
"""
Registers a Gymnasium environment, including Movement Primitives (MP) versions.
If you only want to register MP versions for an already registered environment, use fancy_gym.upgrade instead.
Args:
id (str): The unique identifier for the environment.
entry_point (Optional[Union[Callable, str]]): The entry point for creating the environment.
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment.
register_step_based (bool): Whether to also register the raw srtep-based version of the environment (default True).
add_mp_types (List[str]): List of additional MP types to register.
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
**kwargs: Additional keyword arguments which are passed to the environment constructor.
Notes:
- When `register_step_based` is True, the raw environment will also be registered to gymnasium otherwise only mp-versions will be registered.
- `entry_point` can be given as a string, allowing the same notation as gymnasium.
- If `id` already exists in the Gymnasium registry and `register_step_based` is True,
a warning message will be printed, suggesting to set `register_step_based=False` or use `fancy_gym.upgrade`.
Example:
To register a step-based environment with Movement Primitive versions (will use default mp_wrapper):
>>> register("MyEnv-v0", MyEnvClass"my_module:MyEnvClass")
The entry point can also be provided as a string:
>>> register("MyEnv-v0", "my_module:MyEnvClass")
"""
if register_step_based and id in gym_registry:
print(f'[Info] Gymnasium env with id "{id}" already exists. You should supply register_step_based=False or use fancy_gym.upgrade if you only want to register mp versions of an existing env.')
if register_step_based:
assert entry_point != None, 'You need to provide an entry-point, when registering step-based.'
if not callable(mp_wrapper): # mp_wrapper can be given as a String (same notation as for entry_point)
mod_name, attr_name = mp_wrapper.split(':')
mod = importlib.import_module(mod_name)
mp_wrapper = getattr(mod, attr_name)
if register_step_based:
gym_register(id=id, entry_point=entry_point, **kwargs)
upgrade(id, mp_wrapper, add_mp_types, mp_config_override)
def upgrade(
id: str,
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
add_mp_types: List[str] = KNOWN_MPS,
base_id: Optional[str] = None,
mp_config_override: Dict[str, Any] = {},
):
"""
Upgrades an existing Gymnasium environment to include Movement Primitives (MP) versions.
We expect the raw step-based env to be already registered with gymnasium. Otherwise please use fancy_gym.register instead.
Args:
id (str): The unique identifier for the environment.
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment (default is DefaultMPWrapper).
add_mp_types (List[str]): List of additional MP types to register (default is KNOWN_MPS).
base_id (Optional[str]): The unique identifier for the environment to upgrade. Will use id if non is provided. Can be defined to allow multiple registrations of different versions for the same step-based environment.
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
Notes:
- The `id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade. You can also pick a new one, but then `base_id` needs to be provided.
- The `mp_wrapper` parameter specifies the MP wrapper to use, allowing for customization.
- `add_mp_types` can be used to specify additional MP types to register alongside the base environment.
- The `base_id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade.
- `mp_config_override` allows for customizing MP configuration if needed.
Example:
To upgrade an existing environment with MP versions:
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper)
To upgrade an existing environment with custom MP types and configuration:
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper, add_mp_types=["ProDMP", "DMP"], mp_config_override={"param": 42})
"""
if not base_id:
base_id = id
register_mps(id, base_id, mp_wrapper, add_mp_types, mp_config_override)
def register_mps(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, add_mp_types: List[str] = KNOWN_MPS, mp_config_override: Dict[str, Any] = {}):
for mp_type in add_mp_types:
register_mp(id, base_id, mp_wrapper, mp_type, mp_config_override.get(mp_type, {}))
def register_mp(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, mp_type: List[str], mp_config_override: Dict[str, Any] = {}):
assert mp_type in KNOWN_MPS, 'Unknown mp_type'
assert id not in ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type], f'The environment {id} is already registered for {mp_type}.'
parts = id.split('/')
if len(parts) == 1:
ns, name = 'gym', parts[0]
elif len(parts) == 2:
ns, name = parts[0], parts[1]
else:
raise ValueError('env id can not contain multiple "/".')
parts = name.split('-')
assert len(parts) >= 2 and parts[-1].startswith('v'), 'Malformed env id, must end in -v{int}.'
fancy_id = f'{ns}_{mp_type}/{name}'
gym_register(
id=fancy_id,
entry_point=bb_env_constructor,
kwargs={
'underlying_id': base_id,
'mp_wrapper': mp_wrapper,
'mp_type': mp_type,
'_mp_config_override_register': mp_config_override
}
)
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type].append(fancy_id)
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all'].append(fancy_id)
if ns not in MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS:
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns] = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns][mp_type].append(fancy_id)
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]['all'].append(fancy_id)
def nested_update(base: MutableMapping, update):
"""
Updated method for nested Mappings
Args:
base: main Mapping to be updated
update: updated values for base Mapping
"""
if any([item.endswith('_type') for item in update]):
base = update
return base
for k, v in update.items():
base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
return base
def bb_env_constructor(underlying_id, mp_wrapper, mp_type, mp_config_override={}, _mp_config_override_register={}, **kwargs):
raw_underlying_env = gym_make(underlying_id, **kwargs)
underlying_env = mp_wrapper(raw_underlying_env)
mp_config = getattr(underlying_env, 'mp_config') if hasattr(underlying_env, 'mp_config') else {}
active_mp_config = copy.deepcopy(mp_config.get(mp_type, {}))
global_inherit_defaults = mp_config.get('inherit_defaults', True)
inherit_defaults = active_mp_config.pop('inherit_defaults', global_inherit_defaults)
config = copy.deepcopy(_BB_DEFAULTS[mp_type]) if inherit_defaults else {}
nested_update(config, active_mp_config)
nested_update(config, _mp_config_override_register)
nested_update(config, mp_config_override)
wrappers = config.pop('wrappers')
traj_gen_kwargs = config.pop('trajectory_generator_kwargs', {})
black_box_kwargs = config.pop('black_box_kwargs', {})
contr_kwargs = config.pop('controller_kwargs', {})
phase_kwargs = config.pop('phase_generator_kwargs', {})
basis_kwargs = config.pop('basis_generator_kwargs', {})
return make_bb(underlying_env,
wrappers=wrappers,
black_box_kwargs=black_box_kwargs,
traj_gen_kwargs=traj_gen_kwargs,
controller_kwargs=contr_kwargs,
phase_kwargs=phase_kwargs,
basis_kwargs=basis_kwargs,
**config)

View File

@ -1,20 +1,23 @@
import gymnasium as gym
import fancy_gym import fancy_gym
def example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False):
env = fancy_gym.make(env_name, seed=seed) def example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False):
env.reset() env = gym.make(env_name)
env.reset(seed=seed)
for i in range(iterations): for i in range(iterations):
done = False done = False
while done is False: while done is False:
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
if render: if render:
env.render(mode="human") env.render(mode="human")
if done: if terminated or truncated:
env.reset() env.reset()
env.close() env.close()
del env del env
def example_custom_replanning_envs(seed=0, iteration=100, render=True): def example_custom_replanning_envs(seed=0, iteration=100, render=True):
# id for a step-based environment # id for a step-based environment
base_env_id = "BoxPushingDense-v0" base_env_id = "BoxPushingDense-v0"
@ -22,7 +25,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper] wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper]
trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp', trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp',
'weight_scale': 1} 'weights_scale': 1}
phase_generator_kwargs = {'phase_generator_type': 'exp'} phase_generator_kwargs = {'phase_generator_type': 'exp'}
controller_kwargs = {'controller_type': 'velocity'} controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'prodmp', basis_generator_kwargs = {'basis_generator_type': 'prodmp',
@ -46,8 +49,8 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
for i in range(iteration): for i in range(iteration):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
if done: if terminated or truncated:
env.reset() env.reset()
env.close() env.close()
@ -56,7 +59,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
if __name__ == "__main__": if __name__ == "__main__":
# run a registered replanning environment # run a registered replanning environment
example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False) example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False)
# run a custom replanning environment # run a custom replanning environment
example_custom_replanning_envs(seed=0, iteration=8, render=True) example_custom_replanning_envs(seed=0, iteration=8, render=True)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym import fancy_gym
def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True): def example_dmc(env_id="dm_control/fish-swim", seed=1, iterations=1000, render=True):
""" """
Example for running a DMC based env in the step based setting. Example for running a DMC based env in the step based setting.
The env_id has to be specified as `domain_name:task_name` or The env_id has to be specified as `domain_name:task_name` or
@ -16,9 +17,9 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
Returns: Returns:
""" """
env = fancy_gym.make(env_id, seed) env = gym.make(env_id)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape) print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape) print("action shape:", env.action_space.shape)
@ -26,10 +27,10 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
ac = env.action_space.sample() ac = env.action_space.sample()
if render: if render:
env.render(mode="human") env.render(mode="human")
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(env_id, rewards) print(env_id, rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
@ -56,7 +57,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
""" """
# Base DMC name, according to structure of above example # Base DMC name, according to structure of above example
base_env_id = "dmc:ball_in_cup-catch" base_env_id = "dm_control/ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed. # You can also add other gym.Wrappers in case they are needed.
@ -65,8 +66,8 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'} trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
phase_generator_kwargs = {'phase_generator_type': 'linear'} phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'motor', controller_kwargs = {'controller_type': 'motor',
"p_gains": 1.0, "p_gains": 1.0,
"d_gains": 0.1,} "d_gains": 0.1, }
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf', basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
'num_basis': 5, 'num_basis': 5,
'num_basis_zero_start': 1 'num_basis_zero_start': 1
@ -102,10 +103,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(iterations): for i in range(iterations):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(base_env_id, rewards) print(base_env_id, rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
@ -123,14 +124,14 @@ if __name__ == '__main__':
render = True render = True
# # Standard DMC Suite tasks # # Standard DMC Suite tasks
example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render) example_dmc("dm_control/fish-swim", seed=10, iterations=1000, render=render)
# #
# # Manipulation tasks # # Manipulation tasks
# # Disclaimer: The vision versions are currently not integrated and yield an error # # Disclaimer: The vision versions are currently not integrated and yield an error
example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render) example_dmc("dm_control/manipulation-reach_site_features", seed=10, iterations=250, render=render)
# #
# # Gym + DMC hybrid task provided in the MP framework # # Gym + DMC hybrid task provided in the MP framework
example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render) example_dmc("dm_control_ProMP/ball_in_cup-catch-v0", seed=10, iterations=1, render=render)
# Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is # Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
# already registered above # already registered above

View File

@ -1,6 +1,6 @@
from collections import defaultdict from collections import defaultdict
import gym import gymnasium as gym
import numpy as np import numpy as np
import fancy_gym import fancy_gym
@ -21,27 +21,27 @@ def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
""" """
env = fancy_gym.make(env_id, seed) env = gym.make(env_id)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset(seed=seed)
print("Observation shape: ", env.observation_space.shape) print("Observation shape: ", env.observation_space.shape)
print("Action shape: ", env.action_space.shape) print("Action shape: ", env.action_space.shape)
# number of environment steps # number of environment steps
for i in range(iterations): for i in range(iterations):
obs, reward, done, info = env.step(env.action_space.sample()) obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
rewards += reward rewards += reward
if render: if render:
env.render() env.render()
if done: if terminated or truncated:
print(rewards) print(rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800): def example_async(env_id="fancy/HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
""" """
Example for running any env in a vectorized multiprocessing setting to generate more samples faster. Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
This also includes DMC and DMP environments when leveraging our custom make_env function. This also includes DMC and DMP environments when leveraging our custom make_env function.
@ -69,12 +69,15 @@ def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samp
# this would generate more samples than requested if n_samples % num_envs != 0 # this would generate more samples than requested if n_samples % num_envs != 0
repeat = int(np.ceil(n_samples / env.num_envs)) repeat = int(np.ceil(n_samples / env.num_envs))
for i in range(repeat): for i in range(repeat):
obs, reward, done, info = env.step(env.action_space.sample()) obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
buffer['obs'].append(obs) buffer['obs'].append(obs)
buffer['reward'].append(reward) buffer['reward'].append(reward)
buffer['done'].append(done) buffer['terminated'].append(terminated)
buffer['truncated'].append(truncated)
buffer['info'].append(info) buffer['info'].append(info)
rewards += reward rewards += reward
done = terminated or truncated
if np.any(done): if np.any(done):
print(f"Reward at iteration {i}: {rewards[done]}") print(f"Reward at iteration {i}: {rewards[done]}")
rewards[done] = 0 rewards[done] = 0
@ -90,11 +93,10 @@ if __name__ == '__main__':
example_general("Pendulum-v1", seed=10, iterations=200, render=render) example_general("Pendulum-v1", seed=10, iterations=200, render=render)
# Mujoco task from framework # Mujoco task from framework
example_general("Reacher5d-v0", seed=10, iterations=200, render=render) example_general("fancy/Reacher5d-v0", seed=10, iterations=200, render=render)
# # OpenAI Mujoco task # # OpenAI Mujoco task
example_general("HalfCheetah-v2", seed=10, render=render) example_general("HalfCheetah-v2", seed=10, render=render)
# Vectorized multiprocessing environments # Vectorized multiprocessing environments
# example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200) # example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym import fancy_gym
def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): def example_meta(env_id="fish-swim", seed=1, iterations=1000, render=True):
""" """
Example for running a MetaWorld based env in the step based setting. Example for running a MetaWorld based env in the step based setting.
The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
@ -17,9 +18,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
Returns: Returns:
""" """
env = fancy_gym.make(env_id, seed) env = gym.make(env_id)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset(seed=seed)
print("observation shape:", env.observation_space.shape) print("observation shape:", env.observation_space.shape)
print("action shape:", env.action_space.shape) print("action shape:", env.action_space.shape)
@ -29,9 +30,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
# THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM # THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
# TODO: Remove this, when Metaworld fixes its interface. # TODO: Remove this, when Metaworld fixes its interface.
env.render(False) env.render(False)
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(env_id, rewards) print(env_id, rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
@ -40,7 +41,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
del env del env
def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): def example_custom_meta_and_mp(seed=1, iterations=1, render=True):
""" """
Example for running a custom movement primitive based environments. Example for running a custom movement primitive based environments.
Our already registered environments follow the same structure. Our already registered environments follow the same structure.
@ -58,7 +59,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
""" """
# Base MetaWorld name, according to structure of above example # Base MetaWorld name, according to structure of above example
base_env_id = "metaworld:button-press-v2" base_env_id = "metaworld/button-press-v2"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed. # You can also add other gym.Wrappers in case they are needed.
@ -103,10 +104,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(iterations): for i in range(iterations):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(base_env_id, rewards) print(base_env_id, rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
@ -124,11 +125,10 @@ if __name__ == '__main__':
render = False render = False
# # Standard Meta world tasks # # Standard Meta world tasks
example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render) example_meta("metaworld/button-press-v2", seed=10, iterations=500, render=render)
# # MP + MetaWorld hybrid task provided in the our framework # # MP + MetaWorld hybrid task provided in the our framework
example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render) example_meta("metaworld_ProMP/ButtonPress-v2", seed=10, iterations=1, render=render)
# #
# # Custom MetaWorld task # # Custom MetaWorld task
example_custom_dmc_and_mp(seed=10, iterations=1, render=render) example_custom_meta_and_mp(seed=10, iterations=1, render=render)

View File

@ -1,7 +1,8 @@
import gymnasium as gym
import fancy_gym import fancy_gym
def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True): def example_mp(env_name="fancy_ProMP/HoleReacher-v0", seed=1, iterations=1, render=True):
""" """
Example for running a black box based environment, which is already registered Example for running a black box based environment, which is already registered
Args: Args:
@ -15,11 +16,11 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
""" """
# Equivalent to gym, we have a make function which can be used to create environments. # Equivalent to gym, we have a make function which can be used to create environments.
# It takes care of seeding and enables the use of a variety of external environments using the gym interface. # It takes care of seeding and enables the use of a variety of external environments using the gym interface.
env = fancy_gym.make(env_name, seed) env = gym.make(env_name)
returns = 0 returns = 0
# env.render(mode=None) # env.render(mode=None)
obs = env.reset() obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(iterations): for i in range(iterations):
@ -41,16 +42,16 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
# This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the # This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
# full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal # full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
# to the return of a trajectory. Default is the sum over the step-wise rewards. # to the return of a trajectory. Default is the sum over the step-wise rewards.
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
# Aggregated returns # Aggregated returns
returns += reward returns += reward
if done: if terminated or truncated:
print(reward) print(reward)
obs = env.reset() obs = env.reset()
def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True): def example_custom_mp(env_name="fancy_ProMP/Reacher5d-v0", seed=1, iterations=1, render=True):
""" """
Example for running a movement primitive based environment, which is already registered Example for running a movement primitive based environment, which is already registered
Args: Args:
@ -62,12 +63,9 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
Returns: Returns:
""" """
# Changing the arguments of the black box env is possible by providing them to gym as with all kwargs. # Changing the arguments of the black box env is possible by providing them to gym through mp_config_override.
# E.g. here for way to many basis functions # E.g. here for way to many basis functions
env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000}) env = gym.make(env_name, seed, mp_config_override={'basis_generator_kwargs': {'num_basis': 1000}})
# env = fancy_gym.make(env_name, seed)
# mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
# mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
returns = 0 returns = 0
obs = env.reset() obs = env.reset()
@ -79,10 +77,10 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(iterations): for i in range(iterations):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
returns += reward returns += reward
if done: if terminated or truncated:
print(i, reward) print(i, reward)
obs = env.reset() obs = env.reset()
@ -106,7 +104,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
""" """
base_env_id = "Reacher5d-v0" base_env_id = "fancy/Reacher5d-v0"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
# You can also add other gym.Wrappers in case they are needed. # You can also add other gym.Wrappers in case they are needed.
@ -114,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# For a ProMP # For a ProMP
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp', trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
'weight_scale': 2} 'weights_scale': 2}
phase_generator_kwargs = {'phase_generator_type': 'linear'} phase_generator_kwargs = {'phase_generator_type': 'linear'}
controller_kwargs = {'controller_type': 'velocity'} controller_kwargs = {'controller_type': 'velocity'}
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf', basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
@ -124,7 +122,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# # For a DMP # # For a DMP
# trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp', # trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
# 'weight_scale': 500} # 'weights_scale': 500}
# phase_generator_kwargs = {'phase_generator_type': 'exp', # phase_generator_kwargs = {'phase_generator_type': 'exp',
# 'alpha_phase': 2.5} # 'alpha_phase': 2.5}
# controller_kwargs = {'controller_type': 'velocity'} # controller_kwargs = {'controller_type': 'velocity'}
@ -145,10 +143,10 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(iterations): for i in range(iterations):
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
rewards += reward rewards += reward
if done: if terminated or truncated:
print(rewards) print(rewards)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()
@ -157,20 +155,20 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
if __name__ == '__main__': if __name__ == '__main__':
render = False render = False
# DMP # DMP
example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render) example_mp("fancy_DMP/HoleReacher-v0", seed=10, iterations=5, render=render)
# ProMP # ProMP
example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render) example_mp("fancy_ProMP/HoleReacher-v0", seed=10, iterations=5, render=render)
example_mp("BoxPushingTemporalSparseProMP-v0", seed=10, iterations=1, render=render) example_mp("fancy_ProMP/BoxPushingTemporalSparse-v0", seed=10, iterations=1, render=render)
example_mp("TableTennis4DProMP-v0", seed=10, iterations=20, render=render) example_mp("fancy_ProMP/TableTennis4D-v0", seed=10, iterations=20, render=render)
# ProDMP with Replanning # ProDMP with Replanning
example_mp("BoxPushingDenseReplanProDMP-v0", seed=10, iterations=4, render=render) example_mp("fancy_ProDMP/BoxPushingDenseReplan-v0", seed=10, iterations=4, render=render)
example_mp("TableTennis4DReplanProDMP-v0", seed=10, iterations=20, render=render) example_mp("fancy_ProDMP/TableTennis4DReplan-v0", seed=10, iterations=20, render=render)
example_mp("TableTennisWindReplanProDMP-v0", seed=10, iterations=20, render=render) example_mp("fancy_ProDMP/TableTennisWindReplan-v0", seed=10, iterations=20, render=render)
# Altered basis functions # Altered basis functions
obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render) obs1 = example_custom_mp("fancy_ProMP/Reacher5d-v0", seed=10, iterations=1, render=render)
# Custom MP # Custom MP
example_fully_custom_mp(seed=10, iterations=1, render=render) example_fully_custom_mp(seed=10, iterations=1, render=render)

View File

@ -1,3 +1,4 @@
import gymnasium as gym
import fancy_gym import fancy_gym
@ -12,11 +13,10 @@ def example_mp(env_name, seed=1, render=True):
Returns: Returns:
""" """
# While in this case gym.make() is possible to use as well, we recommend our custom make env function. env = gym.make(env_name)
env = fancy_gym.make(env_name, seed)
returns = 0 returns = 0
obs = env.reset() obs = env.reset(seed=seed)
# number of samples/full trajectories (multiple environment steps) # number of samples/full trajectories (multiple environment steps)
for i in range(10): for i in range(10):
if render and i % 2 == 0: if render and i % 2 == 0:
@ -24,14 +24,13 @@ def example_mp(env_name, seed=1, render=True):
else: else:
env.render() env.render()
ac = env.action_space.sample() ac = env.action_space.sample()
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
returns += reward returns += reward
if done: if terminated or truncated:
print(returns) print(returns)
obs = env.reset() obs = env.reset()
if __name__ == '__main__': if __name__ == '__main__':
example_mp("ReacherProMP-v2") example_mp("gym_ProMP/Reacher-v2")

View File

@ -1,10 +1,14 @@
import gymnasium as gym
import fancy_gym import fancy_gym
def compare_bases_shape(env1_id, env2_id): def compare_bases_shape(env1_id, env2_id):
env1 = fancy_gym.make(env1_id, seed=0) env1 = gym.make(env1_id)
env1.traj_gen.show_scaled_basis(plot=True) env1.traj_gen.show_scaled_basis(plot=True)
env2 = fancy_gym.make(env2_id, seed=0) env2 = gym.make(env2_id)
env2.traj_gen.show_scaled_basis(plot=True) env2.traj_gen.show_scaled_basis(plot=True)
return return
if __name__ == '__main__': if __name__ == '__main__':
compare_bases_shape("TableTennis4DProDMP-v0", "TableTennis4DProMP-v0") compare_bases_shape("fancy_ProDMP/TableTennis4D-v0", "fancy_ProMP/TableTennis4D-v0")

View File

@ -3,19 +3,20 @@ from collections import OrderedDict
import numpy as np import numpy as np
from matplotlib import pyplot as plt from matplotlib import pyplot as plt
import gymnasium as gym
import fancy_gym import fancy_gym
# This might work for some environments, however, please verify either way the correct trajectory information # This might work for some environments, however, please verify either way the correct trajectory information
# for your environment are extracted below # for your environment are extracted below
SEED = 1 SEED = 1
env_id = "Reacher5dProMP-v0" env_id = "fancy_ProMP/Reacher5d-v0"
env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env env = fancy_gym.make(env_id, mp_config_override={'controller_kwargs': {'p_gains': 0.05, 'd_gains': 0.05}}).env
env.action_space.seed(SEED) env.action_space.seed(SEED)
# Plot difference between real trajectory and target MP trajectory # Plot difference between real trajectory and target MP trajectory
env.reset() env.reset(seed=SEED)
w = env.action_space.sample() w = env.action_space.sample()
pos, vel = env.get_trajectory(w) pos, vel = env.get_trajectory(w)
@ -34,7 +35,7 @@ fig.show()
for t, (des_pos, des_vel) in enumerate(zip(pos, vel)): for t, (des_pos, des_vel) in enumerate(zip(pos, vel)):
actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel) actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel)
actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high) actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
_, _, _, _ = env.env.step(actions) env.env.step(actions)
if t % 15 == 0: if t % 15 == 0:
img.set_data(env.env.render(mode="rgb_array")) img.set_data(env.env.render(mode="rgb_array"))
fig.canvas.draw() fig.canvas.draw()

View File

@ -1,26 +1,64 @@
# MetaWorld Wrappers # Metaworld
These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Movement Primitive gym interface with them. [Metaworld](https://meta-world.github.io/) is an open-source simulated benchmark designed to advance meta-reinforcement learning and multi-task learning, comprising 50 diverse robotic manipulation tasks. The benchmark features a universal tabletop environment equipped with a simulated Sawyer arm and a variety of everyday objects. This shared environment is pivotal for reusing structured learning and efficiently acquiring related tasks.
All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized.
Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following: ## Step-Based Envs
```python
return np.hstack([ `fancy_gym` makes all metaworld ML1 tasks avaible via the standard gym interface. To access metaworld environments using a different mode of operation (MT1 / ML100 / etc.) please use the functionality provided by metaworld directly.
# Current observation
[False] * 3, # end-effector position | Name | Description | Horizon | Action Dimension | Observation Dimension | Context Dimension |
[False] * 1, # normalized gripper open distance | ---------------------------------------- | ------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- | ----------------- |
[True] * 3, # main object position | `metaworld/assembly-v2` | A task where the robot must assemble components. | 500 | 4 | 39 | 6 |
[False] * 4, # main object quaternion | `metaworld/basketball-v2` | A task where the robot must play a game of basketball. | 500 | 4 | 39 | 6 |
[False] * 3, # secondary object position | `metaworld/bin-picking-v2` | A task involving the robot picking objects from a bin. | 500 | 4 | 39 | 6 |
[False] * 4, # secondary object quaternion | `metaworld/box-close-v2` | A task requiring the robot to close a box. | 500 | 4 | 39 | 6 |
# Previous observation | `metaworld/button-press-topdown-v2` | A task where the robot must press a button from a top-down perspective. | 500 | 4 | 39 | 6 |
[False] * 3, # previous end-effector position | `metaworld/button-press-topdown-wall-v2` | A task involving the robot pressing a button with a wall from a top-down perspective. | 500 | 4 | 39 | 6 |
[False] * 1, # previous normalized gripper open distance | `metaworld/button-press-v2` | A task where the robot must press a button. | 500 | 4 | 39 | 6 |
[False] * 3, # previous main object position | `metaworld/button-press-wall-v2` | A task involving the robot pressing a button with a wall. | 500 | 4 | 39 | 6 |
[False] * 4, # previous main object quaternion | `metaworld/coffee-button-v2` | A task where the robot must press a button on a coffee machine. | 500 | 4 | 39 | 6 |
[False] * 3, # previous second object position | `metaworld/coffee-pull-v2` | A task involving the robot pulling a lever on a coffee machine. | 500 | 4 | 39 | 6 |
[False] * 4, # previous second object quaternion | `metaworld/coffee-push-v2` | A task involving the robot pushing a component on a coffee machine. | 500 | 4 | 39 | 6 |
# Goal | `metaworld/dial-turn-v2` | A task where the robot must turn a dial. | 500 | 4 | 39 | 6 |
[True] * 3, # goal position | `metaworld/disassemble-v2` | A task requiring the robot to disassemble an object. | 500 | 4 | 39 | 6 |
]) | `metaworld/door-close-v2` | A task where the robot must close a door. | 500 | 4 | 39 | 6 |
``` | `metaworld/door-lock-v2` | A task involving the robot locking a door. | 500 | 4 | 39 | 6 |
For other tasks only the boolean values have to be adjusted accordingly. | `metaworld/door-open-v2` | A task where the robot must open a door. | 500 | 4 | 39 | 6 |
| `metaworld/door-unlock-v2` | A task involving the robot unlocking a door. | 500 | 4 | 39 | 6 |
| `metaworld/hand-insert-v2` | A task requiring the robot to insert a hand into an object. | 500 | 4 | 39 | 6 |
| `metaworld/drawer-close-v2` | A task where the robot must close a drawer. | 500 | 4 | 39 | 6 |
| `metaworld/drawer-open-v2` | A task involving the robot opening a drawer. | 500 | 4 | 39 | 6 |
| `metaworld/faucet-open-v2` | A task requiring the robot to open a faucet. | 500 | 4 | 39 | 6 |
| `metaworld/faucet-close-v2` | A task where the robot must close a faucet. | 500 | 4 | 39 | 6 |
| `metaworld/hammer-v2` | A task where the robot must use a hammer. | 500 | 4 | 39 | 6 |
| `metaworld/handle-press-side-v2` | A task involving the robot pressing a handle from the side. | 500 | 4 | 39 | 6 |
| `metaworld/handle-press-v2` | A task where the robot must press a handle. | 500 | 4 | 39 | 6 |
| `metaworld/handle-pull-side-v2` | A task requiring the robot to pull a handle from the side. | 500 | 4 | 39 | 6 |
| `metaworld/handle-pull-v2` | A task where the robot must pull a handle. | 500 | 4 | 39 | 6 |
| `metaworld/lever-pull-v2` | A task involving the robot pulling a lever. | 500 | 4 | 39 | 6 |
| `metaworld/peg-insert-side-v2` | A task requiring the robot to insert a peg from the side. | 500 | 4 | 39 | 6 |
| `metaworld/pick-place-wall-v2` | A task involving the robot picking and placing an object with a wall. | 500 | 4 | 39 | 6 |
| `metaworld/pick-out-of-hole-v2` | A task where the robot must pick an object out of a hole. | 500 | 4 | 39 | 6 |
| `metaworld/reach-v2` | A task where the robot must reach an object. | 500 | 4 | 39 | 6 |
| `metaworld/push-back-v2` | A task involving the robot pushing an object backward. | 500 | 4 | 39 | 6 |
| `metaworld/push-v2` | A task where the robot must push an object. | 500 | 4 | 39 | 6 |
| `metaworld/pick-place-v2` | A task involving the robot picking up and placing an object. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-v2` | A task requiring the robot to slide a plate. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-side-v2` | A task involving the robot sliding a plate from the side. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-back-v2` | A task where the robot must slide a plate backward. | 500 | 4 | 39 | 6 |
| `metaworld/plate-slide-back-side-v2` | A task involving the robot sliding a plate backward from the side. | 500 | 4 | 39 | 6 |
| `metaworld/peg-unplug-side-v2` | A task where the robot must unplug a peg from the side. | 500 | 4 | 39 | 6 |
| `metaworld/soccer-v2` | A task where the robot must play soccer. | 500 | 4 | 39 | 6 |
| `metaworld/stick-push-v2` | A task involving the robot pushing a stick. | 500 | 4 | 39 | 6 |
| `metaworld/stick-pull-v2` | A task where the robot must pull a stick. | 500 | 4 | 39 | 6 |
| `metaworld/push-wall-v2` | A task involving the robot pushing against a wall. | 500 | 4 | 39 | 6 |
| `metaworld/reach-wall-v2` | A task where the robot must reach an object with a wall. | 500 | 4 | 39 | 6 |
| `metaworld/shelf-place-v2` | A task involving the robot placing an object on a shelf. | 500 | 4 | 39 | 6 |
| `metaworld/sweep-into-v2` | A task where the robot must sweep objects into a container. | 500 | 4 | 39 | 6 |
| `metaworld/sweep-v2` | A task requiring the robot to sweep. | 500 | 4 | 39 | 6 |
| `metaworld/window-open-v2` | A task where the robot must open a window. | 500 | 4 | 39 | 6 |
| `metaworld/window-close-v2` | A task involving the robot closing a window. | 500 | 4 | 39 | 6 |
## MP-Based Envs
All envs also exist in MP-variants. Refer to them using `metaworld_ProMP/<name-v2>` or `metaworld_ProDMP/<name-v2>` (DMP is currently not supported as of now).

View File

@ -1,125 +1,37 @@
from typing import Iterable, Type, Union, Optional
from copy import deepcopy from copy import deepcopy
from gym import register from ..envs.registry import register
from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \ from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
object_change_mp_wrapper object_change_mp_wrapper
from . import metaworld_adapter
metaworld_adapter.register_all_ML1()
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []} ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
# MetaWorld # MetaWorld
DEFAULT_BB_DICT_ProMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp',
'weights_scale': 10,
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'metaworld',
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
},
'black_box_kwargs': {
'condition_on_desired': False,
}
}
DEFAULT_BB_DICT_ProDMP = {
"name": 'EnvName',
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'prodmp',
'auto_scale_basis': True,
'weights_scale': 10,
# 'goal_scale': 0.,
'disable_goal': True,
},
"phase_generator_kwargs": {
'phase_generator_type': 'exp',
# 'alpha_phase' : 3,
},
"controller_kwargs": {
'controller_type': 'metaworld',
},
"basis_generator_kwargs": {
'basis_generator_type': 'prodmp',
'num_basis': 5,
'alpha': 10
},
'black_box_kwargs': {
'condition_on_desired': False,
}
}
_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2", _goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
"plate-slide-side-v2", "plate-slide-back-side-v2"] "plate-slide-side-v2", "plate-slide-back-side-v2"]
for _task in _goal_change_envs: for _task in _goal_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
register( register(
id=_env_id, id=f'metaworld/{_task}',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_goal_change_promp mp_wrapper=goal_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
) )
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_change_prodmp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"] _object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
for _task in _object_change_envs: for _task in _object_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
register( register(
id=_env_id, id=f'metaworld/{_task}',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_object_change_promp mp_wrapper=object_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
) )
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_object_change_prodmp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
kwargs_dict_object_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_object_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2", _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
"button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2", "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
@ -133,62 +45,18 @@ _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press
"shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2" "shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
] ]
for _task in _goal_and_object_change_envs: for _task in _goal_and_object_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
register( register(
id=_env_id, id=f'metaworld/{_task}',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_goal_and_object_change_promp mp_wrapper=goal_object_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
) )
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_and_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_and_object_change_prodmp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_object_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_object_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
_goal_and_endeffector_change_envs = ["basketball-v2"] _goal_and_endeffector_change_envs = ["basketball-v2"]
for _task in _goal_and_endeffector_change_envs: for _task in _goal_and_endeffector_change_envs:
task_id_split = _task.split("-")
name = "".join([s.capitalize() for s in task_id_split[:-1]])
# ProMP
_env_id = f'{name}ProMP-{task_id_split[-1]}'
kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
register( register(
id=_env_id, id=f'metaworld/{_task}',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', register_step_based=False,
kwargs=kwargs_dict_goal_and_endeffector_change_promp mp_wrapper=goal_endeffector_change_mp_wrapper.MPWrapper,
add_mp_types=['ProMP', 'ProDMP'],
) )
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
# ProDMP
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
kwargs_dict_goal_and_endeffector_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
kwargs_dict_goal_and_endeffector_change_prodmp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
kwargs_dict_goal_and_endeffector_change_prodmp['name'] = f'metaworld:{_task}'
register(
id=_env_id,
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_goal_and_endeffector_change_prodmp
)
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)

View File

@ -6,12 +6,63 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class BaseMetaworldMPWrapper(RawInterfaceWrapper): class BaseMetaworldMPWrapper(RawInterfaceWrapper):
mp_config = {
'inherit_defaults': False,
'ProMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'promp',
'weights_scale': 10,
},
'phase_generator_kwargs': {
'phase_generator_type': 'linear'
},
'controller_kwargs': {
'controller_type': 'metaworld',
},
'basis_generator_kwargs': {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
},
'black_box_kwargs': {
'condition_on_desired': False,
},
},
'DMP': {},
'ProDMP': {
'wrappers': [],
'trajectory_generator_kwargs': {
'trajectory_generator_type': 'prodmp',
'auto_scale_basis': True,
'weights_scale': 10,
# 'goal_scale': 0.,
'disable_goal': True,
},
'phase_generator_kwargs': {
'phase_generator_type': 'exp',
# 'alpha_phase' : 3,
},
'controller_kwargs': {
'controller_type': 'metaworld',
},
'basis_generator_kwargs': {
'basis_generator_type': 'prodmp',
'num_basis': 5,
'alpha': 10
},
'black_box_kwargs': {
'condition_on_desired': False,
},
},
}
@property @property
def current_pos(self) -> Union[float, int, np.ndarray]: def current_pos(self) -> Union[float, int, np.ndarray]:
r_close = self.env.data.get_joint_qpos("r_close") r_close = self.env.data.joint('r_close').qpos
return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
@property @property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return np.zeros(4, ) return np.zeros(4, )
# raise NotImplementedError("Velocity cannot be retrieved.") # raise NotImplementedError('Velocity cannot be retrieved.')

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode. and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices. at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ 0. , 0. , 0. , 0. , 0,
0 , 0 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0 , 0 , 0 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
""" """
@property @property

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode. and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices. at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ !=0 , !=0 , !=0 , 0. , 0.,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , !=0 , !=0 ,
!=0 , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
""" """
@property @property

View File

@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
and no secondary objects or end effectors are altered at the start of an episode. and no secondary objects or end effectors are altered at the start of an episode.
You can verify this by executing the code below for your environment id and check if the output is non-zero You can verify this by executing the code below for your environment id and check if the output is non-zero
at the same indices. at the same indices.
```python
import fancy_gym
env = fancy_gym.make(env_id, 1)
print(env.reset() - env.reset())
array([ 0. , 0. , 0. , 0. , !=0,
!=0 , !=0 , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , !=0 , !=0 , !=0 ,
0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. ,
0. , !=0 , !=0 , !=0])
```
""" """
@property @property

View File

@ -0,0 +1,97 @@
import random
from typing import Iterable, Type, Union, Optional
import numpy as np
from gymnasium import register as gym_register
import uuid
import gymnasium as gym
import numpy as np
from fancy_gym.utils.env_compatibility import EnvCompatibility
try:
import metaworld
except Exception:
print('[FANCY GYM] Metaworld not avaible')
class FixMetaworldHasIncorrectObsSpaceWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
eos = env.observation_space
eas = env.action_space
Obs_Space_Class = getattr(gym.spaces, str(eos.__class__).split("'")[1].split('.')[-1])
Act_Space_Class = getattr(gym.spaces, str(eas.__class__).split("'")[1].split('.')[-1])
self.observation_space = Obs_Space_Class(low=eos.low-np.inf, high=eos.high+np.inf, dtype=eos.dtype)
self.action_space = Act_Space_Class(low=eas.low, high=eas.high, dtype=eas.dtype)
class FixMetaworldIncorrectResetPathLengthWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
def reset(self, **kwargs):
ret = self.env.reset(**kwargs)
head = self.env
try:
for i in range(16):
head.curr_path_length = 0
head = head.env
except:
pass
return ret
class FixMetaworldIgnoresSeedOnResetWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
def __init__(self, env: gym.Env):
gym.utils.RecordConstructorArgs.__init__(self)
gym.Wrapper.__init__(self, env)
def reset(self, **kwargs):
print('[!] You just called .reset on a Metaworld env and supplied a seed. Metaworld curretly does not correctly implement seeding. Do not rely on deterministic behavior.')
if 'seed' in kwargs:
self.env.seed(kwargs['seed'])
return self.env.reset(**kwargs)
def make_metaworld(underlying_id: str, seed: int = 1, render_mode: Optional[str] = None, **kwargs):
if underlying_id not in metaworld.ML1.ENV_NAMES:
raise ValueError(f'Specified environment "{underlying_id}" not present in metaworld ML1.')
env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[underlying_id + "-goal-observable"](seed=seed, **kwargs)
# setting this avoids generating the same initialization after each reset
env._freeze_rand_vec = False
# New argument to use global seeding
env.seeded_rand_vec = True
# TODO remove, when this has been fixed upstream
env = FixMetaworldHasIncorrectObsSpaceWrapper(env)
# TODO remove, when this has been fixed upstream
# env = FixMetaworldIncorrectResetPathLengthWrapper(env)
# TODO remove, when this has been fixed upstream
env = FixMetaworldIgnoresSeedOnResetWrapper(env)
return env
def register_all_ML1(**kwargs):
for env_id in metaworld.ML1.ENV_NAMES:
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=0)
max_episode_steps = _env.max_path_length
gym_register(
id='metaworld/'+env_id,
entry_point=make_metaworld,
max_episode_steps=max_episode_steps,
kwargs={
'underlying_id': env_id
},
**kwargs
)

View File

@ -4,11 +4,12 @@ These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.
the Motion Primitive gym interface for them. the Motion Primitive gym interface for them.
## MP Environments ## MP Environments
These environments are wrapped-versions of their OpenAI-gym counterparts. These environments are wrapped-versions of their OpenAI-gym counterparts.
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension | Name | Description | Trajectory Horizon | Action Dimension |
|---|---|---|---|---| | ------------------------------------ | -------------------------------------------------------------------- | ------------------ | ---------------- |
|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 | `gym_ProMP/ContinuousMountainCar-v0` | A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 |
|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2 | `gym_ProMP/Reacher-v2` | A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2 |
|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 | `gym_ProMP/FetchSlideDense-v1` | A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 |
|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 | `gym_ProMP/FetchReachDense-v1` | A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 |

View File

@ -1,45 +1,16 @@
from copy import deepcopy from copy import deepcopy
from gym import register from ..envs.registry import register, upgrade
from . import mujoco from . import mujoco
from .deprecated_needs_gym_robotics import robotics from .deprecated_needs_gym_robotics import robotics
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []} upgrade(
id='Reacher-v2',
DEFAULT_BB_DICT_ProMP = { mp_wrapper=mujoco.reacher_v2.MPWrapper,
"name": 'EnvName', add_mp_types=['ProMP'],
"wrappers": [],
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 1.0,
"d_gains": 0.1,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 5,
'num_basis_zero_start': 1
}
}
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
kwargs_dict_reacher_promp['name'] = "Reacher-v2"
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
register(
id='ReacherProMP-v2',
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
kwargs=kwargs_dict_reacher_promp
) )
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
""" """
The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments. The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
However, the usage and so on needs to be checked However, the usage and so on needs to be checked

View File

@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
class MPWrapper(RawInterfaceWrapper): class MPWrapper(RawInterfaceWrapper):
mp_config = {
'ProMP': {
"trajectory_generator_kwargs": {
'trajectory_generator_type': 'promp'
},
"phase_generator_kwargs": {
'phase_generator_type': 'linear'
},
"controller_kwargs": {
'controller_type': 'motor',
"p_gains": 0.6,
"d_gains": 0.075,
},
"basis_generator_kwargs": {
'basis_generator_type': 'zero_rbf',
'num_basis': 6,
'num_basis_zero_start': 1
}
},
'DMP': {},
'ProDMP': {},
}
@property @property
def current_vel(self) -> Union[float, int, np.ndarray]: def current_vel(self) -> Union[float, int, np.ndarray]:

View File

@ -0,0 +1,11 @@
import gymnasium as gym
class EnvCompatibility(gym.wrappers.EnvCompatibility):
def __getattr__(self, item):
"""Propagate only non-existent properties to wrapped env."""
if item.startswith('_'):
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
if item in self.__dict__:
return getattr(self, item)
return getattr(self.env, item)

View File

@ -1,17 +1,27 @@
import logging from fancy_gym.utils.wrappers import TimeAwareObservation
import re from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
from fancy_gym.black_box.factory.controller_factory import get_controller
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
import uuid import uuid
from collections.abc import MutableMapping from collections.abc import MutableMapping
from copy import deepcopy
from math import ceil from math import ceil
from typing import Iterable, Type, Union from typing import Iterable, Type, Union, Optional
import gym import gymnasium as gym
from gymnasium import make
import numpy as np import numpy as np
from gym.envs.registration import register, registry from gymnasium.envs.registration import register, registry
from gymnasium.wrappers import TimeLimit
from fancy_gym.utils.env_compatibility import EnvCompatibility
from fancy_gym.utils.wrappers import FlattenObservation
try: try:
from dm_control import suite, manipulation import shimmy
from shimmy.dm_control_compatibility import EnvType
except ImportError: except ImportError:
pass pass
@ -21,111 +31,44 @@ except Exception:
# catch Exception as Import error does not catch missing mujoco-py # catch Exception as Import error does not catch missing mujoco-py
pass pass
import fancy_gym
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
from fancy_gym.black_box.factory.controller_factory import get_controller
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
from fancy_gym.utils.utils import nested_update
def _make_wrapped_env(env: gym.Env, wrappers: Iterable[Type[gym.Wrapper]], seed=1, fallback_max_steps=None):
def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
"""
TODO: Do we need this?
Generate a callable to create a new gym environment with a given seed.
The rank is added to the seed and can be used for example when using vector environments.
E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
with seeds 123 through 130.
Hence, testing environments should be seeded with a value which is offset by the number of training environments.
Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
Args:
env_id: name of the environment
seed: seed for deterministic behaviour
rank: environment rank for deterministic over multiple seeds behaviour
return_callable: If True returns a callable to create the environment instead of the environment itself.
Returns:
"""
def f():
return make(env_id, seed + rank, **kwargs)
return f if return_callable else f()
def make(env_id: str, seed: int, **kwargs):
"""
Converts an env_id to an environment with the gym API.
This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
specified with "dmc:domain_name-task_name"
Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
Args:
env_id: spec or env_id for gym tasks, external environments require a domain specification
**kwargs: Additional kwargs for the constructor such as pixel observations, etc.
Returns: Gym environment
"""
if ':' in env_id:
split_id = env_id.split(':')
framework, env_id = split_id[-2:]
else:
framework = None
if framework == 'metaworld':
# MetaWorld environment
env = make_metaworld(env_id, seed, **kwargs)
elif framework == 'dmc':
# DeepMind Control environment
env = make_dmc(env_id, seed, **kwargs)
else:
env = make_gym(env_id, seed, **kwargs)
env.seed(seed)
env.action_space.seed(seed)
env.observation_space.seed(seed)
return env
def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
""" """
Helper function for creating a wrapped gym environment using MPs. Helper function for creating a wrapped gym environment using MPs.
It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
provided to expose the interface for MPs. provided to expose the interface for MPs.
Args: Args:
env_id: name of the environment env: base environemnt to wrap
wrappers: list of wrappers (at least an RawInterfaceWrapper), wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment seed: seed of environment
Returns: gym environment with all specified wrappers applied Returns: gym environment with all specified wrappers applied
""" """
# _env = gym.make(env_id) if fallback_max_steps:
_env = make(env_id, seed, **kwargs) env = ensure_finite_time(env, fallback_max_steps)
has_black_box_wrapper = False has_black_box_wrapper = False
head = env
while hasattr(head, 'env'):
if isinstance(head, RawInterfaceWrapper):
has_black_box_wrapper = True
break
head = head.env
for w in wrappers: for w in wrappers:
# only wrap the environment if not BlackBoxWrapper, e.g. for vision # only wrap the environment if not BlackBoxWrapper, e.g. for vision
if issubclass(w, RawInterfaceWrapper): if issubclass(w, RawInterfaceWrapper):
has_black_box_wrapper = True has_black_box_wrapper = True
_env = w(_env) env = w(env)
if not has_black_box_wrapper: if not has_black_box_wrapper:
raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.") raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
return _env return env
def make_bb( def make_bb(
env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping, env: Union[gym.Env, str], wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1, controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping,
**kwargs): time_limit: int = None, fallback_max_steps: int = None, **kwargs):
""" """
This can also be used standalone for manually building a custom DMP environment. This can also be used standalone for manually building a custom DMP environment.
Args: Args:
@ -133,7 +76,7 @@ def make_bb(
basis_kwargs: kwargs for the basis generator basis_kwargs: kwargs for the basis generator
phase_kwargs: kwargs for the phase generator phase_kwargs: kwargs for the phase generator
controller_kwargs: kwargs for the tracking controller controller_kwargs: kwargs for the tracking controller
env_id: base_env_name, env: step based environment (or environment id),
wrappers: list of wrappers (at least an RawInterfaceWrapper), wrappers: list of wrappers (at least an RawInterfaceWrapper),
seed: seed of environment seed: seed of environment
traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
@ -141,7 +84,7 @@ def make_bb(
Returns: DMP wrapped gym env Returns: DMP wrapped gym env
""" """
_verify_time_limit(traj_gen_kwargs.get("duration"), kwargs.get("time_limit")) _verify_time_limit(traj_gen_kwargs.get("duration"), time_limit)
learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories') learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
do_replanning = black_box_kwargs.get('replanning_schedule') do_replanning = black_box_kwargs.get('replanning_schedule')
@ -153,12 +96,19 @@ def make_bb(
# Add as first wrapper in order to alter observation # Add as first wrapper in order to alter observation
wrappers.insert(0, TimeAwareObservation) wrappers.insert(0, TimeAwareObservation)
env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) if isinstance(env, str):
env = make(env, **kwargs)
env = _make_wrapped_env(env=env, wrappers=wrappers, fallback_max_steps=fallback_max_steps)
# BB expects a spaces.Box to be exposed, need to convert for dict-observations
if type(env.observation_space) == gym.spaces.dict.Dict:
env = FlattenObservation(env)
traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item()) traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
if black_box_kwargs.get('duration') is None: if black_box_kwargs.get('duration') is None:
black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt black_box_kwargs['duration'] = get_env_duration(env)
if phase_kwargs.get('tau') is None: if phase_kwargs.get('tau') is None:
phase_kwargs['tau'] = black_box_kwargs['duration'] phase_kwargs['tau'] = black_box_kwargs['duration']
@ -186,156 +136,27 @@ def make_bb(
return bb_env return bb_env
def make_bb_env_helper(**kwargs): def ensure_finite_time(env: gym.Env, fallback_max_steps=500):
""" cur_limit = env.spec.max_episode_steps
Helper function for registering a black box gym environment. if not cur_limit:
Args: if hasattr(env.unwrapped, 'max_path_length'):
**kwargs: expects at least the following: return TimeLimit(env, env.unwrapped.__getattribute__('max_path_length'))
{ return TimeLimit(env, fallback_max_steps)
"name": base environment name.
"wrappers": list of wrappers (at least an BlackBoxWrapper is required),
"traj_gen_kwargs": {
"trajectory_generator_type": type_of_your_movement_primitive,
non default arguments for the movement primitive instance
...
}
"controller_kwargs": {
"controller_type": type_of_your_controller,
non default arguments for the tracking_controller instance
...
},
"basis_generator_kwargs": {
"basis_generator_type": type_of_your_basis_generator,
non default arguments for the basis generator instance
...
},
"phase_generator_kwargs": {
"phase_generator_type": type_of_your_phase_generator,
non default arguments for the phase generator instance
...
},
}
Returns: MP wrapped gym env
"""
seed = kwargs.pop("seed", None)
wrappers = kwargs.pop("wrappers")
traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
black_box_kwargs = kwargs.pop('black_box_kwargs', {})
contr_kwargs = kwargs.pop("controller_kwargs", {})
phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
black_box_kwargs=black_box_kwargs,
traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
phase_kwargs=phase_kwargs,
basis_kwargs=basis_kwargs, **kwargs, seed=seed)
def make_dmc(
env_id: str,
seed: int = None,
visualize_reward: bool = True,
time_limit: Union[None, float] = None,
**kwargs
):
if not re.match(r"\w+-\w+", env_id):
raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
domain_name, task_name = env_id.split("-")
if task_name.endswith("_vision"):
# TODO
raise ValueError("The vision interface for manipulation tasks is currently not supported.")
if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
# env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
gym_id = uuid.uuid4().hex + '-v1'
task_kwargs = {'random': seed}
if time_limit is not None:
task_kwargs['time_limit'] = time_limit
# create task
# Accessing private attribute because DMC does not expose time_limit or step_limit.
# Only the current time_step/time as well as the control_timestep can be accessed.
if domain_name == "manipulation":
env = manipulation.load(environment_name=task_name, seed=seed)
max_episode_steps = ceil(env._time_limit / env.control_timestep())
else:
env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
visualize_reward=visualize_reward, environment_kwargs=kwargs)
max_episode_steps = int(env._step_limit)
register(
id=gym_id,
entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
kwargs={'env': lambda: env},
max_episode_steps=max_episode_steps,
)
env = gym.make(gym_id)
env.seed(seed)
return env return env
def make_metaworld(env_id: str, seed: int, **kwargs): def get_env_duration(env: gym.Env):
if env_id not in metaworld.ML1.ENV_NAMES:
raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
# setting this avoids generating the same initialization after each reset
_env._freeze_rand_vec = False
# New argument to use global seeding
_env.seeded_rand_vec = True
gym_id = uuid.uuid4().hex + '-v1'
register(
id=gym_id,
entry_point=lambda: _env,
max_episode_steps=_env.max_path_length,
)
# TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
env = gym.make(gym_id, disable_env_checker=True)
return env
def make_gym(env_id, seed, **kwargs):
"""
Create
Args:
env_id:
seed:
**kwargs:
Returns:
"""
# Getting the existing keywords to allow for nested dict updates for BB envs
# gym only allows for non nested updates.
try: try:
all_kwargs = deepcopy(registry.get(env_id).kwargs) duration = env.spec.max_episode_steps * env.dt
except AttributeError as e: except (AttributeError, TypeError) as e:
logging.error(f'The gym environment with id {env_id} could not been found.') if env.env_type is EnvType.COMPOSER:
raise e max_episode_steps = ceil(env.unwrapped._time_limit / env.dt)
nested_update(all_kwargs, kwargs) elif env.env_type is EnvType.RL_CONTROL:
kwargs = all_kwargs max_episode_steps = int(env.unwrapped._step_limit)
else:
# Add seed to kwargs for bb environments to pass seed to step environments raise e
all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), []) duration = max_episode_steps * env.control_timestep()
if env_id in all_bb_envs: return duration
kwargs.update({"seed": seed})
# Gym
env = gym.make(env_id, **kwargs)
return env
def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]): def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):

View File

@ -1,78 +0,0 @@
"""
Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
License: MIT
Copyright (c) 2016 OpenAI (https://openai.com)
Wrapper for adding time aware observations to environment observation.
"""
import gym
import numpy as np
from gym.spaces import Box
class TimeAwareObservation(gym.ObservationWrapper):
"""Augment the observation with the current time step in the episode.
The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
In particular, pixel observations are not supported. This wrapper will append the current timestep
within the current episode to the observation.
Example:
>>> import gym
>>> env = gym.make('CartPole-v1')
>>> env = TimeAwareObservation(env)
>>> env.reset()
array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ])
>>> env.step(env.action_space.sample())[0]
array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ])
"""
def __init__(self, env: gym.Env):
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
observation space.
Args:
env: The environment to apply the wrapper
"""
super().__init__(env)
assert isinstance(env.observation_space, Box)
low = np.append(self.observation_space.low, 0.0)
high = np.append(self.observation_space.high, 1.0)
self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
self.t = 0
self._max_episode_steps = env.spec.max_episode_steps
def observation(self, observation):
"""Adds to the observation with the current time step normalized with max steps.
Args:
observation: The observation to add the time step to
Returns:
The observation with the time step appended to
"""
return np.append(observation, self.t / self._max_episode_steps)
def step(self, action):
"""Steps through the environment, incrementing the time step.
Args:
action: The action to take
Returns:
The environment's step using the action.
"""
self.t += 1
return super().step(action)
def reset(self, **kwargs):
"""Reset the environment setting the time to zero.
Args:
**kwargs: Kwargs to apply to env.reset()
Returns:
The reset environment
"""
self.t = 0
return super().reset(**kwargs)

130
fancy_gym/utils/wrappers.py Normal file
View File

@ -0,0 +1,130 @@
from gymnasium.spaces import Box, Dict, flatten, flatten_space
try:
from gym.spaces import Box as OldBox
except ImportError:
OldBox = None
import gymnasium as gym
import numpy as np
import copy
class TimeAwareObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
"""Augment the observation with the current time step in the episode.
The observation space of the wrapped environment is assumed to be a flat :class:`Box` or flattable :class:`Dict`.
In particular, pixel observations are not supported. This wrapper will append the current progress within the current episode to the observation.
The progress will be indicated as a number between 0 and 1.
"""
def __init__(self, env: gym.Env, enforce_dtype_float32=False):
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` or flattable :class:`Dict` observation space.
Args:
env: The environment to apply the wrapper
"""
gym.utils.RecordConstructorArgs.__init__(self)
gym.ObservationWrapper.__init__(self, env)
allowed_classes = [Box, OldBox, Dict]
if enforce_dtype_float32:
assert env.observation_space.dtype == np.float32, 'TimeAwareObservation was given an environment with a dtype!=np.float32 ('+str(
env.observation_space.dtype)+'). This requirement can be removed by setting enforce_dtype_float32=False.'
assert env.observation_space.__class__ in allowed_classes, str(env.observation_space)+' is not supported. Only Box or Dict'
if env.observation_space.__class__ in [Box, OldBox]:
dtype = env.observation_space.dtype
low = np.append(env.observation_space.low, 0.0)
high = np.append(env.observation_space.high, 1.0)
self.observation_space = Box(low, high, dtype=dtype)
else:
spaces = copy.copy(env.observation_space.spaces)
dtype = np.float64
spaces['time_awareness'] = Box(0, 1, dtype=dtype)
self.observation_space = Dict(spaces)
self.is_vector_env = getattr(env, "is_vector_env", False)
def observation(self, observation):
"""Adds to the observation with the current time step.
Args:
observation: The observation to add the time step to
Returns:
The observation with the time step appended to (relative to total number of steps)
"""
if self.observation_space.__class__ in [Box, OldBox]:
return np.append(observation, self.t / self.env.spec.max_episode_steps)
else:
obs = copy.copy(observation)
obs['time_awareness'] = self.t / self.env.spec.max_episode_steps
return obs
def step(self, action):
"""Steps through the environment, incrementing the time step.
Args:
action: The action to take
Returns:
The environment's step using the action.
"""
self.t += 1
return super().step(action)
def reset(self, **kwargs):
"""Reset the environment setting the time to zero.
Args:
**kwargs: Kwargs to apply to env.reset()
Returns:
The reset environment
"""
self.t = 0
return super().reset(**kwargs)
class FlattenObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
"""Observation wrapper that flattens the observation.
Example:
>>> import gymnasium as gym
>>> from gymnasium.wrappers import FlattenObservation
>>> env = gym.make("CarRacing-v2")
>>> env.observation_space.shape
(96, 96, 3)
>>> env = FlattenObservation(env)
>>> env.observation_space.shape
(27648,)
>>> obs, _ = env.reset()
>>> obs.shape
(27648,)
"""
def __init__(self, env: gym.Env):
"""Flattens the observations of an environment.
Args:
env: The environment to apply the wrapper
"""
gym.utils.RecordConstructorArgs.__init__(self)
gym.ObservationWrapper.__init__(self, env)
self.observation_space = flatten_space(env.observation_space)
def observation(self, observation):
"""Flattens an observation.
Args:
observation: The observation to flatten
Returns:
The flattened observation
"""
try:
return flatten(self.env.observation_space, observation)
except:
return np.array([flatten(self.env.observation_space, observation[i]) for i in range(len(observation))])

101
icon.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 114 KiB

View File

@ -6,33 +6,38 @@ from setuptools import setup, find_packages
# Environment-specific dependencies for dmc and metaworld # Environment-specific dependencies for dmc and metaworld
extras = { extras = {
"dmc": ["dm_control>=1.0.1"], 'dmc': ['shimmy[dm-control]', 'Shimmy==1.0.0'],
"metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld", 'metaworld': ['metaworld @ git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld'],
'mujoco-py<2.2,>=2.1', 'box2d': ['gymnasium[box2d]>=0.26.0'],
'scipy' 'mujoco': ['mujoco==2.3.3', 'gymnasium[mujoco]>0.26.0'],
], 'mujoco-legacy': ['mujoco-py >=2.1,<2.2', 'cython<3'],
'jax': ["jax >=0.4.0", "jaxlib >=0.4.0"],
} }
# All dependencies # All dependencies
all_groups = set(extras.keys()) all_groups = set(extras.keys())
extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups)))) extras["all"] = list(set(itertools.chain.from_iterable(
map(lambda group: extras[group], all_groups))))
extras['testing'] = extras["all"] + ['pytest']
def find_package_data(extensions_to_include: List[str]) -> List[str]: def find_package_data(extensions_to_include: List[str]) -> List[str]:
envs_dir = Path("fancy_gym/envs/mujoco") envs_dir = Path("fancy_gym/envs/mujoco")
package_data_paths = [] package_data_paths = []
for extension in extensions_to_include: for extension in extensions_to_include:
package_data_paths.extend([str(path)[10:] for path in envs_dir.rglob(extension)]) package_data_paths.extend([str(path)[10:]
for path in envs_dir.rglob(extension)])
return package_data_paths return package_data_paths
setup( setup(
author='Fabian Otto, Onur Celik', author='Fabian Otto, Onur Celik, Dominik Roth, Hongyi Zhou',
name='fancy_gym', name='fancy_gym',
version='0.2', version='1.0',
classifiers=[ classifiers=[
'Development Status :: 3 - Alpha', 'Development Status :: 4 - Beta',
'Intended Audience :: Science/Research', 'Intended Audience :: Science/Research',
'License :: OSI Approved :: MIT License', 'License :: OSI Approved :: MIT License',
'Natural Language :: English', 'Natural Language :: English',
@ -46,10 +51,11 @@ setup(
], ],
extras_require=extras, extras_require=extras,
install_requires=[ install_requires=[
'gym[mujoco]<0.25.0,>=0.24.1', 'gymnasium>=0.26.0',
'mp_pytorch<=0.1.3' 'mp_pytorch<=0.1.3'
], ],
packages=[package for package in find_packages() if package.startswith("fancy_gym")], packages=[package for package in find_packages(
) if package.startswith("fancy_gym")],
package_data={ package_data={
"fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"]) "fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"])
}, },

View File

@ -1,14 +1,21 @@
import re
from itertools import chain from itertools import chain
from typing import Callable
import gym import gymnasium as gym
import pytest import pytest
import fancy_gym import fancy_gym
from test.utils import run_env, run_env_determinism from test.utils import run_env, run_env_determinism
GYM_IDS = [spec.id for spec in gym.envs.registry.all() if GYM_IDS = [spec.id for spec in gym.envs.registry.values() if
"fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point] not isinstance(spec.entry_point, Callable) and
GYM_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point
and 'jax' not in spec.id.lower()
and 'jax' not in spec.id.lower()
and not re.match(r'GymV2.Environment', spec.id)
]
GYM_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1 SEED = 1

View File

@ -1,21 +1,23 @@
from itertools import chain from itertools import chain
from typing import Tuple, Type, Union, Optional, Callable from typing import Tuple, Type, Union, Optional, Callable
import gym import gymnasium as gym
import numpy as np import numpy as np
import pytest import pytest
from gym import register from gymnasium import register, make
from gym.core import ActType, ObsType from gymnasium.core import ActType, ObsType
import fancy_gym import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation from fancy_gym.utils.wrappers import TimeAwareObservation
SEED = 1 SEED = 1
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2'] ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper, WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper] fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
MAX_STEPS_FALLBACK = 100
class Object(object): class Object(object):
@ -32,10 +34,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]: options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
return np.array([-1]) obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]: def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
return np.array([-1]), 1, False, {} obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"): def render(self, mode="human"):
pass pass
@ -76,7 +80,7 @@ def test_missing_local_state(mp_type: str):
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type}) {'basis_generator_type': basis_generator_type})
env.reset() env.reset(seed=SEED)
with pytest.raises(NotImplementedError): with pytest.raises(NotImplementedError):
env.step(env.action_space.sample()) env.step(env.action_space.sample())
@ -93,12 +97,14 @@ def test_verbosity(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type}) {'basis_generator_type': basis_generator_type})
env.reset() env.reset(seed=SEED)
info_keys = list(env.step(env.action_space.sample())[3].keys()) _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
info_keys = list(info.keys())
env_step = fancy_gym.make(env_id, SEED) env_step = make(env_id)
env_step.reset() env_step.reset()
info_keys_step = env_step.step(env_step.action_space.sample())[3].keys() _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
info_keys_step = info.keys()
assert all(e in info_keys for e in info_keys_step) assert all(e in info_keys for e in info_keys_step)
assert 'trajectory_length' in info_keys assert 'trajectory_length' in info_keys
@ -118,13 +124,15 @@ def test_length(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]):
{'trajectory_generator_type': mp_type}, {'trajectory_generator_type': mp_type},
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type}) {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
for _ in range(5): for i in range(5):
env.reset() env.reset(seed=SEED)
length = env.step(env.action_space.sample())[3]['trajectory_length']
assert length == env.spec.max_episode_steps _obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
length = info['trajectory_length']
assert length == env.spec.max_episode_steps, f'Expcted total simulation length ({length}) to be equal to spec.max_episode_steps ({env.spec.max_episode_steps}), but was not during test nr. {i}'
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@ -136,9 +144,10 @@ def test_aggregation(mp_type: str, reward_aggregation: Callable[[np.ndarray], fl
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': basis_generator_type}) {'basis_generator_type': basis_generator_type})
env.reset() env.reset(seed=SEED)
# ToyEnv only returns 1 as reward # ToyEnv only returns 1 as reward
assert env.step(env.action_space.sample())[1] == reward_aggregation(np.ones(50, )) _obs, reward, _terminated, _truncated, _info = env.step(env.action_space.sample())
assert reward == reward_aggregation(np.ones(50, ))
@pytest.mark.parametrize('mp_type', ['promp', 'dmp']) @pytest.mark.parametrize('mp_type', ['promp', 'dmp'])
@ -151,14 +160,16 @@ def test_context_space(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapp
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': 'rbf'}) {'basis_generator_type': 'rbf'})
# check if observation space matches with the specified mask values which are true # check if observation space matches with the specified mask values which are true
env_step = fancy_gym.make(env_id, SEED) env_step = make(env_id)
wrapper = wrapper_class(env_step) wrapper = wrapper_class(env_step)
assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
@pytest.mark.parametrize('num_dof', [0, 1, 2, 5]) @pytest.mark.parametrize('num_dof', [0, 1, 2, 5])
@pytest.mark.parametrize('num_basis', [0, 1, 2, 5]) @pytest.mark.parametrize('num_basis', [
pytest.param(0, marks=pytest.mark.xfail(reason="Basis Length 0 is not yet implemented.")),
1, 2, 5])
@pytest.mark.parametrize('learn_tau', [True, False]) @pytest.mark.parametrize('learn_tau', [True, False])
@pytest.mark.parametrize('learn_delay', [True, False]) @pytest.mark.parametrize('learn_delay', [True, False])
def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool): def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool):
@ -219,16 +230,18 @@ def test_learn_tau(mp_type: str, tau: float):
'learn_delay': False 'learn_delay': False
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, seed=SEED) })
d = True env.reset(seed=SEED)
done = True
for i in range(5): for i in range(5):
if d: if done:
env.reset() env.reset(seed=SEED)
action = env.action_space.sample() action = env.action_space.sample()
action[0] = tau action[0] = tau
obs, r, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length'] length = info['trajectory_length']
assert length == env.spec.max_episode_steps assert length == env.spec.max_episode_steps
@ -248,6 +261,8 @@ def test_learn_tau(mp_type: str, tau: float):
assert np.all(vel[:tau_time_steps - 2] != vel[-1]) assert np.all(vel[:tau_time_steps - 2] != vel[-1])
# #
# #
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75]) @pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75])
def test_learn_delay(mp_type: str, delay: float): def test_learn_delay(mp_type: str, delay: float):
@ -262,16 +277,18 @@ def test_learn_delay(mp_type: str, delay: float):
'learn_delay': True 'learn_delay': True
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, seed=SEED) })
d = True env.reset(seed=SEED)
done = True
for i in range(5): for i in range(5):
if d: if done:
env.reset() env.reset(seed=SEED)
action = env.action_space.sample() action = env.action_space.sample()
action[0] = delay action[0] = delay
obs, r, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length'] length = info['trajectory_length']
assert length == env.spec.max_episode_steps assert length == env.spec.max_episode_steps
@ -290,6 +307,8 @@ def test_learn_delay(mp_type: str, delay: float):
assert np.all(vel[max(1, delay_time_steps)] != vel[0]) assert np.all(vel[max(1, delay_time_steps)] != vel[0])
# #
# #
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1]) @pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1])
@pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1]) @pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1])
@ -305,20 +324,23 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
'learn_delay': True 'learn_delay': True
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, seed=SEED) })
env.reset(seed=SEED)
if env.spec.max_episode_steps * env.dt < delay + tau: if env.spec.max_episode_steps * env.dt < delay + tau:
return return
d = True done = True
for i in range(5): for i in range(5):
if d: if done:
env.reset() env.reset(seed=SEED)
action = env.action_space.sample() action = env.action_space.sample()
action[0] = tau action[0] = tau
action[1] = delay action[1] = delay
obs, r, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length'] length = info['trajectory_length']
assert length == env.spec.max_episode_steps assert length == env.spec.max_episode_steps
@ -343,4 +365,4 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
active_pos = pos[delay_time_steps: joint_time_steps - 1] active_pos = pos[delay_time_steps: joint_time_steps - 1]
active_vel = vel[delay_time_steps: joint_time_steps - 2] active_vel = vel[delay_time_steps: joint_time_steps - 2]
assert np.all(active_pos != pos[-1]) and np.all(active_pos != pos[0]) assert np.all(active_pos != pos[-1]) and np.all(active_pos != pos[0])
assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0]) assert np.all(active_vel != vel[-1]) and np.all(active_vel != vel[0])

View File

@ -1,39 +1,30 @@
from itertools import chain from itertools import chain
from typing import Callable
import gymnasium as gym
import pytest import pytest
from dm_control import suite, manipulation
import fancy_gym import fancy_gym
from test.utils import run_env, run_env_determinism from test.utils import run_env, run_env_determinism
SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"] DMC_IDS = [spec.id for spec in gym.envs.registry.values() if
MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')] spec.id.startswith('dm_control/')
DMC_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) and 'compatibility-env-v0' not in spec.id
and 'lqr-lqr' not in spec.id]
DMC_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1 SEED = 1
@pytest.mark.parametrize('env_id', SUITE_IDS) @pytest.mark.parametrize('env_id', DMC_IDS)
def test_step_suite_functionality(env_id: str): def test_step_dm_control_functionality(env_id: str):
"""Tests that suite step environments run without errors using random actions.""" """Tests that suite step environments run without errors using random actions."""
run_env(env_id) run_env(env_id, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', SUITE_IDS) @pytest.mark.parametrize('env_id', DMC_IDS)
def test_step_suite_determinism(env_id: str): def test_step_dm_control_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories.""" """Tests that for step environments identical seeds produce identical trajectories."""
run_env_determinism(env_id, SEED) run_env_determinism(env_id, SEED, 5000, wrappers=[gym.wrappers.FlattenObservation])
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
def test_step_manipulation_functionality(env_id: str):
"""Tests that manipulation step environments run without errors using random actions."""
run_env(env_id)
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
def test_step_manipulation_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories."""
run_env_determinism(env_id, SEED)
@pytest.mark.parametrize('env_id', DMC_MP_IDS) @pytest.mark.parametrize('env_id', DMC_MP_IDS)

View File

@ -1,14 +1,16 @@
import itertools from itertools import chain
from typing import Callable
import fancy_gym import fancy_gym
import gym import gymnasium as gym
import pytest import pytest
from test.utils import run_env, run_env_determinism from test.utils import run_env, run_env_determinism
CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if CUSTOM_IDS = [id for id, spec in gym.envs.registry.items() if
not isinstance(spec.entry_point, Callable) and
"fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point] "fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
CUSTOM_MP_IDS = itertools.chain(*fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) CUSTOM_MP_IDS = fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1 SEED = 1

View File

@ -0,0 +1,78 @@
from typing import Tuple, Type, Union, Optional, Callable
import gymnasium as gym
import numpy as np
import pytest
from gymnasium import make
from gymnasium.core import ActType, ObsType
import fancy_gym
from fancy_gym import register
KNOWN_NS = ['dm_control', 'fancy', 'metaworld', 'gym']
class Object(object):
pass
class ToyEnv(gym.Env):
observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
action_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
dt = 0.02
def __init__(self, a: int = 0, b: float = 0.0, c: list = [], d: dict = {}, e: Object = Object()):
self.a, self.b, self.c, self.d, self.e = a, b, c, d, e
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"):
pass
@pytest.fixture(scope="session", autouse=True)
def setup():
register(
id=f'dummy/toy2-v0',
entry_point='test.test_black_box:ToyEnv',
max_episode_steps=50,
)
@pytest.mark.parametrize('env_id', ['dummy/toy2-v0'])
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
def test_make_mp(env_id: str, mp_type: str):
parts = env_id.split('/')
if len(parts) == 1:
ns, name = 'gym', parts[0]
elif len(parts) == 2:
ns, name = parts[0], parts[1]
else:
raise ValueError('env id can not contain multiple "/".')
fancy_id = f'{ns}_{mp_type}/{name}'
make(fancy_id)
def test_make_raw_toy():
make('dummy/toy2-v0')
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
def test_make_mp_toy(mp_type: str):
fancy_id = f'dummy_{mp_type}/toy2-v0'
make(fancy_id)
@pytest.mark.parametrize('ns', KNOWN_NS)
def test_ns_nonempty(ns):
assert len(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]), f'The namespace {ns} is empty even though, it should not be...'

View File

@ -6,9 +6,9 @@ from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
import fancy_gym import fancy_gym
from test.utils import run_env, run_env_determinism from test.utils import run_env, run_env_determinism
METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in METAWORLD_IDS = [f'metaworld/{env.split("-goal-observable")[0]}' for env, _ in
ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()] ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
METAWORLD_MP_IDS = chain(*fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) METAWORLD_MP_IDS = fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
SEED = 1 SEED = 1
@ -18,6 +18,7 @@ def test_step_metaworld_functionality(env_id: str):
run_env(env_id) run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_IDS) @pytest.mark.parametrize('env_id', METAWORLD_IDS)
def test_step_metaworld_determinism(env_id: str): def test_step_metaworld_determinism(env_id: str):
"""Tests that for step environments identical seeds produce identical trajectories.""" """Tests that for step environments identical seeds produce identical trajectories."""
@ -30,6 +31,7 @@ def test_bb_metaworld_functionality(env_id: str):
run_env(env_id) run_env(env_id)
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
@pytest.mark.parametrize('env_id', METAWORLD_MP_IDS) @pytest.mark.parametrize('env_id', METAWORLD_MP_IDS)
def test_bb_metaworld_determinism(env_id: str): def test_bb_metaworld_determinism(env_id: str):
"""Tests that for black box environment identical seeds produce identical trajectories.""" """Tests that for black box environment identical seeds produce identical trajectories."""

View File

@ -2,21 +2,25 @@ from itertools import chain
from types import FunctionType from types import FunctionType
from typing import Tuple, Type, Union, Optional from typing import Tuple, Type, Union, Optional
import gym import gymnasium as gym
import numpy as np import numpy as np
import pytest import pytest
from gym import register from gymnasium import register, make
from gym.core import ActType, ObsType from gymnasium.core import ActType, ObsType
from gymnasium import spaces
import fancy_gym import fancy_gym
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
from fancy_gym.utils.time_aware_observation import TimeAwareObservation from fancy_gym.utils.wrappers import TimeAwareObservation
from fancy_gym.utils.make_env_helpers import ensure_finite_time
SEED = 1 SEED = 1
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2'] ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper, WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper] fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values()) ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
MAX_STEPS_FALLBACK = 50
class ToyEnv(gym.Env): class ToyEnv(gym.Env):
@ -26,10 +30,12 @@ class ToyEnv(gym.Env):
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]: options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
return np.array([-1]) obs, options = np.array([-1]), {}
return obs, options
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]: def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
return np.array([-1]), 1, False, {} obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
return obs, reward, terminated, truncated, info
def render(self, mode="human"): def render(self, mode="human"):
pass pass
@ -61,7 +67,7 @@ def setup():
def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]], def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool): add_time_aware_wrapper_before: bool):
env_id, wrapper_class = env_wrap env_id, wrapper_class = env_wrap
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED)) env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class] wrappers = [wrapper_class]
# has time aware wrapper # has time aware wrapper
@ -72,24 +78,29 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
{'trajectory_generator_type': mp_type}, {'trajectory_generator_type': mp_type},
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': 'exp'}, {'phase_generator_type': 'exp'},
{'basis_generator_type': 'rbf'}, seed=SEED) {'basis_generator_type': 'rbf'}, fallback_max_steps=MAX_STEPS_FALLBACK)
env.reset(seed=SEED)
assert env.learn_sub_trajectories assert env.learn_sub_trajectories
assert env.spec.max_episode_steps
assert env_step.spec.max_episode_steps
assert env.traj_gen.learn_tau assert env.traj_gen.learn_tau
# This also verifies we are not adding the TimeAwareObservationWrapper twice # This also verifies we are not adding the TimeAwareObservationWrapper twice
assert env.observation_space == env_step.observation_space assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
d = True done = True
for i in range(25): for i in range(25):
if d: if done:
env.reset() env.reset(seed=SEED)
action = env.action_space.sample() action = env.action_space.sample()
obs, r, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length'] length = info['trajectory_length']
if not d: if not done:
assert length == np.round(action[0] / env.dt) assert length == np.round(action[0] / env.dt)
assert length == np.round(env.traj_gen.tau.numpy() / env.dt) assert length == np.round(env.traj_gen.tau.numpy() / env.dt)
else: else:
@ -105,14 +116,14 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]], def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
add_time_aware_wrapper_before: bool, replanning_time: int): add_time_aware_wrapper_before: bool, replanning_time: int):
env_id, wrapper_class = env_wrap env_id, wrapper_class = env_wrap
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED)) env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
wrappers = [wrapper_class] wrappers = [wrapper_class]
# has time aware wrapper # has time aware wrapper
if add_time_aware_wrapper_before: if add_time_aware_wrapper_before:
wrappers += [TimeAwareObservation] wrappers += [TimeAwareObservation]
replanning_schedule = lambda c_pos, c_vel, obs, c_action, t: t % replanning_time == 0 def replanning_schedule(c_pos, c_vel, obs, c_action, t): return t % replanning_time == 0
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf' basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear' phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear'
@ -121,31 +132,36 @@ def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWra
{'trajectory_generator_type': mp_type}, {'trajectory_generator_type': mp_type},
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type}, {'phase_generator_type': phase_generator_type},
{'basis_generator_type': basis_generator_type}, seed=SEED) {'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
env.reset(seed=SEED)
assert env.do_replanning assert env.do_replanning
assert env.spec.max_episode_steps
assert env_step.spec.max_episode_steps
assert callable(env.replanning_schedule) assert callable(env.replanning_schedule)
# This also verifies we are not adding the TimeAwareObservationWrapper twice # This also verifies we are not adding the TimeAwareObservationWrapper twice
assert env.observation_space == env_step.observation_space assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
env.reset() env.reset(seed=SEED)
episode_steps = env_step.spec.max_episode_steps // replanning_time episode_steps = env_step.spec.max_episode_steps // replanning_time
# Make 3 episodes, total steps depend on the replanning steps # Make 3 episodes, total steps depend on the replanning steps
for i in range(3 * episode_steps): for i in range(3 * episode_steps):
action = env.action_space.sample() action = env.action_space.sample()
obs, r, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
length = info['trajectory_length'] length = info['trajectory_length']
if d: if done:
# Check if number of steps until termination match the replanning interval # Check if number of steps until termination match the replanning interval
print(d, (i + 1), episode_steps) print(done, (i + 1), episode_steps)
assert (i + 1) % episode_steps == 0 assert (i + 1) % episode_steps == 0
env.reset() env.reset(seed=SEED)
assert replanning_schedule(None, None, None, None, length) assert replanning_schedule(None, None, None, None, length)
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4]) @pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10]) @pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -165,15 +181,19 @@ def test_max_planning_times(mp_type: str, max_planning_times: int, sub_segment_s
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, },
seed=SEED) fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset()
d = False _ = env.reset(seed=SEED)
done = False
planning_times = 0 planning_times = 0
while not d: while not done:
_, _, d, _ = env.step(env.action_space.sample()) action = env.action_space.sample()
_obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
planning_times += 1 planning_times += 1
assert planning_times == max_planning_times assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4]) @pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10]) @pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -194,17 +214,20 @@ def test_replanning_with_learn_tau(mp_type: str, max_planning_times: int, sub_se
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, },
seed=SEED) fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset()
d = False _ = env.reset(seed=SEED)
done = False
planning_times = 0 planning_times = 0
while not d: while not done:
action = env.action_space.sample() action = env.action_space.sample()
action[0] = tau action[0] = tau
_, _, d, info = env.step(action) _obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
planning_times += 1 planning_times += 1
assert planning_times == max_planning_times assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4]) @pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10]) @pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -213,26 +236,28 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf' basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear' phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper], env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0, {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times, 'max_planning_times': max_planning_times,
'verbose': 2}, 'verbose': 2},
{'trajectory_generator_type': mp_type, {'trajectory_generator_type': mp_type,
}, },
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type, {'phase_generator_type': phase_generator_type,
'learn_tau': False, 'learn_tau': False,
'learn_delay': True 'learn_delay': True
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, },
seed=SEED) fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset()
d = False _ = env.reset(seed=SEED)
done = False
planning_times = 0 planning_times = 0
while not d: while not done:
action = env.action_space.sample() action = env.action_space.sample()
action[0] = delay action[0] = delay
_, _, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt)) delay_time_steps = int(np.round(delay / env.dt))
pos = info['positions'].flatten() pos = info['positions'].flatten()
@ -256,6 +281,7 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
assert planning_times == max_planning_times assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3]) @pytest.mark.parametrize('max_planning_times', [1, 2, 3])
@pytest.mark.parametrize('sub_segment_steps', [5, 10, 15]) @pytest.mark.parametrize('sub_segment_steps', [5, 10, 15])
@ -266,27 +292,29 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf' basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear' phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
env = fancy_gym.make_bb('toy-v0', [ToyWrapper], env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0, {'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
'max_planning_times': max_planning_times, 'max_planning_times': max_planning_times,
'verbose': 2}, 'verbose': 2},
{'trajectory_generator_type': mp_type, {'trajectory_generator_type': mp_type,
}, },
{'controller_type': 'motor'}, {'controller_type': 'motor'},
{'phase_generator_type': phase_generator_type, {'phase_generator_type': phase_generator_type,
'learn_tau': True, 'learn_tau': True,
'learn_delay': True 'learn_delay': True
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, },
seed=SEED) fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset()
d = False _ = env.reset(seed=SEED)
done = False
planning_times = 0 planning_times = 0
while not d: while not done:
action = env.action_space.sample() action = env.action_space.sample()
action[0] = tau action[0] = tau
action[1] = delay action[1] = delay
_, _, d, info = env.step(action) _obs, _reward, terminated, truncated, info = env.step(action)
done = terminated or truncated
delay_time_steps = int(np.round(delay / env.dt)) delay_time_steps = int(np.round(delay / env.dt))
@ -306,6 +334,7 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
assert planning_times == max_planning_times assert planning_times == max_planning_times
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp']) @pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4]) @pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
@pytest.mark.parametrize('sub_segment_steps', [5, 10]) @pytest.mark.parametrize('sub_segment_steps', [5, 10])
@ -325,9 +354,11 @@ def test_replanning_schedule(mp_type: str, max_planning_times: int, sub_segment_
}, },
{'basis_generator_type': basis_generator_type, {'basis_generator_type': basis_generator_type,
}, },
seed=SEED) fallback_max_steps=MAX_STEPS_FALLBACK)
_ = env.reset()
d = False _ = env.reset(seed=SEED)
for i in range(max_planning_times): for i in range(max_planning_times):
_, _, d, _ = env.step(env.action_space.sample()) action = env.action_space.sample()
assert d _obs, _reward, terminated, truncated, _info = env.step(action)
done = terminated or truncated
assert done

View File

@ -1,9 +1,12 @@
import gym from typing import List, Type
import gymnasium as gym
import numpy as np import numpy as np
from fancy_gym import make from gymnasium import make
def run_env(env_id, iterations=None, seed=0, render=False): def run_env(env_id: str, iterations: int = None, seed: int = 0, wrappers: List[Type[gym.Wrapper]] = [],
render: bool = False):
""" """
Example for running a DMC based env in the step based setting. Example for running a DMC based env in the step based setting.
The env_id has to be specified as `dmc:domain_name-task_name` or The env_id has to be specified as `dmc:domain_name-task_name` or
@ -13,70 +16,88 @@ def run_env(env_id, iterations=None, seed=0, render=False):
env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name` env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
iterations: Number of rollout steps to run iterations: Number of rollout steps to run
seed: random seeding seed: random seeding
wrappers: List of Wrappers to apply to the environment
render: Render the episode render: Render the episode
Returns: observations, rewards, dones, actions Returns: observations, rewards, terminations, truncations, actions
""" """
env: gym.Env = make(env_id, seed=seed) env: gym.Env = make(env_id)
for w in wrappers:
env = w(env)
rewards = [] rewards = []
observations = [] observations = []
actions = [] actions = []
dones = [] terminations = []
obs = env.reset() truncations = []
obs, _ = env.reset(seed=seed)
env.action_space.seed(seed)
verify_observations(obs, env.observation_space, "reset()") verify_observations(obs, env.observation_space, "reset()")
iterations = iterations or (env.spec.max_episode_steps or 1) iterations = iterations or (env.spec.max_episode_steps or 1)
# number of samples(multiple environment steps) # number of samples (multiple environment steps)
for i in range(iterations): for i in range(iterations):
observations.append(obs) observations.append(obs)
ac = env.action_space.sample() ac = env.action_space.sample()
actions.append(ac) actions.append(ac)
# ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
obs, reward, done, info = env.step(ac) obs, reward, terminated, truncated, info = env.step(ac)
verify_observations(obs, env.observation_space, "step()") verify_observations(obs, env.observation_space, "step()")
verify_reward(reward) verify_reward(reward)
verify_done(done) verify_done(terminated)
verify_done(truncated)
rewards.append(reward) rewards.append(reward)
dones.append(done) terminations.append(terminated)
truncations.append(truncated)
if render: if render:
env.render("human") env.render("human")
if done: if terminated or truncated:
break break
if not hasattr(env, "replanning_schedule"): if not hasattr(env, "replanning_schedule"):
assert done, "Done flag is not True after end of episode." assert terminated or truncated, f"Termination or truncation flag is not True after {i + 1} iterations."
observations.append(obs) observations.append(obs)
env.close() env.close()
del env del env
return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) return np.array(observations), np.array(rewards), np.array(terminations), np.array(truncations), np.array(actions)
def run_env_determinism(env_id: str, seed: int): def run_env_determinism(env_id: str, seed: int, iterations: int = None, wrappers: List[Type[gym.Wrapper]] = []):
traj1 = run_env(env_id, seed=seed) traj1 = run_env(env_id, iterations=iterations,
traj2 = run_env(env_id, seed=seed) seed=seed, wrappers=wrappers)
traj2 = run_env(env_id, iterations=iterations,
seed=seed, wrappers=wrappers)
# Iterate over two trajectories, which should have the same state and action sequence # Iterate over two trajectories, which should have the same state and action sequence
for i, time_step in enumerate(zip(*traj1, *traj2)): for i, time_step in enumerate(zip(*traj1, *traj2)):
obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step obs1, rwd1, term1, trunc1, ac1, obs2, rwd2, term2, trunc2, ac2 = time_step
assert np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match." assert np.allclose(
assert np.array_equal(ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match." obs1, obs2), f"Observations [{i}] {obs1} ({obs1.shape}) and {obs2} ({obs2.shape}) do not match: Biggest difference is {np.abs(obs1-obs2).max()} at index {np.abs(obs1-obs2).argmax()}."
assert np.array_equal(rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match." assert np.array_equal(
assert np.array_equal(done1, done2), f"Dones [{i}] {done1} and {done2} do not match." ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
assert np.array_equal(
rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
assert np.array_equal(
term1, term2), f"Terminateds [{i}] {term1} and {term2} do not match."
assert np.array_equal(
term1, term2), f"Truncateds [{i}] {trunc1} and {trunc2} do not match."
def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"): def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"):
assert observation_space.contains(obs), \ assert observation_space.contains(obs), \
f"Observation {obs} received from {obs_type} not contained in observation space {observation_space}." f"Observation {obs} ({obs.shape}) received from {obs_type} not contained in observation space {observation_space}."
def verify_reward(reward): def verify_reward(reward):
assert isinstance(reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int." assert isinstance(
reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
def verify_done(done): def verify_done(done):
assert isinstance(done, bool), f"Returned {done} as done flag, expected bool." assert isinstance(
done, bool), f"Returned {done} as done flag, expected bool."