Merge pull request #75 from D-o-d-o-x/great_refactor
Refactor and Upgrade to Gymnasium
This commit is contained in:
commit
c420a96d4f
237
README.md
237
README.md
@ -1,27 +1,35 @@
|
||||
# Fancy Gym
|
||||
<h1 align="center">
|
||||
<br>
|
||||
<img src='./icon.svg' width="250px">
|
||||
<br><br>
|
||||
<b>Fancy Gym</b>
|
||||
<br><br>
|
||||
</h1>
|
||||
|
||||
`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
|
||||
of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites
|
||||
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
|
||||
(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
|
||||
gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
|
||||
you would then submit a PR for this environment to become part of `fancy_gym`.
|
||||
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
|
||||
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
|
||||
| :exclamation: Fancy Gym has recently received a major refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the [legacy branch](https://github.com/ALRhub/fancy_gym/tree/legacy). Find out more about what changed [here](https://github.com/ALRhub/fancy_gym/pull/75). |
|
||||
| --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
|
||||
Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAI’s renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
|
||||
|
||||
**Key Features**:
|
||||
|
||||
- **New Challenging Environments**: `fancy_gym` includes several new environments (Panda Box Pushing, Table Tennis, etc.) that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
|
||||
- **Support for Movement Primitives**: `fancy_gym` supports a range of movement primitives (MPs), including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
|
||||
- **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
|
||||
- **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/), whether you want to use them in the regular step-based setting or using MPs.
|
||||
- **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
|
||||
|
||||
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
|
||||
|
||||
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
|
||||
optimization, and methods that are often used in traditional robotics and control. MP environments are typically
|
||||
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
|
||||
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
|
||||
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
|
||||
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
|
||||
respectively as well as a special controller for the MetaWorld control suite.
|
||||
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
|
||||
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
|
||||
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
|
||||
episode. This requires to predict a new action/MP parametrization for each context.
|
||||
<p align="justify">
|
||||
Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
|
||||
</p>
|
||||
<p align="justify">
|
||||
Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
|
||||
</p>
|
||||
<p align="justify">
|
||||
While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
|
||||
</p>
|
||||
|
||||
## Installation
|
||||
|
||||
@ -43,59 +51,60 @@ cd fancy_gym
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
In case you want to use dm_control oder metaworld, you can install them by specifying extras
|
||||
We have a few optional dependencies. If you also want to install those use
|
||||
|
||||
```bash
|
||||
pip install -e .[dmc,metaworld]
|
||||
pip install -e '.[all]' # to install all optional dependencies
|
||||
pip install -e '.[dmc,metaworld,box2d,mujoco,mujoco-legacy,jax,testing]' # or choose only those you want
|
||||
```
|
||||
|
||||
> **Note:**
|
||||
> While our library already fully supports the new mujoco bindings, metaworld still relies on
|
||||
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
|
||||
|
||||
## How to use Fancy Gym
|
||||
|
||||
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
|
||||
|
||||
### Step-wise Environments
|
||||
### Step-Based Environments
|
||||
|
||||
Regular step based environments added by Fancy Gym are added into the `fancy/` namespace.
|
||||
|
||||
| :exclamation: Legacy versions of Fancy Gym used `fancy_gym.make(...)`. This is no longer supported and will raise an Exception on new versions. |
|
||||
| ----------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
|
||||
```python
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
env = fancy_gym.make('Reacher5d-v0', seed=1)
|
||||
obs = env.reset()
|
||||
env = gym.make('fancy/Reacher5d-v0')
|
||||
# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
|
||||
# or env = gym.make('dm_control/ball_in_cup-catch-v0')
|
||||
# or env = gym.make('Reacher-v2')
|
||||
observation = env.reset(seed=1)
|
||||
|
||||
for i in range(1000):
|
||||
action = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(action)
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
if i % 5 == 0:
|
||||
env.render()
|
||||
|
||||
if done:
|
||||
obs = env.reset()
|
||||
if terminated or truncated:
|
||||
observation, info = env.reset()
|
||||
```
|
||||
|
||||
When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
|
||||
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
|
||||
custom tasks and standard gym environments can be created without prefixes.
|
||||
|
||||
### Black-box Environments
|
||||
|
||||
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
|
||||
each environment returns all collected information from each step as part of the infos. This information is, however,
|
||||
mainly meant for debugging as well as logging and not for training.
|
||||
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
|
||||
|
||||
|Key| Description|Type
|
||||
|---|---|---|
|
||||
`positions`| Generated trajectory from MP | Optional
|
||||
`velocities`| Generated trajectory from MP | Optional
|
||||
`step_actions`| Step-wise executed action based on controller output | Optional
|
||||
`step_observations`| Step-wise intermediate observations | Optional
|
||||
`step_rewards`| Step-wise rewards | Optional
|
||||
`trajectory_length`| Total number of environment interactions | Always
|
||||
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
|
||||
| Key | Description | Type |
|
||||
| ------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
|
||||
| `positions` | Generated trajectory from MP | Optional |
|
||||
| `velocities` | Generated trajectory from MP | Optional |
|
||||
| `step_actions` | Step-wise executed action based on controller output | Optional |
|
||||
| `step_observations` | Step-wise intermediate observations | Optional |
|
||||
| `step_rewards` | Step-wise rewards | Optional |
|
||||
| `trajectory_length` | Total number of environment interactions | Always |
|
||||
| `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always |
|
||||
|
||||
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
|
||||
Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by `<original namespace>_<MP name>/`.
|
||||
Just keep in mind, calling `step()` executes a full trajectory.
|
||||
|
||||
> **Note:**
|
||||
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
|
||||
@ -105,30 +114,38 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
|
||||
> Feel free to try it and open an issue with any problems that occur.
|
||||
|
||||
```python
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
|
||||
env = gym.make('fancy_ProMP/Reacher5d-v0')
|
||||
# or env = gym.make('metaworld_ProDMP/reach-v2')
|
||||
# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
|
||||
# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
|
||||
|
||||
# render() can be called once in the beginning with all necessary arguments.
|
||||
# To turn it of again just call render() without any arguments.
|
||||
env.render(mode='human')
|
||||
|
||||
# This returns the context information, not the full state observation
|
||||
obs = env.reset()
|
||||
observation, info = env.reset(seed=1)
|
||||
|
||||
for i in range(5):
|
||||
action = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(action)
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
|
||||
# Done is always True as we are working on the episode level, hence we always reset()
|
||||
obs = env.reset()
|
||||
# terminated or truncated is always True as we are working on the episode level, hence we always reset()
|
||||
observation, info = env.reset()
|
||||
```
|
||||
|
||||
To show all available environments, we provide some additional convenience variables. All of them return a dictionary
|
||||
with two keys `DMP` and `ProMP` that store a list of available environment ids.
|
||||
with the keys `DMP`, `ProMP`, `ProDMP` and `all` that store a list of available environment ids.
|
||||
|
||||
```python
|
||||
import fancy_gym
|
||||
|
||||
print("All Black-box tasks:")
|
||||
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("Fancy Black-box tasks:")
|
||||
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
@ -140,6 +157,9 @@ print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("MetaWorld Black-box tasks:")
|
||||
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("If you add custom envs, their mp versions will be found in:")
|
||||
print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
|
||||
```
|
||||
|
||||
### How to create a new MP task
|
||||
@ -151,23 +171,27 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
|
||||
from abc import abstractmethod
|
||||
from typing import Union, Tuple
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
|
||||
class RawInterfaceWrapper(gym.Wrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
"""
|
||||
Returns boolean mask of the same shape as the observation space.
|
||||
It determines whether the observation is returned for the contextual case or not.
|
||||
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||
Returns:
|
||||
bool array representing the indices of the observations
|
||||
|
||||
Returns boolean mask of the same shape as the observation space.
|
||||
It determines whether the observation is returned for the contextual case or not.
|
||||
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||
Returns:
|
||||
bool array representing the indices of the observations
|
||||
"""
|
||||
return np.ones(self.env.observation_space.shape[0], dtype=bool)
|
||||
|
||||
@ -197,34 +221,91 @@ class RawInterfaceWrapper(gym.Wrapper):
|
||||
|
||||
```
|
||||
|
||||
Default configurations for MPs can be overitten by defining attributes in mp_config.
|
||||
Available parameters are documented in the [MP_PyTorch Userguide](https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md).
|
||||
|
||||
```python
|
||||
class RawInterfaceWrapper(gym.Wrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'linear'
|
||||
# When selecting another generator type, the default configuration will not be merged for the attribute.
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 3,
|
||||
'num_basis_zero_start': 1,
|
||||
'num_basis_zero_goal': 1,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {}.
|
||||
}
|
||||
|
||||
[...]
|
||||
```
|
||||
|
||||
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
|
||||
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
|
||||
at the [examples](fancy_gym/examples/).
|
||||
|
||||
If the step-based is already registered with gym, you can simply do the following:
|
||||
|
||||
```python
|
||||
import fancy_gym
|
||||
fancy_gym.upgrade(
|
||||
id='custom/cool_new_env-v0',
|
||||
mp_wrapper=my_custom_MPWrapper
|
||||
)
|
||||
```
|
||||
|
||||
# Base environment name, according to structure of above example
|
||||
base_env_id = "dmc:ball_in_cup-catch"
|
||||
If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
|
||||
|
||||
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
|
||||
# You can also add other gym.Wrappers in case they are needed,
|
||||
# e.g. gym.wrappers.FlattenObservation for dict observations
|
||||
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
|
||||
kwargs = {...}
|
||||
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
|
||||
```python
|
||||
fancy_gym.register(
|
||||
id='custom/cool_new_env-v0',
|
||||
entry_point=my_custom_env,
|
||||
mp_wrapper=my_custom_MPWrapper
|
||||
)
|
||||
```
|
||||
|
||||
From this point on, you can access MP-version of your environments via
|
||||
|
||||
```python
|
||||
env = gym.make('custom_ProDMP/cool_new_env-v0')
|
||||
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
observation, info = env.reset()
|
||||
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(5):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
observation, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if done:
|
||||
print(base_env_id, rewards)
|
||||
if terminated or truncated:
|
||||
print(rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
observation, info = env.reset()
|
||||
```
|
||||
|
||||
## Citing the Project
|
||||
|
||||
To cite this repository in publications:
|
||||
|
||||
```bibtex
|
||||
@software{fancy_gym,
|
||||
title = {Fancy Gym},
|
||||
author = {Otto, Fabian and Celik, Onur and Roth, Dominik and Zhou, Hongyi},
|
||||
abstract = {Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.},
|
||||
url = {https://github.com/ALRhub/fancy_gym},
|
||||
organization = {Autonomous Learning Robots Lab (ALR) at KIT},
|
||||
}
|
||||
```
|
||||
|
||||
## Icon Attribution
|
||||
|
||||
The icon is based on the [Gymnasium](https://github.com/Farama-Foundation/Gymnasium) icon as can be found [here](https://gymnasium.farama.org/_static/img/gymnasium_black.svg).
|
||||
|
@ -1,13 +1,17 @@
|
||||
from fancy_gym import dmc, meta, open_ai
|
||||
from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank
|
||||
from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS
|
||||
# Convenience function for all MP environments
|
||||
from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS
|
||||
from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS
|
||||
from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS
|
||||
from fancy_gym import envs as fancy
|
||||
from fancy_gym.utils.make_env_helpers import make_bb
|
||||
from .envs.registry import register, upgrade
|
||||
from .envs.registry import ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS, MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS
|
||||
|
||||
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {
|
||||
key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
|
||||
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] +
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key]
|
||||
for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()}
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['dm_control']
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['fancy']
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['metaworld']
|
||||
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['gym']
|
||||
|
||||
|
||||
def make(*args, **kwargs):
|
||||
"""
|
||||
As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.
|
||||
"""
|
||||
raise Exception('As part of the refactor of Fancy Gym and upgrade to gymnasium the use of fancy_gym.make has been discontinued. Regular gym.make should be used instead. For more details check out the github README. If your codebase was build for older versions of Fancy Gym and relies on the old behavior and dependency versions, please check out the legacy branch.')
|
||||
|
@ -1,8 +1,9 @@
|
||||
from typing import Tuple, Optional, Callable
|
||||
from typing import Tuple, Optional, Callable, Dict, Any
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from gymnasium import spaces
|
||||
from gymnasium.core import ObsType
|
||||
from mp_pytorch.mp.mp_interfaces import MPInterface
|
||||
|
||||
from fancy_gym.black_box.controller.base_controller import BaseController
|
||||
@ -67,7 +68,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
self.reward_aggregation = reward_aggregation
|
||||
|
||||
# spaces
|
||||
self.return_context_observation = not (learn_sub_trajectories or self.do_replanning)
|
||||
self.return_context_observation = not (
|
||||
learn_sub_trajectories or self.do_replanning)
|
||||
self.traj_gen_action_space = self._get_traj_gen_action_space()
|
||||
self.action_space = self._get_action_space()
|
||||
self.observation_space = self._get_observation_space()
|
||||
@ -99,14 +101,17 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
# If we do not do this, the traj_gen assumes we are continuing the trajectory.
|
||||
self.traj_gen.reset()
|
||||
|
||||
clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
|
||||
clipped_params = np.clip(
|
||||
action, self.traj_gen_action_space.low, self.traj_gen_action_space.high)
|
||||
self.traj_gen.set_params(clipped_params)
|
||||
init_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt)
|
||||
init_time = np.array(
|
||||
0 if not self.do_replanning else self.current_traj_steps * self.dt)
|
||||
|
||||
condition_pos = self.condition_pos if self.condition_pos is not None else self.current_pos
|
||||
condition_vel = self.condition_vel if self.condition_vel is not None else self.current_vel
|
||||
condition_pos = self.condition_pos if self.condition_pos is not None else self.env.get_wrapper_attr('current_pos')
|
||||
condition_vel = self.condition_vel if self.condition_vel is not None else self.env.get_wrapper_attr('current_vel')
|
||||
|
||||
self.traj_gen.set_initial_conditions(init_time, condition_pos, condition_vel)
|
||||
self.traj_gen.set_initial_conditions(
|
||||
init_time, condition_pos, condition_vel)
|
||||
self.traj_gen.set_duration(duration, self.dt)
|
||||
|
||||
position = get_numpy(self.traj_gen.get_traj_pos())
|
||||
@ -153,7 +158,8 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
trajectory_length = len(position)
|
||||
rewards = np.zeros(shape=(trajectory_length,))
|
||||
if self.verbose >= 2:
|
||||
actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape)
|
||||
actions = np.zeros(shape=(trajectory_length,) +
|
||||
self.env.action_space.shape)
|
||||
observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape,
|
||||
dtype=self.env.observation_space.dtype)
|
||||
|
||||
@ -161,16 +167,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
done = False
|
||||
|
||||
if not traj_is_valid:
|
||||
obs, trajectory_return, done, infos = self.env.invalid_traj_callback(action, position, velocity,
|
||||
self.return_context_observation,
|
||||
self.tau_bound, self.delay_bound)
|
||||
return self.observation(obs), trajectory_return, done, infos
|
||||
obs, trajectory_return, terminated, truncated, infos = self.env.invalid_traj_callback(action, position, velocity,
|
||||
self.return_context_observation, self.tau_bound, self.delay_bound)
|
||||
return self.observation(obs), trajectory_return, terminated, truncated, infos
|
||||
|
||||
self.plan_steps += 1
|
||||
for t, (pos, vel) in enumerate(zip(position, velocity)):
|
||||
step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel)
|
||||
c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high)
|
||||
obs, c_reward, done, info = self.env.step(c_action)
|
||||
step_action = self.tracking_controller.get_action(
|
||||
pos, vel, self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'))
|
||||
c_action = np.clip(
|
||||
step_action, self.env.action_space.low, self.env.action_space.high)
|
||||
obs, c_reward, terminated, truncated, info = self.env.step(
|
||||
c_action)
|
||||
rewards[t] = c_reward
|
||||
|
||||
if self.verbose >= 2:
|
||||
@ -185,9 +193,7 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
if self.render_kwargs:
|
||||
self.env.render(**self.render_kwargs)
|
||||
|
||||
if done or (self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action,
|
||||
t + 1 + self.current_traj_steps)
|
||||
and self.plan_steps < self.max_planning_times):
|
||||
if terminated or truncated or (self.replanning_schedule(self.env.get_wrapper_attr('current_pos'), self.env.get_wrapper_attr('current_vel'), obs, c_action, t + 1 + self.current_traj_steps) and self.plan_steps < self.max_planning_times):
|
||||
|
||||
if self.condition_on_desired:
|
||||
self.condition_pos = pos
|
||||
@ -207,17 +213,18 @@ class BlackBoxWrapper(gym.ObservationWrapper):
|
||||
|
||||
infos['trajectory_length'] = t + 1
|
||||
trajectory_return = self.reward_aggregation(rewards[:t + 1])
|
||||
return self.observation(obs), trajectory_return, done, infos
|
||||
return self.observation(obs), trajectory_return, terminated, truncated, infos
|
||||
|
||||
def render(self, **kwargs):
|
||||
"""Only set render options here, such that they can be used during the rollout.
|
||||
This only needs to be called once"""
|
||||
self.render_kwargs = kwargs
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.current_traj_steps = 0
|
||||
self.plan_steps = 0
|
||||
self.traj_gen.reset()
|
||||
self.condition_pos = None
|
||||
self.condition_vel = None
|
||||
return super(BlackBoxWrapper, self).reset()
|
||||
return super(BlackBoxWrapper, self).reset(seed=seed, options=options)
|
||||
|
@ -11,11 +11,11 @@ def get_controller(controller_type: str, **kwargs):
|
||||
if controller_type == "motor":
|
||||
return PDController(**kwargs)
|
||||
elif controller_type == "velocity":
|
||||
return VelController()
|
||||
return VelController(**kwargs)
|
||||
elif controller_type == "position":
|
||||
return PosController()
|
||||
return PosController(**kwargs)
|
||||
elif controller_type == "metaworld":
|
||||
return MetaWorldController()
|
||||
return MetaWorldController(**kwargs)
|
||||
else:
|
||||
raise ValueError(f"Specified controller type {controller_type} not supported, "
|
||||
f"please choose one of {ALL_TYPES}.")
|
||||
|
@ -1,6 +1,6 @@
|
||||
from typing import Union, Tuple
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
from mp_pytorch.mp.mp_interfaces import MPInterface
|
||||
|
||||
@ -114,7 +114,8 @@ class RawInterfaceWrapper(gym.Wrapper):
|
||||
Returns:
|
||||
obs: artificial observation if the trajectory is invalid, by default a zero vector
|
||||
reward: artificial reward if the trajectory is invalid, by default 0
|
||||
done: artificial done if the trajectory is invalid, by default True
|
||||
terminated: artificial terminated if the trajectory is invalid, by default True
|
||||
truncated: artificial truncated if the trajectory is invalid, by default False
|
||||
info: artificial info if the trajectory is invalid, by default empty dict
|
||||
"""
|
||||
return np.zeros(1), 0, True, {}
|
||||
return np.zeros(1), 0, True, False, {}
|
||||
|
@ -9,11 +9,11 @@ environments in order to use our Motion Primitive gym interface with them.
|
||||
[//]: <> (These environments are wrapped-versions of their Deep Mind Control Suite (DMC) counterparts. Given most task can be)
|
||||
[//]: <> (solved in shorter horizon lengths than the original 1000 steps, we often shorten the episodes for those task.)
|
||||
|
||||
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|
||||
|---|---|---|---|---|
|
||||
|`dmc_ball_in_cup-catch_promp-v0`| A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2
|
||||
|`dmc_ball_in_cup-catch_dmp-v0`| A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000| 10 | 2
|
||||
|`dmc_reacher-easy_promp-v0`| A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4
|
||||
|`dmc_reacher-easy_dmp-v0`| A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000| 10 | 4
|
||||
|`dmc_reacher-hard_promp-v0`| A ProMP wrapped version of the "hard" task for the "reacher" environment.| 1000 | 10 | 4
|
||||
|`dmc_reacher-hard_dmp-v0`| A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4
|
||||
| Name | Description | Trajectory Horizon | Action Dimension | Context Dimension |
|
||||
| ---------------------------------------- | ------------------------------------------------------------------------------ | ------------------ | ---------------- | ----------------- |
|
||||
| `dm_control_ProDMP/ball_in_cup-catch-v0` | A ProMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
|
||||
| `dm_control_DMP/ball_in_cup-catch-v0` | A DMP wrapped version of the "catch" task for the "ball_in_cup" environment. | 1000 | 10 | 2 |
|
||||
| `dm_control_ProDMP/reacher-easy-v0` | A ProMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
|
||||
| `dm_control_DMP/reacher-easy-v0` | A DMP wrapped version of the "easy" task for the "reacher" environment. | 1000 | 10 | 4 |
|
||||
| `dm_control_ProDMP/reacher-hard-v0` | A ProMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
|
||||
| `dm_control_DMP/reacher-hard-v0` | A DMP wrapped version of the "hard" task for the "reacher" environment. | 1000 | 10 | 4 |
|
||||
|
@ -1,245 +1,61 @@
|
||||
from copy import deepcopy
|
||||
|
||||
from gymnasium.wrappers import FlattenObservation
|
||||
from gymnasium.envs.registration import register
|
||||
|
||||
from ..envs.registry import register
|
||||
|
||||
from . import manipulation, suite
|
||||
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
|
||||
|
||||
from gym.envs.registration import register
|
||||
|
||||
DEFAULT_BB_DICT_ProMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'promp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 50.,
|
||||
"d_gains": 1.,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1
|
||||
}
|
||||
}
|
||||
|
||||
DEFAULT_BB_DICT_DMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'dmp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'exp'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 50.,
|
||||
"d_gains": 1.,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'rbf',
|
||||
'num_basis': 5
|
||||
}
|
||||
}
|
||||
|
||||
# DeepMind Control Suite (DMC)
|
||||
kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch"
|
||||
kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper)
|
||||
# bandwidth_factor=2
|
||||
kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1
|
||||
register(
|
||||
id=f'dmc_ball_in_cup-catch_dmp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_bic_dmp
|
||||
id=f"dm_control/ball_in_cup-catch-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.ball_in_cup.MPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0")
|
||||
|
||||
kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch"
|
||||
kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper)
|
||||
register(
|
||||
id=f'dmc_ball_in_cup-catch_promp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_bic_promp
|
||||
id=f"dm_control/reacher-easy-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.reacher.MPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0")
|
||||
|
||||
kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy"
|
||||
kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper)
|
||||
# bandwidth_factor=2
|
||||
kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
register(
|
||||
id=f'dmc_reacher-easy_dmp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_bic_dmp
|
||||
id=f"dm_control/reacher-hard-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.reacher.MPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0")
|
||||
|
||||
kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy"
|
||||
kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper)
|
||||
kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
register(
|
||||
id=f'dmc_reacher-easy_promp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_reacher_easy_promp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0")
|
||||
|
||||
kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard"
|
||||
kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper)
|
||||
# bandwidth_factor = 2
|
||||
kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
register(
|
||||
id=f'dmc_reacher-hard_dmp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_reacher_hard_dmp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0")
|
||||
|
||||
kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard"
|
||||
kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper)
|
||||
kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
register(
|
||||
id=f'dmc_reacher-hard_promp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_reacher_hard_promp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0")
|
||||
|
||||
_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"]
|
||||
|
||||
for _task in _dmc_cartpole_tasks:
|
||||
_env_id = f'dmc_cartpole-{_task}_dmp-v0'
|
||||
kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}"
|
||||
kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper)
|
||||
# bandwidth_factor = 2
|
||||
kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole_dmp
|
||||
id=f'dm_control/cartpole-{_task}-v0',
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.cartpole.MPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
_env_id = f'dmc_cartpole-{_task}_promp-v0'
|
||||
kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}"
|
||||
kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper)
|
||||
kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10
|
||||
kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole_promp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles"
|
||||
kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
|
||||
# bandwidth_factor = 2
|
||||
kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10
|
||||
_env_id = f'dmc_cartpole-two_poles_dmp-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole2poles_dmp
|
||||
id=f"dm_control/cartpole-two_poles-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.cartpole.TwoPolesMPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles"
|
||||
kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper)
|
||||
kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10
|
||||
kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
_env_id = f'dmc_cartpole-two_poles_promp-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole2poles_promp
|
||||
id=f"dm_control/cartpole-three_poles-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=suite.cartpole.ThreePolesMPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles"
|
||||
kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
|
||||
# bandwidth_factor = 2
|
||||
kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10
|
||||
_env_id = f'dmc_cartpole-three_poles_dmp-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole3poles_dmp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles"
|
||||
kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper)
|
||||
kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10
|
||||
kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10
|
||||
kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
_env_id = f'dmc_cartpole-three_poles_promp-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_cartpole3poles_promp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# DeepMind Manipulation
|
||||
kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features"
|
||||
kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper)
|
||||
kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
# TODO: weight scale 50, but goal scale 0.1
|
||||
kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
register(
|
||||
id=f'dmc_manipulation-reach_site_dmp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_mani_reach_site_features_dmp
|
||||
id=f"dm_control/reach_site_features-v0",
|
||||
register_step_based=False,
|
||||
mp_wrapper=manipulation.reach_site.MPWrapper,
|
||||
add_mp_types=['DMP', 'ProMP'],
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0")
|
||||
|
||||
kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features"
|
||||
kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper)
|
||||
kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2
|
||||
kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
register(
|
||||
id=f'dmc_manipulation-reach_site_promp-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_mani_reach_site_features_promp
|
||||
)
|
||||
ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0")
|
||||
|
@ -1,186 +0,0 @@
|
||||
# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py
|
||||
# License: MIT
|
||||
# Copyright (c) 2020 Denis Yarats
|
||||
import collections
|
||||
from collections.abc import MutableMapping
|
||||
from typing import Any, Dict, Tuple, Optional, Union, Callable
|
||||
|
||||
import gym
|
||||
import numpy as np
|
||||
from dm_control import composer
|
||||
from dm_control.rl import control
|
||||
from dm_env import specs
|
||||
from gym import spaces
|
||||
from gym.core import ObsType
|
||||
|
||||
|
||||
def _spec_to_box(spec):
|
||||
def extract_min_max(s):
|
||||
assert s.dtype == np.float64 or s.dtype == np.float32, \
|
||||
f"Only float64 and float32 types are allowed, instead {s.dtype} was found"
|
||||
dim = int(np.prod(s.shape))
|
||||
if type(s) == specs.Array:
|
||||
bound = np.inf * np.ones(dim, dtype=s.dtype)
|
||||
return -bound, bound
|
||||
elif type(s) == specs.BoundedArray:
|
||||
zeros = np.zeros(dim, dtype=s.dtype)
|
||||
return s.minimum + zeros, s.maximum + zeros
|
||||
|
||||
mins, maxs = [], []
|
||||
for s in spec:
|
||||
mn, mx = extract_min_max(s)
|
||||
mins.append(mn)
|
||||
maxs.append(mx)
|
||||
low = np.concatenate(mins, axis=0)
|
||||
high = np.concatenate(maxs, axis=0)
|
||||
assert low.shape == high.shape
|
||||
return spaces.Box(low, high, dtype=s.dtype)
|
||||
|
||||
|
||||
def _flatten_obs(obs: MutableMapping):
|
||||
"""
|
||||
Flattens an observation of type MutableMapping, e.g. a dict to a 1D array.
|
||||
Args:
|
||||
obs: observation to flatten
|
||||
|
||||
Returns: 1D array of observation
|
||||
|
||||
"""
|
||||
|
||||
if not isinstance(obs, MutableMapping):
|
||||
raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.')
|
||||
|
||||
# Keep key order consistent for non OrderedDicts
|
||||
keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys())
|
||||
|
||||
obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys]
|
||||
return np.concatenate(obs_vals)
|
||||
|
||||
|
||||
class DMCWrapper(gym.Env):
|
||||
def __init__(self,
|
||||
env: Callable[[], Union[composer.Environment, control.Environment]],
|
||||
):
|
||||
|
||||
# TODO: Currently this is required to be a function because dmc does not allow to copy composers environments
|
||||
self._env = env()
|
||||
|
||||
# action and observation space
|
||||
self._action_space = _spec_to_box([self._env.action_spec()])
|
||||
self._observation_space = _spec_to_box(self._env.observation_spec().values())
|
||||
|
||||
self._window = None
|
||||
self.id = 'dmc'
|
||||
|
||||
def __getattr__(self, item):
|
||||
"""Propagate only non-existent properties to wrapped env."""
|
||||
if item.startswith('_'):
|
||||
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
|
||||
if item in self.__dict__:
|
||||
return getattr(self, item)
|
||||
return getattr(self._env, item)
|
||||
|
||||
def _get_obs(self, time_step):
|
||||
obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype)
|
||||
return obs
|
||||
|
||||
@property
|
||||
def observation_space(self):
|
||||
return self._observation_space
|
||||
|
||||
@property
|
||||
def action_space(self):
|
||||
return self._action_space
|
||||
|
||||
@property
|
||||
def dt(self):
|
||||
return self._env.control_timestep()
|
||||
|
||||
def seed(self, seed=None):
|
||||
self._action_space.seed(seed)
|
||||
self._observation_space.seed(seed)
|
||||
|
||||
def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]:
|
||||
assert self._action_space.contains(action)
|
||||
extra = {'internal_state': self._env.physics.get_state().copy()}
|
||||
|
||||
time_step = self._env.step(action)
|
||||
reward = time_step.reward or 0.
|
||||
done = time_step.last()
|
||||
obs = self._get_obs(time_step)
|
||||
extra['discount'] = time_step.discount
|
||||
|
||||
return obs, reward, done, extra
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
time_step = self._env.reset()
|
||||
obs = self._get_obs(time_step)
|
||||
return obs
|
||||
|
||||
def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False,
|
||||
segmentation=False, scene_option=None, render_flag_overrides=None):
|
||||
|
||||
# assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode
|
||||
if mode == "rgb_array":
|
||||
return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays,
|
||||
depth=depth, segmentation=segmentation, scene_option=scene_option,
|
||||
render_flag_overrides=render_flag_overrides)
|
||||
|
||||
# Render max available buffer size. Larger is only possible by altering the XML.
|
||||
img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight,
|
||||
width=self._env.physics.model.vis.global_.offwidth,
|
||||
camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation,
|
||||
scene_option=scene_option, render_flag_overrides=render_flag_overrides)
|
||||
|
||||
if depth:
|
||||
img = np.dstack([img.astype(np.uint8)] * 3)
|
||||
|
||||
if mode == 'human':
|
||||
try:
|
||||
import cv2
|
||||
if self._window is None:
|
||||
self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE)
|
||||
cv2.imshow(self.id, img[..., ::-1]) # Image in BGR
|
||||
cv2.waitKey(1)
|
||||
except ImportError:
|
||||
raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`")
|
||||
# PYGAME seems to destroy some global rendering configs from the physics render
|
||||
# except ImportError:
|
||||
# import pygame
|
||||
# img_copy = img.copy().transpose((1, 0, 2))
|
||||
# if self._window is None:
|
||||
# pygame.init()
|
||||
# pygame.display.init()
|
||||
# self._window = pygame.display.set_mode(img_copy.shape[:2])
|
||||
# self.clock = pygame.time.Clock()
|
||||
#
|
||||
# surf = pygame.surfarray.make_surface(img_copy)
|
||||
# self._window.blit(surf, (0, 0))
|
||||
# pygame.event.pump()
|
||||
# self.clock.tick(30)
|
||||
# pygame.display.flip()
|
||||
|
||||
def close(self):
|
||||
super().close()
|
||||
if self._window is not None:
|
||||
try:
|
||||
import cv2
|
||||
cv2.destroyWindow(self.id)
|
||||
except ImportError:
|
||||
import pygame
|
||||
|
||||
pygame.display.quit()
|
||||
pygame.quit()
|
||||
|
||||
@property
|
||||
def reward_range(self) -> Tuple[float, float]:
|
||||
reward_spec = self._env.reward_spec()
|
||||
if isinstance(reward_spec, specs.BoundedArray):
|
||||
return reward_spec.minimum, reward_spec.maximum
|
||||
return -float('inf'), float('inf')
|
||||
|
||||
@property
|
||||
def metadata(self):
|
||||
return {'render.modes': ['human', 'rgb_array'],
|
||||
'video.frames_per_second': round(1.0 / self._env.control_timestep())}
|
@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 0.2,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
},
|
||||
'phase_generator': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 500,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
@ -35,4 +57,4 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
@property
|
||||
def dt(self) -> Union[float, int]:
|
||||
return self.env.dt
|
||||
return self.env.control_timestep()
|
||||
|
@ -6,6 +6,25 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
},
|
||||
'phase_generator': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 10
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
@ -31,4 +50,4 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
@property
|
||||
def dt(self) -> Union[float, int]:
|
||||
return self.env.dt
|
||||
return self.env.control_timestep()
|
||||
|
@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 10,
|
||||
'd_gains': 10,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 0.2,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 10,
|
||||
'd_gains': 10,
|
||||
},
|
||||
'phase_generator': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 500,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
def __init__(self, env, n_poles: int = 1):
|
||||
self.n_poles = n_poles
|
||||
@ -35,7 +59,7 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
@property
|
||||
def dt(self) -> Union[float, int]:
|
||||
return self.env.dt
|
||||
return self.env.control_timestep()
|
||||
|
||||
|
||||
class TwoPolesMPWrapper(MPWrapper):
|
||||
|
@ -6,6 +6,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
'd_gains': 1.0,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 0.2,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 50.0,
|
||||
'd_gains': 1.0,
|
||||
},
|
||||
'phase_generator': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 500,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
@ -30,4 +54,4 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
@property
|
||||
def dt(self) -> Union[float, int]:
|
||||
return self.env.dt
|
||||
return self.env.control_timestep()
|
||||
|
@ -1,103 +1,43 @@
|
||||
from copy import deepcopy
|
||||
|
||||
import numpy as np
|
||||
from gym import register
|
||||
from gymnasium import register as gym_register
|
||||
from .registry import register, upgrade
|
||||
|
||||
from . import classic_control, mujoco
|
||||
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
|
||||
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
|
||||
from .classic_control.simple_reacher import MPWrapper as MPWrapper_SimpleReacher
|
||||
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
|
||||
from .classic_control.hole_reacher import MPWrapper as MPWrapper_HoleReacher
|
||||
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
|
||||
from .classic_control.viapoint_reacher import MPWrapper as MPWrapper_ViaPointReacher
|
||||
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
|
||||
from .mujoco.reacher.mp_wrapper import MPWrapper as MPWrapper_Reacher
|
||||
from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP
|
||||
from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP
|
||||
from .mujoco.beerpong.mp_wrapper import MPWrapper as MPWrapper_Beerpong
|
||||
from .mujoco.beerpong.mp_wrapper import MPWrapper_FixedRelease as MPWrapper_Beerpong_FixedRelease
|
||||
from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP
|
||||
from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP
|
||||
from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX
|
||||
from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW
|
||||
from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET
|
||||
from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER
|
||||
from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP
|
||||
from .mujoco.box_pushing.box_pushing_env import BoxPushingDense, BoxPushingTemporalSparse, \
|
||||
BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
|
||||
BoxPushingTemporalSpatialSparse, MAX_EPISODE_STEPS_BOX_PUSHING
|
||||
from .mujoco.table_tennis.table_tennis_env import TableTennisEnv, TableTennisWind, TableTennisGoalSwitching, \
|
||||
MAX_EPISODE_STEPS_TABLE_TENNIS
|
||||
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
|
||||
|
||||
DEFAULT_BB_DICT_ProMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'promp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1,
|
||||
'basis_bandwidth_factor': 3.0,
|
||||
},
|
||||
"black_box_kwargs": {
|
||||
}
|
||||
}
|
||||
|
||||
DEFAULT_BB_DICT_DMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'dmp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'exp'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'rbf',
|
||||
'num_basis': 5
|
||||
}
|
||||
}
|
||||
|
||||
DEFAULT_BB_DICT_ProDMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'prodmp',
|
||||
'duration': 2.0,
|
||||
'weights_scale': 1.0,
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'exp',
|
||||
'tau': 1.5,
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'prodmp',
|
||||
'alpha': 10,
|
||||
'num_basis': 5,
|
||||
},
|
||||
"black_box_kwargs": {
|
||||
}
|
||||
}
|
||||
MAX_EPISODE_STEPS_TABLE_TENNIS
|
||||
from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper as MPWrapper_TableTennis
|
||||
from .mujoco.table_tennis.mp_wrapper import TT_MPWrapper_Replan as MPWrapper_TableTennis_Replan
|
||||
from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper as MPWrapper_TableTennis_VelObs
|
||||
from .mujoco.table_tennis.mp_wrapper import TTVelObs_MPWrapper_Replan as MPWrapper_TableTennis_VelObs_Replan
|
||||
|
||||
# Classic Control
|
||||
## Simple Reacher
|
||||
# Simple Reacher
|
||||
register(
|
||||
id='SimpleReacher-v0',
|
||||
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
|
||||
id='fancy/SimpleReacher-v0',
|
||||
entry_point=SimpleReacherEnv,
|
||||
mp_wrapper=MPWrapper_SimpleReacher,
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 2,
|
||||
@ -105,19 +45,20 @@ register(
|
||||
)
|
||||
|
||||
register(
|
||||
id='LongSimpleReacher-v0',
|
||||
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
|
||||
id='fancy/LongSimpleReacher-v0',
|
||||
entry_point=SimpleReacherEnv,
|
||||
mp_wrapper=MPWrapper_SimpleReacher,
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
}
|
||||
)
|
||||
|
||||
## Viapoint Reacher
|
||||
|
||||
# Viapoint Reacher
|
||||
register(
|
||||
id='ViaPointReacher-v0',
|
||||
entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv',
|
||||
id='fancy/ViaPointReacher-v0',
|
||||
entry_point=ViaPointReacherEnv,
|
||||
mp_wrapper=MPWrapper_ViaPointReacher,
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
@ -126,10 +67,11 @@ register(
|
||||
}
|
||||
)
|
||||
|
||||
## Hole Reacher
|
||||
# Hole Reacher
|
||||
register(
|
||||
id='HoleReacher-v0',
|
||||
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
|
||||
id='fancy/HoleReacher-v0',
|
||||
entry_point=HoleReacherEnv,
|
||||
mp_wrapper=MPWrapper_HoleReacher,
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
@ -145,31 +87,35 @@ register(
|
||||
|
||||
# Mujoco
|
||||
|
||||
## Mujoco Reacher
|
||||
for _dims in [5, 7]:
|
||||
# Mujoco Reacher
|
||||
for dims in [5, 7]:
|
||||
register(
|
||||
id=f'Reacher{_dims}d-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
|
||||
id=f'fancy/Reacher{dims}d-v0',
|
||||
entry_point=ReacherEnv,
|
||||
mp_wrapper=MPWrapper_Reacher,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
|
||||
kwargs={
|
||||
"n_links": _dims,
|
||||
"n_links": dims,
|
||||
}
|
||||
)
|
||||
|
||||
register(
|
||||
id=f'Reacher{_dims}dSparse-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
|
||||
id=f'fancy/Reacher{dims}dSparse-v0',
|
||||
entry_point=ReacherEnv,
|
||||
mp_wrapper=MPWrapper_Reacher,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_REACHER,
|
||||
kwargs={
|
||||
"sparse": True,
|
||||
'reward_weight': 200,
|
||||
"n_links": _dims,
|
||||
"n_links": dims,
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
register(
|
||||
id='HopperJumpSparse-v0',
|
||||
id='fancy/HopperJumpSparse-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
|
||||
mp_wrapper=mujoco.hopper_jump.MPWrapper,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
|
||||
kwargs={
|
||||
"sparse": True,
|
||||
@ -177,8 +123,9 @@ register(
|
||||
)
|
||||
|
||||
register(
|
||||
id='HopperJump-v0',
|
||||
id='fancy/HopperJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
|
||||
mp_wrapper=mujoco.hopper_jump.MPWrapper,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
|
||||
kwargs={
|
||||
"sparse": False,
|
||||
@ -188,76 +135,117 @@ register(
|
||||
}
|
||||
)
|
||||
|
||||
# TODO: Add [MPs] later when finished (old TODO I moved here during refactor)
|
||||
register(
|
||||
id='AntJump-v0',
|
||||
id='fancy/AntJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register(
|
||||
id='HalfCheetahJump-v0',
|
||||
id='fancy/HalfCheetahJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register(
|
||||
id='HopperJumpOnBox-v0',
|
||||
id='fancy/HopperJumpOnBox-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register(
|
||||
id='HopperThrow-v0',
|
||||
id='fancy/HopperThrow-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register(
|
||||
id='HopperThrowInBasket-v0',
|
||||
id='fancy/HopperThrowInBasket-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register(
|
||||
id='Walker2DJump-v0',
|
||||
id='fancy/Walker2DJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
|
||||
add_mp_types=[],
|
||||
)
|
||||
|
||||
register( # [MPDone
|
||||
id='fancy/BeerPong-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
|
||||
mp_wrapper=MPWrapper_Beerpong,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
|
||||
add_mp_types=['ProMP'],
|
||||
)
|
||||
|
||||
# Here we use the same reward as in BeerPong-v0, but now consider after the release,
|
||||
# only one time step, i.e. we simulate until the end of th episode
|
||||
register(
|
||||
id='fancy/BeerPongStepBased-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
|
||||
mp_wrapper=MPWrapper_Beerpong_FixedRelease,
|
||||
max_episode_steps=FIXED_RELEASE_STEP,
|
||||
add_mp_types=['ProMP'],
|
||||
)
|
||||
|
||||
register(
|
||||
id='BeerPong-v0',
|
||||
id='fancy/BeerPongFixedRelease-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:BeerPongEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_BEERPONG,
|
||||
mp_wrapper=MPWrapper_Beerpong_FixedRelease,
|
||||
max_episode_steps=FIXED_RELEASE_STEP,
|
||||
add_mp_types=['ProMP'],
|
||||
)
|
||||
|
||||
# Box pushing environments with different rewards
|
||||
for reward_type in ["Dense", "TemporalSparse", "TemporalSpatialSparse"]:
|
||||
register(
|
||||
id='BoxPushing{}-v0'.format(reward_type),
|
||||
id='fancy/BoxPushing{}-v0'.format(reward_type),
|
||||
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
|
||||
mp_wrapper=mujoco.box_pushing.MPWrapper,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
|
||||
)
|
||||
register(
|
||||
id='BoxPushingRandomInit{}-v0'.format(reward_type),
|
||||
id='fancy/BoxPushingRandomInit{}-v0'.format(reward_type),
|
||||
entry_point='fancy_gym.envs.mujoco:BoxPushing{}'.format(reward_type),
|
||||
mp_wrapper=mujoco.box_pushing.MPWrapper,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_BOX_PUSHING,
|
||||
kwargs={"random_init": True}
|
||||
)
|
||||
|
||||
# Here we use the same reward as in BeerPong-v0, but now consider after the release,
|
||||
# only one time step, i.e. we simulate until the end of th episode
|
||||
register(
|
||||
id='BeerPongStepBased-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward',
|
||||
max_episode_steps=FIXED_RELEASE_STEP,
|
||||
)
|
||||
upgrade(
|
||||
id='fancy/BoxPushing{}Replan-v0'.format(reward_type),
|
||||
base_id='fancy/BoxPushing{}-v0'.format(reward_type),
|
||||
mp_wrapper=mujoco.box_pushing.ReplanMPWrapper,
|
||||
)
|
||||
|
||||
# Table Tennis environments
|
||||
for ctxt_dim in [2, 4]:
|
||||
register(
|
||||
id='TableTennis{}D-v0'.format(ctxt_dim),
|
||||
id='fancy/TableTennis{}D-v0'.format(ctxt_dim),
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
|
||||
mp_wrapper=MPWrapper_TableTennis,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
kwargs={
|
||||
"ctxt_dim": ctxt_dim,
|
||||
'frame_skip': 4,
|
||||
}
|
||||
)
|
||||
|
||||
register(
|
||||
id='fancy/TableTennis{}DReplan-v0'.format(ctxt_dim),
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisEnv',
|
||||
mp_wrapper=MPWrapper_TableTennis,
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
add_mp_types=['ProDMP'],
|
||||
kwargs={
|
||||
"ctxt_dim": ctxt_dim,
|
||||
'frame_skip': 4,
|
||||
@ -265,626 +253,39 @@ for ctxt_dim in [2, 4]:
|
||||
)
|
||||
|
||||
register(
|
||||
id='TableTennisWind-v0',
|
||||
id='fancy/TableTennisWind-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
|
||||
mp_wrapper=MPWrapper_TableTennis_VelObs,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
)
|
||||
|
||||
register(
|
||||
id='TableTennisGoalSwitching-v0',
|
||||
id='fancy/TableTennisWindReplan-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisWind',
|
||||
mp_wrapper=MPWrapper_TableTennis_VelObs_Replan,
|
||||
add_mp_types=['ProDMP'],
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
)
|
||||
|
||||
register(
|
||||
id='fancy/TableTennisGoalSwitching-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
|
||||
mp_wrapper=MPWrapper_TableTennis,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
kwargs={
|
||||
'goal_switching_step': 99
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
# movement Primitive Environments
|
||||
|
||||
## Simple Reacher
|
||||
_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"]
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}DMP-{_name[1]}'
|
||||
kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
|
||||
kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6
|
||||
kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075
|
||||
kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
|
||||
kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
kwargs_dict_simple_reacher_dmp['name'] = f"{_v}"
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_simple_reacher_dmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper)
|
||||
kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6
|
||||
kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075
|
||||
kwargs_dict_simple_reacher_promp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_simple_reacher_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# Viapoint reacher
|
||||
kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
|
||||
kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50
|
||||
kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0"
|
||||
register(
|
||||
id='ViaPointReacherDMP-v0',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# max_episode_steps=1,
|
||||
kwargs=kwargs_dict_via_point_reacher_dmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
|
||||
|
||||
kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper)
|
||||
kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0"
|
||||
register(
|
||||
id="ViaPointReacherProMP-v0",
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_via_point_reacher_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0")
|
||||
|
||||
## Hole Reacher
|
||||
_versions = ["HoleReacher-v0"]
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}DMP-{_name[1]}'
|
||||
kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
|
||||
kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
|
||||
kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500
|
||||
kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5
|
||||
kwargs_dict_hole_reacher_dmp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# max_episode_steps=1,
|
||||
kwargs=kwargs_dict_hole_reacher_dmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper)
|
||||
kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2
|
||||
kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity'
|
||||
kwargs_dict_hole_reacher_promp['name'] = f"{_v}"
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_hole_reacher_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
## ReacherNd
|
||||
_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"]
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}DMP-{_name[1]}'
|
||||
kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP)
|
||||
kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper)
|
||||
kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2
|
||||
kwargs_dict_reacher_dmp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# max_episode_steps=1,
|
||||
kwargs=kwargs_dict_reacher_dmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper)
|
||||
kwargs_dict_reacher_promp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_reacher_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
########################################################################################################################
|
||||
## Beerpong ProMP
|
||||
_versions = ['BeerPong-v0']
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
|
||||
kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True
|
||||
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
|
||||
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
|
||||
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
|
||||
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
|
||||
kwargs_dict_bp_promp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_bp_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
### BP with Fixed release
|
||||
_versions = ["BeerPongStepBased-v0", 'BeerPong-v0']
|
||||
for _v in _versions:
|
||||
if _v != 'BeerPong-v0':
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
else:
|
||||
_env_id = 'BeerPongFixedReleaseProMP-v0'
|
||||
kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper)
|
||||
kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62
|
||||
kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25])
|
||||
kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125])
|
||||
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2
|
||||
kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2
|
||||
kwargs_dict_bp_promp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_bp_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
########################################################################################################################
|
||||
|
||||
## Table Tennis needs to be fixed according to Zhou's implementation
|
||||
|
||||
# TODO: Add later when finished
|
||||
# ########################################################################################################################
|
||||
#
|
||||
# ## AntJump
|
||||
# _versions = ['AntJump-v0']
|
||||
# for _v in _versions:
|
||||
# _name = _v.split("-")
|
||||
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper)
|
||||
# kwargs_dict_ant_jump_promp['name'] = _v
|
||||
# register(
|
||||
# id=_env_id,
|
||||
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# kwargs=kwargs_dict_ant_jump_promp
|
||||
# )
|
||||
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
#
|
||||
# ########################################################################################################################
|
||||
#
|
||||
# ## HalfCheetahJump
|
||||
# _versions = ['HalfCheetahJump-v0']
|
||||
# for _v in _versions:
|
||||
# _name = _v.split("-")
|
||||
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper)
|
||||
# kwargs_dict_halfcheetah_jump_promp['name'] = _v
|
||||
# register(
|
||||
# id=_env_id,
|
||||
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# kwargs=kwargs_dict_halfcheetah_jump_promp
|
||||
# )
|
||||
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
#
|
||||
# ########################################################################################################################
|
||||
|
||||
|
||||
## HopperJump
|
||||
_versions = ['HopperJump-v0', 'HopperJumpSparse-v0',
|
||||
# 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0'
|
||||
]
|
||||
# TODO: Check if all environments work with the same MPWrapper
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper)
|
||||
kwargs_dict_hopper_jump_promp['name'] = _v
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_hopper_jump_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# ########################################################################################################################
|
||||
|
||||
## Box Pushing
|
||||
_versions = ['BoxPushingDense-v0', 'BoxPushingTemporalSparse-v0', 'BoxPushingTemporalSpatialSparse-v0',
|
||||
'BoxPushingRandomInitDense-v0', 'BoxPushingRandomInitTemporalSparse-v0',
|
||||
'BoxPushingRandomInitTemporalSpatialSparse-v0']
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_box_pushing_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_box_pushing_promp['wrappers'].append(mujoco.box_pushing.MPWrapper)
|
||||
kwargs_dict_box_pushing_promp['name'] = _v
|
||||
kwargs_dict_box_pushing_promp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
|
||||
kwargs_dict_box_pushing_promp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
|
||||
kwargs_dict_box_pushing_promp['basis_generator_kwargs']['basis_bandwidth_factor'] = 2 # 3.5, 4 to try
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_box_pushing_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
|
||||
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
|
||||
kwargs_dict_box_pushing_prodmp['name'] = _v
|
||||
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
|
||||
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
|
||||
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
|
||||
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
|
||||
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_box_pushing_prodmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
|
||||
kwargs_dict_box_pushing_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_box_pushing_prodmp['wrappers'].append(mujoco.box_pushing.MPWrapper)
|
||||
kwargs_dict_box_pushing_prodmp['name'] = _v
|
||||
kwargs_dict_box_pushing_prodmp['controller_kwargs']['p_gains'] = 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.])
|
||||
kwargs_dict_box_pushing_prodmp['controller_kwargs']['d_gains'] = 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.])
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.3
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['goal_scale'] = 0.3
|
||||
kwargs_dict_box_pushing_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
|
||||
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['num_basis'] = 4
|
||||
kwargs_dict_box_pushing_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
|
||||
kwargs_dict_box_pushing_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
|
||||
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['max_planning_times'] = 4
|
||||
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 25 == 0
|
||||
kwargs_dict_box_pushing_prodmp['black_box_kwargs']['condition_on_desired'] = True
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_box_pushing_prodmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
## Table Tennis
|
||||
_versions = ['TableTennis2D-v0', 'TableTennis4D-v0', 'TableTennisWind-v0', 'TableTennisGoalSwitching-v0']
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
kwargs_dict_tt_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
if _v == 'TableTennisWind-v0':
|
||||
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
|
||||
else:
|
||||
kwargs_dict_tt_promp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
|
||||
kwargs_dict_tt_promp['name'] = _v
|
||||
kwargs_dict_tt_promp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
|
||||
kwargs_dict_tt_promp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
|
||||
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_tau'] = True
|
||||
kwargs_dict_tt_promp['phase_generator_kwargs']['learn_delay'] = True
|
||||
kwargs_dict_tt_promp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
|
||||
kwargs_dict_tt_promp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
|
||||
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis'] = 3
|
||||
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_start'] = 1
|
||||
kwargs_dict_tt_promp['basis_generator_kwargs']['num_basis_zero_goal'] = 1
|
||||
kwargs_dict_tt_promp['black_box_kwargs']['verbose'] = 2
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_tt_promp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ProDMP-{_name[1]}'
|
||||
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
if _v == 'TableTennisWind-v0':
|
||||
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
|
||||
else:
|
||||
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
|
||||
kwargs_dict_tt_prodmp['name'] = _v
|
||||
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
|
||||
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['weights_scale'] = 0.7
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = True
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['relative_goal'] = True
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['disable_goal'] = True
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 3
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_tt_prodmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
for _v in _versions:
|
||||
_name = _v.split("-")
|
||||
_env_id = f'{_name[0]}ReplanProDMP-{_name[1]}'
|
||||
kwargs_dict_tt_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
if _v == 'TableTennisWind-v0':
|
||||
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TTVelObs_MPWrapper)
|
||||
else:
|
||||
kwargs_dict_tt_prodmp['wrappers'].append(mujoco.table_tennis.TT_MPWrapper)
|
||||
kwargs_dict_tt_prodmp['name'] = _v
|
||||
kwargs_dict_tt_prodmp['controller_kwargs']['p_gains'] = 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0])
|
||||
kwargs_dict_tt_prodmp['controller_kwargs']['d_gains'] = 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1])
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['auto_scale_basis'] = False
|
||||
kwargs_dict_tt_prodmp['trajectory_generator_kwargs']['goal_offset'] = 1.0
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['tau_bound'] = [0.8, 1.5]
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['delay_bound'] = [0.05, 0.15]
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_tau'] = True
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['learn_delay'] = True
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['num_basis'] = 2
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['alpha'] = 25.
|
||||
kwargs_dict_tt_prodmp['basis_generator_kwargs']['basis_bandwidth_factor'] = 3
|
||||
kwargs_dict_tt_prodmp['phase_generator_kwargs']['alpha_phase'] = 3
|
||||
kwargs_dict_tt_prodmp['black_box_kwargs']['max_planning_times'] = 3
|
||||
kwargs_dict_tt_prodmp['black_box_kwargs']['replanning_schedule'] = lambda pos, vel, obs, action, t : t % 50 == 0
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_tt_prodmp
|
||||
)
|
||||
ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
#
|
||||
# ## Walker2DJump
|
||||
# _versions = ['Walker2DJump-v0']
|
||||
# for _v in _versions:
|
||||
# _name = _v.split("-")
|
||||
# _env_id = f'{_name[0]}ProMP-{_name[1]}'
|
||||
# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper)
|
||||
# kwargs_dict_walker2d_jump_promp['name'] = _v
|
||||
# register(
|
||||
# id=_env_id,
|
||||
# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
# kwargs=kwargs_dict_walker2d_jump_promp
|
||||
# )
|
||||
# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
### Depricated, we will not provide non random starts anymore
|
||||
"""
|
||||
register(
|
||||
id='SimpleReacher-v1',
|
||||
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
|
||||
max_episode_steps=200,
|
||||
id='fancy/TableTennisGoalSwitchingReplan-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:TableTennisGoalSwitching',
|
||||
mp_wrapper=MPWrapper_TableTennis_Replan,
|
||||
add_mp_types=['ProDMP'],
|
||||
max_episode_steps=MAX_EPISODE_STEPS_TABLE_TENNIS,
|
||||
kwargs={
|
||||
"n_links": 2,
|
||||
"random_start": False
|
||||
'goal_switching_step': 99
|
||||
}
|
||||
)
|
||||
|
||||
register(
|
||||
id='LongSimpleReacher-v1',
|
||||
entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv',
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
"random_start": False
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='HoleReacher-v1',
|
||||
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
"random_start": False,
|
||||
"allow_self_collision": False,
|
||||
"allow_wall_collision": False,
|
||||
"hole_width": 0.25,
|
||||
"hole_depth": 1,
|
||||
"hole_x": None,
|
||||
"collision_penalty": 100,
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='HoleReacher-v2',
|
||||
entry_point='fancy_gym.envs.classic_control:HoleReacherEnv',
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"n_links": 5,
|
||||
"random_start": False,
|
||||
"allow_self_collision": False,
|
||||
"allow_wall_collision": False,
|
||||
"hole_width": 0.25,
|
||||
"hole_depth": 1,
|
||||
"hole_x": 2,
|
||||
"collision_penalty": 1,
|
||||
}
|
||||
)
|
||||
|
||||
# CtxtFree are v0, Contextual are v1
|
||||
register(
|
||||
id='AntJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:AntJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
# CtxtFree are v0, Contextual are v1
|
||||
register(
|
||||
id='HalfCheetahJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='HopperJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP,
|
||||
"context": False,
|
||||
"healthy_reward": 1.0
|
||||
}
|
||||
)
|
||||
|
||||
"""
|
||||
|
||||
### Deprecated used for CorL paper
|
||||
"""
|
||||
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
|
||||
for i in _vs:
|
||||
_env_id = f'ALRReacher{i}-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"steps_before_reward": 0,
|
||||
"n_links": 5,
|
||||
"balance": False,
|
||||
'_ctrl_cost_weight': i
|
||||
}
|
||||
)
|
||||
|
||||
_env_id = f'ALRReacherSparse{i}-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.envs.mujoco:ReacherEnv',
|
||||
max_episode_steps=200,
|
||||
kwargs={
|
||||
"steps_before_reward": 200,
|
||||
"n_links": 5,
|
||||
"balance": False,
|
||||
'_ctrl_cost_weight': i
|
||||
}
|
||||
)
|
||||
_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1]
|
||||
for i in _vs:
|
||||
_env_id = f'ALRReacher{i}ProMP-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
|
||||
kwargs={
|
||||
"name": f"{_env_id.replace('ProMP', '')}",
|
||||
"wrappers": [mujoco.reacher.MPWrapper],
|
||||
"mp_kwargs": {
|
||||
"num_dof": 5,
|
||||
"num_basis": 5,
|
||||
"duration": 4,
|
||||
"policy_type": "motor",
|
||||
# "weights_scale": 5,
|
||||
"n_zero_basis": 1,
|
||||
"zero_start": True,
|
||||
"policy_kwargs": {
|
||||
"p_gains": 1,
|
||||
"d_gains": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
_env_id = f'ALRReacherSparse{i}ProMP-v0'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper',
|
||||
kwargs={
|
||||
"name": f"{_env_id.replace('ProMP', '')}",
|
||||
"wrappers": [mujoco.reacher.MPWrapper],
|
||||
"mp_kwargs": {
|
||||
"num_dof": 5,
|
||||
"num_basis": 5,
|
||||
"duration": 4,
|
||||
"policy_type": "motor",
|
||||
# "weights_scale": 5,
|
||||
"n_zero_basis": 1,
|
||||
"zero_start": True,
|
||||
"policy_kwargs": {
|
||||
"p_gains": 1,
|
||||
"d_gains": 0.1
|
||||
}
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
register(
|
||||
id='HopperJumpOnBox-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='HopperThrow-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperThrowEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='HopperThrowInBasket-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
register(
|
||||
id='Walker2DJump-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv',
|
||||
max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP,
|
||||
kwargs={
|
||||
"max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP,
|
||||
"context": False
|
||||
}
|
||||
)
|
||||
register(id='TableTennis2DCtxt-v1',
|
||||
entry_point='fancy_gym.envs.mujoco:TTEnvGym',
|
||||
max_episode_steps=MAX_EPISODE_STEPS,
|
||||
kwargs={'ctxt_dim': 2, 'fixed_goal': True})
|
||||
|
||||
register(
|
||||
id='BeerPong-v0',
|
||||
entry_point='fancy_gym.envs.mujoco:BeerBongEnv',
|
||||
max_episode_steps=300,
|
||||
kwargs={
|
||||
"rndm_goal": False,
|
||||
"cup_goal_pos": [0.1, -2.0],
|
||||
"frame_skip": 2
|
||||
}
|
||||
)
|
||||
"""
|
||||
|
@ -1,18 +1,20 @@
|
||||
### Classic Control
|
||||
|
||||
## Step-based Environments
|
||||
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
||||
|---|---|---|---|---|
|
||||
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|
||||
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|
||||
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|
||||
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
|
||||
|
||||
| Name | Description | Horizon | Action Dimension | Observation Dimension |
|
||||
| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
|
||||
| `fancy/SimpleReacher-v0` | Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 2 | 9 |
|
||||
| `fancy/LongSimpleReacher-v0` | Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory. | 200 | 5 | 18 |
|
||||
| `fancy/ViaPointReacher-v0` | Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively. | 200 | 5 | 18 |
|
||||
| `fancy/HoleReacher-v0` | 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18 |
|
||||
|
||||
## MP Environments
|
||||
|Name| Description|Horizon|Action Dimension|Context Dimension
|
||||
|---|---|---|---|---|
|
||||
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|
||||
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|
||||
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|
||||
|
||||
[//]: |`HoleReacherProMPP-v0`|
|
||||
| Name | Description | Horizon | Action Dimension | Context Dimension |
|
||||
| ----------------------------------- | -------------------------------------------------------------------------------------------------------- | ------- | ---------------- | ----------------- |
|
||||
| `fancy_DMP/ViaPointReacher-v0` | A DMP provides a trajectory for the `fancy/ViaPointReacher-v0` task. | 200 | 25 |
|
||||
| `fancy_DMP/HoleReacherFixedGoal-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task with a fixed goal attractor. | 200 | 25 |
|
||||
| `fancy_DMP/HoleReacher-v0` | A DMP provides a trajectory for the `fancy/HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30 |
|
||||
|
||||
[//]: |`fancy/HoleReacherProMPP-v0`|
|
||||
|
@ -1,10 +1,10 @@
|
||||
from typing import Union, Tuple, Optional
|
||||
from typing import Union, Tuple, Optional, Any, Dict
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from gym.core import ObsType
|
||||
from gym.utils import seeding
|
||||
from gymnasium import spaces
|
||||
from gymnasium.core import ObsType
|
||||
from gymnasium.utils import seeding
|
||||
|
||||
from fancy_gym.envs.classic_control.utils import intersect
|
||||
|
||||
@ -55,7 +55,6 @@ class BaseReacherEnv(gym.Env):
|
||||
self.fig = None
|
||||
|
||||
self._steps = 0
|
||||
self.seed()
|
||||
|
||||
@property
|
||||
def dt(self) -> Union[float, int]:
|
||||
@ -69,10 +68,15 @@ class BaseReacherEnv(gym.Env):
|
||||
def current_vel(self):
|
||||
return self._angle_velocity.copy()
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
# Sample only orientation of first link, i.e. the arm is always straight.
|
||||
if self.random_start:
|
||||
super(BaseReacherEnv, self).reset(seed=seed, options=options)
|
||||
try:
|
||||
random_start = options.get('random_start', self.random_start)
|
||||
except AttributeError:
|
||||
random_start = self.random_start
|
||||
if random_start:
|
||||
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
|
||||
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
|
||||
self._start_pos = self._joint_angles.copy()
|
||||
@ -84,7 +88,7 @@ class BaseReacherEnv(gym.Env):
|
||||
self._update_joints()
|
||||
self._steps = 0
|
||||
|
||||
return self._get_obs().copy()
|
||||
return self._get_obs().copy(), {}
|
||||
|
||||
def _update_joints(self):
|
||||
"""
|
||||
@ -124,10 +128,6 @@ class BaseReacherEnv(gym.Env):
|
||||
def _terminate(self, info) -> bool:
|
||||
raise NotImplementedError
|
||||
|
||||
def seed(self, seed=None):
|
||||
self.np_random, seed = seeding.np_random(seed)
|
||||
return [seed]
|
||||
|
||||
def close(self):
|
||||
super(BaseReacherEnv, self).close()
|
||||
del self.fig
|
||||
|
@ -1,5 +1,5 @@
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from gymnasium import spaces
|
||||
|
||||
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
|
||||
|
||||
@ -32,6 +32,7 @@ class BaseReacherDirectEnv(BaseReacherEnv):
|
||||
reward, info = self._get_reward(action)
|
||||
|
||||
self._steps += 1
|
||||
done = self._terminate(info)
|
||||
terminated = self._terminate(info)
|
||||
truncated = False
|
||||
|
||||
return self._get_obs().copy(), reward, done, info
|
||||
return self._get_obs().copy(), reward, terminated, truncated, info
|
||||
|
@ -1,5 +1,5 @@
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from gymnasium import spaces
|
||||
|
||||
from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv
|
||||
|
||||
@ -31,6 +31,7 @@ class BaseReacherTorqueEnv(BaseReacherEnv):
|
||||
reward, info = self._get_reward(action)
|
||||
|
||||
self._steps += 1
|
||||
done = False
|
||||
terminated = False
|
||||
truncated = False
|
||||
|
||||
return self._get_obs().copy(), reward, done, info
|
||||
return self._get_obs().copy(), reward, terminated, truncated, info
|
||||
|
@ -1,17 +1,20 @@
|
||||
from typing import Union, Optional, Tuple
|
||||
from typing import Union, Optional, Tuple, Any, Dict
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from gym.core import ObsType
|
||||
from gymnasium import spaces
|
||||
from gymnasium.core import ObsType
|
||||
from matplotlib import patches
|
||||
|
||||
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
|
||||
from . import MPWrapper
|
||||
|
||||
MAX_EPISODE_STEPS_HOLEREACHER = 200
|
||||
|
||||
|
||||
class HoleReacherEnv(BaseReacherDirectEnv):
|
||||
|
||||
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
|
||||
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
|
||||
allow_wall_collision: bool = False, collision_penalty: float = 1000, rew_fct: str = "simple"):
|
||||
@ -40,7 +43,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
|
||||
[np.inf] # env steps, because reward start after n steps TODO: Maybe
|
||||
])
|
||||
# self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
|
||||
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
|
||||
self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
|
||||
|
||||
if rew_fct == "simple":
|
||||
from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward
|
||||
@ -54,13 +57,18 @@ class HoleReacherEnv(BaseReacherDirectEnv):
|
||||
else:
|
||||
raise ValueError("Unknown reward function {}".format(rew_fct))
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
|
||||
# initialize seed here as the random goal needs to be generated before the super reset()
|
||||
gym.Env.reset(self, seed=seed, options=options)
|
||||
|
||||
self._generate_hole()
|
||||
self._set_patches()
|
||||
self.reward_function.reset()
|
||||
|
||||
return super().reset()
|
||||
# do not provide seed to avoid setting it twice
|
||||
return super(HoleReacherEnv, self).reset(options=options)
|
||||
|
||||
def _get_reward(self, action: np.ndarray) -> (float, dict):
|
||||
return self.reward_function.get_reward(self)
|
||||
@ -160,7 +168,7 @@ class HoleReacherEnv(BaseReacherDirectEnv):
|
||||
|
||||
# all points that are above the hole
|
||||
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
|
||||
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
|
||||
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
|
||||
|
||||
# check if any of those points are below surface
|
||||
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
|
||||
@ -223,16 +231,3 @@ class HoleReacherEnv(BaseReacherDirectEnv):
|
||||
self.fig.gca().add_patch(left_block)
|
||||
self.fig.gca().add_patch(right_block)
|
||||
self.fig.gca().add_patch(hole_floor)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
env = HoleReacherEnv(5)
|
||||
env.reset()
|
||||
|
||||
for i in range(10000):
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, done, info = env.step(ac)
|
||||
env.render()
|
||||
if done:
|
||||
env.reset()
|
||||
|
@ -7,6 +7,30 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'velocity',
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 2,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'velocity',
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
# TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check
|
||||
'weights_scale': 500,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'alpha_phase': 2.5,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
return np.hstack([
|
||||
|
@ -7,6 +7,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.6,
|
||||
'd_gains': 0.075,
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.6,
|
||||
'd_gains': 0.075,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 50,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
return np.hstack([
|
||||
|
@ -1,11 +1,12 @@
|
||||
from typing import Iterable, Union, Optional, Tuple
|
||||
from typing import Iterable, Union, Optional, Tuple, Any, Dict
|
||||
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from gym import spaces
|
||||
from gym.core import ObsType
|
||||
from gymnasium import spaces
|
||||
from gymnasium.core import ObsType
|
||||
|
||||
from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv
|
||||
from . import MPWrapper
|
||||
|
||||
|
||||
class SimpleReacherEnv(BaseReacherTorqueEnv):
|
||||
@ -42,11 +43,15 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
|
||||
# def start_pos(self):
|
||||
# return self._start_pos
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
|
||||
# (Env will not behave deterministic otherwise)
|
||||
# Yes, there is probably a more elegant solution to this problem...
|
||||
self._generate_goal()
|
||||
|
||||
return super().reset()
|
||||
super().reset(seed=seed, options=options)
|
||||
self._generate_goal()
|
||||
return super().reset(seed=seed, options=options)
|
||||
|
||||
def _get_reward(self, action: np.ndarray):
|
||||
diff = self.end_effector - self._goal
|
||||
@ -127,15 +132,3 @@ class SimpleReacherEnv(BaseReacherTorqueEnv):
|
||||
|
||||
self.fig.canvas.draw()
|
||||
self.fig.canvas.flush_events()
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
env = SimpleReacherEnv(5)
|
||||
env.reset()
|
||||
for i in range(200):
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, done, info = env.step(ac)
|
||||
|
||||
env.render()
|
||||
if done:
|
||||
break
|
||||
|
@ -7,6 +7,26 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'velocity',
|
||||
},
|
||||
},
|
||||
'DMP': {
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'velocity',
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 50,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
return np.hstack([
|
||||
|
@ -1,11 +1,13 @@
|
||||
from typing import Iterable, Union, Tuple, Optional
|
||||
from typing import Iterable, Union, Tuple, Optional, Any, Dict
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import matplotlib.pyplot as plt
|
||||
import numpy as np
|
||||
from gym.core import ObsType
|
||||
from gymnasium import spaces
|
||||
from gymnasium.core import ObsType
|
||||
|
||||
from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv
|
||||
from . import MPWrapper
|
||||
|
||||
|
||||
class ViaPointReacherEnv(BaseReacherDirectEnv):
|
||||
@ -34,16 +36,21 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
|
||||
[np.inf] * 2, # x-y coordinates of target distance
|
||||
[np.inf] # env steps, because reward start after n steps
|
||||
])
|
||||
self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
|
||||
self.observation_space = spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
|
||||
|
||||
# @property
|
||||
# def start_pos(self):
|
||||
# return self._start_pos
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
# Reset twice to ensure we return obs after generating goal and generating goal after executing seeded reset.
|
||||
# (Env will not behave deterministic otherwise)
|
||||
# Yes, there is probably a more elegant solution to this problem...
|
||||
self._generate_goal()
|
||||
return super().reset()
|
||||
super().reset(seed=seed, options=options)
|
||||
self._generate_goal()
|
||||
return super().reset(seed=seed, options=options)
|
||||
|
||||
def _generate_goal(self):
|
||||
# TODO: Maybe improve this later, this can yield quite a lot of invalid settings
|
||||
@ -183,16 +190,3 @@ class ViaPointReacherEnv(BaseReacherDirectEnv):
|
||||
plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
|
||||
|
||||
plt.pause(0.01)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
env = ViaPointReacherEnv(5)
|
||||
env.reset()
|
||||
|
||||
for i in range(10000):
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, done, info = env.step(ac)
|
||||
env.render()
|
||||
if done:
|
||||
env.reset()
|
||||
|
@ -1,15 +1,48 @@
|
||||
# Custom Mujoco tasks
|
||||
|
||||
## Step-based Environments
|
||||
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
||||
|---|---|---|---|---|
|
||||
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|
||||
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|
||||
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|
||||
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|
||||
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|
||||
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|
||||
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|
||||
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|
||||
|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
|
||||
|
||||
| Name | Description | Horizon | Action Dimension | Observation Dimension |
|
||||
| ------------------------------------------ | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
|
||||
| `fancy/Reacher-v0` | Modified (5 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 5 | 21 |
|
||||
| `fancy/ReacherSparse-v0` | Same as `fancy/Reacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 5 | 21 |
|
||||
| `fancy/ReacherSparseBalanced-v0` | Same as `fancy/ReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 5 | 21 |
|
||||
| `fancy/LongReacher-v0` | Modified (7 links) gymnasiums's mujoco `Reacher-v2` (2 links) | 200 | 7 | 27 |
|
||||
| `fancy/LongReacherSparse-v0` | Same as `fancy/LongReacher-v0`, but the distance penalty is only provided in the last time step. | 200 | 7 | 27 |
|
||||
| `fancy/LongReacherSparseBalanced-v0` | Same as `fancy/LongReacherSparse-v0`, but the end-effector has to remain upright. | 200 | 7 | 27 |
|
||||
| `fancy/Reacher5d-v0` | Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
|
||||
| `fancy/Reacher5dSparse-v0` | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
|
||||
| `fancy/Reacher7d-v0` | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
|
||||
| `fancy/Reacher7dSparse-v0` | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
|
||||
| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
|
||||
| `fancy/HopperJump-v0` | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
|
||||
| `fancy/AntJump-v0` | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant` | 200 | 8 | 119 |
|
||||
| `fancy/HalfCheetahJump-v0` | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah` | 100 | 6 | 112 |
|
||||
| `fancy/HopperJumpOnBox-v0` | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 4 | 16 / 100\* |
|
||||
| `fancy/HopperThrow-v0` | Hopper Throw task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
|
||||
| `fancy/HopperThrowInBasket-v0` | Hopper Throw in Basket task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 18 / 100\* |
|
||||
| `fancy/Walker2DJump-v0` | Walker 2D Jump task, based on Gymnasium's `gym.envs.mujoco.Walker2d` | 300 | 6 | 18 / 19\* |
|
||||
| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
|
||||
| `fancy/BeerPongStepBased-v0` | Step-based Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
|
||||
| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
|
||||
| `fancy/BoxPushingDense-v0` | Custom Box-pushing task with dense rewards | 100 | 3 | 13 |
|
||||
| `fancy/BoxPushingTemporalSparse-v0` | Custom Box-pushing task with temporally sparse rewards | 100 | 3 | 13 |
|
||||
| `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards | 100 | 3 | 13 |
|
||||
| `fancy/TableTennis2D-v0` | Table Tennis task with 2D context, based on a custom environment for table tennis | 350 | 7 | 19 |
|
||||
| `fancy/TableTennis2DReplan-v0` | Table Tennis task with 2D context and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
|
||||
| `fancy/TableTennis4D-v0` | Table Tennis task with 4D context, based on a custom environment for table tennis | 350 | 7 | 22 |
|
||||
| `fancy/TableTennis4DReplan-v0` | Table Tennis task with 4D context and replanning, based on a custom environment for table tennis | 350 | 7 | 22 |
|
||||
| `fancy/TableTennisWind-v0` | Table Tennis task with wind effects, based on a custom environment for table tennis | 350 | 7 | 19 |
|
||||
| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
|
||||
| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
|
||||
|
||||
\*Observation dimensions depend on configuration.
|
||||
|
||||
<!--
|
||||
No longer used?
|
||||
| Name | Description | Horizon | Action Dimension | Observation Dimension |
|
||||
| --------------------------- | --------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
|
||||
| `fancy/BallInACupSimple-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip |
|
||||
| `fancy/BallInACup-v0` | Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip |
|
||||
| `fancy/BallInACupGoal-v0` | Similar to `fancy/BallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip |
|
||||
-->
|
||||
|
@ -1,8 +1,11 @@
|
||||
from typing import Tuple, Union, Optional
|
||||
from typing import Tuple, Union, Optional, Any, Dict
|
||||
|
||||
import numpy as np
|
||||
from gym.core import ObsType
|
||||
from gym.envs.mujoco.ant_v4 import AntEnv
|
||||
from gymnasium.core import ObsType
|
||||
from gymnasium.envs.mujoco.ant_v4 import AntEnv, DEFAULT_CAMERA_CONFIG
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
MAX_EPISODE_STEPS_ANTJUMP = 200
|
||||
|
||||
@ -12,8 +15,74 @@ MAX_EPISODE_STEPS_ANTJUMP = 200
|
||||
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh
|
||||
# as possible, while landing at a specific target position
|
||||
|
||||
class AntEnvCustomXML(AntEnv):
|
||||
def __init__(
|
||||
self,
|
||||
xml_file="ant.xml",
|
||||
ctrl_cost_weight=0.5,
|
||||
use_contact_forces=False,
|
||||
contact_cost_weight=5e-4,
|
||||
healthy_reward=1.0,
|
||||
terminate_when_unhealthy=True,
|
||||
healthy_z_range=(0.2, 1.0),
|
||||
contact_force_range=(-1.0, 1.0),
|
||||
reset_noise_scale=0.1,
|
||||
exclude_current_positions_from_observation=True,
|
||||
**kwargs,
|
||||
):
|
||||
utils.EzPickle.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
ctrl_cost_weight,
|
||||
use_contact_forces,
|
||||
contact_cost_weight,
|
||||
healthy_reward,
|
||||
terminate_when_unhealthy,
|
||||
healthy_z_range,
|
||||
contact_force_range,
|
||||
reset_noise_scale,
|
||||
exclude_current_positions_from_observation,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
class AntJumpEnv(AntEnv):
|
||||
self._ctrl_cost_weight = ctrl_cost_weight
|
||||
self._contact_cost_weight = contact_cost_weight
|
||||
|
||||
self._healthy_reward = healthy_reward
|
||||
self._terminate_when_unhealthy = terminate_when_unhealthy
|
||||
self._healthy_z_range = healthy_z_range
|
||||
|
||||
self._contact_force_range = contact_force_range
|
||||
|
||||
self._reset_noise_scale = reset_noise_scale
|
||||
|
||||
self._use_contact_forces = use_contact_forces
|
||||
|
||||
self._exclude_current_positions_from_observation = (
|
||||
exclude_current_positions_from_observation
|
||||
)
|
||||
|
||||
obs_shape = 27 + 1
|
||||
if not exclude_current_positions_from_observation:
|
||||
obs_shape += 2
|
||||
if use_contact_forces:
|
||||
obs_shape += 84
|
||||
|
||||
observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(obs_shape,), dtype=np.float64
|
||||
)
|
||||
|
||||
MujocoEnv.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
5,
|
||||
observation_space=observation_space,
|
||||
default_camera_config=DEFAULT_CAMERA_CONFIG,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
class AntJumpEnv(AntEnvCustomXML):
|
||||
"""
|
||||
Initialization changes to normal Ant:
|
||||
- healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc
|
||||
@ -61,9 +130,10 @@ class AntJumpEnv(AntEnv):
|
||||
|
||||
costs = ctrl_cost + contact_cost
|
||||
|
||||
done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
|
||||
terminated = bool(
|
||||
height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle
|
||||
|
||||
if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done:
|
||||
if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or terminated:
|
||||
# -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled
|
||||
# height_reward = -10 * (np.linalg.norm(self.max_height - self.goal))
|
||||
height_reward = -10 * np.linalg.norm(self.max_height - self.goal)
|
||||
@ -80,19 +150,21 @@ class AntJumpEnv(AntEnv):
|
||||
'max_height': self.max_height,
|
||||
'goal': self.goal
|
||||
}
|
||||
truncated = False
|
||||
|
||||
return obs, reward, done, info
|
||||
return obs, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.goal)
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.current_step = 0
|
||||
self.max_height = 0
|
||||
# goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
self.goal = self.np_random.uniform(1.0, 2.5, 1)
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# reset_model had to be implemented in every env to make it deterministic
|
||||
def reset_model(self):
|
||||
|
@ -1,9 +1,13 @@
|
||||
import os
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, Tuple
|
||||
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import MujocoEnv
|
||||
from gymnasium import utils
|
||||
from gymnasium.core import ObsType
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
import mujoco
|
||||
|
||||
MAX_EPISODE_STEPS_BEERPONG = 300
|
||||
FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2!
|
||||
@ -30,7 +34,16 @@ CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "c
|
||||
|
||||
|
||||
class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
def __init__(self):
|
||||
metadata = {
|
||||
"render_modes": [
|
||||
"human",
|
||||
"rgb_array",
|
||||
"depth_array",
|
||||
],
|
||||
"render_fps": 100
|
||||
}
|
||||
|
||||
def __init__(self, **kwargs):
|
||||
self._steps = 0
|
||||
# Small Context -> Easier. Todo: Should we do different versions?
|
||||
# self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml")
|
||||
@ -50,9 +63,9 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
self.repeat_action = 2
|
||||
# TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this
|
||||
self.model = None
|
||||
self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model,
|
||||
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
|
||||
x)
|
||||
self.geom_id = lambda x: mujoco.mj_name2id(self.model,
|
||||
mujoco.mjtObj.mjOBJ_GEOM,
|
||||
x)
|
||||
|
||||
# for reward calculation
|
||||
self.dists = []
|
||||
@ -65,7 +78,17 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
self.ball_in_cup = False
|
||||
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
|
||||
|
||||
MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco")
|
||||
self.observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(29,), dtype=np.float64
|
||||
)
|
||||
|
||||
MujocoEnv.__init__(
|
||||
self,
|
||||
self.xml_path,
|
||||
frame_skip=1,
|
||||
observation_space=self.observation_space,
|
||||
**kwargs
|
||||
)
|
||||
utils.EzPickle.__init__(self)
|
||||
|
||||
@property
|
||||
@ -76,7 +99,8 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
def start_vel(self):
|
||||
return self._start_vel
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.dists = []
|
||||
self.dists_final = []
|
||||
self.action_costs = []
|
||||
@ -86,7 +110,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
self.ball_cup_contact = False
|
||||
self.ball_in_cup = False
|
||||
self.dist_ground_cup = -1 # distance floor to cup if first floor contact
|
||||
return super().reset()
|
||||
return super().reset(seed=seed, options=options)
|
||||
|
||||
def reset_model(self):
|
||||
init_pos_all = self.init_qpos.copy()
|
||||
@ -128,11 +152,11 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
if not crash:
|
||||
reward, reward_infos = self._get_reward(applied_action)
|
||||
is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference
|
||||
done = is_collided
|
||||
terminated = is_collided
|
||||
self._steps += 1
|
||||
else:
|
||||
reward = -30
|
||||
done = True
|
||||
terminated = True
|
||||
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
|
||||
|
||||
infos = dict(
|
||||
@ -142,7 +166,10 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash,
|
||||
)
|
||||
infos.update(reward_infos)
|
||||
return ob, reward, done, infos
|
||||
|
||||
truncated = False
|
||||
|
||||
return ob, reward, terminated, truncated, infos
|
||||
|
||||
def _get_obs(self):
|
||||
theta = self.data.qpos.flat[:7].copy()
|
||||
@ -197,13 +224,13 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0
|
||||
action_cost = 1e-4 * np.mean(action_cost)
|
||||
reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \
|
||||
action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
|
||||
action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2
|
||||
# release step punishment
|
||||
min_time_bound = 0.1
|
||||
max_time_bound = 1.0
|
||||
release_time = self.release_step * self.dt
|
||||
release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \
|
||||
int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
|
||||
int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2)
|
||||
reward += release_time_rew
|
||||
success = self.ball_in_cup
|
||||
else:
|
||||
@ -258,9 +285,9 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
|
||||
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
|
||||
else:
|
||||
reward = 0
|
||||
done = True
|
||||
terminated, truncated = True, False
|
||||
while self._steps < MAX_EPISODE_STEPS_BEERPONG:
|
||||
obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
|
||||
obs, sub_reward, terminated, truncated, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
|
||||
np.zeros(a.shape))
|
||||
reward += sub_reward
|
||||
return obs, reward, done, infos
|
||||
return obs, reward, terminated, truncated, infos
|
||||
|
@ -1,9 +1,8 @@
|
||||
import os
|
||||
|
||||
import mujoco_py.builder
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import MujocoEnv
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
|
||||
from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward
|
||||
|
||||
@ -74,27 +73,24 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
crash = False
|
||||
for _ in range(self.repeat_action):
|
||||
applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
|
||||
try:
|
||||
self.do_simulation(applied_action, self.frame_skip)
|
||||
self.reward_function.initialize(self)
|
||||
# self.reward_function.check_contacts(self.sim) # I assume this is not important?
|
||||
if self._steps < self.release_step:
|
||||
self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
|
||||
self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
|
||||
crash = False
|
||||
except mujoco_py.builder.MujocoException:
|
||||
crash = True
|
||||
self.do_simulation(applied_action, self.frame_skip)
|
||||
self.reward_function.initialize(self)
|
||||
# self.reward_function.check_contacts(self.sim) # I assume this is not important?
|
||||
if self._steps < self.release_step:
|
||||
self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy()
|
||||
self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy()
|
||||
crash = False
|
||||
|
||||
ob = self._get_obs()
|
||||
|
||||
if not crash:
|
||||
reward, reward_infos = self.reward_function.compute_reward(self, applied_action)
|
||||
is_collided = reward_infos['is_collided']
|
||||
done = is_collided or self._steps == self.ep_length - 1
|
||||
terminated = is_collided or self._steps == self.ep_length - 1
|
||||
self._steps += 1
|
||||
else:
|
||||
reward = -30
|
||||
done = True
|
||||
terminated = True
|
||||
reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False}
|
||||
|
||||
infos = dict(
|
||||
@ -104,7 +100,7 @@ class BeerPongEnv(MujocoEnv, utils.EzPickle):
|
||||
q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash,
|
||||
)
|
||||
infos.update(reward_infos)
|
||||
return ob, reward, done, infos
|
||||
return ob, reward, terminated, infos
|
||||
|
||||
def _get_obs(self):
|
||||
theta = self.sim.data.qpos.flat[:7]
|
||||
@ -143,16 +139,16 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
|
||||
return super(BeerPongEnvStepBasedEpisodicReward, self).step(a)
|
||||
else:
|
||||
reward = 0
|
||||
done = False
|
||||
while not done:
|
||||
sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step(
|
||||
np.zeros(a.shape))
|
||||
terminated, truncated = False, False
|
||||
while not (terminated or truncated):
|
||||
sub_ob, sub_reward, terminated, truncated, sub_infos = super(BeerPongEnvStepBasedEpisodicReward,
|
||||
self).step(np.zeros(a.shape))
|
||||
reward += sub_reward
|
||||
infos = sub_infos
|
||||
ob = sub_ob
|
||||
ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
|
||||
# internal steps and thus, the observation also needs to be set correctly
|
||||
return ob, reward, done, infos
|
||||
return ob, reward, terminated, truncated, infos
|
||||
|
||||
|
||||
# class BeerBongEnvStepBased(BeerBongEnv):
|
||||
@ -186,27 +182,3 @@ class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv):
|
||||
# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the
|
||||
# # internal steps and thus, the observation also needs to be set correctly
|
||||
# return ob, reward, done, infos
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
env = BeerPongEnv(frame_skip=2)
|
||||
env.seed(0)
|
||||
# env = BeerBongEnvStepBased(frame_skip=2)
|
||||
# env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2)
|
||||
# env = BeerBongEnvFixedReleaseStep(frame_skip=2)
|
||||
import time
|
||||
|
||||
env.reset()
|
||||
env.render("human")
|
||||
for i in range(600):
|
||||
# ac = 10 * env.action_space.sample()
|
||||
ac = 0.05 * np.ones(7)
|
||||
obs, rew, d, info = env.step(ac)
|
||||
env.render("human")
|
||||
|
||||
if d:
|
||||
print('reward:', rew)
|
||||
print('RESETTING')
|
||||
env.reset()
|
||||
time.sleep(1)
|
||||
env.close()
|
||||
|
@ -6,6 +6,23 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'learn_tau': True
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
|
||||
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 2,
|
||||
'num_basis_zero_start': 2,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
@ -39,3 +56,23 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
xyz[-1] = 0.840
|
||||
self.model.body_pos[self.cup_table_id] = xyz
|
||||
return self.get_observation_from_step(self.get_obs())
|
||||
|
||||
|
||||
class MPWrapper_FixedRelease(MPWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'tau': 0.62,
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]),
|
||||
'd_gains': np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 2,
|
||||
'num_basis_zero_start': 2,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
@ -1 +1 @@
|
||||
from .mp_wrapper import MPWrapper
|
||||
from .mp_wrapper import MPWrapper, ReplanMPWrapper
|
||||
|
@ -1,8 +1,8 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
from gym import utils, spaces
|
||||
from gym.envs.mujoco import MujocoEnv
|
||||
from gymnasium import utils, spaces
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import rot_to_quat, get_quaternion_error, rotation_distance
|
||||
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import q_max, q_min, q_dot_max, q_torque_max
|
||||
from fancy_gym.envs.mujoco.box_pushing.box_pushing_utils import desired_rod_quat
|
||||
@ -13,6 +13,7 @@ MAX_EPISODE_STEPS_BOX_PUSHING = 100
|
||||
|
||||
BOX_POS_BOUND = np.array([[0.3, -0.45, -0.01], [0.6, 0.45, -0.01]])
|
||||
|
||||
|
||||
class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
"""
|
||||
franka box pushing environment
|
||||
@ -26,6 +27,15 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
3. time-spatial-depend sparse reward
|
||||
"""
|
||||
|
||||
metadata = {
|
||||
"render_modes": [
|
||||
"human",
|
||||
"rgb_array",
|
||||
"depth_array",
|
||||
],
|
||||
"render_fps": 50
|
||||
}
|
||||
|
||||
def __init__(self, frame_skip: int = 10, random_init: bool = False):
|
||||
utils.EzPickle.__init__(**locals())
|
||||
self._steps = 0
|
||||
@ -39,11 +49,16 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
self._desired_rod_quat = desired_rod_quat
|
||||
|
||||
self._episode_energy = 0.
|
||||
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(28,), dtype=np.float64
|
||||
)
|
||||
|
||||
self.random_init = random_init
|
||||
MujocoEnv.__init__(self,
|
||||
model_path=os.path.join(os.path.dirname(__file__), "assets", "box_pushing.xml"),
|
||||
frame_skip=self.frame_skip,
|
||||
mujoco_bindings="mujoco")
|
||||
observation_space=self.observation_space)
|
||||
self.action_space = spaces.Box(low=-1, high=1, shape=(7,))
|
||||
|
||||
def step(self, action):
|
||||
@ -89,7 +104,11 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
'is_success': True if episode_end and box_goal_pos_dist < 0.05 and box_goal_quat_dist < 0.5 else False,
|
||||
'num_steps': self._steps
|
||||
}
|
||||
return obs, reward, episode_end, infos
|
||||
|
||||
terminated = episode_end and infos['is_success']
|
||||
truncated = episode_end and not infos['is_success']
|
||||
|
||||
return obs, reward, terminated, truncated, infos
|
||||
|
||||
def reset_model(self):
|
||||
# rest box to initial position
|
||||
@ -250,7 +269,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
|
||||
old_err_norm = err_norm
|
||||
|
||||
### get Jacobian by mujoco
|
||||
# get Jacobian by mujoco
|
||||
self.data.qpos[:7] = q
|
||||
mujoco.mj_forward(self.model, self.data)
|
||||
|
||||
@ -284,6 +303,7 @@ class BoxPushingEnvBase(MujocoEnv, utils.EzPickle):
|
||||
|
||||
return q
|
||||
|
||||
|
||||
class BoxPushingDense(BoxPushingEnvBase):
|
||||
def __init__(self, frame_skip: int = 10, random_init: bool = False):
|
||||
super(BoxPushingDense, self).__init__(frame_skip=frame_skip, random_init=random_init)
|
||||
@ -299,7 +319,7 @@ class BoxPushingDense(BoxPushingEnvBase):
|
||||
energy_cost = -0.0005 * np.sum(np.square(action))
|
||||
|
||||
reward = joint_penalty + tcp_box_dist_reward + \
|
||||
box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
|
||||
box_goal_pos_dist_reward + box_goal_rot_dist_reward + energy_cost
|
||||
|
||||
rod_inclined_angle = rotation_distance(rod_quat, self._desired_rod_quat)
|
||||
if rod_inclined_angle > np.pi / 4:
|
||||
@ -307,6 +327,7 @@ class BoxPushingDense(BoxPushingEnvBase):
|
||||
|
||||
return reward
|
||||
|
||||
|
||||
class BoxPushingTemporalSparse(BoxPushingEnvBase):
|
||||
def __init__(self, frame_skip: int = 10, random_init: bool = False):
|
||||
super(BoxPushingTemporalSparse, self).__init__(frame_skip=frame_skip, random_init=random_init)
|
||||
@ -368,6 +389,7 @@ class BoxPushingTemporalSpatialSparse(BoxPushingEnvBase):
|
||||
|
||||
return reward
|
||||
|
||||
|
||||
class BoxPushingTemporalSpatialSparse2(BoxPushingEnvBase):
|
||||
|
||||
def __init__(self, frame_skip: int = 10, random_init: bool = False):
|
||||
|
@ -6,6 +6,27 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
|
||||
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_bandwidth_factor': 2 # 3.5, 4 to try
|
||||
}
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
|
||||
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_bandwidth_factor': 2 # 3.5, 4 to try
|
||||
}
|
||||
},
|
||||
}
|
||||
|
||||
# Random x goal + random init pos
|
||||
@property
|
||||
@ -38,3 +59,35 @@ class MPWrapper(RawInterfaceWrapper):
|
||||
@property
|
||||
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
return self.data.qvel[:7].copy()
|
||||
|
||||
|
||||
class ReplanMPWrapper(MPWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.01 * np.array([120., 120., 120., 120., 50., 30., 10.]),
|
||||
'd_gains': 0.01 * np.array([10., 10., 10., 10., 6., 5., 3.]),
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 0.3,
|
||||
'goal_scale': 0.3,
|
||||
'auto_scale_basis': True,
|
||||
'goal_offset': 1.0,
|
||||
'disable_goal': True,
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 5,
|
||||
'basis_bandwidth_factor': 3,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'alpha_phase': 3,
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'max_planning_times': 4,
|
||||
'replanning_schedule': lambda pos, vel, obs, action, t: t % 25 == 0,
|
||||
'condition_on_desired': True,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
@ -1,14 +1,68 @@
|
||||
import os
|
||||
from typing import Tuple, Union, Optional
|
||||
from typing import Tuple, Union, Optional, Any, Dict
|
||||
|
||||
import numpy as np
|
||||
from gym.core import ObsType
|
||||
from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv
|
||||
from gymnasium.core import ObsType
|
||||
from gymnasium.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv, DEFAULT_CAMERA_CONFIG
|
||||
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100
|
||||
|
||||
|
||||
class HalfCheetahJumpEnv(HalfCheetahEnv):
|
||||
class HalfCheetahEnvCustomXML(HalfCheetahEnv):
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight=1.0,
|
||||
ctrl_cost_weight=0.1,
|
||||
reset_noise_scale=0.1,
|
||||
exclude_current_positions_from_observation=True,
|
||||
**kwargs,
|
||||
):
|
||||
utils.EzPickle.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight,
|
||||
ctrl_cost_weight,
|
||||
reset_noise_scale,
|
||||
exclude_current_positions_from_observation,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
self._forward_reward_weight = forward_reward_weight
|
||||
|
||||
self._ctrl_cost_weight = ctrl_cost_weight
|
||||
|
||||
self._reset_noise_scale = reset_noise_scale
|
||||
|
||||
self._exclude_current_positions_from_observation = (
|
||||
exclude_current_positions_from_observation
|
||||
)
|
||||
|
||||
if exclude_current_positions_from_observation:
|
||||
observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
|
||||
)
|
||||
|
||||
MujocoEnv.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
5,
|
||||
observation_space=observation_space,
|
||||
default_camera_config=DEFAULT_CAMERA_CONFIG,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
class HalfCheetahJumpEnv(HalfCheetahEnvCustomXML):
|
||||
"""
|
||||
_ctrl_cost_weight 0.1 -> 0.0
|
||||
"""
|
||||
@ -41,10 +95,11 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
|
||||
height_after = self.get_body_com("torso")[2]
|
||||
self.max_height = max(height_after, self.max_height)
|
||||
|
||||
## Didnt use fell_over, because base env also has no done condition - Paul and Marc
|
||||
# Didnt use fell_over, because base env also has no done condition - Paul and Marc
|
||||
# fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke?
|
||||
# TODO: Should a fall over be checked here?
|
||||
done = False
|
||||
terminated = False
|
||||
truncated = False
|
||||
|
||||
ctrl_cost = self.control_cost(action)
|
||||
costs = ctrl_cost
|
||||
@ -63,17 +118,18 @@ class HalfCheetahJumpEnv(HalfCheetahEnv):
|
||||
'max_height': self.max_height
|
||||
}
|
||||
|
||||
return observation, reward, done, info
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.goal)
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.max_height = 0
|
||||
self.current_step = 0
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# overwrite reset_model to make it deterministic
|
||||
def reset_model(self):
|
||||
|
@ -6,6 +6,12 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
return np.hstack([
|
||||
|
@ -0,0 +1,52 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<visual>
|
||||
<map znear="0.02"/>
|
||||
</visual>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="goal_site_body" pos = "0 0 0">
|
||||
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
@ -1,52 +1,51 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<compiler angle="radian" autolimits="true"/>
|
||||
<option integrator="RK4"/>
|
||||
<visual>
|
||||
<map znear="0.02"/>
|
||||
</visual>
|
||||
<default class="main">
|
||||
<joint limited="true" armature="1" damping="1"/>
|
||||
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
|
||||
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
|
||||
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
|
||||
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
|
||||
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<site name="foot_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
|
||||
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
|
||||
<body name="torso" pos="0 0 1.25" gravcomp="0">
|
||||
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
|
||||
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
|
||||
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
|
||||
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="leg" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
<site name="foot_site" pos="-0.065 0 -0.06" size="0.02" rgba="1 0 0 1"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="goal_site_body" pos = "0 0 0">
|
||||
<site name="goal_site" pos="0 0 0.0" size="0.02 0.02 0.02" rgba="0 1 0 1" type="sphere"/>
|
||||
</body>
|
||||
<body name="goal_site_body" pos="0 0 0" gravcomp="0">
|
||||
<site name="goal_site" pos="0 0 0" size="0.02" rgba="0 1 0 1"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
||||
|
@ -1,51 +1,50 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<compiler angle="radian" autolimits="true"/>
|
||||
<option integrator="RK4"/>
|
||||
<visual>
|
||||
<map znear="0.02"/>
|
||||
</visual>
|
||||
<default class="main">
|
||||
<joint limited="true" armature="1" damping="1"/>
|
||||
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
|
||||
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
|
||||
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
|
||||
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
|
||||
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
|
||||
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
|
||||
<body name="torso" pos="0 0 1.25" gravcomp="0">
|
||||
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
|
||||
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
|
||||
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
|
||||
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="leg" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="box" pos="1 0 0">
|
||||
<geom friction="1.0" fromto="0.48 0 0 1 0 0" name="basket_ground_geom" size="0.3" type="box" rgba="1 0 0 1"/>
|
||||
<body name="box" pos="1 0 0" gravcomp="0">
|
||||
<geom name="basket_ground_geom" size="0.3 0.3 0.26" pos="-0.26 0 0" quat="0.707107 0 -0.707107 0" type="box" rgba="1 0 0 1"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
@ -1,12 +1,95 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
from gym.envs.mujoco.hopper_v4 import HopperEnv
|
||||
from gymnasium.envs.mujoco.hopper_v4 import HopperEnv, DEFAULT_CAMERA_CONFIG
|
||||
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
import mujoco
|
||||
|
||||
MAX_EPISODE_STEPS_HOPPERJUMP = 250
|
||||
|
||||
|
||||
class HopperJumpEnv(HopperEnv):
|
||||
class HopperEnvCustomXML(HopperEnv):
|
||||
"""
|
||||
Initialization changes to normal Hopper:
|
||||
- terminate_when_unhealthy: True -> False
|
||||
- healthy_reward: 1.0 -> 2.0
|
||||
- healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf'))
|
||||
- healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf'))
|
||||
- exclude_current_positions_from_observation: True -> False
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight=1.0,
|
||||
ctrl_cost_weight=1e-3,
|
||||
healthy_reward=1.0,
|
||||
terminate_when_unhealthy=True,
|
||||
healthy_state_range=(-100.0, 100.0),
|
||||
healthy_z_range=(0.7, float("inf")),
|
||||
healthy_angle_range=(-0.2, 0.2),
|
||||
reset_noise_scale=5e-3,
|
||||
exclude_current_positions_from_observation=True,
|
||||
**kwargs,
|
||||
):
|
||||
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
|
||||
utils.EzPickle.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight,
|
||||
ctrl_cost_weight,
|
||||
healthy_reward,
|
||||
terminate_when_unhealthy,
|
||||
healthy_state_range,
|
||||
healthy_z_range,
|
||||
healthy_angle_range,
|
||||
reset_noise_scale,
|
||||
exclude_current_positions_from_observation,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
self._forward_reward_weight = forward_reward_weight
|
||||
|
||||
self._ctrl_cost_weight = ctrl_cost_weight
|
||||
|
||||
self._healthy_reward = healthy_reward
|
||||
self._terminate_when_unhealthy = terminate_when_unhealthy
|
||||
|
||||
self._healthy_state_range = healthy_state_range
|
||||
self._healthy_z_range = healthy_z_range
|
||||
self._healthy_angle_range = healthy_angle_range
|
||||
|
||||
self._reset_noise_scale = reset_noise_scale
|
||||
|
||||
self._exclude_current_positions_from_observation = (
|
||||
exclude_current_positions_from_observation
|
||||
)
|
||||
|
||||
if not hasattr(self, 'observation_space'):
|
||||
if exclude_current_positions_from_observation:
|
||||
self.observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(15,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
self.observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(16,), dtype=np.float64
|
||||
)
|
||||
|
||||
MujocoEnv.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
4,
|
||||
observation_space=self.observation_space,
|
||||
default_camera_config=DEFAULT_CAMERA_CONFIG,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
class HopperJumpEnv(HopperEnvCustomXML):
|
||||
"""
|
||||
Initialization changes to normal Hopper:
|
||||
- terminate_when_unhealthy: True -> False
|
||||
@ -73,7 +156,7 @@ class HopperJumpEnv(HopperEnv):
|
||||
self.do_simulation(action, self.frame_skip)
|
||||
|
||||
height_after = self.get_body_com("torso")[2]
|
||||
#site_pos_after = self.data.get_site_xpos('foot_site')
|
||||
# site_pos_after = self.data.get_site_xpos('foot_site')
|
||||
site_pos_after = self.data.site('foot_site').xpos
|
||||
self.max_height = max(height_after, self.max_height)
|
||||
|
||||
@ -88,7 +171,8 @@ class HopperJumpEnv(HopperEnv):
|
||||
|
||||
ctrl_cost = self.control_cost(action)
|
||||
costs = ctrl_cost
|
||||
done = False
|
||||
terminated = False
|
||||
truncated = False
|
||||
|
||||
goal_dist = np.linalg.norm(site_pos_after - self.goal)
|
||||
if self.contact_dist is None and self.contact_with_floor:
|
||||
@ -115,7 +199,7 @@ class HopperJumpEnv(HopperEnv):
|
||||
healthy=self.is_healthy,
|
||||
contact_dist=self.contact_dist or 0
|
||||
)
|
||||
return observation, reward, done, info
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
# goal_dist = self.data.get_site_xpos('foot_site') - self.goal
|
||||
@ -140,8 +224,8 @@ class HopperJumpEnv(HopperEnv):
|
||||
noise_high[5] = 0.785
|
||||
|
||||
qpos = (
|
||||
self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
|
||||
self.init_qpos
|
||||
self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) +
|
||||
self.init_qpos
|
||||
)
|
||||
qvel = (
|
||||
# self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) +
|
||||
@ -162,12 +246,12 @@ class HopperJumpEnv(HopperEnv):
|
||||
# floor_geom_id = self.model.geom_name2id('floor')
|
||||
# foot_geom_id = self.model.geom_name2id('foot_geom')
|
||||
# TODO: do this properly over a sensor in the xml file, see dmc hopper
|
||||
floor_geom_id = self._mujoco_bindings.mj_name2id(self.model,
|
||||
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
|
||||
'floor')
|
||||
foot_geom_id = self._mujoco_bindings.mj_name2id(self.model,
|
||||
self._mujoco_bindings.mjtObj.mjOBJ_GEOM,
|
||||
'foot_geom')
|
||||
floor_geom_id = mujoco.mj_name2id(self.model,
|
||||
mujoco.mjtObj.mjOBJ_GEOM,
|
||||
'floor')
|
||||
foot_geom_id = mujoco.mj_name2id(self.model,
|
||||
mujoco.mjtObj.mjOBJ_GEOM,
|
||||
'foot_geom')
|
||||
for i in range(self.data.ncon):
|
||||
contact = self.data.contact[i]
|
||||
collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id
|
||||
|
@ -1,12 +1,16 @@
|
||||
import os
|
||||
from typing import Optional, Dict, Any, Tuple
|
||||
|
||||
import numpy as np
|
||||
from gym.envs.mujoco.hopper_v4 import HopperEnv
|
||||
from gymnasium.core import ObsType
|
||||
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
|
||||
from gymnasium import spaces
|
||||
|
||||
|
||||
MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250
|
||||
|
||||
|
||||
class HopperJumpOnBoxEnv(HopperEnv):
|
||||
class HopperJumpOnBoxEnv(HopperEnvCustomXML):
|
||||
"""
|
||||
Initialization changes to normal Hopper:
|
||||
- healthy_reward: 1.0 -> 0.01 -> 0.001
|
||||
@ -33,6 +37,16 @@ class HopperJumpOnBoxEnv(HopperEnv):
|
||||
self.hopper_on_box = False
|
||||
self.context = context
|
||||
self.box_x = 1
|
||||
|
||||
if exclude_current_positions_from_observation:
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(12,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(13,), dtype=np.float64
|
||||
)
|
||||
|
||||
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
|
||||
super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy,
|
||||
healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale,
|
||||
@ -74,10 +88,10 @@ class HopperJumpOnBoxEnv(HopperEnv):
|
||||
|
||||
costs = ctrl_cost
|
||||
|
||||
done = fell_over or self.hopper_on_box
|
||||
terminated = fell_over or self.hopper_on_box
|
||||
|
||||
if self.current_step >= self.max_episode_steps or done:
|
||||
done = False
|
||||
if self.current_step >= self.max_episode_steps or terminated:
|
||||
done = False # TODO why are we doing this???
|
||||
|
||||
max_height = self.max_height.copy()
|
||||
min_distance = self.min_distance.copy()
|
||||
@ -122,21 +136,25 @@ class HopperJumpOnBoxEnv(HopperEnv):
|
||||
'goal': self.box_x,
|
||||
}
|
||||
|
||||
return observation, reward, done, info
|
||||
truncated = self.current_step >= self.max_episode_steps and not terminated
|
||||
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.box_x)
|
||||
|
||||
def reset(self):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
|
||||
self.max_height = 0
|
||||
self.min_distance = 5000
|
||||
self.current_step = 0
|
||||
self.hopper_on_box = False
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
if self.context:
|
||||
self.box_x = self.np_random.uniform(1, 3, 1)
|
||||
self.model.body("box").pos = [self.box_x[0], 0, 0]
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# overwrite reset_model to make it deterministic
|
||||
def reset_model(self):
|
||||
@ -150,21 +168,3 @@ class HopperJumpOnBoxEnv(HopperEnv):
|
||||
|
||||
observation = self._get_obs()
|
||||
return observation
|
||||
|
||||
if __name__ == '__main__':
|
||||
render_mode = "human" # "human" or "partial" or "final"
|
||||
env = HopperJumpOnBoxEnv()
|
||||
obs = env.reset()
|
||||
|
||||
for i in range(2000):
|
||||
# objective.load_result("/tmp/cma")
|
||||
# test with random actions
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, d, info = env.step(ac)
|
||||
if i % 10 == 0:
|
||||
env.render(mode=render_mode)
|
||||
if d:
|
||||
print('After ', i, ' steps, done: ', d)
|
||||
env.reset()
|
||||
|
||||
env.close()
|
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
# Random x goal + random init pos
|
||||
@property
|
||||
|
@ -1,56 +1,54 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<compiler angle="radian" autolimits="true"/>
|
||||
<option integrator="RK4"/>
|
||||
<visual>
|
||||
<map znear="0.02"/>
|
||||
</visual>
|
||||
<default class="main">
|
||||
<joint limited="true" armature="1" damping="1"/>
|
||||
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
|
||||
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
|
||||
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
|
||||
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
|
||||
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
|
||||
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
|
||||
<body name="torso" pos="0 0 1.25" gravcomp="0">
|
||||
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
|
||||
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
|
||||
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
|
||||
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="leg" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="ball" pos="0 0 1.53">
|
||||
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
|
||||
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
|
||||
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
|
||||
<body name="ball" pos="0 0 1.53" gravcomp="0">
|
||||
<joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
|
||||
<site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
||||
|
@ -1,132 +1,129 @@
|
||||
<mujoco model="hopper">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="1" damping="1" limited="true"/>
|
||||
<geom conaffinity="1" condim="1" contype="1" margin="0.001" material="geom" rgba="0.8 0.6 .4 1" solimp=".8 .8 .01" solref=".02 1"/>
|
||||
<motor ctrllimited="true" ctrlrange="-.4 .4"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<compiler angle="radian" autolimits="true"/>
|
||||
<option integrator="RK4"/>
|
||||
<visual>
|
||||
<map znear="0.02"/>
|
||||
</visual>
|
||||
<default class="main">
|
||||
<joint limited="true" armature="1" damping="1"/>
|
||||
<geom condim="1" solimp="0.8 0.8 0.01 0.5 2" margin="0.001" material="geom" rgba="0.8 0.6 0.4 1"/>
|
||||
<general ctrllimited="true" ctrlrange="-0.4 0.4"/>
|
||||
</default>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
|
||||
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
|
||||
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
|
||||
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="20 20 .125" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.13/2 0 0.1">
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="2.0" fromto="-0.13 0 0.1 0.26 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
<geom name="floor" size="20 20 0.125" type="plane" condim="3" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
|
||||
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
|
||||
<body name="torso" pos="0 0 1.25" gravcomp="0">
|
||||
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
|
||||
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
|
||||
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
|
||||
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="leg" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.005 0.0001"/>
|
||||
<body name="foot" pos="0.065 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_joint" pos="-0.065 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_geom" size="0.06 0.195" quat="0.707107 0 -0.707107 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<body name="ball" pos="0 0 1.53">
|
||||
<joint armature="0" axis="1 0 0" damping="0.0" name="tar:x" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0.0" name="tar:y" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0.0" name="tar:z" pos="0 0 1.53" stiffness="0" type="slide" frictionloss="0" limited="false"/>
|
||||
<geom pos="0 0 1.53" priority= "1" size="0.025 0.025 0.025" type="sphere" condim="4" name="ball_geom" rgba="0.8 0.2 0.1 1" mass="0.1"
|
||||
friction="0.1 0.1 0.1" solimp="0.9 0.95 0.001 0.5 2" solref="-10000 -10"/>
|
||||
<site name="target_ball" pos="0 0 1.53" size="0.04 0.04 0.04" rgba="1 0 0 1" type="sphere"/>
|
||||
<body name="ball" pos="0 0 1.53" gravcomp="0">
|
||||
<joint name="tar:x" pos="0 0 0" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="tar:y" pos="0 0 0" axis="0 1 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="tar:z" pos="0 0 0" axis="0 0 1" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<geom name="ball_geom" size="0.025" condim="4" priority="1" friction="0.1 0.1 0.1" solref="-10000 -10" solimp="0.9 0.95 0.001 0.5 2" mass="0.1" rgba="0.8 0.2 0.1 1"/>
|
||||
<site name="target_ball" pos="0 0 0" size="0.04" rgba="1 0 0 1"/>
|
||||
</body>
|
||||
<body name="basket_ground" pos="5 0 0">
|
||||
<geom friction="0.9" fromto="5 0 0 5.3 0 0" name="basket_ground_geom" size="0.1 0.4 0.3" type="box"/>
|
||||
<body name="edge1" pos="5 0 0">
|
||||
<geom friction="2.0" fromto="5 0 0 5 0 0.2" name="edge1_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge2" pos="5 0 0.05">
|
||||
<geom friction="2.0" fromto="5 0.05 0 5 0.05 0.2" name="edge2_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge3" pos="5 0 0.1">
|
||||
<geom friction="2.0" fromto="5 0.1 0 5 0.1 0.2" name="edge3_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge4" pos="5 0 0.15">
|
||||
<geom friction="2.0" fromto="5 0.15 0 5 0.15 0.2" name="edge4_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge5" pos="5.05 0 0.15">
|
||||
<geom friction="2.0" fromto="5.05 0.15 0 5.05 0.15 0.2" name="edge5_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge6" pos="5.1 0 0.15">
|
||||
<geom friction="2.0" fromto="5.1 0.15 0 5.1 0.15 0.2" name="edge6_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge7" pos="5.15 0 0.15">
|
||||
<geom friction="2.0" fromto="5.15 0.15 0 5.15 0.15 0.2" name="edge7_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge8" pos="5.2 0 0.15">
|
||||
<geom friction="2.0" fromto="5.2 0.15 0 5.2 0.15 0.2" name="edge8_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge9" pos="5.25 0 0.15">
|
||||
<geom friction="2.0" fromto="5.25 0.15 0 5.25 0.15 0.2" name="edge9_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge10" pos="5.3 0 0.15">
|
||||
<geom friction="2.0" fromto="5.3 0.15 0 5.3 0.15 0.2" name="edge10_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge11" pos="5.3 0 0.1">
|
||||
<geom friction="2.0" fromto="5.3 0.1 0 5.3 0.1 0.2" name="edge11_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge12" pos="5.3 0 0.05">
|
||||
<geom friction="2.0" fromto="5.3 0.05 0 5.3 0.05 0.2" name="edge12_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge13" pos="5.3 0 0.0">
|
||||
<geom friction="2.0" fromto="5.3 0 0 5.3 0 0.2" name="edge13_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge14" pos="5.3 0 -0.05">
|
||||
<geom friction="2.0" fromto="5.3 -0.05 0 5.3 -0.05 0.2" name="edge14_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge15" pos="5.3 0 -0.1">
|
||||
<geom friction="2.0" fromto="5.3 -0.1 0 5.3 -0.1 0.2" name="edge15_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge16" pos="5.3 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.3 -0.15 0 5.3 -0.15 0.2" name="edge16_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
|
||||
<body name="edge20" pos="5.25 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.25 -0.15 0 5.25 -0.15 0.2" name="edge20_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge21" pos="5.2 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.2 -0.15 0 5.2 -0.15 0.2" name="edge21_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge22" pos="5.15 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.15 -0.15 0 5.15 -0.15 0.2" name="edge22_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge23" pos="5.1 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.1 -0.15 0 5.1 -0.15 0.2" name="edge23_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge24" pos="5.05 0 -0.15">
|
||||
<geom friction="2.0" fromto="5.05 -0.15 0 5.05 -0.15 0.2" name="edge24_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge25" pos="5 0 -0.15">
|
||||
<geom friction="2.0" fromto="5 -0.15 0 5 -0.15 0.2" name="edge25_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge26" pos="5 0 -0.1">
|
||||
<geom friction="2.0" fromto="5 -0.1 0 5 -0.1 0.2" name="edge26_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="edge27" pos="5 0 -0.05">
|
||||
<geom friction="2.0" fromto="5 -0.05 0 5 -0.05 0.2" name="edge27_geom" size="0.04" type="capsule"/>
|
||||
</body>
|
||||
<body name="basket_ground" pos="5 0 0" gravcomp="0">
|
||||
<geom name="basket_ground_geom" size="0.1 0.1 0.15" pos="0.15 0 0" quat="0.707107 0 -0.707107 0" type="box" friction="0.9 0.005 0.0001"/>
|
||||
<body name="edge1" pos="0 0 0" gravcomp="0">
|
||||
<geom name="edge1_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge2" pos="0 0 0.05" gravcomp="0">
|
||||
<geom name="edge2_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge3" pos="0 0 0.1" gravcomp="0">
|
||||
<geom name="edge3_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge4" pos="0 0 0.15" gravcomp="0">
|
||||
<geom name="edge4_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge5" pos="0.05 0 0.15" gravcomp="0">
|
||||
<geom name="edge5_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge6" pos="0.1 0 0.15" gravcomp="0">
|
||||
<geom name="edge6_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge7" pos="0.15 0 0.15" gravcomp="0">
|
||||
<geom name="edge7_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge8" pos="0.2 0 0.15" gravcomp="0">
|
||||
<geom name="edge8_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge9" pos="0.25 0 0.15" gravcomp="0">
|
||||
<geom name="edge9_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge10" pos="0.3 0 0.15" gravcomp="0">
|
||||
<geom name="edge10_geom" size="0.04 0.1" pos="0 0.15 -0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge11" pos="0.3 0 0.1" gravcomp="0">
|
||||
<geom name="edge11_geom" size="0.04 0.1" pos="0 0.1 0" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge12" pos="0.3 0 0.05" gravcomp="0">
|
||||
<geom name="edge12_geom" size="0.04 0.1" pos="0 0.05 0.05" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge13" pos="0.3 0 0" gravcomp="0">
|
||||
<geom name="edge13_geom" size="0.04 0.1" pos="0 0 0.1" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge14" pos="0.3 0 -0.05" gravcomp="0">
|
||||
<geom name="edge14_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge15" pos="0.3 0 -0.1" gravcomp="0">
|
||||
<geom name="edge15_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge16" pos="0.3 0 -0.15" gravcomp="0">
|
||||
<geom name="edge16_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge20" pos="0.25 0 -0.15" gravcomp="0">
|
||||
<geom name="edge20_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge21" pos="0.2 0 -0.15" gravcomp="0">
|
||||
<geom name="edge21_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge22" pos="0.15 0 -0.15" gravcomp="0">
|
||||
<geom name="edge22_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge23" pos="0.1 0 -0.15" gravcomp="0">
|
||||
<geom name="edge23_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge24" pos="0.05 0 -0.15" gravcomp="0">
|
||||
<geom name="edge24_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge25" pos="0 0 -0.15" gravcomp="0">
|
||||
<geom name="edge25_geom" size="0.04 0.1" pos="0 -0.15 0.25" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge26" pos="0 0 -0.1" gravcomp="0">
|
||||
<geom name="edge26_geom" size="0.04 0.1" pos="0 -0.1 0.2" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
<body name="edge27" pos="0 0 -0.05" gravcomp="0">
|
||||
<geom name="edge27_geom" size="0.04 0.1" pos="0 -0.05 0.15" quat="0 1 0 0" type="capsule" friction="2 0.005 0.0001"/>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="200.0" joint="foot_joint"/>
|
||||
<general joint="thigh_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_joint" ctrlrange="-1 1" gear="200 0 0 0 0 0" actdim="0"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
@ -1,13 +1,15 @@
|
||||
import os
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, Tuple
|
||||
|
||||
import numpy as np
|
||||
from gym.envs.mujoco.hopper_v4 import HopperEnv
|
||||
from gymnasium.core import ObsType
|
||||
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
|
||||
from gymnasium import spaces
|
||||
|
||||
MAX_EPISODE_STEPS_HOPPERTHROW = 250
|
||||
|
||||
|
||||
class HopperThrowEnv(HopperEnv):
|
||||
class HopperThrowEnv(HopperEnvCustomXML):
|
||||
"""
|
||||
Initialization changes to normal Hopper:
|
||||
- healthy_reward: 1.0 -> 0.0 -> 0.1
|
||||
@ -36,6 +38,16 @@ class HopperThrowEnv(HopperEnv):
|
||||
self.max_episode_steps = max_episode_steps
|
||||
self.context = context
|
||||
self.goal = 0
|
||||
|
||||
if not hasattr(self, 'observation_space'):
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
|
||||
)
|
||||
|
||||
super().__init__(xml_file=xml_file,
|
||||
forward_reward_weight=forward_reward_weight,
|
||||
ctrl_cost_weight=ctrl_cost_weight,
|
||||
@ -56,14 +68,14 @@ class HopperThrowEnv(HopperEnv):
|
||||
|
||||
# done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc
|
||||
ball_landed = bool(self.get_body_com("ball")[2] <= 0.05)
|
||||
done = ball_landed
|
||||
terminated = ball_landed
|
||||
|
||||
ctrl_cost = self.control_cost(action)
|
||||
costs = ctrl_cost
|
||||
|
||||
rewards = 0
|
||||
|
||||
if self.current_step >= self.max_episode_steps or done:
|
||||
if self.current_step >= self.max_episode_steps or terminated:
|
||||
distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \
|
||||
self._forward_reward_weight * ball_pos_after
|
||||
healthy_reward = 0 if self.context else self.healthy_reward * self.current_step
|
||||
@ -78,16 +90,19 @@ class HopperThrowEnv(HopperEnv):
|
||||
'_steps': self.current_step,
|
||||
'goal': self.goal,
|
||||
}
|
||||
truncated = False
|
||||
|
||||
return observation, reward, done, info
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.goal)
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.current_step = 0
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# overwrite reset_model to make it deterministic
|
||||
def reset_model(self):
|
||||
@ -101,22 +116,3 @@ class HopperThrowEnv(HopperEnv):
|
||||
|
||||
observation = self._get_obs()
|
||||
return observation
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
render_mode = "human" # "human" or "partial" or "final"
|
||||
env = HopperThrowEnv()
|
||||
obs = env.reset()
|
||||
|
||||
for i in range(2000):
|
||||
# objective.load_result("/tmp/cma")
|
||||
# test with random actions
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, d, info = env.step(ac)
|
||||
if i % 10 == 0:
|
||||
env.render(mode=render_mode)
|
||||
if d:
|
||||
print('After ', i, ' steps, done: ', d)
|
||||
env.reset()
|
||||
|
||||
env.close()
|
||||
|
@ -1,13 +1,16 @@
|
||||
import os
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, Tuple
|
||||
|
||||
import numpy as np
|
||||
from gym.envs.mujoco.hopper_v4 import HopperEnv
|
||||
from fancy_gym.envs.mujoco.hopper_jump.hopper_jump import HopperEnvCustomXML
|
||||
from gymnasium.core import ObsType
|
||||
from gymnasium import spaces
|
||||
|
||||
|
||||
MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250
|
||||
|
||||
|
||||
class HopperThrowInBasketEnv(HopperEnv):
|
||||
class HopperThrowInBasketEnv(HopperEnvCustomXML):
|
||||
"""
|
||||
Initialization changes to normal Hopper:
|
||||
- healthy_reward: 1.0 -> 0.0
|
||||
@ -42,6 +45,16 @@ class HopperThrowInBasketEnv(HopperEnv):
|
||||
self.context = context
|
||||
self.penalty = penalty
|
||||
self.basket_x = 5
|
||||
|
||||
if exclude_current_positions_from_observation:
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
|
||||
)
|
||||
|
||||
xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file)
|
||||
super().__init__(xml_file=xml_file,
|
||||
forward_reward_weight=forward_reward_weight,
|
||||
@ -65,14 +78,14 @@ class HopperThrowInBasketEnv(HopperEnv):
|
||||
|
||||
is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size
|
||||
is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + (
|
||||
self.basket_size / 2)
|
||||
self.basket_size / 2)
|
||||
is_in_basket_z = ball_pos[2] < 0.1
|
||||
is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z
|
||||
if is_in_basket:
|
||||
self.ball_in_basket = True
|
||||
|
||||
ball_landed = self.get_body_com("ball")[2] <= 0.05
|
||||
done = bool(ball_landed or is_in_basket)
|
||||
terminated = bool(ball_landed or is_in_basket)
|
||||
|
||||
rewards = 0
|
||||
|
||||
@ -80,7 +93,7 @@ class HopperThrowInBasketEnv(HopperEnv):
|
||||
|
||||
costs = ctrl_cost
|
||||
|
||||
if self.current_step >= self.max_episode_steps or done:
|
||||
if self.current_step >= self.max_episode_steps or terminated:
|
||||
|
||||
if is_in_basket:
|
||||
if not self.context:
|
||||
@ -101,23 +114,27 @@ class HopperThrowInBasketEnv(HopperEnv):
|
||||
info = {
|
||||
'ball_pos': ball_pos[0],
|
||||
}
|
||||
truncated = False
|
||||
|
||||
return observation, reward, done, info
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.basket_x)
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
|
||||
if self.max_episode_steps == 10:
|
||||
# We have to initialize this here, because the spec is only added after creating the env.
|
||||
self.max_episode_steps = self.spec.max_episode_steps
|
||||
|
||||
self.current_step = 0
|
||||
self.ball_in_basket = False
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
if self.context:
|
||||
self.basket_x = self.np_random.uniform(low=3, high=7, size=1)
|
||||
self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0]
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# overwrite reset_model to make it deterministic
|
||||
def reset_model(self):
|
||||
@ -132,22 +149,3 @@ class HopperThrowInBasketEnv(HopperEnv):
|
||||
|
||||
observation = self._get_obs()
|
||||
return observation
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
render_mode = "human" # "human" or "partial" or "final"
|
||||
env = HopperThrowInBasketEnv()
|
||||
obs = env.reset()
|
||||
|
||||
for i in range(2000):
|
||||
# objective.load_result("/tmp/cma")
|
||||
# test with random actions
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, d, info = env.step(ac)
|
||||
if i % 10 == 0:
|
||||
env.render(mode=render_mode)
|
||||
if d:
|
||||
print('After ', i, ' steps, done: ', d)
|
||||
env.reset()
|
||||
|
||||
env.close()
|
||||
|
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
|
@ -7,6 +7,16 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'alpha_phase': 2,
|
||||
},
|
||||
},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
return np.concatenate([[False] * self.n_links, # cos
|
||||
|
@ -1,8 +1,9 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
from gym import utils
|
||||
from gym.envs.mujoco import MujocoEnv
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
MAX_EPISODE_STEPS_REACHER = 200
|
||||
|
||||
@ -12,7 +13,17 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
|
||||
More general version of the gym mujoco Reacher environment
|
||||
"""
|
||||
|
||||
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1):
|
||||
metadata = {
|
||||
"render_modes": [
|
||||
"human",
|
||||
"rgb_array",
|
||||
"depth_array",
|
||||
],
|
||||
"render_fps": 50,
|
||||
}
|
||||
|
||||
def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1.,
|
||||
**kwargs):
|
||||
utils.EzPickle.__init__(**locals())
|
||||
|
||||
self._steps = 0
|
||||
@ -25,10 +36,16 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
|
||||
|
||||
file_name = f'reacher_{n_links}links.xml'
|
||||
|
||||
# sin, cos, velocity * n_Links + goal position (2) and goal distance (3)
|
||||
shape = (self.n_links * 3 + 5,)
|
||||
observation_space = Box(low=-np.inf, high=np.inf, shape=shape, dtype=np.float64)
|
||||
|
||||
MujocoEnv.__init__(self,
|
||||
model_path=os.path.join(os.path.dirname(__file__), "assets", file_name),
|
||||
frame_skip=2,
|
||||
mujoco_bindings="mujoco")
|
||||
observation_space=observation_space,
|
||||
**kwargs
|
||||
)
|
||||
|
||||
def step(self, action):
|
||||
self._steps += 1
|
||||
@ -45,10 +62,14 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
|
||||
|
||||
reward = reward_dist + reward_ctrl + angular_vel
|
||||
self.do_simulation(action, self.frame_skip)
|
||||
ob = self._get_obs()
|
||||
done = False
|
||||
if self.render_mode == "human":
|
||||
self.render()
|
||||
|
||||
infos = dict(
|
||||
ob = self._get_obs()
|
||||
terminated = False
|
||||
truncated = False
|
||||
|
||||
info = dict(
|
||||
reward_dist=reward_dist,
|
||||
reward_ctrl=reward_ctrl,
|
||||
velocity=angular_vel,
|
||||
@ -56,7 +77,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
|
||||
goal=self.goal if hasattr(self, "goal") else None
|
||||
)
|
||||
|
||||
return ob, reward, done, infos
|
||||
return ob, reward, terminated, truncated, info
|
||||
|
||||
def distance_reward(self):
|
||||
vec = self.get_body_com("fingertip") - self.get_body_com("target")
|
||||
@ -66,6 +87,7 @@ class ReacherEnv(MujocoEnv, utils.EzPickle):
|
||||
return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0
|
||||
|
||||
def viewer_setup(self):
|
||||
assert self.viewer is not None
|
||||
self.viewer.cam.trackbodyid = 0
|
||||
|
||||
def reset_model(self):
|
||||
|
@ -7,6 +7,53 @@ from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, j
|
||||
|
||||
|
||||
class TT_MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'learn_tau': False,
|
||||
'learn_delay': False,
|
||||
'tau_bound': [0.8, 1.5],
|
||||
'delay_bound': [0.05, 0.15],
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 3,
|
||||
'num_basis_zero_start': 1,
|
||||
'num_basis_zero_goal': 1,
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'verbose': 2,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'learn_tau': True,
|
||||
'learn_delay': True,
|
||||
'tau_bound': [0.8, 1.5],
|
||||
'delay_bound': [0.05, 0.15],
|
||||
'alpha_phase': 3,
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 3,
|
||||
'alpha': 25,
|
||||
'basis_bandwidth_factor': 3,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'weights_scale': 0.7,
|
||||
'auto_scale_basis': True,
|
||||
'relative_goal': True,
|
||||
'disable_goal': True,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
# Random x goal + random init pos
|
||||
@property
|
||||
@ -16,7 +63,7 @@ class TT_MPWrapper(RawInterfaceWrapper):
|
||||
[False] * 7, # joints velocity
|
||||
[True] * 2, # position ball x, y
|
||||
[False] * 1, # position ball z
|
||||
#[True] * 3, # velocity ball x, y, z
|
||||
# [True] * 3, # velocity ball x, y, z
|
||||
[True] * 2, # target landing position
|
||||
# [True] * 1, # time
|
||||
])
|
||||
@ -40,7 +87,58 @@ class TT_MPWrapper(RawInterfaceWrapper):
|
||||
return_contextual_obs: bool, tau_bound:list, delay_bound:list) -> Tuple[np.ndarray, float, bool, dict]:
|
||||
return self.get_invalid_traj_step_return(action, pos_traj, return_contextual_obs, tau_bound, delay_bound)
|
||||
|
||||
|
||||
class TT_MPWrapper_Replan(TT_MPWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'learn_tau': True,
|
||||
'learn_delay': True,
|
||||
'tau_bound': [0.8, 1.5],
|
||||
'delay_bound': [0.05, 0.15],
|
||||
'alpha_phase': 3,
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 2,
|
||||
'alpha': 25,
|
||||
'basis_bandwidth_factor': 3,
|
||||
},
|
||||
'trajectory_generator_kwargs': {
|
||||
'auto_scale_basis': True,
|
||||
'goal_offset': 1.0,
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'max_planning_times': 3,
|
||||
'replanning_schedule': lambda pos, vel, obs, action, t: t % 50 == 0,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
class TTVelObs_MPWrapper(TT_MPWrapper):
|
||||
# Will inherit mp_config from TT_MPWrapper
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
return np.hstack([
|
||||
[False] * 7, # joints position
|
||||
[False] * 7, # joints velocity
|
||||
[True] * 2, # position ball x, y
|
||||
[False] * 1, # position ball z
|
||||
[True] * 3, # velocity ball x, y, z
|
||||
[True] * 2, # target landing position
|
||||
# [True] * 1, # time
|
||||
])
|
||||
|
||||
|
||||
class TTVelObs_MPWrapper_Replan(TT_MPWrapper_Replan):
|
||||
# Will inherit mp_config from TT_MPWrapper_Replan
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
|
@ -1,8 +1,8 @@
|
||||
import os
|
||||
|
||||
import numpy as np
|
||||
from gym import utils, spaces
|
||||
from gym.envs.mujoco import MujocoEnv
|
||||
from gymnasium import utils, spaces
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
|
||||
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import is_init_state_valid, magnus_force
|
||||
from fancy_gym.envs.mujoco.table_tennis.table_tennis_utils import jnt_pos_low, jnt_pos_high
|
||||
@ -22,6 +22,16 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
"""
|
||||
7 DoF table tennis environment
|
||||
"""
|
||||
|
||||
metadata = {
|
||||
"render_modes": [
|
||||
"human",
|
||||
"rgb_array",
|
||||
"depth_array",
|
||||
],
|
||||
"render_fps": 125
|
||||
}
|
||||
|
||||
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4,
|
||||
goal_switching_step: int = None,
|
||||
enable_artificial_wind: bool = False):
|
||||
@ -50,10 +60,15 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
|
||||
self._artificial_force = 0.
|
||||
|
||||
if not hasattr(self, 'observation_space'):
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
|
||||
)
|
||||
|
||||
MujocoEnv.__init__(self,
|
||||
model_path=os.path.join(os.path.dirname(__file__), "assets", "xml", "table_tennis_env.xml"),
|
||||
frame_skip=frame_skip,
|
||||
mujoco_bindings="mujoco")
|
||||
observation_space=self.observation_space)
|
||||
|
||||
if ctxt_dim == 2:
|
||||
self.context_bounds = CONTEXT_BOUNDS_2DIMS
|
||||
@ -83,11 +98,11 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
unstable_simulation = False
|
||||
|
||||
if self._steps == self._goal_switching_step and self.np_random.uniform() < 0.5:
|
||||
new_goal_pos = self._generate_goal_pos(random=True)
|
||||
new_goal_pos[1] = -new_goal_pos[1]
|
||||
self._goal_pos = new_goal_pos
|
||||
self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
|
||||
mujoco.mj_forward(self.model, self.data)
|
||||
new_goal_pos = self._generate_goal_pos(random=True)
|
||||
new_goal_pos[1] = -new_goal_pos[1]
|
||||
self._goal_pos = new_goal_pos
|
||||
self.model.body_pos[5] = np.concatenate([self._goal_pos, [0.77]])
|
||||
mujoco.mj_forward(self.model, self.data)
|
||||
|
||||
for _ in range(self.frame_skip):
|
||||
if self._enable_artificial_wind:
|
||||
@ -102,7 +117,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
|
||||
if not self._hit_ball:
|
||||
self._hit_ball = self._contact_checker(self._ball_contact_id, self._bat_front_id) or \
|
||||
self._contact_checker(self._ball_contact_id, self._bat_back_id)
|
||||
self._contact_checker(self._ball_contact_id, self._bat_back_id)
|
||||
if not self._hit_ball:
|
||||
ball_land_on_floor_no_hit = self._contact_checker(self._ball_contact_id, self._floor_contact_id)
|
||||
if ball_land_on_floor_no_hit:
|
||||
@ -130,9 +145,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
reward = -25 if unstable_simulation else self._get_reward(self._terminated)
|
||||
|
||||
land_dist_err = np.linalg.norm(self._ball_landing_pos[:-1] - self._goal_pos) \
|
||||
if self._ball_landing_pos is not None else 10.
|
||||
if self._ball_landing_pos is not None else 10.
|
||||
|
||||
return self._get_obs(), reward, self._terminated, {
|
||||
info = {
|
||||
"hit_ball": self._hit_ball,
|
||||
"ball_returned_success": self._ball_return_success,
|
||||
"land_dist_error": land_dist_err,
|
||||
@ -140,6 +155,10 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
"num_steps": self._steps,
|
||||
}
|
||||
|
||||
terminated, truncated = self._terminated, False
|
||||
|
||||
return self._get_obs(), reward, terminated, truncated, info
|
||||
|
||||
def _contact_checker(self, id_1, id_2):
|
||||
for coni in range(0, self.data.ncon):
|
||||
con = self.data.contact[coni]
|
||||
@ -202,7 +221,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
if not self._hit_ball:
|
||||
return 0.2 * (1 - np.tanh(min_r_b_dist**2))
|
||||
if self._ball_landing_pos is None:
|
||||
min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:,:2] - self._goal_pos[:2], axis=1))
|
||||
min_b_des_b_dist = np.min(np.linalg.norm(np.array(self._ball_traj)[:, :2] - self._goal_pos[:2], axis=1))
|
||||
return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2))
|
||||
min_b_des_b_land_dist = np.linalg.norm(self._goal_pos[:2] - self._ball_landing_pos[:2])
|
||||
over_net_bonus = int(self._ball_landing_pos[0] < 0)
|
||||
@ -231,13 +250,13 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
violate_high_bound_error = np.mean(np.maximum(pos_traj - jnt_pos_high, 0))
|
||||
violate_low_bound_error = np.mean(np.maximum(jnt_pos_low - pos_traj, 0))
|
||||
invalid_penalty = tau_invalid_penalty + delay_invalid_penalty + \
|
||||
violate_high_bound_error + violate_low_bound_error
|
||||
violate_high_bound_error + violate_low_bound_error
|
||||
return -invalid_penalty
|
||||
|
||||
def get_invalid_traj_step_return(self, action, pos_traj, contextual_obs, tau_bound, delay_bound):
|
||||
obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
|
||||
obs = self._get_obs() if contextual_obs else np.concatenate([self._get_obs(), np.array([0])]) # 0 for invalid traj
|
||||
penalty = self._get_traj_invalid_penalty(action, pos_traj, tau_bound, delay_bound)
|
||||
return obs, penalty, True, {
|
||||
return obs, penalty, True, False, {
|
||||
"hit_ball": [False],
|
||||
"ball_returned_success": [False],
|
||||
"land_dist_error": [10.],
|
||||
@ -249,7 +268,7 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
@staticmethod
|
||||
def check_traj_validity(action, pos_traj, vel_traj, tau_bound, delay_bound):
|
||||
time_invalid = action[0] > tau_bound[1] or action[0] < tau_bound[0] \
|
||||
or action[1] > delay_bound[1] or action[1] < delay_bound[0]
|
||||
or action[1] > delay_bound[1] or action[1] < delay_bound[0]
|
||||
if time_invalid or np.any(pos_traj > jnt_pos_high) or np.any(pos_traj < jnt_pos_low):
|
||||
return False, pos_traj, vel_traj
|
||||
return True, pos_traj, vel_traj
|
||||
@ -257,6 +276,9 @@ class TableTennisEnv(MujocoEnv, utils.EzPickle):
|
||||
|
||||
class TableTennisWind(TableTennisEnv):
|
||||
def __init__(self, ctxt_dim: int = 4, frame_skip: int = 4):
|
||||
self.observation_space = spaces.Box(
|
||||
low=-np.inf, high=np.inf, shape=(22,), dtype=np.float64
|
||||
)
|
||||
super().__init__(ctxt_dim=ctxt_dim, frame_skip=frame_skip, enable_artificial_wind=True)
|
||||
|
||||
def _get_obs(self):
|
||||
|
@ -1,64 +1,60 @@
|
||||
<mujoco model="walker2d">
|
||||
<compiler angle="degree" coordinate="global" inertiafromgeom="true"/>
|
||||
<default>
|
||||
<joint armature="0.01" damping=".1" limited="true"/>
|
||||
<geom conaffinity="0" condim="3" contype="1" density="1000" friction=".7 .1 .1" rgba="0.8 0.6 .4 1"/>
|
||||
<compiler angle="radian" autolimits="true"/>
|
||||
<option integrator="RK4"/>
|
||||
<default class="main">
|
||||
<joint limited="true" armature="0.01" damping="0.1"/>
|
||||
<geom conaffinity="0" friction="0.7 0.1 0.1" rgba="0.8 0.6 0.4 1"/>
|
||||
</default>
|
||||
<option integrator="RK4" timestep="0.002"/>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1="0.4 0.5 0.6" rgb2="0 0 0" width="100" height="600"/>
|
||||
<texture type="cube" name="texgeom" builtin="flat" mark="cross" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" markrgb="1 1 1" width="127" height="762"/>
|
||||
<texture type="2d" name="texplane" builtin="checker" rgb1="0 0 0" rgb2="0.8 0.8 0.8" width="100" height="100"/>
|
||||
<material name="MatPlane" texture="texplane" texrepeat="60 60" specular="1" shininess="1" reflectance="0.5"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
<worldbody>
|
||||
<light cutoff="100" diffuse="1 1 1" dir="-0 0 -1.3" directional="true" exponent="1" pos="0 0 1.3" specular=".1 .1 .1"/>
|
||||
<geom conaffinity="1" condim="3" name="floor" pos="0 0 0" rgba="0.8 0.9 0.8 1" size="40 40 40" type="plane" material="MatPlane"/>
|
||||
<body name="torso" pos="0 0 1.25">
|
||||
<camera name="track" mode="trackcom" pos="0 -3 1" xyaxes="1 0 0 0 0 1"/>
|
||||
<joint armature="0" axis="1 0 0" damping="0" limited="false" name="rootx" pos="0 0 0" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 0 1" damping="0" limited="false" name="rootz" pos="0 0 0" ref="1.25" stiffness="0" type="slide"/>
|
||||
<joint armature="0" axis="0 1 0" damping="0" limited="false" name="rooty" pos="0 0 1.25" stiffness="0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.45 0 0 1.05" name="torso_geom" size="0.05" type="capsule"/>
|
||||
<body name="thigh" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_geom" size="0.05" type="capsule"/>
|
||||
<body name="leg" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_geom" size="0.04" type="capsule"/>
|
||||
<body name="foot" pos="0.2/2 0 0.1">
|
||||
<site name="foot_right_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="0 0 1 1" type="sphere"/>
|
||||
<joint axis="0 -1 0" name="foot_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="0.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_geom" size="0.06" type="capsule"/>
|
||||
<geom name="floor" size="40 40 40" type="plane" conaffinity="1" material="MatPlane" rgba="0.8 0.9 0.8 1"/>
|
||||
<light pos="0 0 1.3" dir="0 0 -1" directional="true" cutoff="100" exponent="1" diffuse="1 1 1" specular="0.1 0.1 0.1"/>
|
||||
<body name="torso" pos="0 0 1.25" gravcomp="0">
|
||||
<joint name="rootx" pos="0 0 -1.25" axis="1 0 0" limited="false" type="slide" armature="0" damping="0"/>
|
||||
<joint name="rootz" pos="0 0 -1.25" axis="0 0 1" limited="false" type="slide" ref="1.25" armature="0" damping="0"/>
|
||||
<joint name="rooty" pos="0 0 0" axis="0 1 0" limited="false" armature="0" damping="0"/>
|
||||
<geom name="torso_geom" size="0.05 0.2" type="capsule" friction="0.9 0.1 0.1"/>
|
||||
<camera name="track" pos="0 -3 -0.25" quat="0.707107 0.707107 0 0" mode="trackcom"/>
|
||||
<body name="thigh" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1"/>
|
||||
<body name="leg" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1"/>
|
||||
<body name="foot" pos="0.1 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="0.9 0.1 0.1"/>
|
||||
<site name="foot_right_site" pos="-0.1 0 -0.06" size="0.02" rgba="0 0 1 1"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
<!-- copied and then replace thigh->thigh_left, leg->leg_left, foot->foot_right -->
|
||||
<body name="thigh_left" pos="0 0 1.05">
|
||||
<joint axis="0 -1 0" name="thigh_left_joint" pos="0 0 1.05" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 1.05 0 0 0.6" name="thigh_left_geom" rgba=".7 .3 .6 1" size="0.05" type="capsule"/>
|
||||
<body name="leg_left" pos="0 0 0.35">
|
||||
<joint axis="0 -1 0" name="leg_left_joint" pos="0 0 0.6" range="-150 0" type="hinge"/>
|
||||
<geom friction="0.9" fromto="0 0 0.6 0 0 0.1" name="leg_left_geom" rgba=".7 .3 .6 1" size="0.04" type="capsule"/>
|
||||
<body name="foot_left" pos="0.2/2 0 0.1">
|
||||
<site name="foot_left_site" pos="0 0 0.04" size="0.02 0.02 0.02" rgba="1 0 0 1" type="sphere"/>
|
||||
<joint axis="0 -1 0" name="foot_left_joint" pos="0 0 0.1" range="-45 45" type="hinge"/>
|
||||
<geom friction="1.9" fromto="-0.0 0 0.1 0.2 0 0.1" name="foot_left_geom" rgba=".7 .3 .6 1" size="0.06" type="capsule"/>
|
||||
<body name="thigh_left" pos="0 0 -0.2" gravcomp="0">
|
||||
<joint name="thigh_left_joint" pos="0 0 0" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="thigh_left_geom" size="0.05 0.225" pos="0 0 -0.225" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
|
||||
<body name="leg_left" pos="0 0 -0.7" gravcomp="0">
|
||||
<joint name="leg_left_joint" pos="0 0 0.25" axis="0 -1 0" range="-2.61799 0"/>
|
||||
<geom name="leg_left_geom" size="0.04 0.25" type="capsule" friction="0.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
|
||||
<body name="foot_left" pos="0.1 0 -0.25" gravcomp="0">
|
||||
<joint name="foot_left_joint" pos="-0.1 0 0" axis="0 -1 0" range="-0.785398 0.785398"/>
|
||||
<geom name="foot_left_geom" size="0.06 0.1" quat="0.707107 0 -0.707107 0" type="capsule" friction="1.9 0.1 0.1" rgba="0.7 0.3 0.6 1"/>
|
||||
<site name="foot_left_site" pos="-0.1 0 -0.06" size="0.02" rgba="1 0 0 1"/>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</body>
|
||||
</worldbody>
|
||||
<actuator>
|
||||
<!-- <motor joint="torso_joint" ctrlrange="-100.0 100.0" isctrllimited="true"/>-->
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="thigh_left_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="leg_left_joint"/>
|
||||
<motor ctrllimited="true" ctrlrange="-1.0 1.0" gear="100" joint="foot_left_joint"/>
|
||||
<!-- <motor joint="finger2_rot" ctrlrange="-20.0 20.0" isctrllimited="true"/>-->
|
||||
<general joint="thigh_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="thigh_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="leg_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
<general joint="foot_left_joint" ctrlrange="-1 1" gear="100 0 0 0 0 0" actdim="0"/>
|
||||
</actuator>
|
||||
<asset>
|
||||
<texture type="skybox" builtin="gradient" rgb1=".4 .5 .6" rgb2="0 0 0"
|
||||
width="100" height="100"/>
|
||||
<texture builtin="flat" height="1278" mark="cross" markrgb="1 1 1" name="texgeom" random="0.01" rgb1="0.8 0.6 0.4" rgb2="0.8 0.6 0.4" type="cube" width="127"/>
|
||||
<texture builtin="checker" height="100" name="texplane" rgb1="0 0 0" rgb2="0.8 0.8 0.8" type="2d" width="100"/>
|
||||
<material name="MatPlane" reflectance="0.5" shininess="1" specular="1" texrepeat="60 60" texture="texplane"/>
|
||||
<material name="geom" texture="texgeom" texuniform="true"/>
|
||||
</asset>
|
||||
</mujoco>
|
||||
|
@ -6,6 +6,11 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self):
|
||||
|
@ -1,8 +1,13 @@
|
||||
import os
|
||||
from typing import Optional
|
||||
from typing import Optional, Any, Dict, Tuple
|
||||
|
||||
import numpy as np
|
||||
from gym.envs.mujoco.walker2d_v4 import Walker2dEnv
|
||||
from gymnasium.envs.mujoco.walker2d_v4 import Walker2dEnv, DEFAULT_CAMERA_CONFIG
|
||||
from gymnasium.core import ObsType
|
||||
|
||||
from gymnasium import utils
|
||||
from gymnasium.envs.mujoco import MujocoEnv
|
||||
from gymnasium.spaces import Box
|
||||
|
||||
MAX_EPISODE_STEPS_WALKERJUMP = 300
|
||||
|
||||
@ -11,8 +16,71 @@ MAX_EPISODE_STEPS_WALKERJUMP = 300
|
||||
# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height
|
||||
# as possible, while landing at a specific target position
|
||||
|
||||
class Walker2dEnvCustomXML(Walker2dEnv):
|
||||
def __init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight=1.0,
|
||||
ctrl_cost_weight=1e-3,
|
||||
healthy_reward=1.0,
|
||||
terminate_when_unhealthy=True,
|
||||
healthy_z_range=(0.8, 2.0),
|
||||
healthy_angle_range=(-1.0, 1.0),
|
||||
reset_noise_scale=5e-3,
|
||||
exclude_current_positions_from_observation=True,
|
||||
**kwargs,
|
||||
):
|
||||
utils.EzPickle.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
forward_reward_weight,
|
||||
ctrl_cost_weight,
|
||||
healthy_reward,
|
||||
terminate_when_unhealthy,
|
||||
healthy_z_range,
|
||||
healthy_angle_range,
|
||||
reset_noise_scale,
|
||||
exclude_current_positions_from_observation,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
class Walker2dJumpEnv(Walker2dEnv):
|
||||
self._forward_reward_weight = forward_reward_weight
|
||||
self._ctrl_cost_weight = ctrl_cost_weight
|
||||
|
||||
self._healthy_reward = healthy_reward
|
||||
self._terminate_when_unhealthy = terminate_when_unhealthy
|
||||
|
||||
self._healthy_z_range = healthy_z_range
|
||||
self._healthy_angle_range = healthy_angle_range
|
||||
|
||||
self._reset_noise_scale = reset_noise_scale
|
||||
|
||||
self._exclude_current_positions_from_observation = (
|
||||
exclude_current_positions_from_observation
|
||||
)
|
||||
|
||||
if exclude_current_positions_from_observation:
|
||||
observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(18,), dtype=np.float64
|
||||
)
|
||||
else:
|
||||
observation_space = Box(
|
||||
low=-np.inf, high=np.inf, shape=(19,), dtype=np.float64
|
||||
)
|
||||
|
||||
self.observation_space = observation_space
|
||||
|
||||
MujocoEnv.__init__(
|
||||
self,
|
||||
xml_file,
|
||||
4,
|
||||
observation_space=observation_space,
|
||||
default_camera_config=DEFAULT_CAMERA_CONFIG,
|
||||
**kwargs,
|
||||
)
|
||||
|
||||
|
||||
class Walker2dJumpEnv(Walker2dEnvCustomXML):
|
||||
"""
|
||||
healthy reward 1.0 -> 0.005 -> 0.0025 not from alex
|
||||
penalty 10 -> 0 not from alex
|
||||
@ -54,13 +122,13 @@ class Walker2dJumpEnv(Walker2dEnv):
|
||||
|
||||
self.max_height = max(height, self.max_height)
|
||||
|
||||
done = bool(height < 0.2)
|
||||
terminated = bool(height < 0.2)
|
||||
|
||||
ctrl_cost = self.control_cost(action)
|
||||
costs = ctrl_cost
|
||||
rewards = 0
|
||||
if self.current_step >= self.max_episode_steps or done:
|
||||
done = True
|
||||
if self.current_step >= self.max_episode_steps or terminated:
|
||||
terminated = True
|
||||
height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal))
|
||||
healthy_reward = self.healthy_reward * self.current_step
|
||||
|
||||
@ -73,17 +141,20 @@ class Walker2dJumpEnv(Walker2dEnv):
|
||||
'max_height': self.max_height,
|
||||
'goal': self.goal,
|
||||
}
|
||||
truncated = False
|
||||
|
||||
return observation, reward, done, info
|
||||
return observation, reward, terminated, truncated, info
|
||||
|
||||
def _get_obs(self):
|
||||
return np.append(super()._get_obs(), self.goal)
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None):
|
||||
def reset(self, *, seed: Optional[int] = None, options: Optional[Dict[str, Any]] = None) \
|
||||
-> Tuple[ObsType, Dict[str, Any]]:
|
||||
self.current_step = 0
|
||||
self.max_height = 0
|
||||
ret = super().reset(seed=seed, options=options)
|
||||
self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0
|
||||
return super().reset()
|
||||
return ret
|
||||
|
||||
# overwrite reset_model to make it deterministic
|
||||
def reset_model(self):
|
||||
@ -97,21 +168,3 @@ class Walker2dJumpEnv(Walker2dEnv):
|
||||
|
||||
observation = self._get_obs()
|
||||
return observation
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
render_mode = "human" # "human" or "partial" or "final"
|
||||
env = Walker2dJumpEnv()
|
||||
obs = env.reset()
|
||||
|
||||
for i in range(6000):
|
||||
# test with random actions
|
||||
ac = env.action_space.sample()
|
||||
obs, rew, d, info = env.step(ac)
|
||||
if i % 10 == 0:
|
||||
env.render(mode=render_mode)
|
||||
if d:
|
||||
print('After ', i, ' steps, done: ', d)
|
||||
env.reset()
|
||||
|
||||
env.close()
|
||||
|
309
fancy_gym/envs/registry.py
Normal file
309
fancy_gym/envs/registry.py
Normal file
@ -0,0 +1,309 @@
|
||||
from typing import Tuple, Union, Callable, List, Dict, Any, Optional
|
||||
|
||||
import copy
|
||||
import importlib
|
||||
import numpy as np
|
||||
from collections import defaultdict
|
||||
|
||||
from collections.abc import Mapping, MutableMapping
|
||||
|
||||
from fancy_gym.utils.make_env_helpers import make_bb
|
||||
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
from gymnasium import register as gym_register
|
||||
from gymnasium import make as gym_make
|
||||
from gymnasium.envs.registration import registry as gym_registry
|
||||
|
||||
|
||||
class DefaultMPWrapper(RawInterfaceWrapper):
|
||||
@property
|
||||
def context_mask(self):
|
||||
"""
|
||||
Returns boolean mask of the same shape as the observation space.
|
||||
It determines whether the observation is returned for the contextual case or not.
|
||||
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||
Returns:
|
||||
bool array representing the indices of the observations
|
||||
"""
|
||||
# If the env already defines a context_mask, we will use that
|
||||
if hasattr(self.env, 'context_mask'):
|
||||
return self.env.context_mask
|
||||
|
||||
# Otherwise we will use the whole observation as the context. (Write a custom MPWrapper to change this behavior)
|
||||
return np.full(self.env.observation_space.shape, True)
|
||||
|
||||
@property
|
||||
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
"""
|
||||
Returns the current position of the action/control dimension.
|
||||
The dimensionality has to match the action/control dimension.
|
||||
This is not required when exclusively using velocity control,
|
||||
it should, however, be implemented regardless.
|
||||
E.g. The joint positions that are directly or indirectly controlled by the action.
|
||||
"""
|
||||
assert hasattr(self.env, 'current_pos'), 'DefaultMPWrapper was unable to access env.current_pos. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
|
||||
return self.env.current_pos
|
||||
|
||||
@property
|
||||
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
"""
|
||||
Returns the current velocity of the action/control dimension.
|
||||
The dimensionality has to match the action/control dimension.
|
||||
This is not required when exclusively using position control,
|
||||
it should, however, be implemented regardless.
|
||||
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
||||
"""
|
||||
assert hasattr(self.env, 'current_vel'), 'DefaultMPWrapper was unable to access env.current_vel. Please write a custom MPWrapper (recommended) or expose this attribute directly.'
|
||||
return self.env.current_vel
|
||||
|
||||
|
||||
_BB_DEFAULTS = {
|
||||
'ProMP': {
|
||||
'wrappers': [],
|
||||
'trajectory_generator_kwargs': {
|
||||
'trajectory_generator_type': 'promp'
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'motor',
|
||||
'p_gains': 1.0,
|
||||
'd_gains': 0.1,
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1,
|
||||
'basis_bandwidth_factor': 3.0,
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
}
|
||||
},
|
||||
'DMP': {
|
||||
'wrappers': [],
|
||||
'trajectory_generator_kwargs': {
|
||||
'trajectory_generator_type': 'dmp'
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'exp'
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'motor',
|
||||
'p_gains': 1.0,
|
||||
'd_gains': 0.1,
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_generator_type': 'rbf',
|
||||
'num_basis': 5
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
}
|
||||
},
|
||||
'ProDMP': {
|
||||
'wrappers': [],
|
||||
'trajectory_generator_kwargs': {
|
||||
'trajectory_generator_type': 'prodmp',
|
||||
'duration': 2.0,
|
||||
'weights_scale': 1.0,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'exp',
|
||||
'tau': 1.5,
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'motor',
|
||||
'p_gains': 1.0,
|
||||
'd_gains': 0.1,
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_generator_type': 'prodmp',
|
||||
'alpha': 10,
|
||||
'num_basis': 5,
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
KNOWN_MPS = list(_BB_DEFAULTS.keys())
|
||||
_KNOWN_MPS_PLUS_ALL = KNOWN_MPS + ['all']
|
||||
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
|
||||
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS = {}
|
||||
|
||||
|
||||
def register(
|
||||
id: str,
|
||||
entry_point: Optional[Union[Callable, str]] = None,
|
||||
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
|
||||
register_step_based: bool = True, # TODO: Detect
|
||||
add_mp_types: List[str] = KNOWN_MPS,
|
||||
mp_config_override: Dict[str, Any] = {},
|
||||
**kwargs
|
||||
):
|
||||
"""
|
||||
Registers a Gymnasium environment, including Movement Primitives (MP) versions.
|
||||
If you only want to register MP versions for an already registered environment, use fancy_gym.upgrade instead.
|
||||
|
||||
Args:
|
||||
id (str): The unique identifier for the environment.
|
||||
entry_point (Optional[Union[Callable, str]]): The entry point for creating the environment.
|
||||
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment.
|
||||
register_step_based (bool): Whether to also register the raw srtep-based version of the environment (default True).
|
||||
add_mp_types (List[str]): List of additional MP types to register.
|
||||
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
|
||||
**kwargs: Additional keyword arguments which are passed to the environment constructor.
|
||||
|
||||
Notes:
|
||||
- When `register_step_based` is True, the raw environment will also be registered to gymnasium otherwise only mp-versions will be registered.
|
||||
- `entry_point` can be given as a string, allowing the same notation as gymnasium.
|
||||
- If `id` already exists in the Gymnasium registry and `register_step_based` is True,
|
||||
a warning message will be printed, suggesting to set `register_step_based=False` or use `fancy_gym.upgrade`.
|
||||
|
||||
Example:
|
||||
To register a step-based environment with Movement Primitive versions (will use default mp_wrapper):
|
||||
>>> register("MyEnv-v0", MyEnvClass"my_module:MyEnvClass")
|
||||
|
||||
The entry point can also be provided as a string:
|
||||
>>> register("MyEnv-v0", "my_module:MyEnvClass")
|
||||
|
||||
"""
|
||||
if register_step_based and id in gym_registry:
|
||||
print(f'[Info] Gymnasium env with id "{id}" already exists. You should supply register_step_based=False or use fancy_gym.upgrade if you only want to register mp versions of an existing env.')
|
||||
if register_step_based:
|
||||
assert entry_point != None, 'You need to provide an entry-point, when registering step-based.'
|
||||
if not callable(mp_wrapper): # mp_wrapper can be given as a String (same notation as for entry_point)
|
||||
mod_name, attr_name = mp_wrapper.split(':')
|
||||
mod = importlib.import_module(mod_name)
|
||||
mp_wrapper = getattr(mod, attr_name)
|
||||
if register_step_based:
|
||||
gym_register(id=id, entry_point=entry_point, **kwargs)
|
||||
upgrade(id, mp_wrapper, add_mp_types, mp_config_override)
|
||||
|
||||
|
||||
def upgrade(
|
||||
id: str,
|
||||
mp_wrapper: RawInterfaceWrapper = DefaultMPWrapper,
|
||||
add_mp_types: List[str] = KNOWN_MPS,
|
||||
base_id: Optional[str] = None,
|
||||
mp_config_override: Dict[str, Any] = {},
|
||||
):
|
||||
"""
|
||||
Upgrades an existing Gymnasium environment to include Movement Primitives (MP) versions.
|
||||
We expect the raw step-based env to be already registered with gymnasium. Otherwise please use fancy_gym.register instead.
|
||||
|
||||
Args:
|
||||
id (str): The unique identifier for the environment.
|
||||
mp_wrapper (RawInterfaceWrapper): The MP wrapper for the environment (default is DefaultMPWrapper).
|
||||
add_mp_types (List[str]): List of additional MP types to register (default is KNOWN_MPS).
|
||||
base_id (Optional[str]): The unique identifier for the environment to upgrade. Will use id if non is provided. Can be defined to allow multiple registrations of different versions for the same step-based environment.
|
||||
mp_config_override (Dict[str, Any]): Dictionary for overriding MP configuration.
|
||||
|
||||
Notes:
|
||||
- The `id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade. You can also pick a new one, but then `base_id` needs to be provided.
|
||||
- The `mp_wrapper` parameter specifies the MP wrapper to use, allowing for customization.
|
||||
- `add_mp_types` can be used to specify additional MP types to register alongside the base environment.
|
||||
- The `base_id` parameter should match the ID of the existing Gymnasium environment you wish to upgrade.
|
||||
- `mp_config_override` allows for customizing MP configuration if needed.
|
||||
|
||||
Example:
|
||||
To upgrade an existing environment with MP versions:
|
||||
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper)
|
||||
|
||||
To upgrade an existing environment with custom MP types and configuration:
|
||||
>>> upgrade("MyEnv-v0", mp_wrapper=CustomMPWrapper, add_mp_types=["ProDMP", "DMP"], mp_config_override={"param": 42})
|
||||
"""
|
||||
if not base_id:
|
||||
base_id = id
|
||||
register_mps(id, base_id, mp_wrapper, add_mp_types, mp_config_override)
|
||||
|
||||
|
||||
def register_mps(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, add_mp_types: List[str] = KNOWN_MPS, mp_config_override: Dict[str, Any] = {}):
|
||||
for mp_type in add_mp_types:
|
||||
register_mp(id, base_id, mp_wrapper, mp_type, mp_config_override.get(mp_type, {}))
|
||||
|
||||
|
||||
def register_mp(id: str, base_id: str, mp_wrapper: RawInterfaceWrapper, mp_type: List[str], mp_config_override: Dict[str, Any] = {}):
|
||||
assert mp_type in KNOWN_MPS, 'Unknown mp_type'
|
||||
assert id not in ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type], f'The environment {id} is already registered for {mp_type}.'
|
||||
|
||||
parts = id.split('/')
|
||||
if len(parts) == 1:
|
||||
ns, name = 'gym', parts[0]
|
||||
elif len(parts) == 2:
|
||||
ns, name = parts[0], parts[1]
|
||||
else:
|
||||
raise ValueError('env id can not contain multiple "/".')
|
||||
|
||||
parts = name.split('-')
|
||||
assert len(parts) >= 2 and parts[-1].startswith('v'), 'Malformed env id, must end in -v{int}.'
|
||||
|
||||
fancy_id = f'{ns}_{mp_type}/{name}'
|
||||
|
||||
gym_register(
|
||||
id=fancy_id,
|
||||
entry_point=bb_env_constructor,
|
||||
kwargs={
|
||||
'underlying_id': base_id,
|
||||
'mp_wrapper': mp_wrapper,
|
||||
'mp_type': mp_type,
|
||||
'_mp_config_override_register': mp_config_override
|
||||
}
|
||||
)
|
||||
|
||||
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS[mp_type].append(fancy_id)
|
||||
ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all'].append(fancy_id)
|
||||
if ns not in MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS:
|
||||
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns] = {mp_type: [] for mp_type in _KNOWN_MPS_PLUS_ALL}
|
||||
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns][mp_type].append(fancy_id)
|
||||
MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]['all'].append(fancy_id)
|
||||
|
||||
|
||||
def nested_update(base: MutableMapping, update):
|
||||
"""
|
||||
Updated method for nested Mappings
|
||||
Args:
|
||||
base: main Mapping to be updated
|
||||
update: updated values for base Mapping
|
||||
|
||||
"""
|
||||
if any([item.endswith('_type') for item in update]):
|
||||
base = update
|
||||
return base
|
||||
for k, v in update.items():
|
||||
base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v
|
||||
return base
|
||||
|
||||
|
||||
def bb_env_constructor(underlying_id, mp_wrapper, mp_type, mp_config_override={}, _mp_config_override_register={}, **kwargs):
|
||||
raw_underlying_env = gym_make(underlying_id, **kwargs)
|
||||
underlying_env = mp_wrapper(raw_underlying_env)
|
||||
|
||||
mp_config = getattr(underlying_env, 'mp_config') if hasattr(underlying_env, 'mp_config') else {}
|
||||
active_mp_config = copy.deepcopy(mp_config.get(mp_type, {}))
|
||||
global_inherit_defaults = mp_config.get('inherit_defaults', True)
|
||||
inherit_defaults = active_mp_config.pop('inherit_defaults', global_inherit_defaults)
|
||||
|
||||
config = copy.deepcopy(_BB_DEFAULTS[mp_type]) if inherit_defaults else {}
|
||||
nested_update(config, active_mp_config)
|
||||
nested_update(config, _mp_config_override_register)
|
||||
nested_update(config, mp_config_override)
|
||||
|
||||
wrappers = config.pop('wrappers')
|
||||
|
||||
traj_gen_kwargs = config.pop('trajectory_generator_kwargs', {})
|
||||
black_box_kwargs = config.pop('black_box_kwargs', {})
|
||||
contr_kwargs = config.pop('controller_kwargs', {})
|
||||
phase_kwargs = config.pop('phase_generator_kwargs', {})
|
||||
basis_kwargs = config.pop('basis_generator_kwargs', {})
|
||||
|
||||
return make_bb(underlying_env,
|
||||
wrappers=wrappers,
|
||||
black_box_kwargs=black_box_kwargs,
|
||||
traj_gen_kwargs=traj_gen_kwargs,
|
||||
controller_kwargs=contr_kwargs,
|
||||
phase_kwargs=phase_kwargs,
|
||||
basis_kwargs=basis_kwargs,
|
||||
**config)
|
@ -1,20 +1,23 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
def example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False):
|
||||
env = fancy_gym.make(env_name, seed=seed)
|
||||
env.reset()
|
||||
|
||||
def example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False):
|
||||
env = gym.make(env_name)
|
||||
env.reset(seed=seed)
|
||||
for i in range(iterations):
|
||||
done = False
|
||||
while done is False:
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
if render:
|
||||
env.render(mode="human")
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
env.reset()
|
||||
env.close()
|
||||
del env
|
||||
|
||||
|
||||
def example_custom_replanning_envs(seed=0, iteration=100, render=True):
|
||||
# id for a step-based environment
|
||||
base_env_id = "BoxPushingDense-v0"
|
||||
@ -22,7 +25,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
|
||||
wrappers = [fancy_gym.envs.mujoco.box_pushing.mp_wrapper.MPWrapper]
|
||||
|
||||
trajectory_generator_kwargs = {'trajectory_generator_type': 'prodmp',
|
||||
'weight_scale': 1}
|
||||
'weights_scale': 1}
|
||||
phase_generator_kwargs = {'phase_generator_type': 'exp'}
|
||||
controller_kwargs = {'controller_type': 'velocity'}
|
||||
basis_generator_kwargs = {'basis_generator_type': 'prodmp',
|
||||
@ -46,8 +49,8 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
|
||||
|
||||
for i in range(iteration):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
if done:
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
if terminated or truncated:
|
||||
env.reset()
|
||||
|
||||
env.close()
|
||||
@ -56,7 +59,7 @@ def example_custom_replanning_envs(seed=0, iteration=100, render=True):
|
||||
|
||||
if __name__ == "__main__":
|
||||
# run a registered replanning environment
|
||||
example_run_replanning_env(env_name="BoxPushingDenseReplanProDMP-v0", seed=1, iterations=1, render=False)
|
||||
example_run_replanning_env(env_name="fancy_ProDMP/BoxPushingDenseReplan-v0", seed=1, iterations=1, render=False)
|
||||
|
||||
# run a custom replanning environment
|
||||
example_custom_replanning_envs(seed=0, iteration=8, render=True)
|
@ -1,7 +1,8 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
|
||||
def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
|
||||
def example_dmc(env_id="dm_control/fish-swim", seed=1, iterations=1000, render=True):
|
||||
"""
|
||||
Example for running a DMC based env in the step based setting.
|
||||
The env_id has to be specified as `domain_name:task_name` or
|
||||
@ -16,9 +17,9 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
|
||||
Returns:
|
||||
|
||||
"""
|
||||
env = fancy_gym.make(env_id, seed)
|
||||
env = gym.make(env_id)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
obs = env.reset(seed=seed)
|
||||
print("observation shape:", env.observation_space.shape)
|
||||
print("action shape:", env.action_space.shape)
|
||||
|
||||
@ -26,10 +27,10 @@ def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True):
|
||||
ac = env.action_space.sample()
|
||||
if render:
|
||||
env.render(mode="human")
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(env_id, rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
@ -56,7 +57,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
"""
|
||||
|
||||
# Base DMC name, according to structure of above example
|
||||
base_env_id = "dmc:ball_in_cup-catch"
|
||||
base_env_id = "dm_control/ball_in_cup-catch"
|
||||
|
||||
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
|
||||
# You can also add other gym.Wrappers in case they are needed.
|
||||
@ -65,8 +66,8 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'}
|
||||
phase_generator_kwargs = {'phase_generator_type': 'linear'}
|
||||
controller_kwargs = {'controller_type': 'motor',
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1,}
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1, }
|
||||
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1
|
||||
@ -102,10 +103,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(base_env_id, rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
@ -123,14 +124,14 @@ if __name__ == '__main__':
|
||||
render = True
|
||||
|
||||
# # Standard DMC Suite tasks
|
||||
example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render)
|
||||
example_dmc("dm_control/fish-swim", seed=10, iterations=1000, render=render)
|
||||
#
|
||||
# # Manipulation tasks
|
||||
# # Disclaimer: The vision versions are currently not integrated and yield an error
|
||||
example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render)
|
||||
example_dmc("dm_control/manipulation-reach_site_features", seed=10, iterations=250, render=render)
|
||||
#
|
||||
# # Gym + DMC hybrid task provided in the MP framework
|
||||
example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render)
|
||||
example_dmc("dm_control_ProMP/ball_in_cup-catch-v0", seed=10, iterations=1, render=render)
|
||||
|
||||
# Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is
|
||||
# already registered above
|
||||
|
@ -1,6 +1,6 @@
|
||||
from collections import defaultdict
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
import fancy_gym
|
||||
@ -21,27 +21,27 @@ def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True):
|
||||
|
||||
"""
|
||||
|
||||
env = fancy_gym.make(env_id, seed)
|
||||
env = gym.make(env_id)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
obs = env.reset(seed=seed)
|
||||
print("Observation shape: ", env.observation_space.shape)
|
||||
print("Action shape: ", env.action_space.shape)
|
||||
|
||||
# number of environment steps
|
||||
for i in range(iterations):
|
||||
obs, reward, done, info = env.step(env.action_space.sample())
|
||||
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
|
||||
rewards += reward
|
||||
|
||||
if render:
|
||||
env.render()
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
|
||||
|
||||
def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
|
||||
def example_async(env_id="fancy/HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800):
|
||||
"""
|
||||
Example for running any env in a vectorized multiprocessing setting to generate more samples faster.
|
||||
This also includes DMC and DMP environments when leveraging our custom make_env function.
|
||||
@ -69,12 +69,15 @@ def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samp
|
||||
# this would generate more samples than requested if n_samples % num_envs != 0
|
||||
repeat = int(np.ceil(n_samples / env.num_envs))
|
||||
for i in range(repeat):
|
||||
obs, reward, done, info = env.step(env.action_space.sample())
|
||||
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
|
||||
buffer['obs'].append(obs)
|
||||
buffer['reward'].append(reward)
|
||||
buffer['done'].append(done)
|
||||
buffer['terminated'].append(terminated)
|
||||
buffer['truncated'].append(truncated)
|
||||
buffer['info'].append(info)
|
||||
rewards += reward
|
||||
|
||||
done = terminated or truncated
|
||||
if np.any(done):
|
||||
print(f"Reward at iteration {i}: {rewards[done]}")
|
||||
rewards[done] = 0
|
||||
@ -90,11 +93,10 @@ if __name__ == '__main__':
|
||||
example_general("Pendulum-v1", seed=10, iterations=200, render=render)
|
||||
|
||||
# Mujoco task from framework
|
||||
example_general("Reacher5d-v0", seed=10, iterations=200, render=render)
|
||||
example_general("fancy/Reacher5d-v0", seed=10, iterations=200, render=render)
|
||||
|
||||
# # OpenAI Mujoco task
|
||||
example_general("HalfCheetah-v2", seed=10, render=render)
|
||||
|
||||
# Vectorized multiprocessing environments
|
||||
# example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200)
|
||||
|
||||
|
@ -1,7 +1,8 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
|
||||
def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
|
||||
def example_meta(env_id="fish-swim", seed=1, iterations=1000, render=True):
|
||||
"""
|
||||
Example for running a MetaWorld based env in the step based setting.
|
||||
The env_id has to be specified as `task_name-v2`. V1 versions are not supported and we always
|
||||
@ -17,9 +18,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
|
||||
Returns:
|
||||
|
||||
"""
|
||||
env = fancy_gym.make(env_id, seed)
|
||||
env = gym.make(env_id)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
obs = env.reset(seed=seed)
|
||||
print("observation shape:", env.observation_space.shape)
|
||||
print("action shape:", env.action_space.shape)
|
||||
|
||||
@ -29,9 +30,9 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
|
||||
# THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM
|
||||
# TODO: Remove this, when Metaworld fixes its interface.
|
||||
env.render(False)
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(env_id, rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
@ -40,7 +41,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True):
|
||||
del env
|
||||
|
||||
|
||||
def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
def example_custom_meta_and_mp(seed=1, iterations=1, render=True):
|
||||
"""
|
||||
Example for running a custom movement primitive based environments.
|
||||
Our already registered environments follow the same structure.
|
||||
@ -58,7 +59,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
"""
|
||||
|
||||
# Base MetaWorld name, according to structure of above example
|
||||
base_env_id = "metaworld:button-press-v2"
|
||||
base_env_id = "metaworld/button-press-v2"
|
||||
|
||||
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
|
||||
# You can also add other gym.Wrappers in case they are needed.
|
||||
@ -103,10 +104,10 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True):
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(base_env_id, rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
@ -124,11 +125,10 @@ if __name__ == '__main__':
|
||||
render = False
|
||||
|
||||
# # Standard Meta world tasks
|
||||
example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render)
|
||||
example_meta("metaworld/button-press-v2", seed=10, iterations=500, render=render)
|
||||
|
||||
# # MP + MetaWorld hybrid task provided in the our framework
|
||||
example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render)
|
||||
example_meta("metaworld_ProMP/ButtonPress-v2", seed=10, iterations=1, render=render)
|
||||
#
|
||||
# # Custom MetaWorld task
|
||||
example_custom_dmc_and_mp(seed=10, iterations=1, render=render)
|
||||
|
||||
example_custom_meta_and_mp(seed=10, iterations=1, render=render)
|
||||
|
@ -1,7 +1,8 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
|
||||
def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True):
|
||||
def example_mp(env_name="fancy_ProMP/HoleReacher-v0", seed=1, iterations=1, render=True):
|
||||
"""
|
||||
Example for running a black box based environment, which is already registered
|
||||
Args:
|
||||
@ -15,11 +16,11 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
|
||||
"""
|
||||
# Equivalent to gym, we have a make function which can be used to create environments.
|
||||
# It takes care of seeding and enables the use of a variety of external environments using the gym interface.
|
||||
env = fancy_gym.make(env_name, seed)
|
||||
env = gym.make(env_name)
|
||||
|
||||
returns = 0
|
||||
# env.render(mode=None)
|
||||
obs = env.reset()
|
||||
obs = env.reset(seed=seed)
|
||||
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
@ -41,16 +42,16 @@ def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True
|
||||
# This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the
|
||||
# full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal
|
||||
# to the return of a trajectory. Default is the sum over the step-wise rewards.
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
# Aggregated returns
|
||||
returns += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(reward)
|
||||
obs = env.reset()
|
||||
|
||||
|
||||
def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True):
|
||||
def example_custom_mp(env_name="fancy_ProMP/Reacher5d-v0", seed=1, iterations=1, render=True):
|
||||
"""
|
||||
Example for running a movement primitive based environment, which is already registered
|
||||
Args:
|
||||
@ -62,12 +63,9 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
|
||||
Returns:
|
||||
|
||||
"""
|
||||
# Changing the arguments of the black box env is possible by providing them to gym as with all kwargs.
|
||||
# Changing the arguments of the black box env is possible by providing them to gym through mp_config_override.
|
||||
# E.g. here for way to many basis functions
|
||||
env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000})
|
||||
# env = fancy_gym.make(env_name, seed)
|
||||
# mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}})
|
||||
# mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}})
|
||||
env = gym.make(env_name, seed, mp_config_override={'basis_generator_kwargs': {'num_basis': 1000}})
|
||||
|
||||
returns = 0
|
||||
obs = env.reset()
|
||||
@ -79,10 +77,10 @@ def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
returns += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(i, reward)
|
||||
obs = env.reset()
|
||||
|
||||
@ -106,7 +104,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
|
||||
|
||||
"""
|
||||
|
||||
base_env_id = "Reacher5d-v0"
|
||||
base_env_id = "fancy/Reacher5d-v0"
|
||||
|
||||
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper.
|
||||
# You can also add other gym.Wrappers in case they are needed.
|
||||
@ -114,7 +112,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
|
||||
|
||||
# For a ProMP
|
||||
trajectory_generator_kwargs = {'trajectory_generator_type': 'promp',
|
||||
'weight_scale': 2}
|
||||
'weights_scale': 2}
|
||||
phase_generator_kwargs = {'phase_generator_type': 'linear'}
|
||||
controller_kwargs = {'controller_type': 'velocity'}
|
||||
basis_generator_kwargs = {'basis_generator_type': 'zero_rbf',
|
||||
@ -124,7 +122,7 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
|
||||
|
||||
# # For a DMP
|
||||
# trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp',
|
||||
# 'weight_scale': 500}
|
||||
# 'weights_scale': 500}
|
||||
# phase_generator_kwargs = {'phase_generator_type': 'exp',
|
||||
# 'alpha_phase': 2.5}
|
||||
# controller_kwargs = {'controller_type': 'velocity'}
|
||||
@ -145,10 +143,10 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(rewards)
|
||||
rewards = 0
|
||||
obs = env.reset()
|
||||
@ -157,20 +155,20 @@ def example_fully_custom_mp(seed=1, iterations=1, render=True):
|
||||
if __name__ == '__main__':
|
||||
render = False
|
||||
# DMP
|
||||
example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render)
|
||||
example_mp("fancy_DMP/HoleReacher-v0", seed=10, iterations=5, render=render)
|
||||
|
||||
# ProMP
|
||||
example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render)
|
||||
example_mp("BoxPushingTemporalSparseProMP-v0", seed=10, iterations=1, render=render)
|
||||
example_mp("TableTennis4DProMP-v0", seed=10, iterations=20, render=render)
|
||||
example_mp("fancy_ProMP/HoleReacher-v0", seed=10, iterations=5, render=render)
|
||||
example_mp("fancy_ProMP/BoxPushingTemporalSparse-v0", seed=10, iterations=1, render=render)
|
||||
example_mp("fancy_ProMP/TableTennis4D-v0", seed=10, iterations=20, render=render)
|
||||
|
||||
# ProDMP with Replanning
|
||||
example_mp("BoxPushingDenseReplanProDMP-v0", seed=10, iterations=4, render=render)
|
||||
example_mp("TableTennis4DReplanProDMP-v0", seed=10, iterations=20, render=render)
|
||||
example_mp("TableTennisWindReplanProDMP-v0", seed=10, iterations=20, render=render)
|
||||
example_mp("fancy_ProDMP/BoxPushingDenseReplan-v0", seed=10, iterations=4, render=render)
|
||||
example_mp("fancy_ProDMP/TableTennis4DReplan-v0", seed=10, iterations=20, render=render)
|
||||
example_mp("fancy_ProDMP/TableTennisWindReplan-v0", seed=10, iterations=20, render=render)
|
||||
|
||||
# Altered basis functions
|
||||
obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=1, render=render)
|
||||
obs1 = example_custom_mp("fancy_ProMP/Reacher5d-v0", seed=10, iterations=1, render=render)
|
||||
|
||||
# Custom MP
|
||||
example_fully_custom_mp(seed=10, iterations=1, render=render)
|
||||
|
@ -1,3 +1,4 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
|
||||
@ -12,11 +13,10 @@ def example_mp(env_name, seed=1, render=True):
|
||||
Returns:
|
||||
|
||||
"""
|
||||
# While in this case gym.make() is possible to use as well, we recommend our custom make env function.
|
||||
env = fancy_gym.make(env_name, seed)
|
||||
env = gym.make(env_name)
|
||||
|
||||
returns = 0
|
||||
obs = env.reset()
|
||||
obs = env.reset(seed=seed)
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(10):
|
||||
if render and i % 2 == 0:
|
||||
@ -24,14 +24,13 @@ def example_mp(env_name, seed=1, render=True):
|
||||
else:
|
||||
env.render()
|
||||
ac = env.action_space.sample()
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
returns += reward
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
print(returns)
|
||||
obs = env.reset()
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
example_mp("ReacherProMP-v2")
|
||||
|
||||
example_mp("gym_ProMP/Reacher-v2")
|
||||
|
@ -1,10 +1,14 @@
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
|
||||
def compare_bases_shape(env1_id, env2_id):
|
||||
env1 = fancy_gym.make(env1_id, seed=0)
|
||||
env1 = gym.make(env1_id)
|
||||
env1.traj_gen.show_scaled_basis(plot=True)
|
||||
env2 = fancy_gym.make(env2_id, seed=0)
|
||||
env2 = gym.make(env2_id)
|
||||
env2.traj_gen.show_scaled_basis(plot=True)
|
||||
return
|
||||
|
||||
|
||||
if __name__ == '__main__':
|
||||
compare_bases_shape("TableTennis4DProDMP-v0", "TableTennis4DProMP-v0")
|
||||
compare_bases_shape("fancy_ProDMP/TableTennis4D-v0", "fancy_ProMP/TableTennis4D-v0")
|
||||
|
@ -3,19 +3,20 @@ from collections import OrderedDict
|
||||
import numpy as np
|
||||
from matplotlib import pyplot as plt
|
||||
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
# This might work for some environments, however, please verify either way the correct trajectory information
|
||||
# for your environment are extracted below
|
||||
SEED = 1
|
||||
|
||||
env_id = "Reacher5dProMP-v0"
|
||||
env_id = "fancy_ProMP/Reacher5d-v0"
|
||||
|
||||
env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env
|
||||
env = fancy_gym.make(env_id, mp_config_override={'controller_kwargs': {'p_gains': 0.05, 'd_gains': 0.05}}).env
|
||||
env.action_space.seed(SEED)
|
||||
|
||||
# Plot difference between real trajectory and target MP trajectory
|
||||
env.reset()
|
||||
env.reset(seed=SEED)
|
||||
w = env.action_space.sample()
|
||||
pos, vel = env.get_trajectory(w)
|
||||
|
||||
@ -34,7 +35,7 @@ fig.show()
|
||||
for t, (des_pos, des_vel) in enumerate(zip(pos, vel)):
|
||||
actions = env.tracking_controller.get_action(des_pos, des_vel, env.current_pos, env.current_vel)
|
||||
actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high)
|
||||
_, _, _, _ = env.env.step(actions)
|
||||
env.env.step(actions)
|
||||
if t % 15 == 0:
|
||||
img.set_data(env.env.render(mode="rgb_array"))
|
||||
fig.canvas.draw()
|
||||
|
@ -1,26 +1,64 @@
|
||||
# MetaWorld Wrappers
|
||||
# Metaworld
|
||||
|
||||
These are the Environment Wrappers for selected [Metaworld](https://meta-world.github.io/) environments in order to use our Movement Primitive gym interface with them.
|
||||
All Metaworld environments have a 39 dimensional observation space with the same structure. The tasks differ only in the objective and the initial observations that are randomized.
|
||||
Unused observations are zeroed out. E.g. for `Button-Press-v2` the observation mask looks the following:
|
||||
```python
|
||||
return np.hstack([
|
||||
# Current observation
|
||||
[False] * 3, # end-effector position
|
||||
[False] * 1, # normalized gripper open distance
|
||||
[True] * 3, # main object position
|
||||
[False] * 4, # main object quaternion
|
||||
[False] * 3, # secondary object position
|
||||
[False] * 4, # secondary object quaternion
|
||||
# Previous observation
|
||||
[False] * 3, # previous end-effector position
|
||||
[False] * 1, # previous normalized gripper open distance
|
||||
[False] * 3, # previous main object position
|
||||
[False] * 4, # previous main object quaternion
|
||||
[False] * 3, # previous second object position
|
||||
[False] * 4, # previous second object quaternion
|
||||
# Goal
|
||||
[True] * 3, # goal position
|
||||
])
|
||||
```
|
||||
For other tasks only the boolean values have to be adjusted accordingly.
|
||||
[Metaworld](https://meta-world.github.io/) is an open-source simulated benchmark designed to advance meta-reinforcement learning and multi-task learning, comprising 50 diverse robotic manipulation tasks. The benchmark features a universal tabletop environment equipped with a simulated Sawyer arm and a variety of everyday objects. This shared environment is pivotal for reusing structured learning and efficiently acquiring related tasks.
|
||||
|
||||
## Step-Based Envs
|
||||
|
||||
`fancy_gym` makes all metaworld ML1 tasks avaible via the standard gym interface. To access metaworld environments using a different mode of operation (MT1 / ML100 / etc.) please use the functionality provided by metaworld directly.
|
||||
|
||||
| Name | Description | Horizon | Action Dimension | Observation Dimension | Context Dimension |
|
||||
| ---------------------------------------- | ------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- | ----------------- |
|
||||
| `metaworld/assembly-v2` | A task where the robot must assemble components. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/basketball-v2` | A task where the robot must play a game of basketball. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/bin-picking-v2` | A task involving the robot picking objects from a bin. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/box-close-v2` | A task requiring the robot to close a box. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/button-press-topdown-v2` | A task where the robot must press a button from a top-down perspective. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/button-press-topdown-wall-v2` | A task involving the robot pressing a button with a wall from a top-down perspective. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/button-press-v2` | A task where the robot must press a button. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/button-press-wall-v2` | A task involving the robot pressing a button with a wall. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/coffee-button-v2` | A task where the robot must press a button on a coffee machine. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/coffee-pull-v2` | A task involving the robot pulling a lever on a coffee machine. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/coffee-push-v2` | A task involving the robot pushing a component on a coffee machine. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/dial-turn-v2` | A task where the robot must turn a dial. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/disassemble-v2` | A task requiring the robot to disassemble an object. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/door-close-v2` | A task where the robot must close a door. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/door-lock-v2` | A task involving the robot locking a door. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/door-open-v2` | A task where the robot must open a door. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/door-unlock-v2` | A task involving the robot unlocking a door. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/hand-insert-v2` | A task requiring the robot to insert a hand into an object. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/drawer-close-v2` | A task where the robot must close a drawer. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/drawer-open-v2` | A task involving the robot opening a drawer. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/faucet-open-v2` | A task requiring the robot to open a faucet. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/faucet-close-v2` | A task where the robot must close a faucet. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/hammer-v2` | A task where the robot must use a hammer. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/handle-press-side-v2` | A task involving the robot pressing a handle from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/handle-press-v2` | A task where the robot must press a handle. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/handle-pull-side-v2` | A task requiring the robot to pull a handle from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/handle-pull-v2` | A task where the robot must pull a handle. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/lever-pull-v2` | A task involving the robot pulling a lever. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/peg-insert-side-v2` | A task requiring the robot to insert a peg from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/pick-place-wall-v2` | A task involving the robot picking and placing an object with a wall. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/pick-out-of-hole-v2` | A task where the robot must pick an object out of a hole. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/reach-v2` | A task where the robot must reach an object. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/push-back-v2` | A task involving the robot pushing an object backward. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/push-v2` | A task where the robot must push an object. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/pick-place-v2` | A task involving the robot picking up and placing an object. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/plate-slide-v2` | A task requiring the robot to slide a plate. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/plate-slide-side-v2` | A task involving the robot sliding a plate from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/plate-slide-back-v2` | A task where the robot must slide a plate backward. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/plate-slide-back-side-v2` | A task involving the robot sliding a plate backward from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/peg-unplug-side-v2` | A task where the robot must unplug a peg from the side. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/soccer-v2` | A task where the robot must play soccer. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/stick-push-v2` | A task involving the robot pushing a stick. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/stick-pull-v2` | A task where the robot must pull a stick. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/push-wall-v2` | A task involving the robot pushing against a wall. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/reach-wall-v2` | A task where the robot must reach an object with a wall. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/shelf-place-v2` | A task involving the robot placing an object on a shelf. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/sweep-into-v2` | A task where the robot must sweep objects into a container. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/sweep-v2` | A task requiring the robot to sweep. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/window-open-v2` | A task where the robot must open a window. | 500 | 4 | 39 | 6 |
|
||||
| `metaworld/window-close-v2` | A task involving the robot closing a window. | 500 | 4 | 39 | 6 |
|
||||
|
||||
## MP-Based Envs
|
||||
|
||||
All envs also exist in MP-variants. Refer to them using `metaworld_ProMP/<name-v2>` or `metaworld_ProDMP/<name-v2>` (DMP is currently not supported as of now).
|
||||
|
@ -1,125 +1,37 @@
|
||||
from typing import Iterable, Type, Union, Optional
|
||||
|
||||
from copy import deepcopy
|
||||
|
||||
from gym import register
|
||||
from ..envs.registry import register
|
||||
|
||||
from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \
|
||||
object_change_mp_wrapper
|
||||
|
||||
from . import metaworld_adapter
|
||||
|
||||
metaworld_adapter.register_all_ML1()
|
||||
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
|
||||
|
||||
# MetaWorld
|
||||
|
||||
DEFAULT_BB_DICT_ProMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'promp',
|
||||
'weights_scale': 10,
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'metaworld',
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'condition_on_desired': False,
|
||||
}
|
||||
}
|
||||
|
||||
DEFAULT_BB_DICT_ProDMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'prodmp',
|
||||
'auto_scale_basis': True,
|
||||
'weights_scale': 10,
|
||||
# 'goal_scale': 0.,
|
||||
'disable_goal': True,
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'exp',
|
||||
# 'alpha_phase' : 3,
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'metaworld',
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'prodmp',
|
||||
'num_basis': 5,
|
||||
'alpha': 10
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'condition_on_desired': False,
|
||||
}
|
||||
|
||||
}
|
||||
|
||||
_goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2",
|
||||
"plate-slide-side-v2", "plate-slide-back-side-v2"]
|
||||
for _task in _goal_change_envs:
|
||||
task_id_split = _task.split("-")
|
||||
name = "".join([s.capitalize() for s in task_id_split[:-1]])
|
||||
|
||||
# ProMP
|
||||
_env_id = f'{name}ProMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_change_promp
|
||||
id=f'metaworld/{_task}',
|
||||
register_step_based=False,
|
||||
mp_wrapper=goal_change_mp_wrapper.MPWrapper,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# ProDMP
|
||||
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_goal_change_prodmp['wrappers'].append(goal_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_change_prodmp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_change_prodmp
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
_object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"]
|
||||
for _task in _object_change_envs:
|
||||
task_id_split = _task.split("-")
|
||||
name = "".join([s.capitalize() for s in task_id_split[:-1]])
|
||||
|
||||
# ProMP
|
||||
_env_id = f'{name}ProMP-{task_id_split[-1]}'
|
||||
kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_object_change_promp
|
||||
id=f'metaworld/{_task}',
|
||||
register_step_based=False,
|
||||
mp_wrapper=object_change_mp_wrapper.MPWrapper,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# ProDMP
|
||||
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
|
||||
kwargs_dict_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_object_change_prodmp['wrappers'].append(object_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_object_change_prodmp['name'] = f'metaworld:{_task}'
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_object_change_prodmp
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
_goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2",
|
||||
"button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2",
|
||||
@ -133,62 +45,18 @@ _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press
|
||||
"shelf-place-v2", "sweep-v2", "window-open-v2", "window-close-v2"
|
||||
]
|
||||
for _task in _goal_and_object_change_envs:
|
||||
task_id_split = _task.split("-")
|
||||
name = "".join([s.capitalize() for s in task_id_split[:-1]])
|
||||
|
||||
# ProMP
|
||||
_env_id = f'{name}ProMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_and_object_change_promp
|
||||
id=f'metaworld/{_task}',
|
||||
register_step_based=False,
|
||||
mp_wrapper=goal_object_change_mp_wrapper.MPWrapper,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# ProDMP
|
||||
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_and_object_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_goal_and_object_change_prodmp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_and_object_change_prodmp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_and_object_change_prodmp
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
||||
_goal_and_endeffector_change_envs = ["basketball-v2"]
|
||||
for _task in _goal_and_endeffector_change_envs:
|
||||
task_id_split = _task.split("-")
|
||||
name = "".join([s.capitalize() for s in task_id_split[:-1]])
|
||||
|
||||
# ProMP
|
||||
_env_id = f'{name}ProMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_and_endeffector_change_promp
|
||||
id=f'metaworld/{_task}',
|
||||
register_step_based=False,
|
||||
mp_wrapper=goal_endeffector_change_mp_wrapper.MPWrapper,
|
||||
add_mp_types=['ProMP', 'ProDMP'],
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id)
|
||||
|
||||
# ProDMP
|
||||
_env_id = f'{name}ProDMP-{task_id_split[-1]}'
|
||||
kwargs_dict_goal_and_endeffector_change_prodmp = deepcopy(DEFAULT_BB_DICT_ProDMP)
|
||||
kwargs_dict_goal_and_endeffector_change_prodmp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper)
|
||||
kwargs_dict_goal_and_endeffector_change_prodmp['name'] = f'metaworld:{_task}'
|
||||
|
||||
register(
|
||||
id=_env_id,
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_goal_and_endeffector_change_prodmp
|
||||
)
|
||||
ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProDMP"].append(_env_id)
|
||||
|
@ -6,12 +6,63 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class BaseMetaworldMPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'inherit_defaults': False,
|
||||
'ProMP': {
|
||||
'wrappers': [],
|
||||
'trajectory_generator_kwargs': {
|
||||
'trajectory_generator_type': 'promp',
|
||||
'weights_scale': 10,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'metaworld',
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'condition_on_desired': False,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {
|
||||
'wrappers': [],
|
||||
'trajectory_generator_kwargs': {
|
||||
'trajectory_generator_type': 'prodmp',
|
||||
'auto_scale_basis': True,
|
||||
'weights_scale': 10,
|
||||
# 'goal_scale': 0.,
|
||||
'disable_goal': True,
|
||||
},
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'exp',
|
||||
# 'alpha_phase' : 3,
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'controller_type': 'metaworld',
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'basis_generator_type': 'prodmp',
|
||||
'num_basis': 5,
|
||||
'alpha': 10
|
||||
},
|
||||
'black_box_kwargs': {
|
||||
'condition_on_desired': False,
|
||||
},
|
||||
},
|
||||
}
|
||||
|
||||
@property
|
||||
def current_pos(self) -> Union[float, int, np.ndarray]:
|
||||
r_close = self.env.data.get_joint_qpos("r_close")
|
||||
r_close = self.env.data.joint('r_close').qpos
|
||||
return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close])
|
||||
|
||||
@property
|
||||
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
return np.zeros(4, )
|
||||
# raise NotImplementedError("Velocity cannot be retrieved.")
|
||||
# raise NotImplementedError('Velocity cannot be retrieved.')
|
||||
|
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
|
||||
and no secondary objects or end effectors are altered at the start of an episode.
|
||||
You can verify this by executing the code below for your environment id and check if the output is non-zero
|
||||
at the same indices.
|
||||
```python
|
||||
import fancy_gym
|
||||
env = fancy_gym.make(env_id, 1)
|
||||
print(env.reset() - env.reset())
|
||||
array([ 0. , 0. , 0. , 0. , 0,
|
||||
0 , 0 , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0 , 0 , 0 ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , !=0 , !=0 , !=0])
|
||||
```
|
||||
"""
|
||||
|
||||
@property
|
||||
|
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
|
||||
and no secondary objects or end effectors are altered at the start of an episode.
|
||||
You can verify this by executing the code below for your environment id and check if the output is non-zero
|
||||
at the same indices.
|
||||
```python
|
||||
import fancy_gym
|
||||
env = fancy_gym.make(env_id, 1)
|
||||
print(env.reset() - env.reset())
|
||||
array([ !=0 , !=0 , !=0 , 0. , 0.,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , !=0 , !=0 ,
|
||||
!=0 , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , !=0 , !=0 , !=0])
|
||||
```
|
||||
"""
|
||||
|
||||
@property
|
||||
|
@ -9,19 +9,6 @@ class MPWrapper(BaseMetaworldMPWrapper):
|
||||
and no secondary objects or end effectors are altered at the start of an episode.
|
||||
You can verify this by executing the code below for your environment id and check if the output is non-zero
|
||||
at the same indices.
|
||||
```python
|
||||
import fancy_gym
|
||||
env = fancy_gym.make(env_id, 1)
|
||||
print(env.reset() - env.reset())
|
||||
array([ 0. , 0. , 0. , 0. , !=0,
|
||||
!=0 , !=0 , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , !=0 , !=0 , !=0 ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , 0. , 0. , 0. , 0. ,
|
||||
0. , !=0 , !=0 , !=0])
|
||||
```
|
||||
"""
|
||||
|
||||
@property
|
||||
|
97
fancy_gym/meta/metaworld_adapter.py
Normal file
97
fancy_gym/meta/metaworld_adapter.py
Normal file
@ -0,0 +1,97 @@
|
||||
import random
|
||||
from typing import Iterable, Type, Union, Optional
|
||||
|
||||
import numpy as np
|
||||
from gymnasium import register as gym_register
|
||||
|
||||
import uuid
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
from fancy_gym.utils.env_compatibility import EnvCompatibility
|
||||
|
||||
try:
|
||||
import metaworld
|
||||
except Exception:
|
||||
print('[FANCY GYM] Metaworld not avaible')
|
||||
|
||||
|
||||
class FixMetaworldHasIncorrectObsSpaceWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
|
||||
def __init__(self, env: gym.Env):
|
||||
gym.utils.RecordConstructorArgs.__init__(self)
|
||||
gym.Wrapper.__init__(self, env)
|
||||
|
||||
eos = env.observation_space
|
||||
eas = env.action_space
|
||||
|
||||
Obs_Space_Class = getattr(gym.spaces, str(eos.__class__).split("'")[1].split('.')[-1])
|
||||
Act_Space_Class = getattr(gym.spaces, str(eas.__class__).split("'")[1].split('.')[-1])
|
||||
|
||||
self.observation_space = Obs_Space_Class(low=eos.low-np.inf, high=eos.high+np.inf, dtype=eos.dtype)
|
||||
self.action_space = Act_Space_Class(low=eas.low, high=eas.high, dtype=eas.dtype)
|
||||
|
||||
|
||||
class FixMetaworldIncorrectResetPathLengthWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
|
||||
def __init__(self, env: gym.Env):
|
||||
gym.utils.RecordConstructorArgs.__init__(self)
|
||||
gym.Wrapper.__init__(self, env)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
ret = self.env.reset(**kwargs)
|
||||
head = self.env
|
||||
try:
|
||||
for i in range(16):
|
||||
head.curr_path_length = 0
|
||||
head = head.env
|
||||
except:
|
||||
pass
|
||||
return ret
|
||||
|
||||
|
||||
class FixMetaworldIgnoresSeedOnResetWrapper(gym.Wrapper, gym.utils.RecordConstructorArgs):
|
||||
def __init__(self, env: gym.Env):
|
||||
gym.utils.RecordConstructorArgs.__init__(self)
|
||||
gym.Wrapper.__init__(self, env)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
print('[!] You just called .reset on a Metaworld env and supplied a seed. Metaworld curretly does not correctly implement seeding. Do not rely on deterministic behavior.')
|
||||
if 'seed' in kwargs:
|
||||
self.env.seed(kwargs['seed'])
|
||||
return self.env.reset(**kwargs)
|
||||
|
||||
|
||||
def make_metaworld(underlying_id: str, seed: int = 1, render_mode: Optional[str] = None, **kwargs):
|
||||
if underlying_id not in metaworld.ML1.ENV_NAMES:
|
||||
raise ValueError(f'Specified environment "{underlying_id}" not present in metaworld ML1.')
|
||||
|
||||
env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[underlying_id + "-goal-observable"](seed=seed, **kwargs)
|
||||
|
||||
# setting this avoids generating the same initialization after each reset
|
||||
env._freeze_rand_vec = False
|
||||
# New argument to use global seeding
|
||||
env.seeded_rand_vec = True
|
||||
|
||||
# TODO remove, when this has been fixed upstream
|
||||
env = FixMetaworldHasIncorrectObsSpaceWrapper(env)
|
||||
# TODO remove, when this has been fixed upstream
|
||||
# env = FixMetaworldIncorrectResetPathLengthWrapper(env)
|
||||
# TODO remove, when this has been fixed upstream
|
||||
env = FixMetaworldIgnoresSeedOnResetWrapper(env)
|
||||
return env
|
||||
|
||||
|
||||
def register_all_ML1(**kwargs):
|
||||
for env_id in metaworld.ML1.ENV_NAMES:
|
||||
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=0)
|
||||
max_episode_steps = _env.max_path_length
|
||||
|
||||
gym_register(
|
||||
id='metaworld/'+env_id,
|
||||
entry_point=make_metaworld,
|
||||
max_episode_steps=max_episode_steps,
|
||||
kwargs={
|
||||
'underlying_id': env_id
|
||||
},
|
||||
**kwargs
|
||||
)
|
@ -4,11 +4,12 @@ These are the Environment Wrappers for selected [OpenAI Gym](https://gym.openai.
|
||||
the Motion Primitive gym interface for them.
|
||||
|
||||
## MP Environments
|
||||
|
||||
These environments are wrapped-versions of their OpenAI-gym counterparts.
|
||||
|
||||
|Name| Description|Trajectory Horizon|Action Dimension|Context Dimension
|
||||
|---|---|---|---|---|
|
||||
|`ContinuousMountainCarProMP-v0`| A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1
|
||||
|`ReacherProMP-v2`| A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2
|
||||
|`FetchSlideDenseProMP-v1`| A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4
|
||||
|`FetchReachDenseProMP-v1`| A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4
|
||||
| Name | Description | Trajectory Horizon | Action Dimension |
|
||||
| ------------------------------------ | -------------------------------------------------------------------- | ------------------ | ---------------- |
|
||||
| `gym_ProMP/ContinuousMountainCar-v0` | A ProMP wrapped version of the ContinuousMountainCar-v0 environment. | 100 | 1 |
|
||||
| `gym_ProMP/Reacher-v2` | A ProMP wrapped version of the Reacher-v2 environment. | 50 | 2 |
|
||||
| `gym_ProMP/FetchSlideDense-v1` | A ProMP wrapped version of the FetchSlideDense-v1 environment. | 50 | 4 |
|
||||
| `gym_ProMP/FetchReachDense-v1` | A ProMP wrapped version of the FetchReachDense-v1 environment. | 50 | 4 |
|
||||
|
@ -1,45 +1,16 @@
|
||||
from copy import deepcopy
|
||||
|
||||
from gym import register
|
||||
from ..envs.registry import register, upgrade
|
||||
|
||||
from . import mujoco
|
||||
from .deprecated_needs_gym_robotics import robotics
|
||||
|
||||
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": [], "ProDMP": []}
|
||||
|
||||
DEFAULT_BB_DICT_ProMP = {
|
||||
"name": 'EnvName',
|
||||
"wrappers": [],
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'promp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 1.0,
|
||||
"d_gains": 0.1,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 5,
|
||||
'num_basis_zero_start': 1
|
||||
}
|
||||
}
|
||||
|
||||
kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP)
|
||||
kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6
|
||||
kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075
|
||||
kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6
|
||||
kwargs_dict_reacher_promp['name'] = "Reacher-v2"
|
||||
kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper)
|
||||
register(
|
||||
id='ReacherProMP-v2',
|
||||
entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper',
|
||||
kwargs=kwargs_dict_reacher_promp
|
||||
upgrade(
|
||||
id='Reacher-v2',
|
||||
mp_wrapper=mujoco.reacher_v2.MPWrapper,
|
||||
add_mp_types=['ProMP'],
|
||||
)
|
||||
ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2")
|
||||
|
||||
"""
|
||||
The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments.
|
||||
However, the usage and so on needs to be checked
|
||||
|
@ -6,6 +6,28 @@ from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
|
||||
|
||||
class MPWrapper(RawInterfaceWrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
"trajectory_generator_kwargs": {
|
||||
'trajectory_generator_type': 'promp'
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
'phase_generator_type': 'linear'
|
||||
},
|
||||
"controller_kwargs": {
|
||||
'controller_type': 'motor',
|
||||
"p_gains": 0.6,
|
||||
"d_gains": 0.075,
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
'basis_generator_type': 'zero_rbf',
|
||||
'num_basis': 6,
|
||||
'num_basis_zero_start': 1
|
||||
}
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def current_vel(self) -> Union[float, int, np.ndarray]:
|
||||
|
11
fancy_gym/utils/env_compatibility.py
Normal file
11
fancy_gym/utils/env_compatibility.py
Normal file
@ -0,0 +1,11 @@
|
||||
import gymnasium as gym
|
||||
|
||||
|
||||
class EnvCompatibility(gym.wrappers.EnvCompatibility):
|
||||
def __getattr__(self, item):
|
||||
"""Propagate only non-existent properties to wrapped env."""
|
||||
if item.startswith('_'):
|
||||
raise AttributeError("attempted to get missing private attribute '{}'".format(item))
|
||||
if item in self.__dict__:
|
||||
return getattr(self, item)
|
||||
return getattr(self.env, item)
|
@ -1,17 +1,27 @@
|
||||
import logging
|
||||
import re
|
||||
from fancy_gym.utils.wrappers import TimeAwareObservation
|
||||
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
|
||||
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
|
||||
from fancy_gym.black_box.factory.controller_factory import get_controller
|
||||
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
|
||||
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
|
||||
import uuid
|
||||
from collections.abc import MutableMapping
|
||||
from copy import deepcopy
|
||||
from math import ceil
|
||||
from typing import Iterable, Type, Union
|
||||
from typing import Iterable, Type, Union, Optional
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
from gymnasium import make
|
||||
import numpy as np
|
||||
from gym.envs.registration import register, registry
|
||||
from gymnasium.envs.registration import register, registry
|
||||
from gymnasium.wrappers import TimeLimit
|
||||
|
||||
from fancy_gym.utils.env_compatibility import EnvCompatibility
|
||||
from fancy_gym.utils.wrappers import FlattenObservation
|
||||
|
||||
try:
|
||||
from dm_control import suite, manipulation
|
||||
import shimmy
|
||||
from shimmy.dm_control_compatibility import EnvType
|
||||
except ImportError:
|
||||
pass
|
||||
|
||||
@ -21,111 +31,44 @@ except Exception:
|
||||
# catch Exception as Import error does not catch missing mujoco-py
|
||||
pass
|
||||
|
||||
import fancy_gym
|
||||
from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper
|
||||
from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator
|
||||
from fancy_gym.black_box.factory.controller_factory import get_controller
|
||||
from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator
|
||||
from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator
|
||||
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
|
||||
from fancy_gym.utils.utils import nested_update
|
||||
|
||||
|
||||
def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs):
|
||||
"""
|
||||
TODO: Do we need this?
|
||||
Generate a callable to create a new gym environment with a given seed.
|
||||
The rank is added to the seed and can be used for example when using vector environments.
|
||||
E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments
|
||||
with seeds 123 through 130.
|
||||
Hence, testing environments should be seeded with a value which is offset by the number of training environments.
|
||||
Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns
|
||||
|
||||
Args:
|
||||
env_id: name of the environment
|
||||
seed: seed for deterministic behaviour
|
||||
rank: environment rank for deterministic over multiple seeds behaviour
|
||||
return_callable: If True returns a callable to create the environment instead of the environment itself.
|
||||
|
||||
Returns:
|
||||
|
||||
"""
|
||||
|
||||
def f():
|
||||
return make(env_id, seed + rank, **kwargs)
|
||||
|
||||
return f if return_callable else f()
|
||||
|
||||
|
||||
def make(env_id: str, seed: int, **kwargs):
|
||||
"""
|
||||
Converts an env_id to an environment with the gym API.
|
||||
This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be
|
||||
specified with "dmc:domain_name-task_name"
|
||||
Analogously, metaworld tasks can be created as "metaworld:env_id-v2".
|
||||
|
||||
Args:
|
||||
env_id: spec or env_id for gym tasks, external environments require a domain specification
|
||||
**kwargs: Additional kwargs for the constructor such as pixel observations, etc.
|
||||
|
||||
Returns: Gym environment
|
||||
|
||||
"""
|
||||
|
||||
if ':' in env_id:
|
||||
split_id = env_id.split(':')
|
||||
framework, env_id = split_id[-2:]
|
||||
else:
|
||||
framework = None
|
||||
|
||||
if framework == 'metaworld':
|
||||
# MetaWorld environment
|
||||
env = make_metaworld(env_id, seed, **kwargs)
|
||||
elif framework == 'dmc':
|
||||
# DeepMind Control environment
|
||||
env = make_dmc(env_id, seed, **kwargs)
|
||||
else:
|
||||
env = make_gym(env_id, seed, **kwargs)
|
||||
|
||||
env.seed(seed)
|
||||
env.action_space.seed(seed)
|
||||
env.observation_space.seed(seed)
|
||||
|
||||
return env
|
||||
|
||||
|
||||
def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs):
|
||||
def _make_wrapped_env(env: gym.Env, wrappers: Iterable[Type[gym.Wrapper]], seed=1, fallback_max_steps=None):
|
||||
"""
|
||||
Helper function for creating a wrapped gym environment using MPs.
|
||||
It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is
|
||||
provided to expose the interface for MPs.
|
||||
|
||||
Args:
|
||||
env_id: name of the environment
|
||||
env: base environemnt to wrap
|
||||
wrappers: list of wrappers (at least an RawInterfaceWrapper),
|
||||
seed: seed of environment
|
||||
|
||||
Returns: gym environment with all specified wrappers applied
|
||||
|
||||
"""
|
||||
# _env = gym.make(env_id)
|
||||
_env = make(env_id, seed, **kwargs)
|
||||
if fallback_max_steps:
|
||||
env = ensure_finite_time(env, fallback_max_steps)
|
||||
has_black_box_wrapper = False
|
||||
head = env
|
||||
while hasattr(head, 'env'):
|
||||
if isinstance(head, RawInterfaceWrapper):
|
||||
has_black_box_wrapper = True
|
||||
break
|
||||
head = head.env
|
||||
for w in wrappers:
|
||||
# only wrap the environment if not BlackBoxWrapper, e.g. for vision
|
||||
if issubclass(w, RawInterfaceWrapper):
|
||||
has_black_box_wrapper = True
|
||||
_env = w(_env)
|
||||
env = w(env)
|
||||
if not has_black_box_wrapper:
|
||||
raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.")
|
||||
return _env
|
||||
return env
|
||||
|
||||
|
||||
def make_bb(
|
||||
env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
|
||||
controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1,
|
||||
**kwargs):
|
||||
env: Union[gym.Env, str], wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping,
|
||||
controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping,
|
||||
time_limit: int = None, fallback_max_steps: int = None, **kwargs):
|
||||
"""
|
||||
This can also be used standalone for manually building a custom DMP environment.
|
||||
Args:
|
||||
@ -133,7 +76,7 @@ def make_bb(
|
||||
basis_kwargs: kwargs for the basis generator
|
||||
phase_kwargs: kwargs for the phase generator
|
||||
controller_kwargs: kwargs for the tracking controller
|
||||
env_id: base_env_name,
|
||||
env: step based environment (or environment id),
|
||||
wrappers: list of wrappers (at least an RawInterfaceWrapper),
|
||||
seed: seed of environment
|
||||
traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP
|
||||
@ -141,7 +84,7 @@ def make_bb(
|
||||
Returns: DMP wrapped gym env
|
||||
|
||||
"""
|
||||
_verify_time_limit(traj_gen_kwargs.get("duration"), kwargs.get("time_limit"))
|
||||
_verify_time_limit(traj_gen_kwargs.get("duration"), time_limit)
|
||||
|
||||
learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories')
|
||||
do_replanning = black_box_kwargs.get('replanning_schedule')
|
||||
@ -153,12 +96,19 @@ def make_bb(
|
||||
# Add as first wrapper in order to alter observation
|
||||
wrappers.insert(0, TimeAwareObservation)
|
||||
|
||||
env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs)
|
||||
if isinstance(env, str):
|
||||
env = make(env, **kwargs)
|
||||
|
||||
env = _make_wrapped_env(env=env, wrappers=wrappers, fallback_max_steps=fallback_max_steps)
|
||||
|
||||
# BB expects a spaces.Box to be exposed, need to convert for dict-observations
|
||||
if type(env.observation_space) == gym.spaces.dict.Dict:
|
||||
env = FlattenObservation(env)
|
||||
|
||||
traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item())
|
||||
|
||||
if black_box_kwargs.get('duration') is None:
|
||||
black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt
|
||||
black_box_kwargs['duration'] = get_env_duration(env)
|
||||
if phase_kwargs.get('tau') is None:
|
||||
phase_kwargs['tau'] = black_box_kwargs['duration']
|
||||
|
||||
@ -186,156 +136,27 @@ def make_bb(
|
||||
return bb_env
|
||||
|
||||
|
||||
def make_bb_env_helper(**kwargs):
|
||||
"""
|
||||
Helper function for registering a black box gym environment.
|
||||
Args:
|
||||
**kwargs: expects at least the following:
|
||||
{
|
||||
"name": base environment name.
|
||||
"wrappers": list of wrappers (at least an BlackBoxWrapper is required),
|
||||
"traj_gen_kwargs": {
|
||||
"trajectory_generator_type": type_of_your_movement_primitive,
|
||||
non default arguments for the movement primitive instance
|
||||
...
|
||||
}
|
||||
"controller_kwargs": {
|
||||
"controller_type": type_of_your_controller,
|
||||
non default arguments for the tracking_controller instance
|
||||
...
|
||||
},
|
||||
"basis_generator_kwargs": {
|
||||
"basis_generator_type": type_of_your_basis_generator,
|
||||
non default arguments for the basis generator instance
|
||||
...
|
||||
},
|
||||
"phase_generator_kwargs": {
|
||||
"phase_generator_type": type_of_your_phase_generator,
|
||||
non default arguments for the phase generator instance
|
||||
...
|
||||
},
|
||||
}
|
||||
|
||||
Returns: MP wrapped gym env
|
||||
|
||||
"""
|
||||
seed = kwargs.pop("seed", None)
|
||||
wrappers = kwargs.pop("wrappers")
|
||||
|
||||
traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {})
|
||||
black_box_kwargs = kwargs.pop('black_box_kwargs', {})
|
||||
contr_kwargs = kwargs.pop("controller_kwargs", {})
|
||||
phase_kwargs = kwargs.pop("phase_generator_kwargs", {})
|
||||
basis_kwargs = kwargs.pop("basis_generator_kwargs", {})
|
||||
|
||||
return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers,
|
||||
black_box_kwargs=black_box_kwargs,
|
||||
traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs,
|
||||
phase_kwargs=phase_kwargs,
|
||||
basis_kwargs=basis_kwargs, **kwargs, seed=seed)
|
||||
|
||||
|
||||
def make_dmc(
|
||||
env_id: str,
|
||||
seed: int = None,
|
||||
visualize_reward: bool = True,
|
||||
time_limit: Union[None, float] = None,
|
||||
**kwargs
|
||||
):
|
||||
if not re.match(r"\w+-\w+", env_id):
|
||||
raise ValueError("env_id does not have the following structure: 'domain_name-task_name'")
|
||||
domain_name, task_name = env_id.split("-")
|
||||
|
||||
if task_name.endswith("_vision"):
|
||||
# TODO
|
||||
raise ValueError("The vision interface for manipulation tasks is currently not supported.")
|
||||
|
||||
if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL:
|
||||
raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.')
|
||||
|
||||
# env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1'
|
||||
gym_id = uuid.uuid4().hex + '-v1'
|
||||
|
||||
task_kwargs = {'random': seed}
|
||||
if time_limit is not None:
|
||||
task_kwargs['time_limit'] = time_limit
|
||||
|
||||
# create task
|
||||
# Accessing private attribute because DMC does not expose time_limit or step_limit.
|
||||
# Only the current time_step/time as well as the control_timestep can be accessed.
|
||||
if domain_name == "manipulation":
|
||||
env = manipulation.load(environment_name=task_name, seed=seed)
|
||||
max_episode_steps = ceil(env._time_limit / env.control_timestep())
|
||||
else:
|
||||
env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs,
|
||||
visualize_reward=visualize_reward, environment_kwargs=kwargs)
|
||||
max_episode_steps = int(env._step_limit)
|
||||
|
||||
register(
|
||||
id=gym_id,
|
||||
entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper',
|
||||
kwargs={'env': lambda: env},
|
||||
max_episode_steps=max_episode_steps,
|
||||
)
|
||||
|
||||
env = gym.make(gym_id)
|
||||
env.seed(seed)
|
||||
def ensure_finite_time(env: gym.Env, fallback_max_steps=500):
|
||||
cur_limit = env.spec.max_episode_steps
|
||||
if not cur_limit:
|
||||
if hasattr(env.unwrapped, 'max_path_length'):
|
||||
return TimeLimit(env, env.unwrapped.__getattribute__('max_path_length'))
|
||||
return TimeLimit(env, fallback_max_steps)
|
||||
return env
|
||||
|
||||
|
||||
def make_metaworld(env_id: str, seed: int, **kwargs):
|
||||
if env_id not in metaworld.ML1.ENV_NAMES:
|
||||
raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.')
|
||||
|
||||
_env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs)
|
||||
|
||||
# setting this avoids generating the same initialization after each reset
|
||||
_env._freeze_rand_vec = False
|
||||
# New argument to use global seeding
|
||||
_env.seeded_rand_vec = True
|
||||
|
||||
gym_id = uuid.uuid4().hex + '-v1'
|
||||
|
||||
register(
|
||||
id=gym_id,
|
||||
entry_point=lambda: _env,
|
||||
max_episode_steps=_env.max_path_length,
|
||||
)
|
||||
|
||||
# TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld
|
||||
env = gym.make(gym_id, disable_env_checker=True)
|
||||
return env
|
||||
|
||||
|
||||
def make_gym(env_id, seed, **kwargs):
|
||||
"""
|
||||
Create
|
||||
Args:
|
||||
env_id:
|
||||
seed:
|
||||
**kwargs:
|
||||
|
||||
Returns:
|
||||
|
||||
"""
|
||||
# Getting the existing keywords to allow for nested dict updates for BB envs
|
||||
# gym only allows for non nested updates.
|
||||
def get_env_duration(env: gym.Env):
|
||||
try:
|
||||
all_kwargs = deepcopy(registry.get(env_id).kwargs)
|
||||
except AttributeError as e:
|
||||
logging.error(f'The gym environment with id {env_id} could not been found.')
|
||||
raise e
|
||||
nested_update(all_kwargs, kwargs)
|
||||
kwargs = all_kwargs
|
||||
|
||||
# Add seed to kwargs for bb environments to pass seed to step environments
|
||||
all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), [])
|
||||
if env_id in all_bb_envs:
|
||||
kwargs.update({"seed": seed})
|
||||
|
||||
# Gym
|
||||
env = gym.make(env_id, **kwargs)
|
||||
return env
|
||||
duration = env.spec.max_episode_steps * env.dt
|
||||
except (AttributeError, TypeError) as e:
|
||||
if env.env_type is EnvType.COMPOSER:
|
||||
max_episode_steps = ceil(env.unwrapped._time_limit / env.dt)
|
||||
elif env.env_type is EnvType.RL_CONTROL:
|
||||
max_episode_steps = int(env.unwrapped._step_limit)
|
||||
else:
|
||||
raise e
|
||||
duration = max_episode_steps * env.control_timestep()
|
||||
return duration
|
||||
|
||||
|
||||
def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]):
|
||||
|
@ -1,78 +0,0 @@
|
||||
"""
|
||||
Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py
|
||||
License: MIT
|
||||
Copyright (c) 2016 OpenAI (https://openai.com)
|
||||
|
||||
Wrapper for adding time aware observations to environment observation.
|
||||
"""
|
||||
import gym
|
||||
import numpy as np
|
||||
from gym.spaces import Box
|
||||
|
||||
|
||||
class TimeAwareObservation(gym.ObservationWrapper):
|
||||
"""Augment the observation with the current time step in the episode.
|
||||
|
||||
The observation space of the wrapped environment is assumed to be a flat :class:`Box`.
|
||||
In particular, pixel observations are not supported. This wrapper will append the current timestep
|
||||
within the current episode to the observation.
|
||||
|
||||
Example:
|
||||
>>> import gym
|
||||
>>> env = gym.make('CartPole-v1')
|
||||
>>> env = TimeAwareObservation(env)
|
||||
>>> env.reset()
|
||||
array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ])
|
||||
>>> env.step(env.action_space.sample())[0]
|
||||
array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ])
|
||||
"""
|
||||
|
||||
def __init__(self, env: gym.Env):
|
||||
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box`
|
||||
observation space.
|
||||
|
||||
Args:
|
||||
env: The environment to apply the wrapper
|
||||
"""
|
||||
super().__init__(env)
|
||||
assert isinstance(env.observation_space, Box)
|
||||
low = np.append(self.observation_space.low, 0.0)
|
||||
high = np.append(self.observation_space.high, 1.0)
|
||||
self.observation_space = Box(low, high, dtype=self.observation_space.dtype)
|
||||
self.t = 0
|
||||
self._max_episode_steps = env.spec.max_episode_steps
|
||||
|
||||
def observation(self, observation):
|
||||
"""Adds to the observation with the current time step normalized with max steps.
|
||||
|
||||
Args:
|
||||
observation: The observation to add the time step to
|
||||
|
||||
Returns:
|
||||
The observation with the time step appended to
|
||||
"""
|
||||
return np.append(observation, self.t / self._max_episode_steps)
|
||||
|
||||
def step(self, action):
|
||||
"""Steps through the environment, incrementing the time step.
|
||||
|
||||
Args:
|
||||
action: The action to take
|
||||
|
||||
Returns:
|
||||
The environment's step using the action.
|
||||
"""
|
||||
self.t += 1
|
||||
return super().step(action)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
"""Reset the environment setting the time to zero.
|
||||
|
||||
Args:
|
||||
**kwargs: Kwargs to apply to env.reset()
|
||||
|
||||
Returns:
|
||||
The reset environment
|
||||
"""
|
||||
self.t = 0
|
||||
return super().reset(**kwargs)
|
130
fancy_gym/utils/wrappers.py
Normal file
130
fancy_gym/utils/wrappers.py
Normal file
@ -0,0 +1,130 @@
|
||||
from gymnasium.spaces import Box, Dict, flatten, flatten_space
|
||||
try:
|
||||
from gym.spaces import Box as OldBox
|
||||
except ImportError:
|
||||
OldBox = None
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import copy
|
||||
|
||||
|
||||
class TimeAwareObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
|
||||
"""Augment the observation with the current time step in the episode.
|
||||
|
||||
The observation space of the wrapped environment is assumed to be a flat :class:`Box` or flattable :class:`Dict`.
|
||||
In particular, pixel observations are not supported. This wrapper will append the current progress within the current episode to the observation.
|
||||
The progress will be indicated as a number between 0 and 1.
|
||||
"""
|
||||
|
||||
def __init__(self, env: gym.Env, enforce_dtype_float32=False):
|
||||
"""Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` or flattable :class:`Dict` observation space.
|
||||
|
||||
Args:
|
||||
env: The environment to apply the wrapper
|
||||
"""
|
||||
gym.utils.RecordConstructorArgs.__init__(self)
|
||||
gym.ObservationWrapper.__init__(self, env)
|
||||
allowed_classes = [Box, OldBox, Dict]
|
||||
if enforce_dtype_float32:
|
||||
assert env.observation_space.dtype == np.float32, 'TimeAwareObservation was given an environment with a dtype!=np.float32 ('+str(
|
||||
env.observation_space.dtype)+'). This requirement can be removed by setting enforce_dtype_float32=False.'
|
||||
assert env.observation_space.__class__ in allowed_classes, str(env.observation_space)+' is not supported. Only Box or Dict'
|
||||
|
||||
if env.observation_space.__class__ in [Box, OldBox]:
|
||||
dtype = env.observation_space.dtype
|
||||
|
||||
low = np.append(env.observation_space.low, 0.0)
|
||||
high = np.append(env.observation_space.high, 1.0)
|
||||
|
||||
self.observation_space = Box(low, high, dtype=dtype)
|
||||
else:
|
||||
spaces = copy.copy(env.observation_space.spaces)
|
||||
dtype = np.float64
|
||||
spaces['time_awareness'] = Box(0, 1, dtype=dtype)
|
||||
|
||||
self.observation_space = Dict(spaces)
|
||||
|
||||
self.is_vector_env = getattr(env, "is_vector_env", False)
|
||||
|
||||
def observation(self, observation):
|
||||
"""Adds to the observation with the current time step.
|
||||
|
||||
Args:
|
||||
observation: The observation to add the time step to
|
||||
|
||||
Returns:
|
||||
The observation with the time step appended to (relative to total number of steps)
|
||||
"""
|
||||
if self.observation_space.__class__ in [Box, OldBox]:
|
||||
return np.append(observation, self.t / self.env.spec.max_episode_steps)
|
||||
else:
|
||||
obs = copy.copy(observation)
|
||||
obs['time_awareness'] = self.t / self.env.spec.max_episode_steps
|
||||
return obs
|
||||
|
||||
def step(self, action):
|
||||
"""Steps through the environment, incrementing the time step.
|
||||
|
||||
Args:
|
||||
action: The action to take
|
||||
|
||||
Returns:
|
||||
The environment's step using the action.
|
||||
"""
|
||||
self.t += 1
|
||||
return super().step(action)
|
||||
|
||||
def reset(self, **kwargs):
|
||||
"""Reset the environment setting the time to zero.
|
||||
|
||||
Args:
|
||||
**kwargs: Kwargs to apply to env.reset()
|
||||
|
||||
Returns:
|
||||
The reset environment
|
||||
"""
|
||||
self.t = 0
|
||||
return super().reset(**kwargs)
|
||||
|
||||
|
||||
class FlattenObservation(gym.ObservationWrapper, gym.utils.RecordConstructorArgs):
|
||||
"""Observation wrapper that flattens the observation.
|
||||
|
||||
Example:
|
||||
>>> import gymnasium as gym
|
||||
>>> from gymnasium.wrappers import FlattenObservation
|
||||
>>> env = gym.make("CarRacing-v2")
|
||||
>>> env.observation_space.shape
|
||||
(96, 96, 3)
|
||||
>>> env = FlattenObservation(env)
|
||||
>>> env.observation_space.shape
|
||||
(27648,)
|
||||
>>> obs, _ = env.reset()
|
||||
>>> obs.shape
|
||||
(27648,)
|
||||
"""
|
||||
|
||||
def __init__(self, env: gym.Env):
|
||||
"""Flattens the observations of an environment.
|
||||
|
||||
Args:
|
||||
env: The environment to apply the wrapper
|
||||
"""
|
||||
gym.utils.RecordConstructorArgs.__init__(self)
|
||||
gym.ObservationWrapper.__init__(self, env)
|
||||
|
||||
self.observation_space = flatten_space(env.observation_space)
|
||||
|
||||
def observation(self, observation):
|
||||
"""Flattens an observation.
|
||||
|
||||
Args:
|
||||
observation: The observation to flatten
|
||||
|
||||
Returns:
|
||||
The flattened observation
|
||||
"""
|
||||
try:
|
||||
return flatten(self.env.observation_space, observation)
|
||||
except:
|
||||
return np.array([flatten(self.env.observation_space, observation[i]) for i in range(len(observation))])
|
101
icon.svg
Normal file
101
icon.svg
Normal file
File diff suppressed because one or more lines are too long
After Width: | Height: | Size: 114 KiB |
30
setup.py
30
setup.py
@ -6,33 +6,38 @@ from setuptools import setup, find_packages
|
||||
|
||||
# Environment-specific dependencies for dmc and metaworld
|
||||
extras = {
|
||||
"dmc": ["dm_control>=1.0.1"],
|
||||
"metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
|
||||
'mujoco-py<2.2,>=2.1',
|
||||
'scipy'
|
||||
],
|
||||
'dmc': ['shimmy[dm-control]', 'Shimmy==1.0.0'],
|
||||
'metaworld': ['metaworld @ git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld'],
|
||||
'box2d': ['gymnasium[box2d]>=0.26.0'],
|
||||
'mujoco': ['mujoco==2.3.3', 'gymnasium[mujoco]>0.26.0'],
|
||||
'mujoco-legacy': ['mujoco-py >=2.1,<2.2', 'cython<3'],
|
||||
'jax': ["jax >=0.4.0", "jaxlib >=0.4.0"],
|
||||
}
|
||||
|
||||
# All dependencies
|
||||
all_groups = set(extras.keys())
|
||||
extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups))))
|
||||
extras["all"] = list(set(itertools.chain.from_iterable(
|
||||
map(lambda group: extras[group], all_groups))))
|
||||
|
||||
extras['testing'] = extras["all"] + ['pytest']
|
||||
|
||||
|
||||
def find_package_data(extensions_to_include: List[str]) -> List[str]:
|
||||
envs_dir = Path("fancy_gym/envs/mujoco")
|
||||
package_data_paths = []
|
||||
for extension in extensions_to_include:
|
||||
package_data_paths.extend([str(path)[10:] for path in envs_dir.rglob(extension)])
|
||||
package_data_paths.extend([str(path)[10:]
|
||||
for path in envs_dir.rglob(extension)])
|
||||
|
||||
return package_data_paths
|
||||
|
||||
|
||||
setup(
|
||||
author='Fabian Otto, Onur Celik',
|
||||
author='Fabian Otto, Onur Celik, Dominik Roth, Hongyi Zhou',
|
||||
name='fancy_gym',
|
||||
version='0.2',
|
||||
version='1.0',
|
||||
classifiers=[
|
||||
'Development Status :: 3 - Alpha',
|
||||
'Development Status :: 4 - Beta',
|
||||
'Intended Audience :: Science/Research',
|
||||
'License :: OSI Approved :: MIT License',
|
||||
'Natural Language :: English',
|
||||
@ -46,10 +51,11 @@ setup(
|
||||
],
|
||||
extras_require=extras,
|
||||
install_requires=[
|
||||
'gym[mujoco]<0.25.0,>=0.24.1',
|
||||
'gymnasium>=0.26.0',
|
||||
'mp_pytorch<=0.1.3'
|
||||
],
|
||||
packages=[package for package in find_packages() if package.startswith("fancy_gym")],
|
||||
packages=[package for package in find_packages(
|
||||
) if package.startswith("fancy_gym")],
|
||||
package_data={
|
||||
"fancy_gym": find_package_data(extensions_to_include=["*.stl", "*.xml"])
|
||||
},
|
||||
|
@ -1,14 +1,21 @@
|
||||
import re
|
||||
from itertools import chain
|
||||
from typing import Callable
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import pytest
|
||||
|
||||
import fancy_gym
|
||||
from test.utils import run_env, run_env_determinism
|
||||
|
||||
GYM_IDS = [spec.id for spec in gym.envs.registry.all() if
|
||||
"fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
|
||||
GYM_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
GYM_IDS = [spec.id for spec in gym.envs.registry.values() if
|
||||
not isinstance(spec.entry_point, Callable) and
|
||||
"fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point
|
||||
and 'jax' not in spec.id.lower()
|
||||
and 'jax' not in spec.id.lower()
|
||||
and not re.match(r'GymV2.Environment', spec.id)
|
||||
]
|
||||
GYM_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
SEED = 1
|
||||
|
||||
|
@ -1,21 +1,23 @@
|
||||
from itertools import chain
|
||||
from typing import Tuple, Type, Union, Optional, Callable
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import pytest
|
||||
from gym import register
|
||||
from gym.core import ActType, ObsType
|
||||
from gymnasium import register, make
|
||||
from gymnasium.core import ActType, ObsType
|
||||
|
||||
import fancy_gym
|
||||
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
|
||||
from fancy_gym.utils.wrappers import TimeAwareObservation
|
||||
|
||||
SEED = 1
|
||||
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
|
||||
ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
|
||||
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
|
||||
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
|
||||
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
|
||||
MAX_STEPS_FALLBACK = 100
|
||||
|
||||
|
||||
class Object(object):
|
||||
@ -32,10 +34,12 @@ class ToyEnv(gym.Env):
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
return np.array([-1])
|
||||
obs, options = np.array([-1]), {}
|
||||
return obs, options
|
||||
|
||||
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
|
||||
return np.array([-1]), 1, False, {}
|
||||
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
|
||||
return obs, reward, terminated, truncated, info
|
||||
|
||||
def render(self, mode="human"):
|
||||
pass
|
||||
@ -76,7 +80,7 @@ def test_missing_local_state(mp_type: str):
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': basis_generator_type})
|
||||
env.reset()
|
||||
env.reset(seed=SEED)
|
||||
with pytest.raises(NotImplementedError):
|
||||
env.step(env.action_space.sample())
|
||||
|
||||
@ -93,12 +97,14 @@ def test_verbosity(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': basis_generator_type})
|
||||
env.reset()
|
||||
info_keys = list(env.step(env.action_space.sample())[3].keys())
|
||||
env.reset(seed=SEED)
|
||||
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
|
||||
info_keys = list(info.keys())
|
||||
|
||||
env_step = fancy_gym.make(env_id, SEED)
|
||||
env_step = make(env_id)
|
||||
env_step.reset()
|
||||
info_keys_step = env_step.step(env_step.action_space.sample())[3].keys()
|
||||
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
|
||||
info_keys_step = info.keys()
|
||||
|
||||
assert all(e in info_keys for e in info_keys_step)
|
||||
assert 'trajectory_length' in info_keys
|
||||
@ -118,13 +124,15 @@ def test_length(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]]):
|
||||
{'trajectory_generator_type': mp_type},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': basis_generator_type})
|
||||
{'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
for _ in range(5):
|
||||
env.reset()
|
||||
length = env.step(env.action_space.sample())[3]['trajectory_length']
|
||||
for i in range(5):
|
||||
env.reset(seed=SEED)
|
||||
|
||||
assert length == env.spec.max_episode_steps
|
||||
_obs, _reward, _terminated, _truncated, info = env.step(env.action_space.sample())
|
||||
length = info['trajectory_length']
|
||||
|
||||
assert length == env.spec.max_episode_steps, f'Expcted total simulation length ({length}) to be equal to spec.max_episode_steps ({env.spec.max_episode_steps}), but was not during test nr. {i}'
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
|
||||
@ -136,9 +144,10 @@ def test_aggregation(mp_type: str, reward_aggregation: Callable[[np.ndarray], fl
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': basis_generator_type})
|
||||
env.reset()
|
||||
env.reset(seed=SEED)
|
||||
# ToyEnv only returns 1 as reward
|
||||
assert env.step(env.action_space.sample())[1] == reward_aggregation(np.ones(50, ))
|
||||
_obs, reward, _terminated, _truncated, _info = env.step(env.action_space.sample())
|
||||
assert reward == reward_aggregation(np.ones(50, ))
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'dmp'])
|
||||
@ -151,14 +160,16 @@ def test_context_space(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapp
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': 'rbf'})
|
||||
# check if observation space matches with the specified mask values which are true
|
||||
env_step = fancy_gym.make(env_id, SEED)
|
||||
env_step = make(env_id)
|
||||
wrapper = wrapper_class(env_step)
|
||||
assert env.observation_space.shape == wrapper.context_mask[wrapper.context_mask].shape
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'dmp', 'prodmp'])
|
||||
@pytest.mark.parametrize('num_dof', [0, 1, 2, 5])
|
||||
@pytest.mark.parametrize('num_basis', [0, 1, 2, 5])
|
||||
@pytest.mark.parametrize('num_basis', [
|
||||
pytest.param(0, marks=pytest.mark.xfail(reason="Basis Length 0 is not yet implemented.")),
|
||||
1, 2, 5])
|
||||
@pytest.mark.parametrize('learn_tau', [True, False])
|
||||
@pytest.mark.parametrize('learn_delay', [True, False])
|
||||
def test_action_space(mp_type: str, num_dof: int, num_basis: int, learn_tau: bool, learn_delay: bool):
|
||||
@ -219,16 +230,18 @@ def test_learn_tau(mp_type: str, tau: float):
|
||||
'learn_delay': False
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
}, seed=SEED)
|
||||
})
|
||||
|
||||
d = True
|
||||
env.reset(seed=SEED)
|
||||
done = True
|
||||
for i in range(5):
|
||||
if d:
|
||||
env.reset()
|
||||
if done:
|
||||
env.reset(seed=SEED)
|
||||
action = env.action_space.sample()
|
||||
action[0] = tau
|
||||
|
||||
obs, r, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
length = info['trajectory_length']
|
||||
assert length == env.spec.max_episode_steps
|
||||
@ -248,6 +261,8 @@ def test_learn_tau(mp_type: str, tau: float):
|
||||
assert np.all(vel[:tau_time_steps - 2] != vel[-1])
|
||||
#
|
||||
#
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('delay', [0, 0.25, 0.5, 0.75])
|
||||
def test_learn_delay(mp_type: str, delay: float):
|
||||
@ -262,16 +277,18 @@ def test_learn_delay(mp_type: str, delay: float):
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
}, seed=SEED)
|
||||
})
|
||||
|
||||
d = True
|
||||
env.reset(seed=SEED)
|
||||
done = True
|
||||
for i in range(5):
|
||||
if d:
|
||||
env.reset()
|
||||
if done:
|
||||
env.reset(seed=SEED)
|
||||
action = env.action_space.sample()
|
||||
action[0] = delay
|
||||
|
||||
obs, r, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
length = info['trajectory_length']
|
||||
assert length == env.spec.max_episode_steps
|
||||
@ -290,6 +307,8 @@ def test_learn_delay(mp_type: str, delay: float):
|
||||
assert np.all(vel[max(1, delay_time_steps)] != vel[0])
|
||||
#
|
||||
#
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('tau', [0.25, 0.5, 0.75, 1])
|
||||
@pytest.mark.parametrize('delay', [0.25, 0.5, 0.75, 1])
|
||||
@ -305,20 +324,23 @@ def test_learn_tau_and_delay(mp_type: str, tau: float, delay: float):
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
}, seed=SEED)
|
||||
})
|
||||
|
||||
env.reset(seed=SEED)
|
||||
|
||||
if env.spec.max_episode_steps * env.dt < delay + tau:
|
||||
return
|
||||
|
||||
d = True
|
||||
done = True
|
||||
for i in range(5):
|
||||
if d:
|
||||
env.reset()
|
||||
if done:
|
||||
env.reset(seed=SEED)
|
||||
action = env.action_space.sample()
|
||||
action[0] = tau
|
||||
action[1] = delay
|
||||
|
||||
obs, r, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
length = info['trajectory_length']
|
||||
assert length == env.spec.max_episode_steps
|
||||
|
@ -1,39 +1,30 @@
|
||||
from itertools import chain
|
||||
from typing import Callable
|
||||
|
||||
import gymnasium as gym
|
||||
import pytest
|
||||
from dm_control import suite, manipulation
|
||||
|
||||
import fancy_gym
|
||||
from test.utils import run_env, run_env_determinism
|
||||
|
||||
SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"]
|
||||
MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')]
|
||||
DMC_MP_IDS = chain(*fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
DMC_IDS = [spec.id for spec in gym.envs.registry.values() if
|
||||
spec.id.startswith('dm_control/')
|
||||
and 'compatibility-env-v0' not in spec.id
|
||||
and 'lqr-lqr' not in spec.id]
|
||||
DMC_MP_IDS = fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
SEED = 1
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', SUITE_IDS)
|
||||
def test_step_suite_functionality(env_id: str):
|
||||
@pytest.mark.parametrize('env_id', DMC_IDS)
|
||||
def test_step_dm_control_functionality(env_id: str):
|
||||
"""Tests that suite step environments run without errors using random actions."""
|
||||
run_env(env_id)
|
||||
run_env(env_id, 5000, wrappers=[gym.wrappers.FlattenObservation])
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', SUITE_IDS)
|
||||
def test_step_suite_determinism(env_id: str):
|
||||
@pytest.mark.parametrize('env_id', DMC_IDS)
|
||||
def test_step_dm_control_determinism(env_id: str):
|
||||
"""Tests that for step environments identical seeds produce identical trajectories."""
|
||||
run_env_determinism(env_id, SEED)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
|
||||
def test_step_manipulation_functionality(env_id: str):
|
||||
"""Tests that manipulation step environments run without errors using random actions."""
|
||||
run_env(env_id)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', MANIPULATION_IDS)
|
||||
def test_step_manipulation_determinism(env_id: str):
|
||||
"""Tests that for step environments identical seeds produce identical trajectories."""
|
||||
run_env_determinism(env_id, SEED)
|
||||
run_env_determinism(env_id, SEED, 5000, wrappers=[gym.wrappers.FlattenObservation])
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', DMC_MP_IDS)
|
||||
|
@ -1,14 +1,16 @@
|
||||
import itertools
|
||||
from itertools import chain
|
||||
from typing import Callable
|
||||
|
||||
import fancy_gym
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import pytest
|
||||
|
||||
from test.utils import run_env, run_env_determinism
|
||||
|
||||
CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if
|
||||
CUSTOM_IDS = [id for id, spec in gym.envs.registry.items() if
|
||||
not isinstance(spec.entry_point, Callable) and
|
||||
"fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point]
|
||||
CUSTOM_MP_IDS = itertools.chain(*fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
CUSTOM_MP_IDS = fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
SEED = 1
|
||||
|
||||
|
||||
|
78
test/test_fancy_registry.py
Normal file
78
test/test_fancy_registry.py
Normal file
@ -0,0 +1,78 @@
|
||||
from typing import Tuple, Type, Union, Optional, Callable
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import pytest
|
||||
from gymnasium import make
|
||||
from gymnasium.core import ActType, ObsType
|
||||
|
||||
import fancy_gym
|
||||
from fancy_gym import register
|
||||
|
||||
KNOWN_NS = ['dm_control', 'fancy', 'metaworld', 'gym']
|
||||
|
||||
|
||||
class Object(object):
|
||||
pass
|
||||
|
||||
|
||||
class ToyEnv(gym.Env):
|
||||
observation_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
|
||||
action_space = gym.spaces.Box(low=-1, high=1, shape=(1,), dtype=np.float64)
|
||||
dt = 0.02
|
||||
|
||||
def __init__(self, a: int = 0, b: float = 0.0, c: list = [], d: dict = {}, e: Object = Object()):
|
||||
self.a, self.b, self.c, self.d, self.e = a, b, c, d, e
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
obs, options = np.array([-1]), {}
|
||||
return obs, options
|
||||
|
||||
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
|
||||
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
|
||||
return obs, reward, terminated, truncated, info
|
||||
|
||||
def render(self, mode="human"):
|
||||
pass
|
||||
|
||||
|
||||
@pytest.fixture(scope="session", autouse=True)
|
||||
def setup():
|
||||
register(
|
||||
id=f'dummy/toy2-v0',
|
||||
entry_point='test.test_black_box:ToyEnv',
|
||||
max_episode_steps=50,
|
||||
)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('env_id', ['dummy/toy2-v0'])
|
||||
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
|
||||
def test_make_mp(env_id: str, mp_type: str):
|
||||
parts = env_id.split('/')
|
||||
if len(parts) == 1:
|
||||
ns, name = 'gym', parts[0]
|
||||
elif len(parts) == 2:
|
||||
ns, name = parts[0], parts[1]
|
||||
else:
|
||||
raise ValueError('env id can not contain multiple "/".')
|
||||
|
||||
fancy_id = f'{ns}_{mp_type}/{name}'
|
||||
|
||||
make(fancy_id)
|
||||
|
||||
|
||||
def test_make_raw_toy():
|
||||
make('dummy/toy2-v0')
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['ProMP', 'DMP', 'ProDMP'])
|
||||
def test_make_mp_toy(mp_type: str):
|
||||
fancy_id = f'dummy_{mp_type}/toy2-v0'
|
||||
|
||||
make(fancy_id)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('ns', KNOWN_NS)
|
||||
def test_ns_nonempty(ns):
|
||||
assert len(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS[ns]), f'The namespace {ns} is empty even though, it should not be...'
|
@ -6,9 +6,9 @@ from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE
|
||||
import fancy_gym
|
||||
from test.utils import run_env, run_env_determinism
|
||||
|
||||
METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in
|
||||
METAWORLD_IDS = [f'metaworld/{env.split("-goal-observable")[0]}' for env, _ in
|
||||
ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()]
|
||||
METAWORLD_MP_IDS = chain(*fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
METAWORLD_MP_IDS = fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
SEED = 1
|
||||
|
||||
|
||||
@ -18,6 +18,7 @@ def test_step_metaworld_functionality(env_id: str):
|
||||
run_env(env_id)
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
|
||||
@pytest.mark.parametrize('env_id', METAWORLD_IDS)
|
||||
def test_step_metaworld_determinism(env_id: str):
|
||||
"""Tests that for step environments identical seeds produce identical trajectories."""
|
||||
@ -30,6 +31,7 @@ def test_bb_metaworld_functionality(env_id: str):
|
||||
run_env(env_id)
|
||||
|
||||
|
||||
@pytest.mark.skip(reason="Seeding does not correctly work on current Metaworld.")
|
||||
@pytest.mark.parametrize('env_id', METAWORLD_MP_IDS)
|
||||
def test_bb_metaworld_determinism(env_id: str):
|
||||
"""Tests that for black box environment identical seeds produce identical trajectories."""
|
||||
|
@ -2,21 +2,25 @@ from itertools import chain
|
||||
from types import FunctionType
|
||||
from typing import Tuple, Type, Union, Optional
|
||||
|
||||
import gym
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
import pytest
|
||||
from gym import register
|
||||
from gym.core import ActType, ObsType
|
||||
from gymnasium import register, make
|
||||
from gymnasium.core import ActType, ObsType
|
||||
from gymnasium import spaces
|
||||
|
||||
import fancy_gym
|
||||
from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper
|
||||
from fancy_gym.utils.time_aware_observation import TimeAwareObservation
|
||||
from fancy_gym.utils.wrappers import TimeAwareObservation
|
||||
from fancy_gym.utils.make_env_helpers import ensure_finite_time
|
||||
|
||||
SEED = 1
|
||||
ENV_IDS = ['Reacher5d-v0', 'dmc:ball_in_cup-catch', 'metaworld:reach-v2', 'Reacher-v2']
|
||||
ENV_IDS = ['fancy/Reacher5d-v0', 'dm_control/ball_in_cup-catch-v0', 'metaworld/reach-v2', 'Reacher-v2']
|
||||
WRAPPERS = [fancy_gym.envs.mujoco.reacher.MPWrapper, fancy_gym.dmc.suite.ball_in_cup.MPWrapper,
|
||||
fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper, fancy_gym.open_ai.mujoco.reacher_v2.MPWrapper]
|
||||
ALL_MP_ENVS = chain(*fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values())
|
||||
ALL_MP_ENVS = fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS['all']
|
||||
|
||||
MAX_STEPS_FALLBACK = 50
|
||||
|
||||
|
||||
class ToyEnv(gym.Env):
|
||||
@ -26,10 +30,12 @@ class ToyEnv(gym.Env):
|
||||
|
||||
def reset(self, *, seed: Optional[int] = None, return_info: bool = False,
|
||||
options: Optional[dict] = None) -> Union[ObsType, Tuple[ObsType, dict]]:
|
||||
return np.array([-1])
|
||||
obs, options = np.array([-1]), {}
|
||||
return obs, options
|
||||
|
||||
def step(self, action: ActType) -> Tuple[ObsType, float, bool, dict]:
|
||||
return np.array([-1]), 1, False, {}
|
||||
obs, reward, terminated, truncated, info = np.array([-1]), 1, False, False, {}
|
||||
return obs, reward, terminated, truncated, info
|
||||
|
||||
def render(self, mode="human"):
|
||||
pass
|
||||
@ -61,7 +67,7 @@ def setup():
|
||||
def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
|
||||
add_time_aware_wrapper_before: bool):
|
||||
env_id, wrapper_class = env_wrap
|
||||
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
|
||||
env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
|
||||
wrappers = [wrapper_class]
|
||||
|
||||
# has time aware wrapper
|
||||
@ -72,24 +78,29 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
|
||||
{'trajectory_generator_type': mp_type},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': 'exp'},
|
||||
{'basis_generator_type': 'rbf'}, seed=SEED)
|
||||
{'basis_generator_type': 'rbf'}, fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
env.reset(seed=SEED)
|
||||
|
||||
assert env.learn_sub_trajectories
|
||||
assert env.spec.max_episode_steps
|
||||
assert env_step.spec.max_episode_steps
|
||||
assert env.traj_gen.learn_tau
|
||||
# This also verifies we are not adding the TimeAwareObservationWrapper twice
|
||||
assert env.observation_space == env_step.observation_space
|
||||
assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
|
||||
|
||||
d = True
|
||||
done = True
|
||||
|
||||
for i in range(25):
|
||||
if d:
|
||||
env.reset()
|
||||
if done:
|
||||
env.reset(seed=SEED)
|
||||
|
||||
action = env.action_space.sample()
|
||||
obs, r, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
length = info['trajectory_length']
|
||||
|
||||
if not d:
|
||||
if not done:
|
||||
assert length == np.round(action[0] / env.dt)
|
||||
assert length == np.round(env.traj_gen.tau.numpy() / env.dt)
|
||||
else:
|
||||
@ -105,14 +116,14 @@ def test_learn_sub_trajectories(mp_type: str, env_wrap: Tuple[str, Type[RawInter
|
||||
def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWrapper]],
|
||||
add_time_aware_wrapper_before: bool, replanning_time: int):
|
||||
env_id, wrapper_class = env_wrap
|
||||
env_step = TimeAwareObservation(fancy_gym.make(env_id, SEED))
|
||||
env_step = TimeAwareObservation(ensure_finite_time(make(env_id, SEED), MAX_STEPS_FALLBACK))
|
||||
wrappers = [wrapper_class]
|
||||
|
||||
# has time aware wrapper
|
||||
if add_time_aware_wrapper_before:
|
||||
wrappers += [TimeAwareObservation]
|
||||
|
||||
replanning_schedule = lambda c_pos, c_vel, obs, c_action, t: t % replanning_time == 0
|
||||
def replanning_schedule(c_pos, c_vel, obs, c_action, t): return t % replanning_time == 0
|
||||
|
||||
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
|
||||
phase_generator_type = 'exp' if 'dmp' in mp_type else 'linear'
|
||||
@ -121,31 +132,36 @@ def test_replanning_time(mp_type: str, env_wrap: Tuple[str, Type[RawInterfaceWra
|
||||
{'trajectory_generator_type': mp_type},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': phase_generator_type},
|
||||
{'basis_generator_type': basis_generator_type}, seed=SEED)
|
||||
{'basis_generator_type': basis_generator_type}, fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
env.reset(seed=SEED)
|
||||
|
||||
assert env.do_replanning
|
||||
assert env.spec.max_episode_steps
|
||||
assert env_step.spec.max_episode_steps
|
||||
assert callable(env.replanning_schedule)
|
||||
# This also verifies we are not adding the TimeAwareObservationWrapper twice
|
||||
assert env.observation_space == env_step.observation_space
|
||||
assert spaces.flatten_space(env_step.observation_space) == spaces.flatten_space(env.observation_space)
|
||||
|
||||
env.reset()
|
||||
env.reset(seed=SEED)
|
||||
|
||||
episode_steps = env_step.spec.max_episode_steps // replanning_time
|
||||
# Make 3 episodes, total steps depend on the replanning steps
|
||||
for i in range(3 * episode_steps):
|
||||
action = env.action_space.sample()
|
||||
obs, r, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
length = info['trajectory_length']
|
||||
|
||||
if d:
|
||||
if done:
|
||||
# Check if number of steps until termination match the replanning interval
|
||||
print(d, (i + 1), episode_steps)
|
||||
print(done, (i + 1), episode_steps)
|
||||
assert (i + 1) % episode_steps == 0
|
||||
env.reset()
|
||||
env.reset(seed=SEED)
|
||||
|
||||
assert replanning_schedule(None, None, None, None, length)
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
|
||||
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
|
||||
@ -165,15 +181,19 @@ def test_max_planning_times(mp_type: str, max_planning_times: int, sub_segment_s
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
seed=SEED)
|
||||
_ = env.reset()
|
||||
d = False
|
||||
fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
_ = env.reset(seed=SEED)
|
||||
done = False
|
||||
planning_times = 0
|
||||
while not d:
|
||||
_, _, d, _ = env.step(env.action_space.sample())
|
||||
while not done:
|
||||
action = env.action_space.sample()
|
||||
_obs, _reward, terminated, truncated, _info = env.step(action)
|
||||
done = terminated or truncated
|
||||
planning_times += 1
|
||||
assert planning_times == max_planning_times
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
|
||||
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
|
||||
@ -194,17 +214,20 @@ def test_replanning_with_learn_tau(mp_type: str, max_planning_times: int, sub_se
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
seed=SEED)
|
||||
_ = env.reset()
|
||||
d = False
|
||||
fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
_ = env.reset(seed=SEED)
|
||||
done = False
|
||||
planning_times = 0
|
||||
while not d:
|
||||
while not done:
|
||||
action = env.action_space.sample()
|
||||
action[0] = tau
|
||||
_, _, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, _info = env.step(action)
|
||||
done = terminated or truncated
|
||||
planning_times += 1
|
||||
assert planning_times == max_planning_times
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
|
||||
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
|
||||
@ -213,26 +236,28 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
|
||||
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
|
||||
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
|
||||
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
|
||||
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
|
||||
'max_planning_times': max_planning_times,
|
||||
'verbose': 2},
|
||||
{'trajectory_generator_type': mp_type,
|
||||
},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': phase_generator_type,
|
||||
'learn_tau': False,
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
seed=SEED)
|
||||
_ = env.reset()
|
||||
d = False
|
||||
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
|
||||
'max_planning_times': max_planning_times,
|
||||
'verbose': 2},
|
||||
{'trajectory_generator_type': mp_type,
|
||||
},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': phase_generator_type,
|
||||
'learn_tau': False,
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
_ = env.reset(seed=SEED)
|
||||
done = False
|
||||
planning_times = 0
|
||||
while not d:
|
||||
while not done:
|
||||
action = env.action_space.sample()
|
||||
action[0] = delay
|
||||
_, _, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
delay_time_steps = int(np.round(delay / env.dt))
|
||||
pos = info['positions'].flatten()
|
||||
@ -256,6 +281,7 @@ def test_replanning_with_learn_delay(mp_type: str, max_planning_times: int, sub_
|
||||
|
||||
assert planning_times == max_planning_times
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('max_planning_times', [1, 2, 3])
|
||||
@pytest.mark.parametrize('sub_segment_steps', [5, 10, 15])
|
||||
@ -266,27 +292,29 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
|
||||
basis_generator_type = 'prodmp' if mp_type == 'prodmp' else 'rbf'
|
||||
phase_generator_type = 'exp' if mp_type == 'prodmp' else 'linear'
|
||||
env = fancy_gym.make_bb('toy-v0', [ToyWrapper],
|
||||
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
|
||||
'max_planning_times': max_planning_times,
|
||||
'verbose': 2},
|
||||
{'trajectory_generator_type': mp_type,
|
||||
},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': phase_generator_type,
|
||||
'learn_tau': True,
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
seed=SEED)
|
||||
_ = env.reset()
|
||||
d = False
|
||||
{'replanning_schedule': lambda pos, vel, obs, action, t: t % sub_segment_steps == 0,
|
||||
'max_planning_times': max_planning_times,
|
||||
'verbose': 2},
|
||||
{'trajectory_generator_type': mp_type,
|
||||
},
|
||||
{'controller_type': 'motor'},
|
||||
{'phase_generator_type': phase_generator_type,
|
||||
'learn_tau': True,
|
||||
'learn_delay': True
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
_ = env.reset(seed=SEED)
|
||||
done = False
|
||||
planning_times = 0
|
||||
while not d:
|
||||
while not done:
|
||||
action = env.action_space.sample()
|
||||
action[0] = tau
|
||||
action[1] = delay
|
||||
_, _, d, info = env.step(action)
|
||||
_obs, _reward, terminated, truncated, info = env.step(action)
|
||||
done = terminated or truncated
|
||||
|
||||
delay_time_steps = int(np.round(delay / env.dt))
|
||||
|
||||
@ -306,6 +334,7 @@ def test_replanning_with_learn_delay_and_tau(mp_type: str, max_planning_times: i
|
||||
|
||||
assert planning_times == max_planning_times
|
||||
|
||||
|
||||
@pytest.mark.parametrize('mp_type', ['promp', 'prodmp'])
|
||||
@pytest.mark.parametrize('max_planning_times', [1, 2, 3, 4])
|
||||
@pytest.mark.parametrize('sub_segment_steps', [5, 10])
|
||||
@ -325,9 +354,11 @@ def test_replanning_schedule(mp_type: str, max_planning_times: int, sub_segment_
|
||||
},
|
||||
{'basis_generator_type': basis_generator_type,
|
||||
},
|
||||
seed=SEED)
|
||||
_ = env.reset()
|
||||
d = False
|
||||
fallback_max_steps=MAX_STEPS_FALLBACK)
|
||||
|
||||
_ = env.reset(seed=SEED)
|
||||
for i in range(max_planning_times):
|
||||
_, _, d, _ = env.step(env.action_space.sample())
|
||||
assert d
|
||||
action = env.action_space.sample()
|
||||
_obs, _reward, terminated, truncated, _info = env.step(action)
|
||||
done = terminated or truncated
|
||||
assert done
|
||||
|
@ -1,9 +1,12 @@
|
||||
import gym
|
||||
from typing import List, Type
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
from fancy_gym import make
|
||||
from gymnasium import make
|
||||
|
||||
|
||||
def run_env(env_id, iterations=None, seed=0, render=False):
|
||||
def run_env(env_id: str, iterations: int = None, seed: int = 0, wrappers: List[Type[gym.Wrapper]] = [],
|
||||
render: bool = False):
|
||||
"""
|
||||
Example for running a DMC based env in the step based setting.
|
||||
The env_id has to be specified as `dmc:domain_name-task_name` or
|
||||
@ -13,70 +16,88 @@ def run_env(env_id, iterations=None, seed=0, render=False):
|
||||
env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name`
|
||||
iterations: Number of rollout steps to run
|
||||
seed: random seeding
|
||||
wrappers: List of Wrappers to apply to the environment
|
||||
render: Render the episode
|
||||
|
||||
Returns: observations, rewards, dones, actions
|
||||
Returns: observations, rewards, terminations, truncations, actions
|
||||
|
||||
"""
|
||||
env: gym.Env = make(env_id, seed=seed)
|
||||
env: gym.Env = make(env_id)
|
||||
for w in wrappers:
|
||||
env = w(env)
|
||||
rewards = []
|
||||
observations = []
|
||||
actions = []
|
||||
dones = []
|
||||
obs = env.reset()
|
||||
terminations = []
|
||||
truncations = []
|
||||
obs, _ = env.reset(seed=seed)
|
||||
env.action_space.seed(seed)
|
||||
verify_observations(obs, env.observation_space, "reset()")
|
||||
|
||||
iterations = iterations or (env.spec.max_episode_steps or 1)
|
||||
|
||||
# number of samples(multiple environment steps)
|
||||
# number of samples (multiple environment steps)
|
||||
for i in range(iterations):
|
||||
observations.append(obs)
|
||||
|
||||
ac = env.action_space.sample()
|
||||
actions.append(ac)
|
||||
# ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape)
|
||||
obs, reward, done, info = env.step(ac)
|
||||
obs, reward, terminated, truncated, info = env.step(ac)
|
||||
|
||||
verify_observations(obs, env.observation_space, "step()")
|
||||
verify_reward(reward)
|
||||
verify_done(done)
|
||||
verify_done(terminated)
|
||||
verify_done(truncated)
|
||||
|
||||
rewards.append(reward)
|
||||
dones.append(done)
|
||||
terminations.append(terminated)
|
||||
truncations.append(truncated)
|
||||
|
||||
if render:
|
||||
env.render("human")
|
||||
|
||||
if done:
|
||||
if terminated or truncated:
|
||||
break
|
||||
if not hasattr(env, "replanning_schedule"):
|
||||
assert done, "Done flag is not True after end of episode."
|
||||
assert terminated or truncated, f"Termination or truncation flag is not True after {i + 1} iterations."
|
||||
|
||||
observations.append(obs)
|
||||
env.close()
|
||||
del env
|
||||
return np.array(observations), np.array(rewards), np.array(dones), np.array(actions)
|
||||
return np.array(observations), np.array(rewards), np.array(terminations), np.array(truncations), np.array(actions)
|
||||
|
||||
|
||||
def run_env_determinism(env_id: str, seed: int):
|
||||
traj1 = run_env(env_id, seed=seed)
|
||||
traj2 = run_env(env_id, seed=seed)
|
||||
def run_env_determinism(env_id: str, seed: int, iterations: int = None, wrappers: List[Type[gym.Wrapper]] = []):
|
||||
traj1 = run_env(env_id, iterations=iterations,
|
||||
seed=seed, wrappers=wrappers)
|
||||
traj2 = run_env(env_id, iterations=iterations,
|
||||
seed=seed, wrappers=wrappers)
|
||||
# Iterate over two trajectories, which should have the same state and action sequence
|
||||
for i, time_step in enumerate(zip(*traj1, *traj2)):
|
||||
obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step
|
||||
assert np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match."
|
||||
assert np.array_equal(ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
|
||||
assert np.array_equal(rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
|
||||
assert np.array_equal(done1, done2), f"Dones [{i}] {done1} and {done2} do not match."
|
||||
obs1, rwd1, term1, trunc1, ac1, obs2, rwd2, term2, trunc2, ac2 = time_step
|
||||
assert np.allclose(
|
||||
obs1, obs2), f"Observations [{i}] {obs1} ({obs1.shape}) and {obs2} ({obs2.shape}) do not match: Biggest difference is {np.abs(obs1-obs2).max()} at index {np.abs(obs1-obs2).argmax()}."
|
||||
assert np.array_equal(
|
||||
ac1, ac2), f"Actions [{i}] {ac1} and {ac2} do not match."
|
||||
assert np.array_equal(
|
||||
rwd1, rwd2), f"Rewards [{i}] {rwd1} and {rwd2} do not match."
|
||||
assert np.array_equal(
|
||||
term1, term2), f"Terminateds [{i}] {term1} and {term2} do not match."
|
||||
assert np.array_equal(
|
||||
term1, term2), f"Truncateds [{i}] {trunc1} and {trunc2} do not match."
|
||||
|
||||
|
||||
def verify_observations(obs, observation_space: gym.Space, obs_type="reset()"):
|
||||
assert observation_space.contains(obs), \
|
||||
f"Observation {obs} received from {obs_type} not contained in observation space {observation_space}."
|
||||
f"Observation {obs} ({obs.shape}) received from {obs_type} not contained in observation space {observation_space}."
|
||||
|
||||
|
||||
def verify_reward(reward):
|
||||
assert isinstance(reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
|
||||
assert isinstance(
|
||||
reward, (float, int)), f"Returned type {type(reward)} as reward, expected float or int."
|
||||
|
||||
|
||||
def verify_done(done):
|
||||
assert isinstance(done, bool), f"Returned {done} as done flag, expected bool."
|
||||
assert isinstance(
|
||||
done, bool), f"Returned {done} as done flag, expected bool."
|
||||
|
Loading…
Reference in New Issue
Block a user