readme udpated
This commit is contained in:
parent
f70f3eeb9a
commit
2e6094982e
194
README.md
194
README.md
@ -1,40 +1,27 @@
|
|||||||
# Fancy Gym
|
# Fancy Gym
|
||||||
|
|
||||||
Fancy gym offers a large variety of reinforcement learning environments under the unifying interface
|
`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
|
||||||
of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI interface) for the benchmark suites
|
of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI gym interface) for the benchmark suites
|
||||||
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
|
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
|
||||||
(DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according
|
(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
|
||||||
to [this guide](https://www.gymlibrary.ml/content/environment_creation/). Unlike existing libraries, we additionally
|
gym environments, use [this guide](https://www.gymlibrary.ml/content/environment_creation/). We highly appreciate it, if
|
||||||
support to control agents with movement primitives, such as Dynamic Movement Primitives (DMPs) and Probabilistic
|
you would then submit a PR for this environment to become part of `fancy_gym`.
|
||||||
Movement Primitives (ProMP, we only consider the mean usually).
|
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
|
||||||
|
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
|
||||||
|
|
||||||
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
|
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
|
||||||
|
|
||||||
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
|
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
|
||||||
optimization, and methods that are often used in traditional robotics and control.
|
optimization, and methods that are often used in traditional robotics and control. MP environments are typically
|
||||||
MP environments are episode-based and always execute a full trajectory, which is generated by a trajectory generator,
|
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
|
||||||
such as a Dynamic Movement Primitive (DMP) or a Probabilistic Movement Primitive (ProMP).
|
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
|
||||||
The generated trajectory is translated into individual step-wise actions by a trajectory tracking controller.
|
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
|
||||||
The exact choice of controller is, however, dependent on the type of environment.
|
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
|
||||||
We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, respectively
|
respectively as well as a special controller for the MetaWorld control suite.
|
||||||
as well as a special controller for the MetaWorld control suite.
|
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
|
||||||
The goal of all MP environments is still to learn a optimal policy. Yet, an action
|
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
|
||||||
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
|
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
|
||||||
framework we support all of this also for the contextual setting, i.e. we expose a subset of the observation space
|
episode. This requires to predict a new action/MP parametrization for each context.
|
||||||
as a single context in the beginning of the episode. This requires to predict a new action/MP parametrization for each
|
|
||||||
context.
|
|
||||||
All environments provide next to the cumulative episode reward all collected information from each
|
|
||||||
step as part of the info dictionary. This information is, however, mainly meant for debugging as well as logging
|
|
||||||
and not for training.
|
|
||||||
|
|
||||||
|Key| Description|
|
|
||||||
|---|---|
|
|
||||||
`trajectory`| Generated trajectory from MP
|
|
||||||
`step_actions`| Step-wise executed action based on controller output
|
|
||||||
`step_observations`| Step-wise intermediate observations
|
|
||||||
`step_rewards`| Step-wise rewards
|
|
||||||
`trajectory_length`| Total number of environment interactions
|
|
||||||
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
|
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@ -56,104 +43,137 @@ cd alr_envs
|
|||||||
pip install -e .
|
pip install -e .
|
||||||
```
|
```
|
||||||
|
|
||||||
## Using the framework
|
In case you want to use dm_control oder metaworld, you can install them by specifying extras
|
||||||
|
|
||||||
We prepared [multiple examples](fancy_gym/examples/), please have a look there for more specific examples.
|
```bash
|
||||||
|
pip install -e .[dmc, metaworld]
|
||||||
|
```
|
||||||
|
|
||||||
### Step-wise environments
|
> **Note:**
|
||||||
|
> While our library already fully supports the new mujoco bindings, metaworld still relies on
|
||||||
|
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
|
||||||
|
|
||||||
|
## How to use Fancy Gym
|
||||||
|
|
||||||
|
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
|
||||||
|
|
||||||
|
### Step-wise Environments
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import fancy_gym
|
import fancy_gym
|
||||||
|
|
||||||
env = fancy_gym.make('Reacher5d-v0', seed=1)
|
env = fancy_gym.make('Reacher5d-v0', seed=1)
|
||||||
state = env.reset()
|
obs = env.reset()
|
||||||
|
|
||||||
for i in range(1000):
|
for i in range(1000):
|
||||||
state, reward, done, info = env.step(env.action_space.sample())
|
action = env.action_space.sample()
|
||||||
|
obs, reward, done, info = env.step(action)
|
||||||
if i % 5 == 0:
|
if i % 5 == 0:
|
||||||
env.render()
|
env.render()
|
||||||
|
|
||||||
if done:
|
if done:
|
||||||
state = env.reset()
|
obs = env.reset()
|
||||||
```
|
```
|
||||||
|
|
||||||
For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
|
When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
|
||||||
as `manipulation-environment_name`. All other environments can be created based on their original name.
|
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
|
||||||
|
custom tasks and standard gym environments can be created without prefixes.
|
||||||
|
|
||||||
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
|
### Black-box Environments
|
||||||
trajectory.
|
|
||||||
|
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
|
||||||
|
each environment returns all collected information from each step as part of the infos. This information is, however,
|
||||||
|
mainly meant for debugging as well as logging and not for training.
|
||||||
|
|
||||||
|
|Key| Description|Type
|
||||||
|
|---|---|---|
|
||||||
|
`positions`| Generated trajectory from MP | Optional
|
||||||
|
`velocities`| Generated trajectory from MP | Optional
|
||||||
|
`step_actions`| Step-wise executed action based on controller output | Optional
|
||||||
|
`step_observations`| Step-wise intermediate observations | Optional
|
||||||
|
`step_rewards`| Step-wise rewards | Optional
|
||||||
|
`trajectory_length`| Total number of environment interactions | Always
|
||||||
|
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
|
||||||
|
|
||||||
|
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
|
||||||
|
|
||||||
|
> **Note:**
|
||||||
|
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
|
||||||
|
> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
|
||||||
|
> black-box leaning.
|
||||||
|
> While this is already implemented, it is still in beta and requires further testing.
|
||||||
|
> Feel free to try it and open an issue with any problems that occur.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import fancy_gym
|
import fancy_gym
|
||||||
|
|
||||||
env = fancy_gym.make('HoleReacherProMP-v0', seed=1)
|
env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
|
||||||
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).
|
# render() can be called once in the beginning with all necessary arguments.
|
||||||
env.render()
|
# To turn it of again just call render() without any arguments.
|
||||||
|
env.render(mode='human')
|
||||||
|
|
||||||
state = env.reset()
|
# This returns the context information, not the full state observation
|
||||||
|
obs = env.reset()
|
||||||
|
|
||||||
for i in range(5):
|
for i in range(5):
|
||||||
state, reward, done, info = env.step(env.action_space.sample())
|
action = env.action_space.sample()
|
||||||
|
obs, reward, done, info = env.step(action)
|
||||||
|
|
||||||
# Not really necessary as the environments resets itself after each trajectory anyway.
|
# Done is always True as we are working on the episode level, hence we always reset()
|
||||||
state = env.reset()
|
obs = env.reset()
|
||||||
```
|
```
|
||||||
|
|
||||||
To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
|
To show all available environments, we provide some additional convenience variables. All of them return a dictionary
|
||||||
keys `DMP` and `ProMP` that store a list of available environment names.
|
with two keys `DMP` and `ProMP` that store a list of available environment ids.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import fancy_gym
|
import fancy_gym
|
||||||
|
|
||||||
print("Custom MP tasks:")
|
print("Fancy Black-box tasks:")
|
||||||
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
print("OpenAI Gym MP tasks:")
|
print("OpenAI Gym Black-box tasks:")
|
||||||
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
print("Deepmind Control MP tasks:")
|
print("Deepmind Control Black-box tasks:")
|
||||||
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
print("MetaWorld MP tasks:")
|
print("MetaWorld Black-box tasks:")
|
||||||
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
```
|
```
|
||||||
|
|
||||||
### How to create a new MP task
|
### How to create a new MP task
|
||||||
|
|
||||||
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
|
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
|
||||||
hand, the following interface needs to be implemented.
|
hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented.
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
from abc import abstractmethod
|
||||||
|
from typing import Union, Tuple
|
||||||
|
|
||||||
|
import gym
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from mp_env_api import MPEnvWrapper
|
|
||||||
|
|
||||||
|
|
||||||
class MPWrapper(MPEnvWrapper):
|
class RawInterfaceWrapper(gym.Wrapper):
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def active_obs(self):
|
def context_mask(self) -> np.ndarray:
|
||||||
"""
|
"""
|
||||||
Returns boolean mask for each substate in the full observation.
|
Returns boolean mask of the same shape as the observation space.
|
||||||
It determines whether the observation is returned for the contextual case or not.
|
It determines whether the observation is returned for the contextual case or not.
|
||||||
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||||
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first
|
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||||
observation, the velocities are not necessary in the observation for the MP task.
|
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||||
|
Returns:
|
||||||
|
bool array representing the indices of the observations
|
||||||
|
|
||||||
"""
|
"""
|
||||||
return np.ones(self.observation_space.shape, dtype=bool)
|
return np.ones(self.env.observation_space.shape[0], dtype=bool)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def current_vel(self):
|
@abstractmethod
|
||||||
"""
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
Returns the current velocity of the action/control dimension.
|
|
||||||
The dimensionality has to match the action/control dimension.
|
|
||||||
This is not required when exclusively using position control,
|
|
||||||
it should, however, be implemented regardless.
|
|
||||||
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
|
||||||
"""
|
|
||||||
raise NotImplementedError()
|
|
||||||
|
|
||||||
@property
|
|
||||||
def current_pos(self):
|
|
||||||
"""
|
"""
|
||||||
Returns the current position of the action/control dimension.
|
Returns the current position of the action/control dimension.
|
||||||
The dimensionality has to match the action/control dimension.
|
The dimensionality has to match the action/control dimension.
|
||||||
@ -164,17 +184,14 @@ class MPWrapper(MPEnvWrapper):
|
|||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
@property
|
@property
|
||||||
def goal_pos(self):
|
@abstractmethod
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
"""
|
"""
|
||||||
Returns a predefined final position of the action/control dimension.
|
Returns the current velocity of the action/control dimension.
|
||||||
This is only required for the DMP and is most of the time learned instead.
|
The dimensionality has to match the action/control dimension.
|
||||||
"""
|
This is not required when exclusively using position control,
|
||||||
raise NotImplementedError()
|
it should, however, be implemented regardless.
|
||||||
|
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
||||||
@property
|
|
||||||
def dt(self):
|
|
||||||
"""
|
|
||||||
Returns the time between two simulated steps of the environment
|
|
||||||
"""
|
"""
|
||||||
raise NotImplementedError()
|
raise NotImplementedError()
|
||||||
|
|
||||||
@ -190,15 +207,12 @@ import fancy_gym
|
|||||||
# Base environment name, according to structure of above example
|
# Base environment name, according to structure of above example
|
||||||
base_env_id = "ball_in_cup-catch"
|
base_env_id = "ball_in_cup-catch"
|
||||||
|
|
||||||
# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
|
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
|
||||||
# You can also add other gym.Wrappers in case they are needed,
|
# You can also add other gym.Wrappers in case they are needed,
|
||||||
# e.g. gym.wrappers.FlattenObservation for dict observations
|
# e.g. gym.wrappers.FlattenObservation for dict observations
|
||||||
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
|
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
|
||||||
mp_kwargs = {...}
|
|
||||||
kwargs = {...}
|
kwargs = {...}
|
||||||
env = fancy_gym.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
|
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
|
||||||
# OR for a deterministic ProMP (other traj_gen_kwargs are required):
|
|
||||||
# env = fancy_gym.make_promp_env(base_env, wrappers=wrappers, seed=seed, traj_gen_kwargs=mp_args)
|
|
||||||
|
|
||||||
rewards = 0
|
rewards = 0
|
||||||
obs = env.reset()
|
obs = env.reset()
|
||||||
|
@ -1,7 +1,3 @@
|
|||||||
import os
|
|
||||||
|
|
||||||
os.environ["MUJOCO_GL"] = "egl"
|
|
||||||
|
|
||||||
from typing import Tuple, Optional
|
from typing import Tuple, Optional
|
||||||
|
|
||||||
import gym
|
import gym
|
||||||
|
@ -9,10 +9,13 @@ from mp_pytorch.mp.mp_interfaces import MPInterface
|
|||||||
class RawInterfaceWrapper(gym.Wrapper):
|
class RawInterfaceWrapper(gym.Wrapper):
|
||||||
|
|
||||||
@property
|
@property
|
||||||
@abstractmethod
|
|
||||||
def context_mask(self) -> np.ndarray:
|
def context_mask(self) -> np.ndarray:
|
||||||
"""
|
"""
|
||||||
This function defines the contexts. The contexts are defined as specific observations.
|
Returns boolean mask of the same shape as the observation space.
|
||||||
|
It determines whether the observation is returned for the contextual case or not.
|
||||||
|
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||||
|
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||||
|
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||||
Returns:
|
Returns:
|
||||||
bool array representing the indices of the observations
|
bool array representing the indices of the observations
|
||||||
|
|
||||||
|
5
setup.py
5
setup.py
@ -5,8 +5,8 @@ from setuptools import setup, find_packages
|
|||||||
# Environment-specific dependencies for dmc and metaworld
|
# Environment-specific dependencies for dmc and metaworld
|
||||||
extras = {
|
extras = {
|
||||||
"dmc": ["dm_control==1.0.1"],
|
"dmc": ["dm_control==1.0.1"],
|
||||||
"meta": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld"],
|
"metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
|
||||||
"mujoco": ["mujoco==2.2.0", "imageio>=2.14.1"],
|
'mujoco-py<2.2,>=2.1'],
|
||||||
}
|
}
|
||||||
|
|
||||||
# All dependencies
|
# All dependencies
|
||||||
@ -28,6 +28,7 @@ setup(
|
|||||||
extras_require=extras,
|
extras_require=extras,
|
||||||
install_requires=[
|
install_requires=[
|
||||||
'gym>=0.24.0',
|
'gym>=0.24.0',
|
||||||
|
'mujoco==2.2.0',
|
||||||
],
|
],
|
||||||
packages=[package for package in find_packages() if package.startswith("fancy_gym")],
|
packages=[package for package in find_packages() if package.startswith("fancy_gym")],
|
||||||
# packages=['fancy_gym', 'fancy_gym.envs', 'fancy_gym.open_ai', 'fancy_gym.dmc', 'fancy_gym.meta', 'fancy_gym.utils'],
|
# packages=['fancy_gym', 'fancy_gym.envs', 'fancy_gym.open_ai', 'fancy_gym.dmc', 'fancy_gym.meta', 'fancy_gym.utils'],
|
||||||
|
Loading…
Reference in New Issue
Block a user