readme udpated

This commit is contained in:
Fabian 2022-07-13 16:52:24 +02:00
parent f70f3eeb9a
commit 2e6094982e
4 changed files with 115 additions and 101 deletions

194
README.md
View File

@ -1,40 +1,27 @@
# Fancy Gym # Fancy Gym
Fancy gym offers a large variety of reinforcement learning environments under the unifying interface `fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI interface) for the benchmark suites of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI gym interface) for the benchmark suites
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according (DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
to [this guide](https://www.gymlibrary.ml/content/environment_creation/). Unlike existing libraries, we additionally gym environments, use [this guide](https://www.gymlibrary.ml/content/environment_creation/). We highly appreciate it, if
support to control agents with movement primitives, such as Dynamic Movement Primitives (DMPs) and Probabilistic you would then submit a PR for this environment to become part of `fancy_gym`.
Movement Primitives (ProMP, we only consider the mean usually). In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
## Movement Primitive Environments (Episode-Based/Black-Box Environments) ## Movement Primitive Environments (Episode-Based/Black-Box Environments)
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
optimization, and methods that are often used in traditional robotics and control. optimization, and methods that are often used in traditional robotics and control. MP environments are typically
MP environments are episode-based and always execute a full trajectory, which is generated by a trajectory generator, episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
such as a Dynamic Movement Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
The generated trajectory is translated into individual step-wise actions by a trajectory tracking controller. step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
The exact choice of controller is, however, dependent on the type of environment. of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, respectively respectively as well as a special controller for the MetaWorld control suite.
as well as a special controller for the MetaWorld control suite. The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
The goal of all MP environments is still to learn a optimal policy. Yet, an action the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
framework we support all of this also for the contextual setting, i.e. we expose a subset of the observation space episode. This requires to predict a new action/MP parametrization for each context.
as a single context in the beginning of the episode. This requires to predict a new action/MP parametrization for each
context.
All environments provide next to the cumulative episode reward all collected information from each
step as part of the info dictionary. This information is, however, mainly meant for debugging as well as logging
and not for training.
|Key| Description|
|---|---|
`trajectory`| Generated trajectory from MP
`step_actions`| Step-wise executed action based on controller output
`step_observations`| Step-wise intermediate observations
`step_rewards`| Step-wise rewards
`trajectory_length`| Total number of environment interactions
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
## Installation ## Installation
@ -56,104 +43,137 @@ cd alr_envs
pip install -e . pip install -e .
``` ```
## Using the framework In case you want to use dm_control oder metaworld, you can install them by specifying extras
We prepared [multiple examples](fancy_gym/examples/), please have a look there for more specific examples. ```bash
pip install -e .[dmc, metaworld]
```
### Step-wise environments > **Note:**
> While our library already fully supports the new mujoco bindings, metaworld still relies on
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
## How to use Fancy Gym
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
### Step-wise Environments
```python ```python
import fancy_gym import fancy_gym
env = fancy_gym.make('Reacher5d-v0', seed=1) env = fancy_gym.make('Reacher5d-v0', seed=1)
state = env.reset() obs = env.reset()
for i in range(1000): for i in range(1000):
state, reward, done, info = env.step(env.action_space.sample()) action = env.action_space.sample()
obs, reward, done, info = env.step(action)
if i % 5 == 0: if i % 5 == 0:
env.render() env.render()
if done: if done:
state = env.reset() obs = env.reset()
``` ```
For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
as `manipulation-environment_name`. All other environments can be created based on their original name. tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
custom tasks and standard gym environments can be created without prefixes.
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full ### Black-box Environments
trajectory.
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
each environment returns all collected information from each step as part of the infos. This information is, however,
mainly meant for debugging as well as logging and not for training.
|Key| Description|Type
|---|---|---|
`positions`| Generated trajectory from MP | Optional
`velocities`| Generated trajectory from MP | Optional
`step_actions`| Step-wise executed action based on controller output | Optional
`step_observations`| Step-wise intermediate observations | Optional
`step_rewards`| Step-wise rewards | Optional
`trajectory_length`| Total number of environment interactions | Always
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
> **Note:**
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
> black-box leaning.
> While this is already implemented, it is still in beta and requires further testing.
> Feel free to try it and open an issue with any problems that occur.
```python ```python
import fancy_gym import fancy_gym
env = fancy_gym.make('HoleReacherProMP-v0', seed=1) env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). # render() can be called once in the beginning with all necessary arguments.
env.render() # To turn it of again just call render() without any arguments.
env.render(mode='human')
state = env.reset() # This returns the context information, not the full state observation
obs = env.reset()
for i in range(5): for i in range(5):
state, reward, done, info = env.step(env.action_space.sample()) action = env.action_space.sample()
obs, reward, done, info = env.step(action)
# Not really necessary as the environments resets itself after each trajectory anyway. # Done is always True as we are working on the episode level, hence we always reset()
state = env.reset() obs = env.reset()
``` ```
To show all available environments, we provide some additional convenience. Each value will return a dictionary with two To show all available environments, we provide some additional convenience variables. All of them return a dictionary
keys `DMP` and `ProMP` that store a list of available environment names. with two keys `DMP` and `ProMP` that store a list of available environment ids.
```python ```python
import fancy_gym import fancy_gym
print("Custom MP tasks:") print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("OpenAI Gym MP tasks:") print("OpenAI Gym Black-box tasks:")
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("Deepmind Control MP tasks:") print("Deepmind Control Black-box tasks:")
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("MetaWorld MP tasks:") print("MetaWorld Black-box tasks:")
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS) print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
``` ```
### How to create a new MP task ### How to create a new MP task
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
hand, the following interface needs to be implemented. hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented.
```python ```python
from abc import abstractmethod
from typing import Union, Tuple
import gym
import numpy as np import numpy as np
from mp_env_api import MPEnvWrapper
class MPWrapper(MPEnvWrapper): class RawInterfaceWrapper(gym.Wrapper):
@property @property
def active_obs(self): def context_mask(self) -> np.ndarray:
""" """
Returns boolean mask for each substate in the full observation. Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not. It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case. This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
observation, the velocities are not necessary in the observation for the MP task. context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns:
bool array representing the indices of the observations
""" """
return np.ones(self.observation_space.shape, dtype=bool) return np.ones(self.env.observation_space.shape[0], dtype=bool)
@property @property
def current_vel(self): @abstractmethod
""" def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
Returns the current velocity of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using position control,
it should, however, be implemented regardless.
E.g. The joint velocities that are directly or indirectly controlled by the action.
"""
raise NotImplementedError()
@property
def current_pos(self):
""" """
Returns the current position of the action/control dimension. Returns the current position of the action/control dimension.
The dimensionality has to match the action/control dimension. The dimensionality has to match the action/control dimension.
@ -164,17 +184,14 @@ class MPWrapper(MPEnvWrapper):
raise NotImplementedError() raise NotImplementedError()
@property @property
def goal_pos(self): @abstractmethod
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
""" """
Returns a predefined final position of the action/control dimension. Returns the current velocity of the action/control dimension.
This is only required for the DMP and is most of the time learned instead. The dimensionality has to match the action/control dimension.
""" This is not required when exclusively using position control,
raise NotImplementedError() it should, however, be implemented regardless.
E.g. The joint velocities that are directly or indirectly controlled by the action.
@property
def dt(self):
"""
Returns the time between two simulated steps of the environment
""" """
raise NotImplementedError() raise NotImplementedError()
@ -190,15 +207,12 @@ import fancy_gym
# Base environment name, according to structure of above example # Base environment name, according to structure of above example
base_env_id = "ball_in_cup-catch" base_env_id = "ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
# You can also add other gym.Wrappers in case they are needed, # You can also add other gym.Wrappers in case they are needed,
# e.g. gym.wrappers.FlattenObservation for dict observations # e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper] wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
mp_kwargs = {...}
kwargs = {...} kwargs = {...}
env = fancy_gym.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs) env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
# OR for a deterministic ProMP (other traj_gen_kwargs are required):
# env = fancy_gym.make_promp_env(base_env, wrappers=wrappers, seed=seed, traj_gen_kwargs=mp_args)
rewards = 0 rewards = 0
obs = env.reset() obs = env.reset()

View File

@ -1,7 +1,3 @@
import os
os.environ["MUJOCO_GL"] = "egl"
from typing import Tuple, Optional from typing import Tuple, Optional
import gym import gym

View File

@ -9,10 +9,13 @@ from mp_pytorch.mp.mp_interfaces import MPInterface
class RawInterfaceWrapper(gym.Wrapper): class RawInterfaceWrapper(gym.Wrapper):
@property @property
@abstractmethod
def context_mask(self) -> np.ndarray: def context_mask(self) -> np.ndarray:
""" """
This function defines the contexts. The contexts are defined as specific observations. Returns boolean mask of the same shape as the observation space.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
context/part of the first observation, the velocities are not necessary in the observation for the task.
Returns: Returns:
bool array representing the indices of the observations bool array representing the indices of the observations

View File

@ -5,8 +5,8 @@ from setuptools import setup, find_packages
# Environment-specific dependencies for dmc and metaworld # Environment-specific dependencies for dmc and metaworld
extras = { extras = {
"dmc": ["dm_control==1.0.1"], "dmc": ["dm_control==1.0.1"],
"meta": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld"], "metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld",
"mujoco": ["mujoco==2.2.0", "imageio>=2.14.1"], 'mujoco-py<2.2,>=2.1'],
} }
# All dependencies # All dependencies
@ -28,6 +28,7 @@ setup(
extras_require=extras, extras_require=extras,
install_requires=[ install_requires=[
'gym>=0.24.0', 'gym>=0.24.0',
'mujoco==2.2.0',
], ],
packages=[package for package in find_packages() if package.startswith("fancy_gym")], packages=[package for package in find_packages() if package.startswith("fancy_gym")],
# packages=['fancy_gym', 'fancy_gym.envs', 'fancy_gym.open_ai', 'fancy_gym.dmc', 'fancy_gym.meta', 'fancy_gym.utils'], # packages=['fancy_gym', 'fancy_gym.envs', 'fancy_gym.open_ai', 'fancy_gym.dmc', 'fancy_gym.meta', 'fancy_gym.utils'],