fancy_gym/README.md

# Fancy Gym

`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom
gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if
you would then submit a PR for this environment to become part of `fancy_gym`.  
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic
Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).

## Movement Primitive Environments (Episode-Based/Black-Box Environments)

Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
optimization, and methods that are often used in traditional robotics and control. MP environments are typically
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
respectively as well as a special controller for the MetaWorld control suite.  
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
episode. This requires to predict a new action/MP parametrization for each context.

## Installation

1. Clone the repository

```bash 
git clone git@github.com:ALRhub/fancy_gym.git
```

2. Go to the folder

```bash 
cd fancy_gym
```

3. Install with

```bash 
pip install -e .
```

In case you want to use dm_control oder metaworld, you can install them by specifying extras

```bash 
pip install -e .[dmc, metaworld]
```

> **Note:**   
> While our library already fully supports the new mujoco bindings, metaworld still relies on
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.

## How to use Fancy Gym

We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.

### Step-wise Environments

```python
import fancy_gym

env = fancy_gym.make('Reacher5d-v0', seed=1)
obs = env.reset()

for i in range(1000):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)
    if i % 5 == 0:
        env.render()

    if done:
        obs = env.reset()
``` 

When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
custom tasks and standard gym environments can be created without prefixes.

### Black-box Environments

All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
each environment returns all collected information from each step as part of the infos. This information is, however,
mainly meant for debugging as well as logging and not for training.

|Key| Description|Type
|---|---|---|
`positions`| Generated trajectory from MP | Optional
`velocities`| Generated trajectory from MP | Optional
`step_actions`| Step-wise executed action based on controller output | Optional
`step_observations`| Step-wise intermediate observations | Optional
`step_rewards`| Step-wise rewards | Optional
`trajectory_length`| Total number of environment interactions | Always
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always

Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.

> **Note:**   
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and
> black-box leaning.
> While this is already implemented, it is still in beta and requires further testing.
> Feel free to try it and open an issue with any problems that occur.

```python
import fancy_gym

env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
# render() can be called once in the beginning with all necessary arguments.
# To turn it of again just call render() without any arguments. 
env.render(mode='human')

# This returns the context information, not the full state observation
obs = env.reset()

for i in range(5):
    action = env.action_space.sample()
    obs, reward, done, info = env.step(action)

    # Done is always True as we are working on the episode level, hence we always reset()
    obs = env.reset()
```

To show all available environments, we provide some additional convenience variables. All of them return a dictionary
with two keys `DMP` and `ProMP` that store a list of available environment ids.

```python
import fancy_gym

print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)

print("OpenAI Gym Black-box tasks:")
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)

print("Deepmind Control Black-box tasks:")
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)

print("MetaWorld Black-box tasks:")
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
```

### How to create a new MP task

In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented.

```python
from abc import abstractmethod
from typing import Union, Tuple

import gym
import numpy as np


class RawInterfaceWrapper(gym.Wrapper):

    @property
    def context_mask(self) -> np.ndarray:
        """
        Returns boolean mask of the same shape as the observation space.
        It determines whether the observation is returned for the contextual case or not.
        This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
        E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the 
        context/part of the first observation, the velocities are not necessary in the observation for the task.
        Returns:
            bool array representing the indices of the observations

        """
        return np.ones(self.env.observation_space.shape[0], dtype=bool)

    @property
    @abstractmethod
    def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
        """
            Returns the current position of the action/control dimension.
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using velocity control,
            it should, however, be implemented regardless.
            E.g. The joint positions that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

    @property
    @abstractmethod
    def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
        """
            Returns the current velocity of the action/control dimension.
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using position control,
            it should, however, be implemented regardless.
            E.g. The joint velocities that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

```

If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
at the [examples](fancy_gym/examples/).

```python
import fancy_gym

# Base environment name, according to structure of above example
base_env_id = "dmc:ball_in_cup-catch"

# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
# You can also add other gym.Wrappers in case they are needed, 
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
kwargs = {...}
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)

rewards = 0
obs = env.reset()

# number of samples/full trajectories (multiple environment steps)
for i in range(5):
    ac = env.action_space.sample()
    obs, reward, done, info = env.step(ac)
    rewards += reward

    if done:
        print(base_env_id, rewards)
        rewards = 0
        obs = env.reset()
```
beerpong.py done flag fixed 2022-07-13 16:01:48 +02:00			`# Fancy Gym`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface
Update README.md Updated gym website link 2022-09-23 09:09:37 +02:00			`of [OpenAI gym](https://gymlibrary.dev/). We provide support (under the OpenAI gym interface) for the benchmark suites`
added more documentation 2021-08-23 17:24:55 +02:00			`[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)`
readme udpated 2022-07-13 16:52:24 +02:00			`(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom`
Update README.md Updated gym website link 2022-09-23 09:09:37 +02:00			`gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if`
readme udpated 2022-07-13 16:52:24 +02:00			you would then submit a PR for this environment to become part of `fancy_gym`.
			`In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic`
			`Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).`
refractoring of DMP environmets to fit gym interface better. 2021-03-26 14:05:16 +01:00
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`## Movement Primitive Environments (Episode-Based/Black-Box Environments)`
added more documentation 2021-08-23 17:24:55 +02:00
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box`
readme udpated 2022-07-13 16:52:24 +02:00			`optimization, and methods that are often used in traditional robotics and control. MP environments are typically`
			`episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement`
			`Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual`
			`step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type`
			`of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,`
			`respectively as well as a special controller for the MetaWorld control suite.`
			`The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of`
			`the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for`
			`the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the`
			`episode. This requires to predict a new action/MP parametrization for each context.`
added more documentation 2021-08-23 17:24:55 +02:00
			`## Installation`

			`1. Clone the repository`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
Update README.md 2022-07-14 09:39:54 +02:00			`git clone git@github.com:ALRhub/fancy_gym.git`
Update README.md 2020-08-28 18:46:19 +02:00			```
added more documentation 2021-08-23 17:24:55 +02:00
			`2. Go to the folder`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
Update README.md 2022-07-14 09:39:54 +02:00			`cd fancy_gym`
Update README.md 2020-08-28 18:46:19 +02:00			```
added more documentation 2021-08-23 17:24:55 +02:00
			`3. Install with`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
readme udpated 2022-07-13 16:52:24 +02:00			`pip install -e .`
Update README.md 2020-08-28 18:46:19 +02:00			```
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`In case you want to use dm_control oder metaworld, you can install them by specifying extras`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			```bash
			`pip install -e .[dmc, metaworld]`
			```

			`> Note:`
			`> While our library already fully supports the new mujoco bindings, metaworld still relies on`
			`> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.`

			`## How to use Fancy Gym`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.`

			`### Step-wise Environments`
added more documentation 2021-08-23 17:24:55 +02:00
Update README.md 2020-08-28 18:46:19 +02:00			```python
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`import fancy_gym`
Update README.md 2020-08-28 18:50:37 +02:00
beerpong.py done flag fixed 2022-07-13 16:01:48 +02:00			`env = fancy_gym.make('Reacher5d-v0', seed=1)`
readme udpated 2022-07-13 16:52:24 +02:00			`obs = env.reset()`
Update README.md 2020-08-28 18:50:37 +02:00
added more documentation 2021-08-23 17:24:55 +02:00			`for i in range(1000):`
readme udpated 2022-07-13 16:52:24 +02:00			`action = env.action_space.sample()`
			`obs, reward, done, info = env.step(action)`
Update README.md 2020-08-28 18:50:37 +02:00			`if i % 5 == 0:`
			`env.render()`

			`if done:`
readme udpated 2022-07-13 16:52:24 +02:00			`obs = env.reset()`
Update README.md 2020-08-28 18:46:19 +02:00			```
update readme and init 2021-04-23 12:16:19 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
			tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
			`custom tasks and standard gym environments can be created without prefixes.`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`### Black-box Environments`

			`All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,`
			`each environment returns all collected information from each step as part of the infos. This information is, however,`
			`mainly meant for debugging as well as logging and not for training.`

			`\|Key\| Description\|Type`
			`\|---\|---\|---\|`
			`positions`\| Generated trajectory from MP \| Optional
			`velocities`\| Generated trajectory from MP \| Optional
			`step_actions`\| Step-wise executed action based on controller output \| Optional
			`step_observations`\| Step-wise intermediate observations \| Optional
			`step_rewards`\| Step-wise rewards \| Optional
			`trajectory_length`\| Total number of environment interactions \| Always
			`other`\| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. \| Always

			Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.

			`> Note:`
			`> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.`
			`> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and`
			`> black-box leaning.`
			`> While this is already implemented, it is still in beta and requires further testing.`
			`> Feel free to try it and open an issue with any problems that occur.`
added more documentation 2021-08-23 17:24:55 +02:00
			```python
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`import fancy_gym`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`env = fancy_gym.make('Reacher5dProMP-v0', seed=1)`
			`# render() can be called once in the beginning with all necessary arguments.`
			`# To turn it of again just call render() without any arguments.`
			`env.render(mode='human')`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`# This returns the context information, not the full state observation`
			`obs = env.reset()`
added more documentation 2021-08-23 17:24:55 +02:00
			`for i in range(5):`
readme udpated 2022-07-13 16:52:24 +02:00			`action = env.action_space.sample()`
			`obs, reward, done, info = env.step(action)`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`# Done is always True as we are working on the episode level, hence we always reset()`
			`obs = env.reset()`
added more documentation 2021-08-23 17:24:55 +02:00			```

readme udpated 2022-07-13 16:52:24 +02:00			`To show all available environments, we provide some additional convenience variables. All of them return a dictionary`
			with two keys `DMP` and `ProMP` that store a list of available environment ids.
added more documentation 2021-08-23 17:24:55 +02:00
			```python
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`import fancy_gym`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`print("Fancy Black-box tasks:")`
beerpong.py done flag fixed 2022-07-13 16:01:48 +02:00			`print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`print("OpenAI Gym Black-box tasks:")`
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`print("Deepmind Control Black-box tasks:")`
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`print("MetaWorld Black-box tasks:")`
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)`
added more documentation 2021-08-23 17:24:55 +02:00			```

			`### How to create a new MP task`

			`In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at`
readme udpated 2022-07-13 16:52:24 +02:00			`hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented.`
added more documentation 2021-08-23 17:24:55 +02:00
			```python
readme udpated 2022-07-13 16:52:24 +02:00			`from abc import abstractmethod`
			`from typing import Union, Tuple`

			`import gym`
added more documentation 2021-08-23 17:24:55 +02:00			`import numpy as np`


readme udpated 2022-07-13 16:52:24 +02:00			`class RawInterfaceWrapper(gym.Wrapper):`
added more documentation 2021-08-23 17:24:55 +02:00
			`@property`
readme udpated 2022-07-13 16:52:24 +02:00			`def context_mask(self) -> np.ndarray:`
added more documentation 2021-08-23 17:24:55 +02:00			`"""`
readme udpated 2022-07-13 16:52:24 +02:00			`Returns boolean mask of the same shape as the observation space.`
			`It determines whether the observation is returned for the contextual case or not.`
			`This effectively allows to filter unwanted or unnecessary observations from the full step-based case.`
			`E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the`
			`context/part of the first observation, the velocities are not necessary in the observation for the task.`
			`Returns:`
			`bool array representing the indices of the observations`
added more documentation 2021-08-23 17:24:55 +02:00
			`"""`
readme udpated 2022-07-13 16:52:24 +02:00			`return np.ones(self.env.observation_space.shape[0], dtype=bool)`
added more documentation 2021-08-23 17:24:55 +02:00
			`@property`
readme udpated 2022-07-13 16:52:24 +02:00			`@abstractmethod`
			`def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:`
added more documentation 2021-08-23 17:24:55 +02:00			`"""`
readme udpated 2022-07-13 16:52:24 +02:00			`Returns the current position of the action/control dimension.`
added more documentation 2021-08-23 17:24:55 +02:00			`The dimensionality has to match the action/control dimension.`
readme udpated 2022-07-13 16:52:24 +02:00			`This is not required when exclusively using velocity control,`
added more documentation 2021-08-23 17:24:55 +02:00			`it should, however, be implemented regardless.`
			`E.g. The joint positions that are directly or indirectly controlled by the action.`
			`"""`
			`raise NotImplementedError()`

			`@property`
readme udpated 2022-07-13 16:52:24 +02:00			`@abstractmethod`
			`def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:`
added more documentation 2021-08-23 17:24:55 +02:00			`"""`
readme udpated 2022-07-13 16:52:24 +02:00			`Returns the current velocity of the action/control dimension.`
			`The dimensionality has to match the action/control dimension.`
			`This is not required when exclusively using position control,`
			`it should, however, be implemented regardless.`
			`E.g. The joint velocities that are directly or indirectly controlled by the action.`
added more documentation 2021-08-23 17:24:55 +02:00			`"""`
			`raise NotImplementedError()`

			```

renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the`
			`integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look`
			`at the [examples](fancy_gym/examples/).`
added more documentation 2021-08-23 17:24:55 +02:00
			```python
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`import fancy_gym`
added more documentation 2021-08-23 17:24:55 +02:00
			`# Base environment name, according to structure of above example`
Update README.md 2022-09-28 10:21:39 +02:00			`base_env_id = "dmc:ball_in_cup-catch"`
added more documentation 2021-08-23 17:24:55 +02:00
readme udpated 2022-07-13 16:52:24 +02:00			`# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.`
added more documentation 2021-08-23 17:24:55 +02:00			`# You can also add other gym.Wrappers in case they are needed,`
			`# e.g. gym.wrappers.FlattenObservation for dict observations`
renaming to fancy_gym 2022-07-13 15:10:43 +02:00			`wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]`
added more documentation 2021-08-23 17:24:55 +02:00			`kwargs = {...}`
readme udpated 2022-07-13 16:52:24 +02:00			`env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)`
added more documentation 2021-08-23 17:24:55 +02:00
			`rewards = 0`
			`obs = env.reset()`

			`# number of samples/full trajectories (multiple environment steps)`
			`for i in range(5):`
			`ac = env.action_space.sample()`
			`obs, reward, done, info = env.step(ac)`
			`rewards += reward`

			`if done:`
			`print(base_env_id, rewards)`
			`rewards = 0`
			`obs = env.reset()`
			```