fancy_gym/README.md

## ALR Robotics Control Environments

This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/).
We provide support (under the OpenAI interface) for the benchmark suites
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC) and [Metaworld](https://meta-world.github.io/). 
Custom (Mujoco) gym environments can be created according
to [this guide](https://www.gymlibrary.ml/content/environment_creation/). 
Unlike existing libraries, we additionally support to control agents with movement primitives, such as 
Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP, we only consider the mean usually).

## Movement Primitive Environments (Episode-Based/Black-Box Environments)

Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
optimization, and methods that are often used in traditional robotics and control. 
MP environments are episode-based and always execute a full trajectory, which is generated by a trajectory generator, 
such as a Dynamic Movement Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). 
The generated trajectory is translated into individual step-wise actions by a trajectory tracking controller. 
The exact choice of controller is, however, dependent on the type of environment. 
We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, respectively 
as well as a special controller for the MetaWorld control suite.  
The goal of all MP environments is still to learn a optimal policy. Yet, an action
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
framework we support all of this also for the contextual setting, i.e. we expose a subset of the observation space 
as a single context in the beginning of the episode. This requires to predict a new action/MP parametrization for each
context. 
All environments provide next to the cumulative episode reward all collected information from each
step as part of the info dictionary. This information is, however, mainly meant for debugging as well as logging 
and not for training. 

|Key| Description|
|---|---|
`trajectory`| Generated trajectory from MP
`step_actions`| Step-wise executed action based on controller output
`step_observations`| Step-wise intermediate observations
`step_rewards`| Step-wise rewards
`trajectory_length`| Total number of environment interactions
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.

## Installation

1. Clone the repository

```bash 
git clone git@github.com:ALRhub/alr_envs.git
```

2. Go to the folder

```bash 
cd alr_envs
```

3. Install with

```bash 
pip install -e . 
```

## Using the framework

We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.

### Step-wise environments

```python
import alr_envs

env = alr_envs.make('HoleReacher-v0', seed=1)
state = env.reset()

for i in range(1000):
    state, reward, done, info = env.step(env.action_space.sample())
    if i % 5 == 0:
        env.render()

    if done:
        state = env.reset()
``` 

For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
as `manipulation-environment_name`. All other environments can be created based on their original name.

Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
trajectory.

```python
import alr_envs

env = alr_envs.make('HoleReacherProMP-v0', seed=1)
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). 
env.render()

state = env.reset()

for i in range(5):
    state, reward, done, info = env.step(env.action_space.sample())

    # Not really necessary as the environments resets itself after each trajectory anyway.
    state = env.reset()
```

To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
keys `DMP` and `ProMP` that store a list of available environment names.

```python
import alr_envs

print("Custom MP tasks:")
print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)

print("OpenAI Gym MP tasks:")
print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)

print("Deepmind Control MP tasks:")
print(alr_envs.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)

print("MetaWorld MP tasks:")
print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
```

### How to create a new MP task

In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
hand, the following interface needs to be implemented.

```python
import numpy as np
from mp_env_api import MPEnvWrapper


class MPWrapper(MPEnvWrapper):

    @property
    def active_obs(self):
        """
            Returns boolean mask for each substate in the full observation.
            It determines whether the observation is returned for the contextual case or not.
            This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
            E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first  
            observation, the velocities are not necessary in the observation for the MP task.
        """
        return np.ones(self.observation_space.shape, dtype=bool)

    @property
    def current_vel(self):
        """
            Returns the current velocity of the action/control dimension. 
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using position control, 
            it should, however, be implemented regardless.
            E.g. The joint velocities that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

    @property
    def current_pos(self):
        """
            Returns the current position of the action/control dimension. 
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using velocity control, 
            it should, however, be implemented regardless.
            E.g. The joint positions that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

    @property
    def goal_pos(self):
        """
            Returns a predefined final position of the action/control dimension.
            This is only required for the DMP and is most of the time learned instead.
        """
        raise NotImplementedError()

    @property
    def dt(self):
        """
            Returns the time between two simulated steps of the environment
        """
        raise NotImplementedError()

```

If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. 
Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend 
having a look at the [examples](alr_envs/examples/).

```python
import alr_envs

# Base environment name, according to structure of above example
base_env_id = "ball_in_cup-catch"

# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
# You can also add other gym.Wrappers in case they are needed, 
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
mp_kwargs = {...}
kwargs = {...}
env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
# OR for a deterministic ProMP (other traj_gen_kwargs are required):
# env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, traj_gen_kwargs=mp_args)

rewards = 0
obs = env.reset()

# number of samples/full trajectories (multiple environment steps)
for i in range(5):
    ac = env.action_space.sample()
    obs, reward, done, info = env.step(ac)
    rewards += reward

    if done:
        print(base_env_id, rewards)
        rewards = 0
        obs = env.reset()
```
Update README.md 2021-07-23 15:18:39 +02:00			`## ALR Robotics Control Environments`
added more documentation 2021-08-23 17:24:55 +02:00
Update README.md 2021-12-01 14:31:47 +01:00			`This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/).`
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`We provide support (under the OpenAI interface) for the benchmark suites`
added more documentation 2021-08-23 17:24:55 +02:00			`[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)`
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`(DMC) and [Metaworld](https://meta-world.github.io/).`
			`Custom (Mujoco) gym environments can be created according`
			`to [this guide](https://www.gymlibrary.ml/content/environment_creation/).`
			`Unlike existing libraries, we additionally support to control agents with movement primitives, such as`
			`Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP, we only consider the mean usually).`
refractoring of DMP environmets to fit gym interface better. 2021-03-26 14:05:16 +01:00
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`## Movement Primitive Environments (Episode-Based/Black-Box Environments)`
added more documentation 2021-08-23 17:24:55 +02:00
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box`
			`optimization, and methods that are often used in traditional robotics and control.`
			`MP environments are episode-based and always execute a full trajectory, which is generated by a trajectory generator,`
			`such as a Dynamic Movement Primitive (DMP) or a Probabilistic Movement Primitive (ProMP).`
			`The generated trajectory is translated into individual step-wise actions by a trajectory tracking controller.`
			`The exact choice of controller is, however, dependent on the type of environment.`
			`We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, respectively`
			`as well as a special controller for the MetaWorld control suite.`
			`The goal of all MP environments is still to learn a optimal policy. Yet, an action`
added more documentation 2021-08-23 17:24:55 +02:00			`represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this`
first clean up and some non working ideas sketched 2022-06-28 16:05:09 +02:00			`framework we support all of this also for the contextual setting, i.e. we expose a subset of the observation space`
			`as a single context in the beginning of the episode. This requires to predict a new action/MP parametrization for each`
			`context.`
			`All environments provide next to the cumulative episode reward all collected information from each`
			`step as part of the info dictionary. This information is, however, mainly meant for debugging as well as logging`
			`and not for training.`
added more documentation 2021-08-23 17:24:55 +02:00
			`\|Key\| Description\|`
			`\|---\|---\|`
			`trajectory`\| Generated trajectory from MP
			`step_actions`\| Step-wise executed action based on controller output
			`step_observations`\| Step-wise intermediate observations
			`step_rewards`\| Step-wise rewards
			`trajectory_length`\| Total number of environment interactions
			`other`\| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.

			`## Installation`

			`1. Clone the repository`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
			`git clone git@github.com:ALRhub/alr_envs.git`
			```
added more documentation 2021-08-23 17:24:55 +02:00
			`2. Go to the folder`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
			`cd alr_envs`
			```
added more documentation 2021-08-23 17:24:55 +02:00
			`3. Install with`

Update README.md 2020-08-28 18:46:19 +02:00			```bash
			`pip install -e .`
			```
added more documentation 2021-08-23 17:24:55 +02:00
			`## Using the framework`

			`We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.`

			`### Step-wise environments`

Update README.md 2020-08-28 18:46:19 +02:00			```python
added more documentation 2021-08-23 17:24:55 +02:00			`import alr_envs`
Update README.md 2020-08-28 18:50:37 +02:00
added more documentation 2021-08-23 17:24:55 +02:00			`env = alr_envs.make('HoleReacher-v0', seed=1)`
Update README.md 2020-08-28 18:50:37 +02:00			`state = env.reset()`

added more documentation 2021-08-23 17:24:55 +02:00			`for i in range(1000):`
Update README.md 2020-08-28 18:50:37 +02:00			`state, reward, done, info = env.step(env.action_space.sample())`
			`if i % 5 == 0:`
			`env.render()`

			`if done:`
			`state = env.reset()`
Update README.md 2020-08-28 18:46:19 +02:00			```
update readme and init 2021-04-23 12:16:19 +02:00
added more documentation 2021-08-23 17:24:55 +02:00			For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
			as `manipulation-environment_name`. All other environments can be created based on their original name.

			Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
			`trajectory.`

			```python
			`import alr_envs`

replaced all detpmp with promp 2021-11-30 16:11:32 +01:00			`env = alr_envs.make('HoleReacherProMP-v0', seed=1)`
added more documentation 2021-08-23 17:24:55 +02:00			`# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).`
			`env.render()`

			`state = env.reset()`

			`for i in range(5):`
			`state, reward, done, info = env.step(env.action_space.sample())`

			`# Not really necessary as the environments resets itself after each trajectory anyway.`
			`state = env.reset()`
			```

			`To show all available environments, we provide some additional convenience. Each value will return a dictionary with two`
replaced all detpmp with promp 2021-11-30 16:11:32 +01:00			keys `DMP` and `ProMP` that store a list of available environment names.
added more documentation 2021-08-23 17:24:55 +02:00
			```python
			`import alr_envs`

			`print("Custom MP tasks:")`
			`print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)`

			`print("OpenAI Gym MP tasks:")`
			`print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)`

			`print("Deepmind Control MP tasks:")`
naming convention and running tests 2022-07-11 16:18:18 +02:00			`print(alr_envs.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)`
added more documentation 2021-08-23 17:24:55 +02:00
			`print("MetaWorld MP tasks:")`
			`print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)`
			```

			`### How to create a new MP task`

			`In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at`
			`hand, the following interface needs to be implemented.`

			```python
			`import numpy as np`
			`from mp_env_api import MPEnvWrapper`


			`class MPWrapper(MPEnvWrapper):`

			`@property`
			`def active_obs(self):`
			`"""`
			`Returns boolean mask for each substate in the full observation.`
			`It determines whether the observation is returned for the contextual case or not.`
			`This effectively allows to filter unwanted or unnecessary observations from the full step-based case.`
			`E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first`
			`observation, the velocities are not necessary in the observation for the MP task.`
			`"""`
			`return np.ones(self.observation_space.shape, dtype=bool)`

			`@property`
			`def current_vel(self):`
			`"""`
			`Returns the current velocity of the action/control dimension.`
			`The dimensionality has to match the action/control dimension.`
			`This is not required when exclusively using position control,`
			`it should, however, be implemented regardless.`
			`E.g. The joint velocities that are directly or indirectly controlled by the action.`
			`"""`
			`raise NotImplementedError()`

			`@property`
			`def current_pos(self):`
			`"""`
			`Returns the current position of the action/control dimension.`
			`The dimensionality has to match the action/control dimension.`
			`This is not required when exclusively using velocity control,`
			`it should, however, be implemented regardless.`
			`E.g. The joint positions that are directly or indirectly controlled by the action.`
			`"""`
			`raise NotImplementedError()`

			`@property`
			`def goal_pos(self):`
			`"""`
			`Returns a predefined final position of the action/control dimension.`
			`This is only required for the DMP and is most of the time learned instead.`
			`"""`
			`raise NotImplementedError()`

			`@property`
			`def dt(self):`
			`"""`
			`Returns the time between two simulated steps of the environment`
			`"""`
			`raise NotImplementedError()`

			```

			`If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well.`
			`Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend`
			`having a look at the [examples](alr_envs/examples/).`

			```python
			`import alr_envs`

			`# Base environment name, according to structure of above example`
			`base_env_id = "ball_in_cup-catch"`

			`# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.`
			`# You can also add other gym.Wrappers in case they are needed,`
			`# e.g. gym.wrappers.FlattenObservation for dict observations`
			`wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]`
			`mp_kwargs = {...}`
			`kwargs = {...}`
			`env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)`
restructuring 2022-06-29 09:37:18 +02:00			`# OR for a deterministic ProMP (other traj_gen_kwargs are required):`
			`# env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, traj_gen_kwargs=mp_args)`
added more documentation 2021-08-23 17:24:55 +02:00
			`rewards = 0`
			`obs = env.reset()`

			`# number of samples/full trajectories (multiple environment steps)`
			`for i in range(5):`
			`ac = env.action_space.sample()`
			`obs, reward, done, info = env.step(ac)`
			`rewards += reward`

			`if done:`
			`print(base_env_id, rewards)`
			`rewards = 0`
			`obs = env.reset()`
			```