Better README

This commit is contained in:
Dominik Moritz Roth 2023-09-17 18:37:40 +02:00
parent da34db22c8
commit 8749fc52cb

122
README.md
View File

@ -7,25 +7,26 @@
<br>
</h1>
`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface of [Gymnasium](https://gymnasium.farama.org/).
| :exclamation: Fancy Gym has recently received a mayor refactor, which also updated many of the used dependencies to current versions. The update has brought some breaking changes. If you want to access the old version, check out the legacy branch. Find out more about what changed [here](TODO). |
| ------------------------------------------------------------ |
We provide support (under the Gymnasium interface) for the benchmark suites [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) (DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom gym environments, use [this guide](https://www.gymlibrary.dev/content/environment_creation/). We highly appreciate it, if you would then submit a PR for this environment to become part of `fancy_gym`.
Built upon the foundation of [Gymnasium](https://gymnasium.farama.org/) (a maintained fork of OpenAIs renowned Gym library) `fancy_gym` offers a comprehensive collection of reinforcement learning environments.
In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP).
**Key Features**:
- **New Challenging Environments**: We've introduced several new environments that present a higher degree of difficulty, pushing the boundaries of reinforcement learning research.
- **Advanced Movement Primitives**: `fancy_gym` supports sophisticated movement primitives, including Dynamic Movement Primitives (DMPs), Probabilistic Movement Primitives (ProMP), and Probabilistic Dynamic Movement Primitives (ProDMP).
- **Benchmark Suite Compatibility**: `fancy_gym` makes it easy to access renowned benchmark suites such as [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) and [Metaworld](https://meta-world.github.io/) and makes it easy to use them with movement primitives.
- **Upgrade to Movement Primitives**: With our framework, it's straightforward to transform standard Gymnasium environments into environments that support movement primitives.
- **Contribute Your Own Environments**: If you're inspired to create custom gym environments, both step-based and with movement primitives, this [guide](https://www.gymlibrary.dev/content/environment_creation/) will assist you. We encourage and highly appreciate submissions via PRs to integrate these environments into `fancy_gym`.
## Movement Primitive Environments (Episode-Based/Black-Box Environments)
Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
optimization, and methods that are often used in traditional robotics and control. MP environments are typically
episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement
Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual
step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type
of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control,
respectively as well as a special controller for the MetaWorld control suite.
The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of
the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for
the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the
episode. This requires to predict a new action/MP parametrization for each context.
Movement primitive (MP) environments differ from traditional step-based environments. They align more with concepts from stochastic search, black-box optimization, and methods commonly found in classical robotics and control. Instead of individual steps, MP environments operate on an episode basis, executing complete trajectories. These trajectories are produced by trajectory generators like Dynamic Movement Primitives (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic Dynamic Movement Primitives (ProDMP).
Once generated, these trajectories are converted into step-by-step actions using a trajectory tracking controller. The specific controller chosen depends on the environment's requirements. Currently, we support position, velocity, and PD-Controllers tailored for position, velocity, and torque control. Additionally, we have a specialized controller designed for the MetaWorld control suite.
While the overarching objective of MP environments remains the learning of an optimal policy, the actions here represent the parametrization of motion primitives to craft the right trajectory. Our framework further enhances this by accommodating a contextual setting. At the episode's onset, we present the context space—a subset of the observation space. This demands the prediction of a new action or MP parametrization for every unique context.
## Installation
@ -47,47 +48,43 @@ cd fancy_gym
pip install -e .
```
In case you want to use dm_control oder metaworld, you can install them by specifying extras
We have a few optional dependencies. CHeck them out in the setup.py or just install all of them via
```bash
pip install -e .[dmc,metaworld]
pip install -e '.[all]'
```
> **Note:**
> While our library already fully supports the new mujoco bindings, metaworld still relies on
> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand.
## How to use Fancy Gym
We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look.
### Step-wise Environments
### Step-Based Environments
Regular step based environments added by Fancy Gym are added into the ```fancy/``` namespace.
| :exclamation: Legacy versions of Fancy Gym used ```fancy_gym.make(...)```. This is no longer supported and will raise an Exception on new versions. |
| ------------------------------------------------------------ |
```python
import fancy_gym
import gym
env = fancy_gym.make('Reacher5d-v0', seed=1)
obs = env.reset()
env = gym.make('fancy/Reacher5d-v0')
observation = env.reset(seed=1)
for i in range(1000):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
observation, reward, terminated, truncated, info = env.step(action)
if i % 5 == 0:
env.render()
if done:
obs = env.reset()
if terminated or truncated:
observation = env.reset()
```
When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation
tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our
custom tasks and standard gym environments can be created without prefixes.
### Black-box Environments
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally,
each environment returns all collected information from each step as part of the infos. This information is, however,
mainly meant for debugging as well as logging and not for training.
All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, each environment returns all collected information from each step as part of the infos. This information is, however, mainly meant for debugging as well as logging and not for training.
|Key| Description|Type
|---|---|---|
@ -99,7 +96,8 @@ mainly meant for debugging as well as logging and not for training.
`trajectory_length`| Total number of environment interactions | Always
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory.
Existing MP tasks can be created the same way as above. The namespace of a MP-variant of an environment is given by ```<original namespace>_<MP name>/```.
Just keep in mind, calling `step()` executes a full trajectory.
> **Note:**
> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories.
@ -111,20 +109,23 @@ Existing MP tasks can be created the same way as above. Just keep in mind, calli
```python
import fancy_gym
env = fancy_gym.make('Reacher5dProMP-v0', seed=1)
env = fancy_gym.make('fancy_ProMP/Reacher5d-v0')
# or env = fancy_gym.make('metaworld_ProDMP/reach-v2')
# or env = fancy_gym.make('dm_control_DMP/ball_in_cup-catch-v0')
# render() can be called once in the beginning with all necessary arguments.
# To turn it of again just call render() without any arguments.
env.render(mode='human')
# This returns the context information, not the full state observation
obs = env.reset()
observation = env.reset(seed=1)
for i in range(5):
action = env.action_space.sample()
obs, reward, done, info = env.step(action)
observation, reward, terminated, truncated, info = env.step(action)
# Done is always True as we are working on the episode level, hence we always reset()
obs = env.reset()
observation = env.reset()
```
To show all available environments, we provide some additional convenience variables. All of them return a dictionary
@ -133,6 +134,9 @@ with two keys `DMP` and `ProMP` that store a list of available environment ids.
```python
import fancy_gym
print("All Black-box tasks:")
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
print("Fancy Black-box tasks:")
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
@ -155,11 +159,16 @@ hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) ne
from abc import abstractmethod
from typing import Union, Tuple
import gym
import gymnasium as gym
import numpy as np
class RawInterfaceWrapper(gym.Wrapper):
mp_config = { # Default configurations for MPs can be ovveritten by defining them here.
'ProMP': {},
'DMP': {},
'ProDMP': {},
}
@property
def context_mask(self) -> np.ndarray:
@ -205,32 +214,43 @@ If you created a new task wrapper, feel free to open a PR, so we can integrate i
integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look
at the [examples](fancy_gym/examples/).
If the step-based is already registered with gym, you can simply do the following:
```python
import fancy_gym
fancy_gym.upgrade(
id='custom/cool_new_env-v0',
mp_wrapper=my_custom_MPWrapper
)
```
# Base environment name, according to structure of above example
base_env_id = "dmc:ball_in_cup-catch"
If the step-based is not yet registered with gym we can add both the step-based and MP-versions via
# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper.
# You can also add other gym.Wrappers in case they are needed,
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper]
kwargs = {...}
env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs)
```python
fancy_gym.register(
id='custom/cool_new_env-v0',
entry_point=my_custom_env,
mp_wrapper=my_custom_MPWrapper
)
```
From this point on, you can access MP-version of your environments via
```python
env = gym.make('custom_ProDMP/cool_new_env-v0')
rewards = 0
obs = env.reset()
observation = env.reset()
# number of samples/full trajectories (multiple environment steps)
for i in range(5):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
observation, reward, terminated, truncated, info = env.step(ac)
rewards += reward
if done:
if terminated or truncated:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
observation = env.reset()
```
## Icon Attribution