213 lines
8.1 KiB
Markdown
213 lines
8.1 KiB
Markdown
## ALR Robotics Control Environments
|
|
|
|
This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym.
|
|
Besides, some custom environments we also provide support for the benchmark suites
|
|
[OpenAI gym](https://gym.openai.com/),
|
|
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
|
|
(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according
|
|
to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
|
|
further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP,
|
|
we only consider the mean usually).
|
|
|
|
## Motion Primitive Environments (Episodic environments)
|
|
|
|
Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
|
|
optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full
|
|
trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The
|
|
generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
|
|
however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
|
|
velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
|
|
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
|
|
framework we support the above setting for the contextual setting, for which we expose all changing substates of the
|
|
task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
|
|
trajectory. All environments provide the next to the cumulative episode reward also all collected information from each
|
|
step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.
|
|
|
|
|Key| Description|
|
|
|---|---|
|
|
`trajectory`| Generated trajectory from MP
|
|
`step_actions`| Step-wise executed action based on controller output
|
|
`step_observations`| Step-wise intermediate observations
|
|
`step_rewards`| Step-wise rewards
|
|
`trajectory_length`| Total number of environment interactions
|
|
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
|
|
|
|
## Installation
|
|
|
|
1. Clone the repository
|
|
|
|
```bash
|
|
git clone git@github.com:ALRhub/alr_envs.git
|
|
```
|
|
|
|
2. Go to the folder
|
|
|
|
```bash
|
|
cd alr_envs
|
|
```
|
|
|
|
3. Install with
|
|
|
|
```bash
|
|
pip install -e .
|
|
```
|
|
|
|
## Using the framework
|
|
|
|
We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
|
|
|
|
### Step-wise environments
|
|
|
|
```python
|
|
import alr_envs
|
|
|
|
env = alr_envs.make('HoleReacher-v0', seed=1)
|
|
state = env.reset()
|
|
|
|
for i in range(1000):
|
|
state, reward, done, info = env.step(env.action_space.sample())
|
|
if i % 5 == 0:
|
|
env.render()
|
|
|
|
if done:
|
|
state = env.reset()
|
|
```
|
|
|
|
For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
|
|
as `manipulation-environment_name`. All other environments can be created based on their original name.
|
|
|
|
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
|
|
trajectory.
|
|
|
|
```python
|
|
import alr_envs
|
|
|
|
env = alr_envs.make('HoleReacherDetPMP-v0', seed=1)
|
|
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).
|
|
env.render()
|
|
|
|
state = env.reset()
|
|
|
|
for i in range(5):
|
|
state, reward, done, info = env.step(env.action_space.sample())
|
|
|
|
# Not really necessary as the environments resets itself after each trajectory anyway.
|
|
state = env.reset()
|
|
```
|
|
|
|
To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
|
|
keys `DMP` and `DetPMP` that store a list of available environment names.
|
|
|
|
```python
|
|
import alr_envs
|
|
|
|
print("Custom MP tasks:")
|
|
print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)
|
|
|
|
print("OpenAI Gym MP tasks:")
|
|
print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
|
|
|
|
print("Deepmind Control MP tasks:")
|
|
print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)
|
|
|
|
print("MetaWorld MP tasks:")
|
|
print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
|
|
```
|
|
|
|
### How to create a new MP task
|
|
|
|
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
|
|
hand, the following interface needs to be implemented.
|
|
|
|
```python
|
|
import numpy as np
|
|
from mp_env_api import MPEnvWrapper
|
|
|
|
|
|
class MPWrapper(MPEnvWrapper):
|
|
|
|
@property
|
|
def active_obs(self):
|
|
"""
|
|
Returns boolean mask for each substate in the full observation.
|
|
It determines whether the observation is returned for the contextual case or not.
|
|
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
|
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first
|
|
observation, the velocities are not necessary in the observation for the MP task.
|
|
"""
|
|
return np.ones(self.observation_space.shape, dtype=bool)
|
|
|
|
@property
|
|
def current_vel(self):
|
|
"""
|
|
Returns the current velocity of the action/control dimension.
|
|
The dimensionality has to match the action/control dimension.
|
|
This is not required when exclusively using position control,
|
|
it should, however, be implemented regardless.
|
|
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
@property
|
|
def current_pos(self):
|
|
"""
|
|
Returns the current position of the action/control dimension.
|
|
The dimensionality has to match the action/control dimension.
|
|
This is not required when exclusively using velocity control,
|
|
it should, however, be implemented regardless.
|
|
E.g. The joint positions that are directly or indirectly controlled by the action.
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
@property
|
|
def goal_pos(self):
|
|
"""
|
|
Returns a predefined final position of the action/control dimension.
|
|
This is only required for the DMP and is most of the time learned instead.
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
@property
|
|
def dt(self):
|
|
"""
|
|
Returns the time between two simulated steps of the environment
|
|
"""
|
|
raise NotImplementedError()
|
|
|
|
```
|
|
|
|
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well.
|
|
Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend
|
|
having a look at the [examples](alr_envs/examples/).
|
|
|
|
```python
|
|
import alr_envs
|
|
|
|
# Base environment name, according to structure of above example
|
|
base_env_id = "ball_in_cup-catch"
|
|
|
|
# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
|
|
# You can also add other gym.Wrappers in case they are needed,
|
|
# e.g. gym.wrappers.FlattenObservation for dict observations
|
|
wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
|
|
mp_kwargs = {...}
|
|
kwargs = {...}
|
|
env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
|
|
# OR for a deterministic ProMP (other mp_kwargs are required):
|
|
# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)
|
|
|
|
rewards = 0
|
|
obs = env.reset()
|
|
|
|
# number of samples/full trajectories (multiple environment steps)
|
|
for i in range(5):
|
|
ac = env.action_space.sample()
|
|
obs, reward, done, info = env.step(ac)
|
|
rewards += reward
|
|
|
|
if done:
|
|
print(base_env_id, rewards)
|
|
rewards = 0
|
|
obs = env.reset()
|
|
```
|