fancy_gym

Mirror for https://github.com/ALRhub/fancy_gym

Go to file

ottofabian 0c00c1675f Merge pull request #11 from 1nf0rmagician/dmc_integration Introduce short version of the continuous mountain car env		2021-08-25 17:31:36 +02:00
alr_envs	Added short mountain car	2021-08-25 17:31:05 +02:00
test	added more fine-grained test cases	2021-08-25 17:16:59 +02:00
__init__.py	added simple reacher task	2020-08-28 18:31:06 +02:00
.gitignore	start refactor and biac dev merge	2021-06-22 14:19:42 +02:00
README.md	added more documentation	2021-08-23 17:24:55 +02:00
setup.py	improved project structure and exposed methods	2021-08-25 17:16:20 +02:00

README.md

ALR Robotics Control Environments

This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym. Besides, some custom environments we also provide support for the benchmark suites OpenAI gym, DeepMind Control (DMC), and Metaworld. Custom (Mujoco) gym environment can be created according to this guide. Unlike existing libraries, we further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP, we only consider the mean usually).

Motion Primitive Environments (Episodic environments)

Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this framework we support the above setting for the contextual setting, for which we expose all changing substates of the task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each trajectory. All environments provide the next to the cumulative episode reward also all collected information from each step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.

Key	Description
`trajectory`	Generated trajectory from MP
`step_actions`	Step-wise executed action based on controller output
`step_observations`	Step-wise intermediate observations
`step_rewards`	Step-wise rewards
`trajectory_length`	Total number of environment interactions
`other`	All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.

Installation

Clone the repository

git clone git@github.com:ALRhub/alr_envs.git

Go to the folder

cd alr_envs

Install with

pip install -e .

Using the framework

We prepared multiple examples, please have a look there for more specific examples.

Step-wise environments

import alr_envs

env = alr_envs.make('HoleReacher-v0', seed=1)
state = env.reset()

for i in range(1000):
    state, reward, done, info = env.step(env.action_space.sample())
    if i % 5 == 0:
        env.render()

    if done:
        state = env.reset()

For Deepmind control tasks we expect the env_id to be specified as domain_name-task_name or for manipulation tasks as manipulation-environment_name. All other environments can be created based on their original name.

Existing MP tasks can be created the same way as above. Just keep in mind, calling step() always executs a full trajectory.

import alr_envs

env = alr_envs.make('HoleReacherDetPMP-v0', seed=1)
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). 
env.render()

state = env.reset()

for i in range(5):
    state, reward, done, info = env.step(env.action_space.sample())

    # Not really necessary as the environments resets itself after each trajectory anyway.
    state = env.reset()

To show all available environments, we provide some additional convenience. Each value will return a dictionary with two keys DMP and DetPMP that store a list of available environment names.

import alr_envs

print("Custom MP tasks:")
print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)

print("OpenAI Gym MP tasks:")
print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)

print("Deepmind Control MP tasks:")
print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)

print("MetaWorld MP tasks:")
print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)

How to create a new MP task

In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at hand, the following interface needs to be implemented.

import numpy as np
from mp_env_api import MPEnvWrapper


class MPWrapper(MPEnvWrapper):

    @property
    def active_obs(self):
        """
            Returns boolean mask for each substate in the full observation.
            It determines whether the observation is returned for the contextual case or not.
            This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
            E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first  
            observation, the velocities are not necessary in the observation for the MP task.
        """
        return np.ones(self.observation_space.shape, dtype=bool)

    @property
    def current_vel(self):
        """
            Returns the current velocity of the action/control dimension. 
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using position control, 
            it should, however, be implemented regardless.
            E.g. The joint velocities that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

    @property
    def current_pos(self):
        """
            Returns the current position of the action/control dimension. 
            The dimensionality has to match the action/control dimension.
            This is not required when exclusively using velocity control, 
            it should, however, be implemented regardless.
            E.g. The joint positions that are directly or indirectly controlled by the action.
        """
        raise NotImplementedError()

    @property
    def goal_pos(self):
        """
            Returns a predefined final position of the action/control dimension.
            This is only required for the DMP and is most of the time learned instead.
        """
        raise NotImplementedError()

    @property
    def dt(self):
        """
            Returns the time between two simulated steps of the environment
        """
        raise NotImplementedError()

If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look at the examples.

import alr_envs

# Base environment name, according to structure of above example
base_env_id = "ball_in_cup-catch"

# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
# You can also add other gym.Wrappers in case they are needed, 
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
mp_kwargs = {...}
kwargs = {...}
env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
# OR for a deterministic ProMP (other mp_kwargs are required):
# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)

rewards = 0
obs = env.reset()

# number of samples/full trajectories (multiple environment steps)
for i in range(5):
    ac = env.action_space.sample()
    obs, reward, done, info = env.step(ac)
    rewards += reward

    if done:
        print(base_env_id, rewards)
        rewards = 0
        obs = env.reset()