218 lines
		
	
	
		
			8.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			218 lines
		
	
	
		
			8.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ## ALR Robotics Control Environments
 | |
| 
 | |
| This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/).
 | |
| We provide support (under the OpenAI interface) for the benchmark suites
 | |
| [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
 | |
| (DMC) and [Metaworld](https://meta-world.github.io/). 
 | |
| Custom (Mujoco) gym environments can be created according
 | |
| to [this guide](https://www.gymlibrary.ml/content/environment_creation/). 
 | |
| Unlike existing libraries, we additionally support to control agents with movement primitives, such as 
 | |
| Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP, we only consider the mean usually).
 | |
| 
 | |
| ## Movement Primitive Environments (Episode-Based/Black-Box Environments)
 | |
| 
 | |
| Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box
 | |
| optimization, and methods that are often used in traditional robotics and control. 
 | |
| MP environments are episode-based and always execute a full trajectory, which is generated by a trajectory generator, 
 | |
| such as a Dynamic Movement Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). 
 | |
| The generated trajectory is translated into individual step-wise actions by a trajectory tracking controller. 
 | |
| The exact choice of controller is, however, dependent on the type of environment. 
 | |
| We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, respectively 
 | |
| as well as a special controller for the MetaWorld control suite.  
 | |
| The goal of all MP environments is still to learn a optimal policy. Yet, an action
 | |
| represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
 | |
| framework we support all of this also for the contextual setting, i.e. we expose a subset of the observation space 
 | |
| as a single context in the beginning of the episode. This requires to predict a new action/MP parametrization for each
 | |
| context. 
 | |
| All environments provide next to the cumulative episode reward all collected information from each
 | |
| step as part of the info dictionary. This information is, however, mainly meant for debugging as well as logging 
 | |
| and not for training. 
 | |
| 
 | |
| |Key| Description|
 | |
| |---|---|
 | |
| `trajectory`| Generated trajectory from MP
 | |
| `step_actions`| Step-wise executed action based on controller output
 | |
| `step_observations`| Step-wise intermediate observations
 | |
| `step_rewards`| Step-wise rewards
 | |
| `trajectory_length`| Total number of environment interactions
 | |
| `other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
 | |
| 
 | |
| ## Installation
 | |
| 
 | |
| 1. Clone the repository
 | |
| 
 | |
| ```bash 
 | |
| git clone git@github.com:ALRhub/alr_envs.git
 | |
| ```
 | |
| 
 | |
| 2. Go to the folder
 | |
| 
 | |
| ```bash 
 | |
| cd alr_envs
 | |
| ```
 | |
| 
 | |
| 3. Install with
 | |
| 
 | |
| ```bash 
 | |
| pip install -e . 
 | |
| ```
 | |
| 
 | |
| ## Using the framework
 | |
| 
 | |
| We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
 | |
| 
 | |
| ### Step-wise environments
 | |
| 
 | |
| ```python
 | |
| import alr_envs
 | |
| 
 | |
| env = alr_envs.make('HoleReacher-v0', seed=1)
 | |
| state = env.reset()
 | |
| 
 | |
| for i in range(1000):
 | |
|     state, reward, done, info = env.step(env.action_space.sample())
 | |
|     if i % 5 == 0:
 | |
|         env.render()
 | |
| 
 | |
|     if done:
 | |
|         state = env.reset()
 | |
| ``` 
 | |
| 
 | |
| For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
 | |
| as `manipulation-environment_name`. All other environments can be created based on their original name.
 | |
| 
 | |
| Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
 | |
| trajectory.
 | |
| 
 | |
| ```python
 | |
| import alr_envs
 | |
| 
 | |
| env = alr_envs.make('HoleReacherProMP-v0', seed=1)
 | |
| # render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). 
 | |
| env.render()
 | |
| 
 | |
| state = env.reset()
 | |
| 
 | |
| for i in range(5):
 | |
|     state, reward, done, info = env.step(env.action_space.sample())
 | |
| 
 | |
|     # Not really necessary as the environments resets itself after each trajectory anyway.
 | |
|     state = env.reset()
 | |
| ```
 | |
| 
 | |
| To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
 | |
| keys `DMP` and `ProMP` that store a list of available environment names.
 | |
| 
 | |
| ```python
 | |
| import alr_envs
 | |
| 
 | |
| print("Custom MP tasks:")
 | |
| print(alr_envs.ALL_ALR_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 | |
| 
 | |
| print("OpenAI Gym MP tasks:")
 | |
| print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
 | |
| 
 | |
| print("Deepmind Control MP tasks:")
 | |
| print(alr_envs.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 | |
| 
 | |
| print("MetaWorld MP tasks:")
 | |
| print(alr_envs.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
 | |
| ```
 | |
| 
 | |
| ### How to create a new MP task
 | |
| 
 | |
| In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
 | |
| hand, the following interface needs to be implemented.
 | |
| 
 | |
| ```python
 | |
| import numpy as np
 | |
| from mp_env_api import MPEnvWrapper
 | |
| 
 | |
| 
 | |
| class MPWrapper(MPEnvWrapper):
 | |
| 
 | |
|     @property
 | |
|     def active_obs(self):
 | |
|         """
 | |
|             Returns boolean mask for each substate in the full observation.
 | |
|             It determines whether the observation is returned for the contextual case or not.
 | |
|             This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
 | |
|             E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first  
 | |
|             observation, the velocities are not necessary in the observation for the MP task.
 | |
|         """
 | |
|         return np.ones(self.observation_space.shape, dtype=bool)
 | |
| 
 | |
|     @property
 | |
|     def current_vel(self):
 | |
|         """
 | |
|             Returns the current velocity of the action/control dimension. 
 | |
|             The dimensionality has to match the action/control dimension.
 | |
|             This is not required when exclusively using position control, 
 | |
|             it should, however, be implemented regardless.
 | |
|             E.g. The joint velocities that are directly or indirectly controlled by the action.
 | |
|         """
 | |
|         raise NotImplementedError()
 | |
| 
 | |
|     @property
 | |
|     def current_pos(self):
 | |
|         """
 | |
|             Returns the current position of the action/control dimension. 
 | |
|             The dimensionality has to match the action/control dimension.
 | |
|             This is not required when exclusively using velocity control, 
 | |
|             it should, however, be implemented regardless.
 | |
|             E.g. The joint positions that are directly or indirectly controlled by the action.
 | |
|         """
 | |
|         raise NotImplementedError()
 | |
| 
 | |
|     @property
 | |
|     def goal_pos(self):
 | |
|         """
 | |
|             Returns a predefined final position of the action/control dimension.
 | |
|             This is only required for the DMP and is most of the time learned instead.
 | |
|         """
 | |
|         raise NotImplementedError()
 | |
| 
 | |
|     @property
 | |
|     def dt(self):
 | |
|         """
 | |
|             Returns the time between two simulated steps of the environment
 | |
|         """
 | |
|         raise NotImplementedError()
 | |
| 
 | |
| ```
 | |
| 
 | |
| If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. 
 | |
| Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend 
 | |
| having a look at the [examples](alr_envs/examples/).
 | |
| 
 | |
| ```python
 | |
| import alr_envs
 | |
| 
 | |
| # Base environment name, according to structure of above example
 | |
| base_env_id = "ball_in_cup-catch"
 | |
| 
 | |
| # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
 | |
| # You can also add other gym.Wrappers in case they are needed, 
 | |
| # e.g. gym.wrappers.FlattenObservation for dict observations
 | |
| wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
 | |
| mp_kwargs = {...}
 | |
| kwargs = {...}
 | |
| env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
 | |
| # OR for a deterministic ProMP (other traj_gen_kwargs are required):
 | |
| # env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, traj_gen_kwargs=mp_args)
 | |
| 
 | |
| rewards = 0
 | |
| obs = env.reset()
 | |
| 
 | |
| # number of samples/full trajectories (multiple environment steps)
 | |
| for i in range(5):
 | |
|     ac = env.action_space.sample()
 | |
|     obs, reward, done, info = env.step(ac)
 | |
|     rewards += reward
 | |
| 
 | |
|     if done:
 | |
|         print(base_env_id, rewards)
 | |
|         rewards = 0
 | |
|         obs = env.reset()
 | |
| ```
 |