Merge pull request #13 from ALRhub/dmc_integration

Dmc integration
This commit is contained in:
ottofabian 2021-09-29 16:14:10 +02:00 committed by GitHub
commit 4e5b543ebe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
207 changed files with 3949 additions and 2223 deletions

2
.gitignore vendored
View File

@ -109,3 +109,5 @@ venv.bak/
#configs
/configs/db.cfg
legacy/
MUJOCO_LOG.TXT

View File

@ -1,15 +0,0 @@
Fri Aug 28 14:41:56 2020
ERROR: GLEW initalization error: Missing GL version
Fri Aug 28 14:59:14 2020
ERROR: GLEW initalization error: Missing GL version
Fri Aug 28 15:03:43 2020
ERROR: GLEW initalization error: Missing GL version
Fri Aug 28 15:07:03 2020
ERROR: GLEW initalization error: Missing GL version
Fri Aug 28 15:15:01 2020
ERROR: GLEW initalization error: Missing GL version

239
README.md
View File

@ -1,87 +1,212 @@
## ALR Environments
## ALR Robotics Control Environments
This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc.
Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation.
When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym.
Besides, some custom environments we also provide support for the benchmark suites
[OpenAI gym](https://gym.openai.com/),
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according
to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP,
we only consider the mean usually).
## Environments
Currently we have the following environments:
## Motion Primitive Environments (Episodic environments)
### Mujoco
Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full
trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The
generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
framework we support the above setting for the contextual setting, for which we expose all changing substates of the
task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
trajectory. All environments provide the next to the cumulative episode reward also all collected information from each
step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
|Key| Description|
|---|---|
`trajectory`| Generated trajectory from MP
`step_actions`| Step-wise executed action based on controller output
`step_observations`| Step-wise intermediate observations
`step_rewards`| Step-wise rewards
`trajectory_length`| Total number of environment interactions
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
### Classic Control
## Installation
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
### DMP Environments
These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
The goal is to learn the parameters of this DMP to generate a suitable trajectory.
All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions.
|Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
[//]: |`HoleReacherDetPMP-v0`|
### Stochastic Search
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`Rosenbrock{dim}-v0`| Gym interface for Rosenbrock function. `{dim}` is one of 5, 10, 25, 50 or 100. | 1 | `{dim}` | 0
## Install
1. Clone the repository
```bash
git clone git@github.com:ALRhub/alr_envs.git
```
2. Go to the folder
```bash
cd alr_envs
```
3. Install with
```bash
pip install -e .
```
4. Use (see [example.py](./example.py)):
```python
import gym
env = gym.make('alr_envs:SimpleReacher-v0')
## Using the framework
We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
### Step-wise environments
```python
import alr_envs
env = alr_envs.make('HoleReacher-v0', seed=1)
state = env.reset()
for i in range(10000):
for i in range(1000):
state, reward, done, info = env.step(env.action_space.sample())
if i % 5 == 0:
env.render()
if done:
state = env.reset()
```
For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
as `manipulation-environment_name`. All other environments can be created based on their original name.
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
trajectory.
```python
import alr_envs
env = alr_envs.make('HoleReacherDetPMP-v0', seed=1)
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).
env.render()
state = env.reset()
for i in range(5):
state, reward, done, info = env.step(env.action_space.sample())
# Not really necessary as the environments resets itself after each trajectory anyway.
state = env.reset()
```
To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
keys `DMP` and `DetPMP` that store a list of available environment names.
```python
import alr_envs
print("Custom MP tasks:")
print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)
print("OpenAI Gym MP tasks:")
print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
print("Deepmind Control MP tasks:")
print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)
print("MetaWorld MP tasks:")
print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
```
### How to create a new MP task
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
hand, the following interface needs to be implemented.
```python
import numpy as np
from mp_env_api import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def active_obs(self):
"""
Returns boolean mask for each substate in the full observation.
It determines whether the observation is returned for the contextual case or not.
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first
observation, the velocities are not necessary in the observation for the MP task.
"""
return np.ones(self.observation_space.shape, dtype=bool)
@property
def current_vel(self):
"""
Returns the current velocity of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using position control,
it should, however, be implemented regardless.
E.g. The joint velocities that are directly or indirectly controlled by the action.
"""
raise NotImplementedError()
@property
def current_pos(self):
"""
Returns the current position of the action/control dimension.
The dimensionality has to match the action/control dimension.
This is not required when exclusively using velocity control,
it should, however, be implemented regardless.
E.g. The joint positions that are directly or indirectly controlled by the action.
"""
raise NotImplementedError()
@property
def goal_pos(self):
"""
Returns a predefined final position of the action/control dimension.
This is only required for the DMP and is most of the time learned instead.
"""
raise NotImplementedError()
@property
def dt(self):
"""
Returns the time between two simulated steps of the environment
"""
raise NotImplementedError()
```
For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py)
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well.
Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend
having a look at the [examples](alr_envs/examples/).
```python
import alr_envs
# Base environment name, according to structure of above example
base_env_id = "ball_in_cup-catch"
# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
# You can also add other gym.Wrappers in case they are needed,
# e.g. gym.wrappers.FlattenObservation for dict observations
wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
mp_kwargs = {...}
kwargs = {...}
env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
# OR for a deterministic ProMP (other mp_kwargs are required):
# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)
rewards = 0
obs = env.reset()
# number of samples/full trajectories (multiple environment steps)
for i in range(5):
ac = env.action_space.sample()
obs, reward, done, info = env.step(ac)
rewards += reward
if done:
print(base_env_id, rewards)
rewards = 0
obs = env.reset()
```

View File

@ -1,420 +1,15 @@
import numpy as np
from gym.envs.registration import register
from alr_envs import dmc, meta, open_ai
from alr_envs.utils.make_env_helpers import make, make_detpmp_env, make_dmp_env, make_rank
from alr_envs.utils import make_dmc
from alr_envs.stochastic_search.functions.f_rosenbrock import Rosenbrock
# Convenience function for all MP environments
from .alr import ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS
from .dmc import ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS
from .meta import ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS
from .open_ai import ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS
# from alr_envs.utils.mps.dmp_wrapper import DmpWrapper
# Mujoco
register(
id='ALRReacher-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 5,
"balance": False,
}
)
register(
id='ALRReacherSparse-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": False,
}
)
register(
id='ALRReacherSparseBalanced-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": True,
}
)
register(
id='ALRLongReacher-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 7,
"balance": False,
}
)
register(
id='ALRLongReacherSparse-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 7,
"balance": False,
}
)
register(
id='ALRLongReacherSparseBalanced-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 7,
"balance": True,
}
)
## Balancing Reacher
register(
id='Balancing-v0',
entry_point='alr_envs.mujoco:BalancingEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
register(
id='ALRBallInACupSimple-v0',
entry_point='alr_envs.mujoco:ALRBallInACupEnv',
max_episode_steps=4000,
kwargs={
"simplified": True,
"reward_type": "no_context"
}
)
register(
id='ALRBallInACup-v0',
entry_point='alr_envs.mujoco:ALRBallInACupEnv',
max_episode_steps=4000,
kwargs={
"reward_type": "no_context"
}
)
register(
id='ALRBallInACupGoal-v0',
entry_point='alr_envs.mujoco:ALRBallInACupEnv',
max_episode_steps=4000,
kwargs={
"reward_type": "contextual_goal"
}
)
# Classic control
## Simple Reacher
register(
id='SimpleReacher-v0',
entry_point='alr_envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 2,
}
)
register(
id='SimpleReacher-v1',
entry_point='alr_envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 2,
"random_start": False
}
)
register(
id='LongSimpleReacher-v0',
entry_point='alr_envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
register(
id='LongSimpleReacher-v1',
entry_point='alr_envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False
}
)
## Viapoint Reacher
register(
id='ViaPointReacher-v0',
entry_point='alr_envs.classic_control.viapoint_reacher:ViaPointReacher',
max_episode_steps=200,
kwargs={
"n_links": 5,
"allow_self_collision": False,
"collision_penalty": 1000
}
)
## Hole Reacher
register(
id='HoleReacher-v0',
entry_point='alr_envs.classic_control.hole_reacher:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": True,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": None,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v1',
entry_point='alr_envs.classic_control.hole_reacher:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": None,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v2',
entry_point='alr_envs.classic_control.hole_reacher:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": 2,
"collision_penalty": 100,
}
)
# MP environments
## Simple Reacher
versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"]
for v in versions:
name = v.split("-")
register(
id=f'{name[0]}DMP-{name[1]}',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
# max_episode_steps=1,
kwargs={
"name": f"alr_envs:{v}",
"num_dof": 2 if "long" not in v.lower() else 5,
"num_basis": 5,
"duration": 2,
"alpha_phase": 2,
"learn_goal": True,
"policy_type": "velocity",
"weights_scale": 50,
}
)
register(
id='ViaPointReacherDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
# max_episode_steps=1,
kwargs={
"name": "alr_envs:ViaPointReacher-v0",
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"alpha_phase": 2,
"learn_goal": False,
"policy_type": "velocity",
"weights_scale": 50,
}
)
## Hole Reacher
versions = ["v0", "v1", "v2"]
for v in versions:
register(
id=f'HoleReacherDMP-{v}',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
# max_episode_steps=1,
kwargs={
"name": f"alr_envs:HoleReacher-{v}",
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"learn_goal": True,
"alpha_phase": 2,
"bandwidth_factor": 2,
"policy_type": "velocity",
"weights_scale": 50,
"goal_scale": 0.1
}
)
register(
id=f'HoleReacherDetPMP-{v}',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env',
kwargs={
"name": f"alr_envs:HoleReacher-{v}",
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"width": 0.025,
"policy_type": "velocity",
"weights_scale": 0.2,
"zero_start": True
}
)
# TODO: properly add final_pos
register(
id='HoleReacherFixedGoalDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
# max_episode_steps=1,
kwargs={
"name": "alr_envs:HoleReacher-v0",
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"learn_goal": False,
"alpha_phase": 2,
"policy_type": "velocity",
"weights_scale": 50,
"goal_scale": 0.1
}
)
## Ball in Cup
register(
id='ALRBallInACupSimpleDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
kwargs={
"name": "alr_envs:ALRBallInACupSimple-v0",
"num_dof": 3,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"learn_goal": False,
"alpha_phase": 3,
"bandwidth_factor": 2.5,
"policy_type": "motor",
"weights_scale": 100,
"return_to_start": True,
"p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
"d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
}
)
register(
id='ALRBallInACupDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
kwargs={
"name": "alr_envs:ALRBallInACup-v0",
"num_dof": 7,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"learn_goal": False,
"alpha_phase": 3,
"bandwidth_factor": 2.5,
"policy_type": "motor",
"weights_scale": 100,
"return_to_start": True,
"p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
"d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
}
)
register(
id='ALRBallInACupSimpleDetPMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env',
kwargs={
"name": "alr_envs:ALRBallInACupSimple-v0",
"num_dof": 3,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"width": 0.0035,
# "off": -0.05,
"policy_type": "motor",
"weights_scale": 0.2,
"zero_start": True,
"zero_goal": True,
"p_gains": np.array([4./3., 2.4, 2.5, 5./3., 2., 2., 1.25]),
"d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
}
)
register(
id='ALRBallInACupDetPMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env',
kwargs={
"name": "alr_envs:ALRBallInACupSimple-v0",
"num_dof": 7,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"width": 0.0035,
"policy_type": "motor",
"weights_scale": 0.2,
"zero_start": True,
"zero_goal": True,
"p_gains": np.array([4./3., 2.4, 2.5, 5./3., 2., 2., 1.25]),
"d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
}
)
register(
id='ALRBallInACupGoalDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_contextual_env',
kwargs={
"name": "alr_envs:ALRBallInACupGoal-v0",
"num_dof": 7,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"learn_goal": True,
"alpha_phase": 3,
"bandwidth_factor": 2.5,
"policy_type": "motor",
"weights_scale": 50,
"goal_scale": 0.1,
"p_gains": np.array([4. / 3., 2.4, 2.5, 5. / 3., 2., 2., 1.25]),
"d_gains": np.array([0.0466, 0.12, 0.125, 0.04166, 0.06, 0.06, 0.025])
}
)
# BBO functions
for dim in [5, 10, 25, 50, 100]:
register(
id=f'Rosenbrock{dim}-v0',
entry_point='alr_envs.stochastic_search:StochasticSearchEnv',
max_episode_steps=1,
kwargs={
"cost_f": Rosenbrock(dim),
}
)
ALL_MOTION_PRIMITIVE_ENVIRONMENTS = {
key: value + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS[key] +
ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS[key] +
ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS[key]
for key, value in ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS.items()}

329
alr_envs/alr/__init__.py Normal file
View File

@ -0,0 +1,329 @@
from gym import register
from . import classic_control, mujoco
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
from .mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
from .mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv
from .mujoco.reacher.alr_reacher import ALRReacherEnv
from .mujoco.reacher.balancing import BalancingEnv
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []}
# Classic Control
## Simple Reacher
register(
id='SimpleReacher-v0',
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 2,
}
)
register(
id='SimpleReacher-v1',
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 2,
"random_start": False
}
)
register(
id='LongSimpleReacher-v0',
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
register(
id='LongSimpleReacher-v1',
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False
}
)
## Viapoint Reacher
register(
id='ViaPointReacher-v0',
entry_point='alr_envs.alr.classic_control:ViaPointReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"allow_self_collision": False,
"collision_penalty": 1000
}
)
## Hole Reacher
register(
id='HoleReacher-v0',
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": True,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": None,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v1',
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": None,
"collision_penalty": 100,
}
)
register(
id='HoleReacher-v2',
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
"random_start": False,
"allow_self_collision": False,
"allow_wall_collision": False,
"hole_width": 0.25,
"hole_depth": 1,
"hole_x": 2,
"collision_penalty": 100,
}
)
# Mujoco
## Reacher
register(
id='ALRReacher-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 5,
"balance": False,
}
)
register(
id='ALRReacherSparse-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": False,
}
)
register(
id='ALRReacherSparseBalanced-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 5,
"balance": True,
}
)
register(
id='ALRLongReacher-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 7,
"balance": False,
}
)
register(
id='ALRLongReacherSparse-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 7,
"balance": False,
}
)
register(
id='ALRLongReacherSparseBalanced-v0',
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 7,
"balance": True,
}
)
## Balancing Reacher
register(
id='Balancing-v0',
entry_point='alr_envs.alr.mujoco:BalancingEnv',
max_episode_steps=200,
kwargs={
"n_links": 5,
}
)
# Motion Primitive Environments
## Simple Reacher
_versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"]
for _v in _versions:
_name = _v.split("-")
_env_id = f'{_name[0]}DMP-{_name[1]}'
register(
id=_env_id,
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
# max_episode_steps=1,
kwargs={
"name": f"alr_envs:{_v}",
"wrappers": [classic_control.simple_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 2 if "long" not in _v.lower() else 5,
"num_basis": 5,
"duration": 20,
"alpha_phase": 2,
"learn_goal": True,
"policy_type": "velocity",
"weights_scale": 50,
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'{_name[0]}DetPMP-{_name[1]}'
register(
id=_env_id,
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
# max_episode_steps=1,
kwargs={
"name": f"alr_envs:{_v}",
"wrappers": [classic_control.simple_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 2 if "long" not in _v.lower() else 5,
"num_basis": 5,
"duration": 20,
"width": 0.025,
"policy_type": "velocity",
"weights_scale": 0.2,
"zero_start": True
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)
# Viapoint reacher
register(
id='ViaPointReacherDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
# max_episode_steps=1,
kwargs={
"name": "alr_envs:ViaPointReacher-v0",
"wrappers": [classic_control.viapoint_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"learn_goal": True,
"alpha_phase": 2,
"policy_type": "velocity",
"weights_scale": 50,
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
register(
id='ViaPointReacherDetPMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
# max_episode_steps=1,
kwargs={
"name": "alr_envs:ViaPointReacher-v0",
"wrappers": [classic_control.viapoint_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"width": 0.025,
"policy_type": "velocity",
"weights_scale": 0.2,
"zero_start": True
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ViaPointReacherDetPMP-v0")
## Hole Reacher
_versions = ["v0", "v1", "v2"]
for _v in _versions:
_env_id = f'HoleReacherDMP-{_v}'
register(
id=_env_id,
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
# max_episode_steps=1,
kwargs={
"name": f"alr_envs:HoleReacher-{_v}",
"wrappers": [classic_control.hole_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"learn_goal": True,
"alpha_phase": 2,
"bandwidth_factor": 2,
"policy_type": "velocity",
"weights_scale": 50,
"goal_scale": 0.1
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
_env_id = f'HoleReacherDetPMP-{_v}'
register(
id=_env_id,
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
kwargs={
"name": f"alr_envs:HoleReacher-{_v}",
"wrappers": [classic_control.hole_reacher.MPWrapper],
"mp_kwargs": {
"num_dof": 5,
"num_basis": 5,
"duration": 2,
"width": 0.025,
"policy_type": "velocity",
"weights_scale": 0.2,
"zero_start": True
}
}
)
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)

View File

@ -0,0 +1,21 @@
### Classic Control
## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
## MP Environments
|Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
[//]: |`HoleReacherDetPMP-v0`|

View File

@ -0,0 +1,3 @@
from .hole_reacher.hole_reacher import HoleReacherEnv
from .simple_reacher.simple_reacher import SimpleReacherEnv
from .viapoint_reacher.viapoint_reacher import ViaPointReacherEnv

View File

@ -0,0 +1 @@
from .mp_wrapper import MPWrapper

View File

@ -6,11 +6,10 @@ import numpy as np
from gym.utils import seeding
from matplotlib import patches
from alr_envs.classic_control.utils import check_self_collision
from alr_envs.utils.mps.alr_env import AlrEnv
from alr_envs.alr.classic_control.utils import check_self_collision
class HoleReacherEnv(AlrEnv):
class HoleReacherEnv(gym.Env):
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
@ -22,14 +21,14 @@ class HoleReacherEnv(AlrEnv):
self.random_start = random_start
# provided initial parameters
self._hole_x = hole_x # x-position of center of hole
self._hole_width = hole_width # width of hole
self._hole_depth = hole_depth # depth of hole
self.initial_x = hole_x # x-position of center of hole
self.initial_width = hole_width # width of hole
self.initial_depth = hole_depth # depth of hole
# temp container for current env state
self._tmp_hole_x = None
self._tmp_hole_width = None
self._tmp_hole_depth = None
self._tmp_x = None
self._tmp_width = None
self._tmp_depth = None
self._goal = None # x-y coordinates for reaching the center at the bottom of the hole
# collision
@ -44,7 +43,7 @@ class HoleReacherEnv(AlrEnv):
self._start_pos = np.hstack([[np.pi / 2], np.zeros(self.n_links - 1)])
self._start_vel = np.zeros(self.n_links)
self.dt = 0.01
self._dt = 0.01
action_bound = np.pi * np.ones((self.n_links,))
state_bound = np.hstack([
@ -66,6 +65,22 @@ class HoleReacherEnv(AlrEnv):
self._steps = 0
self.seed()
@property
def dt(self) -> Union[float, int]:
return self._dt
# @property
# def start_pos(self):
# return self._start_pos
@property
def current_pos(self):
return self._joint_angles.copy()
@property
def current_vel(self):
return self._angle_velocity.copy()
def step(self, action: np.ndarray):
"""
A single step with an action in joint velocity space
@ -88,7 +103,7 @@ class HoleReacherEnv(AlrEnv):
def reset(self):
if self.random_start:
# Maybe change more than dirst seed
# Maybe change more than first seed
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
self._start_pos = self._joint_angles.copy()
@ -107,13 +122,27 @@ class HoleReacherEnv(AlrEnv):
return self._get_obs().copy()
def _generate_hole(self):
self._tmp_hole_x = self.np_random.uniform(1, 3.5, 1) if self._hole_x is None else np.copy(self._hole_x)
self._tmp_hole_width = self.np_random.uniform(0.15, 0.5, 1) if self._hole_width is None else np.copy(
self._hole_width)
if self.initial_width is None:
width = self.np_random.uniform(0.15, 0.5)
else:
width = np.copy(self.initial_width)
if self.initial_x is None:
# sample whole on left or right side
direction = self.np_random.choice([-1, 1])
# Hole center needs to be half the width away from the arm to give a valid setting.
x = direction * self.np_random.uniform(width / 2, 3.5)
else:
x = np.copy(self.initial_x)
if self.initial_depth is None:
# TODO we do not want this right now.
self._tmp_hole_depth = self.np_random.uniform(1, 1, 1) if self._hole_depth is None else np.copy(
self._hole_depth)
self._goal = np.hstack([self._tmp_hole_x, -self._tmp_hole_depth])
depth = self.np_random.uniform(1, 1)
else:
depth = np.copy(self.initial_depth)
self._tmp_width = width
self._tmp_x = x
self._tmp_depth = depth
self._goal = np.hstack([self._tmp_x, -self._tmp_depth])
def _update_joints(self):
"""
@ -161,7 +190,7 @@ class HoleReacherEnv(AlrEnv):
np.cos(theta),
np.sin(theta),
self._angle_velocity,
self._tmp_hole_width,
self._tmp_width,
# self._tmp_hole_depth,
self.end_effector - self._goal,
self._steps
@ -187,9 +216,8 @@ class HoleReacherEnv(AlrEnv):
return np.squeeze(end_effector + self._joints[0, :])
def _check_wall_collision(self, line_points):
# all points that are before the hole in x
r, c = np.where(line_points[:, :, 0] < (self._tmp_hole_x - self._tmp_hole_width / 2))
r, c = np.where(line_points[:, :, 0] < (self._tmp_x - self._tmp_width / 2))
# check if any of those points are below surface
nr_line_points_below_surface_before_hole = np.sum(line_points[r, c, 1] < 0)
@ -198,7 +226,7 @@ class HoleReacherEnv(AlrEnv):
return True
# all points that are after the hole in x
r, c = np.where(line_points[:, :, 0] > (self._tmp_hole_x + self._tmp_hole_width / 2))
r, c = np.where(line_points[:, :, 0] > (self._tmp_x + self._tmp_width / 2))
# check if any of those points are below surface
nr_line_points_below_surface_after_hole = np.sum(line_points[r, c, 1] < 0)
@ -207,11 +235,11 @@ class HoleReacherEnv(AlrEnv):
return True
# all points that are above the hole
r, c = np.where((line_points[:, :, 0] > (self._tmp_hole_x - self._tmp_hole_width / 2)) & (
line_points[:, :, 0] < (self._tmp_hole_x + self._tmp_hole_width / 2)))
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
# check if any of those points are below surface
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_hole_depth)
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
if nr_line_points_below_surface_in_hole > 0:
return True
@ -235,7 +263,7 @@ class HoleReacherEnv(AlrEnv):
self.fig.show()
self.fig.gca().set_title(
f"Iteration: {self._steps}, distance: {self.end_effector - self._goal}")
f"Iteration: {self._steps}, distance: {np.linalg.norm(self.end_effector - self._goal) ** 2}")
if mode == "human":
@ -254,17 +282,17 @@ class HoleReacherEnv(AlrEnv):
def _set_patches(self):
if self.fig is not None:
self.fig.gca().patches = []
left_block = patches.Rectangle((-self.n_links, -self._tmp_hole_depth),
self.n_links + self._tmp_hole_x - self._tmp_hole_width / 2,
self._tmp_hole_depth,
left_block = patches.Rectangle((-self.n_links, -self._tmp_depth),
self.n_links + self._tmp_x - self._tmp_width / 2,
self._tmp_depth,
fill=True, edgecolor='k', facecolor='k')
right_block = patches.Rectangle((self._tmp_hole_x + self._tmp_hole_width / 2, -self._tmp_hole_depth),
self.n_links - self._tmp_hole_x + self._tmp_hole_width / 2,
self._tmp_hole_depth,
right_block = patches.Rectangle((self._tmp_x + self._tmp_width / 2, -self._tmp_depth),
self.n_links - self._tmp_x + self._tmp_width / 2,
self._tmp_depth,
fill=True, edgecolor='k', facecolor='k')
hole_floor = patches.Rectangle((self._tmp_hole_x - self._tmp_hole_width / 2, -self._tmp_hole_depth),
self._tmp_hole_width,
1 - self._tmp_hole_depth,
hole_floor = patches.Rectangle((self._tmp_x - self._tmp_width / 2, -self._tmp_depth),
self._tmp_width,
1 - self._tmp_depth,
fill=True, edgecolor='k', facecolor='k')
# Add the patch to the Axes
@ -272,26 +300,6 @@ class HoleReacherEnv(AlrEnv):
self.fig.gca().add_patch(right_block)
self.fig.gca().add_patch(hole_floor)
@property
def active_obs(self):
return np.hstack([
[self.random_start] * self.n_links, # cos
[self.random_start] * self.n_links, # sin
[self.random_start] * self.n_links, # velocity
[self._hole_width is None], # hole width
# [self._hole_depth is None], # hole depth
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def start_pos(self) -> Union[float, int, np.ndarray]:
return self._start_pos
@property
def goal_pos(self) -> Union[float, int, np.ndarray]:
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
@ -301,28 +309,6 @@ class HoleReacherEnv(AlrEnv):
return self._joints[self.n_links].T
def close(self):
super().close()
if self.fig is not None:
plt.close(self.fig)
if __name__ == '__main__':
nl = 5
render_mode = "human" # "human" or "partial" or "final"
env = HoleReacherEnv(n_links=nl, allow_self_collision=False, allow_wall_collision=False, hole_width=None,
hole_depth=1, hole_x=None)
obs = env.reset()
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = 2 * env.action_space.sample()
obs, rew, d, info = env.step(ac)
if i % 10 == 0:
env.render(mode=render_mode)
print(rew)
if d:
env.reset()
env.close()

View File

@ -0,0 +1,35 @@
from typing import Tuple, Union
import numpy as np
from mp_env_api import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def active_obs(self):
return np.hstack([
[self.env.random_start] * self.env.n_links, # cos
[self.env.random_start] * self.env.n_links, # sin
[self.env.random_start] * self.env.n_links, # velocity
[self.env.initial_width is None], # hole width
# [self.env.hole_depth is None], # hole depth
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_pos
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_vel
@property
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -0,0 +1 @@
from .mp_wrapper import MPWrapper

View File

@ -0,0 +1,33 @@
from typing import Tuple, Union
import numpy as np
from mp_env_api import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def active_obs(self):
return np.hstack([
[self.env.random_start] * self.env.n_links, # cos
[self.env.random_start] * self.env.n_links, # sin
[self.env.random_start] * self.env.n_links, # velocity
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_pos
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_vel
@property
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -1,14 +1,13 @@
from typing import Iterable, Union
import gym
import matplotlib.pyplot as plt
import numpy as np
from gym import spaces
from gym.utils import seeding
from alr_envs.utils.mps.alr_env import AlrEnv
class SimpleReacherEnv(AlrEnv):
class SimpleReacherEnv(gym.Env):
"""
Simple Reaching Task without any physics simulation.
Returns no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions
@ -19,19 +18,22 @@ class SimpleReacherEnv(AlrEnv):
super().__init__()
self.link_lengths = np.ones(n_links)
self.n_links = n_links
self.dt = 0.1
self._dt = 0.1
self.random_start = random_start
# provided initial parameters
self.inital_target = target
# temp container for current env state
self._goal = None
self._joints = None
self._joint_angles = None
self._angle_velocity = None
self._start_pos = np.zeros(self.n_links)
self._start_vel = np.zeros(self.n_links)
self._target = target # provided target value
self._goal = None # updated goal value, does not change when target != None
self.max_torque = 1
self.steps_before_reward = 199
@ -53,6 +55,22 @@ class SimpleReacherEnv(AlrEnv):
self._steps = 0
self.seed()
@property
def dt(self) -> Union[float, int]:
return self._dt
# @property
# def start_pos(self):
# return self._start_pos
@property
def current_pos(self):
return self._joint_angles
@property
def current_vel(self):
return self._angle_velocity
def step(self, action: np.ndarray):
"""
A single step with action in torque space
@ -126,14 +144,14 @@ class SimpleReacherEnv(AlrEnv):
def _generate_goal(self):
if self._target is None:
if self.inital_target is None:
total_length = np.sum(self.link_lengths)
goal = np.array([total_length, total_length])
while np.linalg.norm(goal) >= total_length:
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
else:
goal = np.copy(self._target)
goal = np.copy(self.inital_target)
self._goal = goal
@ -172,24 +190,6 @@ class SimpleReacherEnv(AlrEnv):
self.fig.canvas.draw()
self.fig.canvas.flush_events()
@property
def active_obs(self):
return np.hstack([
[self.random_start] * self.n_links, # cos
[self.random_start] * self.n_links, # sin
[self.random_start] * self.n_links, # velocity
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def start_pos(self):
return self._start_pos
@property
def goal_pos(self):
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
@ -200,26 +200,3 @@ class SimpleReacherEnv(AlrEnv):
@property
def end_effector(self):
return self._joints[self.n_links].T
if __name__ == '__main__':
nl = 5
render_mode = "human" # "human" or "partial" or "final"
env = SimpleReacherEnv(n_links=nl)
obs = env.reset()
print("First", obs)
for i in range(2000):
# objective.load_result("/tmp/cma")
# test with random actions
ac = 2 * env.action_space.sample()
# ac = np.ones(env.action_space.shape)
obs, rew, d, info = env.step(ac)
env.render(mode=render_mode)
print(obs[env.active_obs].shape)
if d or i % 200 == 0:
env.reset()
env.close()

View File

@ -10,7 +10,7 @@ def intersect(A, B, C, D):
def check_self_collision(line_points):
"Checks whether line segments and intersect"
"""Checks whether line segments and intersect"""
for i, line1 in enumerate(line_points):
for line2 in line_points[i + 2:, :, :]:
if intersect(line1[0], line1[-1], line2[0], line2[-1]):

View File

@ -0,0 +1 @@
from .mp_wrapper import MPWrapper

View File

@ -0,0 +1,34 @@
from typing import Tuple, Union
import numpy as np
from mp_env_api import MPEnvWrapper
class MPWrapper(MPEnvWrapper):
@property
def active_obs(self):
return np.hstack([
[self.env.random_start] * self.env.n_links, # cos
[self.env.random_start] * self.env.n_links, # sin
[self.env.random_start] * self.env.n_links, # velocity
[self.env.initial_via_target is None] * 2, # x-y coordinates of via point distance
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_pos
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.env.current_vel
@property
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -5,13 +5,12 @@ import matplotlib.pyplot as plt
import numpy as np
from gym.utils import seeding
from alr_envs.classic_control.utils import check_self_collision
from alr_envs.utils.mps.alr_env import AlrEnv
from alr_envs.alr.classic_control.utils import check_self_collision
class ViaPointReacher(AlrEnv):
class ViaPointReacherEnv(gym.Env):
def __init__(self, n_links, random_start: bool = True, via_target: Union[None, Iterable] = None,
def __init__(self, n_links, random_start: bool = False, via_target: Union[None, Iterable] = None,
target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000):
self.n_links = n_links
@ -20,8 +19,8 @@ class ViaPointReacher(AlrEnv):
self.random_start = random_start
# provided initial parameters
self._target = target # provided target value
self._via_target = via_target # provided via point target value
self.intitial_target = target # provided target value
self.initial_via_target = via_target # provided via point target value
# temp container for current env state
self._via_point = np.ones(2)
@ -39,7 +38,7 @@ class ViaPointReacher(AlrEnv):
self._start_vel = np.zeros(self.n_links)
self.weight_matrix_scale = 1
self.dt = 0.01
self._dt = 0.01
action_bound = np.pi * np.ones((self.n_links,))
state_bound = np.hstack([
@ -60,6 +59,22 @@ class ViaPointReacher(AlrEnv):
self._steps = 0
self.seed()
@property
def dt(self):
return self._dt
# @property
# def start_pos(self):
# return self._start_pos
@property
def current_pos(self):
return self._joint_angles.copy()
@property
def current_vel(self):
return self._angle_velocity.copy()
def step(self, action: np.ndarray):
"""
a single step with an action in joint velocity space
@ -104,22 +119,22 @@ class ViaPointReacher(AlrEnv):
total_length = np.sum(self.link_lengths)
# rejection sampled point in inner circle with 0.5*Radius
if self._via_target is None:
if self.initial_via_target is None:
via_target = np.array([total_length, total_length])
while np.linalg.norm(via_target) >= 0.5 * total_length:
via_target = self.np_random.uniform(low=-0.5 * total_length, high=0.5 * total_length, size=2)
else:
via_target = np.copy(self._via_target)
via_target = np.copy(self.initial_via_target)
# rejection sampled point in outer circle
if self._target is None:
if self.intitial_target is None:
goal = np.array([total_length, total_length])
while np.linalg.norm(goal) >= total_length or np.linalg.norm(goal) <= 0.5 * total_length:
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
else:
goal = np.copy(self._target)
goal = np.copy(self.intitial_target)
self._via_target = via_target
self._via_point = via_target
self._goal = goal
def _update_joints(self):
@ -266,25 +281,6 @@ class ViaPointReacher(AlrEnv):
plt.pause(0.01)
@property
def active_obs(self):
return np.hstack([
[self.random_start] * self.n_links, # cos
[self.random_start] * self.n_links, # sin
[self.random_start] * self.n_links, # velocity
[self._via_target is None] * 2, # x-y coordinates of via point distance
[True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def start_pos(self) -> Union[float, int, np.ndarray]:
return self._start_pos
@property
def goal_pos(self) -> Union[float, int, np.ndarray]:
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
def seed(self, seed=None):
self.np_random, seed = seeding.np_random(seed)
return [seed]
@ -296,26 +292,3 @@ class ViaPointReacher(AlrEnv):
def close(self):
if self.fig is not None:
plt.close(self.fig)
if __name__ == '__main__':
nl = 5
render_mode = "human" # "human" or "partial" or "final"
env = ViaPointReacher(n_links=nl, allow_self_collision=False)
env.reset()
env.render(mode=render_mode)
for i in range(300):
# objective.load_result("/tmp/cma")
# test with random actions
ac = env.action_space.sample()
# ac[0] += np.pi/2
obs, rew, d, info = env.step(ac)
env.render(mode=render_mode)
print(rew)
if d:
break
env.close()

View File

@ -0,0 +1,15 @@
# Custom Mujoco tasks
## Step-based Environments
|Name| Description|Horizon|Action Dimension|Observation Dimension
|---|---|---|---|---|
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip

View File

@ -0,0 +1,4 @@
from .reacher.alr_reacher import ALRReacherEnv
from .reacher.balancing import BalancingEnv
from .ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
from .ball_in_a_cup.biac_pd import ALRBallInACupPDEnv

View File

@ -1,16 +1,17 @@
from gym import utils
import os
import numpy as np
from alr_envs.mujoco import alr_mujoco_env
from gym.envs.mujoco import MujocoEnv
class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
class ALRBallInACupEnv(MujocoEnv, utils.EzPickle):
def __init__(self, n_substeps=4, apply_gravity_comp=True, simplified: bool = False,
reward_type: str = None, context: np.ndarray = None):
utils.EzPickle.__init__(**locals())
self._steps = 0
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
"biac_base" + ".xml")
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml")
self._q_pos = []
self._q_vel = []
@ -22,7 +23,6 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
self.context = context
utils.EzPickle.__init__(self)
alr_mujoco_env.AlrMujocoEnv.__init__(self,
self.xml_path,
apply_gravity_comp=apply_gravity_comp,
@ -35,13 +35,13 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
self.sim_time = 8 # seconds
self.sim_steps = int(self.sim_time / self.dt)
if reward_type == "no_context":
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
reward_function = BallInACupReward
elif reward_type == "contextual_goal":
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
reward_function = BallInACupReward
else:
raise ValueError("Unknown reward type")
raise ValueError("Unknown reward type: {}".format(reward_type))
self.reward_function = reward_function(self.sim_steps)
@property
@ -66,6 +66,10 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
def current_vel(self):
return self.sim.data.qvel[0:7].copy()
def reset(self):
self.reward_function.reset(None)
return super().reset()
def reset_model(self):
init_pos_all = self.init_qpos.copy()
init_pos_robot = self._start_pos
@ -100,14 +104,18 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
done = success or self._steps == self.sim_steps - 1 or is_collided
self._steps += 1
else:
reward = -2
reward = -2000
success = False
is_collided = False
done = True
return ob, reward, done, dict(reward_dist=reward_dist,
reward_ctrl=reward_ctrl,
velocity=angular_vel,
traj=self._q_pos, is_success=success,
# traj=self._q_pos,
action=a,
q_pos=self.sim.data.qpos[0:7].ravel().copy(),
q_vel=self.sim.data.qvel[0:7].ravel().copy(),
is_success=success,
is_collided=is_collided, sim_crash=crash)
def check_traj_in_joint_limits(self):
@ -148,6 +156,22 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
des_vel_full[5] = des_vel[2]
return des_vel_full
def render(self, render_mode, **render_kwargs):
if render_mode == "plot_trajectory":
if self._steps == 1:
import matplotlib.pyplot as plt
# plt.ion()
self.fig, self.axs = plt.subplots(3, 1)
if self._steps <= 1750:
for ax, cp in zip(self.axs, self.current_pos[1::2]):
ax.scatter(self._steps, cp, s=2, marker=".")
# self.fig.show()
else:
super().render(render_mode, **render_kwargs)
if __name__ == "__main__":
env = ALRBallInACupEnv()
@ -170,4 +194,3 @@ if __name__ == "__main__":
break
env.close()

View File

@ -0,0 +1,42 @@
from typing import Tuple, Union
import numpy as np
from mp_env_api import MPEnvWrapper
class BallInACupMPWrapper(MPEnvWrapper):
@property
def active_obs(self):
# TODO: @Max Filter observations correctly
return np.hstack([
[False] * 7, # cos
[False] * 7, # sin
# [True] * 2, # x-y coordinates of target distance
[False] # env steps
])
@property
def start_pos(self):
if self.simplified:
return self._start_pos[1::2]
else:
return self._start_pos
@property
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
return self.sim.data.qpos[0:7].copy()
@property
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
return self.sim.data.qvel[0:7].copy()
@property
def goal_pos(self):
# TODO: @Max I think the default value of returning to the start is reasonable here
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
@property
def dt(self) -> Union[float, int]:
return self.env.dt

View File

@ -1,5 +1,5 @@
import numpy as np
from alr_envs.mujoco import alr_reward_fct
from alr_envs.alr.mujoco import alr_reward_fct
class BallInACupReward(alr_reward_fct.AlrReward):

View File

@ -1,11 +1,10 @@
import numpy as np
from alr_envs.mujoco import alr_reward_fct
from alr_envs.alr.mujoco import alr_reward_fct
class BallInACupReward(alr_reward_fct.AlrReward):
def __init__(self, sim_time):
self.sim_time = sim_time
def __init__(self, env):
self.env = env
self.collision_objects = ["cup_geom1", "cup_geom2", "cup_base_contact_below",
"wrist_palm_link_convex_geom",
"wrist_pitch_link_convex_decomposition_p1_geom",
@ -22,7 +21,7 @@ class BallInACupReward(alr_reward_fct.AlrReward):
self.goal_final_id = None
self.collision_ids = None
self._is_collided = False
self.collision_penalty = 1
self.collision_penalty = 1000
self.ball_traj = None
self.dists = None
@ -32,55 +31,62 @@ class BallInACupReward(alr_reward_fct.AlrReward):
self.reset(None)
def reset(self, context):
self.ball_traj = np.zeros(shape=(self.sim_time, 3))
# self.sim_time = self.env.sim.dtsim_time
self.ball_traj = [] # np.zeros(shape=(self.sim_time, 3))
self.dists = []
self.dists_final = []
self.costs = []
self.action_costs = []
self.angle_costs = []
self.cup_angles = []
def compute_reward(self, action, env):
self.ball_id = env.sim.model._body_name2id["ball"]
self.ball_collision_id = env.sim.model._geom_name2id["ball_geom"]
self.goal_id = env.sim.model._site_name2id["cup_goal"]
self.goal_final_id = env.sim.model._site_name2id["cup_goal_final"]
self.collision_ids = [env.sim.model._geom_name2id[name] for name in self.collision_objects]
def compute_reward(self, action):
self.ball_id = self.env.sim.model._body_name2id["ball"]
self.ball_collision_id = self.env.sim.model._geom_name2id["ball_geom"]
self.goal_id = self.env.sim.model._site_name2id["cup_goal"]
self.goal_final_id = self.env.sim.model._site_name2id["cup_goal_final"]
self.collision_ids = [self.env.sim.model._geom_name2id[name] for name in self.collision_objects]
ball_in_cup = self.check_ball_in_cup(env.sim, self.ball_collision_id)
ball_in_cup = self.check_ball_in_cup(self.env.sim, self.ball_collision_id)
# Compute the current distance from the ball to the inner part of the cup
goal_pos = env.sim.data.site_xpos[self.goal_id]
ball_pos = env.sim.data.body_xpos[self.ball_id]
goal_final_pos = env.sim.data.site_xpos[self.goal_final_id]
goal_pos = self.env.sim.data.site_xpos[self.goal_id]
ball_pos = self.env.sim.data.body_xpos[self.ball_id]
goal_final_pos = self.env.sim.data.site_xpos[self.goal_final_id]
self.dists.append(np.linalg.norm(goal_pos - ball_pos))
self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
self.ball_traj[env._steps, :] = ball_pos
cup_quat = np.copy(env.sim.data.body_xquat[env.sim.model._body_name2id["cup"]])
self.cup_angles.append(np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2)))
# self.ball_traj[self.env._steps, :] = ball_pos
self.ball_traj.append(ball_pos)
cup_quat = np.copy(self.env.sim.data.body_xquat[self.env.sim.model._body_name2id["cup"]])
cup_angle = np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2))
cost_angle = (cup_angle - np.pi / 2) ** 2
self.angle_costs.append(cost_angle)
self.cup_angles.append(cup_angle)
action_cost = np.sum(np.square(action))
self.action_costs.append(action_cost)
self._is_collided = self.check_collision(env.sim) or env.check_traj_in_joint_limits()
self._is_collided = self.check_collision(self.env.sim) or self.env.check_traj_in_joint_limits()
if env._steps == env.sim_steps - 1 or self._is_collided:
if self.env._steps == self.env.ep_length - 1 or self._is_collided:
t_min_dist = np.argmin(self.dists)
angle_min_dist = self.cup_angles[t_min_dist]
cost_angle = (angle_min_dist - np.pi / 2)**2
# cost_angle = (angle_min_dist - np.pi / 2)**2
min_dist = self.dists[t_min_dist]
# min_dist = self.dists[t_min_dist]
dist_final = self.dists_final[-1]
min_dist_final = np.min(self.dists_final)
cost = 0.5 * dist_final + 0.05 * cost_angle # TODO: Increase cost_angle weight # 0.5 * min_dist +
# cost = 0.5 * dist_final + 0.05 * cost_angle # TODO: Increase cost_angle weight # 0.5 * min_dist +
# reward = np.exp(-2 * cost) - 1e-2 * action_cost - self.collision_penalty * int(self._is_collided)
# reward = - dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided)
reward = - dist_final**2 - min_dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided)
reward = - dist_final**2 - min_dist_final**2 - 1e-4 * cost_angle - 1e-3 * action_cost - self.collision_penalty * int(self._is_collided)
success = dist_final < 0.05 and ball_in_cup and not self._is_collided
crash = self._is_collided
else:
reward = - 1e-5 * action_cost # TODO: increase action_cost weight
reward = - 1e-3 * action_cost - 1e-4 * cost_angle # TODO: increase action_cost weight
success = False
crash = False

View File

@ -0,0 +1,205 @@
import os
import gym.envs.mujoco
import gym.envs.mujoco as mujoco_env
import mujoco_py.builder
import numpy as np
from gym import utils
from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper
from mp_env_api.utils.policies import PDControllerExtend
def make_detpmp_env(**kwargs):
name = kwargs.pop("name")
_env = gym.make(name)
policy = PDControllerExtend(_env, p_gains=kwargs.pop('p_gains'), d_gains=kwargs.pop('d_gains'))
kwargs['policy_type'] = policy
return DetPMPWrapper(_env, **kwargs)
class ALRBallInACupPDEnv(mujoco_env.MujocoEnv, utils.EzPickle):
def __init__(self, frame_skip=4, apply_gravity_comp=True, simplified: bool = False,
reward_type: str = None, context: np.ndarray = None):
utils.EzPickle.__init__(**locals())
self._steps = 0
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml")
self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
self.context = context
self.apply_gravity_comp = apply_gravity_comp
self.simplified = simplified
self._start_pos = np.array([0.0, 0.58760536, 0.0, 1.36004913, 0.0, -0.32072943, -1.57])
self._start_vel = np.zeros(7)
self.sim_time = 8 # seconds
self._dt = 0.02
self.ep_length = 4000 # based on 8 seconds with dt = 0.02 int(self.sim_time / self.dt)
if reward_type == "no_context":
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
reward_function = BallInACupReward
elif reward_type == "contextual_goal":
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
reward_function = BallInACupReward
else:
raise ValueError("Unknown reward type: {}".format(reward_type))
self.reward_function = reward_function(self)
mujoco_env.MujocoEnv.__init__(self, self.xml_path, frame_skip)
@property
def dt(self):
return self._dt
# TODO: @Max is this even needed?
@property
def start_vel(self):
if self.simplified:
return self._start_vel[1::2]
else:
return self._start_vel
# def _set_action_space(self):
# if self.simplified:
# bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)[1::2]
# else:
# bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)
# low, high = bounds.T
# self.action_space = spaces.Box(low=low, high=high, dtype=np.float32)
# return self.action_space
def reset(self):
self.reward_function.reset(None)
return super().reset()
def reset_model(self):
init_pos_all = self.init_qpos.copy()
init_pos_robot = self._start_pos
init_vel = np.zeros_like(init_pos_all)
self._steps = 0
self._q_pos = []
self._q_vel = []
start_pos = init_pos_all
start_pos[0:7] = init_pos_robot
self.set_state(start_pos, init_vel)
return self._get_obs()
def step(self, a):
reward_dist = 0.0
angular_vel = 0.0
reward_ctrl = - np.square(a).sum()
# if self.simplified:
# tmp = np.zeros(7)
# tmp[1::2] = a
# a = tmp
if self.apply_gravity_comp:
a += self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
crash = False
try:
self.do_simulation(a, self.frame_skip)
except mujoco_py.builder.MujocoException:
crash = True
# joint_cons_viol = self.check_traj_in_joint_limits()
ob = self._get_obs()
if not crash:
reward, success, is_collided = self.reward_function.compute_reward(a)
done = success or is_collided # self._steps == self.sim_steps - 1
self._steps += 1
else:
reward = -2000
success = False
is_collided = False
done = True
return ob, reward, done, dict(reward_dist=reward_dist,
reward_ctrl=reward_ctrl,
velocity=angular_vel,
# traj=self._q_pos,
action=a,
q_pos=self.sim.data.qpos[0:7].ravel().copy(),
q_vel=self.sim.data.qvel[0:7].ravel().copy(),
is_success=success,
is_collided=is_collided, sim_crash=crash)
def check_traj_in_joint_limits(self):
return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min)
# TODO: extend observation space
def _get_obs(self):
theta = self.sim.data.qpos.flat[:7]
return np.concatenate([
np.cos(theta),
np.sin(theta),
# self.get_body_com("target"), # only return target to make problem harder
[self._steps],
])
# These functions are for the task with 3 joint actuations
def extend_des_pos(self, des_pos):
des_pos_full = self._start_pos.copy()
des_pos_full[1] = des_pos[0]
des_pos_full[3] = des_pos[1]
des_pos_full[5] = des_pos[2]
return des_pos_full
def extend_des_vel(self, des_vel):
des_vel_full = self._start_vel.copy()
des_vel_full[1] = des_vel[0]
des_vel_full[3] = des_vel[1]
des_vel_full[5] = des_vel[2]
return des_vel_full
def render(self, render_mode, **render_kwargs):
if render_mode == "plot_trajectory":
if self._steps == 1:
import matplotlib.pyplot as plt
# plt.ion()
self.fig, self.axs = plt.subplots(3, 1)
if self._steps <= 1750:
for ax, cp in zip(self.axs, self.current_pos[1::2]):
ax.scatter(self._steps, cp, s=2, marker=".")
# self.fig.show()
else:
super().render(render_mode, **render_kwargs)
if __name__ == "__main__":
env = ALRBallInACupPDEnv(reward_type="no_context", simplified=True)
# env = gym.make("alr_envs:ALRBallInACupPDSimpleDetPMP-v0")
# ctxt = np.array([-0.20869846, -0.66376693, 1.18088501])
# env.configure(ctxt)
env.reset()
env.render("human")
for i in range(16000):
# test with random actions
ac = 0.02 * env.action_space.sample()[0:7]
# ac = env.start_pos
# ac[0] += np.pi/2
obs, rew, d, info = env.step(ac)
env.render("human")
print(rew)
if d:
break
env.close()

View File

@ -1,6 +1,6 @@
from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
from alr_envs.utils.mps.dmp_wrapper import DmpWrapper
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper
from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper
def make_contextual_env(rank, seed=0):
@ -26,7 +26,7 @@ def make_contextual_env(rank, seed=0):
return _init
def make_env(rank, seed=0):
def _make_env(rank, seed=0):
"""
Utility function for multiprocessed env.

View File

@ -1,11 +1,15 @@
from gym import utils
import os
import numpy as np
from alr_envs.mujoco import alr_mujoco_env
from gym import utils
from gym.envs.mujoco import MujocoEnv
class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
def __init__(self, model_path, frame_skip, n_substeps=4, apply_gravity_comp=True, reward_function=None):
utils.EzPickle.__init__(**locals())
MujocoEnv.__init__(self, model_path=model_path, frame_skip=frame_skip)
self._steps = 0
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
@ -25,17 +29,15 @@ class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
self.context = None
utils.EzPickle.__init__(self)
alr_mujoco_env.AlrMujocoEnv.__init__(self,
self.xml_path,
apply_gravity_comp=apply_gravity_comp,
n_substeps=n_substeps)
# alr_mujoco_env.AlrMujocoEnv.__init__(self,
# self.xml_path,
# apply_gravity_comp=apply_gravity_comp,
# n_substeps=n_substeps)
self.sim_time = 8 # seconds
self.sim_steps = int(self.sim_time / self.dt)
if reward_function is None:
from alr_envs.mujoco.beerpong.beerpong_reward import BeerpongReward
from alr_envs.alr.mujoco.beerpong.beerpong_reward import BeerpongReward
reward_function = BeerpongReward
self.reward_function = reward_function(self.sim, self.sim_steps)
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]

View File

@ -1,5 +1,5 @@
import numpy as np
from alr_envs.mujoco import alr_reward_fct
from alr_envs.alr.mujoco import alr_reward_fct
class BeerpongReward(alr_reward_fct.AlrReward):

View File

@ -1,5 +1,5 @@
import numpy as np
from alr_envs.mujoco import alr_reward_fct
from alr_envs.alr.mujoco import alr_reward_fct
class BeerpongReward(alr_reward_fct.AlrReward):

View File

@ -1,11 +1,13 @@
from gym import utils
import os
import numpy as np
from alr_envs.mujoco import alr_mujoco_env
from gym.envs.mujoco import MujocoEnv
class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
utils.EzPickle.__init__(**locals())
self._steps = 0
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
@ -26,16 +28,17 @@ class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
self.context = None
utils.EzPickle.__init__(self)
alr_mujoco_env.AlrMujocoEnv.__init__(self,
self.xml_path,
apply_gravity_comp=apply_gravity_comp,
n_substeps=n_substeps)
MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=n_substeps)
# alr_mujoco_env.AlrMujocoEnv.__init__(self,
# self.xml_path,
# apply_gravity_comp=apply_gravity_comp,
# n_substeps=n_substeps)
self.sim_time = 8 # seconds
self.sim_steps = int(self.sim_time / self.dt)
if reward_function is None:
from alr_envs.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
from alr_envs.alr.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
reward_function = BeerpongReward
self.reward_function = reward_function(self.sim, self.sim_steps)
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]

View File

@ -1,6 +1,6 @@
from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
from alr_envs.mujoco.beerpong.beerpong import ALRBeerpongEnv
from alr_envs.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
from alr_envs.alr.mujoco.beerpong.beerpong import ALRBeerpongEnv
from alr_envs.alr.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
def make_contextual_env(rank, seed=0):
@ -26,7 +26,7 @@ def make_contextual_env(rank, seed=0):
return _init
def make_env(rank, seed=0):
def _make_env(rank, seed=0):
"""
Utility function for multiprocessed env.

View File

@ -2,9 +2,9 @@ import numpy as np
from gym import spaces
from gym.envs.robotics import robot_env, utils
# import xml.etree.ElementTree as ET
from alr_envs.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
from alr_envs.alr.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
import glfw
from alr_envs.mujoco.gym_table_tennis.utils.experiment import ball_initialize
from alr_envs.alr.mujoco.gym_table_tennis.utils.experiment import ball_initialize
from pathlib import Path
import os

View File

@ -1,6 +1,6 @@
import numpy as np
from gym.utils import seeding
from alr_envs.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
from alr_envs.alr.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
from pathlib import Path

View File

@ -39,7 +39,7 @@ def config_save(dir_path, config):
def change_kp_in_xml(kp_list,
model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/envs/robotics/assets/table_tennis/right_arm_actuator.xml"):
model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/simple_reacher/robotics/assets/table_tennis/right_arm_actuator.xml"):
tree = ET.parse(model_path)
root = tree.getroot()
# for actuator in root.find("actuator"):

Some files were not shown because too many files have changed in this diff Show More