Merge remote-tracking branch 'origin/master'
# Conflicts: # alr_envs/__init__.py # alr_envs/alr/classic_control/hole_reacher/hole_reacher.py # alr_envs/utils/mps/mp_wrapper.py
This commit is contained in:
commit
8d1530823c
2
.gitignore
vendored
2
.gitignore
vendored
@ -109,3 +109,5 @@ venv.bak/
|
|||||||
|
|
||||||
#configs
|
#configs
|
||||||
/configs/db.cfg
|
/configs/db.cfg
|
||||||
|
legacy/
|
||||||
|
MUJOCO_LOG.TXT
|
||||||
|
@ -1,15 +0,0 @@
|
|||||||
Fri Aug 28 14:41:56 2020
|
|
||||||
ERROR: GLEW initalization error: Missing GL version
|
|
||||||
|
|
||||||
Fri Aug 28 14:59:14 2020
|
|
||||||
ERROR: GLEW initalization error: Missing GL version
|
|
||||||
|
|
||||||
Fri Aug 28 15:03:43 2020
|
|
||||||
ERROR: GLEW initalization error: Missing GL version
|
|
||||||
|
|
||||||
Fri Aug 28 15:07:03 2020
|
|
||||||
ERROR: GLEW initalization error: Missing GL version
|
|
||||||
|
|
||||||
Fri Aug 28 15:15:01 2020
|
|
||||||
ERROR: GLEW initalization error: Missing GL version
|
|
||||||
|
|
239
README.md
239
README.md
@ -1,87 +1,212 @@
|
|||||||
## ALR Environments
|
## ALR Robotics Control Environments
|
||||||
|
|
||||||
This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc.
|
This project offers a large verity of reinforcement learning environments under a unifying interface base on OpenAI gym.
|
||||||
Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
|
Besides, some custom environments we also provide support for the benchmark suites
|
||||||
For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
|
[OpenAI gym](https://gym.openai.com/),
|
||||||
We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation.
|
[DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control)
|
||||||
When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
|
(DMC), and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environment can be created according
|
||||||
|
to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we
|
||||||
|
further support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (DetPMP,
|
||||||
|
we only consider the mean usually).
|
||||||
|
|
||||||
## Environments
|
## Motion Primitive Environments (Episodic environments)
|
||||||
Currently we have the following environments:
|
|
||||||
|
|
||||||
### Mujoco
|
Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box
|
||||||
|
optimization and methods that often used in robotics. MP environments are trajectory-based and always execute a full
|
||||||
|
trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (DetPMP). The
|
||||||
|
generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is,
|
||||||
|
however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position,
|
||||||
|
velocity and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action
|
||||||
|
represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this
|
||||||
|
framework we support the above setting for the contextual setting, for which we expose all changing substates of the
|
||||||
|
task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each
|
||||||
|
trajectory. All environments provide the next to the cumulative episode reward also all collected information from each
|
||||||
|
step as part of the info dictionary. This information should, however, mainly be used for debugging and logging.
|
||||||
|
|
||||||
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
|Key| Description|
|
||||||
|---|---|---|---|---|
|
|---|---|
|
||||||
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|
`trajectory`| Generated trajectory from MP
|
||||||
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|
`step_actions`| Step-wise executed action based on controller output
|
||||||
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|
`step_observations`| Step-wise intermediate observations
|
||||||
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|
`step_rewards`| Step-wise rewards
|
||||||
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|
`trajectory_length`| Total number of environment interactions
|
||||||
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|
`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`.
|
||||||
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|
|
||||||
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|
|
||||||
|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
|
|
||||||
|
|
||||||
### Classic Control
|
## Installation
|
||||||
|
|
||||||
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
|
||||||
|---|---|---|---|---|
|
|
||||||
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|
|
||||||
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|
|
||||||
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|
|
||||||
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
|
|
||||||
|
|
||||||
### DMP Environments
|
|
||||||
These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
|
|
||||||
The goal is to learn the parameters of this DMP to generate a suitable trajectory.
|
|
||||||
All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions.
|
|
||||||
|
|
||||||
|Name| Description|Horizon|Action Dimension|Context Dimension
|
|
||||||
|---|---|---|---|---|
|
|
||||||
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|
|
||||||
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|
|
||||||
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|
|
||||||
|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
|
|
||||||
|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
|
|
||||||
|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
|
|
||||||
|
|
||||||
[//]: |`HoleReacherDetPMP-v0`|
|
|
||||||
|
|
||||||
### Stochastic Search
|
|
||||||
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
|
||||||
|---|---|---|---|---|
|
|
||||||
|`Rosenbrock{dim}-v0`| Gym interface for Rosenbrock function. `{dim}` is one of 5, 10, 25, 50 or 100. | 1 | `{dim}` | 0
|
|
||||||
|
|
||||||
|
|
||||||
## Install
|
|
||||||
1. Clone the repository
|
1. Clone the repository
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
git clone git@github.com:ALRhub/alr_envs.git
|
git clone git@github.com:ALRhub/alr_envs.git
|
||||||
```
|
```
|
||||||
|
|
||||||
2. Go to the folder
|
2. Go to the folder
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
cd alr_envs
|
cd alr_envs
|
||||||
```
|
```
|
||||||
|
|
||||||
3. Install with
|
3. Install with
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install -e .
|
pip install -e .
|
||||||
```
|
```
|
||||||
4. Use (see [example.py](./example.py)):
|
|
||||||
```python
|
|
||||||
import gym
|
|
||||||
|
|
||||||
env = gym.make('alr_envs:SimpleReacher-v0')
|
## Using the framework
|
||||||
|
|
||||||
|
We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples.
|
||||||
|
|
||||||
|
### Step-wise environments
|
||||||
|
|
||||||
|
```python
|
||||||
|
import alr_envs
|
||||||
|
|
||||||
|
env = alr_envs.make('HoleReacher-v0', seed=1)
|
||||||
state = env.reset()
|
state = env.reset()
|
||||||
|
|
||||||
for i in range(10000):
|
for i in range(1000):
|
||||||
state, reward, done, info = env.step(env.action_space.sample())
|
state, reward, done, info = env.step(env.action_space.sample())
|
||||||
if i % 5 == 0:
|
if i % 5 == 0:
|
||||||
env.render()
|
env.render()
|
||||||
|
|
||||||
if done:
|
if done:
|
||||||
state = env.reset()
|
state = env.reset()
|
||||||
|
```
|
||||||
|
|
||||||
|
For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks
|
||||||
|
as `manipulation-environment_name`. All other environments can be created based on their original name.
|
||||||
|
|
||||||
|
Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full
|
||||||
|
trajectory.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import alr_envs
|
||||||
|
|
||||||
|
env = alr_envs.make('HoleReacherDetPMP-v0', seed=1)
|
||||||
|
# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None).
|
||||||
|
env.render()
|
||||||
|
|
||||||
|
state = env.reset()
|
||||||
|
|
||||||
|
for i in range(5):
|
||||||
|
state, reward, done, info = env.step(env.action_space.sample())
|
||||||
|
|
||||||
|
# Not really necessary as the environments resets itself after each trajectory anyway.
|
||||||
|
state = env.reset()
|
||||||
|
```
|
||||||
|
|
||||||
|
To show all available environments, we provide some additional convenience. Each value will return a dictionary with two
|
||||||
|
keys `DMP` and `DetPMP` that store a list of available environment names.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import alr_envs
|
||||||
|
|
||||||
|
print("Custom MP tasks:")
|
||||||
|
print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("OpenAI Gym MP tasks:")
|
||||||
|
print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("Deepmind Control MP tasks:")
|
||||||
|
print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("MetaWorld MP tasks:")
|
||||||
|
print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
```
|
||||||
|
|
||||||
|
### How to create a new MP task
|
||||||
|
|
||||||
|
In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at
|
||||||
|
hand, the following interface needs to be implemented.
|
||||||
|
|
||||||
|
```python
|
||||||
|
import numpy as np
|
||||||
|
from mp_env_api import MPEnvWrapper
|
||||||
|
|
||||||
|
|
||||||
|
class MPWrapper(MPEnvWrapper):
|
||||||
|
|
||||||
|
@property
|
||||||
|
def active_obs(self):
|
||||||
|
"""
|
||||||
|
Returns boolean mask for each substate in the full observation.
|
||||||
|
It determines whether the observation is returned for the contextual case or not.
|
||||||
|
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||||
|
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first
|
||||||
|
observation, the velocities are not necessary in the observation for the MP task.
|
||||||
|
"""
|
||||||
|
return np.ones(self.observation_space.shape, dtype=bool)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self):
|
||||||
|
"""
|
||||||
|
Returns the current velocity of the action/control dimension.
|
||||||
|
The dimensionality has to match the action/control dimension.
|
||||||
|
This is not required when exclusively using position control,
|
||||||
|
it should, however, be implemented regardless.
|
||||||
|
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self):
|
||||||
|
"""
|
||||||
|
Returns the current position of the action/control dimension.
|
||||||
|
The dimensionality has to match the action/control dimension.
|
||||||
|
This is not required when exclusively using velocity control,
|
||||||
|
it should, however, be implemented regardless.
|
||||||
|
E.g. The joint positions that are directly or indirectly controlled by the action.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def goal_pos(self):
|
||||||
|
"""
|
||||||
|
Returns a predefined final position of the action/control dimension.
|
||||||
|
This is only required for the DMP and is most of the time learned instead.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self):
|
||||||
|
"""
|
||||||
|
Returns the time between two simulated steps of the environment
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py)
|
If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well.
|
||||||
|
Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend
|
||||||
|
having a look at the [examples](alr_envs/examples/).
|
||||||
|
|
||||||
|
```python
|
||||||
|
import alr_envs
|
||||||
|
|
||||||
|
# Base environment name, according to structure of above example
|
||||||
|
base_env_id = "ball_in_cup-catch"
|
||||||
|
|
||||||
|
# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper.
|
||||||
|
# You can also add other gym.Wrappers in case they are needed,
|
||||||
|
# e.g. gym.wrappers.FlattenObservation for dict observations
|
||||||
|
wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper]
|
||||||
|
mp_kwargs = {...}
|
||||||
|
kwargs = {...}
|
||||||
|
env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs)
|
||||||
|
# OR for a deterministic ProMP (other mp_kwargs are required):
|
||||||
|
# env = alr_envs.make_detpmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args)
|
||||||
|
|
||||||
|
rewards = 0
|
||||||
|
obs = env.reset()
|
||||||
|
|
||||||
|
# number of samples/full trajectories (multiple environment steps)
|
||||||
|
for i in range(5):
|
||||||
|
ac = env.action_space.sample()
|
||||||
|
obs, reward, done, info = env.step(ac)
|
||||||
|
rewards += reward
|
||||||
|
|
||||||
|
if done:
|
||||||
|
print(base_env_id, rewards)
|
||||||
|
rewards = 0
|
||||||
|
obs = env.reset()
|
||||||
|
```
|
||||||
|
329
alr_envs/alr/__init__.py
Normal file
329
alr_envs/alr/__init__.py
Normal file
@ -0,0 +1,329 @@
|
|||||||
|
from gym import register
|
||||||
|
|
||||||
|
from . import classic_control, mujoco
|
||||||
|
from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv
|
||||||
|
from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv
|
||||||
|
from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
|
||||||
|
from .mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
|
||||||
|
from .mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv
|
||||||
|
from .mujoco.reacher.alr_reacher import ALRReacherEnv
|
||||||
|
from .mujoco.reacher.balancing import BalancingEnv
|
||||||
|
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "DetPMP": []}
|
||||||
|
|
||||||
|
# Classic Control
|
||||||
|
## Simple Reacher
|
||||||
|
register(
|
||||||
|
id='SimpleReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 2,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='SimpleReacher-v1',
|
||||||
|
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 2,
|
||||||
|
"random_start": False
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='LongSimpleReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='LongSimpleReacher-v1',
|
||||||
|
entry_point='alr_envs.alr.classic_control:SimpleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
"random_start": False
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
## Viapoint Reacher
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ViaPointReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.classic_control:ViaPointReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
"allow_self_collision": False,
|
||||||
|
"collision_penalty": 1000
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
## Hole Reacher
|
||||||
|
register(
|
||||||
|
id='HoleReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
"random_start": True,
|
||||||
|
"allow_self_collision": False,
|
||||||
|
"allow_wall_collision": False,
|
||||||
|
"hole_width": None,
|
||||||
|
"hole_depth": 1,
|
||||||
|
"hole_x": None,
|
||||||
|
"collision_penalty": 100,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='HoleReacher-v1',
|
||||||
|
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
"random_start": False,
|
||||||
|
"allow_self_collision": False,
|
||||||
|
"allow_wall_collision": False,
|
||||||
|
"hole_width": 0.25,
|
||||||
|
"hole_depth": 1,
|
||||||
|
"hole_x": None,
|
||||||
|
"collision_penalty": 100,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='HoleReacher-v2',
|
||||||
|
entry_point='alr_envs.alr.classic_control:HoleReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
"random_start": False,
|
||||||
|
"allow_self_collision": False,
|
||||||
|
"allow_wall_collision": False,
|
||||||
|
"hole_width": 0.25,
|
||||||
|
"hole_depth": 1,
|
||||||
|
"hole_x": 2,
|
||||||
|
"collision_penalty": 100,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mujoco
|
||||||
|
|
||||||
|
## Reacher
|
||||||
|
register(
|
||||||
|
id='ALRReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 0,
|
||||||
|
"n_links": 5,
|
||||||
|
"balance": False,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ALRReacherSparse-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 200,
|
||||||
|
"n_links": 5,
|
||||||
|
"balance": False,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ALRReacherSparseBalanced-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 200,
|
||||||
|
"n_links": 5,
|
||||||
|
"balance": True,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ALRLongReacher-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 0,
|
||||||
|
"n_links": 7,
|
||||||
|
"balance": False,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ALRLongReacherSparse-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 200,
|
||||||
|
"n_links": 7,
|
||||||
|
"balance": False,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ALRLongReacherSparseBalanced-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:ALRReacherEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"steps_before_reward": 200,
|
||||||
|
"n_links": 7,
|
||||||
|
"balance": True,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
## Balancing Reacher
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='Balancing-v0',
|
||||||
|
entry_point='alr_envs.alr.mujoco:BalancingEnv',
|
||||||
|
max_episode_steps=200,
|
||||||
|
kwargs={
|
||||||
|
"n_links": 5,
|
||||||
|
}
|
||||||
|
)
|
||||||
|
|
||||||
|
# Motion Primitive Environments
|
||||||
|
|
||||||
|
## Simple Reacher
|
||||||
|
_versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"]
|
||||||
|
for _v in _versions:
|
||||||
|
_name = _v.split("-")
|
||||||
|
_env_id = f'{_name[0]}DMP-{_name[1]}'
|
||||||
|
register(
|
||||||
|
id=_env_id,
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
|
||||||
|
# max_episode_steps=1,
|
||||||
|
kwargs={
|
||||||
|
"name": f"alr_envs:{_v}",
|
||||||
|
"wrappers": [classic_control.simple_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 2 if "long" not in _v.lower() else 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 20,
|
||||||
|
"alpha_phase": 2,
|
||||||
|
"learn_goal": True,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 50,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||||
|
|
||||||
|
_env_id = f'{_name[0]}DetPMP-{_name[1]}'
|
||||||
|
register(
|
||||||
|
id=_env_id,
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
|
||||||
|
# max_episode_steps=1,
|
||||||
|
kwargs={
|
||||||
|
"name": f"alr_envs:{_v}",
|
||||||
|
"wrappers": [classic_control.simple_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 2 if "long" not in _v.lower() else 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 20,
|
||||||
|
"width": 0.025,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 0.2,
|
||||||
|
"zero_start": True
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)
|
||||||
|
|
||||||
|
# Viapoint reacher
|
||||||
|
register(
|
||||||
|
id='ViaPointReacherDMP-v0',
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
|
||||||
|
# max_episode_steps=1,
|
||||||
|
kwargs={
|
||||||
|
"name": "alr_envs:ViaPointReacher-v0",
|
||||||
|
"wrappers": [classic_control.viapoint_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 2,
|
||||||
|
"learn_goal": True,
|
||||||
|
"alpha_phase": 2,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 50,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0")
|
||||||
|
|
||||||
|
register(
|
||||||
|
id='ViaPointReacherDetPMP-v0',
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
|
||||||
|
# max_episode_steps=1,
|
||||||
|
kwargs={
|
||||||
|
"name": "alr_envs:ViaPointReacher-v0",
|
||||||
|
"wrappers": [classic_control.viapoint_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 2,
|
||||||
|
"width": 0.025,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 0.2,
|
||||||
|
"zero_start": True
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append("ViaPointReacherDetPMP-v0")
|
||||||
|
|
||||||
|
## Hole Reacher
|
||||||
|
_versions = ["v0", "v1", "v2"]
|
||||||
|
for _v in _versions:
|
||||||
|
_env_id = f'HoleReacherDMP-{_v}'
|
||||||
|
register(
|
||||||
|
id=_env_id,
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper',
|
||||||
|
# max_episode_steps=1,
|
||||||
|
kwargs={
|
||||||
|
"name": f"alr_envs:HoleReacher-{_v}",
|
||||||
|
"wrappers": [classic_control.hole_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 2,
|
||||||
|
"learn_goal": True,
|
||||||
|
"alpha_phase": 2,
|
||||||
|
"bandwidth_factor": 2,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 50,
|
||||||
|
"goal_scale": 0.1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id)
|
||||||
|
|
||||||
|
_env_id = f'HoleReacherDetPMP-{_v}'
|
||||||
|
register(
|
||||||
|
id=_env_id,
|
||||||
|
entry_point='alr_envs.utils.make_env_helpers:make_detpmp_env_helper',
|
||||||
|
kwargs={
|
||||||
|
"name": f"alr_envs:HoleReacher-{_v}",
|
||||||
|
"wrappers": [classic_control.hole_reacher.MPWrapper],
|
||||||
|
"mp_kwargs": {
|
||||||
|
"num_dof": 5,
|
||||||
|
"num_basis": 5,
|
||||||
|
"duration": 2,
|
||||||
|
"width": 0.025,
|
||||||
|
"policy_type": "velocity",
|
||||||
|
"weights_scale": 0.2,
|
||||||
|
"zero_start": True
|
||||||
|
}
|
||||||
|
}
|
||||||
|
)
|
||||||
|
ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DetPMP"].append(_env_id)
|
21
alr_envs/alr/classic_control/README.MD
Normal file
21
alr_envs/alr/classic_control/README.MD
Normal file
@ -0,0 +1,21 @@
|
|||||||
|
### Classic Control
|
||||||
|
|
||||||
|
## Step-based Environments
|
||||||
|
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
||||||
|
|---|---|---|---|---|
|
||||||
|
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|
||||||
|
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|
||||||
|
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|
||||||
|
|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
|
||||||
|
|
||||||
|
## MP Environments
|
||||||
|
|Name| Description|Horizon|Action Dimension|Context Dimension
|
||||||
|
|---|---|---|---|---|
|
||||||
|
|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|
||||||
|
|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|
||||||
|
|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|
||||||
|
|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
|
||||||
|
|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
|
||||||
|
|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
|
||||||
|
|
||||||
|
[//]: |`HoleReacherDetPMP-v0`|
|
3
alr_envs/alr/classic_control/__init__.py
Normal file
3
alr_envs/alr/classic_control/__init__.py
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
from .hole_reacher.hole_reacher import HoleReacherEnv
|
||||||
|
from .simple_reacher.simple_reacher import SimpleReacherEnv
|
||||||
|
from .viapoint_reacher.viapoint_reacher import ViaPointReacherEnv
|
1
alr_envs/alr/classic_control/hole_reacher/__init__.py
Normal file
1
alr_envs/alr/classic_control/hole_reacher/__init__.py
Normal file
@ -0,0 +1 @@
|
|||||||
|
from .mp_wrapper import MPWrapper
|
@ -6,11 +6,10 @@ import numpy as np
|
|||||||
from gym.utils import seeding
|
from gym.utils import seeding
|
||||||
from matplotlib import patches
|
from matplotlib import patches
|
||||||
|
|
||||||
from alr_envs.classic_control.utils import check_self_collision
|
from alr_envs.alr.classic_control.utils import check_self_collision
|
||||||
from alr_envs.utils.mps.mp_environments import AlrEnv
|
|
||||||
|
|
||||||
|
|
||||||
class HoleReacherEnv(AlrEnv):
|
class HoleReacherEnv(gym.Env):
|
||||||
|
|
||||||
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
|
def __init__(self, n_links: int, hole_x: Union[None, float] = None, hole_depth: Union[None, float] = None,
|
||||||
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
|
hole_width: float = 1., random_start: bool = False, allow_self_collision: bool = False,
|
||||||
@ -22,14 +21,14 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
self.random_start = random_start
|
self.random_start = random_start
|
||||||
|
|
||||||
# provided initial parameters
|
# provided initial parameters
|
||||||
self._hole_x = hole_x # x-position of center of hole
|
self.initial_x = hole_x # x-position of center of hole
|
||||||
self._hole_width = hole_width # width of hole
|
self.initial_width = hole_width # width of hole
|
||||||
self._hole_depth = hole_depth # depth of hole
|
self.initial_depth = hole_depth # depth of hole
|
||||||
|
|
||||||
# temp container for current env state
|
# temp container for current env state
|
||||||
self._tmp_hole_x = None
|
self._tmp_x = None
|
||||||
self._tmp_hole_width = None
|
self._tmp_width = None
|
||||||
self._tmp_hole_depth = None
|
self._tmp_depth = None
|
||||||
self._goal = None # x-y coordinates for reaching the center at the bottom of the hole
|
self._goal = None # x-y coordinates for reaching the center at the bottom of the hole
|
||||||
|
|
||||||
# collision
|
# collision
|
||||||
@ -44,7 +43,7 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
self._start_pos = np.hstack([[np.pi / 2], np.zeros(self.n_links - 1)])
|
self._start_pos = np.hstack([[np.pi / 2], np.zeros(self.n_links - 1)])
|
||||||
self._start_vel = np.zeros(self.n_links)
|
self._start_vel = np.zeros(self.n_links)
|
||||||
|
|
||||||
self.dt = 0.01
|
self._dt = 0.01
|
||||||
|
|
||||||
action_bound = np.pi * np.ones((self.n_links,))
|
action_bound = np.pi * np.ones((self.n_links,))
|
||||||
state_bound = np.hstack([
|
state_bound = np.hstack([
|
||||||
@ -66,6 +65,22 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
self._steps = 0
|
self._steps = 0
|
||||||
self.seed()
|
self.seed()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self._dt
|
||||||
|
|
||||||
|
# @property
|
||||||
|
# def start_pos(self):
|
||||||
|
# return self._start_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self):
|
||||||
|
return self._joint_angles.copy()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self):
|
||||||
|
return self._angle_velocity.copy()
|
||||||
|
|
||||||
def step(self, action: np.ndarray):
|
def step(self, action: np.ndarray):
|
||||||
"""
|
"""
|
||||||
A single step with an action in joint velocity space
|
A single step with an action in joint velocity space
|
||||||
@ -88,7 +103,7 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
|
|
||||||
def reset(self):
|
def reset(self):
|
||||||
if self.random_start:
|
if self.random_start:
|
||||||
# Maybe change more than dirst seed
|
# Maybe change more than first seed
|
||||||
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
|
first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4)
|
||||||
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
|
self._joint_angles = np.hstack([[first_joint], np.zeros(self.n_links - 1)])
|
||||||
self._start_pos = self._joint_angles.copy()
|
self._start_pos = self._joint_angles.copy()
|
||||||
@ -107,30 +122,27 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
return self._get_obs().copy()
|
return self._get_obs().copy()
|
||||||
|
|
||||||
def _generate_hole(self):
|
def _generate_hole(self):
|
||||||
if self._hole_width is None:
|
if self.initial_width is None:
|
||||||
width = self.np_random.uniform(0.15, 0.5, 1)
|
width = self.np_random.uniform(0.15, 0.5)
|
||||||
else:
|
else:
|
||||||
width = np.copy(self._hole_width)
|
width = np.copy(self.initial_width)
|
||||||
|
if self.initial_x is None:
|
||||||
if self._hole_x is None:
|
|
||||||
# sample whole on left or right side
|
# sample whole on left or right side
|
||||||
direction = np.random.choice([-1, 1])
|
direction = self.np_random.choice([-1, 1])
|
||||||
# Hole center needs to be half the width away from the arm to give a valid setting.
|
# Hole center needs to be half the width away from the arm to give a valid setting.
|
||||||
x = direction * self.np_random.uniform(width / 2, 3.5, 1)
|
x = direction * self.np_random.uniform(width / 2, 3.5)
|
||||||
else:
|
else:
|
||||||
x = np.copy(self._hole_x)
|
x = np.copy(self.initial_x)
|
||||||
|
if self.initial_depth is None:
|
||||||
if self._hole_depth is None:
|
|
||||||
# TODO we do not want this right now.
|
# TODO we do not want this right now.
|
||||||
depth = self.np_random.uniform(1, 1, 1)
|
depth = self.np_random.uniform(1, 1)
|
||||||
else:
|
else:
|
||||||
depth = np.copy(self._hole_depth)
|
depth = np.copy(self.initial_depth)
|
||||||
|
|
||||||
self._tmp_hole_width = width
|
self._tmp_width = width
|
||||||
self._tmp_hole_x = x
|
self._tmp_x = x
|
||||||
self._tmp_hole_depth = depth
|
self._tmp_depth = depth
|
||||||
|
self._goal = np.hstack([self._tmp_x, -self._tmp_depth])
|
||||||
self._goal = np.hstack([self._tmp_hole_x, -self._tmp_hole_depth])
|
|
||||||
|
|
||||||
def _update_joints(self):
|
def _update_joints(self):
|
||||||
"""
|
"""
|
||||||
@ -178,7 +190,7 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
np.cos(theta),
|
np.cos(theta),
|
||||||
np.sin(theta),
|
np.sin(theta),
|
||||||
self._angle_velocity,
|
self._angle_velocity,
|
||||||
self._tmp_hole_width,
|
self._tmp_width,
|
||||||
# self._tmp_hole_depth,
|
# self._tmp_hole_depth,
|
||||||
self.end_effector - self._goal,
|
self.end_effector - self._goal,
|
||||||
self._steps
|
self._steps
|
||||||
@ -204,9 +216,8 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
return np.squeeze(end_effector + self._joints[0, :])
|
return np.squeeze(end_effector + self._joints[0, :])
|
||||||
|
|
||||||
def _check_wall_collision(self, line_points):
|
def _check_wall_collision(self, line_points):
|
||||||
|
|
||||||
# all points that are before the hole in x
|
# all points that are before the hole in x
|
||||||
r, c = np.where(line_points[:, :, 0] < (self._tmp_hole_x - self._tmp_hole_width / 2))
|
r, c = np.where(line_points[:, :, 0] < (self._tmp_x - self._tmp_width / 2))
|
||||||
|
|
||||||
# check if any of those points are below surface
|
# check if any of those points are below surface
|
||||||
nr_line_points_below_surface_before_hole = np.sum(line_points[r, c, 1] < 0)
|
nr_line_points_below_surface_before_hole = np.sum(line_points[r, c, 1] < 0)
|
||||||
@ -215,7 +226,7 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
# all points that are after the hole in x
|
# all points that are after the hole in x
|
||||||
r, c = np.where(line_points[:, :, 0] > (self._tmp_hole_x + self._tmp_hole_width / 2))
|
r, c = np.where(line_points[:, :, 0] > (self._tmp_x + self._tmp_width / 2))
|
||||||
|
|
||||||
# check if any of those points are below surface
|
# check if any of those points are below surface
|
||||||
nr_line_points_below_surface_after_hole = np.sum(line_points[r, c, 1] < 0)
|
nr_line_points_below_surface_after_hole = np.sum(line_points[r, c, 1] < 0)
|
||||||
@ -224,11 +235,11 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
return True
|
return True
|
||||||
|
|
||||||
# all points that are above the hole
|
# all points that are above the hole
|
||||||
r, c = np.where((line_points[:, :, 0] > (self._tmp_hole_x - self._tmp_hole_width / 2)) & (
|
r, c = np.where((line_points[:, :, 0] > (self._tmp_x - self._tmp_width / 2)) & (
|
||||||
line_points[:, :, 0] < (self._tmp_hole_x + self._tmp_hole_width / 2)))
|
line_points[:, :, 0] < (self._tmp_x + self._tmp_width / 2)))
|
||||||
|
|
||||||
# check if any of those points are below surface
|
# check if any of those points are below surface
|
||||||
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_hole_depth)
|
nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self._tmp_depth)
|
||||||
|
|
||||||
if nr_line_points_below_surface_in_hole > 0:
|
if nr_line_points_below_surface_in_hole > 0:
|
||||||
return True
|
return True
|
||||||
@ -252,7 +263,7 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
self.fig.show()
|
self.fig.show()
|
||||||
|
|
||||||
self.fig.gca().set_title(
|
self.fig.gca().set_title(
|
||||||
f"Iteration: {self._steps}, distance: {self.end_effector - self._goal}")
|
f"Iteration: {self._steps}, distance: {np.linalg.norm(self.end_effector - self._goal) ** 2}")
|
||||||
|
|
||||||
if mode == "human":
|
if mode == "human":
|
||||||
|
|
||||||
@ -271,17 +282,17 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
def _set_patches(self):
|
def _set_patches(self):
|
||||||
if self.fig is not None:
|
if self.fig is not None:
|
||||||
self.fig.gca().patches = []
|
self.fig.gca().patches = []
|
||||||
left_block = patches.Rectangle((-self.n_links, -self._tmp_hole_depth),
|
left_block = patches.Rectangle((-self.n_links, -self._tmp_depth),
|
||||||
self.n_links + self._tmp_hole_x - self._tmp_hole_width / 2,
|
self.n_links + self._tmp_x - self._tmp_width / 2,
|
||||||
self._tmp_hole_depth,
|
self._tmp_depth,
|
||||||
fill=True, edgecolor='k', facecolor='k')
|
fill=True, edgecolor='k', facecolor='k')
|
||||||
right_block = patches.Rectangle((self._tmp_hole_x + self._tmp_hole_width / 2, -self._tmp_hole_depth),
|
right_block = patches.Rectangle((self._tmp_x + self._tmp_width / 2, -self._tmp_depth),
|
||||||
self.n_links - self._tmp_hole_x + self._tmp_hole_width / 2,
|
self.n_links - self._tmp_x + self._tmp_width / 2,
|
||||||
self._tmp_hole_depth,
|
self._tmp_depth,
|
||||||
fill=True, edgecolor='k', facecolor='k')
|
fill=True, edgecolor='k', facecolor='k')
|
||||||
hole_floor = patches.Rectangle((self._tmp_hole_x - self._tmp_hole_width / 2, -self._tmp_hole_depth),
|
hole_floor = patches.Rectangle((self._tmp_x - self._tmp_width / 2, -self._tmp_depth),
|
||||||
self._tmp_hole_width,
|
self._tmp_width,
|
||||||
1 - self._tmp_hole_depth,
|
1 - self._tmp_depth,
|
||||||
fill=True, edgecolor='k', facecolor='k')
|
fill=True, edgecolor='k', facecolor='k')
|
||||||
|
|
||||||
# Add the patch to the Axes
|
# Add the patch to the Axes
|
||||||
@ -289,26 +300,6 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
self.fig.gca().add_patch(right_block)
|
self.fig.gca().add_patch(right_block)
|
||||||
self.fig.gca().add_patch(hole_floor)
|
self.fig.gca().add_patch(hole_floor)
|
||||||
|
|
||||||
@property
|
|
||||||
def active_obs(self):
|
|
||||||
return np.hstack([
|
|
||||||
[self.random_start] * self.n_links, # cos
|
|
||||||
[self.random_start] * self.n_links, # sin
|
|
||||||
[self.random_start] * self.n_links, # velocity
|
|
||||||
[self._hole_width is None], # hole width
|
|
||||||
# [self._hole_depth is None], # hole depth
|
|
||||||
[True] * 2, # x-y coordinates of target distance
|
|
||||||
[False] # env steps
|
|
||||||
])
|
|
||||||
|
|
||||||
@property
|
|
||||||
def start_pos(self) -> Union[float, int, np.ndarray]:
|
|
||||||
return self._start_pos
|
|
||||||
|
|
||||||
@property
|
|
||||||
def goal_pos(self) -> Union[float, int, np.ndarray]:
|
|
||||||
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
|
||||||
|
|
||||||
def seed(self, seed=None):
|
def seed(self, seed=None):
|
||||||
self.np_random, seed = seeding.np_random(seed)
|
self.np_random, seed = seeding.np_random(seed)
|
||||||
return [seed]
|
return [seed]
|
||||||
@ -318,25 +309,6 @@ class HoleReacherEnv(AlrEnv):
|
|||||||
return self._joints[self.n_links].T
|
return self._joints[self.n_links].T
|
||||||
|
|
||||||
def close(self):
|
def close(self):
|
||||||
|
super().close()
|
||||||
if self.fig is not None:
|
if self.fig is not None:
|
||||||
plt.close(self.fig)
|
plt.close(self.fig)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
nl = 5
|
|
||||||
render_mode = "human" # "human" or "partial" or "final"
|
|
||||||
env = HoleReacherEnv(n_links=nl, allow_self_collision=False, allow_wall_collision=False, hole_width=None,
|
|
||||||
hole_depth=1, hole_x=None)
|
|
||||||
obs = env.reset()
|
|
||||||
|
|
||||||
for i in range(2000):
|
|
||||||
# objective.load_result("/tmp/cma")
|
|
||||||
# test with random actions
|
|
||||||
ac = env.action_space.sample()
|
|
||||||
obs, rew, d, info = env.step(ac)
|
|
||||||
if i % 10 == 0:
|
|
||||||
env.render(mode=render_mode)
|
|
||||||
if d:
|
|
||||||
env.reset()
|
|
||||||
|
|
||||||
env.close()
|
|
35
alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py
Normal file
35
alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
from typing import Tuple, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from mp_env_api import MPEnvWrapper
|
||||||
|
|
||||||
|
|
||||||
|
class MPWrapper(MPEnvWrapper):
|
||||||
|
@property
|
||||||
|
def active_obs(self):
|
||||||
|
return np.hstack([
|
||||||
|
[self.env.random_start] * self.env.n_links, # cos
|
||||||
|
[self.env.random_start] * self.env.n_links, # sin
|
||||||
|
[self.env.random_start] * self.env.n_links, # velocity
|
||||||
|
[self.env.initial_width is None], # hole width
|
||||||
|
# [self.env.hole_depth is None], # hole depth
|
||||||
|
[True] * 2, # x-y coordinates of target distance
|
||||||
|
[False] # env steps
|
||||||
|
])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_vel
|
||||||
|
|
||||||
|
@property
|
||||||
|
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self.env.dt
|
1
alr_envs/alr/classic_control/simple_reacher/__init__.py
Normal file
1
alr_envs/alr/classic_control/simple_reacher/__init__.py
Normal file
@ -0,0 +1 @@
|
|||||||
|
from .mp_wrapper import MPWrapper
|
33
alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py
Normal file
33
alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py
Normal file
@ -0,0 +1,33 @@
|
|||||||
|
from typing import Tuple, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from mp_env_api import MPEnvWrapper
|
||||||
|
|
||||||
|
|
||||||
|
class MPWrapper(MPEnvWrapper):
|
||||||
|
@property
|
||||||
|
def active_obs(self):
|
||||||
|
return np.hstack([
|
||||||
|
[self.env.random_start] * self.env.n_links, # cos
|
||||||
|
[self.env.random_start] * self.env.n_links, # sin
|
||||||
|
[self.env.random_start] * self.env.n_links, # velocity
|
||||||
|
[True] * 2, # x-y coordinates of target distance
|
||||||
|
[False] # env steps
|
||||||
|
])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_vel
|
||||||
|
|
||||||
|
@property
|
||||||
|
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self.env.dt
|
@ -1,14 +1,13 @@
|
|||||||
from typing import Iterable, Union
|
from typing import Iterable, Union
|
||||||
|
|
||||||
|
import gym
|
||||||
import matplotlib.pyplot as plt
|
import matplotlib.pyplot as plt
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from gym import spaces
|
from gym import spaces
|
||||||
from gym.utils import seeding
|
from gym.utils import seeding
|
||||||
|
|
||||||
from alr_envs.utils.mps.mp_environments import AlrEnv
|
|
||||||
|
|
||||||
|
class SimpleReacherEnv(gym.Env):
|
||||||
class SimpleReacherEnv(AlrEnv):
|
|
||||||
"""
|
"""
|
||||||
Simple Reaching Task without any physics simulation.
|
Simple Reaching Task without any physics simulation.
|
||||||
Returns no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions
|
Returns no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions
|
||||||
@ -19,19 +18,22 @@ class SimpleReacherEnv(AlrEnv):
|
|||||||
super().__init__()
|
super().__init__()
|
||||||
self.link_lengths = np.ones(n_links)
|
self.link_lengths = np.ones(n_links)
|
||||||
self.n_links = n_links
|
self.n_links = n_links
|
||||||
self.dt = 0.1
|
self._dt = 0.1
|
||||||
|
|
||||||
self.random_start = random_start
|
self.random_start = random_start
|
||||||
|
|
||||||
|
# provided initial parameters
|
||||||
|
self.inital_target = target
|
||||||
|
|
||||||
|
# temp container for current env state
|
||||||
|
self._goal = None
|
||||||
|
|
||||||
self._joints = None
|
self._joints = None
|
||||||
self._joint_angles = None
|
self._joint_angles = None
|
||||||
self._angle_velocity = None
|
self._angle_velocity = None
|
||||||
self._start_pos = np.zeros(self.n_links)
|
self._start_pos = np.zeros(self.n_links)
|
||||||
self._start_vel = np.zeros(self.n_links)
|
self._start_vel = np.zeros(self.n_links)
|
||||||
|
|
||||||
self._target = target # provided target value
|
|
||||||
self._goal = None # updated goal value, does not change when target != None
|
|
||||||
|
|
||||||
self.max_torque = 1
|
self.max_torque = 1
|
||||||
self.steps_before_reward = 199
|
self.steps_before_reward = 199
|
||||||
|
|
||||||
@ -53,6 +55,22 @@ class SimpleReacherEnv(AlrEnv):
|
|||||||
self._steps = 0
|
self._steps = 0
|
||||||
self.seed()
|
self.seed()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self._dt
|
||||||
|
|
||||||
|
# @property
|
||||||
|
# def start_pos(self):
|
||||||
|
# return self._start_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self):
|
||||||
|
return self._joint_angles
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self):
|
||||||
|
return self._angle_velocity
|
||||||
|
|
||||||
def step(self, action: np.ndarray):
|
def step(self, action: np.ndarray):
|
||||||
"""
|
"""
|
||||||
A single step with action in torque space
|
A single step with action in torque space
|
||||||
@ -126,14 +144,14 @@ class SimpleReacherEnv(AlrEnv):
|
|||||||
|
|
||||||
def _generate_goal(self):
|
def _generate_goal(self):
|
||||||
|
|
||||||
if self._target is None:
|
if self.inital_target is None:
|
||||||
|
|
||||||
total_length = np.sum(self.link_lengths)
|
total_length = np.sum(self.link_lengths)
|
||||||
goal = np.array([total_length, total_length])
|
goal = np.array([total_length, total_length])
|
||||||
while np.linalg.norm(goal) >= total_length:
|
while np.linalg.norm(goal) >= total_length:
|
||||||
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
|
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
|
||||||
else:
|
else:
|
||||||
goal = np.copy(self._target)
|
goal = np.copy(self.inital_target)
|
||||||
|
|
||||||
self._goal = goal
|
self._goal = goal
|
||||||
|
|
||||||
@ -172,24 +190,6 @@ class SimpleReacherEnv(AlrEnv):
|
|||||||
self.fig.canvas.draw()
|
self.fig.canvas.draw()
|
||||||
self.fig.canvas.flush_events()
|
self.fig.canvas.flush_events()
|
||||||
|
|
||||||
@property
|
|
||||||
def active_obs(self):
|
|
||||||
return np.hstack([
|
|
||||||
[self.random_start] * self.n_links, # cos
|
|
||||||
[self.random_start] * self.n_links, # sin
|
|
||||||
[self.random_start] * self.n_links, # velocity
|
|
||||||
[True] * 2, # x-y coordinates of target distance
|
|
||||||
[False] # env steps
|
|
||||||
])
|
|
||||||
|
|
||||||
@property
|
|
||||||
def start_pos(self):
|
|
||||||
return self._start_pos
|
|
||||||
|
|
||||||
@property
|
|
||||||
def goal_pos(self):
|
|
||||||
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
|
||||||
|
|
||||||
def seed(self, seed=None):
|
def seed(self, seed=None):
|
||||||
self.np_random, seed = seeding.np_random(seed)
|
self.np_random, seed = seeding.np_random(seed)
|
||||||
return [seed]
|
return [seed]
|
||||||
@ -200,26 +200,3 @@ class SimpleReacherEnv(AlrEnv):
|
|||||||
@property
|
@property
|
||||||
def end_effector(self):
|
def end_effector(self):
|
||||||
return self._joints[self.n_links].T
|
return self._joints[self.n_links].T
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
nl = 5
|
|
||||||
render_mode = "human" # "human" or "partial" or "final"
|
|
||||||
env = SimpleReacherEnv(n_links=nl)
|
|
||||||
obs = env.reset()
|
|
||||||
print("First", obs)
|
|
||||||
|
|
||||||
for i in range(2000):
|
|
||||||
# objective.load_result("/tmp/cma")
|
|
||||||
# test with random actions
|
|
||||||
ac = 2 * env.action_space.sample()
|
|
||||||
# ac = np.ones(env.action_space.shape)
|
|
||||||
obs, rew, d, info = env.step(ac)
|
|
||||||
env.render(mode=render_mode)
|
|
||||||
|
|
||||||
print(obs[env.active_obs].shape)
|
|
||||||
|
|
||||||
if d or i % 200 == 0:
|
|
||||||
env.reset()
|
|
||||||
|
|
||||||
env.close()
|
|
@ -10,7 +10,7 @@ def intersect(A, B, C, D):
|
|||||||
|
|
||||||
|
|
||||||
def check_self_collision(line_points):
|
def check_self_collision(line_points):
|
||||||
"Checks whether line segments and intersect"
|
"""Checks whether line segments and intersect"""
|
||||||
for i, line1 in enumerate(line_points):
|
for i, line1 in enumerate(line_points):
|
||||||
for line2 in line_points[i + 2:, :, :]:
|
for line2 in line_points[i + 2:, :, :]:
|
||||||
if intersect(line1[0], line1[-1], line2[0], line2[-1]):
|
if intersect(line1[0], line1[-1], line2[0], line2[-1]):
|
@ -0,0 +1 @@
|
|||||||
|
from .mp_wrapper import MPWrapper
|
34
alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py
Normal file
34
alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py
Normal file
@ -0,0 +1,34 @@
|
|||||||
|
from typing import Tuple, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from mp_env_api import MPEnvWrapper
|
||||||
|
|
||||||
|
|
||||||
|
class MPWrapper(MPEnvWrapper):
|
||||||
|
@property
|
||||||
|
def active_obs(self):
|
||||||
|
return np.hstack([
|
||||||
|
[self.env.random_start] * self.env.n_links, # cos
|
||||||
|
[self.env.random_start] * self.env.n_links, # sin
|
||||||
|
[self.env.random_start] * self.env.n_links, # velocity
|
||||||
|
[self.env.initial_via_target is None] * 2, # x-y coordinates of via point distance
|
||||||
|
[True] * 2, # x-y coordinates of target distance
|
||||||
|
[False] # env steps
|
||||||
|
])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.env.current_vel
|
||||||
|
|
||||||
|
@property
|
||||||
|
def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self.env.dt
|
@ -5,13 +5,12 @@ import matplotlib.pyplot as plt
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from gym.utils import seeding
|
from gym.utils import seeding
|
||||||
|
|
||||||
from alr_envs.classic_control.utils import check_self_collision
|
from alr_envs.alr.classic_control.utils import check_self_collision
|
||||||
from alr_envs.utils.mps.mp_environments import AlrEnv
|
|
||||||
|
|
||||||
|
|
||||||
class ViaPointReacher(AlrEnv):
|
class ViaPointReacherEnv(gym.Env):
|
||||||
|
|
||||||
def __init__(self, n_links, random_start: bool = True, via_target: Union[None, Iterable] = None,
|
def __init__(self, n_links, random_start: bool = False, via_target: Union[None, Iterable] = None,
|
||||||
target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000):
|
target: Union[None, Iterable] = None, allow_self_collision=False, collision_penalty=1000):
|
||||||
|
|
||||||
self.n_links = n_links
|
self.n_links = n_links
|
||||||
@ -20,8 +19,8 @@ class ViaPointReacher(AlrEnv):
|
|||||||
self.random_start = random_start
|
self.random_start = random_start
|
||||||
|
|
||||||
# provided initial parameters
|
# provided initial parameters
|
||||||
self._target = target # provided target value
|
self.intitial_target = target # provided target value
|
||||||
self._via_target = via_target # provided via point target value
|
self.initial_via_target = via_target # provided via point target value
|
||||||
|
|
||||||
# temp container for current env state
|
# temp container for current env state
|
||||||
self._via_point = np.ones(2)
|
self._via_point = np.ones(2)
|
||||||
@ -39,7 +38,7 @@ class ViaPointReacher(AlrEnv):
|
|||||||
self._start_vel = np.zeros(self.n_links)
|
self._start_vel = np.zeros(self.n_links)
|
||||||
self.weight_matrix_scale = 1
|
self.weight_matrix_scale = 1
|
||||||
|
|
||||||
self.dt = 0.01
|
self._dt = 0.01
|
||||||
|
|
||||||
action_bound = np.pi * np.ones((self.n_links,))
|
action_bound = np.pi * np.ones((self.n_links,))
|
||||||
state_bound = np.hstack([
|
state_bound = np.hstack([
|
||||||
@ -60,6 +59,22 @@ class ViaPointReacher(AlrEnv):
|
|||||||
self._steps = 0
|
self._steps = 0
|
||||||
self.seed()
|
self.seed()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self):
|
||||||
|
return self._dt
|
||||||
|
|
||||||
|
# @property
|
||||||
|
# def start_pos(self):
|
||||||
|
# return self._start_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self):
|
||||||
|
return self._joint_angles.copy()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self):
|
||||||
|
return self._angle_velocity.copy()
|
||||||
|
|
||||||
def step(self, action: np.ndarray):
|
def step(self, action: np.ndarray):
|
||||||
"""
|
"""
|
||||||
a single step with an action in joint velocity space
|
a single step with an action in joint velocity space
|
||||||
@ -104,22 +119,22 @@ class ViaPointReacher(AlrEnv):
|
|||||||
total_length = np.sum(self.link_lengths)
|
total_length = np.sum(self.link_lengths)
|
||||||
|
|
||||||
# rejection sampled point in inner circle with 0.5*Radius
|
# rejection sampled point in inner circle with 0.5*Radius
|
||||||
if self._via_target is None:
|
if self.initial_via_target is None:
|
||||||
via_target = np.array([total_length, total_length])
|
via_target = np.array([total_length, total_length])
|
||||||
while np.linalg.norm(via_target) >= 0.5 * total_length:
|
while np.linalg.norm(via_target) >= 0.5 * total_length:
|
||||||
via_target = self.np_random.uniform(low=-0.5 * total_length, high=0.5 * total_length, size=2)
|
via_target = self.np_random.uniform(low=-0.5 * total_length, high=0.5 * total_length, size=2)
|
||||||
else:
|
else:
|
||||||
via_target = np.copy(self._via_target)
|
via_target = np.copy(self.initial_via_target)
|
||||||
|
|
||||||
# rejection sampled point in outer circle
|
# rejection sampled point in outer circle
|
||||||
if self._target is None:
|
if self.intitial_target is None:
|
||||||
goal = np.array([total_length, total_length])
|
goal = np.array([total_length, total_length])
|
||||||
while np.linalg.norm(goal) >= total_length or np.linalg.norm(goal) <= 0.5 * total_length:
|
while np.linalg.norm(goal) >= total_length or np.linalg.norm(goal) <= 0.5 * total_length:
|
||||||
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
|
goal = self.np_random.uniform(low=-total_length, high=total_length, size=2)
|
||||||
else:
|
else:
|
||||||
goal = np.copy(self._target)
|
goal = np.copy(self.intitial_target)
|
||||||
|
|
||||||
self._via_target = via_target
|
self._via_point = via_target
|
||||||
self._goal = goal
|
self._goal = goal
|
||||||
|
|
||||||
def _update_joints(self):
|
def _update_joints(self):
|
||||||
@ -266,25 +281,6 @@ class ViaPointReacher(AlrEnv):
|
|||||||
|
|
||||||
plt.pause(0.01)
|
plt.pause(0.01)
|
||||||
|
|
||||||
@property
|
|
||||||
def active_obs(self):
|
|
||||||
return np.hstack([
|
|
||||||
[self.random_start] * self.n_links, # cos
|
|
||||||
[self.random_start] * self.n_links, # sin
|
|
||||||
[self.random_start] * self.n_links, # velocity
|
|
||||||
[self._via_target is None] * 2, # x-y coordinates of via point distance
|
|
||||||
[True] * 2, # x-y coordinates of target distance
|
|
||||||
[False] # env steps
|
|
||||||
])
|
|
||||||
|
|
||||||
@property
|
|
||||||
def start_pos(self) -> Union[float, int, np.ndarray]:
|
|
||||||
return self._start_pos
|
|
||||||
|
|
||||||
@property
|
|
||||||
def goal_pos(self) -> Union[float, int, np.ndarray]:
|
|
||||||
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
|
||||||
|
|
||||||
def seed(self, seed=None):
|
def seed(self, seed=None):
|
||||||
self.np_random, seed = seeding.np_random(seed)
|
self.np_random, seed = seeding.np_random(seed)
|
||||||
return [seed]
|
return [seed]
|
||||||
@ -296,26 +292,3 @@ class ViaPointReacher(AlrEnv):
|
|||||||
def close(self):
|
def close(self):
|
||||||
if self.fig is not None:
|
if self.fig is not None:
|
||||||
plt.close(self.fig)
|
plt.close(self.fig)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
nl = 5
|
|
||||||
render_mode = "human" # "human" or "partial" or "final"
|
|
||||||
env = ViaPointReacher(n_links=nl, allow_self_collision=False)
|
|
||||||
env.reset()
|
|
||||||
env.render(mode=render_mode)
|
|
||||||
|
|
||||||
for i in range(300):
|
|
||||||
# objective.load_result("/tmp/cma")
|
|
||||||
# test with random actions
|
|
||||||
ac = env.action_space.sample()
|
|
||||||
# ac[0] += np.pi/2
|
|
||||||
obs, rew, d, info = env.step(ac)
|
|
||||||
env.render(mode=render_mode)
|
|
||||||
|
|
||||||
print(rew)
|
|
||||||
|
|
||||||
if d:
|
|
||||||
break
|
|
||||||
|
|
||||||
env.close()
|
|
15
alr_envs/alr/mujoco/README.MD
Normal file
15
alr_envs/alr/mujoco/README.MD
Normal file
@ -0,0 +1,15 @@
|
|||||||
|
# Custom Mujoco tasks
|
||||||
|
|
||||||
|
## Step-based Environments
|
||||||
|
|Name| Description|Horizon|Action Dimension|Observation Dimension
|
||||||
|
|---|---|---|---|---|
|
||||||
|
|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
|
||||||
|
|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
|
||||||
|
|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
|
||||||
|
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|
||||||
|
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|
||||||
|
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|
||||||
|
|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
|
||||||
|
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|
||||||
|
|`ALRBallInACupGoal-v0`| Similar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
|
||||||
|
|
4
alr_envs/alr/mujoco/__init__.py
Normal file
4
alr_envs/alr/mujoco/__init__.py
Normal file
@ -0,0 +1,4 @@
|
|||||||
|
from .reacher.alr_reacher import ALRReacherEnv
|
||||||
|
from .reacher.balancing import BalancingEnv
|
||||||
|
from .ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
|
||||||
|
from .ball_in_a_cup.biac_pd import ALRBallInACupPDEnv
|
@ -1,30 +1,28 @@
|
|||||||
from gym import utils
|
from gym import utils
|
||||||
import os
|
import os
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_mujoco_env
|
from gym.envs.mujoco import MujocoEnv
|
||||||
|
|
||||||
|
|
||||||
class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|
||||||
|
class ALRBallInACupEnv(MujocoEnv, utils.EzPickle):
|
||||||
def __init__(self, n_substeps=4, apply_gravity_comp=True, simplified: bool = False,
|
def __init__(self, n_substeps=4, apply_gravity_comp=True, simplified: bool = False,
|
||||||
reward_type: str = None, context: np.ndarray = None):
|
reward_type: str = None, context: np.ndarray = None):
|
||||||
|
utils.EzPickle.__init__(**locals())
|
||||||
self._steps = 0
|
self._steps = 0
|
||||||
|
|
||||||
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
|
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml")
|
||||||
"biac_base" + ".xml")
|
|
||||||
|
|
||||||
self._q_pos = []
|
self._q_pos = []
|
||||||
self._q_vel = []
|
self._q_vel = []
|
||||||
# self.weight_matrix_scale = 50
|
# self.weight_matrix_scale = 50
|
||||||
self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
|
self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
|
||||||
self.p_gains = 1 / self.max_ctrl * np.array([200, 300, 100, 100, 10, 10, 2.5])
|
|
||||||
self.d_gains = 1 / self.max_ctrl * np.array([7, 15, 5, 2.5, 0.3, 0.3, 0.05])
|
|
||||||
|
|
||||||
self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
|
self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
|
||||||
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
|
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
|
||||||
|
|
||||||
self.context = context
|
self.context = context
|
||||||
|
|
||||||
utils.EzPickle.__init__(self)
|
|
||||||
alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
||||||
self.xml_path,
|
self.xml_path,
|
||||||
apply_gravity_comp=apply_gravity_comp,
|
apply_gravity_comp=apply_gravity_comp,
|
||||||
@ -37,13 +35,13 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
self.sim_time = 8 # seconds
|
self.sim_time = 8 # seconds
|
||||||
self.sim_steps = int(self.sim_time / self.dt)
|
self.sim_steps = int(self.sim_time / self.dt)
|
||||||
if reward_type == "no_context":
|
if reward_type == "no_context":
|
||||||
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
|
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
|
||||||
reward_function = BallInACupReward
|
reward_function = BallInACupReward
|
||||||
elif reward_type == "contextual_goal":
|
elif reward_type == "contextual_goal":
|
||||||
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
|
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
|
||||||
reward_function = BallInACupReward
|
reward_function = BallInACupReward
|
||||||
else:
|
else:
|
||||||
raise ValueError("Unknown reward type")
|
raise ValueError("Unknown reward type: {}".format(reward_type))
|
||||||
self.reward_function = reward_function(self.sim_steps)
|
self.reward_function = reward_function(self.sim_steps)
|
||||||
|
|
||||||
@property
|
@property
|
||||||
@ -68,6 +66,10 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
def current_vel(self):
|
def current_vel(self):
|
||||||
return self.sim.data.qvel[0:7].copy()
|
return self.sim.data.qvel[0:7].copy()
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
self.reward_function.reset(None)
|
||||||
|
return super().reset()
|
||||||
|
|
||||||
def reset_model(self):
|
def reset_model(self):
|
||||||
init_pos_all = self.init_qpos.copy()
|
init_pos_all = self.init_qpos.copy()
|
||||||
init_pos_robot = self._start_pos
|
init_pos_robot = self._start_pos
|
||||||
@ -102,14 +104,18 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
done = success or self._steps == self.sim_steps - 1 or is_collided
|
done = success or self._steps == self.sim_steps - 1 or is_collided
|
||||||
self._steps += 1
|
self._steps += 1
|
||||||
else:
|
else:
|
||||||
reward = -2
|
reward = -2000
|
||||||
success = False
|
success = False
|
||||||
is_collided = False
|
is_collided = False
|
||||||
done = True
|
done = True
|
||||||
return ob, reward, done, dict(reward_dist=reward_dist,
|
return ob, reward, done, dict(reward_dist=reward_dist,
|
||||||
reward_ctrl=reward_ctrl,
|
reward_ctrl=reward_ctrl,
|
||||||
velocity=angular_vel,
|
velocity=angular_vel,
|
||||||
traj=self._q_pos, is_success=success,
|
# traj=self._q_pos,
|
||||||
|
action=a,
|
||||||
|
q_pos=self.sim.data.qpos[0:7].ravel().copy(),
|
||||||
|
q_vel=self.sim.data.qvel[0:7].ravel().copy(),
|
||||||
|
is_success=success,
|
||||||
is_collided=is_collided, sim_crash=crash)
|
is_collided=is_collided, sim_crash=crash)
|
||||||
|
|
||||||
def check_traj_in_joint_limits(self):
|
def check_traj_in_joint_limits(self):
|
||||||
@ -150,6 +156,22 @@ class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
des_vel_full[5] = des_vel[2]
|
des_vel_full[5] = des_vel[2]
|
||||||
return des_vel_full
|
return des_vel_full
|
||||||
|
|
||||||
|
def render(self, render_mode, **render_kwargs):
|
||||||
|
if render_mode == "plot_trajectory":
|
||||||
|
if self._steps == 1:
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
# plt.ion()
|
||||||
|
self.fig, self.axs = plt.subplots(3, 1)
|
||||||
|
|
||||||
|
if self._steps <= 1750:
|
||||||
|
for ax, cp in zip(self.axs, self.current_pos[1::2]):
|
||||||
|
ax.scatter(self._steps, cp, s=2, marker=".")
|
||||||
|
|
||||||
|
# self.fig.show()
|
||||||
|
|
||||||
|
else:
|
||||||
|
super().render(render_mode, **render_kwargs)
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
env = ALRBallInACupEnv()
|
env = ALRBallInACupEnv()
|
||||||
@ -172,4 +194,3 @@ if __name__ == "__main__":
|
|||||||
break
|
break
|
||||||
|
|
||||||
env.close()
|
env.close()
|
||||||
|
|
@ -0,0 +1,42 @@
|
|||||||
|
from typing import Tuple, Union
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
from mp_env_api import MPEnvWrapper
|
||||||
|
|
||||||
|
|
||||||
|
class BallInACupMPWrapper(MPEnvWrapper):
|
||||||
|
|
||||||
|
@property
|
||||||
|
def active_obs(self):
|
||||||
|
# TODO: @Max Filter observations correctly
|
||||||
|
return np.hstack([
|
||||||
|
[False] * 7, # cos
|
||||||
|
[False] * 7, # sin
|
||||||
|
# [True] * 2, # x-y coordinates of target distance
|
||||||
|
[False] # env steps
|
||||||
|
])
|
||||||
|
|
||||||
|
@property
|
||||||
|
def start_pos(self):
|
||||||
|
if self.simplified:
|
||||||
|
return self._start_pos[1::2]
|
||||||
|
else:
|
||||||
|
return self._start_pos
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.sim.data.qpos[0:7].copy()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
return self.sim.data.qvel[0:7].copy()
|
||||||
|
|
||||||
|
@property
|
||||||
|
def goal_pos(self):
|
||||||
|
# TODO: @Max I think the default value of returning to the start is reasonable here
|
||||||
|
raise ValueError("Goal position is not available and has to be learnt based on the environment.")
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self) -> Union[float, int]:
|
||||||
|
return self.env.dt
|
@ -1,5 +1,5 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_reward_fct
|
from alr_envs.alr.mujoco import alr_reward_fct
|
||||||
|
|
||||||
|
|
||||||
class BallInACupReward(alr_reward_fct.AlrReward):
|
class BallInACupReward(alr_reward_fct.AlrReward):
|
@ -1,11 +1,10 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_reward_fct
|
from alr_envs.alr.mujoco import alr_reward_fct
|
||||||
|
|
||||||
|
|
||||||
class BallInACupReward(alr_reward_fct.AlrReward):
|
class BallInACupReward(alr_reward_fct.AlrReward):
|
||||||
def __init__(self, sim_time):
|
def __init__(self, env):
|
||||||
self.sim_time = sim_time
|
self.env = env
|
||||||
|
|
||||||
self.collision_objects = ["cup_geom1", "cup_geom2", "cup_base_contact_below",
|
self.collision_objects = ["cup_geom1", "cup_geom2", "cup_base_contact_below",
|
||||||
"wrist_palm_link_convex_geom",
|
"wrist_palm_link_convex_geom",
|
||||||
"wrist_pitch_link_convex_decomposition_p1_geom",
|
"wrist_pitch_link_convex_decomposition_p1_geom",
|
||||||
@ -22,7 +21,7 @@ class BallInACupReward(alr_reward_fct.AlrReward):
|
|||||||
self.goal_final_id = None
|
self.goal_final_id = None
|
||||||
self.collision_ids = None
|
self.collision_ids = None
|
||||||
self._is_collided = False
|
self._is_collided = False
|
||||||
self.collision_penalty = 1
|
self.collision_penalty = 1000
|
||||||
|
|
||||||
self.ball_traj = None
|
self.ball_traj = None
|
||||||
self.dists = None
|
self.dists = None
|
||||||
@ -32,55 +31,62 @@ class BallInACupReward(alr_reward_fct.AlrReward):
|
|||||||
self.reset(None)
|
self.reset(None)
|
||||||
|
|
||||||
def reset(self, context):
|
def reset(self, context):
|
||||||
self.ball_traj = np.zeros(shape=(self.sim_time, 3))
|
# self.sim_time = self.env.sim.dtsim_time
|
||||||
|
self.ball_traj = [] # np.zeros(shape=(self.sim_time, 3))
|
||||||
self.dists = []
|
self.dists = []
|
||||||
self.dists_final = []
|
self.dists_final = []
|
||||||
self.costs = []
|
self.costs = []
|
||||||
self.action_costs = []
|
self.action_costs = []
|
||||||
|
self.angle_costs = []
|
||||||
self.cup_angles = []
|
self.cup_angles = []
|
||||||
|
|
||||||
def compute_reward(self, action, env):
|
def compute_reward(self, action):
|
||||||
self.ball_id = env.sim.model._body_name2id["ball"]
|
self.ball_id = self.env.sim.model._body_name2id["ball"]
|
||||||
self.ball_collision_id = env.sim.model._geom_name2id["ball_geom"]
|
self.ball_collision_id = self.env.sim.model._geom_name2id["ball_geom"]
|
||||||
self.goal_id = env.sim.model._site_name2id["cup_goal"]
|
self.goal_id = self.env.sim.model._site_name2id["cup_goal"]
|
||||||
self.goal_final_id = env.sim.model._site_name2id["cup_goal_final"]
|
self.goal_final_id = self.env.sim.model._site_name2id["cup_goal_final"]
|
||||||
self.collision_ids = [env.sim.model._geom_name2id[name] for name in self.collision_objects]
|
self.collision_ids = [self.env.sim.model._geom_name2id[name] for name in self.collision_objects]
|
||||||
|
|
||||||
ball_in_cup = self.check_ball_in_cup(env.sim, self.ball_collision_id)
|
ball_in_cup = self.check_ball_in_cup(self.env.sim, self.ball_collision_id)
|
||||||
|
|
||||||
# Compute the current distance from the ball to the inner part of the cup
|
# Compute the current distance from the ball to the inner part of the cup
|
||||||
goal_pos = env.sim.data.site_xpos[self.goal_id]
|
goal_pos = self.env.sim.data.site_xpos[self.goal_id]
|
||||||
ball_pos = env.sim.data.body_xpos[self.ball_id]
|
ball_pos = self.env.sim.data.body_xpos[self.ball_id]
|
||||||
goal_final_pos = env.sim.data.site_xpos[self.goal_final_id]
|
goal_final_pos = self.env.sim.data.site_xpos[self.goal_final_id]
|
||||||
self.dists.append(np.linalg.norm(goal_pos - ball_pos))
|
self.dists.append(np.linalg.norm(goal_pos - ball_pos))
|
||||||
self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
|
self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
|
||||||
self.ball_traj[env._steps, :] = ball_pos
|
# self.ball_traj[self.env._steps, :] = ball_pos
|
||||||
cup_quat = np.copy(env.sim.data.body_xquat[env.sim.model._body_name2id["cup"]])
|
self.ball_traj.append(ball_pos)
|
||||||
self.cup_angles.append(np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
|
cup_quat = np.copy(self.env.sim.data.body_xquat[self.env.sim.model._body_name2id["cup"]])
|
||||||
1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2)))
|
cup_angle = np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
|
||||||
|
1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2))
|
||||||
|
cost_angle = (cup_angle - np.pi / 2) ** 2
|
||||||
|
self.angle_costs.append(cost_angle)
|
||||||
|
self.cup_angles.append(cup_angle)
|
||||||
|
|
||||||
action_cost = np.sum(np.square(action))
|
action_cost = np.sum(np.square(action))
|
||||||
self.action_costs.append(action_cost)
|
self.action_costs.append(action_cost)
|
||||||
|
|
||||||
self._is_collided = self.check_collision(env.sim) or env.check_traj_in_joint_limits()
|
self._is_collided = self.check_collision(self.env.sim) or self.env.check_traj_in_joint_limits()
|
||||||
|
|
||||||
if env._steps == env.sim_steps - 1 or self._is_collided:
|
if self.env._steps == self.env.ep_length - 1 or self._is_collided:
|
||||||
t_min_dist = np.argmin(self.dists)
|
t_min_dist = np.argmin(self.dists)
|
||||||
angle_min_dist = self.cup_angles[t_min_dist]
|
angle_min_dist = self.cup_angles[t_min_dist]
|
||||||
cost_angle = (angle_min_dist - np.pi / 2)**2
|
# cost_angle = (angle_min_dist - np.pi / 2)**2
|
||||||
|
|
||||||
min_dist = self.dists[t_min_dist]
|
|
||||||
|
# min_dist = self.dists[t_min_dist]
|
||||||
dist_final = self.dists_final[-1]
|
dist_final = self.dists_final[-1]
|
||||||
min_dist_final = np.min(self.dists_final)
|
min_dist_final = np.min(self.dists_final)
|
||||||
|
|
||||||
cost = 0.5 * dist_final + 0.05 * cost_angle # TODO: Increase cost_angle weight # 0.5 * min_dist +
|
# cost = 0.5 * dist_final + 0.05 * cost_angle # TODO: Increase cost_angle weight # 0.5 * min_dist +
|
||||||
# reward = np.exp(-2 * cost) - 1e-2 * action_cost - self.collision_penalty * int(self._is_collided)
|
# reward = np.exp(-2 * cost) - 1e-2 * action_cost - self.collision_penalty * int(self._is_collided)
|
||||||
# reward = - dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided)
|
# reward = - dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided)
|
||||||
reward = - dist_final**2 - min_dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided)
|
reward = - dist_final**2 - min_dist_final**2 - 1e-4 * cost_angle - 1e-3 * action_cost - self.collision_penalty * int(self._is_collided)
|
||||||
success = dist_final < 0.05 and ball_in_cup and not self._is_collided
|
success = dist_final < 0.05 and ball_in_cup and not self._is_collided
|
||||||
crash = self._is_collided
|
crash = self._is_collided
|
||||||
else:
|
else:
|
||||||
reward = - 1e-5 * action_cost # TODO: increase action_cost weight
|
reward = - 1e-3 * action_cost - 1e-4 * cost_angle # TODO: increase action_cost weight
|
||||||
success = False
|
success = False
|
||||||
crash = False
|
crash = False
|
||||||
|
|
205
alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py
Normal file
205
alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py
Normal file
@ -0,0 +1,205 @@
|
|||||||
|
import os
|
||||||
|
|
||||||
|
import gym.envs.mujoco
|
||||||
|
import gym.envs.mujoco as mujoco_env
|
||||||
|
import mujoco_py.builder
|
||||||
|
import numpy as np
|
||||||
|
from gym import utils
|
||||||
|
|
||||||
|
from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper
|
||||||
|
from mp_env_api.utils.policies import PDControllerExtend
|
||||||
|
|
||||||
|
|
||||||
|
def make_detpmp_env(**kwargs):
|
||||||
|
name = kwargs.pop("name")
|
||||||
|
_env = gym.make(name)
|
||||||
|
policy = PDControllerExtend(_env, p_gains=kwargs.pop('p_gains'), d_gains=kwargs.pop('d_gains'))
|
||||||
|
kwargs['policy_type'] = policy
|
||||||
|
return DetPMPWrapper(_env, **kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
class ALRBallInACupPDEnv(mujoco_env.MujocoEnv, utils.EzPickle):
|
||||||
|
def __init__(self, frame_skip=4, apply_gravity_comp=True, simplified: bool = False,
|
||||||
|
reward_type: str = None, context: np.ndarray = None):
|
||||||
|
utils.EzPickle.__init__(**locals())
|
||||||
|
self._steps = 0
|
||||||
|
|
||||||
|
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml")
|
||||||
|
|
||||||
|
self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
|
||||||
|
|
||||||
|
self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
|
||||||
|
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
|
||||||
|
|
||||||
|
self.context = context
|
||||||
|
self.apply_gravity_comp = apply_gravity_comp
|
||||||
|
self.simplified = simplified
|
||||||
|
|
||||||
|
self._start_pos = np.array([0.0, 0.58760536, 0.0, 1.36004913, 0.0, -0.32072943, -1.57])
|
||||||
|
self._start_vel = np.zeros(7)
|
||||||
|
|
||||||
|
self.sim_time = 8 # seconds
|
||||||
|
self._dt = 0.02
|
||||||
|
self.ep_length = 4000 # based on 8 seconds with dt = 0.02 int(self.sim_time / self.dt)
|
||||||
|
if reward_type == "no_context":
|
||||||
|
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
|
||||||
|
reward_function = BallInACupReward
|
||||||
|
elif reward_type == "contextual_goal":
|
||||||
|
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
|
||||||
|
reward_function = BallInACupReward
|
||||||
|
else:
|
||||||
|
raise ValueError("Unknown reward type: {}".format(reward_type))
|
||||||
|
self.reward_function = reward_function(self)
|
||||||
|
|
||||||
|
mujoco_env.MujocoEnv.__init__(self, self.xml_path, frame_skip)
|
||||||
|
|
||||||
|
@property
|
||||||
|
def dt(self):
|
||||||
|
return self._dt
|
||||||
|
|
||||||
|
# TODO: @Max is this even needed?
|
||||||
|
@property
|
||||||
|
def start_vel(self):
|
||||||
|
if self.simplified:
|
||||||
|
return self._start_vel[1::2]
|
||||||
|
else:
|
||||||
|
return self._start_vel
|
||||||
|
|
||||||
|
# def _set_action_space(self):
|
||||||
|
# if self.simplified:
|
||||||
|
# bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)[1::2]
|
||||||
|
# else:
|
||||||
|
# bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)
|
||||||
|
# low, high = bounds.T
|
||||||
|
# self.action_space = spaces.Box(low=low, high=high, dtype=np.float32)
|
||||||
|
# return self.action_space
|
||||||
|
|
||||||
|
def reset(self):
|
||||||
|
self.reward_function.reset(None)
|
||||||
|
return super().reset()
|
||||||
|
|
||||||
|
def reset_model(self):
|
||||||
|
init_pos_all = self.init_qpos.copy()
|
||||||
|
init_pos_robot = self._start_pos
|
||||||
|
init_vel = np.zeros_like(init_pos_all)
|
||||||
|
|
||||||
|
self._steps = 0
|
||||||
|
self._q_pos = []
|
||||||
|
self._q_vel = []
|
||||||
|
|
||||||
|
start_pos = init_pos_all
|
||||||
|
start_pos[0:7] = init_pos_robot
|
||||||
|
|
||||||
|
self.set_state(start_pos, init_vel)
|
||||||
|
|
||||||
|
return self._get_obs()
|
||||||
|
|
||||||
|
def step(self, a):
|
||||||
|
reward_dist = 0.0
|
||||||
|
angular_vel = 0.0
|
||||||
|
reward_ctrl = - np.square(a).sum()
|
||||||
|
|
||||||
|
# if self.simplified:
|
||||||
|
# tmp = np.zeros(7)
|
||||||
|
# tmp[1::2] = a
|
||||||
|
# a = tmp
|
||||||
|
|
||||||
|
if self.apply_gravity_comp:
|
||||||
|
a += self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0]
|
||||||
|
|
||||||
|
crash = False
|
||||||
|
try:
|
||||||
|
self.do_simulation(a, self.frame_skip)
|
||||||
|
except mujoco_py.builder.MujocoException:
|
||||||
|
crash = True
|
||||||
|
# joint_cons_viol = self.check_traj_in_joint_limits()
|
||||||
|
|
||||||
|
ob = self._get_obs()
|
||||||
|
|
||||||
|
if not crash:
|
||||||
|
reward, success, is_collided = self.reward_function.compute_reward(a)
|
||||||
|
done = success or is_collided # self._steps == self.sim_steps - 1
|
||||||
|
self._steps += 1
|
||||||
|
else:
|
||||||
|
reward = -2000
|
||||||
|
success = False
|
||||||
|
is_collided = False
|
||||||
|
done = True
|
||||||
|
|
||||||
|
return ob, reward, done, dict(reward_dist=reward_dist,
|
||||||
|
reward_ctrl=reward_ctrl,
|
||||||
|
velocity=angular_vel,
|
||||||
|
# traj=self._q_pos,
|
||||||
|
action=a,
|
||||||
|
q_pos=self.sim.data.qpos[0:7].ravel().copy(),
|
||||||
|
q_vel=self.sim.data.qvel[0:7].ravel().copy(),
|
||||||
|
is_success=success,
|
||||||
|
is_collided=is_collided, sim_crash=crash)
|
||||||
|
|
||||||
|
def check_traj_in_joint_limits(self):
|
||||||
|
return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min)
|
||||||
|
|
||||||
|
# TODO: extend observation space
|
||||||
|
def _get_obs(self):
|
||||||
|
theta = self.sim.data.qpos.flat[:7]
|
||||||
|
return np.concatenate([
|
||||||
|
np.cos(theta),
|
||||||
|
np.sin(theta),
|
||||||
|
# self.get_body_com("target"), # only return target to make problem harder
|
||||||
|
[self._steps],
|
||||||
|
])
|
||||||
|
|
||||||
|
# These functions are for the task with 3 joint actuations
|
||||||
|
def extend_des_pos(self, des_pos):
|
||||||
|
des_pos_full = self._start_pos.copy()
|
||||||
|
des_pos_full[1] = des_pos[0]
|
||||||
|
des_pos_full[3] = des_pos[1]
|
||||||
|
des_pos_full[5] = des_pos[2]
|
||||||
|
return des_pos_full
|
||||||
|
|
||||||
|
def extend_des_vel(self, des_vel):
|
||||||
|
des_vel_full = self._start_vel.copy()
|
||||||
|
des_vel_full[1] = des_vel[0]
|
||||||
|
des_vel_full[3] = des_vel[1]
|
||||||
|
des_vel_full[5] = des_vel[2]
|
||||||
|
return des_vel_full
|
||||||
|
|
||||||
|
def render(self, render_mode, **render_kwargs):
|
||||||
|
if render_mode == "plot_trajectory":
|
||||||
|
if self._steps == 1:
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
# plt.ion()
|
||||||
|
self.fig, self.axs = plt.subplots(3, 1)
|
||||||
|
|
||||||
|
if self._steps <= 1750:
|
||||||
|
for ax, cp in zip(self.axs, self.current_pos[1::2]):
|
||||||
|
ax.scatter(self._steps, cp, s=2, marker=".")
|
||||||
|
|
||||||
|
# self.fig.show()
|
||||||
|
|
||||||
|
else:
|
||||||
|
super().render(render_mode, **render_kwargs)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
env = ALRBallInACupPDEnv(reward_type="no_context", simplified=True)
|
||||||
|
# env = gym.make("alr_envs:ALRBallInACupPDSimpleDetPMP-v0")
|
||||||
|
# ctxt = np.array([-0.20869846, -0.66376693, 1.18088501])
|
||||||
|
|
||||||
|
# env.configure(ctxt)
|
||||||
|
env.reset()
|
||||||
|
env.render("human")
|
||||||
|
for i in range(16000):
|
||||||
|
# test with random actions
|
||||||
|
ac = 0.02 * env.action_space.sample()[0:7]
|
||||||
|
# ac = env.start_pos
|
||||||
|
# ac[0] += np.pi/2
|
||||||
|
obs, rew, d, info = env.step(ac)
|
||||||
|
env.render("human")
|
||||||
|
|
||||||
|
print(rew)
|
||||||
|
|
||||||
|
if d:
|
||||||
|
break
|
||||||
|
|
||||||
|
env.close()
|
@ -1,6 +1,6 @@
|
|||||||
from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
|
from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
|
||||||
from alr_envs.utils.mps.dmp_wrapper import DmpWrapper
|
from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper
|
||||||
from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
|
from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper
|
||||||
|
|
||||||
|
|
||||||
def make_contextual_env(rank, seed=0):
|
def make_contextual_env(rank, seed=0):
|
||||||
@ -26,7 +26,7 @@ def make_contextual_env(rank, seed=0):
|
|||||||
return _init
|
return _init
|
||||||
|
|
||||||
|
|
||||||
def make_env(rank, seed=0):
|
def _make_env(rank, seed=0):
|
||||||
"""
|
"""
|
||||||
Utility function for multiprocessed env.
|
Utility function for multiprocessed env.
|
||||||
|
|
@ -1,11 +1,15 @@
|
|||||||
from gym import utils
|
|
||||||
import os
|
import os
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_mujoco_env
|
from gym import utils
|
||||||
|
from gym.envs.mujoco import MujocoEnv
|
||||||
|
|
||||||
|
|
||||||
class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|
||||||
def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
|
class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
|
||||||
|
def __init__(self, model_path, frame_skip, n_substeps=4, apply_gravity_comp=True, reward_function=None):
|
||||||
|
utils.EzPickle.__init__(**locals())
|
||||||
|
MujocoEnv.__init__(self, model_path=model_path, frame_skip=frame_skip)
|
||||||
self._steps = 0
|
self._steps = 0
|
||||||
|
|
||||||
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
|
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
|
||||||
@ -25,17 +29,15 @@ class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
|
self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
|
||||||
|
|
||||||
self.context = None
|
self.context = None
|
||||||
|
# alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
||||||
utils.EzPickle.__init__(self)
|
# self.xml_path,
|
||||||
alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
# apply_gravity_comp=apply_gravity_comp,
|
||||||
self.xml_path,
|
# n_substeps=n_substeps)
|
||||||
apply_gravity_comp=apply_gravity_comp,
|
|
||||||
n_substeps=n_substeps)
|
|
||||||
|
|
||||||
self.sim_time = 8 # seconds
|
self.sim_time = 8 # seconds
|
||||||
self.sim_steps = int(self.sim_time / self.dt)
|
self.sim_steps = int(self.sim_time / self.dt)
|
||||||
if reward_function is None:
|
if reward_function is None:
|
||||||
from alr_envs.mujoco.beerpong.beerpong_reward import BeerpongReward
|
from alr_envs.alr.mujoco.beerpong.beerpong_reward import BeerpongReward
|
||||||
reward_function = BeerpongReward
|
reward_function = BeerpongReward
|
||||||
self.reward_function = reward_function(self.sim, self.sim_steps)
|
self.reward_function = reward_function(self.sim, self.sim_steps)
|
||||||
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
|
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
|
@ -1,5 +1,5 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_reward_fct
|
from alr_envs.alr.mujoco import alr_reward_fct
|
||||||
|
|
||||||
|
|
||||||
class BeerpongReward(alr_reward_fct.AlrReward):
|
class BeerpongReward(alr_reward_fct.AlrReward):
|
@ -1,5 +1,5 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_reward_fct
|
from alr_envs.alr.mujoco import alr_reward_fct
|
||||||
|
|
||||||
|
|
||||||
class BeerpongReward(alr_reward_fct.AlrReward):
|
class BeerpongReward(alr_reward_fct.AlrReward):
|
@ -1,11 +1,13 @@
|
|||||||
from gym import utils
|
from gym import utils
|
||||||
import os
|
import os
|
||||||
import numpy as np
|
import numpy as np
|
||||||
from alr_envs.mujoco import alr_mujoco_env
|
from gym.envs.mujoco import MujocoEnv
|
||||||
|
|
||||||
|
|
||||||
class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
class ALRBeerpongEnv(MujocoEnv, utils.EzPickle):
|
||||||
def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
|
def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
|
||||||
|
utils.EzPickle.__init__(**locals())
|
||||||
|
|
||||||
self._steps = 0
|
self._steps = 0
|
||||||
|
|
||||||
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
|
self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
|
||||||
@ -26,16 +28,17 @@ class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
|
|||||||
|
|
||||||
self.context = None
|
self.context = None
|
||||||
|
|
||||||
utils.EzPickle.__init__(self)
|
MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=n_substeps)
|
||||||
alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
|
||||||
self.xml_path,
|
# alr_mujoco_env.AlrMujocoEnv.__init__(self,
|
||||||
apply_gravity_comp=apply_gravity_comp,
|
# self.xml_path,
|
||||||
n_substeps=n_substeps)
|
# apply_gravity_comp=apply_gravity_comp,
|
||||||
|
# n_substeps=n_substeps)
|
||||||
|
|
||||||
self.sim_time = 8 # seconds
|
self.sim_time = 8 # seconds
|
||||||
self.sim_steps = int(self.sim_time / self.dt)
|
self.sim_steps = int(self.sim_time / self.dt)
|
||||||
if reward_function is None:
|
if reward_function is None:
|
||||||
from alr_envs.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
|
from alr_envs.alr.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
|
||||||
reward_function = BeerpongReward
|
reward_function = BeerpongReward
|
||||||
self.reward_function = reward_function(self.sim, self.sim_steps)
|
self.reward_function = reward_function(self.sim, self.sim_steps)
|
||||||
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
|
self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
|
@ -1,6 +1,6 @@
|
|||||||
from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
|
from alr_envs.utils.mps.detpmp_wrapper import DetPMPWrapper
|
||||||
from alr_envs.mujoco.beerpong.beerpong import ALRBeerpongEnv
|
from alr_envs.alr.mujoco.beerpong.beerpong import ALRBeerpongEnv
|
||||||
from alr_envs.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
|
from alr_envs.alr.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
|
||||||
|
|
||||||
|
|
||||||
def make_contextual_env(rank, seed=0):
|
def make_contextual_env(rank, seed=0):
|
||||||
@ -26,7 +26,7 @@ def make_contextual_env(rank, seed=0):
|
|||||||
return _init
|
return _init
|
||||||
|
|
||||||
|
|
||||||
def make_env(rank, seed=0):
|
def _make_env(rank, seed=0):
|
||||||
"""
|
"""
|
||||||
Utility function for multiprocessed env.
|
Utility function for multiprocessed env.
|
||||||
|
|
@ -2,9 +2,9 @@ import numpy as np
|
|||||||
from gym import spaces
|
from gym import spaces
|
||||||
from gym.envs.robotics import robot_env, utils
|
from gym.envs.robotics import robot_env, utils
|
||||||
# import xml.etree.ElementTree as ET
|
# import xml.etree.ElementTree as ET
|
||||||
from alr_envs.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
|
from alr_envs.alr.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
|
||||||
import glfw
|
import glfw
|
||||||
from alr_envs.mujoco.gym_table_tennis.utils.experiment import ball_initialize
|
from alr_envs.alr.mujoco.gym_table_tennis.utils.experiment import ball_initialize
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
import os
|
import os
|
||||||
|
|
@ -1,6 +1,6 @@
|
|||||||
import numpy as np
|
import numpy as np
|
||||||
from gym.utils import seeding
|
from gym.utils import seeding
|
||||||
from alr_envs.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
|
from alr_envs.alr.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
|
@ -39,7 +39,7 @@ def config_save(dir_path, config):
|
|||||||
|
|
||||||
|
|
||||||
def change_kp_in_xml(kp_list,
|
def change_kp_in_xml(kp_list,
|
||||||
model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/envs/robotics/assets/table_tennis/right_arm_actuator.xml"):
|
model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/simple_reacher/robotics/assets/table_tennis/right_arm_actuator.xml"):
|
||||||
tree = ET.parse(model_path)
|
tree = ET.parse(model_path)
|
||||||
root = tree.getroot()
|
root = tree.getroot()
|
||||||
# for actuator in root.find("actuator"):
|
# for actuator in root.find("actuator"):
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user