update readme and init

This commit is contained in:
Maximilian Huettenrauch 2021-04-23 12:16:19 +02:00
parent db001c411f
commit ba0b612868
2 changed files with 41 additions and 14 deletions

View File

@ -19,6 +19,8 @@ Currently we have the following environments:
|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27 |`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27 |`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27 |`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
### Classic Control ### Classic Control
@ -26,20 +28,24 @@ Currently we have the following environments:
|---|---|---|---|---| |---|---|---|---|---|
|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9 |`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18 |`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | |`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
|`HoleReacher-v0`| |`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
### DMP Environments ### DMP Environments
These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller. These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
The goal is to learn the parameters of this DMP to generate a suitable trajectory. The goal is to learn the parameters of this DMP to generate a suitable trajectory.
All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions. All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions.
|Name| Description|Horizon|Action Dimension|Observation Dimension |Name| Description|Horizon|Action Dimension|Context Dimension
|---|---|---|---|---| |---|---|---|---|---|
|`ViaPointReacherDMP-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | |`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
|`HoleReacherDMP-v0`| |`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
|`HoleReacherFixedGoalDMP-v0`| |`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
|`HoleReacherDetPMP-v0`| |`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for a simplified `ALRBallInACup-v0` task where only 3 joints are actuated. | 4000 | 15
|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
[//]: |`HoleReacherDetPMP-v0`|
### Stochastic Search ### Stochastic Search
|Name| Description|Horizon|Action Dimension|Observation Dimension |Name| Description|Horizon|Action Dimension|Observation Dimension
@ -76,3 +82,5 @@ for i in range(10000):
state = env.reset() state = env.reset()
``` ```
For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py)

View File

@ -72,7 +72,7 @@ register(
) )
register( register(
id='ALRBallInACupSimple-v0', id='ALRBallInACup-v0',
entry_point='alr_envs.mujoco:ALRBallInACupEnv', entry_point='alr_envs.mujoco:ALRBallInACupEnv',
max_episode_steps=4000, max_episode_steps=4000,
kwargs={ kwargs={
@ -209,7 +209,7 @@ register(
id='ALRBallInACupSimpleDMP-v0', id='ALRBallInACupSimpleDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env', entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
kwargs={ kwargs={
"name": "alr_envs:ALRBallInACupSimple-v0", "name": "alr_envs:ALRBallInACup-v0",
"num_dof": 3, "num_dof": 3,
"num_basis": 5, "num_basis": 5,
"duration": 3.5, "duration": 3.5,
@ -223,18 +223,37 @@ register(
} }
) )
register(
id='ALRBallInACupDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
kwargs={
"name": "alr_envs:ALRBallInACup-v0",
"num_dof": 7,
"num_basis": 5,
"duration": 3.5,
"post_traj_time": 4.5,
"learn_goal": False,
"alpha_phase": 3,
"bandwidth_factor": 2.5,
"policy_type": "motor",
"weights_scale": 100,
"return_to_start": True
}
)
register( register(
id='ALRBallInACupGoalDMP-v0', id='ALRBallInACupGoalDMP-v0',
entry_point='alr_envs.utils.make_env_helpers:make_dmp_env', entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
kwargs={ kwargs={
"name": "alr_envs:ALRBallInACupGoal-v0", "name": "alr_envs:ALRBallInACupGoal-v0",
"num_dof": 5, "num_dof": 7,
"num_basis": 5, "num_basis": 5,
"duration": 2, "duration": 3.5,
"post_traj_time": 4.5,
"learn_goal": True, "learn_goal": True,
"alpha_phase": 2, "alpha_phase": 3,
"bandwidth_factor": 2, "bandwidth_factor": 2.5,
"policy_type": "velocity", "policy_type": "motor",
"weights_scale": 50, "weights_scale": 50,
"goal_scale": 0.1 "goal_scale": 0.1
} }