Compied new docs

This commit is contained in:
Dominik Moritz Roth 2024-03-20 11:30:03 +01:00
parent 072cbe978c
commit 20d0f97135
5 changed files with 43 additions and 14 deletions

Binary file not shown.

Binary file not shown.

View File

@ -18,6 +18,12 @@ A composite reward function serves as the performance metric for the RL system.
Variations of this environment are available, differing in reward structures and the optionality of randomizing the box's initial position. These variations are purposefully designed to challenge RL algorithms, enhancing their generalization and adaptation capabilities. Temporally sparse environments only provide a reward at the last timestep. Spatially sparse environments only provide a reward, if the goal is almost reached, the box is close enought to the goal and somewhat correctly aligned.
These environments all provide smoothness metrics as part of the return infos:
- mean_squared_jerk: Averages the square of jerk (rate of acceleration change) across the motion. Lower values indicate smoother movement.
- maximum_jerk: Identifies the highest jerk value encountered.
- dimensionless_jerk: Normalizes the summed squared jerk over the motion's duration and peak velocity, offering a scale-independent metric of smoothness
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ------------------------------------------ | -------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/BoxPushingDense-v0` | Custom Box-pushing task with dense rewards | 100 | 3 | 13 |
@ -49,6 +55,9 @@ Variations of the table tennis environment are available to cater to different r
| `fancy/TableTennisWind-v0` | Table Tennis task with wind effects, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisRndRobot-v0` | Table Tennis task with random initial robot joint positions \* | 350 | 7 | 19 |
\* Random initialization of robot joint position and speed can be enabled by providing `random_pos_scale` / `random_vel_scale` to make. `TableTennisRndRobot` is equivalent to `TableTennis4D` except, that `random_pos_scale` is set to 0.1 instead of 0 per default.
---
@ -89,8 +98,9 @@ A successful throw in this task is determined by the ball landing in the cup at
| `fancy/Reacher5dSparse-v0` | Sparse Reacher task with 5 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 5 | 20 |
| `fancy/Reacher7d-v0` | Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/Reacher7dSparse-v0` | Sparse Reacher task with 7 links, based on Gymnasium's `gym.envs.mujoco.ReacherEnv` | 200 | 7 | 22 |
| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/HopperJump-v0` | Hopper Jump task with continuous rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/HopperJumpMarkov-v0` | `fancy/HopperJump-v0`, but with an alternative reward that is markovian. | 250 | 3 | 15 / 16\* |
| `fancy/HopperJumpSparse-v0` | Hopper Jump task with sparse rewards, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 3 | 15 / 16\* |
| `fancy/AntJump-v0` | Ant Jump task, based on Gymnasium's `gym.envs.mujoco.Ant` | 200 | 8 | 119 |
| `fancy/HalfCheetahJump-v0` | HalfCheetah Jump task, based on Gymnasium's `gym.envs.mujoco.HalfCheetah` | 100 | 6 | 112 |
| `fancy/HopperJumpOnBox-v0` | Hopper Jump on Box task, based on Gymnasium's `gym.envs.mujoco.Hopper` | 250 | 4 | 16 / 100\* |

View File

@ -135,6 +135,12 @@
<p>The observation space includes the sine and cosine values of the robotic joint angles, their velocities, and quaternion orientations for the end-effector and the box. The action space describes the applied torques for each joint.</p>
<p>A composite reward function serves as the performance metric for the RL system. It accounts for the distance to the goal, the boxs orientation, maintaining a rod within the box, achieving the rods desired orientation, and includes penalties for joint position and velocity limit violations, as well as an action cost for energy expenditure.</p>
<p>Variations of this environment are available, differing in reward structures and the optionality of randomizing the boxs initial position. These variations are purposefully designed to challenge RL algorithms, enhancing their generalization and adaptation capabilities. Temporally sparse environments only provide a reward at the last timestep. Spatially sparse environments only provide a reward, if the goal is almost reached, the box is close enought to the goal and somewhat correctly aligned.</p>
<p>These environments all provide smoothness metrics as part of the return infos:</p>
<ul class="simple">
<li><p>mean_squared_jerk: Averages the square of jerk (rate of acceleration change) across the motion. Lower values indicate smoother movement.</p></li>
<li><p>maximum_jerk: Identifies the highest jerk value encountered.</p></li>
<li><p>dimensionless_jerk: Normalizes the summed squared jerk over the motions duration and peak velocity, offering a scale-independent metric of smoothness</p></li>
</ul>
<table class="docutils align-default">
<thead>
<tr class="row-odd"><th class="head"><p>Name</p></th>
@ -228,8 +234,15 @@
<td><p>7</p></td>
<td><p>19</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/TableTennisRndRobot-v0</span></code></p></td>
<td><p>Table Tennis task with random initial robot joint positions *</p></td>
<td><p>350</p></td>
<td><p>7</p></td>
<td><p>19</p></td>
</tr>
</tbody>
</table>
<p>* Random initialization of robot joint position and speed can be enabled by providing <code class="docutils literal notranslate"><span class="pre">random_pos_scale</span></code> / <code class="docutils literal notranslate"><span class="pre">random_vel_scale</span></code> to make. <code class="docutils literal notranslate"><span class="pre">TableTennisRndRobot</span></code> is equivalent to <code class="docutils literal notranslate"><span class="pre">TableTennis4D</span></code> except, that <code class="docutils literal notranslate"><span class="pre">random_pos_scale</span></code> is set to 0.1 instead of 0 per default.</p>
</section>
<hr class="docutils" />
<section id="beer-pong">
@ -335,49 +348,55 @@
<td><p>7</p></td>
<td><p>22</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJump-v0</span></code></p></td>
<td><p>Hopper Jump task with continuous rewards, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>15 / 16*</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJumpMarkov-v0</span></code></p></td>
<td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJump-v0</span></code>, but with an alternative reward that is markovian.</p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>15 / 16*</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJumpSparse-v0</span></code></p></td>
<td><p>Hopper Jump task with sparse rewards, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>15 / 16*</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJump-v0</span></code></p></td>
<td><p>Hopper Jump task with continuous rewards, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>15 / 16*</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/AntJump-v0</span></code></p></td>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/AntJump-v0</span></code></p></td>
<td><p>Ant Jump task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Ant</span></code></p></td>
<td><p>200</p></td>
<td><p>8</p></td>
<td><p>119</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HalfCheetahJump-v0</span></code></p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HalfCheetahJump-v0</span></code></p></td>
<td><p>HalfCheetah Jump task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.HalfCheetah</span></code></p></td>
<td><p>100</p></td>
<td><p>6</p></td>
<td><p>112</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJumpOnBox-v0</span></code></p></td>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperJumpOnBox-v0</span></code></p></td>
<td><p>Hopper Jump on Box task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>4</p></td>
<td><p>16 / 100*</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperThrow-v0</span></code></p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperThrow-v0</span></code></p></td>
<td><p>Hopper Throw task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>18 / 100*</p></td>
</tr>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperThrowInBasket-v0</span></code></p></td>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/HopperThrowInBasket-v0</span></code></p></td>
<td><p>Hopper Throw in Basket task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>250</p></td>
<td><p>3</p></td>
<td><p>18 / 100*</p></td>
</tr>
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/Walker2DJump-v0</span></code></p></td>
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">fancy/Walker2DJump-v0</span></code></p></td>
<td><p>Walker 2D Jump task, based on Gymnasiums <code class="docutils literal notranslate"><span class="pre">gym.envs.mujoco.Walker2d</span></code></p></td>
<td><p>300</p></td>
<td><p>6</p></td>

File diff suppressed because one or more lines are too long