Text descriptions of our flagship Mujoco envs

This commit is contained in:
Dominik Moritz Roth 2024-01-18 12:38:01 +01:00
parent 9a42779027
commit 1e55bf13cb

View File

@ -4,27 +4,21 @@
### Environments made by Fancy Gym
#### Beer Pong
TODO: Change image
<div class='center'>
<img src="../../_static/imgs/Box_Pushing.gif" style="margin: 5%; width: 45%;">
</div>
<br>
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ------------------------------- | ---------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
| `fancy/BeerPongStepBased-v0` | Step-based rewards for the Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
#### Box Pushing
<div class='center'>
<img src="../../_static/imgs/Box_Pushing.gif" style="margin: 5%; width: 45%;">
</div>
<br>
The box-pushing task presents an advanced environment for reinforcement learning (RL) systems, utilizing the versatile Franka Emika Panda robotic arm, which boasts seven degrees of freedom (DoFs). The objective of this task is to precisely manipulate a box to a specified goal location and orientation.
This environment defines its context space with a goal position constrained within a certain range along the x and y axes and a goal orientation that encompasses the full 360-degree range on the z-axis. The robot's mission is to achieve positional accuracy within 5 centimeters and an orientation accuracy within 0.5 radians of the specified goal.
The observation space includes the sine and cosine values of the robotic joint angles, their velocities, and quaternion orientations for the end-effector and the box. The action space describes the applied torques for each joint.
A composite reward function serves as the performance metric for the RL system. It accounts for the distance to the goal, the box's orientation, maintaining a rod within the box, achieving the rod's desired orientation, and includes penalties for joint position and velocity limit violations, as well as an action cost for energy expenditure.
Variations of this environment are available, differing in reward structures and the optionality of randomizing the box's initial position. These variations are purposefully designed to challenge RL algorithms, enhancing their generalization and adaptation capabilities.
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ------------------------------------------ | -------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
@ -32,14 +26,21 @@ TODO: Change image
| `fancy/BoxPushingTemporalSparse-v0` | Custom Box-pushing task with temporally sparse rewards | 100 | 3 | 13 |
| `fancy/BoxPushingTemporalSpatialSparse-v0` | Custom Box-pushing task with temporally and spatially sparse rewards | 100 | 3 | 13 |
---
#### Table Tennis
TODO: Change image
<div class='center'>
<img src="../../_static/imgs/Box_Pushing.gif" style="margin: 5%; width: 45%;">
<img src="../../_static/imgs/Table_Tennis.gif" style="margin: 5%; width: 45%;">
</div>
<br>
The table tennis task offers a dynamic and interactive environment designed for the development and testing of reinforcement learning (RL) systems. Using a robotic arm equipped with seven degrees of freedom (DoFs), the challenge is to respond to incoming balls and return them accurately to a specified goal location on the opponent's side of the table.
The context space for this environment includes the initial ball position, with x-coordinates ranging from -1 to -0.2 meters and y-coordinates from -0.65 to 0.65 meters, and the goal position with x-coordinates between -1.2 to -0.2 meters and y-coordinates from -0.6 to 0.6 meters. The full observation space comprises the sine and cosine values of the joint angles, the joint velocities, and the ball's velocity, providing comprehensive information for the RL system to base its decisions on.
A task is considered successfully completed when the returned ball not only lands on the opponent's side of the table but also within a tight margin of 20 centimeters from the goal location. The reward function is designed to reflect various conditions of play, including whether the ball was hit, if it landed on the table, and the proximity of the ball's landing position to the goal location.
Variations of the table tennis environment are available to cater to different research needs. These variations maintain the foundational challenge of precise ball return while providing additional complexity for RL algorithms to overcome.
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ----------------------------------- | -------------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
@ -51,6 +52,40 @@ TODO: Change image
| `fancy/TableTennisGoalSwitching-v0` | Table Tennis task with goal switching, based on a custom environment for table tennis | 350 | 7 | 19 |
| `fancy/TableTennisWindReplan-v0` | Table Tennis task with wind effects and replanning, based on a custom environment for table tennis | 350 | 7 | 19 |
---
#### Beer Pong
The Beer Pong task is based upon a robotic system with seven Degrees of Freedom (DoF), challenging the robot to throw a ball into a cup placed on a large table. The environment's context is established by the cup's location, defined within a range of x-coordinates from -1.42 to 1.42 meters and y-coordinates from -4.05 to -1.25 meters.
The observation space includes the cosine and sine of the robot's joint angles, the angular velocities, and distances of the ball relative to the top and bottom of the cup, along with the cup's position and the current timestep. The action space for the robot is defined by the torques applied to each joint. For episode-based methods, the parameter space is expanded to 15 dimensions, which includes two weights for the basis functions per joint and the duration of the throw, namely the ball release time.
Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball's position relative to the cup and the action penalty is considered.
Conditions for the task are specified as follows:
- The ball contacts the ground before touching the table.
- The ball is not in the cup and has not made contact with the table.
- The ball is not in the cup but has made contact with the table.
- The ball successfully lands in the cup.
An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward.
A successful throw in this task is determined by the ball landing in the cup at the episode's conclusion, showcasing the robot's ability to accurately predict and execute the complex motion required for this popular party game.
<div class='center'>
<img src="../../_static/imgs/Beer_Pong.gif" style="margin: 5%; width: 45%;">
</div>
<!-- TODO: Vid is ugly and unsuccessful. Replace. -->
| Name | Description | Horizon | Action Dimension | Observation Dimension |
| ------------------------------- | ---------------------------------------------------------------------------------------------- | ------- | ---------------- | --------------------- |
| `fancy/BeerPong-v0` | Beer Pong task, based on a custom environment with multiple task variations | 300 | 3 | 29 |
| `fancy/BeerPongStepBased-v0` | Step-based rewards for the Beer Pong task, based on a custom environment with episodic rewards | 300 | 3 | 29 |
| `fancy/BeerPongFixedRelease-v0` | Beer Pong with fixed release, based on a custom environment with episodic rewards | 300 | 3 | 29 |
---
### Variations of existing environments
| Name | Description | Horizon | Action Dimension | Observation Dimension |