<p>The box-pushing task presents an advanced environment for reinforcement learning (RL) systems, utilizing the versatile Franka Emika Panda robotic arm, which boasts seven degrees of freedom (DoFs). The objective of this task is to precisely manipulate a box to a specified goal location and orientation.</p>
<p>This environment defines its context space with a goal position constrained within a certain range along the x and y axes and a goal orientation that encompasses the full 360-degree range on the z-axis. The robot’s mission is to achieve positional accuracy within 5 centimeters and an orientation accuracy within 0.5 radians of the specified goal.</p>
<p>The observation space includes the sine and cosine values of the robotic joint angles, their velocities, and quaternion orientations for the end-effector and the box. The action space describes the applied torques for each joint.</p>
<p>A composite reward function serves as the performance metric for the RL system. It accounts for the distance to the goal, the box’s orientation, maintaining a rod within the box, achieving the rod’s desired orientation, and includes penalties for joint position and velocity limit violations, as well as an action cost for energy expenditure.</p>
<p>Variations of this environment are available, differing in reward structures and the optionality of randomizing the box’s initial position. These variations are purposefully designed to challenge RL algorithms, enhancing their generalization and adaptation capabilities. Temporally sparse environments only provide a reward at the last timestep. Spatially sparse environments only provide a reward, if the goal is almost reached, the box is close enought to the goal and somewhat correctly aligned.</p>
<p>The table tennis task offers a robotic arm equipped with seven degrees of freedom (DoFs). The task is to respond to incoming balls and return them accurately to a specified goal location on the opponent’s side of the table.</p>
<p>The context space for this environment includes the initial ball position, with x-coordinates ranging from -1 to -0.2 meters and y-coordinates from -0.65 to 0.65 meters, and the goal position with x-coordinates between -1.2 to -0.2 meters and y-coordinates from -0.6 to 0.6 meters. The full observation space comprises the sine and cosine values of the joint angles, the joint velocities, and the ball’s velocity, providing comprehensive information for the RL system to base its decisions on.</p>
<p>A task is considered successfully completed when the returned ball not only lands on the opponent’s side of the table but also within a tight margin of 20 centimeters from the goal location. The reward function is designed to reflect various conditions of play, including whether the ball was hit, if it landed on the table, and the proximity of the ball’s landing position to the goal location.</p>
<p>Variations of the table tennis environment are available to cater to different research needs. These variations maintain the foundational challenge of precise ball return while providing additional complexity for RL algorithms to overcome.</p>
<!-- TODO: Vid is ugly and unsuccessful. Replace. -->
<p>The Beer Pong task is based upon a robotic system with seven Degrees of Freedom (DoF), challenging the robot to throw a ball into a cup placed on a large table. The environment’s context is established by the cup’s location, defined within a range of x-coordinates from -1.42 to 1.42 meters and y-coordinates from -4.05 to -1.25 meters.</p>
<p>The observation space includes the cosine and sine of the robot’s joint angles, the angular velocities, and distances of the ball relative to the top and bottom of the cup, along with the cup’s position and the current timestep. The action space for the robot is defined by the torques applied to each joint. For episode-based methods, the parameter space is expanded to 15 dimensions, which includes two weights for the basis functions per joint and the duration of the throw, namely the ball release time.</p>
<p>Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball’s position relative to the cup and the action penalty is considered.</p>
<p>Conditions for the task are specified as follows:</p>
<ulclass="simple">
<li><p>The ball contacts the ground before touching the table.</p></li>
<li><p>The ball is not in the cup and has not made contact with the table.</p></li>
<li><p>The ball is not in the cup but has made contact with the table.</p></li>
<li><p>The ball successfully lands in the cup.</p></li>
</ul>
<p>An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward.</p>
<p>A successful throw in this task is determined by the ball landing in the cup at the episode’s conclusion, showcasing the robot’s ability to accurately predict and execute the complex motion required for this popular party game.</p>
<td><p>Same as <codeclass="docutils literal notranslate"><spanclass="pre">fancy/Reacher-v0</span></code>, but the distance penalty is only provided in the last time step.</p></td>
<td><p>Same as <codeclass="docutils literal notranslate"><spanclass="pre">fancy/LongReacher-v0</span></code>, but the distance penalty is only provided in the last time step.</p></td>
<td><p>Reacher task with 5 links, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.ReacherEnv</span></code></p></td>
<td><p>Sparse Reacher task with 5 links, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.ReacherEnv</span></code></p></td>
<td><p>Reacher task with 7 links, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.ReacherEnv</span></code></p></td>
<td><p>Sparse Reacher task with 7 links, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.ReacherEnv</span></code></p></td>
<td><p>Hopper Jump task with sparse rewards, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>Hopper Jump task with continuous rewards, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>HalfCheetah Jump task, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.HalfCheetah</span></code></p></td>
<td><p>Hopper Jump on Box task, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>Hopper Throw task, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>Hopper Throw in Basket task, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Hopper</span></code></p></td>
<td><p>Walker 2D Jump task, based on Gymnasium’s <codeclass="docutils literal notranslate"><spanclass="pre">gym.envs.mujoco.Walker2d</span></code></p></td>
<td><p>300</p></td>
<td><p>6</p></td>
<td><p>18 / 19*</p></td>
</tr>
</tbody>
</table>
<p>*Observation dimensions depend on configuration.</p>
<p>Most of these envs also exist as MP-variants. Refer to them using <codeclass="docutils literal notranslate"><spanclass="pre">fancy_DMP/<name></span></code><codeclass="docutils literal notranslate"><spanclass="pre">fancy_ProMP/<name></span></code> or <codeclass="docutils literal notranslate"><spanclass="pre">fancy_ProDMP/<name></span></code>.</p>