Final?! changes to docs
This commit is contained in:
		
							parent
							
								
									59c980e495
								
							
						
					
					
						commit
						4be05440a1
					
				
							
								
								
									
										
											BIN
										
									
								
								docs/build/doctrees/environment.pickle
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										
											BIN
										
									
								
								docs/build/doctrees/environment.pickle
									
									
									
									
										vendored
									
									
								
							
										
											Binary file not shown.
										
									
								
							
							
								
								
									
										
											BIN
										
									
								
								docs/build/doctrees/envs/fancy/mujoco.doctree
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										
											BIN
										
									
								
								docs/build/doctrees/envs/fancy/mujoco.doctree
									
									
									
									
										vendored
									
									
								
							
										
											Binary file not shown.
										
									
								
							| @ -65,13 +65,6 @@ The observation space includes the cosine and sine of the robot's joint angles, | |||||||
| 
 | 
 | ||||||
| Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball's position relative to the cup and the action penalty is considered. | Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball's position relative to the cup and the action penalty is considered. | ||||||
| 
 | 
 | ||||||
| Conditions for the task are specified as follows: |  | ||||||
| 
 |  | ||||||
| - The ball contacts the ground before touching the table. |  | ||||||
| - The ball is not in the cup and has not made contact with the table. |  | ||||||
| - The ball is not in the cup but has made contact with the table. |  | ||||||
| - The ball successfully lands in the cup. |  | ||||||
| 
 |  | ||||||
| An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward. | An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward. | ||||||
| 
 | 
 | ||||||
| A successful throw in this task is determined by the ball landing in the cup at the episode's conclusion, showcasing the robot's ability to accurately predict and execute the complex motion required for this popular party game. | A successful throw in this task is determined by the ball landing in the cup at the episode's conclusion, showcasing the robot's ability to accurately predict and execute the complex motion required for this popular party game. | ||||||
|  | |||||||
							
								
								
									
										7
									
								
								docs/build/html/envs/fancy/mujoco.html
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										7
									
								
								docs/build/html/envs/fancy/mujoco.html
									
									
									
									
										vendored
									
									
								
							| @ -241,13 +241,6 @@ | |||||||
| <p>The Beer Pong task is based upon a robotic system with seven Degrees of Freedom (DoF), challenging the robot to throw a ball into a cup placed on a large table. The environment’s context is established by the cup’s location, defined within a range of x-coordinates from -1.42 to 1.42 meters and y-coordinates from -4.05 to -1.25 meters.</p> | <p>The Beer Pong task is based upon a robotic system with seven Degrees of Freedom (DoF), challenging the robot to throw a ball into a cup placed on a large table. The environment’s context is established by the cup’s location, defined within a range of x-coordinates from -1.42 to 1.42 meters and y-coordinates from -4.05 to -1.25 meters.</p> | ||||||
| <p>The observation space includes the cosine and sine of the robot’s joint angles, the angular velocities, and distances of the ball relative to the top and bottom of the cup, along with the cup’s position and the current timestep. The action space for the robot is defined by the torques applied to each joint. For episode-based methods, the parameter space is expanded to 15 dimensions, which includes two weights for the basis functions per joint and the duration of the throw, namely the ball release time.</p> | <p>The observation space includes the cosine and sine of the robot’s joint angles, the angular velocities, and distances of the ball relative to the top and bottom of the cup, along with the cup’s position and the current timestep. The action space for the robot is defined by the torques applied to each joint. For episode-based methods, the parameter space is expanded to 15 dimensions, which includes two weights for the basis functions per joint and the duration of the throw, namely the ball release time.</p> | ||||||
| <p>Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball’s position relative to the cup and the action penalty is considered.</p> | <p>Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball’s position relative to the cup and the action penalty is considered.</p> | ||||||
| <p>Conditions for the task are specified as follows:</p> |  | ||||||
| <ul class="simple"> |  | ||||||
| <li><p>The ball contacts the ground before touching the table.</p></li> |  | ||||||
| <li><p>The ball is not in the cup and has not made contact with the table.</p></li> |  | ||||||
| <li><p>The ball is not in the cup but has made contact with the table.</p></li> |  | ||||||
| <li><p>The ball successfully lands in the cup.</p></li> |  | ||||||
| </ul> |  | ||||||
| <p>An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward.</p> | <p>An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward.</p> | ||||||
| <p>A successful throw in this task is determined by the ball landing in the cup at the episode’s conclusion, showcasing the robot’s ability to accurately predict and execute the complex motion required for this popular party game.</p> | <p>A successful throw in this task is determined by the ball landing in the cup at the episode’s conclusion, showcasing the robot’s ability to accurately predict and execute the complex motion required for this popular party game.</p> | ||||||
| <table class="docutils align-default"> | <table class="docutils align-default"> | ||||||
|  | |||||||
							
								
								
									
										2
									
								
								docs/build/html/searchindex.js
									
									
									
									
										vendored
									
									
								
							
							
						
						
									
										2
									
								
								docs/build/html/searchindex.js
									
									
									
									
										vendored
									
									
								
							
										
											
												File diff suppressed because one or more lines are too long
											
										
									
								
							| @ -65,13 +65,6 @@ The observation space includes the cosine and sine of the robot's joint angles, | |||||||
| 
 | 
 | ||||||
| Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball's position relative to the cup and the action penalty is considered. | Action penalties are implemented in the form of squared torque sums applied across all joints, penalizing excessive force and encouraging efficient motion. The reward function at each timestep t before the final timestep T penalizes the action penalty, while at t=T, a non-Markovian reward based on the ball's position relative to the cup and the action penalty is considered. | ||||||
| 
 | 
 | ||||||
| Conditions for the task are specified as follows: |  | ||||||
| 
 |  | ||||||
| - The ball contacts the ground before touching the table. |  | ||||||
| - The ball is not in the cup and has not made contact with the table. |  | ||||||
| - The ball is not in the cup but has made contact with the table. |  | ||||||
| - The ball successfully lands in the cup. |  | ||||||
| 
 |  | ||||||
| An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward. | An additional reward component at the final timestep T assesses the chosen ball release time to ensure it falls within a reasonable range. The overall return for an episode is the sum of the rewards at each timestep, the task-specific reward, and the release time reward. | ||||||
| 
 | 
 | ||||||
| A successful throw in this task is determined by the ball landing in the cup at the episode's conclusion, showcasing the robot's ability to accurately predict and execute the complex motion required for this popular party game. | A successful throw in this task is determined by the ball landing in the cup at the episode's conclusion, showcasing the robot's ability to accurately predict and execute the complex motion required for this popular party game. | ||||||
|  | |||||||
		Loading…
	
		Reference in New Issue
	
	Block a user