diff --git a/README.md b/README.md
index e9f88a1..41d87f2 100644
--- a/README.md
+++ b/README.md
@@ -1,34 +1,57 @@
-## ALR Custom Environments
+## ALR Environments
-This repository collects custom RL envs not included in Suits like OpenAI gym, rllab, etc.
-Creating a custom (Mujoco) gym environement can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
-For stochastic search problems with gym interace use the Rosenbrock reference implementation.
+This repository collects custom Robotics environments not included in benchmark suites like OpenAI gym, rllab, etc.
+Creating a custom (Mujoco) gym environment can be done according to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md).
+For stochastic search problems with gym interface use the `Rosenbrock-v0` reference implementation.
+We also support to solve environments with DMPs. When adding new DMP tasks check the `ViaPointReacherDMP-v0` reference implementation.
+When simply using the tasks, you can also leverage the wrapper class `DmpWrapper` to turn normal gym environments in to DMP tasks.
## Environments
-Currently we have the following environements:
+Currently we have the following environments:
### Mujoco
-|Name| Description|
-|---|---|
-|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)|
-|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.|
-|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the the end effector has to stay upright.|
-|`ALRReacherShort-v0`|Same as `ALRReacher-v0`, but the episode length is reduced to 50.|
-|`ALRReacherShortSparse-v0`|Combination of `ALRReacherSparse-v0` and `ALRReacherShort-v0`.|
-|`ALRReacher7-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)|
-|`ALRReacher7Sparse-v0`|Same as `ALRReacher7-v0`, but the distance penalty is only provided in the last time step.|
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`ALRReacher-v0`|Modified (5 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 5 | 21
+|`ALRReacherSparse-v0`|Same as `ALRReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 5 | 21
+|`ALRReacherSparseBalanced-v0`|Same as `ALRReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 5 | 21
+|`ALRLongReacher-v0`|Modified (7 links) Mujoco gym's `Reacher-v2` (2 links)| 200 | 7 | 27
+|`ALRLongReacherSparse-v0`|Same as `ALRLongReacher-v0`, but the distance penalty is only provided in the last time step.| 200 | 7 | 27
+|`ALRLongReacherSparseBalanced-v0`|Same as `ALRLongReacherSparse-v0`, but the end-effector has to remain upright.| 200 | 7 | 27
+|`ALRBallInACupSimple-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector. | 4000 | 3 | wip
+|`ALRBallInACup-v0`| Ball-in-a-cup task where a robot needs to catch a ball attached to a cup at its end-effector | 4000 | 7 | wip
+|`ALRBallInACupGoal-v0`| Similiar to `ALRBallInACupSimple-v0` but the ball needs to be caught at a specified goal position | 4000 | 7 | wip
### Classic Control
-|Name| Description|
-|---|---|
-|`SimpleReacher-v0`| Simple Reaching Task without any physics simulation. Returns no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.|
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`SimpleReacher-v0`| Simple reaching task (2 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 2 | 9
+|`LongSimpleReacher-v0`| Simple reaching task (5 links) without any physics simulation. Provides no reward until 150 time steps. This allows the agent to explore the space, but requires precise actions towards the end of the trajectory.| 200 | 5 | 18
+|`ViaPointReacher-v0`| Simple reaching task leveraging a via point, which supports self collision detection. Provides a reward only at 100 and 199 for reaching the viapoint and goal point, respectively.| 200 | 5 | 18
+|`HoleReacher-v0`| 5 link reaching task where the end-effector needs to reach into a narrow hole without collding with itself or walls | 200 | 5 | 18
+
+### DMP Environments
+These environments are closer to stochastic search. They always execute a full trajectory, which is computed by a DMP and executed by a controller, e.g. a PD controller.
+The goal is to learn the parameters of this DMP to generate a suitable trajectory.
+All environments provide the full episode reward and additional information about early terminations, e.g. due to collisions.
+
+|Name| Description|Horizon|Action Dimension|Context Dimension
+|---|---|---|---|---|
+|`ViaPointReacherDMP-v0`| A DMP provides a trajectory for the `ViaPointReacher-v0` task. | 200 | 25
+|`HoleReacherFixedGoalDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task with a fixed goal attractor. | 200 | 25
+|`HoleReacherDMP-v0`| A DMP provides a trajectory for the `HoleReacher-v0` task. The goal attractor needs to be learned. | 200 | 30
+|`ALRBallInACupSimpleDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupSimple-v0` task where only 3 joints are actuated. | 4000 | 15
+|`ALRBallInACupDMP-v0`| A DMP provides a trajectory for the `ALRBallInACup-v0` task. | 4000 | 35
+|`ALRBallInACupGoalDMP-v0`| A DMP provides a trajectory for the `ALRBallInACupGoal-v0` task. | 4000 | 35 | 3
+
+[//]: |`HoleReacherDetPMP-v0`|
### Stochastic Search
-|Name| Description|
-|---|---|
-|`Rosenbrock{dim}-v0`| Gym interface for Rosenbrock function. `{dim}` is one of 5, 10, 25, 50 or 100. |
+|Name| Description|Horizon|Action Dimension|Observation Dimension
+|---|---|---|---|---|
+|`Rosenbrock{dim}-v0`| Gym interface for Rosenbrock function. `{dim}` is one of 5, 10, 25, 50 or 100. | 1 | `{dim}` | 0
## Install
@@ -60,3 +83,5 @@ for i in range(10000):
state = env.reset()
```
+
+For an example using a DMP wrapped env and asynchronous sampling look at [mp_env_async_sampler.py](./alr_envs/utils/mp_env_async_sampler.py)
\ No newline at end of file
diff --git a/alr_envs/__init__.py b/alr_envs/__init__.py
index 2e50f80..f814080 100644
--- a/alr_envs/__init__.py
+++ b/alr_envs/__init__.py
@@ -1,6 +1,9 @@
from gym.envs.registration import register
from alr_envs.stochastic_search.functions.f_rosenbrock import Rosenbrock
+# from alr_envs.utils.wrapper.dmp_wrapper import DmpWrapper
+
+# Mujoco
register(
id='ALRReacher-v0',
@@ -9,6 +12,7 @@ register(
kwargs={
"steps_before_reward": 0,
"n_links": 5,
+ "balance": False,
}
)
@@ -19,6 +23,7 @@ register(
kwargs={
"steps_before_reward": 200,
"n_links": 5,
+ "balance": False,
}
)
@@ -34,65 +39,68 @@ register(
)
register(
- id='ALRReacherShort-v0',
- entry_point='alr_envs.mujoco:ALRReacherEnv',
- max_episode_steps=50,
- kwargs={
- "steps_before_reward": 0,
- "n_links": 5,
- }
-)
-
-register(
- id='ALRReacherShortSparse-v0',
- entry_point='alr_envs.mujoco:ALRReacherEnv',
- max_episode_steps=50,
- kwargs={
- "steps_before_reward": 50,
- "n_links": 5,
- }
-)
-
-register(
- id='ALRReacher7-v0',
+ id='ALRLongReacher-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 0,
"n_links": 7,
+ "balance": False,
}
)
register(
- id='ALRReacher7Sparse-v0',
+ id='ALRLongReacherSparse-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
max_episode_steps=200,
kwargs={
"steps_before_reward": 200,
"n_links": 7,
+ "balance": False,
}
)
register(
- id='ALRReacher7Short-v0',
+ id='ALRLongReacherSparseBalanced-v0',
entry_point='alr_envs.mujoco:ALRReacherEnv',
- max_episode_steps=50,
+ max_episode_steps=200,
kwargs={
- "steps_before_reward": 0,
+ "steps_before_reward": 200,
"n_links": 7,
+ "balance": True,
}
)
register(
- id='ALRReacher7ShortSparse-v0',
- entry_point='alr_envs.mujoco:ALRReacherEnv',
- max_episode_steps=50,
+ id='ALRBallInACupSimple-v0',
+ entry_point='alr_envs.mujoco:ALRBallInACupEnv',
+ max_episode_steps=4000,
kwargs={
- "steps_before_reward": 50,
- "n_links": 7,
+ "simplified": True,
+ "reward_type": "no_context"
}
)
+register(
+ id='ALRBallInACup-v0',
+ entry_point='alr_envs.mujoco:ALRBallInACupEnv',
+ max_episode_steps=4000,
+ kwargs={
+ "reward_type": "no_context"
+ }
+)
+
+register(
+ id='ALRBallInACupGoal-v0',
+ entry_point='alr_envs.mujoco:ALRBallInACupEnv',
+ max_episode_steps=4000,
+ kwargs={
+ "reward_type": "contextual_goal"
+ }
+)
+
+# Classic control
+
register(
id='Balancing-v0',
entry_point='alr_envs.mujoco:BalancingEnv',
@@ -112,7 +120,7 @@ register(
)
register(
- id='SimpleReacher5-v0',
+ id='LongSimpleReacher-v0',
entry_point='alr_envs.classic_control:SimpleReacherEnv',
max_episode_steps=200,
kwargs={
@@ -120,12 +128,155 @@ register(
}
)
+register(
+ id='ViaPointReacher-v0',
+ entry_point='alr_envs.classic_control.viapoint_reacher:ViaPointReacher',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "allow_self_collision": False,
+ "collision_penalty": 1000
+ }
+)
+
+register(
+ id='HoleReacher-v0',
+ entry_point='alr_envs.classic_control.hole_reacher:HoleReacher',
+ max_episode_steps=200,
+ kwargs={
+ "n_links": 5,
+ "allow_self_collision": False,
+ "allow_wall_collision": False,
+ "hole_width": 0.25,
+ "hole_depth": 1,
+ "hole_x": 2,
+ "collision_penalty": 100,
+ }
+)
+
+# MP environments
+
+register(
+ id='ViaPointReacherDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
+ # max_episode_steps=1,
+ kwargs={
+ "name": "alr_envs:ViaPointReacher-v0",
+ "num_dof": 5,
+ "num_basis": 5,
+ "duration": 2,
+ "alpha_phase": 2,
+ "learn_goal": False,
+ "policy_type": "velocity",
+ "weights_scale": 50,
+ }
+)
+
+register(
+ id='HoleReacherDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
+ # max_episode_steps=1,
+ kwargs={
+ "name": "alr_envs:HoleReacher-v0",
+ "num_dof": 5,
+ "num_basis": 5,
+ "duration": 2,
+ "learn_goal": True,
+ "alpha_phase": 2,
+ "bandwidth_factor": 2,
+ "policy_type": "velocity",
+ "weights_scale": 50,
+ "goal_scale": 0.1
+ }
+)
+
+# TODO: properly add final_pos
+register(
+ id='HoleReacherFixedGoalDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
+ # max_episode_steps=1,
+ kwargs={
+ "name": "alr_envs:HoleReacher-v0",
+ "num_dof": 5,
+ "num_basis": 5,
+ "duration": 2,
+ "learn_goal": False,
+ "alpha_phase": 2,
+ "policy_type": "velocity",
+ "weights_scale": 50,
+ "goal_scale": 0.1
+ }
+)
+
+# register(
+# id='HoleReacherDetPMP-v0',
+# entry_point='alr_envs.classic_control.hole_reacher:holereacher_detpmp',
+# # max_episode_steps=1,
+# # TODO: add mp kwargs
+# )
+
+register(
+ id='ALRBallInACupSimpleDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
+ kwargs={
+ "name": "alr_envs:ALRBallInACupSimple-v0",
+ "num_dof": 3,
+ "num_basis": 5,
+ "duration": 3.5,
+ "post_traj_time": 4.5,
+ "learn_goal": False,
+ "alpha_phase": 3,
+ "bandwidth_factor": 2.5,
+ "policy_type": "motor",
+ "weights_scale": 100,
+ "return_to_start": True
+ }
+)
+
+register(
+ id='ALRBallInACupDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_dmp_env',
+ kwargs={
+ "name": "alr_envs:ALRBallInACup-v0",
+ "num_dof": 7,
+ "num_basis": 5,
+ "duration": 3.5,
+ "post_traj_time": 4.5,
+ "learn_goal": False,
+ "alpha_phase": 3,
+ "bandwidth_factor": 2.5,
+ "policy_type": "motor",
+ "weights_scale": 100,
+ "return_to_start": True
+ }
+)
+
+register(
+ id='ALRBallInACupGoalDMP-v0',
+ entry_point='alr_envs.utils.make_env_helpers:make_contextual_env',
+ kwargs={
+ "name": "alr_envs:ALRBallInACupGoal-v0",
+ "num_dof": 7,
+ "num_basis": 5,
+ "duration": 3.5,
+ "post_traj_time": 4.5,
+ "learn_goal": True,
+ "alpha_phase": 3,
+ "bandwidth_factor": 2.5,
+ "policy_type": "motor",
+ "weights_scale": 50,
+ "goal_scale": 0.1
+ }
+)
+
+# BBO functions
+
for dim in [5, 10, 25, 50, 100]:
register(
id=f'Rosenbrock{dim}-v0',
entry_point='alr_envs.stochastic_search:StochasticSearchEnv',
max_episode_steps=1,
kwargs={
- "cost_f": Rosenbrock,
+ "cost_f": Rosenbrock(dim),
}
)
diff --git a/alr_envs/classic_control/__init__.py b/alr_envs/classic_control/__init__.py
index 53e2448..8d31d19 100644
--- a/alr_envs/classic_control/__init__.py
+++ b/alr_envs/classic_control/__init__.py
@@ -1 +1,3 @@
from alr_envs.classic_control.simple_reacher import SimpleReacherEnv
+from alr_envs.classic_control.viapoint_reacher import ViaPointReacher
+from alr_envs.classic_control.hole_reacher import HoleReacher
diff --git a/alr_envs/classic_control/hole_reacher.py b/alr_envs/classic_control/hole_reacher.py
new file mode 100644
index 0000000..3b382f9
--- /dev/null
+++ b/alr_envs/classic_control/hole_reacher.py
@@ -0,0 +1,295 @@
+import gym
+import numpy as np
+import matplotlib.pyplot as plt
+from matplotlib import patches
+from alr_envs.classic_control.utils import check_self_collision
+
+
+class HoleReacher(gym.Env):
+
+ def __init__(self, n_links, hole_x, hole_width, hole_depth, allow_self_collision=False,
+ allow_wall_collision=False, collision_penalty=1000):
+
+ self.n_links = n_links
+ self.link_lengths = np.ones((n_links, 1))
+
+ # task
+ self.hole_x = hole_x # x-position of center of hole
+ self.hole_width = hole_width # width of hole
+ self.hole_depth = hole_depth # depth of hole
+
+ self.bottom_center_of_hole = np.hstack([hole_x, -hole_depth])
+ self.top_center_of_hole = np.hstack([hole_x, 0])
+ self.left_wall_edge = np.hstack([hole_x - self.hole_width / 2, 0])
+ self.right_wall_edge = np.hstack([hole_x + self.hole_width / 2, 0])
+
+ # collision
+ self.allow_self_collision = allow_self_collision
+ self.allow_wall_collision = allow_wall_collision
+ self.collision_penalty = collision_penalty
+
+ # state
+ self._joints = None
+ self._joint_angles = None
+ self._angle_velocity = None
+ self.start_pos = np.hstack([[np.pi / 2], np.zeros(self.n_links - 1)])
+ self.start_vel = np.zeros(self.n_links)
+
+ self.dt = 0.01
+ # self.time_limit = 2
+
+ action_bound = np.pi * np.ones((self.n_links,))
+ state_bound = np.hstack([
+ [np.pi] * self.n_links, # cos
+ [np.pi] * self.n_links, # sin
+ [np.inf] * self.n_links, # velocity
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps TODO: Maybe
+ ])
+ self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
+ self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ self.fig = None
+ rect_1 = patches.Rectangle((-self.n_links, -1),
+ self.n_links + self.hole_x - self.hole_width / 2, 1,
+ fill=True, edgecolor='k', facecolor='k')
+ rect_2 = patches.Rectangle((self.hole_x + self.hole_width / 2, -1),
+ self.n_links - self.hole_x + self.hole_width / 2, 1,
+ fill=True, edgecolor='k', facecolor='k')
+ rect_3 = patches.Rectangle((self.hole_x - self.hole_width / 2, -1), self.hole_width,
+ 1 - self.hole_depth,
+ fill=True, edgecolor='k', facecolor='k')
+
+ self.patches = [rect_1, rect_2, rect_3]
+
+ @property
+ def end_effector(self):
+ return self._joints[self.n_links].T
+
+ def configure(self, context):
+ pass
+
+ def reset(self):
+ self._joint_angles = self.start_pos
+ self._angle_velocity = self.start_vel
+ self._joints = np.zeros((self.n_links + 1, 2))
+ self._update_joints()
+ self._steps = 0
+
+ return self._get_obs().copy()
+
+ def step(self, action: np.ndarray):
+ """
+ a single step with an action in joint velocity space
+ """
+ vel = action # + 0.01 * np.random.randn(self.num_links)
+ acc = (vel - self._angle_velocity) / self.dt
+ self._angle_velocity = vel
+ self._joint_angles = self._joint_angles + self.dt * self._angle_velocity
+
+ self._update_joints()
+
+ # rew = self._reward()
+
+ # compute reward directly in step function
+
+ success = False
+ reward = 0
+ if not self._is_collided:
+ if self._steps == 199:
+ dist = np.linalg.norm(self.end_effector - self.bottom_center_of_hole)
+ reward = - dist ** 2
+ success = dist < 0.005
+ else:
+ dist = np.linalg.norm(self.end_effector - self.bottom_center_of_hole)
+ # if self.collision_penalty != 0:
+ # reward = -self.collision_penalty
+ # else:
+ reward = - dist ** 2 - self.collision_penalty
+
+ reward -= 5e-8 * np.sum(acc ** 2)
+
+ info = {"is_collided": self._is_collided, "is_success": success}
+
+ self._steps += 1
+
+ # done = self._steps * self.dt > self.time_limit or self._is_collided
+ done = self._is_collided
+
+ return self._get_obs().copy(), reward, done, info
+
+ def _update_joints(self):
+ """
+ update _joints to get new end effector position. The other links are only required for rendering.
+ Returns:
+
+ """
+ line_points_in_taskspace = self.get_forward_kinematics(num_points_per_link=20)
+
+ self._joints[1:, 0] = self._joints[0, 0] + line_points_in_taskspace[:, -1, 0]
+ self._joints[1:, 1] = self._joints[0, 1] + line_points_in_taskspace[:, -1, 1]
+
+ self_collision = False
+ wall_collision = False
+
+ if not self.allow_self_collision:
+ self_collision = check_self_collision(line_points_in_taskspace)
+ if np.any(np.abs(self._joint_angles) > np.pi) and not self.allow_self_collision:
+ self_collision = True
+
+ if not self.allow_wall_collision:
+ wall_collision = self.check_wall_collision(line_points_in_taskspace)
+
+ self._is_collided = self_collision or wall_collision
+
+ def _get_obs(self):
+ theta = self._joint_angles
+ return np.hstack([
+ np.cos(theta),
+ np.sin(theta),
+ self._angle_velocity,
+ self.end_effector - self.bottom_center_of_hole,
+ self._steps
+ ])
+
+ def get_forward_kinematics(self, num_points_per_link=1):
+ theta = self._joint_angles[:, None]
+
+ if num_points_per_link > 1:
+ intermediate_points = np.linspace(0, 1, num_points_per_link)
+ else:
+ intermediate_points = 1
+
+ accumulated_theta = np.cumsum(theta, axis=0)
+
+ endeffector = np.zeros(shape=(self.n_links, num_points_per_link, 2))
+
+ x = np.cos(accumulated_theta) * self.link_lengths * intermediate_points
+ y = np.sin(accumulated_theta) * self.link_lengths * intermediate_points
+
+ endeffector[0, :, 0] = x[0, :]
+ endeffector[0, :, 1] = y[0, :]
+
+ for i in range(1, self.n_links):
+ endeffector[i, :, 0] = x[i, :] + endeffector[i - 1, -1, 0]
+ endeffector[i, :, 1] = y[i, :] + endeffector[i - 1, -1, 1]
+
+ return np.squeeze(endeffector + self._joints[0, :])
+
+ def check_wall_collision(self, line_points):
+
+ # all points that are before the hole in x
+ r, c = np.where(line_points[:, :, 0] < (self.hole_x - self.hole_width / 2))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_before_hole = np.sum(line_points[r, c, 1] < 0)
+
+ if nr_line_points_below_surface_before_hole > 0:
+ return True
+
+ # all points that are after the hole in x
+ r, c = np.where(line_points[:, :, 0] > (self.hole_x + self.hole_width / 2))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_after_hole = np.sum(line_points[r, c, 1] < 0)
+
+ if nr_line_points_below_surface_after_hole > 0:
+ return True
+
+ # all points that are above the hole
+ r, c = np.where((line_points[:, :, 0] > (self.hole_x - self.hole_width / 2)) & (
+ line_points[:, :, 0] < (self.hole_x + self.hole_width / 2)))
+
+ # check if any of those points are below surface
+ nr_line_points_below_surface_in_hole = np.sum(line_points[r, c, 1] < -self.hole_depth)
+
+ if nr_line_points_below_surface_in_hole > 0:
+ return True
+
+ return False
+
+ def render(self, mode='human'):
+ if self.fig is None:
+ self.fig = plt.figure()
+ # plt.ion()
+ # plt.pause(0.01)
+ else:
+ plt.figure(self.fig.number)
+
+ if mode == "human":
+ plt.cla()
+ plt.title(f"Iteration: {self._steps}, distance: {self.end_effector - self.bottom_center_of_hole}")
+
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+
+ lim = np.sum(self.link_lengths) + 0.5
+ plt.xlim([-lim, lim])
+ plt.ylim([-1.1, lim])
+ # plt.draw()
+ plt.pause(1e-4) # pushes window to foreground, which is annoying.
+ # self.fig.canvas.flush_events()
+
+ elif mode == "partial":
+ if self._steps == 1:
+ # fig, ax = plt.subplots()
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+ # plt.pause(0.01)
+
+ if self._steps % 20 == 0 or self._steps in [1, 199] or self._is_collided:
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k', alpha=self._steps / 200)
+ # ax.plot(line_points_in_taskspace[:, 0, 0],
+ # line_points_in_taskspace[:, 0, 1],
+ # line_points_in_taskspace[:, -1, 0],
+ # line_points_in_taskspace[:, -1, 1], marker='o', color='k', alpha=t / 200)
+
+ lim = np.sum(self.link_lengths) + 0.5
+ plt.xlim([-lim, lim])
+ plt.ylim([-1.1, lim])
+ plt.pause(0.01)
+
+ elif mode == "final":
+ if self._steps == 199 or self._is_collided:
+ # fig, ax = plt.subplots()
+
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+
+ plt.xlim(-self.n_links, self.n_links), plt.ylim(-1, self.n_links)
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+
+ plt.pause(0.01)
+
+ def close(self):
+ if self.fig is not None:
+ plt.close(self.fig)
+
+
+if __name__ == '__main__':
+ nl = 5
+ render_mode = "human" # "human" or "partial" or "final"
+ env = HoleReacher(n_links=nl, allow_self_collision=False, allow_wall_collision=False, hole_width=0.15,
+ hole_depth=1, hole_x=1)
+ env.reset()
+ # env.render(mode=render_mode)
+
+ for i in range(200):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = 2 * env.action_space.sample()
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ env.render(mode=render_mode)
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
diff --git a/alr_envs/classic_control/simple_reacher.py b/alr_envs/classic_control/simple_reacher.py
index 042b207..296662c 100644
--- a/alr_envs/classic_control/simple_reacher.py
+++ b/alr_envs/classic_control/simple_reacher.py
@@ -1,14 +1,14 @@
-import os
-
import gym
-import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from gym import spaces
from gym.utils import seeding
-if os.environ.get("DISPLAY", None):
- mpl.use('Qt5Agg')
+from alr_envs.utils.utils import angle_normalize
+
+
+# if os.environ.get("DISPLAY", None):
+# mpl.use('Qt5Agg')
class SimpleReacherEnv(gym.Env):
@@ -31,7 +31,7 @@ class SimpleReacherEnv(gym.Env):
self._angle_velocity = None
self.max_torque = 1 # 10
- self.steps_before_reward = 180
+ self.steps_before_reward = 199
action_bound = np.ones((self.n_links,))
state_bound = np.hstack([
@@ -90,7 +90,7 @@ class SimpleReacherEnv(gym.Env):
def _update_joints(self):
"""
- update _joints to get new end effector position. The other links are only required for rendering.
+ update joints to get new end-effector position. The other links are only required for rendering.
Returns:
"""
@@ -104,7 +104,7 @@ class SimpleReacherEnv(gym.Env):
# TODO: Is this the best option
if self._steps >= self.steps_before_reward:
- reward_dist = - np.linalg.norm(diff)
+ reward_dist -= np.linalg.norm(diff)
# reward_dist = np.exp(-0.1 * diff ** 2).mean()
# reward_dist = - (diff ** 2).mean()
diff --git a/alr_envs/classic_control/utils.py b/alr_envs/classic_control/utils.py
new file mode 100644
index 0000000..3a9bb50
--- /dev/null
+++ b/alr_envs/classic_control/utils.py
@@ -0,0 +1,17 @@
+def ccw(A, B, C):
+ return (C[1] - A[1]) * (B[0] - A[0]) - (B[1] - A[1]) * (C[0] - A[0]) > 1e-12
+
+
+# Return true if line segments AB and CD intersect
+def intersect(A, B, C, D):
+ return ccw(A, C, D) != ccw(B, C, D) and ccw(A, B, C) != ccw(A, B, D)
+
+
+def check_self_collision(line_points):
+ for i, line1 in enumerate(line_points):
+ for line2 in line_points[i + 2:, :, :]:
+ # if line1 != line2:
+ if intersect(line1[0], line1[-1], line2[0], line2[-1]):
+ return True
+ return False
+
diff --git a/alr_envs/classic_control/viapoint_reacher.py b/alr_envs/classic_control/viapoint_reacher.py
new file mode 100644
index 0000000..127bf77
--- /dev/null
+++ b/alr_envs/classic_control/viapoint_reacher.py
@@ -0,0 +1,230 @@
+import gym
+import matplotlib.pyplot as plt
+import numpy as np
+
+from alr_envs.classic_control.utils import check_self_collision
+
+
+class ViaPointReacher(gym.Env):
+
+ def __init__(self, n_links, allow_self_collision=False, collision_penalty=1000):
+ self.num_links = n_links
+ self.link_lengths = np.ones((n_links, 1))
+
+ # task
+ self.via_point = np.ones(2)
+ self.goal_point = np.array((n_links, 0))
+
+ # collision
+ self.allow_self_collision = allow_self_collision
+ self.collision_penalty = collision_penalty
+
+ # state
+ self._joints = None
+ self._joint_angles = None
+ self._angle_velocity = None
+ self.start_pos = np.hstack([[np.pi / 2], np.zeros(self.num_links - 1)])
+ self.start_vel = np.zeros(self.num_links)
+ self.weight_matrix_scale = 1
+
+ self._steps = 0
+ self.dt = 0.01
+ # self.time_limit = 2
+
+ action_bound = np.pi * np.ones((self.num_links,))
+ state_bound = np.hstack([
+ [np.pi] * self.num_links, # cos
+ [np.pi] * self.num_links, # sin
+ [np.inf] * self.num_links, # velocity
+ [np.inf] * 2, # x-y coordinates of target distance
+ [np.inf] # env steps, because reward start after n steps TODO: Maybe
+ ])
+ self.action_space = gym.spaces.Box(low=-action_bound, high=action_bound, shape=action_bound.shape)
+ self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape)
+
+ self.fig = None
+
+ @property
+ def end_effector(self):
+ return self._joints[self.num_links].T
+
+ def configure(self, context):
+ pass
+
+ def reset(self):
+ self._joint_angles = self.start_pos
+ self._angle_velocity = self.start_vel
+ self._joints = np.zeros((self.num_links + 1, 2))
+ self._update_joints()
+ self._steps = 0
+
+ return self._get_obs().copy()
+
+ def step(self, action: np.ndarray):
+ """
+ a single step with an action in joint velocity space
+ """
+ vel = action
+ acc = (vel - self._angle_velocity) / self.dt
+ self._angle_velocity = vel
+ self._joint_angles = self._joint_angles + self.dt * self._angle_velocity
+
+ self._update_joints()
+
+ dist_reward = 0
+ if not self._is_collided:
+ if self._steps == 100:
+ dist_reward = np.linalg.norm(self.end_effector - self.via_point)
+ elif self._steps == 199:
+ dist_reward = np.linalg.norm(self.end_effector - self.goal_point)
+
+ # TODO: Do we need that?
+ reward = - dist_reward ** 2
+
+ reward -= 5e-8 * np.sum(acc**2)
+
+ if self._is_collided:
+ reward -= self.collision_penalty
+
+ info = {"is_collided": self._is_collided}
+
+ self._steps += 1
+
+ # done = self._steps * self.dt > self.time_limit or self._is_collided
+ done = self._is_collided
+
+ return self._get_obs().copy(), reward, done, info
+
+ def _update_joints(self):
+ """
+ update _joints to get new end effector position. The other links are only required for rendering.
+ Returns:
+
+ """
+ line_points_in_taskspace = self.get_forward_kinematics(num_points_per_link=20)
+
+ self._joints[1:, 0] = self._joints[0, 0] + line_points_in_taskspace[:, -1, 0]
+ self._joints[1:, 1] = self._joints[0, 1] + line_points_in_taskspace[:, -1, 1]
+
+ self_collision = False
+
+ if not self.allow_self_collision:
+ self_collision = check_self_collision(line_points_in_taskspace)
+ if np.any(np.abs(self._joint_angles) > np.pi):
+ self_collision = True
+
+ self._is_collided = self_collision
+
+ def _get_obs(self):
+ theta = self._joint_angles
+ return np.hstack([
+ np.cos(theta),
+ np.sin(theta),
+ self._angle_velocity,
+ self.end_effector - self.via_point,
+ self.end_effector - self.goal_point,
+ self._steps
+ ])
+
+ def get_forward_kinematics(self, num_points_per_link=1):
+ theta = self._joint_angles[:, None]
+
+ intermediate_points = np.linspace(0, 1, num_points_per_link) if num_points_per_link > 1 else 1
+
+ accumulated_theta = np.cumsum(theta, axis=0)
+
+ endeffector = np.zeros(shape=(self.num_links, num_points_per_link, 2))
+
+ x = np.cos(accumulated_theta) * self.link_lengths * intermediate_points
+ y = np.sin(accumulated_theta) * self.link_lengths * intermediate_points
+
+ endeffector[0, :, 0] = x[0, :]
+ endeffector[0, :, 1] = y[0, :]
+
+ for i in range(1, self.num_links):
+ endeffector[i, :, 0] = x[i, :] + endeffector[i - 1, -1, 0]
+ endeffector[i, :, 1] = y[i, :] + endeffector[i - 1, -1, 1]
+
+ return np.squeeze(endeffector + self._joints[0, :])
+
+ def render(self, mode='human'):
+ if self.fig is None:
+ self.fig = plt.figure()
+ # plt.ion()
+ # plt.pause(0.01)
+ else:
+ plt.figure(self.fig.number)
+
+ if mode == "human":
+ plt.cla()
+ plt.title(f"Iteration: {self._steps}")
+
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+
+ lim = np.sum(self.link_lengths) + 0.5
+ plt.xlim([-lim, lim])
+ plt.ylim([-lim, lim])
+ # plt.draw()
+ plt.pause(1e-4) # pushes window to foreground, which is annoying.
+ # self.fig.canvas.flush_events()
+
+ elif mode == "partial":
+ if self._steps == 1:
+ # fig, ax = plt.subplots()
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+ # plt.pause(0.01)
+
+ if self._steps % 20 == 0 or self._steps in [1, 199] or self._is_collided:
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k', alpha=self._steps / 200)
+ # ax.plot(line_points_in_taskspace[:, 0, 0],
+ # line_points_in_taskspace[:, 0, 1],
+ # line_points_in_taskspace[:, -1, 0],
+ # line_points_in_taskspace[:, -1, 1], marker='o', color='k', alpha=t / 200)
+
+ lim = np.sum(self.link_lengths) + 0.5
+ plt.xlim([-lim, lim])
+ plt.ylim([-1.1, lim])
+ plt.pause(0.01)
+
+ elif mode == "final":
+ if self._steps == 199 or self._is_collided:
+ # fig, ax = plt.subplots()
+
+ # Add the patch to the Axes
+ [plt.gca().add_patch(rect) for rect in self.patches]
+
+ plt.xlim(-self.num_links, self.num_links), plt.ylim(-1, self.num_links)
+ # Arm
+ plt.plot(self._joints[:, 0], self._joints[:, 1], 'ro-', markerfacecolor='k')
+
+ plt.pause(0.01)
+
+ def close(self):
+ if self.fig is not None:
+ plt.close(self.fig)
+
+
+if __name__ == '__main__':
+ nl = 5
+ render_mode = "human" # "human" or "partial" or "final"
+ env = ViaPointReacher(n_links=nl, allow_self_collision=False)
+ env.reset()
+ env.render(mode=render_mode)
+
+ for i in range(300):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = env.action_space.sample()
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ env.render(mode=render_mode)
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
diff --git a/alr_envs/mujoco/__init__.py b/alr_envs/mujoco/__init__.py
index b149793..eb7b41f 100644
--- a/alr_envs/mujoco/__init__.py
+++ b/alr_envs/mujoco/__init__.py
@@ -1,2 +1,3 @@
-from alr_envs.mujoco.alr_reacher import ALRReacherEnv
+from alr_envs.mujoco.reacher.alr_reacher import ALRReacherEnv
from alr_envs.mujoco.balancing import BalancingEnv
+from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
diff --git a/alr_envs/mujoco/alr_mujoco_env.py b/alr_envs/mujoco/alr_mujoco_env.py
new file mode 100644
index 0000000..edc90b6
--- /dev/null
+++ b/alr_envs/mujoco/alr_mujoco_env.py
@@ -0,0 +1,269 @@
+from collections import OrderedDict
+import os
+
+
+from gym import error, spaces
+from gym.utils import seeding
+import numpy as np
+from os import path
+import gym
+
+try:
+ import mujoco_py
+except ImportError as e:
+ raise error.DependencyNotInstalled("{}. (HINT: you need to install mujoco_py, and also perform the setup instructions here: https://github.com/openai/mujoco-py/.)".format(e))
+
+DEFAULT_SIZE = 500
+
+
+def convert_observation_to_space(observation):
+ if isinstance(observation, dict):
+ space = spaces.Dict(OrderedDict([
+ (key, convert_observation_to_space(value))
+ for key, value in observation.items()
+ ]))
+ elif isinstance(observation, np.ndarray):
+ low = np.full(observation.shape, -float('inf'), dtype=np.float32)
+ high = np.full(observation.shape, float('inf'), dtype=np.float32)
+ space = spaces.Box(low, high, dtype=observation.dtype)
+ else:
+ raise NotImplementedError(type(observation), observation)
+
+ return space
+
+
+class AlrMujocoEnv(gym.Env):
+ """
+ Superclass for all MuJoCo environments.
+ """
+
+ def __init__(self, model_path, n_substeps, apply_gravity_comp=True):
+ """
+
+ Args:
+ model_path: path to xml file
+ n_substeps: how many steps mujoco does per call to env.step
+ use_servo: use actuator defined in xml, use False for direct torque control
+ """
+ if model_path.startswith("/"):
+ fullpath = model_path
+ else:
+ fullpath = os.path.join(os.path.dirname(__file__), "assets", model_path)
+ if not path.exists(fullpath):
+ raise IOError("File %s does not exist" % fullpath)
+ self.n_substeps = n_substeps
+ self.apply_gravity_comp = apply_gravity_comp
+ self.model = mujoco_py.load_model_from_path(fullpath)
+ self.sim = mujoco_py.MjSim(self.model, nsubsteps=n_substeps)
+ self.data = self.sim.data
+ self.viewer = None
+ self._viewers = {}
+
+ self.metadata = {
+ 'render.modes': ['human', 'rgb_array', 'depth_array'],
+ 'video.frames_per_second': int(np.round(1.0 / self.dt))
+ }
+
+ self.init_qpos = self.sim.data.qpos.ravel().copy()
+ self.init_qvel = self.sim.data.qvel.ravel().copy()
+
+ self._start_pos = None
+ self._start_vel = None
+
+ self._set_action_space()
+
+ # action = self.action_space.sample()
+ # observation, _reward, done, _info = self.step(action)
+ # assert not done
+
+ observation = self._get_obs() # TODO: is calling get_obs enough? should we call reset, or even step?
+
+ self._set_observation_space(observation)
+
+ self.seed()
+
+ @property
+ def current_pos(self):
+ """
+ By default returns the joint positions of all simulated objects. May be overridden in subclass.
+ """
+ return self.sim.data.qpos
+
+ @property
+ def current_vel(self):
+ """
+ By default returns the joint velocities of all simulated objects. May be overridden in subclass.
+ """
+ return self.sim.data.qvel
+
+ @property
+ def start_pos(self):
+ """
+ Start position of the agent, for example joint angles of a Panda robot. Necessary for MP wrapped envs.
+ """
+ return self._start_pos
+
+ @property
+ def start_vel(self):
+ """
+ Start velocity of the agent. Necessary for MP wrapped envs.
+ """
+ return self._start_vel
+
+ def extend_des_pos(self, des_pos):
+ """
+ In a simplified environment, the actions may only control a subset of all the joints.
+ Extend the trajectory to match the environments full action space
+ Args:
+ des_pos:
+
+ Returns:
+
+ """
+ pass
+
+ def extend_des_vel(self, des_vel):
+ pass
+
+ def _set_action_space(self):
+ bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)
+ low, high = bounds.T
+ self.action_space = spaces.Box(low=low, high=high, dtype=np.float32)
+ return self.action_space
+
+ def _set_observation_space(self, observation):
+ self.observation_space = convert_observation_to_space(observation)
+ return self.observation_space
+
+ def seed(self, seed=None):
+ self.np_random, seed = seeding.np_random(seed)
+ return [seed]
+
+ # methods to override:
+ # ----------------------------
+
+ def _get_obs(self):
+ """Returns the observation.
+ """
+ raise NotImplementedError()
+
+ def configure(self, *args, **kwargs):
+ """
+ Helper method to set certain environment properties such as contexts in contextual environments since reset()
+ doesn't take arguments. Should be called before reset().
+ """
+ pass
+
+ def reset_model(self):
+ """
+ Reset the robot degrees of freedom (qpos and qvel).
+ Implement this in each subclass.
+ """
+ raise NotImplementedError
+
+ def viewer_setup(self):
+ """
+ This method is called when the viewer is initialized.
+ Optionally implement this method, if you need to tinker with camera position
+ and so forth.
+ """
+ pass
+
+ # -----------------------------
+
+ def reset(self):
+ self.sim.reset()
+ ob = self.reset_model()
+ return ob
+
+ def set_state(self, qpos, qvel):
+ assert qpos.shape == (self.model.nq,) and qvel.shape == (self.model.nv,)
+ old_state = self.sim.get_state()
+ new_state = mujoco_py.MjSimState(old_state.time, qpos, qvel,
+ old_state.act, old_state.udd_state)
+ self.sim.set_state(new_state)
+ self.sim.forward()
+
+ @property
+ def dt(self):
+ return self.model.opt.timestep * self.n_substeps
+
+ def do_simulation(self, ctrl):
+ """
+ Additionally returns whether there was an error while stepping the simulation
+ """
+ error_in_sim = False
+ num_actuations = len(ctrl)
+ if self.apply_gravity_comp:
+ self.sim.data.ctrl[:num_actuations] = ctrl + self.sim.data.qfrc_bias[:num_actuations].copy() / self.model.actuator_gear[:, 0]
+ else:
+ self.sim.data.ctrl[:num_actuations] = ctrl
+
+ try:
+ self.sim.step()
+ except mujoco_py.builder.MujocoException as e:
+ error_in_sim = True
+
+ return error_in_sim
+
+ def render(self,
+ mode='human',
+ width=DEFAULT_SIZE,
+ height=DEFAULT_SIZE,
+ camera_id=None,
+ camera_name=None):
+ if mode == 'rgb_array' or mode == 'depth_array':
+ if camera_id is not None and camera_name is not None:
+ raise ValueError("Both `camera_id` and `camera_name` cannot be"
+ " specified at the same time.")
+
+ no_camera_specified = camera_name is None and camera_id is None
+ if no_camera_specified:
+ camera_name = 'track'
+
+ if camera_id is None and camera_name in self.model._camera_name2id:
+ camera_id = self.model.camera_name2id(camera_name)
+
+ self._get_viewer(mode).render(width, height, camera_id=camera_id)
+
+ if mode == 'rgb_array':
+ # window size used for old mujoco-py:
+ data = self._get_viewer(mode).read_pixels(width, height, depth=False)
+ # original image is upside-down, so flip it
+ return data[::-1, :, :]
+ elif mode == 'depth_array':
+ self._get_viewer(mode).render(width, height)
+ # window size used for old mujoco-py:
+ # Extract depth part of the read_pixels() tuple
+ data = self._get_viewer(mode).read_pixels(width, height, depth=True)[1]
+ # original image is upside-down, so flip it
+ return data[::-1, :]
+ elif mode == 'human':
+ self._get_viewer(mode).render()
+
+ def close(self):
+ if self.viewer is not None:
+ # self.viewer.finish()
+ self.viewer = None
+ self._viewers = {}
+
+ def _get_viewer(self, mode):
+ self.viewer = self._viewers.get(mode)
+ if self.viewer is None:
+ if mode == 'human':
+ self.viewer = mujoco_py.MjViewer(self.sim)
+ elif mode == 'rgb_array' or mode == 'depth_array':
+ self.viewer = mujoco_py.MjRenderContextOffscreen(self.sim, -1)
+
+ self.viewer_setup()
+ self._viewers[mode] = self.viewer
+ return self.viewer
+
+ def get_body_com(self, body_name):
+ return self.data.get_body_xpos(body_name)
+
+ def state_vector(self):
+ return np.concatenate([
+ self.sim.data.qpos.flat,
+ self.sim.data.qvel.flat
+ ])
diff --git a/alr_envs/mujoco/alr_reward_fct.py b/alr_envs/mujoco/alr_reward_fct.py
new file mode 100644
index 0000000..c96dd07
--- /dev/null
+++ b/alr_envs/mujoco/alr_reward_fct.py
@@ -0,0 +1,21 @@
+class AlrReward:
+ """
+ A base class for non-Markovian reward functions which may need trajectory information to calculate an episodic
+ reward. Call the methods in reset() and step() of the environment.
+ """
+
+ # methods to override:
+ # ----------------------------
+ def reset(self, *args, **kwargs):
+ """
+ Reset the reward function, empty state buffers before an episode, set contexts that influence reward, etc.
+ """
+ raise NotImplementedError
+
+ def compute_reward(self, *args, **kwargs):
+ """
+
+ Returns: Useful things to return are reward values, success flags or crash flags
+
+ """
+ raise NotImplementedError
diff --git a/alr_envs/mujoco/balancing.py b/alr_envs/mujoco/balancing.py
index 5976bc2..3e34298 100644
--- a/alr_envs/mujoco/balancing.py
+++ b/alr_envs/mujoco/balancing.py
@@ -4,7 +4,7 @@ import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
-from alr_envs.utils.utils import angle_normalize
+import alr_envs.utils.utils as alr_utils
class BalancingEnv(mujoco_env.MujocoEnv, utils.EzPickle):
@@ -23,7 +23,7 @@ class BalancingEnv(mujoco_env.MujocoEnv, utils.EzPickle):
mujoco_env.MujocoEnv.__init__(self, os.path.join(os.path.dirname(__file__), "assets", file_name), 2)
def step(self, a):
- angle = angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad")
+ angle = alr_utils.angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad")
reward = - np.abs(angle)
self.do_simulation(a, self.frame_skip)
diff --git a/alr_envs/mujoco/ball_in_a_cup/__init__.py b/alr_envs/mujoco/ball_in_a_cup/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/__init__.py
@@ -0,0 +1 @@
+
diff --git a/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml b/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml
new file mode 100644
index 0000000..b534205
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/assets/ball-in-a-cup_base.xml
@@ -0,0 +1,366 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/alr_envs/mujoco/ball_in_a_cup/assets/biac_base.xml b/alr_envs/mujoco/ball_in_a_cup/assets/biac_base.xml
new file mode 100644
index 0000000..58f0ac6
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/assets/biac_base.xml
@@ -0,0 +1,360 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py
new file mode 100644
index 0000000..66461c0
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup.py
@@ -0,0 +1,169 @@
+from gym import utils
+import os
+import numpy as np
+from alr_envs.mujoco import alr_mujoco_env
+
+
+class ALRBallInACupEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
+ def __init__(self, n_substeps=4, apply_gravity_comp=True, simplified: bool = False,
+ reward_type: str = None, context: np.ndarray = None):
+ self._steps = 0
+
+ self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
+ "biac_base" + ".xml")
+
+ self._q_pos = []
+ self._q_vel = []
+ # self.weight_matrix_scale = 50
+ self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
+ self.p_gains = 1 / self.max_ctrl * np.array([200, 300, 100, 100, 10, 10, 2.5])
+ self.d_gains = 1 / self.max_ctrl * np.array([7, 15, 5, 2.5, 0.3, 0.3, 0.05])
+
+ self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
+ self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
+
+ self.context = None
+
+ utils.EzPickle.__init__(self)
+ alr_mujoco_env.AlrMujocoEnv.__init__(self,
+ self.xml_path,
+ apply_gravity_comp=apply_gravity_comp,
+ n_substeps=n_substeps)
+ self._start_pos = np.array([0.0, 0.58760536, 0.0, 1.36004913, 0.0, -0.32072943, -1.57])
+ self._start_vel = np.zeros(7)
+
+ self.simplified = simplified
+
+ self.sim_time = 8 # seconds
+ self.sim_steps = int(self.sim_time / self.dt)
+ if reward_type == "no_context":
+ from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward
+ reward_function = BallInACupReward
+ elif reward_type == "contextual_goal":
+ from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward
+ reward_function = BallInACupReward
+ else:
+ raise ValueError("Unknown reward type")
+ self.reward_function = reward_function(self.sim_steps)
+ self.configure(context)
+
+ @property
+ def start_pos(self):
+ if self.simplified:
+ return self._start_pos[1::2]
+ else:
+ return self._start_pos
+
+ @property
+ def start_vel(self):
+ if self.simplified:
+ return self._start_vel[1::2]
+ else:
+ return self._start_vel
+
+ @property
+ def current_pos(self):
+ return self.sim.data.qpos[0:7].copy()
+
+ @property
+ def current_vel(self):
+ return self.sim.data.qvel[0:7].copy()
+
+ def configure(self, context):
+ self.context = context
+ self.reward_function.reset(context)
+
+ def reset_model(self):
+ init_pos_all = self.init_qpos.copy()
+ init_pos_robot = self._start_pos
+ init_vel = np.zeros_like(init_pos_all)
+
+ self._steps = 0
+ self._q_pos = []
+ self._q_vel = []
+
+ start_pos = init_pos_all
+ start_pos[0:7] = init_pos_robot
+
+ self.set_state(start_pos, init_vel)
+
+ return self._get_obs()
+
+ def step(self, a):
+ reward_dist = 0.0
+ angular_vel = 0.0
+ reward_ctrl = - np.square(a).sum()
+
+ crash = self.do_simulation(a)
+ joint_cons_viol = self.check_traj_in_joint_limits()
+
+ self._q_pos.append(self.sim.data.qpos[0:7].ravel().copy())
+ self._q_vel.append(self.sim.data.qvel[0:7].ravel().copy())
+
+ ob = self._get_obs()
+
+ if not crash and not joint_cons_viol:
+ reward, success, stop_sim = self.reward_function.compute_reward(a, self.sim, self._steps)
+ done = success or self._steps == self.sim_steps - 1 or stop_sim
+ self._steps += 1
+ else:
+ reward = -1000
+ success = False
+ done = True
+ return ob, reward, done, dict(reward_dist=reward_dist,
+ reward_ctrl=reward_ctrl,
+ velocity=angular_vel,
+ traj=self._q_pos, is_success=success,
+ is_collided=crash or joint_cons_viol)
+
+ def check_traj_in_joint_limits(self):
+ return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min)
+
+ # TODO: extend observation space
+ def _get_obs(self):
+ theta = self.sim.data.qpos.flat[:7]
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ # self.get_body_com("target"), # only return target to make problem harder
+ [self._steps],
+ ])
+
+ # These functions are for the task with 3 joint actuations
+ def extend_des_pos(self, des_pos):
+ des_pos_full = self._start_pos.copy()
+ des_pos_full[1] = des_pos[0]
+ des_pos_full[3] = des_pos[1]
+ des_pos_full[5] = des_pos[2]
+ return des_pos_full
+
+ def extend_des_vel(self, des_vel):
+ des_vel_full = self._start_vel.copy()
+ des_vel_full[1] = des_vel[0]
+ des_vel_full[3] = des_vel[1]
+ des_vel_full[5] = des_vel[2]
+ return des_vel_full
+
+
+if __name__ == "__main__":
+ env = ALRBallInACupEnv()
+ ctxt = np.array([-0.20869846, -0.66376693, 1.18088501])
+
+ env.configure(ctxt)
+ env.reset()
+ # env.render()
+ for i in range(16000):
+ # test with random actions
+ ac = 0.001 * env.action_space.sample()[0:7]
+ # ac = env.start_pos
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ # env.render()
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
+
diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py
new file mode 100644
index 0000000..e34aecf
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py
@@ -0,0 +1,142 @@
+import numpy as np
+from alr_envs.mujoco import alr_reward_fct
+
+
+class BallInACupReward(alr_reward_fct.AlrReward):
+ def __init__(self, sim_time):
+ self.sim_time = sim_time
+
+ self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom"]
+
+ self.ball_id = None
+ self.ball_collision_id = None
+ self.goal_id = None
+ self.goal_final_id = None
+ self.collision_ids = None
+
+ self.ball_traj = None
+ self.dists = None
+ self.dists_ctxt = None
+ self.dists_final = None
+ self.costs = None
+
+ self.reset(None)
+
+ def reset(self, context):
+ self.ball_traj = np.zeros(shape=(self.sim_time, 3))
+ self.cup_traj = np.zeros(shape=(self.sim_time, 3))
+ self.dists = []
+ self.dists_ctxt = []
+ self.dists_final = []
+ self.costs = []
+ self.context = context
+ self.ball_in_cup = False
+ self.ball_above_threshold = False
+ self.dist_ctxt = 3
+ self.action_costs = []
+ self.cup_angles = []
+
+ def compute_reward(self, action, sim, step):
+ action_cost = np.sum(np.square(action))
+ self.action_costs.append(action_cost)
+
+ stop_sim = False
+ success = False
+
+ self.ball_id = sim.model._body_name2id["ball"]
+ self.ball_collision_id = sim.model._geom_name2id["ball_geom"]
+ self.goal_id = sim.model._site_name2id["cup_goal"]
+ self.goal_final_id = sim.model._site_name2id["cup_goal_final"]
+ self.collision_ids = [sim.model._geom_name2id[name] for name in self.collision_objects]
+
+ if self.check_collision(sim):
+ reward = - 1e-3 * action_cost - 1000
+ stop_sim = True
+ return reward, success, stop_sim
+
+ # Compute the current distance from the ball to the inner part of the cup
+ goal_pos = sim.data.site_xpos[self.goal_id]
+ ball_pos = sim.data.body_xpos[self.ball_id]
+ goal_final_pos = sim.data.site_xpos[self.goal_final_id]
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.dists_ctxt.append(np.linalg.norm(ball_pos - self.context))
+ self.ball_traj[step, :] = np.copy(ball_pos)
+ self.cup_traj[step, :] = np.copy(goal_pos) # ?
+ cup_quat = np.copy(sim.data.body_xquat[sim.model._body_name2id["cup"]])
+ self.cup_angles.append(np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
+ 1 - 2 * (cup_quat[1] ** 2 + cup_quat[2] ** 2)))
+
+ # Determine the first time when ball is in cup
+ if not self.ball_in_cup:
+ ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id)
+ self.ball_in_cup = ball_in_cup
+ if ball_in_cup:
+ dist_to_ctxt = np.linalg.norm(ball_pos - self.context)
+ self.dist_ctxt = dist_to_ctxt
+
+ if step == self.sim_time - 1:
+ t_min_dist = np.argmin(self.dists)
+ angle_min_dist = self.cup_angles[t_min_dist]
+ cost_angle = (angle_min_dist - np.pi / 2) ** 2
+
+ min_dist = np.min(self.dists)
+ dist_final = self.dists_final[-1]
+ # dist_ctxt = self.dists_ctxt[-1]
+
+ # # max distance between ball and cup and cup height at that time
+ # ball_to_cup_diff = self.ball_traj[:, 2] - self.cup_traj[:, 2]
+ # t_max_diff = np.argmax(ball_to_cup_diff)
+ # t_max_ball_height = np.argmax(self.ball_traj[:, 2])
+ # max_ball_height = np.max(self.ball_traj[:, 2])
+
+ # cost = self._get_stage_wise_cost(ball_in_cup, min_dist, dist_final, dist_ctxt)
+ cost = 0.5 * min_dist + 0.5 * dist_final + 0.3 * np.minimum(self.dist_ctxt, 3) + 0.01 * cost_angle
+ reward = np.exp(-2 * cost) - 1e-3 * action_cost
+ # if max_ball_height < self.context[2] or ball_to_cup_diff[t_max_ball_height] < 0:
+ # reward -= 1
+
+ success = dist_final < 0.05 and self.dist_ctxt < 0.05
+ else:
+ reward = - 1e-3 * action_cost
+ success = False
+
+ return reward, success, stop_sim
+
+ def _get_stage_wise_cost(self, ball_in_cup, min_dist, dist_final, dist_to_ctxt):
+ if not ball_in_cup:
+ cost = 3 + 2*(0.5 * min_dist**2 + 0.5 * dist_final**2)
+ else:
+ cost = 2 * dist_to_ctxt ** 2
+ print('Context Distance:', dist_to_ctxt)
+ return cost
+
+ def check_ball_in_cup(self, sim, ball_collision_id):
+ cup_base_collision_id = sim.model._geom_name2id["cup_base_contact"]
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id
+ collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id
+
+ if collision or collision_trans:
+ return True
+ return False
+
+ def check_collision(self, sim):
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id
+ collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids
+
+ if collision or collision_trans:
+ return True
+ return False
diff --git a/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py
new file mode 100644
index 0000000..13053eb
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py
@@ -0,0 +1,104 @@
+import numpy as np
+from alr_envs.mujoco import alr_reward_fct
+
+
+class BallInACupReward(alr_reward_fct.AlrReward):
+ def __init__(self, sim_time):
+ self.sim_time = sim_time
+
+ self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom"]
+
+ self.ball_id = None
+ self.ball_collision_id = None
+ self.goal_id = None
+ self.goal_final_id = None
+ self.collision_ids = None
+
+ self.ball_traj = None
+ self.dists = None
+ self.dists_final = None
+ self.costs = None
+
+ self.reset(None)
+
+ def reset(self, context):
+ self.ball_traj = np.zeros(shape=(self.sim_time, 3))
+ self.dists = []
+ self.dists_final = []
+ self.costs = []
+ self.action_costs = []
+ self.cup_angles = []
+
+ def compute_reward(self, action, sim, step, context=None):
+ self.ball_id = sim.model._body_name2id["ball"]
+ self.ball_collision_id = sim.model._geom_name2id["ball_geom"]
+ self.goal_id = sim.model._site_name2id["cup_goal"]
+ self.goal_final_id = sim.model._site_name2id["cup_goal_final"]
+ self.collision_ids = [sim.model._geom_name2id[name] for name in self.collision_objects]
+
+ ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id)
+
+ # Compute the current distance from the ball to the inner part of the cup
+ goal_pos = sim.data.site_xpos[self.goal_id]
+ ball_pos = sim.data.body_xpos[self.ball_id]
+ goal_final_pos = sim.data.site_xpos[self.goal_final_id]
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.ball_traj[step, :] = ball_pos
+ cup_quat = np.copy(sim.data.body_xquat[sim.model._body_name2id["cup"]])
+ self.cup_angles.append(np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]),
+ 1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2)))
+
+ action_cost = np.sum(np.square(action))
+ self.action_costs.append(action_cost)
+
+ if self.check_collision(sim):
+ reward = - 1000
+ return reward, False, True
+
+ if step == self.sim_time - 1:
+ t_min_dist = np.argmin(self.dists)
+ angle_min_dist = self.cup_angles[t_min_dist]
+ cost_angle = (angle_min_dist - np.pi / 2)**2
+
+ min_dist = self.dists[t_min_dist]
+ dist_final = self.dists_final[-1]
+
+ cost = 0.5 * min_dist + 0.5 * dist_final + 0.01 * cost_angle
+ reward = np.exp(-2 * cost) - 1e-3 * action_cost
+ success = dist_final < 0.05 and ball_in_cup
+ else:
+ reward = - 1e-3 * action_cost
+ success = False
+
+ return reward, success, False
+
+ def check_ball_in_cup(self, sim, ball_collision_id):
+ cup_base_collision_id = sim.model._geom_name2id["cup_base_contact"]
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id
+ collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id
+
+ if collision or collision_trans:
+ return True
+ return False
+
+ def check_collision(self, sim):
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id
+ collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids
+
+ if collision or collision_trans:
+ return True
+ return False
diff --git a/alr_envs/mujoco/ball_in_a_cup/utils.py b/alr_envs/mujoco/ball_in_a_cup/utils.py
new file mode 100644
index 0000000..bfec3cf
--- /dev/null
+++ b/alr_envs/mujoco/ball_in_a_cup/utils.py
@@ -0,0 +1,141 @@
+from alr_envs.utils.wrapper.detpmp_wrapper import DetPMPWrapper
+from alr_envs.utils.wrapper.dmp_wrapper import DmpWrapper
+from alr_envs.mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv
+
+
+def make_contextual_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBallInACupEnv(reward_type="contextual_goal")
+
+ env = DetPMPWrapper(env,
+ num_dof=7,
+ num_basis=5,
+ width=0.005,
+ policy_type="motor",
+ start_pos=env.start_pos,
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.5,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
+
+
+def make_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBallInACupEnv(reward_type="simple")
+
+ env = DetPMPWrapper(env,
+ num_dof=7,
+ num_basis=5,
+ width=0.005,
+ policy_type="motor",
+ start_pos=env.start_pos,
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.2,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
+
+
+def make_simple_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBallInACupEnv(reward_type="simple")
+
+ env = DetPMPWrapper(env,
+ num_dof=3,
+ num_basis=5,
+ width=0.005,
+ off=-0.1,
+ policy_type="motor",
+ start_pos=env.start_pos[1::2],
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.25,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
+
+
+def make_simple_dmp_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ _env = ALRBallInACupEnv(reward_type="simple")
+
+ _env = DmpWrapper(_env,
+ num_dof=3,
+ num_basis=5,
+ duration=3.5,
+ post_traj_time=4.5,
+ bandwidth_factor=2.5,
+ dt=_env.dt,
+ learn_goal=False,
+ alpha_phase=3,
+ start_pos=_env.start_pos[1::2],
+ final_pos=_env.start_pos[1::2],
+ policy_type="motor",
+ weights_scale=100,
+ )
+
+ _env.seed(seed + rank)
+ return _env
+
+ return _init
diff --git a/alr_envs/mujoco/beerpong/__init__.py b/alr_envs/mujoco/beerpong/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/mujoco/beerpong/assets/beerpong.xml b/alr_envs/mujoco/beerpong/assets/beerpong.xml
new file mode 100644
index 0000000..017bbad
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/assets/beerpong.xml
@@ -0,0 +1,238 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/alr_envs/mujoco/beerpong/beerpong.py b/alr_envs/mujoco/beerpong/beerpong.py
new file mode 100644
index 0000000..dc6278d
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/beerpong.py
@@ -0,0 +1,143 @@
+from gym import utils
+import os
+import numpy as np
+from alr_envs.mujoco import alr_mujoco_env
+
+
+class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
+ def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
+ self._steps = 0
+
+ self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
+ "beerpong" + ".xml")
+
+ self.start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59])
+ self.start_vel = np.zeros(7)
+
+ self._q_pos = []
+ self._q_vel = []
+ # self.weight_matrix_scale = 50
+ self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
+ self.p_gains = 1 / self.max_ctrl * np.array([200, 300, 100, 100, 10, 10, 2.5])
+ self.d_gains = 1 / self.max_ctrl * np.array([7, 15, 5, 2.5, 0.3, 0.3, 0.05])
+
+ self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
+ self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
+
+ self.context = None
+
+ utils.EzPickle.__init__(self)
+ alr_mujoco_env.AlrMujocoEnv.__init__(self,
+ self.xml_path,
+ apply_gravity_comp=apply_gravity_comp,
+ n_substeps=n_substeps)
+
+ self.sim_time = 8 # seconds
+ self.sim_steps = int(self.sim_time / self.dt)
+ if reward_function is None:
+ from alr_envs.mujoco.beerpong.beerpong_reward import BeerpongReward
+ reward_function = BeerpongReward
+ self.reward_function = reward_function(self.sim, self.sim_steps)
+ self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
+ self.ball_id = self.sim.model._body_name2id["ball"]
+ self.cup_table_id = self.sim.model._body_name2id["cup_table"]
+
+ @property
+ def current_pos(self):
+ return self.sim.data.qpos[0:7].copy()
+
+ @property
+ def current_vel(self):
+ return self.sim.data.qvel[0:7].copy()
+
+ def configure(self, context):
+ self.context = context
+ self.reward_function.reset(context)
+
+ def reset_model(self):
+ init_pos_all = self.init_qpos.copy()
+ init_pos_robot = self.start_pos
+ init_vel = np.zeros_like(init_pos_all)
+
+ self._steps = 0
+ self._q_pos = []
+ self._q_vel = []
+
+ start_pos = init_pos_all
+ start_pos[0:7] = init_pos_robot
+ # start_pos[7:] = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05])
+
+ self.set_state(start_pos, init_vel)
+
+ ball_pos = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05])
+ self.sim.model.body_pos[self.ball_id] = ball_pos.copy()
+ self.sim.model.body_pos[self.cup_table_id] = self.context.copy()
+
+ return self._get_obs()
+
+ def step(self, a):
+ reward_dist = 0.0
+ angular_vel = 0.0
+ reward_ctrl = - np.square(a).sum()
+
+ crash = self.do_simulation(a)
+ joint_cons_viol = self.check_traj_in_joint_limits()
+
+ self._q_pos.append(self.sim.data.qpos[0:7].ravel().copy())
+ self._q_vel.append(self.sim.data.qvel[0:7].ravel().copy())
+
+ ob = self._get_obs()
+
+ if not crash and not joint_cons_viol:
+ reward, success, stop_sim = self.reward_function.compute_reward(a, self.sim, self._steps)
+ done = success or self._steps == self.sim_steps - 1 or stop_sim
+ self._steps += 1
+ else:
+ reward = -1000
+ success = False
+ done = True
+ return ob, reward, done, dict(reward_dist=reward_dist,
+ reward_ctrl=reward_ctrl,
+ velocity=angular_vel,
+ traj=self._q_pos, is_success=success,
+ is_collided=crash or joint_cons_viol)
+
+ def check_traj_in_joint_limits(self):
+ return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min)
+
+ def _get_obs(self):
+ theta = self.sim.data.qpos.flat[:7]
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ # self.get_body_com("target"), # only return target to make problem harder
+ [self._steps],
+ ])
+
+
+
+if __name__ == "__main__":
+ env = ALRBeerpongEnv()
+ ctxt = np.array([0, -2, 0.840]) # initial
+
+ env.configure(ctxt)
+ env.reset()
+ env.render()
+ for i in range(16000):
+ # test with random actions
+ ac = 0.0 * env.action_space.sample()[0:7]
+ ac[1] = -0.8
+ ac[3] = - 0.3
+ ac[5] = -0.2
+ # ac = env.start_pos
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ env.render()
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
+
diff --git a/alr_envs/mujoco/beerpong/beerpong_reward.py b/alr_envs/mujoco/beerpong/beerpong_reward.py
new file mode 100644
index 0000000..3281409
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/beerpong_reward.py
@@ -0,0 +1,124 @@
+import numpy as np
+from alr_envs.mujoco import alr_reward_fct
+
+
+class BeerpongReward(alr_reward_fct.AlrReward):
+ def __init__(self, sim, sim_time):
+
+ self.sim = sim
+ self.sim_time = sim_time
+
+ self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom"]
+
+ self.ball_id = None
+ self.ball_collision_id = None
+ self.goal_id = None
+ self.goal_final_id = None
+ self.collision_ids = None
+
+ self.ball_traj = None
+ self.dists = None
+ self.dists_ctxt = None
+ self.dists_final = None
+ self.costs = None
+
+ self.reset(None)
+
+ def reset(self, context):
+ self.ball_traj = np.zeros(shape=(self.sim_time, 3))
+ self.dists = []
+ self.dists_ctxt = []
+ self.dists_final = []
+ self.costs = []
+ self.context = context
+ self.ball_in_cup = False
+ self.dist_ctxt = 5
+
+ self.ball_id = self.sim.model._body_name2id["ball"]
+ self.ball_collision_id = self.sim.model._geom_name2id["ball_geom"]
+ self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
+ self.goal_id = self.sim.model._site_name2id["cup_goal_table"]
+ self.goal_final_id = self.sim.model._site_name2id["cup_goal_final_table"]
+ self.collision_ids = [self.sim.model._geom_name2id[name] for name in self.collision_objects]
+ self.cup_table_id = self.sim.model._body_name2id["cup_table"]
+
+ def compute_reward(self, action, sim, step):
+ action_cost = np.sum(np.square(action))
+
+ stop_sim = False
+ success = False
+
+ if self.check_collision(sim):
+ reward = - 1e-4 * action_cost - 1000
+ stop_sim = True
+ return reward, success, stop_sim
+
+ # Compute the current distance from the ball to the inner part of the cup
+ goal_pos = sim.data.site_xpos[self.goal_id]
+ ball_pos = sim.data.body_xpos[self.ball_id]
+ goal_final_pos = sim.data.site_xpos[self.goal_final_id]
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.dists_ctxt.append(np.linalg.norm(ball_pos - self.context))
+ self.ball_traj[step, :] = ball_pos
+
+ # Determine the first time when ball is in cup
+ if not self.ball_in_cup:
+ ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id)
+ self.ball_in_cup = ball_in_cup
+ if ball_in_cup:
+ dist_to_ctxt = np.linalg.norm(ball_pos - self.context)
+ self.dist_ctxt = dist_to_ctxt
+
+ if step == self.sim_time - 1:
+ min_dist = np.min(self.dists)
+ dist_final = self.dists_final[-1]
+ # dist_ctxt = self.dists_ctxt[-1]
+
+ # cost = self._get_stage_wise_cost(ball_in_cup, min_dist, dist_final, dist_ctxt)
+ cost = 2 * (0.5 * min_dist + 0.5 * dist_final + 0.1 * self.dist_ctxt)
+ reward = np.exp(-1 * cost) - 1e-4 * action_cost
+ success = dist_final < 0.05 and self.dist_ctxt < 0.05
+ else:
+ reward = - 1e-4 * action_cost
+ success = False
+
+ return reward, success, stop_sim
+
+ def _get_stage_wise_cost(self, ball_in_cup, min_dist, dist_final, dist_to_ctxt):
+ if not ball_in_cup:
+ cost = 3 + 2*(0.5 * min_dist**2 + 0.5 * dist_final**2)
+ else:
+ cost = 2 * dist_to_ctxt ** 2
+ print('Context Distance:', dist_to_ctxt)
+ return cost
+
+ def check_ball_in_cup(self, sim, ball_collision_id):
+ cup_base_collision_id = sim.model._geom_name2id["cup_base_contact"]
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id
+ collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id
+
+ if collision or collision_trans:
+ return True
+ return False
+
+ def check_collision(self, sim):
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id
+ collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids
+
+ if collision or collision_trans:
+ return True
+ return False
diff --git a/alr_envs/mujoco/beerpong/beerpong_reward_simple.py b/alr_envs/mujoco/beerpong/beerpong_reward_simple.py
new file mode 100644
index 0000000..b79e963
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/beerpong_reward_simple.py
@@ -0,0 +1,141 @@
+import numpy as np
+from alr_envs.mujoco import alr_reward_fct
+
+
+class BeerpongReward(alr_reward_fct.AlrReward):
+ def __init__(self, sim, sim_time):
+
+ self.sim = sim
+ self.sim_time = sim_time
+
+ self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom",
+ "wrist_pitch_link_convex_decomposition_p1_geom",
+ "wrist_pitch_link_convex_decomposition_p2_geom",
+ "wrist_pitch_link_convex_decomposition_p3_geom",
+ "wrist_yaw_link_convex_decomposition_p1_geom",
+ "wrist_yaw_link_convex_decomposition_p2_geom",
+ "forearm_link_convex_decomposition_p1_geom",
+ "forearm_link_convex_decomposition_p2_geom"]
+
+ self.ball_id = None
+ self.ball_collision_id = None
+ self.goal_id = None
+ self.goal_final_id = None
+ self.collision_ids = None
+
+ self.ball_traj = None
+ self.dists = None
+ self.dists_ctxt = None
+ self.dists_final = None
+ self.costs = None
+
+ self.reset(None)
+
+ def reset(self, context):
+ self.ball_traj = np.zeros(shape=(self.sim_time, 3))
+ self.dists = []
+ self.dists_ctxt = []
+ self.dists_final = []
+ self.costs = []
+ self.action_costs = []
+ self.context = context
+ self.ball_in_cup = False
+ self.dist_ctxt = 5
+ self.bounce_dist = 2
+ self.min_dist = 2
+ self.dist_final = 2
+ self.table_contact = False
+
+ self.ball_id = self.sim.model._body_name2id["ball"]
+ self.ball_collision_id = self.sim.model._geom_name2id["ball_geom"]
+ self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
+ self.goal_id = self.sim.model._site_name2id["cup_goal_table"]
+ self.goal_final_id = self.sim.model._site_name2id["cup_goal_final_table"]
+ self.collision_ids = [self.sim.model._geom_name2id[name] for name in self.collision_objects]
+ self.cup_table_id = self.sim.model._body_name2id["cup_table"]
+ self.bounce_table_id = self.sim.model._site_name2id["bounce_table"]
+
+ def compute_reward(self, action, sim, step):
+ action_cost = np.sum(np.square(action))
+ self.action_costs.append(action_cost)
+
+ stop_sim = False
+ success = False
+
+ if self.check_collision(sim):
+ reward = - 1e-2 * action_cost - 10
+ stop_sim = True
+ return reward, success, stop_sim
+
+ # Compute the current distance from the ball to the inner part of the cup
+ goal_pos = sim.data.site_xpos[self.goal_id]
+ ball_pos = sim.data.body_xpos[self.ball_id]
+ bounce_pos = sim.data.site_xpos[self.bounce_table_id]
+ goal_final_pos = sim.data.site_xpos[self.goal_final_id]
+ self.dists.append(np.linalg.norm(goal_pos - ball_pos))
+ self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos))
+ self.ball_traj[step, :] = ball_pos
+
+ ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id)
+ table_contact = self.check_ball_table_contact(sim, self.ball_collision_id)
+
+ if table_contact and not self.table_contact:
+ self.bounce_dist = np.minimum((np.linalg.norm(bounce_pos - ball_pos)), 2)
+ self.table_contact = True
+
+ if step == self.sim_time - 1:
+ min_dist = np.min(self.dists)
+ self.min_dist = min_dist
+ dist_final = self.dists_final[-1]
+ self.dist_final = dist_final
+
+ cost = 0.33 * min_dist + 0.33 * dist_final + 0.33 * self.bounce_dist
+ reward = np.exp(-2 * cost) - 1e-2 * action_cost
+ success = self.bounce_dist < 0.05 and dist_final < 0.05 and ball_in_cup
+ else:
+ reward = - 1e-2 * action_cost
+ success = False
+
+ return reward, success, stop_sim
+
+ def _get_stage_wise_cost(self, ball_in_cup, min_dist, dist_final, dist_to_ctxt):
+ if not ball_in_cup:
+ cost = 3 + 2*(0.5 * min_dist**2 + 0.5 * dist_final**2)
+ else:
+ cost = 2 * dist_to_ctxt ** 2
+ print('Context Distance:', dist_to_ctxt)
+ return cost
+
+ def check_ball_table_contact(self, sim, ball_collision_id):
+ table_collision_id = sim.model._geom_name2id["table_contact_geom"]
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+ collision = con.geom1 == table_collision_id and con.geom2 == ball_collision_id
+ collision_trans = con.geom1 == ball_collision_id and con.geom2 == table_collision_id
+
+ if collision or collision_trans:
+ return True
+ return False
+
+ def check_ball_in_cup(self, sim, ball_collision_id):
+ cup_base_collision_id = sim.model._geom_name2id["cup_base_table_contact"]
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id
+ collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id
+
+ if collision or collision_trans:
+ return True
+ return False
+
+ def check_collision(self, sim):
+ for coni in range(0, sim.data.ncon):
+ con = sim.data.contact[coni]
+
+ collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id
+ collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids
+
+ if collision or collision_trans:
+ return True
+ return False
diff --git a/alr_envs/mujoco/beerpong/beerpong_simple.py b/alr_envs/mujoco/beerpong/beerpong_simple.py
new file mode 100644
index 0000000..7a0908d
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/beerpong_simple.py
@@ -0,0 +1,164 @@
+from gym import utils
+import os
+import numpy as np
+from alr_envs.mujoco import alr_mujoco_env
+
+
+class ALRBeerpongEnv(alr_mujoco_env.AlrMujocoEnv, utils.EzPickle):
+ def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None):
+ self._steps = 0
+
+ self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets",
+ "beerpong" + ".xml")
+
+ self.start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59])
+ self.start_vel = np.zeros(7)
+
+ self._q_pos = []
+ self._q_vel = []
+ # self.weight_matrix_scale = 50
+ self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.])
+ self.p_gains = 1 / self.max_ctrl * np.array([200, 300, 100, 100, 10, 10, 2.5])
+ self.d_gains = 1 / self.max_ctrl * np.array([7, 15, 5, 2.5, 0.3, 0.3, 0.05])
+
+ self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7])
+ self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7])
+
+ self.context = None
+
+ utils.EzPickle.__init__(self)
+ alr_mujoco_env.AlrMujocoEnv.__init__(self,
+ self.xml_path,
+ apply_gravity_comp=apply_gravity_comp,
+ n_substeps=n_substeps)
+
+ self.sim_time = 8 # seconds
+ self.sim_steps = int(self.sim_time / self.dt)
+ if reward_function is None:
+ from alr_envs.mujoco.beerpong.beerpong_reward_simple import BeerpongReward
+ reward_function = BeerpongReward
+ self.reward_function = reward_function(self.sim, self.sim_steps)
+ self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"]
+ self.ball_id = self.sim.model._body_name2id["ball"]
+ self.cup_table_id = self.sim.model._body_name2id["cup_table"]
+ # self.bounce_table_id = self.sim.model._body_name2id["bounce_table"]
+
+ @property
+ def current_pos(self):
+ return self.sim.data.qpos[0:7].copy()
+
+ @property
+ def current_vel(self):
+ return self.sim.data.qvel[0:7].copy()
+
+ def configure(self, context):
+ if context is None:
+ context = np.array([0, -2, 0.840])
+ self.context = context
+ self.reward_function.reset(context)
+
+ def reset_model(self):
+ init_pos_all = self.init_qpos.copy()
+ init_pos_robot = self.start_pos
+ init_vel = np.zeros_like(init_pos_all)
+
+ self._steps = 0
+ self._q_pos = []
+ self._q_vel = []
+
+ start_pos = init_pos_all
+ start_pos[0:7] = init_pos_robot
+ # start_pos[7:] = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05])
+
+ self.set_state(start_pos, init_vel)
+
+ ball_pos = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05])
+ self.sim.model.body_pos[self.ball_id] = ball_pos.copy()
+ self.sim.model.body_pos[self.cup_table_id] = self.context.copy()
+ # self.sim.model.body_pos[self.bounce_table_id] = self.context.copy()
+
+ self.sim.forward()
+
+ return self._get_obs()
+
+ def step(self, a):
+ reward_dist = 0.0
+ angular_vel = 0.0
+ reward_ctrl = - np.square(a).sum()
+ action_cost = np.sum(np.square(a))
+
+ crash = self.do_simulation(a)
+ joint_cons_viol = self.check_traj_in_joint_limits()
+
+ self._q_pos.append(self.sim.data.qpos[0:7].ravel().copy())
+ self._q_vel.append(self.sim.data.qvel[0:7].ravel().copy())
+
+ ob = self._get_obs()
+
+ if not crash and not joint_cons_viol:
+ reward, success, stop_sim = self.reward_function.compute_reward(a, self.sim, self._steps)
+ done = success or self._steps == self.sim_steps - 1 or stop_sim
+ self._steps += 1
+ else:
+ reward = -10 - 1e-2 * action_cost
+ success = False
+ done = True
+ return ob, reward, done, dict(reward_dist=reward_dist,
+ reward_ctrl=reward_ctrl,
+ velocity=angular_vel,
+ traj=self._q_pos, is_success=success,
+ is_collided=crash or joint_cons_viol)
+
+ def check_traj_in_joint_limits(self):
+ return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min)
+
+ def extend_des_pos(self, des_pos):
+ des_pos_full = self.start_pos.copy()
+ des_pos_full[1] = des_pos[0]
+ des_pos_full[3] = des_pos[1]
+ des_pos_full[5] = des_pos[2]
+ return des_pos_full
+
+ def extend_des_vel(self, des_vel):
+ des_vel_full = self.start_vel.copy()
+ des_vel_full[1] = des_vel[0]
+ des_vel_full[3] = des_vel[1]
+ des_vel_full[5] = des_vel[2]
+ return des_vel_full
+
+ def _get_obs(self):
+ theta = self.sim.data.qpos.flat[:7]
+ return np.concatenate([
+ np.cos(theta),
+ np.sin(theta),
+ # self.get_body_com("target"), # only return target to make problem harder
+ [self._steps],
+ ])
+
+
+
+if __name__ == "__main__":
+ env = ALRBeerpongEnv()
+ ctxt = np.array([0, -2, 0.840]) # initial
+
+ env.configure(ctxt)
+ env.reset()
+ env.render()
+ for i in range(16000):
+ # test with random actions
+ ac = 0.0 * env.action_space.sample()[0:7]
+ ac[1] = -0.01
+ ac[3] = - 0.01
+ ac[5] = -0.01
+ # ac = env.start_pos
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ env.render()
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
+
diff --git a/alr_envs/mujoco/beerpong/utils.py b/alr_envs/mujoco/beerpong/utils.py
new file mode 100644
index 0000000..bfbc2a1
--- /dev/null
+++ b/alr_envs/mujoco/beerpong/utils.py
@@ -0,0 +1,105 @@
+from alr_envs.utils.wrapper.detpmp_wrapper import DetPMPWrapper
+from alr_envs.mujoco.beerpong.beerpong import ALRBeerpongEnv
+from alr_envs.mujoco.beerpong.beerpong_simple import ALRBeerpongEnv as ALRBeerpongEnvSimple
+
+
+def make_contextual_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBeerpongEnv()
+
+ env = DetPMPWrapper(env,
+ num_dof=7,
+ num_basis=5,
+ width=0.005,
+ policy_type="motor",
+ start_pos=env.start_pos,
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.5,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
+
+
+def make_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBeerpongEnvSimple()
+
+ env = DetPMPWrapper(env,
+ num_dof=7,
+ num_basis=5,
+ width=0.005,
+ policy_type="motor",
+ start_pos=env.start_pos,
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.25,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
+
+
+def make_simple_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ env = ALRBeerpongEnvSimple()
+
+ env = DetPMPWrapper(env,
+ num_dof=3,
+ num_basis=5,
+ width=0.005,
+ policy_type="motor",
+ start_pos=env.start_pos[1::2],
+ duration=3.5,
+ post_traj_time=4.5,
+ dt=env.dt,
+ weights_scale=0.5,
+ zero_start=True,
+ zero_goal=True
+ )
+
+ env.seed(seed + rank)
+ return env
+
+ return _init
diff --git a/alr_envs/mujoco/gym_table_tennis/__init__.py b/alr_envs/mujoco/gym_table_tennis/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/__init__.py
@@ -0,0 +1 @@
+
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT b/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT
new file mode 100644
index 0000000..91c2162
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/MUJOCO_LOG.TXT
@@ -0,0 +1,3 @@
+Mon Jan 25 15:45:30 2021
+ERROR: GLEW initalization error: Missing GL version
+
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/__init__.py b/alr_envs/mujoco/gym_table_tennis/envs/__init__.py
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/__init__.py
@@ -0,0 +1 @@
+
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml
new file mode 100644
index 0000000..7772d14
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml
@@ -0,0 +1,12 @@
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml
new file mode 100644
index 0000000..a8df915
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml
@@ -0,0 +1,76 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml
new file mode 100644
index 0000000..011b95a
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml
@@ -0,0 +1,95 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_table.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_table.xml
new file mode 100644
index 0000000..ad1ae35
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_table.xml
@@ -0,0 +1,38 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml
new file mode 100644
index 0000000..eb2b347
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml
@@ -0,0 +1,10 @@
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml
new file mode 100644
index 0000000..29a21e1
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml
@@ -0,0 +1,80 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl
new file mode 100644
index 0000000..133b112
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl
new file mode 100644
index 0000000..047e9df
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl
new file mode 100644
index 0000000..3b05c27
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl
new file mode 100644
index 0000000..5ff94a2
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl
new file mode 100644
index 0000000..c548448
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl
new file mode 100644
index 0000000..495160d
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..b4bb322
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..7b2f001
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..f05174e
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl
new file mode 100644
index 0000000..eb252d9
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl
new file mode 100644
index 0000000..0a986fa
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..c039f0d
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..250acaf
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..993d0f7
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl
new file mode 100644
index 0000000..8448a3f
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl
new file mode 100644
index 0000000..b34963d
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_fine.stl
new file mode 100644
index 0000000..f6a1515
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..e6aa6b6
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..667902e
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl
new file mode 100644
index 0000000..ed66bbb
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..aba957d
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..5cca6a9
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..3343e27
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl
new file mode 100644
index 0000000..ae505fd
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl
new file mode 100644
index 0000000..c36cfec
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl
new file mode 100644
index 0000000..dc633c4
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..82d0093
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..7fd5a55
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl
new file mode 100644
index 0000000..76353ae
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl
new file mode 100644
index 0000000..a0386f6
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_fine.stl
new file mode 100644
index 0000000..f6b41ad
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..c36f88f
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..d00cac1
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl
new file mode 100644
index 0000000..34d1d8b
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl
new file mode 100644
index 0000000..13d2f73
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl
new file mode 100644
index 0000000..06e857f
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl
new file mode 100644
index 0000000..48e1bb1
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl
new file mode 100644
index 0000000..0d95239
Binary files /dev/null and b/alr_envs/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl differ
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml
new file mode 100644
index 0000000..9abf102
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml
@@ -0,0 +1,19 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/shared.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/shared.xml
new file mode 100644
index 0000000..dfbc37a
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/shared.xml
@@ -0,0 +1,49 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml b/alr_envs/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml
new file mode 100644
index 0000000..f2432bb
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml
@@ -0,0 +1,41 @@
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py b/alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py
new file mode 100644
index 0000000..f155f06
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/envs/table_tennis_env.py
@@ -0,0 +1,244 @@
+import numpy as np
+from gym import spaces
+from gym.envs.robotics import robot_env, utils
+# import xml.etree.ElementTree as ET
+from alr_envs.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis
+import glfw
+from alr_envs.mujoco.gym_table_tennis.utils.experiment import ball_initialize
+from pathlib import Path
+import os
+
+
+class TableTennisEnv(robot_env.RobotEnv):
+ """Class for Table Tennis environment.
+ """
+ def __init__(self, n_substeps=1,
+ model_path=None,
+ initial_qpos=None,
+ initial_ball_state=None,
+ config=None,
+ reward_obj=None
+ ):
+ """Initializes a new mujoco based Table Tennis environment.
+
+ Args:
+ model_path (string): path to the environments XML file
+ initial_qpos (dict): a dictionary of joint names and values that define the initial
+ n_actions: Number of joints
+ n_substeps (int): number of substeps the simulation runs on every call to step
+ scale (double): limit maximum change in position
+ initial_ball_state: to reset the ball state
+ """
+ # self.config = config.config
+ if model_path is None:
+ path_cws = Path.cwd()
+ print(path_cws)
+ current_dir = Path(os.path.split(os.path.realpath(__file__))[0])
+ table_tennis_env_xml_path = current_dir / "assets"/"table_tennis_env.xml"
+ model_path = str(table_tennis_env_xml_path)
+ self.config = config
+ action_space = True # self.config['trajectory']['args']['action_space']
+ time_step = 0.002 # self.config['mujoco_sim_env']['args']["time_step"]
+ if initial_qpos is None:
+ initial_qpos = {"wam/base_yaw_joint_right": 1.5,
+ "wam/shoulder_pitch_joint_right": 1,
+ "wam/shoulder_yaw_joint_right": 0,
+ "wam/elbow_pitch_joint_right": 1,
+ "wam/wrist_yaw_joint_right": 1,
+ "wam/wrist_pitch_joint_right": 0,
+ "wam/palm_yaw_joint_right": 0}
+ # initial_qpos = [1.5, 1, 0, 1, 1, 0, 0] # self.config['robot_config']['args']['initial_qpos']
+
+ # TODO should read all configuration in config
+ assert initial_qpos is not None, "Must initialize the initial q position of robot arm"
+ n_actions = 7
+ self.initial_qpos_value = np.array(list(initial_qpos.values())).copy()
+ # self.initial_qpos_value = np.array(initial_qpos)
+ # # change time step in .xml file
+ # tree = ET.parse(model_path)
+ # root = tree.getroot()
+ # for option in root.findall('option'):
+ # option.set("timestep", str(time_step))
+ #
+ # tree.write(model_path)
+
+ super(TableTennisEnv, self).__init__(
+ model_path=model_path, n_substeps=n_substeps, n_actions=n_actions,
+ initial_qpos=initial_qpos)
+
+ if action_space:
+ self.action_space = spaces.Box(low=np.array([-2.6, -2.0, -2.8, -0.9, -4.8, -1.6, -2.2]),
+ high=np.array([2.6, 2.0, 2.8, 3.1, 1.3, 1.6, 2.2]),
+ dtype='float64')
+ else:
+ self.action_space = spaces.Box(low=np.array([-np.inf] * 7),
+ high=np.array([-np.inf] * 7),
+ dtype='float64')
+ self.scale = None
+ self.desired_pos = None
+ self.n_actions = n_actions
+ self.action = None
+ self.time_step = time_step
+ self._dt = time_step
+ self.paddle_center_pos = self.sim.data.get_site_xpos('wam/paddle_center')
+ if reward_obj is None:
+ self.reward_obj = HierarchicalRewardTableTennis()
+ else:
+ self.reward_obj = reward_obj
+
+ if initial_ball_state is not None:
+ self.initial_ball_state = initial_ball_state
+ else:
+ self.initial_ball_state = ball_initialize(random=False)
+ self.target_ball_pos = self.sim.data.get_site_xpos("target_ball")
+ self.racket_center_pos = self.sim.data.get_site_xpos("wam/paddle_center")
+
+ def close(self):
+ if self.viewer is not None:
+ glfw.destroy_window(self.viewer.window)
+ # self.viewer.window.close()
+ self.viewer = None
+ self._viewers = {}
+
+ # GoalEnv methods
+ # ----------------------------
+ def compute_reward(self, achieved_goal, goal, info):
+ # reset the reward, if action valid
+ # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"]
+ # right_court_contact_detector = self.reward_obj.contact_detection(self, right_court_contact_obj)
+ # if right_court_contact_detector:
+ # print("can detect the table ball contact")
+ self.reward_obj.total_reward = 0
+ # Stage 1 Hitting
+ self.reward_obj.hitting(self)
+ # if not hitted, return the highest reward
+ if not self.reward_obj.goal_achievement:
+ # return self.reward_obj.highest_reward
+ return self.reward_obj.total_reward
+ # # Stage 2 Right Table Contact
+ # self.reward_obj.right_table_contact(self)
+ # if not self.reward_obj.goal_achievement:
+ # return self.reward_obj.highest_reward
+ # # Stage 2 Net Contact
+ # self.reward_obj.net_contact(self)
+ # if not self.reward_obj.goal_achievement:
+ # return self.reward_obj.highest_reward
+ # Stage 3 Opponent court Contact
+ # self.reward_obj.landing_on_opponent_court(self)
+ # if not self.reward_obj.goal_achievement:
+ # print("self.reward_obj.highest_reward: ", self.reward_obj.highest_reward)
+ # TODO
+ self.reward_obj.target_achievement(self)
+ # return self.reward_obj.highest_reward
+ return self.reward_obj.total_reward
+
+ def _reset_sim(self):
+ self.sim.set_state(self.initial_state)
+ [initial_x, initial_y, initial_z, v_x, v_y, v_z] = self.initial_ball_state
+ self.sim.data.set_joint_qpos('tar:x', initial_x)
+ self.sim.data.set_joint_qpos('tar:y', initial_y)
+ self.sim.data.set_joint_qpos('tar:z', initial_z)
+ self.energy_corrected = True
+ self.give_reflection_reward = False
+
+ # velocity is positive direction
+ self.sim.data.set_joint_qvel('tar:x', v_x)
+ self.sim.data.set_joint_qvel('tar:y', v_y)
+ self.sim.data.set_joint_qvel('tar:z', v_z)
+
+ # Apply gravity compensation
+ if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]:
+ self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7]
+ self.sim.forward()
+ return True
+
+ def _env_setup(self, initial_qpos):
+ for name, value in initial_qpos.items():
+ self.sim.data.set_joint_qpos(name, value)
+
+ # Apply gravity compensation
+ if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]:
+ self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7]
+ self.sim.forward()
+
+ # Get the target position
+ self.initial_paddle_center_xpos = self.sim.data.get_site_xpos('wam/paddle_center').copy()
+ self.initial_paddle_center_vel = None # self.sim.get_site_
+
+ def _sample_goal(self):
+ goal = self.initial_paddle_center_xpos[:3] + self.np_random.uniform(-0.2, 0.2, size=3)
+ return goal.copy()
+
+ def _get_obs(self):
+
+ # positions of racket center
+ paddle_center_pos = self.sim.data.get_site_xpos('wam/paddle_center')
+ ball_pos = self.sim.data.get_site_xpos("target_ball")
+
+ dt = self.sim.nsubsteps * self.sim.model.opt.timestep
+ paddle_center_velp = self.sim.data.get_site_xvelp('wam/paddle_center') * dt
+ robot_qpos, robot_qvel = utils.robot_get_obs(self.sim)
+
+ wrist_state = robot_qpos[-3:]
+ wrist_vel = robot_qvel[-3:] * dt # change to a scalar if the gripper is made symmetric
+
+ # achieved_goal = paddle_body_EE_pos
+ obs = np.concatenate([
+ paddle_center_pos, paddle_center_velp, wrist_state, wrist_vel
+ ])
+
+ out_dict = {
+ 'observation': obs.copy(),
+ 'achieved_goal': paddle_center_pos.copy(),
+ 'desired_goal': self.goal.copy(),
+ 'q_pos': self.sim.data.qpos[:].copy(),
+ "ball_pos": ball_pos.copy(),
+ # "hitting_flag": self.reward_obj.hitting_flag
+ }
+
+ return out_dict
+
+ def _step_callback(self):
+ pass
+
+ def _set_action(self, action):
+ # Apply gravity compensation
+ if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]:
+ self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7]
+ # print("set action process running")
+ assert action.shape == (self.n_actions,)
+ self.action = action.copy() # ensure that we don't change the action outside of this scope
+ pos_ctrl = self.action[:] # limit maximum change in position
+ pos_ctrl = np.clip(pos_ctrl, self.action_space.low, self.action_space.high)
+
+ # get desired trajectory
+ self.sim.data.qpos[:7] = pos_ctrl
+ self.sim.forward()
+ self.desired_pos = self.sim.data.get_site_xpos('wam/paddle_center').copy()
+
+ self.sim.data.ctrl[:] = pos_ctrl
+
+ def _is_success(self, achieved_goal, desired_goal):
+ pass
+
+
+if __name__ == '__main__':
+ render_mode = "human" # "human" or "partial" or "final"
+ env = TableTennisEnv()
+ env.reset()
+ # env.render(mode=render_mode)
+
+ for i in range(500):
+ # objective.load_result("/tmp/cma")
+ # test with random actions
+ ac = env.action_space.sample()
+ # ac[0] += np.pi/2
+ obs, rew, d, info = env.step(ac)
+ env.render(mode=render_mode)
+
+ print(rew)
+
+ if d:
+ break
+
+ env.close()
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/__init__.py b/alr_envs/mujoco/gym_table_tennis/utils/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/experiment.py b/alr_envs/mujoco/gym_table_tennis/utils/experiment.py
new file mode 100644
index 0000000..addd6c5
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/utils/experiment.py
@@ -0,0 +1,83 @@
+import numpy as np
+from gym.utils import seeding
+from alr_envs.mujoco.gym_table_tennis.utils.util import read_yaml, read_json
+from pathlib import Path
+
+
+def ball_initialize(random=False, scale=False, context_range=None, scale_value=None):
+ if random:
+ if scale:
+ # if scale_value is None:
+ scale_value = context_scale_initialize(context_range)
+ v_x, v_y, v_z = [2.5, 2, 0.5] * scale_value
+ dx = 1
+ dy = 0
+ dz = 0.05
+ else:
+ seed = None
+ np_random, seed = seeding.np_random(seed)
+ dx = np_random.uniform(-0.1, 0.1)
+ dy = np_random.uniform(-0.1, 0.1)
+ dz = np_random.uniform(-0.1, 0.1)
+
+ v_x = np_random.uniform(1.7, 1.8)
+ v_y = np_random.uniform(0.7, 0.8)
+ v_z = np_random.uniform(0.1, 0.2)
+ # print(dx, dy, dz, v_x, v_y, v_z)
+ # else:
+ # dx = -0.1
+ # dy = 0.05
+ # dz = 0.05
+ # v_x = 1.5
+ # v_y = 0.7
+ # v_z = 0.06
+ # initial_x = -0.6 + dx
+ # initial_y = -0.3 + dy
+ # initial_z = 0.8 + dz
+ else:
+ if scale:
+ v_x, v_y, v_z = [2.5, 2, 0.5] * scale_value
+ else:
+ v_x = 2.5
+ v_y = 2
+ v_z = 0.5
+ dx = 1
+ dy = 0
+ dz = 0.05
+
+ initial_x = 0 + dx
+ initial_y = -0.2 + dy
+ initial_z = 0.3 + dz
+ # print("initial ball state: ", initial_x, initial_y, initial_z, v_x, v_y, v_z)
+ initial_ball_state = np.array([initial_x, initial_y, initial_z, v_x, v_y, v_z])
+ return initial_ball_state
+
+
+def context_scale_initialize(range):
+ """
+
+ Returns:
+
+ """
+ low, high = range
+ scale = np.random.uniform(low, high, 1)
+ return scale
+
+
+def config_handle_generation(config_file_path):
+ """Generate config handle for multiprocessing
+
+ Args:
+ config_file_path:
+
+ Returns:
+
+ """
+ cfg_fname = Path(config_file_path)
+ # .json and .yml file
+ if cfg_fname.suffix == ".json":
+ config = read_json(cfg_fname)
+ elif cfg_fname.suffix == ".yml":
+ config = read_yaml(cfg_fname)
+
+ return config
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/__init__.py b/alr_envs/mujoco/gym_table_tennis/utils/rewards/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py b/alr_envs/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py
new file mode 100644
index 0000000..fe69104
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py
@@ -0,0 +1,402 @@
+import numpy as np
+import logging
+
+
+class HierarchicalRewardTableTennis(object):
+ """Class for hierarchical reward function for table tennis experiment.
+
+ Return Highest Reward.
+ Reward = 0
+ Step 1: Action Valid. Upper Bound 0
+ [-∞, 0]
+ Reward += -1 * |hit_duration - hit_duration_threshold| * |hit_duration < hit_duration_threshold| * 10
+ Step 2: Hitting. Upper Bound 2
+ if hitting:
+ [0, 2]
+ Reward = 2 * (1 - tanh(|shortest_hitting_dist|))
+ if not hitting:
+ [0, 0.2]
+ Reward = 2 * (1 - tanh(|shortest_hitting_dist|))
+ Step 3: Target Point Achievement. Upper Bound 6
+ [0, 4]
+ if table_contact_detector:
+ Reward += 1
+ Reward += (1 - tanh(|shortest_hitting_dist|)) * 2
+ if contact_coordinate[0] < 0:
+ Reward += 1
+ else:
+ Reward += 0
+ elif:
+ Reward += (1 - tanh(|shortest_hitting_dist|))
+ """
+
+ def __init__(self):
+ self.reward = None
+ self.goal_achievement = False
+ self.total_reward = 0
+ self.shortest_hitting_dist = 1000
+ self.highest_reward = -1000
+ self.lowest_corner_dist = 100
+ self.right_court_contact_detector = False
+ self.table_contact_detector = False
+ self.floor_contact_detector = False
+ self.radius = 0.025
+ self.min_ball_x_pos = 100
+ self.hit_contact_detector = False
+ self.net_contact_detector = False
+ self.ratio = 1
+ self.lowest_z = 100
+ self.target_flag = False
+ self.dist_target_virtual = 100
+ self.ball_z_pos_lowest = 100
+ self.hitting_flag = False
+ self.hitting_time_point = None
+ self.ctxt_dim = None
+ self.context_range_bounds = None
+ # self.ctxt_out_of_range_punishment = None
+ # self.ctxt_in_side_of_range_punishment = None
+ #
+ # def check_where_invalid(self, ctxt, context_range_bounds, set_to_valid_region=False):
+ # idx_max = []
+ # idx_min = []
+ # for dim in range(self.ctxt_dim):
+ # min_dim = context_range_bounds[0][dim]
+ # max_dim = context_range_bounds[1][dim]
+ # idx_max_c = np.where(ctxt[:, dim] > max_dim)[0]
+ # idx_min_c = np.where(ctxt[:, dim] < min_dim)[0]
+ # if set_to_valid_region:
+ # if idx_max_c.shape[0] != 0:
+ # ctxt[idx_max_c, dim] = max_dim
+ # if idx_min_c.shape[0] != 0:
+ # ctxt[idx_min_c, dim] = min_dim
+ # idx_max.append(idx_max_c)
+ # idx_min.append(idx_min_c)
+ # return idx_max, idx_min, ctxt
+
+ def check_valid(self, scale, context_range_bounds):
+
+ min_dim = context_range_bounds[0][0]
+ max_dim = context_range_bounds[1][0]
+ valid = (scale < max_dim) and (scale > min_dim)
+ return valid
+
+ @classmethod
+ def goal_distance(cls, goal_a, goal_b):
+ assert goal_a.shape == goal_b.shape
+ return np.linalg.norm(goal_a - goal_b, axis=-1)
+
+ def refresh_highest_reward(self):
+ if self.total_reward >= self.highest_reward:
+ self.highest_reward = self.total_reward
+
+ def duration_valid(self):
+ pass
+
+ def huge_value_unstable(self):
+ self.total_reward += -10
+ self.highest_reward = -1
+
+ def context_valid(self, context):
+ valid = self.check_valid(context.copy(), context_range_bounds=self.context_range_bounds)
+ # when using dirac punishments
+ if valid:
+ self.total_reward += 1 # If Action Valid and Context Valid, total_reward = 0
+ else:
+ self.total_reward += 0
+ self.refresh_highest_reward()
+
+
+
+ # If in the ctxt, add 1, otherwise, 0
+
+ def action_valid(self, durations=None):
+ """Ensure the execution of the robot movement with parameters which are in a valid domain.
+
+ Time should always be positive,
+ the joint position of the robot should be a subset of [−π, π].
+ if all parameters are valid, the robot gets a zero score,
+ otherwise it gets a negative score proportional to how much it is beyond the valid parameter domain.
+
+ Returns:
+ rewards: if valid, reward is equal to 0.
+ if not valid, reward is negative and proportional to the distance beyond the valid parameter domain
+ """
+ assert durations.shape[0] == 2, "durations type should be np.array and the shape should be 2"
+ # pre_duration = durations[0]
+ hit_duration = durations[1]
+ # pre_duration_thres = 0.01
+ hit_duration_thres = 1
+ # self.goal_achievement = np.all(
+ # [(pre_duration > pre_duration_thres), (hit_duration > hit_duration_thres), (0.3 < pre_duration < 0.6)])
+ self.goal_achievement = (hit_duration > hit_duration_thres)
+ if self.goal_achievement:
+ self.total_reward = -1
+ self.goal_achievement = True
+ else:
+ # self.total_reward += -1 * ((np.abs(pre_duration - pre_duration_thres) * int(
+ # pre_duration < pre_duration_thres) + np.abs(hit_duration - hit_duration_thres) * int(
+ # hit_duration < hit_duration_thres)) * 10)
+ self.total_reward = -1 * ((np.abs(hit_duration - hit_duration_thres) * int(
+ hit_duration < hit_duration_thres)) * 10)
+ self.total_reward += -1
+ self.goal_achievement = False
+ self.refresh_highest_reward()
+
+ def motion_penalty(self, action, high_motion_penalty):
+ """Protects the robot from high acceleration and dangerous movement.
+ """
+ if not high_motion_penalty:
+ reward_ctrl = - 0.05 * np.square(action).sum()
+ else:
+ reward_ctrl = - 0.075 * np.square(action).sum()
+ self.total_reward += reward_ctrl
+ self.refresh_highest_reward()
+ self.goal_achievement = True
+
+ def hitting(self, env): # , target_ball_pos, racket_center_pos, hit_contact_detector=False
+ """Hitting reward calculation
+
+ If racket successfully hit the ball, the reward +1
+ Otherwise calculate the distance between the center of racket and the center of ball,
+ reward = tanh(r/dist) if dist<1 reward almost 2 , if dist >= 1 reward is between [0, 0.2]
+
+
+ Args:
+ env:
+
+ Returns:
+
+ """
+
+ hit_contact_obj = ["target_ball", "bat"]
+ target_ball_pos = env.target_ball_pos
+ racket_center_pos = env.racket_center_pos
+ # hit contact detection
+ # Record the hitting history
+ self.hitting_flag = False
+ if not self.hit_contact_detector:
+ self.hit_contact_detector = self.contact_detection(env, hit_contact_obj)
+ if self.hit_contact_detector:
+ print("First time detect hitting")
+ self.hitting_flag = True
+ if self.hit_contact_detector:
+
+ # TODO
+ dist = self.goal_distance(target_ball_pos, racket_center_pos)
+ if dist < 0:
+ dist = 0
+ # print("goal distance is:", dist)
+ if dist <= self.shortest_hitting_dist:
+ self.shortest_hitting_dist = dist
+ # print("shortest_hitting_dist is:", self.shortest_hitting_dist)
+ # Keep the shortest hitting distance.
+ dist_reward = 2 * (1 - np.tanh(np.abs(self.shortest_hitting_dist)))
+
+ # TODO sparse
+ # dist_reward = 2
+
+ self.total_reward += dist_reward
+ self.goal_achievement = True
+
+ # if self.hitting_time_point is not None and self.hitting_time_point > 600:
+ # self.total_reward += 1
+
+ else:
+ dist = self.goal_distance(target_ball_pos, racket_center_pos)
+ if dist <= self.shortest_hitting_dist:
+ self.shortest_hitting_dist = dist
+ dist_reward = 1 - np.tanh(self.shortest_hitting_dist)
+ reward = 0.2 * dist_reward # because it does not hit the ball, so multiply 0.2
+ self.total_reward += reward
+ self.goal_achievement = False
+
+ self.refresh_highest_reward()
+
+ @classmethod
+ def relu(cls, x):
+ return np.maximum(0, x)
+
+ # def right_table_contact(self, env):
+ # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"]
+ # if env.target_ball_pos[0] >= 0 and env.target_ball_pos[2] >= 0.7:
+ # # update right court contact detection
+ # if not self.right_court_contact_detector:
+ # self.right_court_contact_detector = self.contact_detection(env, right_court_contact_obj)
+ # if self.right_court_contact_detector:
+ # self.contact_x_pos = env.target_ball_pos[0]
+ # if self.right_court_contact_detector:
+ # self.total_reward += 1 - norm(0.685, 1).pdf(self.contact_x_pos) # x axis middle of right table
+ # self.goal_achievement = False
+ # else:
+ # self.total_reward += 1
+ # self.goal_achievement = True
+ # # else:
+ # # self.total_reward += 0
+ # # self.goal_achievement = False
+ # self.refresh_highest_reward()
+
+ # def net_contact(self, env):
+ # net_contact_obj = ["target_ball", "table_tennis_net"]
+ # # net_contact_detector = self.contact_detection(env, net_contact_obj)
+ # # ball_x_pos = env.target_ball_pos[0]
+ # # if self.min_ball_x_pos >= ball_x_pos:
+ # # self.min_ball_x_pos = ball_x_pos
+ # # table_left_edge_x_pos = -1.37
+ # # if np.abs(ball_x_pos) <= 0.01: # x threshold of net
+ # # if self.lowest_z >= env.target_ball_pos[2]:
+ # # self.lowest_z = env.target_ball_pos[2]
+ # # # construct a gaussian distribution of z
+ # # z_reward = 4 - norm(0, 0.1).pdf(self.lowest_z - 0.07625) # maximum 4
+ # # self.total_reward += z_reward
+ # # self.total_reward += 2 - np.minimum(1, self.relu(np.abs(self.min_ball_x_pos)))
+ # if not self.net_contact_detector:
+ # self.net_contact_detector = self.contact_detection(env, net_contact_obj)
+ # if self.net_contact_detector:
+ # self.total_reward += 0 # very high cost
+ # self.goal_achievement = False
+ # else:
+ # self.total_reward += 1
+ # self.goal_achievement = True
+ # self.refresh_highest_reward()
+
+ # def landing_on_opponent_court(self, env):
+ # # Very sparse reward
+ # # don't contact the right side court
+ # # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"]
+ # # right_court_contact_detector = self.contact_detection(env, right_court_contact_obj)
+ # left_court_contact_obj = ["target_ball", "table_tennis_table_left_side"]
+ # # left_court_contact_detector = self.contact_detection(env, left_court_contact_obj)
+ # # record the contact history
+ # # if not self.right_court_contact_detector:
+ # # self.right_court_contact_detector = self.contact_detection(env, right_court_contact_obj)
+ # if not self.table_contact_detector:
+ # self.table_contact_detector = self.contact_detection(env, left_court_contact_obj)
+ #
+ # dist_left_up_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("left_up_corner"))
+ # dist_middle_up_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("middle_up_corner"))
+ # dist_left_down_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("left_down_corner"))
+ # dist_middle_down_corner = self.goal_distance(env.target_ball_pos,
+ # env.sim.data.get_site_xpos("middle_down_corner"))
+ # dist_array = np.array(
+ # [dist_left_up_corner, dist_middle_up_corner, dist_left_down_corner, dist_middle_down_corner])
+ # dist_corner = np.amin(dist_array)
+ # if self.lowest_corner_dist >= dist_corner:
+ # self.lowest_corner_dist = dist_corner
+ #
+ # right_contact_cost = 1
+ # left_contact_reward = 2
+ # dist_left_table_reward = (2 - np.tanh(self.lowest_corner_dist))
+ # # TODO Try multi dimensional gaussian distribution
+ # # contact only the left side court
+ # if self.right_court_contact_detector:
+ # self.total_reward += 0
+ # self.goal_achievement = False
+ # if self.table_contact_detector:
+ # self.total_reward += left_contact_reward
+ # self.goal_achievement = False
+ # else:
+ # self.total_reward += dist_left_table_reward
+ # self.goal_achievement = False
+ # else:
+ # self.total_reward += right_contact_cost
+ # if self.table_contact_detector:
+ # self.total_reward += left_contact_reward
+ # self.goal_achievement = True
+ # else:
+ # self.total_reward += dist_left_table_reward
+ # self.goal_achievement = False
+ # self.refresh_highest_reward()
+ # # if self.left_court_contact_detector and not self.right_court_contact_detector:
+ # # self.total_reward += self.ratio * left_contact_reward
+ # # print("only left court reward return!!!!!!!!!")
+ # # print("contact only left court!!!!!!")
+ # # self.goal_achievement = True
+ # # # no contact with table
+ # # elif not self.right_court_contact_detector and not self.left_court_contact_detector:
+ # # self.total_reward += 0 + self.ratio * dist_left_table_reward
+ # # self.goal_achievement = False
+ # # # contact both side
+ # # elif self.right_court_contact_detector and self.left_court_contact_detector:
+ # # self.total_reward += self.ratio * (left_contact_reward - right_contact_cost) # cost of contact of right court
+ # # self.goal_achievement = False
+ # # # contact only the right side court
+ # # elif self.right_court_contact_detector and not self.left_court_contact_detector:
+ # # self.total_reward += 0 + self.ratio * (
+ # # dist_left_table_reward - right_contact_cost) # cost of contact of right court
+ # # self.goal_achievement = False
+
+ def target_achievement(self, env):
+ target_coordinate = np.array([-0.5, -0.5])
+ # net_contact_obj = ["target_ball", "table_tennis_net"]
+ table_contact_obj = ["target_ball", "table_tennis_table"]
+ floor_contact_obj = ["target_ball", "floor"]
+
+ if 0.78 < env.target_ball_pos[2] < 0.8:
+ dist_target_virtual = np.linalg.norm(env.target_ball_pos[:2] - target_coordinate)
+ if self.dist_target_virtual > dist_target_virtual:
+ self.dist_target_virtual = dist_target_virtual
+ if -0.07 < env.target_ball_pos[0] < 0.07 and env.sim.data.get_joint_qvel('tar:x') < 0:
+ if self.ball_z_pos_lowest > env.target_ball_pos[2]:
+ self.ball_z_pos_lowest = env.target_ball_pos[2].copy()
+ # if not self.net_contact_detector:
+ # self.net_contact_detector = self.contact_detection(env, net_contact_obj)
+ if not self.table_contact_detector:
+ self.table_contact_detector = self.contact_detection(env, table_contact_obj)
+ if not self.floor_contact_detector:
+ self.floor_contact_detector = self.contact_detection(env, floor_contact_obj)
+ if not self.target_flag:
+ # Table Contact Reward.
+ if self.table_contact_detector:
+ self.total_reward += 1
+ # only update when the first contact because of the flag
+ contact_coordinate = env.target_ball_pos[:2].copy()
+ print("contact table ball coordinate: ", env.target_ball_pos)
+ logging.info("contact table ball coordinate: {}".format(env.target_ball_pos))
+ dist_target = np.linalg.norm(contact_coordinate - target_coordinate)
+ self.total_reward += (1 - np.tanh(dist_target)) * 2
+ self.target_flag = True
+ # Net Contact Reward. Precondition: Table Contact exits.
+ if contact_coordinate[0] < 0:
+ print("left table contact")
+ logging.info("~~~~~~~~~~~~~~~left table contact~~~~~~~~~~~~~~~")
+ self.total_reward += 1
+ # TODO Z coordinate reward
+ # self.total_reward += np.maximum(np.tanh(self.ball_z_pos_lowest), 0)
+ self.goal_achievement = True
+ else:
+ print("right table contact")
+ logging.info("~~~~~~~~~~~~~~~right table contact~~~~~~~~~~~~~~~")
+ self.total_reward += 0
+ self.goal_achievement = False
+ # if self.net_contact_detector:
+ # self.total_reward += 0
+ # self.goal_achievement = False
+ # else:
+ # self.total_reward += 1
+ # self.goal_achievement = False
+ # Floor Contact Reward. Precondition: Table Contact exits.
+ elif self.floor_contact_detector:
+ self.total_reward += (1 - np.tanh(self.dist_target_virtual))
+ self.target_flag = True
+ self.goal_achievement = False
+ # No Contact of Floor or Table, flying
+ else:
+ pass
+ # else:
+ # print("Flag is True already")
+ self.refresh_highest_reward()
+
+ def distance_to_target(self):
+ pass
+
+ @classmethod
+ def contact_detection(cls, env, goal_contact):
+ for i in range(env.sim.data.ncon):
+ contact = env.sim.data.contact[i]
+ achieved_geom1_name = env.sim.model.geom_id2name(contact.geom1)
+ achieved_geom2_name = env.sim.model.geom_id2name(contact.geom2)
+ if np.all([(achieved_geom1_name in goal_contact), (achieved_geom2_name in goal_contact)]):
+ print("contact of " + achieved_geom1_name + " " + achieved_geom2_name)
+ return True
+ else:
+ return False
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/rewards/rewards.py b/alr_envs/mujoco/gym_table_tennis/utils/rewards/rewards.py
new file mode 100644
index 0000000..6e6aa32
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/utils/rewards/rewards.py
@@ -0,0 +1,136 @@
+# Copyright 2017 The dm_control Authors.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# ============================================================================
+
+# """Soft indicator function evaluating whether a number is within bounds."""
+#
+# from __future__ import absolute_import
+# from __future__ import division
+# from __future__ import print_function
+
+# Internal dependencies.
+import numpy as np
+
+# The value returned by tolerance() at `margin` distance from `bounds` interval.
+_DEFAULT_VALUE_AT_MARGIN = 0.1
+
+
+def _sigmoids(x, value_at_1, sigmoid):
+ """Returns 1 when `x` == 0, between 0 and 1 otherwise.
+
+ Args:
+ x: A scalar or numpy array.
+ value_at_1: A float between 0 and 1 specifying the output when `x` == 1.
+ sigmoid: String, choice of sigmoid type.
+
+ Returns:
+ A numpy array with values between 0.0 and 1.0.
+
+ Raises:
+ ValueError: If not 0 < `value_at_1` < 1, except for `linear`, `cosine` and
+ `quadratic` sigmoids which allow `value_at_1` == 0.
+ ValueError: If `sigmoid` is of an unknown type.
+ """
+ if sigmoid in ('cosine', 'linear', 'quadratic'):
+ if not 0 <= value_at_1 < 1:
+ raise ValueError('`value_at_1` must be nonnegative and smaller than 1, '
+ 'got {}.'.format(value_at_1))
+ else:
+ if not 0 < value_at_1 < 1:
+ raise ValueError('`value_at_1` must be strictly between 0 and 1, '
+ 'got {}.'.format(value_at_1))
+
+ if sigmoid == 'gaussian':
+ scale = np.sqrt(-2 * np.log(value_at_1))
+ return np.exp(-0.5 * (x*scale)**2)
+
+ elif sigmoid == 'hyperbolic':
+ scale = np.arccosh(1/value_at_1)
+ return 1 / np.cosh(x*scale)
+
+ elif sigmoid == 'long_tail':
+ scale = np.sqrt(1/value_at_1 - 1)
+ return 1 / ((x*scale)**2 + 1)
+
+ elif sigmoid == 'cosine':
+ scale = np.arccos(2*value_at_1 - 1) / np.pi
+ scaled_x = x*scale
+ return np.where(abs(scaled_x) < 1, (1 + np.cos(np.pi*scaled_x))/2, 0.0)
+
+ elif sigmoid == 'linear':
+ scale = 1-value_at_1
+ scaled_x = x*scale
+ return np.where(abs(scaled_x) < 1, 1 - scaled_x, 0.0)
+
+ elif sigmoid == 'quadratic':
+ scale = np.sqrt(1-value_at_1)
+ scaled_x = x*scale
+ return np.where(abs(scaled_x) < 1, 1 - scaled_x**2, 0.0)
+
+ elif sigmoid == 'tanh_squared':
+ scale = np.arctanh(np.sqrt(1-value_at_1))
+ return 1 - np.tanh(x*scale)**2
+
+ else:
+ raise ValueError('Unknown sigmoid type {!r}.'.format(sigmoid))
+
+
+def tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian',
+ value_at_margin=_DEFAULT_VALUE_AT_MARGIN):
+ """Returns 1 when `x` falls inside the bounds, between 0 and 1 otherwise.
+
+ Args:
+ x: A scalar or numpy array.
+ bounds: A tuple of floats specifying inclusive `(lower, upper)` bounds for
+ the target interval. These can be infinite if the interval is unbounded
+ at one or both ends, or they can be equal to one another if the target
+ value is exact.
+ margin: Float. Parameter that controls how steeply the output decreases as
+ `x` moves out-of-bounds.
+ * If `margin == 0` then the output will be 0 for all values of `x`
+ outside of `bounds`.
+ * If `margin > 0` then the output will decrease sigmoidally with
+ increasing distance from the nearest bound.
+ sigmoid: String, choice of sigmoid type. Valid values are: 'gaussian',
+ 'linear', 'hyperbolic', 'long_tail', 'cosine', 'tanh_squared'.
+ value_at_margin: A float between 0 and 1 specifying the output value when
+ the distance from `x` to the nearest bound is equal to `margin`. Ignored
+ if `margin == 0`.
+
+ Returns:
+ A float or numpy array with values between 0.0 and 1.0.
+
+ Raises:
+ ValueError: If `bounds[0] > bounds[1]`.
+ ValueError: If `margin` is negative.
+ """
+ lower, upper = bounds
+ if lower > upper:
+ raise ValueError('Lower bound must be <= upper bound.')
+ if margin < 0:
+ raise ValueError('`margin` must be non-negative.')
+
+ in_bounds = np.logical_and(lower <= x, x <= upper)
+ if margin == 0:
+ value = np.where(in_bounds, 1.0, 0.0)
+ else:
+ d = np.where(x < lower, lower - x, x - upper) / margin
+ value = np.where(in_bounds, 1.0, _sigmoids(d, value_at_margin, sigmoid))
+
+ return float(value) if np.isscalar(x) else value
+
+
+
+
+
diff --git a/alr_envs/mujoco/gym_table_tennis/utils/util.py b/alr_envs/mujoco/gym_table_tennis/utils/util.py
new file mode 100644
index 0000000..716b3c6
--- /dev/null
+++ b/alr_envs/mujoco/gym_table_tennis/utils/util.py
@@ -0,0 +1,49 @@
+import json
+import yaml
+import xml.etree.ElementTree as ET
+from collections import OrderedDict
+from pathlib import Path
+
+
+def read_json(fname):
+ fname = Path(fname)
+ with fname.open('rt') as handle:
+ return json.load(handle, object_hook=OrderedDict)
+
+
+def write_json(content, fname):
+ fname = Path(fname)
+ with fname.open('wt') as handle:
+ json.dump(content, handle, indent=4, sort_keys=False)
+
+
+def read_yaml(fname):
+ fname = Path(fname)
+ with fname.open('rt') as handle:
+ return yaml.load(handle, Loader=yaml.FullLoader)
+
+
+def write_yaml(content, fname):
+ fname = Path(fname)
+ with fname.open('wt') as handle:
+ yaml.dump(content, handle)
+
+
+def config_save(dir_path, config):
+ dir_path = Path(dir_path)
+ config_path_json = dir_path / "config.json"
+ config_path_yaml = dir_path / "config.yml"
+ # .json and .yml file,save 2 version of configuration.
+ write_json(config, config_path_json)
+ write_yaml(config, config_path_yaml)
+
+
+def change_kp_in_xml(kp_list,
+ model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/envs/robotics/assets/table_tennis/right_arm_actuator.xml"):
+ tree = ET.parse(model_path)
+ root = tree.getroot()
+ # for actuator in root.find("actuator"):
+ for position, kp in zip(root.iter('position'), kp_list):
+ position.set("kp", str(kp))
+ tree.write(model_path)
+
diff --git a/alr_envs/mujoco/meshes/wam/base_link_convex.stl b/alr_envs/mujoco/meshes/wam/base_link_convex.stl
new file mode 100755
index 0000000..133b112
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/base_link_convex.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/base_link_fine.stl b/alr_envs/mujoco/meshes/wam/base_link_fine.stl
new file mode 100755
index 0000000..047e9df
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/base_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup.stl b/alr_envs/mujoco/meshes/wam/cup.stl
new file mode 100644
index 0000000..bc34058
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split1.stl b/alr_envs/mujoco/meshes/wam/cup_split1.stl
new file mode 100644
index 0000000..c80aa61
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split10.stl b/alr_envs/mujoco/meshes/wam/cup_split10.stl
new file mode 100644
index 0000000..bd5708b
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split10.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split11.stl b/alr_envs/mujoco/meshes/wam/cup_split11.stl
new file mode 100644
index 0000000..ac81da2
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split11.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split12.stl b/alr_envs/mujoco/meshes/wam/cup_split12.stl
new file mode 100644
index 0000000..a18e96e
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split12.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split13.stl b/alr_envs/mujoco/meshes/wam/cup_split13.stl
new file mode 100644
index 0000000..f0e5832
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split13.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split14.stl b/alr_envs/mujoco/meshes/wam/cup_split14.stl
new file mode 100644
index 0000000..41a3e94
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split14.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split15.stl b/alr_envs/mujoco/meshes/wam/cup_split15.stl
new file mode 100644
index 0000000..7a26643
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split15.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split16.stl b/alr_envs/mujoco/meshes/wam/cup_split16.stl
new file mode 100644
index 0000000..155b24e
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split16.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split17.stl b/alr_envs/mujoco/meshes/wam/cup_split17.stl
new file mode 100644
index 0000000..2fe8d95
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split17.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split18.stl b/alr_envs/mujoco/meshes/wam/cup_split18.stl
new file mode 100644
index 0000000..f5287b2
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split18.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split2.stl b/alr_envs/mujoco/meshes/wam/cup_split2.stl
new file mode 100644
index 0000000..5c1e50c
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split3.stl b/alr_envs/mujoco/meshes/wam/cup_split3.stl
new file mode 100644
index 0000000..ef6d547
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split3.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split4.stl b/alr_envs/mujoco/meshes/wam/cup_split4.stl
new file mode 100644
index 0000000..5476296
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split4.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split5.stl b/alr_envs/mujoco/meshes/wam/cup_split5.stl
new file mode 100644
index 0000000..ccfcd42
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split5.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split6.stl b/alr_envs/mujoco/meshes/wam/cup_split6.stl
new file mode 100644
index 0000000..72d6287
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split6.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split7.stl b/alr_envs/mujoco/meshes/wam/cup_split7.stl
new file mode 100644
index 0000000..d4918f2
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split7.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split8.stl b/alr_envs/mujoco/meshes/wam/cup_split8.stl
new file mode 100644
index 0000000..8a0cd84
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split8.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/cup_split9.stl b/alr_envs/mujoco/meshes/wam/cup_split9.stl
new file mode 100644
index 0000000..4281a69
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/cup_split9.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/elbow_link_convex.stl b/alr_envs/mujoco/meshes/wam/elbow_link_convex.stl
new file mode 100755
index 0000000..b34963d
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/elbow_link_convex.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl b/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl
new file mode 100755
index 0000000..f6a1515
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/elbow_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..e6aa6b6
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..667902e
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/forearm_link_fine.stl b/alr_envs/mujoco/meshes/wam/forearm_link_fine.stl
new file mode 100755
index 0000000..ed66bbb
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/forearm_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..aba957d
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..5cca6a9
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl
new file mode 100755
index 0000000..3343e27
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_link_fine.stl b/alr_envs/mujoco/meshes/wam/shoulder_link_fine.stl
new file mode 100755
index 0000000..ae505fd
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_convex.stl b/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_convex.stl
new file mode 100755
index 0000000..c36cfec
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_convex.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_fine.stl b/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_fine.stl
new file mode 100755
index 0000000..dc633c4
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/shoulder_pitch_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..82d0093
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..7fd5a55
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/upper_arm_link_fine.stl b/alr_envs/mujoco/meshes/wam/upper_arm_link_fine.stl
new file mode 100755
index 0000000..76353ae
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/upper_arm_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_palm_link_convex.stl b/alr_envs/mujoco/meshes/wam/wrist_palm_link_convex.stl
new file mode 100755
index 0000000..a0386f6
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_palm_link_convex.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl
new file mode 100755
index 0000000..f6b41ad
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_palm_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..c36f88f
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..d00cac1
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl
new file mode 100755
index 0000000..34d1d8b
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl
new file mode 100755
index 0000000..13d2f73
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_pitch_link_fine.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl
new file mode 100755
index 0000000..06e857f
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl
new file mode 100755
index 0000000..48e1bb1
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl differ
diff --git a/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl
new file mode 100755
index 0000000..0d95239
Binary files /dev/null and b/alr_envs/mujoco/meshes/wam/wrist_yaw_link_fine.stl differ
diff --git a/alr_envs/mujoco/reacher/__init__.py b/alr_envs/mujoco/reacher/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/mujoco/alr_reacher.py b/alr_envs/mujoco/reacher/alr_reacher.py
similarity index 92%
rename from alr_envs/mujoco/alr_reacher.py
rename to alr_envs/mujoco/reacher/alr_reacher.py
index 58e273f..c6cca16 100644
--- a/alr_envs/mujoco/alr_reacher.py
+++ b/alr_envs/mujoco/reacher/alr_reacher.py
@@ -4,11 +4,11 @@ import numpy as np
from gym import utils
from gym.envs.mujoco import mujoco_env
-from alr_envs.utils.utils import angle_normalize
+import alr_envs.utils.utils as alr_utils
class ALRReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
- def __init__(self, steps_before_reward=200, n_links=5, balance=True, file_name=None):
+ def __init__(self, steps_before_reward=200, n_links=5, balance=False):
utils.EzPickle.__init__(**locals())
self._steps = 0
@@ -16,7 +16,7 @@ class ALRReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
self.n_links = n_links
self.balance = balance
- self.balance_weight = 0.5
+ self.balance_weight = 1.0
self.reward_weight = 1
if steps_before_reward == 200:
@@ -47,7 +47,7 @@ class ALRReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
if self.balance:
reward_balance -= self.balance_weight * np.abs(
- angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad"))
+ alr_utils.angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad"))
reward = reward_dist + reward_ctrl + angular_vel + reward_balance
self.do_simulation(a, self.frame_skip)
@@ -59,7 +59,7 @@ class ALRReacherEnv(mujoco_env.MujocoEnv, utils.EzPickle):
goal=self.goal if hasattr(self, "goal") else None)
def viewer_setup(self):
- self.viewer.cam.trackbodyid = 1
+ self.viewer.cam.trackbodyid = 0
def reset_model(self):
qpos = self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq) + self.init_qpos
diff --git a/alr_envs/mujoco/assets/reacher_5links.xml b/alr_envs/mujoco/reacher/assets/reacher_5links.xml
similarity index 100%
rename from alr_envs/mujoco/assets/reacher_5links.xml
rename to alr_envs/mujoco/reacher/assets/reacher_5links.xml
diff --git a/alr_envs/mujoco/assets/reacher_7links.xml b/alr_envs/mujoco/reacher/assets/reacher_7links.xml
similarity index 100%
rename from alr_envs/mujoco/assets/reacher_7links.xml
rename to alr_envs/mujoco/reacher/assets/reacher_7links.xml
diff --git a/alr_envs/utils/legacy/__init__.py b/alr_envs/utils/legacy/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/utils/legacy/detpmp_env_wrapper.py b/alr_envs/utils/legacy/detpmp_env_wrapper.py
new file mode 100644
index 0000000..c667abf
--- /dev/null
+++ b/alr_envs/utils/legacy/detpmp_env_wrapper.py
@@ -0,0 +1,88 @@
+from alr_envs.utils.policies import get_policy_class
+from mp_lib import det_promp
+import numpy as np
+import gym
+
+
+class DetPMPEnvWrapper(gym.Wrapper):
+ def __init__(self,
+ env,
+ num_dof,
+ num_basis,
+ width,
+ off=0.01,
+ start_pos=None,
+ duration=1,
+ dt=0.01,
+ post_traj_time=0.,
+ policy_type=None,
+ weights_scale=1,
+ zero_start=False,
+ zero_goal=False,
+ ):
+ super(DetPMPEnvWrapper, self).__init__(env)
+ self.num_dof = num_dof
+ self.num_basis = num_basis
+ self.dim = num_dof * num_basis
+ self.pmp = det_promp.DeterministicProMP(n_basis=num_basis, n_dof=num_dof, width=width, off=off,
+ zero_start=zero_start, zero_goal=zero_goal)
+ weights = np.zeros(shape=(num_basis, num_dof))
+ self.pmp.set_weights(duration, weights)
+ self.weights_scale = weights_scale
+
+ self.duration = duration
+ self.dt = dt
+ self.post_traj_steps = int(post_traj_time / dt)
+
+ self.start_pos = start_pos
+ self.zero_start = zero_start
+
+ policy_class = get_policy_class(policy_type)
+ self.policy = policy_class(env)
+
+ def __call__(self, params, contexts=None):
+ params = np.atleast_2d(params)
+ rewards = []
+ infos = []
+ for p, c in zip(params, contexts):
+ reward, info = self.rollout(p, c)
+ rewards.append(reward)
+ infos.append(info)
+
+ return np.array(rewards), infos
+
+ def rollout(self, params, context=None, render=False):
+ """ This function generates a trajectory based on a DMP and then does the usual loop over reset and step"""
+ params = np.reshape(params, newshape=(self.num_basis, self.num_dof)) * self.weights_scale
+ self.pmp.set_weights(self.duration, params)
+ t, des_pos, des_vel, des_acc = self.pmp.compute_trajectory(1 / self.dt, 1.)
+ if self.zero_start:
+ des_pos += self.start_pos[None, :]
+
+ if self.post_traj_steps > 0:
+ des_pos = np.vstack([des_pos, np.tile(des_pos[-1, :], [self.post_traj_steps, 1])])
+ des_vel = np.vstack([des_vel, np.zeros(shape=(self.post_traj_steps, self.num_dof))])
+
+ self._trajectory = des_pos
+ self._velocity = des_vel
+
+ rews = []
+ infos = []
+
+ self.env.configure(context)
+ self.env.reset()
+
+ for t, pos_vel in enumerate(zip(des_pos, des_vel)):
+ ac = self.policy.get_action(pos_vel[0], pos_vel[1])
+ obs, rew, done, info = self.env.step(ac)
+ rews.append(rew)
+ infos.append(info)
+ if render:
+ self.env.render(mode="human")
+ if done:
+ break
+
+ reward = np.sum(rews)
+
+ return reward, info
+
diff --git a/alr_envs/utils/legacy/dmp_async_vec_env.py b/alr_envs/utils/legacy/dmp_async_vec_env.py
new file mode 100644
index 0000000..641e770
--- /dev/null
+++ b/alr_envs/utils/legacy/dmp_async_vec_env.py
@@ -0,0 +1,173 @@
+import gym
+from gym.error import (AlreadyPendingCallError, NoAsyncCallError)
+from gym.vector.utils import concatenate, create_empty_array
+from gym.vector.async_vector_env import AsyncState
+import numpy as np
+import multiprocessing as mp
+import sys
+
+
+def _worker(index, env_fn, pipe, parent_pipe, shared_memory, error_queue):
+ assert shared_memory is None
+ env = env_fn()
+ parent_pipe.close()
+ try:
+ while True:
+ command, data = pipe.recv()
+ if command == 'reset':
+ observation = env.reset()
+ pipe.send((observation, True))
+ elif command == 'step':
+ observation, reward, done, info = env.step(data)
+ if done:
+ observation = env.reset()
+ pipe.send(((observation, reward, done, info), True))
+ elif command == 'rollout':
+ rewards = []
+ infos = []
+ for p, c in zip(*data):
+ reward, info = env.rollout(p, c)
+ rewards.append(reward)
+ infos.append(info)
+ pipe.send(((rewards, infos), (True, ) * len(rewards)))
+ elif command == 'seed':
+ env.seed(data)
+ pipe.send((None, True))
+ elif command == 'close':
+ env.close()
+ pipe.send((None, True))
+ break
+ elif command == 'idle':
+ pipe.send((None, True))
+ elif command == '_check_observation_space':
+ pipe.send((data == env.observation_space, True))
+ else:
+ raise RuntimeError('Received unknown command `{0}`. Must '
+ 'be one of {`reset`, `step`, `seed`, `close`, '
+ '`_check_observation_space`}.'.format(command))
+ except (KeyboardInterrupt, Exception):
+ error_queue.put((index,) + sys.exc_info()[:2])
+ pipe.send((None, False))
+ finally:
+ env.close()
+
+
+class DmpAsyncVectorEnv(gym.vector.AsyncVectorEnv):
+ def __init__(self, env_fns, n_samples, observation_space=None, action_space=None,
+ shared_memory=False, copy=True, context="spawn", daemon=True, worker=_worker):
+ super(DmpAsyncVectorEnv, self).__init__(env_fns,
+ observation_space=observation_space,
+ action_space=action_space,
+ shared_memory=shared_memory,
+ copy=copy,
+ context=context,
+ daemon=daemon,
+ worker=worker)
+
+ # we need to overwrite the number of samples as we may sample more than num_envs
+ self.observations = create_empty_array(self.single_observation_space,
+ n=n_samples,
+ fn=np.zeros)
+
+ def __call__(self, params, contexts=None):
+ return self.rollout(params, contexts)
+
+ def rollout_async(self, params, contexts):
+ """
+ Parameters
+ ----------
+ params : iterable of samples from `action_space`
+ List of actions.
+ """
+ self._assert_is_running()
+ if self._state != AsyncState.DEFAULT:
+ raise AlreadyPendingCallError('Calling `rollout_async` while waiting '
+ 'for a pending call to `{0}` to complete.'.format(
+ self._state.value), self._state.value)
+
+ params = np.atleast_2d(params)
+ split_params = np.array_split(params, np.minimum(len(params), self.num_envs))
+ if contexts is None:
+ split_contexts = np.array_split([None, ] * len(params), np.minimum(len(params), self.num_envs))
+ else:
+ split_contexts = np.array_split(contexts, np.minimum(len(contexts), self.num_envs))
+
+ assert np.all([len(p) == len(c) for p, c in zip(split_params, split_contexts)])
+ for pipe, param, context in zip(self.parent_pipes, split_params, split_contexts):
+ pipe.send(('rollout', (param, context)))
+ for pipe in self.parent_pipes[len(split_params):]:
+ pipe.send(('idle', None))
+ self._state = AsyncState.WAITING_ROLLOUT
+
+ def rollout_wait(self, timeout=None):
+ """
+ Parameters
+ ----------
+ timeout : int or float, optional
+ Number of seconds before the call to `step_wait` times out. If
+ `None`, the call to `step_wait` never times out.
+
+ Returns
+ -------
+ observations : sample from `observation_space`
+ A batch of observations from the vectorized environment.
+
+ rewards : `np.ndarray` instance (dtype `np.float_`)
+ A vector of rewards from the vectorized environment.
+
+ dones : `np.ndarray` instance (dtype `np.bool_`)
+ A vector whose entries indicate whether the episode has ended.
+
+ infos : list of dict
+ A list of auxiliary diagnostic information.
+ """
+ self._assert_is_running()
+ if self._state != AsyncState.WAITING_ROLLOUT:
+ raise NoAsyncCallError('Calling `rollout_wait` without any prior call '
+ 'to `rollout_async`.', AsyncState.WAITING_ROLLOUT.value)
+
+ if not self._poll(timeout):
+ self._state = AsyncState.DEFAULT
+ raise mp.TimeoutError('The call to `rollout_wait` has timed out after '
+ '{0} second{1}.'.format(timeout, 's' if timeout > 1 else ''))
+
+ results, successes = zip(*[pipe.recv() for pipe in self.parent_pipes])
+ results = [r for r in results if r is not None]
+ self._raise_if_errors(successes)
+ self._state = AsyncState.DEFAULT
+
+ rewards, infos = [_flatten_list(r) for r in zip(*results)]
+
+ # for now, we ignore the observations and only return the rewards
+
+ # if not self.shared_memory:
+ # self.observations = concatenate(observations_list, self.observations,
+ # self.single_observation_space)
+
+ # return (deepcopy(self.observations) if self.copy else self.observations,
+ # np.array(rewards), np.array(dones, dtype=np.bool_), infos)
+
+ return np.array(rewards), infos
+
+ def rollout(self, actions, contexts):
+ self.rollout_async(actions, contexts)
+ return self.rollout_wait()
+
+
+def _flatten_obs(obs):
+ assert isinstance(obs, (list, tuple))
+ assert len(obs) > 0
+
+ if isinstance(obs[0], dict):
+ keys = obs[0].keys()
+ return {k: np.stack([o[k] for o in obs]) for k in keys}
+ else:
+ return np.stack(obs)
+
+
+def _flatten_list(l):
+ assert isinstance(l, (list, tuple))
+ assert len(l) > 0
+ assert all([len(l_) > 0 for l_ in l])
+
+ return [l__ for l_ in l for l__ in l_]
diff --git a/alr_envs/utils/legacy/dmp_env_wrapper.py b/alr_envs/utils/legacy/dmp_env_wrapper.py
new file mode 100644
index 0000000..6835d80
--- /dev/null
+++ b/alr_envs/utils/legacy/dmp_env_wrapper.py
@@ -0,0 +1,125 @@
+from alr_envs.utils.policies import get_policy_class
+from mp_lib.phase import ExpDecayPhaseGenerator
+from mp_lib.basis import DMPBasisGenerator
+from mp_lib import dmps
+import numpy as np
+import gym
+
+
+class DmpEnvWrapper(gym.Wrapper):
+ def __init__(self,
+ env,
+ num_dof,
+ num_basis,
+ start_pos=None,
+ final_pos=None,
+ duration=1,
+ dt=0.01,
+ alpha_phase=2,
+ bandwidth_factor=3,
+ learn_goal=False,
+ post_traj_time=0.,
+ policy_type=None,
+ weights_scale=1.,
+ goal_scale=1.,
+ ):
+ super(DmpEnvWrapper, self).__init__(env)
+ self.num_dof = num_dof
+ self.num_basis = num_basis
+ self.dim = num_dof * num_basis
+ if learn_goal:
+ self.dim += num_dof
+ self.learn_goal = learn_goal
+ self.duration = duration # seconds
+ time_steps = int(duration / dt)
+ self.t = np.linspace(0, duration, time_steps)
+ self.post_traj_steps = int(post_traj_time / dt)
+
+ phase_generator = ExpDecayPhaseGenerator(alpha_phase=alpha_phase, duration=duration)
+ basis_generator = DMPBasisGenerator(phase_generator,
+ duration=duration,
+ num_basis=self.num_basis,
+ basis_bandwidth_factor=bandwidth_factor)
+
+ self.dmp = dmps.DMP(num_dof=num_dof,
+ basis_generator=basis_generator,
+ phase_generator=phase_generator,
+ num_time_steps=time_steps,
+ dt=dt
+ )
+
+ self.dmp.dmp_start_pos = start_pos.reshape((1, num_dof))
+
+ dmp_weights = np.zeros((num_basis, num_dof))
+ if learn_goal:
+ dmp_goal_pos = np.zeros(num_dof)
+ else:
+ dmp_goal_pos = final_pos
+
+ self.dmp.set_weights(dmp_weights, dmp_goal_pos)
+ self.weights_scale = weights_scale
+ self.goal_scale = goal_scale
+
+ policy_class = get_policy_class(policy_type)
+ self.policy = policy_class(env)
+
+ def __call__(self, params, contexts=None):
+ params = np.atleast_2d(params)
+ rewards = []
+ infos = []
+ for p, c in zip(params, contexts):
+ reward, info = self.rollout(p, c)
+ rewards.append(reward)
+ infos.append(info)
+
+ return np.array(rewards), infos
+
+ def goal_and_weights(self, params):
+ if len(params.shape) > 1:
+ assert params.shape[1] == self.dim
+ else:
+ assert len(params) == self.dim
+ params = np.reshape(params, [1, self.dim])
+
+ if self.learn_goal:
+ goal_pos = params[0, -self.num_dof:]
+ weight_matrix = np.reshape(params[:, :-self.num_dof], [self.num_basis, self.num_dof])
+ else:
+ goal_pos = self.dmp.dmp_goal_pos.flatten()
+ assert goal_pos is not None
+ weight_matrix = np.reshape(params, [self.num_basis, self.num_dof])
+
+ return goal_pos * self.goal_scale, weight_matrix * self.weights_scale
+
+ def rollout(self, params, context=None, render=False):
+ """ This function generates a trajectory based on a DMP and then does the usual loop over reset and step"""
+ goal_pos, weight_matrix = self.goal_and_weights(params)
+ self.dmp.set_weights(weight_matrix, goal_pos)
+ trajectory, velocity = self.dmp.reference_trajectory(self.t)
+
+ if self.post_traj_steps > 0:
+ trajectory = np.vstack([trajectory, np.tile(trajectory[-1, :], [self.post_traj_steps, 1])])
+ velocity = np.vstack([velocity, np.zeros(shape=(self.post_traj_steps, self.num_dof))])
+
+ self._trajectory = trajectory
+ self._velocity = velocity
+
+ rews = []
+ infos = []
+
+ self.env.configure(context)
+ self.env.reset()
+
+ for t, pos_vel in enumerate(zip(trajectory, velocity)):
+ ac = self.policy.get_action(pos_vel[0], pos_vel[1])
+ obs, rew, done, info = self.env.step(ac)
+ rews.append(rew)
+ infos.append(info)
+ if render:
+ self.env.render(mode="human")
+ if done:
+ break
+
+ reward = np.sum(rews)
+
+ return reward, info
diff --git a/alr_envs/utils/legacy/dmp_env_wrapper_example.py b/alr_envs/utils/legacy/dmp_env_wrapper_example.py
new file mode 100644
index 0000000..d2edae5
--- /dev/null
+++ b/alr_envs/utils/legacy/dmp_env_wrapper_example.py
@@ -0,0 +1,28 @@
+from alr_envs.utils.legacy.utils import make_holereacher_env
+import numpy as np
+
+if __name__ == "__main__":
+ n_samples = 1
+ n_cpus = 4
+ dim = 30
+
+ # env = DmpAsyncVectorEnv([make_viapointreacher_env(i) for i in range(n_cpus)],
+ # n_samples=n_samples)
+
+ test_env = make_holereacher_env(0)()
+
+ # params = np.random.randn(n_samples, dim)
+ params = np.array([[1.386102, -3.29980525, 4.70402733, 1.3966668, 0.73774902,
+ 3.14676681, -4.98644416, 6.20303193, 1.30502127, -0.09330522,
+ 7.62656797, -5.76893033, 3.4706711, -0.6944142, -3.33442788,
+ 12.31421548, -0.72760271, -6.9090723, 7.02903814, -8.7236836,
+ 1.4805914, 0.53185824, -5.46626893, 0.69692163, 13.58472666,
+ 0.77199316, 2.02906724, -3.0203244, -1.00533159, -0.57417351]])
+
+ # params = np.hstack([50 * np.random.randn(n_samples, 25), np.tile(np.array([np.pi/2, -np.pi/4, -np.pi/4, -np.pi/4, -np.pi/4]), [n_samples, 1])])
+
+ rew, info = test_env.rollout(params, render=True)
+ print(rew)
+
+ # out = env(params)
+ # print(out)
diff --git a/alr_envs/utils/legacy/dmp_pd_control_example.py b/alr_envs/utils/legacy/dmp_pd_control_example.py
new file mode 100644
index 0000000..3b713f3
--- /dev/null
+++ b/alr_envs/utils/legacy/dmp_pd_control_example.py
@@ -0,0 +1,28 @@
+from alr_envs.mujoco.ball_in_a_cup.utils import make_simple_dmp_env
+import numpy as np
+
+if __name__ == "__main__":
+
+ dim = 15
+ n_cpus = 4
+
+ # n_samples = 10
+ #
+ # vec_env = DmpAsyncVectorEnv([make_simple_env(i) for i in range(n_cpus)],
+ # n_samples=n_samples)
+ #
+ # params = np.tile(1 * np.random.randn(n_samples, dim), (10, 1))
+ #
+ # rewards, infos = vec_env(params)
+ # print(rewards)
+ #
+ non_vec_env = make_simple_dmp_env(0, 0)()
+
+ # params = 0.5 * np.random.randn(dim)
+ params = np.array([-2.63357598, -1.04950296, -0.44330737, 0.52950017, 4.29247739,
+ 4.52473661, -0.05685977, -0.76796851, 3.71540499, 1.22631059,
+ 2.20412438, 3.91588129, -0.12652723, -3.0788211 , 0.56204464])
+
+ out2 = non_vec_env.rollout(params, render=True )
+
+ print(out2)
diff --git a/alr_envs/utils/legacy/utils.py b/alr_envs/utils/legacy/utils.py
new file mode 100644
index 0000000..c158cae
--- /dev/null
+++ b/alr_envs/utils/legacy/utils.py
@@ -0,0 +1,156 @@
+import alr_envs.classic_control.hole_reacher as hr
+import alr_envs.classic_control.viapoint_reacher as vpr
+from alr_envs.utils.wrapper.dmp_wrapper import DmpWrapper
+from alr_envs.utils.wrapper.detpmp_wrapper import DetPMPWrapper
+import numpy as np
+
+
+def make_viapointreacher_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ _env = vpr.ViaPointReacher(n_links=5,
+ allow_self_collision=False,
+ collision_penalty=1000)
+
+ _env = DmpWrapper(_env,
+ num_dof=5,
+ num_basis=5,
+ duration=2,
+ alpha_phase=2.5,
+ dt=_env.dt,
+ start_pos=_env.start_pos,
+ learn_goal=False,
+ policy_type="velocity",
+ weights_scale=50)
+ _env.seed(seed + rank)
+ return _env
+
+ return _init
+
+
+def make_holereacher_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ _env = hr.HoleReacher(n_links=5,
+ allow_self_collision=False,
+ allow_wall_collision=False,
+ hole_width=0.25,
+ hole_depth=1,
+ hole_x=2,
+ collision_penalty=100)
+
+ _env = DmpWrapper(_env,
+ num_dof=5,
+ num_basis=5,
+ duration=2,
+ bandwidth_factor=2,
+ dt=_env.dt,
+ learn_goal=True,
+ alpha_phase=2,
+ start_pos=_env.start_pos,
+ policy_type="velocity",
+ weights_scale=50,
+ goal_scale=0.1
+ )
+
+ _env.seed(seed + rank)
+ return _env
+
+ return _init
+
+
+def make_holereacher_fix_goal_env(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ _env = hr.HoleReacher(n_links=5,
+ allow_self_collision=False,
+ allow_wall_collision=False,
+ hole_width=0.15,
+ hole_depth=1,
+ hole_x=1,
+ collision_penalty=100)
+
+ _env = DmpWrapper(_env,
+ num_dof=5,
+ num_basis=5,
+ duration=2,
+ dt=_env.dt,
+ learn_goal=False,
+ final_pos=np.array([2.02669572, -1.25966385, -1.51618198, -0.80946476, 0.02012344]),
+ alpha_phase=2,
+ start_pos=_env.start_pos,
+ policy_type="velocity",
+ weights_scale=50,
+ goal_scale=1
+ )
+
+ _env.seed(seed + rank)
+ return _env
+
+ return _init
+
+
+def make_holereacher_env_pmp(rank, seed=0):
+ """
+ Utility function for multiprocessed env.
+
+ :param env_id: (str) the environment ID
+ :param num_env: (int) the number of environments you wish to have in subprocesses
+ :param seed: (int) the initial seed for RNG
+ :param rank: (int) index of the subprocess
+ :returns a function that generates an environment
+ """
+
+ def _init():
+ _env = hr.HoleReacher(n_links=5,
+ allow_self_collision=False,
+ allow_wall_collision=False,
+ hole_width=0.15,
+ hole_depth=1,
+ hole_x=1,
+ collision_penalty=1000)
+
+ _env = DetPMPWrapper(_env,
+ num_dof=5,
+ num_basis=5,
+ width=0.02,
+ policy_type="velocity",
+ start_pos=_env.start_pos,
+ duration=2,
+ post_traj_time=0,
+ dt=_env.dt,
+ weights_scale=0.2,
+ zero_start=True,
+ zero_goal=False
+ )
+ _env.seed(seed + rank)
+ return _env
+
+ return _init
diff --git a/alr_envs/utils/make_env_helpers.py b/alr_envs/utils/make_env_helpers.py
new file mode 100644
index 0000000..c0e55b4
--- /dev/null
+++ b/alr_envs/utils/make_env_helpers.py
@@ -0,0 +1,29 @@
+from alr_envs.utils.wrapper.dmp_wrapper import DmpWrapper
+from alr_envs.utils.wrapper.detpmp_wrapper import DetPMPWrapper
+import gym
+from gym.vector.utils import write_to_shared_memory
+import sys
+
+
+def make_env(env_id, seed, rank):
+ env = gym.make(env_id)
+ env.seed(seed + rank)
+ return lambda: env
+
+
+def make_contextual_env(env_id, context, seed, rank):
+ env = gym.make(env_id, context=context)
+ env.seed(seed + rank)
+ return lambda: env
+
+
+def make_dmp_env(**kwargs):
+ name = kwargs.pop("name")
+ _env = gym.make(name)
+ return DmpWrapper(_env, **kwargs)
+
+
+def make_detpmp_env(**kwargs):
+ name = kwargs.pop("name")
+ _env = gym.make(name)
+ return DetPMPWrapper(_env, **kwargs)
diff --git a/alr_envs/utils/mp_env_async_sampler.py b/alr_envs/utils/mp_env_async_sampler.py
new file mode 100644
index 0000000..344cb37
--- /dev/null
+++ b/alr_envs/utils/mp_env_async_sampler.py
@@ -0,0 +1,82 @@
+import gym
+from gym.vector.async_vector_env import AsyncVectorEnv
+import numpy as np
+from _collections import defaultdict
+
+
+def make_env(env_id, rank, seed=0):
+ env = gym.make(env_id)
+ env.seed(seed + rank)
+ return lambda: env
+
+
+def split_array(ary, size):
+ n_samples = len(ary)
+ if n_samples < size:
+ tmp = np.zeros((size, ary.shape[1]))
+ tmp[0:n_samples] = ary
+ return [tmp]
+ elif n_samples == size:
+ return [ary]
+ else:
+ repeat = int(np.ceil(n_samples / size))
+ split = [k * size for k in range(1, repeat)]
+ sub_arys = np.split(ary, split)
+
+ if n_samples % size != 0:
+ tmp = np.zeros_like(sub_arys[0])
+ last = sub_arys[-1]
+ tmp[0: len(last)] = last
+ sub_arys[-1] = tmp
+
+ return sub_arys
+
+
+def _flatten_list(l):
+ assert isinstance(l, (list, tuple))
+ assert len(l) > 0
+ assert all([len(l_) > 0 for l_ in l])
+
+ return [l__ for l_ in l for l__ in l_]
+
+
+class AlrMpEnvSampler:
+ """
+ An asynchronous sampler for non contextual MPWrapper environments. A sampler object can be called with a set of
+ parameters and returns the corresponding final obs, rewards, dones and info dicts.
+ """
+ def __init__(self, env_id, num_envs, seed=0):
+ self.num_envs = num_envs
+ self.env = AsyncVectorEnv([make_env(env_id, seed, i) for i in range(num_envs)])
+
+ def __call__(self, params):
+ params = np.atleast_2d(params)
+ n_samples = params.shape[0]
+ split_params = split_array(params, self.num_envs)
+
+ vals = defaultdict(list)
+ for p in split_params:
+ obs, reward, done, info = self.env.step(p)
+ vals['obs'].append(obs)
+ vals['reward'].append(reward)
+ vals['done'].append(done)
+ vals['info'].append(info)
+
+ # do not return values above threshold
+ return np.vstack(vals['obs'])[:n_samples], np.hstack(vals['reward'])[:n_samples],\
+ _flatten_list(vals['done'])[:n_samples], _flatten_list(vals['info'])[:n_samples]
+
+
+if __name__ == "__main__":
+ env_name = "alr_envs:ALRBallInACupSimpleDMP-v0"
+ n_cpu = 8
+ dim = 15
+ n_samples = 10
+
+ sampler = AlrMpEnvSampler(env_name, num_envs=n_cpu)
+
+ thetas = np.random.randn(n_samples, dim) # usually form a search distribution
+
+ _, rewards, __, ___ = sampler(thetas)
+
+ print(rewards)
diff --git a/alr_envs/utils/policies.py b/alr_envs/utils/policies.py
new file mode 100644
index 0000000..e8874d6
--- /dev/null
+++ b/alr_envs/utils/policies.py
@@ -0,0 +1,48 @@
+from gym import Env
+
+from alr_envs.mujoco.alr_mujoco_env import AlrMujocoEnv
+
+
+class BaseController:
+ def __init__(self, env: Env):
+ self.env = env
+
+ def get_action(self, des_pos, des_vel):
+ raise NotImplementedError
+
+
+class PosController(BaseController):
+ def get_action(self, des_pos, des_vel):
+ return des_pos
+
+
+class VelController(BaseController):
+ def get_action(self, des_pos, des_vel):
+ return des_vel
+
+
+class PDController(BaseController):
+ def __init__(self, env: AlrMujocoEnv):
+ self.p_gains = env.p_gains
+ self.d_gains = env.d_gains
+ super(PDController, self).__init__(env)
+
+ def get_action(self, des_pos, des_vel):
+ # TODO: make standardized ALRenv such that all of them have current_pos/vel attributes
+ cur_pos = self.env.current_pos
+ cur_vel = self.env.current_vel
+ if len(des_pos) != len(cur_pos):
+ des_pos = self.env.extend_des_pos(des_pos)
+ if len(des_vel) != len(cur_vel):
+ des_vel = self.env.extend_des_vel(des_vel)
+ trq = self.p_gains * (des_pos - cur_pos) + self.d_gains * (des_vel - cur_vel)
+ return trq
+
+
+def get_policy_class(policy_type):
+ if policy_type == "motor":
+ return PDController
+ elif policy_type == "velocity":
+ return VelController
+ elif policy_type == "position":
+ return PosController
diff --git a/alr_envs/utils/utils.py b/alr_envs/utils/utils.py
index 2c15bcb..89205bd 100644
--- a/alr_envs/utils/utils.py
+++ b/alr_envs/utils/utils.py
@@ -19,3 +19,4 @@ def angle_normalize(x, type="deg"):
two_pi = 2 * np.pi
return x - two_pi * np.floor((x + np.pi) / two_pi)
+
diff --git a/alr_envs/utils/wrapper/__init__.py b/alr_envs/utils/wrapper/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/alr_envs/utils/wrapper/detpmp_wrapper.py b/alr_envs/utils/wrapper/detpmp_wrapper.py
new file mode 100644
index 0000000..62b93d5
--- /dev/null
+++ b/alr_envs/utils/wrapper/detpmp_wrapper.py
@@ -0,0 +1,40 @@
+import gym
+import numpy as np
+from mp_lib import det_promp
+
+from alr_envs.utils.wrapper.mp_wrapper import MPWrapper
+
+
+class DetPMPWrapper(MPWrapper):
+ def __init__(self, env, num_dof, num_basis, width, start_pos=None, duration=1, dt=0.01, post_traj_time=0.,
+ policy_type=None, weights_scale=1, zero_start=False, zero_goal=False, **mp_kwargs):
+ # self.duration = duration # seconds
+
+ super().__init__(env, num_dof, duration, dt, post_traj_time, policy_type, weights_scale,
+ num_basis=num_basis, width=width, start_pos=start_pos, zero_start=zero_start,
+ zero_goal=zero_goal)
+
+ action_bounds = np.inf * np.ones((self.mp.n_basis * self.mp.n_dof))
+ self.action_space = gym.spaces.Box(low=-action_bounds, high=action_bounds, dtype=np.float32)
+
+ self.start_pos = start_pos
+ self.dt = dt
+
+ def initialize_mp(self, num_dof: int, duration: int, dt: float, num_basis: int = 5, width: float = None,
+ start_pos: np.ndarray = None, zero_start: bool = False, zero_goal: bool = False):
+ pmp = det_promp.DeterministicProMP(n_basis=num_basis, n_dof=num_dof, width=width, off=0.01,
+ zero_start=zero_start, zero_goal=zero_goal)
+
+ weights = np.zeros(shape=(num_basis, num_dof))
+ pmp.set_weights(duration, weights)
+
+ return pmp
+
+ def mp_rollout(self, action):
+ params = np.reshape(action, (self.mp.n_basis, self.mp.n_dof)) * self.weights_scale
+ self.mp.set_weights(self.duration, params)
+ _, des_pos, des_vel, _ = self.mp.compute_trajectory(1 / self.dt, 1.)
+ if self.mp.zero_start:
+ des_pos += self.start_pos[None, :]
+
+ return des_pos, des_vel
diff --git a/alr_envs/utils/wrapper/dmp_wrapper.py b/alr_envs/utils/wrapper/dmp_wrapper.py
new file mode 100644
index 0000000..d6c8f92
--- /dev/null
+++ b/alr_envs/utils/wrapper/dmp_wrapper.py
@@ -0,0 +1,92 @@
+from mp_lib.phase import ExpDecayPhaseGenerator
+from mp_lib.basis import DMPBasisGenerator
+from mp_lib import dmps
+import numpy as np
+import gym
+
+from alr_envs.utils.wrapper.mp_wrapper import MPWrapper
+
+
+class DmpWrapper(MPWrapper):
+
+ def __init__(self, env: gym.Env, num_dof: int, num_basis: int, start_pos: np.ndarray = None,
+ final_pos: np.ndarray = None, duration: int = 1, alpha_phase: float = 2., dt: float = None,
+ learn_goal: bool = False, return_to_start: bool = False, post_traj_time: float = 0.,
+ weights_scale: float = 1., goal_scale: float = 1., bandwidth_factor: float = 3.,
+ policy_type: str = None):
+
+ """
+ This Wrapper generates a trajectory based on a DMP and will only return episodic performances.
+ Args:
+ env:
+ num_dof:
+ num_basis:
+ start_pos:
+ final_pos:
+ duration:
+ alpha_phase:
+ dt:
+ learn_goal:
+ post_traj_time:
+ policy_type:
+ weights_scale:
+ goal_scale:
+ """
+ self.learn_goal = learn_goal
+ dt = env.dt if hasattr(env, "dt") else dt
+ assert dt is not None
+ start_pos = start_pos if start_pos is not None else env.start_pos if hasattr(env, "start_pos") else None
+ assert start_pos is not None
+ if learn_goal:
+ final_pos = np.zeros_like(start_pos) # arbitrary, will be learned
+ else:
+ final_pos = final_pos if final_pos is not None else start_pos if return_to_start else None
+ assert final_pos is not None
+ self.t = np.linspace(0, duration, int(duration / dt))
+ self.goal_scale = goal_scale
+
+ super().__init__(env, num_dof, duration, dt, post_traj_time, policy_type, weights_scale,
+ num_basis=num_basis, start_pos=start_pos, final_pos=final_pos, alpha_phase=alpha_phase,
+ bandwidth_factor=bandwidth_factor)
+
+ action_bounds = np.inf * np.ones((np.prod(self.mp.dmp_weights.shape) + (num_dof if learn_goal else 0)))
+ self.action_space = gym.spaces.Box(low=-action_bounds, high=action_bounds, dtype=np.float32)
+
+ def initialize_mp(self, num_dof: int, duration: int, dt: float, num_basis: int = 5, start_pos: np.ndarray = None,
+ final_pos: np.ndarray = None, alpha_phase: float = 2., bandwidth_factor: float = 3.):
+
+ phase_generator = ExpDecayPhaseGenerator(alpha_phase=alpha_phase, duration=duration)
+ basis_generator = DMPBasisGenerator(phase_generator, duration=duration, num_basis=num_basis,
+ basis_bandwidth_factor=bandwidth_factor)
+
+ dmp = dmps.DMP(num_dof=num_dof, basis_generator=basis_generator, phase_generator=phase_generator,
+ num_time_steps=int(duration / dt), dt=dt)
+
+ dmp.dmp_start_pos = start_pos.reshape((1, num_dof))
+
+ weights = np.zeros((num_basis, num_dof))
+ goal_pos = np.zeros(num_dof) if self.learn_goal else final_pos
+
+ dmp.set_weights(weights, goal_pos)
+ return dmp
+
+ def goal_and_weights(self, params):
+ assert params.shape[-1] == self.action_space.shape[0]
+ params = np.atleast_2d(params)
+
+ if self.learn_goal:
+ goal_pos = params[0, -self.mp.num_dimensions:] # [num_dof]
+ params = params[:, :-self.mp.num_dimensions] # [1,num_dof]
+ # weight_matrix = np.reshape(params[:, :-self.num_dof], [self.num_basis, self.num_dof])
+ else:
+ goal_pos = self.mp.dmp_goal_pos.flatten()
+ assert goal_pos is not None
+ # weight_matrix = np.reshape(params, [self.num_basis, self.num_dof])
+
+ weight_matrix = np.reshape(params, self.mp.dmp_weights.shape)
+ return goal_pos * self.goal_scale, weight_matrix * self.weights_scale
+
+ def mp_rollout(self, action):
+ goal_pos, weight_matrix = self.goal_and_weights(action)
+ self.mp.set_weights(weight_matrix, goal_pos)
+ return self.mp.reference_trajectory(self.t)
diff --git a/alr_envs/utils/wrapper/mp_wrapper.py b/alr_envs/utils/wrapper/mp_wrapper.py
new file mode 100644
index 0000000..e9ab5f1
--- /dev/null
+++ b/alr_envs/utils/wrapper/mp_wrapper.py
@@ -0,0 +1,133 @@
+from abc import ABC, abstractmethod
+from collections import defaultdict
+
+import gym
+import numpy as np
+
+from alr_envs.utils.policies import get_policy_class
+
+
+class MPWrapper(gym.Wrapper, ABC):
+
+ def __init__(self,
+ env: gym.Env,
+ num_dof: int,
+ duration: int = 1,
+ dt: float = None,
+ post_traj_time: float = 0.,
+ policy_type: str = None,
+ weights_scale: float = 1.,
+ **mp_kwargs
+ ):
+ super().__init__(env)
+
+ # self.num_dof = num_dof
+ # self.num_basis = num_basis
+ # self.duration = duration # seconds
+
+ # dt = env.dt if hasattr(env, "dt") else dt
+ assert dt is not None # this should never happen as MPWrapper is a base class
+ self.post_traj_steps = int(post_traj_time / dt)
+
+ self.mp = self.initialize_mp(num_dof, duration, dt, **mp_kwargs)
+ self.weights_scale = weights_scale
+
+ policy_class = get_policy_class(policy_type)
+ self.policy = policy_class(env)
+
+ # rendering
+ self.render_mode = None
+ self.render_kwargs = None
+
+ # TODO: not yet final
+ def __call__(self, params, contexts=None):
+ params = np.atleast_2d(params)
+ obs = []
+ rewards = []
+ dones = []
+ infos = []
+ # for p, c in zip(params, contexts):
+ for p in params:
+ # self.configure(c)
+ ob, reward, done, info = self.step(p)
+ obs.append(ob)
+ rewards.append(reward)
+ dones.append(done)
+ infos.append(info)
+
+ return obs, np.array(rewards), dones, infos
+
+ def configure(self, context):
+ self.env.configure(context)
+
+ def step(self, action: np.ndarray):
+ """ This function generates a trajectory based on a DMP and then does the usual loop over reset and step"""
+ trajectory, velocity = self.mp_rollout(action)
+
+ if self.post_traj_steps > 0:
+ trajectory = np.vstack([trajectory, np.tile(trajectory[-1, :], [self.post_traj_steps, 1])])
+ velocity = np.vstack([velocity, np.zeros(shape=(self.post_traj_steps, self.mp.num_dimensions))])
+
+ # self._trajectory = trajectory
+ # self._velocity = velocity
+
+ rewards = 0
+ # infos = defaultdict(list)
+
+ # TODO: @Max Why do we need this configure, states should be part of the model
+ # TODO: Ask Onur if the context distribution needs to be outside the environment
+ # TODO: For now create a new env with each context
+ # self.env.configure(context)
+ obs = self.env.reset()
+ info = {}
+
+ for t, pos_vel in enumerate(zip(trajectory, velocity)):
+ ac = self.policy.get_action(pos_vel[0], pos_vel[1])
+ obs, rew, done, info = self.env.step(ac)
+ rewards += rew
+ # TODO return all dicts?
+ # [infos[k].append(v) for k, v in info.items()]
+ if self.render_mode:
+ self.env.render(mode=self.render_mode, **self.render_kwargs)
+ if done:
+ break
+
+ done = True
+ return obs, rewards, done, info
+
+ def render(self, mode='human', **kwargs):
+ """Only set render options here, such that they can be used during the rollout.
+ This only needs to be called once"""
+ self.render_mode = mode
+ self.render_kwargs = kwargs
+
+ # def __call__(self, actions):
+ # return self.step(actions)
+ # params = np.atleast_2d(params)
+ # rewards = []
+ # infos = []
+ # for p, c in zip(params, contexts):
+ # reward, info = self.rollout(p, c)
+ # rewards.append(reward)
+ # infos.append(info)
+ #
+ # return np.array(rewards), infos
+
+ @abstractmethod
+ def mp_rollout(self, action):
+ """
+ Generate trajectory and velocity based on the MP
+ Returns:
+ trajectory/positions, velocity
+ """
+ raise NotImplementedError()
+
+ @abstractmethod
+ def initialize_mp(self, num_dof: int, duration: int, dt: float, **kwargs):
+ """
+ Create respective instance of MP
+ Returns:
+ MP instance
+ """
+
+ raise NotImplementedError
diff --git a/example.py b/example.py
index fa2b92e..94da23c 100644
--- a/example.py
+++ b/example.py
@@ -1,22 +1,87 @@
-import time
-
+from collections import defaultdict
import gym
+import numpy as np
-if __name__ == '__main__':
- # env = gym.make('alr_envs:SimpleReacher-v0')
+def example_mujoco():
env = gym.make('alr_envs:ALRReacher-v0')
- # env = gym.make('alr_envs:ALRReacher7-v0')
- env = gym.make('alr_envs:Balancing-v0')
- state = env.reset()
+ rewards = 0
+ obs = env.reset()
+ # number of environment steps
for i in range(10000):
- state, reward, done, info = env.step(env.action_space.sample())
- print(reward)
+ obs, reward, done, info = env.step(env.action_space.sample())
+ rewards += reward
+
if i % 1 == 0:
env.render()
if done:
- state = env.reset()
+ print(rewards)
+ rewards = 0
+ obs = env.reset()
- # time.sleep(0.5)
+
+def example_dmp():
+ # env = gym.make("alr_envs:ViaPointReacherDMP-v0")
+ env = gym.make("alr_envs:HoleReacherDMP-v0")
+ rewards = 0
+ # env.render(mode=None)
+ obs = env.reset()
+
+ # number of samples/full trajectories (multiple environment steps)
+ for i in range(10):
+ obs, reward, done, info = env.step(env.action_space.sample())
+ rewards += reward
+
+ if i % 1 == 0:
+ # render full DMP trajectory
+ # render can only be called once in the beginning as well. That would render every trajectory
+ # Calling it after every trajectory allows to modify the mode. mode=None, disables rendering.
+ env.render(mode="human")
+
+ if done:
+ print(rewards)
+ rewards = 0
+ obs = env.reset()
+
+
+def example_async(n_cpu=4, seed=int('533D', 16)):
+ def make_env(env_id, seed, rank):
+ env = gym.make(env_id)
+ env.seed(seed + rank)
+ return lambda: env
+
+ def sample(env: gym.vector.VectorEnv, n_samples=100):
+ # for plotting
+ rewards = np.zeros(n_cpu)
+
+ # this would generate more samples than requested if n_samples % num_envs != 0
+ repeat = int(np.ceil(n_samples / env.num_envs))
+ vals = defaultdict(list)
+ for i in range(repeat):
+ obs, reward, done, info = envs.step(envs.action_space.sample())
+ vals['obs'].append(obs)
+ vals['reward'].append(reward)
+ vals['done'].append(done)
+ vals['info'].append(info)
+ rewards += reward
+ if np.any(done):
+ print(rewards[done])
+ rewards[done] = 0
+
+ # do not return values above threshold
+ return (*map(lambda v: np.stack(v)[:n_samples], vals.values()),)
+
+ envs = gym.vector.AsyncVectorEnv([make_env("alr_envs:HoleReacherDMP-v0", seed, i) for i in range(n_cpu)])
+
+ obs = envs.reset()
+ print(sample(envs, 16))
+
+
+if __name__ == '__main__':
+ # example_mujoco()
+ # example_dmp()
+ # example_async()
+ env = gym.make("alr_envs:HoleReacherDMP-v0", context=0.1)
+ print()
\ No newline at end of file
diff --git a/setup.py b/setup.py
index 81f949d..22e1d1c 100644
--- a/setup.py
+++ b/setup.py
@@ -2,5 +2,9 @@ from setuptools import setup
setup(name='alr_envs',
version='0.0.1',
- install_requires=['gym', 'PyQt5'] # And any other dependencies foo needs
+ install_requires=['gym',
+ 'PyQt5',
+ 'matplotlib',
+ 'mp_lib @ git+https://git@github.com/maxhuettenrauch/mp_lib@master#egg=mp_lib',
+ 'mujoco_py'], # And any other dependencies foo needs
)