diff --git a/README.md b/README.md index edd1aac..25cb6d8 100644 --- a/README.md +++ b/README.md @@ -1,199 +1,218 @@ -## ALR Robotics Control Environments +# Fancy Gym -This project offers a large variety of reinforcement learning environments under the unifying interface of [OpenAI gym](https://gym.openai.com/). -Besides, we also provide support (under the OpenAI interface) for the benchmark suites +`fancy_gym` offers a large variety of reinforcement learning environments under the unifying interface +of [OpenAI gym](https://gym.openai.com/). We provide support (under the OpenAI gym interface) for the benchmark suites [DeepMind Control](https://deepmind.com/research/publications/2020/dm-control-Software-and-Tasks-for-Continuous-Control) -(DMC) and [Metaworld](https://meta-world.github.io/). Custom (Mujoco) gym environments can be created according -to [this guide](https://github.com/openai/gym/blob/master/docs/creating-environments.md). Unlike existing libraries, we -additionally support to control agents with Dynamic Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP, -we only consider the mean usually). +(DMC) and [Metaworld](https://meta-world.github.io/). If those are not sufficient and you want to create your own custom +gym environments, use [this guide](https://www.gymlibrary.ml/content/environment_creation/). We highly appreciate it, if +you would then submit a PR for this environment to become part of `fancy_gym`. +In comparison to existing libraries, we additionally support to control agents with movement primitives, such as Dynamic +Movement Primitives (DMPs) and Probabilistic Movement Primitives (ProMP). -## Motion Primitive Environments (Episodic environments) +## Movement Primitive Environments (Episode-Based/Black-Box Environments) -Unlike step-based environments, motion primitive (MP) environments are closer related to stochastic search, black box -optimization, and methods that are often used in robotics. MP environments are trajectory-based and always execute a full -trajectory, which is generated by a Dynamic Motion Primitive (DMP) or a Probabilistic Motion Primitive (ProMP). The -generated trajectory is translated into individual step-wise actions by a controller. The exact choice of controller is, -however, dependent on the type of environment. We currently support position, velocity, and PD-Controllers for position, -velocity, and torque control, respectively. The goal of all MP environments is still to learn a policy. Yet, an action -represents the parametrization of the motion primitives to generate a suitable trajectory. Additionally, in this -framework we support all of this also for the contextual setting, for which we expose all changing substates of the -task as a single observation in the beginning. This requires to predict a new action/MP parametrization for each -trajectory. All environments provide next to the cumulative episode reward all collected information from each -step as part of the info dictionary. This information should, however, mainly be used for debugging and logging. - -|Key| Description| -|---|---| -`trajectory`| Generated trajectory from MP -`step_actions`| Step-wise executed action based on controller output -`step_observations`| Step-wise intermediate observations -`step_rewards`| Step-wise rewards -`trajectory_length`| Total number of environment interactions -`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. +Unlike step-based environments, movement primitive (MP) environments are closer related to stochastic search, black-box +optimization, and methods that are often used in traditional robotics and control. MP environments are typically +episode-based and execute a full trajectory, which is generated by a trajectory generator, such as a Dynamic Movement +Primitive (DMP) or a Probabilistic Movement Primitive (ProMP). The generated trajectory is translated into individual +step-wise actions by a trajectory tracking controller. The exact choice of controller is, however, dependent on the type +of environment. We currently support position, velocity, and PD-Controllers for position, velocity, and torque control, +respectively as well as a special controller for the MetaWorld control suite. +The goal of all MP environments is still to learn an optimal policy. Yet, an action represents the parametrization of +the motion primitives to generate a suitable trajectory. Additionally, in this framework we support all of this also for +the contextual setting, i.e. we expose the context space - a subset of the observation space - in the beginning of the +episode. This requires to predict a new action/MP parametrization for each context. ## Installation 1. Clone the repository ```bash -git clone git@github.com:ALRhub/alr_envs.git +git clone git@github.com:ALRhub/fancy_gym.git ``` 2. Go to the folder ```bash -cd alr_envs +cd fancy_gym ``` 3. Install with ```bash -pip install -e . +pip install -e . ``` -## Using the framework +In case you want to use dm_control oder metaworld, you can install them by specifying extras -We prepared [multiple examples](alr_envs/examples/), please have a look there for more specific examples. +```bash +pip install -e .[dmc, metaworld] +``` -### Step-wise environments +> **Note:** +> While our library already fully supports the new mujoco bindings, metaworld still relies on +> [mujoco_py](https://github.com/openai/mujoco-py), hence make sure to have mujoco 2.1 installed beforehand. + +## How to use Fancy Gym + +We will only show the basics here and prepared [multiple examples](fancy_gym/examples/) for a more detailed look. + +### Step-wise Environments ```python -import alr_envs +import fancy_gym -env = alr_envs.make('HoleReacher-v0', seed=1) -state = env.reset() +env = fancy_gym.make('Reacher5d-v0', seed=1) +obs = env.reset() for i in range(1000): - state, reward, done, info = env.step(env.action_space.sample()) + action = env.action_space.sample() + obs, reward, done, info = env.step(action) if i % 5 == 0: env.render() if done: - state = env.reset() + obs = env.reset() ``` -For Deepmind control tasks we expect the `env_id` to be specified as `domain_name-task_name` or for manipulation tasks -as `manipulation-environment_name`. All other environments can be created based on their original name. +When using `dm_control` tasks we expect the `env_id` to be specified as `dmc:domain_name-task_name` or for manipulation +tasks as `dmc:manipulation-environment_name`. For `metaworld` tasks, we require the structure `metaworld:env_id-v2`, our +custom tasks and standard gym environments can be created without prefixes. -Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` always executs a full -trajectory. +### Black-box Environments + +All environments provide by default the cumulative episode reward, this can however be changed if necessary. Optionally, +each environment returns all collected information from each step as part of the infos. This information is, however, +mainly meant for debugging as well as logging and not for training. + +|Key| Description|Type +|---|---|---| +`positions`| Generated trajectory from MP | Optional +`velocities`| Generated trajectory from MP | Optional +`step_actions`| Step-wise executed action based on controller output | Optional +`step_observations`| Step-wise intermediate observations | Optional +`step_rewards`| Step-wise rewards | Optional +`trajectory_length`| Total number of environment interactions | Always +`other`| All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. In case some information are not provided every time step, the missing values are filled with `None`. | Always + +Existing MP tasks can be created the same way as above. Just keep in mind, calling `step()` executes a full trajectory. + +> **Note:** +> Currently, we are also in the process of enabling replanning as well as learning of sub-trajectories. +> This allows to split the episode into multiple trajectories and is a hybrid setting between step-based and +> black-box leaning. +> While this is already implemented, it is still in beta and requires further testing. +> Feel free to try it and open an issue with any problems that occur. ```python -import alr_envs +import fancy_gym -env = alr_envs.make('HoleReacherProMP-v0', seed=1) -# render() can be called once in the beginning with all necessary arguments. To turn it of again just call render(None). -env.render() +env = fancy_gym.make('Reacher5dProMP-v0', seed=1) +# render() can be called once in the beginning with all necessary arguments. +# To turn it of again just call render() without any arguments. +env.render(mode='human') -state = env.reset() +# This returns the context information, not the full state observation +obs = env.reset() for i in range(5): - state, reward, done, info = env.step(env.action_space.sample()) + action = env.action_space.sample() + obs, reward, done, info = env.step(action) - # Not really necessary as the environments resets itself after each trajectory anyway. - state = env.reset() + # Done is always True as we are working on the episode level, hence we always reset() + obs = env.reset() ``` -To show all available environments, we provide some additional convenience. Each value will return a dictionary with two -keys `DMP` and `ProMP` that store a list of available environment names. +To show all available environments, we provide some additional convenience variables. All of them return a dictionary +with two keys `DMP` and `ProMP` that store a list of available environment ids. ```python -import alr_envs +import fancy_gym -print("Custom MP tasks:") -print(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS) +print("Fancy Black-box tasks:") +print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS) -print("OpenAI Gym MP tasks:") -print(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS) +print("OpenAI Gym Black-box tasks:") +print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS) -print("Deepmind Control MP tasks:") -print(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS) +print("Deepmind Control Black-box tasks:") +print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS) -print("MetaWorld MP tasks:") -print(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS) +print("MetaWorld Black-box tasks:") +print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS) ``` ### How to create a new MP task In case a required task is not supported yet in the MP framework, it can be created relatively easy. For the task at -hand, the following interface needs to be implemented. +hand, the following [interface](fancy_gym/black_box/raw_interface_wrapper.py) needs to be implemented. ```python +from abc import abstractmethod +from typing import Union, Tuple + +import gym import numpy as np -from mp_env_api import MPEnvWrapper -class MPWrapper(MPEnvWrapper): +class RawInterfaceWrapper(gym.Wrapper): @property - def active_obs(self): + def context_mask(self) -> np.ndarray: """ - Returns boolean mask for each substate in the full observation. - It determines whether the observation is returned for the contextual case or not. - This effectively allows to filter unwanted or unnecessary observations from the full step-based case. - E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the first - observation, the velocities are not necessary in the observation for the MP task. + Returns boolean mask of the same shape as the observation space. + It determines whether the observation is returned for the contextual case or not. + This effectively allows to filter unwanted or unnecessary observations from the full step-based case. + E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the + context/part of the first observation, the velocities are not necessary in the observation for the task. + Returns: + bool array representing the indices of the observations + """ - return np.ones(self.observation_space.shape, dtype=bool) + return np.ones(self.env.observation_space.shape[0], dtype=bool) @property - def current_vel(self): + @abstractmethod + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: """ - Returns the current velocity of the action/control dimension. + Returns the current position of the action/control dimension. The dimensionality has to match the action/control dimension. - This is not required when exclusively using position control, - it should, however, be implemented regardless. - E.g. The joint velocities that are directly or indirectly controlled by the action. - """ - raise NotImplementedError() - - @property - def current_pos(self): - """ - Returns the current position of the action/control dimension. - The dimensionality has to match the action/control dimension. - This is not required when exclusively using velocity control, + This is not required when exclusively using velocity control, it should, however, be implemented regardless. E.g. The joint positions that are directly or indirectly controlled by the action. """ raise NotImplementedError() @property - def goal_pos(self): + @abstractmethod + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: """ - Returns a predefined final position of the action/control dimension. - This is only required for the DMP and is most of the time learned instead. - """ - raise NotImplementedError() - - @property - def dt(self): - """ - Returns the time between two simulated steps of the environment + Returns the current velocity of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using position control, + it should, however, be implemented regardless. + E.g. The joint velocities that are directly or indirectly controlled by the action. """ raise NotImplementedError() ``` -If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. -Without the integration the task can still be used. A rough outline can be shown here, for more details we recommend -having a look at the [examples](alr_envs/examples/). +If you created a new task wrapper, feel free to open a PR, so we can integrate it for others to use as well. Without the +integration the task can still be used. A rough outline can be shown here, for more details we recommend having a look +at the [examples](fancy_gym/examples/). ```python -import alr_envs +import fancy_gym # Base environment name, according to structure of above example base_env_id = "ball_in_cup-catch" -# Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. +# Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInferfaceWrapper. # You can also add other gym.Wrappers in case they are needed, # e.g. gym.wrappers.FlattenObservation for dict observations -wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] -mp_kwargs = {...} +wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper] kwargs = {...} -env = alr_envs.make_dmp_env(base_env_id, wrappers=wrappers, seed=1, mp_kwargs=mp_kwargs, **kwargs) -# OR for a deterministic ProMP (other mp_kwargs are required): -# env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) +env = fancy_gym.make_bb(base_env_id, wrappers=wrappers, seed=0, **kwargs) rewards = 0 obs = env.reset() diff --git a/alr_envs/__init__.py b/alr_envs/__init__.py deleted file mode 100644 index 858a66c..0000000 --- a/alr_envs/__init__.py +++ /dev/null @@ -1,15 +0,0 @@ -from alr_envs import dmc, meta, open_ai -from alr_envs.utils.make_env_helpers import make, make_dmp_env, make_promp_env, make_rank -from alr_envs.utils import make_dmc - -# Convenience function for all MP environments -from .alr import ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS -from .dmc import ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS -from .meta import ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS -from .open_ai import ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS - -ALL_MOTION_PRIMITIVE_ENVIRONMENTS = { - key: value + ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS[key] + - ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS[key] + - ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS[key] - for key, value in ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS.items()} diff --git a/alr_envs/alr/__init__.py b/alr_envs/alr/__init__.py deleted file mode 100644 index 8a7140d..0000000 --- a/alr_envs/alr/__init__.py +++ /dev/null @@ -1,499 +0,0 @@ -import numpy as np -from gym import register - -from . import classic_control, mujoco -from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv -from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv -from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv -from .mujoco.ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv -from .mujoco.ball_in_a_cup.biac_pd import ALRBallInACupPDEnv -from .mujoco.reacher.alr_reacher import ALRReacherEnv -from .mujoco.reacher.balancing import BalancingEnv - -from .mujoco.table_tennis.tt_gym import MAX_EPISODE_STEPS - -ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} - -# Classic Control -## Simple Reacher -register( - id='SimpleReacher-v0', - entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 2, - } -) - -register( - id='SimpleReacher-v1', - entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 2, - "random_start": False - } -) - -register( - id='LongSimpleReacher-v0', - entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - } -) - -register( - id='LongSimpleReacher-v1', - entry_point='alr_envs.alr.classic_control:SimpleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False - } -) - -## Viapoint Reacher - -register( - id='ViaPointReacher-v0', - entry_point='alr_envs.alr.classic_control:ViaPointReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "allow_self_collision": False, - "collision_penalty": 1000 - } -) - -## Hole Reacher -register( - id='HoleReacher-v0', - entry_point='alr_envs.alr.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": True, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": None, - "hole_depth": 1, - "hole_x": None, - "collision_penalty": 100, - } -) - -register( - id='HoleReacher-v1', - entry_point='alr_envs.alr.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": 0.25, - "hole_depth": 1, - "hole_x": None, - "collision_penalty": 100, - } -) - -register( - id='HoleReacher-v2', - entry_point='alr_envs.alr.classic_control:HoleReacherEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - "random_start": False, - "allow_self_collision": False, - "allow_wall_collision": False, - "hole_width": 0.25, - "hole_depth": 1, - "hole_x": 2, - "collision_penalty": 1, - } -) - -# Mujoco - -## Reacher -register( - id='ALRReacher-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 0, - "n_links": 5, - "balance": False, - } -) - -register( - id='ALRReacherSparse-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 5, - "balance": False, - } -) - -register( - id='ALRReacherSparseBalanced-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 5, - "balance": True, - } -) - -register( - id='ALRLongReacher-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 0, - "n_links": 7, - "balance": False, - } -) - -register( - id='ALRLongReacherSparse-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 7, - "balance": False, - } -) - -register( - id='ALRLongReacherSparseBalanced-v0', - entry_point='alr_envs.alr.mujoco:ALRReacherEnv', - max_episode_steps=200, - kwargs={ - "steps_before_reward": 200, - "n_links": 7, - "balance": True, - } -) - -## Balancing Reacher - -register( - id='Balancing-v0', - entry_point='alr_envs.alr.mujoco:BalancingEnv', - max_episode_steps=200, - kwargs={ - "n_links": 5, - } -) - -## Table Tennis -register(id='TableTennis2DCtxt-v0', - entry_point='alr_envs.alr.mujoco:TTEnvGym', - max_episode_steps=MAX_EPISODE_STEPS, - kwargs={'ctxt_dim': 2}) - -register(id='TableTennis2DCtxt-v1', - entry_point='alr_envs.alr.mujoco:TTEnvGym', - max_episode_steps=MAX_EPISODE_STEPS, - kwargs={'ctxt_dim': 2, 'fixed_goal': True}) - -register(id='TableTennis4DCtxt-v0', - entry_point='alr_envs.alr.mujoco:TTEnvGym', - max_episode_steps=MAX_EPISODE_STEPS, - kwargs={'ctxt_dim': 4}) - -## BeerPong -difficulties = ["simple", "intermediate", "hard", "hardest"] - -for v, difficulty in enumerate(difficulties): - register( - id='ALRBeerPong-v{}'.format(v), - entry_point='alr_envs.alr.mujoco:ALRBeerBongEnv', - max_episode_steps=600, - kwargs={ - "difficulty": difficulty, - "reward_type": "staged", - } - ) - -# Motion Primitive Environments - -## Simple Reacher -_versions = ["SimpleReacher-v0", "SimpleReacher-v1", "LongSimpleReacher-v0", "LongSimpleReacher-v1"] -for _v in _versions: - _name = _v.split("-") - _env_id = f'{_name[0]}DMP-{_name[1]}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"alr_envs:{_v}", - "wrappers": [classic_control.simple_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2 if "long" not in _v.lower() else 5, - "num_basis": 5, - "duration": 2, - "alpha_phase": 2, - "learn_goal": True, - "policy_type": "motor", - "weights_scale": 50, - "policy_kwargs": { - "p_gains": .6, - "d_gains": .075 - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - - _env_id = f'{_name[0]}ProMP-{_name[1]}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"alr_envs:{_v}", - "wrappers": [classic_control.simple_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2 if "long" not in _v.lower() else 5, - "num_basis": 5, - "duration": 2, - "policy_type": "motor", - "weights_scale": 1, - "zero_start": True, - "policy_kwargs": { - "p_gains": .6, - "d_gains": .075 - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -# Viapoint reacher -register( - id='ViaPointReacherDMP-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": "alr_envs:ViaPointReacher-v0", - "wrappers": [classic_control.viapoint_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2, - "policy_type": "velocity", - "weights_scale": 50, - } - } -) -ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0") - -register( - id="ViaPointReacherProMP-v0", - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"alr_envs:ViaPointReacher-v0", - "wrappers": [classic_control.viapoint_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "policy_type": "velocity", - "weights_scale": 1, - "zero_start": True - } - } -) -ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0") - -## Hole Reacher -_versions = ["v0", "v1", "v2"] -for _v in _versions: - _env_id = f'HoleReacherDMP-{_v}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"alr_envs:HoleReacher-{_v}", - "wrappers": [classic_control.hole_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2.5, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1 - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - - _env_id = f'HoleReacherProMP-{_v}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"alr_envs:HoleReacher-{_v}", - "wrappers": [classic_control.hole_reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "policy_type": "velocity", - "weights_scale": 5, - "zero_start": True - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -## ALRReacher -_versions = ["ALRReacher-v0", "ALRLongReacher-v0", "ALRReacherSparse-v0", "ALRLongReacherSparse-v0"] -for _v in _versions: - _name = _v.split("-") - _env_id = f'{_name[0]}DMP-{_name[1]}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"alr_envs:{_v}", - "wrappers": [mujoco.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5 if "long" not in _v.lower() else 7, - "num_basis": 2, - "duration": 4, - "alpha_phase": 2, - "learn_goal": True, - "policy_type": "motor", - "weights_scale": 5, - "policy_kwargs": { - "p_gains": 1, - "d_gains": 0.1 - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - - _env_id = f'{_name[0]}ProMP-{_name[1]}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"alr_envs:{_v}", - "wrappers": [mujoco.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 5 if "long" not in _v.lower() else 7, - "num_basis": 1, - "duration": 4, - "policy_type": "motor", - "weights_scale": 5, - "zero_start": True, - "policy_kwargs": { - "p_gains": 1, - "d_gains": 0.1 - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -## Beerpong -_versions = ["v0", "v1", "v2", "v3"] -for _v in _versions: - _env_id = f'BeerpongProMP-{_v}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"alr_envs:ALRBeerPong-{_v}", - "wrappers": [mujoco.beerpong.MPWrapper], - "mp_kwargs": { - "num_dof": 7, - "num_basis": 2, - "duration": 1, - "post_traj_time": 2, - "policy_type": "motor", - "weights_scale": 1, - "zero_start": True, - "zero_goal": False, - "policy_kwargs": { - "p_gains": np.array([ 1.5, 5, 2.55, 3, 2., 2, 1.25]), - "d_gains": np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]) - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -## Table Tennis -ctxt_dim = [2, 4] -for _v, cd in enumerate(ctxt_dim): - _env_id = f'TableTennisProMP-v{_v}' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "alr_envs:TableTennis{}DCtxt-v0".format(cd), - "wrappers": [mujoco.table_tennis.MPWrapper], - "mp_kwargs": { - "num_dof": 7, - "num_basis": 2, - "duration": 1.25, - "post_traj_time": 4.5, - "policy_type": "motor", - "weights_scale": 1.0, - "zero_start": True, - "zero_goal": False, - "policy_kwargs": { - "p_gains": 0.5*np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]), - "d_gains": 0.5*np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]) - } - } - } - ) - ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -register( - id='TableTennisProMP-v2', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "alr_envs:TableTennis2DCtxt-v1", - "wrappers": [mujoco.table_tennis.MPWrapper], - "mp_kwargs": { - "num_dof": 7, - "num_basis": 2, - "duration": 1., - "post_traj_time": 2.5, - "policy_type": "motor", - "weights_scale": 1, - "off": -0.05, - "bandwidth_factor": 3.5, - "zero_start": True, - "zero_goal": False, - "policy_kwargs": { - "p_gains": 0.5*np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]), - "d_gains": 0.5*np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]) - } - } - } -) -ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("TableTennisProMP-v2") diff --git a/alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py b/alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py deleted file mode 100644 index feb545f..0000000 --- a/alr_envs/alr/classic_control/hole_reacher/mp_wrapper.py +++ /dev/null @@ -1,43 +0,0 @@ -from typing import Tuple, Union - -import numpy as np - -from mp_env_api import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - @property - def active_obs(self): - return np.hstack([ - [self.env.random_start] * self.env.n_links, # cos - [self.env.random_start] * self.env.n_links, # sin - [self.env.random_start] * self.env.n_links, # velocity - [self.env.initial_width is None], # hole width - # [self.env.hole_depth is None], # hole depth - [True] * 2, # x-y coordinates of target distance - [False] # env steps - ]) - - # @property - # def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: - # return self._joint_angles.copy() - # - # @property - # def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - # return self._angle_velocity.copy() - - @property - def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: - return self.env.current_pos - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - return self.env.current_vel - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/__init__.py b/alr_envs/alr/mujoco/__init__.py deleted file mode 100644 index cdb3cde..0000000 --- a/alr_envs/alr/mujoco/__init__.py +++ /dev/null @@ -1,6 +0,0 @@ -from .reacher.alr_reacher import ALRReacherEnv -from .reacher.balancing import BalancingEnv -from .ball_in_a_cup.ball_in_a_cup import ALRBallInACupEnv -from .ball_in_a_cup.biac_pd import ALRBallInACupPDEnv -from .table_tennis.tt_gym import TTEnvGym -from .beerpong.beerpong import ALRBeerBongEnv \ No newline at end of file diff --git a/alr_envs/alr/mujoco/alr_reward_fct.py b/alr_envs/alr/mujoco/alr_reward_fct.py deleted file mode 100644 index c96dd07..0000000 --- a/alr_envs/alr/mujoco/alr_reward_fct.py +++ /dev/null @@ -1,21 +0,0 @@ -class AlrReward: - """ - A base class for non-Markovian reward functions which may need trajectory information to calculate an episodic - reward. Call the methods in reset() and step() of the environment. - """ - - # methods to override: - # ---------------------------- - def reset(self, *args, **kwargs): - """ - Reset the reward function, empty state buffers before an episode, set contexts that influence reward, etc. - """ - raise NotImplementedError - - def compute_reward(self, *args, **kwargs): - """ - - Returns: Useful things to return are reward values, success flags or crash flags - - """ - raise NotImplementedError diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml b/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml deleted file mode 100644 index 9229ad5..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/assets/biac_base.xml +++ /dev/null @@ -1,361 +0,0 @@ - - - diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py deleted file mode 100644 index 7cff670..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup.py +++ /dev/null @@ -1,196 +0,0 @@ -from gym import utils -import os -import numpy as np -from gym.envs.mujoco import MujocoEnv - - - -class ALRBallInACupEnv(MujocoEnv, utils.EzPickle): - def __init__(self, n_substeps=4, apply_gravity_comp=True, simplified: bool = False, - reward_type: str = None, context: np.ndarray = None): - utils.EzPickle.__init__(**locals()) - self._steps = 0 - - self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml") - - self._q_pos = [] - self._q_vel = [] - # self.weight_matrix_scale = 50 - self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.]) - - self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7]) - self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7]) - - self.context = context - - alr_mujoco_env.AlrMujocoEnv.__init__(self, - self.xml_path, - apply_gravity_comp=apply_gravity_comp, - n_substeps=n_substeps) - self._start_pos = np.array([0.0, 0.58760536, 0.0, 1.36004913, 0.0, -0.32072943, -1.57]) - self._start_vel = np.zeros(7) - - self.simplified = simplified - - self.sim_time = 8 # seconds - self.sim_steps = int(self.sim_time / self.dt) - if reward_type == "no_context": - from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward - reward_function = BallInACupReward - elif reward_type == "contextual_goal": - from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward - reward_function = BallInACupReward - else: - raise ValueError("Unknown reward type: {}".format(reward_type)) - self.reward_function = reward_function(self.sim_steps) - - @property - def start_pos(self): - if self.simplified: - return self._start_pos[1::2] - else: - return self._start_pos - - @property - def start_vel(self): - if self.simplified: - return self._start_vel[1::2] - else: - return self._start_vel - - @property - def current_pos(self): - return self.sim.data.qpos[0:7].copy() - - @property - def current_vel(self): - return self.sim.data.qvel[0:7].copy() - - def reset(self): - self.reward_function.reset(None) - return super().reset() - - def reset_model(self): - init_pos_all = self.init_qpos.copy() - init_pos_robot = self._start_pos - init_vel = np.zeros_like(init_pos_all) - - self._steps = 0 - self._q_pos = [] - self._q_vel = [] - - start_pos = init_pos_all - start_pos[0:7] = init_pos_robot - - self.set_state(start_pos, init_vel) - - return self._get_obs() - - def step(self, a): - reward_dist = 0.0 - angular_vel = 0.0 - reward_ctrl = - np.square(a).sum() - - crash = self.do_simulation(a) - # joint_cons_viol = self.check_traj_in_joint_limits() - - self._q_pos.append(self.sim.data.qpos[0:7].ravel().copy()) - self._q_vel.append(self.sim.data.qvel[0:7].ravel().copy()) - - ob = self._get_obs() - - if not crash: - reward, success, is_collided = self.reward_function.compute_reward(a, self) - done = success or self._steps == self.sim_steps - 1 or is_collided - self._steps += 1 - else: - reward = -2000 - success = False - is_collided = False - done = True - return ob, reward, done, dict(reward_dist=reward_dist, - reward_ctrl=reward_ctrl, - velocity=angular_vel, - # traj=self._q_pos, - action=a, - q_pos=self.sim.data.qpos[0:7].ravel().copy(), - q_vel=self.sim.data.qvel[0:7].ravel().copy(), - is_success=success, - is_collided=is_collided, sim_crash=crash) - - def check_traj_in_joint_limits(self): - return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min) - - # TODO: extend observation space - def _get_obs(self): - theta = self.sim.data.qpos.flat[:7] - return np.concatenate([ - np.cos(theta), - np.sin(theta), - # self.get_body_com("target"), # only return target to make problem harder - [self._steps], - ]) - - # TODO - @property - def active_obs(self): - return np.hstack([ - [False] * 7, # cos - [False] * 7, # sin - # [True] * 2, # x-y coordinates of target distance - [False] # env steps - ]) - - # These functions are for the task with 3 joint actuations - def extend_des_pos(self, des_pos): - des_pos_full = self._start_pos.copy() - des_pos_full[1] = des_pos[0] - des_pos_full[3] = des_pos[1] - des_pos_full[5] = des_pos[2] - return des_pos_full - - def extend_des_vel(self, des_vel): - des_vel_full = self._start_vel.copy() - des_vel_full[1] = des_vel[0] - des_vel_full[3] = des_vel[1] - des_vel_full[5] = des_vel[2] - return des_vel_full - - def render(self, render_mode, **render_kwargs): - if render_mode == "plot_trajectory": - if self._steps == 1: - import matplotlib.pyplot as plt - # plt.ion() - self.fig, self.axs = plt.subplots(3, 1) - - if self._steps <= 1750: - for ax, cp in zip(self.axs, self.current_pos[1::2]): - ax.scatter(self._steps, cp, s=2, marker=".") - - # self.fig.show() - - else: - super().render(render_mode, **render_kwargs) - - -if __name__ == "__main__": - env = ALRBallInACupEnv() - ctxt = np.array([-0.20869846, -0.66376693, 1.18088501]) - - env.configure(ctxt) - env.reset() - # env.render() - for i in range(16000): - # test with random actions - ac = 0.001 * env.action_space.sample()[0:7] - # ac = env.start_pos - # ac[0] += np.pi/2 - obs, rew, d, info = env.step(ac) - # env.render() - - print(rew) - - if d: - break - - env.close() diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py deleted file mode 100644 index 945fa8d..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_mp_wrapper.py +++ /dev/null @@ -1,42 +0,0 @@ -from typing import Tuple, Union - -import numpy as np - -from mp_env_api import MPEnvWrapper - - -class BallInACupMPWrapper(MPEnvWrapper): - - @property - def active_obs(self): - # TODO: @Max Filter observations correctly - return np.hstack([ - [False] * 7, # cos - [False] * 7, # sin - # [True] * 2, # x-y coordinates of target distance - [False] # env steps - ]) - - @property - def start_pos(self): - if self.simplified: - return self._start_pos[1::2] - else: - return self._start_pos - - @property - def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qpos[0:7].copy() - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qvel[0:7].copy() - - @property - def goal_pos(self): - # TODO: @Max I think the default value of returning to the start is reasonable here - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py deleted file mode 100644 index 4ea4381..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward.py +++ /dev/null @@ -1,142 +0,0 @@ -import numpy as np -from alr_envs.alr.mujoco import alr_reward_fct - - -class BallInACupReward(alr_reward_fct.AlrReward): - def __init__(self, sim_time): - self.sim_time = sim_time - - self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom", - "wrist_pitch_link_convex_decomposition_p1_geom", - "wrist_pitch_link_convex_decomposition_p2_geom", - "wrist_pitch_link_convex_decomposition_p3_geom", - "wrist_yaw_link_convex_decomposition_p1_geom", - "wrist_yaw_link_convex_decomposition_p2_geom", - "forearm_link_convex_decomposition_p1_geom", - "forearm_link_convex_decomposition_p2_geom"] - - self.ball_id = None - self.ball_collision_id = None - self.goal_id = None - self.goal_final_id = None - self.collision_ids = None - - self.ball_traj = None - self.dists = None - self.dists_ctxt = None - self.dists_final = None - self.costs = None - - self.reset(None) - - def reset(self, context): - self.ball_traj = np.zeros(shape=(self.sim_time, 3)) - self.cup_traj = np.zeros(shape=(self.sim_time, 3)) - self.dists = [] - self.dists_ctxt = [] - self.dists_final = [] - self.costs = [] - self.context = context - self.ball_in_cup = False - self.ball_above_threshold = False - self.dist_ctxt = 3 - self.action_costs = [] - self.cup_angles = [] - - def compute_reward(self, action, sim, step): - action_cost = np.sum(np.square(action)) - self.action_costs.append(action_cost) - - stop_sim = False - success = False - - self.ball_id = sim.model._body_name2id["ball"] - self.ball_collision_id = sim.model._geom_name2id["ball_geom"] - self.goal_id = sim.model._site_name2id["cup_goal"] - self.goal_final_id = sim.model._site_name2id["cup_goal_final"] - self.collision_ids = [sim.model._geom_name2id[name] for name in self.collision_objects] - - if self.check_collision(sim): - reward = - 1e-3 * action_cost - 1000 - stop_sim = True - return reward, success, stop_sim - - # Compute the current distance from the ball to the inner part of the cup - goal_pos = sim.data.site_xpos[self.goal_id] - ball_pos = sim.data.body_xpos[self.ball_id] - goal_final_pos = sim.data.site_xpos[self.goal_final_id] - self.dists.append(np.linalg.norm(goal_pos - ball_pos)) - self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) - self.dists_ctxt.append(np.linalg.norm(ball_pos - self.context)) - self.ball_traj[step, :] = np.copy(ball_pos) - self.cup_traj[step, :] = np.copy(goal_pos) # ? - cup_quat = np.copy(sim.data.body_xquat[sim.model._body_name2id["cup"]]) - self.cup_angles.append(np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]), - 1 - 2 * (cup_quat[1] ** 2 + cup_quat[2] ** 2))) - - # Determine the first time when ball is in cup - if not self.ball_in_cup: - ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id) - self.ball_in_cup = ball_in_cup - if ball_in_cup: - dist_to_ctxt = np.linalg.norm(ball_pos - self.context) - self.dist_ctxt = dist_to_ctxt - - if step == self.sim_time - 1: - t_min_dist = np.argmin(self.dists) - angle_min_dist = self.cup_angles[t_min_dist] - cost_angle = (angle_min_dist - np.pi / 2) ** 2 - - min_dist = np.min(self.dists) - dist_final = self.dists_final[-1] - # dist_ctxt = self.dists_ctxt[-1] - - # # max distance between ball and cup and cup height at that time - # ball_to_cup_diff = self.ball_traj[:, 2] - self.cup_traj[:, 2] - # t_max_diff = np.argmax(ball_to_cup_diff) - # t_max_ball_height = np.argmax(self.ball_traj[:, 2]) - # max_ball_height = np.max(self.ball_traj[:, 2]) - - # cost = self._get_stage_wise_cost(ball_in_cup, min_dist, dist_final, dist_ctxt) - cost = 0.5 * min_dist + 0.5 * dist_final + 0.3 * np.minimum(self.dist_ctxt, 3) + 0.01 * cost_angle - reward = np.exp(-2 * cost) - 1e-3 * action_cost - # if max_ball_height < self.context[2] or ball_to_cup_diff[t_max_ball_height] < 0: - # reward -= 1 - - success = dist_final < 0.05 and self.dist_ctxt < 0.05 - else: - reward = - 1e-3 * action_cost - success = False - - return reward, success, stop_sim - - def _get_stage_wise_cost(self, ball_in_cup, min_dist, dist_final, dist_to_ctxt): - if not ball_in_cup: - cost = 3 + 2*(0.5 * min_dist**2 + 0.5 * dist_final**2) - else: - cost = 2 * dist_to_ctxt ** 2 - print('Context Distance:', dist_to_ctxt) - return cost - - def check_ball_in_cup(self, sim, ball_collision_id): - cup_base_collision_id = sim.model._geom_name2id["cup_base_contact"] - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id - collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id - - if collision or collision_trans: - return True - return False - - def check_collision(self, sim): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id - collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids - - if collision or collision_trans: - return True - return False diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py b/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py deleted file mode 100644 index a147d89..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/ball_in_a_cup_reward_simple.py +++ /dev/null @@ -1,116 +0,0 @@ -import numpy as np -from alr_envs.alr.mujoco import alr_reward_fct - - -class BallInACupReward(alr_reward_fct.AlrReward): - def __init__(self, env): - self.env = env - self.collision_objects = ["cup_geom1", "cup_geom2", "cup_base_contact_below", - "wrist_palm_link_convex_geom", - "wrist_pitch_link_convex_decomposition_p1_geom", - "wrist_pitch_link_convex_decomposition_p2_geom", - "wrist_pitch_link_convex_decomposition_p3_geom", - "wrist_yaw_link_convex_decomposition_p1_geom", - "wrist_yaw_link_convex_decomposition_p2_geom", - "forearm_link_convex_decomposition_p1_geom", - "forearm_link_convex_decomposition_p2_geom"] - - self.ball_id = None - self.ball_collision_id = None - self.goal_id = None - self.goal_final_id = None - self.collision_ids = None - self._is_collided = False - self.collision_penalty = 1000 - - self.ball_traj = None - self.dists = None - self.dists_final = None - self.costs = None - - self.reset(None) - - def reset(self, context): - # self.sim_time = self.env.sim.dtsim_time - self.ball_traj = [] # np.zeros(shape=(self.sim_time, 3)) - self.dists = [] - self.dists_final = [] - self.costs = [] - self.action_costs = [] - self.angle_costs = [] - self.cup_angles = [] - - def compute_reward(self, action): - self.ball_id = self.env.sim.model._body_name2id["ball"] - self.ball_collision_id = self.env.sim.model._geom_name2id["ball_geom"] - self.goal_id = self.env.sim.model._site_name2id["cup_goal"] - self.goal_final_id = self.env.sim.model._site_name2id["cup_goal_final"] - self.collision_ids = [self.env.sim.model._geom_name2id[name] for name in self.collision_objects] - - ball_in_cup = self.check_ball_in_cup(self.env.sim, self.ball_collision_id) - - # Compute the current distance from the ball to the inner part of the cup - goal_pos = self.env.sim.data.site_xpos[self.goal_id] - ball_pos = self.env.sim.data.body_xpos[self.ball_id] - goal_final_pos = self.env.sim.data.site_xpos[self.goal_final_id] - self.dists.append(np.linalg.norm(goal_pos - ball_pos)) - self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) - # self.ball_traj[self.env._steps, :] = ball_pos - self.ball_traj.append(ball_pos) - cup_quat = np.copy(self.env.sim.data.body_xquat[self.env.sim.model._body_name2id["cup"]]) - cup_angle = np.arctan2(2 * (cup_quat[0] * cup_quat[1] + cup_quat[2] * cup_quat[3]), - 1 - 2 * (cup_quat[1]**2 + cup_quat[2]**2)) - cost_angle = (cup_angle - np.pi / 2) ** 2 - self.angle_costs.append(cost_angle) - self.cup_angles.append(cup_angle) - - action_cost = np.sum(np.square(action)) - self.action_costs.append(action_cost) - - self._is_collided = self.check_collision(self.env.sim) or self.env.check_traj_in_joint_limits() - - if self.env._steps == self.env.ep_length - 1 or self._is_collided: - t_min_dist = np.argmin(self.dists) - angle_min_dist = self.cup_angles[t_min_dist] - # cost_angle = (angle_min_dist - np.pi / 2)**2 - - - # min_dist = self.dists[t_min_dist] - dist_final = self.dists_final[-1] - min_dist_final = np.min(self.dists_final) - - # cost = 0.5 * dist_final + 0.05 * cost_angle # TODO: Increase cost_angle weight # 0.5 * min_dist + - # reward = np.exp(-2 * cost) - 1e-2 * action_cost - self.collision_penalty * int(self._is_collided) - # reward = - dist_final**2 - 1e-4 * cost_angle - 1e-5 * action_cost - self.collision_penalty * int(self._is_collided) - reward = - dist_final**2 - min_dist_final**2 - 1e-4 * cost_angle - 1e-3 * action_cost - self.collision_penalty * int(self._is_collided) - success = dist_final < 0.05 and ball_in_cup and not self._is_collided - crash = self._is_collided - else: - reward = - 1e-3 * action_cost - 1e-4 * cost_angle # TODO: increase action_cost weight - success = False - crash = False - - return reward, success, crash - - def check_ball_in_cup(self, sim, ball_collision_id): - cup_base_collision_id = sim.model._geom_name2id["cup_base_contact"] - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id - collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id - - if collision or collision_trans: - return True - return False - - def check_collision(self, sim): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id - collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids - - if collision or collision_trans: - return True - return False diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py b/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py deleted file mode 100644 index b047c54..0000000 --- a/alr_envs/alr/mujoco/ball_in_a_cup/biac_pd.py +++ /dev/null @@ -1,205 +0,0 @@ -import os - -import gym.envs.mujoco -import gym.envs.mujoco as mujoco_env -import mujoco_py.builder -import numpy as np -from gym import utils - -from mp_env_api.mp_wrappers.detpmp_wrapper import DetPMPWrapper -from mp_env_api.utils.policies import PDControllerExtend - - -def make_detpmp_env(**kwargs): - name = kwargs.pop("name") - _env = gym.make(name) - policy = PDControllerExtend(_env, p_gains=kwargs.pop('p_gains'), d_gains=kwargs.pop('d_gains')) - kwargs['policy_type'] = policy - return DetPMPWrapper(_env, **kwargs) - - -class ALRBallInACupPDEnv(mujoco_env.MujocoEnv, utils.EzPickle): - def __init__(self, frame_skip=4, apply_gravity_comp=True, simplified: bool = False, - reward_type: str = None, context: np.ndarray = None): - utils.EzPickle.__init__(**locals()) - self._steps = 0 - - self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "biac_base.xml") - - self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.]) - - self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7]) - self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7]) - - self.context = context - self.apply_gravity_comp = apply_gravity_comp - self.simplified = simplified - - self._start_pos = np.array([0.0, 0.58760536, 0.0, 1.36004913, 0.0, -0.32072943, -1.57]) - self._start_vel = np.zeros(7) - - self.sim_time = 8 # seconds - self._dt = 0.02 - self.ep_length = 4000 # based on 8 seconds with dt = 0.02 int(self.sim_time / self.dt) - if reward_type == "no_context": - from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward_simple import BallInACupReward - reward_function = BallInACupReward - elif reward_type == "contextual_goal": - from alr_envs.alr.mujoco.ball_in_a_cup.ball_in_a_cup_reward import BallInACupReward - reward_function = BallInACupReward - else: - raise ValueError("Unknown reward type: {}".format(reward_type)) - self.reward_function = reward_function(self) - - mujoco_env.MujocoEnv.__init__(self, self.xml_path, frame_skip) - - @property - def dt(self): - return self._dt - - # TODO: @Max is this even needed? - @property - def start_vel(self): - if self.simplified: - return self._start_vel[1::2] - else: - return self._start_vel - - # def _set_action_space(self): - # if self.simplified: - # bounds = self.model.actuator_ctrlrange.copy().astype(np.float32)[1::2] - # else: - # bounds = self.model.actuator_ctrlrange.copy().astype(np.float32) - # low, high = bounds.T - # self.action_space = spaces.Box(low=low, high=high, dtype=np.float32) - # return self.action_space - - def reset(self): - self.reward_function.reset(None) - return super().reset() - - def reset_model(self): - init_pos_all = self.init_qpos.copy() - init_pos_robot = self._start_pos - init_vel = np.zeros_like(init_pos_all) - - self._steps = 0 - self._q_pos = [] - self._q_vel = [] - - start_pos = init_pos_all - start_pos[0:7] = init_pos_robot - - self.set_state(start_pos, init_vel) - - return self._get_obs() - - def step(self, a): - reward_dist = 0.0 - angular_vel = 0.0 - reward_ctrl = - np.square(a).sum() - - # if self.simplified: - # tmp = np.zeros(7) - # tmp[1::2] = a - # a = tmp - - if self.apply_gravity_comp: - a += self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0] - - crash = False - try: - self.do_simulation(a, self.frame_skip) - except mujoco_py.builder.MujocoException: - crash = True - # joint_cons_viol = self.check_traj_in_joint_limits() - - ob = self._get_obs() - - if not crash: - reward, success, is_collided = self.reward_function.compute_reward(a) - done = success or is_collided # self._steps == self.sim_steps - 1 - self._steps += 1 - else: - reward = -2000 - success = False - is_collided = False - done = True - - return ob, reward, done, dict(reward_dist=reward_dist, - reward_ctrl=reward_ctrl, - velocity=angular_vel, - # traj=self._q_pos, - action=a, - q_pos=self.sim.data.qpos[0:7].ravel().copy(), - q_vel=self.sim.data.qvel[0:7].ravel().copy(), - is_success=success, - is_collided=is_collided, sim_crash=crash) - - def check_traj_in_joint_limits(self): - return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min) - - # TODO: extend observation space - def _get_obs(self): - theta = self.sim.data.qpos.flat[:7] - return np.concatenate([ - np.cos(theta), - np.sin(theta), - # self.get_body_com("target"), # only return target to make problem harder - [self._steps], - ]) - - # These functions are for the task with 3 joint actuations - def extend_des_pos(self, des_pos): - des_pos_full = self._start_pos.copy() - des_pos_full[1] = des_pos[0] - des_pos_full[3] = des_pos[1] - des_pos_full[5] = des_pos[2] - return des_pos_full - - def extend_des_vel(self, des_vel): - des_vel_full = self._start_vel.copy() - des_vel_full[1] = des_vel[0] - des_vel_full[3] = des_vel[1] - des_vel_full[5] = des_vel[2] - return des_vel_full - - def render(self, render_mode, **render_kwargs): - if render_mode == "plot_trajectory": - if self._steps == 1: - import matplotlib.pyplot as plt - # plt.ion() - self.fig, self.axs = plt.subplots(3, 1) - - if self._steps <= 1750: - for ax, cp in zip(self.axs, self.current_pos[1::2]): - ax.scatter(self._steps, cp, s=2, marker=".") - - # self.fig.show() - - else: - super().render(render_mode, **render_kwargs) - - -if __name__ == "__main__": - env = ALRBallInACupPDEnv(reward_type="no_context", simplified=True) - # env = gym.make("alr_envs:ALRBallInACupPDSimpleDetPMP-v0") - # ctxt = np.array([-0.20869846, -0.66376693, 1.18088501]) - - # env.configure(ctxt) - env.reset() - env.render("human") - for i in range(16000): - # test with random actions - ac = 0.02 * env.action_space.sample()[0:7] - # ac = env.start_pos - # ac[0] += np.pi/2 - obs, rew, d, info = env.step(ac) - env.render("human") - - print(rew) - - if d: - break - - env.close() diff --git a/alr_envs/alr/mujoco/beerpong/beerpong.py b/alr_envs/alr/mujoco/beerpong/beerpong.py deleted file mode 100644 index 755710a..0000000 --- a/alr_envs/alr/mujoco/beerpong/beerpong.py +++ /dev/null @@ -1,193 +0,0 @@ -import mujoco_py.builder -import os - -import numpy as np -from gym import utils -from gym.envs.mujoco import MujocoEnv - - -class ALRBeerBongEnv(MujocoEnv, utils.EzPickle): - def __init__(self, frame_skip=1, apply_gravity_comp=True, reward_type: str = "staged", noisy=False, - context: np.ndarray = None, difficulty='simple'): - self._steps = 0 - - self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", - "beerpong_wo_cup" + ".xml") - - self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7]) - self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7]) - - self.context = context - self.apply_gravity_comp = apply_gravity_comp - self.add_noise = noisy - - self._start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59]) - self._start_vel = np.zeros(7) - - self.ball_site_id = 0 - self.ball_id = 11 - - self._release_step = 175 # time step of ball release - - self.sim_time = 3 # seconds - self.ep_length = 600 # based on 3 seconds with dt = 0.005 int(self.sim_time / self.dt) - self.cup_table_id = 10 - - if noisy: - self.noise_std = 0.01 - else: - self.noise_std = 0 - - if difficulty == 'simple': - self.cup_goal_pos = np.array([0, -1.7, 0.840]) - elif difficulty == 'intermediate': - self.cup_goal_pos = np.array([0.3, -1.5, 0.840]) - elif difficulty == 'hard': - self.cup_goal_pos = np.array([-0.3, -2.2, 0.840]) - elif difficulty == 'hardest': - self.cup_goal_pos = np.array([-0.3, -1.2, 0.840]) - - if reward_type == "no_context": - from alr_envs.alr.mujoco.beerpong.beerpong_reward import BeerPongReward - reward_function = BeerPongReward - elif reward_type == "staged": - from alr_envs.alr.mujoco.beerpong.beerpong_reward_staged import BeerPongReward - reward_function = BeerPongReward - else: - raise ValueError("Unknown reward type: {}".format(reward_type)) - self.reward_function = reward_function() - - MujocoEnv.__init__(self, self.xml_path, frame_skip) - utils.EzPickle.__init__(self) - - @property - def start_pos(self): - return self._start_pos - - @property - def start_vel(self): - return self._start_vel - - @property - def current_pos(self): - return self.sim.data.qpos[0:7].copy() - - @property - def current_vel(self): - return self.sim.data.qvel[0:7].copy() - - def reset(self): - self.reward_function.reset(self.add_noise) - return super().reset() - - def reset_model(self): - init_pos_all = self.init_qpos.copy() - init_pos_robot = self.start_pos - init_vel = np.zeros_like(init_pos_all) - - self._steps = 0 - - start_pos = init_pos_all - start_pos[0:7] = init_pos_robot - - self.set_state(start_pos, init_vel) - self.sim.model.body_pos[self.cup_table_id] = self.cup_goal_pos - start_pos[7::] = self.sim.data.site_xpos[self.ball_site_id, :].copy() - self.set_state(start_pos, init_vel) - return self._get_obs() - - def step(self, a): - reward_dist = 0.0 - angular_vel = 0.0 - reward_ctrl = - np.square(a).sum() - - if self.apply_gravity_comp: - a = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0] - try: - self.do_simulation(a, self.frame_skip) - if self._steps < self._release_step: - self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.ball_site_id, :].copy() - self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.ball_site_id, :].copy() - elif self._steps == self._release_step and self.add_noise: - self.sim.data.qvel[7::] += self.noise_std * np.random.randn(3) - crash = False - except mujoco_py.builder.MujocoException: - crash = True - # joint_cons_viol = self.check_traj_in_joint_limits() - - ob = self._get_obs() - - if not crash: - reward, reward_infos = self.reward_function.compute_reward(self, a) - success = reward_infos['success'] - is_collided = reward_infos['is_collided'] - ball_pos = reward_infos['ball_pos'] - ball_vel = reward_infos['ball_vel'] - done = is_collided or self._steps == self.ep_length - 1 - self._steps += 1 - else: - reward = -30 - reward_infos = dict() - success = False - is_collided = False - done = True - ball_pos = np.zeros(3) - ball_vel = np.zeros(3) - - infos = dict(reward_dist=reward_dist, - reward_ctrl=reward_ctrl, - reward=reward, - velocity=angular_vel, - # traj=self._q_pos, - action=a, - q_pos=self.sim.data.qpos[0:7].ravel().copy(), - q_vel=self.sim.data.qvel[0:7].ravel().copy(), - ball_pos=ball_pos, - ball_vel=ball_vel, - success=success, - is_collided=is_collided, sim_crash=crash) - infos.update(reward_infos) - - return ob, reward, done, infos - - def check_traj_in_joint_limits(self): - return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min) - - # TODO: extend observation space - def _get_obs(self): - theta = self.sim.data.qpos.flat[:7] - return np.concatenate([ - np.cos(theta), - np.sin(theta), - # self.get_body_com("target"), # only return target to make problem harder - [self._steps], - ]) - - # TODO - @property - def active_obs(self): - return np.hstack([ - [False] * 7, # cos - [False] * 7, # sin - # [True] * 2, # x-y coordinates of target distance - [False] # env steps - ]) - - -if __name__ == "__main__": - env = ALRBeerBongEnv(reward_type="staged", difficulty='hardest') - - # env.configure(ctxt) - env.reset() - env.render("human") - for i in range(800): - ac = 10 * env.action_space.sample()[0:7] - obs, rew, d, info = env.step(ac) - env.render("human") - - print(rew) - - if d: - break - - env.close() diff --git a/alr_envs/alr/mujoco/beerpong/beerpong_reward.py b/alr_envs/alr/mujoco/beerpong/beerpong_reward.py deleted file mode 100644 index dc39ca8..0000000 --- a/alr_envs/alr/mujoco/beerpong/beerpong_reward.py +++ /dev/null @@ -1,171 +0,0 @@ -import numpy as np - - -class BeerPongReward: - def __init__(self): - - self.robot_collision_objects = ["wrist_palm_link_convex_geom", - "wrist_pitch_link_convex_decomposition_p1_geom", - "wrist_pitch_link_convex_decomposition_p2_geom", - "wrist_pitch_link_convex_decomposition_p3_geom", - "wrist_yaw_link_convex_decomposition_p1_geom", - "wrist_yaw_link_convex_decomposition_p2_geom", - "forearm_link_convex_decomposition_p1_geom", - "forearm_link_convex_decomposition_p2_geom", - "upper_arm_link_convex_decomposition_p1_geom", - "upper_arm_link_convex_decomposition_p2_geom", - "shoulder_link_convex_decomposition_p1_geom", - "shoulder_link_convex_decomposition_p2_geom", - "shoulder_link_convex_decomposition_p3_geom", - "base_link_convex_geom", "table_contact_geom"] - - self.cup_collision_objects = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6", - "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10", - # "cup_base_table", "cup_base_table_contact", - "cup_geom_table15", - "cup_geom_table16", - "cup_geom_table17", "cup_geom1_table8", - # "cup_base_table_contact", - # "cup_base_table" - ] - - - self.ball_traj = None - self.dists = None - self.dists_final = None - self.costs = None - self.action_costs = None - self.angle_rewards = None - self.cup_angles = None - self.cup_z_axes = None - self.collision_penalty = 500 - self.reset(None) - - def reset(self, context): - self.ball_traj = [] - self.dists = [] - self.dists_final = [] - self.costs = [] - self.action_costs = [] - self.angle_rewards = [] - self.cup_angles = [] - self.cup_z_axes = [] - self.ball_ground_contact = False - self.ball_table_contact = False - self.ball_wall_contact = False - self.ball_cup_contact = False - - def compute_reward(self, env, action): - self.ball_id = env.sim.model._body_name2id["ball"] - self.ball_collision_id = env.sim.model._geom_name2id["ball_geom"] - self.goal_id = env.sim.model._site_name2id["cup_goal_table"] - self.goal_final_id = env.sim.model._site_name2id["cup_goal_final_table"] - self.cup_collision_ids = [env.sim.model._geom_name2id[name] for name in self.cup_collision_objects] - self.cup_table_id = env.sim.model._body_name2id["cup_table"] - self.table_collision_id = env.sim.model._geom_name2id["table_contact_geom"] - self.wall_collision_id = env.sim.model._geom_name2id["wall"] - self.cup_table_collision_id = env.sim.model._geom_name2id["cup_base_table_contact"] - self.init_ball_pos_site_id = env.sim.model._site_name2id["init_ball_pos_site"] - self.ground_collision_id = env.sim.model._geom_name2id["ground"] - self.robot_collision_ids = [env.sim.model._geom_name2id[name] for name in self.robot_collision_objects] - - goal_pos = env.sim.data.site_xpos[self.goal_id] - ball_pos = env.sim.data.body_xpos[self.ball_id] - ball_vel = env.sim.data.body_xvelp[self.ball_id] - goal_final_pos = env.sim.data.site_xpos[self.goal_final_id] - self.dists.append(np.linalg.norm(goal_pos - ball_pos)) - self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) - - action_cost = np.sum(np.square(action)) - self.action_costs.append(action_cost) - - ball_table_bounce = self._check_collision_single_objects(env.sim, self.ball_collision_id, - self.table_collision_id) - - if ball_table_bounce: # or ball_cup_table_cont or ball_wall_con - self.ball_table_contact = True - - ball_cup_cont = self._check_collision_with_set_of_objects(env.sim, self.ball_collision_id, - self.cup_collision_ids) - if ball_cup_cont: - self.ball_cup_contact = True - - ball_wall_cont = self._check_collision_single_objects(env.sim, self.ball_collision_id, self.wall_collision_id) - if ball_wall_cont and not self.ball_table_contact: - self.ball_wall_contact = True - - ball_ground_contact = self._check_collision_single_objects(env.sim, self.ball_collision_id, - self.ground_collision_id) - if ball_ground_contact and not self.ball_table_contact: - self.ball_ground_contact = True - - self._is_collided = self._check_collision_with_itself(env.sim, self.robot_collision_ids) - if env._steps == env.ep_length - 1 or self._is_collided: - - min_dist = np.min(self.dists) - - ball_in_cup = self._check_collision_single_objects(env.sim, self.ball_collision_id, self.cup_table_collision_id) - - cost_offset = 0 - - if self.ball_ground_contact: # or self.ball_wall_contact: - cost_offset += 2 - - if not self.ball_table_contact: - cost_offset += 2 - - if not ball_in_cup: - cost_offset += 2 - cost = cost_offset + min_dist ** 2 + 0.5 * self.dists_final[-1] ** 2 + 1e-4 * action_cost # + min_dist ** 2 - else: - if self.ball_cup_contact: - cost_offset += 1 - cost = cost_offset + self.dists_final[-1] ** 2 + 1e-4 * action_cost - - reward = - 1*cost - self.collision_penalty * int(self._is_collided) - success = ball_in_cup and not self.ball_ground_contact and not self.ball_wall_contact and not self.ball_cup_contact - else: - reward = - 1e-4 * action_cost - success = False - - infos = {} - infos["success"] = success - infos["is_collided"] = self._is_collided - infos["ball_pos"] = ball_pos.copy() - infos["ball_vel"] = ball_vel.copy() - infos["action_cost"] = 5e-4 * action_cost - - return reward, infos - - def _check_collision_single_objects(self, sim, id_1, id_2): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 == id_1 and con.geom2 == id_2 - collision_trans = con.geom1 == id_2 and con.geom2 == id_1 - - if collision or collision_trans: - return True - return False - - def _check_collision_with_itself(self, sim, collision_ids): - col_1, col_2 = False, False - for j, id in enumerate(collision_ids): - col_1 = self._check_collision_with_set_of_objects(sim, id, collision_ids[:j]) - if j != len(collision_ids) - 1: - col_2 = self._check_collision_with_set_of_objects(sim, id, collision_ids[j + 1:]) - else: - col_2 = False - collision = True if col_1 or col_2 else False - return collision - - def _check_collision_with_set_of_objects(self, sim, id_1, id_list): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 in id_list and con.geom2 == id_1 - collision_trans = con.geom1 == id_1 and con.geom2 in id_list - - if collision or collision_trans: - return True - return False \ No newline at end of file diff --git a/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py b/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py deleted file mode 100644 index fbe2163..0000000 --- a/alr_envs/alr/mujoco/beerpong/beerpong_reward_simple.py +++ /dev/null @@ -1,141 +0,0 @@ -import numpy as np -from alr_envs.alr.mujoco import alr_reward_fct - - -class BeerpongReward(alr_reward_fct.AlrReward): - def __init__(self, sim, sim_time): - - self.sim = sim - self.sim_time = sim_time - - self.collision_objects = ["cup_geom1", "cup_geom2", "wrist_palm_link_convex_geom", - "wrist_pitch_link_convex_decomposition_p1_geom", - "wrist_pitch_link_convex_decomposition_p2_geom", - "wrist_pitch_link_convex_decomposition_p3_geom", - "wrist_yaw_link_convex_decomposition_p1_geom", - "wrist_yaw_link_convex_decomposition_p2_geom", - "forearm_link_convex_decomposition_p1_geom", - "forearm_link_convex_decomposition_p2_geom"] - - self.ball_id = None - self.ball_collision_id = None - self.goal_id = None - self.goal_final_id = None - self.collision_ids = None - - self.ball_traj = None - self.dists = None - self.dists_ctxt = None - self.dists_final = None - self.costs = None - - self.reset(None) - - def reset(self, context): - self.ball_traj = np.zeros(shape=(self.sim_time, 3)) - self.dists = [] - self.dists_ctxt = [] - self.dists_final = [] - self.costs = [] - self.action_costs = [] - self.context = context - self.ball_in_cup = False - self.dist_ctxt = 5 - self.bounce_dist = 2 - self.min_dist = 2 - self.dist_final = 2 - self.table_contact = False - - self.ball_id = self.sim.model._body_name2id["ball"] - self.ball_collision_id = self.sim.model._geom_name2id["ball_geom"] - self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"] - self.goal_id = self.sim.model._site_name2id["cup_goal_table"] - self.goal_final_id = self.sim.model._site_name2id["cup_goal_final_table"] - self.collision_ids = [self.sim.model._geom_name2id[name] for name in self.collision_objects] - self.cup_table_id = self.sim.model._body_name2id["cup_table"] - self.bounce_table_id = self.sim.model._site_name2id["bounce_table"] - - def compute_reward(self, action, sim, step): - action_cost = np.sum(np.square(action)) - self.action_costs.append(action_cost) - - stop_sim = False - success = False - - if self.check_collision(sim): - reward = - 1e-2 * action_cost - 10 - stop_sim = True - return reward, success, stop_sim - - # Compute the current distance from the ball to the inner part of the cup - goal_pos = sim.data.site_xpos[self.goal_id] - ball_pos = sim.data.body_xpos[self.ball_id] - bounce_pos = sim.data.site_xpos[self.bounce_table_id] - goal_final_pos = sim.data.site_xpos[self.goal_final_id] - self.dists.append(np.linalg.norm(goal_pos - ball_pos)) - self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) - self.ball_traj[step, :] = ball_pos - - ball_in_cup = self.check_ball_in_cup(sim, self.ball_collision_id) - table_contact = self.check_ball_table_contact(sim, self.ball_collision_id) - - if table_contact and not self.table_contact: - self.bounce_dist = np.minimum((np.linalg.norm(bounce_pos - ball_pos)), 2) - self.table_contact = True - - if step == self.sim_time - 1: - min_dist = np.min(self.dists) - self.min_dist = min_dist - dist_final = self.dists_final[-1] - self.dist_final = dist_final - - cost = 0.33 * min_dist + 0.33 * dist_final + 0.33 * self.bounce_dist - reward = np.exp(-2 * cost) - 1e-2 * action_cost - success = self.bounce_dist < 0.05 and dist_final < 0.05 and ball_in_cup - else: - reward = - 1e-2 * action_cost - success = False - - return reward, success, stop_sim - - def _get_stage_wise_cost(self, ball_in_cup, min_dist, dist_final, dist_to_ctxt): - if not ball_in_cup: - cost = 3 + 2*(0.5 * min_dist**2 + 0.5 * dist_final**2) - else: - cost = 2 * dist_to_ctxt ** 2 - print('Context Distance:', dist_to_ctxt) - return cost - - def check_ball_table_contact(self, sim, ball_collision_id): - table_collision_id = sim.model._geom_name2id["table_contact_geom"] - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - collision = con.geom1 == table_collision_id and con.geom2 == ball_collision_id - collision_trans = con.geom1 == ball_collision_id and con.geom2 == table_collision_id - - if collision or collision_trans: - return True - return False - - def check_ball_in_cup(self, sim, ball_collision_id): - cup_base_collision_id = sim.model._geom_name2id["cup_base_table_contact"] - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 == cup_base_collision_id and con.geom2 == ball_collision_id - collision_trans = con.geom1 == ball_collision_id and con.geom2 == cup_base_collision_id - - if collision or collision_trans: - return True - return False - - def check_collision(self, sim): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 in self.collision_ids and con.geom2 == self.ball_collision_id - collision_trans = con.geom1 == self.ball_collision_id and con.geom2 in self.collision_ids - - if collision or collision_trans: - return True - return False diff --git a/alr_envs/alr/mujoco/beerpong/beerpong_reward_staged.py b/alr_envs/alr/mujoco/beerpong/beerpong_reward_staged.py deleted file mode 100644 index e94b470..0000000 --- a/alr_envs/alr/mujoco/beerpong/beerpong_reward_staged.py +++ /dev/null @@ -1,158 +0,0 @@ -import numpy as np - - -class BeerPongReward: - def __init__(self): - - self.robot_collision_objects = ["wrist_palm_link_convex_geom", - "wrist_pitch_link_convex_decomposition_p1_geom", - "wrist_pitch_link_convex_decomposition_p2_geom", - "wrist_pitch_link_convex_decomposition_p3_geom", - "wrist_yaw_link_convex_decomposition_p1_geom", - "wrist_yaw_link_convex_decomposition_p2_geom", - "forearm_link_convex_decomposition_p1_geom", - "forearm_link_convex_decomposition_p2_geom", - "upper_arm_link_convex_decomposition_p1_geom", - "upper_arm_link_convex_decomposition_p2_geom", - "shoulder_link_convex_decomposition_p1_geom", - "shoulder_link_convex_decomposition_p2_geom", - "shoulder_link_convex_decomposition_p3_geom", - "base_link_convex_geom", "table_contact_geom"] - - self.cup_collision_objects = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6", - "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10", - # "cup_base_table", "cup_base_table_contact", - "cup_geom_table15", - "cup_geom_table16", - "cup_geom_table17", "cup_geom1_table8", - # "cup_base_table_contact", - # "cup_base_table" - ] - - - self.ball_traj = None - self.dists = None - self.dists_final = None - self.costs = None - self.action_costs = None - self.angle_rewards = None - self.cup_angles = None - self.cup_z_axes = None - self.collision_penalty = 500 - self.reset(None) - - def reset(self, noisy): - self.ball_traj = [] - self.dists = [] - self.dists_final = [] - self.costs = [] - self.action_costs = [] - self.angle_rewards = [] - self.cup_angles = [] - self.cup_z_axes = [] - self.ball_ground_contact = False - self.ball_table_contact = False - self.ball_wall_contact = False - self.ball_cup_contact = False - self.noisy_bp = noisy - self._t_min_final_dist = -1 - - def compute_reward(self, env, action): - self.ball_id = env.sim.model._body_name2id["ball"] - self.ball_collision_id = env.sim.model._geom_name2id["ball_geom"] - self.goal_id = env.sim.model._site_name2id["cup_goal_table"] - self.goal_final_id = env.sim.model._site_name2id["cup_goal_final_table"] - self.cup_collision_ids = [env.sim.model._geom_name2id[name] for name in self.cup_collision_objects] - self.cup_table_id = env.sim.model._body_name2id["cup_table"] - self.table_collision_id = env.sim.model._geom_name2id["table_contact_geom"] - self.wall_collision_id = env.sim.model._geom_name2id["wall"] - self.cup_table_collision_id = env.sim.model._geom_name2id["cup_base_table_contact"] - self.init_ball_pos_site_id = env.sim.model._site_name2id["init_ball_pos_site"] - self.ground_collision_id = env.sim.model._geom_name2id["ground"] - self.robot_collision_ids = [env.sim.model._geom_name2id[name] for name in self.robot_collision_objects] - - goal_pos = env.sim.data.site_xpos[self.goal_id] - ball_pos = env.sim.data.body_xpos[self.ball_id] - ball_vel = env.sim.data.body_xvelp[self.ball_id] - goal_final_pos = env.sim.data.site_xpos[self.goal_final_id] - self.dists.append(np.linalg.norm(goal_pos - ball_pos)) - self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) - - action_cost = np.sum(np.square(action)) - self.action_costs.append(action_cost) - - if not self.ball_table_contact: - self.ball_table_contact = self._check_collision_single_objects(env.sim, self.ball_collision_id, - self.table_collision_id) - - self._is_collided = self._check_collision_with_itself(env.sim, self.robot_collision_ids) - if env._steps == env.ep_length - 1 or self._is_collided: - - min_dist = np.min(self.dists) - final_dist = self.dists_final[-1] - - ball_in_cup = self._check_collision_single_objects(env.sim, self.ball_collision_id, - self.cup_table_collision_id) - - # encourage bounce before falling into cup - if not ball_in_cup: - if not self.ball_table_contact: - reward = 0.2 * (1 - np.tanh(min_dist ** 2)) + 0.1 * (1 - np.tanh(final_dist ** 2)) - else: - reward = (1 - np.tanh(min_dist ** 2)) + 0.5 * (1 - np.tanh(final_dist ** 2)) - else: - if not self.ball_table_contact: - reward = 2 * (1 - np.tanh(final_dist ** 2)) + 1 * (1 - np.tanh(min_dist ** 2)) + 1 - else: - reward = 2 * (1 - np.tanh(final_dist ** 2)) + 1 * (1 - np.tanh(min_dist ** 2)) + 3 - - # reward = - 1 * cost - self.collision_penalty * int(self._is_collided) - success = ball_in_cup - crash = self._is_collided - else: - reward = - 1e-2 * action_cost - success = False - crash = False - - infos = {} - infos["success"] = success - infos["is_collided"] = self._is_collided - infos["ball_pos"] = ball_pos.copy() - infos["ball_vel"] = ball_vel.copy() - infos["action_cost"] = action_cost - infos["task_reward"] = reward - - return reward, infos - - def _check_collision_single_objects(self, sim, id_1, id_2): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 == id_1 and con.geom2 == id_2 - collision_trans = con.geom1 == id_2 and con.geom2 == id_1 - - if collision or collision_trans: - return True - return False - - def _check_collision_with_itself(self, sim, collision_ids): - col_1, col_2 = False, False - for j, id in enumerate(collision_ids): - col_1 = self._check_collision_with_set_of_objects(sim, id, collision_ids[:j]) - if j != len(collision_ids) - 1: - col_2 = self._check_collision_with_set_of_objects(sim, id, collision_ids[j + 1:]) - else: - col_2 = False - collision = True if col_1 or col_2 else False - return collision - - def _check_collision_with_set_of_objects(self, sim, id_1, id_list): - for coni in range(0, sim.data.ncon): - con = sim.data.contact[coni] - - collision = con.geom1 in id_list and con.geom2 == id_1 - collision_trans = con.geom1 == id_1 and con.geom2 in id_list - - if collision or collision_trans: - return True - return False \ No newline at end of file diff --git a/alr_envs/alr/mujoco/beerpong/beerpong_simple.py b/alr_envs/alr/mujoco/beerpong/beerpong_simple.py deleted file mode 100644 index 1708d38..0000000 --- a/alr_envs/alr/mujoco/beerpong/beerpong_simple.py +++ /dev/null @@ -1,166 +0,0 @@ -from gym import utils -import os -import numpy as np -from gym.envs.mujoco import MujocoEnv - - -class ALRBeerpongEnv(MujocoEnv, utils.EzPickle): - def __init__(self, n_substeps=4, apply_gravity_comp=True, reward_function=None): - self._steps = 0 - - self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", - "beerpong" + ".xml") - - self.start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59]) - self.start_vel = np.zeros(7) - - self._q_pos = [] - self._q_vel = [] - # self.weight_matrix_scale = 50 - self.max_ctrl = np.array([150., 125., 40., 60., 5., 5., 2.]) - self.p_gains = 1 / self.max_ctrl * np.array([200, 300, 100, 100, 10, 10, 2.5]) - self.d_gains = 1 / self.max_ctrl * np.array([7, 15, 5, 2.5, 0.3, 0.3, 0.05]) - - self.j_min = np.array([-2.6, -1.985, -2.8, -0.9, -4.55, -1.5707, -2.7]) - self.j_max = np.array([2.6, 1.985, 2.8, 3.14159, 1.25, 1.5707, 2.7]) - - self.context = None - - # alr_mujoco_env.AlrMujocoEnv.__init__(self, - # self.xml_path, - # apply_gravity_comp=apply_gravity_comp, - # n_substeps=n_substeps) - - self.sim_time = 8 # seconds - # self.sim_steps = int(self.sim_time / self.dt) - if reward_function is None: - from alr_envs.alr.mujoco.beerpong.beerpong_reward_simple import BeerpongReward - reward_function = BeerpongReward - self.reward_function = reward_function(self.sim, self.sim_steps) - self.cup_robot_id = self.sim.model._site_name2id["cup_robot_final"] - self.ball_id = self.sim.model._body_name2id["ball"] - self.cup_table_id = self.sim.model._body_name2id["cup_table"] - # self.bounce_table_id = self.sim.model._body_name2id["bounce_table"] - - MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=n_substeps) - utils.EzPickle.__init__(self) - - @property - def current_pos(self): - return self.sim.data.qpos[0:7].copy() - - @property - def current_vel(self): - return self.sim.data.qvel[0:7].copy() - - def configure(self, context): - if context is None: - context = np.array([0, -2, 0.840]) - self.context = context - self.reward_function.reset(context) - - def reset_model(self): - init_pos_all = self.init_qpos.copy() - init_pos_robot = self.start_pos - init_vel = np.zeros_like(init_pos_all) - - self._steps = 0 - self._q_pos = [] - self._q_vel = [] - - start_pos = init_pos_all - start_pos[0:7] = init_pos_robot - # start_pos[7:] = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05]) - - self.set_state(start_pos, init_vel) - - ball_pos = np.copy(self.sim.data.site_xpos[self.cup_robot_id, :]) + np.array([0., 0.0, 0.05]) - self.sim.model.body_pos[self.ball_id] = ball_pos.copy() - self.sim.model.body_pos[self.cup_table_id] = self.context.copy() - # self.sim.model.body_pos[self.bounce_table_id] = self.context.copy() - - self.sim.forward() - - return self._get_obs() - - def step(self, a): - reward_dist = 0.0 - angular_vel = 0.0 - reward_ctrl = - np.square(a).sum() - action_cost = np.sum(np.square(a)) - - crash = self.do_simulation(a, self.frame_skip) - joint_cons_viol = self.check_traj_in_joint_limits() - - self._q_pos.append(self.sim.data.qpos[0:7].ravel().copy()) - self._q_vel.append(self.sim.data.qvel[0:7].ravel().copy()) - - ob = self._get_obs() - - if not crash and not joint_cons_viol: - reward, success, stop_sim = self.reward_function.compute_reward(a, self.sim, self._steps) - done = success or self._steps == self.sim_steps - 1 or stop_sim - self._steps += 1 - else: - reward = -10 - 1e-2 * action_cost - success = False - done = True - return ob, reward, done, dict(reward_dist=reward_dist, - reward_ctrl=reward_ctrl, - velocity=angular_vel, - traj=self._q_pos, is_success=success, - is_collided=crash or joint_cons_viol) - - def check_traj_in_joint_limits(self): - return any(self.current_pos > self.j_max) or any(self.current_pos < self.j_min) - - def extend_des_pos(self, des_pos): - des_pos_full = self.start_pos.copy() - des_pos_full[1] = des_pos[0] - des_pos_full[3] = des_pos[1] - des_pos_full[5] = des_pos[2] - return des_pos_full - - def extend_des_vel(self, des_vel): - des_vel_full = self.start_vel.copy() - des_vel_full[1] = des_vel[0] - des_vel_full[3] = des_vel[1] - des_vel_full[5] = des_vel[2] - return des_vel_full - - def _get_obs(self): - theta = self.sim.data.qpos.flat[:7] - return np.concatenate([ - np.cos(theta), - np.sin(theta), - # self.get_body_com("target"), # only return target to make problem harder - [self._steps], - ]) - - - -if __name__ == "__main__": - env = ALRBeerpongEnv() - ctxt = np.array([0, -2, 0.840]) # initial - - env.configure(ctxt) - env.reset() - env.render() - for i in range(16000): - # test with random actions - ac = 0.0 * env.action_space.sample()[0:7] - ac[1] = -0.01 - ac[3] = - 0.01 - ac[5] = -0.01 - # ac = env.start_pos - # ac[0] += np.pi/2 - obs, rew, d, info = env.step(ac) - env.render() - - print(rew) - - if d: - break - - env.close() - diff --git a/alr_envs/alr/mujoco/beerpong/mp_wrapper.py b/alr_envs/alr/mujoco/beerpong/mp_wrapper.py deleted file mode 100644 index 87c6b7e..0000000 --- a/alr_envs/alr/mujoco/beerpong/mp_wrapper.py +++ /dev/null @@ -1,39 +0,0 @@ -from typing import Tuple, Union - -import numpy as np - -from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - - @property - def active_obs(self): - # TODO: @Max Filter observations correctly - return np.hstack([ - [False] * 7, # cos - [False] * 7, # sin - # [True] * 2, # x-y coordinates of target distance - [False] # env steps - ]) - - @property - def start_pos(self): - return self._start_pos - - @property - def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qpos[0:7].copy() - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qvel[0:7].copy() - - @property - def goal_pos(self): - # TODO: @Max I think the default value of returning to the start is reasonable here - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/gym_table_tennis/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/__init__.py deleted file mode 100644 index 8b13789..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/__init__.py +++ /dev/null @@ -1 +0,0 @@ - diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/__init__.py b/alr_envs/alr/mujoco/gym_table_tennis/envs/__init__.py deleted file mode 100644 index 8b13789..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/__init__.py +++ /dev/null @@ -1 +0,0 @@ - diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml deleted file mode 100644 index 7772d14..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_7_motor_actuator.xml +++ /dev/null @@ -1,12 +0,0 @@ - - - - - - - - - - - - diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml deleted file mode 100644 index a8df915..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_left.xml +++ /dev/null @@ -1,76 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml deleted file mode 100644 index 011b95a..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_barrett_wam_7dof_right.xml +++ /dev/null @@ -1,95 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml deleted file mode 100644 index ad1ae35..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_table.xml +++ /dev/null @@ -1,38 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml deleted file mode 100644 index eb2b347..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_target_ball.xml +++ /dev/null @@ -1,10 +0,0 @@ - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml deleted file mode 100644 index 29a21e1..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/include_test_balls.xml +++ /dev/null @@ -1,80 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml deleted file mode 100644 index 9abf102..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/right_arm_actuator.xml +++ /dev/null @@ -1,19 +0,0 @@ - - - - - - - - - - - - - - - - - - - diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml deleted file mode 100644 index dfbc37a..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/shared.xml +++ /dev/null @@ -1,49 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml b/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml deleted file mode 100644 index f2432bb..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/table_tennis_env.xml +++ /dev/null @@ -1,41 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py b/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py deleted file mode 100644 index 9b16ae1..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/envs/table_tennis_env.py +++ /dev/null @@ -1,244 +0,0 @@ -import numpy as np -from gym import spaces -from gym.envs.robotics import robot_env, utils -# import xml.etree.ElementTree as ET -from alr_envs.alr.mujoco.gym_table_tennis.utils.rewards.hierarchical_reward import HierarchicalRewardTableTennis -import glfw -from alr_envs.alr.mujoco.gym_table_tennis.utils.experiment import ball_initialize -from pathlib import Path -import os - - -class TableTennisEnv(robot_env.RobotEnv): - """Class for Table Tennis environment. - """ - def __init__(self, n_substeps=1, - model_path=None, - initial_qpos=None, - initial_ball_state=None, - config=None, - reward_obj=None - ): - """Initializes a new mujoco based Table Tennis environment. - - Args: - model_path (string): path to the environments XML file - initial_qpos (dict): a dictionary of joint names and values that define the initial - n_actions: Number of joints - n_substeps (int): number of substeps the simulation runs on every call to step - scale (double): limit maximum change in position - initial_ball_state: to reset the ball state - """ - # self.config = config.config - if model_path is None: - path_cws = Path.cwd() - print(path_cws) - current_dir = Path(os.path.split(os.path.realpath(__file__))[0]) - table_tennis_env_xml_path = current_dir / "assets"/"table_tennis_env.xml" - model_path = str(table_tennis_env_xml_path) - self.config = config - action_space = True # self.config['trajectory']['args']['action_space'] - time_step = 0.002 # self.config['mujoco_sim_env']['args']["time_step"] - if initial_qpos is None: - initial_qpos = {"wam/base_yaw_joint_right": 1.5, - "wam/shoulder_pitch_joint_right": 1, - "wam/shoulder_yaw_joint_right": 0, - "wam/elbow_pitch_joint_right": 1, - "wam/wrist_yaw_joint_right": 1, - "wam/wrist_pitch_joint_right": 0, - "wam/palm_yaw_joint_right": 0} - # initial_qpos = [1.5, 1, 0, 1, 1, 0, 0] # self.config['robot_config']['args']['initial_qpos'] - - # TODO should read all configuration in config - assert initial_qpos is not None, "Must initialize the initial q position of robot arm" - n_actions = 7 - self.initial_qpos_value = np.array(list(initial_qpos.values())).copy() - # self.initial_qpos_value = np.array(initial_qpos) - # # change time step in .xml file - # tree = ET.parse(model_path) - # root = tree.getroot() - # for option in root.findall('option'): - # option.set("timestep", str(time_step)) - # - # tree.write(model_path) - - super(TableTennisEnv, self).__init__( - model_path=model_path, n_substeps=n_substeps, n_actions=n_actions, - initial_qpos=initial_qpos) - - if action_space: - self.action_space = spaces.Box(low=np.array([-2.6, -2.0, -2.8, -0.9, -4.8, -1.6, -2.2]), - high=np.array([2.6, 2.0, 2.8, 3.1, 1.3, 1.6, 2.2]), - dtype='float64') - else: - self.action_space = spaces.Box(low=np.array([-np.inf] * 7), - high=np.array([-np.inf] * 7), - dtype='float64') - self.scale = None - self.desired_pos = None - self.n_actions = n_actions - self.action = None - self.time_step = time_step - self._dt = time_step - self.paddle_center_pos = self.sim.data.get_site_xpos('wam/paddle_center') - if reward_obj is None: - self.reward_obj = HierarchicalRewardTableTennis() - else: - self.reward_obj = reward_obj - - if initial_ball_state is not None: - self.initial_ball_state = initial_ball_state - else: - self.initial_ball_state = ball_initialize(random=False) - self.target_ball_pos = self.sim.data.get_site_xpos("target_ball") - self.racket_center_pos = self.sim.data.get_site_xpos("wam/paddle_center") - - def close(self): - if self.viewer is not None: - glfw.destroy_window(self.viewer.window) - # self.viewer.window.close() - self.viewer = None - self._viewers = {} - - # GoalEnv methods - # ---------------------------- - def compute_reward(self, achieved_goal, goal, info): - # reset the reward, if action valid - # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"] - # right_court_contact_detector = self.reward_obj.contact_detection(self, right_court_contact_obj) - # if right_court_contact_detector: - # print("can detect the table ball contact") - self.reward_obj.total_reward = 0 - # Stage 1 Hitting - self.reward_obj.hitting(self) - # if not hitted, return the highest reward - if not self.reward_obj.goal_achievement: - # return self.reward_obj.highest_reward - return self.reward_obj.total_reward - # # Stage 2 Right Table Contact - # self.reward_obj.right_table_contact(self) - # if not self.reward_obj.goal_achievement: - # return self.reward_obj.highest_reward - # # Stage 2 Net Contact - # self.reward_obj.net_contact(self) - # if not self.reward_obj.goal_achievement: - # return self.reward_obj.highest_reward - # Stage 3 Opponent court Contact - # self.reward_obj.landing_on_opponent_court(self) - # if not self.reward_obj.goal_achievement: - # print("self.reward_obj.highest_reward: ", self.reward_obj.highest_reward) - # TODO - self.reward_obj.target_achievement(self) - # return self.reward_obj.highest_reward - return self.reward_obj.total_reward - - def _reset_sim(self): - self.sim.set_state(self.initial_state) - [initial_x, initial_y, initial_z, v_x, v_y, v_z] = self.initial_ball_state - self.sim.data.set_joint_qpos('tar:x', initial_x) - self.sim.data.set_joint_qpos('tar:y', initial_y) - self.sim.data.set_joint_qpos('tar:z', initial_z) - self.energy_corrected = True - self.give_reflection_reward = False - - # velocity is positive direction - self.sim.data.set_joint_qvel('tar:x', v_x) - self.sim.data.set_joint_qvel('tar:y', v_y) - self.sim.data.set_joint_qvel('tar:z', v_z) - - # Apply gravity compensation - if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]: - self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7] - self.sim.forward() - return True - - def _env_setup(self, initial_qpos): - for name, value in initial_qpos.items(): - self.sim.data.set_joint_qpos(name, value) - - # Apply gravity compensation - if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]: - self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7] - self.sim.forward() - - # Get the target position - self.initial_paddle_center_xpos = self.sim.data.get_site_xpos('wam/paddle_center').copy() - self.initial_paddle_center_vel = None # self.sim.get_site_ - - def _sample_goal(self): - goal = self.initial_paddle_center_xpos[:3] + self.np_random.uniform(-0.2, 0.2, size=3) - return goal.copy() - - def _get_obs(self): - - # positions of racket center - paddle_center_pos = self.sim.data.get_site_xpos('wam/paddle_center') - ball_pos = self.sim.data.get_site_xpos("target_ball") - - dt = self.sim.nsubsteps * self.sim.model.opt.timestep - paddle_center_velp = self.sim.data.get_site_xvelp('wam/paddle_center') * dt - robot_qpos, robot_qvel = utils.robot_get_obs(self.sim) - - wrist_state = robot_qpos[-3:] - wrist_vel = robot_qvel[-3:] * dt # change to a scalar if the gripper is made symmetric - - # achieved_goal = paddle_body_EE_pos - obs = np.concatenate([ - paddle_center_pos, paddle_center_velp, wrist_state, wrist_vel - ]) - - out_dict = { - 'observation': obs.copy(), - 'achieved_goal': paddle_center_pos.copy(), - 'desired_goal': self.goal.copy(), - 'q_pos': self.sim.data.qpos[:].copy(), - "ball_pos": ball_pos.copy(), - # "hitting_flag": self.reward_obj.hitting_flag - } - - return out_dict - - def _step_callback(self): - pass - - def _set_action(self, action): - # Apply gravity compensation - if self.sim.data.qfrc_applied[:7] is not self.sim.data.qfrc_bias[:7]: - self.sim.data.qfrc_applied[:7] = self.sim.data.qfrc_bias[:7] - # print("set action process running") - assert action.shape == (self.n_actions,) - self.action = action.copy() # ensure that we don't change the action outside of this scope - pos_ctrl = self.action[:] # limit maximum change in position - pos_ctrl = np.clip(pos_ctrl, self.action_space.low, self.action_space.high) - - # get desired trajectory - self.sim.data.qpos[:7] = pos_ctrl - self.sim.forward() - self.desired_pos = self.sim.data.get_site_xpos('wam/paddle_center').copy() - - self.sim.data.ctrl[:] = pos_ctrl - - def _is_success(self, achieved_goal, desired_goal): - pass - - -if __name__ == '__main__': - render_mode = "human" # "human" or "partial" or "final" - env = TableTennisEnv() - env.reset() - # env.render(mode=render_mode) - - for i in range(500): - # objective.load_result("/tmp/cma") - # test with random actions - ac = env.action_space.sample() - # ac[0] += np.pi/2 - obs, rew, d, info = env.step(ac) - env.render(mode=render_mode) - - print(rew) - - if d: - break - - env.close() diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py deleted file mode 100644 index a106d68..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/utils/experiment.py +++ /dev/null @@ -1,83 +0,0 @@ -import numpy as np -from gym.utils import seeding -from alr_envs.alr.mujoco.gym_table_tennis.utils.util import read_yaml, read_json -from pathlib import Path - - -def ball_initialize(random=False, scale=False, context_range=None, scale_value=None): - if random: - if scale: - # if scale_value is None: - scale_value = context_scale_initialize(context_range) - v_x, v_y, v_z = [2.5, 2, 0.5] * scale_value - dx = 1 - dy = 0 - dz = 0.05 - else: - seed = None - np_random, seed = seeding.np_random(seed) - dx = np_random.uniform(-0.1, 0.1) - dy = np_random.uniform(-0.1, 0.1) - dz = np_random.uniform(-0.1, 0.1) - - v_x = np_random.uniform(1.7, 1.8) - v_y = np_random.uniform(0.7, 0.8) - v_z = np_random.uniform(0.1, 0.2) - # print(dx, dy, dz, v_x, v_y, v_z) - # else: - # dx = -0.1 - # dy = 0.05 - # dz = 0.05 - # v_x = 1.5 - # v_y = 0.7 - # v_z = 0.06 - # initial_x = -0.6 + dx - # initial_y = -0.3 + dy - # initial_z = 0.8 + dz - else: - if scale: - v_x, v_y, v_z = [2.5, 2, 0.5] * scale_value - else: - v_x = 2.5 - v_y = 2 - v_z = 0.5 - dx = 1 - dy = 0 - dz = 0.05 - - initial_x = 0 + dx - initial_y = -0.2 + dy - initial_z = 0.3 + dz - # print("initial ball state: ", initial_x, initial_y, initial_z, v_x, v_y, v_z) - initial_ball_state = np.array([initial_x, initial_y, initial_z, v_x, v_y, v_z]) - return initial_ball_state - - -def context_scale_initialize(range): - """ - - Returns: - - """ - low, high = range - scale = np.random.uniform(low, high, 1) - return scale - - -def config_handle_generation(config_file_path): - """Generate config handle for multiprocessing - - Args: - config_file_path: - - Returns: - - """ - cfg_fname = Path(config_file_path) - # .json and .yml file - if cfg_fname.suffix == ".json": - config = read_json(cfg_fname) - elif cfg_fname.suffix == ".yml": - config = read_yaml(cfg_fname) - - return config diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py deleted file mode 100644 index fe69104..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/hierarchical_reward.py +++ /dev/null @@ -1,402 +0,0 @@ -import numpy as np -import logging - - -class HierarchicalRewardTableTennis(object): - """Class for hierarchical reward function for table tennis experiment. - - Return Highest Reward. - Reward = 0 - Step 1: Action Valid. Upper Bound 0 - [-∞, 0] - Reward += -1 * |hit_duration - hit_duration_threshold| * |hit_duration < hit_duration_threshold| * 10 - Step 2: Hitting. Upper Bound 2 - if hitting: - [0, 2] - Reward = 2 * (1 - tanh(|shortest_hitting_dist|)) - if not hitting: - [0, 0.2] - Reward = 2 * (1 - tanh(|shortest_hitting_dist|)) - Step 3: Target Point Achievement. Upper Bound 6 - [0, 4] - if table_contact_detector: - Reward += 1 - Reward += (1 - tanh(|shortest_hitting_dist|)) * 2 - if contact_coordinate[0] < 0: - Reward += 1 - else: - Reward += 0 - elif: - Reward += (1 - tanh(|shortest_hitting_dist|)) - """ - - def __init__(self): - self.reward = None - self.goal_achievement = False - self.total_reward = 0 - self.shortest_hitting_dist = 1000 - self.highest_reward = -1000 - self.lowest_corner_dist = 100 - self.right_court_contact_detector = False - self.table_contact_detector = False - self.floor_contact_detector = False - self.radius = 0.025 - self.min_ball_x_pos = 100 - self.hit_contact_detector = False - self.net_contact_detector = False - self.ratio = 1 - self.lowest_z = 100 - self.target_flag = False - self.dist_target_virtual = 100 - self.ball_z_pos_lowest = 100 - self.hitting_flag = False - self.hitting_time_point = None - self.ctxt_dim = None - self.context_range_bounds = None - # self.ctxt_out_of_range_punishment = None - # self.ctxt_in_side_of_range_punishment = None - # - # def check_where_invalid(self, ctxt, context_range_bounds, set_to_valid_region=False): - # idx_max = [] - # idx_min = [] - # for dim in range(self.ctxt_dim): - # min_dim = context_range_bounds[0][dim] - # max_dim = context_range_bounds[1][dim] - # idx_max_c = np.where(ctxt[:, dim] > max_dim)[0] - # idx_min_c = np.where(ctxt[:, dim] < min_dim)[0] - # if set_to_valid_region: - # if idx_max_c.shape[0] != 0: - # ctxt[idx_max_c, dim] = max_dim - # if idx_min_c.shape[0] != 0: - # ctxt[idx_min_c, dim] = min_dim - # idx_max.append(idx_max_c) - # idx_min.append(idx_min_c) - # return idx_max, idx_min, ctxt - - def check_valid(self, scale, context_range_bounds): - - min_dim = context_range_bounds[0][0] - max_dim = context_range_bounds[1][0] - valid = (scale < max_dim) and (scale > min_dim) - return valid - - @classmethod - def goal_distance(cls, goal_a, goal_b): - assert goal_a.shape == goal_b.shape - return np.linalg.norm(goal_a - goal_b, axis=-1) - - def refresh_highest_reward(self): - if self.total_reward >= self.highest_reward: - self.highest_reward = self.total_reward - - def duration_valid(self): - pass - - def huge_value_unstable(self): - self.total_reward += -10 - self.highest_reward = -1 - - def context_valid(self, context): - valid = self.check_valid(context.copy(), context_range_bounds=self.context_range_bounds) - # when using dirac punishments - if valid: - self.total_reward += 1 # If Action Valid and Context Valid, total_reward = 0 - else: - self.total_reward += 0 - self.refresh_highest_reward() - - - - # If in the ctxt, add 1, otherwise, 0 - - def action_valid(self, durations=None): - """Ensure the execution of the robot movement with parameters which are in a valid domain. - - Time should always be positive, - the joint position of the robot should be a subset of [−π, π]. - if all parameters are valid, the robot gets a zero score, - otherwise it gets a negative score proportional to how much it is beyond the valid parameter domain. - - Returns: - rewards: if valid, reward is equal to 0. - if not valid, reward is negative and proportional to the distance beyond the valid parameter domain - """ - assert durations.shape[0] == 2, "durations type should be np.array and the shape should be 2" - # pre_duration = durations[0] - hit_duration = durations[1] - # pre_duration_thres = 0.01 - hit_duration_thres = 1 - # self.goal_achievement = np.all( - # [(pre_duration > pre_duration_thres), (hit_duration > hit_duration_thres), (0.3 < pre_duration < 0.6)]) - self.goal_achievement = (hit_duration > hit_duration_thres) - if self.goal_achievement: - self.total_reward = -1 - self.goal_achievement = True - else: - # self.total_reward += -1 * ((np.abs(pre_duration - pre_duration_thres) * int( - # pre_duration < pre_duration_thres) + np.abs(hit_duration - hit_duration_thres) * int( - # hit_duration < hit_duration_thres)) * 10) - self.total_reward = -1 * ((np.abs(hit_duration - hit_duration_thres) * int( - hit_duration < hit_duration_thres)) * 10) - self.total_reward += -1 - self.goal_achievement = False - self.refresh_highest_reward() - - def motion_penalty(self, action, high_motion_penalty): - """Protects the robot from high acceleration and dangerous movement. - """ - if not high_motion_penalty: - reward_ctrl = - 0.05 * np.square(action).sum() - else: - reward_ctrl = - 0.075 * np.square(action).sum() - self.total_reward += reward_ctrl - self.refresh_highest_reward() - self.goal_achievement = True - - def hitting(self, env): # , target_ball_pos, racket_center_pos, hit_contact_detector=False - """Hitting reward calculation - - If racket successfully hit the ball, the reward +1 - Otherwise calculate the distance between the center of racket and the center of ball, - reward = tanh(r/dist) if dist<1 reward almost 2 , if dist >= 1 reward is between [0, 0.2] - - - Args: - env: - - Returns: - - """ - - hit_contact_obj = ["target_ball", "bat"] - target_ball_pos = env.target_ball_pos - racket_center_pos = env.racket_center_pos - # hit contact detection - # Record the hitting history - self.hitting_flag = False - if not self.hit_contact_detector: - self.hit_contact_detector = self.contact_detection(env, hit_contact_obj) - if self.hit_contact_detector: - print("First time detect hitting") - self.hitting_flag = True - if self.hit_contact_detector: - - # TODO - dist = self.goal_distance(target_ball_pos, racket_center_pos) - if dist < 0: - dist = 0 - # print("goal distance is:", dist) - if dist <= self.shortest_hitting_dist: - self.shortest_hitting_dist = dist - # print("shortest_hitting_dist is:", self.shortest_hitting_dist) - # Keep the shortest hitting distance. - dist_reward = 2 * (1 - np.tanh(np.abs(self.shortest_hitting_dist))) - - # TODO sparse - # dist_reward = 2 - - self.total_reward += dist_reward - self.goal_achievement = True - - # if self.hitting_time_point is not None and self.hitting_time_point > 600: - # self.total_reward += 1 - - else: - dist = self.goal_distance(target_ball_pos, racket_center_pos) - if dist <= self.shortest_hitting_dist: - self.shortest_hitting_dist = dist - dist_reward = 1 - np.tanh(self.shortest_hitting_dist) - reward = 0.2 * dist_reward # because it does not hit the ball, so multiply 0.2 - self.total_reward += reward - self.goal_achievement = False - - self.refresh_highest_reward() - - @classmethod - def relu(cls, x): - return np.maximum(0, x) - - # def right_table_contact(self, env): - # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"] - # if env.target_ball_pos[0] >= 0 and env.target_ball_pos[2] >= 0.7: - # # update right court contact detection - # if not self.right_court_contact_detector: - # self.right_court_contact_detector = self.contact_detection(env, right_court_contact_obj) - # if self.right_court_contact_detector: - # self.contact_x_pos = env.target_ball_pos[0] - # if self.right_court_contact_detector: - # self.total_reward += 1 - norm(0.685, 1).pdf(self.contact_x_pos) # x axis middle of right table - # self.goal_achievement = False - # else: - # self.total_reward += 1 - # self.goal_achievement = True - # # else: - # # self.total_reward += 0 - # # self.goal_achievement = False - # self.refresh_highest_reward() - - # def net_contact(self, env): - # net_contact_obj = ["target_ball", "table_tennis_net"] - # # net_contact_detector = self.contact_detection(env, net_contact_obj) - # # ball_x_pos = env.target_ball_pos[0] - # # if self.min_ball_x_pos >= ball_x_pos: - # # self.min_ball_x_pos = ball_x_pos - # # table_left_edge_x_pos = -1.37 - # # if np.abs(ball_x_pos) <= 0.01: # x threshold of net - # # if self.lowest_z >= env.target_ball_pos[2]: - # # self.lowest_z = env.target_ball_pos[2] - # # # construct a gaussian distribution of z - # # z_reward = 4 - norm(0, 0.1).pdf(self.lowest_z - 0.07625) # maximum 4 - # # self.total_reward += z_reward - # # self.total_reward += 2 - np.minimum(1, self.relu(np.abs(self.min_ball_x_pos))) - # if not self.net_contact_detector: - # self.net_contact_detector = self.contact_detection(env, net_contact_obj) - # if self.net_contact_detector: - # self.total_reward += 0 # very high cost - # self.goal_achievement = False - # else: - # self.total_reward += 1 - # self.goal_achievement = True - # self.refresh_highest_reward() - - # def landing_on_opponent_court(self, env): - # # Very sparse reward - # # don't contact the right side court - # # right_court_contact_obj = ["target_ball", "table_tennis_table_right_side"] - # # right_court_contact_detector = self.contact_detection(env, right_court_contact_obj) - # left_court_contact_obj = ["target_ball", "table_tennis_table_left_side"] - # # left_court_contact_detector = self.contact_detection(env, left_court_contact_obj) - # # record the contact history - # # if not self.right_court_contact_detector: - # # self.right_court_contact_detector = self.contact_detection(env, right_court_contact_obj) - # if not self.table_contact_detector: - # self.table_contact_detector = self.contact_detection(env, left_court_contact_obj) - # - # dist_left_up_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("left_up_corner")) - # dist_middle_up_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("middle_up_corner")) - # dist_left_down_corner = self.goal_distance(env.target_ball_pos, env.sim.data.get_site_xpos("left_down_corner")) - # dist_middle_down_corner = self.goal_distance(env.target_ball_pos, - # env.sim.data.get_site_xpos("middle_down_corner")) - # dist_array = np.array( - # [dist_left_up_corner, dist_middle_up_corner, dist_left_down_corner, dist_middle_down_corner]) - # dist_corner = np.amin(dist_array) - # if self.lowest_corner_dist >= dist_corner: - # self.lowest_corner_dist = dist_corner - # - # right_contact_cost = 1 - # left_contact_reward = 2 - # dist_left_table_reward = (2 - np.tanh(self.lowest_corner_dist)) - # # TODO Try multi dimensional gaussian distribution - # # contact only the left side court - # if self.right_court_contact_detector: - # self.total_reward += 0 - # self.goal_achievement = False - # if self.table_contact_detector: - # self.total_reward += left_contact_reward - # self.goal_achievement = False - # else: - # self.total_reward += dist_left_table_reward - # self.goal_achievement = False - # else: - # self.total_reward += right_contact_cost - # if self.table_contact_detector: - # self.total_reward += left_contact_reward - # self.goal_achievement = True - # else: - # self.total_reward += dist_left_table_reward - # self.goal_achievement = False - # self.refresh_highest_reward() - # # if self.left_court_contact_detector and not self.right_court_contact_detector: - # # self.total_reward += self.ratio * left_contact_reward - # # print("only left court reward return!!!!!!!!!") - # # print("contact only left court!!!!!!") - # # self.goal_achievement = True - # # # no contact with table - # # elif not self.right_court_contact_detector and not self.left_court_contact_detector: - # # self.total_reward += 0 + self.ratio * dist_left_table_reward - # # self.goal_achievement = False - # # # contact both side - # # elif self.right_court_contact_detector and self.left_court_contact_detector: - # # self.total_reward += self.ratio * (left_contact_reward - right_contact_cost) # cost of contact of right court - # # self.goal_achievement = False - # # # contact only the right side court - # # elif self.right_court_contact_detector and not self.left_court_contact_detector: - # # self.total_reward += 0 + self.ratio * ( - # # dist_left_table_reward - right_contact_cost) # cost of contact of right court - # # self.goal_achievement = False - - def target_achievement(self, env): - target_coordinate = np.array([-0.5, -0.5]) - # net_contact_obj = ["target_ball", "table_tennis_net"] - table_contact_obj = ["target_ball", "table_tennis_table"] - floor_contact_obj = ["target_ball", "floor"] - - if 0.78 < env.target_ball_pos[2] < 0.8: - dist_target_virtual = np.linalg.norm(env.target_ball_pos[:2] - target_coordinate) - if self.dist_target_virtual > dist_target_virtual: - self.dist_target_virtual = dist_target_virtual - if -0.07 < env.target_ball_pos[0] < 0.07 and env.sim.data.get_joint_qvel('tar:x') < 0: - if self.ball_z_pos_lowest > env.target_ball_pos[2]: - self.ball_z_pos_lowest = env.target_ball_pos[2].copy() - # if not self.net_contact_detector: - # self.net_contact_detector = self.contact_detection(env, net_contact_obj) - if not self.table_contact_detector: - self.table_contact_detector = self.contact_detection(env, table_contact_obj) - if not self.floor_contact_detector: - self.floor_contact_detector = self.contact_detection(env, floor_contact_obj) - if not self.target_flag: - # Table Contact Reward. - if self.table_contact_detector: - self.total_reward += 1 - # only update when the first contact because of the flag - contact_coordinate = env.target_ball_pos[:2].copy() - print("contact table ball coordinate: ", env.target_ball_pos) - logging.info("contact table ball coordinate: {}".format(env.target_ball_pos)) - dist_target = np.linalg.norm(contact_coordinate - target_coordinate) - self.total_reward += (1 - np.tanh(dist_target)) * 2 - self.target_flag = True - # Net Contact Reward. Precondition: Table Contact exits. - if contact_coordinate[0] < 0: - print("left table contact") - logging.info("~~~~~~~~~~~~~~~left table contact~~~~~~~~~~~~~~~") - self.total_reward += 1 - # TODO Z coordinate reward - # self.total_reward += np.maximum(np.tanh(self.ball_z_pos_lowest), 0) - self.goal_achievement = True - else: - print("right table contact") - logging.info("~~~~~~~~~~~~~~~right table contact~~~~~~~~~~~~~~~") - self.total_reward += 0 - self.goal_achievement = False - # if self.net_contact_detector: - # self.total_reward += 0 - # self.goal_achievement = False - # else: - # self.total_reward += 1 - # self.goal_achievement = False - # Floor Contact Reward. Precondition: Table Contact exits. - elif self.floor_contact_detector: - self.total_reward += (1 - np.tanh(self.dist_target_virtual)) - self.target_flag = True - self.goal_achievement = False - # No Contact of Floor or Table, flying - else: - pass - # else: - # print("Flag is True already") - self.refresh_highest_reward() - - def distance_to_target(self): - pass - - @classmethod - def contact_detection(cls, env, goal_contact): - for i in range(env.sim.data.ncon): - contact = env.sim.data.contact[i] - achieved_geom1_name = env.sim.model.geom_id2name(contact.geom1) - achieved_geom2_name = env.sim.model.geom_id2name(contact.geom2) - if np.all([(achieved_geom1_name in goal_contact), (achieved_geom2_name in goal_contact)]): - print("contact of " + achieved_geom1_name + " " + achieved_geom2_name) - return True - else: - return False diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py deleted file mode 100644 index 6e6aa32..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/rewards.py +++ /dev/null @@ -1,136 +0,0 @@ -# Copyright 2017 The dm_control Authors. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# ============================================================================ - -# """Soft indicator function evaluating whether a number is within bounds.""" -# -# from __future__ import absolute_import -# from __future__ import division -# from __future__ import print_function - -# Internal dependencies. -import numpy as np - -# The value returned by tolerance() at `margin` distance from `bounds` interval. -_DEFAULT_VALUE_AT_MARGIN = 0.1 - - -def _sigmoids(x, value_at_1, sigmoid): - """Returns 1 when `x` == 0, between 0 and 1 otherwise. - - Args: - x: A scalar or numpy array. - value_at_1: A float between 0 and 1 specifying the output when `x` == 1. - sigmoid: String, choice of sigmoid type. - - Returns: - A numpy array with values between 0.0 and 1.0. - - Raises: - ValueError: If not 0 < `value_at_1` < 1, except for `linear`, `cosine` and - `quadratic` sigmoids which allow `value_at_1` == 0. - ValueError: If `sigmoid` is of an unknown type. - """ - if sigmoid in ('cosine', 'linear', 'quadratic'): - if not 0 <= value_at_1 < 1: - raise ValueError('`value_at_1` must be nonnegative and smaller than 1, ' - 'got {}.'.format(value_at_1)) - else: - if not 0 < value_at_1 < 1: - raise ValueError('`value_at_1` must be strictly between 0 and 1, ' - 'got {}.'.format(value_at_1)) - - if sigmoid == 'gaussian': - scale = np.sqrt(-2 * np.log(value_at_1)) - return np.exp(-0.5 * (x*scale)**2) - - elif sigmoid == 'hyperbolic': - scale = np.arccosh(1/value_at_1) - return 1 / np.cosh(x*scale) - - elif sigmoid == 'long_tail': - scale = np.sqrt(1/value_at_1 - 1) - return 1 / ((x*scale)**2 + 1) - - elif sigmoid == 'cosine': - scale = np.arccos(2*value_at_1 - 1) / np.pi - scaled_x = x*scale - return np.where(abs(scaled_x) < 1, (1 + np.cos(np.pi*scaled_x))/2, 0.0) - - elif sigmoid == 'linear': - scale = 1-value_at_1 - scaled_x = x*scale - return np.where(abs(scaled_x) < 1, 1 - scaled_x, 0.0) - - elif sigmoid == 'quadratic': - scale = np.sqrt(1-value_at_1) - scaled_x = x*scale - return np.where(abs(scaled_x) < 1, 1 - scaled_x**2, 0.0) - - elif sigmoid == 'tanh_squared': - scale = np.arctanh(np.sqrt(1-value_at_1)) - return 1 - np.tanh(x*scale)**2 - - else: - raise ValueError('Unknown sigmoid type {!r}.'.format(sigmoid)) - - -def tolerance(x, bounds=(0.0, 0.0), margin=0.0, sigmoid='gaussian', - value_at_margin=_DEFAULT_VALUE_AT_MARGIN): - """Returns 1 when `x` falls inside the bounds, between 0 and 1 otherwise. - - Args: - x: A scalar or numpy array. - bounds: A tuple of floats specifying inclusive `(lower, upper)` bounds for - the target interval. These can be infinite if the interval is unbounded - at one or both ends, or they can be equal to one another if the target - value is exact. - margin: Float. Parameter that controls how steeply the output decreases as - `x` moves out-of-bounds. - * If `margin == 0` then the output will be 0 for all values of `x` - outside of `bounds`. - * If `margin > 0` then the output will decrease sigmoidally with - increasing distance from the nearest bound. - sigmoid: String, choice of sigmoid type. Valid values are: 'gaussian', - 'linear', 'hyperbolic', 'long_tail', 'cosine', 'tanh_squared'. - value_at_margin: A float between 0 and 1 specifying the output value when - the distance from `x` to the nearest bound is equal to `margin`. Ignored - if `margin == 0`. - - Returns: - A float or numpy array with values between 0.0 and 1.0. - - Raises: - ValueError: If `bounds[0] > bounds[1]`. - ValueError: If `margin` is negative. - """ - lower, upper = bounds - if lower > upper: - raise ValueError('Lower bound must be <= upper bound.') - if margin < 0: - raise ValueError('`margin` must be non-negative.') - - in_bounds = np.logical_and(lower <= x, x <= upper) - if margin == 0: - value = np.where(in_bounds, 1.0, 0.0) - else: - d = np.where(x < lower, lower - x, x - upper) / margin - value = np.where(in_bounds, 1.0, _sigmoids(d, value_at_margin, sigmoid)) - - return float(value) if np.isscalar(x) else value - - - - - diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py b/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py deleted file mode 100644 index fa308e3..0000000 --- a/alr_envs/alr/mujoco/gym_table_tennis/utils/util.py +++ /dev/null @@ -1,49 +0,0 @@ -import json -import yaml -import xml.etree.ElementTree as ET -from collections import OrderedDict -from pathlib import Path - - -def read_json(fname): - fname = Path(fname) - with fname.open('rt') as handle: - return json.load(handle, object_hook=OrderedDict) - - -def write_json(content, fname): - fname = Path(fname) - with fname.open('wt') as handle: - json.dump(content, handle, indent=4, sort_keys=False) - - -def read_yaml(fname): - fname = Path(fname) - with fname.open('rt') as handle: - return yaml.load(handle, Loader=yaml.FullLoader) - - -def write_yaml(content, fname): - fname = Path(fname) - with fname.open('wt') as handle: - yaml.dump(content, handle) - - -def config_save(dir_path, config): - dir_path = Path(dir_path) - config_path_json = dir_path / "config.json" - config_path_yaml = dir_path / "config.yml" - # .json and .yml file,save 2 version of configuration. - write_json(config, config_path_json) - write_yaml(config, config_path_yaml) - - -def change_kp_in_xml(kp_list, - model_path="/home/zhou/slow/table_tennis_rl/simulation/gymTableTennis/gym_table_tennis/simple_reacher/robotics/assets/table_tennis/right_arm_actuator.xml"): - tree = ET.parse(model_path) - root = tree.getroot() - # for actuator in root.find("actuator"): - for position, kp in zip(root.iter('position'), kp_list): - position.set("kp", str(kp)) - tree.write(model_path) - diff --git a/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl deleted file mode 100755 index 133b112..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/base_link_convex.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl deleted file mode 100755 index 047e9df..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/base_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_dist_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_dist_link_fine.stl deleted file mode 100644 index 5ff94a2..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_dist_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_convex.stl deleted file mode 100644 index c548448..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_convex.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_fine.stl deleted file mode 100644 index 495160d..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_med_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl deleted file mode 100644 index b4bb322..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl deleted file mode 100644 index f05174e..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_fine.stl deleted file mode 100644 index eb252d9..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl deleted file mode 100644 index c039f0d..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl deleted file mode 100644 index 250acaf..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl deleted file mode 100644 index 993d0f7..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl b/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl deleted file mode 100644 index 8448a3f..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/elbow_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/elbow_link_convex.stl deleted file mode 100755 index b34963d..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/elbow_link_convex.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl deleted file mode 100755 index e6aa6b6..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl deleted file mode 100755 index 667902e..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/forearm_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/forearm_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/forearm_link_fine.stl deleted file mode 100755 index ed66bbb..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/forearm_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl deleted file mode 100755 index aba957d..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl deleted file mode 100755 index 5cca6a9..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl deleted file mode 100755 index 3343e27..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_convex_decomposition_p3.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_link_fine.stl deleted file mode 100755 index ae505fd..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_convex.stl deleted file mode 100755 index c36cfec..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_convex.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_fine.stl deleted file mode 100755 index dc633c4..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/shoulder_pitch_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl deleted file mode 100755 index 82d0093..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl deleted file mode 100755 index 7fd5a55..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_fine.stl deleted file mode 100755 index 76353ae..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/upper_arm_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_convex.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_convex.stl deleted file mode 100755 index a0386f6..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_convex.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl deleted file mode 100644 index c36f88f..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl deleted file mode 100644 index d00cac1..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl deleted file mode 100755 index 34d1d8b..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_fine.stl deleted file mode 100644 index 13d2f73..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_pitch_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl deleted file mode 100755 index 06e857f..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl deleted file mode 100755 index 48e1bb1..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_fine.stl b/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_fine.stl deleted file mode 100644 index 0d95239..0000000 Binary files a/alr_envs/alr/mujoco/meshes/wam/wrist_yaw_link_fine.stl and /dev/null differ diff --git a/alr_envs/alr/mujoco/reacher/alr_reacher.py b/alr_envs/alr/mujoco/reacher/alr_reacher.py deleted file mode 100644 index c2b5f18..0000000 --- a/alr_envs/alr/mujoco/reacher/alr_reacher.py +++ /dev/null @@ -1,139 +0,0 @@ -import os - -import numpy as np -from gym import utils -from gym.envs.mujoco import MujocoEnv - -import alr_envs.utils.utils as alr_utils - - -class ALRReacherEnv(MujocoEnv, utils.EzPickle): - def __init__(self, steps_before_reward=200, n_links=5, balance=False): - utils.EzPickle.__init__(**locals()) - - self._steps = 0 - self.steps_before_reward = steps_before_reward - self.n_links = n_links - - self.balance = balance - self.balance_weight = 1.0 - - self.reward_weight = 1 - if steps_before_reward == 200: - self.reward_weight = 200 - elif steps_before_reward == 50: - self.reward_weight = 50 - - if n_links == 5: - file_name = 'reacher_5links.xml' - elif n_links == 7: - file_name = 'reacher_7links.xml' - else: - raise ValueError(f"Invalid number of links {n_links}, only 5 or 7 allowed.") - - MujocoEnv.__init__(self, os.path.join(os.path.dirname(__file__), "assets", file_name), 2) - - def step(self, a): - self._steps += 1 - - reward_dist = 0.0 - angular_vel = 0.0 - reward_balance = 0.0 - if self._steps >= self.steps_before_reward: - vec = self.get_body_com("fingertip") - self.get_body_com("target") - reward_dist -= self.reward_weight * np.linalg.norm(vec) - if self.steps_before_reward > 0: - # avoid giving this penalty for normal step based case - # angular_vel -= 10 * np.linalg.norm(self.sim.data.qvel.flat[:self.n_links]) - angular_vel -= 10 * np.square(self.sim.data.qvel.flat[:self.n_links]).sum() - reward_ctrl = - 10 * np.square(a).sum() - - if self.balance: - reward_balance -= self.balance_weight * np.abs( - alr_utils.angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad")) - - reward = reward_dist + reward_ctrl + angular_vel + reward_balance - self.do_simulation(a, self.frame_skip) - ob = self._get_obs() - done = False - return ob, reward, done, dict(reward_dist=reward_dist, reward_ctrl=reward_ctrl, - velocity=angular_vel, reward_balance=reward_balance, - end_effector=self.get_body_com("fingertip").copy(), - goal=self.goal if hasattr(self, "goal") else None) - - def viewer_setup(self): - self.viewer.cam.trackbodyid = 0 - - # def reset_model(self): - # qpos = self.init_qpos - # if not hasattr(self, "goal"): - # self.goal = np.array([-0.25, 0.25]) - # # self.goal = self.init_qpos.copy()[:2] + 0.05 - # qpos[-2:] = self.goal - # qvel = self.init_qvel - # qvel[-2:] = 0 - # self.set_state(qpos, qvel) - # self._steps = 0 - # - # return self._get_obs() - - def reset_model(self): - qpos = self.init_qpos.copy() - while True: - self.goal = self.np_random.uniform(low=-self.n_links / 10, high=self.n_links / 10, size=2) - # self.goal = self.np_random.uniform(low=0, high=self.n_links / 10, size=2) - # self.goal = np.random.uniform(low=[-self.n_links / 10, 0], high=[0, self.n_links / 10], size=2) - if np.linalg.norm(self.goal) < self.n_links / 10: - break - qpos[-2:] = self.goal - qvel = self.init_qvel.copy() - qvel[-2:] = 0 - self.set_state(qpos, qvel) - self._steps = 0 - - return self._get_obs() - - # def reset_model(self): - # qpos = self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq) + self.init_qpos - # while True: - # self.goal = self.np_random.uniform(low=-self.n_links / 10, high=self.n_links / 10, size=2) - # if np.linalg.norm(self.goal) < self.n_links / 10: - # break - # qpos[-2:] = self.goal - # qvel = self.init_qvel + self.np_random.uniform(low=-.005, high=.005, size=self.model.nv) - # qvel[-2:] = 0 - # self.set_state(qpos, qvel) - # self._steps = 0 - # - # return self._get_obs() - - def _get_obs(self): - theta = self.sim.data.qpos.flat[:self.n_links] - target = self.get_body_com("target") - return np.concatenate([ - np.cos(theta), - np.sin(theta), - target[:2], # x-y of goal position - self.sim.data.qvel.flat[:self.n_links], # angular velocity - self.get_body_com("fingertip") - target, # goal distance - [self._steps], - ]) - - -if __name__ == '__main__': - nl = 5 - render_mode = "human" # "human" or "partial" or "final" - env = ALRReacherEnv(n_links=nl) - obs = env.reset() - - for i in range(2000): - # objective.load_result("/tmp/cma") - # test with random actions - ac = env.action_space.sample() - obs, rew, d, info = env.step(ac) - if i % 10 == 0: - env.render(mode=render_mode) - if d: - env.reset() - - env.close() diff --git a/alr_envs/alr/mujoco/reacher/balancing.py b/alr_envs/alr/mujoco/reacher/balancing.py deleted file mode 100644 index 3e34298..0000000 --- a/alr_envs/alr/mujoco/reacher/balancing.py +++ /dev/null @@ -1,53 +0,0 @@ -import os - -import numpy as np -from gym import utils -from gym.envs.mujoco import mujoco_env - -import alr_envs.utils.utils as alr_utils - - -class BalancingEnv(mujoco_env.MujocoEnv, utils.EzPickle): - def __init__(self, n_links=5): - utils.EzPickle.__init__(**locals()) - - self.n_links = n_links - - if n_links == 5: - file_name = 'reacher_5links.xml' - elif n_links == 7: - file_name = 'reacher_7links.xml' - else: - raise ValueError(f"Invalid number of links {n_links}, only 5 or 7 allowed.") - - mujoco_env.MujocoEnv.__init__(self, os.path.join(os.path.dirname(__file__), "assets", file_name), 2) - - def step(self, a): - angle = alr_utils.angle_normalize(np.sum(self.sim.data.qpos.flat[:self.n_links]), type="rad") - reward = - np.abs(angle) - - self.do_simulation(a, self.frame_skip) - ob = self._get_obs() - done = False - return ob, reward, done, dict(angle=angle, end_effector=self.get_body_com("fingertip").copy()) - - def viewer_setup(self): - self.viewer.cam.trackbodyid = 1 - - def reset_model(self): - # This also generates a goal, we however do not need/use it - qpos = self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq) + self.init_qpos - qpos[-2:] = 0 - qvel = self.init_qvel + self.np_random.uniform(low=-.005, high=.005, size=self.model.nv) - qvel[-2:] = 0 - self.set_state(qpos, qvel) - - return self._get_obs() - - def _get_obs(self): - theta = self.sim.data.qpos.flat[:self.n_links] - return np.concatenate([ - np.cos(theta), - np.sin(theta), - self.sim.data.qvel.flat[:self.n_links], # this is angular velocity - ]) diff --git a/alr_envs/alr/mujoco/reacher/mp_wrapper.py b/alr_envs/alr/mujoco/reacher/mp_wrapper.py deleted file mode 100644 index abcdc50..0000000 --- a/alr_envs/alr/mujoco/reacher/mp_wrapper.py +++ /dev/null @@ -1,43 +0,0 @@ -from typing import Union - -import numpy as np -from mp_env_api import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - - @property - def active_obs(self): - return np.concatenate([ - [False] * self.n_links, # cos - [False] * self.n_links, # sin - [True] * 2, # goal position - [False] * self.n_links, # angular velocity - [False] * 3, # goal distance - # self.get_body_com("target"), # only return target to make problem harder - [False], # step - ]) - - # @property - # def active_obs(self): - # return np.concatenate([ - # [True] * self.n_links, # cos, True - # [True] * self.n_links, # sin, True - # [True] * 2, # goal position - # [True] * self.n_links, # angular velocity, True - # [True] * 3, # goal distance - # # self.get_body_com("target"), # only return target to make problem harder - # [False], # step - # ]) - - @property - def current_vel(self) -> Union[float, int, np.ndarray]: - return self.sim.data.qvel.flat[:self.n_links] - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - return self.sim.data.qpos.flat[:self.n_links] - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/table_tennis/mp_wrapper.py b/alr_envs/alr/mujoco/table_tennis/mp_wrapper.py deleted file mode 100644 index 86d5a00..0000000 --- a/alr_envs/alr/mujoco/table_tennis/mp_wrapper.py +++ /dev/null @@ -1,38 +0,0 @@ -from typing import Tuple, Union - -import numpy as np - -from mp_env_api.interface_wrappers.mp_env_wrapper import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - - @property - def active_obs(self): - # TODO: @Max Filter observations correctly - return np.hstack([ - [True] * 7, # Joint Pos - [True] * 3, # Ball pos - [True] * 3 # goal pos - ]) - - @property - def start_pos(self): - return self.self.init_qpos_tt - - @property - def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qpos[:7].copy() - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - return self.sim.data.qvel[:7].copy() - - @property - def goal_pos(self): - # TODO: @Max I think the default value of returning to the start is reasonable here - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/table_tennis/tt_gym.py b/alr_envs/alr/mujoco/table_tennis/tt_gym.py deleted file mode 100644 index d1c2dc3..0000000 --- a/alr_envs/alr/mujoco/table_tennis/tt_gym.py +++ /dev/null @@ -1,180 +0,0 @@ -import os - -import numpy as np -import mujoco_py -from gym import utils, spaces -from gym.envs.mujoco import MujocoEnv - -from alr_envs.alr.mujoco.table_tennis.tt_utils import ball_init -from alr_envs.alr.mujoco.table_tennis.tt_reward import TT_Reward - -#TODO: Check for simulation stability. Make sure the code runs even for sim crash - -MAX_EPISODE_STEPS = 1750 -BALL_NAME_CONTACT = "target_ball_contact" -BALL_NAME = "target_ball" -TABLE_NAME = 'table_tennis_table' -FLOOR_NAME = 'floor' -PADDLE_CONTACT_1_NAME = 'bat' -PADDLE_CONTACT_2_NAME = 'bat_back' -RACKET_NAME = 'bat' -# CONTEXT_RANGE_BOUNDS_2DIM = np.array([[-1.2, -0.6], [-0.2, 0.6]]) -CONTEXT_RANGE_BOUNDS_2DIM = np.array([[-1.2, -0.6], [-0.2, 0.0]]) -CONTEXT_RANGE_BOUNDS_4DIM = np.array([[-1.35, -0.75, -1.25, -0.75], [-0.1, 0.75, -0.1, 0.75]]) - - -class TTEnvGym(MujocoEnv, utils.EzPickle): - - def __init__(self, ctxt_dim=2, fixed_goal=False): - model_path = os.path.join(os.path.dirname(__file__), "xml", 'table_tennis_env.xml') - - self.ctxt_dim = ctxt_dim - self.fixed_goal = fixed_goal - if ctxt_dim == 2: - self.context_range_bounds = CONTEXT_RANGE_BOUNDS_2DIM - if self.fixed_goal: - self.goal = np.array([-1, -0.1, 0]) - else: - self.goal = np.zeros(3) # 2 x,y + 1z - elif ctxt_dim == 4: - self.context_range_bounds = CONTEXT_RANGE_BOUNDS_4DIM - self.goal = np.zeros(3) - else: - raise ValueError("either 2 or 4 dimensional Contexts available") - - # has no effect as it is overwritten in init of super - # action_space_low = np.array([-2.6, -2.0, -2.8, -0.9, -4.8, -1.6, -2.2]) - # action_space_high = np.array([2.6, 2.0, 2.8, 3.1, 1.3, 1.6, 2.2]) - # self.action_space = spaces.Box(low=action_space_low, high=action_space_high, dtype='float64') - - self.time_steps = 0 - self.init_qpos_tt = np.array([0, 0, 0, 1.5, 0, 0, 1.5, 0, 0, 0]) - self.init_qvel_tt = np.zeros(10) - - self.reward_func = TT_Reward(self.ctxt_dim) - self.ball_landing_pos = None - self.hit_ball = False - self.ball_contact_after_hit = False - self._ids_set = False - super(TTEnvGym, self).__init__(model_path=model_path, frame_skip=1) - self.ball_id = self.sim.model._body_name2id[BALL_NAME] # find the proper -> not protected func. - self.ball_contact_id = self.sim.model._geom_name2id[BALL_NAME_CONTACT] - self.table_contact_id = self.sim.model._geom_name2id[TABLE_NAME] - self.floor_contact_id = self.sim.model._geom_name2id[FLOOR_NAME] - self.paddle_contact_id_1 = self.sim.model._geom_name2id[PADDLE_CONTACT_1_NAME] # check if we need both or only this - self.paddle_contact_id_2 = self.sim.model._geom_name2id[PADDLE_CONTACT_2_NAME] # check if we need both or only this - self.racket_id = self.sim.model._geom_name2id[RACKET_NAME] - - def _set_ids(self): - self.ball_id = self.sim.model._body_name2id[BALL_NAME] # find the proper -> not protected func. - self.table_contact_id = self.sim.model._geom_name2id[TABLE_NAME] - self.floor_contact_id = self.sim.model._geom_name2id[FLOOR_NAME] - self.paddle_contact_id_1 = self.sim.model._geom_name2id[PADDLE_CONTACT_1_NAME] # check if we need both or only this - self.paddle_contact_id_2 = self.sim.model._geom_name2id[PADDLE_CONTACT_2_NAME] # check if we need both or only this - self.racket_id = self.sim.model._geom_name2id[RACKET_NAME] - self.ball_contact_id = self.sim.model._geom_name2id[BALL_NAME_CONTACT] - self._ids_set = True - - def _get_obs(self): - ball_pos = self.sim.data.body_xpos[self.ball_id] - obs = np.concatenate([self.sim.data.qpos[:7].copy(), # 7 joint positions - ball_pos, - self.goal.copy()]) - return obs - - def sample_context(self): - return self.np_random.uniform(self.context_range_bounds[0], self.context_range_bounds[1], size=self.ctxt_dim) - - def reset_model(self): - self.set_state(self.init_qpos_tt, self.init_qvel_tt) # reset to initial sim state - self.time_steps = 0 - self.ball_landing_pos = None - self.hit_ball = False - self.ball_contact_after_hit = False - if self.fixed_goal: - self.goal = self.goal[:2] - else: - self.goal = self.sample_context()[:2] - if self.ctxt_dim == 2: - initial_ball_state = ball_init(random=False) # fixed velocity, fixed position - elif self.ctxt_dim == 4: - initial_ball_state = ball_init(random=False)#raise NotImplementedError - - self.sim.data.set_joint_qpos('tar:x', initial_ball_state[0]) - self.sim.data.set_joint_qpos('tar:y', initial_ball_state[1]) - self.sim.data.set_joint_qpos('tar:z', initial_ball_state[2]) - - self.sim.data.set_joint_qvel('tar:x', initial_ball_state[3]) - self.sim.data.set_joint_qvel('tar:y', initial_ball_state[4]) - self.sim.data.set_joint_qvel('tar:z', initial_ball_state[5]) - - z_extended_goal_pos = np.concatenate((self.goal[:2], [0.77])) - self.goal = z_extended_goal_pos - self.sim.model.body_pos[5] = self.goal[:3] # Desired Landing Position, Yellow - self.sim.model.body_pos[3] = np.array([0, 0, 0.5]) # Outgoing Ball Landing Position, Green - self.sim.model.body_pos[4] = np.array([0, 0, 0.5]) # Incoming Ball Landing Position, Red - self.sim.forward() - - self.reward_func.reset(self.goal) # reset the reward function - return self._get_obs() - - def _contact_checker(self, id_1, id_2): - for coni in range(0, self.sim.data.ncon): - con = self.sim.data.contact[coni] - collision = con.geom1 == id_1 and con.geom2 == id_2 - collision_trans = con.geom1 == id_2 and con.geom2 == id_1 - if collision or collision_trans: - return True - return False - - def step(self, action): - if not self._ids_set: - self._set_ids() - done = False - episode_end = False if self.time_steps + 1 < MAX_EPISODE_STEPS else True - if not self.hit_ball: - self.hit_ball = self._contact_checker(self.ball_contact_id, self.paddle_contact_id_1) # check for one side - if not self.hit_ball: - self.hit_ball = self._contact_checker(self.ball_contact_id, self.paddle_contact_id_2) # check for other side - if self.hit_ball: - if not self.ball_contact_after_hit: - if self._contact_checker(self.ball_contact_id, self.floor_contact_id): # first check contact with floor - self.ball_contact_after_hit = True - self.ball_landing_pos = self.sim.data.body_xpos[self.ball_id] - elif self._contact_checker(self.ball_contact_id, self.table_contact_id): # second check contact with table - self.ball_contact_after_hit = True - self.ball_landing_pos = self.sim.data.body_xpos[self.ball_id] - c_ball_pos = self.sim.data.body_xpos[self.ball_id] - racket_pos = self.sim.data.geom_xpos[self.racket_id] # TODO: use this to reach out the position of the paddle? - if self.ball_landing_pos is not None: - done = True - episode_end =True - reward = self.reward_func.get_reward(episode_end, c_ball_pos, racket_pos, self.hit_ball, self.ball_landing_pos) - self.time_steps += 1 - # gravity compensation on joints: - #action += self.sim.data.qfrc_bias[:7].copy() - try: - self.do_simulation(action, self.frame_skip) - except mujoco_py.MujocoException as e: - print('Simulation got unstable returning') - done = True - reward = -25 - ob = self._get_obs() - info = {"hit_ball": self.hit_ball, - "q_pos": np.copy(self.sim.data.qpos[:7]), - "ball_pos": np.copy(self.sim.data.qpos[7:])} - return ob, reward, done, info # might add some information here .... - - def set_context(self, context): - old_state = self.sim.get_state() - qpos = old_state.qpos.copy() - qvel = old_state.qvel.copy() - self.set_state(qpos, qvel) - self.goal = context - z_extended_goal_pos = np.concatenate((self.goal[:self.ctxt_dim], [0.77])) - if self.ctxt_dim == 4: - z_extended_goal_pos = np.concatenate((z_extended_goal_pos, [0.77])) - self.goal = z_extended_goal_pos - self.sim.model.body_pos[5] = self.goal[:3] # TODO: Missing: Setting the desired incomoing landing position - self.sim.forward() - return self._get_obs() diff --git a/alr_envs/alr/mujoco/table_tennis/tt_reward.py b/alr_envs/alr/mujoco/table_tennis/tt_reward.py deleted file mode 100644 index 0e1bebf..0000000 --- a/alr_envs/alr/mujoco/table_tennis/tt_reward.py +++ /dev/null @@ -1,48 +0,0 @@ -import numpy as np - - -class TT_Reward: - - def __init__(self, ctxt_dim): - self.ctxt_dim = ctxt_dim - self.c_goal = None # current desired landing point - self.c_ball_traj = [] - self.c_racket_traj = [] - self.constant = 8 - - def get_reward(self, episode_end, ball_position, racket_pos, hited_ball, ball_landing_pos): - self.c_ball_traj.append(ball_position.copy()) - self.c_racket_traj.append(racket_pos.copy()) - if not episode_end: - return 0 - else: - # # seems to work for episodic case - min_r_b_dist = np.min(np.linalg.norm(np.array(self.c_ball_traj) - np.array(self.c_racket_traj), axis=1)) - if not hited_ball: - return 0.2 * (1 - np.tanh(min_r_b_dist**2)) - else: - if ball_landing_pos is None: - min_b_des_b_dist = np.min(np.linalg.norm(np.array(self.c_ball_traj)[:,:2] - self.c_goal[:2], axis=1)) - return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + (1 - np.tanh(min_b_des_b_dist**2)) - else: - min_b_des_b_land_dist = np.linalg.norm(self.c_goal[:2] - ball_landing_pos[:2]) - over_net_bonus = int(ball_landing_pos[0] < 0) - return 2 * (1 - np.tanh(min_r_b_dist ** 2)) + 4 * (1 - np.tanh(min_b_des_b_land_dist ** 2)) + over_net_bonus - - - # if not hited_ball: - # min_r_b_dist = 1 + np.min(np.linalg.norm(np.array(self.c_ball_traj) - np.array(self.c_racket_traj), axis=1)) - # return -min_r_b_dist - # else: - # if ball_landing_pos is None: - # dist_to_des_pos = 1-np.power(np.linalg.norm(self.c_goal - ball_position), 0.75)/self.constant - # else: - # dist_to_des_pos = 1-np.power(np.linalg.norm(self.c_goal - ball_landing_pos), 0.75)/self.constant - # if dist_to_des_pos < -0.2: - # dist_to_des_pos = -0.2 - # return -dist_to_des_pos - - def reset(self, goal): - self.c_goal = goal.copy() - self.c_ball_traj = [] - self.c_racket_traj = [] \ No newline at end of file diff --git a/alr_envs/alr/mujoco/table_tennis/tt_utils.py b/alr_envs/alr/mujoco/table_tennis/tt_utils.py deleted file mode 100644 index 04a1b97..0000000 --- a/alr_envs/alr/mujoco/table_tennis/tt_utils.py +++ /dev/null @@ -1,26 +0,0 @@ -import numpy as np - - -def ball_init(random=False, context_range=None): - if random: - dx = np.random.uniform(-0.1, 0.1) # TODO: clarify these numbers? - dy = np.random.uniform(-0.1, 0.1) # TODO: clarify these numbers? - dz = np.random.uniform(-0.1, 0.1) # TODO: clarify these numbers? - - v_x = np.random.uniform(1.7, 1.8) - v_y = np.random.uniform(0.7, 0.8) - v_z = np.random.uniform(0.1, 0.2) - else: - dx = 1 - dy = 0 - dz = 0.05 - - v_x = 2.5 - v_y = 2 - v_z = 0.5 - - initial_x = 0 + dx - 1.2 - initial_y = -0.2 + dy - 0.6 - initial_z = 0.3 + dz + 1.5 - initial_ball_state = np.array([initial_x, initial_y, initial_z, v_x, v_y, v_z]) - return initial_ball_state diff --git a/alr_envs/alr/mujoco/table_tennis/xml/include_7_motor_actuator.xml b/alr_envs/alr/mujoco/table_tennis/xml/include_7_motor_actuator.xml deleted file mode 100644 index ec94d96..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/include_7_motor_actuator.xml +++ /dev/null @@ -1,12 +0,0 @@ - - - - - - - - - - - - diff --git a/alr_envs/alr/mujoco/table_tennis/xml/include_barrett_wam_7dof_right.xml b/alr_envs/alr/mujoco/table_tennis/xml/include_barrett_wam_7dof_right.xml deleted file mode 100644 index 7e04c28..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/include_barrett_wam_7dof_right.xml +++ /dev/null @@ -1,103 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/table_tennis/xml/include_table.xml b/alr_envs/alr/mujoco/table_tennis/xml/include_table.xml deleted file mode 100644 index c313489..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/include_table.xml +++ /dev/null @@ -1,30 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/table_tennis/xml/include_target_ball.xml b/alr_envs/alr/mujoco/table_tennis/xml/include_target_ball.xml deleted file mode 100644 index 19543e2..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/include_target_ball.xml +++ /dev/null @@ -1,10 +0,0 @@ - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/table_tennis/xml/right_arm_actuator.xml b/alr_envs/alr/mujoco/table_tennis/xml/right_arm_actuator.xml deleted file mode 100644 index 25d0366..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/right_arm_actuator.xml +++ /dev/null @@ -1,47 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/alr_envs/alr/mujoco/table_tennis/xml/shared.xml b/alr_envs/alr/mujoco/table_tennis/xml/shared.xml deleted file mode 100644 index e349992..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/shared.xml +++ /dev/null @@ -1,46 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/alr/mujoco/table_tennis/xml/table_tennis_env.xml b/alr_envs/alr/mujoco/table_tennis/xml/table_tennis_env.xml deleted file mode 100644 index 9609a6e..0000000 --- a/alr_envs/alr/mujoco/table_tennis/xml/table_tennis_env.xml +++ /dev/null @@ -1,18 +0,0 @@ - - - - - - - - - - - - - - - - \ No newline at end of file diff --git a/alr_envs/dmc/__init__.py b/alr_envs/dmc/__init__.py deleted file mode 100644 index ac34415..0000000 --- a/alr_envs/dmc/__init__.py +++ /dev/null @@ -1,378 +0,0 @@ -from . import manipulation, suite - -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} - -from gym.envs.registration import register - -# DeepMind Control Suite (DMC) - -register( - id=f'dmc_ball_in_cup-catch_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"ball_in_cup-catch", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.ball_in_cup.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0") - -register( - id=f'dmc_ball_in_cup-catch_promp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"ball_in_cup-catch", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.ball_in_cup.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "policy_type": "motor", - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0") - -register( - id=f'dmc_reacher-easy_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"reacher-easy", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0") - -register( - id=f'dmc_reacher-easy_promp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"reacher-easy", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0") - -register( - id=f'dmc_reacher-hard_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"reacher-hard", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0") - -register( - id=f'dmc_reacher-hard_promp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"reacher-hard", - "time_limit": 20, - "episode_length": 1000, - "wrappers": [suite.reacher.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 5, - "duration": 20, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 50, - "d_gains": 1 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0") - -_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"] - -for _task in _dmc_cartpole_tasks: - _env_id = f'dmc_cartpole-{_task}_dmp-v0' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-{_task}", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } - ) - ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - - _env_id = f'dmc_cartpole-{_task}_promp-v0' - register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"cartpole-{_task}", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } - ) - ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -_env_id = f'dmc_cartpole-two_poles_dmp-v0' -register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - -_env_id = f'dmc_cartpole-two_poles_promp-v0' -register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"cartpole-two_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.TwoPolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -_env_id = f'dmc_cartpole-three_poles_dmp-v0' -register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", - "weights_scale": 50, - "goal_scale": 0.1, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) - -_env_id = f'dmc_cartpole-three_poles_promp-v0' -register( - id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"cartpole-three_poles", - # "time_limit": 1, - "camera_id": 0, - "episode_length": 1000, - "wrappers": [suite.cartpole.ThreePolesMPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 5, - "duration": 10, - "policy_type": "motor", - "weights_scale": 0.2, - "zero_start": True, - "policy_kwargs": { - "p_gains": 10, - "d_gains": 10 - } - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) - -# DeepMind Manipulation - -register( - id=f'dmc_manipulation-reach_site_dmp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_dmp_env_helper', - # max_episode_steps=1, - kwargs={ - "name": f"manipulation-reach_site_features", - # "time_limit": 1, - "episode_length": 250, - "wrappers": [manipulation.reach_site.MPWrapper], - "mp_kwargs": { - "num_dof": 9, - "num_basis": 5, - "duration": 10, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1, - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0") - -register( - id=f'dmc_manipulation-reach_site_promp-v0', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": f"manipulation-reach_site_features", - # "time_limit": 1, - "episode_length": 250, - "wrappers": [manipulation.reach_site.MPWrapper], - "mp_kwargs": { - "num_dof": 9, - "num_basis": 5, - "duration": 10, - "policy_type": "velocity", - "weights_scale": 0.2, - "zero_start": True, - } - } -) -ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0") diff --git a/alr_envs/dmc/dmc_wrapper.py b/alr_envs/dmc/dmc_wrapper.py deleted file mode 100644 index aa6c7aa..0000000 --- a/alr_envs/dmc/dmc_wrapper.py +++ /dev/null @@ -1,206 +0,0 @@ -# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py -# License: MIT -# Copyright (c) 2020 Denis Yarats -import collections -from typing import Any, Dict, Tuple - -import numpy as np -from dm_control import manipulation, suite -from dm_env import specs -from gym import core, spaces - - -def _spec_to_box(spec): - def extract_min_max(s): - assert s.dtype == np.float64 or s.dtype == np.float32, f"Only float64 and float32 types are allowed, instead {s.dtype} was found" - dim = int(np.prod(s.shape)) - if type(s) == specs.Array: - bound = np.inf * np.ones(dim, dtype=s.dtype) - return -bound, bound - elif type(s) == specs.BoundedArray: - zeros = np.zeros(dim, dtype=s.dtype) - return s.minimum + zeros, s.maximum + zeros - - mins, maxs = [], [] - for s in spec: - mn, mx = extract_min_max(s) - mins.append(mn) - maxs.append(mx) - low = np.concatenate(mins, axis=0) - high = np.concatenate(maxs, axis=0) - assert low.shape == high.shape - return spaces.Box(low, high, dtype=s.dtype) - - -def _flatten_obs(obs: collections.MutableMapping): - """ - Flattens an observation of type MutableMapping, e.g. a dict to a 1D array. - Args: - obs: observation to flatten - - Returns: 1D array of observation - - """ - - if not isinstance(obs, collections.MutableMapping): - raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.') - - # Keep key order consistent for non OrderedDicts - keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys()) - - obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys] - return np.concatenate(obs_vals) - - -class DMCWrapper(core.Env): - def __init__( - self, - domain_name: str, - task_name: str, - task_kwargs: dict = {}, - visualize_reward: bool = True, - from_pixels: bool = False, - height: int = 84, - width: int = 84, - camera_id: int = 0, - frame_skip: int = 1, - environment_kwargs: dict = None, - channels_first: bool = True - ): - assert 'random' in task_kwargs, 'Please specify a seed for deterministic behavior.' - self._from_pixels = from_pixels - self._height = height - self._width = width - self._camera_id = camera_id - self._frame_skip = frame_skip - self._channels_first = channels_first - - # create task - if domain_name == "manipulation": - assert not from_pixels and not task_name.endswith("_vision"), \ - "TODO: Vision interface for manipulation is different to suite and needs to be implemented" - self._env = manipulation.load(environment_name=task_name, seed=task_kwargs['random']) - else: - self._env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs, - visualize_reward=visualize_reward, environment_kwargs=environment_kwargs) - - # action and observation space - self._action_space = _spec_to_box([self._env.action_spec()]) - self._observation_space = _spec_to_box(self._env.observation_spec().values()) - - self._last_state = None - self.viewer = None - - # set seed - self.seed(seed=task_kwargs.get('random', 1)) - - def __getattr__(self, item): - """Propagate only non-existent properties to wrapped env.""" - if item.startswith('_'): - raise AttributeError("attempted to get missing private attribute '{}'".format(item)) - if item in self.__dict__: - return getattr(self, item) - return getattr(self._env, item) - - def _get_obs(self, time_step): - if self._from_pixels: - obs = self.render( - mode="rgb_array", - height=self._height, - width=self._width, - camera_id=self._camera_id - ) - if self._channels_first: - obs = obs.transpose(2, 0, 1).copy() - else: - obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype) - return obs - - @property - def observation_space(self): - return self._observation_space - - @property - def action_space(self): - return self._action_space - - @property - def dt(self): - return self._env.control_timestep() * self._frame_skip - - @property - def base_step_limit(self): - """ - Returns: max_episode_steps of the underlying DMC env - - """ - # Accessing private attribute because DMC does not expose time_limit or step_limit. - # Only the current time_step/time as well as the control_timestep can be accessed. - try: - return (self._env._step_limit + self._frame_skip - 1) // self._frame_skip - except AttributeError as e: - return self._env._time_limit / self.dt - - def seed(self, seed=None): - self._action_space.seed(seed) - self._observation_space.seed(seed) - - def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]: - assert self._action_space.contains(action) - reward = 0 - extra = {'internal_state': self._env.physics.get_state().copy()} - - for _ in range(self._frame_skip): - time_step = self._env.step(action) - reward += time_step.reward or 0. - done = time_step.last() - if done: - break - - self._last_state = _flatten_obs(time_step.observation) - obs = self._get_obs(time_step) - extra['discount'] = time_step.discount - return obs, reward, done, extra - - def reset(self) -> np.ndarray: - time_step = self._env.reset() - self._last_state = _flatten_obs(time_step.observation) - obs = self._get_obs(time_step) - return obs - - def render(self, mode='rgb_array', height=None, width=None, camera_id=0): - if self._last_state is None: - raise ValueError('Environment not ready to render. Call reset() first.') - - camera_id = camera_id or self._camera_id - - # assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode - if mode == "rgb_array": - height = height or self._height - width = width or self._width - return self._env.physics.render(height=height, width=width, camera_id=camera_id) - - elif mode == 'human': - if self.viewer is None: - # pylint: disable=import-outside-toplevel - # pylint: disable=g-import-not-at-top - from gym.envs.classic_control import rendering - self.viewer = rendering.SimpleImageViewer() - # Render max available buffer size. Larger is only possible by altering the XML. - img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight, - width=self._env.physics.model.vis.global_.offwidth, - camera_id=camera_id) - self.viewer.imshow(img) - return self.viewer.isopen - - def close(self): - super().close() - if self.viewer is not None and self.viewer.isopen: - self.viewer.close() - - @property - def reward_range(self) -> Tuple[float, float]: - reward_spec = self._env.reward_spec() - if isinstance(reward_spec, specs.BoundedArray): - return reward_spec.minimum, reward_spec.maximum - return -float('inf'), float('inf') diff --git a/alr_envs/examples/examples_motion_primitives.py b/alr_envs/examples/examples_motion_primitives.py deleted file mode 100644 index 1a679df..0000000 --- a/alr_envs/examples/examples_motion_primitives.py +++ /dev/null @@ -1,164 +0,0 @@ -import alr_envs - - -def example_mp(env_name="alr_envs:HoleReacherDMP-v1", seed=1, iterations=1, render=True): - """ - Example for running a motion primitive based environment, which is already registered - Args: - env_name: DMP env_id - seed: seed for deterministic behaviour - iterations: Number of rollout steps to run - render: Render the episode - - Returns: - - """ - # While in this case gym.make() is possible to use as well, we recommend our custom make env function. - # First, it already takes care of seeding and second enables the use of DMC tasks within the gym interface. - env = alr_envs.make(env_name, seed) - - rewards = 0 - # env.render(mode=None) - obs = env.reset() - - # number of samples/full trajectories (multiple environment steps) - for i in range(iterations): - - if render and i % 2 == 0: - # This renders the full MP trajectory - # It is only required to call render() once in the beginning, which renders every consecutive trajectory. - # Resetting to no rendering, can be achieved by render(mode=None). - # It is also possible to change the mode multiple times when - # e.g. only every second trajectory should be displayed, such as here - # Just make sure the correct mode is set before executing the step. - env.render(mode="human") - else: - env.render(mode=None) - - ac = env.action_space.sample() - obs, reward, done, info = env.step(ac) - rewards += reward - - if done: - print(rewards) - rewards = 0 - obs = env.reset() - - -def example_custom_mp(env_name="alr_envs:HoleReacherDMP-v1", seed=1, iterations=1, render=True): - """ - Example for running a motion primitive based environment, which is already registered - Args: - env_name: DMP env_id - seed: seed for deterministic behaviour - iterations: Number of rollout steps to run - render: Render the episode - - Returns: - - """ - # Changing the mp_kwargs is possible by providing them to gym. - # E.g. here by providing way to many basis functions - mp_kwargs = { - "num_dof": 5, - "num_basis": 1000, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1 - } - env = alr_envs.make(env_name, seed, mp_kwargs=mp_kwargs) - - # This time rendering every trajectory - if render: - env.render(mode="human") - - rewards = 0 - obs = env.reset() - - # number of samples/full trajectories (multiple environment steps) - for i in range(iterations): - ac = env.action_space.sample() - obs, reward, done, info = env.step(ac) - rewards += reward - - if done: - print(rewards) - rewards = 0 - obs = env.reset() - - -def example_fully_custom_mp(seed=1, iterations=1, render=True): - """ - Example for running a custom motion primitive based environments. - Our already registered environments follow the same structure. - Hence, this also allows to adjust hyperparameters of the motion primitives. - Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks. - We appreciate PRs for custom environments (especially MP wrappers of existing tasks) - for our repo: https://github.com/ALRhub/alr_envs/ - Args: - seed: seed - iterations: Number of rollout steps to run - render: Render the episode - - Returns: - - """ - - base_env = "alr_envs:HoleReacher-v1" - - # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. - # You can also add other gym.Wrappers in case they are needed. - wrappers = [alr_envs.alr.classic_control.hole_reacher.MPWrapper] - mp_kwargs = { - "num_dof": 5, - "num_basis": 5, - "duration": 2, - "learn_goal": True, - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "velocity", - "weights_scale": 50, - "goal_scale": 0.1 - } - env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) - # OR for a deterministic ProMP: - # env = make_promp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) - - if render: - env.render(mode="human") - - rewards = 0 - obs = env.reset() - - # number of samples/full trajectories (multiple environment steps) - for i in range(iterations): - ac = env.action_space.sample() - obs, reward, done, info = env.step(ac) - rewards += reward - - if done: - print(rewards) - rewards = 0 - obs = env.reset() - - -if __name__ == '__main__': - render = False - # DMP - example_mp("alr_envs:HoleReacherDMP-v1", seed=10, iterations=1, render=render) - - # ProMP - example_mp("alr_envs:HoleReacherProMP-v1", seed=10, iterations=1, render=render) - - # DetProMP - example_mp("alr_envs:HoleReacherDetPMP-v1", seed=10, iterations=1, render=render) - - # Altered basis functions - example_custom_mp("alr_envs:HoleReacherDMP-v1", seed=10, iterations=1, render=render) - - # Custom MP - example_fully_custom_mp(seed=10, iterations=1, render=render) diff --git a/alr_envs/examples/examples_open_ai.py b/alr_envs/examples/examples_open_ai.py deleted file mode 100644 index dc0c558..0000000 --- a/alr_envs/examples/examples_open_ai.py +++ /dev/null @@ -1,41 +0,0 @@ -import alr_envs - - -def example_mp(env_name, seed=1): - """ - Example for running a motion primitive based version of a OpenAI-gym environment, which is already registered. - For more information on motion primitive specific stuff, look at the mp examples. - Args: - env_name: ProMP env_id - seed: seed - - Returns: - - """ - # While in this case gym.make() is possible to use as well, we recommend our custom make env function. - env = alr_envs.make(env_name, seed) - - rewards = 0 - obs = env.reset() - - # number of samples/full trajectories (multiple environment steps) - for i in range(10): - ac = env.action_space.sample() - obs, reward, done, info = env.step(ac) - rewards += reward - - if done: - print(rewards) - rewards = 0 - obs = env.reset() - - -if __name__ == '__main__': - # DMP - not supported yet - # example_mp("ReacherDMP-v2") - - # DetProMP - example_mp("ContinuousMountainCarProMP-v0") - example_mp("ReacherProMP-v2") - example_mp("FetchReachDenseProMP-v1") - example_mp("FetchSlideDenseProMP-v1") diff --git a/alr_envs/examples/pd_control_gain_tuning.py b/alr_envs/examples/pd_control_gain_tuning.py deleted file mode 100644 index eff432a..0000000 --- a/alr_envs/examples/pd_control_gain_tuning.py +++ /dev/null @@ -1,100 +0,0 @@ -import numpy as np -from matplotlib import pyplot as plt - -from alr_envs import dmc, meta -from alr_envs.alr import mujoco -from alr_envs.utils.make_env_helpers import make_promp_env - - -def visualize(env): - t = env.t - pos_features = env.mp.basis_generator.basis(t) - plt.plot(t, pos_features) - plt.show() - - -# This might work for some environments, however, please verify either way the correct trajectory information -# for your environment are extracted below -SEED = 1 -# env_id = "ball_in_cup-catch" -env_id = "ALRReacherSparse-v0" -env_id = "button-press-v2" -wrappers = [mujoco.reacher.MPWrapper] -wrappers = [meta.goal_object_change_mp_wrapper.MPWrapper] - -mp_kwargs = { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "policy_type": "metaworld", - "weights_scale": 10, - "zero_start": True, - # "policy_kwargs": { - # "p_gains": 1, - # "d_gains": 0.1 - # } -} - -# kwargs = dict(time_limit=4, episode_length=200) -kwargs = {} - -env = make_promp_env(env_id, wrappers, seed=SEED, mp_kwargs=mp_kwargs, **kwargs) -env.action_space.seed(SEED) - -# Plot difference between real trajectory and target MP trajectory -env.reset() -w = env.action_space.sample() # N(0,1) -visualize(env) -pos, vel = env.mp_rollout(w) - -base_shape = env.full_action_space.shape -actual_pos = np.zeros((len(pos), *base_shape)) -actual_vel = np.zeros((len(pos), *base_shape)) -act = np.zeros((len(pos), *base_shape)) - -plt.ion() -fig = plt.figure() -ax = fig.add_subplot(1, 1, 1) -img = ax.imshow(env.env.render("rgb_array")) -fig.show() - -for t, pos_vel in enumerate(zip(pos, vel)): - actions = env.policy.get_action(pos_vel[0], pos_vel[1]) - actions = np.clip(actions, env.full_action_space.low, env.full_action_space.high) - _, _, _, _ = env.env.step(actions) - if t % 15 == 0: - img.set_data(env.env.render("rgb_array")) - fig.canvas.draw() - fig.canvas.flush_events() - act[t, :] = actions - # TODO verify for your environment - actual_pos[t, :] = env.current_pos - actual_vel[t, :] = 0 # env.current_vel - -plt.figure(figsize=(15, 5)) - -plt.subplot(131) -plt.title("Position") -p1 = plt.plot(actual_pos, c='C0', label="true") -# plt.plot(actual_pos_ball, label="true pos ball") -p2 = plt.plot(pos, c='C1', label="MP") # , label=["MP" if i == 0 else None for i in range(np.prod(base_shape))]) -plt.xlabel("Episode steps") -# plt.legend() -handles, labels = plt.gca().get_legend_handles_labels() -from collections import OrderedDict - -by_label = OrderedDict(zip(labels, handles)) -plt.legend(by_label.values(), by_label.keys()) - -plt.subplot(132) -plt.title("Velocity") -plt.plot(actual_vel, c='C0', label="true") -plt.plot(vel, c='C1', label="MP") -plt.xlabel("Episode steps") - -plt.subplot(133) -plt.title(f"Actions {np.std(act, axis=0)}") -plt.plot(act, c="C0"), # label=[f"actions" if i == 0 else "" for i in range(np.prod(base_action_shape))]) -plt.xlabel("Episode steps") -# plt.legend() -plt.show() diff --git a/alr_envs/open_ai/__init__.py b/alr_envs/open_ai/__init__.py deleted file mode 100644 index 41b770f..0000000 --- a/alr_envs/open_ai/__init__.py +++ /dev/null @@ -1,154 +0,0 @@ -from gym import register -from gym.wrappers import FlattenObservation - -from . import classic_control, mujoco, robotics - -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} - -# Short Continuous Mountain Car -register( - id="MountainCarContinuous-v1", - entry_point="gym.envs.classic_control:Continuous_MountainCarEnv", - max_episode_steps=100, - reward_threshold=90.0, -) - -# Open AI -# Classic Control -register( - id='ContinuousMountainCarProMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "alr_envs:MountainCarContinuous-v1", - "wrappers": [classic_control.continuous_mountain_car.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 4, - "duration": 2, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "motor", - "policy_kwargs": { - "p_gains": 1., - "d_gains": 1. - } - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ContinuousMountainCarProMP-v1") - -register( - id='ContinuousMountainCarProMP-v0', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.classic_control:MountainCarContinuous-v0", - "wrappers": [classic_control.continuous_mountain_car.MPWrapper], - "mp_kwargs": { - "num_dof": 1, - "num_basis": 4, - "duration": 19.98, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "motor", - "policy_kwargs": { - "p_gains": 1., - "d_gains": 1. - } - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ContinuousMountainCarProMP-v0") - -register( - id='ReacherProMP-v2', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.mujoco:Reacher-v2", - "wrappers": [mujoco.reacher_v2.MPWrapper], - "mp_kwargs": { - "num_dof": 2, - "num_basis": 6, - "duration": 1, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "motor", - "policy_kwargs": { - "p_gains": .6, - "d_gains": .075 - } - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2") - -register( - id='FetchSlideDenseProMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchSlideDense-v1", - "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "position" - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideDenseProMP-v1") - -register( - id='FetchSlideProMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchSlide-v1", - "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "position" - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideProMP-v1") - -register( - id='FetchReachDenseProMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchReachDense-v1", - "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "position" - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachDenseProMP-v1") - -register( - id='FetchReachProMP-v1', - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": "gym.envs.robotics:FetchReach-v1", - "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 2, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "position" - } - } -) -ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachProMP-v1") diff --git a/alr_envs/open_ai/classic_control/__init__.py b/alr_envs/open_ai/classic_control/__init__.py deleted file mode 100644 index b998494..0000000 --- a/alr_envs/open_ai/classic_control/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from . import continuous_mountain_car \ No newline at end of file diff --git a/alr_envs/open_ai/classic_control/continuous_mountain_car/__init__.py b/alr_envs/open_ai/classic_control/continuous_mountain_car/__init__.py deleted file mode 100644 index 989b5a9..0000000 --- a/alr_envs/open_ai/classic_control/continuous_mountain_car/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/open_ai/classic_control/continuous_mountain_car/mp_wrapper.py b/alr_envs/open_ai/classic_control/continuous_mountain_car/mp_wrapper.py deleted file mode 100644 index 2a2357a..0000000 --- a/alr_envs/open_ai/classic_control/continuous_mountain_car/mp_wrapper.py +++ /dev/null @@ -1,22 +0,0 @@ -from typing import Union - -import numpy as np -from mp_env_api import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - @property - def current_vel(self) -> Union[float, int, np.ndarray]: - return np.array([self.state[1]]) - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - return np.array([self.state[0]]) - - @property - def goal_pos(self): - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return 0.02 \ No newline at end of file diff --git a/alr_envs/open_ai/mujoco/reacher_v2/__init__.py b/alr_envs/open_ai/mujoco/reacher_v2/__init__.py deleted file mode 100644 index 989b5a9..0000000 --- a/alr_envs/open_ai/mujoco/reacher_v2/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/open_ai/mujoco/reacher_v2/mp_wrapper.py b/alr_envs/open_ai/mujoco/reacher_v2/mp_wrapper.py deleted file mode 100644 index 16202e5..0000000 --- a/alr_envs/open_ai/mujoco/reacher_v2/mp_wrapper.py +++ /dev/null @@ -1,19 +0,0 @@ -from typing import Union - -import numpy as np -from mp_env_api import MPEnvWrapper - - -class MPWrapper(MPEnvWrapper): - - @property - def current_vel(self) -> Union[float, int, np.ndarray]: - return self.sim.data.qvel[:2] - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - return self.sim.data.qpos[:2] - - @property - def dt(self) -> Union[float, int]: - return self.env.dt \ No newline at end of file diff --git a/alr_envs/open_ai/robotics/fetch/__init__.py b/alr_envs/open_ai/robotics/fetch/__init__.py deleted file mode 100644 index 989b5a9..0000000 --- a/alr_envs/open_ai/robotics/fetch/__init__.py +++ /dev/null @@ -1 +0,0 @@ -from .mp_wrapper import MPWrapper \ No newline at end of file diff --git a/alr_envs/utils/__init__.py b/alr_envs/utils/__init__.py deleted file mode 100644 index bb4b0bc..0000000 --- a/alr_envs/utils/__init__.py +++ /dev/null @@ -1,66 +0,0 @@ -import re -from typing import Union - -import gym -from gym.envs.registration import register - -from alr_envs.utils.make_env_helpers import make - - -def make_dmc( - id: str, - seed: int = 1, - visualize_reward: bool = True, - from_pixels: bool = False, - height: int = 84, - width: int = 84, - camera_id: int = 0, - frame_skip: int = 1, - episode_length: Union[None, int] = None, - environment_kwargs: dict = {}, - time_limit: Union[None, float] = None, - channels_first: bool = True -): - # Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/__init__.py - # License: MIT - # Copyright (c) 2020 Denis Yarats - - assert re.match(r"\w+-\w+", id), "env_id does not have the following structure: 'domain_name-task_name'" - domain_name, task_name = id.split("-") - - env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1' - - if from_pixels: - assert not visualize_reward, 'cannot use visualize reward when learning from pixels' - - # shorten episode length - if episode_length is None: - # Default lengths for benchmarking suite is 1000 and for manipulation tasks 250 - episode_length = 250 if domain_name == "manipulation" else 1000 - - max_episode_steps = (episode_length + frame_skip - 1) // frame_skip - if env_id not in gym.envs.registry.env_specs: - task_kwargs = {'random': seed} - # if seed is not None: - # task_kwargs['random'] = seed - if time_limit is not None: - task_kwargs['time_limit'] = time_limit - register( - id=env_id, - entry_point='alr_envs.dmc.dmc_wrapper:DMCWrapper', - kwargs=dict( - domain_name=domain_name, - task_name=task_name, - task_kwargs=task_kwargs, - environment_kwargs=environment_kwargs, - visualize_reward=visualize_reward, - from_pixels=from_pixels, - height=height, - width=width, - camera_id=camera_id, - frame_skip=frame_skip, - channels_first=channels_first, - ), - max_episode_steps=max_episode_steps, - ) - return gym.make(env_id) diff --git a/alr_envs/utils/make_env_helpers.py b/alr_envs/utils/make_env_helpers.py deleted file mode 100644 index 4300439..0000000 --- a/alr_envs/utils/make_env_helpers.py +++ /dev/null @@ -1,224 +0,0 @@ -import warnings -from typing import Iterable, Type, Union - -import gym -import numpy as np -from gym.envs.registration import EnvSpec - -from mp_env_api import MPEnvWrapper -from mp_env_api.mp_wrappers.dmp_wrapper import DmpWrapper -from mp_env_api.mp_wrappers.promp_wrapper import ProMPWrapper - - -def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs): - """ - TODO: Do we need this? - Generate a callable to create a new gym environment with a given seed. - The rank is added to the seed and can be used for example when using vector environments. - E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments - with seeds 123 through 130. - Hence, testing environments should be seeded with a value which is offset by the number of training environments. - Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns - - Args: - env_id: name of the environment - seed: seed for deterministic behaviour - rank: environment rank for deterministic over multiple seeds behaviour - return_callable: If True returns a callable to create the environment instead of the environment itself. - - Returns: - - """ - - def f(): - return make(env_id, seed + rank, **kwargs) - - return f if return_callable else f() - - -def make(env_id: str, seed, **kwargs): - """ - Converts an env_id to an environment with the gym API. - This also works for DeepMind Control Suite interface_wrappers - for which domain name and task name are expected to be separated by "-". - Args: - env_id: gym name or env_id of the form "domain_name-task_name" for DMC tasks - **kwargs: Additional kwargs for the constructor such as pixel observations, etc. - - Returns: Gym environment - - """ - if any([det_pmp in env_id for det_pmp in ["DetPMP", "detpmp"]]): - warnings.warn("DetPMP is deprecated and converted to ProMP") - env_id = env_id.replace("DetPMP", "ProMP") - env_id = env_id.replace("detpmp", "promp") - - try: - # Add seed to kwargs in case it is a predefined gym+dmc hybrid environment. - if env_id.startswith("dmc"): - kwargs.update({"seed": seed}) - - # Gym - env = gym.make(env_id, **kwargs) - env.seed(seed) - env.action_space.seed(seed) - env.observation_space.seed(seed) - except gym.error.Error: - - # MetaWorld env - import metaworld - if env_id in metaworld.ML1.ENV_NAMES: - env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs) - # setting this avoids generating the same initialization after each reset - env._freeze_rand_vec = False - # Manually set spec, as metaworld environments are not registered via gym - env.unwrapped.spec = EnvSpec(env_id) - # Set Timelimit based on the maximum allowed path length of the environment - env = gym.wrappers.TimeLimit(env, max_episode_steps=env.max_path_length) - env.seed(seed) - env.action_space.seed(seed) - env.observation_space.seed(seed) - env.goal_space.seed(seed) - - else: - # DMC - from alr_envs import make_dmc - env = make_dmc(env_id, seed=seed, **kwargs) - - assert env.base_step_limit == env.spec.max_episode_steps, \ - f"The specified 'episode_length' of {env.spec.max_episode_steps} steps for gym is different from " \ - f"the DMC environment specification of {env.base_step_limit} steps." - - return env - - -def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs): - """ - Helper function for creating a wrapped gym environment using MPs. - It adds all provided wrappers to the specified environment and verifies at least one MPEnvWrapper is - provided to expose the interface for MPs. - - Args: - env_id: name of the environment - wrappers: list of wrappers (at least an MPEnvWrapper), - seed: seed of environment - - Returns: gym environment with all specified wrappers applied - - """ - # _env = gym.make(env_id) - _env = make(env_id, seed, **kwargs) - - assert any(issubclass(w, MPEnvWrapper) for w in wrappers), \ - "At least one MPEnvWrapper is required in order to leverage motion primitive environments." - for w in wrappers: - _env = w(_env) - - return _env - - -def make_dmp_env(env_id: str, wrappers: Iterable, seed=1, mp_kwargs={}, **kwargs): - """ - This can also be used standalone for manually building a custom DMP environment. - Args: - env_id: base_env_name, - wrappers: list of wrappers (at least an MPEnvWrapper), - seed: seed of environment - mp_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP - - Returns: DMP wrapped gym env - - """ - _verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) - - _env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) - - _verify_dof(_env, mp_kwargs.get("num_dof")) - - return DmpWrapper(_env, **mp_kwargs) - - -def make_promp_env(env_id: str, wrappers: Iterable, seed=1, mp_kwargs={}, **kwargs): - """ - This can also be used standalone for manually building a custom ProMP environment. - Args: - env_id: base_env_name, - wrappers: list of wrappers (at least an MPEnvWrapper), - mp_kwargs: dict of at least {num_dof: int, num_basis: int, width: int} - - Returns: ProMP wrapped gym env - - """ - _verify_time_limit(mp_kwargs.get("duration", None), kwargs.get("time_limit", None)) - - _env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) - - _verify_dof(_env, mp_kwargs.get("num_dof")) - - return ProMPWrapper(_env, **mp_kwargs) - - -def make_dmp_env_helper(**kwargs): - """ - Helper function for registering a DMP gym environments. - Args: - **kwargs: expects at least the following: - { - "name": base_env_name, - "wrappers": list of wrappers (at least an MPEnvWrapper), - "mp_kwargs": dict of at least {num_dof: int, num_basis: int} for DMP - } - - Returns: DMP wrapped gym env - - """ - seed = kwargs.pop("seed", None) - return make_dmp_env(env_id=kwargs.pop("name"), wrappers=kwargs.pop("wrappers"), seed=seed, - mp_kwargs=kwargs.pop("mp_kwargs"), **kwargs) - - -def make_promp_env_helper(**kwargs): - """ - Helper function for registering ProMP gym environments. - This can also be used standalone for manually building a custom ProMP environment. - Args: - **kwargs: expects at least the following: - { - "name": base_env_name, - "wrappers": list of wrappers (at least an MPEnvWrapper), - "mp_kwargs": dict of at least {num_dof: int, num_basis: int, width: int} - } - - Returns: ProMP wrapped gym env - - """ - seed = kwargs.pop("seed", None) - return make_promp_env(env_id=kwargs.pop("name"), wrappers=kwargs.pop("wrappers"), seed=seed, - mp_kwargs=kwargs.pop("mp_kwargs"), **kwargs) - - -def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]): - """ - When using DMC check if a manually specified time limit matches the trajectory duration the MP receives. - Mostly, the time_limit for DMC is not specified and the default values from DMC are taken. - This check, however, can only been done after instantiating the environment. - It can be found in the BaseMP class. - - Args: - mp_time_limit: max trajectory length of mp in seconds - env_time_limit: max trajectory length of DMC environment in seconds - - Returns: - - """ - if mp_time_limit is not None and env_time_limit is not None: - assert mp_time_limit == env_time_limit, \ - f"The specified 'time_limit' of {env_time_limit}s does not match " \ - f"the duration of {mp_time_limit}s for the MP." - - -def _verify_dof(base_env: gym.Env, dof: int): - action_shape = np.prod(base_env.action_space.shape) - assert dof == action_shape, \ - f"The specified degrees of freedom ('num_dof') {dof} do not match " \ - f"the action space of {action_shape} the base environments" diff --git a/alr_envs/utils/utils.py b/alr_envs/utils/utils.py deleted file mode 100644 index 3354db3..0000000 --- a/alr_envs/utils/utils.py +++ /dev/null @@ -1,21 +0,0 @@ -import numpy as np - - -def angle_normalize(x, type="deg"): - """ - normalize angle x to [-pi,pi]. - Args: - x: Angle in either degrees or radians - type: one of "deg" or "rad" for x being in degrees or radians - - Returns: - - """ - - if type not in ["deg", "rad"]: raise ValueError(f"Invalid type {type}. Choose one of 'deg' or 'rad'.") - - if type == "deg": - x = np.deg2rad(x) # x * pi / 180 - - two_pi = 2 * np.pi - return x - two_pi * np.floor((x + np.pi) / two_pi) diff --git a/fancy_gym/__init__.py b/fancy_gym/__init__.py new file mode 100644 index 0000000..f6f690a --- /dev/null +++ b/fancy_gym/__init__.py @@ -0,0 +1,13 @@ +from fancy_gym import dmc, meta, open_ai +from fancy_gym.utils.make_env_helpers import make, make_bb, make_rank +from .dmc import ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS +# Convenience function for all MP environments +from .envs import ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS +from .meta import ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS +from .open_ai import ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS + +ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS = { + key: value + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] + + ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] + + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS[key] + for key, value in ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items()} diff --git a/__init__.py b/fancy_gym/black_box/__init__.py similarity index 100% rename from __init__.py rename to fancy_gym/black_box/__init__.py diff --git a/fancy_gym/black_box/black_box_wrapper.py b/fancy_gym/black_box/black_box_wrapper.py new file mode 100644 index 0000000..4e3b160 --- /dev/null +++ b/fancy_gym/black_box/black_box_wrapper.py @@ -0,0 +1,182 @@ +from typing import Tuple, Optional + +import gym +import numpy as np +from gym import spaces +from mp_pytorch.mp.mp_interfaces import MPInterface + +from fancy_gym.black_box.controller.base_controller import BaseController +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper +from fancy_gym.utils.utils import get_numpy + + +class BlackBoxWrapper(gym.ObservationWrapper): + + def __init__(self, + env: RawInterfaceWrapper, + trajectory_generator: MPInterface, + tracking_controller: BaseController, + duration: float, + verbose: int = 1, + learn_sub_trajectories: bool = False, + replanning_schedule: Optional[callable] = None, + reward_aggregation: callable = np.sum + ): + """ + gym.Wrapper for leveraging a black box approach with a trajectory generator. + + Args: + env: The (wrapped) environment this wrapper is applied on + trajectory_generator: Generates the full or partial trajectory + tracking_controller: Translates the desired trajectory to raw action sequences + duration: Length of the trajectory of the movement primitive in seconds + verbose: level of detail for returned values in info dict. + learn_sub_trajectories: Transforms full episode learning into learning sub-trajectories, similar to + step-based learning + replanning_schedule: callable that receives + reward_aggregation: function that takes the np.ndarray of step rewards as input and returns the trajectory + reward, default summation over all values. + """ + super().__init__(env) + + self.duration = duration + self.learn_sub_trajectories = learn_sub_trajectories + self.do_replanning = replanning_schedule is not None + self.replanning_schedule = replanning_schedule or (lambda *x: False) + self.current_traj_steps = 0 + + # trajectory generation + self.traj_gen = trajectory_generator + self.tracking_controller = tracking_controller + # self.time_steps = np.linspace(0, self.duration, self.traj_steps) + # self.traj_gen.set_mp_times(self.time_steps) + self.traj_gen.set_duration(self.duration - self.dt, self.dt) + + # reward computation + self.reward_aggregation = reward_aggregation + + # spaces + self.return_context_observation = not (learn_sub_trajectories or self.do_replanning) + self.traj_gen_action_space = self._get_traj_gen_action_space() + self.action_space = self._get_action_space() + self.observation_space = self._get_observation_space() + + # rendering + self.render_kwargs = {} + self.verbose = verbose + + def observation(self, observation): + # return context space if we are + if self.return_context_observation: + observation = observation[self.env.context_mask] + # cast dtype because metaworld returns incorrect that throws gym error + return observation.astype(self.observation_space.dtype) + + def get_trajectory(self, action: np.ndarray) -> Tuple: + clipped_params = np.clip(action, self.traj_gen_action_space.low, self.traj_gen_action_space.high) + self.traj_gen.set_params(clipped_params) + bc_time = np.array(0 if not self.do_replanning else self.current_traj_steps * self.dt) + # TODO we could think about initializing with the previous desired value in order to have a smooth transition + # at least from the planning point of view. + self.traj_gen.set_boundary_conditions(bc_time, self.current_pos, self.current_vel) + duration = None if self.learn_sub_trajectories else self.duration + self.traj_gen.set_duration(duration, self.dt) + # traj_dict = self.traj_gen.get_trajs(get_pos=True, get_vel=True) + trajectory = get_numpy(self.traj_gen.get_traj_pos()) + velocity = get_numpy(self.traj_gen.get_traj_vel()) + + # Remove first element of trajectory as this is the current position and velocity + # trajectory = trajectory[1:] + # velocity = velocity[1:] + + return trajectory, velocity + + def _get_traj_gen_action_space(self): + """This function can be used to set up an individual space for the parameters of the traj_gen.""" + min_action_bounds, max_action_bounds = self.traj_gen.get_params_bounds() + action_space = gym.spaces.Box(low=min_action_bounds.numpy(), high=max_action_bounds.numpy(), + dtype=self.env.action_space.dtype) + return action_space + + def _get_action_space(self): + """ + This function can be used to modify the action space for considering actions which are not learned via movement + primitives. E.g. ball releasing time for the beer pong task. By default, it is the parameter space of the + movement primitive. + Only needs to be overwritten if the action space needs to be modified. + """ + try: + return self.traj_gen_action_space + except AttributeError: + return self._get_traj_gen_action_space() + + def _get_observation_space(self): + if self.return_context_observation: + mask = self.env.context_mask + # return full observation + min_obs_bound = self.env.observation_space.low[mask] + max_obs_bound = self.env.observation_space.high[mask] + return spaces.Box(low=min_obs_bound, high=max_obs_bound, dtype=self.env.observation_space.dtype) + return self.env.observation_space + + def step(self, action: np.ndarray): + """ This function generates a trajectory based on a MP and then does the usual loop over reset and step""" + + # TODO remove this part, right now only needed for beer pong + mp_params, env_spec_params = self.env.episode_callback(action, self.traj_gen) + trajectory, velocity = self.get_trajectory(mp_params) + + trajectory_length = len(trajectory) + rewards = np.zeros(shape=(trajectory_length,)) + if self.verbose >= 2: + actions = np.zeros(shape=(trajectory_length,) + self.env.action_space.shape) + observations = np.zeros(shape=(trajectory_length,) + self.env.observation_space.shape, + dtype=self.env.observation_space.dtype) + + infos = dict() + done = False + + for t, (pos, vel) in enumerate(zip(trajectory, velocity)): + step_action = self.tracking_controller.get_action(pos, vel, self.current_pos, self.current_vel) + c_action = np.clip(step_action, self.env.action_space.low, self.env.action_space.high) + obs, c_reward, done, info = self.env.step(c_action) + rewards[t] = c_reward + + if self.verbose >= 2: + actions[t, :] = c_action + observations[t, :] = obs + + for k, v in info.items(): + elems = infos.get(k, [None] * trajectory_length) + elems[t] = v + infos[k] = elems + + if self.render_kwargs: + self.env.render(**self.render_kwargs) + + if done or self.replanning_schedule(self.current_pos, self.current_vel, obs, c_action, + t + 1 + self.current_traj_steps): + break + + infos.update({k: v[:t] for k, v in infos.items()}) + self.current_traj_steps += t + 1 + + if self.verbose >= 2: + infos['positions'] = trajectory + infos['velocities'] = velocity + infos['step_actions'] = actions[:t + 1] + infos['step_observations'] = observations[:t + 1] + infos['step_rewards'] = rewards[:t + 1] + + infos['trajectory_length'] = t + 1 + trajectory_return = self.reward_aggregation(rewards[:t + 1]) + return self.observation(obs), trajectory_return, done, infos + + def render(self, **kwargs): + """Only set render options here, such that they can be used during the rollout. + This only needs to be called once""" + self.render_kwargs = kwargs + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): + self.current_traj_steps = 0 + return super(BlackBoxWrapper, self).reset() diff --git a/alr_envs/alr/classic_control/base_reacher/__init__.py b/fancy_gym/black_box/controller/__init__.py similarity index 100% rename from alr_envs/alr/classic_control/base_reacher/__init__.py rename to fancy_gym/black_box/controller/__init__.py diff --git a/fancy_gym/black_box/controller/base_controller.py b/fancy_gym/black_box/controller/base_controller.py new file mode 100644 index 0000000..1ac1522 --- /dev/null +++ b/fancy_gym/black_box/controller/base_controller.py @@ -0,0 +1,4 @@ +class BaseController: + + def get_action(self, des_pos, des_vel, c_pos, c_vel): + raise NotImplementedError diff --git a/fancy_gym/black_box/controller/meta_world_controller.py b/fancy_gym/black_box/controller/meta_world_controller.py new file mode 100644 index 0000000..efd8983 --- /dev/null +++ b/fancy_gym/black_box/controller/meta_world_controller.py @@ -0,0 +1,24 @@ +import numpy as np + +from fancy_gym.black_box.controller.base_controller import BaseController + + +class MetaWorldController(BaseController): + """ + A Metaworld Controller. Using position and velocity information from a provided environment, + the tracking_controller calculates a response based on the desired position and velocity. + Unlike the other Controllers, this is a special tracking_controller for MetaWorld environments. + They use a position delta for the xyz coordinates and a raw position for the gripper opening. + + """ + + def get_action(self, des_pos, des_vel, c_pos, c_vel): + gripper_pos = des_pos[-1] + + cur_pos = c_pos[:-1] + xyz_pos = des_pos[:-1] + + assert xyz_pos.shape == cur_pos.shape, \ + f"Mismatch in dimension between desired position {xyz_pos.shape} and current position {cur_pos.shape}" + trq = np.hstack([(xyz_pos - cur_pos), gripper_pos]) + return trq diff --git a/fancy_gym/black_box/controller/pd_controller.py b/fancy_gym/black_box/controller/pd_controller.py new file mode 100644 index 0000000..35203d8 --- /dev/null +++ b/fancy_gym/black_box/controller/pd_controller.py @@ -0,0 +1,28 @@ +from typing import Union, Tuple + +from fancy_gym.black_box.controller.base_controller import BaseController + + +class PDController(BaseController): + """ + A PD-Controller. Using position and velocity information from a provided environment, + the tracking_controller calculates a response based on the desired position and velocity + + :param env: A position environment + :param p_gains: Factors for the proportional gains + :param d_gains: Factors for the differential gains + """ + + def __init__(self, + p_gains: Union[float, Tuple] = 1, + d_gains: Union[float, Tuple] = 0.5): + self.p_gains = p_gains + self.d_gains = d_gains + + def get_action(self, des_pos, des_vel, c_pos, c_vel): + assert des_pos.shape == c_pos.shape, \ + f"Mismatch in dimension between desired position {des_pos.shape} and current position {c_pos.shape}" + assert des_vel.shape == c_vel.shape, \ + f"Mismatch in dimension between desired velocity {des_vel.shape} and current velocity {c_vel.shape}" + trq = self.p_gains * (des_pos - c_pos) + self.d_gains * (des_vel - c_vel) + return trq diff --git a/fancy_gym/black_box/controller/pos_controller.py b/fancy_gym/black_box/controller/pos_controller.py new file mode 100644 index 0000000..517187d --- /dev/null +++ b/fancy_gym/black_box/controller/pos_controller.py @@ -0,0 +1,9 @@ +from fancy_gym.black_box.controller.base_controller import BaseController + + +class PosController(BaseController): + """ + A Position Controller. The tracking_controller calculates a response only based on the desired position. + """ + def get_action(self, des_pos, des_vel, c_pos, c_vel): + return des_pos diff --git a/fancy_gym/black_box/controller/vel_controller.py b/fancy_gym/black_box/controller/vel_controller.py new file mode 100644 index 0000000..b26d9ba --- /dev/null +++ b/fancy_gym/black_box/controller/vel_controller.py @@ -0,0 +1,9 @@ +from fancy_gym.black_box.controller.base_controller import BaseController + + +class VelController(BaseController): + """ + A Velocity Controller. The tracking_controller calculates a response only based on the desired velocity. + """ + def get_action(self, des_pos, des_vel, c_pos, c_vel): + return des_vel diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/__init__.py b/fancy_gym/black_box/factory/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/utils/__init__.py rename to fancy_gym/black_box/factory/__init__.py diff --git a/fancy_gym/black_box/factory/basis_generator_factory.py b/fancy_gym/black_box/factory/basis_generator_factory.py new file mode 100644 index 0000000..4601953 --- /dev/null +++ b/fancy_gym/black_box/factory/basis_generator_factory.py @@ -0,0 +1,18 @@ +from mp_pytorch.basis_gn import NormalizedRBFBasisGenerator, ZeroPaddingNormalizedRBFBasisGenerator +from mp_pytorch.phase_gn import PhaseGenerator + +ALL_TYPES = ["rbf", "zero_rbf", "rhythmic"] + + +def get_basis_generator(basis_generator_type: str, phase_generator: PhaseGenerator, **kwargs): + basis_generator_type = basis_generator_type.lower() + if basis_generator_type == "rbf": + return NormalizedRBFBasisGenerator(phase_generator, **kwargs) + elif basis_generator_type == "zero_rbf": + return ZeroPaddingNormalizedRBFBasisGenerator(phase_generator, **kwargs) + elif basis_generator_type == "rhythmic": + raise NotImplementedError() + # return RhythmicBasisGenerator(phase_generator, **kwargs) + else: + raise ValueError(f"Specified basis generator type {basis_generator_type} not supported, " + f"please choose one of {ALL_TYPES}.") diff --git a/fancy_gym/black_box/factory/controller_factory.py b/fancy_gym/black_box/factory/controller_factory.py new file mode 100644 index 0000000..8b2d865 --- /dev/null +++ b/fancy_gym/black_box/factory/controller_factory.py @@ -0,0 +1,21 @@ +from fancy_gym.black_box.controller.meta_world_controller import MetaWorldController +from fancy_gym.black_box.controller.pd_controller import PDController +from fancy_gym.black_box.controller.pos_controller import PosController +from fancy_gym.black_box.controller.vel_controller import VelController + +ALL_TYPES = ["motor", "velocity", "position", "metaworld"] + + +def get_controller(controller_type: str, **kwargs): + controller_type = controller_type.lower() + if controller_type == "motor": + return PDController(**kwargs) + elif controller_type == "velocity": + return VelController() + elif controller_type == "position": + return PosController() + elif controller_type == "metaworld": + return MetaWorldController() + else: + raise ValueError(f"Specified controller type {controller_type} not supported, " + f"please choose one of {ALL_TYPES}.") diff --git a/fancy_gym/black_box/factory/phase_generator_factory.py b/fancy_gym/black_box/factory/phase_generator_factory.py new file mode 100644 index 0000000..67e17f8 --- /dev/null +++ b/fancy_gym/black_box/factory/phase_generator_factory.py @@ -0,0 +1,23 @@ +from mp_pytorch.phase_gn import LinearPhaseGenerator, ExpDecayPhaseGenerator + +# from mp_pytorch.phase_gn.rhythmic_phase_generator import RhythmicPhaseGenerator +# from mp_pytorch.phase_gn.smooth_phase_generator import SmoothPhaseGenerator + +ALL_TYPES = ["linear", "exp", "rhythmic", "smooth"] + + +def get_phase_generator(phase_generator_type, **kwargs): + phase_generator_type = phase_generator_type.lower() + if phase_generator_type == "linear": + return LinearPhaseGenerator(**kwargs) + elif phase_generator_type == "exp": + return ExpDecayPhaseGenerator(**kwargs) + elif phase_generator_type == "rhythmic": + raise NotImplementedError() + # return RhythmicPhaseGenerator(**kwargs) + elif phase_generator_type == "smooth": + raise NotImplementedError() + # return SmoothPhaseGenerator(**kwargs) + else: + raise ValueError(f"Specified phase generator type {phase_generator_type} not supported, " + f"please choose one of {ALL_TYPES}.") diff --git a/fancy_gym/black_box/factory/trajectory_generator_factory.py b/fancy_gym/black_box/factory/trajectory_generator_factory.py new file mode 100644 index 0000000..5a36fdd --- /dev/null +++ b/fancy_gym/black_box/factory/trajectory_generator_factory.py @@ -0,0 +1,21 @@ +from mp_pytorch.basis_gn import BasisGenerator +from mp_pytorch.mp import ProDMP, DMP, ProMP + +ALL_TYPES = ["promp", "dmp", "idmp"] + + +def get_trajectory_generator( + trajectory_generator_type: str, action_dim: int, basis_generator: BasisGenerator, **kwargs +): + trajectory_generator_type = trajectory_generator_type.lower() + if trajectory_generator_type == "promp": + return ProMP(basis_generator, action_dim, **kwargs) + elif trajectory_generator_type == "dmp": + return DMP(basis_generator, action_dim, **kwargs) + elif trajectory_generator_type == 'prodmp': + from mp_pytorch.basis_gn import ProDMPBasisGenerator + assert isinstance(basis_generator, ProDMPBasisGenerator) + return ProDMP(basis_generator, action_dim, **kwargs) + else: + raise ValueError(f"Specified movement primitive type {trajectory_generator_type} not supported, " + f"please choose one of {ALL_TYPES}.") diff --git a/fancy_gym/black_box/raw_interface_wrapper.py b/fancy_gym/black_box/raw_interface_wrapper.py new file mode 100644 index 0000000..02945a1 --- /dev/null +++ b/fancy_gym/black_box/raw_interface_wrapper.py @@ -0,0 +1,68 @@ +from typing import Union, Tuple + +import gym +import numpy as np +from mp_pytorch.mp.mp_interfaces import MPInterface + + +class RawInterfaceWrapper(gym.Wrapper): + + @property + def context_mask(self) -> np.ndarray: + """ + Returns boolean mask of the same shape as the observation space. + It determines whether the observation is returned for the contextual case or not. + This effectively allows to filter unwanted or unnecessary observations from the full step-based case. + E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the + context/part of the first observation, the velocities are not necessary in the observation for the task. + Returns: + bool array representing the indices of the observations + + """ + return np.ones(self.env.observation_space.shape[0], dtype=bool) + + @property + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: + """ + Returns the current position of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using velocity control, + it should, however, be implemented regardless. + E.g. The joint positions that are directly or indirectly controlled by the action. + """ + raise NotImplementedError + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + """ + Returns the current velocity of the action/control dimension. + The dimensionality has to match the action/control dimension. + This is not required when exclusively using position control, + it should, however, be implemented regardless. + E.g. The joint velocities that are directly or indirectly controlled by the action. + """ + raise NotImplementedError + + @property + def dt(self) -> float: + """ + Control frequency of the environment + Returns: float + + """ + return self.env.dt + + def episode_callback(self, action: np.ndarray, traj_gen: MPInterface) -> Tuple[ + np.ndarray, Union[np.ndarray, None]]: + """ + Used to extract the parameters for the movement primitive and other parameters from an action array which might + include other actions like ball releasing time for the beer pong environment. + This only needs to be overwritten if the action space is modified. + Args: + action: a vector instance of the whole action space, includes traj_gen parameters and additional parameters if + specified, else only traj_gen parameters + + Returns: + Tuple: mp_arguments and other arguments + """ + return action, None diff --git a/alr_envs/dmc/README.MD b/fancy_gym/dmc/README.MD similarity index 100% rename from alr_envs/dmc/README.MD rename to fancy_gym/dmc/README.MD diff --git a/fancy_gym/dmc/__init__.py b/fancy_gym/dmc/__init__.py new file mode 100644 index 0000000..22ae47f --- /dev/null +++ b/fancy_gym/dmc/__init__.py @@ -0,0 +1,245 @@ +from copy import deepcopy + +from . import manipulation, suite + +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} + +from gym.envs.registration import register + +DEFAULT_BB_DICT_ProMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'promp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'linear' + }, + "controller_kwargs": { + 'controller_type': 'motor', + "p_gains": 50., + "d_gains": 1., + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'zero_rbf', + 'num_basis': 5, + 'num_basis_zero_start': 1 + } +} + +DEFAULT_BB_DICT_DMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'dmp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'exp' + }, + "controller_kwargs": { + 'controller_type': 'motor', + "p_gains": 50., + "d_gains": 1., + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'rbf', + 'num_basis': 5 + } +} + +# DeepMind Control Suite (DMC) +kwargs_dict_bic_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_bic_dmp['name'] = f"dmc:ball_in_cup-catch" +kwargs_dict_bic_dmp['wrappers'].append(suite.ball_in_cup.MPWrapper) +# bandwidth_factor=2 +kwargs_dict_bic_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +kwargs_dict_bic_dmp['trajectory_generator_kwargs']['weight_scale'] = 10 # TODO: weight scale 1, but goal scale 0.1 +register( + id=f'dmc_ball_in_cup-catch_dmp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_bic_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_ball_in_cup-catch_dmp-v0") + +kwargs_dict_bic_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_bic_promp['name'] = f"dmc:ball_in_cup-catch" +kwargs_dict_bic_promp['wrappers'].append(suite.ball_in_cup.MPWrapper) +register( + id=f'dmc_ball_in_cup-catch_promp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_bic_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_ball_in_cup-catch_promp-v0") + +kwargs_dict_reacher_easy_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_reacher_easy_dmp['name'] = f"dmc:reacher-easy" +kwargs_dict_reacher_easy_dmp['wrappers'].append(suite.reacher.MPWrapper) +# bandwidth_factor=2 +kwargs_dict_reacher_easy_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +# TODO: weight scale 50, but goal scale 0.1 +kwargs_dict_reacher_easy_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 +register( + id=f'dmc_reacher-easy_dmp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_bic_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-easy_dmp-v0") + +kwargs_dict_reacher_easy_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_reacher_easy_promp['name'] = f"dmc:reacher-easy" +kwargs_dict_reacher_easy_promp['wrappers'].append(suite.reacher.MPWrapper) +kwargs_dict_reacher_easy_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 +register( + id=f'dmc_reacher-easy_promp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_reacher_easy_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-easy_promp-v0") + +kwargs_dict_reacher_hard_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_reacher_hard_dmp['name'] = f"dmc:reacher-hard" +kwargs_dict_reacher_hard_dmp['wrappers'].append(suite.reacher.MPWrapper) +# bandwidth_factor = 2 +kwargs_dict_reacher_hard_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +# TODO: weight scale 50, but goal scale 0.1 +kwargs_dict_reacher_hard_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 +register( + id=f'dmc_reacher-hard_dmp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_reacher_hard_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_reacher-hard_dmp-v0") + +kwargs_dict_reacher_hard_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_reacher_hard_promp['name'] = f"dmc:reacher-hard" +kwargs_dict_reacher_hard_promp['wrappers'].append(suite.reacher.MPWrapper) +kwargs_dict_reacher_hard_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 +register( + id=f'dmc_reacher-hard_promp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_reacher_hard_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_reacher-hard_promp-v0") + +_dmc_cartpole_tasks = ["balance", "balance_sparse", "swingup", "swingup_sparse"] + +for _task in _dmc_cartpole_tasks: + _env_id = f'dmc_cartpole-{_task}_dmp-v0' + kwargs_dict_cartpole_dmp = deepcopy(DEFAULT_BB_DICT_DMP) + kwargs_dict_cartpole_dmp['name'] = f"dmc:cartpole-{_task}" + kwargs_dict_cartpole_dmp['wrappers'].append(suite.cartpole.MPWrapper) + # bandwidth_factor = 2 + kwargs_dict_cartpole_dmp['phase_generator_kwargs']['alpha_phase'] = 2 + # TODO: weight scale 50, but goal scale 0.1 + kwargs_dict_cartpole_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 + kwargs_dict_cartpole_dmp['controller_kwargs']['p_gains'] = 10 + kwargs_dict_cartpole_dmp['controller_kwargs']['d_gains'] = 10 + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole_dmp + ) + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'dmc_cartpole-{_task}_promp-v0' + kwargs_dict_cartpole_promp = deepcopy(DEFAULT_BB_DICT_DMP) + kwargs_dict_cartpole_promp['name'] = f"dmc:cartpole-{_task}" + kwargs_dict_cartpole_promp['wrappers'].append(suite.cartpole.MPWrapper) + kwargs_dict_cartpole_promp['controller_kwargs']['p_gains'] = 10 + kwargs_dict_cartpole_promp['controller_kwargs']['d_gains'] = 10 + kwargs_dict_cartpole_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole_promp + ) + ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +kwargs_dict_cartpole2poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_cartpole2poles_dmp['name'] = f"dmc:cartpole-two_poles" +kwargs_dict_cartpole2poles_dmp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper) +# bandwidth_factor = 2 +kwargs_dict_cartpole2poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +# TODO: weight scale 50, but goal scale 0.1 +kwargs_dict_cartpole2poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 +kwargs_dict_cartpole2poles_dmp['controller_kwargs']['p_gains'] = 10 +kwargs_dict_cartpole2poles_dmp['controller_kwargs']['d_gains'] = 10 +_env_id = f'dmc_cartpole-two_poles_dmp-v0' +register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole2poles_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + +kwargs_dict_cartpole2poles_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_cartpole2poles_promp['name'] = f"dmc:cartpole-two_poles" +kwargs_dict_cartpole2poles_promp['wrappers'].append(suite.cartpole.TwoPolesMPWrapper) +kwargs_dict_cartpole2poles_promp['controller_kwargs']['p_gains'] = 10 +kwargs_dict_cartpole2poles_promp['controller_kwargs']['d_gains'] = 10 +kwargs_dict_cartpole2poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 +_env_id = f'dmc_cartpole-two_poles_promp-v0' +register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole2poles_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +kwargs_dict_cartpole3poles_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_cartpole3poles_dmp['name'] = f"dmc:cartpole-three_poles" +kwargs_dict_cartpole3poles_dmp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper) +# bandwidth_factor = 2 +kwargs_dict_cartpole3poles_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +# TODO: weight scale 50, but goal scale 0.1 +kwargs_dict_cartpole3poles_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 +kwargs_dict_cartpole3poles_dmp['controller_kwargs']['p_gains'] = 10 +kwargs_dict_cartpole3poles_dmp['controller_kwargs']['d_gains'] = 10 +_env_id = f'dmc_cartpole-three_poles_dmp-v0' +register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole3poles_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + +kwargs_dict_cartpole3poles_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_cartpole3poles_promp['name'] = f"dmc:cartpole-three_poles" +kwargs_dict_cartpole3poles_promp['wrappers'].append(suite.cartpole.ThreePolesMPWrapper) +kwargs_dict_cartpole3poles_promp['controller_kwargs']['p_gains'] = 10 +kwargs_dict_cartpole3poles_promp['controller_kwargs']['d_gains'] = 10 +kwargs_dict_cartpole3poles_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 +_env_id = f'dmc_cartpole-three_poles_promp-v0' +register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_cartpole3poles_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +# DeepMind Manipulation +kwargs_dict_mani_reach_site_features_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_mani_reach_site_features_dmp['name'] = f"dmc:manipulation-reach_site_features" +kwargs_dict_mani_reach_site_features_dmp['wrappers'].append(manipulation.reach_site.MPWrapper) +kwargs_dict_mani_reach_site_features_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +# TODO: weight scale 50, but goal scale 0.1 +kwargs_dict_mani_reach_site_features_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 +kwargs_dict_mani_reach_site_features_dmp['controller_kwargs']['controller_type'] = 'velocity' +register( + id=f'dmc_manipulation-reach_site_dmp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_mani_reach_site_features_dmp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("dmc_manipulation-reach_site_dmp-v0") + +kwargs_dict_mani_reach_site_features_promp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_mani_reach_site_features_promp['name'] = f"dmc:manipulation-reach_site_features" +kwargs_dict_mani_reach_site_features_promp['wrappers'].append(manipulation.reach_site.MPWrapper) +kwargs_dict_mani_reach_site_features_promp['trajectory_generator_kwargs']['weight_scale'] = 0.2 +kwargs_dict_mani_reach_site_features_promp['controller_kwargs']['controller_type'] = 'velocity' +register( + id=f'dmc_manipulation-reach_site_promp-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_mani_reach_site_features_promp +) +ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("dmc_manipulation-reach_site_promp-v0") diff --git a/fancy_gym/dmc/dmc_wrapper.py b/fancy_gym/dmc/dmc_wrapper.py new file mode 100644 index 0000000..b1522c3 --- /dev/null +++ b/fancy_gym/dmc/dmc_wrapper.py @@ -0,0 +1,186 @@ +# Adopted from: https://github.com/denisyarats/dmc2gym/blob/master/dmc2gym/wrappers.py +# License: MIT +# Copyright (c) 2020 Denis Yarats +import collections +from collections.abc import MutableMapping +from typing import Any, Dict, Tuple, Optional, Union, Callable + +import gym +import numpy as np +from dm_control import composer +from dm_control.rl import control +from dm_env import specs +from gym import spaces +from gym.core import ObsType + + +def _spec_to_box(spec): + def extract_min_max(s): + assert s.dtype == np.float64 or s.dtype == np.float32, \ + f"Only float64 and float32 types are allowed, instead {s.dtype} was found" + dim = int(np.prod(s.shape)) + if type(s) == specs.Array: + bound = np.inf * np.ones(dim, dtype=s.dtype) + return -bound, bound + elif type(s) == specs.BoundedArray: + zeros = np.zeros(dim, dtype=s.dtype) + return s.minimum + zeros, s.maximum + zeros + + mins, maxs = [], [] + for s in spec: + mn, mx = extract_min_max(s) + mins.append(mn) + maxs.append(mx) + low = np.concatenate(mins, axis=0) + high = np.concatenate(maxs, axis=0) + assert low.shape == high.shape + return spaces.Box(low, high, dtype=s.dtype) + + +def _flatten_obs(obs: MutableMapping): + """ + Flattens an observation of type MutableMapping, e.g. a dict to a 1D array. + Args: + obs: observation to flatten + + Returns: 1D array of observation + + """ + + if not isinstance(obs, MutableMapping): + raise ValueError(f'Requires dict-like observations structure. {type(obs)} found.') + + # Keep key order consistent for non OrderedDicts + keys = obs.keys() if isinstance(obs, collections.OrderedDict) else sorted(obs.keys()) + + obs_vals = [np.array([obs[key]]) if np.isscalar(obs[key]) else obs[key].ravel() for key in keys] + return np.concatenate(obs_vals) + + +class DMCWrapper(gym.Env): + def __init__(self, + env: Callable[[], Union[composer.Environment, control.Environment]], + ): + + # TODO: Currently this is required to be a function because dmc does not allow to copy composers environments + self._env = env() + + # action and observation space + self._action_space = _spec_to_box([self._env.action_spec()]) + self._observation_space = _spec_to_box(self._env.observation_spec().values()) + + self._window = None + self.id = 'dmc' + + def __getattr__(self, item): + """Propagate only non-existent properties to wrapped env.""" + if item.startswith('_'): + raise AttributeError("attempted to get missing private attribute '{}'".format(item)) + if item in self.__dict__: + return getattr(self, item) + return getattr(self._env, item) + + def _get_obs(self, time_step): + obs = _flatten_obs(time_step.observation).astype(self.observation_space.dtype) + return obs + + @property + def observation_space(self): + return self._observation_space + + @property + def action_space(self): + return self._action_space + + @property + def dt(self): + return self._env.control_timestep() + + def seed(self, seed=None): + self._action_space.seed(seed) + self._observation_space.seed(seed) + + def step(self, action) -> Tuple[np.ndarray, float, bool, Dict[str, Any]]: + assert self._action_space.contains(action) + extra = {'internal_state': self._env.physics.get_state().copy()} + + time_step = self._env.step(action) + reward = time_step.reward or 0. + done = time_step.last() + obs = self._get_obs(time_step) + extra['discount'] = time_step.discount + + return obs, reward, done, extra + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: + time_step = self._env.reset() + obs = self._get_obs(time_step) + return obs + + def render(self, mode='rgb_array', height=240, width=320, camera_id=-1, overlays=(), depth=False, + segmentation=False, scene_option=None, render_flag_overrides=None): + + # assert mode == 'rgb_array', 'only support rgb_array mode, given %s' % mode + if mode == "rgb_array": + return self._env.physics.render(height=height, width=width, camera_id=camera_id, overlays=overlays, + depth=depth, segmentation=segmentation, scene_option=scene_option, + render_flag_overrides=render_flag_overrides) + + # Render max available buffer size. Larger is only possible by altering the XML. + img = self._env.physics.render(height=self._env.physics.model.vis.global_.offheight, + width=self._env.physics.model.vis.global_.offwidth, + camera_id=camera_id, overlays=overlays, depth=depth, segmentation=segmentation, + scene_option=scene_option, render_flag_overrides=render_flag_overrides) + + if depth: + img = np.dstack([img.astype(np.uint8)] * 3) + + if mode == 'human': + try: + import cv2 + if self._window is None: + self._window = cv2.namedWindow(self.id, cv2.WINDOW_AUTOSIZE) + cv2.imshow(self.id, img[..., ::-1]) # Image in BGR + cv2.waitKey(1) + except ImportError: + raise gym.error.DependencyNotInstalled("Rendering requires opencv. Run `pip install opencv-python`") + # PYGAME seems to destroy some global rendering configs from the physics render + # except ImportError: + # import pygame + # img_copy = img.copy().transpose((1, 0, 2)) + # if self._window is None: + # pygame.init() + # pygame.display.init() + # self._window = pygame.display.set_mode(img_copy.shape[:2]) + # self.clock = pygame.time.Clock() + # + # surf = pygame.surfarray.make_surface(img_copy) + # self._window.blit(surf, (0, 0)) + # pygame.event.pump() + # self.clock.tick(30) + # pygame.display.flip() + + def close(self): + super().close() + if self._window is not None: + try: + import cv2 + cv2.destroyWindow(self.id) + except ImportError: + import pygame + + pygame.display.quit() + pygame.quit() + + @property + def reward_range(self) -> Tuple[float, float]: + reward_spec = self._env.reward_spec() + if isinstance(reward_spec, specs.BoundedArray): + return reward_spec.minimum, reward_spec.maximum + return -float('inf'), float('inf') + + @property + def metadata(self): + return {'render.modes': ['human', 'rgb_array'], + 'video.frames_per_second': round(1.0 / self._env.control_timestep())} diff --git a/alr_envs/dmc/manipulation/__init__.py b/fancy_gym/dmc/manipulation/__init__.py similarity index 100% rename from alr_envs/dmc/manipulation/__init__.py rename to fancy_gym/dmc/manipulation/__init__.py diff --git a/alr_envs/alr/classic_control/simple_reacher/__init__.py b/fancy_gym/dmc/manipulation/reach_site/__init__.py similarity index 100% rename from alr_envs/alr/classic_control/simple_reacher/__init__.py rename to fancy_gym/dmc/manipulation/reach_site/__init__.py diff --git a/alr_envs/dmc/manipulation/reach_site/mp_wrapper.py b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py similarity index 87% rename from alr_envs/dmc/manipulation/reach_site/mp_wrapper.py rename to fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py index 2d03f7b..f64ac4a 100644 --- a/alr_envs/dmc/manipulation/reach_site/mp_wrapper.py +++ b/fancy_gym/dmc/manipulation/reach_site/mp_wrapper.py @@ -2,13 +2,13 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # Joint and target positions are randomized, velocities are always set to 0. return np.hstack([ [True] * 3, # target position diff --git a/alr_envs/dmc/suite/__init__.py b/fancy_gym/dmc/suite/__init__.py similarity index 100% rename from alr_envs/dmc/suite/__init__.py rename to fancy_gym/dmc/suite/__init__.py diff --git a/alr_envs/alr/classic_control/viapoint_reacher/__init__.py b/fancy_gym/dmc/suite/ball_in_cup/__init__.py similarity index 100% rename from alr_envs/alr/classic_control/viapoint_reacher/__init__.py rename to fancy_gym/dmc/suite/ball_in_cup/__init__.py diff --git a/alr_envs/dmc/suite/ball_in_cup/mp_wrapper.py b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py similarity index 86% rename from alr_envs/dmc/suite/ball_in_cup/mp_wrapper.py rename to fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py index fb068b3..dc6a539 100644 --- a/alr_envs/dmc/suite/ball_in_cup/mp_wrapper.py +++ b/fancy_gym/dmc/suite/ball_in_cup/mp_wrapper.py @@ -2,13 +2,13 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # Besides the ball position, the environment is always set to 0. return np.hstack([ [False] * 2, # cup position diff --git a/alr_envs/dmc/suite/cartpole/__init__.py b/fancy_gym/dmc/suite/cartpole/__init__.py similarity index 100% rename from alr_envs/dmc/suite/cartpole/__init__.py rename to fancy_gym/dmc/suite/cartpole/__init__.py index c5f9bee..8867457 100644 --- a/alr_envs/dmc/suite/cartpole/__init__.py +++ b/fancy_gym/dmc/suite/cartpole/__init__.py @@ -1,3 +1,3 @@ from .mp_wrapper import MPWrapper -from .mp_wrapper import TwoPolesMPWrapper from .mp_wrapper import ThreePolesMPWrapper +from .mp_wrapper import TwoPolesMPWrapper diff --git a/alr_envs/dmc/suite/cartpole/mp_wrapper.py b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py similarity index 88% rename from alr_envs/dmc/suite/cartpole/mp_wrapper.py rename to fancy_gym/dmc/suite/cartpole/mp_wrapper.py index 1ca99f5..7edd51f 100644 --- a/alr_envs/dmc/suite/cartpole/mp_wrapper.py +++ b/fancy_gym/dmc/suite/cartpole/mp_wrapper.py @@ -2,18 +2,17 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): def __init__(self, env, n_poles: int = 1): self.n_poles = n_poles super().__init__(env) - @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # Besides the ball position, the environment is always set to 0. return np.hstack([ [True], # slider position diff --git a/alr_envs/alr/mujoco/beerpong/__init__.py b/fancy_gym/dmc/suite/reacher/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/beerpong/__init__.py rename to fancy_gym/dmc/suite/reacher/__init__.py diff --git a/alr_envs/dmc/suite/reacher/mp_wrapper.py b/fancy_gym/dmc/suite/reacher/mp_wrapper.py similarity index 84% rename from alr_envs/dmc/suite/reacher/mp_wrapper.py rename to fancy_gym/dmc/suite/reacher/mp_wrapper.py index 86bc992..5ac52e5 100644 --- a/alr_envs/dmc/suite/reacher/mp_wrapper.py +++ b/fancy_gym/dmc/suite/reacher/mp_wrapper.py @@ -2,13 +2,13 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # Joint and target positions are randomized, velocities are always set to 0. return np.hstack([ [True] * 2, # joint position diff --git a/fancy_gym/envs/__init__.py b/fancy_gym/envs/__init__.py new file mode 100644 index 0000000..9f0299e --- /dev/null +++ b/fancy_gym/envs/__init__.py @@ -0,0 +1,664 @@ +from copy import deepcopy + +import numpy as np +from gym import register + +from . import classic_control, mujoco +from .classic_control.hole_reacher.hole_reacher import HoleReacherEnv +from .classic_control.simple_reacher.simple_reacher import SimpleReacherEnv +from .classic_control.viapoint_reacher.viapoint_reacher import ViaPointReacherEnv +from .mujoco.ant_jump.ant_jump import MAX_EPISODE_STEPS_ANTJUMP +from .mujoco.beerpong.beerpong import MAX_EPISODE_STEPS_BEERPONG, FIXED_RELEASE_STEP +from .mujoco.half_cheetah_jump.half_cheetah_jump import MAX_EPISODE_STEPS_HALFCHEETAHJUMP +from .mujoco.hopper_jump.hopper_jump import MAX_EPISODE_STEPS_HOPPERJUMP +from .mujoco.hopper_jump.hopper_jump_on_box import MAX_EPISODE_STEPS_HOPPERJUMPONBOX +from .mujoco.hopper_throw.hopper_throw import MAX_EPISODE_STEPS_HOPPERTHROW +from .mujoco.hopper_throw.hopper_throw_in_basket import MAX_EPISODE_STEPS_HOPPERTHROWINBASKET +from .mujoco.reacher.reacher import ReacherEnv, MAX_EPISODE_STEPS_REACHER +from .mujoco.walker_2d_jump.walker_2d_jump import MAX_EPISODE_STEPS_WALKERJUMP + +ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} + +DEFAULT_BB_DICT_ProMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'promp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'linear' + }, + "controller_kwargs": { + 'controller_type': 'motor', + "p_gains": 1.0, + "d_gains": 0.1, + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'zero_rbf', + 'num_basis': 5, + 'num_basis_zero_start': 1 + } +} + +DEFAULT_BB_DICT_DMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'dmp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'exp' + }, + "controller_kwargs": { + 'controller_type': 'motor', + "p_gains": 1.0, + "d_gains": 0.1, + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'rbf', + 'num_basis': 5 + } +} + +# Classic Control +## Simple Reacher +register( + id='SimpleReacher-v0', + entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 2, + } +) + +register( + id='LongSimpleReacher-v0', + entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + } +) + +## Viapoint Reacher + +register( + id='ViaPointReacher-v0', + entry_point='fancy_gym.envs.classic_control:ViaPointReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "allow_self_collision": False, + "collision_penalty": 1000 + } +) + +## Hole Reacher +register( + id='HoleReacher-v0', + entry_point='fancy_gym.envs.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": True, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": None, + "hole_depth": 1, + "hole_x": None, + "collision_penalty": 100, + } +) + +# Mujoco + +## Mujoco Reacher +for _dims in [5, 7]: + register( + id=f'Reacher{_dims}d-v0', + entry_point='fancy_gym.envs.mujoco:ReacherEnv', + max_episode_steps=MAX_EPISODE_STEPS_REACHER, + kwargs={ + "n_links": _dims, + } + ) + + register( + id=f'Reacher{_dims}dSparse-v0', + entry_point='fancy_gym.envs.mujoco:ReacherEnv', + max_episode_steps=MAX_EPISODE_STEPS_REACHER, + kwargs={ + "sparse": True, + 'reward_weight': 200, + "n_links": _dims, + } + ) + +register( + id='HopperJumpSparse-v0', + entry_point='fancy_gym.envs.mujoco:HopperJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP, + kwargs={ + "sparse": True, + } +) + +register( + id='HopperJump-v0', + entry_point='fancy_gym.envs.mujoco:HopperJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP, + kwargs={ + "sparse": False, + "healthy_reward": 1.0, + "contact_weight": 0.0, + "height_weight": 3.0, + } +) + +register( + id='AntJump-v0', + entry_point='fancy_gym.envs.mujoco:AntJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP, +) + +register( + id='HalfCheetahJump-v0', + entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP, +) + +register( + id='HopperJumpOnBox-v0', + entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX, +) + +register( + id='HopperThrow-v0', + entry_point='fancy_gym.envs.mujoco:HopperThrowEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW, +) + +register( + id='HopperThrowInBasket-v0', + entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET, +) + +register( + id='Walker2DJump-v0', + entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP, +) + +register( + id='BeerPong-v0', + entry_point='fancy_gym.envs.mujoco:BeerPongEnv', + max_episode_steps=MAX_EPISODE_STEPS_BEERPONG, +) + +# Here we use the same reward as in BeerPong-v0, but now consider after the release, +# only one time step, i.e. we simulate until the end of th episode +register( + id='BeerPongStepBased-v0', + entry_point='fancy_gym.envs.mujoco:BeerPongEnvStepBasedEpisodicReward', + max_episode_steps=FIXED_RELEASE_STEP, +) + + +# movement Primitive Environments + +## Simple Reacher +_versions = ["SimpleReacher-v0", "LongSimpleReacher-v0"] +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}DMP-{_name[1]}' + kwargs_dict_simple_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP) + kwargs_dict_simple_reacher_dmp['wrappers'].append(classic_control.simple_reacher.MPWrapper) + kwargs_dict_simple_reacher_dmp['controller_kwargs']['p_gains'] = 0.6 + kwargs_dict_simple_reacher_dmp['controller_kwargs']['d_gains'] = 0.075 + kwargs_dict_simple_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50 + kwargs_dict_simple_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2 + kwargs_dict_simple_reacher_dmp['name'] = f"{_v}" + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_simple_reacher_dmp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'{_name[0]}ProMP-{_name[1]}' + kwargs_dict_simple_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_simple_reacher_promp['wrappers'].append(classic_control.simple_reacher.MPWrapper) + kwargs_dict_simple_reacher_promp['controller_kwargs']['p_gains'] = 0.6 + kwargs_dict_simple_reacher_promp['controller_kwargs']['d_gains'] = 0.075 + kwargs_dict_simple_reacher_promp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_simple_reacher_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +# Viapoint reacher +kwargs_dict_via_point_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP) +kwargs_dict_via_point_reacher_dmp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper) +kwargs_dict_via_point_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity' +kwargs_dict_via_point_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 50 +kwargs_dict_via_point_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2 +kwargs_dict_via_point_reacher_dmp['name'] = "ViaPointReacher-v0" +register( + id='ViaPointReacherDMP-v0', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + # max_episode_steps=1, + kwargs=kwargs_dict_via_point_reacher_dmp +) +ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append("ViaPointReacherDMP-v0") + +kwargs_dict_via_point_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP) +kwargs_dict_via_point_reacher_promp['wrappers'].append(classic_control.viapoint_reacher.MPWrapper) +kwargs_dict_via_point_reacher_promp['controller_kwargs']['controller_type'] = 'velocity' +kwargs_dict_via_point_reacher_promp['name'] = "ViaPointReacher-v0" +register( + id="ViaPointReacherProMP-v0", + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_via_point_reacher_promp +) +ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ViaPointReacherProMP-v0") + +## Hole Reacher +_versions = ["HoleReacher-v0"] +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}DMP-{_name[1]}' + kwargs_dict_hole_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP) + kwargs_dict_hole_reacher_dmp['wrappers'].append(classic_control.hole_reacher.MPWrapper) + kwargs_dict_hole_reacher_dmp['controller_kwargs']['controller_type'] = 'velocity' + # TODO: Before it was weight scale 50 and goal scale 0.1. We now only have weight scale and thus set it to 500. Check + kwargs_dict_hole_reacher_dmp['trajectory_generator_kwargs']['weight_scale'] = 500 + kwargs_dict_hole_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2.5 + kwargs_dict_hole_reacher_dmp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + # max_episode_steps=1, + kwargs=kwargs_dict_hole_reacher_dmp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'{_name[0]}ProMP-{_name[1]}' + kwargs_dict_hole_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_hole_reacher_promp['wrappers'].append(classic_control.hole_reacher.MPWrapper) + kwargs_dict_hole_reacher_promp['trajectory_generator_kwargs']['weight_scale'] = 2 + kwargs_dict_hole_reacher_promp['controller_kwargs']['controller_type'] = 'velocity' + kwargs_dict_hole_reacher_promp['name'] = f"{_v}" + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_hole_reacher_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +## ReacherNd +_versions = ["Reacher5d-v0", "Reacher7d-v0", "Reacher5dSparse-v0", "Reacher7dSparse-v0"] +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}DMP-{_name[1]}' + kwargs_dict_reacher_dmp = deepcopy(DEFAULT_BB_DICT_DMP) + kwargs_dict_reacher_dmp['wrappers'].append(mujoco.reacher.MPWrapper) + kwargs_dict_reacher_dmp['phase_generator_kwargs']['alpha_phase'] = 2 + kwargs_dict_reacher_dmp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + # max_episode_steps=1, + kwargs=kwargs_dict_reacher_dmp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["DMP"].append(_env_id) + + _env_id = f'{_name[0]}ProMP-{_name[1]}' + kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher.MPWrapper) + kwargs_dict_reacher_promp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_reacher_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +######################################################################################################################## +## Beerpong ProMP +_versions = ['BeerPong-v0'] +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}ProMP-{_name[1]}' + kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper) + kwargs_dict_bp_promp['phase_generator_kwargs']['learn_tau'] = True + kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]) + kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]) + kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2 + kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2 + kwargs_dict_bp_promp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_bp_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +### BP with Fixed release +_versions = ["BeerPongStepBased-v0", 'BeerPong-v0'] +for _v in _versions: + if _v != 'BeerPong-v0': + _name = _v.split("-") + _env_id = f'{_name[0]}ProMP-{_name[1]}' + else: + _env_id = 'BeerPongFixedReleaseProMP-v0' + kwargs_dict_bp_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_bp_promp['wrappers'].append(mujoco.beerpong.MPWrapper) + kwargs_dict_bp_promp['phase_generator_kwargs']['tau'] = 0.62 + kwargs_dict_bp_promp['controller_kwargs']['p_gains'] = np.array([1.5, 5, 2.55, 3, 2., 2, 1.25]) + kwargs_dict_bp_promp['controller_kwargs']['d_gains'] = np.array([0.02333333, 0.1, 0.0625, 0.08, 0.03, 0.03, 0.0125]) + kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis'] = 2 + kwargs_dict_bp_promp['basis_generator_kwargs']['num_basis_zero_start'] = 2 + kwargs_dict_bp_promp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_bp_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) +######################################################################################################################## + +## Table Tennis needs to be fixed according to Zhou's implementation + +# TODO: Add later when finished +# ######################################################################################################################## +# +# ## AntJump +# _versions = ['AntJump-v0'] +# for _v in _versions: +# _name = _v.split("-") +# _env_id = f'{_name[0]}ProMP-{_name[1]}' +# kwargs_dict_ant_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP) +# kwargs_dict_ant_jump_promp['wrappers'].append(mujoco.ant_jump.MPWrapper) +# kwargs_dict_ant_jump_promp['name'] = _v +# register( +# id=_env_id, +# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', +# kwargs=kwargs_dict_ant_jump_promp +# ) +# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) +# +# ######################################################################################################################## +# +# ## HalfCheetahJump +# _versions = ['HalfCheetahJump-v0'] +# for _v in _versions: +# _name = _v.split("-") +# _env_id = f'{_name[0]}ProMP-{_name[1]}' +# kwargs_dict_halfcheetah_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP) +# kwargs_dict_halfcheetah_jump_promp['wrappers'].append(mujoco.half_cheetah_jump.MPWrapper) +# kwargs_dict_halfcheetah_jump_promp['name'] = _v +# register( +# id=_env_id, +# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', +# kwargs=kwargs_dict_halfcheetah_jump_promp +# ) +# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) +# +# ######################################################################################################################## + + +## HopperJump +_versions = ['HopperJump-v0', 'HopperJumpSparse-v0', + # 'HopperJumpOnBox-v0', 'HopperThrow-v0', 'HopperThrowInBasket-v0' + ] +# TODO: Check if all environments work with the same MPWrapper +for _v in _versions: + _name = _v.split("-") + _env_id = f'{_name[0]}ProMP-{_name[1]}' + kwargs_dict_hopper_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_hopper_jump_promp['wrappers'].append(mujoco.hopper_jump.MPWrapper) + kwargs_dict_hopper_jump_promp['name'] = _v + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_hopper_jump_promp + ) + ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +# ######################################################################################################################## +# +# +# ## Walker2DJump +# _versions = ['Walker2DJump-v0'] +# for _v in _versions: +# _name = _v.split("-") +# _env_id = f'{_name[0]}ProMP-{_name[1]}' +# kwargs_dict_walker2d_jump_promp = deepcopy(DEFAULT_BB_DICT_ProMP) +# kwargs_dict_walker2d_jump_promp['wrappers'].append(mujoco.walker_2d_jump.MPWrapper) +# kwargs_dict_walker2d_jump_promp['name'] = _v +# register( +# id=_env_id, +# entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', +# kwargs=kwargs_dict_walker2d_jump_promp +# ) +# ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + +### Depricated, we will not provide non random starts anymore +""" +register( + id='SimpleReacher-v1', + entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 2, + "random_start": False + } +) + +register( + id='LongSimpleReacher-v1', + entry_point='fancy_gym.envs.classic_control:SimpleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False + } +) +register( + id='HoleReacher-v1', + entry_point='fancy_gym.envs.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": 0.25, + "hole_depth": 1, + "hole_x": None, + "collision_penalty": 100, + } +) +register( + id='HoleReacher-v2', + entry_point='fancy_gym.envs.classic_control:HoleReacherEnv', + max_episode_steps=200, + kwargs={ + "n_links": 5, + "random_start": False, + "allow_self_collision": False, + "allow_wall_collision": False, + "hole_width": 0.25, + "hole_depth": 1, + "hole_x": 2, + "collision_penalty": 1, + } +) + +# CtxtFree are v0, Contextual are v1 +register( + id='AntJump-v0', + entry_point='fancy_gym.envs.mujoco:AntJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_ANTJUMP, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_ANTJUMP, + "context": False + } +) +# CtxtFree are v0, Contextual are v1 +register( + id='HalfCheetahJump-v0', + entry_point='fancy_gym.envs.mujoco:HalfCheetahJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_HALFCHEETAHJUMP, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_HALFCHEETAHJUMP, + "context": False + } +) +register( + id='HopperJump-v0', + entry_point='fancy_gym.envs.mujoco:HopperJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMP, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMP, + "context": False, + "healthy_reward": 1.0 + } +) + +""" + +### Deprecated used for CorL paper +""" +_vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1] +for i in _vs: + _env_id = f'ALRReacher{i}-v0' + register( + id=_env_id, + entry_point='fancy_gym.envs.mujoco:ReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 0, + "n_links": 5, + "balance": False, + '_ctrl_cost_weight': i + } + ) + + _env_id = f'ALRReacherSparse{i}-v0' + register( + id=_env_id, + entry_point='fancy_gym.envs.mujoco:ReacherEnv', + max_episode_steps=200, + kwargs={ + "steps_before_reward": 200, + "n_links": 5, + "balance": False, + '_ctrl_cost_weight': i + } + ) + _vs = np.arange(101).tolist() + [1e-5, 5e-5, 1e-4, 5e-4, 1e-3, 5e-3, 1e-2, 5e-2, 1e-1, 5e-1] +for i in _vs: + _env_id = f'ALRReacher{i}ProMP-v0' + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": f"{_env_id.replace('ProMP', '')}", + "wrappers": [mujoco.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 4, + "policy_type": "motor", + # "weights_scale": 5, + "n_zero_basis": 1, + "zero_start": True, + "policy_kwargs": { + "p_gains": 1, + "d_gains": 0.1 + } + } + } + ) + + _env_id = f'ALRReacherSparse{i}ProMP-v0' + register( + id=_env_id, + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": f"{_env_id.replace('ProMP', '')}", + "wrappers": [mujoco.reacher.MPWrapper], + "mp_kwargs": { + "num_dof": 5, + "num_basis": 5, + "duration": 4, + "policy_type": "motor", + # "weights_scale": 5, + "n_zero_basis": 1, + "zero_start": True, + "policy_kwargs": { + "p_gains": 1, + "d_gains": 0.1 + } + } + } + ) + + register( + id='HopperJumpOnBox-v0', + entry_point='fancy_gym.envs.mujoco:HopperJumpOnBoxEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERJUMPONBOX, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_HOPPERJUMPONBOX, + "context": False + } + ) + register( + id='HopperThrow-v0', + entry_point='fancy_gym.envs.mujoco:HopperThrowEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROW, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROW, + "context": False + } + ) + register( + id='HopperThrowInBasket-v0', + entry_point='fancy_gym.envs.mujoco:HopperThrowInBasketEnv', + max_episode_steps=MAX_EPISODE_STEPS_HOPPERTHROWINBASKET, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_HOPPERTHROWINBASKET, + "context": False + } + ) + register( + id='Walker2DJump-v0', + entry_point='fancy_gym.envs.mujoco:Walker2dJumpEnv', + max_episode_steps=MAX_EPISODE_STEPS_WALKERJUMP, + kwargs={ + "max_episode_steps": MAX_EPISODE_STEPS_WALKERJUMP, + "context": False + } + ) + register(id='TableTennis2DCtxt-v1', + entry_point='fancy_gym.envs.mujoco:TTEnvGym', + max_episode_steps=MAX_EPISODE_STEPS, + kwargs={'ctxt_dim': 2, 'fixed_goal': True}) + + register( + id='BeerPong-v0', + entry_point='fancy_gym.envs.mujoco:BeerBongEnv', + max_episode_steps=300, + kwargs={ + "rndm_goal": False, + "cup_goal_pos": [0.1, -2.0], + "frame_skip": 2 + } + ) +""" diff --git a/alr_envs/alr/classic_control/README.MD b/fancy_gym/envs/classic_control/README.MD similarity index 100% rename from alr_envs/alr/classic_control/README.MD rename to fancy_gym/envs/classic_control/README.MD diff --git a/alr_envs/alr/classic_control/__init__.py b/fancy_gym/envs/classic_control/__init__.py similarity index 100% rename from alr_envs/alr/classic_control/__init__.py rename to fancy_gym/envs/classic_control/__init__.py diff --git a/alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/__init__.py b/fancy_gym/envs/classic_control/base_reacher/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/utils/rewards/__init__.py rename to fancy_gym/envs/classic_control/base_reacher/__init__.py diff --git a/alr_envs/alr/classic_control/base_reacher/base_reacher.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py similarity index 85% rename from alr_envs/alr/classic_control/base_reacher/base_reacher.py rename to fancy_gym/envs/classic_control/base_reacher/base_reacher.py index 1b1ad19..f2ba135 100644 --- a/alr_envs/alr/classic_control/base_reacher/base_reacher.py +++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher.py @@ -1,20 +1,20 @@ -from typing import Iterable, Union -from abc import ABC, abstractmethod +from typing import Union, Tuple, Optional + import gym -import matplotlib.pyplot as plt import numpy as np from gym import spaces +from gym.core import ObsType from gym.utils import seeding -from alr_envs.alr.classic_control.utils import intersect + +from fancy_gym.envs.classic_control.utils import intersect -class BaseReacherEnv(gym.Env, ABC): +class BaseReacherEnv(gym.Env): """ Base class for all reaching environments. """ - def __init__(self, n_links: int, random_start: bool = True, - allow_self_collision: bool = False): + def __init__(self, n_links: int, random_start: bool = True, allow_self_collision: bool = False): super().__init__() self.link_lengths = np.ones(n_links) self.n_links = n_links @@ -69,7 +69,8 @@ class BaseReacherEnv(gym.Env, ABC): def current_vel(self): return self._angle_velocity.copy() - def reset(self): + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: # Sample only orientation of first link, i.e. the arm is always straight. if self.random_start: first_joint = self.np_random.uniform(np.pi / 4, 3 * np.pi / 4) @@ -85,13 +86,6 @@ class BaseReacherEnv(gym.Env, ABC): return self._get_obs().copy() - @abstractmethod - def step(self, action: np.ndarray): - """ - A single step with action in angular velocity space - """ - raise NotImplementedError - def _update_joints(self): """ update joints to get new end-effector position. The other links are only required for rendering. @@ -118,27 +112,24 @@ class BaseReacherEnv(gym.Env, ABC): return True return False - @abstractmethod def _get_reward(self, action: np.ndarray) -> (float, dict): - pass + raise NotImplementedError - @abstractmethod def _get_obs(self) -> np.ndarray: - pass + raise NotImplementedError - @abstractmethod def _check_collisions(self) -> bool: - pass + raise NotImplementedError - @abstractmethod def _terminate(self, info) -> bool: - return False + raise NotImplementedError def seed(self, seed=None): self.np_random, seed = seeding.np_random(seed) return [seed] def close(self): + super(BaseReacherEnv, self).close() del self.fig @property diff --git a/alr_envs/alr/classic_control/base_reacher/base_reacher_direct.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py similarity index 87% rename from alr_envs/alr/classic_control/base_reacher/base_reacher_direct.py rename to fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py index dc79827..ab21b39 100644 --- a/alr_envs/alr/classic_control/base_reacher/base_reacher_direct.py +++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_direct.py @@ -1,14 +1,14 @@ -from abc import ABC - -from gym import spaces import numpy as np -from alr_envs.alr.classic_control.base_reacher.base_reacher import BaseReacherEnv +from gym import spaces + +from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv -class BaseReacherDirectEnv(BaseReacherEnv, ABC): +class BaseReacherDirectEnv(BaseReacherEnv): """ Base class for directly controlled reaching environments """ + def __init__(self, n_links: int, random_start: bool = True, allow_self_collision: bool = False): super().__init__(n_links, random_start, allow_self_collision) diff --git a/alr_envs/alr/classic_control/base_reacher/base_reacher_torque.py b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py similarity index 87% rename from alr_envs/alr/classic_control/base_reacher/base_reacher_torque.py rename to fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py index 094f632..7364948 100644 --- a/alr_envs/alr/classic_control/base_reacher/base_reacher_torque.py +++ b/fancy_gym/envs/classic_control/base_reacher/base_reacher_torque.py @@ -1,14 +1,14 @@ -from abc import ABC - -from gym import spaces import numpy as np -from alr_envs.alr.classic_control.base_reacher.base_reacher import BaseReacherEnv +from gym import spaces + +from fancy_gym.envs.classic_control.base_reacher.base_reacher import BaseReacherEnv -class BaseReacherTorqueEnv(BaseReacherEnv, ABC): +class BaseReacherTorqueEnv(BaseReacherEnv): """ Base class for torque controlled reaching environments """ + def __init__(self, n_links: int, random_start: bool = True, allow_self_collision: bool = False): super().__init__(n_links, random_start, allow_self_collision) diff --git a/alr_envs/alr/classic_control/hole_reacher/__init__.py b/fancy_gym/envs/classic_control/hole_reacher/__init__.py similarity index 100% rename from alr_envs/alr/classic_control/hole_reacher/__init__.py rename to fancy_gym/envs/classic_control/hole_reacher/__init__.py diff --git a/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py similarity index 91% rename from alr_envs/alr/classic_control/hole_reacher/hole_reacher.py rename to fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py index 208f005..5563ea6 100644 --- a/alr_envs/alr/classic_control/hole_reacher/hole_reacher.py +++ b/fancy_gym/envs/classic_control/hole_reacher/hole_reacher.py @@ -1,11 +1,14 @@ -from typing import Union +from typing import Union, Optional, Tuple import gym import matplotlib.pyplot as plt import numpy as np +from gym.core import ObsType from matplotlib import patches -from alr_envs.alr.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv +from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv + +MAX_EPISODE_STEPS_HOLEREACHER = 200 class HoleReacherEnv(BaseReacherDirectEnv): @@ -40,15 +43,19 @@ class HoleReacherEnv(BaseReacherDirectEnv): self.observation_space = gym.spaces.Box(low=-state_bound, high=state_bound, shape=state_bound.shape) if rew_fct == "simple": - from alr_envs.alr.classic_control.hole_reacher.hr_simple_reward import HolereacherReward + from fancy_gym.envs.classic_control.hole_reacher.hr_simple_reward import HolereacherReward self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision, collision_penalty) elif rew_fct == "vel_acc": - from alr_envs.alr.classic_control.hole_reacher.hr_dist_vel_acc_reward import HolereacherReward + from fancy_gym.envs.classic_control.hole_reacher.hr_dist_vel_acc_reward import HolereacherReward self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision, collision_penalty) + elif rew_fct == "unbounded": + from fancy_gym.envs.classic_control.hole_reacher.hr_unbounded_reward import HolereacherReward + self.reward_function = HolereacherReward(allow_self_collision, allow_wall_collision) else: raise ValueError("Unknown reward function {}".format(rew_fct)) - def reset(self): + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: self._generate_hole() self._set_patches() self.reward_function.reset() @@ -219,7 +226,7 @@ class HoleReacherEnv(BaseReacherDirectEnv): if __name__ == "__main__": - import time + env = HoleReacherEnv(5) env.reset() diff --git a/alr_envs/alr/classic_control/hole_reacher/hr_dist_vel_acc_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_dist_vel_acc_reward.py similarity index 100% rename from alr_envs/alr/classic_control/hole_reacher/hr_dist_vel_acc_reward.py rename to fancy_gym/envs/classic_control/hole_reacher/hr_dist_vel_acc_reward.py diff --git a/alr_envs/alr/classic_control/hole_reacher/hr_simple_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_simple_reward.py similarity index 100% rename from alr_envs/alr/classic_control/hole_reacher/hr_simple_reward.py rename to fancy_gym/envs/classic_control/hole_reacher/hr_simple_reward.py diff --git a/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py b/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py new file mode 100644 index 0000000..7ed13a1 --- /dev/null +++ b/fancy_gym/envs/classic_control/hole_reacher/hr_unbounded_reward.py @@ -0,0 +1,60 @@ +import numpy as np + + +class HolereacherReward: + def __init__(self, allow_self_collision, allow_wall_collision): + + # collision + self.allow_self_collision = allow_self_collision + self.allow_wall_collision = allow_wall_collision + self._is_collided = False + + self.reward_factors = np.array((1, -5e-6)) + + def reset(self): + self._is_collided = False + + def get_reward(self, env): + dist_reward = 0 + success = False + + self_collision = False + wall_collision = False + + if not self.allow_self_collision: + self_collision = env._check_self_collision() + + if not self.allow_wall_collision: + wall_collision = env.check_wall_collision() + + self._is_collided = self_collision or wall_collision + + if env._steps == 180 or self._is_collided: + self.end_eff_pos = np.copy(env.end_effector) + + if env._steps == 199 or self._is_collided: + # return reward only in last time step + # Episode also terminates when colliding, hence return reward + dist = np.linalg.norm(self.end_eff_pos - env._goal) + + if self._is_collided: + dist_reward = 0.25 * np.exp(- dist) + else: + if env.end_effector[1] > 0: + dist_reward = np.exp(- dist) + else: + dist_reward = 1 - self.end_eff_pos[1] + + success = not self._is_collided + + info = {"is_success": success, + "is_collided": self._is_collided, + "end_effector": np.copy(env.end_effector), + "joints": np.copy(env.current_pos)} + + acc_cost = np.sum(env._acc ** 2) + + reward_features = np.array((dist_reward, acc_cost)) + reward = np.dot(reward_features, self.reward_factors) + + return reward, info \ No newline at end of file diff --git a/alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py similarity index 63% rename from alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py rename to fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py index 4b71e3a..d160b5c 100644 --- a/alr_envs/alr/classic_control/simple_reacher/mp_wrapper.py +++ b/fancy_gym/envs/classic_control/hole_reacher/mp_wrapper.py @@ -2,16 +2,19 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): + @property - def active_obs(self): + def context_mask(self): return np.hstack([ [self.env.random_start] * self.env.n_links, # cos [self.env.random_start] * self.env.n_links, # sin [self.env.random_start] * self.env.n_links, # velocity + [self.env.initial_width is None], # hole width + # [self.env.hole_depth is None], # hole depth [True] * 2, # x-y coordinates of target distance [False] # env steps ]) @@ -23,11 +26,3 @@ class MPWrapper(MPEnvWrapper): @property def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: return self.env.current_vel - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/mujoco/reacher/__init__.py b/fancy_gym/envs/classic_control/simple_reacher/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/reacher/__init__.py rename to fancy_gym/envs/classic_control/simple_reacher/__init__.py diff --git a/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py new file mode 100644 index 0000000..6d1fda1 --- /dev/null +++ b/fancy_gym/envs/classic_control/simple_reacher/mp_wrapper.py @@ -0,0 +1,26 @@ +from typing import Tuple, Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self): + return np.hstack([ + [self.env.random_start] * self.env.n_links, # cos + [self.env.random_start] * self.env.n_links, # sin + [self.env.random_start] * self.env.n_links, # velocity + [True] * 2, # x-y coordinates of target distance + [False] # env steps + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: + return self.env.current_pos + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.env.current_vel diff --git a/alr_envs/alr/classic_control/simple_reacher/simple_reacher.py b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py similarity index 91% rename from alr_envs/alr/classic_control/simple_reacher/simple_reacher.py rename to fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py index dac06a3..9b03147 100644 --- a/alr_envs/alr/classic_control/simple_reacher/simple_reacher.py +++ b/fancy_gym/envs/classic_control/simple_reacher/simple_reacher.py @@ -1,10 +1,11 @@ -from typing import Iterable, Union +from typing import Iterable, Union, Optional, Tuple import matplotlib.pyplot as plt import numpy as np from gym import spaces +from gym.core import ObsType -from alr_envs.alr.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv +from fancy_gym.envs.classic_control.base_reacher.base_reacher_torque import BaseReacherTorqueEnv class SimpleReacherEnv(BaseReacherTorqueEnv): @@ -15,7 +16,7 @@ class SimpleReacherEnv(BaseReacherTorqueEnv): """ def __init__(self, n_links: int, target: Union[None, Iterable] = None, random_start: bool = True, - allow_self_collision: bool = False,): + allow_self_collision: bool = False, ): super().__init__(n_links, random_start, allow_self_collision) # provided initial parameters @@ -41,7 +42,8 @@ class SimpleReacherEnv(BaseReacherTorqueEnv): # def start_pos(self): # return self._start_pos - def reset(self): + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: self._generate_goal() return super().reset() diff --git a/alr_envs/alr/classic_control/utils.py b/fancy_gym/envs/classic_control/utils.py similarity index 96% rename from alr_envs/alr/classic_control/utils.py rename to fancy_gym/envs/classic_control/utils.py index b557a0a..ea378d1 100644 --- a/alr_envs/alr/classic_control/utils.py +++ b/fancy_gym/envs/classic_control/utils.py @@ -1,6 +1,3 @@ -import numpy as np - - def ccw(A, B, C): return (C[1] - A[1]) * (B[0] - A[0]) - (B[1] - A[1]) * (C[0] - A[0]) > 1e-12 diff --git a/alr_envs/alr/mujoco/table_tennis/__init__.py b/fancy_gym/envs/classic_control/viapoint_reacher/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/table_tennis/__init__.py rename to fancy_gym/envs/classic_control/viapoint_reacher/__init__.py diff --git a/alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py similarity index 67% rename from alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py rename to fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py index 6b3e85d..47da749 100644 --- a/alr_envs/alr/classic_control/viapoint_reacher/mp_wrapper.py +++ b/fancy_gym/envs/classic_control/viapoint_reacher/mp_wrapper.py @@ -2,12 +2,13 @@ from typing import Tuple, Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): + @property - def active_obs(self): + def context_mask(self): return np.hstack([ [self.env.random_start] * self.env.n_links, # cos [self.env.random_start] * self.env.n_links, # sin @@ -24,11 +25,3 @@ class MPWrapper(MPEnvWrapper): @property def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: return self.env.current_vel - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py similarity index 95% rename from alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py rename to fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py index 292e40a..f3412ac 100644 --- a/alr_envs/alr/classic_control/viapoint_reacher/viapoint_reacher.py +++ b/fancy_gym/envs/classic_control/viapoint_reacher/viapoint_reacher.py @@ -1,11 +1,11 @@ -from typing import Iterable, Union +from typing import Iterable, Union, Tuple, Optional import gym import matplotlib.pyplot as plt import numpy as np -from gym.utils import seeding +from gym.core import ObsType -from alr_envs.alr.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv +from fancy_gym.envs.classic_control.base_reacher.base_reacher_direct import BaseReacherDirectEnv class ViaPointReacherEnv(BaseReacherDirectEnv): @@ -40,7 +40,8 @@ class ViaPointReacherEnv(BaseReacherDirectEnv): # def start_pos(self): # return self._start_pos - def reset(self): + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: self._generate_goal() return super().reset() @@ -183,8 +184,9 @@ class ViaPointReacherEnv(BaseReacherDirectEnv): plt.pause(0.01) + if __name__ == "__main__": - import time + env = ViaPointReacherEnv(5) env.reset() diff --git a/alr_envs/alr/mujoco/README.MD b/fancy_gym/envs/mujoco/README.MD similarity index 100% rename from alr_envs/alr/mujoco/README.MD rename to fancy_gym/envs/mujoco/README.MD diff --git a/fancy_gym/envs/mujoco/__init__.py b/fancy_gym/envs/mujoco/__init__.py new file mode 100644 index 0000000..840691f --- /dev/null +++ b/fancy_gym/envs/mujoco/__init__.py @@ -0,0 +1,9 @@ +from .ant_jump.ant_jump import AntJumpEnv +from .beerpong.beerpong import BeerPongEnv, BeerPongEnvStepBasedEpisodicReward +from .half_cheetah_jump.half_cheetah_jump import HalfCheetahJumpEnv +from .hopper_jump.hopper_jump import HopperJumpEnv +from .hopper_jump.hopper_jump_on_box import HopperJumpOnBoxEnv +from .hopper_throw.hopper_throw import HopperThrowEnv +from .hopper_throw.hopper_throw_in_basket import HopperThrowInBasketEnv +from .reacher.reacher import ReacherEnv +from .walker_2d_jump.walker_2d_jump import Walker2dJumpEnv diff --git a/fancy_gym/envs/mujoco/ant_jump/__init__.py b/fancy_gym/envs/mujoco/ant_jump/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/ant_jump/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/fancy_gym/envs/mujoco/ant_jump/ant_jump.py b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py new file mode 100644 index 0000000..9311ae1 --- /dev/null +++ b/fancy_gym/envs/mujoco/ant_jump/ant_jump.py @@ -0,0 +1,109 @@ +from typing import Tuple, Union, Optional + +import numpy as np +from gym.core import ObsType +from gym.envs.mujoco.ant_v4 import AntEnv + +MAX_EPISODE_STEPS_ANTJUMP = 200 + + +# TODO: This environment was not tested yet. Do the following todos and test it. +# TODO: Right now this environment only considers jumping to a specific height, which is not nice. It should be extended +# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as heigh +# as possible, while landing at a specific target position + + +class AntJumpEnv(AntEnv): + """ + Initialization changes to normal Ant: + - healthy_reward: 1.0 -> 0.01 -> 0.0 no healthy reward needed - Paul and Marc + - _ctrl_cost_weight 0.5 -> 0.0 + - contact_cost_weight: 5e-4 -> 0.0 + - healthy_z_range: (0.2, 1.0) -> (0.3, float('inf')) !!!!! Does that make sense, limiting height? + """ + + def __init__(self, + xml_file='ant.xml', + ctrl_cost_weight=0.0, + contact_cost_weight=0.0, + healthy_reward=0.0, + terminate_when_unhealthy=True, + healthy_z_range=(0.3, float('inf')), + contact_force_range=(-1.0, 1.0), + reset_noise_scale=0.1, + exclude_current_positions_from_observation=True, + ): + self.current_step = 0 + self.max_height = 0 + self.goal = 0 + super().__init__(xml_file=xml_file, + ctrl_cost_weight=ctrl_cost_weight, + contact_cost_weight=contact_cost_weight, + healthy_reward=healthy_reward, + terminate_when_unhealthy=terminate_when_unhealthy, + healthy_z_range=healthy_z_range, + contact_force_range=contact_force_range, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + def step(self, action): + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + + height = self.get_body_com("torso")[2].copy() + + self.max_height = max(height, self.max_height) + + rewards = 0 + + ctrl_cost = self.control_cost(action) + contact_cost = self.contact_cost + + costs = ctrl_cost + contact_cost + + done = bool(height < 0.3) # fall over -> is the 0.3 value from healthy_z_range? TODO change 0.3 to the value of healthy z angle + + if self.current_step == MAX_EPISODE_STEPS_ANTJUMP or done: + # -10 for scaling the value of the distance between the max_height and the goal height; only used when context is enabled + # height_reward = -10 * (np.linalg.norm(self.max_height - self.goal)) + height_reward = -10 * np.linalg.norm(self.max_height - self.goal) + # no healthy reward when using context, because we optimize a negative value + healthy_reward = 0 + + rewards = height_reward + healthy_reward + + obs = self._get_obs() + reward = rewards - costs + + info = { + 'height': height, + 'max_height': self.max_height, + 'goal': self.goal + } + + return obs, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.goal) + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: + self.current_step = 0 + self.max_height = 0 + # goal heights from 1.0 to 2.5; can be increased, but didnt work well with CMORE + self.goal = self.np_random.uniform(1.0, 2.5, 1) + return super().reset() + + # reset_model had to be implemented in every env to make it deterministic + def reset_model(self): + # Todo remove if not needed + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation diff --git a/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml b/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml new file mode 100644 index 0000000..ee4d679 --- /dev/null +++ b/fancy_gym/envs/mujoco/ant_jump/assets/ant.xml @@ -0,0 +1,81 @@ + + + diff --git a/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py new file mode 100644 index 0000000..2f2eb76 --- /dev/null +++ b/fancy_gym/envs/mujoco/ant_jump/mp_wrapper.py @@ -0,0 +1,23 @@ +from typing import Union, Tuple + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self): + return np.hstack([ + [False] * 111, # ant has 111 dimensional observation space !! + [True] # goal height + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.data.qpos[7:15].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[6:14].copy() diff --git a/fancy_gym/envs/mujoco/beerpong/__init__.py b/fancy_gym/envs/mujoco/beerpong/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/alr_envs/alr/mujoco/beerpong/assets/beerpong.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong.xml similarity index 100% rename from alr_envs/alr/mujoco/beerpong/assets/beerpong.xml rename to fancy_gym/envs/mujoco/beerpong/assets/beerpong.xml diff --git a/alr_envs/alr/mujoco/beerpong/assets/beerpong_wo_cup.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml similarity index 99% rename from alr_envs/alr/mujoco/beerpong/assets/beerpong_wo_cup.xml rename to fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml index e96d2bc..6786ecf 100644 --- a/alr_envs/alr/mujoco/beerpong/assets/beerpong_wo_cup.xml +++ b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup.xml @@ -123,7 +123,7 @@ - + diff --git a/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml new file mode 100644 index 0000000..4497818 --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/assets/beerpong_wo_cup_big_table.xml @@ -0,0 +1,193 @@ + + + diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_convex.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_convex.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/base_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/base_link_fine.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_dist_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_convex.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/bhand_finger_dist_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_convex.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_dist_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_dist_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_convex.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_convex.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_med_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_med_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_convex_decomposition_p3.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_convex_decomposition_p3.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_finger_prox_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_finger_prox_link_fine.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/bhand_palm_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/bhand_palm_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p3.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p3.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/bhand_palm_link_convex_decomposition_p4.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/bhand_palm_link_convex_decomposition_p4.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split1.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split1.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split10.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split10.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split10.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split10.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split11.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split11.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split11.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split11.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split12.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split12.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split12.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split12.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split13.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split13.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split13.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split13.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split14.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split14.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split14.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split14.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split15.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split15.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split15.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split15.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split16.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split16.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split16.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split16.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split17.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split17.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split17.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split17.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split18.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split18.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split18.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split18.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split2.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split2.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split3.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split3.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split3.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split4.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split4.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split4.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split4.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split5.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split5.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split5.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split5.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split6.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split6.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split6.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split6.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split7.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split7.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split7.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split7.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split8.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split8.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split8.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split8.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/cup_split9.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split9.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/cup_split9.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/cup_split9.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_convex.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/elbow_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_convex.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/elbow_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/elbow_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/elbow_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p1.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p2.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/forearm_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/forearm_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p1.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p2.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p3.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_convex_decomposition_p3.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_convex_decomposition_p3.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_convex.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_convex.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/shoulder_pitch_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/shoulder_pitch_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p1.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p2.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_fine.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/upper_arm_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/upper_arm_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_convex.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_palm_link_convex.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_convex.stl diff --git a/alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/meshes/wam/wrist_palm_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_palm_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_convex_decomposition_p3.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_convex_decomposition_p3.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_pitch_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_pitch_link_fine.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p1.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p1.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl old mode 100644 new mode 100755 similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_convex_decomposition_p2.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_convex_decomposition_p2.stl diff --git a/alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl b/fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_fine.stl similarity index 100% rename from alr_envs/alr/mujoco/gym_table_tennis/envs/assets/meshes/wrist_yaw_link_fine.stl rename to fancy_gym/envs/mujoco/beerpong/assets/meshes/wam/wrist_yaw_link_fine.stl diff --git a/fancy_gym/envs/mujoco/beerpong/beerpong.py b/fancy_gym/envs/mujoco/beerpong/beerpong.py new file mode 100644 index 0000000..368425d --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/beerpong.py @@ -0,0 +1,266 @@ +import os +from typing import Optional + +import numpy as np +from gym import utils +from gym.envs.mujoco import MujocoEnv + +MAX_EPISODE_STEPS_BEERPONG = 300 +FIXED_RELEASE_STEP = 62 # empirically evaluated for frame_skip=2! + +# XML Variables +ROBOT_COLLISION_OBJ = ["wrist_palm_link_convex_geom", + "wrist_pitch_link_convex_decomposition_p1_geom", + "wrist_pitch_link_convex_decomposition_p2_geom", + "wrist_pitch_link_convex_decomposition_p3_geom", + "wrist_yaw_link_convex_decomposition_p1_geom", + "wrist_yaw_link_convex_decomposition_p2_geom", + "forearm_link_convex_decomposition_p1_geom", + "forearm_link_convex_decomposition_p2_geom", + "upper_arm_link_convex_decomposition_p1_geom", + "upper_arm_link_convex_decomposition_p2_geom", + "shoulder_link_convex_decomposition_p1_geom", + "shoulder_link_convex_decomposition_p2_geom", + "shoulder_link_convex_decomposition_p3_geom", + "base_link_convex_geom", "table_contact_geom"] + +CUP_COLLISION_OBJ = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6", + "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10", + "cup_geom_table15", "cup_geom_table16", "cup_geom_table17", "cup_geom1_table8"] + + +class BeerPongEnv(MujocoEnv, utils.EzPickle): + def __init__(self): + self._steps = 0 + # Small Context -> Easier. Todo: Should we do different versions? + # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", "beerpong_wo_cup.xml") + # self._cup_pos_min = np.array([-0.32, -2.2]) + # self._cup_pos_max = np.array([0.32, -1.2]) + + self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", + "beerpong_wo_cup_big_table.xml") + self._cup_pos_min = np.array([-1.42, -4.05]) + self._cup_pos_max = np.array([1.42, -1.25]) + + self._start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59]) + self._start_vel = np.zeros(7) + + self.release_step = FIXED_RELEASE_STEP + + self.repeat_action = 2 + # TODO: If accessing IDs is easier in the (new) official mujoco bindings, remove this + self.model = None + self.geom_id = lambda x: self._mujoco_bindings.mj_name2id(self.model, + self._mujoco_bindings.mjtObj.mjOBJ_GEOM, + x) + + # for reward calculation + self.dists = [] + self.dists_final = [] + self.action_costs = [] + self.ball_ground_contact_first = False + self.ball_table_contact = False + self.ball_wall_contact = False + self.ball_cup_contact = False + self.ball_in_cup = False + self.dist_ground_cup = -1 # distance floor to cup if first floor contact + + MujocoEnv.__init__(self, model_path=self.xml_path, frame_skip=1, mujoco_bindings="mujoco") + utils.EzPickle.__init__(self) + + @property + def start_pos(self): + return self._start_pos + + @property + def start_vel(self): + return self._start_vel + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): + self.dists = [] + self.dists_final = [] + self.action_costs = [] + self.ball_ground_contact_first = False + self.ball_table_contact = False + self.ball_wall_contact = False + self.ball_cup_contact = False + self.ball_in_cup = False + self.dist_ground_cup = -1 # distance floor to cup if first floor contact + return super().reset() + + def reset_model(self): + init_pos_all = self.init_qpos.copy() + init_pos_robot = self.start_pos + init_vel = np.zeros_like(init_pos_all) + + self._steps = 0 + + start_pos = init_pos_all + start_pos[0:7] = init_pos_robot + + # TODO: Ask Max why we need to set the state twice. + self.set_state(start_pos, init_vel) + start_pos[7::] = self.data.site("init_ball_pos").xpos.copy() + self.set_state(start_pos, init_vel) + xy = self.np_random.uniform(self._cup_pos_min, self._cup_pos_max) + xyz = np.zeros(3) + xyz[:2] = xy + xyz[-1] = 0.840 + self.model.body("cup_table").pos[:] = xyz + return self._get_obs() + + def step(self, a): + crash = False + for _ in range(self.repeat_action): + applied_action = a + self.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0] + try: + self.do_simulation(applied_action, self.frame_skip) + # self.reward_function.check_contacts(self.sim) # I assume this is not important? + if self._steps < self.release_step: + self.data.qpos[7::] = self.data.site('init_ball_pos').xpos.copy() + self.data.qvel[7::] = self.data.sensor('init_ball_vel').data.copy() + crash = False + except Exception as e: + crash = True + + ob = self._get_obs() + + if not crash: + reward, reward_infos = self._get_reward(applied_action) + is_collided = reward_infos['is_collided'] # TODO: Remove if self collision does not make a difference + done = is_collided + self._steps += 1 + else: + reward = -30 + done = True + reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False} + + infos = dict( + reward=reward, + action=a, + q_pos=self.data.qpos[0:7].ravel().copy(), + q_vel=self.data.qvel[0:7].ravel().copy(), sim_crash=crash, + ) + infos.update(reward_infos) + return ob, reward, done, infos + + def _get_obs(self): + theta = self.data.qpos.flat[:7].copy() + theta_dot = self.data.qvel.flat[:7].copy() + ball_pos = self.data.qpos.flat[7:].copy() + cup_goal_diff_final = ball_pos - self.data.site("cup_goal_final_table").xpos.copy() + cup_goal_diff_top = ball_pos - self.data.site("cup_goal_table").xpos.copy() + return np.concatenate([ + np.cos(theta), + np.sin(theta), + theta_dot, + cup_goal_diff_final, + cup_goal_diff_top, + self.model.body("cup_table").pos[:2].copy(), + # [self._steps], # Use TimeAwareObservation Wrapper instead .... + ]) + + @property + def dt(self): + return super(BeerPongEnv, self).dt * self.repeat_action + + def _get_reward(self, action): + goal_pos = self.data.site("cup_goal_table").xpos + goal_final_pos = self.data.site("cup_goal_final_table").xpos + ball_pos = self.data.qpos.flat[7:].copy() + ball_vel = self.data.qvel.flat[7:].copy() + + self._check_contacts() + self.dists.append(np.linalg.norm(goal_pos - ball_pos)) + self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) + self.dist_ground_cup = np.linalg.norm(ball_pos - goal_pos) \ + if self.ball_ground_contact_first and self.dist_ground_cup == -1 else self.dist_ground_cup + action_cost = np.sum(np.square(action)) + self.action_costs.append(np.copy(action_cost)) + # # ##################### Reward function which does not force to bounce once on the table (quad dist) ######### + + # Is this needed? + # self._is_collided = self._check_collision_with_itself([self.geom_id(name) for name in CUP_COLLISION_OBJ]) + + if self._steps == MAX_EPISODE_STEPS_BEERPONG - 1: # or self._is_collided: + min_dist = np.min(self.dists) + final_dist = self.dists_final[-1] + if self.ball_ground_contact_first: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 2, -4 + else: + if not self.ball_in_cup: + if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -4 + else: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -2 + else: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0 + action_cost = 1e-4 * np.mean(action_cost) + reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \ + action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2 + # release step punishment + min_time_bound = 0.1 + max_time_bound = 1.0 + release_time = self.release_step * self.dt + release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \ + int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2) + reward += release_time_rew + success = self.ball_in_cup + else: + action_cost = 1e-2 * action_cost + reward = - action_cost + success = False + # ############################################################################################################## + infos = {"success": success, "ball_pos": ball_pos.copy(), + "ball_vel": ball_vel.copy(), "action_cost": action_cost, "task_reward": reward, + "table_contact_first": int(not self.ball_ground_contact_first), + "is_collided": False} # TODO: Check if is collided is needed + return reward, infos + + def _check_contacts(self): + # TODO proper contact detection using sensors? + if not self.ball_table_contact: + self.ball_table_contact = self._check_collision({self.geom_id("ball_geom")}, + {self.geom_id("table_contact_geom")}) + if not self.ball_cup_contact: + self.ball_cup_contact = self._check_collision({self.geom_id("ball_geom")}, + {self.geom_id(name) for name in CUP_COLLISION_OBJ}) + if not self.ball_wall_contact: + self.ball_wall_contact = self._check_collision({self.geom_id("ball_geom")}, + {self.geom_id("wall")}) + if not self.ball_in_cup: + self.ball_in_cup = self._check_collision({self.geom_id("ball_geom")}, + {self.geom_id("cup_base_table_contact")}) + if not self.ball_ground_contact_first: + if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact \ + and not self.ball_in_cup: + self.ball_ground_contact_first = self._check_collision({self.geom_id("ball_geom")}, + {self.geom_id("ground")}) + + def _check_collision(self, id_set_1, id_set_2=None): + """ + Checks if id_set1 has a collision with id_set2. + If id_set_2 is set to None, it will check for a collision with itself (id_set_1). + """ + collision_id_set = id_set_2 - id_set_1 if id_set_2 is not None else id_set_1 + for coni in range(self.data.ncon): + con = self.data.contact[coni] + if ((con.geom1 in id_set_1 and con.geom2 in collision_id_set) or + (con.geom2 in id_set_1 and con.geom1 in collision_id_set)): + return True + return False + + +class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv): + + def step(self, a): + if self._steps < FIXED_RELEASE_STEP: + return super(BeerPongEnvStepBasedEpisodicReward, self).step(a) + else: + reward = 0 + done = True + while self._steps < MAX_EPISODE_STEPS_BEERPONG: + obs, sub_reward, done, infos = super(BeerPongEnvStepBasedEpisodicReward, self).step( + np.zeros(a.shape)) + reward += sub_reward + return obs, reward, done, infos diff --git a/alr_envs/examples/__init__.py b/fancy_gym/envs/mujoco/beerpong/deprecated/__init__.py similarity index 100% rename from alr_envs/examples/__init__.py rename to fancy_gym/envs/mujoco/beerpong/deprecated/__init__.py diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py new file mode 100644 index 0000000..015e887 --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong.py @@ -0,0 +1,212 @@ +import os + +import mujoco_py.builder +import numpy as np +from gym import utils +from gym.envs.mujoco import MujocoEnv + +from fancy_gym.envs.mujoco.beerpong.deprecated.beerpong_reward_staged import BeerPongReward + + +class BeerPongEnv(MujocoEnv, utils.EzPickle): + def __init__(self, frame_skip=2): + self._steps = 0 + # Small Context -> Easier. Todo: Should we do different versions? + # self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "assets", + # "beerpong_wo_cup" + ".xml") + # self._cup_pos_min = np.array([-0.32, -2.2]) + # self._cup_pos_max = np.array([0.32, -1.2]) + + self.xml_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), "../assets", + "beerpong_wo_cup_big_table" + ".xml") + self._cup_pos_min = np.array([-1.42, -4.05]) + self._cup_pos_max = np.array([1.42, -1.25]) + + self._start_pos = np.array([0.0, 1.35, 0.0, 1.18, 0.0, -0.786, -1.59]) + self._start_vel = np.zeros(7) + + self.release_step = 100 # time step of ball release + self.ep_length = 600 // frame_skip + + self.reward_function = BeerPongReward() + self.repeat_action = frame_skip + self.model = None + self.site_id = lambda x: self.model.site_name2id(x) + self.body_id = lambda x: self.model.body_name2id(x) + + MujocoEnv.__init__(self, self.xml_path, frame_skip=1) + utils.EzPickle.__init__(self) + + @property + def start_pos(self): + return self._start_pos + + @property + def start_vel(self): + return self._start_vel + + def reset(self): + self.reward_function.reset() + return super().reset() + + def reset_model(self): + init_pos_all = self.init_qpos.copy() + init_pos_robot = self.start_pos + init_vel = np.zeros_like(init_pos_all) + + self._steps = 0 + + start_pos = init_pos_all + start_pos[0:7] = init_pos_robot + + # TODO: Ask Max why we need to set the state twice. + self.set_state(start_pos, init_vel) + start_pos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy() + self.set_state(start_pos, init_vel) + xy = self.np_random.uniform(self._cup_pos_min, self._cup_pos_max) + xyz = np.zeros(3) + xyz[:2] = xy + xyz[-1] = 0.840 + self.sim.model.body_pos[self.body_id("cup_table")] = xyz + return self._get_obs() + + def step(self, a): + crash = False + for _ in range(self.repeat_action): + applied_action = a + self.sim.data.qfrc_bias[:len(a)].copy() / self.model.actuator_gear[:, 0] + try: + self.do_simulation(applied_action, self.frame_skip) + self.reward_function.initialize(self) + # self.reward_function.check_contacts(self.sim) # I assume this is not important? + if self._steps < self.release_step: + self.sim.data.qpos[7::] = self.sim.data.site_xpos[self.site_id("init_ball_pos"), :].copy() + self.sim.data.qvel[7::] = self.sim.data.site_xvelp[self.site_id("init_ball_pos"), :].copy() + crash = False + except mujoco_py.builder.MujocoException: + crash = True + + ob = self._get_obs() + + if not crash: + reward, reward_infos = self.reward_function.compute_reward(self, applied_action) + is_collided = reward_infos['is_collided'] + done = is_collided or self._steps == self.ep_length - 1 + self._steps += 1 + else: + reward = -30 + done = True + reward_infos = {"success": False, "ball_pos": np.zeros(3), "ball_vel": np.zeros(3), "is_collided": False} + + infos = dict( + reward=reward, + action=a, + q_pos=self.sim.data.qpos[0:7].ravel().copy(), + q_vel=self.sim.data.qvel[0:7].ravel().copy(), sim_crash=crash, + ) + infos.update(reward_infos) + return ob, reward, done, infos + + def _get_obs(self): + theta = self.sim.data.qpos.flat[:7] + theta_dot = self.sim.data.qvel.flat[:7] + ball_pos = self.data.get_body_xpos("ball").copy() + cup_goal_diff_final = ball_pos - self.data.get_site_xpos("cup_goal_final_table").copy() + cup_goal_diff_top = ball_pos - self.data.get_site_xpos("cup_goal_table").copy() + return np.concatenate([ + np.cos(theta), + np.sin(theta), + theta_dot, + cup_goal_diff_final, + cup_goal_diff_top, + self.sim.model.body_pos[self.body_id("cup_table")][:2].copy(), + [self._steps], + ]) + + @property + def dt(self): + return super(BeerPongEnv, self).dt * self.repeat_action + + +class BeerPongEnvFixedReleaseStep(BeerPongEnv): + def __init__(self, frame_skip=2): + super().__init__(frame_skip) + self.release_step = 62 # empirically evaluated for frame_skip=2! + + +class BeerPongEnvStepBasedEpisodicReward(BeerPongEnv): + def __init__(self, frame_skip=2): + super().__init__(frame_skip) + self.release_step = 62 # empirically evaluated for frame_skip=2! + + def step(self, a): + if self._steps < self.release_step: + return super(BeerPongEnvStepBasedEpisodicReward, self).step(a) + else: + reward = 0 + done = False + while not done: + sub_ob, sub_reward, done, sub_infos = super(BeerPongEnvStepBasedEpisodicReward, self).step( + np.zeros(a.shape)) + reward += sub_reward + infos = sub_infos + ob = sub_ob + ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the + # internal steps and thus, the observation also needs to be set correctly + return ob, reward, done, infos + + +# class BeerBongEnvStepBased(BeerBongEnv): +# def __init__(self, frame_skip=1, apply_gravity_comp=True, noisy=False, rndm_goal=False, cup_goal_pos=None): +# super().__init__(frame_skip, apply_gravity_comp, noisy, rndm_goal, cup_goal_pos) +# self.release_step = 62 # empirically evaluated for frame_skip=2! +# +# def step(self, a): +# if self._steps < self.release_step: +# return super(BeerBongEnvStepBased, self).step(a) +# else: +# reward = 0 +# done = False +# while not done: +# sub_ob, sub_reward, done, sub_infos = super(BeerBongEnvStepBased, self).step(np.zeros(a.shape)) +# if not done or sub_infos['sim_crash']: +# reward += sub_reward +# else: +# ball_pos = self.sim.data.body_xpos[self.sim.model._body_name2id["ball"]].copy() +# cup_goal_dist_final = np.linalg.norm(ball_pos - self.sim.data.site_xpos[ +# self.sim.model._site_name2id["cup_goal_final_table"]].copy()) +# cup_goal_dist_top = np.linalg.norm(ball_pos - self.sim.data.site_xpos[ +# self.sim.model._site_name2id["cup_goal_table"]].copy()) +# if sub_infos['success']: +# dist_rew = -cup_goal_dist_final ** 2 +# else: +# dist_rew = -0.5 * cup_goal_dist_final ** 2 - cup_goal_dist_top ** 2 +# reward = reward - sub_infos['action_cost'] + dist_rew +# infos = sub_infos +# ob = sub_ob +# ob[-1] = self.release_step + 1 # Since we simulate until the end of the episode, PPO does not see the +# # internal steps and thus, the observation also needs to be set correctly +# return ob, reward, done, infos + + +if __name__ == "__main__": + env = BeerPongEnv(frame_skip=2) + env.seed(0) + # env = BeerBongEnvStepBased(frame_skip=2) + # env = BeerBongEnvStepBasedEpisodicReward(frame_skip=2) + # env = BeerBongEnvFixedReleaseStep(frame_skip=2) + import time + + env.reset() + env.render("human") + for i in range(600): + # ac = 10 * env.action_space.sample() + ac = 0.05 * np.ones(7) + obs, rew, d, info = env.step(ac) + env.render("human") + + if d: + print('reward:', rew) + print('RESETTING') + env.reset() + time.sleep(1) + env.close() diff --git a/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py new file mode 100644 index 0000000..4308921 --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/deprecated/beerpong_reward_staged.py @@ -0,0 +1,183 @@ +import numpy as np + + +class BeerPongReward: + def __init__(self): + + self.robot_collision_objects = ["wrist_palm_link_convex_geom", + "wrist_pitch_link_convex_decomposition_p1_geom", + "wrist_pitch_link_convex_decomposition_p2_geom", + "wrist_pitch_link_convex_decomposition_p3_geom", + "wrist_yaw_link_convex_decomposition_p1_geom", + "wrist_yaw_link_convex_decomposition_p2_geom", + "forearm_link_convex_decomposition_p1_geom", + "forearm_link_convex_decomposition_p2_geom", + "upper_arm_link_convex_decomposition_p1_geom", + "upper_arm_link_convex_decomposition_p2_geom", + "shoulder_link_convex_decomposition_p1_geom", + "shoulder_link_convex_decomposition_p2_geom", + "shoulder_link_convex_decomposition_p3_geom", + "base_link_convex_geom", "table_contact_geom"] + + self.cup_collision_objects = ["cup_geom_table3", "cup_geom_table4", "cup_geom_table5", "cup_geom_table6", + "cup_geom_table7", "cup_geom_table8", "cup_geom_table9", "cup_geom_table10", + "cup_geom_table15", + "cup_geom_table16", + "cup_geom_table17", "cup_geom1_table8", + ] + + self.dists = None + self.dists_final = None + self.action_costs = None + self.ball_ground_contact_first = False + self.ball_table_contact = False + self.ball_wall_contact = False + self.ball_cup_contact = False + self.ball_in_cup = False + self.dist_ground_cup = -1 # distance floor to cup if first floor contact + + # IDs + self.ball_collision_id = None + self.table_collision_id = None + self.wall_collision_id = None + self.cup_table_collision_id = None + self.ground_collision_id = None + self.cup_collision_ids = None + self.robot_collision_ids = None + self.reset() + self.is_initialized = False + + def reset(self): + self.dists = [] + self.dists_final = [] + self.action_costs = [] + self.ball_ground_contact_first = False + self.ball_table_contact = False + self.ball_wall_contact = False + self.ball_cup_contact = False + self.ball_in_cup = False + self.dist_ground_cup = -1 # distance floor to cup if first floor contact + + def initialize(self, env): + + if not self.is_initialized: + self.is_initialized = True + # TODO: Find a more elegant way to acces to the geom ids in each step -> less code + self.ball_collision_id = {env.model.geom_name2id("ball_geom")} + # self.ball_collision_id = env.model.geom_name2id("ball_geom") + self.table_collision_id = {env.model.geom_name2id("table_contact_geom")} + # self.table_collision_id = env.model.geom_name2id("table_contact_geom") + self.wall_collision_id = {env.model.geom_name2id("wall")} + # self.wall_collision_id = env.model.geom_name2id("wall") + self.cup_table_collision_id = {env.model.geom_name2id("cup_base_table_contact")} + # self.cup_table_collision_id = env.model.geom_name2id("cup_base_table_contact") + self.ground_collision_id = {env.model.geom_name2id("ground")} + # self.ground_collision_id = env.model.geom_name2id("ground") + self.cup_collision_ids = {env.model.geom_name2id(name) for name in self.cup_collision_objects} + # self.cup_collision_ids = [env.model.geom_name2id(name) for name in self.cup_collision_objects] + self.robot_collision_ids = [env.model.geom_name2id(name) for name in self.robot_collision_objects] + + def compute_reward(self, env, action): + + goal_pos = env.data.get_site_xpos("cup_goal_table") + ball_pos = env.data.get_body_xpos("ball") + ball_vel = env.data.get_body_xvelp("ball") + goal_final_pos = env.data.get_site_xpos("cup_goal_final_table") + + self.check_contacts(env.sim) + self.dists.append(np.linalg.norm(goal_pos - ball_pos)) + self.dists_final.append(np.linalg.norm(goal_final_pos - ball_pos)) + self.dist_ground_cup = np.linalg.norm(ball_pos - goal_pos) \ + if self.ball_ground_contact_first and self.dist_ground_cup == -1 else self.dist_ground_cup + action_cost = np.sum(np.square(action)) + self.action_costs.append(np.copy(action_cost)) + # # ##################### Reward function which does not force to bounce once on the table (quad dist) ######### + + # Is this needed? + # self._is_collided = self._check_collision_with_itself(env.sim, self.robot_collision_ids) + + if env._steps == env.ep_length - 1: # or self._is_collided: + min_dist = np.min(self.dists) + final_dist = self.dists_final[-1] + if self.ball_ground_contact_first: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 2, -4 + else: + if not self.ball_in_cup: + if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -4 + else: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 1, 0.5, 0, -2 + else: + min_dist_coeff, final_dist_coeff, ground_contact_dist_coeff, rew_offset = 0, 1, 0, 0 + action_cost = 1e-4 * np.mean(action_cost) + reward = rew_offset - min_dist_coeff * min_dist ** 2 - final_dist_coeff * final_dist ** 2 - \ + action_cost - ground_contact_dist_coeff * self.dist_ground_cup ** 2 + # release step punishment + min_time_bound = 0.1 + max_time_bound = 1.0 + release_time = env.release_step * env.dt + release_time_rew = int(release_time < min_time_bound) * (-30 - 10 * (release_time - min_time_bound) ** 2) + \ + int(release_time > max_time_bound) * (-30 - 10 * (release_time - max_time_bound) ** 2) + reward += release_time_rew + success = self.ball_in_cup + else: + action_cost = 1e-2 * action_cost + reward = - action_cost + success = False + # ############################################################################################################## + infos = {"success": success, "ball_pos": ball_pos.copy(), + "ball_vel": ball_vel.copy(), "action_cost": action_cost, "task_reward": reward, + "table_contact_first": int(not self.ball_ground_contact_first), + "is_collided": False} # TODO: Check if is collided is needed + return reward, infos + + def check_contacts(self, sim): + if not self.ball_table_contact: + self.ball_table_contact = self._check_collision(sim, self.ball_collision_id, self.table_collision_id) + if not self.ball_cup_contact: + self.ball_cup_contact = self._check_collision(sim, self.ball_collision_id, self.cup_collision_ids) + if not self.ball_wall_contact: + self.ball_wall_contact = self._check_collision(sim, self.ball_collision_id, self.wall_collision_id) + if not self.ball_in_cup: + self.ball_in_cup = self._check_collision(sim, self.ball_collision_id, self.cup_table_collision_id) + if not self.ball_ground_contact_first: + if not self.ball_table_contact and not self.ball_cup_contact and not self.ball_wall_contact \ + and not self.ball_in_cup: + self.ball_ground_contact_first = self._check_collision(sim, self.ball_collision_id, + self.ground_collision_id) + + # Checks if id_set1 has a collision with id_set2 + def _check_collision(self, sim, id_set_1, id_set_2): + """ + If id_set_2 is set to None, it will check for a collision with itself (id_set_1). + """ + collision_id_set = id_set_2 - id_set_1 if id_set_2 is not None else id_set_1 + for coni in range(sim.data.ncon): + con = sim.data.contact[coni] + if ((con.geom1 in id_set_1 and con.geom2 in collision_id_set) or + (con.geom2 in id_set_1 and con.geom1 in collision_id_set)): + return True + return False + + # def _check_collision_with_itself(self, sim, collision_ids): + # col_1, col_2 = False, False + # for j, id in enumerate(collision_ids): + # col_1 = self._check_collision_with_set_of_objects(sim, id, collision_ids[:j]) + # if j != len(collision_ids) - 1: + # col_2 = self._check_collision_with_set_of_objects(sim, id, collision_ids[j + 1:]) + # else: + # col_2 = False + # collision = True if col_1 or col_2 else False + # return collision + + ### This function will not be needed if we really do not need to check for collision with itself + # def _check_collision_with_set_of_objects(self, sim, id_1, id_list): + # for coni in range(0, sim.data.ncon): + # con = sim.data.contact[coni] + # + # collision = con.geom1 in id_list and con.geom2 == id_1 + # collision_trans = con.geom1 == id_1 and con.geom2 in id_list + # + # if collision or collision_trans: + # return True + # return False diff --git a/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py new file mode 100644 index 0000000..8988f5a --- /dev/null +++ b/fancy_gym/envs/mujoco/beerpong/mp_wrapper.py @@ -0,0 +1,41 @@ +from typing import Union, Tuple + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self) -> np.ndarray: + return np.hstack([ + [False] * 7, # cos + [False] * 7, # sin + [False] * 7, # joint velocities + [False] * 3, # cup_goal_diff_final + [False] * 3, # cup_goal_diff_top + [True] * 2, # xy position of cup + # [False] # env steps + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qpos[0:7].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[0:7].copy() + + # TODO: Fix this + def episode_callback(self, action: np.ndarray, mp) -> Tuple[np.ndarray, Union[np.ndarray, None]]: + if mp.learn_tau: + self.release_step = action[0] / self.dt # Tau value + return action, None + + def set_context(self, context): + xyz = np.zeros(3) + xyz[:2] = context + xyz[-1] = 0.840 + self.model.body_pos[self.cup_table_id] = xyz + return self.get_observation_from_step(self.get_obs()) diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py b/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/half_cheetah_jump/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml b/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml new file mode 100644 index 0000000..41daadf --- /dev/null +++ b/fancy_gym/envs/mujoco/half_cheetah_jump/assets/cheetah.xml @@ -0,0 +1,62 @@ + + + + + + + + + diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py new file mode 100644 index 0000000..e0a5982 --- /dev/null +++ b/fancy_gym/envs/mujoco/half_cheetah_jump/half_cheetah_jump.py @@ -0,0 +1,90 @@ +import os +from typing import Tuple, Union, Optional + +import numpy as np +from gym.core import ObsType +from gym.envs.mujoco.half_cheetah_v4 import HalfCheetahEnv + +MAX_EPISODE_STEPS_HALFCHEETAHJUMP = 100 + + +class HalfCheetahJumpEnv(HalfCheetahEnv): + """ + _ctrl_cost_weight 0.1 -> 0.0 + """ + + def __init__(self, + xml_file='cheetah.xml', + forward_reward_weight=1.0, + ctrl_cost_weight=0.0, + reset_noise_scale=0.1, + context=True, + exclude_current_positions_from_observation=True, + max_episode_steps=100): + self.current_step = 0 + self.max_height = 0 + # self.max_episode_steps = max_episode_steps + self.goal = 0 + self.context = context + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + super().__init__(xml_file=xml_file, + forward_reward_weight=forward_reward_weight, + ctrl_cost_weight=ctrl_cost_weight, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + def step(self, action): + + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + + height_after = self.get_body_com("torso")[2] + self.max_height = max(height_after, self.max_height) + + ## Didnt use fell_over, because base env also has no done condition - Paul and Marc + # fell_over = abs(self.sim.data.qpos[2]) > 2.5 # how to figure out if the cheetah fell over? -> 2.5 oke? + # TODO: Should a fall over be checked here? + done = False + + ctrl_cost = self.control_cost(action) + costs = ctrl_cost + + if self.current_step == MAX_EPISODE_STEPS_HALFCHEETAHJUMP: + height_goal_distance = -10 * np.linalg.norm(self.max_height - self.goal) + 1e-8 if self.context \ + else self.max_height + rewards = self._forward_reward_weight * height_goal_distance + else: + rewards = 0 + + observation = self._get_obs() + reward = rewards - costs + info = { + 'height': height_after, + 'max_height': self.max_height + } + + return observation, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.goal) + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, + options: Optional[dict] = None, ) -> Union[ObsType, Tuple[ObsType, dict]]: + self.max_height = 0 + self.current_step = 0 + self.goal = self.np_random.uniform(1.1, 1.6, 1) # 1.1 1.6 + return super().reset() + + # overwrite reset_model to make it deterministic + def reset_model(self): + # TODO remove if not needed! + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation diff --git a/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py new file mode 100644 index 0000000..11b169b --- /dev/null +++ b/fancy_gym/envs/mujoco/half_cheetah_jump/mp_wrapper.py @@ -0,0 +1,22 @@ +from typing import Tuple, Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + @property + def context_mask(self) -> np.ndarray: + return np.hstack([ + [False] * 17, + [True] # goal height + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.data.qpos[3:9].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[3:9].copy() diff --git a/fancy_gym/envs/mujoco/hopper_jump/__init__.py b/fancy_gym/envs/mujoco/hopper_jump/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml new file mode 100644 index 0000000..3348bab --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump.xml @@ -0,0 +1,52 @@ + + + + + + + + diff --git a/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml new file mode 100644 index 0000000..69d78ff --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/assets/hopper_jump_on_box.xml @@ -0,0 +1,51 @@ + + + + + + + + \ No newline at end of file diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py new file mode 100644 index 0000000..da9ac4d --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump.py @@ -0,0 +1,251 @@ +import os + +import numpy as np +from gym.envs.mujoco.hopper_v4 import HopperEnv + +MAX_EPISODE_STEPS_HOPPERJUMP = 250 + + +class HopperJumpEnv(HopperEnv): + """ + Initialization changes to normal Hopper: + - terminate_when_unhealthy: True -> False + - healthy_reward: 1.0 -> 2.0 + - healthy_z_range: (0.7, float('inf')) -> (0.5, float('inf')) + - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf')) + - exclude_current_positions_from_observation: True -> False + """ + + def __init__( + self, + xml_file='hopper_jump.xml', + forward_reward_weight=1.0, + ctrl_cost_weight=1e-3, + healthy_reward=2.0, + contact_weight=2.0, + height_weight=10.0, + dist_weight=3.0, + terminate_when_unhealthy=False, + healthy_state_range=(-100.0, 100.0), + healthy_z_range=(0.5, float('inf')), + healthy_angle_range=(-float('inf'), float('inf')), + reset_noise_scale=5e-3, + exclude_current_positions_from_observation=False, + sparse=False, + ): + + self.sparse = sparse + self._height_weight = height_weight + self._dist_weight = dist_weight + self._contact_weight = contact_weight + + self.max_height = 0 + self.goal = np.zeros(3, ) + + self._steps = 0 + self.contact_with_floor = False + self.init_floor_contact = False + self.has_left_floor = False + self.contact_dist = None + + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + super().__init__(xml_file=xml_file, + forward_reward_weight=forward_reward_weight, + ctrl_cost_weight=ctrl_cost_weight, + healthy_reward=healthy_reward, + terminate_when_unhealthy=terminate_when_unhealthy, + healthy_state_range=healthy_state_range, + healthy_z_range=healthy_z_range, + healthy_angle_range=healthy_angle_range, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + # increase initial height + self.init_qpos[1] = 1.5 + + @property + def exclude_current_positions_from_observation(self): + return self._exclude_current_positions_from_observation + + def step(self, action): + self._steps += 1 + + self.do_simulation(action, self.frame_skip) + + height_after = self.get_body_com("torso")[2] + #site_pos_after = self.data.get_site_xpos('foot_site') + site_pos_after = self.data.site('foot_site').xpos + self.max_height = max(height_after, self.max_height) + + has_floor_contact = self._is_floor_foot_contact() if not self.contact_with_floor else False + + if not self.init_floor_contact: + self.init_floor_contact = has_floor_contact + if self.init_floor_contact and not self.has_left_floor: + self.has_left_floor = not has_floor_contact + if not self.contact_with_floor and self.has_left_floor: + self.contact_with_floor = has_floor_contact + + ctrl_cost = self.control_cost(action) + costs = ctrl_cost + done = False + + goal_dist = np.linalg.norm(site_pos_after - self.goal) + if self.contact_dist is None and self.contact_with_floor: + self.contact_dist = goal_dist + + rewards = 0 + if not self.sparse or (self.sparse and self._steps >= MAX_EPISODE_STEPS_HOPPERJUMP): + healthy_reward = self.healthy_reward + distance_reward = -goal_dist * self._dist_weight + height_reward = (self.max_height if self.sparse else height_after) * self._height_weight + contact_reward = -(self.contact_dist or 5) * self._contact_weight + rewards = self._forward_reward_weight * (distance_reward + height_reward + contact_reward + healthy_reward) + + observation = self._get_obs() + reward = rewards - costs + info = dict( + height=height_after, + x_pos=site_pos_after, + max_height=self.max_height, + goal=self.goal[:1], + goal_dist=goal_dist, + height_rew=self.max_height, + healthy_reward=self.healthy_reward, + healthy=self.is_healthy, + contact_dist=self.contact_dist or 0 + ) + return observation, reward, done, info + + def _get_obs(self): + # goal_dist = self.data.get_site_xpos('foot_site') - self.goal + goal_dist = self.data.site('foot_site').xpos - self.goal + return np.concatenate((super(HopperJumpEnv, self)._get_obs(), goal_dist.copy(), self.goal[:1])) + + def reset_model(self): + # super(HopperJumpEnv, self).reset_model() + + # self.goal = self.np_random.uniform(0.3, 1.35, 1)[0] + self.goal = np.concatenate([self.np_random.uniform(0.3, 1.35, 1), np.zeros(2, )]) + # self.sim.model.body_pos[self.sim.model.body_name2id('goal_site_body')] = self.goal + self.model.body('goal_site_body').pos[:] = np.copy(self.goal) + self.max_height = 0 + self._steps = 0 + + noise_low = np.zeros(self.model.nq) + noise_low[3] = -0.5 + noise_low[4] = -0.2 + + noise_high = np.zeros(self.model.nq) + noise_high[5] = 0.785 + + qpos = ( + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + + self.init_qpos + ) + qvel = ( + # self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.init_qvel + ) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + self.has_left_floor = False + self.contact_with_floor = False + self.init_floor_contact = False + self.contact_dist = None + + return observation + + def _is_floor_foot_contact(self): + # floor_geom_id = self.model.geom_name2id('floor') + # foot_geom_id = self.model.geom_name2id('foot_geom') + # TODO: do this properly over a sensor in the xml file, see dmc hopper + floor_geom_id = self._mujoco_bindings.mj_name2id(self.model, + self._mujoco_bindings.mjtObj.mjOBJ_GEOM, + 'floor') + foot_geom_id = self._mujoco_bindings.mj_name2id(self.model, + self._mujoco_bindings.mjtObj.mjOBJ_GEOM, + 'foot_geom') + for i in range(self.data.ncon): + contact = self.data.contact[i] + collision = contact.geom1 == floor_geom_id and contact.geom2 == foot_geom_id + collision_trans = contact.geom1 == foot_geom_id and contact.geom2 == floor_geom_id + if collision or collision_trans: + return True + return False + +# # TODO is that needed? if so test it +# class HopperJumpStepEnv(HopperJumpEnv): +# +# def __init__(self, +# xml_file='hopper_jump.xml', +# forward_reward_weight=1.0, +# ctrl_cost_weight=1e-3, +# healthy_reward=1.0, +# height_weight=3, +# dist_weight=3, +# terminate_when_unhealthy=False, +# healthy_state_range=(-100.0, 100.0), +# healthy_z_range=(0.5, float('inf')), +# healthy_angle_range=(-float('inf'), float('inf')), +# reset_noise_scale=5e-3, +# exclude_current_positions_from_observation=False +# ): +# +# self._height_weight = height_weight +# self._dist_weight = dist_weight +# super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy, +# healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale, +# exclude_current_positions_from_observation) +# +# def step(self, action): +# self._steps += 1 +# +# self.do_simulation(action, self.frame_skip) +# +# height_after = self.get_body_com("torso")[2] +# site_pos_after = self.data.site('foot_site').xpos.copy() +# self.max_height = max(height_after, self.max_height) +# +# ctrl_cost = self.control_cost(action) +# healthy_reward = self.healthy_reward +# height_reward = self._height_weight * height_after +# goal_dist = np.linalg.norm(site_pos_after - np.array([self.goal, 0, 0])) +# goal_dist_reward = -self._dist_weight * goal_dist +# dist_reward = self._forward_reward_weight * (goal_dist_reward + height_reward) +# +# rewards = dist_reward + healthy_reward +# costs = ctrl_cost +# done = False +# +# # This is only for logging the distance to goal when first having the contact +# has_floor_contact = self._is_floor_foot_contact() if not self.contact_with_floor else False +# +# if not self.init_floor_contact: +# self.init_floor_contact = has_floor_contact +# if self.init_floor_contact and not self.has_left_floor: +# self.has_left_floor = not has_floor_contact +# if not self.contact_with_floor and self.has_left_floor: +# self.contact_with_floor = has_floor_contact +# +# if self.contact_dist is None and self.contact_with_floor: +# self.contact_dist = goal_dist +# +# ############################################################## +# +# observation = self._get_obs() +# reward = rewards - costs +# info = { +# 'height': height_after, +# 'x_pos': site_pos_after, +# 'max_height': copy.copy(self.max_height), +# 'goal': copy.copy(self.goal), +# 'goal_dist': goal_dist, +# 'height_rew': height_reward, +# 'healthy_reward': healthy_reward, +# 'healthy': copy.copy(self.is_healthy), +# 'contact_dist': copy.copy(self.contact_dist) or 0 +# } +# return observation, reward, done, info diff --git a/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py new file mode 100644 index 0000000..f9834bd --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/hopper_jump_on_box.py @@ -0,0 +1,170 @@ +import os + +import numpy as np +from gym.envs.mujoco.hopper_v4 import HopperEnv + +MAX_EPISODE_STEPS_HOPPERJUMPONBOX = 250 + + +class HopperJumpOnBoxEnv(HopperEnv): + """ + Initialization changes to normal Hopper: + - healthy_reward: 1.0 -> 0.01 -> 0.001 + - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf')) + """ + + def __init__(self, + xml_file='hopper_jump_on_box.xml', + forward_reward_weight=1.0, + ctrl_cost_weight=1e-3, + healthy_reward=0.001, + terminate_when_unhealthy=True, + healthy_state_range=(-100.0, 100.0), + healthy_z_range=(0.7, float('inf')), + healthy_angle_range=(-float('inf'), float('inf')), + reset_noise_scale=5e-3, + context=True, + exclude_current_positions_from_observation=True, + max_episode_steps=250): + self.current_step = 0 + self.max_height = 0 + self.max_episode_steps = max_episode_steps + self.min_distance = 5000 # what value? + self.hopper_on_box = False + self.context = context + self.box_x = 1 + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + super().__init__(xml_file, forward_reward_weight, ctrl_cost_weight, healthy_reward, terminate_when_unhealthy, + healthy_state_range, healthy_z_range, healthy_angle_range, reset_noise_scale, + exclude_current_positions_from_observation) + + def step(self, action): + + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + height_after = self.get_body_com("torso")[2] + foot_pos = self.get_body_com("foot") + self.max_height = max(height_after, self.max_height) + + vx, vz, vangle = self.data.qvel[0:3] + + s = self.state_vector() + fell_over = not (np.isfinite(s).all() and (np.abs(s[2:]) < 100).all() and (height_after > .7)) + + box_pos = self.get_body_com("box") + box_size = 0.3 + box_height = 0.3 + box_center = (box_pos[0] + (box_size / 2), box_pos[1], box_height) + foot_length = 0.3 + foot_center = foot_pos[0] - (foot_length / 2) + + dist = np.linalg.norm(foot_pos - box_center) + + self.min_distance = min(dist, self.min_distance) + + # check if foot is on box + is_on_box_x = box_pos[0] <= foot_center <= box_pos[0] + box_size + is_on_box_y = True # is y always true because he can only move in x and z direction? + is_on_box_z = box_height - 0.02 <= foot_pos[2] <= box_height + 0.02 + is_on_box = is_on_box_x and is_on_box_y and is_on_box_z + if is_on_box: + self.hopper_on_box = True + + ctrl_cost = self.control_cost(action) + + costs = ctrl_cost + + done = fell_over or self.hopper_on_box + + if self.current_step >= self.max_episode_steps or done: + done = False + + max_height = self.max_height.copy() + min_distance = self.min_distance.copy() + + alive_bonus = self._healthy_reward * self.current_step + box_bonus = 0 + rewards = 0 + + # TODO explain what we did here for the calculation of the reward + if is_on_box: + if self.context: + rewards -= 100 * vx ** 2 if 100 * vx ** 2 < 1 else 1 + else: + box_bonus = 10 + rewards += box_bonus + # rewards -= dist * dist ???? why when already on box? + # reward -= 90 - abs(angle) + rewards -= 100 * vx ** 2 if 100 * vx ** 2 < 1 else 1 + rewards += max_height * 3 + rewards += alive_bonus + + else: + if self.context: + rewards = -10 - min_distance + rewards += max_height * 3 + else: + # reward -= (dist*dist) + rewards -= min_distance * min_distance + # rewards -= dist / self.max_distance + rewards += max_height + rewards += alive_bonus + + else: + rewards = 0 + + observation = self._get_obs() + reward = rewards - costs + info = { + 'height': height_after, + 'max_height': self.max_height.copy(), + 'min_distance': self.min_distance, + 'goal': self.box_x, + } + + return observation, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.box_x) + + def reset(self): + + self.max_height = 0 + self.min_distance = 5000 + self.current_step = 0 + self.hopper_on_box = False + if self.context: + self.box_x = self.np_random.uniform(1, 3, 1) + self.model.body("box").pos = [self.box_x[0], 0, 0] + return super().reset() + + # overwrite reset_model to make it deterministic + def reset_model(self): + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation + +if __name__ == '__main__': + render_mode = "human" # "human" or "partial" or "final" + env = HopperJumpOnBoxEnv() + obs = env.reset() + + for i in range(2000): + # objective.load_result("/tmp/cma") + # test with random actions + ac = env.action_space.sample() + obs, rew, d, info = env.step(ac) + if i % 10 == 0: + env.render(mode=render_mode) + if d: + print('After ', i, ' steps, done: ', d) + env.reset() + + env.close() \ No newline at end of file diff --git a/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py new file mode 100644 index 0000000..ed95b3d --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_jump/mp_wrapper.py @@ -0,0 +1,27 @@ +from typing import Union, Tuple + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + # Random x goal + random init pos + @property + def context_mask(self): + return np.hstack([ + [False] * (2 + int(not self.exclude_current_positions_from_observation)), # position + [True] * 3, # set to true if randomize initial pos + [False] * 6, # velocity + [False] * 3, # goal distance + [True] # goal + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qpos[3:6].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[3:6].copy() diff --git a/alr_envs/dmc/manipulation/reach_site/__init__.py b/fancy_gym/envs/mujoco/hopper_throw/__init__.py similarity index 100% rename from alr_envs/dmc/manipulation/reach_site/__init__.py rename to fancy_gym/envs/mujoco/hopper_throw/__init__.py diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml new file mode 100644 index 0000000..1c39602 --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw.xml @@ -0,0 +1,56 @@ + + + + + + + + diff --git a/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml new file mode 100644 index 0000000..b4f0342 --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_throw/assets/hopper_throw_in_basket.xml @@ -0,0 +1,132 @@ + + + + + + + + \ No newline at end of file diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py new file mode 100644 index 0000000..e69cea6 --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw.py @@ -0,0 +1,122 @@ +import os +from typing import Optional + +import numpy as np +from gym.envs.mujoco.hopper_v4 import HopperEnv + +MAX_EPISODE_STEPS_HOPPERTHROW = 250 + + +class HopperThrowEnv(HopperEnv): + """ + Initialization changes to normal Hopper: + - healthy_reward: 1.0 -> 0.0 -> 0.1 + - forward_reward_weight -> 5.0 + - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf')) + + Reward changes to normal Hopper: + - velocity: (x_position_after - x_position_before) -> self.get_body_com("ball")[0] + """ + + def __init__(self, + xml_file='hopper_throw.xml', + forward_reward_weight=5.0, + ctrl_cost_weight=1e-3, + healthy_reward=0.1, + terminate_when_unhealthy=True, + healthy_state_range=(-100.0, 100.0), + healthy_z_range=(0.7, float('inf')), + healthy_angle_range=(-float('inf'), float('inf')), + reset_noise_scale=5e-3, + context=True, + exclude_current_positions_from_observation=True, + max_episode_steps=250): + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + self.current_step = 0 + self.max_episode_steps = max_episode_steps + self.context = context + self.goal = 0 + super().__init__(xml_file=xml_file, + forward_reward_weight=forward_reward_weight, + ctrl_cost_weight=ctrl_cost_weight, + healthy_reward=healthy_reward, + terminate_when_unhealthy=terminate_when_unhealthy, + healthy_angle_range=healthy_state_range, + healthy_z_range=healthy_z_range, + healthy_state_range=healthy_angle_range, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + def step(self, action): + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + ball_pos_after = self.get_body_com("ball")[ + 0] # abs(self.get_body_com("ball")[0]) # use x and y to get point and use euclid distance as reward? + ball_pos_after_y = self.get_body_com("ball")[2] + + # done = self.done TODO We should use this, not sure why there is no other termination; ball_landed should be enough, because we only look at the throw itself? - Paul and Marc + ball_landed = bool(self.get_body_com("ball")[2] <= 0.05) + done = ball_landed + + ctrl_cost = self.control_cost(action) + costs = ctrl_cost + + rewards = 0 + + if self.current_step >= self.max_episode_steps or done: + distance_reward = -np.linalg.norm(ball_pos_after - self.goal) if self.context else \ + self._forward_reward_weight * ball_pos_after + healthy_reward = 0 if self.context else self.healthy_reward * self.current_step + + rewards = distance_reward + healthy_reward + + observation = self._get_obs() + reward = rewards - costs + info = { + 'ball_pos': ball_pos_after, + 'ball_pos_y': ball_pos_after_y, + '_steps': self.current_step, + 'goal': self.goal, + } + + return observation, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.goal) + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): + self.current_step = 0 + self.goal = self.goal = self.np_random.uniform(2.0, 6.0, 1) # 0.5 8.0 + return super().reset() + + # overwrite reset_model to make it deterministic + def reset_model(self): + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation + + +if __name__ == '__main__': + render_mode = "human" # "human" or "partial" or "final" + env = HopperThrowEnv() + obs = env.reset() + + for i in range(2000): + # objective.load_result("/tmp/cma") + # test with random actions + ac = env.action_space.sample() + obs, rew, d, info = env.step(ac) + if i % 10 == 0: + env.render(mode=render_mode) + if d: + print('After ', i, ' steps, done: ', d) + env.reset() + + env.close() diff --git a/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py new file mode 100644 index 0000000..76ef861 --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_throw/hopper_throw_in_basket.py @@ -0,0 +1,153 @@ +import os +from typing import Optional + +import numpy as np +from gym.envs.mujoco.hopper_v4 import HopperEnv + +MAX_EPISODE_STEPS_HOPPERTHROWINBASKET = 250 + + +class HopperThrowInBasketEnv(HopperEnv): + """ + Initialization changes to normal Hopper: + - healthy_reward: 1.0 -> 0.0 + - healthy_angle_range: (-0.2, 0.2) -> (-float('inf'), float('inf')) + - hit_basket_reward: - -> 10 + + Reward changes to normal Hopper: + - velocity: (x_position_after - x_position_before) -> (ball_position_after - ball_position_before) + """ + + def __init__(self, + xml_file='hopper_throw_in_basket.xml', + forward_reward_weight=1.0, + ctrl_cost_weight=1e-3, + healthy_reward=0.0, + hit_basket_reward=10, + basket_size=0.3, + terminate_when_unhealthy=True, + healthy_state_range=(-100.0, 100.0), + healthy_z_range=(0.7, float('inf')), + healthy_angle_range=(-float('inf'), float('inf')), + reset_noise_scale=5e-3, + context=True, + penalty=0.0, + exclude_current_positions_from_observation=True, + max_episode_steps=250): + self.hit_basket_reward = hit_basket_reward + self.current_step = 0 + self.max_episode_steps = max_episode_steps + self.ball_in_basket = False + self.basket_size = basket_size + self.context = context + self.penalty = penalty + self.basket_x = 5 + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + super().__init__(xml_file=xml_file, + forward_reward_weight=forward_reward_weight, + ctrl_cost_weight=ctrl_cost_weight, + healthy_reward=healthy_reward, + terminate_when_unhealthy=terminate_when_unhealthy, + healthy_state_range=healthy_state_range, + healthy_z_range=healthy_z_range, + healthy_angle_range=healthy_angle_range, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + def step(self, action): + + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + ball_pos = self.get_body_com("ball") + + basket_pos = self.get_body_com("basket_ground") + basket_center = (basket_pos[0] + 0.5, basket_pos[1], basket_pos[2]) + + is_in_basket_x = ball_pos[0] >= basket_pos[0] and ball_pos[0] <= basket_pos[0] + self.basket_size + is_in_basket_y = ball_pos[1] >= basket_pos[1] - (self.basket_size / 2) and ball_pos[1] <= basket_pos[1] + ( + self.basket_size / 2) + is_in_basket_z = ball_pos[2] < 0.1 + is_in_basket = is_in_basket_x and is_in_basket_y and is_in_basket_z + if is_in_basket: + self.ball_in_basket = True + + ball_landed = self.get_body_com("ball")[2] <= 0.05 + done = bool(ball_landed or is_in_basket) + + rewards = 0 + + ctrl_cost = self.control_cost(action) + + costs = ctrl_cost + + if self.current_step >= self.max_episode_steps or done: + + if is_in_basket: + if not self.context: + rewards += self.hit_basket_reward + else: + dist = np.linalg.norm(ball_pos - basket_center) + if self.context: + rewards = -10 * dist + else: + rewards -= (dist * dist) + else: + # penalty not needed + rewards += ((action[ + :2] > 0) * self.penalty).sum() if self.current_step < 10 else 0 # too much of a penalty? + + observation = self._get_obs() + reward = rewards - costs + info = { + 'ball_pos': ball_pos[0], + } + + return observation, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.basket_x) + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): + if self.max_episode_steps == 10: + # We have to initialize this here, because the spec is only added after creating the env. + self.max_episode_steps = self.spec.max_episode_steps + + self.current_step = 0 + self.ball_in_basket = False + if self.context: + self.basket_x = self.np_random.uniform(low=3, high=7, size=1) + self.model.body("basket_ground").pos[:] = [self.basket_x[0], 0, 0] + return super().reset() + + # overwrite reset_model to make it deterministic + def reset_model(self): + # Todo remove if not needed! + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation + + +if __name__ == '__main__': + render_mode = "human" # "human" or "partial" or "final" + env = HopperThrowInBasketEnv() + obs = env.reset() + + for i in range(2000): + # objective.load_result("/tmp/cma") + # test with random actions + ac = env.action_space.sample() + obs, rew, d, info = env.step(ac) + if i % 10 == 0: + env.render(mode=render_mode) + if d: + print('After ', i, ' steps, done: ', d) + env.reset() + + env.close() diff --git a/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py new file mode 100644 index 0000000..cad680a --- /dev/null +++ b/fancy_gym/envs/mujoco/hopper_throw/mp_wrapper.py @@ -0,0 +1,23 @@ +from typing import Tuple, Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self): + return np.hstack([ + [False] * 17, + [True] # goal pos + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.data.qpos[3:6].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[3:6].copy() diff --git a/fancy_gym/envs/mujoco/reacher/__init__.py b/fancy_gym/envs/mujoco/reacher/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/reacher/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/alr_envs/alr/mujoco/reacher/assets/reacher_5links.xml b/fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml similarity index 95% rename from alr_envs/alr/mujoco/reacher/assets/reacher_5links.xml rename to fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml index 25a3208..742f45c 100644 --- a/alr_envs/alr/mujoco/reacher/assets/reacher_5links.xml +++ b/fancy_gym/envs/mujoco/reacher/assets/reacher_5links.xml @@ -41,7 +41,7 @@ - + diff --git a/alr_envs/alr/mujoco/reacher/assets/reacher_7links.xml b/fancy_gym/envs/mujoco/reacher/assets/reacher_7links.xml similarity index 100% rename from alr_envs/alr/mujoco/reacher/assets/reacher_7links.xml rename to fancy_gym/envs/mujoco/reacher/assets/reacher_7links.xml diff --git a/fancy_gym/envs/mujoco/reacher/mp_wrapper.py b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py new file mode 100644 index 0000000..0464640 --- /dev/null +++ b/fancy_gym/envs/mujoco/reacher/mp_wrapper.py @@ -0,0 +1,26 @@ +from typing import Union, Tuple + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self): + return np.concatenate([[False] * self.n_links, # cos + [False] * self.n_links, # sin + [True] * 2, # goal position + [False] * self.n_links, # angular velocity + [False] * 3, # goal distance + # [False], # step + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qpos.flat[:self.n_links] + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel.flat[:self.n_links] diff --git a/fancy_gym/envs/mujoco/reacher/reacher.py b/fancy_gym/envs/mujoco/reacher/reacher.py new file mode 100644 index 0000000..ccd0073 --- /dev/null +++ b/fancy_gym/envs/mujoco/reacher/reacher.py @@ -0,0 +1,110 @@ +import os + +import numpy as np +from gym import utils +from gym.envs.mujoco import MujocoEnv + +MAX_EPISODE_STEPS_REACHER = 200 + + +class ReacherEnv(MujocoEnv, utils.EzPickle): + """ + More general version of the gym mujoco Reacher environment + """ + + def __init__(self, sparse: bool = False, n_links: int = 5, reward_weight: float = 1, ctrl_cost_weight: float = 1): + utils.EzPickle.__init__(**locals()) + + self._steps = 0 + self.n_links = n_links + + self.sparse = sparse + + self._ctrl_cost_weight = ctrl_cost_weight + self._reward_weight = reward_weight + + file_name = f'reacher_{n_links}links.xml' + + MujocoEnv.__init__(self, + model_path=os.path.join(os.path.dirname(__file__), "assets", file_name), + frame_skip=2, + mujoco_bindings="mujoco") + + def step(self, action): + self._steps += 1 + + is_reward = not self.sparse or (self.sparse and self._steps == MAX_EPISODE_STEPS_REACHER) + + reward_dist = 0.0 + angular_vel = 0.0 + if is_reward: + reward_dist = self.distance_reward() + angular_vel = self.velocity_reward() + + reward_ctrl = -self._ctrl_cost_weight * np.square(action).sum() + + reward = reward_dist + reward_ctrl + angular_vel + self.do_simulation(action, self.frame_skip) + ob = self._get_obs() + done = False + + infos = dict( + reward_dist=reward_dist, + reward_ctrl=reward_ctrl, + velocity=angular_vel, + end_effector=self.get_body_com("fingertip").copy(), + goal=self.goal if hasattr(self, "goal") else None + ) + + return ob, reward, done, infos + + def distance_reward(self): + vec = self.get_body_com("fingertip") - self.get_body_com("target") + return -self._reward_weight * np.linalg.norm(vec) + + def velocity_reward(self): + return -10 * np.square(self.data.qvel.flat[:self.n_links]).sum() if self.sparse else 0.0 + + def viewer_setup(self): + self.viewer.cam.trackbodyid = 0 + + def reset_model(self): + qpos = ( + # self.np_random.uniform(low=-0.1, high=0.1, size=self.model.nq) + + self.init_qpos.copy() + ) + while True: + # full space + self.goal = self.np_random.uniform(low=-self.n_links / 10, high=self.n_links / 10, size=2) + # I Quadrant + # self.goal = self.np_random.uniform(low=0, high=self.n_links / 10, size=2) + # II Quadrant + # self.goal = np.random.uniform(low=[-self.n_links / 10, 0], high=[0, self.n_links / 10], size=2) + # II + III Quadrant + # self.goal = np.random.uniform(low=-self.n_links / 10, high=[0, self.n_links / 10], size=2) + # I + II Quadrant + # self.goal = np.random.uniform(low=[-self.n_links / 10, 0], high=self.n_links, size=2) + if np.linalg.norm(self.goal) < self.n_links / 10: + break + + qpos[-2:] = self.goal + qvel = ( + # self.np_random.uniform(low=-0.005, high=0.005, size=self.model.nv) + + self.init_qvel.copy() + ) + qvel[-2:] = 0 + self.set_state(qpos, qvel) + self._steps = 0 + + return self._get_obs() + + def _get_obs(self): + theta = self.data.qpos.flat[:self.n_links] + target = self.get_body_com("target") + return np.concatenate([ + np.cos(theta), + np.sin(theta), + target[:2], # x-y of goal position + self.data.qvel.flat[:self.n_links], # angular velocity + self.get_body_com("fingertip") - target, # goal distance + ]) diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py b/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py new file mode 100644 index 0000000..c5e6d2f --- /dev/null +++ b/fancy_gym/envs/mujoco/walker_2d_jump/__init__.py @@ -0,0 +1 @@ +from .mp_wrapper import MPWrapper diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml new file mode 100644 index 0000000..f3bcbd1 --- /dev/null +++ b/fancy_gym/envs/mujoco/walker_2d_jump/assets/walker2d.xml @@ -0,0 +1,64 @@ + + + + + + + diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py new file mode 100644 index 0000000..d55e9d2 --- /dev/null +++ b/fancy_gym/envs/mujoco/walker_2d_jump/mp_wrapper.py @@ -0,0 +1,23 @@ +from typing import Tuple, Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def context_mask(self): + return np.hstack([ + [False] * 17, + [True] # goal pos + ]) + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.data.qpos[3:9].copy() + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + return self.data.qvel[3:9].copy() diff --git a/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py new file mode 100644 index 0000000..ed663d2 --- /dev/null +++ b/fancy_gym/envs/mujoco/walker_2d_jump/walker_2d_jump.py @@ -0,0 +1,117 @@ +import os +from typing import Optional + +import numpy as np +from gym.envs.mujoco.walker2d_v4 import Walker2dEnv + +MAX_EPISODE_STEPS_WALKERJUMP = 300 + + +# TODO: Right now this environment only considers jumping to a specific height, which is not nice. It should be extended +# to the same structure as the Hopper, where the angles are randomized (->contexts) and the agent should jump as height +# as possible, while landing at a specific target position + + +class Walker2dJumpEnv(Walker2dEnv): + """ + healthy reward 1.0 -> 0.005 -> 0.0025 not from alex + penalty 10 -> 0 not from alex + """ + + def __init__(self, + xml_file='walker2d.xml', + forward_reward_weight=1.0, + ctrl_cost_weight=1e-3, + healthy_reward=0.0025, + terminate_when_unhealthy=True, + healthy_z_range=(0.8, 2.0), + healthy_angle_range=(-1.0, 1.0), + reset_noise_scale=5e-3, + penalty=0, + exclude_current_positions_from_observation=True, + max_episode_steps=300): + self.current_step = 0 + self.max_episode_steps = max_episode_steps + self.max_height = 0 + self._penalty = penalty + self.goal = 0 + xml_file = os.path.join(os.path.dirname(__file__), "assets", xml_file) + super().__init__(xml_file=xml_file, + forward_reward_weight=forward_reward_weight, + ctrl_cost_weight=ctrl_cost_weight, + healthy_reward=healthy_reward, + terminate_when_unhealthy=terminate_when_unhealthy, + healthy_z_range=healthy_z_range, + healthy_angle_range=healthy_angle_range, + reset_noise_scale=reset_noise_scale, + exclude_current_positions_from_observation=exclude_current_positions_from_observation) + + def step(self, action): + self.current_step += 1 + self.do_simulation(action, self.frame_skip) + # pos_after = self.get_body_com("torso")[0] + height = self.get_body_com("torso")[2] + + self.max_height = max(height, self.max_height) + + done = bool(height < 0.2) + + ctrl_cost = self.control_cost(action) + costs = ctrl_cost + rewards = 0 + if self.current_step >= self.max_episode_steps or done: + done = True + height_goal_distance = -10 * (np.linalg.norm(self.max_height - self.goal)) + healthy_reward = self.healthy_reward * self.current_step + + rewards = height_goal_distance + healthy_reward + + observation = self._get_obs() + reward = rewards - costs + info = { + 'height': height, + 'max_height': self.max_height, + 'goal': self.goal, + } + + return observation, reward, done, info + + def _get_obs(self): + return np.append(super()._get_obs(), self.goal) + + def reset(self, *, seed: Optional[int] = None, return_info: bool = False, options: Optional[dict] = None): + self.current_step = 0 + self.max_height = 0 + self.goal = self.np_random.uniform(1.5, 2.5, 1) # 1.5 3.0 + return super().reset() + + # overwrite reset_model to make it deterministic + def reset_model(self): + noise_low = -self._reset_noise_scale + noise_high = self._reset_noise_scale + + qpos = self.init_qpos # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nq) + qvel = self.init_qvel # + self.np_random.uniform(low=noise_low, high=noise_high, size=self.model.nv) + + self.set_state(qpos, qvel) + + observation = self._get_obs() + return observation + + +if __name__ == '__main__': + render_mode = "human" # "human" or "partial" or "final" + env = Walker2dJumpEnv() + obs = env.reset() + + for i in range(6000): + # test with random actions + ac = env.action_space.sample() + obs, rew, d, info = env.step(ac) + if i % 10 == 0: + env.render(mode=render_mode) + if d: + print('After ', i, ' steps, done: ', d) + env.reset() + + env.close() diff --git a/fancy_gym/examples/__init__.py b/fancy_gym/examples/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/alr_envs/examples/examples_dmc.py b/fancy_gym/examples/examples_dmc.py similarity index 51% rename from alr_envs/examples/examples_dmc.py rename to fancy_gym/examples/examples_dmc.py index d223d3c..75648b7 100644 --- a/alr_envs/examples/examples_dmc.py +++ b/fancy_gym/examples/examples_dmc.py @@ -1,11 +1,11 @@ -import alr_envs +import fancy_gym -def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): +def example_dmc(env_id="dmc:fish-swim", seed=1, iterations=1000, render=True): """ Example for running a DMC based env in the step based setting. - The env_id has to be specified as `domain_name-task_name` or - for manipulation tasks as `manipulation-environment_name` + The env_id has to be specified as `domain_name:task_name` or + for manipulation tasks as `domain_name:manipulation-environment_name` Args: env_id: Either `domain_name-task_name` or `manipulation-environment_name` @@ -16,7 +16,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): Returns: """ - env = alr_envs.make(env_id, seed) + env = fancy_gym.make(env_id, seed) rewards = 0 obs = env.reset() print("observation shape:", env.observation_space.shape) @@ -24,12 +24,11 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): for i in range(iterations): ac = env.action_space.sample() + if render: + env.render(mode="human") obs, reward, done, info = env.step(ac) rewards += reward - if render: - env.render("human") - if done: print(env_id, rewards) rewards = 0 @@ -41,12 +40,12 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): """ - Example for running a custom motion primitive based environments. + Example for running a custom movement primitive based environments. Our already registered environments follow the same structure. - Hence, this also allows to adjust hyperparameters of the motion primitives. + Hence, this also allows to adjust hyperparameters of the movement primitives. Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks. We appreciate PRs for custom environments (especially MP wrappers of existing tasks) - for our repo: https://github.com/ALRhub/alr_envs/ + for our repo: https://github.com/ALRhub/fancy_gym/ Args: seed: seed for deterministic behaviour iterations: Number of rollout steps to run @@ -57,34 +56,37 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): """ # Base DMC name, according to structure of above example - base_env = "ball_in_cup-catch" + base_env_id = "dmc:ball_in_cup-catch" - # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. + # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [alr_envs.dmc.suite.ball_in_cup.MPWrapper] - mp_kwargs = { - "num_dof": 2, # degrees of fredom a.k.a. the old action space dimensionality - "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis - "duration": 20, # length of trajectory in s, number of steps = duration / dt - "learn_goal": True, # learn the goal position (recommended) - "alpha_phase": 2, - "bandwidth_factor": 2, - "policy_type": "motor", # controller type, 'velocity', 'position', and 'motor' (torque control) - "weights_scale": 1, # scaling of MP weights - "goal_scale": 1, # scaling of learned goal position - "policy_kwargs": { # only required for torque control/PD-Controller - "p_gains": 0.2, - "d_gains": 0.05 - } - } - kwargs = { - "time_limit": 20, # same as duration value but as max horizon for underlying DMC environment - "episode_length": 1000, # corresponding number of episode steps - # "frame_skip": 1 - } - env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) - # OR for a deterministic ProMP (other mp_kwargs are required, see metaworld_examples): - # env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_args) + wrappers = [fancy_gym.dmc.suite.ball_in_cup.MPWrapper] + # # For a ProMP + trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'} + phase_generator_kwargs = {'phase_generator_type': 'linear'} + controller_kwargs = {'controller_type': 'motor', + "p_gains": 1.0, + "d_gains": 0.1,} + basis_generator_kwargs = {'basis_generator_type': 'zero_rbf', + 'num_basis': 5, + 'num_basis_zero_start': 1 + } + + # For a DMP + # trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp'} + # phase_generator_kwargs = {'phase_generator_type': 'exp', + # 'alpha_phase': 2} + # controller_kwargs = {'controller_type': 'motor', + # "p_gains": 1.0, + # "d_gains": 0.1, + # } + # basis_generator_kwargs = {'basis_generator_type': 'rbf', + # 'num_basis': 5 + # } + env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={}, + traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs, + phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs, + seed=seed) # This renders the full MP trajectory # It is only required to call render() once in the beginning, which renders every consecutive trajectory. @@ -104,7 +106,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): rewards += reward if done: - print(base_env, rewards) + print(base_env_id, rewards) rewards = 0 obs = env.reset() @@ -118,18 +120,18 @@ if __name__ == '__main__': # For rendering DMC # export MUJOCO_GL="osmesa" - render = False + render = True # # Standard DMC Suite tasks - example_dmc("fish-swim", seed=10, iterations=1000, render=render) - - # Manipulation tasks - # Disclaimer: The vision versions are currently not integrated and yield an error - example_dmc("manipulation-reach_site_features", seed=10, iterations=250, render=render) - - # Gym + DMC hybrid task provided in the MP framework + example_dmc("dmc:fish-swim", seed=10, iterations=1000, render=render) + # + # # Manipulation tasks + # # Disclaimer: The vision versions are currently not integrated and yield an error + example_dmc("dmc:manipulation-reach_site_features", seed=10, iterations=250, render=render) + # + # # Gym + DMC hybrid task provided in the MP framework example_dmc("dmc_ball_in_cup-catch_promp-v0", seed=10, iterations=1, render=render) - # Custom DMC task - # Different seed, because the episode is longer for this example and the name+seed combo is already registered above + # Custom DMC task # Different seed, because the episode is longer for this example and the name+seed combo is + # already registered above example_custom_dmc_and_mp(seed=11, iterations=1, render=render) diff --git a/alr_envs/examples/examples_general.py b/fancy_gym/examples/examples_general.py similarity index 77% rename from alr_envs/examples/examples_general.py rename to fancy_gym/examples/examples_general.py index d043d6d..1a89e30 100644 --- a/alr_envs/examples/examples_general.py +++ b/fancy_gym/examples/examples_general.py @@ -3,10 +3,10 @@ from collections import defaultdict import gym import numpy as np -import alr_envs +import fancy_gym -def example_general(env_id="Pendulum-v0", seed=1, iterations=1000, render=True): +def example_general(env_id="Pendulum-v1", seed=1, iterations=1000, render=True): """ Example for running any env in the step based setting. This also includes DMC environments when leveraging our custom make_env function. @@ -21,7 +21,7 @@ def example_general(env_id="Pendulum-v0", seed=1, iterations=1000, render=True): """ - env = alr_envs.make(env_id, seed) + env = fancy_gym.make(env_id, seed) rewards = 0 obs = env.reset() print("Observation shape: ", env.observation_space.shape) @@ -41,7 +41,7 @@ def example_general(env_id="Pendulum-v0", seed=1, iterations=1000, render=True): obs = env.reset() -def example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800): +def example_async(env_id="HoleReacher-v0", n_cpu=4, seed=int('533D', 16), n_samples=800): """ Example for running any env in a vectorized multiprocessing setting to generate more samples faster. This also includes DMC and DMP environments when leveraging our custom make_env function. @@ -56,7 +56,7 @@ def example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=4, seed=int('533D', 16 Returns: Tuple of (obs, reward, done, info) with type np.ndarray """ - env = gym.vector.AsyncVectorEnv([alr_envs.make_rank(env_id, seed, i) for i in range(n_cpu)]) + env = gym.vector.AsyncVectorEnv([fancy_gym.make_rank(env_id, seed, i) for i in range(n_cpu)]) # OR # envs = gym.vector.AsyncVectorEnv([make_env(env_id, seed + i) for i in range(n_cpu)]) @@ -80,23 +80,21 @@ def example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=4, seed=int('533D', 16 rewards[done] = 0 # do not return values above threshold - return (*map(lambda v: np.stack(v)[:n_samples], buffer.values()),) + return *map(lambda v: np.stack(v)[:n_samples], buffer.values()), if __name__ == '__main__': render = True # Basic gym task - example_general("Pendulum-v0", seed=10, iterations=200, render=render) + example_general("Pendulum-v1", seed=10, iterations=200, render=render) - # # Basis task from framework - example_general("alr_envs:HoleReacher-v0", seed=10, iterations=200, render=render) + # Mujoco task from framework + example_general("Reacher5d-v0", seed=10, iterations=200, render=render) # # OpenAI Mujoco task example_general("HalfCheetah-v2", seed=10, render=render) - # # Mujoco task from framework - example_general("alr_envs:ALRReacher-v0", seed=10, iterations=200, render=render) - # Vectorized multiprocessing environments - example_async(env_id="alr_envs:HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200) + # example_async(env_id="HoleReacher-v0", n_cpu=2, seed=int('533D', 16), n_samples=2 * 200) + diff --git a/alr_envs/examples/examples_metaworld.py b/fancy_gym/examples/examples_metaworld.py similarity index 67% rename from alr_envs/examples/examples_metaworld.py rename to fancy_gym/examples/examples_metaworld.py index 9ead50c..0fa7066 100644 --- a/alr_envs/examples/examples_metaworld.py +++ b/fancy_gym/examples/examples_metaworld.py @@ -1,4 +1,4 @@ -import alr_envs +import fancy_gym def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): @@ -17,7 +17,7 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): Returns: """ - env = alr_envs.make(env_id, seed) + env = fancy_gym.make(env_id, seed) rewards = 0 obs = env.reset() print("observation shape:", env.observation_space.shape) @@ -25,14 +25,12 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): for i in range(iterations): ac = env.action_space.sample() - obs, reward, done, info = env.step(ac) - rewards += reward - if render: # THIS NEEDS TO BE SET TO FALSE FOR NOW, BECAUSE THE INTERFACE FOR RENDERING IS DIFFERENT TO BASIC GYM # TODO: Remove this, when Metaworld fixes its interface. env.render(False) - + obs, reward, done, info = env.step(ac) + rewards += reward if done: print(env_id, rewards) rewards = 0 @@ -44,12 +42,12 @@ def example_dmc(env_id="fish-swim", seed=1, iterations=1000, render=True): def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): """ - Example for running a custom motion primitive based environments. + Example for running a custom movement primitive based environments. Our already registered environments follow the same structure. - Hence, this also allows to adjust hyperparameters of the motion primitives. + Hence, this also allows to adjust hyperparameters of the movement primitives. Yet, we recommend the method above if you are just interested in chaining those parameters for existing tasks. We appreciate PRs for custom environments (especially MP wrappers of existing tasks) - for our repo: https://github.com/ALRhub/alr_envs/ + for our repo: https://github.com/ALRhub/fancy_gym/ Args: seed: seed for deterministic behaviour (TODO: currently not working due to an issue in MetaWorld code) iterations: Number of rollout steps to run @@ -60,25 +58,32 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): """ # Base MetaWorld name, according to structure of above example - base_env = "button-press-v2" + base_env_id = "metaworld:button-press-v2" - # Replace this wrapper with the custom wrapper for your environment by inheriting from the MPEnvWrapper. + # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. # You can also add other gym.Wrappers in case they are needed. - wrappers = [alr_envs.meta.goal_and_object_change.MPWrapper] - mp_kwargs = { - "num_dof": 4, # degrees of fredom a.k.a. the old action space dimensionality - "num_basis": 5, # number of basis functions, the new action space has size num_dof x num_basis - "duration": 6.25, # length of trajectory in s, number of steps = duration / dt - "post_traj_time": 0, # pad trajectory with additional zeros at the end (recommended: 0) - "width": 0.025, # width of the basis functions - "zero_start": True, # start from current environment position if True - "weights_scale": 1, # scaling of MP weights - "policy_type": "metaworld", # custom controller type for metaworld environments - } + wrappers = [fancy_gym.meta.goal_object_change_mp_wrapper.MPWrapper] + # # For a ProMP + # trajectory_generator_kwargs = {'trajectory_generator_type': 'promp'} + # phase_generator_kwargs = {'phase_generator_type': 'linear'} + # controller_kwargs = {'controller_type': 'metaworld'} + # basis_generator_kwargs = {'basis_generator_type': 'zero_rbf', + # 'num_basis': 5, + # 'num_basis_zero_start': 1 + # } - env = alr_envs.make_promp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs) - # OR for a DMP (other mp_kwargs are required, see dmc_examples): - # env = alr_envs.make_dmp_env(base_env, wrappers=wrappers, seed=seed, mp_kwargs=mp_kwargs, **kwargs) + # For a DMP + trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp'} + phase_generator_kwargs = {'phase_generator_type': 'exp', + 'alpha_phase': 2} + controller_kwargs = {'controller_type': 'metaworld'} + basis_generator_kwargs = {'basis_generator_type': 'rbf', + 'num_basis': 5 + } + env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={}, + traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs, + phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs, + seed=seed) # This renders the full MP trajectory # It is only required to call render() once in the beginning, which renders every consecutive trajectory. @@ -102,7 +107,7 @@ def example_custom_dmc_and_mp(seed=1, iterations=1, render=True): rewards += reward if done: - print(base_env, rewards) + print(base_env_id, rewards) rewards = 0 obs = env.reset() @@ -118,11 +123,12 @@ if __name__ == '__main__': # export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so render = False - # # Standard DMC Suite tasks - example_dmc("button-press-v2", seed=10, iterations=500, render=render) + # # Standard Meta world tasks + example_dmc("metaworld:button-press-v2", seed=10, iterations=500, render=render) - # MP + MetaWorld hybrid task provided in the our framework + # # MP + MetaWorld hybrid task provided in the our framework example_dmc("ButtonPressProMP-v2", seed=10, iterations=1, render=render) - - # Custom MetaWorld task + # + # # Custom MetaWorld task example_custom_dmc_and_mp(seed=10, iterations=1, render=render) + diff --git a/fancy_gym/examples/examples_movement_primitives.py b/fancy_gym/examples/examples_movement_primitives.py new file mode 100644 index 0000000..22e95ac --- /dev/null +++ b/fancy_gym/examples/examples_movement_primitives.py @@ -0,0 +1,169 @@ +import fancy_gym + + +def example_mp(env_name="HoleReacherProMP-v0", seed=1, iterations=1, render=True): + """ + Example for running a black box based environment, which is already registered + Args: + env_name: Black box env_id + seed: seed for deterministic behaviour + iterations: Number of rollout steps to run + render: Render the episode + + Returns: + + """ + # Equivalent to gym, we have a make function which can be used to create environments. + # It takes care of seeding and enables the use of a variety of external environments using the gym interface. + env = fancy_gym.make(env_name, seed) + + returns = 0 + # env.render(mode=None) + obs = env.reset() + + # number of samples/full trajectories (multiple environment steps) + for i in range(iterations): + + if render and i % 2 == 0: + # This renders the full MP trajectory + # It is only required to call render() once in the beginning, which renders every consecutive trajectory. + # Resetting to no rendering, can be achieved by render(mode=None). + # It is also possible to change the mode multiple times when + # e.g. only every second trajectory should be displayed, such as here + # Just make sure the correct mode is set before executing the step. + env.render(mode="human") + else: + env.render(mode=None) + + # Now the action space is not the raw action but the parametrization of the trajectory generator, + # such as a ProMP + ac = env.action_space.sample() + # This executes a full trajectory and gives back the context (obs) of the last step in the trajectory, or the + # full observation space of the last step, if replanning/sub-trajectory learning is used. The 'reward' is equal + # to the return of a trajectory. Default is the sum over the step-wise rewards. + obs, reward, done, info = env.step(ac) + # Aggregated returns + returns += reward + + if done: + print(reward) + obs = env.reset() + + +def example_custom_mp(env_name="Reacher5dProMP-v0", seed=1, iterations=1, render=True): + """ + Example for running a movement primitive based environment, which is already registered + Args: + env_name: DMP env_id + seed: seed for deterministic behaviour + iterations: Number of rollout steps to run + render: Render the episode + + Returns: + + """ + # Changing the arguments of the black box env is possible by providing them to gym as with all kwargs. + # E.g. here for way to many basis functions + env = fancy_gym.make(env_name, seed, basis_generator_kwargs={'num_basis': 1000}) + # env = fancy_gym.make(env_name, seed) + # mp_dict.update({'black_box_kwargs': {'learn_sub_trajectories': True}}) + # mp_dict.update({'black_box_kwargs': {'do_replanning': lambda pos, vel, t: lambda t: t % 100}}) + + returns = 0 + obs = env.reset() + + # This time rendering every trajectory + if render: + env.render(mode="human") + + # number of samples/full trajectories (multiple environment steps) + for i in range(iterations): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + returns += reward + + if done: + print(i, reward) + obs = env.reset() + + return obs + + +def example_fully_custom_mp(seed=1, iterations=1, render=True): + """ + Example for running a custom movement primitive based environments. + Our already registered environments follow the same structure. + Hence, this also allows to adjust hyperparameters of the movement primitives. + Yet, we recommend the method above if you are just interested in changing those parameters for existing tasks. + We appreciate PRs for custom environments (especially MP wrappers of existing tasks) + for our repo: https://github.com/ALRhub/fancy_gym/ + Args: + seed: seed + iterations: Number of rollout steps to run + render: Render the episode + + Returns: + + """ + + base_env_id = "HoleReacher-v0" + + # Replace this wrapper with the custom wrapper for your environment by inheriting from the RawInterfaceWrapper. + # You can also add other gym.Wrappers in case they are needed. + wrappers = [fancy_gym.envs.classic_control.hole_reacher.MPWrapper] + + # # For a ProMP + # trajectory_generator_kwargs = {'trajectory_generator_type': 'promp', + # 'weight_scale': 2} + # phase_generator_kwargs = {'phase_generator_type': 'linear'} + # controller_kwargs = {'controller_type': 'velocity'} + # basis_generator_kwargs = {'basis_generator_type': 'zero_rbf', + # 'num_basis': 5, + # 'num_basis_zero_start': 1 + # } + + # For a DMP + trajectory_generator_kwargs = {'trajectory_generator_type': 'dmp', + 'weight_scale': 500} + phase_generator_kwargs = {'phase_generator_type': 'exp', + 'alpha_phase': 2.5} + controller_kwargs = {'controller_type': 'velocity'} + basis_generator_kwargs = {'basis_generator_type': 'rbf', + 'num_basis': 5 + } + env = fancy_gym.make_bb(env_id=base_env_id, wrappers=wrappers, black_box_kwargs={}, + traj_gen_kwargs=trajectory_generator_kwargs, controller_kwargs=controller_kwargs, + phase_kwargs=phase_generator_kwargs, basis_kwargs=basis_generator_kwargs, + seed=seed) + + if render: + env.render(mode="human") + + rewards = 0 + obs = env.reset() + + # number of samples/full trajectories (multiple environment steps) + for i in range(iterations): + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + rewards += reward + + if done: + print(rewards) + rewards = 0 + obs = env.reset() + + +if __name__ == '__main__': + render = True + # DMP + example_mp("HoleReacherDMP-v0", seed=10, iterations=5, render=render) + # + # # ProMP + example_mp("HoleReacherProMP-v0", seed=10, iterations=5, render=render) + + # Altered basis functions + obs1 = example_custom_mp("Reacher5dProMP-v0", seed=10, iterations=5, render=render) + + # Custom MP + example_fully_custom_mp(seed=10, iterations=1, render=render) diff --git a/fancy_gym/examples/examples_open_ai.py b/fancy_gym/examples/examples_open_ai.py new file mode 100644 index 0000000..a4a162d --- /dev/null +++ b/fancy_gym/examples/examples_open_ai.py @@ -0,0 +1,37 @@ +import fancy_gym + + +def example_mp(env_name, seed=1, render=True): + """ + Example for running a movement primitive based version of a OpenAI-gym environment, which is already registered. + For more information on movement primitive specific stuff, look at the traj_gen examples. + Args: + env_name: ProMP env_id + seed: seed + render: boolean + Returns: + + """ + # While in this case gym.make() is possible to use as well, we recommend our custom make env function. + env = fancy_gym.make(env_name, seed) + + returns = 0 + obs = env.reset() + # number of samples/full trajectories (multiple environment steps) + for i in range(10): + if render and i % 2 == 0: + env.render(mode="human") + else: + env.render(mode=None) + ac = env.action_space.sample() + obs, reward, done, info = env.step(ac) + returns += reward + + if done: + print(returns) + obs = env.reset() + + +if __name__ == '__main__': + example_mp("ReacherProMP-v2") + diff --git a/fancy_gym/examples/pd_control_gain_tuning.py b/fancy_gym/examples/pd_control_gain_tuning.py new file mode 100644 index 0000000..407bfa1 --- /dev/null +++ b/fancy_gym/examples/pd_control_gain_tuning.py @@ -0,0 +1,69 @@ +from collections import OrderedDict + +import numpy as np +from matplotlib import pyplot as plt + +import fancy_gym + +# This might work for some environments, however, please verify either way the correct trajectory information +# for your environment are extracted below +SEED = 1 + +env_id = "Reacher5dProMP-v0" + +env = fancy_gym.make(env_id, seed=SEED, controller_kwargs={'p_gains': 0.05, 'd_gains': 0.05}).env +env.action_space.seed(SEED) + +# Plot difference between real trajectory and target MP trajectory +env.reset() +w = env.action_space.sample() +pos, vel = env.get_trajectory(w) + +base_shape = env.env.action_space.shape +actual_pos = np.zeros((len(pos), *base_shape)) +actual_vel = np.zeros((len(pos), *base_shape)) +act = np.zeros((len(pos), *base_shape)) + +plt.ion() +fig = plt.figure() +ax = fig.add_subplot(1, 1, 1) + +img = ax.imshow(env.env.render(mode="rgb_array")) +fig.show() + +for t, pos_vel in enumerate(zip(pos, vel)): + actions = env.tracking_controller.get_action(pos_vel[0], pos_vel[1], env.current_vel, env.current_pos) + actions = np.clip(actions, env.env.action_space.low, env.env.action_space.high) + _, _, _, _ = env.env.step(actions) + if t % 15 == 0: + img.set_data(env.env.render(mode="rgb_array")) + fig.canvas.draw() + fig.canvas.flush_events() + act[t, :] = actions + # TODO verify for your environment + actual_pos[t, :] = env.current_pos + actual_vel[t, :] = env.current_vel + +plt.figure(figsize=(15, 5)) + +plt.subplot(131) +plt.title("Position") +p1 = plt.plot(actual_pos, c='C0', label="true") +p2 = plt.plot(pos, c='C1', label="MP") +plt.xlabel("Episode steps") +handles, labels = plt.gca().get_legend_handles_labels() + +by_label = OrderedDict(zip(labels, handles)) +plt.legend(by_label.values(), by_label.keys()) + +plt.subplot(132) +plt.title("Velocity") +plt.plot(actual_vel, c='C0', label="true") +plt.plot(vel, c='C1', label="MP") +plt.xlabel("Episode steps") + +plt.subplot(133) +plt.title(f"Actions {np.std(act, axis=0)}") +plt.plot(act, c="C0"), +plt.xlabel("Episode steps") +plt.show() diff --git a/alr_envs/meta/README.MD b/fancy_gym/meta/README.MD similarity index 100% rename from alr_envs/meta/README.MD rename to fancy_gym/meta/README.MD diff --git a/alr_envs/meta/__init__.py b/fancy_gym/meta/__init__.py similarity index 51% rename from alr_envs/meta/__init__.py rename to fancy_gym/meta/__init__.py index 5651224..04438f4 100644 --- a/alr_envs/meta/__init__.py +++ b/fancy_gym/meta/__init__.py @@ -1,58 +1,64 @@ +from copy import deepcopy + from gym import register from . import goal_object_change_mp_wrapper, goal_change_mp_wrapper, goal_endeffector_change_mp_wrapper, \ object_change_mp_wrapper -ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} +ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} # MetaWorld +DEFAULT_BB_DICT_ProMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'promp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'linear' + }, + "controller_kwargs": { + 'controller_type': 'metaworld', + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'zero_rbf', + 'num_basis': 5, + 'num_basis_zero_start': 1 + } +} + _goal_change_envs = ["assembly-v2", "pick-out-of-hole-v2", "plate-slide-v2", "plate-slide-back-v2", "plate-slide-side-v2", "plate-slide-back-side-v2"] for _task in _goal_change_envs: task_id_split = _task.split("-") name = "".join([s.capitalize() for s in task_id_split[:-1]]) _env_id = f'{name}ProMP-{task_id_split[-1]}' + kwargs_dict_goal_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_goal_change_promp['wrappers'].append(goal_change_mp_wrapper.MPWrapper) + kwargs_dict_goal_change_promp['name'] = f'metaworld:{_task}' + register( id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": _task, - "wrappers": [goal_change_mp_wrapper.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "metaworld", - } - } + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_goal_change_promp ) - ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) _object_change_envs = ["bin-picking-v2", "hammer-v2", "sweep-into-v2"] for _task in _object_change_envs: task_id_split = _task.split("-") name = "".join([s.capitalize() for s in task_id_split[:-1]]) _env_id = f'{name}ProMP-{task_id_split[-1]}' + kwargs_dict_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_object_change_promp['wrappers'].append(object_change_mp_wrapper.MPWrapper) + kwargs_dict_object_change_promp['name'] = f'metaworld:{_task}' register( id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": _task, - "wrappers": [object_change_mp_wrapper.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "metaworld", - } - } + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_object_change_promp ) - ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) _goal_and_object_change_envs = ["box-close-v2", "button-press-v2", "button-press-wall-v2", "button-press-topdown-v2", "button-press-topdown-wall-v2", "coffee-button-v2", "coffee-pull-v2", @@ -69,43 +75,29 @@ for _task in _goal_and_object_change_envs: task_id_split = _task.split("-") name = "".join([s.capitalize() for s in task_id_split[:-1]]) _env_id = f'{name}ProMP-{task_id_split[-1]}' + kwargs_dict_goal_and_object_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_goal_and_object_change_promp['wrappers'].append(goal_object_change_mp_wrapper.MPWrapper) + kwargs_dict_goal_and_object_change_promp['name'] = f'metaworld:{_task}' + register( id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": _task, - "wrappers": [goal_object_change_mp_wrapper.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "metaworld", - } - } + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_goal_and_object_change_promp ) - ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) _goal_and_endeffector_change_envs = ["basketball-v2"] for _task in _goal_and_endeffector_change_envs: task_id_split = _task.split("-") name = "".join([s.capitalize() for s in task_id_split[:-1]]) _env_id = f'{name}ProMP-{task_id_split[-1]}' + kwargs_dict_goal_and_endeffector_change_promp = deepcopy(DEFAULT_BB_DICT_ProMP) + kwargs_dict_goal_and_endeffector_change_promp['wrappers'].append(goal_endeffector_change_mp_wrapper.MPWrapper) + kwargs_dict_goal_and_endeffector_change_promp['name'] = f'metaworld:{_task}' + register( id=_env_id, - entry_point='alr_envs.utils.make_env_helpers:make_promp_env_helper', - kwargs={ - "name": _task, - "wrappers": [goal_endeffector_change_mp_wrapper.MPWrapper], - "mp_kwargs": { - "num_dof": 4, - "num_basis": 5, - "duration": 6.25, - "post_traj_time": 0, - "zero_start": True, - "policy_type": "metaworld", - } - } + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_goal_and_endeffector_change_promp ) - ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) + ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append(_env_id) diff --git a/fancy_gym/meta/base_metaworld_mp_wrapper.py b/fancy_gym/meta/base_metaworld_mp_wrapper.py new file mode 100644 index 0000000..4029e28 --- /dev/null +++ b/fancy_gym/meta/base_metaworld_mp_wrapper.py @@ -0,0 +1,20 @@ +from typing import Tuple, Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class BaseMetaworldMPWrapper(RawInterfaceWrapper): + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + r_close = self.env.data.get_joint_qpos("r_close") + # TODO check if this is correct + # return np.hstack([self.env.data.get_body_xpos('hand').flatten() / self.env.action_scale, r_close]) + return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) + + @property + def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: + # TODO check if this is correct + return np.zeros(4, ) + # raise NotImplementedError("Velocity cannot be retrieved.") diff --git a/alr_envs/meta/goal_change_mp_wrapper.py b/fancy_gym/meta/goal_change_mp_wrapper.py similarity index 71% rename from alr_envs/meta/goal_change_mp_wrapper.py rename to fancy_gym/meta/goal_change_mp_wrapper.py index a558365..a8eabb5 100644 --- a/alr_envs/meta/goal_change_mp_wrapper.py +++ b/fancy_gym/meta/goal_change_mp_wrapper.py @@ -1,19 +1,17 @@ -from typing import Tuple, Union - import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(BaseMetaworldMPWrapper): """ This Wrapper is for environments where merely the goal changes in the beginning and no secondary objects or end effectors are altered at the start of an episode. You can verify this by executing the code below for your environment id and check if the output is non-zero at the same indices. ```python - import alr_envs - env = alr_envs.make(env_id, 1) + import fancy_gym + env = fancy_gym.make(env_id, 1) print(env.reset() - env.reset()) array([ 0. , 0. , 0. , 0. , 0, 0 , 0 , 0. , 0. , 0. , @@ -27,7 +25,7 @@ class MPWrapper(MPEnvWrapper): """ @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # This structure is the same for all metaworld environments. # Only the observations which change could differ return np.hstack([ @@ -49,20 +47,3 @@ class MPWrapper(MPEnvWrapper): # Goal [True] * 3, # goal position ]) - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - r_close = self.env.data.get_joint_qpos("r_close") - return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - raise NotImplementedError("Velocity cannot be retrieved.") - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/meta/goal_endeffector_change_mp_wrapper.py b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py similarity index 71% rename from alr_envs/meta/goal_endeffector_change_mp_wrapper.py rename to fancy_gym/meta/goal_endeffector_change_mp_wrapper.py index 8912a72..c299597 100644 --- a/alr_envs/meta/goal_endeffector_change_mp_wrapper.py +++ b/fancy_gym/meta/goal_endeffector_change_mp_wrapper.py @@ -1,19 +1,17 @@ -from typing import Tuple, Union - import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(BaseMetaworldMPWrapper): """ This Wrapper is for environments where merely the goal changes in the beginning and no secondary objects or end effectors are altered at the start of an episode. You can verify this by executing the code below for your environment id and check if the output is non-zero at the same indices. ```python - import alr_envs - env = alr_envs.make(env_id, 1) + import fancy_gym + env = fancy_gym.make(env_id, 1) print(env.reset() - env.reset()) array([ !=0 , !=0 , !=0 , 0. , 0., 0. , 0. , 0. , 0. , 0. , @@ -27,7 +25,7 @@ class MPWrapper(MPEnvWrapper): """ @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # This structure is the same for all metaworld environments. # Only the observations which change could differ return np.hstack([ @@ -49,20 +47,3 @@ class MPWrapper(MPEnvWrapper): # Goal [True] * 3, # goal position ]) - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - r_close = self.env.data.get_joint_qpos("r_close") - return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - raise NotImplementedError("Velocity cannot be retrieved.") - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/meta/goal_object_change_mp_wrapper.py b/fancy_gym/meta/goal_object_change_mp_wrapper.py similarity index 71% rename from alr_envs/meta/goal_object_change_mp_wrapper.py rename to fancy_gym/meta/goal_object_change_mp_wrapper.py index 63e16b7..ae667a6 100644 --- a/alr_envs/meta/goal_object_change_mp_wrapper.py +++ b/fancy_gym/meta/goal_object_change_mp_wrapper.py @@ -1,19 +1,17 @@ -from typing import Tuple, Union - import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(BaseMetaworldMPWrapper): """ This Wrapper is for environments where merely the goal changes in the beginning and no secondary objects or end effectors are altered at the start of an episode. You can verify this by executing the code below for your environment id and check if the output is non-zero at the same indices. ```python - import alr_envs - env = alr_envs.make(env_id, 1) + import fancy_gym + env = fancy_gym.make(env_id, 1) print(env.reset() - env.reset()) array([ 0. , 0. , 0. , 0. , !=0, !=0 , !=0 , 0. , 0. , 0. , @@ -27,7 +25,7 @@ class MPWrapper(MPEnvWrapper): """ @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # This structure is the same for all metaworld environments. # Only the observations which change could differ return np.hstack([ @@ -49,20 +47,3 @@ class MPWrapper(MPEnvWrapper): # Goal [True] * 3, # goal position ]) - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - r_close = self.env.data.get_joint_qpos("r_close") - return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - raise NotImplementedError("Velocity cannot be retrieved.") - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/meta/object_change_mp_wrapper.py b/fancy_gym/meta/object_change_mp_wrapper.py similarity index 71% rename from alr_envs/meta/object_change_mp_wrapper.py rename to fancy_gym/meta/object_change_mp_wrapper.py index 4293148..6faecc9 100644 --- a/alr_envs/meta/object_change_mp_wrapper.py +++ b/fancy_gym/meta/object_change_mp_wrapper.py @@ -1,19 +1,17 @@ -from typing import Tuple, Union - import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.meta.base_metaworld_mp_wrapper import BaseMetaworldMPWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(BaseMetaworldMPWrapper): """ This Wrapper is for environments where merely the goal changes in the beginning and no secondary objects or end effectors are altered at the start of an episode. You can verify this by executing the code below for your environment id and check if the output is non-zero at the same indices. ```python - import alr_envs - env = alr_envs.make(env_id, 1) + import fancy_gym + env = fancy_gym.make(env_id, 1) print(env.reset() - env.reset()) array([ 0. , 0. , 0. , 0. , !=0 , !=0 , !=0 , 0. , 0. , 0. , @@ -27,7 +25,7 @@ class MPWrapper(MPEnvWrapper): """ @property - def active_obs(self): + def context_mask(self) -> np.ndarray: # This structure is the same for all metaworld environments. # Only the observations which change could differ return np.hstack([ @@ -49,20 +47,3 @@ class MPWrapper(MPEnvWrapper): # Goal [True] * 3, # goal position ]) - - @property - def current_pos(self) -> Union[float, int, np.ndarray]: - r_close = self.env.data.get_joint_qpos("r_close") - return np.hstack([self.env.data.mocap_pos.flatten() / self.env.action_scale, r_close]) - - @property - def current_vel(self) -> Union[float, int, np.ndarray, Tuple]: - raise NotImplementedError("Velocity cannot be retrieved.") - - @property - def goal_pos(self) -> Union[float, int, np.ndarray, Tuple]: - raise ValueError("Goal position is not available and has to be learnt based on the environment.") - - @property - def dt(self) -> Union[float, int]: - return self.env.dt diff --git a/alr_envs/open_ai/README.MD b/fancy_gym/open_ai/README.MD similarity index 100% rename from alr_envs/open_ai/README.MD rename to fancy_gym/open_ai/README.MD diff --git a/fancy_gym/open_ai/__init__.py b/fancy_gym/open_ai/__init__.py new file mode 100644 index 0000000..3e0b770 --- /dev/null +++ b/fancy_gym/open_ai/__init__.py @@ -0,0 +1,118 @@ +from copy import deepcopy + +from gym import register + +from . import mujoco +from .deprecated_needs_gym_robotics import robotics + +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS = {"DMP": [], "ProMP": []} + +DEFAULT_BB_DICT_ProMP = { + "name": 'EnvName', + "wrappers": [], + "trajectory_generator_kwargs": { + 'trajectory_generator_type': 'promp' + }, + "phase_generator_kwargs": { + 'phase_generator_type': 'linear' + }, + "controller_kwargs": { + 'controller_type': 'motor', + "p_gains": 1.0, + "d_gains": 0.1, + }, + "basis_generator_kwargs": { + 'basis_generator_type': 'zero_rbf', + 'num_basis': 5, + 'num_basis_zero_start': 1 + } +} + +kwargs_dict_reacher_promp = deepcopy(DEFAULT_BB_DICT_ProMP) +kwargs_dict_reacher_promp['controller_kwargs']['p_gains'] = 0.6 +kwargs_dict_reacher_promp['controller_kwargs']['d_gains'] = 0.075 +kwargs_dict_reacher_promp['basis_generator_kwargs']['num_basis'] = 6 +kwargs_dict_reacher_promp['name'] = "Reacher-v2" +kwargs_dict_reacher_promp['wrappers'].append(mujoco.reacher_v2.MPWrapper) +register( + id='ReacherProMP-v2', + entry_point='fancy_gym.utils.make_env_helpers:make_bb_env_helper', + kwargs=kwargs_dict_reacher_promp +) +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("ReacherProMP-v2") +""" +The Fetch environments are not supported by gym anymore. A new repository (gym_robotics) is supporting the environments. +However, the usage and so on needs to be checked + +register( + id='FetchSlideDenseProMP-v1', + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchSlideDense-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "traj_gen_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideDenseProMP-v1") + +register( + id='FetchSlideProMP-v1', + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchSlide-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "traj_gen_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchSlideProMP-v1") + +register( + id='FetchReachDenseProMP-v1', + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchReachDense-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "traj_gen_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachDenseProMP-v1") + +register( + id='FetchReachProMP-v1', + entry_point='fancy_gym.utils.make_env_helpers:make_promp_env_helper', + kwargs={ + "name": "gym.envs.robotics:FetchReach-v1", + "wrappers": [FlattenObservation, robotics.fetch.MPWrapper], + "traj_gen_kwargs": { + "num_dof": 4, + "num_basis": 5, + "duration": 2, + "post_traj_time": 0, + "zero_start": True, + "policy_type": "position" + } + } +) +ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS["ProMP"].append("FetchReachProMP-v1") +""" diff --git a/fancy_gym/open_ai/deprecated_needs_gym_robotics/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/alr_envs/open_ai/robotics/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/__init__.py similarity index 100% rename from alr_envs/open_ai/robotics/__init__.py rename to fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/__init__.py diff --git a/alr_envs/dmc/suite/ball_in_cup/__init__.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/__init__.py similarity index 100% rename from alr_envs/dmc/suite/ball_in_cup/__init__.py rename to fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/__init__.py diff --git a/alr_envs/open_ai/robotics/fetch/mp_wrapper.py b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py similarity index 94% rename from alr_envs/open_ai/robotics/fetch/mp_wrapper.py rename to fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py index 218e175..e96698a 100644 --- a/alr_envs/open_ai/robotics/fetch/mp_wrapper.py +++ b/fancy_gym/open_ai/deprecated_needs_gym_robotics/robotics/fetch/mp_wrapper.py @@ -2,10 +2,10 @@ from typing import Union import numpy as np -from mp_env_api import MPEnvWrapper +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper -class MPWrapper(MPEnvWrapper): +class MPWrapper(RawInterfaceWrapper): @property def active_obs(self): diff --git a/alr_envs/open_ai/mujoco/__init__.py b/fancy_gym/open_ai/mujoco/__init__.py similarity index 100% rename from alr_envs/open_ai/mujoco/__init__.py rename to fancy_gym/open_ai/mujoco/__init__.py diff --git a/alr_envs/dmc/suite/reacher/__init__.py b/fancy_gym/open_ai/mujoco/reacher_v2/__init__.py similarity index 100% rename from alr_envs/dmc/suite/reacher/__init__.py rename to fancy_gym/open_ai/mujoco/reacher_v2/__init__.py diff --git a/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py new file mode 100644 index 0000000..b2fa04c --- /dev/null +++ b/fancy_gym/open_ai/mujoco/reacher_v2/mp_wrapper.py @@ -0,0 +1,26 @@ +from typing import Union + +import numpy as np + +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper + + +class MPWrapper(RawInterfaceWrapper): + + @property + def current_vel(self) -> Union[float, int, np.ndarray]: + return self.sim.data.qvel[:2] + + @property + def current_pos(self) -> Union[float, int, np.ndarray]: + return self.sim.data.qpos[:2] + + @property + def context_mask(self): + return np.concatenate([ + [False] * 2, # cos of two links + [False] * 2, # sin of two links + [True] * 2, # goal position + [False] * 2, # angular velocity + [False] * 3, # goal distance + ]) diff --git a/alr_envs/alr/mujoco/ball_in_a_cup/__init__.py b/fancy_gym/utils/__init__.py similarity index 100% rename from alr_envs/alr/mujoco/ball_in_a_cup/__init__.py rename to fancy_gym/utils/__init__.py diff --git a/fancy_gym/utils/make_env_helpers.py b/fancy_gym/utils/make_env_helpers.py new file mode 100644 index 0000000..5221423 --- /dev/null +++ b/fancy_gym/utils/make_env_helpers.py @@ -0,0 +1,356 @@ +import logging +import re +import uuid +from collections.abc import MutableMapping +from copy import deepcopy +from math import ceil +from typing import Iterable, Type, Union + +import gym +import numpy as np +from gym.envs.registration import register, registry + +try: + from dm_control import suite, manipulation +except ImportError: + pass + +try: + import metaworld +except Exception: + # catch Exception as Import error does not catch missing mujoco-py + pass + +import fancy_gym +from fancy_gym.black_box.black_box_wrapper import BlackBoxWrapper +from fancy_gym.black_box.factory.basis_generator_factory import get_basis_generator +from fancy_gym.black_box.factory.controller_factory import get_controller +from fancy_gym.black_box.factory.phase_generator_factory import get_phase_generator +from fancy_gym.black_box.factory.trajectory_generator_factory import get_trajectory_generator +from fancy_gym.black_box.raw_interface_wrapper import RawInterfaceWrapper +from fancy_gym.utils.time_aware_observation import TimeAwareObservation +from fancy_gym.utils.utils import nested_update + + +def make_rank(env_id: str, seed: int, rank: int = 0, return_callable=True, **kwargs): + """ + TODO: Do we need this? + Generate a callable to create a new gym environment with a given seed. + The rank is added to the seed and can be used for example when using vector environments. + E.g. [make_rank("my_env_name-v0", 123, i) for i in range(8)] creates a list of 8 environments + with seeds 123 through 130. + Hence, testing environments should be seeded with a value which is offset by the number of training environments. + Here e.g. [make_rank("my_env_name-v0", 123 + 8, i) for i in range(5)] for 5 testing environmetns + + Args: + env_id: name of the environment + seed: seed for deterministic behaviour + rank: environment rank for deterministic over multiple seeds behaviour + return_callable: If True returns a callable to create the environment instead of the environment itself. + + Returns: + + """ + + def f(): + return make(env_id, seed + rank, **kwargs) + + return f if return_callable else f() + + +def make(env_id: str, seed: int, **kwargs): + """ + Converts an env_id to an environment with the gym API. + This also works for DeepMind Control Suite environments that are wrapped using the DMCWrapper, they can be + specified with "dmc:domain_name-task_name" + Analogously, metaworld tasks can be created as "metaworld:env_id-v2". + + Args: + env_id: spec or env_id for gym tasks, external environments require a domain specification + **kwargs: Additional kwargs for the constructor such as pixel observations, etc. + + Returns: Gym environment + + """ + + if ':' in env_id: + split_id = env_id.split(':') + framework, env_id = split_id[-2:] + else: + framework = None + + if framework == 'metaworld': + # MetaWorld environment + env = make_metaworld(env_id, seed, **kwargs) + elif framework == 'dmc': + # DeepMind Control environment + env = make_dmc(env_id, seed, **kwargs) + else: + env = make_gym(env_id, seed, **kwargs) + + env.seed(seed) + env.action_space.seed(seed) + env.observation_space.seed(seed) + + return env + + +def _make_wrapped_env(env_id: str, wrappers: Iterable[Type[gym.Wrapper]], seed=1, **kwargs): + """ + Helper function for creating a wrapped gym environment using MPs. + It adds all provided wrappers to the specified environment and verifies at least one RawInterfaceWrapper is + provided to expose the interface for MPs. + + Args: + env_id: name of the environment + wrappers: list of wrappers (at least an RawInterfaceWrapper), + seed: seed of environment + + Returns: gym environment with all specified wrappers applied + + """ + # _env = gym.make(env_id) + _env = make(env_id, seed, **kwargs) + has_black_box_wrapper = False + for w in wrappers: + # only wrap the environment if not BlackBoxWrapper, e.g. for vision + if issubclass(w, RawInterfaceWrapper): + has_black_box_wrapper = True + _env = w(_env) + if not has_black_box_wrapper: + raise ValueError("A RawInterfaceWrapper is required in order to leverage movement primitive environments.") + return _env + + +def make_bb( + env_id: str, wrappers: Iterable, black_box_kwargs: MutableMapping, traj_gen_kwargs: MutableMapping, + controller_kwargs: MutableMapping, phase_kwargs: MutableMapping, basis_kwargs: MutableMapping, seed: int = 1, + **kwargs): + """ + This can also be used standalone for manually building a custom DMP environment. + Args: + black_box_kwargs: kwargs for the black-box wrapper + basis_kwargs: kwargs for the basis generator + phase_kwargs: kwargs for the phase generator + controller_kwargs: kwargs for the tracking controller + env_id: base_env_name, + wrappers: list of wrappers (at least an RawInterfaceWrapper), + seed: seed of environment + traj_gen_kwargs: dict of at least {num_dof: int, num_basis: int} for DMP + + Returns: DMP wrapped gym env + + """ + _verify_time_limit(traj_gen_kwargs.get("duration", None), kwargs.get("time_limit", None)) + + learn_sub_trajs = black_box_kwargs.get('learn_sub_trajectories') + do_replanning = black_box_kwargs.get('replanning_schedule') + if learn_sub_trajs and do_replanning: + raise ValueError('Cannot used sub-trajectory learning and replanning together.') + + # add time_step observation when replanning + if (learn_sub_trajs or do_replanning) and not any(issubclass(w, TimeAwareObservation) for w in wrappers): + # Add as first wrapper in order to alter observation + wrappers.insert(0, TimeAwareObservation) + + env = _make_wrapped_env(env_id=env_id, wrappers=wrappers, seed=seed, **kwargs) + + traj_gen_kwargs['action_dim'] = traj_gen_kwargs.get('action_dim', np.prod(env.action_space.shape).item()) + + if black_box_kwargs.get('duration') is None: + black_box_kwargs['duration'] = env.spec.max_episode_steps * env.dt + if phase_kwargs.get('tau') is None: + phase_kwargs['tau'] = black_box_kwargs['duration'] + + if learn_sub_trajs is not None: + # We have to learn the length when learning sub_trajectories trajectories + phase_kwargs['learn_tau'] = True + + phase_gen = get_phase_generator(**phase_kwargs) + basis_gen = get_basis_generator(phase_generator=phase_gen, **basis_kwargs) + controller = get_controller(**controller_kwargs) + traj_gen = get_trajectory_generator(basis_generator=basis_gen, **traj_gen_kwargs) + + bb_env = BlackBoxWrapper(env, trajectory_generator=traj_gen, tracking_controller=controller, + **black_box_kwargs) + + return bb_env + + +def make_bb_env_helper(**kwargs): + """ + Helper function for registering a black box gym environment. + Args: + **kwargs: expects at least the following: + { + "name": base environment name. + "wrappers": list of wrappers (at least an BlackBoxWrapper is required), + "traj_gen_kwargs": { + "trajectory_generator_type": type_of_your_movement_primitive, + non default arguments for the movement primitive instance + ... + } + "controller_kwargs": { + "controller_type": type_of_your_controller, + non default arguments for the tracking_controller instance + ... + }, + "basis_generator_kwargs": { + "basis_generator_type": type_of_your_basis_generator, + non default arguments for the basis generator instance + ... + }, + "phase_generator_kwargs": { + "phase_generator_type": type_of_your_phase_generator, + non default arguments for the phase generator instance + ... + }, + } + + Returns: MP wrapped gym env + + """ + seed = kwargs.pop("seed", None) + wrappers = kwargs.pop("wrappers") + + traj_gen_kwargs = kwargs.pop("trajectory_generator_kwargs", {}) + black_box_kwargs = kwargs.pop('black_box_kwargs', {}) + contr_kwargs = kwargs.pop("controller_kwargs", {}) + phase_kwargs = kwargs.pop("phase_generator_kwargs", {}) + basis_kwargs = kwargs.pop("basis_generator_kwargs", {}) + + return make_bb(env_id=kwargs.pop("name"), wrappers=wrappers, + black_box_kwargs=black_box_kwargs, + traj_gen_kwargs=traj_gen_kwargs, controller_kwargs=contr_kwargs, + phase_kwargs=phase_kwargs, + basis_kwargs=basis_kwargs, **kwargs, seed=seed) + + +def make_dmc( + env_id: str, + seed: int = None, + visualize_reward: bool = True, + time_limit: Union[None, float] = None, + **kwargs +): + if not re.match(r"\w+-\w+", env_id): + raise ValueError("env_id does not have the following structure: 'domain_name-task_name'") + domain_name, task_name = env_id.split("-") + + if task_name.endswith("_vision"): + # TODO + raise ValueError("The vision interface for manipulation tasks is currently not supported.") + + if (domain_name, task_name) not in suite.ALL_TASKS and task_name not in manipulation.ALL: + raise ValueError(f'Specified domain "{domain_name}" and task "{task_name}" combination does not exist.') + + # env_id = f'dmc_{domain_name}_{task_name}_{seed}-v1' + gym_id = uuid.uuid4().hex + '-v1' + + task_kwargs = {'random': seed} + if time_limit is not None: + task_kwargs['time_limit'] = time_limit + + # create task + # Accessing private attribute because DMC does not expose time_limit or step_limit. + # Only the current time_step/time as well as the control_timestep can be accessed. + if domain_name == "manipulation": + env = manipulation.load(environment_name=task_name, seed=seed) + max_episode_steps = ceil(env._time_limit / env.control_timestep()) + else: + env = suite.load(domain_name=domain_name, task_name=task_name, task_kwargs=task_kwargs, + visualize_reward=visualize_reward, environment_kwargs=kwargs) + max_episode_steps = int(env._step_limit) + + register( + id=gym_id, + entry_point='fancy_gym.dmc.dmc_wrapper:DMCWrapper', + kwargs={'env': lambda: env}, + max_episode_steps=max_episode_steps, + ) + + env = gym.make(gym_id) + env.seed(seed) + return env + + +def make_metaworld(env_id: str, seed: int, **kwargs): + if env_id not in metaworld.ML1.ENV_NAMES: + raise ValueError(f'Specified environment "{env_id}" not present in metaworld ML1.') + + _env = metaworld.envs.ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE[env_id + "-goal-observable"](seed=seed, **kwargs) + + # setting this avoids generating the same initialization after each reset + _env._freeze_rand_vec = False + # New argument to use global seeding + _env.seeded_rand_vec = True + + gym_id = uuid.uuid4().hex + '-v1' + + register( + id=gym_id, + entry_point=lambda: _env, + max_episode_steps=_env.max_path_length, + ) + + # TODO enable checker when the incorrect dtype of obs and observation space are fixed by metaworld + env = gym.make(gym_id, disable_env_checker=True) + return env + + +def make_gym(env_id, seed, **kwargs): + """ + Create + Args: + env_id: + seed: + **kwargs: + + Returns: + + """ + # Getting the existing keywords to allow for nested dict updates for BB envs + # gym only allows for non nested updates. + try: + all_kwargs = deepcopy(registry.get(env_id).kwargs) + except AttributeError as e: + logging.error(f'The gym environment with id {env_id} could not been found.') + raise e + nested_update(all_kwargs, kwargs) + kwargs = all_kwargs + + # Add seed to kwargs for bb environments to pass seed to step environments + all_bb_envs = sum(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS.values(), []) + if env_id in all_bb_envs: + kwargs.update({"seed": seed}) + + # Gym + env = gym.make(env_id, **kwargs) + return env + + +def _verify_time_limit(mp_time_limit: Union[None, float], env_time_limit: Union[None, float]): + """ + When using DMC check if a manually specified time limit matches the trajectory duration the MP receives. + Mostly, the time_limit for DMC is not specified and the default values from DMC are taken. + This check, however, can only been done after instantiating the environment. + It can be found in the BaseMP class. + + Args: + mp_time_limit: max trajectory length of traj_gen in seconds + env_time_limit: max trajectory length of DMC environment in seconds + + Returns: + + """ + if mp_time_limit is not None and env_time_limit is not None: + assert mp_time_limit == env_time_limit, \ + f"The specified 'time_limit' of {env_time_limit}s does not match " \ + f"the duration of {mp_time_limit}s for the MP." + + +def _verify_dof(base_env: gym.Env, dof: int): + action_shape = np.prod(base_env.action_space.shape) + assert dof == action_shape, \ + f"The specified degrees of freedom ('num_dof') {dof} do not match " \ + f"the action space of {action_shape} the base environments" diff --git a/fancy_gym/utils/time_aware_observation.py b/fancy_gym/utils/time_aware_observation.py new file mode 100644 index 0000000..b2cbc78 --- /dev/null +++ b/fancy_gym/utils/time_aware_observation.py @@ -0,0 +1,78 @@ +""" +Adapted from: https://github.com/openai/gym/blob/907b1b20dd9ac0cba5803225059b9c6673702467/gym/wrappers/time_aware_observation.py +License: MIT +Copyright (c) 2016 OpenAI (https://openai.com) + +Wrapper for adding time aware observations to environment observation. +""" +import gym +import numpy as np +from gym.spaces import Box + + +class TimeAwareObservation(gym.ObservationWrapper): + """Augment the observation with the current time step in the episode. + + The observation space of the wrapped environment is assumed to be a flat :class:`Box`. + In particular, pixel observations are not supported. This wrapper will append the current timestep + within the current episode to the observation. + + Example: + >>> import gym + >>> env = gym.make('CartPole-v1') + >>> env = TimeAwareObservation(env) + >>> env.reset() + array([ 0.03810719, 0.03522411, 0.02231044, -0.01088205, 0. ]) + >>> env.step(env.action_space.sample())[0] + array([ 0.03881167, -0.16021058, 0.0220928 , 0.28875574, 1. ]) + """ + + def __init__(self, env: gym.Env): + """Initialize :class:`TimeAwareObservation` that requires an environment with a flat :class:`Box` + observation space. + + Args: + env: The environment to apply the wrapper + """ + super().__init__(env) + assert isinstance(env.observation_space, Box) + low = np.append(self.observation_space.low, 0.0) + high = np.append(self.observation_space.high, 1.0) + self.observation_space = Box(low, high, dtype=self.observation_space.dtype) + self.t = 0 + self._max_episode_steps = env.spec.max_episode_steps + + def observation(self, observation): + """Adds to the observation with the current time step normalized with max steps. + + Args: + observation: The observation to add the time step to + + Returns: + The observation with the time step appended to + """ + return np.append(observation, self.t / self._max_episode_steps) + + def step(self, action): + """Steps through the environment, incrementing the time step. + + Args: + action: The action to take + + Returns: + The environment's step using the action. + """ + self.t += 1 + return super().step(action) + + def reset(self, **kwargs): + """Reset the environment setting the time to zero. + + Args: + **kwargs: Kwargs to apply to env.reset() + + Returns: + The reset environment + """ + self.t = 0 + return super().reset(**kwargs) diff --git a/fancy_gym/utils/utils.py b/fancy_gym/utils/utils.py new file mode 100644 index 0000000..f4fea74 --- /dev/null +++ b/fancy_gym/utils/utils.py @@ -0,0 +1,50 @@ +from collections.abc import Mapping, MutableMapping + +import numpy as np +import torch as ch + + +def angle_normalize(x, type="deg"): + """ + normalize angle x to [-pi,pi]. + Args: + x: Angle in either degrees or radians + type: one of "deg" or "rad" for x being in degrees or radians + + Returns: + + """ + + if type not in ["deg", "rad"]: raise ValueError(f"Invalid type {type}. Choose one of 'deg' or 'rad'.") + + if type == "deg": + x = np.deg2rad(x) # x * pi / 180 + + two_pi = 2 * np.pi + return x - two_pi * np.floor((x + np.pi) / two_pi) + + +def get_numpy(x: ch.Tensor): + """ + Returns numpy array from torch tensor + Args: + x: + + Returns: + + """ + return x.detach().cpu().numpy() + + +def nested_update(base: MutableMapping, update): + """ + Updated method for nested Mappings + Args: + base: main Mapping to be updated + update: updated values for base Mapping + + """ + for k, v in update.items(): + base[k] = nested_update(base.get(k, {}), v) if isinstance(v, Mapping) else v + return base + diff --git a/setup.py b/setup.py index b5aa424..1148e85 100644 --- a/setup.py +++ b/setup.py @@ -1,24 +1,46 @@ -from setuptools import setup +import itertools + +from setuptools import setup, find_packages + +# Environment-specific dependencies for dmc and metaworld +extras = { + "dmc": ["dm_control>=1.0.1"], + "metaworld": ["metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld", + 'mujoco-py<2.2,>=2.1', + 'scipy' + ], +} + +# All dependencies +all_groups = set(extras.keys()) +extras["all"] = list(set(itertools.chain.from_iterable(map(lambda group: extras[group], all_groups)))) setup( - name='alr_envs', - version='0.0.1', - packages=['alr_envs', 'alr_envs.alr', 'alr_envs.open_ai', 'alr_envs.dmc', 'alr_envs.meta', 'alr_envs.utils'], - install_requires=[ - 'gym', - 'PyQt5', - 'matplotlib', - 'mp_env_api @ git+https://github.com/ALRhub/motion_primitive_env_api.git', -# 'mp_env_api @ git+ssh://git@github.com/ALRhub/motion_primitive_env_api.git', - 'mujoco-py<2.1,>=2.0', - 'dm_control', - 'metaworld @ git+https://github.com/rlworkgroup/metaworld.git@master#egg=metaworld', + author='Fabian Otto, Onur Celik', + name='fancy_gym', + version='0.2', + classifiers=[ + # Python 3.7 is minimally supported + "Programming Language :: Python :: 3", + "Programming Language :: Python :: 3.7", + "Programming Language :: Python :: 3.8", + "Programming Language :: Python :: 3.9", + "Programming Language :: Python :: 3.10", ], - - url='https://github.com/ALRhub/alr_envs/', - license='MIT', - author='Fabian Otto, Marcel Sandermann, Maximilian Huettenrauch', + extras_require=extras, + install_requires=[ + 'gym[mujoco]<0.25.0,>=0.24.0', + 'mp_pytorch @ git+https://github.com/ALRhub/MP_PyTorch.git@main' + ], + packages=[package for package in find_packages() if package.startswith("fancy_gym")], + package_data={ + "fancy_gym": [ + "envs/mujoco/*/assets/*.xml", + ] + }, + python_requires=">=3.7", + url='https://github.com/ALRhub/fancy_gym/', + # license='AGPL-3.0 license', author_email='', - description='Custom Gym environments for various (robotics) tasks. integration of DMC environments into the' - 'gym interface, and support for using motion primitives with gym environments.' + description='Fancy Gym: Unifying interface for various RL benchmarks with support for Black Box approaches.' ) diff --git a/test/test_dmc_envs.py b/test/test_dmc.py similarity index 51% rename from test/test_dmc_envs.py rename to test/test_dmc.py index a7cd9be..0800a67 100644 --- a/test/test_dmc_envs.py +++ b/test/test_dmc.py @@ -2,52 +2,49 @@ import unittest import gym import numpy as np - from dm_control import suite, manipulation -from alr_envs import make +import fancy_gym +from fancy_gym import make -DMC_ENVS = [f'{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"] -MANIPULATION_SPECS = [f'manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')] +SUITE_IDS = [f'dmc:{env}-{task}' for env, task in suite.ALL_TASKS if env != "lqr"] +MANIPULATION_IDS = [f'dmc:manipulation-{task}' for task in manipulation.ALL if task.endswith('_features')] SEED = 1 -class TestStepDMCEnvironments(unittest.TestCase): +class TestDMCEnvironments(unittest.TestCase): def _run_env(self, env_id, iterations=None, seed=SEED, render=False): """ Example for running a DMC based env in the step based setting. - The env_id has to be specified as `domain_name-task_name` or + The env_id has to be specified as `dmc:domain_name-task_name` or for manipulation tasks as `manipulation-environment_name` Args: - env_id: Either `domain_name-task_name` or `manipulation-environment_name` + env_id: Either `dmc:domain_name-task_name` or `dmc:manipulation-environment_name` iterations: Number of rollout steps to run - seed= random seeding + seed: random seeding render: Render the episode - Returns: + Returns: observations, rewards, dones, actions """ env: gym.Env = make(env_id, seed=seed) rewards = [] observations = [] + actions = [] dones = [] obs = env.reset() self._verify_observations(obs, env.observation_space, "reset()") - length = env.spec.max_episode_steps - if iterations is None: - if length is None: - iterations = 1 - else: - iterations = length + iterations = iterations or (env.spec.max_episode_steps or 1) # number of samples(multiple environment steps) for i in range(iterations): observations.append(obs) ac = env.action_space.sample() + actions.append(ac) # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) obs, reward, done, info = env.step(ac) @@ -62,13 +59,26 @@ class TestStepDMCEnvironments(unittest.TestCase): env.render("human") if done: - obs = env.reset() + break - assert done, "Done flag is not True after max episode length." + assert done, "Done flag is not True after end of episode." observations.append(obs) env.close() del env - return np.array(observations), np.array(rewards), np.array(dones) + return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) + + def _run_env_determinism(self, ids): + seed = 0 + for env_id in ids: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.") + self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") def _verify_observations(self, obs, observation_space, obs_type="reset()"): self.assertTrue(observation_space.contains(obs), @@ -76,51 +86,44 @@ class TestStepDMCEnvironments(unittest.TestCase): f"not contained in observation space {observation_space}.") def _verify_reward(self, reward): - self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.") + self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.") def _verify_done(self, done): self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") - def test_dmc_functionality(self): - """Tests that environments runs without errors using random actions.""" - for env_id in DMC_ENVS: + def test_suite_functionality(self): + """Tests that suite step environments run without errors using random actions.""" + for env_id in SUITE_IDS: with self.subTest(msg=env_id): self._run_env(env_id) - def test_dmc_determinism(self): - """Tests that identical seeds produce identical trajectories.""" - seed = 0 - # Iterate over two trajectories, which should have the same state and action sequence - for env_id in DMC_ENVS: - with self.subTest(msg=env_id): - traj1 = self._run_env(env_id, seed=seed) - traj2 = self._run_env(env_id, seed=seed) - for i, time_step in enumerate(zip(*traj1, *traj2)): - obs1, rwd1, done1, obs2, rwd2, done2 = time_step - self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") - self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") - self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + def test_suite_determinism(self): + """Tests that for step environments identical seeds produce identical trajectories.""" + self._run_env_determinism(SUITE_IDS) def test_manipulation_functionality(self): - """Tests that environments runs without errors using random actions.""" - for env_id in MANIPULATION_SPECS: + """Tests that manipulation step environments run without errors using random actions.""" + for env_id in MANIPULATION_IDS: with self.subTest(msg=env_id): self._run_env(env_id) def test_manipulation_determinism(self): - """Tests that identical seeds produce identical trajectories.""" - seed = 0 - # Iterate over two trajectories, which should have the same state and action sequence - for env_id in MANIPULATION_SPECS: - with self.subTest(msg=env_id): - traj1 = self._run_env(env_id, seed=seed) - traj2 = self._run_env(env_id, seed=seed) - for i, time_step in enumerate(zip(*traj1, *traj2)): - obs1, rwd1, done1, obs2, rwd2, done2 = time_step - self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") - self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") - self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") - self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + """Tests that for step environments identical seeds produce identical trajectories.""" + self._run_env_determinism(MANIPULATION_IDS) + + def test_bb_functionality(self): + """Tests that black box environments run without errors using random actions.""" + for traj_gen, env_ids in fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + for id in env_ids: + with self.subTest(msg=id): + self._run_env(id) + + def test_bb_determinism(self): + """Tests that for black box environment identical seeds produce identical trajectories.""" + for traj_gen, env_ids in fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + self._run_env_determinism(env_ids) if __name__ == '__main__': diff --git a/test/test_envs.py b/test/test_envs.py deleted file mode 100644 index b3263ba..0000000 --- a/test/test_envs.py +++ /dev/null @@ -1,172 +0,0 @@ -import unittest - -import gym -import numpy as np - -import alr_envs # noqa -from alr_envs.utils.make_env_helpers import make - -ALL_SPECS = list(spec for spec in gym.envs.registry.all() if "alr_envs" in spec.entry_point) -SEED = 1 - - -class TestMPEnvironments(unittest.TestCase): - - def _run_env(self, env_id, iterations=None, seed=SEED, render=False): - """ - Example for running a DMC based env in the step based setting. - The env_id has to be specified as `domain_name-task_name` or - for manipulation tasks as `manipulation-environment_name` - - Args: - env_id: Either `domain_name-task_name` or `manipulation-environment_name` - iterations: Number of rollout steps to run - seed= random seeding - render: Render the episode - - Returns: - - """ - env: gym.Env = make(env_id, seed=seed) - rewards = [] - observations = [] - dones = [] - obs = env.reset() - self._verify_observations(obs, env.observation_space, "reset()") - - length = env.spec.max_episode_steps - if iterations is None: - if length is None: - iterations = 1 - else: - iterations = length - - # number of samples(multiple environment steps) - for i in range(iterations): - observations.append(obs) - - ac = env.action_space.sample() - # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) - obs, reward, done, info = env.step(ac) - - self._verify_observations(obs, env.observation_space, "step()") - self._verify_reward(reward) - self._verify_done(done) - - rewards.append(reward) - dones.append(done) - - if render: - env.render("human") - - if done: - obs = env.reset() - - assert done, "Done flag is not True after max episode length." - observations.append(obs) - env.close() - del env - return np.array(observations), np.array(rewards), np.array(dones) - - def _run_env_determinism(self, ids): - seed = 0 - for env_id in ids: - with self.subTest(msg=env_id): - traj1 = self._run_env(env_id, seed=seed) - traj2 = self._run_env(env_id, seed=seed) - for i, time_step in enumerate(zip(*traj1, *traj2)): - obs1, rwd1, done1, obs2, rwd2, done2 = time_step - self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] {obs1} and {obs2} do not match.") - self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") - self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") - - def _verify_observations(self, obs, observation_space, obs_type="reset()"): - self.assertTrue(observation_space.contains(obs), - f"Observation {obs} received from {obs_type} " - f"not contained in observation space {observation_space}.") - - def _verify_reward(self, reward): - self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.") - - def _verify_done(self, done): - self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") - - def test_alr_environment_functionality(self): - """Tests that environments runs without errors using random actions for ALR MP envs.""" - with self.subTest(msg="DMP"): - for env_id in alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - with self.subTest(msg="ProMP"): - for env_id in alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS['ProMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - def test_openai_environment_functionality(self): - """Tests that environments runs without errors using random actions for OpenAI gym MP envs.""" - with self.subTest(msg="DMP"): - for env_id in alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - with self.subTest(msg="ProMP"): - for env_id in alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS['ProMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - def test_dmc_environment_functionality(self): - """Tests that environments runs without errors using random actions for DMC MP envs.""" - with self.subTest(msg="DMP"): - for env_id in alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - with self.subTest(msg="ProMP"): - for env_id in alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS['ProMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - def test_metaworld_environment_functionality(self): - """Tests that environments runs without errors using random actions for Metaworld MP envs.""" - with self.subTest(msg="DMP"): - for env_id in alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS['DMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - with self.subTest(msg="ProMP"): - for env_id in alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS['ProMP']: - with self.subTest(msg=env_id): - self._run_env(env_id) - - def test_alr_environment_determinism(self): - """Tests that identical seeds produce identical trajectories for ALR MP Envs.""" - with self.subTest(msg="DMP"): - self._run_env_determinism(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) - with self.subTest(msg="ProMP"): - self._run_env_determinism(alr_envs.ALL_ALR_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"]) - - def test_openai_environment_determinism(self): - """Tests that identical seeds produce identical trajectories for OpenAI gym MP Envs.""" - with self.subTest(msg="DMP"): - self._run_env_determinism(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) - with self.subTest(msg="ProMP"): - self._run_env_determinism(alr_envs.ALL_GYM_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"]) - - def test_dmc_environment_determinism(self): - """Tests that identical seeds produce identical trajectories for DMC MP Envs.""" - with self.subTest(msg="DMP"): - self._run_env_determinism(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) - with self.subTest(msg="ProMP"): - self._run_env_determinism(alr_envs.ALL_DEEPMIND_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"]) - - def test_metaworld_environment_determinism(self): - """Tests that identical seeds produce identical trajectories for Metaworld MP Envs.""" - with self.subTest(msg="DMP"): - self._run_env_determinism(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["DMP"]) - with self.subTest(msg="ProMP"): - self._run_env_determinism(alr_envs.ALL_METAWORLD_MOTION_PRIMITIVE_ENVIRONMENTS["ProMP"]) - - -if __name__ == '__main__': - unittest.main() diff --git a/test/test_metaworld_envs.py b/test/test_fancy.py similarity index 59% rename from test/test_metaworld_envs.py rename to test/test_fancy.py index b84ba3c..d4890cc 100644 --- a/test/test_metaworld_envs.py +++ b/test/test_fancy.py @@ -3,14 +3,15 @@ import unittest import gym import numpy as np -from alr_envs import make -from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE +import fancy_gym # noqa +from fancy_gym.utils.make_env_helpers import make -ALL_ENVS = [env.split("-goal-observable")[0] for env, _ in ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()] +CUSTOM_IDS = [spec.id for spec in gym.envs.registry.all() if + "fancy_gym" in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point] SEED = 1 -class TestStepMetaWorlEnvironments(unittest.TestCase): +class TestCustomEnvironments(unittest.TestCase): def _run_env(self, env_id, iterations=None, seed=SEED, render=False): """ @@ -21,26 +22,21 @@ class TestStepMetaWorlEnvironments(unittest.TestCase): Args: env_id: Either `domain_name-task_name` or `manipulation-environment_name` iterations: Number of rollout steps to run - seed= random seeding + seed: random seeding render: Render the episode - Returns: + Returns: observations, rewards, dones, actions """ env: gym.Env = make(env_id, seed=seed) rewards = [] - observations = [] actions = [] + observations = [] dones = [] obs = env.reset() self._verify_observations(obs, env.observation_space, "reset()") - length = env.max_path_length - if iterations is None: - if length is None: - iterations = 1 - else: - iterations = length + iterations = iterations or (env.spec.max_episode_steps or 1) # number of samples(multiple environment steps) for i in range(iterations): @@ -48,7 +44,6 @@ class TestStepMetaWorlEnvironments(unittest.TestCase): ac = env.action_space.sample() actions.append(ac) - # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) obs, reward, done, info = env.step(ac) self._verify_observations(obs, env.observation_space, "step()") @@ -62,36 +57,17 @@ class TestStepMetaWorlEnvironments(unittest.TestCase): env.render("human") if done: - obs = env.reset() + break - assert done, "Done flag is not True after max episode length." + assert done, "Done flag is not True after end of episode." observations.append(obs) env.close() del env return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) - def _verify_observations(self, obs, observation_space, obs_type="reset()"): - self.assertTrue(observation_space.contains(obs), - f"Observation {obs} received from {obs_type} " - f"not contained in observation space {observation_space}.") - - def _verify_reward(self, reward): - self.assertIsInstance(reward, float, f"Returned {reward} as reward, expected float.") - - def _verify_done(self, done): - self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") - - def test_metaworld_functionality(self): - """Tests that environments runs without errors using random actions.""" - for env_id in ALL_ENVS: - with self.subTest(msg=env_id): - self._run_env(env_id) - - def test_metaworld_determinism(self): - """Tests that identical seeds produce identical trajectories.""" + def _run_env_determinism(self, ids): seed = 0 - # Iterate over two trajectories, which should have the same state and action sequence - for env_id in ALL_ENVS: + for env_id in ids: with self.subTest(msg=env_id): traj1 = self._run_env(env_id, seed=seed) traj2 = self._run_env(env_id, seed=seed) @@ -99,9 +75,44 @@ class TestStepMetaWorlEnvironments(unittest.TestCase): obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.") self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.") - self.assertAlmostEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + def _verify_observations(self, obs, observation_space, obs_type="reset()"): + self.assertTrue(observation_space.contains(obs), + f"Observation {obs} received from {obs_type} " + f"not contained in observation space {observation_space}.") + + def _verify_reward(self, reward): + self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.") + + def _verify_done(self, done): + self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") + + def test_step_functionality(self): + """Tests that step environments run without errors using random actions.""" + for env_id in CUSTOM_IDS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_step_determinism(self): + """Tests that for step environments identical seeds produce identical trajectories.""" + self._run_env_determinism(CUSTOM_IDS) + + def test_bb_functionality(self): + """Tests that black box environments run without errors using random actions.""" + for traj_gen, env_ids in fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + for id in env_ids: + with self.subTest(msg=id): + self._run_env(id) + + def test_bb_determinism(self): + """Tests that for black box environment identical seeds produce identical trajectories.""" + for traj_gen, env_ids in fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + self._run_env_determinism(env_ids) + if __name__ == '__main__': unittest.main() diff --git a/test/test_gym.py b/test/test_gym.py new file mode 100644 index 0000000..c8252d0 --- /dev/null +++ b/test/test_gym.py @@ -0,0 +1,118 @@ +import unittest + +import gym +import numpy as np + +import fancy_gym +from fancy_gym import make + +GYM_IDS = [spec.id for spec in gym.envs.registry.all() if + "fancy_gym" not in spec.entry_point and 'make_bb_env_helper' not in spec.entry_point] +SEED = 1 + + +class TestGymEnvironments(unittest.TestCase): + + def _run_env(self, env_id, iterations=None, seed=SEED, render=False): + """ + Example for running a openai gym env in the step based setting. + The env_id has to be specified as `env_id-vX`. + + Args: + env_id: env id in the form `env_id-vX` + iterations: Number of rollout steps to run + seed: random seeding + render: Render the episode + + Returns: + + """ + env: gym.Env = make(env_id, seed=seed) + rewards = [] + observations = [] + actions = [] + dones = [] + obs = env.reset() + self._verify_observations(obs, env.observation_space, "reset()") + + iterations = iterations or (env.spec.max_episode_steps or 1) + + # number of samples(multiple environment steps) + for i in range(iterations): + observations.append(obs) + + ac = env.action_space.sample() + actions.append(ac) + # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) + obs, reward, done, info = env.step(ac) + + self._verify_observations(obs, env.observation_space, "step()") + self._verify_reward(reward) + self._verify_done(done) + + rewards.append(reward) + dones.append(done) + + if render: + env.render("human") + + if done: + break + + assert done or env.spec.max_episode_steps is None, "Done flag is not True after end of episode." + observations.append(obs) + env.close() + del env + return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) + + def _run_env_determinism(self, ids): + seed = 0 + for env_id in ids: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step + self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.") + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + + def _verify_observations(self, obs, observation_space, obs_type="reset()"): + self.assertTrue(observation_space.contains(obs), + f"Observation {obs} received from {obs_type} " + f"not contained in observation space {observation_space}.") + + def _verify_reward(self, reward): + self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.") + + def _verify_done(self, done): + self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") + + def test_step_functionality(self): + """Tests that step environments run without errors using random actions.""" + for env_id in GYM_IDS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_step_determinism(self): + """Tests that for step environments identical seeds produce identical trajectories.""" + self._run_env_determinism(GYM_IDS) + + def test_bb_functionality(self): + """Tests that black box environments run without errors using random actions.""" + for traj_gen, env_ids in fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + for id in env_ids: + with self.subTest(msg=id): + self._run_env(id) + + def test_bb_determinism(self): + """Tests that for black box environment identical seeds produce identical trajectories.""" + for traj_gen, env_ids in fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + self._run_env_determinism(env_ids) + + +if __name__ == '__main__': + unittest.main() diff --git a/test/test_metaworld.py b/test/test_metaworld.py new file mode 100644 index 0000000..f10d54a --- /dev/null +++ b/test/test_metaworld.py @@ -0,0 +1,119 @@ +import unittest + +import gym +import numpy as np +from metaworld.envs import ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE + +import fancy_gym +from fancy_gym import make + +METAWORLD_IDS = [f'metaworld:{env.split("-goal-observable")[0]}' for env, _ in + ALL_V2_ENVIRONMENTS_GOAL_OBSERVABLE.items()] +SEED = 1 + + +class TestMetaWorldEnvironments(unittest.TestCase): + + def _run_env(self, env_id, iterations=None, seed=SEED, render=False): + """ + Example for running a metaworld based env in the step based setting. + The env_id has to be specified as `metaworld:env_id-vX`. + + Args: + env_id: env id in the form `metaworld:env_id-vX` + iterations: Number of rollout steps to run + seed: random seeding + render: Render the episode + + Returns: + + """ + env: gym.Env = make(env_id, seed=seed) + rewards = [] + observations = [] + actions = [] + dones = [] + obs = env.reset() + self._verify_observations(obs, env.observation_space, "reset()") + + iterations = iterations or (env.spec.max_episode_steps or 1) + + # number of samples(multiple environment steps) + for i in range(iterations): + observations.append(obs) + + ac = env.action_space.sample() + actions.append(ac) + # ac = np.random.uniform(env.action_space.low, env.action_space.high, env.action_space.shape) + obs, reward, done, info = env.step(ac) + + self._verify_observations(obs, env.observation_space, "step()") + self._verify_reward(reward) + self._verify_done(done) + + rewards.append(reward) + dones.append(done) + + if render: + env.render("human") + + if done: + break + + assert done, "Done flag is not True after end of episode." + observations.append(obs) + env.close() + del env + return np.array(observations), np.array(rewards), np.array(dones), np.array(actions) + + def _run_env_determinism(self, ids): + seed = 0 + for env_id in ids: + with self.subTest(msg=env_id): + traj1 = self._run_env(env_id, seed=seed) + traj2 = self._run_env(env_id, seed=seed) + for i, time_step in enumerate(zip(*traj1, *traj2)): + obs1, rwd1, done1, ac1, obs2, rwd2, done2, ac2 = time_step + self.assertTrue(np.array_equal(ac1, ac2), f"Actions [{i}] delta {ac1 - ac2} is not zero.") + self.assertTrue(np.array_equal(obs1, obs2), f"Observations [{i}] delta {obs1 - obs2} is not zero.") + self.assertEqual(rwd1, rwd2, f"Rewards [{i}] {rwd1} and {rwd2} do not match.") + self.assertEqual(done1, done2, f"Dones [{i}] {done1} and {done2} do not match.") + + def _verify_observations(self, obs, observation_space, obs_type="reset()"): + self.assertTrue(observation_space.contains(obs), + f"Observation {obs} received from {obs_type} " + f"not contained in observation space {observation_space}.") + + def _verify_reward(self, reward): + self.assertIsInstance(reward, (float, int), f"Returned type {type(reward)} as reward, expected float or int.") + + def _verify_done(self, done): + self.assertIsInstance(done, bool, f"Returned {done} as done flag, expected bool.") + + def test_step_functionality(self): + """Tests that step environments run without errors using random actions.""" + for env_id in METAWORLD_IDS: + with self.subTest(msg=env_id): + self._run_env(env_id) + + def test_step_determinism(self): + """Tests that for step environments identical seeds produce identical trajectories.""" + self._run_env_determinism(METAWORLD_IDS) + + def test_bb_functionality(self): + """Tests that black box environments run without errors using random actions.""" + for traj_gen, env_ids in fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + for id in env_ids: + with self.subTest(msg=id): + self._run_env(id) + + def test_bb_determinism(self): + """Tests that for black box environment identical seeds produce identical trajectories.""" + for traj_gen, env_ids in fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS.items(): + with self.subTest(msg=traj_gen): + self._run_env_determinism(env_ids) + + +if __name__ == '__main__': + unittest.main()