Create 'Guide' from README

2023-12-05 18:14:12 +01:00 · 2023-12-05 18:14:12 +01:00 · 268b74e5bd
commit 268b74e5bd
parent 81dbdc5745
4 changed files with 385 additions and 0 deletions
--- a/docs/source/guide/basic_usage.rst
+++ b/docs/source/guide/basic_usage.rst
@ -0,0 +1,127 @@
 Basic Usage
 -----------
 We will only show the basics here and prepared `multiple
 examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__
 for a more detailed look.
 Step-Based Environments
 ~~~~~~~~~~~~~~~~~~~~~~~
 Regular step based environments added by Fancy Gym are added into the
 ``fancy/`` namespace.
 .. note::
    Legacy versions of Fancy Gym used ``fancy_gym.make(...)``. This is no longer supported and will raise an Exception on new versions.
 .. code:: python
   import gymnasium as gym
   import fancy_gym
   env = gym.make('fancy/Reacher5d-v0')
   # or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
   # or env = gym.make('dm_control/ball_in_cup-catch-v0')
   # or env = gym.make('Reacher-v2')
   observation = env.reset(seed=1)
   for i in range(1000):
       action = env.action_space.sample()
       observation, reward, terminated, truncated, info = env.step(action)
       if i % 5 == 0:
           env.render()
       if terminated or truncated:
           observation, info = env.reset()
 Black-box Environments
 ~~~~~~~~~~~~~~~~~~~~~~
 All environments provide by default the cumulative episode reward, this
 can however be changed if necessary. Optionally, each environment
 returns all collected information from each step as part of the infos.
 This information is, however, mainly meant for debugging as well as
 logging and not for training.
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | Key                 | Description                                                                                                                                | Type     |
 +=====================+============================================================================================================================================+==========+
 | `positions`         | Generated trajectory from MP                                                                                                               | Optional |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `velocities`        | Generated trajectory from MP                                                                                                               | Optional |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `step_actions`      | Step-wise executed action based on controller output                                                                                       | Optional |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `step_observations` | Step-wise intermediate observations                                                                                                        | Optional |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `step_rewards`      | Step-wise rewards                                                                                                                          | Optional |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `trajectory_length` | Total number of environment interactions                                                                                                   | Always   |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 | `other`             | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. | Always   |
 |                     | In case some information are not provided every time step, the missing values are filled with `None`.                                      |          |
 +---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
 Existing MP tasks can be created the same way as above. The namespace of
 a MP-variant of an environment is given by
 ``<original namespace>_<MP name>/``. Just keep in mind, calling
 ``step()`` executes a full trajectory.
   | **Note:**
   | Currently, we are also in the process of enabling replanning as
     well as learning of sub-trajectories. This allows to split the
     episode into multiple trajectories and is a hybrid setting between
     step-based and black-box leaning. While this is already
     implemented, it is still in beta and requires further testing. Feel
     free to try it and open an issue with any problems that occur.
 .. code:: python
   import gymnasium as gym
   import fancy_gym
   env = gym.make('fancy_ProMP/Reacher5d-v0')
   # or env = gym.make('metaworld_ProDMP/reach-v2')
   # or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
   # or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
   # render() can be called once in the beginning with all necessary arguments.
   # To turn it of again just call render() without any arguments.
   env.render(mode='human')
   # This returns the context information, not the full state observation
   observation, info = env.reset(seed=1)
   for i in range(5):
       action = env.action_space.sample()
       observation, reward, terminated, truncated, info = env.step(action)
       # terminated or truncated is always True as we are working on the episode level, hence we always reset()
       observation, info = env.reset()
 To show all available environments, we provide some additional
 convenience variables. All of them return a dictionary with the keys
 ``DMP``, ``ProMP``, ``ProDMP`` and ``all`` that store a list of
 available environment ids.
 .. code:: python
   import fancy_gym
   print("All Black-box tasks:")
   print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
   print("Fancy Black-box tasks:")
   print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
   print("OpenAI Gym Black-box tasks:")
   print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
   print("Deepmind Control Black-box tasks:")
   print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
   print("MetaWorld Black-box tasks:")
   print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
   print("If you add custom envs, their mp versions will be found in:")
   print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
--- a/docs/source/guide/episodic_rl.rst
+++ b/docs/source/guide/episodic_rl.rst
@ -0,0 +1,50 @@
 What is Episodic RL?
 --------------------
 .. raw:: html
   <p align="justify">
 Movement primitive (MP) environments differ from traditional step-based
 environments. They align more with concepts from stochastic search,
 black-box optimization, and methods commonly found in classical robotics
 and control. Instead of individual steps, MP environments operate on an
 episode basis, executing complete trajectories. These trajectories are
 produced by trajectory generators like Dynamic Movement Primitives
 (DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic
 Dynamic Movement Primitives (ProDMP).
 .. raw:: html
   </p>
 .. raw:: html
   <p align="justify">
 Once generated, these trajectories are converted into step-by-step
 actions using a trajectory tracking controller. The specific controller
 chosen depends on the environment’s requirements. Currently, we support
 position, velocity, and PD-Controllers tailored for position, velocity,
 and torque control. Additionally, we have a specialized controller
 designed for the MetaWorld control suite.
 .. raw:: html
   </p>
 .. raw:: html
   <p align="justify">
 While the overarching objective of MP environments remains the learning
 of an optimal policy, the actions here represent the parametrization of
 motion primitives to craft the right trajectory. Our framework further
 enhances this by accommodating a contextual setting. At the episode’s
 onset, we present the context space—a subset of the observation space.
 This demands the prediction of a new action or MP parametrization for
 every unique context.
 .. raw:: html
   </p>
--- a/docs/source/guide/installation.rst
+++ b/docs/source/guide/installation.rst
@ -0,0 +1,72 @@
 Installation
 ------------
 We recommend installing ``fancy_gym`` into a virtual environment as
 provided by `venv <https://docs.python.org/3/library/venv.html>`__. 3rd
 party alternatives to venv like `Poetry <https://python-poetry.org/>`__
 or `Conda <https://docs.conda.io/en/latest/>`__ can also be used.
 Installation from PyPI (recommended)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Install ``fancy_gym`` via
 .. code:: bash
   pip install fancy_gym
 We have a few optional dependencies. If you also want to install those
 use
 .. code:: bash
   # to install all optional dependencies
   pip install 'fancy_gym[all]'
   # or choose only those you want
   pip install 'fancy_gym[dmc,box2d,mujoco-legacy,jax,testing]'
 Pip can not automatically install up-to-date versions of metaworld,
 since they are not avaible on PyPI yet. Install metaworld via
 .. code:: bash
   pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
 Installation from master
 ~~~~~~~~~~~~~~~~~~~~~~~~
 1. Clone the repository
 .. code:: bash
   git clone git@github.com:ALRhub/fancy_gym.git
 2. Go to the folder
 .. code:: bash
   cd fancy_gym
 3. Install with
 .. code:: bash
   pip install -e .
 We have a few optional dependencies. If you also want to install those
 use
 .. code:: bash
   # to install all optional dependencies
   pip install -e '.[all]'
   # or choose only those you want
   pip install -e '.[dmc,box2d,mujoco-legacy,jax,testing]'
 Metaworld has to be installed manually with
 .. code:: bash
   pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
--- a/docs/source/guide/upgrading_envs.rst
+++ b/docs/source/guide/upgrading_envs.rst
@ -0,0 +1,136 @@
 Creating new MP Environments
 ----------------------------
 In case a required task is not supported yet in the MP framework, it can
 be created relatively easy. For the task at hand, the following
 `interface <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/black_box/raw_interface_wrapper.py>`__
 needs to be implemented.
 .. code:: python
   from abc import abstractmethod
   from typing import Union, Tuple
   import gymnasium as gym
   import numpy as np
   class RawInterfaceWrapper(gym.Wrapper):
       mp_config = {
           'ProMP': {},
           'DMP': {},
           'ProDMP': {},
       }
       @property
       def context_mask(self) -> np.ndarray:
           """
               Returns boolean mask of the same shape as the observation space.
               It determines whether the observation is returned for the contextual case or not.
               This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
               E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
               context/part of the first observation, the velocities are not necessary in the observation for the task.
               Returns:
                   bool array representing the indices of the observations
           """
           return np.ones(self.env.observation_space.shape[0], dtype=bool)
       @property
       @abstractmethod
       def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
           """
               Returns the current position of the action/control dimension.
               The dimensionality has to match the action/control dimension.
               This is not required when exclusively using velocity control,
               it should, however, be implemented regardless.
               E.g. The joint positions that are directly or indirectly controlled by the action.
           """
           raise NotImplementedError()
       @property
       @abstractmethod
       def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
           """
               Returns the current velocity of the action/control dimension.
               The dimensionality has to match the action/control dimension.
               This is not required when exclusively using position control,
               it should, however, be implemented regardless.
               E.g. The joint velocities that are directly or indirectly controlled by the action.
           """
           raise NotImplementedError()
 Default configurations for MPs can be overitten by defining attributes
 in mp_config. Available parameters are documented in the `MP_PyTorch
 Userguide <https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md>`__.
 .. code:: python
   class RawInterfaceWrapper(gym.Wrapper):
       mp_config = {
           'ProMP': {
               'phase_generator_kwargs': {
                   'phase_generator_type': 'linear'
                   # When selecting another generator type, the default configuration will not be merged for the attribute.
               },
               'controller_kwargs': {
                   'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
                   'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
               },
               'basis_generator_kwargs': {
                   'num_basis': 3,
                   'num_basis_zero_start': 1,
                   'num_basis_zero_goal': 1,
               },
           },
           'DMP': {},
           'ProDMP': {}.
       }
       [...]
 If you created a new task wrapper, feel free to open a PR, so we can
 integrate it for others to use as well. Without the integration the task
 can still be used. A rough outline can be shown here, for more details
 we recommend having a look at the
 `examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__.
 If the step-based is already registered with gym, you can simply do the
 following:
 .. code:: python
   fancy_gym.upgrade(
       id='custom/cool_new_env-v0',
       mp_wrapper=my_custom_MPWrapper
   )
 If the step-based is not yet registered with gym we can add both the
 step-based and MP-versions via
 .. code:: python
   fancy_gym.register(
       id='custom/cool_new_env-v0',
       entry_point=my_custom_env,
       mp_wrapper=my_custom_MPWrapper
   )
 From this point on, you can access MP-version of your environments via
 .. code:: python
   env = gym.make('custom_ProDMP/cool_new_env-v0')
   rewards = 0
   observation, info = env.reset()
   # number of samples/full trajectories (multiple environment steps)
   for i in range(5):
       ac = env.action_space.sample()
       observation, reward, terminated, truncated, info = env.step(ac)
       rewards += reward
       if terminated or truncated:
           print(rewards)
           rewards = 0
           observation, info = env.reset()