Create 'Guide' from README
This commit is contained in:
parent
81dbdc5745
commit
268b74e5bd
127
docs/source/guide/basic_usage.rst
Normal file
127
docs/source/guide/basic_usage.rst
Normal file
@ -0,0 +1,127 @@
|
||||
Basic Usage
|
||||
-----------
|
||||
|
||||
We will only show the basics here and prepared `multiple
|
||||
examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__
|
||||
for a more detailed look.
|
||||
|
||||
Step-Based Environments
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Regular step based environments added by Fancy Gym are added into the
|
||||
``fancy/`` namespace.
|
||||
|
||||
.. note::
|
||||
Legacy versions of Fancy Gym used ``fancy_gym.make(...)``. This is no longer supported and will raise an Exception on new versions.
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
env = gym.make('fancy/Reacher5d-v0')
|
||||
# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
|
||||
# or env = gym.make('dm_control/ball_in_cup-catch-v0')
|
||||
# or env = gym.make('Reacher-v2')
|
||||
observation = env.reset(seed=1)
|
||||
|
||||
for i in range(1000):
|
||||
action = env.action_space.sample()
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
if i % 5 == 0:
|
||||
env.render()
|
||||
|
||||
if terminated or truncated:
|
||||
observation, info = env.reset()
|
||||
|
||||
Black-box Environments
|
||||
~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
All environments provide by default the cumulative episode reward, this
|
||||
can however be changed if necessary. Optionally, each environment
|
||||
returns all collected information from each step as part of the infos.
|
||||
This information is, however, mainly meant for debugging as well as
|
||||
logging and not for training.
|
||||
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| Key | Description | Type |
|
||||
+=====================+============================================================================================================================================+==========+
|
||||
| `positions` | Generated trajectory from MP | Optional |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `velocities` | Generated trajectory from MP | Optional |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `step_actions` | Step-wise executed action based on controller output | Optional |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `step_observations` | Step-wise intermediate observations | Optional |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `step_rewards` | Step-wise rewards | Optional |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `trajectory_length` | Total number of environment interactions | Always |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
| `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. | Always |
|
||||
| | In case some information are not provided every time step, the missing values are filled with `None`. | |
|
||||
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||
|
||||
Existing MP tasks can be created the same way as above. The namespace of
|
||||
a MP-variant of an environment is given by
|
||||
``<original namespace>_<MP name>/``. Just keep in mind, calling
|
||||
``step()`` executes a full trajectory.
|
||||
|
||||
| **Note:**
|
||||
| Currently, we are also in the process of enabling replanning as
|
||||
well as learning of sub-trajectories. This allows to split the
|
||||
episode into multiple trajectories and is a hybrid setting between
|
||||
step-based and black-box leaning. While this is already
|
||||
implemented, it is still in beta and requires further testing. Feel
|
||||
free to try it and open an issue with any problems that occur.
|
||||
|
||||
.. code:: python
|
||||
|
||||
import gymnasium as gym
|
||||
import fancy_gym
|
||||
|
||||
env = gym.make('fancy_ProMP/Reacher5d-v0')
|
||||
# or env = gym.make('metaworld_ProDMP/reach-v2')
|
||||
# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
|
||||
# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
|
||||
|
||||
# render() can be called once in the beginning with all necessary arguments.
|
||||
# To turn it of again just call render() without any arguments.
|
||||
env.render(mode='human')
|
||||
|
||||
# This returns the context information, not the full state observation
|
||||
observation, info = env.reset(seed=1)
|
||||
|
||||
for i in range(5):
|
||||
action = env.action_space.sample()
|
||||
observation, reward, terminated, truncated, info = env.step(action)
|
||||
|
||||
# terminated or truncated is always True as we are working on the episode level, hence we always reset()
|
||||
observation, info = env.reset()
|
||||
|
||||
To show all available environments, we provide some additional
|
||||
convenience variables. All of them return a dictionary with the keys
|
||||
``DMP``, ``ProMP``, ``ProDMP`` and ``all`` that store a list of
|
||||
available environment ids.
|
||||
|
||||
.. code:: python
|
||||
|
||||
import fancy_gym
|
||||
|
||||
print("All Black-box tasks:")
|
||||
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("Fancy Black-box tasks:")
|
||||
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("OpenAI Gym Black-box tasks:")
|
||||
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("Deepmind Control Black-box tasks:")
|
||||
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("MetaWorld Black-box tasks:")
|
||||
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||
|
||||
print("If you add custom envs, their mp versions will be found in:")
|
||||
print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
|
50
docs/source/guide/episodic_rl.rst
Normal file
50
docs/source/guide/episodic_rl.rst
Normal file
@ -0,0 +1,50 @@
|
||||
What is Episodic RL?
|
||||
--------------------
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<p align="justify">
|
||||
|
||||
Movement primitive (MP) environments differ from traditional step-based
|
||||
environments. They align more with concepts from stochastic search,
|
||||
black-box optimization, and methods commonly found in classical robotics
|
||||
and control. Instead of individual steps, MP environments operate on an
|
||||
episode basis, executing complete trajectories. These trajectories are
|
||||
produced by trajectory generators like Dynamic Movement Primitives
|
||||
(DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic
|
||||
Dynamic Movement Primitives (ProDMP).
|
||||
|
||||
.. raw:: html
|
||||
|
||||
</p>
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<p align="justify">
|
||||
|
||||
Once generated, these trajectories are converted into step-by-step
|
||||
actions using a trajectory tracking controller. The specific controller
|
||||
chosen depends on the environment’s requirements. Currently, we support
|
||||
position, velocity, and PD-Controllers tailored for position, velocity,
|
||||
and torque control. Additionally, we have a specialized controller
|
||||
designed for the MetaWorld control suite.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
</p>
|
||||
|
||||
.. raw:: html
|
||||
|
||||
<p align="justify">
|
||||
|
||||
While the overarching objective of MP environments remains the learning
|
||||
of an optimal policy, the actions here represent the parametrization of
|
||||
motion primitives to craft the right trajectory. Our framework further
|
||||
enhances this by accommodating a contextual setting. At the episode’s
|
||||
onset, we present the context space—a subset of the observation space.
|
||||
This demands the prediction of a new action or MP parametrization for
|
||||
every unique context.
|
||||
|
||||
.. raw:: html
|
||||
|
||||
</p>
|
72
docs/source/guide/installation.rst
Normal file
72
docs/source/guide/installation.rst
Normal file
@ -0,0 +1,72 @@
|
||||
Installation
|
||||
------------
|
||||
|
||||
We recommend installing ``fancy_gym`` into a virtual environment as
|
||||
provided by `venv <https://docs.python.org/3/library/venv.html>`__. 3rd
|
||||
party alternatives to venv like `Poetry <https://python-poetry.org/>`__
|
||||
or `Conda <https://docs.conda.io/en/latest/>`__ can also be used.
|
||||
|
||||
Installation from PyPI (recommended)
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Install ``fancy_gym`` via
|
||||
|
||||
.. code:: bash
|
||||
|
||||
pip install fancy_gym
|
||||
|
||||
We have a few optional dependencies. If you also want to install those
|
||||
use
|
||||
|
||||
.. code:: bash
|
||||
|
||||
# to install all optional dependencies
|
||||
pip install 'fancy_gym[all]'
|
||||
|
||||
# or choose only those you want
|
||||
pip install 'fancy_gym[dmc,box2d,mujoco-legacy,jax,testing]'
|
||||
|
||||
Pip can not automatically install up-to-date versions of metaworld,
|
||||
since they are not avaible on PyPI yet. Install metaworld via
|
||||
|
||||
.. code:: bash
|
||||
|
||||
pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
|
||||
|
||||
Installation from master
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
1. Clone the repository
|
||||
|
||||
.. code:: bash
|
||||
|
||||
git clone git@github.com:ALRhub/fancy_gym.git
|
||||
|
||||
2. Go to the folder
|
||||
|
||||
.. code:: bash
|
||||
|
||||
cd fancy_gym
|
||||
|
||||
3. Install with
|
||||
|
||||
.. code:: bash
|
||||
|
||||
pip install -e .
|
||||
|
||||
We have a few optional dependencies. If you also want to install those
|
||||
use
|
||||
|
||||
.. code:: bash
|
||||
|
||||
# to install all optional dependencies
|
||||
pip install -e '.[all]'
|
||||
|
||||
# or choose only those you want
|
||||
pip install -e '.[dmc,box2d,mujoco-legacy,jax,testing]'
|
||||
|
||||
Metaworld has to be installed manually with
|
||||
|
||||
.. code:: bash
|
||||
|
||||
pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
|
136
docs/source/guide/upgrading_envs.rst
Normal file
136
docs/source/guide/upgrading_envs.rst
Normal file
@ -0,0 +1,136 @@
|
||||
Creating new MP Environments
|
||||
----------------------------
|
||||
|
||||
In case a required task is not supported yet in the MP framework, it can
|
||||
be created relatively easy. For the task at hand, the following
|
||||
`interface <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/black_box/raw_interface_wrapper.py>`__
|
||||
needs to be implemented.
|
||||
|
||||
.. code:: python
|
||||
|
||||
from abc import abstractmethod
|
||||
from typing import Union, Tuple
|
||||
|
||||
import gymnasium as gym
|
||||
import numpy as np
|
||||
|
||||
|
||||
class RawInterfaceWrapper(gym.Wrapper):
|
||||
mp_config = {
|
||||
'ProMP': {},
|
||||
'DMP': {},
|
||||
'ProDMP': {},
|
||||
}
|
||||
|
||||
@property
|
||||
def context_mask(self) -> np.ndarray:
|
||||
"""
|
||||
Returns boolean mask of the same shape as the observation space.
|
||||
It determines whether the observation is returned for the contextual case or not.
|
||||
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||
Returns:
|
||||
bool array representing the indices of the observations
|
||||
"""
|
||||
return np.ones(self.env.observation_space.shape[0], dtype=bool)
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
"""
|
||||
Returns the current position of the action/control dimension.
|
||||
The dimensionality has to match the action/control dimension.
|
||||
This is not required when exclusively using velocity control,
|
||||
it should, however, be implemented regardless.
|
||||
E.g. The joint positions that are directly or indirectly controlled by the action.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||
"""
|
||||
Returns the current velocity of the action/control dimension.
|
||||
The dimensionality has to match the action/control dimension.
|
||||
This is not required when exclusively using position control,
|
||||
it should, however, be implemented regardless.
|
||||
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
||||
"""
|
||||
raise NotImplementedError()
|
||||
|
||||
Default configurations for MPs can be overitten by defining attributes
|
||||
in mp_config. Available parameters are documented in the `MP_PyTorch
|
||||
Userguide <https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md>`__.
|
||||
|
||||
.. code:: python
|
||||
|
||||
class RawInterfaceWrapper(gym.Wrapper):
|
||||
mp_config = {
|
||||
'ProMP': {
|
||||
'phase_generator_kwargs': {
|
||||
'phase_generator_type': 'linear'
|
||||
# When selecting another generator type, the default configuration will not be merged for the attribute.
|
||||
},
|
||||
'controller_kwargs': {
|
||||
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||
},
|
||||
'basis_generator_kwargs': {
|
||||
'num_basis': 3,
|
||||
'num_basis_zero_start': 1,
|
||||
'num_basis_zero_goal': 1,
|
||||
},
|
||||
},
|
||||
'DMP': {},
|
||||
'ProDMP': {}.
|
||||
}
|
||||
|
||||
[...]
|
||||
|
||||
If you created a new task wrapper, feel free to open a PR, so we can
|
||||
integrate it for others to use as well. Without the integration the task
|
||||
can still be used. A rough outline can be shown here, for more details
|
||||
we recommend having a look at the
|
||||
`examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__.
|
||||
|
||||
If the step-based is already registered with gym, you can simply do the
|
||||
following:
|
||||
|
||||
.. code:: python
|
||||
|
||||
fancy_gym.upgrade(
|
||||
id='custom/cool_new_env-v0',
|
||||
mp_wrapper=my_custom_MPWrapper
|
||||
)
|
||||
|
||||
If the step-based is not yet registered with gym we can add both the
|
||||
step-based and MP-versions via
|
||||
|
||||
.. code:: python
|
||||
|
||||
fancy_gym.register(
|
||||
id='custom/cool_new_env-v0',
|
||||
entry_point=my_custom_env,
|
||||
mp_wrapper=my_custom_MPWrapper
|
||||
)
|
||||
|
||||
From this point on, you can access MP-version of your environments via
|
||||
|
||||
.. code:: python
|
||||
|
||||
env = gym.make('custom_ProDMP/cool_new_env-v0')
|
||||
|
||||
rewards = 0
|
||||
observation, info = env.reset()
|
||||
|
||||
# number of samples/full trajectories (multiple environment steps)
|
||||
for i in range(5):
|
||||
ac = env.action_space.sample()
|
||||
observation, reward, terminated, truncated, info = env.step(ac)
|
||||
rewards += reward
|
||||
|
||||
if terminated or truncated:
|
||||
print(rewards)
|
||||
rewards = 0
|
||||
observation, info = env.reset()
|
Loading…
Reference in New Issue
Block a user