Create 'Guide' from README
This commit is contained in:
parent
81dbdc5745
commit
268b74e5bd
127
docs/source/guide/basic_usage.rst
Normal file
127
docs/source/guide/basic_usage.rst
Normal file
@ -0,0 +1,127 @@
|
|||||||
|
Basic Usage
|
||||||
|
-----------
|
||||||
|
|
||||||
|
We will only show the basics here and prepared `multiple
|
||||||
|
examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__
|
||||||
|
for a more detailed look.
|
||||||
|
|
||||||
|
Step-Based Environments
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Regular step based environments added by Fancy Gym are added into the
|
||||||
|
``fancy/`` namespace.
|
||||||
|
|
||||||
|
.. note::
|
||||||
|
Legacy versions of Fancy Gym used ``fancy_gym.make(...)``. This is no longer supported and will raise an Exception on new versions.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
import fancy_gym
|
||||||
|
|
||||||
|
env = gym.make('fancy/Reacher5d-v0')
|
||||||
|
# or env = gym.make('metaworld/reach-v2') # fancy_gym allows access to all metaworld ML1 tasks via the metaworld/ NS
|
||||||
|
# or env = gym.make('dm_control/ball_in_cup-catch-v0')
|
||||||
|
# or env = gym.make('Reacher-v2')
|
||||||
|
observation = env.reset(seed=1)
|
||||||
|
|
||||||
|
for i in range(1000):
|
||||||
|
action = env.action_space.sample()
|
||||||
|
observation, reward, terminated, truncated, info = env.step(action)
|
||||||
|
if i % 5 == 0:
|
||||||
|
env.render()
|
||||||
|
|
||||||
|
if terminated or truncated:
|
||||||
|
observation, info = env.reset()
|
||||||
|
|
||||||
|
Black-box Environments
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
All environments provide by default the cumulative episode reward, this
|
||||||
|
can however be changed if necessary. Optionally, each environment
|
||||||
|
returns all collected information from each step as part of the infos.
|
||||||
|
This information is, however, mainly meant for debugging as well as
|
||||||
|
logging and not for training.
|
||||||
|
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| Key | Description | Type |
|
||||||
|
+=====================+============================================================================================================================================+==========+
|
||||||
|
| `positions` | Generated trajectory from MP | Optional |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `velocities` | Generated trajectory from MP | Optional |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `step_actions` | Step-wise executed action based on controller output | Optional |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `step_observations` | Step-wise intermediate observations | Optional |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `step_rewards` | Step-wise rewards | Optional |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `trajectory_length` | Total number of environment interactions | Always |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
| `other` | All other information from the underlying environment are returned as a list with length `trajectory_length` maintaining the original key. | Always |
|
||||||
|
| | In case some information are not provided every time step, the missing values are filled with `None`. | |
|
||||||
|
+---------------------+--------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
||||||
|
|
||||||
|
Existing MP tasks can be created the same way as above. The namespace of
|
||||||
|
a MP-variant of an environment is given by
|
||||||
|
``<original namespace>_<MP name>/``. Just keep in mind, calling
|
||||||
|
``step()`` executes a full trajectory.
|
||||||
|
|
||||||
|
| **Note:**
|
||||||
|
| Currently, we are also in the process of enabling replanning as
|
||||||
|
well as learning of sub-trajectories. This allows to split the
|
||||||
|
episode into multiple trajectories and is a hybrid setting between
|
||||||
|
step-based and black-box leaning. While this is already
|
||||||
|
implemented, it is still in beta and requires further testing. Feel
|
||||||
|
free to try it and open an issue with any problems that occur.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
import fancy_gym
|
||||||
|
|
||||||
|
env = gym.make('fancy_ProMP/Reacher5d-v0')
|
||||||
|
# or env = gym.make('metaworld_ProDMP/reach-v2')
|
||||||
|
# or env = gym.make('dm_control_DMP/ball_in_cup-catch-v0')
|
||||||
|
# or env = gym.make('gym_ProMP/Reacher-v2') # mp versions of envs added directly by gymnasium are in the gym_<MP-type> NS
|
||||||
|
|
||||||
|
# render() can be called once in the beginning with all necessary arguments.
|
||||||
|
# To turn it of again just call render() without any arguments.
|
||||||
|
env.render(mode='human')
|
||||||
|
|
||||||
|
# This returns the context information, not the full state observation
|
||||||
|
observation, info = env.reset(seed=1)
|
||||||
|
|
||||||
|
for i in range(5):
|
||||||
|
action = env.action_space.sample()
|
||||||
|
observation, reward, terminated, truncated, info = env.step(action)
|
||||||
|
|
||||||
|
# terminated or truncated is always True as we are working on the episode level, hence we always reset()
|
||||||
|
observation, info = env.reset()
|
||||||
|
|
||||||
|
To show all available environments, we provide some additional
|
||||||
|
convenience variables. All of them return a dictionary with the keys
|
||||||
|
``DMP``, ``ProMP``, ``ProDMP`` and ``all`` that store a list of
|
||||||
|
available environment ids.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
import fancy_gym
|
||||||
|
|
||||||
|
print("All Black-box tasks:")
|
||||||
|
print(fancy_gym.ALL_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("Fancy Black-box tasks:")
|
||||||
|
print(fancy_gym.ALL_FANCY_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("OpenAI Gym Black-box tasks:")
|
||||||
|
print(fancy_gym.ALL_GYM_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("Deepmind Control Black-box tasks:")
|
||||||
|
print(fancy_gym.ALL_DMC_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("MetaWorld Black-box tasks:")
|
||||||
|
print(fancy_gym.ALL_METAWORLD_MOVEMENT_PRIMITIVE_ENVIRONMENTS)
|
||||||
|
|
||||||
|
print("If you add custom envs, their mp versions will be found in:")
|
||||||
|
print(fancy_gym.MOVEMENT_PRIMITIVE_ENVIRONMENTS_FOR_NS['<my_custom_namespace>'])
|
50
docs/source/guide/episodic_rl.rst
Normal file
50
docs/source/guide/episodic_rl.rst
Normal file
@ -0,0 +1,50 @@
|
|||||||
|
What is Episodic RL?
|
||||||
|
--------------------
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<p align="justify">
|
||||||
|
|
||||||
|
Movement primitive (MP) environments differ from traditional step-based
|
||||||
|
environments. They align more with concepts from stochastic search,
|
||||||
|
black-box optimization, and methods commonly found in classical robotics
|
||||||
|
and control. Instead of individual steps, MP environments operate on an
|
||||||
|
episode basis, executing complete trajectories. These trajectories are
|
||||||
|
produced by trajectory generators like Dynamic Movement Primitives
|
||||||
|
(DMP), Probabilistic Movement Primitives (ProMP) or Probabilistic
|
||||||
|
Dynamic Movement Primitives (ProDMP).
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</p>
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<p align="justify">
|
||||||
|
|
||||||
|
Once generated, these trajectories are converted into step-by-step
|
||||||
|
actions using a trajectory tracking controller. The specific controller
|
||||||
|
chosen depends on the environment’s requirements. Currently, we support
|
||||||
|
position, velocity, and PD-Controllers tailored for position, velocity,
|
||||||
|
and torque control. Additionally, we have a specialized controller
|
||||||
|
designed for the MetaWorld control suite.
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</p>
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
<p align="justify">
|
||||||
|
|
||||||
|
While the overarching objective of MP environments remains the learning
|
||||||
|
of an optimal policy, the actions here represent the parametrization of
|
||||||
|
motion primitives to craft the right trajectory. Our framework further
|
||||||
|
enhances this by accommodating a contextual setting. At the episode’s
|
||||||
|
onset, we present the context space—a subset of the observation space.
|
||||||
|
This demands the prediction of a new action or MP parametrization for
|
||||||
|
every unique context.
|
||||||
|
|
||||||
|
.. raw:: html
|
||||||
|
|
||||||
|
</p>
|
72
docs/source/guide/installation.rst
Normal file
72
docs/source/guide/installation.rst
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
Installation
|
||||||
|
------------
|
||||||
|
|
||||||
|
We recommend installing ``fancy_gym`` into a virtual environment as
|
||||||
|
provided by `venv <https://docs.python.org/3/library/venv.html>`__. 3rd
|
||||||
|
party alternatives to venv like `Poetry <https://python-poetry.org/>`__
|
||||||
|
or `Conda <https://docs.conda.io/en/latest/>`__ can also be used.
|
||||||
|
|
||||||
|
Installation from PyPI (recommended)
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Install ``fancy_gym`` via
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
pip install fancy_gym
|
||||||
|
|
||||||
|
We have a few optional dependencies. If you also want to install those
|
||||||
|
use
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
# to install all optional dependencies
|
||||||
|
pip install 'fancy_gym[all]'
|
||||||
|
|
||||||
|
# or choose only those you want
|
||||||
|
pip install 'fancy_gym[dmc,box2d,mujoco-legacy,jax,testing]'
|
||||||
|
|
||||||
|
Pip can not automatically install up-to-date versions of metaworld,
|
||||||
|
since they are not avaible on PyPI yet. Install metaworld via
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
|
||||||
|
|
||||||
|
Installation from master
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
1. Clone the repository
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
git clone git@github.com:ALRhub/fancy_gym.git
|
||||||
|
|
||||||
|
2. Go to the folder
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
cd fancy_gym
|
||||||
|
|
||||||
|
3. Install with
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
pip install -e .
|
||||||
|
|
||||||
|
We have a few optional dependencies. If you also want to install those
|
||||||
|
use
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
# to install all optional dependencies
|
||||||
|
pip install -e '.[all]'
|
||||||
|
|
||||||
|
# or choose only those you want
|
||||||
|
pip install -e '.[dmc,box2d,mujoco-legacy,jax,testing]'
|
||||||
|
|
||||||
|
Metaworld has to be installed manually with
|
||||||
|
|
||||||
|
.. code:: bash
|
||||||
|
|
||||||
|
pip install metaworld@git+https://github.com/Farama-Foundation/Metaworld.git@d155d0051630bb365ea6a824e02c66c068947439#egg=metaworld
|
136
docs/source/guide/upgrading_envs.rst
Normal file
136
docs/source/guide/upgrading_envs.rst
Normal file
@ -0,0 +1,136 @@
|
|||||||
|
Creating new MP Environments
|
||||||
|
----------------------------
|
||||||
|
|
||||||
|
In case a required task is not supported yet in the MP framework, it can
|
||||||
|
be created relatively easy. For the task at hand, the following
|
||||||
|
`interface <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/black_box/raw_interface_wrapper.py>`__
|
||||||
|
needs to be implemented.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
from abc import abstractmethod
|
||||||
|
from typing import Union, Tuple
|
||||||
|
|
||||||
|
import gymnasium as gym
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
class RawInterfaceWrapper(gym.Wrapper):
|
||||||
|
mp_config = {
|
||||||
|
'ProMP': {},
|
||||||
|
'DMP': {},
|
||||||
|
'ProDMP': {},
|
||||||
|
}
|
||||||
|
|
||||||
|
@property
|
||||||
|
def context_mask(self) -> np.ndarray:
|
||||||
|
"""
|
||||||
|
Returns boolean mask of the same shape as the observation space.
|
||||||
|
It determines whether the observation is returned for the contextual case or not.
|
||||||
|
This effectively allows to filter unwanted or unnecessary observations from the full step-based case.
|
||||||
|
E.g. Velocities starting at 0 are only changing after the first action. Given we only receive the
|
||||||
|
context/part of the first observation, the velocities are not necessary in the observation for the task.
|
||||||
|
Returns:
|
||||||
|
bool array representing the indices of the observations
|
||||||
|
"""
|
||||||
|
return np.ones(self.env.observation_space.shape[0], dtype=bool)
|
||||||
|
|
||||||
|
@property
|
||||||
|
@abstractmethod
|
||||||
|
def current_pos(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
"""
|
||||||
|
Returns the current position of the action/control dimension.
|
||||||
|
The dimensionality has to match the action/control dimension.
|
||||||
|
This is not required when exclusively using velocity control,
|
||||||
|
it should, however, be implemented regardless.
|
||||||
|
E.g. The joint positions that are directly or indirectly controlled by the action.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
@property
|
||||||
|
@abstractmethod
|
||||||
|
def current_vel(self) -> Union[float, int, np.ndarray, Tuple]:
|
||||||
|
"""
|
||||||
|
Returns the current velocity of the action/control dimension.
|
||||||
|
The dimensionality has to match the action/control dimension.
|
||||||
|
This is not required when exclusively using position control,
|
||||||
|
it should, however, be implemented regardless.
|
||||||
|
E.g. The joint velocities that are directly or indirectly controlled by the action.
|
||||||
|
"""
|
||||||
|
raise NotImplementedError()
|
||||||
|
|
||||||
|
Default configurations for MPs can be overitten by defining attributes
|
||||||
|
in mp_config. Available parameters are documented in the `MP_PyTorch
|
||||||
|
Userguide <https://github.com/ALRhub/MP_PyTorch/blob/main/doc/README.md>`__.
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
class RawInterfaceWrapper(gym.Wrapper):
|
||||||
|
mp_config = {
|
||||||
|
'ProMP': {
|
||||||
|
'phase_generator_kwargs': {
|
||||||
|
'phase_generator_type': 'linear'
|
||||||
|
# When selecting another generator type, the default configuration will not be merged for the attribute.
|
||||||
|
},
|
||||||
|
'controller_kwargs': {
|
||||||
|
'p_gains': 0.5 * np.array([1.0, 4.0, 2.0, 4.0, 1.0, 4.0, 1.0]),
|
||||||
|
'd_gains': 0.5 * np.array([0.1, 0.4, 0.2, 0.4, 0.1, 0.4, 0.1]),
|
||||||
|
},
|
||||||
|
'basis_generator_kwargs': {
|
||||||
|
'num_basis': 3,
|
||||||
|
'num_basis_zero_start': 1,
|
||||||
|
'num_basis_zero_goal': 1,
|
||||||
|
},
|
||||||
|
},
|
||||||
|
'DMP': {},
|
||||||
|
'ProDMP': {}.
|
||||||
|
}
|
||||||
|
|
||||||
|
[...]
|
||||||
|
|
||||||
|
If you created a new task wrapper, feel free to open a PR, so we can
|
||||||
|
integrate it for others to use as well. Without the integration the task
|
||||||
|
can still be used. A rough outline can be shown here, for more details
|
||||||
|
we recommend having a look at the
|
||||||
|
`examples <https://github.com/ALRhub/fancy_gym/tree/master/fancy_gym/examples/>`__.
|
||||||
|
|
||||||
|
If the step-based is already registered with gym, you can simply do the
|
||||||
|
following:
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
fancy_gym.upgrade(
|
||||||
|
id='custom/cool_new_env-v0',
|
||||||
|
mp_wrapper=my_custom_MPWrapper
|
||||||
|
)
|
||||||
|
|
||||||
|
If the step-based is not yet registered with gym we can add both the
|
||||||
|
step-based and MP-versions via
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
fancy_gym.register(
|
||||||
|
id='custom/cool_new_env-v0',
|
||||||
|
entry_point=my_custom_env,
|
||||||
|
mp_wrapper=my_custom_MPWrapper
|
||||||
|
)
|
||||||
|
|
||||||
|
From this point on, you can access MP-version of your environments via
|
||||||
|
|
||||||
|
.. code:: python
|
||||||
|
|
||||||
|
env = gym.make('custom_ProDMP/cool_new_env-v0')
|
||||||
|
|
||||||
|
rewards = 0
|
||||||
|
observation, info = env.reset()
|
||||||
|
|
||||||
|
# number of samples/full trajectories (multiple environment steps)
|
||||||
|
for i in range(5):
|
||||||
|
ac = env.action_space.sample()
|
||||||
|
observation, reward, terminated, truncated, info = env.step(ac)
|
||||||
|
rewards += reward
|
||||||
|
|
||||||
|
if terminated or truncated:
|
||||||
|
print(rewards)
|
||||||
|
rewards = 0
|
||||||
|
observation, info = env.reset()
|
Loading…
Reference in New Issue
Block a user