Extended README
This commit is contained in:
parent
8d4f57a59d
commit
85e9e1033d
@ -6,13 +6,10 @@
|
||||
|
||||
An extension to Stable Baselines 3. Based on Metastable Baselines 1.
|
||||
|
||||
During training of a RL-Agent we follow the gradient of the loss, which leads us to a minimum. In cases where the found minimum is merely a local minimum, this can be seen as a _false vacuum_ in our loss space. Exploration mechanisms try to let our training procedure escape these _stable states_: Making them _metastable_.
|
||||
|
||||
In order to archive this, this Repo contains some extensions for [Stable Baselines 3 by DLR-RM](https://github.com/DLR-RM/stable-baselines3)
|
||||
These extensions include:
|
||||
This repo provides:
|
||||
|
||||
- An implementation of ["Differentiable Trust Region Layers for Deep Reinforcement Learning" by Fabian Otto et al. (TRPL)](https://arxiv.org/abs/2101.09207)
|
||||
- Support for Prior Conditioned Annealing
|
||||
- Support for Prior Conditioned Annealing (WIP)
|
||||
- Support for Contextual Covariances (Planned)
|
||||
- Support for Full Covariances (Planned)
|
||||
|
||||
@ -33,6 +30,23 @@ Then install this repo as a package:
|
||||
pip install -e .
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
TRPL can be used just like SB3's PPO:
|
||||
|
||||
```
|
||||
import gymnasium as gym
|
||||
from metastable_baselines2 import TRPL
|
||||
|
||||
projection = 'Wasserstein' # or Frobenius or KL
|
||||
|
||||
model = TRPL("MlpPolicy", env_id, n_steps=128, seed=0, policy_kwargs=dict(net_arch=[16]), projection_class=projection, projection_kwargs={'mean_bound': mean_bound, 'cov_bound': cov_bound}, verbose=1)
|
||||
|
||||
model.learn(total_timesteps=100)
|
||||
```
|
||||
|
||||
For avaible projection_kwargs have a look at [Metastable Projections](https://git.dominik-roth.eu/dodox/metastable-projections).
|
||||
|
||||
## License
|
||||
|
||||
Since this Repo is an extension to [Stable Baselines 3 by DLR-RM](https://github.com/DLR-RM/stable-baselines3), it contains some of it's code. SB3 is licensed under the [MIT-License](https://github.com/DLR-RM/stable-baselines3/blob/master/LICENSE).
|
||||
|
Loading…
Reference in New Issue
Block a user