Some additions for Stable Baselines 3. e.g. Contextual Covariance and Differentiable Trust Region Layers
Go to file
2023-05-21 16:14:34 +02:00
metastable_baselines Fixed Typo 2023-05-21 16:14:34 +02:00
.gitignore +1 2022-08-05 21:07:38 +02:00
3rd-party-licenses.txt Fixes + spherical_chol 2022-07-11 17:28:08 +02:00
icon.svg Switching to new icon 2022-06-30 21:02:22 +02:00
LICENSE MIT License 2022-11-13 20:07:08 +01:00
README.md Pruning of README 2nd try 2022-11-07 12:59:12 +01:00
replay.py Support SAC in replays 2022-07-19 10:08:34 +02:00
requirements.txt New Version of SB3 2022-12-13 18:38:33 +01:00
run_tensorboard.sh changed locations / names for logs 2022-06-17 13:16:32 +02:00
setup.py Now we even include a setup.py 2022-07-01 12:22:27 +02:00
test.py Tiny fix for other envs 2022-11-07 13:23:55 +01:00

Metastable Baselines

During training of a RL-Agent we follow the gradient of the loss, which leads us to a minimum. In cases where the found minimum is merely a local minimum, this can be seen as a false vacuum in our loss space. Exploration mechanisms try to let our training procedure escape these stable states: Making them metastable.

In order to archive this, this Repo contains some extensions for Stable Baselines 3 by DLR-RM
These extensions include:

The resulting algorithms can than be tested for their ability of exploration in the enviroments provided by Project Columbus

This Repo was created as part of my bachelor-thesis at ALR (KIT).

Installation

(optional) Columbus for test.py and replay.py

Install Project Columbus by following the instructions in the repo.

Install dependency: Metastable Projections

Follow instructions for the Public Version (GitHub Mirror) / Private Version (GitHub Mirror). The private version also requires ALR's ITPAL as a dependency. Only the private version supports KL Projections.

Install as a package

Then install this repo as a package:

pip install -e .

License

Since this Repo is an extension to Stable Baselines 3 by DLR-RM, it contains some of it's code. SB3 is licensed under the MIT-License.