Fancy RL provides minimalistic and efficient implementations of PPO and TRPL for torchrl.

Go to file

Dominik Roth 4f8fc500b7 Simplify operations on spaces (is_discrete, shape)		2024-11-07 11:40:32 +01:00
example	Update example.py to new func	2024-05-31 13:04:56 +02:00
fancy_rl	Simplify operations on spaces (is_discrete, shape)	2024-11-07 11:40:32 +01:00
test	Expand Tests	2024-10-21 15:24:36 +02:00
.gitignore	Expanded .gitignore	2024-06-02 11:11:35 +02:00
fancy_rl.svg	Icon?	2024-05-29 21:41:36 +02:00
pyproject.toml	Pytest as optional dep	2024-10-21 15:24:45 +02:00
README.md	Updated README	2024-11-07 11:38:34 +01:00

README.md

Fancy RL

Fancy RL provides a minimalistic and efficient implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Layers (TRPL) using primitives from torchrl. This library focuses on providing clean, understandable code and reusable modules while leveraging the powerful functionalities of torchrl.

❗ This project is still WIP and not ready to be used. (Problems with torchdict routing are a pain to debug...)

Installation

Fancy RL requires Python 3.7-3.11. (TorchRL currently does not support Python 3.12)

pip install -e .

Usage

Fancy RL provides two main components:

Ready-to-use Classes for PPO / TRPL: These classes allow you to quickly get started with reinforcement learning algorithms, enjoying the performance and hackability that comes with using TorchRL.
```
from fancy_rl import PPO, TRPL

model = TRPL("CartPole-v1")

model.train()
```
For environments, you can pass any gymnasium or Fancy Gym environment ID as a string, a function returning a gymnasium or torchrl environment, an already instantiated gymnasium or torchrl environment, or a dict that will be passed to gymnasium.make. Check 'example/example.py' for a more complete usage example.
Additional Modules for TRPL: Designed to integrate with torchrl's primitives-first approach, these modules are ideal for building custom algorithms with precise trust region projections. Oh, you want documentation for these? To bad... (TODO)

Background on Trust Region Policy Layers (TRPL)

Trust region methods are essential in reinforcement learning for ensuring robust policy updates. Traditional methods like TRPO and PPO use approximations, which can sometimes violate constraints or fail to find optimal solutions. To address these issues, TRPL provides differentiable neural network layers that enforce trust regions through closed-form projections for deep Gaussian policies. These layers formalize trust regions individually for each state and complement existing reinforcement learning algorithms.

The TRPL implementation in Fancy RL includes projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. This approach achieves similar or better results than existing methods while being less sensitive to specific implementation choices.

Testing

To run the test suite:

pytest test/

Status

Implemented Features

Proximal Policy Optimization (PPO) algorithm
Trust Region Policy Layers (TRPL) algorithm (WIP)
Support for continuous and discrete action spaces
Multiple projection methods (Rewritten for MIT License Compatability):
- KL Divergence projection
- Frobenius norm projection
- Wasserstein distance projection
- Identity projection (Eq to PPO)
Configurable neural network architectures for actor and critic
Logging support (Terminal and WandB, extendable)

TODO

All PPO Tests green
Better / more logging
Test / Benchmark PPO
Refactor Modules for TRPL
Get TRPL working
All TRPL Tests green
Make contextual covariance optional
Allow full-cov via chol
Test / Benchmark TRPL
Write docs / extend README
Test func of non-gym envs
Implement SAC
Implement VLEARN

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to enhance the library.

License

This project is licensed under the MIT License.