fancy_rl/README.md

<h1 align="center">
  <br>
  <img src='./fancy_rl.svg' width="250px">
  <br><br>
  <b>Fancy RL</b>
  <br><br>
</h1>

Fancy RL provides a minimalistic and efficient implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Layers (TRPL) using primitives from [torchrl](https://pypi.org/project/torchrl/). This library focuses on providing clean, understandable code and reusable modules while leveraging the powerful functionalities of torchrl.

## Installation

Fancy RL requires Python 3.7-3.11. (TorchRL currently does not support Python 3.12)

```bash
pip install -e .
```

## Usage

Fancy RL provides two main components:

1. **Ready-to-use Classes for PPO / TRPL**: These classes allow you to quickly get started with reinforcement learning algorithms, enjoying the performance and hackability that comes with using TorchRL.

   ```python
   from fancy_rl import PPO, TRPL

   model = TRPL("CartPole-v1")

   model.train()
   ```

   For environments, you can pass any [gymnasium](https://gymnasium.farama.org/) or [Fancy Gym](https://alrhub.github.io/fancy_gym/) environment ID as a string, a function returning a gymnasium or torchrl environment, an already instantiated gymnasium or torchrl environment, or a dict that will be passed to gymnasium.make. Check 'example/example.py' for a more complete usage example.

2. **Additional Modules for TRPL**: Designed to integrate with torchrl's primitives-first approach, these modules are ideal for building custom algorithms with precise trust region projections.

## Background on Trust Region Policy Layers (TRPL)

Trust region methods are essential in reinforcement learning for ensuring robust policy updates. Traditional methods like TRPO and PPO use approximations, which can sometimes violate constraints or fail to find optimal solutions. To address these issues, TRPL provides differentiable neural network layers that enforce trust regions through closed-form projections for deep Gaussian policies. These layers formalize trust regions individually for each state and complement existing reinforcement learning algorithms.

The TRPL implementation in Fancy RL includes projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. This approach achieves similar or better results than existing methods while being less sensitive to specific implementation choices.

## Testing

To run the test suite:

```bash
pytest test/test_ppo.py
```

## Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to enhance the library.

## License

This project is licensed under the MIT License.
Icon? 2024-05-29 21:41:36 +02:00			`<h1 align="center">`
			`<br>`
			`<img src='./fancy_rl.svg' width="250px">`
			`<br><br>`
			`<b>Fancy RL</b>`
			`<br><br>`
			`</h1>`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Updated README 2024-05-31 13:05:11 +02:00			`Fancy RL provides a minimalistic and efficient implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Layers (TRPL) using primitives from [torchrl](https://pypi.org/project/torchrl/). This library focuses on providing clean, understandable code and reusable modules while leveraging the powerful functionalities of torchrl.`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
			`## Installation`

			`Fancy RL requires Python 3.7-3.11. (TorchRL currently does not support Python 3.12)`

			```bash
			`pip install -e .`
			```

			`## Usage`

Updated README 2024-05-30 22:26:36 +02:00			`Fancy RL provides two main components:`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Updated README 2024-05-30 22:26:36 +02:00			`1. Ready-to-use Classes for PPO / TRPL: These classes allow you to quickly get started with reinforcement learning algorithms, enjoying the performance and hackability that comes with using TorchRL.`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Updated README 2024-05-31 13:05:11 +02:00			```python
Showcase TRPL in README 2024-05-31 18:25:42 +02:00			`from fancy_rl import PPO, TRPL`
Fixed README 2024-05-31 13:06:50 +02:00
Showcase TRPL in README 2024-05-31 18:25:42 +02:00			`model = TRPL("CartPole-v1")`
Updated README 2024-05-31 13:05:11 +02:00
Showcase TRPL in README 2024-05-31 18:25:42 +02:00			`model.train()`
Updated README 2024-05-31 13:05:11 +02:00			```

Support for torchrl envs 2024-06-02 11:11:06 +02:00			`For environments, you can pass any [gymnasium](https://gymnasium.farama.org/) or [Fancy Gym](https://alrhub.github.io/fancy_gym/) environment ID as a string, a function returning a gymnasium or torchrl environment, an already instantiated gymnasium or torchrl environment, or a dict that will be passed to gymnasium.make. Check 'example/example.py' for a more complete usage example.`
Updated README 2024-05-31 13:05:11 +02:00
			`2. Additional Modules for TRPL: Designed to integrate with torchrl's primitives-first approach, these modules are ideal for building custom algorithms with precise trust region projections.`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Why was that h3? 2024-06-02 11:59:26 +02:00			`## Background on Trust Region Policy Layers (TRPL)`
Updated README 2024-05-30 22:26:36 +02:00
			Trust region methods are essential in reinforcement learning for ensuring robust policy updates. Traditional methods like TRPO and PPO use approximations, which can sometimes violate constraints or fail to find optimal solutions. To address these issues, TRPL provides differentiable neural network layers that enforce trust regions through closed-form projections for deep Gaussian policies. These layers formalize trust regions individually for each state and complement existing reinforcement learning algorithms.
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Updated README 2024-05-30 22:26:36 +02:00			`The TRPL implementation in Fancy RL includes projections based on the Kullback-Leibler divergence, the Wasserstein L2 distance, and the Frobenius norm for Gaussian distributions. This approach achieves similar or better results than existing methods while being less sensitive to specific implementation choices.`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
Updated README 2024-05-30 22:26:36 +02:00			`## Testing`
Oh, I could start using git... 2024-05-29 21:21:43 +02:00
			`To run the test suite:`

			```bash
			`pytest test/test_ppo.py`
			```

			`## Contributing`

			`Contributions are welcome! Feel free to open issues or submit pull requests to enhance the library.`

			`## License`

Updated README 2024-05-31 13:05:11 +02:00			`This project is licensed under the MIT License.`