Fancy RL

Fancy RL is a minimalistic and efficient implementation of Proximal Policy Optimization (PPO) and Trust Region Policy Layers (TRPL) using primitives from torchrl. Future plans include implementing Soft Actor-Critic (SAC). This library focuses on providing clean and understandable code while leveraging the powerful functionalities of torchrl. We provide optional integration with wandb.

Installation

Fancy RL requires Python 3.7-3.11. (TorchRL currently does not support Python 3.12)

pip install -e .

Usage

Here's a basic example of how to train a PPO agent with Fancy RL:

from fancy_rl.ppo import PPO
from fancy_rl.policy import Policy
import gymnasium as gym

def env_fn():
    return gym.make("CartPole-v1")

# Create policy
env = env_fn()
policy = Policy(env.observation_space, env.action_space)

# Create PPO instance with default config
ppo = PPO(policy=policy, env_fn=env_fn)

# Train the agent
ppo.train()

For a more complete function description and advanced usage, refer to example/example.py.

Testing

To run the test suite:

pytest test/test_ppo.py

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests to enhance the library.

License

This project is licensed under the MIT License.

1.4 KiB Raw Blame History