Add PPO example to README

2024-10-10 17:27:12 +02:00 · 2024-10-10 17:27:12 +02:00 · e4c9f047d0
commit e4c9f047d0
parent 5dfd85a5af
1 changed files with 38 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -138,6 +138,44 @@ env.close()
 Objectives takes either strings of the name of predefined objectives, or lambda functions which take an observation and return a scalar reward. Final rewards are (weighted) summed across all objectives. `info['objectives']` contains all objectives and their values.
 You can e.g. train an PPO agent using the [sb3](https://github.com/DLR-RM/stable-baselines3) implementation:
 ```python
 from nucon.rl import NuconEnv
 from stable_baselines3 import PPO
 env = NuconEnv(objectives=['max_power'], seconds_per_step=5)
 # Create the PPO (Proximal Policy Optimization) model
 model = PPO(
    "MlpPolicy", 
    env, 
    verbose=1,
    learning_rate=3e-4,  # You can adjust hyperparameters as needed
    n_steps=2048, 
    batch_size=64, 
    n_epochs=10, 
    gamma=0.99, 
    gae_lambda=0.95, 
    clip_range=0.2, 
    ent_coef=0.01
 )
 # Train the model
 model.learn(total_timesteps=100000)  # Adjust total_timesteps as needed
 # Test the trained model
 obs, info = env.reset()
 for _ in range(1000):
    action, _states = model.predict(obs, deterministic=True)
    obs, reward, terminated, truncated, info = env.step(action)
    if terminated or truncated:
        obs, info = env.reset()
 # Close the environment
 env.close()
 ```
 But theres a problem: RL algorithms require a huge amount of training steps to get passable policies, and Nucleares is a very slow simulation and can not be trivially parallelized. That's why NuCon also provides a
 ## Simulator (Work in Progress)