diff --git a/README.md b/README.md index e627fcf..65edde2 100644 --- a/README.md +++ b/README.md @@ -356,6 +356,165 @@ knn_learner.save_model('reactor_knn.pkl') The trained models can be integrated into the NuconSimulator to provide accurate dynamics based on real game data. +## Full Training Loop + +The recommended end-to-end workflow for training an RL operator is an iterative cycle of real-game data collection, model fitting, and simulated training. The real game is slow and cannot be parallelised, so the bulk of RL training happens in the simulator — the game is used only as an oracle for data and evaluation. + +``` +┌─────────────────────────────────────────────────────────────┐ +│ 1. Human dataset collection │ +│ Play the game: start up the reactor, operate it across │ +│ a range of states. NuCon records state transitions. │ +└───────────────────────┬─────────────────────────────────────┘ + │ + ▼ +┌─────────────────────────────────────────────────────────────┐ +│ 2. Initial model fitting │ +│ Fit NN or kNN dynamics model to the collected dataset. │ +│ kNN is instant; NN needs gradient steps but generalises │ +│ better with more data. │ +└───────────────────────┬─────────────────────────────────────┘ + │ + ┌─────────▼──────────┐ + │ 3. Train RL │◄──────────────────────┐ + │ in simulator │ │ + │ (fast, many │ │ + │ trajectories) │ │ + └─────────┬──────────┘ │ + │ │ + ▼ │ + ┌─────────────────────┐ │ + │ 4. Eval in game │ │ + │ + collect new data │ │ + │ (merge & prune │ │ + │ dataset) │ │ + └─────────┬───────────┘ │ + │ │ + ▼ │ + ┌─────────────────────┐ model improved? │ + │ 5. Refit model ├──────── yes ──────────┘ + │ on expanded data │ + └─────────────────────┘ +``` + +### Step 1 — Human dataset collection + +Start `NuconModelLearner` before or during your play session. Try to cover a wide range of reactor states — startup from cold, ramping power up and down, adjusting individual rod banks, pump speed changes. Diversity in the dataset directly determines how accurate the simulator will be. + +```python +from nucon.model import NuconModelLearner + +learner = NuconModelLearner( + dataset_path='reactor_dataset.pkl', + time_delta=10.0, # 10 game-seconds per sample +) +learner.collect_data(num_steps=500, save_every=10) +``` + +The collector saves every 10 steps, retries automatically on game crashes, and scales wall-clock sleep with `GAME_SIM_SPEED` so samples are always 10 game-seconds apart regardless of simulation speed. + +### Step 2 — Initial model fitting + +```python +from nucon.model import NuconModelLearner + +learner = NuconModelLearner(dataset_path='reactor_dataset.pkl') + +# Option A: kNN + GP (instant fit, built-in uncertainty estimation) +learner.drop_redundant(min_state_distance=0.1, min_output_distance=0.05) +learner.fit_knn(k=10) +learner.save_model('reactor_knn.pkl') + +# Option B: Neural network (better extrapolation with larger datasets) +learner.train_model(batch_size=32, num_epochs=50) +learner.drop_well_fitted(error_threshold=1.0) # keep hard samples for next round +learner.save_model('reactor_nn.pth') +``` + +### Step 3 — Train RL in simulator + +Load the fitted model into the simulator and train with SAC + HER. The simulator runs orders of magnitude faster than the real game, allowing millions of steps in reasonable time. + +```python +from nucon.sim import NuconSimulator, OperatingState +from nucon.rl import NuconGoalEnv +from stable_baselines3 import SAC +from stable_baselines3.common.buffers import HerReplayBuffer + +simulator = NuconSimulator() +simulator.load_model('reactor_knn.pkl') +simulator.set_state(OperatingState.NOMINAL) + +env = NuconGoalEnv( + goal_params=['GENERATOR_0_KW', 'GENERATOR_1_KW', 'GENERATOR_2_KW'], + goal_range={'GENERATOR_0_KW': (0, 1200), 'GENERATOR_1_KW': (0, 1200), 'GENERATOR_2_KW': (0, 1200)}, + tolerance=0.05, + simulator=simulator, + seconds_per_step=10, +) + +model = SAC( + 'MultiInputPolicy', env, + replay_buffer_class=HerReplayBuffer, + replay_buffer_kwargs={'n_sampled_goal': 4, 'goal_selection_strategy': 'future'}, + verbose=1, +) +model.learn(total_timesteps=500_000) +model.save('rl_policy.zip') +``` + +### Step 4 — Eval in game + collect new data + +Run the trained policy against the real game. This validates whether the simulator was accurate enough, and simultaneously collects new data covering states the policy visits — which may be regions the original dataset missed. + +```python +from nucon.rl import NuconGoalEnv +from nucon.model import NuconModelLearner +from stable_baselines3 import SAC +import numpy as np + +# Load policy and run in real game +env = NuconGoalEnv( + goal_params=['GENERATOR_0_KW', 'GENERATOR_1_KW', 'GENERATOR_2_KW'], + goal_range={'GENERATOR_0_KW': (0, 1200), 'GENERATOR_1_KW': (0, 1200), 'GENERATOR_2_KW': (0, 1200)}, + seconds_per_step=10, +) +policy = SAC.load('rl_policy.zip') + +# Simultaneously collect new data +new_data_learner = NuconModelLearner( + dataset_path='reactor_dataset_new.pkl', + time_delta=10.0, +) + +obs, _ = env.reset() +for _ in range(200): + action, _ = policy.predict(obs, deterministic=True) + obs, reward, terminated, truncated, _ = env.step(action) + if terminated or truncated: + obs, _ = env.reset() +``` + +### Step 5 — Refit model on expanded data + +Merge the new data into the original dataset and refit: + +```python +learner = NuconModelLearner(dataset_path='reactor_dataset.pkl') +learner.merge_datasets('reactor_dataset_new.pkl') + +# Prune redundant samples before refitting +learner.drop_redundant(min_state_distance=0.1, min_output_distance=0.05) +print(f"Dataset size after pruning: {len(learner.dataset)}") + +learner.fit_knn(k=10) +learner.save_model('reactor_knn.pkl') +``` + +Then go back to Step 3 with the improved model. Each iteration the simulator gets more accurate, the policy gets better, and the new data collection explores increasingly interesting regions of state space. + +**When to stop**: when the policy performs well in the real game and the kNN uncertainty stays low throughout an episode (indicating the policy stays within the known data distribution). + ## Testing NuCon includes a test suite to verify its functionality and compatibility with the Nucleares game.