NuCon/README.md
Dominik Roth 36a33e74e5 fix: add objectives support to NuconGoalEnv; fix README uncertainty example
- NuconGoalEnv now accepts objectives/objective_weights; additive on top
  of the goal reward, same interface as NuconEnv
- README: use UncertaintyPenalty/UncertaintyAbort correctly (via objectives
  and terminators, not as constructor params that don't exist)
- Step 3 prose updated to reference composable callables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:55:16 +01:00

497 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<div align="center">
<img src='./logo.png' width="250px">
<h2>NuCon</h2>
<br>
</div>
NuCon (Nucleares Controller) is a Python library designed to interface with and control parameters in [Nucleares](https://store.steampowered.com/app/1428420/Nucleares/), a nuclear reactor simulation game. It provides a robust, type-safe foundation for reading and writing game parameters, allowing users to easily create their own automations and control systems.
NuCon further provides a work in progress implementation of a reinforcement learning environment for training control policies and a simulator based on model learning.
> [!NOTE]
> NuCon is compatible with Nucleares v2.2.25.213. The game exposes a rich set of writable parameters including individual rod bank positions (`ROD_BANK_POS_{0-8}_ORDERED`), pump speeds, MSCV and turbine bypass setpoints, and various switches. Core chemistry parameters (e.g. Xenon concentration) are still read-only. Development on the advanced features (Reinforcement / Model Learning) is ongoing.
## Features
- Enum-based parameter system for type safety and code clarity
- Support for various parameter types including floats, integers, booleans, strings, and custom enums
- Read and write capabilities for game parameters
- Reinforcement learning environment for training control policies
- Built-in simulator for rapid prototyping and testing
- Model learning for dynamics prediction
## Installation
To install NuCon, clone this repository and install via pip:
```bash
git clone https://git.dominik-roth.eu/dodox/NuCon
cd NuCon
pip install -e .
```
## Usage
Here's a basic example of how to use NuCon:
```python
from nucon import Nucon
nucon = Nucon()
# or nucon = Nucon(host='localhost', port=8786)
# Enable dummy mode for testing (optional)
nucon.set_dummy_mode(True)
# Read a parameter
core_temp = nucon.CORE_TEMP.value
print(f"Core Temperature: {core_temp}")
# >> Core Temperature: 500.0
# Read a parameter with an enum type
pump_status = nucon.COOLANT_CORE_CIRCULATION_PUMP_0_STATUS.value
print(f"Pump 0 Status: {pump_status}")
if nucon.COOLANT_CORE_CIRCULATION_PUMP_0_STATUS.value:
print('Pump 0 is active.')
# >> Pump 0 Status: PumpStatus.INACTIVE
# Write to a parameter (has no effect in dummy mode)
nucon.RODS_POS_ORDERED.value = 50
print(f"Rods Position Ordered: {nucon.RODS_POS_ORDERED.value}")
# >> Rods Position Ordered: 50.0
# The repr of an attribute contains all contained info
nucon.CORE_TEMP
# >> NuconParameter(id='CORE_TEMP', value=500.0, param_type=float, is_writable=False)
```
## API Reference
The `nucon` instance contains all available parameters.
Parameter properties:
- `nucon.<PARAMETER>.value`: Get or set the current value of the parameter. Assigning a new value will write it to the game.
- `nucon.<PARAMETER>.param_type`: Get the type of the parameter
- `nucon.<PARAMETER>.is_writable`: Check if the parameter is writable
- `nucon.<PARAMETER>.is_readable`: `False` for write-only parameters (e.g. VALVE_OPEN, CORE_SCRAM_BUTTON). Reading raises `AttributeError`.
- `nucon.<PARAMETER>.is_cheat`: `True` for game-event triggers (all `FUN_*`). Writing raises `ValueError` unless `cheat_mode=True`.
- `nucon.<PARAMETER>.enum_type`: Get the enum type of the parameter if it's an enum, otherwise None
- `nucon.<PARAMETER>.unit`: Unit string if defined (e.g. `'°C'`, `'bar'`, `'%'`)
Parameter methods:
- `nucon.<PARAMETER>.read()`: Get the current value of the parameter (alias for `value`)
- `nucon.<PARAMETER>.write(new_value, force=False)`: Write a new value to the parameter. `force` will try to write even if the parameter is known as non-writable or out of known allowed range.
Class methods:
- `nucon.get(parameter)`: Get the value of a specific parameter. Also accepts string parameter names.
- `nucon.set(parameter, value, force=False)`: Set the value of a specific parameter. Also accepts string parameter names. `force` bypasses writable/range/cheat checks.
- `nucon.get_all_readable()`: Get a dict of all readable parameters.
- `nucon.get_all_writable()`: Get a dict of all writable parameters (includes write-only params).
- `nucon.get_all()`: Get all readable parameter values as a dictionary.
- `nucon.get_all_iter()`: Get all readable parameter values as a generator.
- `nucon.get_multiple(params)`: Get values for multiple specified parameters.
- `nucon.get_multiple_iter(params)`: Get values for multiple specified parameters as a generator.
- `nucon.get_game_variable_names()`: Query the game for all exposed variable names (GET and POST), excluding special endpoints.
- `nucon.set_dummy_mode(dummy_mode)`: In dummy mode, returns sensible values without connecting to the game and silently ignores writes.
- `nucon.set_cheat_mode(cheat_mode)`: Enable writing to cheat parameters (`FUN_*` event triggers). Default `False`.
Valve API (motorized actuators: OPEN/CLOSE powers the motor, OFF holds current position):
- `nucon.get_valve(name)`: Get state dict for a single valve (`Value`, `IsOpened`, `IsClosed`, `Stuck`, …).
- `nucon.get_valves()`: Get state dict for all 53 valves.
- `nucon.open_valve(name)` / `nucon.open_valves(names)`: Power actuator toward open.
- `nucon.close_valve(name)` / `nucon.close_valves(names)`: Power actuator toward closed.
- `nucon.off_valve(name)` / `nucon.off_valves(names)`: Cut actuator power, hold current position (normal resting state).
Custom Enum Types:
- `PumpStatus`: Enum for pump status (INACTIVE, ACTIVE_NO_SPEED_REACHED\*, ACTIVE_SPEED_REACHED\*, REQUIRES_MAINTENANCE, NOT_INSTALLED, INSUFFICIENT_ENERGY)
- `PumpDryStatus`: Enum for pump dry status (ACTIVE_WITHOUT_FLUID\*, INACTIVE_OR_ACTIVE_WITH_FLUID)
- `PumpOverloadStatus`: Enum for pump overload status (ACTIVE_AND_OVERLOAD\*, INACTIVE_OR_ACTIVE_NO_OVERLOAD)
- `BreakerStatus`: Enum for breaker status (OPEN\*, CLOSED)
\*: Truthy value (will be treated as true in e.g. if statements).
So if you're not in the mood to play the game manually, this API can be used to easily create your own automations and control systems. Maybe a little PID controller for the rods? Or, if you wanna go crazy, why not try some
## Reinforcement Learning
NuCon includes a Reinforcement Learning (RL) environment based on the OpenAI Gym interface. This allows you to train control policies for the Nucleares game instead of writing them yourself. Requires additional dependencies.
### Additional Dependencies
To use you'll need to install `gymnasium` and `numpy`. You can do so via
```bash
pip install -e '.[rl]'
```
### Environments
Two environment classes are provided in `nucon/rl.py`:
**`NuconEnv`**: classic fixed-objective environment. You define one or more objectives at construction time (e.g. maximise power output, keep temperature in range). The agent always trains toward the same goal.
- Observation space: all readable numeric parameters (~290 dims).
- Action space: all readable-back writable parameters (~30 dims): 9 individual rod bank positions, 3 MSCVs, 3 turbine bypass valves, 6 coolant pump speeds, condenser pump, freight/vent switches, resistor banks, and more.
- Objectives: predefined strings (`'max_power'`, `'episode_time'`) or arbitrary callables `(obs) -> float`. Multiple objectives are weighted-summed.
**`NuconGoalEnv`**: goal-conditioned environment. The desired goal (e.g. target generator output) is sampled at the start of each episode and provided as part of the observation. A single policy learns to reach *any* goal in the specified range, making it far more useful than a fixed-objective agent. Designed for training with [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495), which makes sparse-reward goal-conditioned training tractable.
- Observation space: `Dict` with keys `observation` (non-goal params), `achieved_goal` (current goal param values, normalised to [0,1]), `desired_goal` (target, normalised to [0,1]).
- Goals are sampled uniformly from the specified `goal_range` each episode.
- Reward defaults to negative L2 distance in normalised goal space (dense). Pass `tolerance` for a sparse `{0, -1}` reward; this works particularly well with HER.
### NuconEnv Usage
```python
from nucon.rl import NuconEnv, Parameterized_Objectives
env = NuconEnv(objectives=['max_power'], seconds_per_step=5)
# env2 = gym.make('Nucon-max_power-v0')
# env3 = NuconEnv(objectives=[Parameterized_Objectives['target_temperature'](goal_temp=350)], objective_weights=[1.0], seconds_per_step=5)
obs, info = env.reset()
for _ in range(1000):
action = env.action_space.sample() # Your agent here (instead of random)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
```
Objectives takes either strings of the name of predefined objectives, or lambda functions which take an observation and return a scalar reward. Final rewards are (weighted) summed across all objectives. `info['objectives']` contains all objectives and their values.
You can e.g. train a PPO agent using the [sb3](https://github.com/DLR-RM/stable-baselines3) implementation:
```python
from nucon.rl import NuconEnv
from stable_baselines3 import PPO
env = NuconEnv(objectives=['max_power'], seconds_per_step=5)
model = PPO(
"MlpPolicy",
env,
verbose=1,
learning_rate=3e-4,
n_steps=2048,
batch_size=64,
n_epochs=10,
gamma=0.99,
gae_lambda=0.95,
clip_range=0.2,
ent_coef=0.01,
)
model.learn(total_timesteps=100_000)
obs, info = env.reset()
for _ in range(1000):
action, _states = model.predict(obs, deterministic=True)
obs, reward, terminated, truncated, info = env.step(action)
if terminated or truncated:
obs, info = env.reset()
env.close()
```
### NuconGoalEnv + HER Usage
HER works by relabelling past trajectories with the goal that was *actually achieved*, turning every episode into useful training signal even when the agent never reaches the intended target. This makes it much more sample-efficient than standard RL for goal-reaching tasks. This matters a lot given how slow the real game is.
```python
from nucon.rl import NuconGoalEnv, UncertaintyPenalty, UncertaintyAbort
from stable_baselines3 import SAC
from stable_baselines3.common.buffers import HerReplayBuffer
env = NuconGoalEnv(
goal_params=['GENERATOR_0_KW', 'GENERATOR_1_KW', 'GENERATOR_2_KW'],
goal_range={
'GENERATOR_0_KW': (0.0, 1200.0),
'GENERATOR_1_KW': (0.0, 1200.0),
'GENERATOR_2_KW': (0.0, 1200.0),
},
tolerance=0.05, # sparse: within 5% of range counts as success (recommended with HER)
seconds_per_step=5,
simulator=simulator, # use a pre-trained simulator for fast pre-training
# Keep policy within the simulator's known data distribution.
# SIM_UNCERTAINTY (kNN-GP posterior std) is injected into obs when a simulator is active.
# Tune start/scale/threshold to taste.
objectives=[UncertaintyPenalty(start=0.3, scale=1.0)], # L2 penalty above soft threshold
terminators=[UncertaintyAbort(threshold=0.7)], # abort episode at hard threshold
)
# Or use a preset: env = gym.make('Nucon-goal_power-v0', simulator=simulator)
model = SAC(
'MultiInputPolicy',
env,
replay_buffer_class=HerReplayBuffer,
replay_buffer_kwargs={'n_sampled_goal': 4, 'goal_selection_strategy': 'future'},
verbose=1,
learning_rate=1e-3,
batch_size=256,
tau=0.005,
gamma=0.98,
train_freq=1,
gradient_steps=1,
)
model.learn(total_timesteps=500_000)
```
At inference time, inject any target by constructing the observation manually:
```python
import numpy as np
obs, _ = env.reset()
# Override the desired goal (values are normalised to [0,1] within goal_range)
obs['desired_goal'] = np.array([0.8, 0.8, 0.8], dtype=np.float32) # ~960 kW per generator
action, _ = model.predict(obs, deterministic=True)
```
Predefined goal environments:
- `Nucon-goal_power-v0`: target total generator output (3 × 01200 kW)
- `Nucon-goal_temp-v0`: target core temperature (280380 °C)
RL algorithms require a huge number of training steps, and Nucleares is slow and cannot be trivially parallelised. That's why NuCon provides a built-in simulator.
## Simulator
NuCon provides a built-in simulator to address the challenge of slow training times in the actual Nucleares game. This simulator allows for rapid prototyping and testing of control policies without the need for the full game environment. Key features include:
- Mimics the behavior of the Nucleares game API
- Configurable initial states and operating modes
- Faster than real-time simulation
- Supports parallel execution for increased training throughput
### Additional Dependencies
To use you'll need to install `torch` and `flask`. You can do so via
```bash
pip install -e '.[sim]'
```
### Usage
To use the NuCon simulator:
```python
from nucon import Nucon
from nucon.sim import NuconSimulator, OperatingState
# Create a simulator instance
simulator = NuconSimulator()
# Load a dynamics model (explained later)
simulator.load_model('path/to/model.pth')
# Set initial state (optional)
simulator.set_state(OperatingState.NOMINAL)
# Run the simulator, will start the web server
simulator.run()
# Access via nucon by using the simulator's port
nucon = Nucon(port=simulator.port)
# Or use the simulator with NuconEnv
from nucon.rl import NuconEnv
env = NuconEnv(simulator=simulator) # When given a similator, instead of waiting on the game, we will tell the simulator to skip forward after each step
# Train your RL agent using the simulator
# ...
```
The simulator needs an accurate dynamics model of the game. NuCon provides tools to learn one from real gameplay data.
## Model Learning
To address the challenge of unknown game dynamics, NuCon provides tools for collecting data, creating datasets, and training models to learn the reactor dynamics. Key features include:
- **Data Collection**: Gathers state transitions from human play or automated agents. `time_delta` is specified in game-time seconds; wall-clock sleep is automatically adjusted for `GAME_SIM_SPEED` so collected deltas are uniform regardless of simulation speed.
- **Automatic param filtering**: Junk params (GAME_VERSION, TIME, ALARMS_ACTIVE, …) and params from uninstalled subsystems (returns `None`) are automatically excluded from model inputs/outputs.
- **Two model backends**: Neural network (NN) or a local Gaussian Process approximated via k-Nearest Neighbours (kNN-GP).
- **Uncertainty estimation**: The kNN-GP backend returns a GP posterior standard deviation alongside each prediction; 0 means the query lies on known data, ~1 means it is out of distribution.
- **Dataset management**: Tools for saving, loading, merging, and pruning datasets.
### Additional Dependencies
```bash
pip install -e '.[model]'
```
### Model selection
**kNN-GP** (the `ReactorKNNModel` backend) is a local Gaussian Process: it finds the `k` nearest neighbours in the training set, fits an RBF kernel on them, and returns a prediction plus a GP posterior std as uncertainty. It works well from a few hundred samples and requires no training. **NN** needs input normalisation and several thousand samples to generalise; use it once you have a large dataset. For initial experiments, start with kNN-GP (`k=10`).
### Usage
```python
from nucon.model import NuconModelLearner
# --- Data collection ---
learner = NuconModelLearner(
time_delta=10.0, # 10 game-seconds per step (wall sleep auto-scales with sim speed)
include_valve_states=False, # set True to include all 53 valve positions as model inputs
)
learner.collect_data(num_steps=1000)
learner.save_dataset('reactor_dataset.pkl')
# Merge datasets collected across multiple sessions
learner.merge_datasets('other_session.pkl')
# --- Neural network backend ---
nn_learner = NuconModelLearner(dataset_path='reactor_dataset.pkl')
nn_learner.train_model(batch_size=32, num_epochs=50) # creates NN model on first call
# Drop samples the NN already predicts well (keep hard cases for further training)
nn_learner.drop_well_fitted(error_threshold=1.0)
nn_learner.save_model('reactor_nn.pth')
# --- kNN-GP backend ---
knn_learner = NuconModelLearner(dataset_path='reactor_dataset.pkl')
# Drop near-duplicate samples before fitting (keeps diverse coverage).
# A sample is dropped only if BOTH its input state AND output transition
# are within the given distances of an already-kept sample.
knn_learner.drop_redundant(min_state_distance=0.1, min_output_distance=0.05)
knn_learner.fit_knn(k=10) # creates kNN-GP model on first call
# Point prediction
state = knn_learner._get_state()
pred = knn_learner.model.forward(state, time_delta=10.0)
# Prediction with uncertainty
pred, uncertainty = knn_learner.predict_with_uncertainty(state, time_delta=10.0)
print(f"CORE_TEMP: {pred['CORE_TEMP']:.1f} ± {uncertainty:.3f} (std, GP posterior)")
# uncertainty ≈ 0: confident (query near known data)
# uncertainty ≈ 1: out of distribution
knn_learner.save_model('reactor_knn.pkl')
```
The trained models can be integrated into the NuconSimulator to provide accurate dynamics based on real game data.
## Full Training Loop
The recommended end-to-end workflow for training an RL operator is an iterative cycle of real-game data collection, model fitting, and simulated training. The real game is slow and cannot be parallelised, so the bulk of RL training happens in the simulator. The game is used only as an oracle for data and evaluation.
```
┌─────────────────────────────────────────────────────────────┐
│ 1. Human dataset collection │
│ Play the game: start up the reactor, operate it across │
│ a range of states. NuCon records state transitions. │
└───────────────────────┬─────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ 2. Initial model fitting │
│ Fit NN or kNN dynamics model to the collected dataset. │
│ kNN is instant; NN needs gradient steps but generalises │
│ better with more data. │
└───────────────────────┬─────────────────────────────────────┘
┌─────────▼──────────┐
│ 3. Train RL │◄───────────────────────┐
│ in simulator │ │
│ (fast, many │ │
│ trajectories) │ │
└─────────┬──────────┘ │
│ │
▼ │
┌─────────────────────┐ │
│ 4. Eval in game │ │
│ + collect new data │ │
│ (merge & prune │ │
│ dataset) │ │
└─────────┬───────────┘ │
│ │
▼ │
┌─────────────────────┐ model improved? │
│ 5. Refit model ├──────── yes ──────────┘
│ on expanded data │
└─────────────────────┘
```
**Step 1 — Human dataset collection**: Run `NuconModelLearner.collect_data()` during your play session. Cover a wide range of states: startup from cold, ramping power, individual rod bank adjustments. Diversity in the dataset directly determines simulator accuracy. See [Model Learning](#model-learning-work-in-progress) for collection details.
**Step 2 — Initial model fitting**: Fit a kNN-GP model (instant) or NN (better extrapolation with larger datasets) using `fit_knn()` or `train_model()`. Prune near-duplicate samples with `drop_redundant()` before fitting. See [Model Learning](#model-learning).
**Step 3 — Train RL in simulator**: Load the fitted model into `NuconSimulator`, then train a `NuconGoalEnv` policy with SAC + HER. The simulator runs far faster than the real game, allowing many trajectories in reasonable time. Pass `UncertaintyPenalty` and `UncertaintyAbort` as objectives/terminators to discourage the policy from wandering into regions the model hasn't seen; `SIM_UNCERTAINTY` is automatically injected into the obs dict when a simulator is active. See [NuconGoalEnv + HER Usage](#nucongoalenv--her-usage).
**Step 4 — Eval in game + collect new data**: Run the trained policy against the real game. This validates simulator accuracy and simultaneously collects new data from states the policy visits, which may be regions the original dataset missed. Run a second `NuconModelLearner` in a background thread to collect concurrently.
**Step 5 — Refit model on expanded data**: Merge new data into the original dataset with `merge_datasets()`, prune with `drop_redundant()`, and refit. Then return to Step 3 with the improved model. Each iteration the simulator gets more accurate and the policy improves.
Stop when the policy performs well in the real game and kNN-GP uncertainty stays low throughout an episode, indicating the policy stays within the known data distribution.
## Testing
NuCon includes a test suite to verify its functionality and compatibility with the Nucleares game.
### Running Tests
To run the tests:
1. Ensure the Nucleares game is running and accessible at http://localhost:8785/ (or update the URL in the test setup).
2. Install pytest: `pip install pytest` (or `pip install -e .[dev]`)
3. Run the tests:
```bash
pytest test/test_core.py
pytest test/test_sim.py
```
### Test Coverage
The tests verify:
- Parameter types match their definitions in NuCon
- Writable parameters can be written to
- Non-writable parameters cannot be written to, even when force-writing
- Enum parameters and their custom truthy values behave correctly
- Simulator functionality and consistency
---
![NuCon Meme](README_meme.jpg)
To use you'll need to install `pillow`. You can do so via
```bash
pip install -e '.[drake]'
```
### Usage:
```python
from nucon.drake import create_drake_meme
items = [
(False, "Play Nucleares manually"),
(True, "Automate it with a script"),
(False, "But the web interface is tedious to use"),
(True, "Write an elegant libary to interface with the game and then use that to write the script"),
(False, "But I would still need to write the control policy by hand"),
(True, "Let's extend the libary such that it trains a policy via Reinforcement Learning"),
(False, "But RL takes a huge number of training samples"),
(True, "Extend the libary to also include an efficient simulator"),
(False, "But I don't know what the actual internal dynamics are"),
(True, "Extend the libary once more to also include a neural network dynamics model"),
(True, "And I'm gonna put a drake meme on the README"),
(False, "Online meme generators only support a single yes/no pair"),
(True, "Let's also add a drake meme generator to the libary"),
]
meme = create_drake_meme(items)
meme.save("README_meme.jpg")
```
## Disclaimer
NuCon is an unofficial tool and is not affiliated with or endorsed by the creators of Nucleares.
## Citing
What? Why would you wanna cite it? What are you even doing?
```
@misc{nucon,
title = {NuCon},
author = {Dominik Roth},
abstract = {NuCon is a Python library to interface with and control Nucleares, a nuclear reactor simulation game. Includes gymnasium bindings for Reinforcement Learning.},
url = {https://git.dominik-roth.eu/dodox/NuCon},
year = {2024},
}
```