sim.py:
- simulator.update(return_uncertainty=True) calls forward_with_uncertainty
on kNN models and returns the GP std; returns None for NN or when not
requested (no extra cost if unused)
- No state stored on simulator; caller decides what to do with the value
rl.py (NuconEnv and NuconGoalEnv):
- uncertainty_penalty_start: above this GP std, subtract a linear penalty
from the reward (scaled by uncertainty_penalty_scale, default 1.0)
- uncertainty_abort: at or above this GP std, set truncated=True
- Only calls update(return_uncertainty=True) when either threshold is set
- Uncertainty only applies when using a simulator (kNN model); ignored otherwise
Example:
simulator = NuconSimulator()
simulator.load_model('reactor_knn.pkl')
env = NuconGoalEnv(..., simulator=simulator,
uncertainty_penalty_start=0.3,
uncertainty_abort=0.7,
uncertainty_penalty_scale=2.0)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NuconSimulator now accepts uncertainty_threshold (default None = disabled).
When set and using a kNN model, _update_reactor_state() calls
forward_with_uncertainty() and raises HighUncertaintyError if the GP
posterior std exceeds the threshold.
NuconEnv and NuconGoalEnv catch HighUncertaintyError in step() and
return truncated=True, so SB3 bootstraps the value rather than treating
OOD regions as terminal states.
Usage:
simulator = NuconSimulator(uncertainty_threshold=0.3)
# episodes are cut short when the policy wanders OOD
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The prior sections already have full code examples; the training loop
section now just describes each step concisely and links back to them.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents the iterative sim-to-real workflow:
1. Human data collection during gameplay
2. Initial model fitting (kNN or NN)
3. RL training in simulator (SAC + HER)
4. Eval in game while collecting new data
5. Refit model, repeat
Includes ASCII flow diagram, code for each step, and a convergence
criterion (low kNN uncertainty throughout episode).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The None-param filtering probe at init also needs to wait for the game
to be reachable, not just the collection loop.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Save dataset every N steps (default 10) so a disconnect loses at most
one checkpoint's worth of samples instead of everything
- Retry _get_state() on ConnectionError/Timeout rather than crashing,
resuming automatically once the game comes back up
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Model type is irrelevant during data collection. Models are now created
lazily on first use: train_model() creates a ReactorDynamicsModel,
fit_knn(k) creates a ReactorKNNModel. load_model() detects type by
file extension as before. drop_well_fitted() now checks model exists.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Describe both NuconEnv and NuconGoalEnv with their obs/action spaces
- Explain goal-conditioned approach and why HER is appropriate
- Add SAC + HerReplayBuffer usage example with recommended hyperparams
- Show how to inject a custom goal at inference time
- List registered goal env presets
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
rl.py:
- Add missing `from enum import Enum`
- Skip str-typed params in obs/action space construction (was crashing)
- Guard action space: exclude write-only (is_readable=False) and cheat params
- Fix step() param lookup (no longer iterates Nucon, uses _parameters dict directly)
- Correct sim-speed time dilation in real-game sleep
- Extract _build_param_space() helper shared by NuconEnv and NuconGoalEnv
- Add NuconGoalEnv: goal-conditioned env with normalised achieved/desired goal
vectors, compatible with SB3 HerReplayBuffer; goals sampled per episode
- Register Nucon-goal_power-v0 and Nucon-goal_temp-v0 presets
- Enum obs/action space now scalar index (not one-hot)
sim.py:
- Store self.port and self.host on NuconSimulator
- Add set_model() to accept a pre-loaded model directly
- load_model() detects type by extension (.pkl → kNN, else → NN torch)
and reads new checkpoint format with embedded input/output param lists
- _update_reactor_state() uses model.input_params (not all readable params),
calls .forward() directly for both NN and kNN, guards torch.no_grad per type
- Import ReactorKNNModel and pickle
model.py:
- save_model() embeds input_params/output_params in NN checkpoint dict
- load_model() handles new checkpoint format (state_dict key) with fallback
README.md:
- Update note: RODS_POS_ORDERED is no longer the only writable param;
game v2.2.25.213 exposes rod banks, pumps, MSCVs, switches and more
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Data collection:
- time_delta is now target game-time; wall sleep = game_delta / sim_speed
so stored deltas are uniform regardless of GAME_SIM_SPEED setting
- Auto-exclude junk params (GAME_VERSION, TIME, ALARMS_ACTIVE, …) and
params returning None (uninstalled subsystems)
- Optional include_valve_states=True adds all 53 valve positions as inputs
Model backends (model_type='nn' or 'knn'):
- ReactorKNNModel: k-nearest neighbours with GP interpolation
- Finds k nearest states, computes per-second transition rates,
linearly scales to requested game_delta (linear-in-time assumption)
- forward_with_uncertainty() returns (prediction_dict, gp_std)
where std≈0 = on known data, std≈1 = out of distribution
- NN training fixed: loss computed in tensor space, mse_loss per batch
Dataset management:
- drop_well_fitted(error_threshold): drop samples model predicts well,
keep hard cases (useful for NN curriculum)
- drop_redundant(min_state_distance, min_output_distance): drop samples
that are close in BOTH input state AND output transition space, keeping
genuinely different dynamics even at the same input state
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename is_admin/admin_mode -> is_cheat/cheat_mode (only FUN_* event
triggers are cheat params, not operational commands like SCRAM)
- Fix steam ejector valve write commands: int 0-100, not bool
- Move SCRAM, EMERGENCY_STOP, bay hatches, turbine trip etc. to normal
write-only (not cheat-gated)
- Add FUN_IS_ENABLED to readable params (it appears in GET list)
- Add get_valve/get_valves, open/close/off_valve(s) methods with correct
actuator semantics: OPEN/CLOSE powers motor, OFF holds position
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Parameters like CORE_SCRAM_BUTTON, CORE_EMERGENCY_STOP, bay hatch/fuel
loading, VALVE_OPEN/CLOSE/OFF, STEAM_TURBINE_TRIP, and all FUN_* event
triggers are now marked is_admin=True. Writing to them is blocked unless
the Nucon instance has admin_mode=True or force=True is used.
Normal control setpoints (MSCV_*, STEAM_TURBINE_*_BYPASS_ORDERED,
CHEM_BORON_*) remain write-only but are not admin-gated.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Export ParameterEnum from __init__
- Add flask and numpy to dev dependencies
- Fix sim: remove run() call from test fixture, handle WEBSERVER_LIST_VARIABLES and WEBSERVER_BATCH_GET, normalize variable names to uppercase
- Remove RODS params from sim state (no longer part of sim model)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>