Commit Graph

11 Commits

Author SHA1 Message Date
2c1bbc1a31 refactor: move UncertaintyPenalty/Abort into Parameterized_Objectives/Terminators dicts
- uncertainty_penalty -> Parameterized_Objectives['uncertainty_penalty']
- uncertainty_abort   -> Parameterized_Terminators['uncertainty_abort']
- Add Parameterized_Terminators dict (same pattern as Parameterized_Objectives)
- Keep UncertaintyPenalty / UncertaintyAbort as convenience aliases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:57:44 +01:00
041e0ec1bd rename: objectives -> additional_objectives in NuconGoalEnv
Clarifies that the goal reward is the primary built-in objective;
additional_objectives are additive on top of it.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:56:34 +01:00
36a33e74e5 fix: add objectives support to NuconGoalEnv; fix README uncertainty example
- NuconGoalEnv now accepts objectives/objective_weights; additive on top
  of the goal reward, same interface as NuconEnv
- README: use UncertaintyPenalty/UncertaintyAbort correctly (via objectives
  and terminators, not as constructor params that don't exist)
- Step 3 prose updated to reference composable callables

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:55:16 +01:00
f4d45d3cfd feat: NuconGoalEnv, composable uncertainty helpers, kNN-GP naming
- Add NuconGoalEnv for goal-conditioned HER training (SAC + HER)
- Add UncertaintyPenalty and UncertaintyAbort composable callables;
  SIM_UNCERTAINTY injected into obs dict when simulator is active
- Fix rl.py: str-typed params crash, missing Enum import, write-only
  params in action space, broken step() iteration order
- Remove uncertainty state from sim (return value from update() instead)
- Rename kNN -> kNN-GP throughout README; add model selection note

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:51:13 +01:00
65190dffea feat: uncertainty-aware training with penalty and abort
sim.py:
- simulator.update(return_uncertainty=True) calls forward_with_uncertainty
  on kNN models and returns the GP std; returns None for NN or when not
  requested (no extra cost if unused)
- No state stored on simulator; caller decides what to do with the value

rl.py (NuconEnv and NuconGoalEnv):
- uncertainty_penalty_start: above this GP std, subtract a linear penalty
  from the reward (scaled by uncertainty_penalty_scale, default 1.0)
- uncertainty_abort: at or above this GP std, set truncated=True
- Only calls update(return_uncertainty=True) when either threshold is set
- Uncertainty only applies when using a simulator (kNN model); ignored otherwise

Example:
    simulator = NuconSimulator()
    simulator.load_model('reactor_knn.pkl')
    env = NuconGoalEnv(..., simulator=simulator,
                       uncertainty_penalty_start=0.3,
                       uncertainty_abort=0.7,
                       uncertainty_penalty_scale=2.0)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:37:09 +01:00
6cb93ad56d feat: abort trajectory on high kNN uncertainty in simulator
NuconSimulator now accepts uncertainty_threshold (default None = disabled).
When set and using a kNN model, _update_reactor_state() calls
forward_with_uncertainty() and raises HighUncertaintyError if the GP
posterior std exceeds the threshold.

NuconEnv and NuconGoalEnv catch HighUncertaintyError in step() and
return truncated=True, so SB3 bootstraps the value rather than treating
OOD regions as terminal states.

Usage:
    simulator = NuconSimulator(uncertainty_threshold=0.3)
    # episodes are cut short when the policy wanders OOD

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 18:29:54 +01:00
0dab7a6cec Fix rl/sim blockers and add NuconGoalEnv for HER training
rl.py:
- Add missing `from enum import Enum`
- Skip str-typed params in obs/action space construction (was crashing)
- Guard action space: exclude write-only (is_readable=False) and cheat params
- Fix step() param lookup (no longer iterates Nucon, uses _parameters dict directly)
- Correct sim-speed time dilation in real-game sleep
- Extract _build_param_space() helper shared by NuconEnv and NuconGoalEnv
- Add NuconGoalEnv: goal-conditioned env with normalised achieved/desired goal
  vectors, compatible with SB3 HerReplayBuffer; goals sampled per episode
- Register Nucon-goal_power-v0 and Nucon-goal_temp-v0 presets
- Enum obs/action space now scalar index (not one-hot)

sim.py:
- Store self.port and self.host on NuconSimulator
- Add set_model() to accept a pre-loaded model directly
- load_model() detects type by extension (.pkl → kNN, else → NN torch)
  and reads new checkpoint format with embedded input/output param lists
- _update_reactor_state() uses model.input_params (not all readable params),
  calls .forward() directly for both NN and kNN, guards torch.no_grad per type
- Import ReactorKNNModel and pickle

model.py:
- save_model() embeds input_params/output_params in NN checkpoint dict
- load_model() handles new checkpoint format (state_dict key) with fallback

README.md:
- Update note: RODS_POS_ORDERED is no longer the only writable param;
  game v2.2.25.213 exposes rod banks, pumps, MSCVs, switches and more

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-12 17:37:16 +01:00
4c3ad983fc Morer objectives and fixes 2024-10-03 21:56:27 +02:00
d580f77fce Allowed weighted objectives 2024-10-02 19:31:19 +02:00
dc59173fe7 Better parameterized objectives and gym bindings 2024-10-02 19:22:23 +02:00
08828e2dec RL oh yeah 2024-10-02 18:45:06 +02:00