From f0cc7ba9c42d6ff60850217cb1872a94091562df Mon Sep 17 00:00:00 2001 From: Dominik Roth Date: Thu, 12 Mar 2026 18:19:04 +0100 Subject: [PATCH] docs: replace em-dashes in body text with natural punctuation Keep em-dashes in step headings, replace in prose with ;/:/./, Co-Authored-By: Claude Sonnet 4.6 --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index ada423a..d19e81a 100644 --- a/README.md +++ b/README.md @@ -127,17 +127,17 @@ pip install -e '.[rl]' Two environment classes are provided in `nucon/rl.py`: -**`NuconEnv`** — classic fixed-objective environment. You define one or more objectives at construction time (e.g. maximise power output, keep temperature in range). The agent always trains toward the same goal. +**`NuconEnv`**: classic fixed-objective environment. You define one or more objectives at construction time (e.g. maximise power output, keep temperature in range). The agent always trains toward the same goal. - Observation space: all readable numeric parameters (~290 dims). - Action space: all readable-back writable parameters (~30 dims): 9 individual rod bank positions, 3 MSCVs, 3 turbine bypass valves, 6 coolant pump speeds, condenser pump, freight/vent switches, resistor banks, and more. - Objectives: predefined strings (`'max_power'`, `'episode_time'`) or arbitrary callables `(obs) -> float`. Multiple objectives are weighted-summed. -**`NuconGoalEnv`** — goal-conditioned environment. The desired goal (e.g. target generator output) is sampled at the start of each episode and provided as part of the observation. A single policy learns to reach *any* goal in the specified range, making it far more useful than a fixed-objective agent. Designed for training with [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495), which makes sparse-reward goal-conditioned training tractable. +**`NuconGoalEnv`**: goal-conditioned environment. The desired goal (e.g. target generator output) is sampled at the start of each episode and provided as part of the observation. A single policy learns to reach *any* goal in the specified range, making it far more useful than a fixed-objective agent. Designed for training with [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495), which makes sparse-reward goal-conditioned training tractable. - Observation space: `Dict` with keys `observation` (non-goal params), `achieved_goal` (current goal param values, normalised to [0,1]), `desired_goal` (target, normalised to [0,1]). - Goals are sampled uniformly from the specified `goal_range` each episode. -- Reward defaults to negative L2 distance in normalised goal space (dense). Pass `tolerance` for a sparse `{0, -1}` reward — this works particularly well with HER. +- Reward defaults to negative L2 distance in normalised goal space (dense). Pass `tolerance` for a sparse `{0, -1}` reward; this works particularly well with HER. ### NuconEnv Usage @@ -193,7 +193,7 @@ env.close() ### NuconGoalEnv + HER Usage -HER works by relabelling past trajectories with the goal that was *actually achieved*, turning every episode into useful training signal even when the agent never reaches the intended target. This makes it much more sample-efficient than standard RL for goal-reaching tasks — important given how slow the real game is. +HER works by relabelling past trajectories with the goal that was *actually achieved*, turning every episode into useful training signal even when the agent never reaches the intended target. This makes it much more sample-efficient than standard RL for goal-reaching tasks. This matters a lot given how slow the real game is. ```python from nucon.rl import NuconGoalEnv @@ -301,7 +301,7 @@ To address the challenge of unknown game dynamics, NuCon provides tools for coll - **Data Collection**: Gathers state transitions from human play or automated agents. `time_delta` is specified in game-time seconds; wall-clock sleep is automatically adjusted for `GAME_SIM_SPEED` so collected deltas are uniform regardless of simulation speed. - **Automatic param filtering**: Junk params (GAME_VERSION, TIME, ALARMS_ACTIVE, …) and params from uninstalled subsystems (returns `None`) are automatically excluded from model inputs/outputs. - **Two model backends**: Neural network (NN) or k-Nearest Neighbours with GP interpolation (kNN). -- **Uncertainty estimation**: The kNN backend returns a GP posterior standard deviation alongside each prediction — 0 means the query lies on known data, ~1 means it is out of distribution. +- **Uncertainty estimation**: The kNN backend returns a GP posterior standard deviation alongside each prediction; 0 means the query lies on known data, ~1 means it is out of distribution. - **Dataset management**: Tools for saving, loading, merging, and pruning datasets. ### Additional Dependencies @@ -358,7 +358,7 @@ The trained models can be integrated into the NuconSimulator to provide accurate ## Full Training Loop -The recommended end-to-end workflow for training an RL operator is an iterative cycle of real-game data collection, model fitting, and simulated training. The real game is slow and cannot be parallelised, so the bulk of RL training happens in the simulator — the game is used only as an oracle for data and evaluation. +The recommended end-to-end workflow for training an RL operator is an iterative cycle of real-game data collection, model fitting, and simulated training. The real game is slow and cannot be parallelised, so the bulk of RL training happens in the simulator. The game is used only as an oracle for data and evaluation. ``` ┌─────────────────────────────────────────────────────────────┐ @@ -399,7 +399,7 @@ The recommended end-to-end workflow for training an RL operator is an iterative ### Step 1 — Human dataset collection -Start `NuconModelLearner` before or during your play session. Try to cover a wide range of reactor states — startup from cold, ramping power up and down, adjusting individual rod banks, pump speed changes. Diversity in the dataset directly determines how accurate the simulator will be. +Start `NuconModelLearner` before or during your play session. Try to cover a wide range of reactor states: startup from cold, ramping power up and down, adjusting individual rod banks, pump speed changes. Diversity in the dataset directly determines how accurate the simulator will be. ```python from nucon.model import NuconModelLearner @@ -465,7 +465,7 @@ model.save('rl_policy.zip') ### Step 4 — Eval in game + collect new data -Run the trained policy against the real game. This validates whether the simulator was accurate enough, and simultaneously collects new data covering states the policy visits — which may be regions the original dataset missed. +Run the trained policy against the real game. This validates whether the simulator was accurate enough, and simultaneously collects new data covering states the policy visits, which may be regions the original dataset missed. ```python from nucon.rl import NuconGoalEnv