Fix Hydra config error in dev script

- Change train.n_iters to train.n_epochs (correct DPPO parameter) - Update experiment tracking with failed job details - Ready for corrected dev test
2025-08-27 12:07:38 +02:00 · 2025-08-27 12:07:38 +02:00 · 4adf67694a
commit 4adf67694a
parent f88a5be4fe
2 changed files with 9 additions and 6 deletions
--- a/EXPERIMENT_PLAN.md
+++ b/EXPERIMENT_PLAN.md
@ -71,7 +71,7 @@ TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh

 | Job ID | Type | Task | Mode | Status | Duration | Results |
 |--------|------|------|------|---------|----------|---------|
-| 3445081 | dev test | hopper | pretrain | PENDING | 30min | - |
+| 3445081 | dev test | hopper | pretrain | ❌ FAILED | 33sec | Hydra config error |

 ## Configuration Notes

@ -87,7 +87,11 @@ export WANDB_ENTITY=<your_username>

 ## Issues Encountered

-None so far - installation completed without code modifications.
+### Fixed Issues
+1. **Hydra Configuration Error** (Job 3445081)
+   - Issue: Wrong parameter names in dev script (`train.n_iters` instead of `train.n_epochs`)
+   - Fix: Updated to use correct DPPO config parameters
+   - Status: Fixed in commit

 ## Next Steps

--- a/slurm/run_dppo_dev.sh
+++ b/slurm/run_dppo_dev.sh
@ -41,12 +41,11 @@ echo "PyTorch version: $(python -c 'import torch; print(torch.__version__)')"
 echo "CUDA available: $(python -c 'import torch; print(torch.cuda.is_available())')"
 echo ""

-# Run a quick pre-training test with reduced steps
+# Run a quick pre-training test with reduced epochs
 python script/run.py --config-name=pre_diffusion_mlp \
    --config-dir=cfg/gym/pretrain/hopper-medium-v2 \
-    train.n_iters=10 \
-    train.log_interval=5 \
-    train.checkpoint_interval=10 \
+    train.n_epochs=2 \
+    train.save_model_freq=1 \
    wandb=${WANDB_MODE:-null}

 echo "Dev test completed!"