Fix Hydra config error in dev script

- Change train.n_iters to train.n_epochs (correct DPPO parameter)
- Update experiment tracking with failed job details
- Ready for corrected dev test
This commit is contained in:
ys1087@partner.kit.edu 2025-08-27 12:07:38 +02:00
parent f88a5be4fe
commit 4adf67694a
2 changed files with 9 additions and 6 deletions

View File

@ -71,7 +71,7 @@ TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
| Job ID | Type | Task | Mode | Status | Duration | Results |
|--------|------|------|------|---------|----------|---------|
| 3445081 | dev test | hopper | pretrain | PENDING | 30min | - |
| 3445081 | dev test | hopper | pretrain | ❌ FAILED | 33sec | Hydra config error |
## Configuration Notes
@ -87,7 +87,11 @@ export WANDB_ENTITY=<your_username>
## Issues Encountered
None so far - installation completed without code modifications.
### Fixed Issues
1. **Hydra Configuration Error** (Job 3445081)
- Issue: Wrong parameter names in dev script (`train.n_iters` instead of `train.n_epochs`)
- Fix: Updated to use correct DPPO config parameters
- Status: Fixed in commit
## Next Steps

View File

@ -41,12 +41,11 @@ echo "PyTorch version: $(python -c 'import torch; print(torch.__version__)')"
echo "CUDA available: $(python -c 'import torch; print(torch.cuda.is_available())')"
echo ""
# Run a quick pre-training test with reduced steps
# Run a quick pre-training test with reduced epochs
python script/run.py --config-name=pre_diffusion_mlp \
--config-dir=cfg/gym/pretrain/hopper-medium-v2 \
train.n_iters=10 \
train.log_interval=5 \
train.checkpoint_interval=10 \
train.n_epochs=2 \
train.save_model_freq=1 \
wandb=${WANDB_MODE:-null}
echo "Dev test completed!"