diff --git a/EXPERIMENT_PLAN.md b/EXPERIMENT_PLAN.md index 0cbbfdf..b2b2b6f 100644 --- a/EXPERIMENT_PLAN.md +++ b/EXPERIMENT_PLAN.md @@ -8,25 +8,26 @@ - All dependencies installed including PyTorch, d4rl, dm-control ### Initial Testing -✅ **DPPO Confirmed Working on HoReKa** -- Successfully completed dev test (Job ID 3445106) -- Pre-training working: 2 epochs, loss reduction 0.2494→0.2010 -- Model checkpoints saved correctly -- Ready for full experiments +✅ **DPPO Confirmed Working on HoReKa with WandB** +- Successfully completed dev test (Job ID 3445117) +- Quick verification: 2 epochs only (not full training), loss reduction 0.2494→0.2010 +- WandB logging working: https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf +- Model checkpoints and logging fully functional +- Ready for full 200-epoch production runs ## Experiments To Run ### 1. Reproduce Paper Results - Gym Tasks -**Pre-training Phase**: -- hopper-medium-v2 -- walker2d-medium-v2 -- halfcheetah-medium-v2 +**Pre-training Phase** (Train diffusion model on offline D4RL datasets): +- hopper-medium-v2 → diffusion model trained on offline data (200 epochs) +- walker2d-medium-v2 → diffusion model trained on offline data (200 epochs) +- halfcheetah-medium-v2 → diffusion model trained on offline data (200 epochs) -**Fine-tuning Phase**: -- hopper-v2 -- walker2d-v2 -- halfcheetah-v2 +**Fine-tuning Phase** (PPO fine-tune diffusion model with online interaction): +- hopper-v2 → fine-tune pre-trained hopper model with PPO + online env +- walker2d-v2 → fine-tune pre-trained walker2d model with PPO + online env +- halfcheetah-v2 → fine-tune pre-trained halfcheetah model with PPO + online env **Settings**: Paper hyperparameters, 3 seeds each @@ -92,6 +93,18 @@ No issues with the DPPO repository - installation and setup completed successful ## Next Steps -1. Run corrected dev test -2. Begin systematic pre-training experiments -3. Document successful runs and results \ No newline at end of file +### Immediate Tasks (To Verify All Environments Work) + +1. **Test remaining Gym environments**: + - [ ] walker2d-medium-v2 (2 epochs dev test) + - [ ] halfcheetah-medium-v2 (2 epochs dev test) + +2. **Test other environment types**: + - [ ] Robomimic: can task (basic test) + - [ ] D3IL: avoid_m1 (basic test) + +3. **Full production runs** (after confirming all work): + - [ ] Full pre-training: hopper, walker2d, halfcheetah (200 epochs each) + - [ ] Fine-tuning experiments + +**Status**: Only hopper-medium-v2 confirmed working. Need to verify other environments before production runs. \ No newline at end of file