# DPPO Experiment Plan ## Current Status ### Setup Complete - [x] Installation successful on HoReKa with Python 3.10 venv - [x] SLURM scripts created for automated job submission - [x] All dependencies installed including PyTorch, d4rl, dm-control - [x] WandB integration configured with dppo- project prefix ### Initial Testing Status - [x] DPPO confirmed working on HoReKa with WandB - [x] Dev test completed successfully (Job ID 3445117) - [x] Loss reduction verified: 0.2494→0.2010 over 2 epochs - [x] WandB logging functional: [View Run](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) - [x] Model checkpoints and logging operational - [ ] All environments validated on dev partition - [ ] Ready for production runs ## Experiments To Run ### 1. Reproduce Paper Results - Gym Tasks **Pre-training Phase** (Behavior cloning on offline datasets): - hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) - walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) - halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) **Fine-tuning Phase** (DPPO: Policy gradient on diffusion denoising process): - hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" - walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" - halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" **Settings**: Paper hyperparameters, 3 seeds each ### 2. Additional Environments (Future) **Robomimic Suite**: - lift, can, square, transport **D3IL Suite**: - avoid_m1, avoid_m2, avoid_m3 **Furniture-Bench Suite**: - one_leg, lamp, round_table (low/med difficulty) ## Running Experiments ### Quick Development Test ```bash ./submit_job.sh dev ``` ### Gym Pre-training ```bash ./submit_job.sh gym hopper pretrain ./submit_job.sh gym walker2d pretrain ./submit_job.sh gym halfcheetah pretrain ``` ### Gym Fine-tuning (after pre-training completes) ```bash ./submit_job.sh gym hopper finetune ./submit_job.sh gym walker2d finetune ./submit_job.sh gym halfcheetah finetune ``` ### Manual SLURM Submission ```bash # With environment variables TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh ``` ## Job Tracking | Job ID | Type | Task | Mode | Status | Duration | Results | |--------|------|------|------|---------|----------|---------| | 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) | | 3445154 | dev test | walker2d | pretrain | ✅ SUCCESS | ~2m | Completed | | 3445155 | dev test | halfcheetah | pretrain | 🔄 RUNNING | ~2m | SLURM: 3445155 | | 3445158 | dev test | hopper | finetune | 🔄 QUEUED | 30m | SLURM: 3445158 | **Note**: - Production job 3445123 cancelled (cluster policy: no prod jobs while dev running) - WandB project names updated to start with "dppo-" prefix - Focused on Phase 1 validation before production runs ## Configuration Notes ### WandB Setup Required ```bash export WANDB_API_KEY= export WANDB_ENTITY= ``` ### Resource Requirements - **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated - **Production**: 8h, 32GB RAM, 40 CPUs, accelerated ## Issues Encountered No issues with the DPPO repository - installation and setup completed successfully. ## Paper Reproduction Progress ### Full Paper Results (Target: All experiments in WandB) **Goal**: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account. #### Gym Tasks (Core Paper Results) - [ ] **hopper-medium-v2 → hopper-v2**: Pre-train (200 epochs) + Fine-tune - [ ] **walker2d-medium-v2 → walker2d-v2**: Pre-train (200 epochs) + Fine-tune - [ ] **halfcheetah-medium-v2 → halfcheetah-v2**: Pre-train (200 epochs) + Fine-tune #### Additional Environment Suites (Extended Results) - [ ] **Robomimic Tasks**: lift, can, square, transport (pre-train + fine-tune) - [ ] **D3IL Tasks**: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune) - [ ] **Furniture-Bench Tasks**: one_leg, lamp, round_table (low/med difficulty) #### Success Criteria - [ ] All pre-training runs complete successfully (loss convergence) - [ ] All fine-tuning runs complete successfully (performance improvement) - [ ] All experiments logged with proper WandB tracking - [ ] Results comparable to paper benchmarks - [ ] Complete documentation of hyperparameters and settings ## Next Steps ### Phase 1: Validation on Dev Partition (Current Priority) **Goal**: Test all environments and modes on dev partition to validate installation and document any issues. #### Dev Validation Todo List (In Order): 1. - [ ] Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS] 2. - [ ] Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS] 3. - [ ] Monitor hopper finetune dev test (Job 3445158) [PENDING] 4. - [ ] Test walker2d finetune on dev 5. - [ ] Test halfcheetah finetune on dev 6. - [ ] Test Robomimic lift pretrain on dev 7. - [ ] Test Robomimic lift finetune on dev 8. - [ ] Test Robomimic can pretrain on dev 9. - [ ] Test Robomimic can finetune on dev 10. - [ ] Test Robomimic square pretrain on dev 11. - [ ] Test Robomimic square finetune on dev 12. - [ ] Test Robomimic transport pretrain on dev 13. - [ ] Test Robomimic transport finetune on dev 14. - [ ] Test D3IL avoid_m1 pretrain on dev 15. - [ ] Test D3IL avoid_m1 finetune on dev 16. - [ ] Test D3IL avoid_m2 pretrain on dev 17. - [ ] Test D3IL avoid_m2 finetune on dev 18. - [ ] Test D3IL avoid_m3 pretrain on dev 19. - [ ] Test D3IL avoid_m3 finetune on dev 20. - [ ] Test Furniture one_leg_low pretrain on dev 21. - [ ] Test Furniture one_leg_low finetune on dev 22. - [ ] Test Furniture lamp_low pretrain on dev 23. - [ ] Test Furniture lamp_low finetune on dev 24. - [ ] Document any issues found in README 25. - [ ] Verify all WandB logging works with dppo- prefix **Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)** ### Phase 2: Production Runs (After Dev Validation) **Only proceed after Phase 1 complete and all issues resolved** #### 2.1 Full Gym Pipeline - [ ] hopper: pre-train (200 epochs) → fine-tune - [ ] walker2d: pre-train (200 epochs) → fine-tune - [ ] halfcheetah: pre-train (200 epochs) → fine-tune #### 2.2 Extended Environments - [ ] All validated environments from Phase 1 **Current Status**: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.