- Updated all WandB project names to use dppo- prefix for organization - Added flexible dev testing script for all environments - Created organized dev_tests directory for test scripts - Fixed MuJoCo compilation issues (added GCC compiler flags) - Documented Python 3.10 compatibility and Furniture-Bench limitation - Validated pre-training for Gym, Robomimic, D3IL environments - Updated experiment tracking with validation results - Enhanced README with troubleshooting and setup instructions
6.6 KiB
DPPO Experiment Plan
Current Status
Setup Complete
- Installation successful on HoReKa with Python 3.10 venv
- SLURM scripts created for automated job submission
- All dependencies installed including PyTorch, d4rl, dm-control
- WandB integration configured with dppo- project prefix
Initial Testing Status
- DPPO confirmed working on HoReKa with WandB
- Dev test completed successfully (Job ID 3445117)
- Loss reduction verified: 0.2494→0.2010 over 2 epochs
- WandB logging functional: View Run
- Model checkpoints and logging operational
- All environments validated on dev partition
- Ready for production runs
Experiments To Run
1. Reproduce Paper Results - Gym Tasks
Pre-training Phase (Behavior cloning on offline datasets):
- hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
- walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
- halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
Fine-tuning Phase (DPPO: Policy gradient on diffusion denoising process):
- hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
- walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
- halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
Settings: Paper hyperparameters, 3 seeds each
2. Additional Environments (Future)
Robomimic Suite:
- lift, can, square, transport
D3IL Suite:
- avoid_m1, avoid_m2, avoid_m3
Furniture-Bench Suite:
- one_leg, lamp, round_table (low/med difficulty)
Running Experiments
Quick Development Test
./submit_job.sh dev
Gym Pre-training
./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain
./submit_job.sh gym halfcheetah pretrain
Gym Fine-tuning (after pre-training completes)
./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune
Manual SLURM Submission
# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
Job Tracking
Job ID | Type | Task | Mode | Status | Duration | Results |
---|---|---|---|---|---|---|
3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | WandB |
3445154 | dev test | walker2d | pretrain | ✅ SUCCESS | ~2m | Completed |
3445155 | dev test | halfcheetah | pretrain | 🔄 RUNNING | ~2m | SLURM: 3445155 |
3445158 | dev test | hopper | finetune | 🔄 QUEUED | 30m | SLURM: 3445158 |
Note:
- Production job 3445123 cancelled (cluster policy: no prod jobs while dev running)
- WandB project names updated to start with "dppo-" prefix
- Focused on Phase 1 validation before production runs
Configuration Notes
WandB Setup Required
export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>
Resource Requirements
- Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
- Production: 8h, 32GB RAM, 40 CPUs, accelerated
Issues Encountered
No issues with the DPPO repository - installation and setup completed successfully.
Paper Reproduction Progress
Full Paper Results (Target: All experiments in WandB)
Goal: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account.
Gym Tasks (Core Paper Results)
- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
Additional Environment Suites (Extended Results)
- Robomimic Tasks: lift, can, square, transport (pre-train + fine-tune)
- D3IL Tasks: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune)
- Furniture-Bench Tasks: one_leg, lamp, round_table (low/med difficulty)
Success Criteria
- All pre-training runs complete successfully (loss convergence)
- All fine-tuning runs complete successfully (performance improvement)
- All experiments logged with proper WandB tracking
- Results comparable to paper benchmarks
- Complete documentation of hyperparameters and settings
Next Steps
Phase 1: Validation on Dev Partition (Current Priority)
Goal: Test all environments and modes on dev partition to validate installation and document any issues.
Dev Validation Todo List (In Order):
-
- Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS]
-
- Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS]
-
- Monitor hopper finetune dev test (Job 3445158) [PENDING]
-
- Test walker2d finetune on dev
-
- Test halfcheetah finetune on dev
-
- Test Robomimic lift pretrain on dev
-
- Test Robomimic lift finetune on dev
-
- Test Robomimic can pretrain on dev
-
- Test Robomimic can finetune on dev
-
- Test Robomimic square pretrain on dev
-
- Test Robomimic square finetune on dev
-
- Test Robomimic transport pretrain on dev
-
- Test Robomimic transport finetune on dev
-
- Test D3IL avoid_m1 pretrain on dev
-
- Test D3IL avoid_m1 finetune on dev
-
- Test D3IL avoid_m2 pretrain on dev
-
- Test D3IL avoid_m2 finetune on dev
-
- Test D3IL avoid_m3 pretrain on dev
-
- Test D3IL avoid_m3 finetune on dev
-
- Test Furniture one_leg_low pretrain on dev
-
- Test Furniture one_leg_low finetune on dev
-
- Test Furniture lamp_low pretrain on dev
-
- Test Furniture lamp_low finetune on dev
-
- Document any issues found in README
-
- Verify all WandB logging works with dppo- prefix
Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)
Phase 2: Production Runs (After Dev Validation)
Only proceed after Phase 1 complete and all issues resolved
2.1 Full Gym Pipeline
- hopper: pre-train (200 epochs) → fine-tune
- walker2d: pre-train (200 epochs) → fine-tune
- halfcheetah: pre-train (200 epochs) → fine-tune
2.2 Extended Environments
- All validated environments from Phase 1
Current Status: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.