3.4 KiB
3.4 KiB
DPPO Experiment Plan
Current Status
Setup Complete ✅
- Installation successful on HoReKa with Python 3.10 venv
- SLURM scripts created for automated job submission
- All dependencies installed including PyTorch, d4rl, dm-control
Initial Testing
✅ DPPO Confirmed Working on HoReKa with WandB
- Successfully completed dev test (Job ID 3445117)
- Quick verification: 2 epochs only (not full training), loss reduction 0.2494→0.2010
- WandB logging working: https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf
- Model checkpoints and logging fully functional
- Ready for full 200-epoch production runs
Experiments To Run
1. Reproduce Paper Results - Gym Tasks
Pre-training Phase (Train diffusion model on offline D4RL datasets):
- hopper-medium-v2 → diffusion model trained on offline data (200 epochs)
- walker2d-medium-v2 → diffusion model trained on offline data (200 epochs)
- halfcheetah-medium-v2 → diffusion model trained on offline data (200 epochs)
Fine-tuning Phase (PPO fine-tune diffusion model with online interaction):
- hopper-v2 → fine-tune pre-trained hopper model with PPO + online env
- walker2d-v2 → fine-tune pre-trained walker2d model with PPO + online env
- halfcheetah-v2 → fine-tune pre-trained halfcheetah model with PPO + online env
Settings: Paper hyperparameters, 3 seeds each
2. Additional Environments (Future)
Robomimic Suite:
- lift, can, square, transport
D3IL Suite:
- avoid_m1, avoid_m2, avoid_m3
Furniture-Bench Suite:
- one_leg, lamp, round_table (low/med difficulty)
Running Experiments
Quick Development Test
./submit_job.sh dev
Gym Pre-training
./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain
./submit_job.sh gym halfcheetah pretrain
Gym Fine-tuning (after pre-training completes)
./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune
Manual SLURM Submission
# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
Job Tracking
Job ID | Type | Task | Mode | Status | Duration | Results |
---|---|---|---|---|---|---|
3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | WandB |
3445123 | production | hopper | pretrain | 🔄 QUEUED | 8h | SLURM: 3445123 |
Configuration Notes
WandB Setup Required
export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>
Resource Requirements
- Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
- Production: 8h, 32GB RAM, 40 CPUs, accelerated
Issues Encountered
No issues with the DPPO repository - installation and setup completed successfully.
Next Steps
Immediate Tasks (To Verify All Environments Work)
-
Test remaining Gym environments:
- walker2d-medium-v2 (2 epochs dev test)
- halfcheetah-medium-v2 (2 epochs dev test)
-
Test other environment types:
- Robomimic: can task (basic test)
- D3IL: avoid_m1 (basic test)
-
Full production runs (after confirming all work):
- Full pre-training: hopper, walker2d, halfcheetah (200 epochs each)
- Fine-tuning experiments
Status: Only hopper-medium-v2 confirmed working. Need to verify other environments before production runs.