# DPPO Experiment Plan ## Current Status ### Setup Complete ✅ - Installation successful on HoReKa with Python 3.10 venv - SLURM scripts created for automated job submission - All dependencies installed including PyTorch, d4rl, dm-control ### Initial Testing ✅ **DPPO Confirmed Working on HoReKa with WandB** - Successfully completed dev test (Job ID 3445117) - Quick verification: 2 epochs only (not full training), loss reduction 0.2494→0.2010 - WandB logging working: https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf - Model checkpoints and logging fully functional - Ready for full 200-epoch production runs ## Experiments To Run ### 1. Reproduce Paper Results - Gym Tasks **Pre-training Phase** (Train diffusion model on offline D4RL datasets): - hopper-medium-v2 → diffusion model trained on offline data (200 epochs) - walker2d-medium-v2 → diffusion model trained on offline data (200 epochs) - halfcheetah-medium-v2 → diffusion model trained on offline data (200 epochs) **Fine-tuning Phase** (PPO fine-tune diffusion model with online interaction): - hopper-v2 → fine-tune pre-trained hopper model with PPO + online env - walker2d-v2 → fine-tune pre-trained walker2d model with PPO + online env - halfcheetah-v2 → fine-tune pre-trained halfcheetah model with PPO + online env **Settings**: Paper hyperparameters, 3 seeds each ### 2. Additional Environments (Future) **Robomimic Suite**: - lift, can, square, transport **D3IL Suite**: - avoid_m1, avoid_m2, avoid_m3 **Furniture-Bench Suite**: - one_leg, lamp, round_table (low/med difficulty) ## Running Experiments ### Quick Development Test ```bash ./submit_job.sh dev ``` ### Gym Pre-training ```bash ./submit_job.sh gym hopper pretrain ./submit_job.sh gym walker2d pretrain ./submit_job.sh gym halfcheetah pretrain ``` ### Gym Fine-tuning (after pre-training completes) ```bash ./submit_job.sh gym hopper finetune ./submit_job.sh gym walker2d finetune ./submit_job.sh gym halfcheetah finetune ``` ### Manual SLURM Submission ```bash # With environment variables TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh ``` ## Job Tracking | Job ID | Type | Task | Mode | Status | Duration | Results | |--------|------|------|------|---------|----------|---------| | 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) | | 3445123 | production | hopper | pretrain | 🔄 QUEUED | 8h | SLURM: 3445123 | ## Configuration Notes ### WandB Setup Required ```bash export WANDB_API_KEY= export WANDB_ENTITY= ``` ### Resource Requirements - **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated - **Production**: 8h, 32GB RAM, 40 CPUs, accelerated ## Issues Encountered No issues with the DPPO repository - installation and setup completed successfully. ## Next Steps ### Immediate Tasks (To Verify All Environments Work) 1. **Test remaining Gym environments**: - [ ] walker2d-medium-v2 (2 epochs dev test) - [ ] halfcheetah-medium-v2 (2 epochs dev test) 2. **Test other environment types**: - [ ] Robomimic: can task (basic test) - [ ] D3IL: avoid_m1 (basic test) 3. **Full production runs** (after confirming all work): - [ ] Full pre-training: hopper, walker2d, halfcheetah (200 epochs each) - [ ] Fine-tuning experiments **Status**: Only hopper-medium-v2 confirmed working. Need to verify other environments before production runs.