# DPPO Experiment Plan ## Current Status ### Setup Complete ✅ - Installation successful on HoReKa with Python 3.10 venv - SLURM scripts created for automated job submission - All dependencies installed including PyTorch, d4rl, dm-control ### Initial Testing ✅ **DPPO Confirmed Working on HoReKa** - Successfully completed dev test (Job ID 3445106) - Pre-training working: 2 epochs, loss reduction 0.2494→0.2010 - Model checkpoints saved correctly - Ready for full experiments ## Experiments To Run ### 1. Reproduce Paper Results - Gym Tasks **Pre-training Phase**: - hopper-medium-v2 - walker2d-medium-v2 - halfcheetah-medium-v2 **Fine-tuning Phase**: - hopper-v2 - walker2d-v2 - halfcheetah-v2 **Settings**: Paper hyperparameters, 3 seeds each ### 2. Additional Environments (Future) **Robomimic Suite**: - lift, can, square, transport **D3IL Suite**: - avoid_m1, avoid_m2, avoid_m3 **Furniture-Bench Suite**: - one_leg, lamp, round_table (low/med difficulty) ## Running Experiments ### Quick Development Test ```bash ./submit_job.sh dev ``` ### Gym Pre-training ```bash ./submit_job.sh gym hopper pretrain ./submit_job.sh gym walker2d pretrain ./submit_job.sh gym halfcheetah pretrain ``` ### Gym Fine-tuning (after pre-training completes) ```bash ./submit_job.sh gym hopper finetune ./submit_job.sh gym walker2d finetune ./submit_job.sh gym halfcheetah finetune ``` ### Manual SLURM Submission ```bash # With environment variables TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh ``` ## Job Tracking | Job ID | Type | Task | Mode | Status | Duration | Results | |--------|------|------|------|---------|----------|---------| | 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) | ## Configuration Notes ### WandB Setup Required ```bash export WANDB_API_KEY= export WANDB_ENTITY= ``` ### Resource Requirements - **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated - **Production**: 8h, 32GB RAM, 40 CPUs, accelerated ## Issues Encountered No issues with the DPPO repository - installation and setup completed successfully. ## Next Steps 1. Run corrected dev test 2. Begin systematic pre-training experiments 3. Document successful runs and results