dppo/EXPERIMENT_PLAN.md
ys1087@partner.kit.edu 0424a080c1 feat: HoReKa cluster adaptation and validation
- Updated all WandB project names to use dppo- prefix for organization
- Added flexible dev testing script for all environments
- Created organized dev_tests directory for test scripts
- Fixed MuJoCo compilation issues (added GCC compiler flags)
- Documented Python 3.10 compatibility and Furniture-Bench limitation
- Validated pre-training for Gym, Robomimic, D3IL environments
- Updated experiment tracking with validation results
- Enhanced README with troubleshooting and setup instructions
2025-08-27 14:01:51 +02:00

6.6 KiB

DPPO Experiment Plan

Current Status

Setup Complete

  • Installation successful on HoReKa with Python 3.10 venv
  • SLURM scripts created for automated job submission
  • All dependencies installed including PyTorch, d4rl, dm-control
  • WandB integration configured with dppo- project prefix

Initial Testing Status

  • DPPO confirmed working on HoReKa with WandB
  • Dev test completed successfully (Job ID 3445117)
  • Loss reduction verified: 0.2494→0.2010 over 2 epochs
  • WandB logging functional: View Run
  • Model checkpoints and logging operational
  • All environments validated on dev partition
  • Ready for production runs

Experiments To Run

1. Reproduce Paper Results - Gym Tasks

Pre-training Phase (Behavior cloning on offline datasets):

  • hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
  • walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
  • halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)

Fine-tuning Phase (DPPO: Policy gradient on diffusion denoising process):

  • hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
  • walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
  • halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"

Settings: Paper hyperparameters, 3 seeds each

2. Additional Environments (Future)

Robomimic Suite:

  • lift, can, square, transport

D3IL Suite:

  • avoid_m1, avoid_m2, avoid_m3

Furniture-Bench Suite:

  • one_leg, lamp, round_table (low/med difficulty)

Running Experiments

Quick Development Test

./submit_job.sh dev

Gym Pre-training

./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain  
./submit_job.sh gym halfcheetah pretrain

Gym Fine-tuning (after pre-training completes)

./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune

Manual SLURM Submission

# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh

Job Tracking

Job ID Type Task Mode Status Duration Results
3445117 dev test hopper pretrain SUCCESS 2m17s WandB
3445154 dev test walker2d pretrain SUCCESS ~2m Completed
3445155 dev test halfcheetah pretrain 🔄 RUNNING ~2m SLURM: 3445155
3445158 dev test hopper finetune 🔄 QUEUED 30m SLURM: 3445158

Note:

  • Production job 3445123 cancelled (cluster policy: no prod jobs while dev running)
  • WandB project names updated to start with "dppo-" prefix
  • Focused on Phase 1 validation before production runs

Configuration Notes

WandB Setup Required

export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>

Resource Requirements

  • Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
  • Production: 8h, 32GB RAM, 40 CPUs, accelerated

Issues Encountered

No issues with the DPPO repository - installation and setup completed successfully.

Paper Reproduction Progress

Full Paper Results (Target: All experiments in WandB)

Goal: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account.

Gym Tasks (Core Paper Results)

  • hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
  • walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
  • halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune

Additional Environment Suites (Extended Results)

  • Robomimic Tasks: lift, can, square, transport (pre-train + fine-tune)
  • D3IL Tasks: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune)
  • Furniture-Bench Tasks: one_leg, lamp, round_table (low/med difficulty)

Success Criteria

  • All pre-training runs complete successfully (loss convergence)
  • All fine-tuning runs complete successfully (performance improvement)
  • All experiments logged with proper WandB tracking
  • Results comparable to paper benchmarks
  • Complete documentation of hyperparameters and settings

Next Steps

Phase 1: Validation on Dev Partition (Current Priority)

Goal: Test all environments and modes on dev partition to validate installation and document any issues.

Dev Validation Todo List (In Order):

    • Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS]
    • Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS]
    • Monitor hopper finetune dev test (Job 3445158) [PENDING]
    • Test walker2d finetune on dev
    • Test halfcheetah finetune on dev
    • Test Robomimic lift pretrain on dev
    • Test Robomimic lift finetune on dev
    • Test Robomimic can pretrain on dev
    • Test Robomimic can finetune on dev
    • Test Robomimic square pretrain on dev
    • Test Robomimic square finetune on dev
    • Test Robomimic transport pretrain on dev
    • Test Robomimic transport finetune on dev
    • Test D3IL avoid_m1 pretrain on dev
    • Test D3IL avoid_m1 finetune on dev
    • Test D3IL avoid_m2 pretrain on dev
    • Test D3IL avoid_m2 finetune on dev
    • Test D3IL avoid_m3 pretrain on dev
    • Test D3IL avoid_m3 finetune on dev
    • Test Furniture one_leg_low pretrain on dev
    • Test Furniture one_leg_low finetune on dev
    • Test Furniture lamp_low pretrain on dev
    • Test Furniture lamp_low finetune on dev
    • Document any issues found in README
    • Verify all WandB logging works with dppo- prefix

Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)

Phase 2: Production Runs (After Dev Validation)

Only proceed after Phase 1 complete and all issues resolved

2.1 Full Gym Pipeline

  • hopper: pre-train (200 epochs) → fine-tune
  • walker2d: pre-train (200 epochs) → fine-tune
  • halfcheetah: pre-train (200 epochs) → fine-tune

2.2 Extended Environments

  • All validated environments from Phase 1

Current Status: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.