dodox/dppo

ys1087@partner.kit.edu 0424a080c1 feat: HoReKa cluster adaptation and validation

- Updated all WandB project names to use dppo- prefix for organization
- Added flexible dev testing script for all environments
- Created organized dev_tests directory for test scripts
- Fixed MuJoCo compilation issues (added GCC compiler flags)
- Documented Python 3.10 compatibility and Furniture-Bench limitation
- Validated pre-training for Gym, Robomimic, D3IL environments
- Updated experiment tracking with validation results
- Enhanced README with troubleshooting and setup instructions

2025-08-27 14:01:51 +02:00

6.6 KiB

Raw Blame History

DPPO Experiment Plan

Current Status

Setup Complete

Installation successful on HoReKa with Python 3.10 venv
SLURM scripts created for automated job submission
All dependencies installed including PyTorch, d4rl, dm-control
WandB integration configured with dppo- project prefix

Initial Testing Status

DPPO confirmed working on HoReKa with WandB
Dev test completed successfully (Job ID 3445117)
Loss reduction verified: 0.2494→0.2010 over 2 epochs
WandB logging functional: View Run
Model checkpoints and logging operational
All environments validated on dev partition
Ready for production runs

Experiments To Run

1. Reproduce Paper Results - Gym Tasks

Pre-training Phase (Behavior cloning on offline datasets):

hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)

Fine-tuning Phase (DPPO: Policy gradient on diffusion denoising process):

hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"

Settings: Paper hyperparameters, 3 seeds each

2. Additional Environments (Future)

Robomimic Suite:

lift, can, square, transport

D3IL Suite:

avoid_m1, avoid_m2, avoid_m3

Furniture-Bench Suite:

one_leg, lamp, round_table (low/med difficulty)

Running Experiments

Quick Development Test

./submit_job.sh dev

Gym Pre-training

./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain  
./submit_job.sh gym halfcheetah pretrain

Gym Fine-tuning (after pre-training completes)

./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune

Manual SLURM Submission

# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh

Job Tracking

Job ID	Type	Task	Mode	Status	Duration	Results
3445117	dev test	hopper	pretrain	✅ SUCCESS	2m17s	WandB
3445154	dev test	walker2d	pretrain	✅ SUCCESS	~2m	Completed
3445155	dev test	halfcheetah	pretrain	🔄 RUNNING	~2m	SLURM: 3445155
3445158	dev test	hopper	finetune	🔄 QUEUED	30m	SLURM: 3445158

Note:

Production job 3445123 cancelled (cluster policy: no prod jobs while dev running)
WandB project names updated to start with "dppo-" prefix
Focused on Phase 1 validation before production runs

Configuration Notes

WandB Setup Required

export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>

Resource Requirements

Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
Production: 8h, 32GB RAM, 40 CPUs, accelerated

Issues Encountered

No issues with the DPPO repository - installation and setup completed successfully.

Paper Reproduction Progress

Full Paper Results (Target: All experiments in WandB)

Goal: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account.

Gym Tasks (Core Paper Results)

hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune

Additional Environment Suites (Extended Results)

Robomimic Tasks: lift, can, square, transport (pre-train + fine-tune)
D3IL Tasks: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune)
Furniture-Bench Tasks: one_leg, lamp, round_table (low/med difficulty)

Success Criteria

All pre-training runs complete successfully (loss convergence)
All fine-tuning runs complete successfully (performance improvement)
All experiments logged with proper WandB tracking
Results comparable to paper benchmarks
Complete documentation of hyperparameters and settings

Next Steps

Phase 1: Validation on Dev Partition (Current Priority)

Goal: Test all environments and modes on dev partition to validate installation and document any issues.

Dev Validation Todo List (In Order):

- Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS]
- Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS]
- Monitor hopper finetune dev test (Job 3445158) [PENDING]
- Test walker2d finetune on dev
- Test halfcheetah finetune on dev
- Test Robomimic lift pretrain on dev
- Test Robomimic lift finetune on dev
- Test Robomimic can pretrain on dev
- Test Robomimic can finetune on dev
- Test Robomimic square pretrain on dev
- Test Robomimic square finetune on dev
- Test Robomimic transport pretrain on dev
- Test Robomimic transport finetune on dev
- Test D3IL avoid_m1 pretrain on dev
- Test D3IL avoid_m1 finetune on dev
- Test D3IL avoid_m2 pretrain on dev
- Test D3IL avoid_m2 finetune on dev
- Test D3IL avoid_m3 pretrain on dev
- Test D3IL avoid_m3 finetune on dev
- Test Furniture one_leg_low pretrain on dev
- Test Furniture one_leg_low finetune on dev
- Test Furniture lamp_low pretrain on dev
- Test Furniture lamp_low finetune on dev
- Document any issues found in README
- Verify all WandB logging works with dppo- prefix

Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)

Phase 2: Production Runs (After Dev Validation)

Only proceed after Phase 1 complete and all issues resolved

2.1 Full Gym Pipeline

hopper: pre-train (200 epochs) → fine-tune
walker2d: pre-train (200 epochs) → fine-tune
halfcheetah: pre-train (200 epochs) → fine-tune

2.2 Extended Environments

All validated environments from Phase 1

Current Status: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.

6.6 KiB Raw Blame History