dodox/dppo

ys1087@partner.kit.edu 93ac652def Start full hopper pre-training production run

Job 3445123: 200 epochs, 8h allocated, queued on accelerated partition

2025-08-27 12:31:42 +02:00

3.4 KiB

Raw Blame History

DPPO Experiment Plan

Current Status

Setup Complete ✅

Installation successful on HoReKa with Python 3.10 venv
SLURM scripts created for automated job submission
All dependencies installed including PyTorch, d4rl, dm-control

Initial Testing

✅ DPPO Confirmed Working on HoReKa with WandB

Successfully completed dev test (Job ID 3445117)
Quick verification: 2 epochs only (not full training), loss reduction 0.2494→0.2010
WandB logging working: https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf
Model checkpoints and logging fully functional
Ready for full 200-epoch production runs

Experiments To Run

1. Reproduce Paper Results - Gym Tasks

Pre-training Phase (Train diffusion model on offline D4RL datasets):

hopper-medium-v2 → diffusion model trained on offline data (200 epochs)
walker2d-medium-v2 → diffusion model trained on offline data (200 epochs)
halfcheetah-medium-v2 → diffusion model trained on offline data (200 epochs)

Fine-tuning Phase (PPO fine-tune diffusion model with online interaction):

hopper-v2 → fine-tune pre-trained hopper model with PPO + online env
walker2d-v2 → fine-tune pre-trained walker2d model with PPO + online env
halfcheetah-v2 → fine-tune pre-trained halfcheetah model with PPO + online env

Settings: Paper hyperparameters, 3 seeds each

2. Additional Environments (Future)

Robomimic Suite:

lift, can, square, transport

D3IL Suite:

avoid_m1, avoid_m2, avoid_m3

Furniture-Bench Suite:

one_leg, lamp, round_table (low/med difficulty)

Running Experiments

Quick Development Test

./submit_job.sh dev

Gym Pre-training

./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain  
./submit_job.sh gym halfcheetah pretrain

Gym Fine-tuning (after pre-training completes)

./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune

Manual SLURM Submission

# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh

Job Tracking

Job ID	Type	Task	Mode	Status	Duration	Results
3445117	dev test	hopper	pretrain	✅ SUCCESS	2m17s	WandB
3445123	production	hopper	pretrain	🔄 QUEUED	8h	SLURM: 3445123

Configuration Notes

WandB Setup Required

export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>

Resource Requirements

Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
Production: 8h, 32GB RAM, 40 CPUs, accelerated

Issues Encountered

No issues with the DPPO repository - installation and setup completed successfully.

Next Steps

Immediate Tasks (To Verify All Environments Work)

Test remaining Gym environments:
- walker2d-medium-v2 (2 epochs dev test)
- halfcheetah-medium-v2 (2 epochs dev test)
Test other environment types:
- Robomimic: can task (basic test)
- D3IL: avoid_m1 (basic test)
Full production runs (after confirming all work):
- Full pre-training: hopper, walker2d, halfcheetah (200 epochs each)
- Fine-tuning experiments

Status: Only hopper-medium-v2 confirmed working. Need to verify other environments before production runs.

3.4 KiB Raw Blame History