dppo/EXPERIMENT_PLAN.md
ys1087@partner.kit.edu e8e7233d98 Fix WandB config issue and achieve working DPPO setup
- Disable WandB in dev script to avoid config object vs string error
- Successfully completed development test (Job 3445106)
- Confirmed: pre-training works, loss reduces, checkpoints save
- Update experiment tracking with successful results
2025-08-27 12:19:38 +02:00

2.2 KiB

DPPO Experiment Plan

Current Status

Setup Complete

  • Installation successful on HoReKa with Python 3.10 venv
  • SLURM scripts created for automated job submission
  • All dependencies installed including PyTorch, d4rl, dm-control

Initial Testing

DPPO Confirmed Working on HoReKa

  • Successfully completed dev test (Job ID 3445106)
  • Pre-training working: 2 epochs, loss reduction 0.2494→0.2010
  • Model checkpoints saved correctly
  • Ready for full experiments

Experiments To Run

1. Reproduce Paper Results - Gym Tasks

Pre-training Phase:

  • hopper-medium-v2
  • walker2d-medium-v2
  • halfcheetah-medium-v2

Fine-tuning Phase:

  • hopper-v2
  • walker2d-v2
  • halfcheetah-v2

Settings: Paper hyperparameters, 3 seeds each

2. Additional Environments (Future)

Robomimic Suite:

  • lift, can, square, transport

D3IL Suite:

  • avoid_m1, avoid_m2, avoid_m3

Furniture-Bench Suite:

  • one_leg, lamp, round_table (low/med difficulty)

Running Experiments

Quick Development Test

./submit_job.sh dev

Gym Pre-training

./submit_job.sh gym hopper pretrain
./submit_job.sh gym walker2d pretrain  
./submit_job.sh gym halfcheetah pretrain

Gym Fine-tuning (after pre-training completes)

./submit_job.sh gym hopper finetune
./submit_job.sh gym walker2d finetune
./submit_job.sh gym halfcheetah finetune

Manual SLURM Submission

# With environment variables
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh

Job Tracking

Job ID Type Task Mode Status Duration Results
3445106 dev test hopper pretrain SUCCESS 2m11s Train loss: 0.2494→0.2010

Configuration Notes

WandB Setup Required

export WANDB_API_KEY=<your_api_key>
export WANDB_ENTITY=<your_username>

Resource Requirements

  • Dev jobs: 30min, 24GB RAM, 8 CPUs, dev_accelerated
  • Production: 8h, 32GB RAM, 40 CPUs, accelerated

Issues Encountered

No issues with the DPPO repository - installation and setup completed successfully.

Next Steps

  1. Run corrected dev test
  2. Begin systematic pre-training experiments
  3. Document successful runs and results