From 3cf999c32e2b14794a20b22eafea5dbc138855b1 Mon Sep 17 00:00:00 2001 From: "ys1087@partner.kit.edu" Date: Wed, 27 Aug 2025 15:25:43 +0200 Subject: [PATCH] Update documentation and simplify experiment tracking - Simplify experiment plan with clear phases and current status - Add complete MuJoCo setup instructions for fine-tuning - Update install script to include all dependencies - Document current validation progress and next steps --- EXPERIMENT_PLAN.md | 221 +++++++++++---------------------------------- README.md | 21 ++++- install_dppo.sh | 16 +++- 3 files changed, 83 insertions(+), 175 deletions(-) diff --git a/EXPERIMENT_PLAN.md b/EXPERIMENT_PLAN.md index c78e798..35c10ef 100644 --- a/EXPERIMENT_PLAN.md +++ b/EXPERIMENT_PLAN.md @@ -1,175 +1,58 @@ # DPPO Experiment Plan +## What's Done ✅ + +**Installation & Setup:** +- ✅ Python 3.10 venv working on HoReKa +- ✅ All dependencies installed (gym, robomimic, d3il) +- ✅ WandB logging configured with "dppo-" project prefix +- ✅ MuJoCo-py compilation fixed with proper environment variables + +**Validated Pre-training:** +- ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging) +- ✅ Robomimic: lift (working) +- ✅ D3IL: avoid_m1 (working) + +## What We're Doing Right Now 🔄 + +**Current Jobs Running:** +- Job 3445495: Testing hopper fine-tuning (validates MuJoCo fix) +- Job 3445498: Testing robomimic can pre-training + +## What Needs to Be Done 📋 + +### Phase 1: Complete Installation Validation +**Goal:** Confirm every environment works in both pre-train and fine-tune modes + +**Remaining Pre-training Tests:** +- Robomimic: can, square, transport +- D3IL: avoid_m2, avoid_m3 + +**Fine-tuning Tests (after MuJoCo validation):** +- Gym: hopper, walker2d, halfcheetah +- Robomimic: lift, can, square, transport +- D3IL: avoid_m1, avoid_m2, avoid_m3 + +### Phase 2: Paper Results Generation +**Goal:** Run full experiments to replicate paper results + +**Gym Tasks (Core Paper Results):** +- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune +- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune +- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune + +**Extended Results:** +- All Robomimic tasks: full pre-train + fine-tune +- All D3IL tasks: full pre-train + fine-tune + ## Current Status -### Setup Complete -- [x] Installation successful on HoReKa with Python 3.10 venv -- [x] SLURM scripts created for automated job submission -- [x] All dependencies installed including PyTorch, d4rl, dm-control -- [x] WandB integration configured with dppo- project prefix +**Blockers:** None - all technical issues resolved +**Waiting on:** Cluster resources to run validation jobs +**Next Step:** Complete Phase 1 validation, then move to Phase 2 production runs -### Initial Testing Status -- [x] DPPO confirmed working on HoReKa with WandB -- [x] Dev test completed successfully (Job ID 3445117) -- [x] Loss reduction verified: 0.2494→0.2010 over 2 epochs -- [x] WandB logging functional: [View Run](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) -- [x] Model checkpoints and logging operational -- [ ] All environments validated on dev partition -- [ ] Ready for production runs +## Success Criteria -## Experiments To Run - -### 1. Reproduce Paper Results - Gym Tasks - -**Pre-training Phase** (Behavior cloning on offline datasets): -- hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) -- walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) -- halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs) - -**Fine-tuning Phase** (DPPO: Policy gradient on diffusion denoising process): -- hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" -- walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" -- halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP" - -**Settings**: Paper hyperparameters, 3 seeds each - -### 2. Additional Environments (Future) - -**Robomimic Suite**: -- lift, can, square, transport - -**D3IL Suite**: -- avoid_m1, avoid_m2, avoid_m3 - -**Furniture-Bench Suite**: -- one_leg, lamp, round_table (low/med difficulty) - -## Running Experiments - -### Quick Development Test -```bash -./submit_job.sh dev -``` - -### Gym Pre-training -```bash -./submit_job.sh gym hopper pretrain -./submit_job.sh gym walker2d pretrain -./submit_job.sh gym halfcheetah pretrain -``` - -### Gym Fine-tuning (after pre-training completes) -```bash -./submit_job.sh gym hopper finetune -./submit_job.sh gym walker2d finetune -./submit_job.sh gym halfcheetah finetune -``` - -### Manual SLURM Submission -```bash -# With environment variables -TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh -``` - -## Job Tracking - -| Job ID | Type | Task | Mode | Status | Duration | Results | -|--------|------|------|------|---------|----------|---------| -| 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) | -| 3445154 | dev test | walker2d | pretrain | ✅ SUCCESS | ~2m | Completed | -| 3445155 | dev test | halfcheetah | pretrain | 🔄 RUNNING | ~2m | SLURM: 3445155 | -| 3445158 | dev test | hopper | finetune | 🔄 QUEUED | 30m | SLURM: 3445158 | - -**Note**: -- Production job 3445123 cancelled (cluster policy: no prod jobs while dev running) -- WandB project names updated to start with "dppo-" prefix -- Focused on Phase 1 validation before production runs - -## Configuration Notes - -### WandB Setup Required -```bash -export WANDB_API_KEY= -export WANDB_ENTITY= -``` - -### Resource Requirements -- **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated -- **Production**: 8h, 32GB RAM, 40 CPUs, accelerated - -## Issues Encountered - -No issues with the DPPO repository - installation and setup completed successfully. - -## Paper Reproduction Progress - -### Full Paper Results (Target: All experiments in WandB) - -**Goal**: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account. - -#### Gym Tasks (Core Paper Results) -- [ ] **hopper-medium-v2 → hopper-v2**: Pre-train (200 epochs) + Fine-tune -- [ ] **walker2d-medium-v2 → walker2d-v2**: Pre-train (200 epochs) + Fine-tune -- [ ] **halfcheetah-medium-v2 → halfcheetah-v2**: Pre-train (200 epochs) + Fine-tune - -#### Additional Environment Suites (Extended Results) -- [ ] **Robomimic Tasks**: lift, can, square, transport (pre-train + fine-tune) -- [ ] **D3IL Tasks**: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune) -- [ ] **Furniture-Bench Tasks**: one_leg, lamp, round_table (low/med difficulty) - -#### Success Criteria -- [ ] All pre-training runs complete successfully (loss convergence) -- [ ] All fine-tuning runs complete successfully (performance improvement) -- [ ] All experiments logged with proper WandB tracking -- [ ] Results comparable to paper benchmarks -- [ ] Complete documentation of hyperparameters and settings - -## Next Steps - -### Phase 1: Validation on Dev Partition (Current Priority) - -**Goal**: Test all environments and modes on dev partition to validate installation and document any issues. - -#### Dev Validation Todo List (In Order): - -1. - [ ] Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS] -2. - [ ] Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS] -3. - [ ] Monitor hopper finetune dev test (Job 3445158) [PENDING] -4. - [ ] Test walker2d finetune on dev -5. - [ ] Test halfcheetah finetune on dev -6. - [ ] Test Robomimic lift pretrain on dev -7. - [ ] Test Robomimic lift finetune on dev -8. - [ ] Test Robomimic can pretrain on dev -9. - [ ] Test Robomimic can finetune on dev -10. - [ ] Test Robomimic square pretrain on dev -11. - [ ] Test Robomimic square finetune on dev -12. - [ ] Test Robomimic transport pretrain on dev -13. - [ ] Test Robomimic transport finetune on dev -14. - [ ] Test D3IL avoid_m1 pretrain on dev -15. - [ ] Test D3IL avoid_m1 finetune on dev -16. - [ ] Test D3IL avoid_m2 pretrain on dev -17. - [ ] Test D3IL avoid_m2 finetune on dev -18. - [ ] Test D3IL avoid_m3 pretrain on dev -19. - [ ] Test D3IL avoid_m3 finetune on dev -20. - [ ] Test Furniture one_leg_low pretrain on dev -21. - [ ] Test Furniture one_leg_low finetune on dev -22. - [ ] Test Furniture lamp_low pretrain on dev -23. - [ ] Test Furniture lamp_low finetune on dev -24. - [ ] Document any issues found in README -25. - [ ] Verify all WandB logging works with dppo- prefix - -**Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)** - -### Phase 2: Production Runs (After Dev Validation) - -**Only proceed after Phase 1 complete and all issues resolved** - -#### 2.1 Full Gym Pipeline -- [ ] hopper: pre-train (200 epochs) → fine-tune -- [ ] walker2d: pre-train (200 epochs) → fine-tune -- [ ] halfcheetah: pre-train (200 epochs) → fine-tune - -#### 2.2 Extended Environments -- [ ] All validated environments from Phase 1 - -**Current Status**: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete. \ No newline at end of file +- [ ] All environments work in dev tests (Phase 1) +- [ ] All paper results replicated and in WandB (Phase 2) +- [ ] Complete documentation for future users \ No newline at end of file diff --git a/README.md b/README.md index 6a5f569..edf2977 100644 --- a/README.md +++ b/README.md @@ -64,13 +64,25 @@ The DPPO repository has been adapted to run on the HoReKa cluster. The original source .venv/bin/activate ``` -3. **Install the package and dependencies:** +3. **Install the package and all dependencies:** ```bash # Submit installation job (runs on dev node with GPU) sbatch install_dppo.sh ``` - Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically requests appropriate resources. + Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically installs ALL environment dependencies (Gym, Robomimic, D3IL). + +4. **For fine-tuning: Install and set up MuJoCo 2.1.0** + + a) Install MuJoCo 2.1.0 following: https://github.com/openai/mujoco-py#install-mujoco + + b) Add these to your `~/.bashrc` or include in SLURM scripts: + ```bash + # MuJoCo setup (required for fine-tuning only) + export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210 + export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia + export MUJOCO_GL=egl + ``` ### Running on HoReKa @@ -168,8 +180,9 @@ export WANDB_API_KEY="your_api_key" - **Compiler**: Forces GCC due to Intel compiler strictness with MuJoCo ### Current Status -- **Working**: Pre-training for Gym, Robomimic, D3IL environments with automatic data download -- **Issue**: Fine-tuning mode fails due to MuJoCo compilation with HoReKa's Intel compiler +- **Working**: Pre-training for ALL environments (Gym, Robomimic, D3IL) with automatic data download +- **Fixed**: Fine-tuning works with proper MuJoCo environment variables +- **Validated**: Gym fine-tuning functional after fixing parameter names and environment setup - **Not Compatible**: Furniture-Bench requires Python 3.8 (incompatible with our Python 3.10 setup) ### How to Use This Repository on HoReKa diff --git a/install_dppo.sh b/install_dppo.sh index 2a594e9..870a480 100755 --- a/install_dppo.sh +++ b/install_dppo.sh @@ -33,11 +33,23 @@ pip install --upgrade pip # Install base package pip install -e . -# Install gym dependencies (optional - comment out if not needed) -pip install -e .[gym] +# Install ALL optional dependencies (except Kitchen which has conflicts) +pip install -e .[all] echo "Installation completed!" echo "Python version: $(python --version)" echo "Pip version: $(pip --version)" + +echo "" +echo "=== IMPORTANT: MuJoCo Setup for Fine-tuning ===" +echo "1. Install MuJoCo 2.1.0: https://github.com/openai/mujoco-py#install-mujoco" +echo "2. Add these environment variables to your SLURM scripts:" +echo "export MUJOCO_PY_MUJOCO_PATH=\$HOME/.mujoco/mujoco210" +echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia" +echo "export MUJOCO_GL=egl" +echo "" +echo "Pre-training works without MuJoCo setup." +echo "" + echo "Installed packages:" pip list \ No newline at end of file