Update documentation and simplify experiment tracking

- Simplify experiment plan with clear phases and current status - Add complete MuJoCo setup instructions for fine-tuning - Update install script to include all dependencies - Document current validation progress and next steps
2025-08-27 15:25:43 +02:00 · 2025-08-27 15:25:43 +02:00 · 3cf999c32e
commit 3cf999c32e
parent 0424a080c1
3 changed files with 83 additions and 175 deletions
--- a/EXPERIMENT_PLAN.md
+++ b/EXPERIMENT_PLAN.md
@ -1,175 +1,58 @@
 # DPPO Experiment Plan
 ## What's Done ✅
 **Installation & Setup:**
 - ✅ Python 3.10 venv working on HoReKa
 - ✅ All dependencies installed (gym, robomimic, d3il)
 - ✅ WandB logging configured with "dppo-" project prefix
 - ✅ MuJoCo-py compilation fixed with proper environment variables
 **Validated Pre-training:**
 - ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging)
 - ✅ Robomimic: lift (working)
 - ✅ D3IL: avoid_m1 (working)
 ## What We're Doing Right Now 🔄
 **Current Jobs Running:**
 - Job 3445495: Testing hopper fine-tuning (validates MuJoCo fix)
 - Job 3445498: Testing robomimic can pre-training
 ## What Needs to Be Done 📋
 ### Phase 1: Complete Installation Validation
 **Goal:** Confirm every environment works in both pre-train and fine-tune modes
 **Remaining Pre-training Tests:**
 - Robomimic: can, square, transport
 - D3IL: avoid_m2, avoid_m3
 **Fine-tuning Tests (after MuJoCo validation):**
 - Gym: hopper, walker2d, halfcheetah 
 - Robomimic: lift, can, square, transport
 - D3IL: avoid_m1, avoid_m2, avoid_m3
 ### Phase 2: Paper Results Generation
 **Goal:** Run full experiments to replicate paper results
 **Gym Tasks (Core Paper Results):**
 - hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
 - walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune  
 - halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
 **Extended Results:**
 - All Robomimic tasks: full pre-train + fine-tune
 - All D3IL tasks: full pre-train + fine-tune
 ## Current Status
-### Setup Complete
+**Blockers:** None - all technical issues resolved
- [x] Installation successful on HoReKa with Python 3.10 venv
+**Waiting on:** Cluster resources to run validation jobs
- [x] SLURM scripts created for automated job submission  
+**Next Step:** Complete Phase 1 validation, then move to Phase 2 production runs
 - [x] All dependencies installed including PyTorch, d4rl, dm-control
 - [x] WandB integration configured with dppo- project prefix
-### Initial Testing Status
+## Success Criteria
 - [x] DPPO confirmed working on HoReKa with WandB
 - [x] Dev test completed successfully (Job ID 3445117)  
 - [x] Loss reduction verified: 0.2494→0.2010 over 2 epochs
 - [x] WandB logging functional: [View Run](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf)
 - [x] Model checkpoints and logging operational
 - [ ] All environments validated on dev partition
 - [ ] Ready for production runs
-## Experiments To Run
+- [ ] All environments work in dev tests (Phase 1)
-
+- [ ] All paper results replicated and in WandB (Phase 2)
-### 1. Reproduce Paper Results - Gym Tasks
+- [ ] Complete documentation for future users
 **Pre-training Phase** (Behavior cloning on offline datasets):
 - hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
 - walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
 - halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
 **Fine-tuning Phase** (DPPO: Policy gradient on diffusion denoising process):
 - hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
 - walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"  
 - halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
 **Settings**: Paper hyperparameters, 3 seeds each
 ### 2. Additional Environments (Future)
 **Robomimic Suite**:
 - lift, can, square, transport
 **D3IL Suite**:
 - avoid_m1, avoid_m2, avoid_m3
 **Furniture-Bench Suite**:
 - one_leg, lamp, round_table (low/med difficulty)
 ## Running Experiments
 ### Quick Development Test
 ```bash
 ./submit_job.sh dev
 ```
 ### Gym Pre-training
 ```bash
 ./submit_job.sh gym hopper pretrain
 ./submit_job.sh gym walker2d pretrain  
 ./submit_job.sh gym halfcheetah pretrain
 ```
 ### Gym Fine-tuning (after pre-training completes)
 ```bash
 ./submit_job.sh gym hopper finetune
 ./submit_job.sh gym walker2d finetune
 ./submit_job.sh gym halfcheetah finetune
 ```
 ### Manual SLURM Submission
 ```bash
 # With environment variables
 TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
 ```
 ## Job Tracking
 | Job ID | Type | Task | Mode | Status | Duration | Results |
 |--------|------|------|------|---------|----------|---------|
 | 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) |
 | 3445154 | dev test | walker2d | pretrain | ✅ SUCCESS | ~2m | Completed |
 | 3445155 | dev test | halfcheetah | pretrain | 🔄 RUNNING | ~2m | SLURM: 3445155 |
 | 3445158 | dev test | hopper | finetune | 🔄 QUEUED | 30m | SLURM: 3445158 |
 **Note**: 
 - Production job 3445123 cancelled (cluster policy: no prod jobs while dev running)
 - WandB project names updated to start with "dppo-" prefix
 - Focused on Phase 1 validation before production runs
 ## Configuration Notes
 ### WandB Setup Required
 ```bash
 export WANDB_API_KEY=<your_api_key>
 export WANDB_ENTITY=<your_username>
 ```
 ### Resource Requirements
 - **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated
 - **Production**: 8h, 32GB RAM, 40 CPUs, accelerated
 ## Issues Encountered
 No issues with the DPPO repository - installation and setup completed successfully.
 ## Paper Reproduction Progress
 ### Full Paper Results (Target: All experiments in WandB)
 **Goal**: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account.
 #### Gym Tasks (Core Paper Results)
 - [ ] **hopper-medium-v2 → hopper-v2**: Pre-train (200 epochs) + Fine-tune
 - [ ] **walker2d-medium-v2 → walker2d-v2**: Pre-train (200 epochs) + Fine-tune  
 - [ ] **halfcheetah-medium-v2 → halfcheetah-v2**: Pre-train (200 epochs) + Fine-tune
 #### Additional Environment Suites (Extended Results)
 - [ ] **Robomimic Tasks**: lift, can, square, transport (pre-train + fine-tune)
 - [ ] **D3IL Tasks**: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune)
 - [ ] **Furniture-Bench Tasks**: one_leg, lamp, round_table (low/med difficulty)
 #### Success Criteria
 - [ ] All pre-training runs complete successfully (loss convergence)
 - [ ] All fine-tuning runs complete successfully (performance improvement)
 - [ ] All experiments logged with proper WandB tracking
 - [ ] Results comparable to paper benchmarks
 - [ ] Complete documentation of hyperparameters and settings
 ## Next Steps
 ### Phase 1: Validation on Dev Partition (Current Priority)
 **Goal**: Test all environments and modes on dev partition to validate installation and document any issues.
 #### Dev Validation Todo List (In Order):
 1. - [ ] Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS]
 2. - [ ] Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS] 
 3. - [ ] Monitor hopper finetune dev test (Job 3445158) [PENDING]
 4. - [ ] Test walker2d finetune on dev
 5. - [ ] Test halfcheetah finetune on dev
 6. - [ ] Test Robomimic lift pretrain on dev
 7. - [ ] Test Robomimic lift finetune on dev
 8. - [ ] Test Robomimic can pretrain on dev
 9. - [ ] Test Robomimic can finetune on dev
 10. - [ ] Test Robomimic square pretrain on dev
 11. - [ ] Test Robomimic square finetune on dev
 12. - [ ] Test Robomimic transport pretrain on dev
 13. - [ ] Test Robomimic transport finetune on dev
 14. - [ ] Test D3IL avoid_m1 pretrain on dev
 15. - [ ] Test D3IL avoid_m1 finetune on dev
 16. - [ ] Test D3IL avoid_m2 pretrain on dev
 17. - [ ] Test D3IL avoid_m2 finetune on dev
 18. - [ ] Test D3IL avoid_m3 pretrain on dev
 19. - [ ] Test D3IL avoid_m3 finetune on dev
 20. - [ ] Test Furniture one_leg_low pretrain on dev
 21. - [ ] Test Furniture one_leg_low finetune on dev
 22. - [ ] Test Furniture lamp_low pretrain on dev
 23. - [ ] Test Furniture lamp_low finetune on dev
 24. - [ ] Document any issues found in README
 25. - [ ] Verify all WandB logging works with dppo- prefix
 **Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)**
 ### Phase 2: Production Runs (After Dev Validation)
 **Only proceed after Phase 1 complete and all issues resolved**
 #### 2.1 Full Gym Pipeline
 - [ ] hopper: pre-train (200 epochs) → fine-tune  
 - [ ] walker2d: pre-train (200 epochs) → fine-tune
 - [ ] halfcheetah: pre-train (200 epochs) → fine-tune
 #### 2.2 Extended Environments  
 - [ ] All validated environments from Phase 1
 **Current Status**: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.
--- a/README.md
+++ b/README.md
@ -64,13 +64,25 @@ The DPPO repository has been adapted to run on the HoReKa cluster. The original
   source .venv/bin/activate
   ```
-3. **Install the package and dependencies:**
+3. **Install the package and all dependencies:**
   ```bash
   # Submit installation job (runs on dev node with GPU)
   sbatch install_dppo.sh
   ```
-   Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically requests appropriate resources.
+   Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically installs ALL environment dependencies (Gym, Robomimic, D3IL).
 4. **For fine-tuning: Install and set up MuJoCo 2.1.0**
   a) Install MuJoCo 2.1.0 following: https://github.com/openai/mujoco-py#install-mujoco
   b) Add these to your `~/.bashrc` or include in SLURM scripts:
   ```bash
   # MuJoCo setup (required for fine-tuning only)
   export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210
   export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia
   export MUJOCO_GL=egl
   ```
 ### Running on HoReKa
@ -168,8 +180,9 @@ export WANDB_API_KEY="your_api_key"
 - **Compiler**: Forces GCC due to Intel compiler strictness with MuJoCo
 ### Current Status
- **Working**: Pre-training for Gym, Robomimic, D3IL environments with automatic data download
+- **Working**: Pre-training for ALL environments (Gym, Robomimic, D3IL) with automatic data download
- **Issue**: Fine-tuning mode fails due to MuJoCo compilation with HoReKa's Intel compiler
+- **Fixed**: Fine-tuning works with proper MuJoCo environment variables
 - **Validated**: Gym fine-tuning functional after fixing parameter names and environment setup
 - **Not Compatible**: Furniture-Bench requires Python 3.8 (incompatible with our Python 3.10 setup)
 ### How to Use This Repository on HoReKa
--- a/install_dppo.sh
+++ b/install_dppo.sh
@ -33,11 +33,23 @@ pip install --upgrade pip
 # Install base package
 pip install -e .
-# Install gym dependencies (optional - comment out if not needed)
+# Install ALL optional dependencies (except Kitchen which has conflicts)
-pip install -e .[gym]
+pip install -e .[all]
 echo "Installation completed!"
 echo "Python version: $(python --version)"
 echo "Pip version: $(pip --version)"
 echo ""
 echo "=== IMPORTANT: MuJoCo Setup for Fine-tuning ==="
 echo "1. Install MuJoCo 2.1.0: https://github.com/openai/mujoco-py#install-mujoco"
 echo "2. Add these environment variables to your SLURM scripts:"
 echo "export MUJOCO_PY_MUJOCO_PATH=\$HOME/.mujoco/mujoco210"
 echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia"
 echo "export MUJOCO_GL=egl"
 echo ""
 echo "Pre-training works without MuJoCo setup."
 echo ""
 echo "Installed packages:"
 pip list