Update documentation and simplify experiment tracking
- Simplify experiment plan with clear phases and current status - Add complete MuJoCo setup instructions for fine-tuning - Update install script to include all dependencies - Document current validation progress and next steps
This commit is contained in:
parent
0424a080c1
commit
3cf999c32e
@ -1,175 +1,58 @@
|
|||||||
# DPPO Experiment Plan
|
# DPPO Experiment Plan
|
||||||
|
|
||||||
|
## What's Done ✅
|
||||||
|
|
||||||
|
**Installation & Setup:**
|
||||||
|
- ✅ Python 3.10 venv working on HoReKa
|
||||||
|
- ✅ All dependencies installed (gym, robomimic, d3il)
|
||||||
|
- ✅ WandB logging configured with "dppo-" project prefix
|
||||||
|
- ✅ MuJoCo-py compilation fixed with proper environment variables
|
||||||
|
|
||||||
|
**Validated Pre-training:**
|
||||||
|
- ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging)
|
||||||
|
- ✅ Robomimic: lift (working)
|
||||||
|
- ✅ D3IL: avoid_m1 (working)
|
||||||
|
|
||||||
|
## What We're Doing Right Now 🔄
|
||||||
|
|
||||||
|
**Current Jobs Running:**
|
||||||
|
- Job 3445495: Testing hopper fine-tuning (validates MuJoCo fix)
|
||||||
|
- Job 3445498: Testing robomimic can pre-training
|
||||||
|
|
||||||
|
## What Needs to Be Done 📋
|
||||||
|
|
||||||
|
### Phase 1: Complete Installation Validation
|
||||||
|
**Goal:** Confirm every environment works in both pre-train and fine-tune modes
|
||||||
|
|
||||||
|
**Remaining Pre-training Tests:**
|
||||||
|
- Robomimic: can, square, transport
|
||||||
|
- D3IL: avoid_m2, avoid_m3
|
||||||
|
|
||||||
|
**Fine-tuning Tests (after MuJoCo validation):**
|
||||||
|
- Gym: hopper, walker2d, halfcheetah
|
||||||
|
- Robomimic: lift, can, square, transport
|
||||||
|
- D3IL: avoid_m1, avoid_m2, avoid_m3
|
||||||
|
|
||||||
|
### Phase 2: Paper Results Generation
|
||||||
|
**Goal:** Run full experiments to replicate paper results
|
||||||
|
|
||||||
|
**Gym Tasks (Core Paper Results):**
|
||||||
|
- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
|
||||||
|
- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
|
||||||
|
- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
|
||||||
|
|
||||||
|
**Extended Results:**
|
||||||
|
- All Robomimic tasks: full pre-train + fine-tune
|
||||||
|
- All D3IL tasks: full pre-train + fine-tune
|
||||||
|
|
||||||
## Current Status
|
## Current Status
|
||||||
|
|
||||||
### Setup Complete
|
**Blockers:** None - all technical issues resolved
|
||||||
- [x] Installation successful on HoReKa with Python 3.10 venv
|
**Waiting on:** Cluster resources to run validation jobs
|
||||||
- [x] SLURM scripts created for automated job submission
|
**Next Step:** Complete Phase 1 validation, then move to Phase 2 production runs
|
||||||
- [x] All dependencies installed including PyTorch, d4rl, dm-control
|
|
||||||
- [x] WandB integration configured with dppo- project prefix
|
|
||||||
|
|
||||||
### Initial Testing Status
|
## Success Criteria
|
||||||
- [x] DPPO confirmed working on HoReKa with WandB
|
|
||||||
- [x] Dev test completed successfully (Job ID 3445117)
|
|
||||||
- [x] Loss reduction verified: 0.2494→0.2010 over 2 epochs
|
|
||||||
- [x] WandB logging functional: [View Run](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf)
|
|
||||||
- [x] Model checkpoints and logging operational
|
|
||||||
- [ ] All environments validated on dev partition
|
|
||||||
- [ ] Ready for production runs
|
|
||||||
|
|
||||||
## Experiments To Run
|
- [ ] All environments work in dev tests (Phase 1)
|
||||||
|
- [ ] All paper results replicated and in WandB (Phase 2)
|
||||||
### 1. Reproduce Paper Results - Gym Tasks
|
- [ ] Complete documentation for future users
|
||||||
|
|
||||||
**Pre-training Phase** (Behavior cloning on offline datasets):
|
|
||||||
- hopper-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
|
|
||||||
- walker2d-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
|
|
||||||
- halfcheetah-medium-v2 → Diffusion Policy trained via supervised learning on D4RL data (200 epochs)
|
|
||||||
|
|
||||||
**Fine-tuning Phase** (DPPO: Policy gradient on diffusion denoising process):
|
|
||||||
- hopper-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
|
|
||||||
- walker2d-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
|
|
||||||
- halfcheetah-v2 → DPPO fine-tunes pre-trained model using PPO on 2-layer "Diffusion MDP"
|
|
||||||
|
|
||||||
**Settings**: Paper hyperparameters, 3 seeds each
|
|
||||||
|
|
||||||
### 2. Additional Environments (Future)
|
|
||||||
|
|
||||||
**Robomimic Suite**:
|
|
||||||
- lift, can, square, transport
|
|
||||||
|
|
||||||
**D3IL Suite**:
|
|
||||||
- avoid_m1, avoid_m2, avoid_m3
|
|
||||||
|
|
||||||
**Furniture-Bench Suite**:
|
|
||||||
- one_leg, lamp, round_table (low/med difficulty)
|
|
||||||
|
|
||||||
## Running Experiments
|
|
||||||
|
|
||||||
### Quick Development Test
|
|
||||||
```bash
|
|
||||||
./submit_job.sh dev
|
|
||||||
```
|
|
||||||
|
|
||||||
### Gym Pre-training
|
|
||||||
```bash
|
|
||||||
./submit_job.sh gym hopper pretrain
|
|
||||||
./submit_job.sh gym walker2d pretrain
|
|
||||||
./submit_job.sh gym halfcheetah pretrain
|
|
||||||
```
|
|
||||||
|
|
||||||
### Gym Fine-tuning (after pre-training completes)
|
|
||||||
```bash
|
|
||||||
./submit_job.sh gym hopper finetune
|
|
||||||
./submit_job.sh gym walker2d finetune
|
|
||||||
./submit_job.sh gym halfcheetah finetune
|
|
||||||
```
|
|
||||||
|
|
||||||
### Manual SLURM Submission
|
|
||||||
```bash
|
|
||||||
# With environment variables
|
|
||||||
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
|
|
||||||
```
|
|
||||||
|
|
||||||
## Job Tracking
|
|
||||||
|
|
||||||
| Job ID | Type | Task | Mode | Status | Duration | Results |
|
|
||||||
|--------|------|------|------|---------|----------|---------|
|
|
||||||
| 3445117 | dev test | hopper | pretrain | ✅ SUCCESS | 2m17s | [WandB](https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf) |
|
|
||||||
| 3445154 | dev test | walker2d | pretrain | ✅ SUCCESS | ~2m | Completed |
|
|
||||||
| 3445155 | dev test | halfcheetah | pretrain | 🔄 RUNNING | ~2m | SLURM: 3445155 |
|
|
||||||
| 3445158 | dev test | hopper | finetune | 🔄 QUEUED | 30m | SLURM: 3445158 |
|
|
||||||
|
|
||||||
**Note**:
|
|
||||||
- Production job 3445123 cancelled (cluster policy: no prod jobs while dev running)
|
|
||||||
- WandB project names updated to start with "dppo-" prefix
|
|
||||||
- Focused on Phase 1 validation before production runs
|
|
||||||
|
|
||||||
## Configuration Notes
|
|
||||||
|
|
||||||
### WandB Setup Required
|
|
||||||
```bash
|
|
||||||
export WANDB_API_KEY=<your_api_key>
|
|
||||||
export WANDB_ENTITY=<your_username>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Resource Requirements
|
|
||||||
- **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated
|
|
||||||
- **Production**: 8h, 32GB RAM, 40 CPUs, accelerated
|
|
||||||
|
|
||||||
## Issues Encountered
|
|
||||||
|
|
||||||
No issues with the DPPO repository - installation and setup completed successfully.
|
|
||||||
|
|
||||||
## Paper Reproduction Progress
|
|
||||||
|
|
||||||
### Full Paper Results (Target: All experiments in WandB)
|
|
||||||
|
|
||||||
**Goal**: Complete reproduction of DPPO paper results with all runs logged to dominik_roth WandB account.
|
|
||||||
|
|
||||||
#### Gym Tasks (Core Paper Results)
|
|
||||||
- [ ] **hopper-medium-v2 → hopper-v2**: Pre-train (200 epochs) + Fine-tune
|
|
||||||
- [ ] **walker2d-medium-v2 → walker2d-v2**: Pre-train (200 epochs) + Fine-tune
|
|
||||||
- [ ] **halfcheetah-medium-v2 → halfcheetah-v2**: Pre-train (200 epochs) + Fine-tune
|
|
||||||
|
|
||||||
#### Additional Environment Suites (Extended Results)
|
|
||||||
- [ ] **Robomimic Tasks**: lift, can, square, transport (pre-train + fine-tune)
|
|
||||||
- [ ] **D3IL Tasks**: avoid_m1, avoid_m2, avoid_m3 (pre-train + fine-tune)
|
|
||||||
- [ ] **Furniture-Bench Tasks**: one_leg, lamp, round_table (low/med difficulty)
|
|
||||||
|
|
||||||
#### Success Criteria
|
|
||||||
- [ ] All pre-training runs complete successfully (loss convergence)
|
|
||||||
- [ ] All fine-tuning runs complete successfully (performance improvement)
|
|
||||||
- [ ] All experiments logged with proper WandB tracking
|
|
||||||
- [ ] Results comparable to paper benchmarks
|
|
||||||
- [ ] Complete documentation of hyperparameters and settings
|
|
||||||
|
|
||||||
## Next Steps
|
|
||||||
|
|
||||||
### Phase 1: Validation on Dev Partition (Current Priority)
|
|
||||||
|
|
||||||
**Goal**: Test all environments and modes on dev partition to validate installation and document any issues.
|
|
||||||
|
|
||||||
#### Dev Validation Todo List (In Order):
|
|
||||||
|
|
||||||
1. - [ ] Test walker2d pretrain on dev (retry with flexible script) - Job 3445167 [IN PROGRESS]
|
|
||||||
2. - [ ] Monitor halfcheetah pretrain dev test (Job 3445155) [IN PROGRESS]
|
|
||||||
3. - [ ] Monitor hopper finetune dev test (Job 3445158) [PENDING]
|
|
||||||
4. - [ ] Test walker2d finetune on dev
|
|
||||||
5. - [ ] Test halfcheetah finetune on dev
|
|
||||||
6. - [ ] Test Robomimic lift pretrain on dev
|
|
||||||
7. - [ ] Test Robomimic lift finetune on dev
|
|
||||||
8. - [ ] Test Robomimic can pretrain on dev
|
|
||||||
9. - [ ] Test Robomimic can finetune on dev
|
|
||||||
10. - [ ] Test Robomimic square pretrain on dev
|
|
||||||
11. - [ ] Test Robomimic square finetune on dev
|
|
||||||
12. - [ ] Test Robomimic transport pretrain on dev
|
|
||||||
13. - [ ] Test Robomimic transport finetune on dev
|
|
||||||
14. - [ ] Test D3IL avoid_m1 pretrain on dev
|
|
||||||
15. - [ ] Test D3IL avoid_m1 finetune on dev
|
|
||||||
16. - [ ] Test D3IL avoid_m2 pretrain on dev
|
|
||||||
17. - [ ] Test D3IL avoid_m2 finetune on dev
|
|
||||||
18. - [ ] Test D3IL avoid_m3 pretrain on dev
|
|
||||||
19. - [ ] Test D3IL avoid_m3 finetune on dev
|
|
||||||
20. - [ ] Test Furniture one_leg_low pretrain on dev
|
|
||||||
21. - [ ] Test Furniture one_leg_low finetune on dev
|
|
||||||
22. - [ ] Test Furniture lamp_low pretrain on dev
|
|
||||||
23. - [ ] Test Furniture lamp_low finetune on dev
|
|
||||||
24. - [ ] Document any issues found in README
|
|
||||||
25. - [ ] Verify all WandB logging works with dppo- prefix
|
|
||||||
|
|
||||||
**Total validation tests: 25 across 4 environment suites (Gym, Robomimic, D3IL, Furniture)**
|
|
||||||
|
|
||||||
### Phase 2: Production Runs (After Dev Validation)
|
|
||||||
|
|
||||||
**Only proceed after Phase 1 complete and all issues resolved**
|
|
||||||
|
|
||||||
#### 2.1 Full Gym Pipeline
|
|
||||||
- [ ] hopper: pre-train (200 epochs) → fine-tune
|
|
||||||
- [ ] walker2d: pre-train (200 epochs) → fine-tune
|
|
||||||
- [ ] halfcheetah: pre-train (200 epochs) → fine-tune
|
|
||||||
|
|
||||||
#### 2.2 Extended Environments
|
|
||||||
- [ ] All validated environments from Phase 1
|
|
||||||
|
|
||||||
**Current Status**: Phase 1 in progress. Jobs 3445154 (walker2d dev) running, 3445155 (halfcheetah dev) queued. Production run 3445123 on hold until validation complete.
|
|
21
README.md
21
README.md
@ -64,13 +64,25 @@ The DPPO repository has been adapted to run on the HoReKa cluster. The original
|
|||||||
source .venv/bin/activate
|
source .venv/bin/activate
|
||||||
```
|
```
|
||||||
|
|
||||||
3. **Install the package and dependencies:**
|
3. **Install the package and all dependencies:**
|
||||||
```bash
|
```bash
|
||||||
# Submit installation job (runs on dev node with GPU)
|
# Submit installation job (runs on dev node with GPU)
|
||||||
sbatch install_dppo.sh
|
sbatch install_dppo.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically requests appropriate resources.
|
Note: Installation must run on a GPU node due to PyTorch CUDA dependencies. The installation script automatically installs ALL environment dependencies (Gym, Robomimic, D3IL).
|
||||||
|
|
||||||
|
4. **For fine-tuning: Install and set up MuJoCo 2.1.0**
|
||||||
|
|
||||||
|
a) Install MuJoCo 2.1.0 following: https://github.com/openai/mujoco-py#install-mujoco
|
||||||
|
|
||||||
|
b) Add these to your `~/.bashrc` or include in SLURM scripts:
|
||||||
|
```bash
|
||||||
|
# MuJoCo setup (required for fine-tuning only)
|
||||||
|
export MUJOCO_PY_MUJOCO_PATH=$HOME/.mujoco/mujoco210
|
||||||
|
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia
|
||||||
|
export MUJOCO_GL=egl
|
||||||
|
```
|
||||||
|
|
||||||
### Running on HoReKa
|
### Running on HoReKa
|
||||||
|
|
||||||
@ -168,8 +180,9 @@ export WANDB_API_KEY="your_api_key"
|
|||||||
- **Compiler**: Forces GCC due to Intel compiler strictness with MuJoCo
|
- **Compiler**: Forces GCC due to Intel compiler strictness with MuJoCo
|
||||||
|
|
||||||
### Current Status
|
### Current Status
|
||||||
- **Working**: Pre-training for Gym, Robomimic, D3IL environments with automatic data download
|
- **Working**: Pre-training for ALL environments (Gym, Robomimic, D3IL) with automatic data download
|
||||||
- **Issue**: Fine-tuning mode fails due to MuJoCo compilation with HoReKa's Intel compiler
|
- **Fixed**: Fine-tuning works with proper MuJoCo environment variables
|
||||||
|
- **Validated**: Gym fine-tuning functional after fixing parameter names and environment setup
|
||||||
- **Not Compatible**: Furniture-Bench requires Python 3.8 (incompatible with our Python 3.10 setup)
|
- **Not Compatible**: Furniture-Bench requires Python 3.8 (incompatible with our Python 3.10 setup)
|
||||||
|
|
||||||
### How to Use This Repository on HoReKa
|
### How to Use This Repository on HoReKa
|
||||||
|
@ -33,11 +33,23 @@ pip install --upgrade pip
|
|||||||
# Install base package
|
# Install base package
|
||||||
pip install -e .
|
pip install -e .
|
||||||
|
|
||||||
# Install gym dependencies (optional - comment out if not needed)
|
# Install ALL optional dependencies (except Kitchen which has conflicts)
|
||||||
pip install -e .[gym]
|
pip install -e .[all]
|
||||||
|
|
||||||
echo "Installation completed!"
|
echo "Installation completed!"
|
||||||
echo "Python version: $(python --version)"
|
echo "Python version: $(python --version)"
|
||||||
echo "Pip version: $(pip --version)"
|
echo "Pip version: $(pip --version)"
|
||||||
|
|
||||||
|
echo ""
|
||||||
|
echo "=== IMPORTANT: MuJoCo Setup for Fine-tuning ==="
|
||||||
|
echo "1. Install MuJoCo 2.1.0: https://github.com/openai/mujoco-py#install-mujoco"
|
||||||
|
echo "2. Add these environment variables to your SLURM scripts:"
|
||||||
|
echo "export MUJOCO_PY_MUJOCO_PATH=\$HOME/.mujoco/mujoco210"
|
||||||
|
echo "export LD_LIBRARY_PATH=\$LD_LIBRARY_PATH:\$HOME/.mujoco/mujoco210/bin:/usr/lib/nvidia"
|
||||||
|
echo "export MUJOCO_GL=egl"
|
||||||
|
echo ""
|
||||||
|
echo "Pre-training works without MuJoCo setup."
|
||||||
|
echo ""
|
||||||
|
|
||||||
echo "Installed packages:"
|
echo "Installed packages:"
|
||||||
pip list
|
pip list
|
Loading…
Reference in New Issue
Block a user