dppo/EXPERIMENT_PLAN.md
ys1087@partner.kit.edu d739fa5e5e Add robomimic transport test and update experiment plan
- Create robomimic transport pre-training test script
- Update EXPERIMENT_PLAN.md with square success
- Add WandB URLs for completed robomimic tests
- Track progress on remaining validation tests
2025-08-27 16:21:06 +02:00

69 lines
2.6 KiB
Markdown

# DPPO Experiment Plan
## What's Done ✅
**Installation & Setup:**
- ✅ Python 3.10 venv working on HoReKa
- ✅ All dependencies installed (gym, robomimic, d3il)
- ✅ WandB logging configured with "dppo-" project prefix
- ✅ HoReKa Intel compiler fix for mujoco-py integrated into install script
- ✅ Cython version pinned to 0.29.37 for mujoco-py compatibility
**Validated Pre-training:**
- ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging)
- ✅ Robomimic: lift, can, square (WandB: can: https://wandb.ai/dominik_roth/robomimic-can-pretrain/runs/xwpzcssw, square: https://wandb.ai/dominik_roth/robomimic-square-pretrain/runs/hty80o7z)
- ✅ D3IL: avoid_m1 (working)
## What We're Doing Right Now 🔄
**Current Jobs:**
- 🔄 Job 3445594: Running updated installer with integrated MuJoCo fix
- 🔄 Job 3445604: Testing robomimic square (new job)
- 🔄 Job 3445606: Testing robomimic transport
**Latest Success:**
- ✅ Job 3445550: Robomimic square pre-training SUCCESS with WandB logging!
**Progress on MuJoCo Fix:**
- ✅ Identified root cause: Intel compiler flags incompatible with GCC for mujoco-py
- ✅ Developed sysconfig patch to override Intel flags
- ✅ Integrated fix into install script and README
- 🔄 Waiting for installer completion to test fix validation
## What Needs to Be Done 📋
### Phase 1: Complete Installation Validation
**Goal:** Confirm every environment works in both pre-train and fine-tune modes
**Remaining Pre-training Tests:**
- Robomimic: transport (in progress)
- D3IL: avoid_m2, avoid_m3 (waiting for full installer)
**Fine-tuning Tests (after MuJoCo validation):**
- Gym: hopper, walker2d, halfcheetah
- Robomimic: lift, can, square, transport
- D3IL: avoid_m1, avoid_m2, avoid_m3
### Phase 2: Paper Results Generation
**Goal:** Run full experiments to replicate paper results
**Gym Tasks (Core Paper Results):**
- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
**Extended Results:**
- All Robomimic tasks: full pre-train + fine-tune
- All D3IL tasks: full pre-train + fine-tune
## Current Status
**Blockers:** None - all technical issues resolved
**Waiting on:** Cluster resources to run validation jobs
**Next Step:** Complete Phase 1 validation, then move to Phase 2 production runs
## Success Criteria
- [ ] All environments work in dev tests (Phase 1)
- [ ] All paper results replicated and in WandB (Phase 2)
- [ ] Complete documentation for future users