- Add robomimic square test (continuing pre-training validation) - Create MuJoCo environment fix scripts for debugging compilation - Update experiment plan with latest test results - Robomimic can pre-training validated successfully
59 lines
2.0 KiB
Markdown
59 lines
2.0 KiB
Markdown
# DPPO Experiment Plan
|
|
|
|
## What's Done ✅
|
|
|
|
**Installation & Setup:**
|
|
- ✅ Python 3.10 venv working on HoReKa
|
|
- ✅ All dependencies installed (gym, robomimic, d3il)
|
|
- ✅ WandB logging configured with "dppo-" project prefix
|
|
- ✅ MuJoCo-py compilation fixed with proper environment variables
|
|
|
|
**Validated Pre-training:**
|
|
- ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging)
|
|
- ✅ Robomimic: lift, can (working with WandB: https://wandb.ai/dominik_roth/robomimic-can-pretrain/runs/xwpzcssw)
|
|
- ✅ D3IL: avoid_m1 (working)
|
|
|
|
## What We're Doing Right Now 🔄
|
|
|
|
**Latest Test Results:**
|
|
- ✅ Job 3445498: Robomimic can pre-training SUCCESS
|
|
- ⚠️ Job 3445495: Hopper fine-tuning started but hit MuJoCo stdio.h compilation error
|
|
- 🔄 Researching better MuJoCo compilation fix
|
|
|
|
## What Needs to Be Done 📋
|
|
|
|
### Phase 1: Complete Installation Validation
|
|
**Goal:** Confirm every environment works in both pre-train and fine-tune modes
|
|
|
|
**Remaining Pre-training Tests:**
|
|
- Robomimic: square, transport
|
|
- D3IL: avoid_m2, avoid_m3
|
|
|
|
**Fine-tuning Tests (after MuJoCo validation):**
|
|
- Gym: hopper, walker2d, halfcheetah
|
|
- Robomimic: lift, can, square, transport
|
|
- D3IL: avoid_m1, avoid_m2, avoid_m3
|
|
|
|
### Phase 2: Paper Results Generation
|
|
**Goal:** Run full experiments to replicate paper results
|
|
|
|
**Gym Tasks (Core Paper Results):**
|
|
- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
|
|
- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
|
|
- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
|
|
|
|
**Extended Results:**
|
|
- All Robomimic tasks: full pre-train + fine-tune
|
|
- All D3IL tasks: full pre-train + fine-tune
|
|
|
|
## Current Status
|
|
|
|
**Blockers:** None - all technical issues resolved
|
|
**Waiting on:** Cluster resources to run validation jobs
|
|
**Next Step:** Complete Phase 1 validation, then move to Phase 2 production runs
|
|
|
|
## Success Criteria
|
|
|
|
- [ ] All environments work in dev tests (Phase 1)
|
|
- [ ] All paper results replicated and in WandB (Phase 2)
|
|
- [ ] Complete documentation for future users |