- Simplify experiment plan with clear phases and current status - Add complete MuJoCo setup instructions for fine-tuning - Update install script to include all dependencies - Document current validation progress and next steps
1.9 KiB
1.9 KiB
DPPO Experiment Plan
What's Done ✅
Installation & Setup:
- ✅ Python 3.10 venv working on HoReKa
- ✅ All dependencies installed (gym, robomimic, d3il)
- ✅ WandB logging configured with "dppo-" project prefix
- ✅ MuJoCo-py compilation fixed with proper environment variables
Validated Pre-training:
- ✅ Gym: hopper, walker2d, halfcheetah (all working with data download & WandB logging)
- ✅ Robomimic: lift (working)
- ✅ D3IL: avoid_m1 (working)
What We're Doing Right Now 🔄
Current Jobs Running:
- Job 3445495: Testing hopper fine-tuning (validates MuJoCo fix)
- Job 3445498: Testing robomimic can pre-training
What Needs to Be Done 📋
Phase 1: Complete Installation Validation
Goal: Confirm every environment works in both pre-train and fine-tune modes
Remaining Pre-training Tests:
- Robomimic: can, square, transport
- D3IL: avoid_m2, avoid_m3
Fine-tuning Tests (after MuJoCo validation):
- Gym: hopper, walker2d, halfcheetah
- Robomimic: lift, can, square, transport
- D3IL: avoid_m1, avoid_m2, avoid_m3
Phase 2: Paper Results Generation
Goal: Run full experiments to replicate paper results
Gym Tasks (Core Paper Results):
- hopper-medium-v2 → hopper-v2: Pre-train (200 epochs) + Fine-tune
- walker2d-medium-v2 → walker2d-v2: Pre-train (200 epochs) + Fine-tune
- halfcheetah-medium-v2 → halfcheetah-v2: Pre-train (200 epochs) + Fine-tune
Extended Results:
- All Robomimic tasks: full pre-train + fine-tune
- All D3IL tasks: full pre-train + fine-tune
Current Status
Blockers: None - all technical issues resolved Waiting on: Cluster resources to run validation jobs Next Step: Complete Phase 1 validation, then move to Phase 2 production runs
Success Criteria
- All environments work in dev tests (Phase 1)
- All paper results replicated and in WandB (Phase 2)
- Complete documentation for future users