Clarify pre-training vs fine-tuning phases and dev test purpose
- Pre-training: diffusion model on offline D4RL data (200 epochs) - Fine-tuning: PPO fine-tune with online environment interaction - Dev test: 2 epochs only for quick verification, not full training
This commit is contained in:
parent
80339cad52
commit
a67f474fc0
@ -8,25 +8,26 @@
|
|||||||
- All dependencies installed including PyTorch, d4rl, dm-control
|
- All dependencies installed including PyTorch, d4rl, dm-control
|
||||||
|
|
||||||
### Initial Testing
|
### Initial Testing
|
||||||
✅ **DPPO Confirmed Working on HoReKa**
|
✅ **DPPO Confirmed Working on HoReKa with WandB**
|
||||||
- Successfully completed dev test (Job ID 3445106)
|
- Successfully completed dev test (Job ID 3445117)
|
||||||
- Pre-training working: 2 epochs, loss reduction 0.2494→0.2010
|
- Quick verification: 2 epochs only (not full training), loss reduction 0.2494→0.2010
|
||||||
- Model checkpoints saved correctly
|
- WandB logging working: https://wandb.ai/dominik_roth/gym-hopper-medium-v2-pretrain/runs/rztwqutf
|
||||||
- Ready for full experiments
|
- Model checkpoints and logging fully functional
|
||||||
|
- Ready for full 200-epoch production runs
|
||||||
|
|
||||||
## Experiments To Run
|
## Experiments To Run
|
||||||
|
|
||||||
### 1. Reproduce Paper Results - Gym Tasks
|
### 1. Reproduce Paper Results - Gym Tasks
|
||||||
|
|
||||||
**Pre-training Phase**:
|
**Pre-training Phase** (Train diffusion model on offline D4RL datasets):
|
||||||
- hopper-medium-v2
|
- hopper-medium-v2 → diffusion model trained on offline data (200 epochs)
|
||||||
- walker2d-medium-v2
|
- walker2d-medium-v2 → diffusion model trained on offline data (200 epochs)
|
||||||
- halfcheetah-medium-v2
|
- halfcheetah-medium-v2 → diffusion model trained on offline data (200 epochs)
|
||||||
|
|
||||||
**Fine-tuning Phase**:
|
**Fine-tuning Phase** (PPO fine-tune diffusion model with online interaction):
|
||||||
- hopper-v2
|
- hopper-v2 → fine-tune pre-trained hopper model with PPO + online env
|
||||||
- walker2d-v2
|
- walker2d-v2 → fine-tune pre-trained walker2d model with PPO + online env
|
||||||
- halfcheetah-v2
|
- halfcheetah-v2 → fine-tune pre-trained halfcheetah model with PPO + online env
|
||||||
|
|
||||||
**Settings**: Paper hyperparameters, 3 seeds each
|
**Settings**: Paper hyperparameters, 3 seeds each
|
||||||
|
|
||||||
@ -92,6 +93,18 @@ No issues with the DPPO repository - installation and setup completed successful
|
|||||||
|
|
||||||
## Next Steps
|
## Next Steps
|
||||||
|
|
||||||
1. Run corrected dev test
|
### Immediate Tasks (To Verify All Environments Work)
|
||||||
2. Begin systematic pre-training experiments
|
|
||||||
3. Document successful runs and results
|
1. **Test remaining Gym environments**:
|
||||||
|
- [ ] walker2d-medium-v2 (2 epochs dev test)
|
||||||
|
- [ ] halfcheetah-medium-v2 (2 epochs dev test)
|
||||||
|
|
||||||
|
2. **Test other environment types**:
|
||||||
|
- [ ] Robomimic: can task (basic test)
|
||||||
|
- [ ] D3IL: avoid_m1 (basic test)
|
||||||
|
|
||||||
|
3. **Full production runs** (after confirming all work):
|
||||||
|
- [ ] Full pre-training: hopper, walker2d, halfcheetah (200 epochs each)
|
||||||
|
- [ ] Fine-tuning experiments
|
||||||
|
|
||||||
|
**Status**: Only hopper-medium-v2 confirmed working. Need to verify other environments before production runs.
|
Loading…
Reference in New Issue
Block a user