- Disable WandB in dev script to avoid config object vs string error - Successfully completed development test (Job 3445106) - Confirmed: pre-training works, loss reduces, checkpoints save - Update experiment tracking with successful results
97 lines
2.2 KiB
Markdown
97 lines
2.2 KiB
Markdown
# DPPO Experiment Plan
|
|
|
|
## Current Status
|
|
|
|
### Setup Complete ✅
|
|
- Installation successful on HoReKa with Python 3.10 venv
|
|
- SLURM scripts created for automated job submission
|
|
- All dependencies installed including PyTorch, d4rl, dm-control
|
|
|
|
### Initial Testing
|
|
✅ **DPPO Confirmed Working on HoReKa**
|
|
- Successfully completed dev test (Job ID 3445106)
|
|
- Pre-training working: 2 epochs, loss reduction 0.2494→0.2010
|
|
- Model checkpoints saved correctly
|
|
- Ready for full experiments
|
|
|
|
## Experiments To Run
|
|
|
|
### 1. Reproduce Paper Results - Gym Tasks
|
|
|
|
**Pre-training Phase**:
|
|
- hopper-medium-v2
|
|
- walker2d-medium-v2
|
|
- halfcheetah-medium-v2
|
|
|
|
**Fine-tuning Phase**:
|
|
- hopper-v2
|
|
- walker2d-v2
|
|
- halfcheetah-v2
|
|
|
|
**Settings**: Paper hyperparameters, 3 seeds each
|
|
|
|
### 2. Additional Environments (Future)
|
|
|
|
**Robomimic Suite**:
|
|
- lift, can, square, transport
|
|
|
|
**D3IL Suite**:
|
|
- avoid_m1, avoid_m2, avoid_m3
|
|
|
|
**Furniture-Bench Suite**:
|
|
- one_leg, lamp, round_table (low/med difficulty)
|
|
|
|
## Running Experiments
|
|
|
|
### Quick Development Test
|
|
```bash
|
|
./submit_job.sh dev
|
|
```
|
|
|
|
### Gym Pre-training
|
|
```bash
|
|
./submit_job.sh gym hopper pretrain
|
|
./submit_job.sh gym walker2d pretrain
|
|
./submit_job.sh gym halfcheetah pretrain
|
|
```
|
|
|
|
### Gym Fine-tuning (after pre-training completes)
|
|
```bash
|
|
./submit_job.sh gym hopper finetune
|
|
./submit_job.sh gym walker2d finetune
|
|
./submit_job.sh gym halfcheetah finetune
|
|
```
|
|
|
|
### Manual SLURM Submission
|
|
```bash
|
|
# With environment variables
|
|
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
|
|
```
|
|
|
|
## Job Tracking
|
|
|
|
| Job ID | Type | Task | Mode | Status | Duration | Results |
|
|
|--------|------|------|------|---------|----------|---------|
|
|
| 3445106 | dev test | hopper | pretrain | ✅ SUCCESS | 2m11s | Train loss: 0.2494→0.2010 |
|
|
|
|
## Configuration Notes
|
|
|
|
### WandB Setup Required
|
|
```bash
|
|
export WANDB_API_KEY=<your_api_key>
|
|
export WANDB_ENTITY=<your_username>
|
|
```
|
|
|
|
### Resource Requirements
|
|
- **Dev jobs**: 30min, 24GB RAM, 8 CPUs, dev_accelerated
|
|
- **Production**: 8h, 32GB RAM, 40 CPUs, accelerated
|
|
|
|
## Issues Encountered
|
|
|
|
No issues with the DPPO repository - installation and setup completed successfully.
|
|
|
|
## Next Steps
|
|
|
|
1. Run corrected dev test
|
|
2. Begin systematic pre-training experiments
|
|
3. Document successful runs and results |