- Updated all WandB project names to use dppo- prefix for organization
- Added flexible dev testing script for all environments
- Created organized dev_tests directory for test scripts
- Fixed MuJoCo compilation issues (added GCC compiler flags)
- Documented Python 3.10 compatibility and Furniture-Bench limitation
- Validated pre-training for Gym, Robomimic, D3IL environments
- Updated experiment tracking with validation results
- Enhanced README with troubleshooting and setup instructions
- Pre-training: diffusion model on offline D4RL data (200 epochs)
- Fine-tuning: PPO fine-tune with online environment interaction
- Dev test: 2 epochs only for quick verification, not full training
- Disable WandB in dev script to avoid config object vs string error
- Successfully completed development test (Job 3445106)
- Confirmed: pre-training works, loss reduces, checkpoints save
- Update experiment tracking with successful results