- Submit all 10 full replication runs on accelerated partition
- Update experiment plan with complete validation results and full run status
- Add comprehensive full run scripts for robomimic and D3IL environments
- All validated environments now running full paper-quality experiments
- Total queue: 3 Gym + 4 Robomimic + 3 D3IL fine-tuning runs
- Complete validation status table with results for all environments
- Add WandB tracking URLs for completed fine-tuning runs
- Document technical fixes and current job queue status
- Add test scripts for remaining D3IL avoid_m3 and robomimic transport validation
- Complete SLURM test scripts for all environment types
- Gym fine-tuning: walker2d, halfcheetah validation tests
- Robomimic fine-tuning: lift validation test with scheduler fix
- D3IL validation: avoid_m1 pre-training and fine-tuning tests
- Updated experiment plan with current validation status
- All major environments now have automated testing pipeline
- Simplify experiment plan with clear phases and current status
- Add complete MuJoCo setup instructions for fine-tuning
- Update install script to include all dependencies
- Document current validation progress and next steps
- Updated all WandB project names to use dppo- prefix for organization
- Added flexible dev testing script for all environments
- Created organized dev_tests directory for test scripts
- Fixed MuJoCo compilation issues (added GCC compiler flags)
- Documented Python 3.10 compatibility and Furniture-Bench limitation
- Validated pre-training for Gym, Robomimic, D3IL environments
- Updated experiment tracking with validation results
- Enhanced README with troubleshooting and setup instructions
- Pre-training: diffusion model on offline D4RL data (200 epochs)
- Fine-tuning: PPO fine-tune with online environment interaction
- Dev test: 2 epochs only for quick verification, not full training
- Configure DPPO_WANDB_ENTITY environment variable in dev script
- Update README with clear WandB setup instructions
- Remove wandb=null to enable logging when credentials are set
- Disable WandB in dev script to avoid config object vs string error
- Successfully completed development test (Job 3445106)
- Confirmed: pre-training works, loss reduces, checkpoints save
- Update experiment tracking with successful results
- Document installation process using Python 3.10 venv
- Add usage examples for SLURM job submission
- Document available environments and resource allocations
- Add WandB configuration instructions
- List all repository changes made for HoReKa compatibility
* move ema update within pretraining epoch
* update pretraining ema configs
* add lift and can epoch 8000 checkpoint url
* add note about EMA issue in pretraining instruction
* Sampling over both env and denoising steps in DPPO updates (#13)
* sample one from each chain
* full random sampling
* Add Proficient Human (PH) Configs and Pipeline (#16)
* fix missing cfg
* add ph config
* fix how terminated flags are added to buffer in ibrl
* add ph config
* offline calql for 1M gradient updates
* bug fix: number of calql online gradient steps is the number of new transitions collected
* add sample config for DPPO with ta=1
* Sampling over both env and denoising steps in DPPO updates (#13)
* sample one from each chain
* full random sampling
* fix diffusion loss when predicting initial noise
* fix dppo inds
* fix typo
* remove print statement
---------
Co-authored-by: Justin M. Lidard <jlidard@neuronic.cs.princeton.edu>
Co-authored-by: allenzren <allen.ren@princeton.edu>
* update robomimic configs
* better calql formulation
* optimize calql and ibrl training
* optimize data transfer in ppo agents
* add kitchen configs
* re-organize config folders, rerun calql and rlpd
* add scratch gym locomotion configs
* add kitchen installation dependencies
* use truncated for termination in furniture env
* update furniture and gym configs
* update README and dependencies with kitchen
* add url for new data and checkpoints
* update demo RL configs
* update batch sizes for furniture unet configs
* raise error about dropout in residual mlp
* fix observation bug in bc loss
---------
Co-authored-by: Justin Lidard <60638575+jlidard@users.noreply.github.com>
Co-authored-by: Justin M. Lidard <jlidard@neuronic.cs.princeton.edu>