* Sampling over both env and denoising steps in DPPO updates (#13)
* sample one from each chain
* full random sampling
* Add Proficient Human (PH) Configs and Pipeline (#16)
* fix missing cfg
* add ph config
* fix how terminated flags are added to buffer in ibrl
* add ph config
* offline calql for 1M gradient updates
* bug fix: number of calql online gradient steps is the number of new transitions collected
* add sample config for DPPO with ta=1
* Sampling over both env and denoising steps in DPPO updates (#13)
* sample one from each chain
* full random sampling
* fix diffusion loss when predicting initial noise
* fix dppo inds
* fix typo
* remove print statement
---------
Co-authored-by: Justin M. Lidard <jlidard@neuronic.cs.princeton.edu>
Co-authored-by: allenzren <allen.ren@princeton.edu>
* update robomimic configs
* better calql formulation
* optimize calql and ibrl training
* optimize data transfer in ppo agents
* add kitchen configs
* re-organize config folders, rerun calql and rlpd
* add scratch gym locomotion configs
* add kitchen installation dependencies
* use truncated for termination in furniture env
* update furniture and gym configs
* update README and dependencies with kitchen
* add url for new data and checkpoints
* update demo RL configs
* update batch sizes for furniture unet configs
* raise error about dropout in residual mlp
* fix observation bug in bc loss
---------
Co-authored-by: Justin Lidard <60638575+jlidard@users.noreply.github.com>
Co-authored-by: Justin M. Lidard <jlidard@neuronic.cs.princeton.edu>