dppo/cfg/finetuning.md
Allen Z. Ren 1d04211666 v0.7 (#26)
* update from scratch configs

* update gym pretraining configs - use fewer epochs

* update robomimic pretraining configs - use fewer epochs

* allow trajectory plotting in eval agent

* add simple vit unet

* update avoid pretraining configs - use fewer epochs

* update furniture pretraining configs - use same amount of epochs as before

* add robomimic diffusion unet pretraining configs

* update robomimic finetuning configs - higher lr

* add vit unet checkpoint urls

* update pretraining and finetuning instructions as configs are updated
2024-11-20 15:56:23 -05:00

2.8 KiB

Fine-tuning experiments

Update, Nov 20 2024: In v0.7 we updated the fine-tuning configs as we find sample efficiency can be improved with higher actor learning rate and other hyperparameters. If you would like to replicate the original experimental results from the paper, please use the configs from v0.6. Otherwise we recommmend starting with configs from v0.7 for your applications.

Comparing diffusion-based RL algorithms (Sec. 5.1)

Gym configs are under cfg/gym/finetune/<env_name>/, and the naming follows ft_<alg_name>_diffusion_mlp, e.g., ft_awr_diffusion_mlp. alg_name is one of rwr, awr, dipo, idql, dql, qsm, ppo (DPPO), ppo_exact (exact likelihood). They share the same pre-trained checkpoint in each env.

Robomimic configs are under cfg/robomimic/finetune/<env_name>/, and the naming follows the same.

Comparing policy parameterizations (Sec. 5.2, 5.3)

Robomimic configs are under cfg/robomimic/finetune/<env_name>/, and the naming follows ft_ppo_<diffusion/gaussian/gmm>_<mlp/unet/transformer>_<img?>. For pixel experiments, we choose pre-trained checkpoints such that the pre-training performance is similar between DPPO and Gaussian baseline.

Note: For Can and Lift in Robomimic with DPPO, you need to manually download the final checkpoints (epoch 8000). The default ones in the configs are from epoch 5000 (more room for fine-tuning improvement) and used for comparing diffusion-based RL algorithms,

Furniture-Bench configs are under cfg/furniture/finetune/<env_name>/, and the naming follows ft_<diffusion/gaussian>_<mlp/unet>. In the paper we did not show the results of ft_diffusion_mlp. Running IsaacGym for the first time may take a while for setting up the meshes. If you encounter the error about libpython, see instruction here.

D3IL (Sec. 6)

D3IL configs are under cfg/d3il/finetune/avoid_<mode>/, and the naming follows ft_ppo_<diffusion/gaussian/gmm>_mlp. The number of fine-tuned denoising steps can be specified with ft_denoising_steps.

Training from scratch (App. B.2)

ppo_diffusion_mlp and ppo_gaussian_mlp under cfg/gym/finetune/<env_name> are for training DPPO or Gaussian policy from scratch.

Comparing to exact likelihood policy gradient (App. B.5)

ft_ppo_exact_diffusion_mlp under cfg/gym/finetune/hopper-v2, cfg/gym/finetune/halfcheetah-v2, and cfg/robomimic/finetune/can are for training diffusion policy gradient with exact likelihood. torchdiffeq package needs to be installed first with pip install -e .[exact].