Guide for Sim2Real Training & Deployment

This guide provides guide to run sim-to-real experiments using FastTD3 and BoosterGym.

⚠️ Warning: Deploying RL policies to real hardware can be sometimes very dangerous. Please make sure that you understand everything, check your policies work well in simulation, set every robot configuration correct (e.g., damping, stiffness, torque limits, etc), and follow proper safety protocols. Simply copy-pasting commands in this README is not safe.

⚙️ Prerequisites

Install dependencies for Playground experiments (see README.md)

Then, install fast_td3 package with pip install -e . so you can import its classes in BoosterGym (see fast_td3_deploy.py).

⚠️ Note: Our sim-to-real experiments depend on our customized MuJoCo Playground that supports T1LowDimJoystick tasks for 12-DOF T1 control instead of 23-DOF T1 control in T1Joystick tasks.

🚀 Training in simulation

Users can train deployable policies for Booster T1 with FastTD3 using the below script:

python fast_td3/train.py --env_name T1LowDimJoystickRoughTerrain --exp_name FastTD3 --use_domain_randomization --use_push_randomization --total_timesteps 1000000 --render_interval 0 --seed 2

⚠️ Note: There is no 'guaranteed' number of training steps that can ensure safe real-world deployment. Usually, the gait becomes more stable with longer training. Please check the quality of gaits via sim-to-sim transfer, and fine-tune the policy to fix the issues. Use the checkpoints in models directory for sim-to-sim or sim-to-real transfer.

⚠️ Note: We set render_interval to 0 to avoid dumping a lot of videos into wandb. Make sure to set it to non-zero values if you want to render videos during training.

(Optional) 2-Stage Training

For faster convergence, users can consider introducing curriculum to the training -- so that the robot first learns to walk in a flat terrain without push perturbations. For this, train policies with the below script:

STAGE1_STEPS = 100000
STAGE2_STEPS = 300000  # Effective steps: 300000 - 200000 = 100000
SEED = 2
CHECKPOINT_PATH = T1LowDimJoystickFlatTerrain__FastTD3-Stage1__${SEED}_final.pt

conda activate fasttd3_playground

# Stage 1 training
python fast_td3/train.py --env_name T1LowDimJoystickFlatTerrain --exp_name FastTD3-Stage1 --use_domain_randomization --no_use_push_randomization --total_timesteps ${STAGE1_STEPS} --render_interval 0 --seed ${SEED}

# Stage 2 training
python fast_td3/train.py --env_name T1LowDimJoystickRoughTerrain --exp_name FastTD3-Stage2 --use_domain_randomization --use_push_randomization --total_timesteps ${STAGE2_STEPS} --render_interval 0 --checkpoint_path ${CHECKPOINT_PATH} --seed ${SEED}

Again, 100K and 200K steps do not guarantee safe real-world deployment. Please check the quality of gaits via sim-to-sim transfer, and fine-tune the policy to fix the issues. Use the final checkpoint (models/T1LowDimJoystickRoughTerrain__FastTD3-Stage2__${SEED}_final.pt) for sim-to-sim or sim-to-real transfer.

🛝 Deployment with BoosterGym

We use the customized version of BoosterGym for deployment with FastTD3.

First, clone our fork of BoosterGym.

git clone https://github.com/carlosferrazza/booster_gym.git

Then, follow the guide to install dependencies for BoosterGym.

Sim-to-Sim Transfer

You can check whether the trained policy transfers to non-MJX version of MuJoCo. Use the following commands in a machine that supports rendering to test sim-to-sim transfer:

cd <YOUR_WORKSPACE>/booster_gym

# Activate your BoosterGym virtual environemnt

# Launch MuJoCo simulation
python play_mujoco.py --task=T1 --checkpoint=<CHECKPOINT_PATH>
mjpython play_mujoco.py --task=T1 --checkpoint=<CHECKPOINT_PATH>  # for Mac

Sim-to-Real Transfer

First, prepare a JIT-scripted checkpoint

# Python snippets for JIT-scripting checkpoints
import torch
from fast_td3 import load_policy
policy = load_policy(<CHECKPOINT_PATH>)
scripted_policy = torch.jit.script(policy)
scripted_policy.save(<JIT_CHECKPOINT_PATH>)

Then, deploy this JIT-scripted checkpoint by following the guide on Booster T1 Deployment.

⚠️ Warning: Please double-check every value in robot configuration (booster_gym/deploy/configs/T1.yaml) is correctly set! If values for position control such as damping or stiffness are set differently, your robot may perform dangerous behaviors.

⚠️ Warning: You may want to use different configuration (e.g., damping and stiffness, etc) for your own experiments. Just make sure to thoroughly test it in simulation and make sure to set the values correctly.

🚀 That's it! Hope everything went smoothly, and be aware of your safety.

4.8 KiB Raw Permalink Blame History