# FastTD3 HoReKa Experiment Plan *Added by Dominik - Paper Replication Study* ## ✅ Proof of Concept Results **Initial Success**: [HoReKa Dev Run](https://wandb.ai/rl-network-scaling/FastTD3_HoReKa_Dev?nw=nwuserdominik_roth) - **Task**: T1JoystickFlatTerrain - **Duration**: 7 minutes (5000 timesteps) - **Performance**: Successfully training at ~29 it/s - **Key Achievement**: Fixed JAX/PyTorch dtype mismatch issue (removed JAX_ENABLE_X64) - **Status**: ✅ Environment working, ready for full-scale experiments ## Experiments to Replicate ### Phase 1: MuJoCo Playground (Figure 11 from paper) - `T1JoystickFlatTerrain` (3600s) - `T1JoystickRoughTerrain` (3600s) - `G1JoystickFlatTerrain` (3600s) - `G1JoystickRoughTerrain` (3600s) **Hyperparameters (from paper):** - total_timesteps: 500000 - num_envs: 2048 - batch_size: 32768 - buffer_size: 102400 (50K per env) - eval_interval: 25000 ### Phase 2: IsaacLab (Figure 10 from paper) - `Isaac-Velocity-Flat-G1-v0` (3600s) - `Isaac-Velocity-Rough-G1-v0` (3600s) - `Isaac-Repose-Cube-Allegro-Direct-v0` (3600s) - `Isaac-Repose-Cube-Shadow-Direct-v0` (3600s) - `Isaac-Velocity-Flat-H1-v0` (3600s) - `Isaac-Velocity-Rough-H1-v0` (3600s) **Hyperparameters:** - total_timesteps: 1000000 - num_envs: 1024 - batch_size: 32768 - buffer_size: 51200 - eval_interval: 50000 ### Phase 3: HumanoidBench (Figure 9 from paper - subset) - `h1hand-walk` (10800s) - `h1hand-run` (10800s) - `h1hand-hurdle` (10800s) - `h1hand-stair` (10800s) - `h1hand-slide` (10800s) **Hyperparameters:** - total_timesteps: 2000000 - num_envs: 256 - batch_size: 16384 - buffer_size: 12800 - eval_interval: 100000 ## Usage Submit Phase 1: ```bash python submit_experiment_batch.py --phase 1 --seeds 3 ``` Monitor progress: ```bash python monitor_experiments.py --watch ```