FastTD3/experiment_plan.md

# FastTD3 HoReKa Experiment Plan
*Added by Dominik - Paper Replication Study*

## ✅ Proof of Concept Results
**Initial Success**: [HoReKa Dev Run](https://wandb.ai/rl-network-scaling/FastTD3_HoReKa_Dev?nw=nwuserdominik_roth)

- **Task**: T1JoystickFlatTerrain
- **Duration**: 7 minutes (5000 timesteps)
- **Performance**: Successfully training at ~29 it/s
- **Key Achievement**: Fixed JAX/PyTorch dtype mismatch issue (removed JAX_ENABLE_X64)
- **Status**: ✅ Environment working, ready for full-scale experiments

## 🚧 Currently Running Jobs

### Phase 1: MuJoCo Playground - SUBMITTED ✅
**SLURM Job IDs**: 3367710-3367723 (12 jobs total)
- ⏳ T1JoystickFlatTerrain (seeds 1,2,3) - Jobs: 3367710, 3367711, 3367712
- ⏳ T1JoystickRoughTerrain (seeds 1,2,3) - Jobs: 3367713, 3367716, 3367717
- ⏳ G1JoystickFlatTerrain (seeds 1,2,3) - Jobs: 3367718, 3367719, 3367720
- ⏳ G1JoystickRoughTerrain (seeds 1,2,3) - Jobs: 3367721, 3367722, 3367723
- **Status**: All jobs pending in queue
- **Monitor**: `python monitor_experiments.py logs/experiment_tracking_1753196960.yaml --watch`

## 📋 TODO List

### Phase 1: MuJoCo Playground
- [x] Set up MuJoCo Playground environment
- [x] Test 5000-step run successfully
- [x] Submit full batch (4 tasks × 3 seeds)
- [ ] Wait for jobs to complete (~1 hour each)
- [ ] Verify results match paper Figure 11
- [ ] Download wandb data for analysis

### Phase 2: IsaacLab
- [ ] **INSTALL ISAACLAB ENVIRONMENT FIRST**
- [ ] Test single IsaacLab task
- [ ] Submit batch: `python submit_experiment_batch.py --phase 2 --seeds 3`
- [ ] Monitor 6 tasks × 3 seeds (18 jobs total)
- [ ] Verify results match paper Figure 10

### Phase 3: HumanoidBench
- [ ] **INSTALL HUMANOIDBENCH ENVIRONMENT FIRST**
- [ ] Test single HumanoidBench task
- [ ] Submit batch: `python submit_experiment_batch.py --phase 3 --seeds 3`
- [ ] Monitor 5 tasks × 3 seeds (15 jobs total)
- [ ] Verify results match paper Figure 9

### Analysis & Completion
- [ ] Collect all results from wandb
- [ ] Generate comparison plots vs paper
- [ ] Document findings and performance
- [ ] Create final report

## 📊 Experiment Details

### Phase 1: MuJoCo Playground (Figure 11 from paper)
- `T1JoystickFlatTerrain`, `T1JoystickRoughTerrain`, `G1JoystickFlatTerrain`, `G1JoystickRoughTerrain`
- **Duration**: 3600s each
- **Hyperparameters**: total_timesteps=500000, num_envs=2048, batch_size=32768, buffer_size=102400, eval_interval=25000

### Phase 2: IsaacLab (Figure 10 from paper)
- `Isaac-Velocity-Flat-G1-v0`, `Isaac-Velocity-Rough-G1-v0`, `Isaac-Repose-Cube-Allegro-Direct-v0`, `Isaac-Repose-Cube-Shadow-Direct-v0`, `Isaac-Velocity-Flat-H1-v0`, `Isaac-Velocity-Rough-H1-v0`
- **Duration**: 3600s each
- **Hyperparameters**: total_timesteps=1000000, num_envs=1024, batch_size=32768, buffer_size=51200, eval_interval=50000

### Phase 3: HumanoidBench (Figure 9 from paper - subset)
- `h1hand-walk`, `h1hand-run`, `h1hand-hurdle`, `h1hand-stair`, `h1hand-slide`
- **Duration**: 10800s each
- **Hyperparameters**: total_timesteps=2000000, num_envs=256, batch_size=16384, buffer_size=12800, eval_interval=100000

## 🔧 Commands

Monitor jobs:
```bash
squeue -u $USER
python monitor_experiments.py logs/experiment_tracking_1753196960.yaml --watch
```

Submit next phases:
```bash
python submit_experiment_batch.py --phase 2 --seeds 3
python submit_experiment_batch.py --phase 3 --seeds 3
```