FastTD3/experiment_plan.md
ys1087@partner.kit.edu b7b5a59803 Upd experiment_plan.md
2025-07-24 01:11:30 +02:00

84 lines
3.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# FastTD3 HoReKa Experiment Plan
*Added by Dominik - Paper Replication Study*
## ✅ Proof of Concept Results
**Initial Success**: [HoReKa Dev Run](https://wandb.ai/rl-network-scaling/FastTD3_HoReKa_Dev?nw=nwuserdominik_roth)
- **Task**: T1JoystickFlatTerrain
- **Duration**: 7 minutes (5000 timesteps)
- **Performance**: Successfully training at ~29 it/s
- **Key Achievement**: Fixed JAX/PyTorch dtype mismatch issue (removed JAX_ENABLE_X64)
- **Status**: ✅ Environment working, ready for full-scale experiments
## 🚧 Currently Running Jobs
### Phase 1: MuJoCo Playground - RESUBMITTED TO H100 ✅
**NEW SLURM Job IDs**: 3371681-3371692 (12 jobs total) - Using accelerated-h100 partition (94GB GPU RAM)
- ⏳ T1JoystickFlatTerrain (seeds 1,2,3) - Jobs: 3371681, 3371682, 3371683
- ⏳ T1JoystickRoughTerrain (seeds 1,2,3) - Jobs: 3371684, 3371685, 3371686
- ⏳ G1JoystickFlatTerrain (seeds 1,2,3) - Jobs: 3371687, 3371688, 3371689
- ⏳ G1JoystickRoughTerrain (seeds 1,2,3) - Jobs: 3371690, 3371691, 3371692
- **Status**: All jobs pending in accelerated-h100 queue
- **Monitor**: `python monitor_experiments.py experiment_tracking_1753312228.yaml --watch`
- **Note**: Previous jobs (3367710-3367723) crashed due to insufficient GPU RAM on standard partition
## 📋 TODO List
### Phase 1: MuJoCo Playground
- [x] Set up MuJoCo Playground environment
- [x] Test 5000-step run successfully
- [x] Submit full batch (4 tasks × 3 seeds)
- [ ] Wait for jobs to complete (~1 hour each)
- [ ] Verify results match paper Figure 11
- [ ] Download wandb data for analysis
### Phase 2: IsaacLab
- [ ] **INSTALL ISAACLAB ENVIRONMENT FIRST**
- [ ] Test single IsaacLab task
- [ ] Submit batch: `python submit_experiment_batch.py --phase 2 --seeds 3`
- [ ] Monitor 6 tasks × 3 seeds (18 jobs total)
- [ ] Verify results match paper Figure 10
### Phase 3: HumanoidBench
- [ ] **INSTALL HUMANOIDBENCH ENVIRONMENT FIRST**
- [ ] Test single HumanoidBench task
- [ ] Submit batch: `python submit_experiment_batch.py --phase 3 --seeds 3`
- [ ] Monitor 5 tasks × 3 seeds (15 jobs total)
- [ ] Verify results match paper Figure 9
### Analysis & Completion
- [ ] Collect all results from wandb
- [ ] Generate comparison plots vs paper
- [ ] Document findings and performance
- [ ] Create final report
## 📊 Experiment Details
### Phase 1: MuJoCo Playground (Figure 11 from paper)
- `T1JoystickFlatTerrain`, `T1JoystickRoughTerrain`, `G1JoystickFlatTerrain`, `G1JoystickRoughTerrain`
- **Duration**: 3600s each
- **Hyperparameters**: total_timesteps=500000, num_envs=2048, batch_size=32768, buffer_size=102400, eval_interval=25000
### Phase 2: IsaacLab (Figure 10 from paper)
- `Isaac-Velocity-Flat-G1-v0`, `Isaac-Velocity-Rough-G1-v0`, `Isaac-Repose-Cube-Allegro-Direct-v0`, `Isaac-Repose-Cube-Shadow-Direct-v0`, `Isaac-Velocity-Flat-H1-v0`, `Isaac-Velocity-Rough-H1-v0`
- **Duration**: 3600s each
- **Hyperparameters**: total_timesteps=1000000, num_envs=1024, batch_size=32768, buffer_size=51200, eval_interval=50000
### Phase 3: HumanoidBench (Figure 9 from paper - subset)
- `h1hand-walk`, `h1hand-run`, `h1hand-hurdle`, `h1hand-stair`, `h1hand-slide`
- **Duration**: 10800s each
- **Hyperparameters**: total_timesteps=2000000, num_envs=256, batch_size=16384, buffer_size=12800, eval_interval=100000
## 🔧 Commands
Monitor jobs:
```bash
squeue -u $USER
python monitor_experiments.py logs/experiment_tracking_1753196960.yaml --watch
```
Submit next phases:
```bash
python submit_experiment_batch.py --phase 2 --seeds 3
python submit_experiment_batch.py --phase 3 --seeds 3
```