Complete HoReKa README with experiment management tools
- Added reference to experiment_plan.md for current progress - Updated running instructions with batch experiment commands - Added monitoring tools and paper replication workflow - Listed all available scripts and their purposes - Complete install and run instructions for HoReKa users
This commit is contained in:
parent
e95c2c4e11
commit
69502c8911
52
README.md
52
README.md
@ -15,6 +15,8 @@ For more information, please see our [project webpage](https://younggyo.me/fast_
|
|||||||
|
|
||||||
This repository includes optimized scripts for running FastTD3 on the HoReKa supercomputer cluster with SLURM job scheduling and wandb logging.
|
This repository includes optimized scripts for running FastTD3 on the HoReKa supercomputer cluster with SLURM job scheduling and wandb logging.
|
||||||
|
|
||||||
|
**📋 Current Progress**: See [experiment_plan.md](experiment_plan.md) for ongoing paper replication experiments and job tracking.
|
||||||
|
|
||||||
### Installation on HoReKa
|
### Installation on HoReKa
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@ -58,37 +60,55 @@ python test_setup.py
|
|||||||
|
|
||||||
### Running on HoReKa
|
### Running on HoReKa
|
||||||
|
|
||||||
**Easy submission:**
|
**Single job submission:**
|
||||||
```bash
|
```bash
|
||||||
python submit_job.py
|
python submit_job.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Batch experiments (paper replication):**
|
||||||
|
```bash
|
||||||
|
# Submit Phase 1: MuJoCo Playground (4 tasks × 3 seeds)
|
||||||
|
python submit_experiment_batch.py --phase 1 --seeds 3
|
||||||
|
|
||||||
|
# Submit Phase 2: IsaacLab (6 tasks × 3 seeds)
|
||||||
|
python submit_experiment_batch.py --phase 2 --seeds 3
|
||||||
|
|
||||||
|
# Submit Phase 3: HumanoidBench (5 tasks × 3 seeds)
|
||||||
|
python submit_experiment_batch.py --phase 3 --seeds 3
|
||||||
|
```
|
||||||
|
|
||||||
|
**Monitor experiments:**
|
||||||
|
```bash
|
||||||
|
# Check job status
|
||||||
|
squeue -u $USER
|
||||||
|
|
||||||
|
# Monitor experiments with tracking
|
||||||
|
python monitor_experiments.py --watch
|
||||||
|
|
||||||
|
# View specific job output
|
||||||
|
tail -f logs/fasttd3_*.out
|
||||||
|
|
||||||
|
# Cancel job if needed
|
||||||
|
scancel <job_id>
|
||||||
|
```
|
||||||
|
|
||||||
**Manual submission:**
|
**Manual submission:**
|
||||||
```bash
|
```bash
|
||||||
sbatch run_fasttd3.slurm
|
sbatch run_fasttd3.slurm
|
||||||
```
|
```
|
||||||
|
|
||||||
**Monitor jobs:**
|
|
||||||
```bash
|
|
||||||
# Check job status
|
|
||||||
squeue -u $USER
|
|
||||||
|
|
||||||
# View output
|
|
||||||
tail -f fasttd3_<job_id>.out
|
|
||||||
|
|
||||||
# Cancel job if needed
|
|
||||||
scancel <job_id>
|
|
||||||
```
|
|
||||||
|
|
||||||
### Configuration
|
### Configuration
|
||||||
|
|
||||||
The setup includes:
|
The setup includes:
|
||||||
- **SLURM script** (`run_fasttd3.slurm`) configured for accelerated partition with GPU
|
- **SLURM scripts** (`run_fasttd3.slurm`, `run_fasttd3_full.slurm`) configured for accelerated partition with GPU
|
||||||
- **Job helper** (`submit_job.py`) for easy job submission with wandb setup
|
- **Job helpers** (`submit_job.py`, `submit_experiment_batch.py`) for single/batch job submission
|
||||||
|
- **Monitoring tool** (`monitor_experiments.py`) for real-time experiment tracking
|
||||||
- **Test script** (`test_setup.py`) for environment verification
|
- **Test script** (`test_setup.py`) for environment verification
|
||||||
- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) for humanoid control
|
- **Experiment plan** (`experiment_plan.md`) with current progress and TODO tracking
|
||||||
|
- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) working and tested
|
||||||
- **Automatic GPU detection** and CUDA 12.4 compatibility
|
- **Automatic GPU detection** and CUDA 12.4 compatibility
|
||||||
- **Wandb logging** with online mode by default
|
- **Wandb logging** with online mode by default
|
||||||
|
- **Paper-accurate hyperparameters** for systematic replication
|
||||||
|
|
||||||
### Wandb Integration
|
### Wandb Integration
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user