Complete HoReKa README with experiment management tools

- Added reference to experiment_plan.md for current progress - Updated running instructions with batch experiment commands - Added monitoring tools and paper replication workflow - Listed all available scripts and their purposes - Complete install and run instructions for HoReKa users
2025-07-22 17:18:27 +02:00 · 2025-07-22 17:18:27 +02:00 · 69502c8911
commit 69502c8911
parent e95c2c4e11
1 changed files with 37 additions and 17 deletions
--- a/README.md
+++ b/README.md
@ -15,6 +15,8 @@ For more information, please see our [project webpage](https://younggyo.me/fast_
 This repository includes optimized scripts for running FastTD3 on the HoReKa supercomputer cluster with SLURM job scheduling and wandb logging.
 **📋 Current Progress**: See [experiment_plan.md](experiment_plan.md) for ongoing paper replication experiments and job tracking.
 ### Installation on HoReKa
 ```bash
@ -58,37 +60,55 @@ python test_setup.py
 ### Running on HoReKa
-**Easy submission:**
+**Single job submission:**
 ```bash
 python submit_job.py
 ```
 **Batch experiments (paper replication):**
 ```bash
 # Submit Phase 1: MuJoCo Playground (4 tasks × 3 seeds)
 python submit_experiment_batch.py --phase 1 --seeds 3
 # Submit Phase 2: IsaacLab (6 tasks × 3 seeds) 
 python submit_experiment_batch.py --phase 2 --seeds 3
 # Submit Phase 3: HumanoidBench (5 tasks × 3 seeds)
 python submit_experiment_batch.py --phase 3 --seeds 3
 ```
 **Monitor experiments:**
 ```bash
 # Check job status
 squeue -u $USER
 # Monitor experiments with tracking
 python monitor_experiments.py --watch
 # View specific job output
 tail -f logs/fasttd3_*.out
 # Cancel job if needed
 scancel <job_id>
 ```
 **Manual submission:**
 ```bash
 sbatch run_fasttd3.slurm
 ```
 **Monitor jobs:**
 ```bash
 # Check job status
 squeue -u $USER
 # View output
 tail -f fasttd3_<job_id>.out
 # Cancel job if needed
 scancel <job_id>
 ```
 ### Configuration
 The setup includes:
- **SLURM script** (`run_fasttd3.slurm`) configured for accelerated partition with GPU
+- **SLURM scripts** (`run_fasttd3.slurm`, `run_fasttd3_full.slurm`) configured for accelerated partition with GPU
- **Job helper** (`submit_job.py`) for easy job submission with wandb setup
+- **Job helpers** (`submit_job.py`, `submit_experiment_batch.py`) for single/batch job submission
 - **Monitoring tool** (`monitor_experiments.py`) for real-time experiment tracking
 - **Test script** (`test_setup.py`) for environment verification
- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) for humanoid control
+- **Experiment plan** (`experiment_plan.md`) with current progress and TODO tracking
- **Automatic GPU detection** and CUDA 12.4 compatibility
+- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) working and tested
 - **Automatic GPU detection** and CUDA 12.4 compatibility  
 - **Wandb logging** with online mode by default
 - **Paper-accurate hyperparameters** for systematic replication
 ### Wandb Integration