From 69502c891185c878d4f14246a77fe32d720b14cb Mon Sep 17 00:00:00 2001 From: "ys1087@partner.kit.edu" Date: Tue, 22 Jul 2025 17:18:27 +0200 Subject: [PATCH] Complete HoReKa README with experiment management tools - Added reference to experiment_plan.md for current progress - Updated running instructions with batch experiment commands - Added monitoring tools and paper replication workflow - Listed all available scripts and their purposes - Complete install and run instructions for HoReKa users --- README.md | 54 +++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 17 deletions(-) diff --git a/README.md b/README.md index 9a4b510..7b6717f 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,8 @@ For more information, please see our [project webpage](https://younggyo.me/fast_ This repository includes optimized scripts for running FastTD3 on the HoReKa supercomputer cluster with SLURM job scheduling and wandb logging. +**📋 Current Progress**: See [experiment_plan.md](experiment_plan.md) for ongoing paper replication experiments and job tracking. + ### Installation on HoReKa ```bash @@ -58,37 +60,55 @@ python test_setup.py ### Running on HoReKa -**Easy submission:** +**Single job submission:** ```bash python submit_job.py ``` +**Batch experiments (paper replication):** +```bash +# Submit Phase 1: MuJoCo Playground (4 tasks × 3 seeds) +python submit_experiment_batch.py --phase 1 --seeds 3 + +# Submit Phase 2: IsaacLab (6 tasks × 3 seeds) +python submit_experiment_batch.py --phase 2 --seeds 3 + +# Submit Phase 3: HumanoidBench (5 tasks × 3 seeds) +python submit_experiment_batch.py --phase 3 --seeds 3 +``` + +**Monitor experiments:** +```bash +# Check job status +squeue -u $USER + +# Monitor experiments with tracking +python monitor_experiments.py --watch + +# View specific job output +tail -f logs/fasttd3_*.out + +# Cancel job if needed +scancel +``` + **Manual submission:** ```bash sbatch run_fasttd3.slurm ``` -**Monitor jobs:** -```bash -# Check job status -squeue -u $USER - -# View output -tail -f fasttd3_.out - -# Cancel job if needed -scancel -``` - ### Configuration The setup includes: -- **SLURM script** (`run_fasttd3.slurm`) configured for accelerated partition with GPU -- **Job helper** (`submit_job.py`) for easy job submission with wandb setup +- **SLURM scripts** (`run_fasttd3.slurm`, `run_fasttd3_full.slurm`) configured for accelerated partition with GPU +- **Job helpers** (`submit_job.py`, `submit_experiment_batch.py`) for single/batch job submission +- **Monitoring tool** (`monitor_experiments.py`) for real-time experiment tracking - **Test script** (`test_setup.py`) for environment verification -- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) for humanoid control -- **Automatic GPU detection** and CUDA 12.4 compatibility +- **Experiment plan** (`experiment_plan.md`) with current progress and TODO tracking +- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) working and tested +- **Automatic GPU detection** and CUDA 12.4 compatibility - **Wandb logging** with online mode by default +- **Paper-accurate hyperparameters** for systematic replication ### Wandb Integration