Upd install intr to supprot epyc nodes like HoReKa Teal

This commit is contained in:
ys1087@partner.kit.edu 2025-07-29 19:19:27 +02:00
parent 22dfaa82dd
commit 13cd2e5b60

View File

@ -24,25 +24,38 @@ This repository includes optimized scripts for running FastTD3 on the HoReKa sup
git clone https://github.com/younggyoseo/FastTD3.git
cd FastTD3
# Install Python 3.10 locally (HoReKa doesn't provide conda)
# Install Python 3.10 locally with cross-CPU compatibility
# IMPORTANT: Use generic x86-64 architecture for compatibility with both Intel and AMD nodes
mkdir -p $HOME/.local/python-3.10
cd /tmp
curl -O https://www.python.org/ftp/python/3.10.14/Python-3.10.14.tgz
tar -xzf Python-3.10.14.tgz
cd Python-3.10.14
./configure --prefix=$HOME/.local/python-3.10 --enable-optimizations --with-ensurepip=install
# Configure without Intel-specific optimizations for AMD EPYC compatibility
export EXTRA_CFLAGS="-march=x86-64 -mtune=generic"
./configure --prefix=$HOME/.local/python-3.10 \
--with-ensurepip=install \
--enable-shared \
CFLAGS="$EXTRA_CFLAGS" \
CPPFLAGS="$EXTRA_CFLAGS"
make -j$(nproc)
make install
# Add to PATH
# Add to PATH and set library path
echo 'export PATH="$HOME/.local/python-3.10/bin:$PATH"' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH="$HOME/.local/python-3.10/lib:$LD_LIBRARY_PATH"' >> ~/.bashrc
echo 'export PATH="$HOME/.local/python-3.10/bin:$PATH"' >> ~/.zshrc
echo 'export LD_LIBRARY_PATH="$HOME/.local/python-3.10/lib:$LD_LIBRARY_PATH"' >> ~/.zshrc
export PATH="$HOME/.local/python-3.10/bin:$PATH"
export LD_LIBRARY_PATH="$HOME/.local/python-3.10/lib:$LD_LIBRARY_PATH"
# Go back to FastTD3 directory
cd $HOME/path/to/FastTD3
# Create virtual environment and install dependencies
# NOTE: If you encounter library errors, ensure LD_LIBRARY_PATH is set correctly
source ~/.bashrc # Load PATH and LD_LIBRARY_PATH
$HOME/.local/python-3.10/bin/python3.10 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
@ -100,15 +113,17 @@ sbatch run_fasttd3.slurm
### Configuration
The setup includes:
- **Cross-CPU compatible Python 3.10** with generic x86-64 architecture (works on both Intel Xeon and AMD EPYC nodes)
- **SLURM scripts** (`run_fasttd3.slurm`, `run_fasttd3_full.slurm`) configured for accelerated partition with GPU
- **Job helpers** (`submit_job.py`, `submit_experiment_batch.py`) for single/batch job submission
- **Monitoring tool** (`monitor_experiments.py`) for real-time experiment tracking
- **Test script** (`test_setup.py`) for environment verification
- **Experiment plan** (`experiment_plan.md`) with current progress and TODO tracking
- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) working and tested
- **MuJoCo Playground environment** (`T1JoystickFlatTerrain`) working and tested on all node types
- **Automatic GPU detection** and CUDA 12.4 compatibility
- **Wandb logging** with online mode by default
- **Paper-accurate hyperparameters** for systematic replication
- **LD_LIBRARY_PATH configuration** for shared Python libraries
### Wandb Integration