Add HoReKa cluster documentation to README
- Document installation process using Python 3.10 venv - Add usage examples for SLURM job submission - Document available environments and resource allocations - Add WandB configuration instructions - List all repository changes made for HoReKa compatibility
This commit is contained in:
parent
05dddfa10c
commit
30f59aaa9b
98
README.md
98
README.md
@ -44,6 +44,104 @@ pip install -e .[all] # except for Kitchen
|
|||||||
source script/set_path.sh
|
source script/set_path.sh
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## HoReKa Cluster Setup
|
||||||
|
|
||||||
|
### Installation on HoReKa
|
||||||
|
|
||||||
|
The DPPO repository has been adapted to run on the HoReKa cluster. The original repository recommends conda, but we use vanilla Python with venv for consistency with cluster policies.
|
||||||
|
|
||||||
|
1. **Clone the repository and navigate to it:**
|
||||||
|
```bash
|
||||||
|
git clone git@dominik-roth.eu:dodox/dppo.git
|
||||||
|
cd dppo
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Create virtual environment with Python 3.10:**
|
||||||
|
```bash
|
||||||
|
python3.10 -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Install the package and dependencies:**
|
||||||
|
```bash
|
||||||
|
# Use the provided installation script
|
||||||
|
sbatch install_dppo.sh
|
||||||
|
|
||||||
|
# Or install manually:
|
||||||
|
pip install --upgrade pip
|
||||||
|
pip install -e .
|
||||||
|
pip install -e .[gym] # For Gym environments
|
||||||
|
```
|
||||||
|
|
||||||
|
### Running on HoReKa
|
||||||
|
|
||||||
|
The repository includes pre-configured SLURM scripts for job submission:
|
||||||
|
|
||||||
|
#### Quick Start
|
||||||
|
```bash
|
||||||
|
# Run a development test (30 minutes, 24GB RAM)
|
||||||
|
./submit_job.sh dev
|
||||||
|
|
||||||
|
# Run Gym pre-training
|
||||||
|
./submit_job.sh gym hopper pretrain
|
||||||
|
|
||||||
|
# Run Gym fine-tuning
|
||||||
|
./submit_job.sh gym walker2d finetune
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Manual Job Submission
|
||||||
|
```bash
|
||||||
|
# Submit development test
|
||||||
|
sbatch slurm/run_dppo_dev.sh
|
||||||
|
|
||||||
|
# Submit Gym experiments with parameters
|
||||||
|
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Supported Tasks
|
||||||
|
|
||||||
|
**Gym environments:**
|
||||||
|
- `hopper`, `walker2d`, `halfcheetah`
|
||||||
|
|
||||||
|
**Modes:**
|
||||||
|
- `pretrain` - Pre-train diffusion policy
|
||||||
|
- `finetune` - Fine-tune with PPO
|
||||||
|
|
||||||
|
#### Resource Allocations
|
||||||
|
- **Development**: 30 minutes, 24GB RAM, 8 CPUs, dev_accelerated partition
|
||||||
|
- **Production**: 8 hours, 32GB RAM, 40 CPUs, accelerated partition
|
||||||
|
|
||||||
|
#### Monitoring Jobs
|
||||||
|
```bash
|
||||||
|
# Check job status
|
||||||
|
squeue -u $USER
|
||||||
|
|
||||||
|
# View logs
|
||||||
|
tail -f logs/dppo_<job_id>.out
|
||||||
|
```
|
||||||
|
|
||||||
|
### Configuration
|
||||||
|
|
||||||
|
Before running experiments, set your WandB credentials:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export WANDB_API_KEY=<your_api_key>
|
||||||
|
export WANDB_ENTITY=<your_username_or_team>
|
||||||
|
```
|
||||||
|
|
||||||
|
Or disable WandB by adding `wandb=null` to your python command.
|
||||||
|
|
||||||
|
### Repository Changes
|
||||||
|
|
||||||
|
This fork includes the following additions for HoReKa compatibility:
|
||||||
|
- `install_dppo.sh` - Automated installation script for SLURM
|
||||||
|
- `submit_job.sh` - Convenient job submission wrapper
|
||||||
|
- `slurm/` directory with job scripts for different experiment types
|
||||||
|
- Updated `.gitignore` to allow shell scripts (removed `*.sh` exclusion)
|
||||||
|
- Git remotes configured: `upstream` (original repository) and `origin` (this fork)
|
||||||
|
|
||||||
|
Note: The installation was successful without any code modifications. All dependencies installed correctly with Python 3.10.
|
||||||
|
|
||||||
## Usage - Pre-training
|
## Usage - Pre-training
|
||||||
|
|
||||||
**Note**: You may skip pre-training if you would like to use the default checkpoint (available for download) for fine-tuning.
|
**Note**: You may skip pre-training if you would like to use the default checkpoint (available for download) for fine-tuning.
|
||||||
|
Loading…
Reference in New Issue
Block a user