Add HoReKa cluster documentation to README
- Document installation process using Python 3.10 venv - Add usage examples for SLURM job submission - Document available environments and resource allocations - Add WandB configuration instructions - List all repository changes made for HoReKa compatibility
This commit is contained in:
parent
05dddfa10c
commit
30f59aaa9b
98
README.md
98
README.md
@ -44,6 +44,104 @@ pip install -e .[all] # except for Kitchen
|
||||
source script/set_path.sh
|
||||
```
|
||||
|
||||
## HoReKa Cluster Setup
|
||||
|
||||
### Installation on HoReKa
|
||||
|
||||
The DPPO repository has been adapted to run on the HoReKa cluster. The original repository recommends conda, but we use vanilla Python with venv for consistency with cluster policies.
|
||||
|
||||
1. **Clone the repository and navigate to it:**
|
||||
```bash
|
||||
git clone git@dominik-roth.eu:dodox/dppo.git
|
||||
cd dppo
|
||||
```
|
||||
|
||||
2. **Create virtual environment with Python 3.10:**
|
||||
```bash
|
||||
python3.10 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
```
|
||||
|
||||
3. **Install the package and dependencies:**
|
||||
```bash
|
||||
# Use the provided installation script
|
||||
sbatch install_dppo.sh
|
||||
|
||||
# Or install manually:
|
||||
pip install --upgrade pip
|
||||
pip install -e .
|
||||
pip install -e .[gym] # For Gym environments
|
||||
```
|
||||
|
||||
### Running on HoReKa
|
||||
|
||||
The repository includes pre-configured SLURM scripts for job submission:
|
||||
|
||||
#### Quick Start
|
||||
```bash
|
||||
# Run a development test (30 minutes, 24GB RAM)
|
||||
./submit_job.sh dev
|
||||
|
||||
# Run Gym pre-training
|
||||
./submit_job.sh gym hopper pretrain
|
||||
|
||||
# Run Gym fine-tuning
|
||||
./submit_job.sh gym walker2d finetune
|
||||
```
|
||||
|
||||
#### Manual Job Submission
|
||||
```bash
|
||||
# Submit development test
|
||||
sbatch slurm/run_dppo_dev.sh
|
||||
|
||||
# Submit Gym experiments with parameters
|
||||
TASK=hopper MODE=pretrain sbatch slurm/run_dppo_gym.sh
|
||||
```
|
||||
|
||||
#### Supported Tasks
|
||||
|
||||
**Gym environments:**
|
||||
- `hopper`, `walker2d`, `halfcheetah`
|
||||
|
||||
**Modes:**
|
||||
- `pretrain` - Pre-train diffusion policy
|
||||
- `finetune` - Fine-tune with PPO
|
||||
|
||||
#### Resource Allocations
|
||||
- **Development**: 30 minutes, 24GB RAM, 8 CPUs, dev_accelerated partition
|
||||
- **Production**: 8 hours, 32GB RAM, 40 CPUs, accelerated partition
|
||||
|
||||
#### Monitoring Jobs
|
||||
```bash
|
||||
# Check job status
|
||||
squeue -u $USER
|
||||
|
||||
# View logs
|
||||
tail -f logs/dppo_<job_id>.out
|
||||
```
|
||||
|
||||
### Configuration
|
||||
|
||||
Before running experiments, set your WandB credentials:
|
||||
|
||||
```bash
|
||||
export WANDB_API_KEY=<your_api_key>
|
||||
export WANDB_ENTITY=<your_username_or_team>
|
||||
```
|
||||
|
||||
Or disable WandB by adding `wandb=null` to your python command.
|
||||
|
||||
### Repository Changes
|
||||
|
||||
This fork includes the following additions for HoReKa compatibility:
|
||||
- `install_dppo.sh` - Automated installation script for SLURM
|
||||
- `submit_job.sh` - Convenient job submission wrapper
|
||||
- `slurm/` directory with job scripts for different experiment types
|
||||
- Updated `.gitignore` to allow shell scripts (removed `*.sh` exclusion)
|
||||
- Git remotes configured: `upstream` (original repository) and `origin` (this fork)
|
||||
|
||||
Note: The installation was successful without any code modifications. All dependencies installed correctly with Python 3.10.
|
||||
|
||||
## Usage - Pre-training
|
||||
|
||||
**Note**: You may skip pre-training if you would like to use the default checkpoint (available for download) for fine-tuning.
|
||||
|
Loading…
Reference in New Issue
Block a user