- Added reference to experiment_plan.md for current progress
- Updated running instructions with batch experiment commands
- Added monitoring tools and paper replication workflow
- Listed all available scripts and their purposes
- Complete install and run instructions for HoReKa users
- Fixed JAX/PyTorch dtype mismatch for successful training
- Added experiment plan with paper-accurate hyperparameters
- Created batch submission and monitoring scripts
- Cleaned up log files and updated gitignore
- Ready for systematic paper replication
- Update SLURM scripts to use correct CUDA modules (devel/cuda/12.4, intel compiler)
- Add JAX downgrade to 0.4.35 for CuDNN 9.5.1 compatibility
- Fix JAX_PLATFORMS environment variable (cuda vs gpu,cpu)
- Update README with cluster-specific JAX installation steps
- Tested successfully: Both PyTorch and JAX working on GPU with full training
- Add complete HoReKa installation guide without conda dependency
- Include SLURM job script with GPU configuration and account setup
- Add helper scripts for job submission and environment testing
- Integrate wandb logging with both online and offline modes
- Support MuJoCo Playground environments for humanoid control
- Update README with clear separation of added vs original content
- Change in isaaclab_env wrapper to explicitly state GPU for each simulation
- Removing jax cache to support multi-gpu environment launch in MuJoCo Playground
- Removing .train() and .eval() in evaluation and rendering to avoid deadlock in multi-gpu training
- Supporting synchronous normalization for multi-gpu training
- Modified codes to be compatible with torch.compile
- Modified empirical normalizer to use in-place operator to avoid costly __setattr__
- Parallel soft Q-update
- As a default option, we disabled gradient norm clipping as it is quite expensive
This PR includes these changes:
- Fixing a bug in MTBench evaluation
- Add a missing `critic_cls` in `train.py` (resolving an issue https://github.com/younggyoseo/FastTD3/issues/17)
- Updating hyperparameters for MTBench
This PR incorporates MTBench into the current codebase, as a good demonstration that shows how to use FastTD3 for multi-task setup.
- Add support for MTBench along with its wrapper
- Add support for per-task reward normalizer useful for multi-task RL, motivated by BRC paper (https://arxiv.org/abs/2505.23150v1)
- Support hyperspherical normalization
- Support loading FastTD3 + SimbaV2 for both training and inference
- Support (experimental) reward normalization that uses SimbaV2's formulation -- not working that well though
- Updated README for FastTD3 + SimbaV2
- Fix an issue where the n-step reward is not properly computed for end-of-episode transitions when using n_step > 1.
- Fix an issue where the observation and next_observations are sampled across different episodes when using n_step > 1 and the buffer is full
- Fix an issue where the discount is not properly computed when n_step > 1