This PR incorporates MTBench into the current codebase, as a good demonstration that shows how to use FastTD3 for multi-task setup.
- Add support for MTBench along with its wrapper
- Add support for per-task reward normalizer useful for multi-task RL, motivated by BRC paper (https://arxiv.org/abs/2505.23150v1)