- Fix an issue where the n-step reward is not properly computed for end-of-episode transitions when using n_step > 1. - Fix an issue where the observation and next_observations are sampled across different episodes when using n_step > 1 and the buffer is full - Fix an issue where the discount is not properly computed when n_step > 1 |
||
---|---|---|
.. | ||
environments | ||
__init__.py | ||
fast_td3_deploy.py | ||
fast_td3_utils.py | ||
fast_td3.py | ||
hyperparams.py | ||
train.py | ||
training_notebook.ipynb |