Commit Graph

11 Commits

Author SHA1 Message Date
Younggyo Seo
6e890eebd2
Support FastTD3 + SimbaV2 (#13)
- Support hyperspherical normalization
- Support loading FastTD3 + SimbaV2 for both training and inference
- Support (experimental) reward normalization that uses SimbaV2's formulation -- not working that well though
- Updated README for FastTD3 + SimbaV2
2025-06-15 12:49:59 -07:00
Younggyo Seo
1014bf7e82 [hotfix] fix issue when using n-step==1 2025-06-10 08:26:27 +00:00
Younggyo Seo
85cb1c65c7
Fix replay buffer issues when n_steps > 1 (#7)
- Fix an issue where the n-step reward is not properly computed for end-of-episode transitions when using n_step > 1.
- Fix an issue where the observation and next_observations are sampled across different episodes when using n_step > 1 and the buffer is full
- Fix an issue where the discount is not properly computed when n_step > 1
2025-06-07 01:20:48 -04:00
Younggyo Seo
fe028b578f update README and gitignore 2025-06-01 22:50:02 +00:00
Younggyo Seo
544adac2b4 fix typo 2025-05-29 17:53:35 +00:00
Younggyo Seo
3f22046fa8
Merge pull request #3 from younggyoseo/minor_updates_dev1
Update tuned_reward for T1
2025-05-29 01:30:33 -07:00
Younggyo Seo
c156ba93fb black formatting and update tuned_reward for T1 2025-05-29 08:29:44 +00:00
Younggyo Seo
65a55433fc
Merge pull request #2 from younggyoseo/memory_optimization_for_playground
memory optimization for playground
2025-05-29 00:00:57 -07:00
Younggyo Seo
5725eba3b8 memory optimization for playground 2025-05-29 06:58:28 +00:00
Younggyo Seo
5db18c2de2 update citations to include blog posts 2025-05-29 02:40:43 +00:00
Younggyo Seo
258bfe67dd Initial Public Release 2025-05-29 01:49:23 +00:00