From dd6c6b616503c04c33239f70f5653916cef99cc1 Mon Sep 17 00:00:00 2001 From: Dominik Roth Date: Sun, 2 Jun 2024 11:59:26 +0200 Subject: [PATCH] Why was that h3? --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index db1fc87..f6e17a6 100644 --- a/README.md +++ b/README.md @@ -34,7 +34,7 @@ Fancy RL provides two main components: 2. **Additional Modules for TRPL**: Designed to integrate with torchrl's primitives-first approach, these modules are ideal for building custom algorithms with precise trust region projections. -### Background on Trust Region Policy Layers (TRPL) +## Background on Trust Region Policy Layers (TRPL) Trust region methods are essential in reinforcement learning for ensuring robust policy updates. Traditional methods like TRPO and PPO use approximations, which can sometimes violate constraints or fail to find optimal solutions. To address these issues, TRPL provides differentiable neural network layers that enforce trust regions through closed-form projections for deep Gaussian policies. These layers formalize trust regions individually for each state and complement existing reinforcement learning algorithms.