Machine Learning — Hard
Key points
- TRPO uses a hard KL divergence constraint with conjugate gradient
- PPO approximates this constraint with a clipped surrogate objective
- TRPO guarantees monotonic improvement but is computationally expensive
- PPO is simpler to implement and computationally efficient
Ready to go further?
Related questions
