What is the difference between proximal policy optimization (PPO) and trust region policy optimization (TRPO)?

Machine Learning Hard

Machine Learning — Hard

What is the difference between proximal policy optimization (PPO) and trust region policy optimization (TRPO)?

Key points

  • TRPO uses a hard KL divergence constraint with conjugate gradient
  • PPO approximates this constraint with a clipped surrogate objective
  • TRPO guarantees monotonic improvement but is computationally expensive
  • PPO is simpler to implement and computationally efficient

Ready to go further?

Related questions