What is the difference between actor-critic and pure policy gradient methods in RL?

Machine Learning Hard

Machine Learning — Hard

What is the difference between actor-critic and pure policy gradient methods in RL?

Key points

  • Actor-critic uses both policy and value function
  • Pure policy gradient methods only use returns from trajectories
  • Critic in actor-critic reduces variance
  • Pure policy gradient methods suffer from high variance
  • REINFORCE is an example of pure policy gradient method

Ready to go further?

Related questions