Machine Learning — Hard
Key points
- Actor-critic uses both policy and value function
- Pure policy gradient methods only use returns from trajectories
- Critic in actor-critic reduces variance
- Pure policy gradient methods suffer from high variance
- REINFORCE is an example of pure policy gradient method
Ready to go further?
Related questions
