What is the purpose of the REINFORCE algorithm in reinforcement learning?

Machine Learning Hard

Machine Learning — Hard

What is the purpose of the REINFORCE algorithm in reinforcement learning?

Key points

  • REINFORCE is a policy gradient algorithm that optimizes the policy directly
  • It estimates the gradient of expected return by sampling trajectories
  • It computes the gradient of log-probability of actions weighted by the actual return
  • REINFORCE does not use a value function for optimization

Ready to go further?

Related questions