Machine Learning — Hard
Key points
- REINFORCE is a policy gradient algorithm that optimizes the policy directly
- It estimates the gradient of expected return by sampling trajectories
- It computes the gradient of log-probability of actions weighted by the actual return
- REINFORCE does not use a value function for optimization
Ready to go further?
Related questions
