Machine Learning — Medium
Key points
- Adam adapts learning rates individually for faster convergence
- Vanilla SGD uses a fixed learning rate for all parameters
- Adam estimates first and second moments of gradients
- Vanilla SGD lacks adaptive learning rate capabilities
Ready to go further?
Related questions
