What is the purpose of the Adam optimizer compared to vanilla stochastic gradient descent?

Machine Learning Medium

Machine Learning — Medium

What is the purpose of the Adam optimizer compared to vanilla stochastic gradient descent?

Key points

  • Adam adapts learning rates individually for faster convergence
  • Vanilla SGD uses a fixed learning rate for all parameters
  • Adam estimates first and second moments of gradients
  • Vanilla SGD lacks adaptive learning rate capabilities

Ready to go further?

Related questions