What is the purpose of the Adam optimizer compared to vanilla stochastic gradient descent?

Question

Machine Learning — Medium

What is the purpose of the Adam optimizer compared to vanilla stochastic gradient descent?

Accepted Answer

Adam optimizer adjusts the learning rate for each parameter based on first and second moments of gradients, leading to faster convergence and less sensitivity to initial learning rate. This is in contrast to vanilla SGD, which uses a fixed learning rate for all parameters.