What is the difference between gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent?

Data Science with Python Hard

Data Science with Python — Hard

What is the difference between gradient descent, stochastic gradient descent (SGD), and mini-batch gradient descent?

Key points

  • Gradient descent uses full dataset; SGD uses one random sample; mini-batch uses a subset
  • SGD has noisy updates but faster convergence; mini-batch balances noise and efficiency
  • Each method updates based on different data sizes

Ready to go further?

Related questions