Data Science with Python — Hard
Key points
- Gradient descent uses full dataset; SGD uses one random sample; mini-batch uses a subset
- SGD has noisy updates but faster convergence; mini-batch balances noise and efficiency
- Each method updates based on different data sizes
Ready to go further?
Related questions
