What is the difference between SMOTE and random oversampling for handling imbalanced datasets?

Data Science with Python Hard

Data Science with Python — Hard

What is the difference between SMOTE and random oversampling for handling imbalanced datasets?

Key points

  • SMOTE creates new synthetic samples based on existing minority instances
  • Random oversampling duplicates existing minority class samples
  • SMOTE reduces overfitting risk by generating diverse synthetic samples
  • Random oversampling may lead to overfitting by duplicating the same instances

Ready to go further?

Related questions