What is the difference between BERT and GPT in terms of architecture and pre-training objectives?

Machine Learning Hard

Machine Learning — Hard

What is the difference between BERT and GPT in terms of architecture and pre-training objectives?

Key points

  • BERT uses bidirectional encoder with masked language modeling
  • GPT uses unidirectional decoder with causal language modeling
  • BERT focuses on understanding tasks, GPT on text generation
  • BERT has next sentence prediction, GPT predicts next token
  • BERT is better for understanding tasks, GPT for text generation

Ready to go further?

Related questions