What is the difference between RLHF (Reinforcement Learning from Human Feedback) and standard supervised fine-tuning of language models?

Machine Learning Hard

Machine Learning — Hard

What is the difference between RLHF (Reinforcement Learning from Human Feedback) and standard supervised fine-tuning of language models?

Key points

  • RLHF optimizes language models using a reward model from human preferences
  • Supervised fine-tuning trains directly on human-provided demonstrations
  • RLHF captures nuanced preferences like helpfulness and harmlessness
  • Supervised fine-tuning requires labeled data for every training example

Ready to go further?

Related questions