Machine Learning — Hard
Key points
- RLHF optimizes language models using a reward model from human preferences
- Supervised fine-tuning trains directly on human-provided demonstrations
- RLHF captures nuanced preferences like helpfulness and harmlessness
- Supervised fine-tuning requires labeled data for every training example
Ready to go further?
Related questions
