AI Fundamentals — Hard
Key points
- Multi-head attention runs attention multiple times in parallel
- Different learned projections are used for each attention head
- Helps the model attend to information from various representation subspaces
- Enhances the model's ability to capture complex relationships
Ready to go further?
Related questions
