What is the purpose of knowledge distillation in model compression?

Question

Machine Learning — Hard

What is the purpose of knowledge distillation in model compression?

Accepted Answer

Knowledge distillation involves training a smaller student model to mimic the soft probability outputs of a larger teacher model, allowing for more efficient transfer of knowledge. This process results in a compact model with performance close to that of the teacher model.