8 Innovative BERT Knowledge Distillation Papers That Have Changed The Landscape of NLP-526互联

8 Innovative BERT Knowledge Distillation Papers That Have Changed The Landscape of NLP

Contemporary state-of-the-art NLP models are difficult to be utilized in production. Knowledge distillation offers tools for tackling such issues along with several others, but it has its quirks.

BERT’s inefficient nature has not gone unnoticed. Many researchers have pursued ways to reduce its cost and size. Some of the most active research is in model compression techniques such as smaller architectures (structured pruning), distillation, quantization, and unstructured pruning. A few of the more impactful papers include:

DistilBERT used knowledge distillation to transfer knowledge from a BERT base model to a 6-layer version.
TinyBERT implemented a more complicated distillation setup to better transfer the knowledge from the baseline model into a 4-layer version.
The Lottery Ticket Hypothesis applied magnitude pruning during pre-training of a BERT model to create a sparse architecture that generalized well across fine-tuning tasks.
Movement Pruning applied a combination of the magnitude and gradient information to remove redundant parameters while fine-tuning with distillation.

https://towardsdatascience.com/https-medium-com-chaturangarajapakshe-text-classification-with-transformer-models-d370944b50ca

This post is about text classification on problems with a limited sample count.

distillation innovative knowledge landscape

understanding distillation knowledge empirical

distillation relational knowledge

distillation recommender knowledge framework

distillation target-aware transformer knowledge

distillation knowledge stronger teacher

distillation unsupervised adaptation knowledge

distillation decoupled knowledge

recommendation distillation knowledge unbiased

innovative