Mexican NLP 2024 Summer School Tutorial on Knowledge Distillation and Parameter Efficient Fine-tuning
The slides can be found here
- Baseline Training: This notebook shows an example of fine-tuning BERT for a standard classification task.
- Small Model Training: Similar to Baseline Training, but now we train a significantly smaller model from scratch.
- Small Model Training + KD: We train the small model from scratch with knowledge distillation, using the baseline training as the teacher.
- Small Model Training + Optimized KD: We optimize the knowledge distillation by caching the teacher's logits.
- Baseline Training with LoRA: We re-train the baseline model with LoRA.