Simple code using pytorch to realize part of Knowledge-distillation.
For KD and AT, ResNet20 is student network and ResNet56 is teacher Network.
For DML, two student networks are ResNet20.
python train.py -m student -gpu 1
metric | Raw ResNet20 | Raw ResNet56 | KD | AT | DML |
---|---|---|---|---|---|
Top-1 | 91.030 | 92.257 | 91.723 | 91.822 | 91.574 |