About the implementation of FitNets #29

Coinc1dens · 2022-11-30T12:02:35Z

Hello, your work on knowledge distillation is great!
However, I have some problems about the code of FitNets.
I found you just use sum of losses to get backward, specifically, the loss_feat and loss_ce are passed together to the trainer directly. But I think that it is supposed to train initial weights of intermediate layers using feature loss then train the whole student model with ce loss, according to original paper. I wonder if I get something wrong about this or I misunderstand the process? Look forward to ur reply.

The text was updated successfully, but these errors were encountered:

Zzzzz1 · 2022-12-06T03:41:27Z

Thanks for your attention. We check the code and the original paper. FitNets is actually a two-stage distillation method yet our implementation simply combines the feature loss and the logit loss following CRD's codebase. We will correct it when updating the code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the implementation of FitNets #29

About the implementation of FitNets #29

Coinc1dens commented Nov 30, 2022

Zzzzz1 commented Dec 6, 2022

About the implementation of FitNets #29

About the implementation of FitNets #29

Comments

Coinc1dens commented Nov 30, 2022

Zzzzz1 commented Dec 6, 2022