You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, your work on knowledge distillation is great!
However, I have some problems about the code of FitNets.
I found you just use sum of losses to get backward, specifically, the loss_feat and loss_ce are passed together to the trainer directly. But I think that it is supposed to train initial weights of intermediate layers using feature loss then train the whole student model with ce loss, according to original paper. I wonder if I get something wrong about this or I misunderstand the process? Look forward to ur reply.
The text was updated successfully, but these errors were encountered:
Thanks for your attention. We check the code and the original paper. FitNets is actually a two-stage distillation method yet our implementation simply combines the feature loss and the logit loss following CRD's codebase. We will correct it when updating the code.
Hello, your work on knowledge distillation is great!
However, I have some problems about the code of FitNets.
I found you just use sum of losses to get backward, specifically, the
loss_feat
andloss_ce
are passed together to the trainer directly. But I think that it is supposed to train initial weights of intermediate layers using feature loss then train the whole student model with ce loss, according to original paper. I wonder if I get something wrong about this or I misunderstand the process? Look forward to ur reply.The text was updated successfully, but these errors were encountered: