-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The first term of “grad” seems to be wrong #2
Comments
hi @PhyscalX, here i just ignore the sign, because it can be dismissed by multiplying the |
Your mathematic consideration is right, the sign is dismissed. Besides, directly log(p) is dangerous, because the lower bound of softmax outputs can be very small. Besides,your loss is wrong also. |
hi @PhyscalX, you are right, the loss is computed wrong, and thanks for reminding of for the first term you mention, i need to double check. thanks again. |
I have testified my idea on cifar10-quick, it is right,got the similar val acc as original loss @(alpha=1.0/0.75/0.5/0.25, gamma=2.0) eps is very important in focal loss, all divisions in your code are dangerous, when alpha > 0.25, I preset eps as 1e-10 in my framework Dragon. (op_kernel.h, line 336), the declaration of kernel::SparseSoftmaxFocalLoss |
hi @PhyscalX, so thanks a lot. I have fixed the problems that u tell me. for the gradient, i forgot to derivate the (1 - p_t), to ignore the sign. now i add it back. thanks again. |
hi @PhyscalX, you're right, eps is very important, i add it to solve the thanks for pointing out my error and giving so useful suggestions. |
And the last, Recommend you to multiply "grad" by prob_data[ind_i]. |
hi @PhyscalX, i have updated. |
in code,“gamma * (power_prob_data[ind_i] / (1 - prob_data[ind_i])) * log_prob_data[ind_i]”
howover,if(i==j), the (prob_data[ind_i] - 1) should make it as
"-gamma * (power_prob_data[ind_i] / (1 - prob_data[ind_i])) * log_prob_data[ind_i]"
otherwise,it turns to be a gradient ascent optimization.
The text was updated successfully, but these errors were encountered: