Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The first term of “grad” seems to be wrong #2

Open
PhyscalX opened this issue Aug 17, 2017 · 8 comments
Open

The first term of “grad” seems to be wrong #2

PhyscalX opened this issue Aug 17, 2017 · 8 comments

Comments

@PhyscalX
Copy link

PhyscalX commented Aug 17, 2017

in code,“gamma * (power_prob_data[ind_i] / (1 - prob_data[ind_i])) * log_prob_data[ind_i]”
howover,if(i==j), the (prob_data[ind_i] - 1) should make it as
"-gamma * (power_prob_data[ind_i] / (1 - prob_data[ind_i])) * log_prob_data[ind_i]"
otherwise,it turns to be a gradient ascent optimization.

@PhyscalX PhyscalX changed the title The first term 哦分 The first term of “grad” seems to be wrong Aug 17, 2017
@zimenglan-sysu-512
Copy link
Owner

hi @PhyscalX,

here i just ignore the sign, because it can be dismissed by multiplying the -1 * p_i (p_i - 1) or -1 * p_j * p_i, which is the derivative of normal cross-entropy loss using softmax function. you can see the line 141 and 144 in .cu file.

@PhyscalX
Copy link
Author

PhyscalX commented Aug 17, 2017

Your mathematic consideration is right, the sign is dismissed.
But the "(power_prob_data[ind_i] / (1 - prob_data[ind_i])) " multiplied by "(prob_data[ind_i] - 1)",
lead to "-power_prob_data[ind_i]" at the same time, actually it should be "power_prob_data[ind_i]"

Besides, directly log(p) is dangerous, because the lower bound of softmax outputs can be very small.
Prefer to add a eps(e.g. 1e-10) before.

Besides,your loss is wrong also.
in code, "loss[index] = -log(max(power_prob_data[ind] * log_prob_data[ind], Dtype(FLT_MIN)));
howover, it should be
"loss[index] = -power_prob_data[ind] * log_prob_data[ind];"

@zimenglan-sysu-512
Copy link
Owner

hi @PhyscalX,

you are right, the loss is computed wrong, and thanks for reminding of log operation. i will update my code tomorrow and do some test.

for the first term you mention, i need to double check.

thanks again.

@PhyscalX
Copy link
Author

I have testified my idea on cifar10-quick, it is right,got the similar val acc as original loss @(alpha=1.0/0.75/0.5/0.25, gamma=2.0)

eps is very important in focal loss, all divisions in your code are dangerous, when alpha > 0.25,
it tends to encounter NaN in cifar10-quick at the end of convergence.

I preset eps as 1e-10 in my framework Dragon.
If you are interested in it, check the following codes:

(op_kernel.h, line 336), the declaration of kernel::SparseSoftmaxFocalLoss
(op_kernel.cc, line 777), the CPU implementation of kernel::SparseSoftmaxFocalLoss
(op_kernel.cu, line 1417), the CUDA implementation of kernel::SparseSoftmaxFocalLoss
(sparse_softmax_focal_loss_op.h), the declaration of SparseSoftmaxFocalLossOp
(sparse_softmax_focal_loss_op.cc), the implementation of SparseSoftmaxFocalLossOp

@zimenglan-sysu-512
Copy link
Owner

zimenglan-sysu-512 commented Aug 18, 2017

hi @PhyscalX,

so thanks a lot. I have fixed the problems that u tell me. for the gradient, i forgot to derivate the (1 - p_t), to ignore the sign. now i add it back.

thanks again.

@zimenglan-sysu-512
Copy link
Owner

zimenglan-sysu-512 commented Aug 18, 2017

hi @PhyscalX,

you're right, eps is very important, i add it to solve the NaN problem. Right now, it can run normally, you can have a look.

thanks for pointing out my error and giving so useful suggestions.

@PhyscalX
Copy link
Author

PhyscalX commented Aug 18, 2017

And the last, Recommend you to multiply "grad" by prob_data[ind_i].
Directly dividing it may still lead to potential numerical issues.
Formulate in ONE way,and Implement in ANOTHER.
There are enormous tricks in programming mathematical formulations.

@zimenglan-sysu-512
Copy link
Owner

hi @PhyscalX,

i have updated.
thanks a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants