-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10
Comments
I also encountered the same problem. Have you solved it? Or can we discuss it?
|
we can discuss it |
To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative. |
Wow, that sounds like a complicated solution. The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on. Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model)) |
Can we exchange our calculations? |
I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee |
Hi! I also find this question important! I just want to see the layer-wise eigenvalues of a specific model. |
@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation? |
i know how to calculate each layer hessian trace:
|
did you solve the issue? the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only thanks |
Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.
The text was updated successfully, but these errors were encountered: