How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

Phuoc-Hoan-Le · 2021-11-09T21:57:16Z

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

345308394 · 2021-12-02T06:59:01Z

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

Phuoc-Hoan-Le · 2021-12-02T11:45:45Z

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

345308394 · 2021-12-02T13:23:42Z

I also encountered the same problem. Have you solved it? Or can we discuss it?

Because it seems that the function which calculates the eigenvalues only returns one eigenvalue for the whole model.

we can discuss it

To calculate the maximum eigenvalue of the second derivative of the weights, first calculate the parameters and first partial derivative of the weights. This function(get_params_grad(model)) is to get all the weights and the corresponding first partial derivatives. Therefore, my method is to change this function, return the weight of each block and the corresponding first partial derivative, and then calculate the maximum eigenvalue of the corresponding second derivative.

Phuoc-Hoan-Le · 2021-12-02T16:10:56Z

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

345308394 · 2021-12-03T08:37:08Z

Wow, that sounds like a complicated solution.

The way I solved it is, if you look at the eigenvalues() function, you will see that the final eigenvalue is just one value because if you look at the group_product() function, they return the sum of the whole list, rather than returning a list. This makes me think they did already calculated the eigenvalue for each weight matrix already, but instead just choose to sum up the eigenvalues to calculate the eigenvalue for the whole model. Also, in eigenvalues() function, the eigenvector is returned as a list of list of vectors where each element of the outer list corresponds to a list of the n-th eigenvector for each weight matrix, so the first element of the outer list corresponds to the 1st eigenvector for each weight matrix and so on.

Note, that when you are modifying the group_product() function. normalization() will be affected so gotta change that to make it work or maybe introduce a new function. I didn't have to change the function(get_params_grad(model))

Can we exchange our calculations?

huanmei9 · 2022-01-07T13:40:22Z

I also encountered the same question, have you solved it? @345308394 @CharlesLeeeee

xchuwenbo · 2022-01-27T09:48:55Z

Hi! I also find this question important!

I just want to see the layer-wise eigenvalues of a specific model.

katayoun-cadence · 2022-02-03T01:36:30Z

@345308394 @CharlesLee did your code solve this issue? Can we exchange the calculation?

BiaoFangAIA · 2022-11-08T03:02:27Z

i know how to calculate each layer hessian trace：
get_trace(self,maxIter=100, tol=1e-3):
"""

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

EdenBelouadah · 2023-12-15T16:28:14Z

@BiaoFangAIA y

i know how to calculate each layer hessian trace： get_trace(self,maxIter=100, tol=1e-3): """

    """
    device=self.device
    traces_vhv=[] #返回all layer 值
    seed=1
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    for (i_grad, i_param, (module_name, _)) in zip(self.gradsH, self.params, self.model.named_modules()):
        v=[
        torch.randint_like(i_param, high=2, device=device)
        ]
        for v_i in v:
            v_i[v_i==0] = -1
        i_v=v
        trace_vhv=[]
        trace=0.
        trace_pair={"layer_name":" ", "trace":0}
        self.model.zero_grad()
        for i in range(maxIter):
            if len(i_grad.shape)>1:
                hv=hessian_vector_product(i_grad, i_param, i_v)
                trace_vhv_cur=group_product(hv,v).cpu().item()
                # print("trace_vhv_cur:", trace_vhv_cur)
                trace_vhv.append(trace_vhv_cur)
                # print("error:", abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6))
                if abs(np.mean(trace_vhv)-trace)/(abs(trace)+1e-6)<tol:
                    avg_trace_vhv=np.mean(trace_vhv)
                    # print("model_name:",module_name,"trace_probe:", avg_trace_vhv)
                    trace_pair["layer_name"]=module_name
                    trace_pair["trace"]=avg_trace_vhv
                    # traces_vhv.append((name[0], avg_trace_vhv))
                    traces_vhv.append(trace_pair)
                    break
                else:
                    trace=np.mean(trace_vhv)
        # i_v=i_v+1
    return traces_vhv

did you solve the issue?

the proposed solution does not work, my ViT model contains 75 layers each containing weight and bias layers, your code returns a list with 52 traces only

thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

Phuoc-Hoan-Le commented Nov 9, 2021 •

edited

Loading

345308394 commented Dec 2, 2021

Phuoc-Hoan-Le commented Dec 2, 2021

345308394 commented Dec 2, 2021

Phuoc-Hoan-Le commented Dec 2, 2021 •

edited

Loading

345308394 commented Dec 3, 2021

huanmei9 commented Jan 7, 2022

xchuwenbo commented Jan 27, 2022

katayoun-cadence commented Feb 3, 2022 •

edited

Loading

BiaoFangAIA commented Nov 8, 2022

EdenBelouadah commented Dec 15, 2023

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

How do we compute eigenvalues of Hessian matrices for each weight matrices or for each module in a model rather than calculating the eigenvalue for the whole model?? #10

Comments

Phuoc-Hoan-Le commented Nov 9, 2021 • edited Loading

345308394 commented Dec 2, 2021

Phuoc-Hoan-Le commented Dec 2, 2021

345308394 commented Dec 2, 2021

Phuoc-Hoan-Le commented Dec 2, 2021 • edited Loading

345308394 commented Dec 3, 2021

huanmei9 commented Jan 7, 2022

xchuwenbo commented Jan 27, 2022

katayoun-cadence commented Feb 3, 2022 • edited Loading

BiaoFangAIA commented Nov 8, 2022

EdenBelouadah commented Dec 15, 2023

Phuoc-Hoan-Le commented Nov 9, 2021 •

edited

Loading

Phuoc-Hoan-Le commented Dec 2, 2021 •

edited

Loading

katayoun-cadence commented Feb 3, 2022 •

edited

Loading