Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions on Fine-Tuning Parameter Percentage Calculation #2

Open
mrclovvn opened this issue Dec 12, 2024 · 2 comments
Open

Questions on Fine-Tuning Parameter Percentage Calculation #2

mrclovvn opened this issue Dec 12, 2024 · 2 comments

Comments

@mrclovvn
Copy link

Dear authors,

Thank you for your great work and the publicly available code! I noticed in Table 3 of the paper that the fine-tuning parameter percentage for PYRA in ViT-B/16 is listed as 0.35%. Could you please clarify how this percentage is calculated?

When I try to replicate the results using the experiments/LoRA/ViT-B_prompt_lora_8.yaml configuration from the public code, I get the following parameter information:

total training parameters: 399652 adapter 0 LoRA 294912 prompt 0 prefix 0 PYRA 27840 head 76900
total parameters in model: 86198308

However, when I calculate the percentage as (LoRA + PYRA) / total, I get (294912 + 27840) / 86198308 = 0.3744%. Alternatively, when I calculate it as (LoRA + PYRA) / (total - head), I get (294912 + 27840) / (86198308 - 76900) = 0.3748%, which is different from the value of 0.35% in Table 3.

Similarly, when I use the experiments/LoRA/ViT-L_prompt_lora_12.yaml configuration to fine-tune ViT-L, the parameter information is as follows:

total training parameters: 1356068 adapter 0 LoRA 1179648 prompt 0 prefix 0 PYRA 73920 head 102500
total parameters in model: 304657700

The percentage I calculate is (1179648 + 73920) / 304657700 = 0.4115% and (1179648 + 73920) / (304657700 - 102500) = 0.4116%, which again does not match the value of 0.40% in Table 3.

Could you please explain how the fine-tuning parameter percentage is computed in the paper? Am I misunderstanding the calculation process?

Thank you for your time and assistance!

@Bostoncake
Copy link
Collaborator

Hello! Sorry for the misunderstanding between our codebase and the article. When we were working on the draft, we have adopted a version of PYRA without the LayerNorm, and the numbers in the paper corresponds to the LayerNorm-free version.

You are correct and that was a good catch. We will update the draft on arXiv accordingly.

@mrclovvn
Copy link
Author

Thank you for your prompt response and clarification! I look forward to the revised version on arXiv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants