Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lora : error message if new token is added in the adapter #9948

Merged
merged 1 commit into from
Oct 22, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Oct 18, 2024

Related to #9778

Explanation from TRL team member:

There can be situations where PEFT will automatically save the embedding layer when it detects that it has been changed (e.g. new tokens being added). However, this should save the full embedding layer, not lora_A and lora_B.

In llama.cpp, we simply can't support this case because it will break adapter hot-reload (cannot switch lm_head or embd_tokens at runtime). Also, if multiple adapters each has its own set of additional token, we can't mix and match these adapters together.

Solution

When fine-tuning your model using TRL, please make sure:

  • Not to add any token
  • Not to call setup_chat_format()

@github-actions github-actions bot added the python python script changes label Oct 18, 2024
@ngxson ngxson changed the title lora : warn user if new token is added in the adapter lora : error message if new token is added in the adapter Oct 18, 2024
@slaren
Copy link
Collaborator

slaren commented Oct 18, 2024

In llama.cpp, we simply can't support this case because it will break adapter hot-reload

I think we can support this by loading the full tensor from the LoRA and replacing it in the model, it shouldn't be a problem. Keep a copy to the previous tensor pointer to restore it when unloading the LoRA, and reject applying another LoRA if it also has a full copy of the same tensor. Or better, just modify the LoRA applying functions like llm_build_lora_mm to use the replacement tensor. Dealing with the changes to the tokenizer may be a more difficult problem, however.

@ngxson
Copy link
Collaborator Author

ngxson commented Oct 18, 2024

In fact, I'm thinking of the case where each adapter has its own set of added tokens. This will be impossible because now each adapter has its own definition of the added token.

In any cases, I don't think user should add a new tokens to LoRA since the their embeddings will not be trained. In fact, these embeddings will be initialized to random vectors.

(I'm merging this PR once @Victoran0 confirms that removing setup_chat_format fixes the problem)

@Victoran0
Copy link

Victoran0 commented Oct 19, 2024

Yes @ngxson , removing setup_chat_format resolved the error, the lora adapter's size is now less than 100mb.
This commit is optimal as it tells the user where the problem is coming from and what to fix. Great job!

@ngxson ngxson merged commit c421ac0 into ggerganov:master Oct 22, 2024
9 checks passed
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants