Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add InternLM2 model and tests #29667

Closed
wants to merge 6 commits into from
Closed

Conversation

x54-729
Copy link

@x54-729 x54-729 commented Mar 15, 2024

What does this PR do?

  • Add tokenizer, fast tokenizer, configuration and model of InternLM2
  • Add tests for InternLM2 tokenizer and model
  • Complete README and model doc of InternLM2 (InternLM2 technical report will be released next week and I will update paper link to this PR)

All the tests have passed locally except some tokenizer issues(#29617, #29626). These failed tests are skipped temporarily.

I'm not very sure that my code format and tests are proper enough to merge, if there are any changes neede,d please let me know and I will fix them asap!

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker

@vansin
Copy link
Contributor

vansin commented Mar 15, 2024

amazing!!!

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! Thanks for porting this model, is the only different with the Llama arch the non splitted QKV? If so, you have the Persimmon architecture that should be compatible out of the box with weight renaming!
What about the tokenizer, pretty sure the first InternLM used LlamaTokenizer converted by @Rocketknight1 no?

@x54-729
Copy link
Author

x54-729 commented Mar 27, 2024

Hey! Thanks for porting this model, is the only different with the Llama arch the non splitted QKV? If so, you have the Persimmon architecture that should be compatible out of the box with weight renaming! What about the tokenizer, pretty sure the first InternLM used LlamaTokenizer converted by @Rocketknight1 no?

@ArthurZucker

Thanks for your reply!

  1. Yes, QKV of InternLM2 are merged. According to 2.2 Model Structure section of our tech report: https://arxiv.org/pdf/2403.17297.pdf, The arrangement of merged QKV can accelerate inference. Also the merged QKV is convenient for tensor parallism.

  2. Our tokenizer is different from llama since our add_dummy_prefix is False, so use llama tokenizer for internlm previously will cause some problems. Additionally, InternLM2 chat model uses some special tokens, so I have to use InternLM2Converter1 instead of LlamaConverter`.

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 27, 2024

Regarding 1. There is a Llama like model with fused QKV in transformers. Perismmon! If it's not exact match, I recommend starting from this arch with copied from!
2. If these are the only difference, GemmaTokenizer is the closest. You probably don't even need that and can just use AutoTokenizer with a PreTrainedTokenizerFast just using the tokenizer.json. Something similar to the gemma tokenizer should do the trick? I can probably help you with that if you want? 🤗

@x54-729
Copy link
Author

x54-729 commented Mar 28, 2024

Regarding 1. There is a Llama like model with fused QKV in transformers. Perismmon! If it's not exact match, I recommend starting from this arch with copied from! 2. If these are the only difference, GemmaTokenizer is the closest. You probably don't even need that and can just use AutoTokenizer with a PreTrainedTokenizerFast just using the tokenizer.json. Something similar to the gemma tokenizer should do the trick? I can probably help you with that if you want? 🤗

These were what you recommended to do:

  1. Using copied from command to write our model based on Perismmon if internlm2's architecture is not excatly the same as perismmon.
  2. Upload tokenizer.json file to our model repositories to deal with add_dummy_prefix and special chat tokens, so there is no need to submit new tokenizer in this PR for internlm2.

Is that right?

@ArthurZucker
Copy link
Collaborator

ArthurZucker commented Mar 28, 2024

Yes! I

("falcon", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),

falcon only relies on a tokenizer.json ! If this does not work, then let's add InternLm2TokenizerFast, using copied from as well.

for 1. I can have another look to check what is the closest model to help, pretty sure it's persimmon (merge qkv) and phi (split qkv)

@x54-729
Copy link
Author

x54-729 commented Mar 29, 2024

Yes! I

("falcon", (None, "PreTrainedTokenizerFast" if is_tokenizers_available() else None)),

falcon only relies on a tokenizer.json ! If this does not work, then let's add InternLm2TokenizerFast, using copied from as well.

for 1. I can have another look to check what is the closest model to help, pretty sure it's persimmon (merge qkv) and phi (split qkv)

Thanks! I will try tokenizer.json later !

For Perismmon model, it does not support GQA and the arrangement of qkv seems be different from internlm2.
On the other hand, names of state are not the same. Thus writing internlm2 based on LLaMA seems to be more convenient? I'm not pretty sure about this

@ArthurZucker
Copy link
Collaborator

Gemma or Phi should be the closest! @SunMarc will do the next round of review! Ping him if you need any help in the mean time! 🤗

@SunMarc
Copy link
Member

SunMarc commented Apr 17, 2024

Hi @x54-729, just checking out if you are still planning to finish this PR. Feel free to ask me any question =)

@huggingface huggingface deleted a comment from github-actions bot May 13, 2024
Copy link

github-actions bot commented Jun 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Jun 15, 2024
@SunMarc SunMarc reopened this Jun 17, 2024
@github-actions github-actions bot closed this Jun 26, 2024
@SunMarc SunMarc reopened this Jun 26, 2024
@github-actions github-actions bot closed this Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants