Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the support for the Mistral Model seem inconsistent or incomplete with it official? #29533

Closed
MrYxJ opened this issue Mar 8, 2024 · 6 comments

Comments

@MrYxJ
Copy link

MrYxJ commented Mar 8, 2024

Feature request

Hello , thank you very much for creating the library transformers which provides a very convenient way to use the different models.

Recently, when I use Mistral Model by transformers (4.38.2) library to perform the model inference, the execution did not seem to be completely consistent with the official mistral papers and the code they provided.

Specifically, firstly the mistral have a Pre-fill and Chunking mechanism when prompt input encoding(As shown in the following picture). Is this method included in the transformers library's implementation of mistral's generate function?

图片

Secondly, Mistral use a Rolling Buffer Cache mechanism to enable key values to be updated in the cache as the sliding window moves. (As shown in the following picture).
图片

When I read this piece of the transformers source code, this mechanism is implemented in the transformer library using the DynamicCache update method, and this operation is actually to concatenate all the past keys and values with the current keys and values. https://github.com/huggingface/transformers/blob/b338a6c3b8eda29610d4d472cad8cd87cbfdaaed/src/transformers/cache_utils.py#L126C1-L132C105

Am I mistaken in the logic of this program or is the implementation not fully consistent with the mistral paper? In addition, I read the official source code of mistral and found that they do release two kinds of model inference code, one of which is simple version. Is this your current implementation?

Motivation

I want to know how to use transformers library on the mistral model to achieve the same way as in mistral paper.
Whether I read the code wrong or do I need to make some additional changes based on transformer?

Your contribution

I can try fix this piece code to achieve the same implementation as in the mistral paper based on transformers.

@amyeroberts
Copy link
Collaborator

Possibly related: #29519

cc @ArthurZucker @gante

@gante
Copy link
Member

gante commented Mar 8, 2024

Hi @MrYxJ 👋

The sliding window logic from the pre-fill and chunking mechanism is implemented differently depending on the attention layer:

  • eager (i.e. vanilla) attention: the attention mask with sliding window is created here
  • flash attention 2: the sliding window is applied in the forward pass of the attention layer itself.
  • SDPA: it is not enabled on main, this PR enables it

Regarding the rolling buffer cache mechanism, we can see that it is not needed for the correctness of the model -- the sliding window discards the same older values, i.e. the two mechanisms are redundant. The rolling buffer cache is, however, extremely beneficial for memory purposes. We are working on it (for all models), although currently only Llama and Gemma support it -- search for "static cache".

(@ArthurZucker please correct if what I wrote above is not correct :) )

Copy link

github-actions bot commented Apr 8, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@MrYxJ
Copy link
Author

MrYxJ commented Apr 9, 2024

Hi @gante, thank you very much for your detailed answer. By your answer, I have already confirmed it clear. Thank you very much.

@MrYxJ MrYxJ closed this as completed Apr 9, 2024
@cyr0930
Copy link

cyr0930 commented May 8, 2024

But still StaticCache is not implemented in Mistral yet, right?

@ArthurZucker
Copy link
Collaborator

#30642 is coming 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants