Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: ensure token addition to batch does not exceed llama_batch size #9668

Merged
merged 1 commit into from
Sep 29, 2024

Conversation

matiaslin
Copy link
Contributor

fix #9667

A crash was observed when the number of tokens added to a batch exceeds the context size. Assertions have been added to ensure the number of tokens added to batch is within bounds of context size.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more elegant solution is to use the fact that llama_batch.seq_id is null-terminated:

batch.seq_id[n_tokens_alloc] = nullptr;

We can add an assert directly in llama_batch_add:

GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded");

@matiaslin
Copy link
Contributor Author

That's great @ggerganov. I'm uploading the new patch soon.

@github-actions github-actions bot added build Compilation issues testing Everything test related Vulkan Issues specific to the Vulkan backend python python script changes devops improvements to build systems and github actions server ggml changes relating to the ggml tensor library for machine learning labels Sep 28, 2024
A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
@matiaslin
Copy link
Contributor Author

Done. Now the issued mentioned in #9667 displays the following message if we attempt to add more tokens than batch size:

common.cpp:1435: GGML_ASSERT(batch.seq_id[batch.n_tokens] && "llama_batch size exceeded") failed

@matiaslin matiaslin changed the title parallel: fix adding tokens to batch common: ensure token addition to batch does not exceed llama_batch size Sep 28, 2024
@ggerganov ggerganov added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Sep 29, 2024
@ggerganov ggerganov merged commit faac0ba into ggerganov:master Sep 29, 2024
52 checks passed
@WilliamTambellini
Copy link
Contributor

Tks @matiaslin and @ggerganov

dsx1986 pushed a commit to dsx1986/llama.cpp that referenced this pull request Oct 29, 2024
…9668)

A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 15, 2024
…9668)

A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Nov 18, 2024
…9668)

A crash was observed when the number of tokens added to a batch exceeds
llama_batch size. An assertion in llama_batch_add was added to protect
against llama_batch size overflow.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Compilation issues devops improvements to build systems and github actions examples ggml changes relating to the ggml tensor library for machine learning merge ready indicates that this may be ready to merge soon and is just holding out in case of objections python python script changes server testing Everything test related Vulkan Issues specific to the Vulkan backend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: llama-parallel crashes when adding more tokens to llama_batch than context size
3 participants