Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid unnecessarily disabling CUDA graphs (ggerganov#7302)
As discussed in PR ggerganov#6766, CUDA graphs were being disabled in the presence of long prompts. This fixes the issue by avoiding the consective update counter from incrementing unnecessarily for tokens in which cuda graphs are disabled due to batch size > 1.
- Loading branch information