-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
server: use chunked inputs #1985
Conversation
544ddf5
to
64bea24
Compare
64bea24
to
94affb7
Compare
from text_generation_server.pb import generate_pb2 | ||
|
||
|
||
def concat_text_chunks(chunks: Iterable[generate_pb2.InputChunk]) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this method only to be future proof or is there a way today to have multiple text chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK we can currently only have multiple text chunks in a VLM models, so this was indeed only to future proof.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then mb we should take [0] and crash if len > 1 with an unreachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, we do need to iterate over chunks, because we are sending image chunks unconditionally during warmup, even for text-only models:
if n_tokens == 0 { |
The current approach seems more robust? What do you think about logging a warning when len(texts) > 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to:
- Fail when there is more than one text chunk.
- Fail when there is no text chunk.
- Log at debug-level when there is a non-text chunk (only log because e.g. warmup sends an image chunk).
94affb7
to
77a6805
Compare
77a6805
to
e1dd4ae
Compare
e1dd4ae
to
3181443
Compare
The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.
3181443
to
633e2bd
Compare
What does this PR do?
The router will now send the input as chunks besides as a single string. This change modifies the server to process chunked input rather than strings. This also allows us to remove the image extraction code from the server.
Draft note: mostly checking whether all models pass, also needs to be rebased on #1981 once that PR is merged.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.