Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: triton generate support #675
base: main
Are you sure you want to change the base?
feature: triton generate support #675
Changes from 15 commits
c822917
fddba6d
61ae566
f1b4646
56e662c
5503c85
bfc1c7a
d576d25
1f50b6e
8dcc53d
7f740a9
73ffa08
056c099
e7b3e53
0c7f5a1
1026361
20a3c07
69480bd
9f153f8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes in this file should be renamed since they aren't specific to Chat Completions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "HTTP with SSE Support" is in the end what it is .... not sure the best name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd really like to see the classes refactored. We shouldn't need two independent full http clients. Either one goes away, or we get a base class and then some really thin implementation classes on top. We already have stories for this (TMA-1644), so no big deal if this is ignored in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change the name of the function since it's no longer just openai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the code be changed to check endpoint_type != kserve? I know that with the current code it is the same result, but it introduces an assumption (endpoint kserve -> service_kind triton) that could trip up a future developer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can generate endpoint not use trtllm vs vllm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can - I haven't added any different behavior for the different backends. Actually - it has only been tested against vllm at the moment. So this is fair point ...
Let me move this back to draft - plan to test trt-llm in the next week or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally one less CLI option 😄 Can we also update the README to reflect the changes in CLI options?