Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: supports openai chat completions API #1408

Closed
wants to merge 9 commits into from

Conversation

drbh
Copy link
Collaborator

@drbh drbh commented Jan 5, 2024

This PR adds support to make TGI a drop in replacement for OpenAI clients by exposing the same HTTP interface.

Notes

  • TGI inits a single model at startup so the model field is unused in HTTP requests.
  • max_tokens and stream should work as expected but other params may be (unimplemented or not supported)

General approach

  • fetch the tokenizer_config at startup from the hub
  • pass tokenizer_config into Infer so we have it at request time
  • use the chat_template on the config to format chat request
  • parse jinja template and render chat string
  • pass inputs into existing generate function
  • wrap generation output in expected structure before returning

How to test

Streaming curl

curl localhost:3000/v1/chat/completions \
    -X POST \
    -d '{
  "model": "tgi",
  "messages": [
    {
      "role": "system",
      "content": "You are a helpful assistant."
    },
    {
      "role": "user",
      "content": "What is deep learning?"
    }
  ],
  "stream": true,
  "max_tokens": 20
}' \
    -H 'Content-Type: application/json'

It is also possible to use the openai python library and change the base url

🌊 STREAMING REQUEST

from openai import OpenAI

# init the client but point it to TGI
client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="not needed for a local LLM"
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant." },
        {"role": "user", "content": "What is deep learning?"}
    ],
    stream=True
)

# iterate and print stream
for message in chat_completion:
    print(message)

# ChatCompletionChunk(id='', choices=[Choice(delta=ChoiceDelta(content=' that', function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=2, logprobs=None)], created=1704486761, model='', object='text_completion', system_fingerprint='')

🚗 SYNCHRONOUS REQUEST

from openai import OpenAI

# init the client but point it to TGI
client = OpenAI(
    base_url="http://localhost:3000/v1",
    api_key="not needed for a local LLM"
)

chat_completion = client.chat.completions.create(
    model="tgi",
    messages=[
        {"role": "system", "content": "You are a helpful assistant." },
        {"role": "user", "content": "What is deep learning?"}
    ],
    stream=False
)

print(chat_completion)
# ChatCompletion(id='', choices=[Choice(finish_reason=None, index=0, logprobs=None, message=ChatCompletionMessage(content='\nDeep learning is a new field of research that has been gaining traction in the last ...', role='assistant', function_call=None, tool_calls=None))], created=1704486762, model='', object='text_completion', system_fingerprint='', usage=CompletionUsage(completion_tokens=100, prompt_tokens=76, total_tokens=176))

@drbh
Copy link
Collaborator Author

drbh commented Jan 5, 2024

@Narsil and @OlivierDehaene please let me know if any changes should be made!

router/src/infer.rs Outdated Show resolved Hide resolved
@drbh drbh requested a review from OlivierDehaene January 8, 2024 15:45
router/src/lib.rs Outdated Show resolved Hide resolved
router/src/lib.rs Outdated Show resolved Hide resolved
router/src/lib.rs Outdated Show resolved Hide resolved
router/src/lib.rs Show resolved Hide resolved
router/src/lib.rs Outdated Show resolved Hide resolved
router/src/lib.rs Outdated Show resolved Hide resolved
router/src/server.rs Outdated Show resolved Hide resolved
router/src/server.rs Outdated Show resolved Hide resolved
router/src/infer.rs Outdated Show resolved Hide resolved
router/src/server.rs Outdated Show resolved Hide resolved
@drbh drbh requested a review from OlivierDehaene January 9, 2024 19:06
drbh added a commit that referenced this pull request Jan 10, 2024
prefer PR from original repo rather than fork to run CI #1408
@Michellehbn
Copy link
Member

Noting we can probably close #735 when this is done!

@drbh
Copy link
Collaborator Author

drbh commented Jan 10, 2024

closing in favor of CI enabled PR #1427

@drbh drbh closed this Jan 10, 2024
drbh added a commit that referenced this pull request Jan 10, 2024
prefer PR from original repo rather than fork to run CI #1408
drbh added a commit that referenced this pull request Jan 11, 2024
prefer PR from original repo rather than fork to run CI #1408
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants