You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ReadMe, it shows that when sending a request to API server, use this format:
curl http://localhost:18888/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "openchat_3.5",
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
}'
It also says: when using transformers for inference, use this format:
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
It seems that the method tokenize_conversations in conversation_template.py only adds eot to messages, but did not add GPT4 Correct User and GPT4 Correct Assistant (only used user and assistant in messages)?
As a result, the same input is processed differently for API server and Inference with transformers?
Because of this, the model that I tested for inference with transformers will behave differently when it is deployed
using API Server?
The text was updated successfully, but these errors were encountered:
In ReadMe, it shows that when sending a request to API server, use this format:
curl http://localhost:18888/v1/chat/completions
-H "Content-Type: application/json"
-d '{
"model": "openchat_3.5",
"messages": [{"role": "user", "content": "You are a large language model named OpenChat. Write a poem to describe yourself"}]
}'
It also says: when using transformers for inference, use this format:
GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:
It seems that the method tokenize_conversations in conversation_template.py only adds eot to messages, but did not add GPT4 Correct User and GPT4 Correct Assistant (only used user and assistant in messages)?
As a result, the same input is processed differently for API server and Inference with transformers?
Because of this, the model that I tested for inference with transformers will behave differently when it is deployed
using API Server?
The text was updated successfully, but these errors were encountered: