You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We noticed that the deployment success rate of the TEI models with the HF text-embedding-inference container is very low for the recent top trending TEI models, in our deployment verification pipeline. We see only 23 out of a total of 500 were successful.
Example deployment
IMAGE=us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204
MODEL_ID=ncbi/MedCPT-Article-Encoder
docker run --gpus all -p 7080:80 -e MODEL_ID=${MODEL_ID} --pull always $IMAGE
We looked into the root cause of the failures, and we found the following two common failures:
Error: The --pooling arg is not set and we could not find a pooling configuration (1_Pooling/config.json) for this model.
My understanding is that the container expects a config file 1_Pooling/config.json in the repo, but the model owner failed to provide one. In this case, what is your suggestion should we do here?
We tried adding a default value of pooling=mean as an environment variable to the container and we see a significant improvement of the deployment success rate. 23/500 -> 243/500.
We know the pooling parameter is required for the TEI models. However, we are not quite sure if it is the correct way to add a default value for all the TEI models and have a concern if it has a negative impact on the model quality. Can you advise how should we do here?
Looking on the model card, the model is a XLM-RoBERTa model, I guess it is the reason the repo does not have a tokenizer.json file? To use the model, the user also needs to load the tokenizer first
There are many models failed with the same error, can you advise if there is anything we can do here? we also have a concern that if we provide a default tokenizer will negatively impact on the model quality.
Thanks!
The text was updated successfully, but these errors were encountered:
weigary
changed the title
Low deployment success rate using the HF text-embedding-inference container
Low deployment success rate using the HF text-embedding-inference container, with missing pooling and tokenizer
Oct 16, 2024
Regarding the second one, unfortunately TEI relies on this file for tokenization. We will update the Hub tag to make sure that these models are removed.
Hi HF folks,
We noticed that the deployment success rate of the TEI models with the HF
text-embedding-inference
container is very low for the recent top trending TEI models, in our deployment verification pipeline. We see only 23 out of a total of 500 were successful.Example deployment
We looked into the root cause of the failures, and we found the following two common failures:
--pooling
arg is not set and we could not find a pooling configuration (1_Pooling/config.json
) for this model.My understanding is that the container expects a config file
1_Pooling/config.json
in the repo, but the model owner failed to provide one. In this case, what is your suggestion should we do here?We tried adding a default value of
pooling=mean
as an environment variable to the container and we see a significant improvement of the deployment success rate. 23/500 -> 243/500.We know the pooling parameter is required for the TEI models. However, we are not quite sure if it is the correct way to add a default value for all the TEI models and have a concern if it has a negative impact on the model quality. Can you advise how should we do here?
0: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/facebook/MEXMA/resolve/main/tokenizer.json)
1: HTTP status client error (404 Not Found) for url (https://huggingface.co/facebook/MEXMA/resolve/main/tokenizer.json)
My understanding is that the container expects a
tokenizer.json
file in the repo but the model owner failed to provide one.Looking on the model card, the model is a XLM-RoBERTa model, I guess it is the reason the repo does not have a tokenizer.json file? To use the model, the user also needs to load the tokenizer first
There are many models failed with the same error, can you advise if there is anything we can do here? we also have a concern that if we provide a default tokenizer will negatively impact on the model quality.
Thanks!
The text was updated successfully, but these errors were encountered: