-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[refactor]
model loading - no more unnecessary file downloads
#2345
Conversation
Deprecated arguments are not listed in docstrings
[refactor]
model loading - no more unnecessary file downloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest
I also prefer
|
Somebody for the love of god, please merge this and update pypi |
THANK YOU |
@Sirri69 I'm on it 😉 Give it a few days. I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings: from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape) These are now the outputs under the various settings:
This is exactly what I would hope to get. cc: @nreimers as we discussed this.
|
Hi, I appreciate this update to support model loading without an internet connection. However, I find that loading the model is very slow without an internet connection. My testing code is as follows: import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start) The output is as follows:
Additionally, I found that adding the import time
start = time.time()
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)
# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033 |
Hello!
Pull Request overview
hf_hub_download
.cached_download
.use_auth_token
in favor oftoken
as required by recenttransformers
/huggingface_hub
versions.Details
In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for
Transformers
modules.Original model loading steps
modules.json
exists.Transformer
using the local files downloaded in the last step +Pooling
.New model loading steps
modules.json
exists locally or on the Hub.a. Download the ST configuration files (
'config_sentence_transformers.json'
,'README.md'
,'modules.json'
) if they're remote.b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the
model_name_or_path
.Transformer
using themodel_name_or_path
+Pooling
.With this changed setup, we defer downloading any
transformers
data totransformers
itself. In a test model that I uploaded with bothpytorch_model.bin
andmodel.safetensors
, only the safetensors file is loaded. This is verified in the attached test case.Additional changes
As required by
huggingface_hub
, we now usetoken
instead ofuse_auth_token
. Ifuse_auth_token
is still provided, thentoken = use_auth_token
is set, and a warning is given. I.e. a soft deprecation.