Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[refactor] model loading - no more unnecessary file downloads #2345

Merged
merged 9 commits into from
Dec 12, 2023

Conversation

tomaarsen
Copy link
Collaborator

Hello!

Pull Request overview

  • Refactor the model loading;
    • No longer download the full model repository.
    • Update cache format to git style via hf_hub_download.
    • No longer use deprecated cached_download.
    • Soft deprecation of use_auth_token in favor of token as required by recent transformers/huggingface_hub versions.
  • Add test to ensure that correct/appropriate files are downloaded.

Details

In short, model downloading has moved from greedy full repository downloading to lazy per-module downloading, where no files are downloaded for Transformers modules.

Original model loading steps

  1. Greedily load the full model repository to the cache folder.
  2. Check if modules.json exists.
  3. If so, load all modules individually using the local files downloaded in the last step.
  4. If not, load Transformer using the local files downloaded in the last step + Pooling.
  5. Done

New model loading steps

  1. Check if modules.json exists locally or on the Hub.
  2. If so,
    a. Download the ST configuration files ('config_sentence_transformers.json', 'README.md', 'modules.json') if they're remote.
    b. For each module, if it is not transformers, then download (if necessary) the directory with configuration/weights for that module. If it is transformers, then do not download & load the model using the model_name_or_path.
  3. If not, load Transformer using the model_name_or_path + Pooling.
  4. Done

With this changed setup, we defer downloading any transformers data to transformers itself. In a test model that I uploaded with both pytorch_model.bin and model.safetensors, only the safetensors file is loaded. This is verified in the attached test case.

Additional changes

As required by huggingface_hub, we now use token instead of use_auth_token. If use_auth_token is still provided, then token = use_auth_token is set, and a warning is given. I.e. a soft deprecation.

  • Tom Aarsen

@tomaarsen tomaarsen changed the title Refactor model loading - no more unnecessary file downloads [refactor] model loading - no more unnecessary file downloads Nov 7, 2023
Copy link
Contributor

@bwanglzu bwanglzu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some very minor comments, do you think it make sense, at some point to refactor tests in pytests? i personally find it much more effective than unittest

@tomaarsen
Copy link
Collaborator Author

I also prefer pytest. I would indeed like to fully refactor the tests and heavily improve them. The current coverage is quite low for my tastes! Thanks for the review by the way!

  • Tom Aarsen

@Sirri69
Copy link

Sirri69 commented Dec 11, 2023

Somebody for the love of god, please merge this and update pypi

@Sirri69
Copy link

Sirri69 commented Dec 12, 2023

THANK YOU

@tomaarsen
Copy link
Collaborator Author

@Sirri69 I'm on it 😉 Give it a few days.

I made updates to introduce better support if Internet is unavailable. Now, we can run the following script under various settings:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
embeddings = model.encode("This is a test sentence", normalize_embeddings=True)
print(embeddings.shape)

These are now the outputs under the various settings:

Internet No Internet
Cache (384,) (384,)
No Cache modules.json: 100%|█████████████████████████████████| 349/349 [00:00<?, ?B/s]
config_sentence_transformers.json: 100%|████████████| 116/116 [00:00<?, ?B/s]
README.md: 100%|████████████████████████████████| 10.6k/10.6k [00:00<?, ?B/s]
sentence_bert_config.json: 100%|██████████████████| 53.0/53.0 [00:00<?, ?B/s]
config.json: 100%|██████████████████████████████████| 612/612 [00:00<?, ?B/s]
pytorch_model.bin: 100%|████████████████| 90.9M/90.9M [00:06<00:00, 14.9MB/s]
tokenizer_config.json: 100%|████████████████████████| 350/350 [00:00<?, ?B/s]
vocab.txt: 100%|██████████████████████████| 232k/232k [00:00<00:00, 1.36MB/s]
tokenizer.json: 100%|█████████████████████| 466k/466k [00:00<00:00, 4.97MB/s]
special_tokens_map.json: 100%|██████████████| 112/112 [00:00<00:00, 90.1kB/s]
1_Pooling/config.json: 100%|████████████████████████| 190/190 [00:00<?, ?B/s]
(384,)
OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/all-MiniLM-L6-v2 is not the path to a directory containing a file named config.json. Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

This is exactly what I would hope to get.

cc: @nreimers as we discussed this.

  • Tom Aarsen

@peiyangL
Copy link

@tomaarsen

Hi, I appreciate this update to support model loading without an internet connection.

However, I find that loading the model is very slow without an internet connection. My testing code is as follows:

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu')
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

The output is as follows:

# without internet
<All keys matched successfully>
(1, 768)
time: 376.90756702423096

# with internet
<All keys matched successfully>
(1, 768)
time: 15.75501823425293

Additionally, I found that adding the local_files_only=True parameter speeds up model loading without an internet connection, but it is still quite slow.

import time
start = time.time()
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1", trust_remote_code=True, device='cpu', local_files_only=True)
emb = model.encode(["hello world"])
print(emb.shape)
print('time:', time.time()-start)

# output:
# <All keys matched successfully>
# (1, 768)
# time: 145.69492316246033

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants