Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I am not able load a module from my local disk after download from huggingface #2626

Closed
wassimsuleiman opened this issue Feb 9, 2022 · 15 comments
Labels
bug Something isn't working wontfix This will not be worked on

Comments

@wassimsuleiman
Copy link

i downloaded the module from huggingface
https://huggingface.co/flair/ner-english-ontonotes-large/tree/main

i am trying to load the SequenceTagger
tagger = SequenceTagger.load('./pytorch_model.bin')

i got:
ValueError: Connection error ?
Any idea

@wassimsuleiman wassimsuleiman added the bug Something isn't working label Feb 9, 2022
@wassimsuleiman
Copy link
Author

I manage to unzip the bin file, i had pkl file and data directories. how to go from here ? i tried to load
tagger = SequenceTagger.load('archive/data.pkl')

UnpicklingError: A load persistent id instruction was encountered,vbut no persistent_load function was specified.

Any idea

@pg020196
Copy link

Hi @wassimsuleiman,

I recently faced a similiar problem on a machine without internet access and found the following workaround:

  1. Find a machine that has unrestricted internet access and download the model using the following code. You can find this command on the hugginface hub website clicking the button </> Use in flair
from flair.models import SequenceTagger 
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Next, find the directory in which the flair models are stored on your machine. You can do so by using the following code. Default is ~/.flair:
import flair
print(flair.cache_root)
  1. Navigate to that folder and copy the files to the same directory on the machine without internet access.
  2. In my case, additional cached files (tokenizer, sentencepiece model, ...) from the underlying transformer model were required to finally load the SequenceTagger. Therefore, I also copied them. The files were located at ~/.cache/huggingface/ on my computer with internet access.
  3. At the machine without internet access, set the following environment variables before trying to load the model:
import os

os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['HF_DATASETS_OFFLINE'] = '1'

  1. Load the model using the code from 1.

Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4.

Maybe somebody can contribute a more elegant solution but for now this at least works.

@helpmefindaname
Copy link
Collaborator

Hey @pg020196 you can use the current master branch, then it will work out of the box, if you redownload the model

@pg020196
Copy link

Hi @helpmefindaname, thank you for your reply! Sorry for another need for clarification on my side: If i understand your comment correctly, you are referring to the master branch of the flair project and not of the master branch of the hugging face model, aren't you? Currently, I am using the latest package available through pip (flair 0.10).

Do I still have to initially download the model using SequenceTagger.load() or does this allow me to download the pytorch_model.bin directly from the huggingface hub (https://huggingface.co/flair/ner-english-ontonotes-large/tree/main)? If I understand correctly, the current .load() method also creates a config.json file which is missing in the huggingface hub. Is this file still required when using the latest version from your master branch?

In my case, I cannot use the SequenceTagger.load() function on my target device to initially download the model since it does not have any internet access and, therefore, I have to download all the required files beforehand and copy them to a local directory on my target device.

Thank you!

@helpmefindaname
Copy link
Collaborator

Hi @pg020196 yes I refer to the master branch of flair.
It doesn't matter how you download the model, however you have to once load and save it again.
When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.

As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.

@pg020196
Copy link

Hi @helpmefindaname,
thank you for clarifying and answering my further questions. I just tried your approach and it works like a charm!

@stale
Copy link

stale bot commented Jul 30, 2022

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix This will not be worked on label Jul 30, 2022
@stale stale bot closed this as completed Sep 9, 2022
@ksachdeva11
Copy link

Hi @pg020196 yes I refer to the master branch of flair. It doesn't matter how you download the model, however you have to once load and save it again. When saving, it will internally store a zip file containing the config/vocab/... files and use them when loading it the next time.

As you cannot do that on your target device, you need to install flair on a different device and run the load/save once there.

Hello @helpmefindaname @pg020196 , could you please mention the load and save steps which need to be followed once the model is downloaded?

@pg020196
Copy link

Hi @ksachdeva11,

to save the model you can simply use the .save() function on the model instance. Looking at the example here, you can simply call tagger.save(filepath).

Loading a model is shown here, e.g. loading a SequenceTagger can be done by calling:
SequenceTagger.load(filepath)

@ksachdeva11
Copy link

ksachdeva11 commented Sep 29, 2022

Hi @pg020196 ,

I am following below steps -

  1. From machine with internet access -
from flair.models import SequenceTagger
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Then trying to save the model on the same machine using

tagger.save('path/to/directory/')

but getting this error - IsADirectoryError: [Errno 21] Is a directory: 'path/to/directory/'

Is it not the right way to do load and save?

@pg020196
Copy link

Hi @ksachdeva11,
when loading or saving the model locally, I think you have to specify path to the file and not to the directory, e. g.
tagger.save('path/to/directory/tagger_model.pt')
tagger = SequenceTagger.load('path/to/directory/tagger_model.pt')

When loading the model with SequenceTagger.load("flair/ner-english-ontonotes-large") the string value is used as an identifier for the model on the model hub and not a directory. See here

@ksachdeva11
Copy link

ksachdeva11 commented Sep 29, 2022

thanks @pg020196 ... getting this error now while saving the model..

tagger.save('path/to/directory/tagger_model.pt')

~/anaconda3/envs/amazonei_pytorch_latest_p37/lib/python3.7/site-packages/flair/embeddings/base.py in _tokenizer_bytes(self)
    344             files = list(self.tokenizer.save_pretrained(temp_dir))
    345             if self.tokenizer.is_fast:
--> 346                 vocab_files = self.tokenizer.slow_tokenizer_class.vocab_files_names.values()
    347                 files = [f for f in files if all(v not in f for v in vocab_files)]
    348             zip_data = BytesIO()

AttributeError: 'NoneType' object has no attribute 'vocab_files_names'

@pg020196
Copy link

@ksachdeva11 sorry, unfortunately I can't help you with this issue. Did you make sure that the model was fully and correctly loaded, meaning that you can make predictions? If not, I would recommend to check out the examples, they are really useful. Otherwise the only thing I can point out is that I tried all of the above steps with python 3.9, torch 1.11.0 and flair 0.11. Maybe there are some dependency issues with other versions. Additionally you could try another model and see if the issue still exists to narrow down the options.

@ksachdeva11
Copy link

@pg020196 updating the flair version to 0.11 helped. Thanks a lot!

@skwskwskwskw
Copy link

Hi @wassimsuleiman,

I recently faced a similiar problem on a machine without internet access and found the following workaround:

  1. Find a machine that has unrestricted internet access and download the model using the following code. You can find this command on the hugginface hub website clicking the button </> Use in flair
from flair.models import SequenceTagger 
tagger = SequenceTagger.load("flair/ner-english-ontonotes-large")
  1. Next, find the directory in which the flair models are stored on your machine. You can do so by using the following code. Default is ~/.flair:
import flair
print(flair.cache_root)
  1. Navigate to that folder and copy the files to the same directory on the machine without internet access.
  2. In my case, additional cached files (tokenizer, sentencepiece model, ...) from the underlying transformer model were required to finally load the SequenceTagger. Therefore, I also copied them. The files were located at ~/.cache/huggingface/ on my computer with internet access.
  3. At the machine without internet access, set the following environment variables before trying to load the model:
import os

os.environ['TRANSFORMERS_OFFLINE'] = '1'
os.environ['HF_DATASETS_OFFLINE'] = '1'
  1. Load the model using the code from 1.

Note: If you simply want to store the model for a later offline usage on the same computer, you can skip step 2. to 4.

Maybe somebody can contribute a more elegant solution but for now this at least works.

It's still not working and it kept trying to connect even though I have all the necessary files on the PC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants