You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Like in the title, I load the tokenizer with load_multiprocess_safe, the dataset is just a bunch of plain text files to load and tokenize. I have tested each stage of loading and there are no problems until I wrap it in a DataLoader and use num_workers > 0, it hangs forever then.
The text was updated successfully, but these errors were encountered:
OS: Ubuntu 22.04
Python version: 3.11.8
PyTorch version: 2.2.1
Tokenmonster package version: 1.1.12
Other libraries:
lightning==2.2.1
,datasets==2.18.0
Like in the title, I load the tokenizer with
load_multiprocess_safe
, the dataset is just a bunch of plain text files to load and tokenize. I have tested each stage of loading and there are no problems until I wrap it in aDataLoader
and usenum_workers > 0
, it hangs forever then.The text was updated successfully, but these errors were encountered: