Hangs with PyTorch data loaders when `num_workers > 0` #34

ntoxeg · 2024-03-22T15:07:59Z

OS: Ubuntu 22.04
Python version: 3.11.8
PyTorch version: 2.2.1
Tokenmonster package version: 1.1.12
Other libraries: lightning==2.2.1, datasets==2.18.0

Like in the title, I load the tokenizer with load_multiprocess_safe, the dataset is just a bunch of plain text files to load and tokenize. I have tested each stage of loading and there are no problems until I wrap it in a DataLoader and use num_workers > 0, it hangs forever then.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hangs with PyTorch data loaders when `num_workers > 0` #34

Hangs with PyTorch data loaders when `num_workers > 0` #34

ntoxeg commented Mar 22, 2024

Hangs with PyTorch data loaders when num_workers > 0 #34

Hangs with PyTorch data loaders when num_workers > 0 #34

Comments

ntoxeg commented Mar 22, 2024

Hangs with PyTorch data loaders when `num_workers > 0` #34

Hangs with PyTorch data loaders when `num_workers > 0` #34