Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rxnorm linker doesn't work with multiprocessing? #345

Open
kpich opened this issue Apr 5, 2021 · 3 comments
Open

rxnorm linker doesn't work with multiprocessing? #345

kpich opened this issue Apr 5, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@kpich
Copy link

kpich commented Apr 5, 2021

Hi, I'm getting an error trying to run nlp.pipe with n_processes > 1, I think because the pickling that multiprocessing does under the hood interacts poorly with nmslib.dist.FloatIndex, which the rxnorm entity linker requires and does not seem picklable.

Minimal code:

import spacy
import scispacy
from scispacy.linking import EntityLinker

TEXTS = ["Hello! This is document 1.", "And here's doc 2."]

if __name__ == '__main__':
  nlp = spacy.load("en_core_sci_sm")
  nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True,
                                          "linker_name": "rxnorm"})
  for doc in nlp.pipe(TEXTS, n_process=2):
    print(doc)

Running with Python 3.8.5 gives me:

Traceback (most recent call last):
  File "./mwerror.py", line 13, in <module>
    for doc in nlp.pipe(TEXTS, n_process=2):
  File ".../python3.8/site-packages/spacy/language.py", line 1479, in pipe
    for doc in docs:
  File ".../python3.8/site-packages/spacy/language.py", line 1515, in _multiprocessing_pipe
    proc.start()
  File ".../python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File ".../python3.8/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File ".../python3.8/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File ".../python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File ".../python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File ".../python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File ".../python3.8/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'nmslib.dist.FloatIndex' object

Note I don't get an error with n_process=1, presumably because multiprocessing is not invoked.

I also do not get this error if I don't include the linker pipe (i.e. comment out the add_pipe() line above).

Thanks! This lib is great!

@kpich
Copy link
Author

kpich commented Apr 6, 2021

Hey, seems like it works as expected (i.e. doesn't crash) on linux? Error above was from running on OSX 10.14.6.

(FYI I suspect it might something to do with multiprocessing using spawn rather than fork by default on OSX as of py3.8 [doc link] but IDK)

@dakinggg
Copy link
Collaborator

Interesting, not sure off the top of my head. Leaving this open for now, let me know if you happen to resolve anything. At a minimum, you could do the parallelization yourself, but ideally it would work with spacy's parallelization.

@kpich
Copy link
Author

kpich commented Apr 10, 2021

I actually initially tried doing the parallelization myself with joblib, calling nlp() inside the parallelized code, and it gave me the same error as the spacy nlp.pipe snippet I posted.

Will let you know if I come across anything, but it seems to work fine on linux fwiw.

@dakinggg dakinggg added the bug Something isn't working label Sep 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants