You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I'm getting an error trying to run nlp.pipe with n_processes > 1, I think because the pickling that multiprocessing does under the hood interacts poorly with nmslib.dist.FloatIndex, which the rxnorm entity linker requires and does not seem picklable.
Minimal code:
import spacy
import scispacy
from scispacy.linking import EntityLinker
TEXTS = ["Hello! This is document 1.", "And here's doc 2."]
if __name__ == '__main__':
nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True,
"linker_name": "rxnorm"})
for doc in nlp.pipe(TEXTS, n_process=2):
print(doc)
Running with Python 3.8.5 gives me:
Traceback (most recent call last):
File "./mwerror.py", line 13, in <module>
for doc in nlp.pipe(TEXTS, n_process=2):
File ".../python3.8/site-packages/spacy/language.py", line 1479, in pipe
for doc in docs:
File ".../python3.8/site-packages/spacy/language.py", line 1515, in _multiprocessing_pipe
proc.start()
File ".../python3.8/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File ".../python3.8/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File ".../python3.8/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File ".../python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File ".../python3.8/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File ".../python3.8/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File ".../python3.8/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'nmslib.dist.FloatIndex' object
Note I don't get an error with n_process=1, presumably because multiprocessing is not invoked.
I also do not get this error if I don't include the linker pipe (i.e. comment out the add_pipe() line above).
Thanks! This lib is great!
The text was updated successfully, but these errors were encountered:
Interesting, not sure off the top of my head. Leaving this open for now, let me know if you happen to resolve anything. At a minimum, you could do the parallelization yourself, but ideally it would work with spacy's parallelization.
I actually initially tried doing the parallelization myself with joblib, calling nlp() inside the parallelized code, and it gave me the same error as the spacy nlp.pipe snippet I posted.
Will let you know if I come across anything, but it seems to work fine on linux fwiw.
Hi, I'm getting an error trying to run
nlp.pipe
withn_processes > 1
, I think because the pickling thatmultiprocessing
does under the hood interacts poorly withnmslib.dist.FloatIndex
, which the rxnorm entity linker requires and does not seem picklable.Minimal code:
Running with Python 3.8.5 gives me:
Note I don't get an error with
n_process=1
, presumably becausemultiprocessing
is not invoked.I also do not get this error if I don't include the linker pipe (i.e. comment out the
add_pipe()
line above).Thanks! This lib is great!
The text was updated successfully, but these errors were encountered: