You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm playing around with spacyr as a potential replacement for cleanNLP. I've never really done anything directly with spaCy (ie, in Python).
I'm not seeing a big performance difference between multithread = TRUE (43 sec on a test dataset of 2k journal abstracts) and multithread = FALSE (53 sec). Is there some additional configuration I need to do to take advantage of multithreading?
The text was updated successfully, but these errors were encountered:
I've also had this problem. As far as I can tell multithreading in spacyr_1.2.1 doesn't work, as top does not show any additional processes being spawned, whether the setting is TRUE or FALSE.
I did succeed in building a parallelized workaround by setting multithread = FALSE and adding a doParallel/foreach framework on top: https://github.com/SeanFobbe/R-fobbe-proto-package/blob/main/f.dopar.spacyparse.R The same approach with a future front/backend fails because of non-exportable objects. Not sure why this doesn't affect the doParallel approach.
My setup is Fedora 34, running on an AMD Ryzen 7 3700X, using spacyr_1.2.1. I'm happy to supply smaller and larger corpora to test this, but I believe this is a spacyr issue, not a data issue. A good testing corpus (not too large) might be this one (is in German, though): https://doi.org/10.5281/zenodo.3902658
Fairly certain that this is related to #202 as multithread = TRUE drastically increases RAM usage without a detectable speed boost.
I'm playing around with spacyr as a potential replacement for cleanNLP. I've never really done anything directly with spaCy (ie, in Python).
I'm not seeing a big performance difference between
multithread = TRUE
(43 sec on a test dataset of 2k journal abstracts) andmultithread = FALSE
(53 sec). Is there some additional configuration I need to do to take advantage of multithreading?The text was updated successfully, but these errors were encountered: