Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using multithreading #206

Closed
dhicks opened this issue May 25, 2021 · 2 comments
Closed

Using multithreading #206

dhicks opened this issue May 25, 2021 · 2 comments

Comments

@dhicks
Copy link

dhicks commented May 25, 2021

I'm playing around with spacyr as a potential replacement for cleanNLP. I've never really done anything directly with spaCy (ie, in Python).

I'm not seeing a big performance difference between multithread = TRUE (43 sec on a test dataset of 2k journal abstracts) and multithread = FALSE (53 sec). Is there some additional configuration I need to do to take advantage of multithreading?

@SeanFobbe
Copy link

I've also had this problem. As far as I can tell multithreading in spacyr_1.2.1 doesn't work, as top does not show any additional processes being spawned, whether the setting is TRUE or FALSE.

I did succeed in building a parallelized workaround by setting multithread = FALSE and adding a doParallel/foreach framework on top: https://github.com/SeanFobbe/R-fobbe-proto-package/blob/main/f.dopar.spacyparse.R The same approach with a future front/backend fails because of non-exportable objects. Not sure why this doesn't affect the doParallel approach.

My setup is Fedora 34, running on an AMD Ryzen 7 3700X, using spacyr_1.2.1. I'm happy to supply smaller and larger corpora to test this, but I believe this is a spacyr issue, not a data issue. A good testing corpus (not too large) might be this one (is in German, though): https://doi.org/10.5281/zenodo.3902658

Fairly certain that this is related to #202 as multithread = TRUE drastically increases RAM usage without a detectable speed boost.

@kbenoit
Copy link
Collaborator

kbenoit commented Sep 1, 2022

We're aware and working on it... Moving to #185.

@kbenoit kbenoit closed this as completed Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants