-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WISH: Support also SnowParam(type = "PSOCK") #231
Comments
Buried in the help page ?SnowParam is this note:
But naive testing suggests this no longer seems to be the case (either because of changes in parallel or BiocParallel) so I have started a 'PSOCK' branch. Is there an easy way to generate the socket connection error? |
I missed that note. I don't think I've ever seen argument Looking at snow, it looks like '/path/to/lib/R/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.workRSOCK()' MASTER=localhost PORT=11312 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential
Excellent.
I don't think so. It's a race condition that appears when many R processes try to create a cluster using the same port. Give that the default is randomizing a port from 11000:11999, it only happens once in a while, but if you check enough things in parallel you end up with it often enough for it to add friction. Before R 4.0.0, I did see it once in a while happening to the future package on the CRAN servers, because I do tons of testing there. It disappeared at the next round of checks. BTW, I'm not sure, but I also think the race condition could also happen to launch parallel workers in one |
Yes, parallel's implementation doesn't allow customization of the worker startup script, whereas snow (& therefore SOCK, MPI, FORK) can (and are, by BiocParallel) be customized. Looking a little more deeply makes it seem likely that BiocParallel's
versus
|
You can probably use FWIW, I've made some of these things easier and more robust in |
Background
SnowParam()
supportstype = "SOCK"
(default),type = "MPI"
, andtype = "FORK"
. The former two stems from the days of snow package and the latter was introduced with the parallel package. Thetype
argument is passed toparallel::makeCluster()
as-is;Wish
Please add support also for
type = "PSOCK"
, which is the default forparallel::makeCluster()
[since day one back in 2014, I think]. It looks like it would be quite straightforward to do this.Why add this? Because, PSOCK clusters have undergone lots of improvements since snow was incorporated into parallel. For example, in R (>= 4.0.0), the nodes ("workers") of PSOCK cluster is set up in parallel, instead of sequentially. This makes the setup much faster, e,g.
Source: https://www.jottr.org/2021/06/10/parallelly-1.26.0/
In addition, this parallel setup strategy avoids port clashes that we saw in parallel (< 4.0.0), and still in snow (since it's deprecated and not improved on), e.g.
FYI, I haven't seen those type of errors since R (< 4.0.0), except from revdep checking packages relying on snow. More recently while revdep checking Bioconductor package DMCFB that uses
SnowParam
in it's package tests.The text was updated successfully, but these errors were encountered: