Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

usage of sockets by MulticoreParam #85

Open
lawremi opened this issue Oct 24, 2018 · 11 comments
Open

usage of sockets by MulticoreParam #85

lawremi opened this issue Oct 24, 2018 · 11 comments

Comments

@lawremi
Copy link

lawremi commented Oct 24, 2018

It appears that on our cluster the port usage is too restrictive for the socket-based multicore backend. I think the base multicore uses raw connections or some sort of shared memory to communicate within the R process and its children, but I could be wrong. Is there any way to get back to a pure multicore implementation of BiocParallel? Why is the current one using sockets?

@HenrikBengtsson
Copy link
Contributor

If of any help, for a quick workaround, you can use a 'multicore' future backend, which uses parallel::mcparallel()/mccollect()-style forked processing without sockets. You can achieve this using:

library(BiocParallel)
register(DoparParam())
library(doFuture)
registerDoFuture()
plan(multicore)  ## this is where you control the actual backend

mu <- 1.0
sigma <- 2.0
x <- bplapply(1:3, mu = mu, sigma = sigma, function(i, mu, sigma) {
  rnorm(i, mean = mu, sd = sigma)
})

Or, you can use the more direct BiocParallel.FutureParam (only on GitHub):

## remotes::install_github("HenrikBengtsson/BiocParallel.FutureParam")
library("BiocParallel.FutureParam")
register(FutureParam())
plan(multicore)  ## this is where you control the actual backend

mu <- 1.0
sigma <- 2.0
x <- bplapply(1:3, mu = mu, sigma = sigma, function(i, mu, sigma) {
  rnorm(i, mean = mu, sd = sigma)
})

@mtmorgan
Copy link
Collaborator

I think base R creates a 'write once' pipe shared with the forked process and written to by the forked process before it exits. We'd like bi-directional communication persisting across multiple exchanges between the forked process and manager. The socket solution was adopted because it could be implemented in R, shared across several back-ends, and re-use existing code. But I agree that sockets cause problems, and I'd be up for exploring either pipe-based or shared memory solutions...this would take me a little time to work through.

@lawremi
Copy link
Author

lawremi commented Oct 25, 2018

I guess BiocParallel is a lot more complicated than it used to be. It's too bad there isn't some minimal abstraction that could be backed by mcmapply().

@mtmorgan
Copy link
Collaborator

it's always been implemented to support persistent workers on all back-ends.

@MPIIB-Department-TFMeyer

We also are having problems with MultiCore in BiocParallel since a few weeks. bplapply worked fine before but stopped working in both R-3.4.3 (v.1.12.0) and R-3.5.1 (1.16.0). This seems to be related to socket use, since a simple

library(BiocParallel)
a = list(A=1:10, B=2:200)
bplapply(a, mean, MulticoreParam(workers=2))

hangs and using GDB reveals that it is blocking in sock_open(). Setting workers = 1 works fine.

I would try to reconfigure our server (Ubuntu 16.04, Dell 48core) if I only knew what to change - is there any documentation on how sockets should be configured to make BiocParallel work with the MultiCore backend ?

Thanks !

@mtmorgan
Copy link
Collaborator

mtmorgan commented Nov 26, 2018

There are two additional parameters manager.hostname and manager.port that you could try to discern valid values of -- the hostname of the computer that you're running on, and an open port.

I am working on an alternative that uses local sockets that does not require open ports.

@MPIIB-Department-TFMeyer

Thanks, that solved it. For some strange reason traffic from my machine to localhost was routed through an external firewall, while traffic to 127.0.0.1 or using the machine name is not. Setting manager.hostname to the latter two worked.

@mtmorgan
Copy link
Collaborator

The LocalParam branch implements the LocalParam() back-end to use 'local' (disk-based) sockets that do not require a port. This is intended to be a drop-in replacement for MulticoreParam.

BiocManager::install("Bioconductor/BiocParallel", ref = "LocalParam")

I'm aware of speed regression in some circumstances, but would welcome any other comments.

@lawremi
Copy link
Author

lawremi commented Jan 25, 2019

Are file sockets really implemented on disk across the board? It would be great for it to be an in-memory stream.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Jan 26, 2019

across which board? These are so-called 'local' sockets, and are file-based in the abstract unix sense; my understanding is that the file system is used as a name space, rather than communication medium.

@lawremi
Copy link
Author

lawremi commented Jan 26, 2019

You had said the sockets were "disk-based" which made me a little worried, but it sounds like they are just file-based, so this is great news.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants