Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bpexport functionality #24

Open
mtmorgan opened this issue Nov 6, 2013 · 9 comments
Open

Add bpexport functionality #24

mtmorgan opened this issue Nov 6, 2013 · 9 comments
Assignees

Comments

@mtmorgan
Copy link
Collaborator

mtmorgan commented Nov 6, 2013

bpexport to make local variables available to remote computation. From the mailing list

@ghost ghost assigned mtmorgan Nov 6, 2013
@DarwinAwardWinner
Copy link

I'm taking a stab at this here: https://github.com/DarwinAwardWinner/BiocParallel/tree/bpexport

So far I've added stubs for all the params, and I've added a clusterExport-based implementation for SnowParam. But thinking about it, that will only work if the cluster is running when clusterExport is called, so even that is not fully implemented.

@DarwinAwardWinner
Copy link

One issue to consider is, what if we call bpexport on a SerialParam or a MulticoreParam? They already have access to all the parent's variables, including any changes to those variables' values that occur after the call to bpexport. Should we make an attempt to have these params match the behavior of e.g. SnowParam by storing a snapshot of the variables when bpexport is called and then using that snapshot in place of the current value when the param is used?

Also, what should happen when you call bpexport on a stopped cluster? What should happen when you stop a cluster after exporting a variable?

@mllg
Copy link
Collaborator

mllg commented Nov 8, 2013

I'd like to suggest creating a simple class/list storing objects exported via bpexport. As soon as bplapply/bpmapply is called the objects can then be put into the function's environment. Something like

exported = list(x = 12, y = rnorm(10))
mapply(assign, x = names(exported), values = exported, MoreArgs=list(envir = environment(FUN))

You would just have to check that environmentName(FUN) != "R_GlobalEnv" and in this case just give the function a new environment with the GlobalEnv as parent.

@DarwinAwardWinner
Copy link

I think it's probably a good idea to always give the function a new environment with the exported values and with the function's previous environment as parent. Are you suggesting this for the SerialParam and MulticoreParam classes?

@mllg
Copy link
Collaborator

mllg commented Nov 8, 2013

Yes, Serial and Multicore. I also see no drawbacks for BatchJobs over its internal export mechanism. I don't know if this is applicable for DoPar. You could pass them to .export in foreach, but I was unable to find a way to turn the heuristic auto-export off.

On more thing to consider is the expected behavior if a variable is explicitly exported and also defined in the function's environment. Variables in the function's env have precedence in the lookup which deviates from the lookup using parallel/clusterExport (which assigns to GlobalEnv on the slaves).

@DarwinAwardWinner
Copy link

Well, I think the goal would be in all cases to keep the behavior consistent across all param classes. So to answer what happens when you export a variable and the same variable is defined in the function's environment, we ask what happens naturally in the case of ShowParam where you use clusterExport to implement bpexport, and then make sure we do the same thing for the other params, right? I actually don't know how (or if) function environments get transferred between processes by snow and others.

@DarwinAwardWinner
Copy link

Actually, to be honest, I'm probably not the best person to implement this, because the vast majority of the time I want to do parallel stuff in R, I use multicore, so I never have to worry about exporting variables and I have no real idea how to do it.

@DarwinAwardWinner
Copy link

Thinking about it, we should probably take this same "just-in-time export" approach for SnowParam as well. This will solve the problem of the cluster not being running when bpexport is called.

@DarwinAwardWinner
Copy link

Ok, I am finding myself using BatchJobsParam a lot and wanting export functionality, so I will try to work on this some time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants