-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
High memory usage when dispatching many targets #1220
Comments
For the first approach, each |
Just chanced across this, seems timely! Would like to confirm - this memory usage is at dispatcher? i.e. the messages have been received by this process and are sitting in a buffer somewhere - and building up as they haven't been processed yet. |
I'm asking as if you keep a backlog at the controller, you are still making a copy of the data, just not sending it to dispatcher yet. Dispatcher should just manage memory through R garbage collection (all external pointers have finalizers attached), so usage may seem high but should not cause problems. So perhaps the solution should indeed be |
#1221 seems to have reduced memory consumption almost by a factor of 10 in my case. |
I remember thinking so yesterday after examining
I thought about doing it that way, but your point is a good one and I realized it early on. Which is why development |
A couple more thoughts on this:
|
Hmm... # _targets.R file:
library(targets)
mirai::serialization(list(qs::qserialize, qs::qdeserialize))
tar_option_set(controller = crew::crew_controller_local())
list(tar_target(x, 1)) I think the reason is that Is there a way to register refhooks without submitting any tasks? If there is, and if I could manually supply the refhooks separately for |
Right, I think something like that would work for targets as it has the DAG and so can ensure the correct data is created just as it's needed. In the more general case there's no guarantee that objects won't be modified by any subsequent evaluation. I'll have to take a look at strategies to reduce memory usage at some point. Not a straightforward one.
Works for persistent workers, can't think of a good way for auto-scaling. You'd seemingly need some sort of handshake when new instances connect, which is not ideal.
May be an idea if I enable it for environments. Custom serialization is limited to handling external pointer objects at the moment to keep the implementation simple. |
Closing in favor of shikokuchuo/mirai#97. |
I am running a pipeline which dispatched a couple thousand targets with "worker" storage/retrieval, and memory usage for the
crew
dispatcher is still in the GB range.As of the last couple versions,
targets
dispatches all the targets it can in the moment, regardless of the saturation level of thecrew
controller. I think we might want to go back to withholding tasks from saturated controllers, but in a more efficient way than before.And maybe there could be an option to compress the data with
qs::qserialize()
(if the user hasqs
installed).qserialize()
with default settings looks to be slightly more compact than a custom fit-for-purpose serialization method might be:Of course there is a speed penalty,
but that might be offset if there the data to ship to workers is lighter.
The text was updated successfully, but these errors were encountered: