-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Limit ParallelIterable memory consumption by yielding in tasks #10787
Conversation
…(backport apache#10691) ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for processing input iterables. This defaults to 2 * # CPU cores. When one or some of the input iterables are considerable in size and the ParallelIterable consumer is not quick enough, this could result in unbounded allocation inside `ParallelIterator.queue`. This commit bounds the queue. When queue is full, the tasks yield and get removed from the executor. They are resumed when consumer catches up. (cherry picked from commit 7831a8d)
I think first we need to conclude whether are we having 1.6.1 release just for this fix? (mailing list discussion can be a good place for this) |
Good question, @ajantha-bhat . |
1.7.0 may take another quarter. I think 1.6.1 is fine. But just need to notify in the mailing list and then raise a PR for backport. |
Hey there, yes I already created a 1.6.1 milestone. Of course, all official communication will go over the mailing list, and that's the place to see if there are more fixes that people want to backport. It would be good to start a |
Since this fix exists on main & just after 1.6.0 release, i believe merging it into 1.6.x is non-controvercial. In the worst case it will just sit there. |
I can start a mail thread for a discussion. What exactly we want to be talking about? I think we need to let people know about 1.6.1 being planned, so that any other fixes that are urgent can be likewise backported. Anything else? |
That's exactly the goal of the |
I will merge this PR, there is no controversy about the change itself. Thank you all! |
(backport #10691)
ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for processing input iterables. This defaults to 2 * # CPU cores. When one or some of the input iterables are considerable in size and the ParallelIterable consumer is not quick enough, this could result in unbounded allocation inside
ParallelIterator.queue
. This commit bounds the queue. When queue is full, the tasks yield and get removed from the executor. They are resumed when consumer catches up.(cherry picked from commit 7831a8d)