-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close #9402
Conversation
see #7844 for whole discuss |
9793b4a
to
5e40a9e
Compare
Any plans for when this will be released? Thank you |
5e40a9e
to
27f74d1
Compare
core/src/test/java/org/apache/iceberg/util/TestParallelIterable.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/util/ParallelIterable.java
Outdated
Show resolved
Hide resolved
Overall the change makes sense, just a few small comments to address |
27f74d1
to
52c19ae
Compare
…added even if iterator exited
52c19ae
to
bb2888d
Compare
@nastra @raunaqmorarka Thanks for review and attention about this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I verified that this fixes the underlying issue. I'll leave the PR open for a bit to give others a chance to review this as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Heltman for fixing this leak, I also agree with this solution since the population of the queue can't really be "interrupted". I'll go ahead and merge.
…pulated even after iterator close (apache#9402)
* Core: Fix ParallelIterable memory leak where queue continues to be populated even after iterator close (apache#9402) (cherry picked from commit d3cb1b6) * Core: Limit ParallelIterable memory consumption by yielding in tasks (apache#10691) ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for processing input iterables. This defaults to 2 * # CPU cores. When one or some of the input iterables are considerable in size and the ParallelIterable consumer is not quick enough, this could result in unbounded allocation inside `ParallelIterator.queue`. This commit bounds the queue. When queue is full, the tasks yield and get removed from the executor. They are resumed when consumer catches up. (cherry picked from commit 7831a8d) * Drop ParallelIterable's queue low water mark (apache#10978) As part of the change in commit 7831a8d, queue low water mark was introduced. However, it resulted in increased number of manifests being read when planning LIMIT queries in Trino Iceberg connector. To avoid increased I/O, back out the change for now. (cherry picked from commit bcb3281) --------- Co-authored-by: Helt <[email protected]> Co-authored-by: Piotr Findeisen <[email protected]>
…pulated even after iterator close (apache#9402)
This patch is a part for fix #7843 , we will need to add
BlockingParallelIterable
to fix the whole problem.