Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Limit ParallelIterable memory consumption by yielding in tasks #10787

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

raunaqmorarka
Copy link

(backport #10691)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for processing input iterables. This defaults to 2 * # CPU cores. When one or some of the input iterables are considerable in size and the ParallelIterable consumer is not quick enough, this could result in unbounded allocation inside ParallelIterator.queue. This commit bounds the queue. When queue is full, the tasks yield and get removed from the executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)

…(backport apache#10691)

ParallelIterable schedules 2 * WORKER_THREAD_POOL_SIZE tasks for
processing input iterables. This defaults to 2 * # CPU cores.  When one
or some of the input iterables are considerable in size and the
ParallelIterable consumer is not quick enough, this could result in
unbounded allocation inside `ParallelIterator.queue`. This commit bounds
the queue. When queue is full, the tasks yield and get removed from the
executor. They are resumed when consumer catches up.

(cherry picked from commit 7831a8d)
@github-actions github-actions bot added the core label Jul 26, 2024
@ajantha-bhat
Copy link
Member

I think first we need to conclude whether are we having 1.6.1 release just for this fix? (mailing list discussion can be a good place for this)
If not, it is not worth back porting as 1.7.0 release will ship this fix.

@findepi
Copy link
Member

findepi commented Jul 26, 2024

Good question, @ajantha-bhat .
For context, his was discussed on slack here: https://apache-iceberg.slack.com/archives/C03LG1D563F/p1720791069289639?thread_ts=1720787223.468779&cid=C03LG1D563F and also with @Fokko on a DM.
Trino needs this fix, so it would be great to release it as possible. Do you think 1.7.0 is the quickest way forward?

@findepi findepi requested review from jbonofre and Fokko July 26, 2024 11:19
@ajantha-bhat
Copy link
Member

1.7.0 may take another quarter. I think 1.6.1 is fine. But just need to notify in the mailing list and then raise a PR for backport.
And if people have other issue fixes they can also chime in for the release.

@Fokko Fokko added this to the Iceberg 1.6.1 milestone Jul 26, 2024
@Fokko
Copy link
Contributor

Fokko commented Jul 26, 2024

Hey there, yes I already created a 1.6.1 milestone.

Of course, all official communication will go over the mailing list, and that's the place to see if there are more fixes that people want to backport. It would be good to start a [DISCUSS] thread there. Remember that we backport bugs, or things that block adoption and no new features. I do think it is good to see if any issues arise when down streaming 1.6.0 in the ecosystem so we can solve them in 1.6.1.

@findepi
Copy link
Member

findepi commented Jul 26, 2024

Since this fix exists on main & just after 1.6.0 release, i believe merging it into 1.6.x is non-controvercial. In the worst case it will just sit there.
Since it is actively awaited for, it would be awesome to do a minor release at earliest convenience. I volunteer to do the release, if possible. Of course it would be awesome to have more content for the release, but I personally don't mind releasing with just one change. If 1.6.0 is as solid a release as we believe it is, I don't expect many fixes to be backported.

@findepi
Copy link
Member

findepi commented Jul 26, 2024

I can start a mail thread for a discussion. What exactly we want to be talking about? I think we need to let people know about 1.6.1 being planned, so that any other fixes that are urgent can be likewise backported. Anything else?

@Fokko
Copy link
Contributor

Fokko commented Jul 26, 2024

Of course it would be awesome to have more content for the release, but I personally don't mind releasing with just one change. If 1.6.0 is as solid a release as we believe it is, I don't expect many fixes to be backported.

That's exactly the goal of the [DISCUSS] on the mailing list. 👍

@findepi
Copy link
Member

findepi commented Jul 26, 2024

I will merge this PR, there is no controversy about the change itself.
I sent [DISCUSS] Iceberg 1.6.1 release mail to the dev mailing listing so that we can have discussion about what else should go in before we cut the release.

Thank you all!

@findepi findepi merged commit e18a2fe into apache:1.6.x Jul 26, 2024
49 checks passed
@raunaqmorarka raunaqmorarka deleted the port-10691 branch July 29, 2024 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants