Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix sort based shuffle oom in spill when compress was disabled #7553

Merged
merged 1 commit into from
Dec 13, 2024

Conversation

clay4megtr
Copy link
Contributor

What changes were proposed in this pull request?

Fix oom when compress was disabled in spill.

image

@@ -85,7 +85,7 @@ arrow::Status RssPartitionWriter::doEvict(
inMemoryPayload->toBlockPayload(
payloadType, payloadPool_.get(), codec_ ? codec_.get() : nullptr, std::move(compressed)));
// Copy payload to arrow buffered os.
ARROW_ASSIGN_OR_RAISE(auto rssBufferOs, arrow::io::BufferOutputStream::Create(options_.pushBufferMaxSize, pool_));
ARROW_ASSIGN_OR_RAISE(auto rssBufferOs, arrow::io::BufferOutputStream::Create(options_.pushBufferMaxSize));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on a fix. Though this could turn the memory buffer into untracked?

cc @marin-ma

Copy link
Contributor Author

@clay4megtr clay4megtr Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, actually I'm a little confused about delegate the timing of spill trigger to spark framework, which means we can't make any other memory allocation when spilling.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a threshold to indicate whether trigger spill actively or not?

@github-actions github-actions bot added the VELOX label Oct 16, 2024
Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/apache/incubator-gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@marin-ma
Copy link
Contributor

Thanks. Can we pre-allocate a buffer of size options_.pushBufferMaxSize, and make rssBufferOs reuse the memory each time?

@clay4megtr
Copy link
Contributor Author

clay4megtr commented Oct 16, 2024

Thanks. Can we pre-allocate a buffer of size options_.pushBufferMaxSize, and make rssBufferOs reuse the memory each time?

That's a better solution, I'll update later. btw, I have another question, is it possible that sortTime which defined here include cost time of other spill type? because the trigger root is nativeShuffleWriter.write(...) ?
or spill was triggered in other code path, so the shuffleWallTime will not include the cost spent on shuffle push for that instance, which finally lead to sortTime is inaccurate, I'm not sure about this.

@zhouyuan
Copy link
Contributor

related
#6896

Copy link

github-actions bot commented Dec 2, 2024

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Dec 2, 2024
Copy link

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Dec 13, 2024
@marin-ma marin-ma reopened this Dec 13, 2024
@marin-ma
Copy link
Contributor

Thanks. Can we pre-allocate a buffer of size options_.pushBufferMaxSize, and make rssBufferOs reuse the memory each time?

@clay4megtr Let's merge this one first. Perhaps consider this change in the future. Thanks!

@marin-ma marin-ma merged commit 67b6831 into apache:main Dec 13, 2024
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants