-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[VL] Move ColumnarBuildSideRelation
's memory occupation to Spark off-heap
#7750
Comments
Do you mean we can allocate the batch in Spark's offheap memory? if so it's a good solution. |
Correct, I assume it could be straightforward to simply change the allocation to off-heap. It's not likely similar with Gazelle's case which used to be tricky. |
Hi, I would like to do this ticket to learn more about Gluten. Could you please point me to where an off-heap allocated container is used in the code? Thank you. |
Thank you in advance for helping @Zand100 . You can refer to vanilla Spark's code UnsafeHashedRelation where So I can assign this ticket to you, I suppose? |
@Zand100 You may consider to upstream the PR to upstream spark as well. It's another solution about offheap/onheap conflict, that we move all Spark's large memory allocation to offheap once offheap is enabled. |
Thank you! Just to check I'm on the right track, I'm trying to use the Yes, you can assign the ticket to me. |
Hi, how should I handle constructing a
If |
Could you please review the draft #7885? |
Not necessarily to use |
Thank you! Could you please review the draft of the new binary container? #7902 |
Could you please review #7944 ? Should |
So far
ColumnarBuildSideRelation
is allocated on Spark JVM heap memory.It appears that we can replace
batches: Array[Array[Byte]]
with an off-heap allocated container to move the memory usage to off-heap. There should be a simple solution that doesn't require too much of refactor.This could avoid some of the heap OOM issues.
The text was updated successfully, but these errors were encountered: