forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 23
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
ARROW-1943: [JAVA] handle setInitialCapacity for deeply nested lists
The current implementation of setInitialCapacity() uses a factor of 5 for every level we go into list: So if the schema is LIST (LIST (LIST (LIST (LIST (LIST (LIST (BIGINT)))))) and we start with an initial capacity of 128, we end up throwing OversizedAllocationException from the BigIntVector because at every level we increased the capacity by 5 and by the time we reached inner scalar that actually stores the data, we were well over max size limit per vector (1MB). We saw this problem downstream when we failed to read deeply nested JSON data. The potential fix is to use the factor of 5 only when we are down to the leaf vector. As the depth increases and we are still working with complex/list, we don't use the factor of 5. cc @jacques-n , @BryanCutler , @icexelloss Author: siddharth <[email protected]> Closes apache#1439 from siddharthteotia/ARROW-1943 and squashes the following commits: d0adbad [siddharth] unit tests e2f21a8 [siddharth] fix imports d103436 [siddharth] ARROW-1943: handle setInitialCapacity for deeply nested lists
- Loading branch information
1 parent
41b11bf
commit 9d84f84
Showing
2 changed files
with
135 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters