Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
selective reader deduplicated array type (#100)
Summary: X-link: facebookincubator/velox#11427 Extend the selective reader for the deduplicated array type. The deduplicated array streams have an internal dictionary semantics, and this brings some noteworthy differences: 1) The offset stream represents indices, matching the cardinality of the rows in the read range. However, the lengths stream has the cardinality of the unique runs. 2) Read ranges can easily be in the middle of the dictionary runs. Hence, we need to manage states to help us either cache and copy the loaded last run/alphabet entry values, or to include them in the current read if previously skipped. 3) Instead of managing states with indices semantics, we manage states with the dictionary semantic to save significant amount of memory. e.g. instead of translating states by maintaining the array indices per row, we record the start of the runs and iterate through them with the sorted row set. 4) Some read ranges are no-op on the dictionary states. Reviewed By: Yuhta Differential Revision: D64754886
- Loading branch information