Skip to content

Commit

Permalink
selective reader deduplicated array type (#100)
Browse files Browse the repository at this point in the history
Summary:
X-link: facebookincubator/velox#11427


Extend the selective reader for the deduplicated array type. The deduplicated array streams have an internal dictionary semantics, and this brings some noteworthy differences:

1) The offset stream represents indices, matching the cardinality of the rows in the read range. However, the lengths stream has the cardinality of the unique runs.
2) Read ranges can easily be in the middle of the dictionary runs. Hence, we need to manage states to help us either cache and copy the loaded last run/alphabet entry values, or to include them in the current read if previously skipped.
3) Instead of managing states with indices semantics, we manage states with the dictionary semantic to save significant amount of memory. e.g. instead of translating states by maintaining the array indices per row, we record the start of the runs and iterate through them with the sorted row set.
4) Some read ranges are no-op on the dictionary states.

Reviewed By: Yuhta

Differential Revision: D64754886
  • Loading branch information
Huameng (Michael) Jiang authored and facebook-github-bot committed Nov 6, 2024
1 parent a417531 commit 222b8d0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion dwio/nimble/encodings/Encoding.h
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ using vector_size_t = velox::vector_size_t;
struct ReadWithVisitorParams {
// Create the reader nulls buffer if not already exists and return pointer to
// the raw buffer. When it is created, it is created with the full length
// across potential mutliple chunks.
// across potential multiple chunks.
std::function<uint64_t*()> makeReaderNulls;

// Initialize `SelectiveColumnReader::returnReaderNulls_' field. Need to be
Expand Down

0 comments on commit 222b8d0

Please sign in to comment.