Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PECO-1054 Expose Arrow batches to users, part two (databricks#163)
Updated FetchableItems interface to return an instance of OutputType instead of a slice of output type. Created interfaces SparkArrowBatch and SparkArrowRecord and implementations of each. This also changed the 1:1 ratio of batch instances to arrow records. A SparkArrowBatch can contain multiple arrow records now. Created BatchIterator interface and implementation and switched arrowRowScanner to use BatchIterator instead of BatchLoader Created RowValues interface and implementation as a container for the currently loaded values for a set of rows. Updated the behaviour of fetchable items cloudURL and localBatch to de-serialize the arrow records as part of fetching, rather than carry around the raw bytes for later de-serialization. Also eliminated the cloud fetch code that was de-serializing the arrow batch then serializing each record individually to create one batch instance per record. Removed chunkedByteReader and replaced with io.MultiReader Normalized use of row number so that there is no need to track the index of the row in the current batch.
- Loading branch information