Skip to content

Batch Handling Upgrades

Paul Rogers edited this page Jan 9, 2018 · 10 revisions

Outline

  1. Materialized Field. Column Metadata. Vector structures. Repeated lists. List vector. Union vector.

  2. Row set level. Build batch from schema. Unit testing framework.

  3. Vector readers. Object categories. Vector indexes. Vector accessors. Array accessors. Generated code. Array wrappers for nullable, arrays.

  4. Row-set writers. Top-level writers. Structure. Writing to arrays. Events. Offset vector updates.

  5. Row set loader. Concept of overflow. Column states. Vector states. Overflow processing. Vector allocation. Vector cache and multi-reader model.

  6. Operator framework. Split of concerns. Protocol adapter. Schema change detection.

  7. Projection framework. Concepts. Project lists. Null columns. Implicit columns. Assembling the output batch. Column information in projection list. Recursive projection in maps. Schema smoothing and persistence.

  8. Mock reader. CSV reader. Easy format plugin. Concept of Parquet support.

  9. JSON concepts. JSON issues. Revised JSON parser. JSON semantics. Open issues. Possible opportunities.

  10. Future opportunities. Code generation. Plugin APIs. Reader retrofits. Fixed-size buffers.

Clone this wiki locally