Skip to content

Batch Handling Upgrades

Paul Rogers edited this page Jan 14, 2018 · 10 revisions

The batch handling framework consists of a number of layers that combine to enable Drill to control the size of each record batch, which in turn allows Drill to implement effective memory management and admission control.

The material here starts with concepts, then provides a tour of the various components. Each component is heavily commented, so after reading this material, you should be able to get the details from the code itself.

  1. Mock reader. CSV reader. Easy format plugin. Concept of Parquet support.

  2. JSON concepts. JSON issues. Revised JSON parser. JSON semantics. Open issues. Possible opportunities.

  3. Future opportunities. Code generation. Plugin APIs. Reader retrofits. Fixed-size buffers.

Clone this wiki locally