-
Notifications
You must be signed in to change notification settings - Fork 987
Batch Handling Upgrades
Paul Rogers edited this page Jan 14, 2018
·
10 revisions
The batch handling framework consists of a number of layers that combine to enable Drill to control the size of each record batch, which in turn allows Drill to implement effective memory management and admission control.
The material here starts with concepts, then provides a tour of the various components. Each component is heavily commented, so after reading this material, you should be able to get the details from the code itself.
- Conceptual Overview
- Components
- Code
- Metadata
- Row Set Mechanism
- Column Accessors
- Column Readers
- Column Writers
- Result Set Loader
- Operator Framework
- Projection Framework
- Scan Framework
- Mock Reader
- Easy Format Plugin
- CSV (Compliant Text) Reader
- JSON Reader
- Conclusion and Future Work
-
Mock reader. CSV reader. Easy format plugin. Concept of Parquet support.
-
JSON concepts. JSON issues. Revised JSON parser. JSON semantics. Open issues. Possible opportunities.
-
Future opportunities. Code generation. Plugin APIs. Reader retrofits. Fixed-size buffers.