Commit d775693
authored
Bug parquet reader memory consumption (#1757)
* Split FlatColumnData to Read and Write
* Use Read/Write FlatColumnData in reader/writer
* Updated ReadFlatColumnValues constructor
* Make ParquetFile yield rows instead of buffering them in array
* Refactored REad/WriteFlatcolumnValues iterator method
* Yield pages from column chunk reader instead of buffering them
* Optimized Dremel Assembler null level processing
* Make assemblyFlat to yield rows instead of buffering them
* Fix yielding one page at time from ColumnChunkReader
* Optimized microseconds to date time conversio in parquet
* Optimize DremelAssembler assemblyList method
* Optimize DremelAssemble assemblyList assemblyMap assemblyStructure methods
* Optimize using stack for parquet flat columns
* Optimize BinaryReady to yield values
* Make PlainValueUnpacker return generator
* Fix bug in write column values and read even deeply nested columns read page by page
* Cleaned up ParquetFile
* Move to flat number of rows per page vs estimations basde on rows in memory
* Smal refactoring of ReadFlatColumnValues
* Fixed skipping rows in Read/Write flat column values
* Simplified read flat column values
* Make ReadFlatColumnValues take values as generator
* Updated dsl definitions
* CS Fixes
* Regenerate data for parquet extractor benchmark
* Added missing file
* Added a failing test related to data page sizes
* Temporarly save nested column children in one page
* Restored data page size option
* PoC of new Optimized Row Group Writer
* Fixed bug when writing empty pages
* Fix not equal rows distribution across pges
* Fixed bug related to inverting booleans
* More performance optimizations
* Regenerated parquet fixtures in order to fix extractors benchmarks
* Removed legacy row group builder
* Reorganized parquet library namespaces
* Covered column Page Builders with unit tests
* Fixed namespace for ChunkColumnBuilder implementtaions
* Covered RowGroupBuilder and dependencies with unit tests
* Added Benchmarks for Parquet Library
* Updated github workflow with benchmarks
* Added snappy extension to parquet benchmarks1 parent 1ca7f29 commit d775693
119 files changed
Lines changed: 10086 additions & 2475 deletions
File tree
- .github/workflows
- src
- adapter/etl-adapter-parquet
- src/Flow/ETL/Adapter/Parquet
- tests/Flow/ETL/Adapter/Parquet/Tests
- Benchmark/Fixtures
- Integration
- core/etl/tests/Flow/ETL/Tests/Double
- lib/parquet
- src/Flow/Parquet
- BinaryReader
- Data
- Converter
- Dremel
- ColumnData
- Statistics
- Validator
- ParquetFile
- Data
- Page
- Header
- RowGroupBuilder
- PageBuilder
- RowGroup
- Reader
- Writer
- ColumnChunkBuilder
- PageBuilder
- DictionaryBuilder
- ValueStorage
- tests/Flow/Parquet/Tests
- Benchmark
- Fixtures
- Integration
- Binary
- Data
- IO
- ParquetFile
- RowGroupBuilder
- Writer/PageBuilder
- Unit
- Data
- Dremel
- ColumnData
- ParquetFile
- Data
- RowGroupBuilder
- ColumnData
- Writer
- ColumnChunkBuilder
- PageBuilder
- DictionaryBuilder
- web/landing/resources
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
47 | 49 | | |
48 | 50 | | |
49 | 51 | | |
| |||
107 | 109 | | |
108 | 110 | | |
109 | 111 | | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
110 | 120 | | |
111 | 121 | | |
112 | 122 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
361 | 361 | | |
362 | 362 | | |
363 | 363 | | |
364 | | - | |
| 364 | + | |
| 365 | + | |
365 | 366 | | |
366 | 367 | | |
367 | 368 | | |
| |||
381 | 382 | | |
382 | 383 | | |
383 | 384 | | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
384 | 388 | | |
385 | 389 | | |
386 | 390 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
66 | 66 | | |
67 | 67 | | |
68 | 68 | | |
69 | | - | |
| 69 | + | |
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
60 | | - | |
| 60 | + | |
61 | 61 | | |
62 | 62 | | |
63 | 63 | | |
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
74 | 77 | | |
75 | 78 | | |
76 | 79 | | |
| |||
Lines changed: 3 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
27 | 27 | | |
28 | 28 | | |
29 | 29 | | |
30 | | - | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
31 | 33 | | |
32 | 34 | | |
33 | 35 | | |
| |||
Lines changed: 26 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
8 | 12 | | |
9 | 13 | | |
10 | 14 | | |
| |||
68 | 72 | | |
69 | 73 | | |
70 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
Binary file not shown.
Binary file not shown.
Lines changed: 30 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | | - | |
| 8 | + | |
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
12 | | - | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
13 | 15 | | |
14 | 16 | | |
15 | 17 | | |
| |||
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
36 | 64 | | |
37 | 65 | | |
38 | 66 | | |
| |||
0 commit comments