Skip to content

Commit fa255df

Browse files
authored
Correct fixed_length_byte_array.parquet (#111)
* Change fixed_length_byte_array.parquet flba_field to optional * Add previous fixed_length_byte_array.parquet file to bad_data
1 parent 5a6cf84 commit fa255df

4 files changed

Lines changed: 16 additions & 14 deletions

File tree

bad_data/ARROW-GH-47662.parquet

4.23 KB
Binary file not shown.

bad_data/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,5 @@ These are files used for reproducing various bugs that have been reported.
3333
* ARROW-GH-43605.parquet: dictionary index page uses rle encoding but 0 as rle bit-width.
3434
* ARROW-GH-45185.parquet: test case of https://github.com/apache/arrow/issues/45185
3535
where repetition levels start with a 1 instead of 0.
36+
* ARROW-GH-47662.parquet: test case identified in https://github.com/apache/arrow/issues/47662
37+
where a required column contains null values (an incorrect version of data/fixed_length_byte_array.parquet).

data/fixed_length_byte_array.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,14 +31,14 @@ Properties:
3131
writer.model.name: example
3232
Schema:
3333
message schema {
34-
required fixed_len_byte_array(4) flba_field;
34+
optional fixed_len_byte_array(4) flba_field;
3535
}
3636
3737
38-
Row group 0: count: 1000 3.84 B records start: 4 total(compressed): 3.749 kB total(uncompressed):3.749 kB
38+
Row group 0: count: 1000 3,94 B records start: 4 total(compressed): 3,848 kB total(uncompressed):3,848 kB
3939
--------------------------------------------------------------------------------
4040
type encodings count avg size nulls min / max
41-
flba_field FIXED[4] _ _ 1000 3.84 B 105 "0x00000001" / "0x000003E8"
41+
flba_field FIXED[4] _ _ 1000 3,94 B 105 "0x00000001" / "0x000003E8"
4242
```
4343

4444
# Column Index (from parquet-cli column-index command)
@@ -59,15 +59,15 @@ page-8 9 0x00000065 0x00
5959
page-9 6 0x00000001 0x00000064
6060
6161
offset index for column flba_field:
62-
offset compressed size first row index
63-
page-0 4 390 0
64-
page-1 394 390 100
65-
page-2 784 350 200
66-
page-3 1134 386 300
67-
page-4 1520 373 400
68-
page-5 1893 382 500
69-
page-6 2275 382 600
70-
page-7 2657 394 700
71-
page-8 3051 390 800
72-
page-9 3441 402 900
62+
offset compressed size first row index unencoded bytes
63+
page-0 4 400 0 -
64+
page-1 404 400 100 -
65+
page-2 804 361 200 -
66+
page-3 1165 396 300 -
67+
page-4 1561 384 400 -
68+
page-5 1945 392 500 -
69+
page-6 2337 392 600 -
70+
page-7 2729 404 700 -
71+
page-8 3133 400 800 -
72+
page-9 3533 411 900 -
7373
```
102 Bytes
Binary file not shown.

0 commit comments

Comments
 (0)