You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`variant/int_overflow_in_bounds_check.parquet`| Triggers an overflow if 32 bit multiplication is used to calculate ranges. |
47
+
|`variant/out_of_range_dictionary_size.parquet`| The dictionary is declared as larger than the data |
48
+
|`variant/malformed_child_inside_well_formed_parent.parquet`| Parent is well formed; child is malformed |
49
+
|`variant/out_of_range_child_offset.parquet`| The offset of an child element is out of range |
50
+
|`variant/out_of_range_element_count.parquet`| The number of declared array elements is larger than the data |
51
+
|`variant/bad_data/variants/over_deep_nested_children.parquet`| The hierarchy is excessively deep |
52
+
53
+
The first of these is the most critical, as this can trigger a memory allocation of many GiB, which may affect the operations of other worker threads in a shared process; an oversized dictionary may also trigger excessive memory allocation.
54
+
55
+
The out of range child and element files contain metadata referring to content past the end of the actual data field.
56
+
On languages with strict range check, this will fail on read; extra verification simply changes when the failure is detected.
57
+
For languages where range checks are not automatically, there is a risk of variant data referencing other data on the stack/in the heap.
58
+
As this data is read only, there's no _direct_ threat to the integrity of the process, but it is still highly dangerous.
59
+
60
+
One notable file is `bad_data/variants/over_deep_nested_children.parquet`, which verifies that nested variant children over 500 levels deep is rejected. This number is subjective; it was chosen to be consistent with the JSON parser `org.apache.parquet.variant.VariantJsonParser`.
61
+
62
+
Currently excluded from these tests is any with an explicit limit on the size of a variant.
63
+
Apache Spark places a limit on 128 MiB on each of the metadata and value fields here.
0 commit comments