File tree Expand file tree Collapse file tree
docs/source/user-guide/latest/compatibility Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -57,6 +57,15 @@ The following shared limitation may produce incorrect results without falling ba
5757 written using the Proleptic Gregorian calendar. This may produce incorrect results for dates before
5858 October 15, 1582.
5959
60+ The following shared limitation raises an error at scan time rather than falling back to Spark:
61+
62+ - Invalid UTF-8 bytes in ` STRING ` columns. Spark permits arbitrary byte sequences in a ` STRING `
63+ column (for example from ` CAST(X'C1' AS STRING) ` ), but Comet's native execution path is built on
64+ Arrow, whose string type is strictly UTF-8. Reading a Parquet file whose ` STRING ` column contains
65+ non-UTF-8 bytes fails with ` Parquet error: encountered non UTF-8 data ` . Disable Comet for the
66+ query, or cast the column to ` BINARY ` before persisting, if you need to preserve non-UTF-8 bytes.
67+ See [ #4121 ] ( https://github.com/apache/datafusion-comet/issues/4121 ) .
68+
6069## ` native_datafusion ` Limitations
6170
6271The ` native_datafusion ` scan has some additional limitations, mostly related to Parquet metadata. All of these
Original file line number Diff line number Diff line change @@ -51,6 +51,17 @@ Spark 4.1 support is experimental and intended for development and testing only.
5151in production.
5252```
5353
54+ ### Known Limitations
55+
56+ - ** ` NullType ` columns in Parquet files**
57+ ([ #4199 ] ( https://github.com/apache/datafusion-comet/issues/4199 ) ): Spark encodes a ` NullType `
58+ column as a Parquet ` BOOLEAN ` physical type annotated with ` LogicalType::Unknown ` . The Rust
59+ ` parquet ` crate that Comet depends on accepts ` Unknown ` only when paired with ` INT32 ` and rejects
60+ any other physical type with ` Parquet error: Cannot annotate Unknown from BOOLEAN for field '<name>' ` .
61+ Any attempt to read a Parquet file that contains a ` NullType ` column fails at decode time before
62+ Comet's scan runs. Workaround: project the column away, cast it to a concrete type before
63+ persisting, or read the file with Comet disabled for that query.
64+
5465## Spark 4.2 (Experimental)
5566
5667Spark 4.2.0-preview4 is provided as experimental support with Java 17 and Scala 2.13.
You can’t perform that action at this time.
0 commit comments