@@ -30,3 +30,30 @@ This guide documents areas where Comet's behavior is known to differ from Spark.
3030- ** Expressions** : per-expression compatibility notes, including cast.
3131- ** JSON** : choosing between the native and Spark-compatible engines for JSON expressions.
3232- ** Spark versions** : version-specific known issues and limitations.
33+
34+ ## Native and codegen-dispatch implementations
35+
36+ Some Spark expressions have two implementations in Comet:
37+
38+ - A ** codegen-dispatch** implementation that runs Spark's own generated code for the
39+ expression inside Comet's native pipeline (via the Arrow-direct codegen dispatcher). This
40+ produces byte-exact Spark results at the cost of one JNI round-trip per batch. It is gated
41+ globally by ` spark.comet.exec.scalaUDF.codegen.enabled ` (enabled by default); when the
42+ dispatcher is disabled, these expressions fall back to Spark.
43+ - A ** native** (Rust / DataFusion) implementation that is faster, with no JNI overhead, but
44+ has known semantic differences from Spark for some inputs or patterns.
45+
46+ Because the codegen-dispatch path matches Spark exactly, Comet uses it by ** default** . The
47+ faster native path is ** opt-in per expression** via that expression's
48+ ` spark.comet.expression.<ExprClassName>.allowIncompatible=true ` flag, which declares that you
49+ accept its differences from Spark. There is no global opt-in. When the native path is enabled
50+ but a specific input or pattern has no native implementation, Comet routes that case back
51+ through the codegen dispatcher rather than running something incompatible.
52+
53+ This is the model behind the [ regular expression] ( regex.md ) and [ JSON] ( json.md ) families,
54+ which document their per-expression configs and the specific differences to expect.
55+
56+ This is distinct from expressions that have ** no** codegen-dispatch path: there, the
57+ incompatible cases fall back to Spark by default, and ` allowIncompatible=true ` runs the native
58+ (incompatible) path instead. ` cast ` is the main example; see the
59+ [ expression reference] ( ../expressions.md ) for which expressions have incompatible cases.
0 commit comments