chore: drop leftover JVM Parquet helpers and the native_datafusion scan name#4385
Merged
Conversation
…scan name After the JVM Parquet reader paths were removed, three Spark-derived helpers in `spark.sql.comet.parquet` and `parquet.filter2.predicate` were left behind with no production callers. The scan they used to support is also the only scan we still have, so the `native_datafusion` qualifier in code, comments, and test names is now redundant. Removed: - `CometParquetReadSupport.scala`: copy of Spark's `ParquetReadSupport` - `CometSparkToParquetSchemaConverter.scala`: only referenced by the file above - `SparkFilterApi.java`: copy of Spark 3.2 filter API, never referenced Renamed: - `CometScanRule.nativeDataFusionScan` -> `nativeScan` - Test cases and comments referring to `native_datafusion` now say "native scan" / "native Parquet scan" Also dropped the stale `native_iceberg_compat` mention from `parquet_exec.rs` and the bug-triage skill's area-indicator list.
mbutrovich
approved these changes
May 22, 2026
Contributor
mbutrovich
left a comment
There was a problem hiding this comment.
Thanks for finding the dead code, @andygrove!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
After the JVM Parquet reader paths (
native_cometandnative_iceberg_compat) were retired, three Spark-derived helpers were left behind with no production callers. They were originally needed for the JVM vectorized reader to clip Spark request schemas against Parquet file schemas and to build predicate column descriptors; the native Parquet scan does all of that in Rust now.The same cleanup pass surfaced something else: every reference to
native_datafusionin code, comments, and test names is now misleading. There is only one scan, so naming it adds no information and makes the code read as if other options still exist.What changes are included in this PR?
spark/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scalaspark/src/main/scala/org/apache/spark/sql/comet/parquet/CometSparkToParquetSchemaConverter.scala(only referenced by the file above)spark/src/main/java/org/apache/parquet/filter2/predicate/SparkFilterApi.java(untouched since the initial PR; no callers)CometScanRule.nativeDataFusionScantonativeScan.native_datafusionto just saynative scan(5 tests inParquetReadV1Suite, 1 inCometTaskMetricsSuite).ParquetReadSuite,ParquetEncryptionITCase,CometNativeScan,parquet_exec.rs, and the bug-triage skill that referred tonative_datafusionornative_iceberg_compat.Net diff: 10 files, +17 / -767.
How are these changes tested?
Covered by existing tests; the renames are mechanical and the deleted files have no callers.
ParquetReadV1Suite "native scan rejects": 5/5 pass (the renamed type-mismatch regression tests).makeandcargo clippy --all-targets --workspace -- -D warningsboth green.