Skip to content

chore: drop leftover JVM Parquet helpers and the native_datafusion scan name#4385

Merged
andygrove merged 2 commits into
apache:mainfrom
andygrove:parquet-reader-audit
May 22, 2026
Merged

chore: drop leftover JVM Parquet helpers and the native_datafusion scan name#4385
andygrove merged 2 commits into
apache:mainfrom
andygrove:parquet-reader-audit

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #.

Rationale for this change

After the JVM Parquet reader paths (native_comet and native_iceberg_compat) were retired, three Spark-derived helpers were left behind with no production callers. They were originally needed for the JVM vectorized reader to clip Spark request schemas against Parquet file schemas and to build predicate column descriptors; the native Parquet scan does all of that in Rust now.

The same cleanup pass surfaced something else: every reference to native_datafusion in code, comments, and test names is now misleading. There is only one scan, so naming it adds no information and makes the code read as if other options still exist.

What changes are included in this PR?

  • Delete three dead files:
    • spark/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scala
    • spark/src/main/scala/org/apache/spark/sql/comet/parquet/CometSparkToParquetSchemaConverter.scala (only referenced by the file above)
    • spark/src/main/java/org/apache/parquet/filter2/predicate/SparkFilterApi.java (untouched since the initial PR; no callers)
  • Rename CometScanRule.nativeDataFusionScan to nativeScan.
  • Update test names that contained native_datafusion to just say native scan (5 tests in ParquetReadV1Suite, 1 in CometTaskMetricsSuite).
  • Update comments in ParquetReadSuite, ParquetEncryptionITCase, CometNativeScan, parquet_exec.rs, and the bug-triage skill that referred to native_datafusion or native_iceberg_compat.

Net diff: 10 files, +17 / -767.

How are these changes tested?

Covered by existing tests; the renames are mechanical and the deleted files have no callers.

  • ParquetReadV1Suite "native scan rejects": 5/5 pass (the renamed type-mismatch regression tests).
  • make and cargo clippy --all-targets --workspace -- -D warnings both green.

andygrove and others added 2 commits May 21, 2026 09:43
…scan name

After the JVM Parquet reader paths were removed, three Spark-derived
helpers in `spark.sql.comet.parquet` and `parquet.filter2.predicate`
were left behind with no production callers. The scan they used to
support is also the only scan we still have, so the `native_datafusion`
qualifier in code, comments, and test names is now redundant.

Removed:
- `CometParquetReadSupport.scala`: copy of Spark's `ParquetReadSupport`
- `CometSparkToParquetSchemaConverter.scala`: only referenced by the file
  above
- `SparkFilterApi.java`: copy of Spark 3.2 filter API, never referenced

Renamed:
- `CometScanRule.nativeDataFusionScan` -> `nativeScan`
- Test cases and comments referring to `native_datafusion` now say
  "native scan" / "native Parquet scan"

Also dropped the stale `native_iceberg_compat` mention from
`parquet_exec.rs` and the bug-triage skill's area-indicator list.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding the dead code, @andygrove!

@andygrove andygrove merged commit 354ad46 into apache:main May 22, 2026
125 checks passed
@andygrove andygrove deleted the parquet-reader-audit branch May 22, 2026 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants