Skip to content

refactor(parquet): split opener.rs into a module#22156

Draft
adriangb wants to merge 1 commit into
apache:mainfrom
adriangb:refactor/parquet-opener-modules-and-optimizers
Draft

refactor(parquet): split opener.rs into a module#22156
adriangb wants to merge 1 commit into
apache:mainfrom
adriangb:refactor/parquet-opener-modules-and-optimizers

Conversation

@adriangb
Copy link
Copy Markdown
Contributor

@adriangb adriangb commented May 13, 2026

Which issue does this PR close?

Relates to the discussion in #22024 about the Parquet opener becoming a tangled state machine. Does not close that issue — this is a small, mechanical step that makes future hook/extension work easier.

Rationale for this change

opener.rs is ~2,700 LOC on main, and several pieces in it have nothing to do with the open-file state machine — they're self-contained types sharing a file with their caller. Moving them to siblings drops the cognitive cost of reading the state machine without changing any public API or behavior.

This is intentionally minimal so it can land quickly. The hook/extension trait that addresses #22024's broader concern ("the opener is becoming a mini planner") is being prepared as a separate PR so reviewers can evaluate each change on its own merits.

What changes are included in this PR?

Module split, pure code motion:

  • opener.rsopener/mod.rs
  • New opener/early_stop.rsEarlyStoppingStream (~100 LOC), the dynamic-filter early-termination wrapper used at the end of build_stream.
  • New opener/encryption.rsEncryptionContext and the ParquetMorselizer::get_encryption_context helpers. Isolates the #[cfg(feature = \"parquet_encryption\")] gating that previously bled through the main file.

The state machine, the pruning logic, and all public APIs are unchanged.

Are these changes tested?

Yes:

  • All 99 existing datafusion-datasource-parquet unit tests pass.
  • cargo fmt --all, ./dev/rust_lint.sh, cargo clippy -p datafusion-datasource-parquet --all-targets --all-features -- -D warnings all pass.
  • Downstream datafusion core builds clean.

Are there any user-facing changes?

No. Public API is unchanged. The renamed file is module-private. No breaking changes.

@github-actions github-actions Bot added the datasource Changes to the datasource crate label May 13, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented May 13, 2026

We could potentially split this into module moves too

Convert `opener.rs` (~2,700 LOC) into an `opener/` module with two
sibling files extracted from the state machine:

- `opener/early_stop.rs` — `EarlyStoppingStream`, the dynamic-filter
  early-termination wrapper used at the end of `build_stream`. Pure
  code motion.
- `opener/encryption.rs` — `EncryptionContext` and the
  `ParquetMorselizer::get_encryption_context` helpers. Isolates the
  `#[cfg(feature = "parquet_encryption")]` gating that previously
  bled through the main file.

No public API change, no behavior change. The state machine and all
pruning logic stay in `opener/mod.rs`.
@adriangb adriangb force-pushed the refactor/parquet-opener-modules-and-optimizers branch from b044e05 to 3efc367 Compare May 14, 2026 00:29
@adriangb adriangb changed the title refactor(parquet): split opener.rs into module + add ParquetAccessPlanOptimizer trait refactor(parquet): split opener.rs into a module May 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants