Skip to content

Extract parquet push decoder module#22289

Merged
adriangb merged 4 commits into
apache:mainfrom
xudong963:xudong963/extract-push-decoder-module
May 17, 2026
Merged

Extract parquet push decoder module#22289
adriangb merged 4 commits into
apache:mainfrom
xudong963:xudong963/extract-push-decoder-module

Conversation

@xudong963
Copy link
Copy Markdown
Member

@xudong963 xudong963 commented May 17, 2026

Which issue does this PR close?

Follow-up to #22191.

Rationale for this change

This is a code organization follow-up suggested during review of #22191. The push decoder setup and stream-driving state now live outside opener.rs, making the opener focus on orchestration.

What changes are included in this PR?

  • Move RowFilterGenerator to row_filter.rs, next to build_row_filter.
  • Add push_decoder.rs for DecoderBuilderConfig and PushDecoderStreamState.
  • Register the new parquet datasource module from mod.rs.

No behavior change intended.

Are these changes tested?

Existing tests cover the behavior.

Are there any user-facing changes?

No. This is an internal refactor.

@github-actions github-actions Bot added the datasource Changes to the datasource crate label May 17, 2026
@xudong963
Copy link
Copy Markdown
Member Author

All looks good to me, so I don't make any changes for your commits. haha cc @adriangb

Copy link
Copy Markdown
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice thank you!

@adriangb adriangb enabled auto-merge May 17, 2026 05:06
@adriangb adriangb added this pull request to the merge queue May 17, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to a conflict with the base branch May 17, 2026
adriangb and others added 4 commits May 16, 2026 22:23
This decouples the helper from opener.rs internals: it now takes its
inputs directly (predicate, schema, metadata, reorder flag, metrics)
rather than a `&PreparedParquetOpen`, so it lives alongside the
underlying `build_row_filter` it wraps. opener.rs constructs it from
the prepared state.

No behavior change.
Move the configuration struct and its builder construction out of
opener.rs into a new `push_decoder` module. The helper becomes a
`build(prepared_access_plan, metadata)` method on the config, which
reads more naturally at the call site.

No behavior change.
Co-locate the per-file stream driver with the builder configuration it
consumes. Add `into_stream` so the opener doesn't need to name the
unfold/fuse type; it just hands off the state and gets back a
`BoxStream`.

After this commit, push_decoder.rs owns the full push-decoder
lifecycle (builder setup + stream driving), and build_stream in
opener.rs reads as orchestration: prepare access plans, build
decoders, hand off to the stream, optionally wrap in EarlyStoppingStream.

No behavior change.
@adriangb adriangb force-pushed the xudong963/extract-push-decoder-module branch from 88ad15f to 960a9fe Compare May 17, 2026 05:24
@adriangb adriangb enabled auto-merge May 17, 2026 05:31
@adriangb adriangb added this pull request to the merge queue May 17, 2026
Merged via the queue into apache:main with commit b4a6eb1 May 17, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants