Skip to content

[branch-0.9] Cherry pick feat(datafusion): expose PartitionKeysKind getter on IcebergTableScan#21

Merged
toutane merged 1 commit into
branch-0.9from
branch-0.9-cherry-pick-11
May 19, 2026
Merged

[branch-0.9] Cherry pick feat(datafusion): expose PartitionKeysKind getter on IcebergTableScan#21
toutane merged 1 commit into
branch-0.9from
branch-0.9-cherry-pick-11

Conversation

@toutane
Copy link
Copy Markdown

@toutane toutane commented May 19, 2026

Cherry pick 35741b7
Jira https://datadoghq.atlassian.net/browse/QECO-1260

Which issue does this PR close?

  • Closes #.

What changes are included in this PR?

Expose to callers which transform family produced an IcebergTableScan's Partitioning::Hash declaration. Today the scan's partitioning surface is opaque: properties().partitioning only carries the Column exprs, so any consumer that needs to branch on "this Hash is identity-backed vs. bucket-backed" has to re-walk the table metadata and re-run the detection logic from bucketing.

Public, re-exported from crate::table:

#[non_exhaustive]
pub enum PartitionKeysKind { Identity, Bucket }
impl IcebergTableScan {
    pub fn partition_keys_kind(&self) -> Option<PartitionKeysKind>;
}

None means the scan declares UnknownPartitioning. Wiring goes through a crate-internal with_partition_keys_kind builder setter on IcebergTableScan, called by IcebergTableProvider::scan right after Partitioning::Hash is chosen. Public constructors (IcebergTableScan::new, new_with_tasks) are unchanged.

  • Single source of truth. PartitionKeysKind is derived via PartitionKeys::kind(), i.e. from the same descriptor that drives both task bucketing and the Hash expression list, so kind and Partitioning::Hash cannot drift.
  • #[non_exhaustive]. Future transform families (e.g. truncate, mixed) can be added without breaking downstream matches.
  • Builder setter rather than a new constructor. Avoids growing the already-wide new_with_tasks signature and keeps the field private.

Are these changes tested?

Existing identity and bucket tests in table/mod.rs extended with partition_keys_kind() assertions on both the Hash and UnknownPartitioning paths.

@toutane toutane marked this pull request as ready for review May 19, 2026 09:21
Base automatically changed from branch-0.9-cherry-pick-10 to branch-0.9 May 19, 2026 14:35
@toutane toutane force-pushed the branch-0.9-cherry-pick-11 branch from 1ce2eb9 to 35741b7 Compare May 19, 2026 14:37
Add `PartitionKeysKind` (#[non_exhaustive] enum: Identity | Bucket) and
a public `partition_keys_kind() -> Option<PartitionKeysKind>` getter on
`IcebergTableScan`, so callers can distinguish identity-backed from
bucket-backed `Partitioning::Hash` without re-inspecting table metadata.

Wired through `IcebergTableProvider::scan` via a crate-internal
`with_partition_keys_kind` setter; public constructor signatures are
unchanged. Existing bucket/identity tests extended with
`partition_keys_kind()` assertions.

(cherry picked from commit 28d117f)
@toutane toutane force-pushed the branch-0.9-cherry-pick-11 branch from 35741b7 to 02ac860 Compare May 19, 2026 14:39
@toutane toutane merged commit c10a519 into branch-0.9 May 19, 2026
2 checks passed
@toutane toutane deleted the branch-0.9-cherry-pick-11 branch May 19, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants