Commit 4c2af81
catalog: query-aware statistics requests via ScanArgs / ScanResult
Adds an opt-in handshake that lets callers ask a `TableProvider` for
specific stats by name and receive only what the provider can answer
cheaply, instead of the all-or-nothing dense `Statistics` we have today.
## What's new
* `datafusion-common::stats::StatisticsRequest` — enum of stat kinds
that mirror `Statistics` / `ColumnStatistics` (Min, Max, NullCount,
DistinctCount, Sum, ByteSize, RowCount, TotalByteSize). `Hash + Eq`
so it can key a `HashMap`.
* `datafusion-common::stats::StatisticsValue` — `Scalar(Precision<...>)
| Distribution(Arc<dyn Any>) | Sketch(Arc<dyn Any>) | Absent`. Whether
a value is exact or estimated travels in the `Precision` wrapper, not
the variant.
* `ScanArgs::with_statistics_requests` / `statistics_requests()` — the
caller's question.
* `ScanResult::with_statistics` / `statistics()` / `into_parts()` — the
provider's answer, paired 1:1 with the requests slice.
* `PartitionedFile::satisfied_stats` — sparse,
`Arc<HashMap<StatisticsRequest, StatisticsValue>>` for per-file
answers. Memory scales with what was asked, not with table width.
Providers that store stats out-of-band (Delta/Iceberg/Hudi manifests,
Hive Metastore, custom catalogs) can populate this directly without
rebuilding a full dense `Statistics`.
* `FilePruner` learns to consume the sparse map. Internally,
`file_stats_pruning` is now `Box<dyn PruningStatistics + Send + Sync>`
so we can dispatch between the existing `PrunableStatistics` (dense)
and a new `SparseFilePruningStats` adapter (sparse). The sparse
adapter looks up each `StatisticsRequest` directly in the map and
materializes single-row arrays only for the columns the pruning
predicate touches — no densify-then-throw-away.
* `ListingTable::scan_with_args` populates `ScanResult.statistics` from
the merged dense `Statistics` it already computed when
`args.statistics_requests()` is set and `collect_statistics=true`.
When `collect_statistics=false` it returns `Absent` for everything
(the contract is "answer what's free"). `DistinctCount`/`Sum`/
`ByteSize` are likewise `Absent` for parquet — those aren't in
thrift footers; layered helpers (or richer providers) can fill the
gaps.
## Backwards compat
All additions are opt-in:
* `ScanArgs` / `ScanResult` gain new fields with `Default`-friendly
initializers; existing callers that don't use the new builders see
no change.
* `FilePruner`'s field-type change is internal (private field).
* The only minor source-level break is a new pub field on
`PartitionedFile` (`satisfied_stats`). Callers using
`PartitionedFile::new` / `From<ObjectMeta>` / the existing builders
are unaffected. Direct struct literals — uncommon, none in-tree —
need to add `satisfied_stats: None` (or use the new
`with_satisfied_stats` builder).
## Tests
* `datafusion-common::stats::tests::statistics_request_is_hashable_keyable`
— round-trip a `StatisticsRequest` through a `HashMap`.
* `datafusion-pruning::file_pruner::tests` — three tests demonstrating
end-to-end pruning against a sparse-only `PartitionedFile` (`x > 100`
prunes a `[10, 20]` file, `x > 15` doesn't, no stats at all → no
pruner).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 9a29e33 commit 4c2af81
4 files changed
Lines changed: 408 additions & 7 deletions
File tree
- datafusion
- catalog/src
- datasource/src
- expr-common/src
- pruning/src
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
406 | 407 | | |
407 | 408 | | |
408 | 409 | | |
| 410 | + | |
409 | 411 | | |
410 | 412 | | |
411 | 413 | | |
| |||
467 | 469 | | |
468 | 470 | | |
469 | 471 | | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
470 | 497 | | |
471 | 498 | | |
472 | 499 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
61 | 62 | | |
62 | 63 | | |
63 | 64 | | |
| |||
138 | 139 | | |
139 | 140 | | |
140 | 141 | | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
141 | 154 | | |
142 | 155 | | |
143 | 156 | | |
| |||
168 | 181 | | |
169 | 182 | | |
170 | 183 | | |
| 184 | + | |
171 | 185 | | |
172 | 186 | | |
173 | 187 | | |
| |||
181 | 195 | | |
182 | 196 | | |
183 | 197 | | |
| 198 | + | |
184 | 199 | | |
185 | 200 | | |
186 | 201 | | |
| |||
200 | 215 | | |
201 | 216 | | |
202 | 217 | | |
| 218 | + | |
203 | 219 | | |
204 | 220 | | |
205 | 221 | | |
| |||
328 | 344 | | |
329 | 345 | | |
330 | 346 | | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
331 | 362 | | |
332 | 363 | | |
333 | 364 | | |
| |||
337 | 368 | | |
338 | 369 | | |
339 | 370 | | |
| 371 | + | |
340 | 372 | | |
341 | 373 | | |
342 | 374 | | |
| |||
534 | 566 | | |
535 | 567 | | |
536 | 568 | | |
| 569 | + | |
537 | 570 | | |
538 | 571 | | |
539 | 572 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1632 | 1632 | | |
1633 | 1633 | | |
1634 | 1634 | | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
| 1702 | + | |
| 1703 | + | |
| 1704 | + | |
| 1705 | + | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
| 1730 | + | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
| 1738 | + | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
| 1742 | + | |
| 1743 | + | |
| 1744 | + | |
| 1745 | + | |
| 1746 | + | |
| 1747 | + | |
| 1748 | + | |
| 1749 | + | |
| 1750 | + | |
| 1751 | + | |
| 1752 | + | |
0 commit comments