Commit 9d92944
fix: preserve Inexact precision in Statistics (#22146)
## Which issue does this PR close?
No issue reported
## Rationale for this change
`Statistics::with_fetch` unconditionally returned `Precision::Exact(0)`
when the input had `nr <= skip`, even when the input was `Inexact(nr)` —
promoting an estimated upper bound into an exact zero. The exactness
flag then misleads downstream consumers (notably `AggregateStatistics`
via `Count::value_from_stats`) into trusting a derived "0" and folding
the count subtree to a literal.
Concrete user-visible symptom reported on TPC-H Q22:
```rust
let df = ctx.sql(q22_sql).await?;
df.clone().show().await?; // prints 7 rows (correct)
df.count().await?; // returns 0 (wrong)
```
`EXPLAIN` for the count plan shows the outer count aggregate collapsed
to `ProjectionExec([lit(0)]) -> PlaceholderRowExec`.
After PR #21240 left uncorrelated scalar subqueries in the filter rather
than rewriting them to joins, `FilterExec` can't use interval analysis
on `ScalarSubqueryExpr`, falls back to the 20% default selectivity, and
produces a small `Inexact` row estimate. A `LeftAnti` join whose
estimated semi-overlap covers the outer estimate then yields
`Inexact(0)`. That zero propagates through grouped aggregates whose
`estimate_num_rows` returns the child stats unchanged when `value == 0`.
The pre-existing `with_fetch` bug on a downstream `SortExec` finally
promotes it to `Exact(0)`, which `AggregateStatistics` trusts.
The root cause is the precision promotion in `with_fetch`. The PR fixes
that; the surrounding plan-shape changes after #21240 just made it
reachable.
## What changes are included in this PR?
- `Statistics::with_fetch`: when `nr <= skip`, preserve the exactness of
the input via `check_num_rows(Some(0),
self.num_rows.is_exact().unwrap())` instead of always returning
`Exact(0)`.
- `datafusion-common`: new unit test
`test_with_fetch_skip_all_rows_inexact` pinning the new behaviour.
- `datafusion-physical-plan`: update the existing
`test_row_number_statistics_for_global_limit` expectation that encoded
the old (incorrect) promotion to expect `Inexact(0)` now.
- `datafusion/sqllogictest/test_files/subquery.slt`: SLT regression test
reproducing the user-visible `count(*)` symptom over a query that
contains a scalar subquery, `not exists`, and a group-by on a derived
column, backed by parquet sources so the data sources report Exact
statistics.
## Are these changes tested?
Yes:
- Unit test in `datafusion-common` for the precision-preserving
behaviour.
- Updated unit test in `datafusion-physical-plan/limit.rs`.
- SLT regression test in `subquery.slt` that fails without the fix
(`count(*)` returns `0` instead of `2`) and passes with it.
## Are there any user-facing changes?
Only the bug fix. No public API changes.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 89391b5 commit 9d92944
3 files changed
Lines changed: 109 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
553 | 553 | | |
554 | 554 | | |
555 | 555 | | |
556 | | - | |
| 556 | + | |
557 | 557 | | |
558 | | - | |
559 | | - | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
560 | 562 | | |
561 | 563 | | |
562 | 564 | | |
| |||
2336 | 2338 | | |
2337 | 2339 | | |
2338 | 2340 | | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
| 2348 | + | |
| 2349 | + | |
| 2350 | + | |
| 2351 | + | |
| 2352 | + | |
| 2353 | + | |
| 2354 | + | |
| 2355 | + | |
| 2356 | + | |
2339 | 2357 | | |
2340 | 2358 | | |
2341 | 2359 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
791 | 791 | | |
792 | 792 | | |
793 | 793 | | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
794 | 797 | | |
795 | 798 | | |
796 | | - | |
| 799 | + | |
797 | 800 | | |
798 | 801 | | |
799 | 802 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2187 | 2187 | | |
2188 | 2188 | | |
2189 | 2189 | | |
| 2190 | + | |
| 2191 | + | |
| 2192 | + | |
| 2193 | + | |
| 2194 | + | |
| 2195 | + | |
| 2196 | + | |
| 2197 | + | |
| 2198 | + | |
| 2199 | + | |
| 2200 | + | |
| 2201 | + | |
| 2202 | + | |
| 2203 | + | |
| 2204 | + | |
| 2205 | + | |
| 2206 | + | |
| 2207 | + | |
| 2208 | + | |
| 2209 | + | |
| 2210 | + | |
| 2211 | + | |
| 2212 | + | |
| 2213 | + | |
| 2214 | + | |
| 2215 | + | |
| 2216 | + | |
| 2217 | + | |
| 2218 | + | |
| 2219 | + | |
| 2220 | + | |
| 2221 | + | |
| 2222 | + | |
| 2223 | + | |
| 2224 | + | |
| 2225 | + | |
| 2226 | + | |
| 2227 | + | |
| 2228 | + | |
| 2229 | + | |
| 2230 | + | |
| 2231 | + | |
| 2232 | + | |
| 2233 | + | |
| 2234 | + | |
| 2235 | + | |
| 2236 | + | |
| 2237 | + | |
| 2238 | + | |
| 2239 | + | |
| 2240 | + | |
| 2241 | + | |
| 2242 | + | |
| 2243 | + | |
| 2244 | + | |
| 2245 | + | |
| 2246 | + | |
| 2247 | + | |
| 2248 | + | |
| 2249 | + | |
| 2250 | + | |
| 2251 | + | |
| 2252 | + | |
| 2253 | + | |
| 2254 | + | |
| 2255 | + | |
| 2256 | + | |
| 2257 | + | |
| 2258 | + | |
| 2259 | + | |
| 2260 | + | |
| 2261 | + | |
| 2262 | + | |
| 2263 | + | |
| 2264 | + | |
| 2265 | + | |
| 2266 | + | |
| 2267 | + | |
| 2268 | + | |
| 2269 | + | |
| 2270 | + | |
| 2271 | + | |
| 2272 | + | |
| 2273 | + | |
0 commit comments