Commit ebcee69
reorder_join: guard QueryNode::rank against cost=0, improve Aggregate/Inner-Join cardinality
- Mirror the cost==0 guard from PrecedenceTreeNode::rank to QueryNode::rank.
Without this, root nodes (which are hardcoded cost=0) combined with a
selectivity=1.0 fallback produced (1-1)/0 = NaN, which then panicked
inside denormalize's `partial_cmp(...).unwrap()`. Q18 in IN-subquery form
hit this and crashed the optimizer.
- Port duckdb's ExtractAggregationStats heuristic
(relation_statistics_helper.cpp:359-437) into estimate_cardinality::Aggregate:
- Ungrouped aggregate → 1 row.
- With per-group-key NDV: product * 0.95^(n-1) correction, then the
Occupancy-Problem formula `product * (1 - exp(-input/product))`,
clamped to `[1, input]`.
- Without NDV: fall back to `max(input/2, 1)` instead of `0.1*input`.
This makes the aggregate's estimated cardinality reflect the actual
number of distinct group keys when they can be sized, instead of
passing input rows straight through.
- Relax is_group_key matching to compare by column name only. A
SubqueryAlias wrapping an aggregate rewrites the relation prefix on the
way back up, so strict relation-equality dropped legitimate group-key
references (e.g. `t.l_orderkey` failing to match `lineitem.l_orderkey`).
- For non-group columns asked of an Aggregate, return the post-aggregate
row count as a loose NDV upper bound instead of erroring. The error
used to bubble up and force callers (Filter, Projection,
SubqueryAlias) into the multi-input catch-all, leaving the join's
ndv lookup empty and the selectivity stuck at the 0.1 fallback.
- Add an explicit Inner-Join arm to estimate_cardinality. Without it,
any caller asking for the cardinality of a join subtree the
flattener absorbed as an opaque node (e.g. when a Projection sits
between two Inner Joins, as optimize_projections inserts) errored
with "Cannot estimate cardinality for plan with multiple inputs",
again degrading the cost model to constants.
End-to-end on TPC-H Q18 against an iceberg FileCatalog at SF=100:
- IN-subquery form (q18.sql, threshold 313) no longer panics. The
reorder rule flips the LeftSemi over (customer x lineitem x orders)
to a RightSemi with the aggregated subquery on the build side.
- CTE form (q18s.sql, threshold 300): the reorder rule swaps the top
Inner Join's children so the aggregated CTE is the logical LEFT
(intended build) side. The physical join_selection rule may still
re-pick based on physical Statistics, which is upstream of this
optimizer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 1340df0 commit ebcee69
2 files changed
Lines changed: 91 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
147 | 151 | | |
148 | | - | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
149 | 181 | | |
150 | 182 | | |
151 | 183 | | |
152 | 184 | | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
153 | 189 | | |
154 | | - | |
| 190 | + | |
155 | 191 | | |
156 | 192 | | |
157 | 193 | | |
158 | 194 | | |
159 | 195 | | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
164 | | - | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
165 | 202 | | |
166 | 203 | | |
167 | 204 | | |
| |||
233 | 270 | | |
234 | 271 | | |
235 | 272 | | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
236 | 315 | | |
237 | 316 | | |
238 | 317 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
159 | 159 | | |
160 | 160 | | |
161 | 161 | | |
162 | | - | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
163 | 167 | | |
164 | 168 | | |
165 | 169 | | |
| |||
0 commit comments