@@ -1366,14 +1366,29 @@ impl RowGroupsPrunedParquetOpen {
13661366 // projection and **every** predicate conjunct's columns, regardless
13671367 // of whether each conjunct is currently row-level or post-scan.
13681368 //
1369- // Why all conjuncts (not just post-scan): a mid-stream
1370- // `maybe_swap_strategy` call can demote a row-level filter to
1371- // post-scan when its measured throughput drops below
1372- // `min_bytes_per_sec`. The decoder's projection mask is fixed for
1373- // the file (we don't grow it on swap), so any column that *might*
1374- // be referenced by a post-scan filter at some point during the
1375- // file must already be in the mask — otherwise the post-scan
1376- // rebase fails with a schema-lookup error.
1369+ // The arrow-rs decoder *does* support swapping projection masks
1370+ // between row groups (`StrategySwap::with_projection`), so this
1371+ // could in principle be dynamic. We don't do that because:
1372+ //
1373+ // 1. Correctness — a mid-stream `maybe_swap_strategy` call can
1374+ // demote a row-level filter to post-scan when its measured
1375+ // throughput drops below `min_bytes_per_sec`. The post-scan
1376+ // rebase resolves the filter against the decoder's output
1377+ // schema, so any column that *might* be referenced by a
1378+ // post-scan filter at some point during the file must already
1379+ // be in the mask — otherwise the rebase fails with a
1380+ // schema-lookup error.
1381+ //
1382+ // 2. Empirically operational, not defensive — the byte-ratio
1383+ // placement heuristic (`filter_collecting_byte_ratio_threshold`)
1384+ // routes filters with out-of-projection heavy columns to
1385+ // post-scan from file-open, so those columns are needed at
1386+ // every row group anyway. A `DF_MASK_PROBE` instrumentation
1387+ // run on TPC-DS / TPC-H / ClickBench smoke measured
1388+ // < 0.001 % per-RG mask waste under default config — i.e. the
1389+ // static union mask matches the per-RG optimum at essentially
1390+ // every swap point. A dynamic-shrink scheme would gain nothing
1391+ // in these workloads.
13771392 //
13781393 // Filter-only columns are stripped when the projector runs after
13791394 // post-scan filters, so the user-visible output schema is
0 commit comments