apache
diff --git a/‎benchmarks/README.md‎
Lines changed: 44 additions & 0 deletions b/‎benchmarks/README.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎benchmarks/bench.sh‎
Lines changed: 76 additions & 0 deletions b/‎benchmarks/bench.sh‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/README.md‎
Lines changed: 2 additions & 0 deletions b/‎benchmarks/sql_benchmarks/README.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q01.benchmark‎
Lines changed: 26 additions & 0 deletions b/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q01.benchmark‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q03.benchmark‎
Lines changed: 23 additions & 0 deletions b/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q03.benchmark‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q10.benchmark‎
Lines changed: 23 additions & 0 deletions b/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q10.benchmark‎
Lines changed: 23 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q13.benchmark‎
Lines changed: 24 additions & 0 deletions b/‎benchmarks/sql_benchmarks/narrow_schema/benchmarks/q13.benchmark‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q01.benchmark‎
Lines changed: 25 additions & 0 deletions b/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q01.benchmark‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q02.benchmark‎
Lines changed: 26 additions & 0 deletions b/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q02.benchmark‎
Lines changed: 26 additions & 0 deletions
diff --git a/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q03.benchmark‎
Lines changed: 22 additions & 0 deletions b/‎benchmarks/sql_benchmarks/wide_schema/benchmarks/q03.benchmark‎
Lines changed: 22 additions & 0 deletions
@@ -620,6 +620,50 @@ This benchmarks is derived from the [TPC-H][1] version
 [2]: https://github.com/databricks/tpch-dbgen.git,
 [2.17.1]: https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.1.pdf
 
+## Wide-schema benchmark
+
+A pair of benchmark suites for measuring the per-file metadata
+overhead of a wide schema in selective parquet scans — the regime
+where most of the work is loading footers / column-chunk metadata
+rather than reading row data, and that cost scales linearly with the
+number of column chunks in the dataset rather than with the number of
+columns the query references.
+
+The data preparation step (`gen_wide_data`) synthesizes a generic
+8-column base schema (`id`, `value`, `count`, `ts`, `category`,
+`flag`, `status`, `text`) with deterministic data, then replicates it
+128× via suffix renaming (`_2`, `_3`, …) for 1024 columns total —
+written across 256 files at 50 k rows per file with one row group per
+file and ZSTD(1) compression. Copies 2..128 are zero-filled arrays so
+the schema is wide (every column still has its own footer / page
+index / column-chunk metadata) but the on-disk size stays around
+225 MB. A companion narrow dataset is written the same way without
+the suffix copies — same row count, same file count, same per-file
+row-group shape, just the 8 base columns. The only variable between
+wide and narrow is schema width, which is what controls the per-file
+metadata overhead.
+
+```shell
+./benchmarks/bench.sh data wide_schema    # synthesizes wide (1024 cols × 256 files) + narrow (8 cols × 256 files), ~60 s, ~335 MB
+./benchmarks/bench.sh run  wide_schema    # small-projection queries on the wide dataset
+./benchmarks/bench.sh run  narrow_schema  # same SQL on the narrow dataset (control)
+```
+
+The queries are deliberately small-projection (touch ≤ 4 columns) so
+the wide-schema overhead is the dominant signal. Headline query is
+`Q13` — filter on two low-cardinality string columns plus a
+non-stat-prunable modulo predicate for tight selectivity, projecting
+two columns, no `LIMIT` or `ORDER BY`. Other queries cover predicates
+on duplicated columns far from the start of the schema (`Q02`), TopK
+(`Q01`/`Q11`), tight selectivity with limit (`Q03`/`Q10`), and an
+`expect_plan` regression guard for projection pushdown (`Q12`).
+
+Compare `wide_schema` and `narrow_schema` Criterion outputs
+query-by-query for the slowdown ratio. For cold-start measurements
+that include planner setup (the regime where this overhead is most
+visible), invoke `datafusion-cli` directly against
+`data/wide_schema/{wide,narrow}/`.
+
 ## TPCDS
 
 Run the tpcds benchmark.
 
@@ -100,6 +100,8 @@ sort_tpch:              Benchmark of sorting speed for end-to-end sort queries o
 sort_tpch10:            Benchmark of sorting speed for end-to-end sort queries on TPC-H dataset (SF=10)
 topk_tpch:              Benchmark of top-k (sorting with limit) queries on TPC-H dataset (SF=1)
 external_aggr:          External aggregation benchmark on TPC-H dataset (SF=1)
+wide_schema:            Small-projection queries on a wide synthetic dataset (1024 cols × 256 files) — measures per-file metadata overhead
+narrow_schema:          The same queries against an 8-col narrow dataset (same row/file/group shape) — wide-vs-narrow control
 
 # ClickBench Benchmarks
 clickbench_1:           ClickBench queries against a single parquet file
@@ -239,6 +241,9 @@ main() {
                 tpch_csv10)
                     data_tpch "10" "csv"
                     ;;
+                wide_schema|narrow_schema)
+                    data_wide_schema
+                    ;;
                 tpcds)
                     data_tpcds
                     ;;
@@ -444,6 +449,12 @@ main() {
                 tpch_mem10)
                     run_tpch_mem "10"
                     ;;
+                wide_schema)
+                    run_wide_schema
+                    ;;
+                narrow_schema)
+                    run_narrow_schema
+                    ;;
                 tpcds)
                     run_tpcds
                     ;;
@@ -698,6 +709,71 @@ run_tpch() {
       bash -c "$SQL_CARGO_COMMAND"
 }
 
+# Synthesizes two parquet datasets used to measure per-file metadata
+# overhead of a wide schema:
+#
+#   - data/wide_schema/wide/    1024-col events × 256 files (~225 MB)
+#   - data/wide_schema/narrow/    8-col events × 256 files (~110 MB)
+#
+# Both share row count, file count, and per-file row-group shape; only
+# schema width differs. No external data source required — gen_wide_data
+# synthesizes everything from scratch in ~60 s.
+data_wide_schema() {
+    NUM_FILES=256
+    ROWS_PER_FILE=50000
+    WIDTH_FACTOR=128
+
+    DST_DIR="${DATA_DIR}/wide_schema"
+    WIDE_DIR="${DST_DIR}/wide"
+    NARROW_DIR="${DST_DIR}/narrow"
+
+    if [ -d "${WIDE_DIR}" ] && [ "$(ls -A "${WIDE_DIR}" 2>/dev/null | wc -l)" -ge ${NUM_FILES} ]; then
+        echo " wide parquet exists (${WIDE_DIR})."
+    else
+        mkdir -p "${WIDE_DIR}"
+        echo " synthesizing wide -> ${WIDE_DIR} (factor ${WIDTH_FACTOR}, ${NUM_FILES} files × ${ROWS_PER_FILE} rows) ..."
+        debug_run $CARGO_COMMAND --bin gen_wide_data -- \
+            --dst-dir "${WIDE_DIR}" \
+            --width-factor ${WIDTH_FACTOR} \
+            --num-files ${NUM_FILES} \
+            --rows-per-file ${ROWS_PER_FILE}
+    fi
+
+    if [ -d "${NARROW_DIR}" ] && [ "$(ls -A "${NARROW_DIR}" 2>/dev/null | wc -l)" -ge ${NUM_FILES} ]; then
+        echo " narrow parquet exists (${NARROW_DIR})."
+    else
+        mkdir -p "${NARROW_DIR}"
+        echo " synthesizing narrow -> ${NARROW_DIR} (8 base cols, ${NUM_FILES} files × ${ROWS_PER_FILE} rows) ..."
+        debug_run $CARGO_COMMAND --bin gen_wide_data -- \
+            --dst-dir "${NARROW_DIR}" \
+            --width-factor 1 \
+            --num-files ${NUM_FILES} \
+            --rows-per-file ${ROWS_PER_FILE}
+    fi
+}
+
+# Runs the wide_schema benchmark (small-projection queries on the wide dataset).
+run_wide_schema() {
+    echo "Running wide_schema benchmark..."
+
+    debug_run env BENCH_NAME=wide_schema \
+      PREFER_HASH_JOIN="${PREFER_HASH_JOIN}" \
+      SIMULATE_LATENCY="${SIMULATE_LATENCY}" \
+      ${QUERY:+BENCH_QUERY="${QUERY}"}  \
+      bash -c "$SQL_CARGO_COMMAND"
+}
+
+# Runs the same SQL against the narrow dataset — wide-vs-narrow control.
+run_narrow_schema() {
+    echo "Running narrow_schema (baseline) benchmark..."
+
+    debug_run env BENCH_NAME=narrow_schema \
+      PREFER_HASH_JOIN="${PREFER_HASH_JOIN}" \
+      SIMULATE_LATENCY="${SIMULATE_LATENCY}" \
+      ${QUERY:+BENCH_QUERY="${QUERY}"}  \
+      bash -c "$SQL_CARGO_COMMAND"
+}
+
 # Runs the tpch in memory (needs tpch parquet data)
 run_tpch_mem() {
     SCALE_FACTOR=$1
 
@@ -41,6 +41,8 @@ in the community:
 | `taxi`                | NYC taxi dataset benchmark                                         |
 | `tpcds`               | TPC‑DS queries                                                     |
 | `tpch`                | TPC‑H queries                                                      |
+| `wide_schema`         | Small-projection queries on a wide (1024-col, 256-file) synthetic dataset |
+| `narrow_schema`       | Same queries on an 8-col, 256-file dataset — wide-vs-narrow control       |
 
 # Running Benchmarks
 
 
@@ -0,0 +1,26 @@
+-- Companion to wide_schema/Q01 — same SQL against the narrow events
+-- dataset (8 cols vs 1024). Apples-to-apples baseline for measuring
+-- wide-schema metadata overhead.
+
+name Q01
+group narrow_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_narrow.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT id, ts, value, text
+FROM events
+WHERE category = 'c0'
+  AND flag     = 'f0'
+  AND status   = 's0'
+ORDER BY ts DESC
+LIMIT 100;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,23 @@
+-- Companion to wide_schema/Q03 — same SQL against the narrow events
+-- dataset (8 cols vs 1024). Apples-to-apples baseline for measuring
+-- wide-schema metadata overhead.
+
+name Q03
+group narrow_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_narrow.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT value
+FROM events
+WHERE id = 12345
+LIMIT 1;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,23 @@
+-- Companion to wide_schema/Q10 — same SQL against the narrow events
+-- dataset (8 cols vs 1024). Apples-to-apples baseline for measuring
+-- wide-schema metadata overhead.
+
+name Q10
+group narrow_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_narrow.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT id, ts, value, text
+FROM events
+WHERE id = 12345
+  AND category = 'c0';
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,24 @@
+-- Companion to wide_schema/Q13: same SQL, same row/file/row-group
+-- shape, only the per-file schema width differs. Apples-to-apples
+-- baseline for measuring wide-schema metadata overhead.
+
+name Q13
+group narrow_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_narrow.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT id, ts
+FROM events
+WHERE category = 'c0'
+  AND flag     = 'f0'
+  AND id % 1000 = 0;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,25 @@
+-- Filter on three low-cardinality columns, project four columns,
+-- ORDER BY + LIMIT (TopK shortcut).
+
+name Q01
+group wide_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_wide.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT id, ts, value, text
+FROM events
+WHERE category = 'c0'
+  AND flag     = 'f0'
+  AND status   = 's0'
+ORDER BY ts DESC
+LIMIT 100;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,26 @@
+-- Same shape as Q01 but filtering on duplicated columns far from the
+-- start of the schema (suffix copies _10). Tests whether planning /
+-- pruning cost depends on column position.
+
+name Q02
+group wide_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_wide.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT id, ts, value, text
+FROM events
+WHERE category_10 = 'c0'
+  AND flag_10     = 'f0'
+  AND status_10   = 's0'
+ORDER BY ts DESC
+LIMIT 100;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql
@@ -0,0 +1,22 @@
+-- Project 1 column with a very tight filter. Stresses minimum-
+-- projection pushdown over a wide schema.
+
+name Q03
+group wide_schema
+
+init sql_benchmarks/wide_schema/init/set_config.sql
+
+load sql_benchmarks/wide_schema/init/load_wide.sql
+
+assert I
+SELECT COUNT(*) > 0 from events;
+----
+true
+
+run
+SELECT value
+FROM events
+WHERE id = 12345
+LIMIT 1;
+
+cleanup sql_benchmarks/wide_schema/init/cleanup.sql