Commit 74914e0
feat: SamplePushdown rule + Sample logical/physical nodes for parquet
Adds the infrastructure for pushing TABLESAMPLE-shaped sampling into
file sources, with parquet as the first absorbing source. There is
no SQL surface yet; this commit only ships the primitives. Wiring a
RelationPlanner / ExtensionPlanner so it works out of the box from
SQL is a follow-up.
- `Sample` `UserDefinedLogicalNodeCore` extension node in
`datafusion-expr` (`logical_plan/sample.rs`). Schema-preserving;
validates `fraction ∈ (0, 1]`. Currently encodes
`SampleMethod::System` only.
- `SampleExec` placeholder in `datafusion-physical-plan`. Errors at
`execute` (it's a marker — the `SamplePushdown` rule is expected
to remove it). Implements filter / sort pushdown passthrough so
unrelated optimizer rules see straight through it.
- New `try_push_sample` method on `ExecutionPlan` and `FileSource`,
returning `Absorbed { inner }` / `Passthrough` / `Unsupported
{ reason }`. Default is `Unsupported`; per-node `Passthrough`
overrides on filter, projection, coalesce_batches,
coalesce_partitions, repartition, and non-fetch sort.
- `ParquetSource::try_push_sample` runs the (intentionally private)
hierarchical block-level reduction across files / row groups /
rows, with adaptive collapse when an axis can't reduce. Coordinates
with the opener via a `pub(crate)` `system_target_remaining` field
on `ParquetSampling`. Single-file, single-row-group inputs hit
~p × N rows instead of undershooting at p^(1/3) × N.
- `SamplePushdown` optimizer rule (between `PushdownSort` and
`EnsureCooperative`) walks top-down. On `Absorbed` it replaces
`SampleExec` with the rebuilt source; on `Passthrough` it pushes
through the single-child node and recurses; on `Unsupported` it
errors at planning time with `"TABLESAMPLE is not supported for
this source"`. There is intentionally no generic post-scan
`SampleExec` yet.
- EXPLAIN visibility: `ParquetSource::fmt_extra` surfaces
`sample_system_target_remaining` when set.
- `optimizer_rule_reference.md` updated to list `SamplePushdown` in
the documented rule order.
- `explain.slt` updated with `physical_plan after SamplePushdown SAME
TEXT AS ABOVE` lines under each verbose-explain test.
Tests: 7 unit tests on `ParquetSource::try_push_sample` covering the
pushdown contract (full / single-file / multi-file / target clamping
/ REPEATABLE determinism / multi-file rounding compensation), and 2
opener end-to-end tests covering the adaptive split for single vs
multi row group inputs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 3d0dc4a commit 74914e0
21 files changed
Lines changed: 1414 additions & 30 deletions
File tree
- datafusion
- core/src
- datasource-parquet/src
- datasource/src
- expr/src/logical_plan
- physical-optimizer/src
- physical-plan/src
- repartition
- sorts
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
92 | | - | |
93 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
893 | 893 | | |
894 | 894 | | |
895 | 895 | | |
896 | | - | |
897 | | - | |
898 | | - | |
899 | | - | |
900 | | - | |
901 | | - | |
902 | | - | |
903 | | - | |
904 | | - | |
905 | | - | |
906 | | - | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
907 | 943 | | |
908 | | - | |
909 | | - | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | | - | |
915 | | - | |
916 | | - | |
917 | | - | |
918 | | - | |
919 | | - | |
920 | | - | |
921 | | - | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
| 958 | + | |
| 959 | + | |
922 | 960 | | |
923 | 961 | | |
924 | 962 | | |
| |||
3056 | 3094 | | |
3057 | 3095 | | |
3058 | 3096 | | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
| 3126 | + | |
| 3127 | + | |
| 3128 | + | |
| 3129 | + | |
| 3130 | + | |
| 3131 | + | |
| 3132 | + | |
| 3133 | + | |
| 3134 | + | |
| 3135 | + | |
| 3136 | + | |
| 3137 | + | |
| 3138 | + | |
| 3139 | + | |
| 3140 | + | |
| 3141 | + | |
| 3142 | + | |
| 3143 | + | |
| 3144 | + | |
| 3145 | + | |
| 3146 | + | |
| 3147 | + | |
| 3148 | + | |
| 3149 | + | |
| 3150 | + | |
| 3151 | + | |
| 3152 | + | |
| 3153 | + | |
| 3154 | + | |
| 3155 | + | |
| 3156 | + | |
| 3157 | + | |
| 3158 | + | |
| 3159 | + | |
| 3160 | + | |
| 3161 | + | |
| 3162 | + | |
| 3163 | + | |
| 3164 | + | |
| 3165 | + | |
| 3166 | + | |
| 3167 | + | |
| 3168 | + | |
| 3169 | + | |
| 3170 | + | |
| 3171 | + | |
| 3172 | + | |
| 3173 | + | |
| 3174 | + | |
| 3175 | + | |
| 3176 | + | |
| 3177 | + | |
| 3178 | + | |
| 3179 | + | |
| 3180 | + | |
| 3181 | + | |
| 3182 | + | |
| 3183 | + | |
| 3184 | + | |
| 3185 | + | |
| 3186 | + | |
| 3187 | + | |
| 3188 | + | |
| 3189 | + | |
| 3190 | + | |
| 3191 | + | |
| 3192 | + | |
| 3193 | + | |
| 3194 | + | |
| 3195 | + | |
| 3196 | + | |
| 3197 | + | |
| 3198 | + | |
| 3199 | + | |
| 3200 | + | |
3059 | 3201 | | |
0 commit comments