Commit 1c7630c
feat: SamplePushdown rule + Sample logical/physical nodes
Adds the cross-cutting infrastructure for pushing TABLESAMPLE-shaped
sampling into file sources, with parquet as the first absorbing
source. There is no SQL surface yet; this commit only ships the
primitives. Wiring a RelationPlanner / ExtensionPlanner so it works
out of the box from SQL is the next commit in this stack.
- `Sample` `UserDefinedLogicalNodeCore` extension node in
`datafusion-expr` (`logical_plan/sample.rs`). Schema-preserving;
validates `fraction ∈ (0, 1]`. Currently encodes
`SampleMethod::System` only.
- `SampleExec` placeholder in `datafusion-physical-plan`. Errors at
`execute` (it's a marker — the `SamplePushdown` rule is expected
to remove it). Implements filter / sort pushdown passthrough so
unrelated optimizer rules see straight through it.
- New `try_push_sample` method on `ExecutionPlan` and `FileSource`,
returning `Absorbed { inner }` / `Passthrough` / `Unsupported
{ reason }`. Default is `Unsupported`; per-node `Passthrough`
overrides on filter, projection, coalesce_batches,
coalesce_partitions, repartition, and non-fetch sort.
- `ParquetSource::try_push_sample` runs the (intentionally private)
hierarchical block-level reduction across files / row groups /
rows, with adaptive collapse when an axis can't reduce. Coordinates
with the opener via `pub(crate)` `system_target_remaining` and
`seed` fields on `ParquetSampling`. Single-file, single-row-group
inputs hit ~p × N rows instead of undershooting at p^(1/3) × N.
- `REPEATABLE(seed)` is plumbed all the way through: when set,
`ParquetSampling::apply_row_group_sampling` and
`apply_row_fraction_sampling` key only on `(seed, ...)` and ignore
the file path, so the same query is reproducible across
environments.
- `SamplePushdown` optimizer rule (between `PushdownSort` and
`EnsureCooperative`) walks top-down. On `Absorbed` it replaces
`SampleExec` with the rebuilt source; on `Passthrough` it pushes
through the single-child node and recurses; on `Unsupported` it
errors at planning time with `"TABLESAMPLE is not supported for
this source"`. There is intentionally no generic post-scan
`SampleExec` yet.
- EXPLAIN visibility: `ParquetSource::fmt_extra` surfaces
`sample_system_target_remaining` when set.
- `optimizer_rule_reference.md` updated to list `SamplePushdown` in
the documented rule order.
- `explain.slt` updated with `physical_plan after SamplePushdown SAME
TEXT AS ABOVE` lines under each verbose-explain test.
Tests: 7 unit tests on `ParquetSource::try_push_sample` covering the
pushdown contract (full / single-file / multi-file / target clamping
/ REPEATABLE determinism / multi-file rounding compensation), and 3
opener end-to-end tests covering the adaptive split for single vs
multi row group inputs and REPEATABLE-seed reproducibility across
file paths.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 2026412 commit 1c7630c
22 files changed
Lines changed: 1570 additions & 26 deletions
File tree
- datafusion
- core/src
- datasource-parquet/src
- datasource/src
- expr/src/logical_plan
- physical-optimizer/src
- physical-plan/src
- repartition
- sorts
- sqllogictest/test_files
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
91 | | - | |
92 | | - | |
93 | | - | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
893 | 893 | | |
894 | 894 | | |
895 | 895 | | |
896 | | - | |
897 | | - | |
898 | | - | |
899 | | - | |
900 | | - | |
901 | | - | |
902 | | - | |
903 | | - | |
904 | | - | |
905 | | - | |
906 | | - | |
907 | | - | |
908 | | - | |
909 | | - | |
910 | | - | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
911 | 948 | | |
912 | 949 | | |
913 | 950 | | |
| |||
1586 | 1623 | | |
1587 | 1624 | | |
1588 | 1625 | | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
1589 | 1630 | | |
1590 | 1631 | | |
1591 | 1632 | | |
| |||
2753 | 2794 | | |
2754 | 2795 | | |
2755 | 2796 | | |
| 2797 | + | |
| 2798 | + | |
| 2799 | + | |
2756 | 2800 | | |
2757 | 2801 | | |
2758 | 2802 | | |
| |||
2860 | 2904 | | |
2861 | 2905 | | |
2862 | 2906 | | |
| 2907 | + | |
| 2908 | + | |
| 2909 | + | |
| 2910 | + | |
| 2911 | + | |
| 2912 | + | |
| 2913 | + | |
| 2914 | + | |
| 2915 | + | |
| 2916 | + | |
| 2917 | + | |
| 2918 | + | |
| 2919 | + | |
| 2920 | + | |
| 2921 | + | |
| 2922 | + | |
| 2923 | + | |
| 2924 | + | |
| 2925 | + | |
| 2926 | + | |
| 2927 | + | |
| 2928 | + | |
| 2929 | + | |
| 2930 | + | |
| 2931 | + | |
| 2932 | + | |
| 2933 | + | |
| 2934 | + | |
| 2935 | + | |
| 2936 | + | |
| 2937 | + | |
| 2938 | + | |
| 2939 | + | |
| 2940 | + | |
| 2941 | + | |
| 2942 | + | |
| 2943 | + | |
| 2944 | + | |
| 2945 | + | |
| 2946 | + | |
| 2947 | + | |
| 2948 | + | |
| 2949 | + | |
| 2950 | + | |
| 2951 | + | |
| 2952 | + | |
| 2953 | + | |
| 2954 | + | |
| 2955 | + | |
| 2956 | + | |
| 2957 | + | |
| 2958 | + | |
| 2959 | + | |
| 2960 | + | |
| 2961 | + | |
| 2962 | + | |
| 2963 | + | |
| 2964 | + | |
| 2965 | + | |
| 2966 | + | |
| 2967 | + | |
| 2968 | + | |
| 2969 | + | |
| 2970 | + | |
| 2971 | + | |
| 2972 | + | |
| 2973 | + | |
| 2974 | + | |
| 2975 | + | |
| 2976 | + | |
| 2977 | + | |
| 2978 | + | |
| 2979 | + | |
| 2980 | + | |
| 2981 | + | |
| 2982 | + | |
| 2983 | + | |
| 2984 | + | |
| 2985 | + | |
| 2986 | + | |
| 2987 | + | |
| 2988 | + | |
| 2989 | + | |
| 2990 | + | |
| 2991 | + | |
| 2992 | + | |
| 2993 | + | |
| 2994 | + | |
| 2995 | + | |
| 2996 | + | |
| 2997 | + | |
| 2998 | + | |
| 2999 | + | |
| 3000 | + | |
| 3001 | + | |
| 3002 | + | |
| 3003 | + | |
| 3004 | + | |
| 3005 | + | |
| 3006 | + | |
| 3007 | + | |
| 3008 | + | |
| 3009 | + | |
| 3010 | + | |
| 3011 | + | |
| 3012 | + | |
| 3013 | + | |
| 3014 | + | |
| 3015 | + | |
| 3016 | + | |
| 3017 | + | |
| 3018 | + | |
| 3019 | + | |
| 3020 | + | |
| 3021 | + | |
| 3022 | + | |
| 3023 | + | |
| 3024 | + | |
| 3025 | + | |
| 3026 | + | |
| 3027 | + | |
| 3028 | + | |
| 3029 | + | |
| 3030 | + | |
| 3031 | + | |
| 3032 | + | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
| 3037 | + | |
| 3038 | + | |
| 3039 | + | |
| 3040 | + | |
| 3041 | + | |
| 3042 | + | |
| 3043 | + | |
| 3044 | + | |
| 3045 | + | |
| 3046 | + | |
| 3047 | + | |
| 3048 | + | |
| 3049 | + | |
| 3050 | + | |
| 3051 | + | |
| 3052 | + | |
| 3053 | + | |
| 3054 | + | |
| 3055 | + | |
| 3056 | + | |
| 3057 | + | |
| 3058 | + | |
| 3059 | + | |
| 3060 | + | |
| 3061 | + | |
| 3062 | + | |
| 3063 | + | |
| 3064 | + | |
| 3065 | + | |
| 3066 | + | |
| 3067 | + | |
| 3068 | + | |
| 3069 | + | |
| 3070 | + | |
| 3071 | + | |
| 3072 | + | |
| 3073 | + | |
| 3074 | + | |
| 3075 | + | |
| 3076 | + | |
| 3077 | + | |
| 3078 | + | |
| 3079 | + | |
| 3080 | + | |
| 3081 | + | |
| 3082 | + | |
| 3083 | + | |
| 3084 | + | |
| 3085 | + | |
| 3086 | + | |
| 3087 | + | |
| 3088 | + | |
| 3089 | + | |
| 3090 | + | |
| 3091 | + | |
| 3092 | + | |
| 3093 | + | |
| 3094 | + | |
| 3095 | + | |
| 3096 | + | |
| 3097 | + | |
| 3098 | + | |
| 3099 | + | |
| 3100 | + | |
| 3101 | + | |
| 3102 | + | |
| 3103 | + | |
| 3104 | + | |
| 3105 | + | |
| 3106 | + | |
| 3107 | + | |
| 3108 | + | |
| 3109 | + | |
| 3110 | + | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
| 3116 | + | |
| 3117 | + | |
| 3118 | + | |
| 3119 | + | |
| 3120 | + | |
| 3121 | + | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + | |
| 3125 | + | |
| 3126 | + | |
| 3127 | + | |
| 3128 | + | |
| 3129 | + | |
| 3130 | + | |
| 3131 | + | |
| 3132 | + | |
| 3133 | + | |
| 3134 | + | |
2863 | 3135 | | |
0 commit comments