Commit b2ce070
perf(upsert): prune destination scan via df partition-column ranges and project join_cols only
Two complementary optimizations to ``Transaction.upsert`` for tables
whose partition spec sources from columns NOT in ``join_cols`` (a
common pattern for append-only event logs partitioned by time but
keyed by composite IDs):
1. Partition-range augmentation: ``upsert_util.augment_filter_with_partition_ranges``
derives ``[min, max]`` predicates from ``df`` for every partition
source column present in the frame and ANDs them into the row
filter built by ``create_match_filter``. ``inclusive_projection``
then projects each range through the partition transform at scan
plan time, enabling manifest- and file-level pruning that the
key-only filter can't trigger.
2. Column-projection for the insert-only path: when
``when_matched_update_all=False`` the consumer loop only reads
``join_cols`` off each destination batch. Passing
``selected_fields=tuple(join_cols)`` to ``DataScan`` lets the
parquet reader prune wide non-key columns. The existing
``_projected_field_ids`` auto-union with row-filter columns keeps
the partition-range predicate's data accessible.
Correctness guards skip the augmentation per-column when the source
column is absent from df, entirely null, or partially null (a non-null
range predicate would exclude NULL-partition destination rows whose
keys may collide with the null-partition source rows).
Related to #2138, #2159, #3129. Complementary to (closed-stale) #2943's
"coarse match filter" approach: that PR shrinks the row predicate
itself; this one adds partition pruning the row predicate can't
trigger on its own.
Co-authored-by: Cursor <cursoragent@cursor.com>1 parent d339391 commit b2ce070
3 files changed
Lines changed: 795 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
877 | 877 | | |
878 | 878 | | |
879 | 879 | | |
| 880 | + | |
| 881 | + | |
| 882 | + | |
| 883 | + | |
| 884 | + | |
| 885 | + | |
| 886 | + | |
| 887 | + | |
| 888 | + | |
| 889 | + | |
| 890 | + | |
| 891 | + | |
| 892 | + | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
880 | 902 | | |
881 | 903 | | |
882 | 904 | | |
883 | 905 | | |
884 | 906 | | |
885 | 907 | | |
| 908 | + | |
886 | 909 | | |
887 | 910 | | |
888 | 911 | | |
| |||
2072 | 2095 | | |
2073 | 2096 | | |
2074 | 2097 | | |
2075 | | - | |
2076 | | - | |
2077 | | - | |
2078 | | - | |
2079 | | - | |
2080 | | - | |
2081 | | - | |
| 2098 | + | |
| 2099 | + | |
| 2100 | + | |
| 2101 | + | |
| 2102 | + | |
2082 | 2103 | | |
2083 | 2104 | | |
2084 | 2105 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
| 26 | + | |
26 | 27 | | |
27 | 28 | | |
| 29 | + | |
28 | 30 | | |
| 31 | + | |
29 | 32 | | |
30 | 33 | | |
| 34 | + | |
| 35 | + | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
| |||
53 | 58 | | |
54 | 59 | | |
55 | 60 | | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
56 | 159 | | |
57 | 160 | | |
58 | 161 | | |
| |||
0 commit comments