Commit 9d647dc
perf(parquet): force-inline ByteBudgetChunker hot path, split cold path out
The previous `#[inline]` hint was no longer enough once
`pick_sub_batch_size` grew the `ValueCountStrategy` match — LLVM
silently stopped inlining and the most recent GKE bench bounced
`string_dictionary/*` back to +46–81% (`default` +81%, `parquet_2`
+86%, `bloom_filter` +46%).
Fix:
1. Mark `pick_sub_batch_size` `#[inline(always)]`. The hot path is
just `if static_always_fits || has_dictionary || chunk_size == 0 {
return chunk_size; }` — one struct-field load + one virtual call —
so unconditional inlining is the right call, not a heuristic
suggestion.
2. Pull the byte-budget computation out into a separate
`byte_budget_sub_batch_size` method marked `#[inline(never)]`. This
keeps the inlined fast path small even as the slow path grows; the
slow path is paid for explicitly when bypasses don't fire, not
smuggled into every chunk's inline body.
Same behavior, just compiler-friendlier code layout.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent bb19d3e commit 9d647dc
1 file changed
Lines changed: 34 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
134 | 134 | | |
135 | 135 | | |
136 | 136 | | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
141 | 143 | | |
142 | | - | |
| 144 | + | |
143 | 145 | | |
144 | 146 | | |
145 | 147 | | |
| |||
154 | 156 | | |
155 | 157 | | |
156 | 158 | | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
157 | 186 | | |
158 | 187 | | |
159 | 188 | | |
| |||
0 commit comments