Commit 0b13cb9
perf(parquet): mark cold paths #[cold] so they move out of hot icache
The GKE bench shows `string_dictionary/*` consistently ~+80% across
every branch commit, even though the chunker's fast path returns
`chunk_size` with a single struct-field load while `has_dictionary()`
is true (which it is for the entire `string_dictionary` bench since
`create_random_batch` produces a low-cardinality dict that doesn't
spill the writer's encoder).
Working hypothesis: the regression is icache pressure from the new
code's mere presence. The cold path (`byte_budget_sub_batch_size`,
`write_granular_chunk`) is never executed for `string_dictionary` but
sits inline near the encoder's hot path and pushes hot bytes out of
L1i.
Mark both cold paths `#[cold]` so LLVM places them in a separate text
section. The hot encoder loop should stay tighter in icache.
This is a hypothesis-driven attempt; if GKE doesn't move it tells us
the regression source is somewhere else and we keep digging.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>1 parent 9d647dc commit 0b13cb9
2 files changed
Lines changed: 11 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
170 | 170 | | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
174 | 176 | | |
175 | 177 | | |
| 178 | + | |
176 | 179 | | |
177 | 180 | | |
178 | 181 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
774 | 774 | | |
775 | 775 | | |
776 | 776 | | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
777 | 783 | | |
| 784 | + | |
778 | 785 | | |
779 | 786 | | |
780 | 787 | | |
| |||
0 commit comments