Skip to content

Commit 0b13cb9

Browse files
adriangbclaude
andcommitted
perf(parquet): mark cold paths #[cold] so they move out of hot icache
The GKE bench shows `string_dictionary/*` consistently ~+80% across every branch commit, even though the chunker's fast path returns `chunk_size` with a single struct-field load while `has_dictionary()` is true (which it is for the entire `string_dictionary` bench since `create_random_batch` produces a low-cardinality dict that doesn't spill the writer's encoder). Working hypothesis: the regression is icache pressure from the new code's mere presence. The cold path (`byte_budget_sub_batch_size`, `write_granular_chunk`) is never executed for `string_dictionary` but sits inline near the encoder's hot path and pushes hot bytes out of L1i. Mark both cold paths `#[cold]` so LLVM places them in a separate text section. The hot encoder loop should stay tighter in icache. This is a hypothesis-driven attempt; if GKE doesn't move it tells us the regression source is somewhere else and we keep digging. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 9d647dc commit 0b13cb9

2 files changed

Lines changed: 11 additions & 1 deletion

File tree

parquet/src/column/writer/byte_budget_chunker.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -170,9 +170,12 @@ impl ByteBudgetChunker {
170170
/// Cold path: the encoder is plain-encoding and the bypass conditions
171171
/// didn't fire, so we have to look at value sizes to decide whether
172172
/// the chunk fits. Pulled out of `pick_sub_batch_size` and marked
173-
/// `#[inline(never)]` so the inlined fast path stays small.
173+
/// `#[inline(never)]` + `#[cold]` so the inlined fast path stays
174+
/// small and the dead-code placement signal pushes this body
175+
/// physically away from the hot encoder loop's icache footprint.
174176
#[allow(clippy::too_many_arguments)]
175177
#[inline(never)]
178+
#[cold]
176179
fn byte_budget_sub_batch_size<E: ColumnValueEncoder>(
177180
&self,
178181
values: &E::Values,

parquet/src/column/writer/mod.rs

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -774,7 +774,14 @@ impl<'a, E: ColumnValueEncoder> GenericColumnWriter<'a, E> {
774774
/// record never spans data pages, matching the parquet format rule.
775775
///
776776
/// Returns the total number of values consumed across all sub-batches.
777+
///
778+
/// Marked `#[cold]` because the byte-budget path that calls this
779+
/// fires only on columns whose values are individually larger than
780+
/// `data_page_size_limit / write_batch_size` (e.g. multi-MiB
781+
/// BYTE_ARRAY blobs). Keeping it out of the hot section lets the
782+
/// hot `write_mini_batch` path keep its icache locality.
777783
#[allow(clippy::too_many_arguments)]
784+
#[cold]
778785
fn write_granular_chunk(
779786
&mut self,
780787
values: &E::Values,

0 commit comments

Comments
 (0)