Skip to content

Commit 29121fe

Browse files
committed
docs(perf_logger): note pad_to_multiple_of / cu_seq_lens_q collapse (#1561)
When the collator's pad_to_multiple_of option is set (FP8/FP4 alignment), cu_seq_lens_q is mutated in place to include an appended mock pad sequence and no cu_seq_lens_q_padded key is written — that key is reserved for TE's per-sequence CP padding. In that path the unpadded and padded MFU metrics collapse, inflated by at most pad_to_multiple_of² of the real Σ(Lᵢ²) — typically <10⁻⁵, below measurement noise. Documented as a known limitation in _attn_work_from_batch's docstring in all four MFU-tracking recipes (esm2, llama3, opengenome2_llama, codonfm), with a pointer to issue #1561 for the full analysis and proposed fixes. No behavior change. Signed-off-by: Gagan Kaushik <gkaushik@nvidia.com>
1 parent 44172ae commit 29121fe

4 files changed

Lines changed: 33 additions & 0 deletions

File tree

bionemo-recipes/recipes/codonfm_native_te/perf_logger.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,15 @@ def _attn_work_from_batch(
132132
CodonFM currently runs FSDP without CP (cp_size=1), but the formula stays correct
133133
if CP is added later.
134134
Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
135+
136+
NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment, inlined
137+
in ``CodonTHDCollator.__call__`` in dataset.py), the cu_seq_lens_q tensor is mutated
138+
in place to include one or more appended mock pad sequences and no
139+
``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's per-sequence
140+
CP padding). In that path the unpadded and padded metrics collapse, inflated by
141+
≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically <10⁻⁵ and below
142+
measurement noise. Known limitation; see
143+
https://github.com/NVIDIA/bionemo-framework/issues/1561.
135144
"""
136145
if include_padding:
137146
cu = batch.get("cu_seq_lens_q_padded")

bionemo-recipes/recipes/esm2_native_te/perf_logger.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,14 @@ def _attn_work_from_batch(
129129
* BSHD: uses full ``input_ids.shape``, scaled by ``cp_size²``.
130130
131131
Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
132+
133+
NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
134+
cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
135+
and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
136+
per-sequence CP padding). In that path the unpadded and padded metrics collapse,
137+
inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
138+
<10⁻⁵ and below measurement noise. Known limitation; see
139+
https://github.com/NVIDIA/bionemo-framework/issues/1561.
132140
"""
133141
if include_padding:
134142
cu = batch.get("cu_seq_lens_q_padded")

bionemo-recipes/recipes/llama3_native_te/perf_logger.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,14 @@ def _attn_work_from_batch(
131131
scaled by ``cp_size²``.
132132
133133
Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
134+
135+
NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
136+
cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
137+
and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
138+
per-sequence CP padding). In that path the unpadded and padded metrics collapse,
139+
inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
140+
<10⁻⁵ and below measurement noise. Known limitation; see
141+
https://github.com/NVIDIA/bionemo-framework/issues/1561.
134142
"""
135143
if include_padding:
136144
cu = batch.get("cu_seq_lens_q_padded")

bionemo-recipes/recipes/opengenome2_llama_native_te/perf_logger.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,14 @@ def _attn_work_from_batch(
136136
* BSHD: uses full ``input_ids.shape``, scaled by ``cp_size²``.
137137
138138
Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
139+
140+
NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
141+
cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
142+
and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
143+
per-sequence CP padding). In that path the unpadded and padded metrics collapse,
144+
inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
145+
<10⁻⁵ and below measurement noise. Known limitation; see
146+
https://github.com/NVIDIA/bionemo-framework/issues/1561.
139147
"""
140148
if include_padding:
141149
cu = batch.get("cu_seq_lens_q_padded")

0 commit comments

Comments
 (0)