File tree Expand file tree Collapse file tree
opengenome2_llama_native_te Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -132,6 +132,15 @@ def _attn_work_from_batch(
132132 CodonFM currently runs FSDP without CP (cp_size=1), but the formula stays correct
133133 if CP is added later.
134134 Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
135+
136+ NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment, inlined
137+ in ``CodonTHDCollator.__call__`` in dataset.py), the cu_seq_lens_q tensor is mutated
138+ in place to include one or more appended mock pad sequences and no
139+ ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's per-sequence
140+ CP padding). In that path the unpadded and padded metrics collapse, inflated by
141+ ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically <10⁻⁵ and below
142+ measurement noise. Known limitation; see
143+ https://github.com/NVIDIA/bionemo-framework/issues/1561.
135144 """
136145 if include_padding :
137146 cu = batch .get ("cu_seq_lens_q_padded" )
Original file line number Diff line number Diff line change @@ -129,6 +129,14 @@ def _attn_work_from_batch(
129129 * BSHD: uses full ``input_ids.shape``, scaled by ``cp_size²``.
130130
131131 Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
132+
133+ NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
134+ cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
135+ and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
136+ per-sequence CP padding). In that path the unpadded and padded metrics collapse,
137+ inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
138+ <10⁻⁵ and below measurement noise. Known limitation; see
139+ https://github.com/NVIDIA/bionemo-framework/issues/1561.
132140 """
133141 if include_padding :
134142 cu = batch .get ("cu_seq_lens_q_padded" )
Original file line number Diff line number Diff line change @@ -131,6 +131,14 @@ def _attn_work_from_batch(
131131 scaled by ``cp_size²``.
132132
133133 Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
134+
135+ NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
136+ cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
137+ and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
138+ per-sequence CP padding). In that path the unpadded and padded metrics collapse,
139+ inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
140+ <10⁻⁵ and below measurement noise. Known limitation; see
141+ https://github.com/NVIDIA/bionemo-framework/issues/1561.
134142 """
135143 if include_padding :
136144 cu = batch .get ("cu_seq_lens_q_padded" )
Original file line number Diff line number Diff line change @@ -136,6 +136,14 @@ def _attn_work_from_batch(
136136 * BSHD: uses full ``input_ids.shape``, scaled by ``cp_size²``.
137137
138138 Int32 lens cast to int64 BEFORE squaring (overflow at L ≈ 46k otherwise).
139+
140+ NOTE: With the collator's ``pad_to_multiple_of`` option (FP8/FP4 alignment), the
141+ cu_seq_lens_q tensor is mutated in place to include an appended mock pad sequence
142+ and no ``cu_seq_lens_q_padded`` key is written (that key is reserved for TE's
143+ per-sequence CP padding). In that path the unpadded and padded metrics collapse,
144+ inflated by ≤``pad_to_multiple_of²`` relative to the real Σ(Lᵢ²) — typically
145+ <10⁻⁵ and below measurement noise. Known limitation; see
146+ https://github.com/NVIDIA/bionemo-framework/issues/1561.
139147 """
140148 if include_padding :
141149 cu = batch .get ("cu_seq_lens_q_padded" )
You can’t perform that action at this time.
0 commit comments