Commit 0883c09
authored
Fix Float8CurrentScaling NaN for CodonFM: init TE layers on CUDA (#1539)
## Summary
Fix nightly CI failure in `unit-tests-recipes.yml` ([run
#23790357242](https://github.com/NVIDIA/bionemo-framework/actions/runs/23790357242)).
### Root Cause
`CodonFMEncoder` and `CodonFMLMHead` initialized TransformerEngine
layers on `"cpu"` instead of `"cuda"` (unlike ESM2 and all other
models). In `test_legacy_quantized_model_init_forward_and_backward`, the
model is created inside a `quantized_model_init(Float8CurrentScaling)`
context then moved with `model.to("cuda")`. Moving FP8-quantized tensors
from CPU→CUDA corrupts `Float8CurrentScaling`'s scale metadata,
producing NaN loss.
### Fix
Changed CodonFM's TE layer init device from `"cpu"` to `"cuda"`
(matching ESM2), which is a 2-line change in `modeling_codonfm_te.py`.
The initial xfail approach (commit 1) was too broad — only codonfm was
affected.
### Files Changed
- `bionemo-recipes/models/codonfm/modeling_codonfm_te.py` — Fix device
init (`"cpu"` → `"cuda"`)
- `bionemo-recipes/recipes/codonfm_native_te/modeling_codonfm_te.py` —
Synced copy
- 5× `test_modeling_common.py` — Removed unnecessary xfail (net -1 line
each)
---
*Automated fix by OpenClaw + Claude Code*
Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
Co-authored-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>1 parent f0d4bfd commit 0883c09
2 files changed
Lines changed: 4 additions & 4 deletions
File tree
- bionemo-recipes
- models/codonfm
- recipes/codonfm_native_te
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
224 | 224 | | |
225 | 225 | | |
226 | 226 | | |
227 | | - | |
| 227 | + | |
228 | 228 | | |
229 | 229 | | |
230 | 230 | | |
| |||
362 | 362 | | |
363 | 363 | | |
364 | 364 | | |
365 | | - | |
| 365 | + | |
366 | 366 | | |
367 | 367 | | |
368 | 368 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
230 | 230 | | |
231 | 231 | | |
232 | 232 | | |
233 | | - | |
| 233 | + | |
234 | 234 | | |
235 | 235 | | |
236 | 236 | | |
| |||
368 | 368 | | |
369 | 369 | | |
370 | 370 | | |
371 | | - | |
| 371 | + | |
372 | 372 | | |
373 | 373 | | |
374 | 374 | | |
| |||
0 commit comments