fix(data): preserve Nemotron Omni packed sequence metadata by cuichenx · Pull Request #4400 · NVIDIA-NeMo/Megatron-Bridge

cuichenx · 2026-06-16T22:57:10Z

Summary

Add cu_seqlens_unpadded_argmin to the Nemotron Omni Energon batch metadata.
Populate it for in-batch packed sequences and forward it through encode_batch().
Extend the packed Energon unit test to cover the forwarded packed-sequence metadata.

Root Cause

NemotronOmniTaskEncoder.batch() builds packed cu_seqlens metadata, but the unpadded argmin was not carried through the Energon batch dict. When get_packed_seq_params() receives cu_seqlens_unpadded without cu_seqlens_unpadded_argmin, it falls back to torch.argmin(cu_seqlens_unpadded). For valid packed sequences the first entry is 0, so the fallback truncates the unpadded cu-seqlens to an empty tensor, which can surface as a TE/cuDNN fused-attention bad parameter.

Validation

pre-commit run --all-files
python3 -m py_compile src/megatron/bridge/data/energon/nemotron_omni_task_encoder.py tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py
DFW rc8 packed Valor smoke, MBS=2, TP=2, EP=8, CP=1, real Energon data: job 12869551 completed with exit 0 and rank 0 logged Step Time : 114.68s. Logs: /lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_llm/users/chcui/vera-wandb/valor32k-energon-20260616/logs/valor32k-rc8-packdiag3_12869551.{out,err}.

Not run locally:

uv run python -m pytest tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py -q because this workstation resolves as manylinux_2_31_x86_64, while nvidia-resiliency-ext==0.6.0 only publishes compatible wheels for manylinux_2_39 on x86_64.

Signed-off-by: Chen Cui <chcui@nvidia.com>

copy-pr-bot · 2026-06-16T22:57:13Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

claude · 2026-06-16T23:02:34Z

Light Code Review

Clean, well-scoped bug fix. The root cause analysis in the PR description is solid -- missing cu_seqlens_unpadded_argmin causes get_packed_seq_params() to fall back to torch.argmin(cu_seqlens_unpadded), which returns 0 (the first entry is always 0), truncating the tensor to empty and triggering TE/cuDNN errors.

Code change is correct. The new field is added to the dataclass, populated identically to cu_seqlens_argmin in the packing path (both set to len(cu_seqlens) for the no-op slice), left as None in the non-packing path, and forwarded through encode_batch. All consistent.

Test coverage is good. The existing test now verifies the new cu_seqlens_unpadded_argmin batch field value and all five packed-sequence metadata fields through encode_batch (closing a pre-existing coverage gap where only cu_seqlens was checked in the encoded dict).

No bugs, typos, or documentation issues found.

LGTM

Suggested test cases: No perf tests impacted.

fix(data): preserve Nemotron Omni packed metadata

2f0d6e4

Signed-off-by: Chen Cui <chcui@nvidia.com>

cuichenx marked this pull request as ready for review June 16, 2026 22:59

copy-pr-bot Bot temporarily deployed to public June 16, 2026 23:00 Inactive

copy-pr-bot Bot temporarily deployed to test June 16, 2026 23:00 Inactive

yaoyu-33 added area:data Dataset builders, preprocessing, and samplers bug Something isn't working needs-review PR is ready for code review and waiting on a reviewer labels Jun 16, 2026

copy-pr-bot Bot temporarily deployed to public June 16, 2026 23:09 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 23:10 Inactive

copy-pr-bot Bot temporarily deployed to public June 16, 2026 23:33 Inactive

yaoyu-33 approved these changes Jun 25, 2026

View reviewed changes

cuichenx merged commit 8b5b732 into main Jun 25, 2026
108 checks passed

cuichenx deleted the codex/fix-nomni-packed-metadata branch June 25, 2026 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(data): preserve Nemotron Omni packed sequence metadata#4400

fix(data): preserve Nemotron Omni packed sequence metadata#4400
cuichenx merged 1 commit into
mainfrom
codex/fix-nomni-packed-metadata

cuichenx commented Jun 16, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

claude Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

cuichenx commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Validation

Uh oh!

copy-pr-bot Bot commented Jun 16, 2026

Uh oh!

claude Bot commented Jun 16, 2026

Light Code Review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cuichenx commented Jun 16, 2026 •

edited

Loading