Skip to content

fix(data): preserve Nemotron Omni packed sequence metadata#4400

Merged
cuichenx merged 1 commit into
mainfrom
codex/fix-nomni-packed-metadata
Jun 25, 2026
Merged

fix(data): preserve Nemotron Omni packed sequence metadata#4400
cuichenx merged 1 commit into
mainfrom
codex/fix-nomni-packed-metadata

Conversation

@cuichenx

@cuichenx cuichenx commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add cu_seqlens_unpadded_argmin to the Nemotron Omni Energon batch metadata.
  • Populate it for in-batch packed sequences and forward it through encode_batch().
  • Extend the packed Energon unit test to cover the forwarded packed-sequence metadata.

Root Cause

NemotronOmniTaskEncoder.batch() builds packed cu_seqlens metadata, but the unpadded argmin was not carried through the Energon batch dict. When get_packed_seq_params() receives cu_seqlens_unpadded without cu_seqlens_unpadded_argmin, it falls back to torch.argmin(cu_seqlens_unpadded). For valid packed sequences the first entry is 0, so the fallback truncates the unpadded cu-seqlens to an empty tensor, which can surface as a TE/cuDNN fused-attention bad parameter.

Validation

  • pre-commit run --all-files
  • python3 -m py_compile src/megatron/bridge/data/energon/nemotron_omni_task_encoder.py tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py
  • DFW rc8 packed Valor smoke, MBS=2, TP=2, EP=8, CP=1, real Energon data: job 12869551 completed with exit 0 and rank 0 logged Step Time : 114.68s. Logs: /lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_llm/users/chcui/vera-wandb/valor32k-energon-20260616/logs/valor32k-rc8-packdiag3_12869551.{out,err}.

Not run locally:

  • uv run python -m pytest tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py -q because this workstation resolves as manylinux_2_31_x86_64, while nvidia-resiliency-ext==0.6.0 only publishes compatible wheels for manylinux_2_39 on x86_64.

Signed-off-by: Chen Cui <chcui@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cuichenx cuichenx marked this pull request as ready for review June 16, 2026 22:59
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Light Code Review

Clean, well-scoped bug fix. The root cause analysis in the PR description is solid -- missing cu_seqlens_unpadded_argmin causes get_packed_seq_params() to fall back to torch.argmin(cu_seqlens_unpadded), which returns 0 (the first entry is always 0), truncating the tensor to empty and triggering TE/cuDNN errors.

Code change is correct. The new field is added to the dataclass, populated identically to cu_seqlens_argmin in the packing path (both set to len(cu_seqlens) for the no-op slice), left as None in the non-packing path, and forwarded through encode_batch. All consistent.

Test coverage is good. The existing test now verifies the new cu_seqlens_unpadded_argmin batch field value and all five packed-sequence metadata fields through encode_batch (closing a pre-existing coverage gap where only cu_seqlens was checked in the encoded dict).

No bugs, typos, or documentation issues found.

LGTM

Suggested test cases: No perf tests impacted.

@yaoyu-33 yaoyu-33 added area:data Dataset builders, preprocessing, and samplers bug Something isn't working needs-review PR is ready for code review and waiting on a reviewer labels Jun 16, 2026
@cuichenx cuichenx merged commit 8b5b732 into main Jun 25, 2026
108 checks passed
@cuichenx cuichenx deleted the codex/fix-nomni-packed-metadata branch June 25, 2026 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:data Dataset builders, preprocessing, and samplers bug Something isn't working needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants