fix(data): preserve Nemotron Omni packed sequence metadata#4400
Conversation
Signed-off-by: Chen Cui <chcui@nvidia.com>
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
Light Code ReviewClean, well-scoped bug fix. The root cause analysis in the PR description is solid -- missing cu_seqlens_unpadded_argmin causes get_packed_seq_params() to fall back to torch.argmin(cu_seqlens_unpadded), which returns 0 (the first entry is always 0), truncating the tensor to empty and triggering TE/cuDNN errors. Code change is correct. The new field is added to the dataclass, populated identically to cu_seqlens_argmin in the packing path (both set to len(cu_seqlens) for the no-op slice), left as None in the non-packing path, and forwarded through encode_batch. All consistent. Test coverage is good. The existing test now verifies the new cu_seqlens_unpadded_argmin batch field value and all five packed-sequence metadata fields through encode_batch (closing a pre-existing coverage gap where only cu_seqlens was checked in the encoded dict). No bugs, typos, or documentation issues found. LGTM Suggested test cases: No perf tests impacted. |
Summary
cu_seqlens_unpadded_argminto the Nemotron Omni Energon batch metadata.encode_batch().Root Cause
NemotronOmniTaskEncoder.batch()builds packedcu_seqlensmetadata, but the unpadded argmin was not carried through the Energon batch dict. Whenget_packed_seq_params()receivescu_seqlens_unpaddedwithoutcu_seqlens_unpadded_argmin, it falls back totorch.argmin(cu_seqlens_unpadded). For valid packed sequences the first entry is0, so the fallback truncates the unpadded cu-seqlens to an empty tensor, which can surface as a TE/cuDNN fused-attention bad parameter.Validation
pre-commit run --all-filespython3 -m py_compile src/megatron/bridge/data/energon/nemotron_omni_task_encoder.py tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py12869551completed with exit0and rank 0 loggedStep Time : 114.68s. Logs:/lustre/fs1/portfolios/coreai/projects/coreai_dlalgo_llm/users/chcui/vera-wandb/valor32k-energon-20260616/logs/valor32k-rc8-packdiag3_12869551.{out,err}.Not run locally:
uv run python -m pytest tests/unit_tests/data/energon/test_nemotron_omni_task_encoder.py -qbecause this workstation resolves asmanylinux_2_31_x86_64, whilenvidia-resiliency-ext==0.6.0only publishes compatible wheels formanylinux_2_39on x86_64.