feat(nemotron3): add Nemotron-3 Ultra chat-template variant#77
Merged
Conversation
ApprovabilityVerdict: Needs human review This PR adds support for a new Nemotron-3 Ultra chat-template variant with new configuration, auto-detection logic, and conditional rendering paths that affect how messages are formatted differently from the existing Nano/Super variants. New feature capability warrants human review. You can customize Macroscope's approvability policy. Learn more. |
snimu
previously approved these changes
Jun 4, 2026
Nemotron-3 Ultra uses a chat-template variant distinct from Nano/Super:
the reasoning block is glued as `<think>\n{reasoning}</think>{content}`
(no `\n` around `</think>`), truncated historical turns collapse to
`<think></think>{content}` (no `\n`), and the thinking-truncation
boundary follows the template's `loop.index0 < last_user_idx` rule (drop
thinking on every assistant turn before the last user message).
- Add `Nemotron3RendererConfig.ultra` (`bool | None`, default `None` =
auto-detect by model name). Marked `_internal_fields` since it selects
a template variant rather than mapping to a Jinja kwarg, so the parity
matrix doesn't cross it as a template field.
- Auto-select the variant from `tokenizer.name_or_path` via
`_ULTRA_DEFAULTS` + `_default_ultra`, materialized in `__init__`
(mirrors Qwen3.5's `_ENABLE_THINKING_DEFAULTS`). Unknown / fine-tuned /
local-path checkpoints fall back to the Nano/Super template; pass an
explicit `ultra=` to override.
- Map the Ultra BF16 + FP8 checkpoints to the nemotron-3 renderer.
- Cover Ultra in the config-parity, conftest barrage, and roundtrip
matrices (BF16 representative), plus an offline test pinning the
name-based selection and the model->renderer mapping.
Token-parity verified against the canonical Ultra `apply_chat_template`
across system / no-system, reasoning, multi-turn truncation, tool-call,
and generation-prompt shapes.
a76412c to
f63fa30
Compare
snimu
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the Nemotron-3 Ultra chat-template variant to the
nemotron-3renderer. Ultra's template differs from Nano/Super in three ways:<think>\n{reasoning}</think>{content}— no\naround</think>(Nano/Super use<think>\n{reasoning}\n</think>\n{content}).<think></think>{content}with no separating\n.loop.index0 < last_user_idxrule), rather than preserving only the last plain assistant.Changes
Nemotron3RendererConfig.ultra: bool | None = None—Noneauto-detects the variant from the model name. Marked_internal_fields(it selects a template variant, not a Jinja kwarg), so the parity matrix doesn't cross it as a template field.tokenizer.name_or_pathvia_ULTRA_DEFAULTS+_default_ultra, materialized in__init__— mirrors Qwen3.5's_ENABLE_THINKING_DEFAULTS. Unknown / fine-tuned / local-path checkpoints fall back to the Nano/Super template; pass an explicitultra=to override.nemotron-3renderer inMODEL_RENDERER_MAP.conftest), and roundtrip test matrices (BF16 as the representative), plus an offline test pinning the name-based selection and the model→renderer mapping.Validation
Token-parity verified against the canonical Ultra
apply_chat_templateacross system / no-system, reasoning, multi-turn truncation, tool-call (including consecutive tool results and varied argument types), and generation-prompt shapes, withenable_thinkingandtruncate_history_thinkingboth toggled. Nano/Super behavior is unchanged — the Ultra branches are gated on the resolved flag, which isFalsefor them.CI uses BF16 as the Ultra representative; BF16 and FP8 share the same tokenizer and template.
Note
Medium Risk
Changes tokenization and history-thinking rules for Ultra checkpoints; incorrect
ultraresolution would break HF template parity and round-trips, though Nano/Super paths stay gated onultra=False.Overview
Adds Nemotron-3 Ultra as a second chat-template variant on the existing
nemotron-3renderer, selected byNemotron3RendererConfig.ultra(auto fromtokenizer.name_or_pathvia_ULTRA_DEFAULTS, overridable explicitly).Ultra vs Nano/Super behavior when
ultra=True: historical thinking is dropped on assistant turns before the last user message (not only before the last plain assistant); reasoning blocks glue as{content}without separating newlines aroundand on collapsedturns.Registers Ultra BF16 and FP8 in
MODEL_RENDERER_MAP, excludesultrafrom template parity fields (_internal_fields), and extends config-parity, conftest, roundtrip (BF16), plus offline tests for name-based selection and FP8 mapping.Reviewed by Cursor Bugbot for commit f63fa30. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add Nemotron-3 Ultra chat-template variant to
Nemotron3Rendererultraboolean field toNemotron3RendererConfig; whenNone(default), it is auto-resolved at renderer init time by matchingtokenizer.name_or_pathagainst a new_ULTRA_DEFAULTSlookup table.nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16andnvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-FP8checkpoints to thenemotron-3renderer inMODEL_RENDERER_MAP.</think>is emitted directly adjacent to content (no surrounding newlines), and thinking is truncated on every assistant turn before the last user message rather than only at the final plain assistant turn.ultrais marked as an internal field via_internal_fieldsso it is not exposed as a template kwarg.DefaultRendererand now use the nemotron-3 renderer.Macroscope summarized f63fa30.