Skip to content

feat(nemotron3): add Nemotron-3 Ultra chat-template variant#77

Merged
hallerite merged 1 commit into
mainfrom
feat/nemotron3-ultra-renderer
Jun 4, 2026
Merged

feat(nemotron3): add Nemotron-3 Ultra chat-template variant#77
hallerite merged 1 commit into
mainfrom
feat/nemotron3-ultra-renderer

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented Jun 4, 2026

Summary

Adds the Nemotron-3 Ultra chat-template variant to the nemotron-3 renderer. Ultra's template differs from Nano/Super in three ways:

  • Reasoning glue: <think>\n{reasoning}</think>{content} — no \n around </think> (Nano/Super use <think>\n{reasoning}\n</think>\n{content}).
  • Historical truncation: dropped-reasoning turns collapse to <think></think>{content} with no separating \n.
  • Truncation boundary: thinking is dropped on every assistant turn before the last user message (the template's loop.index0 < last_user_idx rule), rather than preserving only the last plain assistant.

Changes

  • Nemotron3RendererConfig.ultra: bool | None = NoneNone auto-detects the variant from the model name. Marked _internal_fields (it selects a template variant, not a Jinja kwarg), so the parity matrix doesn't cross it as a template field.
  • Variant auto-selected from tokenizer.name_or_path via _ULTRA_DEFAULTS + _default_ultra, materialized in __init__ — mirrors Qwen3.5's _ENABLE_THINKING_DEFAULTS. Unknown / fine-tuned / local-path checkpoints fall back to the Nano/Super template; pass an explicit ultra= to override.
  • Ultra BF16 + FP8 checkpoints mapped to the nemotron-3 renderer in MODEL_RENDERER_MAP.
  • Ultra added to the config-parity, shared-barrage (conftest), and roundtrip test matrices (BF16 as the representative), plus an offline test pinning the name-based selection and the model→renderer mapping.

Validation

Token-parity verified against the canonical Ultra apply_chat_template across system / no-system, reasoning, multi-turn truncation, tool-call (including consecutive tool results and varied argument types), and generation-prompt shapes, with enable_thinking and truncate_history_thinking both toggled. Nano/Super behavior is unchanged — the Ultra branches are gated on the resolved flag, which is False for them.

CI uses BF16 as the Ultra representative; BF16 and FP8 share the same tokenizer and template.

Pre-existing renderer content-normalization behaviors (e.g. inline <think>…</think> inside content rather than reasoning_content, and whitespace stripping of message content) are unchanged by this PR and apply identically to Nano/Super — out of scope here.


Note

Medium Risk
Changes tokenization and history-thinking rules for Ultra checkpoints; incorrect ultra resolution would break HF template parity and round-trips, though Nano/Super paths stay gated on ultra=False.

Overview
Adds Nemotron-3 Ultra as a second chat-template variant on the existing nemotron-3 renderer, selected by Nemotron3RendererConfig.ultra (auto from tokenizer.name_or_path via _ULTRA_DEFAULTS, overridable explicitly).

Ultra vs Nano/Super behavior when ultra=True: historical thinking is dropped on assistant turns before the last user message (not only before the last plain assistant); reasoning blocks glue as {content} without separating newlines around and on collapsed turns.

Registers Ultra BF16 and FP8 in MODEL_RENDERER_MAP, excludes ultra from template parity fields (_internal_fields), and extends config-parity, conftest, roundtrip (BF16), plus offline tests for name-based selection and FP8 mapping.

Reviewed by Cursor Bugbot for commit f63fa30. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add Nemotron-3 Ultra chat-template variant to Nemotron3Renderer

  • Adds an ultra boolean field to Nemotron3RendererConfig; when None (default), it is auto-resolved at renderer init time by matching tokenizer.name_or_path against a new _ULTRA_DEFAULTS lookup table.
  • Maps the nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 and nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-FP8 checkpoints to the nemotron-3 renderer in MODEL_RENDERER_MAP.
  • Ultra rendering differs from Nano/Super in two ways: </think> is emitted directly adjacent to content (no surrounding newlines), and thinking is truncated on every assistant turn before the last user message rather than only at the final plain assistant turn.
  • ultra is marked as an internal field via _internal_fields so it is not exposed as a template kwarg.
  • Behavioral Change: existing callers using Nano/Super Nemotron-3 checkpoints are unaffected; Ultra checkpoints previously fell back to DefaultRenderer and now use the nemotron-3 renderer.

Macroscope summarized f63fa30.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented Jun 4, 2026

Approvability

Verdict: Needs human review

This PR adds support for a new Nemotron-3 Ultra chat-template variant with new configuration, auto-detection logic, and conditional rendering paths that affect how messages are formatted differently from the existing Nano/Super variants. New feature capability warrants human review.

You can customize Macroscope's approvability policy. Learn more.

snimu
snimu previously approved these changes Jun 4, 2026
Nemotron-3 Ultra uses a chat-template variant distinct from Nano/Super:
the reasoning block is glued as `<think>\n{reasoning}</think>{content}`
(no `\n` around `</think>`), truncated historical turns collapse to
`<think></think>{content}` (no `\n`), and the thinking-truncation
boundary follows the template's `loop.index0 < last_user_idx` rule (drop
thinking on every assistant turn before the last user message).

- Add `Nemotron3RendererConfig.ultra` (`bool | None`, default `None` =
  auto-detect by model name). Marked `_internal_fields` since it selects
  a template variant rather than mapping to a Jinja kwarg, so the parity
  matrix doesn't cross it as a template field.
- Auto-select the variant from `tokenizer.name_or_path` via
  `_ULTRA_DEFAULTS` + `_default_ultra`, materialized in `__init__`
  (mirrors Qwen3.5's `_ENABLE_THINKING_DEFAULTS`). Unknown / fine-tuned /
  local-path checkpoints fall back to the Nano/Super template; pass an
  explicit `ultra=` to override.
- Map the Ultra BF16 + FP8 checkpoints to the nemotron-3 renderer.
- Cover Ultra in the config-parity, conftest barrage, and roundtrip
  matrices (BF16 representative), plus an offline test pinning the
  name-based selection and the model->renderer mapping.

Token-parity verified against the canonical Ultra `apply_chat_template`
across system / no-system, reasoning, multi-turn truncation, tool-call,
and generation-prompt shapes.
@hallerite hallerite force-pushed the feat/nemotron3-ultra-renderer branch from a76412c to f63fa30 Compare June 4, 2026 14:19
@hallerite hallerite merged commit 596c15f into main Jun 4, 2026
11 checks passed
@hallerite hallerite deleted the feat/nemotron3-ultra-renderer branch June 4, 2026 14:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants