feat(nemotron3): add Nemotron-3 Ultra chat-template variant by hallerite · Pull Request #77 · PrimeIntellect-ai/renderers

hallerite · 2026-06-04T13:55:05Z

Summary

Adds the Nemotron-3 Ultra chat-template variant to the nemotron-3 renderer. Ultra's template differs from Nano/Super in three ways:

Reasoning glue: <think>\n{reasoning}</think>{content} — no \n around </think> (Nano/Super use <think>\n{reasoning}\n</think>\n{content}).
Historical truncation: dropped-reasoning turns collapse to <think></think>{content} with no separating \n.
Truncation boundary: thinking is dropped on every assistant turn before the last user message (the template's loop.index0 < last_user_idx rule), rather than preserving only the last plain assistant.

Changes

Nemotron3RendererConfig.ultra: bool | None = None — None auto-detects the variant from the model name. Marked _internal_fields (it selects a template variant, not a Jinja kwarg), so the parity matrix doesn't cross it as a template field.
Variant auto-selected from tokenizer.name_or_path via _ULTRA_DEFAULTS + _default_ultra, materialized in __init__ — mirrors Qwen3.5's _ENABLE_THINKING_DEFAULTS. Unknown / fine-tuned / local-path checkpoints fall back to the Nano/Super template; pass an explicit ultra= to override.
Ultra BF16 + FP8 checkpoints mapped to the nemotron-3 renderer in MODEL_RENDERER_MAP.
Ultra added to the config-parity, shared-barrage (conftest), and roundtrip test matrices (BF16 as the representative), plus an offline test pinning the name-based selection and the model→renderer mapping.

Validation

Token-parity verified against the canonical Ultra apply_chat_template across system / no-system, reasoning, multi-turn truncation, tool-call (including consecutive tool results and varied argument types), and generation-prompt shapes, with enable_thinking and truncate_history_thinking both toggled. Nano/Super behavior is unchanged — the Ultra branches are gated on the resolved flag, which is False for them.

CI uses BF16 as the Ultra representative; BF16 and FP8 share the same tokenizer and template.

Pre-existing renderer content-normalization behaviors (e.g. inline <think>…</think> inside content rather than reasoning_content, and whitespace stripping of message content) are unchanged by this PR and apply identically to Nano/Super — out of scope here.

Note

Medium Risk
Changes tokenization and history-thinking rules for Ultra checkpoints; incorrect ultra resolution would break HF template parity and round-trips, though Nano/Super paths stay gated on ultra=False.

Overview
Adds Nemotron-3 Ultra as a second chat-template variant on the existing nemotron-3 renderer, selected by Nemotron3RendererConfig.ultra (auto from tokenizer.name_or_path via _ULTRA_DEFAULTS, overridable explicitly).

Ultra vs Nano/Super behavior when ultra=True: historical thinking is dropped on assistant turns before the last user message (not only before the last plain assistant); reasoning blocks glue as {content} without separating newlines around and on collapsed turns.

Registers Ultra BF16 and FP8 in MODEL_RENDERER_MAP, excludes ultra from template parity fields (_internal_fields), and extends config-parity, conftest, roundtrip (BF16), plus offline tests for name-based selection and FP8 mapping.

^{Reviewed by Cursor Bugbot for commit f63fa30. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add Nemotron-3 Ultra chat-template variant to `Nemotron3Renderer`

Adds an ultra boolean field to Nemotron3RendererConfig; when None (default), it is auto-resolved at renderer init time by matching tokenizer.name_or_path against a new _ULTRA_DEFAULTS lookup table.
Maps the nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 and nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-FP8 checkpoints to the nemotron-3 renderer in MODEL_RENDERER_MAP.
Ultra rendering differs from Nano/Super in two ways: </think> is emitted directly adjacent to content (no surrounding newlines), and thinking is truncated on every assistant turn before the last user message rather than only at the final plain assistant turn.
ultra is marked as an internal field via _internal_fields so it is not exposed as a template kwarg.
Behavioral Change: existing callers using Nano/Super Nemotron-3 checkpoints are unaffected; Ultra checkpoints previously fell back to DefaultRenderer and now use the nemotron-3 renderer.

^{Macroscope summarized f63fa30.}

macroscopeapp · 2026-06-04T13:57:03Z

Approvability

Verdict: Needs human review

This PR adds support for a new Nemotron-3 Ultra chat-template variant with new configuration, auto-detection logic, and conditional rendering paths that affect how messages are formatted differently from the existing Nano/Super variants. New feature capability warrants human review.

^{You can customize Macroscope's approvability policy. Learn more.}

Nemotron-3 Ultra uses a chat-template variant distinct from Nano/Super: the reasoning block is glued as `<think>\n{reasoning}</think>{content}` (no `\n` around `</think>`), truncated historical turns collapse to `<think></think>{content}` (no `\n`), and the thinking-truncation boundary follows the template's `loop.index0 < last_user_idx` rule (drop thinking on every assistant turn before the last user message). - Add `Nemotron3RendererConfig.ultra` (`bool | None`, default `None` = auto-detect by model name). Marked `_internal_fields` since it selects a template variant rather than mapping to a Jinja kwarg, so the parity matrix doesn't cross it as a template field. - Auto-select the variant from `tokenizer.name_or_path` via `_ULTRA_DEFAULTS` + `_default_ultra`, materialized in `__init__` (mirrors Qwen3.5's `_ENABLE_THINKING_DEFAULTS`). Unknown / fine-tuned / local-path checkpoints fall back to the Nano/Super template; pass an explicit `ultra=` to override. - Map the Ultra BF16 + FP8 checkpoints to the nemotron-3 renderer. - Cover Ultra in the config-parity, conftest barrage, and roundtrip matrices (BF16 representative), plus an offline test pinning the name-based selection and the model->renderer mapping. Token-parity verified against the canonical Ultra `apply_chat_template` across system / no-system, reasoning, multi-turn truncation, tool-call, and generation-prompt shapes.

snimu previously approved these changes Jun 4, 2026

View reviewed changes

hallerite dismissed snimu’s stale review via f63fa30 June 4, 2026 14:19

hallerite force-pushed the feat/nemotron3-ultra-renderer branch from a76412c to f63fa30 Compare June 4, 2026 14:19

snimu approved these changes Jun 4, 2026

View reviewed changes

hallerite merged commit 596c15f into main Jun 4, 2026
11 checks passed

hallerite deleted the feat/nemotron3-ultra-renderer branch June 4, 2026 14:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(nemotron3): add Nemotron-3 Ultra chat-template variant#77

feat(nemotron3): add Nemotron-3 Ultra chat-template variant#77
hallerite merged 1 commit into
mainfrom
feat/nemotron3-ultra-renderer

hallerite commented Jun 4, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

macroscopeapp Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hallerite commented Jun 4, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Add Nemotron-3 Ultra chat-template variant to Nemotron3Renderer

Uh oh!

macroscopeapp Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hallerite commented Jun 4, 2026 •

edited by macroscopeapp Bot

Loading

Add Nemotron-3 Ultra chat-template variant to `Nemotron3Renderer`

macroscopeapp Bot commented Jun 4, 2026 •

edited

Loading