Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/concepts/models/default-model-settings.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ The following model configurations are automatically available when `NVIDIA_API_
| Alias | Model | Use Case | Inference Parameters |
|-------|-------|----------|---------------------|
| `nvidia-text` | `nvidia/nemotron-3-nano-30b-a3b` | General text generation | `temperature=1.0, top_p=1.0` |
| `nvidia-reasoning` | `openai/gpt-oss-20b` | Reasoning and analysis tasks | `temperature=0.35, top_p=0.95` |
| `nvidia-reasoning` | `nvidia/nemotron-3-super-120b-a12b` | Reasoning and analysis tasks | `temperature=1.0, top_p=0.95` |
| `nvidia-vision` | `nvidia/nemotron-nano-12b-v2-vl` | Vision and image understanding | `temperature=0.85, top_p=0.95` |
| `nvidia-embedding` | `nvidia/llama-3.2-nv-embedqa-1b-v2` | Text embeddings | `encoding_format="float", extra_body={"input_type": "query"}` |

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,7 @@ class NordColor(Enum):
DEFAULT_VISION_INFERENCE_PARAMS = {"temperature": 0.85, "top_p": 0.95}
DEFAULT_EMBEDDING_INFERENCE_PARAMS = {"encoding_format": "float"}
NEMOTRON_3_NANO_30B_A3B_INFERENCE_PARAMS = {"temperature": 1.0, "top_p": 1.0}
NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS = {"temperature": 1.0, "top_p": 0.95}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nemotron Super supports reasoning_effort as a request param (same as gpt-oss did). Might be worth setting a default here - without it the model runs full chain-of-thought on every call, which can add up for bulk generation. We don't set it for gpt-oss either so not blocking, but since we're touching this anyway:

NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS = {"temperature": 1.0, "top_p": 0.95, "extra_body": {"reasoning_effort": "medium"}}

Mirrors what we do for GPT-5 on line 340.

GPT5_INFERENCE_PARAMS = {"extra_body": {"reasoning_effort": "medium"}}

PREDEFINED_PROVIDERS_MODEL_MAP = {
Expand All @@ -344,7 +345,10 @@ class NordColor(Enum):
"model": "nvidia/nemotron-3-nano-30b-a3b",
"inference_parameters": NEMOTRON_3_NANO_30B_A3B_INFERENCE_PARAMS,
},
"reasoning": {"model": "openai/gpt-oss-20b", "inference_parameters": DEFAULT_REASONING_INFERENCE_PARAMS},
"reasoning": {
"model": "nvidia/nemotron-3-super-120b-a12b",
"inference_parameters": NEMOTRON_3_SUPER_120B_A12B_INFERENCE_PARAMS,
},
"vision": {"model": "nvidia/nemotron-nano-12b-v2-vl", "inference_parameters": DEFAULT_VISION_INFERENCE_PARAMS},
"embedding": {
"model": "nvidia/llama-3.2-nv-embedqa-1b-v2",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,9 @@ def test_get_default_inference_parameters():
top_p=0.95,
)
assert get_default_inference_parameters(
"reasoning", {"temperature": 0.35, "top_p": 0.95}
"reasoning", {"temperature": 1.0, "top_p": 0.95}
) == ChatCompletionInferenceParams(
temperature=0.35,
temperature=1.0,
top_p=0.95,
)
assert get_default_inference_parameters(
Expand All @@ -59,7 +59,7 @@ def test_get_builtin_model_configs():
assert builtin_model_configs[0].model == "nvidia/nemotron-3-nano-30b-a3b"
assert builtin_model_configs[0].provider == "nvidia"
assert builtin_model_configs[1].alias == "nvidia-reasoning"
assert builtin_model_configs[1].model == "openai/gpt-oss-20b"
assert builtin_model_configs[1].model == "nvidia/nemotron-3-super-120b-a12b"
assert builtin_model_configs[1].provider == "nvidia"
assert builtin_model_configs[2].alias == "nvidia-vision"
assert builtin_model_configs[2].model == "nvidia/nemotron-nano-12b-v2-vl"
Expand Down
Loading