feat: Support deployment-level thinking defaults for chat templates by indrajit96 · Pull Request #11047 · ai-dynamo/dynamo

indrajit96 · 2026-06-29T16:57:44Z

Overview:

Adds a deployment-level default thinking toggle for chat-template rendering.
This lets operators set whether a deployment should default to thinking enabled or disabled, while preserving request-level override priority.

Details:

Adds --dyn-default-thinking-mode enabled|disabled / DYN_DEFAULT_THINKING_MODE.
Publishes the configured value through model runtime metadata as default_thinking_mode.
Wires metadata publication through vLLM, SGLang, and TRTLLM worker registration paths.
Applies the deployment default in the Python frontend processors and Rust OpenAI preprocessor.
Injects compatible chat-template args when no request-level thinking control is present:
- thinking
- enable_thinking
- thinking_mode
Preserves request precedence for root-level thinking, chat_template_args, and chat_template_kwargs.

Validation:

Unit coverage for Rust preprocessor default injection.
Python processor unit coverage for vLLM/SGLang paths.
E2E validated with Dynamo frontend + vLLM using Qwen3 and SmolLM3:
- deployment default disabled
- deployment default enabled
- per-request override via chat_template_kwargs.enable_thinking

Summary by CodeRabbit

New Features
- Added support for a deployment-level default “thinking mode” that can be set via runtime configuration or CLI.
- Chat request preprocessing now applies this default when requests do not already specify thinking-related options.
- The setting is carried through worker startup and engine metadata so it is available across supported runtimes.
Tests
- Added coverage for enabled/disabled defaults and for preserving user-provided thinking settings.

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

devin-ai-integration

Devin Review found 1 potential issue.

devin-ai-integration · 2026-06-29T17:04:11Z

+    chat_template_kwargs = apply_default_thinking_mode_to_template_kwargs(
+        chat_template_kwargs,
+        default_thinking_mode,
+        request_has_root_thinking=(isinstance(request, dict) and "thinking" in request),
+    )
    chat_template_kwargs["reasoning_effort"] = request_for_sampling.reasoning_effort


🚩 Behavioral difference between SGLang and vLLM/Rust in how reasoning_effort interacts with default_thinking_mode

In the SGLang path, _normalize_openai_thinking_template_kwargs (components/src/dynamo/frontend/sglang_prepost.py:366-401) processes reasoning_effort: "none" into thinking: False / enable_thinking: False via setdefault_reasoning(False) BEFORE apply_default_thinking_mode_to_template_kwargs runs. This means reasoning_effort: "none" effectively blocks any deployment default from being applied.

In the vLLM path (components/src/dynamo/frontend/prepost.py:139-145) and the Rust preprocessor (lib/llm/src/preprocessor.rs:402-442), reasoning_effort is NOT checked before the default is applied — it's simply added as a separate template kwarg afterward. So a request with reasoning_effort: "none" + default_thinking_mode: "enabled" would end up with both enable_thinking: true (from the default) and reasoning_effort: "none" in the template kwargs. Whether this is a problem depends on the chat template's precedence rules for these fields.

This is likely intentional — SGLang normalizes OpenAI thinking controls into template flags, while vLLM/Rust let the template interpret reasoning_effort directly — but it's worth confirming the desired behavior for mixed-signal scenarios.

Was this helpful? React with 👍 or 👎 to provide feedback.

coderabbitai · 2026-06-29T17:07:21Z

Walkthrough

Adds a deployment-level default_thinking_mode option (enabled/disabled) surfaced via --dyn-default-thinking-mode CLI flag. The value flows from CLI config through Rust WorkerConfig, engine runtime metadata (vLLM, SGLang, TRT-LLM), Python/Rust preprocessors, and frontend processors, injecting thinking/enable_thinking/thinking_mode template kwargs only when the request does not already specify them.

Changes

Default Thinking Mode Feature

Layer / File(s)	Summary
thinking.py helpers and Rust preprocessor logic `components/src/dynamo/frontend/thinking.py`, `lib/llm/src/preprocessor.rs`	New `thinking.py` defines `DEFAULT_THINKING_MODE_RUNTIME_KEY`, `THINKING_CONTROL_KEYS`, `runtime_default_thinking_mode`, and `apply_default_thinking_mode_to_template_kwargs`. Rust `preprocessor.rs` adds parallel helpers and calls `apply_default_thinking_mode` on inbound chat-completion requests before preprocessing.
CLI config and Rust WorkerConfig field `components/src/dynamo/common/configuration/groups/runtime_args.py`, `lib/backend-common/src/worker.rs`	`DynamoRuntimeConfig` adds `dyn_default_thinking_mode` and the corresponding `--dyn-default-thinking-mode` CLI arg. Rust `WorkerConfig` adds the `default_thinking_mode` field and `build_local_model` injects it into `runtime_data`.
Engine runtime metadata publication `components/src/dynamo/vllm/main.py`, `components/src/dynamo/sglang/register.py`, `components/src/dynamo/trtllm/workers/llm_worker.py`	Each engine worker conditionally JSON-serializes `dyn_default_thinking_mode` and publishes it as an engine-specific `runtime_config` key.
Python WorkerConfig binding and PyO3 stub `components/src/dynamo/common/backend/worker.py`, `lib/bindings/python/rust/backend.rs`, `lib/bindings/python/src/dynamo/_core.pyi`	Python `WorkerConfig` dataclass adds `default_thinking_mode`, `from_runtime_config` reads it, `Worker.run()` forwards it. The PyO3 constructor and `.pyi` stub expose the new parameter.
Frontend processors wiring `components/src/dynamo/frontend/prepost.py`, `components/src/dynamo/frontend/sglang_prepost.py`, `components/src/dynamo/frontend/vllm_processor.py`, `components/src/dynamo/frontend/sglang_processor.py`	All four frontend modules accept `default_thinking_mode` from runtime config and thread it into `preprocess_chat_request` / `apply_default_thinking_mode_to_template_kwargs`. SGLang processor also propagates it into worker pool `initargs`.
Tests `components/src/dynamo/common/backend/tests/test_backend_bindings.py`, `lib/backend-common/src/worker.rs`, `lib/llm/src/preprocessor.rs`, `components/src/dynamo/frontend/tests/test_sglang_processor_unit.py`, `components/src/dynamo/frontend/tests/test_vllm_processor_unit.py`	Tests cover WorkerConfig binding constructor/defaults, Rust `build_local_model` runtime_data injection, Rust preprocessor injection/non-override cases, and Python frontend disabled/non-override behavior.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description covers overview and details, but it omits the required "Where should the reviewer start?" and Related Issues sections.	Add the missing reviewer-start guidance and complete the required Related Issues section with the issue link or the no-issue confirmation.
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: deployment-level thinking defaults for chat templates.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

lib/bindings/python/rust/backend.rs (1)

297-345: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Append the new backend.WorkerConfig parameter instead of inserting it mid-signature.

This changes the positional constructor contract for backend.WorkerConfig(...): every argument after reasoning_parser now shifts one slot. Please move default_thinking_mode to the end of the PyO3 signature/parameter list and update lib/bindings/python/src/dynamo/_core.pyi to match.

Suggested fix

     #[pyo3(signature = (
         namespace,
         component = "backend".to_string(),
         endpoint = "generate".to_string(),
         model_name = String::new(),
         served_model_name = None,
         model_input = ModelInput::Tokens,
         endpoint_types = "chat,completions".to_string(),
         custom_jinja_template = None,
         tool_call_parser = None,
         reasoning_parser = None,
-        default_thinking_mode = None,
         exclude_tools_when_tool_choice_none = true,
         enable_local_indexer = true,
         enable_kv_routing = true,
         metrics_labels = Vec::new(),
         runtime = None,
         disaggregation_mode = DisaggregationMode::Aggregated,
         health_check_payload = None,
         structural_tag_mode = "off".to_string(),
         structural_tag_scope = "auto".to_string(),
         structural_tag_schema = "auto".to_string(),
         route_to_encoder = false,
+        default_thinking_mode = None,
     ))]
@@
         custom_jinja_template: Option<String>,
         tool_call_parser: Option<String>,
         reasoning_parser: Option<String>,
-        default_thinking_mode: Option<String>,
         exclude_tools_when_tool_choice_none: bool,
         enable_local_indexer: bool,
         enable_kv_routing: bool,
         metrics_labels: Vec<(String, String)>,
         runtime: Option<RuntimeConfig>,
         disaggregation_mode: DisaggregationMode,
         health_check_payload: Option<PyObject>,
         structural_tag_mode: String,
         structural_tag_scope: String,
         structural_tag_schema: String,
         route_to_encoder: bool,
+        default_thinking_mode: Option<String>,
     ) -> PyResult<Self> {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/bindings/python/rust/backend.rs` around lines 297 - 345, The
backend.WorkerConfig constructor signature changed the positional argument order
by inserting a new parameter in the middle, which breaks existing callers.
Update the PyO3 constructor in backend.rs so default_thinking_mode is appended
at the end of the signature and parameter list after route_to_encoder, and
mirror the same ordering in the backend.WorkerConfig type stub in
lib/bindings/python/src/dynamo/_core.pyi to keep the Python API contract stable.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@components/src/dynamo/common/backend/worker.py`:
- Line 120: Move the new WorkerConfig field default_thinking_mode to the end of
the class so the dataclass stays append-only and existing positional callers
keep binding correctly. Update the WorkerConfig definition by appending
default_thinking_mode after the current optional fields, rather than inserting
it before exclude_tools_when_tool_choice_none or any existing parameters.

---

Outside diff comments:
In `@lib/bindings/python/rust/backend.rs`:
- Around line 297-345: The backend.WorkerConfig constructor signature changed
the positional argument order by inserting a new parameter in the middle, which
breaks existing callers. Update the PyO3 constructor in backend.rs so
default_thinking_mode is appended at the end of the signature and parameter list
after route_to_encoder, and mirror the same ordering in the backend.WorkerConfig
type stub in lib/bindings/python/src/dynamo/_core.pyi to keep the Python API
contract stable.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b0055b16-3d34-4118-ad8d-6ab9157ae694

📥 Commits

Reviewing files that changed from the base of the PR and between 1e05725 and 3443e08.

📒 Files selected for processing (17)

components/src/dynamo/common/backend/tests/test_backend_bindings.py
components/src/dynamo/common/backend/worker.py
components/src/dynamo/common/configuration/groups/runtime_args.py
components/src/dynamo/frontend/prepost.py
components/src/dynamo/frontend/sglang_prepost.py
components/src/dynamo/frontend/sglang_processor.py
components/src/dynamo/frontend/tests/test_sglang_processor_unit.py
components/src/dynamo/frontend/tests/test_vllm_processor_unit.py
components/src/dynamo/frontend/thinking.py
components/src/dynamo/frontend/vllm_processor.py
components/src/dynamo/sglang/register.py
components/src/dynamo/trtllm/workers/llm_worker.py
components/src/dynamo/vllm/main.py
lib/backend-common/src/worker.rs
lib/bindings/python/rust/backend.rs
lib/bindings/python/src/dynamo/_core.pyi
lib/llm/src/preprocessor.rs

coderabbitai · 2026-06-29T17:07:24Z

    custom_jinja_template: Optional[str] = None
    tool_call_parser: Optional[str] = None
    reasoning_parser: Optional[str] = None
+    default_thinking_mode: Optional[str] = None


🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Keep WorkerConfig append-only.

Line 120 inserts default_thinking_mode before existing optional fields, but this class already documents that new fields must be appended to preserve positional callers. Any positional WorkerConfig(...) call that used exclude_tools_when_tool_choice_none or anything after it will now silently bind the wrong values.

Suggested fix

- default_thinking_mode: Optional[str] = None exclude_tools_when_tool_choice_none: bool = True enable_local_indexer: bool = True @@ # Appended at the END of the dataclass to keep positional callers # working -- inserting mid-class would silently shift downstream args. route_to_encoder: bool = False + default_thinking_mode: Optional[str] = None

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

default_thinking_mode: Optional[str] = None

exclude_tools_when_tool_choice_none: bool = True

enable_local_indexer: bool = True

# Appended at the END of the dataclass to keep positional callers

# working -- inserting mid-class would silently shift downstream args.

route_to_encoder: bool = False

default_thinking_mode: Optional[str] = None

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@components/src/dynamo/common/backend/worker.py` at line 120, Move the new WorkerConfig field default_thinking_mode to the end of the class so the dataclass stays append-only and existing positional callers keep binding correctly. Update the WorkerConfig definition by appending default_thinking_mode after the current optional fields, rather than inserting it before exclude_tools_when_tool_choice_none or any existing parameters.

datadog-official · 2026-06-29T17:31:23Z

⚠️ Warnings

🚦 3 Pipeline jobs failed

PR | deploy-cleanup

PR | deploy-operator

PR | deploy-status-check

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 3443e08 | Docs | Give us feedback!}

…namo#11047) Adapt and apply ai-dynamo/dynamo PR ai-dynamo#11047 ("Support deployment-level thinking defaults for chat templates") onto feat/glm51-v12-20260610. Adds --dyn-default-thinking-mode enabled|disabled / DYN_DEFAULT_THINKING_MODE, published through model runtime metadata (runtime_data.default_thinking_mode) and applied in the SGLang/vLLM Python frontends and the Rust OpenAI preprocessor when a request carries no thinking control of its own. Request-level thinking/chat_template_args/chat_template_kwargs always win. Adaptations for this branch's older base (~855 commits behind the PR base): - preprocessor.rs: this branch's NvCreateChatCompletionRequest has no first-class `thinking` field, so request-level precedence checks `unsupported_fields` (where a root-level `thinking` lands) instead. - worker.rs: this branch's EngineConfig has no `runtime_data`, so the default-thinking runtime_data map is built fresh rather than cloned. - sglang_prepost.py: this branch lacks the newer-main `_normalize_openai_thinking_template_kwargs` helper; the deployment default is injected at the top of preprocess_chat_request (before reasoning gating and rendering) so both observe the same thinking state. - vllm/main.py: add missing `import json` for the set_engine_specific call. - The PR's frontend Python unit tests were not ported (written against newer-main fixtures); the Rust preprocessor unit tests are included. Original-author: Indrajit Bhosale <iamindrajitb@gmail.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

indrajit96 added 3 commits June 26, 2026 00:31

Thinking toggle at deployemnt level

f733cd1

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

Thinking toggle at deployemnt level

e5b2bc8

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

Merge branch 'main' into ibhosale/deployment-thinking-toggle

3443e08

Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>

indrajit96 requested review from a team as code owners June 29, 2026 16:57

pull-request-size Bot added the size/L label Jun 29, 2026

github-actions Bot added feat backend::vllm Relates to the vllm backend backend::sglang Relates to the sglang backend backend::trtllm Relates to the trtllm backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` labels Jun 29, 2026

devin-ai-integration Bot reviewed Jun 29, 2026

View reviewed changes

coderabbitai Bot reviewed Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Support deployment-level thinking defaults for chat templates#11047

feat: Support deployment-level thinking defaults for chat templates#11047
indrajit96 wants to merge 3 commits into
mainfrom
ibhosale/deployment-thinking-toggle

indrajit96 commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Jun 29, 2026

Uh oh!

coderabbitai Bot commented Jun 29, 2026

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Uh oh!

datadog-official Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    default_thinking_mode: Optional[str] = None
+    exclude_tools_when_tool_choice_none: bool = True
+    enable_local_indexer: bool = True
+    # Appended at the END of the dataclass to keep positional callers
+    # working -- inserting mid-class would silently shift downstream args.
+    route_to_encoder: bool = False
+    default_thinking_mode: Optional[str] = None

Uh oh!

Conversation

indrajit96 commented Jun 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Validation:

Summary by CodeRabbit

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 29, 2026

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

datadog-official Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ Warnings

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

indrajit96 commented Jun 29, 2026 •

edited by coderabbitai Bot

Loading

datadog-official Bot commented Jun 29, 2026 •

edited

Loading