feat: support multiple model names in --served_model_name by nvyutwu · Pull Request #12746 · NVIDIA/TensorRT-LLM

nvyutwu · 2026-04-03T20:40:57Z

Summary

Allow --served_model_name to be specified multiple times for model name aliases
The first name is the primary; additional names are aliases the server also accepts
/v1/models endpoint returns all registered names; responses echo back the client-requested alias

Changes

serve.py: --served_model_name gains multiple=True (click); launch_server/launch_grpc_server accept Sequence[str]; normalizes to list internally
openai_server.py: __init__ accepts Union[str, Sequence[str]]; stores self.model (primary) and self.served_model_names (all); get_model() returns all names; added _resolve_model_name() to echo the client-requested alias in responses

Usage

trtllm-serve model --served_model_name my-model --served_model_name alias1

Test plan

Verify --served_model_name my-model --served_model_name alias1 starts server with both names
Verify /v1/models returns both model names
Verify chat/completion requests with alias name return the alias in the response model field
Verify single --served_model_name still works as before (backward compatible)
Verify omitting --served_model_name defaults to model path (backward compatible)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Serve a model under multiple name aliases via repeated --served_model_name; server accepts and normalizes multiple names.
- The /v1/models endpoint lists all configured aliases; requests may use any alias and responses reflect the requested alias.
Behavior Change
- gRPC serving uses the primary name (first alias) while HTTP/OpenAI endpoints honor aliases.

coderabbitai · 2026-04-03T20:46:28Z

📝 Walkthrough

Walkthrough

The PR adds support for multiple served model names (aliases). The CLI option --served_model_name now accepts multiple values. launch_server and launch_grpc_server accept Optional[Sequence[str]]. OpenAIServer accepts a model list, resolves request model names to aliases, and exposes all aliases via /v1/models.

Changes

Cohort / File(s)	Summary
CLI Command & Launch Functions `tensorrt_llm/commands/serve.py`	Changed `--served_model_name` from a single string to a multi-valued Click option (`multiple=True`). `serve()` now receives a `Tuple[str, ...]`. `launch_server()` and `launch_grpc_server()` signatures accept and normalize `Optional[Sequence[str]]`. gRPC path derives `model_path` from the first name; HTTP/OpenAI uses the full list when provided.
OpenAI Server Model Handling `tensorrt_llm/serve/openai_server.py`	`OpenAIServer.__init__` now accepts `model: Union[str, Sequence[str]]` and normalizes primary name plus unique aliases into `served_model_names`. Added `OpenAIServer._resolve_model_name(requested: Optional[str]) -> str`. `/v1/models` returns all aliases; chat/completions/embeddings and Responses API use resolved model names in responses (health generation uses primary).

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ModelResolver as Model Resolver

    Client->>OpenAIServer: POST /v1/chat/completions<br/>model="alias_name"
    activate OpenAIServer
    OpenAIServer->>ModelResolver: _resolve_model_name("alias_name")
    activate ModelResolver
    ModelResolver-->>OpenAIServer: resolved_model_name
    deactivate ModelResolver
    OpenAIServer->>OpenAIServer: Generate response with<br/>model=resolved_model_name
    OpenAIServer-->>Client: Response with<br/>model field matching alias
    deactivate OpenAIServer

    Client->>OpenAIServer: GET /v1/models
    activate OpenAIServer
    OpenAIServer-->>Client: List all served_model_names<br/>from ModelCard entries
    deactivate OpenAIServer

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main feature being added: support for multiple model names in the --served_model_name option.
Description check	✅ Passed	The PR description is well-structured with Summary, Changes, Usage, and Test plan sections. However, it deviates from the repository's required template structure which specifies additional required sections like PR Checklist, CODEOWNERS updates, and documentation updates.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (2)

tensorrt_llm/serve/openai_server.py (1)
197-210: Duplicate aliases are not fully deduplicated.

The current logic only filters out the primary name from subsequent aliases. If a user provides ["model", "alias1", "alias1"], the duplicate alias1 will appear twice in served_model_names. Consider using a set or ordered deduplication:
-        self.served_model_names: List[str] = [primary] + [
-            n for n in names[1:] if n != primary
-        ]
+        seen = {primary}
+        self.served_model_names: List[str] = [primary]
+        for n in names[1:]:
+            if n not in seen:
+                seen.add(n)
+                self.served_model_names.append(n)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 210, The
deduplication logic for served_model_names currently only removes occurrences
equal to primary but allows duplicate aliases (in the block handling model ->
names and setting self.model and self.served_model_names). Update the
construction of self.served_model_names to perform ordered deduplication:
iterate over names (skipping the primary entry), track seen names in a set, and
append only the first occurrence of each alias so duplicates are eliminated
while preserving order and still excluding primary; keep the existing
primary/model_dir handling and assign self.model = primary as before.
tensorrt_llm/commands/serve.py (1)
324-357: Document the gRPC server's single-model limitation.

The gRPC server uses only the first served model name (line 357) and passes it to TrtllmServiceServicer, which returns a single model_id in GetModelInfo. The HTTP server, by contrast, supports multiple model aliases via OpenAIServer. Since this is an intentional design (gRPC targets external routers that handle aliasing themselves), consider clarifying this behavior in the CLI help text or PR description to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/commands/serve.py` around lines 324 - 357, The gRPC server
implemented in launch_grpc_server only uses the first entry of served_model_name
(variable _names -> model_path) and TrtllmServiceServicer/its GetModelInfo
returns a single model_id, unlike the HTTP OpenAIServer which supports multiple
aliases; update the CLI help text or the PR description to explicitly document
this single-model limitation for the gRPC interface (mentioning
launch_grpc_server, served_model_name, TrtllmServiceServicer, GetModelInfo and
OpenAIServer) so users know gRPC expects external routers to handle aliasing.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/commands/serve.py`:
- Around line 324-357: The gRPC server implemented in launch_grpc_server only
uses the first entry of served_model_name (variable _names -> model_path) and
TrtllmServiceServicer/its GetModelInfo returns a single model_id, unlike the
HTTP OpenAIServer which supports multiple aliases; update the CLI help text or
the PR description to explicitly document this single-model limitation for the
gRPC interface (mentioning launch_grpc_server, served_model_name,
TrtllmServiceServicer, GetModelInfo and OpenAIServer) so users know gRPC expects
external routers to handle aliasing.

In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-210: The deduplication logic for served_model_names currently
only removes occurrences equal to primary but allows duplicate aliases (in the
block handling model -> names and setting self.model and
self.served_model_names). Update the construction of self.served_model_names to
perform ordered deduplication: iterate over names (skipping the primary entry),
track seen names in a set, and append only the first occurrence of each alias so
duplicates are eliminated while preserving order and still excluding primary;
keep the existing primary/model_dir handling and assign self.model = primary as
before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c733621e-35b3-4d4b-a473-ea337f779734

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee9e8b and cdda8d8.

📒 Files selected for processing (2)

tensorrt_llm/commands/serve.py
tensorrt_llm/serve/openai_server.py

nvyutwu · 2026-04-03T22:04:26Z

@coderabbitai review

coderabbitai · 2026-04-03T22:04:32Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

🧹 Nitpick comments (1)

tensorrt_llm/serve/openai_server.py (1)

197-213: Edge case: empty sequence produces empty primary model name.

If an empty list/tuple is passed, primary becomes an empty string. While this is unlikely in practice (the CLI defaults to the model path when no --served_model_name is provided), consider adding a guard or assertion to fail fast.

🛡️ Optional defensive check

         if isinstance(model, (list, tuple)):
             names = list(model)
         else:
             names = [model]
+        if not names or not names[0]:
+            raise ValueError("At least one model name must be provided")
         primary = names[0] if names else ""

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 213, The code
normalizes model names into names/primary and can set primary to an empty string
if an empty list/tuple is passed; add a defensive check at the start of that
block to fail fast (e.g., if not names: raise ValueError("served_model_name
cannot be empty") or assert names) so primary is never empty, and ensure
subsequent assignments to self.model and self.served_model_names only run after
this validation; reference the variables names, primary, model, self.model, and
self.served_model_names when making the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-213: The code normalizes model names into names/primary and
can set primary to an empty string if an empty list/tuple is passed; add a
defensive check at the start of that block to fail fast (e.g., if not names:
raise ValueError("served_model_name cannot be empty") or assert names) so
primary is never empty, and ensure subsequent assignments to self.model and
self.served_model_names only run after this validation; reference the variables
names, primary, model, self.model, and self.served_model_names when making the
change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 88be5c4b-56c9-4278-a708-97ff1a6ed60a

📥 Commits

Reviewing files that changed from the base of the PR and between cdda8d8 and a2b661b.

📒 Files selected for processing (2)

tensorrt_llm/commands/serve.py
tensorrt_llm/serve/openai_server.py

Allow specifying multiple served model names so that requests using any alias are accepted and the /v1/models endpoint returns all names. Changes: - serve.py: --served_model_name is now multiple=True (specify flag multiple times); launch_server/launch_grpc_server accept Sequence[str]; passes list to OpenAIServer - openai_server.py: __init__ accepts Union[str, Sequence[str]]; stores self.model (primary) and self.served_model_names (all aliases); get_model() returns a ModelCard for each name; added _resolve_model_name() to echo back the client-requested name in responses if it matches a known alias - Fully deduplicates aliases using ordered set logic - Documents gRPC single-model limitation in docstring Usage: trtllm-serve model --served_model_name my-model --served_model_name alias1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nvyutwu <yutwu@nvidia.com>

nvyutwu requested a review from a team as a code owner April 3, 2026 20:41

nvyutwu requested a review from hchings April 3, 2026 20:41

github-actions bot assigned nvyutwu Apr 3, 2026

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 3, 2026

nvyutwu force-pushed the yutwu/multi-model-name branch from 294549f to a2b661b Compare April 3, 2026 22:04

coderabbitai bot reviewed Apr 3, 2026

View reviewed changes

nvyutwu force-pushed the yutwu/multi-model-name branch from a2b661b to 298d284 Compare April 3, 2026 22:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support multiple model names in --served_model_name#12746

feat: support multiple model names in --served_model_name#12746
nvyutwu wants to merge 1 commit intoNVIDIA:mainfrom
nvyutwu:yutwu/multi-model-name

nvyutwu commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Apr 3, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

nvyutwu commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nvyutwu commented Apr 3, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

nvyutwu commented Apr 3, 2026

Uh oh!

coderabbitai bot commented Apr 3, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nvyutwu commented Apr 3, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Apr 3, 2026 •

edited

Loading