Skip to content

feat: support multiple model names in --served_model_name#12746

Open
nvyutwu wants to merge 1 commit intoNVIDIA:mainfrom
nvyutwu:yutwu/multi-model-name
Open

feat: support multiple model names in --served_model_name#12746
nvyutwu wants to merge 1 commit intoNVIDIA:mainfrom
nvyutwu:yutwu/multi-model-name

Conversation

@nvyutwu
Copy link
Copy Markdown

@nvyutwu nvyutwu commented Apr 3, 2026

Summary

  • Allow --served_model_name to be specified multiple times for model name aliases
  • The first name is the primary; additional names are aliases the server also accepts
  • /v1/models endpoint returns all registered names; responses echo back the client-requested alias

Changes

  • serve.py: --served_model_name gains multiple=True (click); launch_server/launch_grpc_server accept Sequence[str]; normalizes to list internally
  • openai_server.py: __init__ accepts Union[str, Sequence[str]]; stores self.model (primary) and self.served_model_names (all); get_model() returns all names; added _resolve_model_name() to echo the client-requested alias in responses

Usage

trtllm-serve model --served_model_name my-model --served_model_name alias1

Test plan

  • Verify --served_model_name my-model --served_model_name alias1 starts server with both names
  • Verify /v1/models returns both model names
  • Verify chat/completion requests with alias name return the alias in the response model field
  • Verify single --served_model_name still works as before (backward compatible)
  • Verify omitting --served_model_name defaults to model path (backward compatible)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features
    • Serve a model under multiple name aliases via repeated --served_model_name; server accepts and normalizes multiple names.
    • The /v1/models endpoint lists all configured aliases; requests may use any alias and responses reflect the requested alias.
  • Behavior Change
    • gRPC serving uses the primary name (first alias) while HTTP/OpenAI endpoints honor aliases.

@nvyutwu nvyutwu requested a review from a team as a code owner April 3, 2026 20:41
@nvyutwu nvyutwu requested a review from hchings April 3, 2026 20:41
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

📝 Walkthrough

Walkthrough

The PR adds support for multiple served model names (aliases). The CLI option --served_model_name now accepts multiple values. launch_server and launch_grpc_server accept Optional[Sequence[str]]. OpenAIServer accepts a model list, resolves request model names to aliases, and exposes all aliases via /v1/models.

Changes

Cohort / File(s) Summary
CLI Command & Launch Functions
tensorrt_llm/commands/serve.py
Changed --served_model_name from a single string to a multi-valued Click option (multiple=True). serve() now receives a Tuple[str, ...]. launch_server() and launch_grpc_server() signatures accept and normalize Optional[Sequence[str]]. gRPC path derives model_path from the first name; HTTP/OpenAI uses the full list when provided.
OpenAI Server Model Handling
tensorrt_llm/serve/openai_server.py
OpenAIServer.__init__ now accepts model: Union[str, Sequence[str]] and normalizes primary name plus unique aliases into served_model_names. Added OpenAIServer._resolve_model_name(requested: Optional[str]) -> str. /v1/models returns all aliases; chat/completions/embeddings and Responses API use resolved model names in responses (health generation uses primary).

Sequence Diagram

sequenceDiagram
    participant Client
    participant OpenAIServer
    participant ModelResolver as Model Resolver

    Client->>OpenAIServer: POST /v1/chat/completions<br/>model="alias_name"
    activate OpenAIServer
    OpenAIServer->>ModelResolver: _resolve_model_name("alias_name")
    activate ModelResolver
    ModelResolver-->>OpenAIServer: resolved_model_name
    deactivate ModelResolver
    OpenAIServer->>OpenAIServer: Generate response with<br/>model=resolved_model_name
    OpenAIServer-->>Client: Response with<br/>model field matching alias
    deactivate OpenAIServer

    Client->>OpenAIServer: GET /v1/models
    activate OpenAIServer
    OpenAIServer-->>Client: List all served_model_names<br/>from ModelCard entries
    deactivate OpenAIServer
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely describes the main feature being added: support for multiple model names in the --served_model_name option.
Description check ✅ Passed The PR description is well-structured with Summary, Changes, Usage, and Test plan sections. However, it deviates from the repository's required template structure which specifies additional required sections like PR Checklist, CODEOWNERS updates, and documentation updates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
tensorrt_llm/serve/openai_server.py (1)

197-210: Duplicate aliases are not fully deduplicated.

The current logic only filters out the primary name from subsequent aliases. If a user provides ["model", "alias1", "alias1"], the duplicate alias1 will appear twice in served_model_names. Consider using a set or ordered deduplication:

-        self.served_model_names: List[str] = [primary] + [
-            n for n in names[1:] if n != primary
-        ]
+        seen = {primary}
+        self.served_model_names: List[str] = [primary]
+        for n in names[1:]:
+            if n not in seen:
+                seen.add(n)
+                self.served_model_names.append(n)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 210, The
deduplication logic for served_model_names currently only removes occurrences
equal to primary but allows duplicate aliases (in the block handling model ->
names and setting self.model and self.served_model_names). Update the
construction of self.served_model_names to perform ordered deduplication:
iterate over names (skipping the primary entry), track seen names in a set, and
append only the first occurrence of each alias so duplicates are eliminated
while preserving order and still excluding primary; keep the existing
primary/model_dir handling and assign self.model = primary as before.
tensorrt_llm/commands/serve.py (1)

324-357: Document the gRPC server's single-model limitation.

The gRPC server uses only the first served model name (line 357) and passes it to TrtllmServiceServicer, which returns a single model_id in GetModelInfo. The HTTP server, by contrast, supports multiple model aliases via OpenAIServer. Since this is an intentional design (gRPC targets external routers that handle aliasing themselves), consider clarifying this behavior in the CLI help text or PR description to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/commands/serve.py` around lines 324 - 357, The gRPC server
implemented in launch_grpc_server only uses the first entry of served_model_name
(variable _names -> model_path) and TrtllmServiceServicer/its GetModelInfo
returns a single model_id, unlike the HTTP OpenAIServer which supports multiple
aliases; update the CLI help text or the PR description to explicitly document
this single-model limitation for the gRPC interface (mentioning
launch_grpc_server, served_model_name, TrtllmServiceServicer, GetModelInfo and
OpenAIServer) so users know gRPC expects external routers to handle aliasing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/commands/serve.py`:
- Around line 324-357: The gRPC server implemented in launch_grpc_server only
uses the first entry of served_model_name (variable _names -> model_path) and
TrtllmServiceServicer/its GetModelInfo returns a single model_id, unlike the
HTTP OpenAIServer which supports multiple aliases; update the CLI help text or
the PR description to explicitly document this single-model limitation for the
gRPC interface (mentioning launch_grpc_server, served_model_name,
TrtllmServiceServicer, GetModelInfo and OpenAIServer) so users know gRPC expects
external routers to handle aliasing.

In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-210: The deduplication logic for served_model_names currently
only removes occurrences equal to primary but allows duplicate aliases (in the
block handling model -> names and setting self.model and
self.served_model_names). Update the construction of self.served_model_names to
perform ordered deduplication: iterate over names (skipping the primary entry),
track seen names in a set, and append only the first occurrence of each alias so
duplicates are eliminated while preserving order and still excluding primary;
keep the existing primary/model_dir handling and assign self.model = primary as
before.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c733621e-35b3-4d4b-a473-ea337f779734

📥 Commits

Reviewing files that changed from the base of the PR and between 7ee9e8b and cdda8d8.

📒 Files selected for processing (2)
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/openai_server.py

@svc-trtllm-gh-bot svc-trtllm-gh-bot added the Community want to contribute PRs initiated from Community label Apr 3, 2026
@nvyutwu nvyutwu force-pushed the yutwu/multi-model-name branch from 294549f to a2b661b Compare April 3, 2026 22:04
@nvyutwu
Copy link
Copy Markdown
Author

nvyutwu commented Apr 3, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 3, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tensorrt_llm/serve/openai_server.py (1)

197-213: Edge case: empty sequence produces empty primary model name.

If an empty list/tuple is passed, primary becomes an empty string. While this is unlikely in practice (the CLI defaults to the model path when no --served_model_name is provided), consider adding a guard or assertion to fail fast.

🛡️ Optional defensive check
         if isinstance(model, (list, tuple)):
             names = list(model)
         else:
             names = [model]
+        if not names or not names[0]:
+            raise ValueError("At least one model name must be provided")
         primary = names[0] if names else ""
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 213, The code
normalizes model names into names/primary and can set primary to an empty string
if an empty list/tuple is passed; add a defensive check at the start of that
block to fail fast (e.g., if not names: raise ValueError("served_model_name
cannot be empty") or assert names) so primary is never empty, and ensure
subsequent assignments to self.model and self.served_model_names only run after
this validation; reference the variables names, primary, model, self.model, and
self.served_model_names when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-213: The code normalizes model names into names/primary and
can set primary to an empty string if an empty list/tuple is passed; add a
defensive check at the start of that block to fail fast (e.g., if not names:
raise ValueError("served_model_name cannot be empty") or assert names) so
primary is never empty, and ensure subsequent assignments to self.model and
self.served_model_names only run after this validation; reference the variables
names, primary, model, self.model, and self.served_model_names when making the
change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 88be5c4b-56c9-4278-a708-97ff1a6ed60a

📥 Commits

Reviewing files that changed from the base of the PR and between cdda8d8 and a2b661b.

📒 Files selected for processing (2)
  • tensorrt_llm/commands/serve.py
  • tensorrt_llm/serve/openai_server.py

Allow specifying multiple served model names so that requests using any
alias are accepted and the /v1/models endpoint returns all names.

Changes:
- serve.py: --served_model_name is now multiple=True (specify flag
  multiple times); launch_server/launch_grpc_server accept Sequence[str];
  passes list to OpenAIServer
- openai_server.py: __init__ accepts Union[str, Sequence[str]]; stores
  self.model (primary) and self.served_model_names (all aliases);
  get_model() returns a ModelCard for each name; added
  _resolve_model_name() to echo back the client-requested name in
  responses if it matches a known alias
- Fully deduplicates aliases using ordered set logic
- Documents gRPC single-model limitation in docstring

Usage:
  trtllm-serve model --served_model_name my-model --served_model_name alias1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: nvyutwu <yutwu@nvidia.com>
@nvyutwu nvyutwu force-pushed the yutwu/multi-model-name branch from a2b661b to 298d284 Compare April 3, 2026 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Community want to contribute PRs initiated from Community

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants