feat: support multiple model names in --served_model_name#12746
feat: support multiple model names in --served_model_name#12746nvyutwu wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThe PR adds support for multiple served model names (aliases). The CLI option Changes
Sequence DiagramsequenceDiagram
participant Client
participant OpenAIServer
participant ModelResolver as Model Resolver
Client->>OpenAIServer: POST /v1/chat/completions<br/>model="alias_name"
activate OpenAIServer
OpenAIServer->>ModelResolver: _resolve_model_name("alias_name")
activate ModelResolver
ModelResolver-->>OpenAIServer: resolved_model_name
deactivate ModelResolver
OpenAIServer->>OpenAIServer: Generate response with<br/>model=resolved_model_name
OpenAIServer-->>Client: Response with<br/>model field matching alias
deactivate OpenAIServer
Client->>OpenAIServer: GET /v1/models
activate OpenAIServer
OpenAIServer-->>Client: List all served_model_names<br/>from ModelCard entries
deactivate OpenAIServer
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
tensorrt_llm/serve/openai_server.py (1)
197-210: Duplicate aliases are not fully deduplicated.The current logic only filters out the primary name from subsequent aliases. If a user provides
["model", "alias1", "alias1"], the duplicatealias1will appear twice inserved_model_names. Consider using a set or ordered deduplication:- self.served_model_names: List[str] = [primary] + [ - n for n in names[1:] if n != primary - ] + seen = {primary} + self.served_model_names: List[str] = [primary] + for n in names[1:]: + if n not in seen: + seen.add(n) + self.served_model_names.append(n)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 210, The deduplication logic for served_model_names currently only removes occurrences equal to primary but allows duplicate aliases (in the block handling model -> names and setting self.model and self.served_model_names). Update the construction of self.served_model_names to perform ordered deduplication: iterate over names (skipping the primary entry), track seen names in a set, and append only the first occurrence of each alias so duplicates are eliminated while preserving order and still excluding primary; keep the existing primary/model_dir handling and assign self.model = primary as before.tensorrt_llm/commands/serve.py (1)
324-357: Document the gRPC server's single-model limitation.The gRPC server uses only the first served model name (line 357) and passes it to
TrtllmServiceServicer, which returns a singlemodel_idinGetModelInfo. The HTTP server, by contrast, supports multiple model aliases viaOpenAIServer. Since this is an intentional design (gRPC targets external routers that handle aliasing themselves), consider clarifying this behavior in the CLI help text or PR description to avoid confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/commands/serve.py` around lines 324 - 357, The gRPC server implemented in launch_grpc_server only uses the first entry of served_model_name (variable _names -> model_path) and TrtllmServiceServicer/its GetModelInfo returns a single model_id, unlike the HTTP OpenAIServer which supports multiple aliases; update the CLI help text or the PR description to explicitly document this single-model limitation for the gRPC interface (mentioning launch_grpc_server, served_model_name, TrtllmServiceServicer, GetModelInfo and OpenAIServer) so users know gRPC expects external routers to handle aliasing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/commands/serve.py`:
- Around line 324-357: The gRPC server implemented in launch_grpc_server only
uses the first entry of served_model_name (variable _names -> model_path) and
TrtllmServiceServicer/its GetModelInfo returns a single model_id, unlike the
HTTP OpenAIServer which supports multiple aliases; update the CLI help text or
the PR description to explicitly document this single-model limitation for the
gRPC interface (mentioning launch_grpc_server, served_model_name,
TrtllmServiceServicer, GetModelInfo and OpenAIServer) so users know gRPC expects
external routers to handle aliasing.
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-210: The deduplication logic for served_model_names currently
only removes occurrences equal to primary but allows duplicate aliases (in the
block handling model -> names and setting self.model and
self.served_model_names). Update the construction of self.served_model_names to
perform ordered deduplication: iterate over names (skipping the primary entry),
track seen names in a set, and append only the first occurrence of each alias so
duplicates are eliminated while preserving order and still excluding primary;
keep the existing primary/model_dir handling and assign self.model = primary as
before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c733621e-35b3-4d4b-a473-ea337f779734
📒 Files selected for processing (2)
tensorrt_llm/commands/serve.pytensorrt_llm/serve/openai_server.py
294549f to
a2b661b
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tensorrt_llm/serve/openai_server.py (1)
197-213: Edge case: empty sequence produces empty primary model name.If an empty list/tuple is passed,
primarybecomes an empty string. While this is unlikely in practice (the CLI defaults to the model path when no--served_model_nameis provided), consider adding a guard or assertion to fail fast.🛡️ Optional defensive check
if isinstance(model, (list, tuple)): names = list(model) else: names = [model] + if not names or not names[0]: + raise ValueError("At least one model name must be provided") primary = names[0] if names else ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tensorrt_llm/serve/openai_server.py` around lines 197 - 213, The code normalizes model names into names/primary and can set primary to an empty string if an empty list/tuple is passed; add a defensive check at the start of that block to fail fast (e.g., if not names: raise ValueError("served_model_name cannot be empty") or assert names) so primary is never empty, and ensure subsequent assignments to self.model and self.served_model_names only run after this validation; reference the variables names, primary, model, self.model, and self.served_model_names when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tensorrt_llm/serve/openai_server.py`:
- Around line 197-213: The code normalizes model names into names/primary and
can set primary to an empty string if an empty list/tuple is passed; add a
defensive check at the start of that block to fail fast (e.g., if not names:
raise ValueError("served_model_name cannot be empty") or assert names) so
primary is never empty, and ensure subsequent assignments to self.model and
self.served_model_names only run after this validation; reference the variables
names, primary, model, self.model, and self.served_model_names when making the
change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 88be5c4b-56c9-4278-a708-97ff1a6ed60a
📒 Files selected for processing (2)
tensorrt_llm/commands/serve.pytensorrt_llm/serve/openai_server.py
Allow specifying multiple served model names so that requests using any alias are accepted and the /v1/models endpoint returns all names. Changes: - serve.py: --served_model_name is now multiple=True (specify flag multiple times); launch_server/launch_grpc_server accept Sequence[str]; passes list to OpenAIServer - openai_server.py: __init__ accepts Union[str, Sequence[str]]; stores self.model (primary) and self.served_model_names (all aliases); get_model() returns a ModelCard for each name; added _resolve_model_name() to echo back the client-requested name in responses if it matches a known alias - Fully deduplicates aliases using ordered set logic - Documents gRPC single-model limitation in docstring Usage: trtllm-serve model --served_model_name my-model --served_model_name alias1 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: nvyutwu <yutwu@nvidia.com>
a2b661b to
298d284
Compare
Summary
--served_model_nameto be specified multiple times for model name aliases/v1/modelsendpoint returns all registered names; responses echo back the client-requested aliasChanges
serve.py:--served_model_namegainsmultiple=True(click);launch_server/launch_grpc_serveracceptSequence[str]; normalizes to list internallyopenai_server.py:__init__acceptsUnion[str, Sequence[str]]; storesself.model(primary) andself.served_model_names(all);get_model()returns all names; added_resolve_model_name()to echo the client-requested alias in responsesUsage
Test plan
--served_model_name my-model --served_model_name alias1starts server with both names/v1/modelsreturns both model namesmodelfield--served_model_namestill works as before (backward compatible)--served_model_namedefaults to model path (backward compatible)🤖 Generated with Claude Code
Summary by CodeRabbit