[None][perf] offload chat template rendering into async#15284
[None][perf] offload chat template rendering into async#15284yechank-nvidia wants to merge 2 commits into
Conversation
|
/bot run |
📝 WalkthroughWalkthroughIntroduces ChangesAsync chat-template rendering and integration
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/unittest/inputs/test_chat_template_dispatch.py (1)
332-359: 🧹 Nitpick | 🔵 Trivial | 🏗️ Heavy liftCoverage is insufficient for the full async integration surface.
This test validates the wrapper contract well, but it does not cover the new
asyncio.gatherintegration paths. Please add follow-up tests for:
tests/unittest/serve/test_openai_server.py(bothprompt_token_idsand non-prompt_token_idsbranches),tests/unittest/serve/test_resource_governor.py(_convert_messagestokenization path),tests/unittest/serve/test_responses_utils.py(_create_input_tokensgather + multimodal unpack path).As per coding guidelines, reviews under
tests/**should explicitly assess whether coverage is sufficient and call out concrete follow-up files when it is not.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unittest/inputs/test_chat_template_dispatch.py` around lines 332 - 359, The test class TestAsyncApplyChatTemplate currently only validates the wrapper contract through test_runs_in_worker_thread but does not cover the asyncio.gather integration paths used in the actual application. Add follow-up test cases in three additional files: test_openai_server.py covering both prompt_token_ids and non-prompt_token_ids branches, test_resource_governor.py covering the _convert_messages tokenization path, and test_responses_utils.py covering the _create_input_tokens gather operation combined with multimodal unpacking. Each test should verify that async_apply_chat_template integrates correctly with its respective calling context and that asyncio.gather properly coordinates the async operations.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tensorrt_llm/serve/resource_governor.py`:
- Around line 104-106: Replace the use of self.model_config.model_type in the
async_apply_chat_template call with a call to resolve_top_level_model_type to
ensure KV-truncation tokenization uses the same model-type resolver as the
serving paths and maintains consistency for aliased or top-level configs.
In `@tensorrt_llm/serve/responses_utils.py`:
- Line 838: The asyncio.gather assignment at line 838 incorrectly unpacks the
result from mm_coroutines. The mm_coroutines awaitable returns a tuple of
(mm_data, mm_embeddings), but the current code assigns this entire tuple
directly to mm_data, causing _create_input_tokens to return a tuple instead of
just the multimodal data dict as documented. Fix this by properly unpacking the
tuple result in the asyncio.gather assignment so that both mm_data and
mm_embeddings are correctly extracted, rather than assigning the tuple to
mm_data.
---
Nitpick comments:
In `@tests/unittest/inputs/test_chat_template_dispatch.py`:
- Around line 332-359: The test class TestAsyncApplyChatTemplate currently only
validates the wrapper contract through test_runs_in_worker_thread but does not
cover the asyncio.gather integration paths used in the actual application. Add
follow-up test cases in three additional files: test_openai_server.py covering
both prompt_token_ids and non-prompt_token_ids branches,
test_resource_governor.py covering the _convert_messages tokenization path, and
test_responses_utils.py covering the _create_input_tokens gather operation
combined with multimodal unpacking. Each test should verify that
async_apply_chat_template integrates correctly with its respective calling
context and that asyncio.gather properly coordinates the async operations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fd1178bd-f85c-45f3-b9e7-0112fb8b878a
📒 Files selected for processing (5)
tensorrt_llm/inputs/utils.pytensorrt_llm/serve/openai_server.pytensorrt_llm/serve/resource_governor.pytensorrt_llm/serve/responses_utils.pytests/unittest/inputs/test_chat_template_dispatch.py
|
PR_Github #55020 [ run ] triggered by Bot. Commit: |
5c5024f to
45efcca
Compare
|
PR_Github #55020 [ run ] completed with state
|
|
/bot run |
|
PR_Github #55028 [ run ] triggered by Bot. Commit: |
|
PR_Github #55028 [ run ] completed with state
|
|
/bot run |
|
PR_Github #55250 [ run ] triggered by Bot. Commit: |
|
PR_Github #55250 [ run ] completed with state
|
|
/bot run --disable-fail-fast |
|
PR_Github #55356 [ run ] triggered by Bot. Commit: |
|
PR_Github #55356 [ run ] completed with state
|
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
- resource_governor: resolve the top-level model type (resolve_top_level_ model_type) in _convert_messages, matching the serving call sites, instead of the raw model_config.model_type. - responses_utils: unpack the (mm_data, mm_embeddings) tuple from the asyncio.gather result so _create_input_tokens returns mm_data (not the whole tuple) as its contract states. - tests: add async regression coverage for both gather paths (ResourceGovernor._convert_messages and _create_input_tokens). Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
45efcca to
644bddc
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #55585 [ run ] triggered by Bot. Commit: |
|
PR_Github #55585 [ run ] completed with state
|
|
|
||
| @pytest.mark.asyncio | ||
| async def test_resource_governor_convert_messages(self, monkeypatch): | ||
| from unittest.mock import Mock |
There was a problem hiding this comment.
Nit: move imports to module-level.
Summary by CodeRabbit
Performance Improvements
Tests