Add multimodal and preemption metrics by CUHKSZzxy · Pull Request #4640 · InternLM/lmdeploy

CUHKSZzxy · 2026-06-01T07:54:18Z

Summary

Add multimodal preprocessing metrics for VLM requests, including request/item counters, total preprocessing latency, per-stage latency, per-request item count, and preprocessing failure counters.
Record multimodal metrics through the existing metrics processor and Prometheus logger path.
Export multimodal detailed metrics by default whenever the metrics system is enabled; no extra CLI flag is required.
Track PyTorch engine preemption events when requests are evicted under KV-cache pressure.
Export preemption visibility in both terminal logging and Prometheus via lmdeploy:num_preemptions_total.
Keep TPOT as the existing per-iteration inter-token metric, while request decode/inference histograms now measure through the latest generated token so preemption gaps are included without counting API-side finalization time.

Validation

Focused multimodal metrics tests passed.
Existing metrics logger tests passed.
Focused scheduler tests for preemption event emission passed.
Pre-commit passed on the touched metrics, scheduler, message, and test files.
Real OpenAI-compatible API smoke testing triggered preemptions under KV-cache pressure and confirmed matching terminal and Prometheus output.

Benchmark

Local OpenAI-compatible API macrobenchmarks compared this branch with main on a VLM image workload.
No throughput regression was observed in the large image payload run; the branch was within expected run-to-run variance.
A real PyTorch serving pressure run completed 32 long-output requests successfully and reported 8 preemptions in both terminal logging and /metrics.

Terminal Log

When a logging interval includes multimodal preprocessing, the terminal stats line now appends the average multimodal preprocessing latency:

[2026-06-01 12:18:04 Engine 000] Avg thr (in/out): 0.0 / 0.0 tokens/s, Server (succeeded/failed/routed/waiting): 0 / 1 / 0 / 0, Engine (running/waiting): 0 / 0, KV cache: 0.0%, Avg MM preprocess: 0.051 s/req,

When a logging interval includes request preemptions, the terminal stats line appends the preemption count:

[2026-06-16 14:40:20 Engine 000] Avg thr (in/out): 0.0 / 2187.7 tokens/s, Server (succeeded/failed/routed/waiting): 32 / 0 / 0 / 0, Engine (running/waiting): 0 / 0, KV cache: 0.1%, Preemptions: 8,

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast

Copilot

Pull request overview

This PR adds end-to-end metrics for multimodal (VLM) prompt preprocessing, wiring new stats collection from request processing through the existing metrics processor and Prometheus/console loggers, and documenting the exported metrics.

Changes:

Introduces MultimodalStats to track multimodal item counts, per-stage/total preprocessing latency, and failures.
Instruments MultimodalProcessor and AsyncEngine.generate() to collect and emit multimodal preprocessing stats via metrics_processor.
Extends metrics loggers (Prometheus + periodic console logger) and updates EN/ZH metrics documentation; adds targeted tests for the new multimodal metrics.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/test_lmdeploy/test_metrics_multimodal.py	Adds unit tests covering multimodal stats behavior and Prometheus emission.
lmdeploy/serve/processors/multimodal.py	Instruments multimodal parsing + VLM preprocessing stages and threadsafe stats updates.
lmdeploy/serve/core/async_engine.py	Creates/records per-request multimodal stats in the request path.
lmdeploy/metrics/stats.py	Adds `MultimodalStats` for multimodal preprocessing accounting.
lmdeploy/metrics/metrics_processor.py	Adds `record_multimodal()` to emit multimodal stats through configured loggers.
lmdeploy/metrics/loggers.py	Implements multimodal metric export for Prometheus and aggregates for console logging.
docs/zh_cn/advance/metrics.md	Documents newly exported multimodal preprocessing metrics (CN).
docs/en/advance/metrics.md	Documents newly exported multimodal preprocessing metrics (EN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

# Conflicts: # lmdeploy/serve/processors/multimodal.py

…rics # Conflicts: # tests/pytorch/paging/test_scheduler.py

Copilot

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

CUHKSZzxy added 2 commits June 1, 2026 15:10

feat: add multimodal metrics

5ae54bb

refactor: enable multimodal metrics by default

bafee8b

Copilot AI review requested due to automatic review settings June 1, 2026 07:54

Copilot started reviewing on behalf of CUHKSZzxy June 1, 2026 07:54 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread lmdeploy/serve/processors/multimodal.py

Comment thread lmdeploy/serve/processors/multimodal.py

Comment thread lmdeploy/serve/processors/multimodal.py Outdated

CUHKSZzxy added 7 commits June 1, 2026 16:00

style: fix metrics lint

4265ff9

fix: record multimodal parse failures

18b8be3

refactor: simplify multimodal metrics defaults

c6df63d

Merge remote-tracking branch 'origin/main' into feat/multimodal-metrics

4f76787

# Conflicts: # lmdeploy/serve/processors/multimodal.py

Merge remote-tracking branch 'origin/main' into feat/multimodal-metrics

7cd5160

Merge branch 'main' into feat/multimodal-metrics

ff8a801

feat: track preempted requests in metrics

ca157a9

CUHKSZzxy changed the title ~~Add multimodal preprocessing metrics~~ Add multimodal and preemption metrics Jun 16, 2026

Merge remote-tracking branch 'upstream/main' into feat/multimodal-met…

831da18

…rics # Conflicts: # tests/pytorch/paging/test_scheduler.py

CUHKSZzxy requested a review from Copilot June 16, 2026 08:17

Copilot started reviewing on behalf of CUHKSZzxy June 16, 2026 08:17 View session

fix: avoid preemption-only terminal logs

27f4dcd

Copilot AI reviewed Jun 16, 2026

View reviewed changes

Comment thread lmdeploy/serve/processors/multimodal.py

Comment thread docs/en/advance/metrics.md Outdated

Comment thread docs/zh_cn/advance/metrics.md Outdated

fix: address multimodal metrics review comments

978032b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add multimodal and preemption metrics#4640

Add multimodal and preemption metrics#4640
CUHKSZzxy wants to merge 12 commits into
InternLM:mainfrom
CUHKSZzxy:feat/multimodal-metrics

CUHKSZzxy commented Jun 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CUHKSZzxy commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Benchmark

Terminal Log

Assistance

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CUHKSZzxy commented Jun 1, 2026 •

edited

Loading