Skip to content

Add multimodal and preemption metrics#4640

Open
CUHKSZzxy wants to merge 12 commits into
InternLM:mainfrom
CUHKSZzxy:feat/multimodal-metrics
Open

Add multimodal and preemption metrics#4640
CUHKSZzxy wants to merge 12 commits into
InternLM:mainfrom
CUHKSZzxy:feat/multimodal-metrics

Conversation

@CUHKSZzxy

@CUHKSZzxy CUHKSZzxy commented Jun 1, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • Add multimodal preprocessing metrics for VLM requests, including request/item counters, total preprocessing latency, per-stage latency, per-request item count, and preprocessing failure counters.
  • Record multimodal metrics through the existing metrics processor and Prometheus logger path.
  • Export multimodal detailed metrics by default whenever the metrics system is enabled; no extra CLI flag is required.
  • Track PyTorch engine preemption events when requests are evicted under KV-cache pressure.
  • Export preemption visibility in both terminal logging and Prometheus via lmdeploy:num_preemptions_total.
  • Keep TPOT as the existing per-iteration inter-token metric, while request decode/inference histograms now measure through the latest generated token so preemption gaps are included without counting API-side finalization time.

Validation

  • Focused multimodal metrics tests passed.
  • Existing metrics logger tests passed.
  • Focused scheduler tests for preemption event emission passed.
  • Pre-commit passed on the touched metrics, scheduler, message, and test files.
  • Real OpenAI-compatible API smoke testing triggered preemptions under KV-cache pressure and confirmed matching terminal and Prometheus output.

Benchmark

  • Local OpenAI-compatible API macrobenchmarks compared this branch with main on a VLM image workload.
  • No throughput regression was observed in the large image payload run; the branch was within expected run-to-run variance.
  • A real PyTorch serving pressure run completed 32 long-output requests successfully and reported 8 preemptions in both terminal logging and /metrics.

Terminal Log

When a logging interval includes multimodal preprocessing, the terminal stats line now appends the average multimodal preprocessing latency:

[2026-06-01 12:18:04 Engine 000] Avg thr (in/out): 0.0 / 0.0 tokens/s, Server (succeeded/failed/routed/waiting): 0 / 1 / 0 / 0, Engine (running/waiting): 0 / 0, KV cache: 0.0%, Avg MM preprocess: 0.051 s/req,

When a logging interval includes request preemptions, the terminal stats line appends the preemption count:

[2026-06-16 14:40:20 Engine 000] Avg thr (in/out): 0.0 / 2187.7 tokens/s, Server (succeeded/failed/routed/waiting): 32 / 0 / 0 / 0, Engine (running/waiting): 0 / 0, KV cache: 0.1%, Preemptions: 8,

Assistance

Assisted with Codex + GPT-5.5 xHigh Fast

Copilot AI review requested due to automatic review settings June 1, 2026 07:54

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds end-to-end metrics for multimodal (VLM) prompt preprocessing, wiring new stats collection from request processing through the existing metrics processor and Prometheus/console loggers, and documenting the exported metrics.

Changes:

  • Introduces MultimodalStats to track multimodal item counts, per-stage/total preprocessing latency, and failures.
  • Instruments MultimodalProcessor and AsyncEngine.generate() to collect and emit multimodal preprocessing stats via metrics_processor.
  • Extends metrics loggers (Prometheus + periodic console logger) and updates EN/ZH metrics documentation; adds targeted tests for the new multimodal metrics.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests/test_lmdeploy/test_metrics_multimodal.py Adds unit tests covering multimodal stats behavior and Prometheus emission.
lmdeploy/serve/processors/multimodal.py Instruments multimodal parsing + VLM preprocessing stages and threadsafe stats updates.
lmdeploy/serve/core/async_engine.py Creates/records per-request multimodal stats in the request path.
lmdeploy/metrics/stats.py Adds MultimodalStats for multimodal preprocessing accounting.
lmdeploy/metrics/metrics_processor.py Adds record_multimodal() to emit multimodal stats through configured loggers.
lmdeploy/metrics/loggers.py Implements multimodal metric export for Prometheus and aggregates for console logging.
docs/zh_cn/advance/metrics.md Documents newly exported multimodal preprocessing metrics (CN).
docs/en/advance/metrics.md Documents newly exported multimodal preprocessing metrics (EN).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread lmdeploy/serve/processors/multimodal.py
Comment thread lmdeploy/serve/processors/multimodal.py
Comment thread lmdeploy/serve/processors/multimodal.py Outdated
@CUHKSZzxy CUHKSZzxy changed the title Add multimodal preprocessing metrics Add multimodal and preemption metrics Jun 16, 2026
…rics

# Conflicts:
#	tests/pytorch/paging/test_scheduler.py

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.

Comment thread lmdeploy/serve/processors/multimodal.py
Comment thread docs/en/advance/metrics.md Outdated
Comment thread docs/zh_cn/advance/metrics.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants