Skip to content

[Feature]: KV cache config metrics unavailable until first inference request #12595

Description

@BenjaminBraunDev

🚀 The feature, motivation and pitch

trtllm_cache_config_info (added in #12564) is only populated after the first inference request. The iteration stats pipeline _maybe_initialize_iteration_results() is only called on request submission, so get_stats_async() returns empty until then.

External scrapers like the Kubernetes Inference Gateway EPP need cache_config_info (with block_size and num_gpu_blocks labels) at startup to make routing decisions. Without it, pods that haven't received traffic get lower scores, so they never get routed to.

Proposed fix: emit a one-time stats snapshot from PyExecutor.__init__ (after the KV cache manager is initialized) containing max_num_blocks and tokens_per_block, and wake the stats collector loop on server startup so it processes the initial stats immediately. The data is already available via kv_cache_manager.get_kv_cache_stats() at init time, it just isn't surfaced through the stats pipeline until the first request.

Alternatives

The alternative is to have external systems (e.g. the Inference Gateway EPP) send dummy warmup requests to each pod on discovery. This works but is fragile as it doesn't survive pod restarts, adds negligible but unnecessary inference load, and is somewhat hacky. Emitting the stats at init is cleaner since the data is already available.

Additional context

A comparison of trtllm-serve vs other model server metrics and the full gap analysis can be found in kubernetes-sigs/gateway-api-inference-extension#2596.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

KV-Cache Managementkv-cache management for efficient LLM inferencefeature requestNew feature or request. This includes new model, dtype, functionality support

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions