[Feature]: KV cache config metrics unavailable until first inference request

### 🚀 The feature, motivation and pitch

`trtllm_cache_config_info` (added in #12564) is only populated after the first inference request. The iteration stats pipeline `_maybe_initialize_iteration_results()` is only called on request submission, so `get_stats_async()` returns empty until then.

External scrapers like the [Kubernetes Inference Gateway EPP](https://github.com/kubernetes-sigs/gateway-api-inference-extension) need `cache_config_info` (with `block_size` and `num_gpu_blocks` labels) at startup to make routing decisions. Without it, pods that haven't received traffic get lower scores, so they never get routed to.

Proposed fix: emit a one-time stats snapshot from `PyExecutor.__init__` (after the KV cache manager is initialized) containing `max_num_blocks` and `tokens_per_block`, and wake the stats collector loop on server startup so it processes the initial stats immediately. The data is already available via `kv_cache_manager.get_kv_cache_stats()` at init time, it just isn't surfaced through the stats pipeline until the first request.

### Alternatives

The alternative is to have external systems (e.g. the Inference Gateway EPP) send dummy warmup requests to each pod on discovery. This works but is fragile as it doesn't survive pod restarts, adds negligible but unnecessary inference load, and is somewhat hacky. Emitting the stats at init is cleaner since the data is already available.

### Additional context

A comparison of trtllm-serve vs other model server metrics and the full gap analysis can be found in [kubernetes-sigs/gateway-api-inference-extension#2596](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/2596).

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature]: KV cache config metrics unavailable until first inference request #12595

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Feature]: KV cache config metrics unavailable until first inference request #12595

Description

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions