[Metric] Support custom metric labels#7865
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务仍有 2 个失败:主覆盖率任务未达阈值,Approval 任务等待人工审批;处理完成前不建议合入。
2 任务状态汇总日志列说明:失败任务直接使用 CI 工具预生成链接;运行中任务手动链接 Job。 2.1 Required任务 : 8/10 通过
2.2 可选任务 — 29/32 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 覆盖率阈值未达标(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 代码上下文复核:
修复建议:
修复建议摘要: 补充 metrics/common_engine 覆盖率测试 Approval — 需要人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #7865 +/- ##
==========================================
Coverage ? 64.06%
==========================================
Files ? 468
Lines ? 65107
Branches ? 9984
==========================================
Hits ? 41709
Misses ? 20565
Partials ? 2833
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…e interface Introduce MetricsManagerInterface with unified set_value/inc_value/dec_value/obs_value methods. When FD_DEFAULT_METRIC_LABEL_VALUES is set to a valid non-empty JSON dict, metric labels (e.g. model_id) are automatically applied. Otherwise, operations fall back to the raw prometheus_client calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ba1a9fc to
fa2885a
Compare
fa2885a to
c7f13de
Compare
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-26 20:59:12
📋 Review 摘要
PR 概述:新增 MetricsManagerInterface 抽象层,支持通过 FD_DEFAULT_METRIC_LABEL_VALUES 环境变量为所有 Prometheus 指标附加自定义 label(如 model_id)。
变更范围:fastdeploy/metrics/、fastdeploy/envs.py、14 个调用方文件、测试文件
影响面 Tag:[Feature] [Engine] [APIServer] [KVCache] [DataProcessor] [PD Disaggregation]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 兼容性 | fastdeploy/metrics/metrics.py:819 |
spec_decode_draft_single_head_acceptance_rate 指标名称 Breaking Change |
📝 PR 规范检查
标题 Tag [Metric] 不在官方 Tag 列表中,属于自创 Tag。
标题建议(可直接复制):
[Feature] Support custom metric labels
PR 描述建议(点击展开,可直接复制)
## Motivation
Re-implement PR #4480 on current develop branch. The original PR introduced `MetricsManagerInterface` to support custom labels (e.g., `model_id`) on Prometheus metrics, but the codebase has changed significantly since then (`WorkMetricsManager` removed, new `v1/serving_chat.py` added, `internal_adapter_utils.py` no longer imports metrics, etc.).
## Modifications
1. **New file `fastdeploy/metrics/interface.py`**: Define `MetricsManagerInterface` with 4 abstract methods: `set_value`, `inc_value`, `dec_value`, `obs_value`.
2. **`fastdeploy/metrics/metrics.py`**:
- `MetricsManager` inherits from `MetricsManagerInterface`
- Parse `FD_DEFAULT_METRIC_LABEL_VALUES` env var; when set to a valid non-empty JSON dict, enable metric labels
- `_patch_labelnames()`: add label keys from `_default_labelvalues` to all metrics' `labelnames`
- Implement the 4 interface methods: when labels enabled, call `metric.labels(**merged).set()/inc()/dec()/observe()`; otherwise, call `metric.set()/inc()/dec()/observe()` directly
- Handle `set_cache_config_info()`, `record_zmq_stats()`, `init_zmq_metrics()`, `_init_speculative_metrics()` with label support
- **Breaking Change**: `spec_decode_draft_single_head_acceptance_rate` 指标从多个独立 Gauge(`_0`、`_1` 等后缀)改为单个带 `head` label 的 Gauge,旧指标名称不再存在,请更新相关 Dashboard/告警规则。
3. **`fastdeploy/envs.py`**: Add `FD_DEFAULT_METRIC_LABEL_VALUES` environment variable
4. **14 call-site files**: Migrate all `main_process_metrics.<metric>.set()/inc()/dec()/observe()` calls to `set_value()/inc_value()/dec_value()/obs_value()`
5. **`fastdeploy/metrics/metrics_middleware.py`**: Migrate HTTP metric `.labels().inc()/.observe()` to `inc_value()/obs_value()` with `labelvalues` parameter
## Usage or Command
```bash
# Enable custom labels on all metrics
export FD_DEFAULT_METRIC_LABEL_VALUES='{"model_id":"qwen3-30b"}'
# Or with multiple labels
export FD_DEFAULT_METRIC_LABEL_VALUES='{"model_id":"qwen3-30b","version":"v2"}'
```
When not set (default `{}`), behavior is identical to current code — no labels are added.
## Accuracy Tests
N/A — This only affects Prometheus metric formatting, no model output changes.
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Unit tests for the metrics interface are not included in this PR and will be added as a follow-up.
- [ ] Provide accuracy results. N/A — no model output changes.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
整体实现思路清晰,抽象层设计合理,默认行为与现有代码完全兼容。主要关注点是 spec_decode_draft_single_head_acceptance_rate 指标名称的 Breaking Change 需在描述中明确说明,以及标题 Tag 需替换为官方 Tag。
| ), | ||
| ) | ||
|
|
||
| patched_spec_metrics = self._patch_labelnames(self.SPECULATIVE_METRICS) |
There was a problem hiding this comment.
🔴 兼容性 spec_decode_draft_single_head_acceptance_rate 指标名称发生 Breaking Change
旧实现创建多个独立 Gauge,名称为 fastdeploy:spec_decode_draft_single_head_acceptance_rate_0、_1 等;新实现改为单个带 head label 的 Gauge,名称变为 fastdeploy:spec_decode_draft_single_head_acceptance_rate{head="0"}。
已有 Prometheus 告警规则、Grafana Dashboard 若依赖旧指标名称将全部失效,属于不向后兼容的破坏性变更。
建议在 PR 描述中明确说明此 Breaking Change,并提供迁移指引(如旧指标名 → 新 PromQL 查询方式)。
* [Metric] Support model_id as metric labels by redefining metric update interface Introduce MetricsManagerInterface with unified set_value/inc_value/dec_value/obs_value methods. When FD_DEFAULT_METRIC_LABEL_VALUES is set to a valid non-empty JSON dict, metric labels (e.g. model_id) are automatically applied. Otherwise, operations fall back to the raw prometheus_client calls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [chore] add logger * [fix] fix spec metrics and cache info * [refactor] reimplement spec_decode_draft_single_head_acceptance_rate * [chore] fix pre-commit * [fix] fix spec labels * [test] fix test * [update] update cache_info and zmq_labels --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Motivation
Re-implement PR #4480 on current develop branch. The original PR introduced
MetricsManagerInterfaceto support custom labels (e.g.,model_id) on Prometheus metrics, but the codebase has changed significantly since then (WorkMetricsManagerremoved, newv1/serving_chat.pyadded,internal_adapter_utils.pyno longer imports metrics, etc.).Modifications
New file
fastdeploy/metrics/interface.py: DefineMetricsManagerInterfacewith 4 abstract methods:set_value,inc_value,dec_value,obs_value.fastdeploy/metrics/metrics.py:MetricsManagerinherits fromMetricsManagerInterfaceFD_DEFAULT_METRIC_LABEL_VALUESenv var; when set to a valid non-empty JSON dict, enable metric labels_patch_labelnames(): add label keys from_default_labelvaluesto all metrics'labelnamesmetric.labels(**merged).set()/inc()/dec()/observe(); otherwise, callmetric.set()/inc()/dec()/observe()directlyset_cache_config_info(),record_zmq_stats(),init_zmq_metrics(),_init_speculative_metrics()with label supportfastdeploy/envs.py: AddFD_DEFAULT_METRIC_LABEL_VALUESenvironment variable14 call-site files: Migrate all
main_process_metrics.<metric>.set()/inc()/dec()/observe()calls toset_value()/inc_value()/dec_value()/obs_value()fastdeploy/metrics/metrics_middleware.py: Migrate HTTP metric.labels().inc()/.observe()toinc_value()/obs_value()withlabelvaluesparameterUsage or Command
When not set (default
{}), behavior is identical to current code — no labels are added.An example of metrics text when default label values are enabled:
Accuracy Tests
No model output changes. This only affects Prometheus metric formatting.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.