[Metrics] move prompt_tokens_total report to main process#7982
[Metrics] move prompt_tokens_total report to main process#7982liyonghua0910 wants to merge 2 commits into
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #7982 +/- ##
==========================================
Coverage ? 67.73%
==========================================
Files ? 468
Lines ? 65989
Branches ? 10186
==========================================
Hits ? 44700
Misses ? 18441
Partials ? 2848
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务当前有 2 个失败、0 个运行中、0 个等待中,暂不建议合入。主测试失败为 PR 代码问题;XPU 8 卡失败表现为 Decode 节点健康检查超时,更像环境/服务启动问题。
2 任务状态汇总日志列说明:失败任务直接使用 2.1 Required任务 : 8/10 通过
2.2 可选任务 — 28/31 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 为 sampling_params 判空再上报 关联变更: xpu_8cards_case_test / run_xpu_8cards_cases — 超时(置信度: 中)xpu_8cards_case_test / run_xpu_8cards_cases
失败用例:
根因详情: 关键日志: 修复建议:
修复建议摘要: 环境问题,请 rerun 关联变更: 未发现与本 PR 变更文件的直接关联 |
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-06-03 20:00:34
📋 Review 摘要
PR 概述:将 prompt_tokens_total、request_prompt_tokens、request_params_max_tokens 三项指标上报点从 engine_client.py(API 进程)迁移至 common_engine.py(主进程)。
变更范围:fastdeploy/engine/、fastdeploy/entrypoints/、tests/engine/
影响面 Tag:[Engine] [APIServer]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🟡 建议 | tests/engine/test_common_engine.py |
新增 mock 属性但缺少对应断言,测试不验证新指标上报行为 |
🟡 建议 tests/engine/test_common_engine.py — 两个测试用例(with/without trace_carrier)均新增了 prompt_token_ids_len 和 sampling_params mock 属性,但测试体只断言了 trace_set_proc_propagate_context,没有验证 prompt_tokens_total、request_prompt_tokens、request_params_max_tokens 是否被正确调用。测试仅确保不崩溃,不具备守护作用。
建议在 eng._insert_zmq_task_to_scheduler() 调用后补充断言(两个用例均需要):
# 验证新增的指标上报
eng.metrics.prompt_tokens_total.inc.assert_called_once_with(2)
eng.metrics.request_prompt_tokens.observe.assert_called_once_with(2)
eng.metrics.request_params_max_tokens.observe.assert_called_once_with(16)同时建议补充 sampling_params=None 的边界用例,验证无 sampling_params 时 request_params_max_tokens 不被调用。
历史 Findings 修复情况
| Finding | 问题 | 状态 |
|---|---|---|
| F1 | PR 标题使用了非官方 Tag [Metrics] |
|
| F2 | PR 描述各 Section 内容为空 | ✅ 已修复 |
📝 PR 规范检查
PR 标题使用了 [Metrics],该 Tag 不在 FastDeploy 官方 Tag 列表中,应改用 [Engine]。
标题建议(可直接复制):
[Engine] Move prompt_tokens_total metrics report to main process
PR 描述建议(点击展开,可直接复制)
## Motivation
将 `prompt_tokens_total`、`request_prompt_tokens`、`request_params_max_tokens` 三个指标的上报点从 `engine_client.py`(API 进程)迁移至 `common_engine.py` 主进程的 `_insert_zmq_task_to_scheduler`,使指标在请求真正进入调度器时才被记录,语义更准确,同时消除了从 API 进程向主进程指标对象写入的跨进程依赖。
## Modifications
- `fastdeploy/engine/common_engine.py`:在 `_insert_zmq_task_to_scheduler` 的请求入队处新增 `prompt_tokens_total`、`request_prompt_tokens`、`request_params_max_tokens` 三项指标上报
- `fastdeploy/entrypoints/engine_client.py`:删除上述三项指标上报及对应的 `main_process_metrics` import
## Usage or Command
pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py
## Accuracy Tests
N/A。本 PR 仅迁移指标上报位置,不影响模型输出逻辑。
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [x] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests. (已在 test_common_engine.py 中更新测试)
- [x] Provide accuracy results. (N/A,无模型输出变更)
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
代码改动简洁,指标迁移方向正确,逻辑无误。测试新增了必要的 mock 属性,但缺少对应断言,建议补充以提升测试有效性;PR 标题 Tag 建议更换为官方 Tag [Engine]。
Motivation
Move prompt token related metrics reporting from the API-side EngineClient to the engine main process, so
prompt_tokens_totalis reported from the process that owns the main metrics collector.Modifications
prompt_tokens_total,request_prompt_tokens, andrequest_params_max_tokensreporting fromfastdeploy/entrypoints/engine_client.pytofastdeploy/engine/common_engine.py.main_process_metricsimport fromfastdeploy/entrypoints/engine_client.py.Usage or Command
pytest tests/engine/test_common_engine.py tests/pooling/test_Qwen3-Embedding_serving.py tests/pooling/test_Ernie4_5_reward_serving.py
Accuracy Tests
N/A. This PR only changes metrics reporting location and does not change model output logic.
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.