[benchmark] Enhance benchmark metrics with ITL aggregation details by ZhangYulongg · Pull Request #8063 · PaddlePaddle/FastDeploy

ZhangYulongg · 2026-06-17T08:28:06Z

Added metrics for cleaned ITL aggregation including decode speed and counts for different ITL categories.

Motivation

💡 If this PR is a Cherry Pick, the PR title needs to follow the format by adding the [Cherry-Pick] label at the very beginning and appending the original PR ID at the end. For example, [Cherry-Pick][CI] Add check trigger and logic(#5191)

💡 如若此PR是Cherry Pick，PR标题需遵循格式，在最开始加上[Cherry-Pick]标签，以及最后面加上原PR ID，例如[Cherry-Pick][CI] Add check trigger and logic(#5191)

Modifications

Usage or Command

Accuracy Tests

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

Added metrics for cleaned ITL aggregation including decode speed and counts for different ITL categories.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-17 16:38:18

📋 Review 摘要

PR 概述：为 benchmark_serving.py 增加清洗后的 ITL 全局聚合解码速度及样本计数输出。
变更范围：benchmarks/benchmark_serving.py
影响面 Tag：[Benchmark]

问题

级别	文件	概述
🔴 Bug	`benchmarks/benchmark_serving.py:304`	clean ITL 聚合与主统计使用不同有效样本口径，可能把 `output_tokens == 0` 的响应计入新增指标
🔴 安全	`benchmarks/benchmark_serving.py:471`	warmup 成功路径无条件打印完整响应对象，可能泄露生成内容和 token ids

📝 PR 规范检查

标题 Tag 大小写不符合官方 Tag，PR 描述保留了模板占位内容且关键章节未填写。

标题建议（可直接复制）：

[Benchmark] Add cleaned ITL aggregation metrics

PR 描述建议（点击展开，可直接复制）

## Motivation
Add benchmark metrics that report cleaned global ITL aggregation, so decode speed can be inspected after filtering burst and preemption intervals.

## Modifications
- Updated `benchmarks/benchmark_serving.py` to collect global ITL samples, count burst/preemption/clean intervals, and report clean decode speed.
- Added the clean ITL aggregate fields to benchmark JSON results and console output.

## Usage or Command
N/A

## Accuracy Tests
N/A

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

新增指标方向合理，但当前实现存在统计口径不一致和日志暴露风险，建议修复后再合入。

PaddlePaddle-bot · 2026-06-17T08:42:16Z

+    PREEMPT_THRESHOLD_S = 0.5  # 500 ms
+    all_itls_flat: list[float] = []
+    for o in outputs:
+        if o.success:


🔴 Bug 新增 clean ITL 聚合只按 o.success 收集样本，会把主统计已经跳过的 output_tokens == 0 响应重新计入。

calculate_metrics() 前面的主循环在 output_len = outputs[i].output_tokens 后会对空输出直接 continue，因此这些请求不会进入 completed、total_output、普通 itl/s_decode 等指标。这里重新遍历 outputs 并只判断 o.success，会让 s_decode_clean、n_itls_* 与同一份结果中的主指标使用不同样本集；对某些流式后端，成功返回文本但没有 usage/completion_tokens 时，已有代码正是走这个跳过路径。

建议修复方式：复用主循环已经确认有效的 itls 样本，例如直接基于 itls 构造 _arr，或把 clean 聚合移动到 if not output_len: continue 之后，保证与 completed/total_output/普通 ITL 指标口径一致。

PaddlePaddle-bot · 2026-06-17T08:42:16Z

            out_list, metrics = test_output
            test_output = out_list[0]

+        print("test_output:", test_output, flush=True)


🔴 安全 这里把 warmup 成功响应也无条件打印到日志，会泄露完整 RequestFuncOutput。

RequestFuncOutput 的 repr 包含 generated_text、reasoning_content、output_ids、tool_calls 等字段。此前只在 warmup 失败时打印，主要用于排错；现在成功请求也会进入 stdout，在 benchmark 日志被采集或共享时会暴露生成内容和 token ids。

建议修复方式：恢复为仅失败时打印完整 test_output；如成功路径确实需要调试信息，请放到 if debug: 或只打印 request id、latency、output_tokens 等非内容字段。

[benchmark] Enhance benchmark metrics with ITL aggregation details

20e675a

Added metrics for cleaned ITL aggregation including decode speed and counts for different ITL categories.

ZhangYulongg temporarily deployed to Metax_ci June 17, 2026 08:28 — with GitHub Actions Inactive

EmmonsCurse added the skip-ci: all label Jun 17, 2026

EmmonsCurse approved these changes Jun 17, 2026

View reviewed changes

EmmonsCurse merged commit 5372fe5 into develop Jun 17, 2026
41 of 43 checks passed

EmmonsCurse deleted the ZhangYulongg-patch-1 branch June 17, 2026 08:32

PaddlePaddle-bot suggested changes Jun 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] Enhance benchmark metrics with ITL aggregation details#8063

[benchmark] Enhance benchmark metrics with ITL aggregation details#8063
EmmonsCurse merged 1 commit into
developfrom
ZhangYulongg-patch-1

ZhangYulongg commented Jun 17, 2026

Uh oh!

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

PaddlePaddle-bot Jun 17, 2026

Uh oh!

PaddlePaddle-bot Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ZhangYulongg commented Jun 17, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

PaddlePaddle-bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

PaddlePaddle-bot Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants