Skip to content

[Feature] Support computing entropy with fastdeploy runner#7954

Open
rain7996 wants to merge 3 commits into
PaddlePaddle:developfrom
rain7996:develop
Open

[Feature] Support computing entropy with fastdeploy runner#7954
rain7996 wants to merge 3 commits into
PaddlePaddle:developfrom
rain7996:develop

Conversation

@rain7996
Copy link
Copy Markdown
Contributor

Motivation

Support entropy calculation for fastdeploy runner. The previous implementation had three bugs in the fd-runner + MTP scenario:

  1. ENTROPY-DONE never triggered: When accept_num=0 for a finishing slot, the code skipped the stop_flags check entirely, so entropy was never summarized or cleared.
  2. Incorrect logits indexing: fd-runner's logits shape is [sum(seq_lens_this_time), vocab] (all positions including rejected), but the code treated it as [total_accepted_num, vocab] (accepted-only, which is the ernie5_runner layout).
  3. Warmup pollution: CUDA Graph warmup sends dummy requests with empty req_id. Their entropy values accumulated in entropy_list and were never cleared, contaminating subsequent real requests.

Modifications

fastdeploy/model_executor/entropy_utils.py:

  • Add dual-path logic in speculate_calculate_logits_entropy: fd-runner uses accepted_idx to extract correct rows from full logits; ernie5_runner uses pre-filtered logits directly.
  • Move stop_flags check outside the if accept_count > 0 block so ENTROPY-DONE fires even when no tokens are accepted in the final step.
  • Add is_valid_req guard: skip entropy accumulation for warmup requests (empty/whitespace req_id).
  • Remove verbose per-step debug logging; only emit [ENTROPY-DONE] at request completion.
  • Remove unused import time and _mtp_step_counter.

Accuracy Tests

测试配置:ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0

fd_runner Overlap 开启 vs 关闭 (水的化学式是什么?, max_tokens=10)

配置 all_values avg_entropy
overlap 开启 [0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906] 0.363945
overlap 关闭 [0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906] 0.363945
对比 10步完全一致 完全一致

Checklist

  • Add at least a tag in the PR title: [BugFix], [Feature]
  • Format your code, run pre-commit before commit.
  • Add unit tests.
  • Provide accuracy results.

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented May 28, 2026

Thanks for your contribution!

@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 28, 2026

Codecov Report

❌ Patch coverage is 79.31034% with 18 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@60e6223). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/entropy_utils.py 86.84% 5 Missing and 5 partials ⚠️
fastdeploy/model_executor/pre_and_post_process.py 20.00% 7 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #7954   +/-   ##
==========================================
  Coverage           ?   67.81%           
==========================================
  Files              ?      467           
  Lines              ?    65259           
  Branches           ?    10033           
==========================================
  Hits               ?    44255           
  Misses             ?    18156           
  Partials           ?     2848           
Flag Coverage Δ
GPU 78.08% <79.31%> (?)
XPU 7.07% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot
Copy link
Copy Markdown

PaddlePaddle-bot commented May 28, 2026

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-05-29 05:50:16

CI报告基于以下代码生成(30分钟更新一次):


1 任务总览

Required 未全部通过:当前有 1 个 required 失败任务(Approval,需人工审批),0 个 required 等待/运行中任务。主测试 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage 已通过;另有 3 个 optional 失败、1 个 optional 运行中供参考。

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
41(0) 41 36 4 1 0 0

2 任务状态汇总

日志列说明:失败任务直接使用 CI 日志链接;可选任务失败不阻塞合并,仅供参考。

2.1 Required任务 : 9/10 通过

必选任务阻塞合并,失败需优先处理。

状态 任务 耗时 根因 修复建议 日志 重跑
Approval 19s 需要 Approval 请通过人工审批 Job -
其余 9 个必选任务通过 - - - - -

2.2 可选任务 — 27/31 通过

可选任务不阻塞合并,失败仅供参考。

状态 任务 耗时 日志 重跑
Run iluvatar Tests / run_iluvatar_cases 1m51s Job -
Check PR Template 21s Job -
Trigger Jenkins for PR 14s Job -
CI_HPU - - -
其余 27 个可选任务通过 - - -

3 失败详情(仅 required)

Approval — 需要人工审批(置信度: 高)

该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants