[Feature] Support computing entropy with fastdeploy runner#7954
[Feature] Support computing entropy with fastdeploy runner#7954rain7996 wants to merge 3 commits into
Conversation
|
Thanks for your contribution! |
|
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7954 +/- ##
==========================================
Coverage ? 67.81%
==========================================
Files ? 467
Lines ? 65259
Branches ? 10033
==========================================
Hits ? 44255
Misses ? 18156
Partials ? 2848
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 未全部通过:当前有 1 个 required 失败任务(Approval,需人工审批),0 个 required 等待/运行中任务。主测试
2 任务状态汇总日志列说明:失败任务直接使用 CI 日志链接;可选任务失败不阻塞合并,仅供参考。 2.1 Required任务 : 9/10 通过
2.2 可选任务 — 27/31 通过
3 失败详情(仅 required)Approval — 需要人工审批(置信度: 高)该 Job 需要人工 Approval,完成审批后 CI 才会继续执行。 |
Motivation
Support entropy calculation for fastdeploy runner. The previous implementation had three bugs in the fd-runner + MTP scenario:
accept_num=0for a finishing slot, the code skipped thestop_flagscheck entirely, so entropy was never summarized or cleared.[sum(seq_lens_this_time), vocab](all positions including rejected), but the code treated it as[total_accepted_num, vocab](accepted-only, which is the ernie5_runner layout).req_id. Their entropy values accumulated inentropy_listand were never cleared, contaminating subsequent real requests.Modifications
fastdeploy/model_executor/entropy_utils.py:speculate_calculate_logits_entropy: fd-runner usesaccepted_idxto extract correct rows from full logits; ernie5_runner uses pre-filtered logits directly.stop_flagscheck outside theif accept_count > 0block so ENTROPY-DONE fires even when no tokens are accepted in the final step.is_valid_reqguard: skip entropy accumulation for warmup requests (empty/whitespacereq_id).[ENTROPY-DONE]at request completion.import timeand_mtp_step_counter.Accuracy Tests
测试配置:ERNIE5 TP1, block_wise_fp8, fd_runner, no-prefix-cache, temperature=0
fd_runner Overlap 开启 vs 关闭 (水的化学式是什么?, max_tokens=10)
[0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906][0.0, 0.001631, 0.20335, 0.058157, 0.293438, 0.00377, 0.498297, 1.209875, 0.765423, 0.605906]Checklist
[BugFix],[Feature]pre-commitbefore commit.