[Feature] Support sampling_threshold#7875
Conversation
|
Thanks for your contribution! |
CI报告基于以下代码生成(30分钟更新一次): 1 任务总览Required 任务 10 个中 9 个已通过,当前有 1 个 Required 失败任务阻塞合并:
2 任务状态汇总日志列说明:失败任务直接使用工具生成的日志链接;运行中任务显示 Job 链接。 2.1 Required任务 : 9/10 通过
2.2 可选任务 — 30/32 通过
3 失败详情(仅 required)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — 测试失败(覆盖率检查不通过,置信度: 高)Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage
失败用例: 无(单元测试全部通过,覆盖率检查阶段失败) 根因详情:
关键日志: 修复建议:
修复建议摘要: 在 关联变更: |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #7875 +/- ##
==========================================
Coverage ? 63.57%
==========================================
Files ? 462
Lines ? 64518
Branches ? 9891
==========================================
Hits ? 41016
Misses ? 20721
Partials ? 2781
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PaddlePaddle-bot
left a comment
There was a problem hiding this comment.
🤖 Paddle-CI-Agent | pr_review |
2026-05-21 16:08:58
📋 Review 摘要
PR 概述:新增 sampling_threshold 参数支持,在采样前过滤绝对概率低于阈值的 token,改善生成质量。
变更范围:engine/sampling_params.py、entrypoints/openai/protocol.py、model_executor/layers/sample/、worker/gpu_model_runner.py、worker/xpu_model_runner.py、worker/input_batch.py、文档
影响面 Tag:[Feature] [APIServer] [Engine] [OP] [XPU]
问题
| 级别 | 文件 | 概述 |
|---|---|---|
| 🔴 Bug | fastdeploy/worker/xpu_model_runner.py:762 |
逗号表达式而非赋值,XPU 上 sampling_threshold 永远无法写入共享内存 |
| 🟡 建议 | fastdeploy/worker/ |
dcu_model_runner.py / iluvatar_model_runner.py / metax_model_runner.py / hpu_model_runner.py / gcu_model_runner.py 均未同步 sampling_threshold 支持 |
📝 PR 规范检查
标题 [Feature] Support sampling_threshold 格式合规,Tag 使用正确。
PR 描述结构基本符合模板,但 ## Accuracy Tests 章节内容为空(仅有注释占位),## Checklist 所有条目均未勾选。建议按实际情况补全。
PR 描述建议(可直接复制):
## Motivation
支持请求指定 `sampling_threshold`,在采样前过滤概率小于 `sampling_threshold` 的所有 token,只从高绝对概率 token 中采样,提高生成质量。
## Modifications
- 新增 `SamplingParams.sampling_threshold` 字段(默认 0.0),并在 `_verify_args` 中校验范围 [0.0, 1.0)
- `CompletionRequest` / `ChatCompletionRequest` 新增 `sampling_threshold` 字段(OpenAI 协议扩展,ge=0.0, lt=1.0)
- `SamplingMetadata` 新增 `sampling_threshold` tensor 字段
- `sampler.py` 在 `_normal_sample`、`_verify_and_sample`、`forward_cuda`、`_normal_sample_xpu`、`_verify_and_sample_xpu` 中透传 `threshold` 参数
- `input_batch.py` 初始化/重置/交换逻辑同步新增 `sampling_threshold` 及 `sampling_threshold_list`
- `gpu_model_runner.py` / `xpu_model_runner.py` 的 `insert_tasks_v1` 和 `_prepare_inputs` 同步支持
- 文档(中英文)同步更新采样策略说明、接口参数说明、使用示例
## Usage or Command
支持 `sampling_threshold` 的采样后端:
```
默认(export FD_SAMPLING_CLASS=base)
or export FD_SAMPLING_CLASS=base_non_truncated
or export FD_SAMPLING_CLASS=air
```
测试脚本:
```shell
curl -X POST "http://0.0.0.0:8112/v1/chat/completions" \
-H "Content-Type: application/json" \
-d '{
"messages": [{"role": "user", "content": "牛顿的三大运动定律是什么?"}],
"top_p": 1.0,
"temperature": 1.0,
"sampling_threshold": 0.00002,
"stream": false,
"max_tokens": 3
}'
```
## Accuracy Tests
N/A(本 PR 为新增采样过滤参数,不影响现有采样路径的精度)
## Checklist
- [x] Add at least a tag in the PR title.
- Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
- You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [x] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.总体评价
PR 整体思路清晰,sampling_threshold 功能链路完整(参数定义 → 协议解析 → 采样传参 → 文档)。XPU Runner 存在一处笔误(逗号替代赋值),必须修复后方可合入;此外建议补全其他硬件 Runner(DCU / Iluvatar / Metax / HPU / GCU)的同步支持,避免这些平台功能静默失效。
| self.share_inputs["min_p"][idx : idx + 1] = request.get("min_p", 0.0) | ||
| self.share_inputs["min_p_list"][idx] = request.get("min_p", 0.0) | ||
| self.share_inputs["sampling_threshold_list"][idx] = request.get("sampling_threshold", 0.0) | ||
| self.share_inputs["sampling_threshold"][idx : idx + 1], request.get("sampling_threshold", 0.0) |
There was a problem hiding this comment.
🔴 Bug 赋值语句写成了逗号表达式,sampling_threshold 值永远不会被写入共享内存。
当前代码:
self.share_inputs["sampling_threshold"][idx : idx + 1], request.get("sampling_threshold", 0.0)这是一个元组表达式,不是赋值,因此 XPU 上 sampling_threshold 始终保持初始化值 0.0,功能完全失效。
建议修复为(与 gpu_model_runner.py 第1053-1055行保持一致):
self.share_inputs["sampling_threshold"][idx : idx + 1] = request.get("sampling_threshold", 0.0)
Motivation
支持请求指定
sampling_threshold,在采样前过滤probs小于sampling_threshold的所有tokens。Modifications
新增
sampling_threshold字段Usage or Command
支持
sampling_threshold的采样后端测试脚本
Accuracy Tests
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.