Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592
Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592qiching wants to merge 2 commits into
Conversation
Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.
|
@qiching , a few recommendations
Great work so far, cc @benchislett @functionstackx |
|
These numbers seem a bit high to me. Let's double check that thinking is on and temperature is being set properly. |
|
thanks for this PR, what is the scripts that u used to generate this? can u create an PR that has the github actions for running this AL distribtuion collection script? |
The previous values were measured with reasoning disabled but labeled
thinking=true. Restructure the reference into an explicit matrix:
thinking_on - reasoning enabled (production config; golden reference
for synthetic-acceptance modeling)
thinking_off - reasoning disabled (comparison only)
Values measured on SPEED-Bench coding, temperature=1.0, output_len=4096.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.
| 5: 3.13 | ||
| 6: 3.08 | ||
| 7: 3.13 | ||
| 8: 3.12 |
There was a problem hiding this comment.
Golden reference wrong thinking block
Medium Severity
Comments designate thinking_on as the production golden reference for synthetic-acceptance modeling, but the PR’s thinking=true AL table (MTP 1–8) aligns with the thinking_off entries, not thinking_on. Anything reading the golden block gets different AL than the values partners are asked to align on.
Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.
updated. Please check the latest number of AL when thinking mode on/off. |
functionstackx
left a comment
There was a problem hiding this comment.
thanks for the contribution! shouldnt the github action generating the yaml to check into the codebase instead of having it human generated?
i.e. close this PR and then have the other PR generate this yaml? #1650


Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.
Note
Low Risk
Documentation-only benchmark reference data with no application or infrastructure code changes.
Overview
Adds
benchmarks/speedbench-reference-al.yaml, a new SPEED-Bench reference for acceptance length (AL) on DeepSeek-V4-Pro with vLLM MTP fornum_speculative_tokens1–8 (coding dataset, temperature 1.0, output_len 4096).The file documents two AL curves:
thinking_on(marked as the production golden reference for synthetic-acceptance modeling) andthinking_off(comparison only). The numeric tables differ from the PR description’s single table, which aligns with thethinking_offvalues rather than productionthinking_on.Reviewed by Cursor Bugbot for commit 1f0d9dd. Bugbot is set up for automated code reviews on this repo. Configure here.