Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8 by qiching · Pull Request #1592 · SemiAnalysisAI/InferenceX

qiching · 2026-05-30T07:11:06Z

Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.

Note

Low Risk
Documentation-only benchmark reference data with no application or infrastructure code changes.

Overview
Adds benchmarks/speedbench-reference-al.yaml, a new SPEED-Bench reference for acceptance length (AL) on DeepSeek-V4-Pro with vLLM MTP for num_speculative_tokens 1–8 (coding dataset, temperature 1.0, output_len 4096).

The file documents two AL curves: thinking_on (marked as the production golden reference for synthetic-acceptance modeling) and thinking_off (comparison only). The numeric tables differ from the PR description’s single table, which aligns with the thinking_off values rather than production thinking_on.

^{Reviewed by Cursor Bugbot for commit 1f0d9dd. Bugbot is set up for automated code reviews on this repo. Configure here.}

Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

xinli-sw · 2026-05-30T07:18:28Z

@qiching , a few recommendations

title [1/N] Synthetic MTP - Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8
In the PR description, mention that we will also have speedbench as part of github workflows, however, as the first iteration, we'd like to get alignment to make sure partners all feel confident and equally about these AL values
attach full repro for the numbers (serve command, installation, speedbench command, etc)
attach full results of the runs you had (the jsonl file) for audibility

Great work so far, cc @benchislett @functionstackx

benchislett · 2026-05-30T15:06:13Z

These numbers seem a bit high to me. Let's double check that thinking is on and temperature is being set properly.

functionstackx · 2026-05-30T20:58:51Z

thanks for this PR, what is the scripts that u used to generate this? can u create an PR that has the github actions for running this AL distribtuion collection script?

The previous values were measured with reasoning disabled but labeled thinking=true. Restructure the reference into an explicit matrix: thinking_on - reasoning enabled (production config; golden reference for synthetic-acceptance modeling) thinking_off - reasoning disabled (comparison only) Values measured on SPEED-Bench coding, temperature=1.0, output_len=4096.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.}

cursor · 2026-06-01T17:02:00Z

+    5: 3.13
+    6: 3.08
+    7: 3.13
+    8: 3.12


Golden reference wrong thinking block

Medium Severity

Comments designate thinking_on as the production golden reference for synthetic-acceptance modeling, but the PR’s thinking=true AL table (MTP 1–8) aligns with the thinking_off entries, not thinking_on. Anything reading the golden block gets different AL than the values partners are asked to align on.

^{Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.}

qiching · 2026-06-01T18:42:43Z

These numbers seem a bit high to me. Let's double check that thinking is on and temperature is being set properly.

updated. Please check the latest number of AL when thinking mode on/off.

functionstackx

thanks for the contribution! shouldnt the github action generating the yaml to check into the codebase instead of having it human generated?

i.e. close this PR and then have the other PR generate this yaml? #1650

Add SPEED-Bench reference AL values for DeepSeek-V4-Pro MTP 1-8

1de285d

Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.

qiching requested a review from a team May 30, 2026 07:11

github-project-automation Bot added this to InferenceMAX Board May 30, 2026

claude Bot reviewed May 30, 2026

View reviewed changes

cursor Bot reviewed Jun 1, 2026

View reviewed changes

xinli-sw mentioned this pull request Jun 2, 2026

[Tracking Issue] Synthetic Acceptance for MTP Benchmarks #1651

Open

3 tasks

functionstackx requested changes Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592

Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592
qiching wants to merge 2 commits into
SemiAnalysisAI:mainfrom
qiching:albecheng/add-dsv4-reference-al

qiching commented May 30, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

xinli-sw commented May 30, 2026

Uh oh!

benchislett commented May 30, 2026

Uh oh!

functionstackx commented May 30, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Jun 1, 2026

Uh oh!

qiching commented Jun 1, 2026

Uh oh!

functionstackx left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+: 3.13
+: 3.08
+: 3.13
+: 3.12

Conversation

qiching commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

xinli-sw commented May 30, 2026

Uh oh!

benchislett commented May 30, 2026

Uh oh!

functionstackx commented May 30, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Jun 1, 2026

Choose a reason for hiding this comment

Golden reference wrong thinking block

Uh oh!

qiching commented Jun 1, 2026

Uh oh!

functionstackx left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

qiching commented May 30, 2026 •

edited

Loading