Skip to content

Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592

Open
qiching wants to merge 2 commits into
SemiAnalysisAI:mainfrom
qiching:albecheng/add-dsv4-reference-al
Open

Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8#1592
qiching wants to merge 2 commits into
SemiAnalysisAI:mainfrom
qiching:albecheng/add-dsv4-reference-al

Conversation

@qiching
Copy link
Copy Markdown

@qiching qiching commented May 30, 2026

Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true. Values used for synthetic acceptance rate configuration in MTP benchmarks.


Note

Low Risk
Documentation-only benchmark reference data with no application or infrastructure code changes.

Overview
Adds benchmarks/speedbench-reference-al.yaml, a new SPEED-Bench reference for acceptance length (AL) on DeepSeek-V4-Pro with vLLM MTP for num_speculative_tokens 1–8 (coding dataset, temperature 1.0, output_len 4096).

The file documents two AL curves: thinking_on (marked as the production golden reference for synthetic-acceptance modeling) and thinking_off (comparison only). The numeric tables differ from the PR description’s single table, which aligns with the thinking_off values rather than production thinking_on.

Reviewed by Cursor Bugbot for commit 1f0d9dd. Bugbot is set up for automated code reviews on this repo. Configure here.

Measured with SPEED-Bench coding dataset, temperature=1.0, thinking=true.
Values used for synthetic acceptance rate configuration in MTP benchmarks.
@qiching qiching requested a review from a team May 30, 2026 07:11
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@xinli-sw
Copy link
Copy Markdown
Collaborator

@qiching , a few recommendations

  1. title [1/N] Synthetic MTP - Add SPEED-Bench reference synthetic AL values for DeepSeek-V4-Pro MTP 1-8

  2. In the PR description, mention that we will also have speedbench as part of github workflows, however, as the first iteration, we'd like to get alignment to make sure partners all feel confident and equally about these AL values

  3. attach full repro for the numbers (serve command, installation, speedbench command, etc)

  4. attach full results of the runs you had (the jsonl file) for audibility

Great work so far, cc @benchislett @functionstackx

@benchislett
Copy link
Copy Markdown

These numbers seem a bit high to me. Let's double check that thinking is on and temperature is being set properly.

@functionstackx
Copy link
Copy Markdown
Collaborator

thanks for this PR, what is the scripts that u used to generate this? can u create an PR that has the github actions for running this AL distribtuion collection script?

The previous values were measured with reasoning disabled but labeled
thinking=true. Restructure the reference into an explicit matrix:

  thinking_on  - reasoning enabled (production config; golden reference
                 for synthetic-acceptance modeling)
  thinking_off - reasoning disabled (comparison only)

Values measured on SPEED-Bench coding, temperature=1.0, output_len=4096.
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.

5: 3.13
6: 3.08
7: 3.13
8: 3.12
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Golden reference wrong thinking block

Medium Severity

Comments designate thinking_on as the production golden reference for synthetic-acceptance modeling, but the PR’s thinking=true AL table (MTP 1–8) aligns with the thinking_off entries, not thinking_on. Anything reading the golden block gets different AL than the values partners are asked to align on.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 1f0d9dd. Configure here.

@qiching
Copy link
Copy Markdown
Author

qiching commented Jun 1, 2026

These numbers seem a bit high to me. Let's double check that thinking is on and temperature is being set properly.

updated. Please check the latest number of AL when thinking mode on/off.

Copy link
Copy Markdown
Collaborator

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! shouldnt the github action generating the yaml to check into the codebase instead of having it human generated?

i.e. close this PR and then have the other PR generate this yaml? #1650

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants