Skip to content

[Models] Optimize DeepSeek V3 MLA projection layout#8083

Open
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:dskv3
Open

[Models] Optimize DeepSeek V3 MLA projection layout#8083
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:dskv3

Conversation

@chang-wenbin

@chang-wenbin chang-wenbin commented Jun 28, 2026

Copy link
Copy Markdown
Collaborator

Motivation

Optimize DeepSeek V3 MLA K_b/V_b projection layout handling and reduce intermediate transpose/padding work in the attention forward path.

Modifications

  • Added KVBatchLinear.forward_k_b_thd() for token-head-dim K_b projection.
  • Added KVBatchLinear.forward_v_b_htr() for token-head-rank V_b projection.
  • Updated DeepSeek V3 MLA, sliding-window MLA, and DSA attention paths to use the new helpers.
  • Removed prefill value padding/slicing so the value path keeps v_head_dim directly.

Usage or Command

N/A

Accuracy Tests

N/A. This PR changes DeepSeek V3 attention forward/output shape handling, but no accuracy or logits-alignment result is provided yet.

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@PaddlePaddle-bot PaddlePaddle-bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Paddle-CI-Agent | pr_review | 2026-06-29 00:03:12

📋 Review 摘要

PR 概述:调整 DeepSeek V3/DeepSeek V3.2 MLA 路径中的 K_b/V_b 投影布局处理,减少 transpose + bmm + transpose 形式的中间变换,并移除 prefill value padding/slice。

变更范围fastdeploy/model_executor/layers/linear.pyfastdeploy/model_executor/models/deepseek_v3.py

影响面 Tag[Models] [OP]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复

📝 PR 规范检查

当前 PR 标题缺少官方 Tag,PR 描述保留了模板但 MotivationModificationsUsage or CommandAccuracy Tests 均为空;本次改动触及模型 forward/attention 输出形状路径,也未提供精度或 logits 对齐结果。建议按下面内容补齐。

标题建议(可直接复制):

  • [Models] Optimize DeepSeek V3 MLA projection layout
PR 描述建议(点击展开,可直接复制)
## Motivation
Optimize DeepSeek V3 MLA K_b/V_b projection layout handling and reduce intermediate transpose/padding work in the attention forward path.

## Modifications
- Added `KVBatchLinear.forward_k_b_thd()` for token-head-dim K_b projection.
- Added `KVBatchLinear.forward_v_b_htr()` for token-head-rank V_b projection.
- Updated DeepSeek V3 MLA, sliding-window MLA, and DSA attention paths to use the new helpers.
- Removed prefill value padding/slicing so the value path keeps `v_head_dim` directly.

## Usage or Command
N/A

## Accuracy Tests
N/A. This PR changes DeepSeek V3 attention forward/output shape handling, but no accuracy or logits-alignment result is provided yet.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 KVBatchLinear 新增投影 helper,以及 DeepSeek V3 MLA/DSA/sliding-window 路径中 K/V 维度、latent rank 和 merge 输出的形状链路。代码层面暂未确认到需要阻塞合入的问题;但该 PR 修改模型 forward 热路径,建议补充至少一组 DeepSeek V3/DeepSeek V3.2 logits 或精度对齐结果,并说明是否覆盖 prefill、decode、mixed、DSA/滑窗路径。

@codecov-commenter

codecov-commenter commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@f4eda5a). Learn more about missing BASE report.

Files with missing lines Patch % Lines
fastdeploy/model_executor/models/deepseek_v3.py 0.00% 13 Missing ⚠️
fastdeploy/model_executor/layers/linear.py 50.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #8083   +/-   ##
==========================================
  Coverage           ?   67.55%           
==========================================
  Files              ?      475           
  Lines              ?    66912           
  Branches           ?    10320           
==========================================
  Hits               ?    45204           
  Misses             ?    18833           
  Partials           ?     2875           
Flag Coverage Δ
GPU 77.58% <11.76%> (?)
XPU 6.95% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PaddlePaddle-bot

PaddlePaddle-bot commented Jun 29, 2026

Copy link
Copy Markdown

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-29 18:00:44 UTC+08:00

CI报告基于以下代码生成(30分钟更新一次):
PR commit: 97c30e5 | Merge base: f4eda5a (branch: develop)


1 Required任务 : 9/10 通过

总执行(rerun次数) 总任务 ✅ 通过 ❌ 失败 ⏳ 运行中 ⏸️ 等待中 跳过
113(72) 41 37 4 0 0 0
任务 错误类型 置信度 日志
Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage PR问题:增量覆盖率仅 11%,未达 80% Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题(置信度: 高)

分析器: 通用分析(fallback)

失败用例: 覆盖率阈值校验

用例 错误摘要
Verify Code Coverage Threshold (80%) 增量覆盖率 11%,低于 80% 阈值,步骤以 exit code 9 失败

关键日志:

TEST_EXIT_CODE=0
COVERAGE_EXIT_CODE=9
Verify Code Coverage Threshold (80%)
total_num_lines=17, total_num_violations=15, total_percent_covered=11
violations: fastdeploy/model_executor/layers/linear.py:1142, 1154
Process completed with exit code 9.
  • 根因摘要: 新增 KVBatchLinear helper 未补充单测

PR 新增 KVBatchLinear.forward_k_b_thd()forward_v_b_htr(),并在 DeepSeek V3 注意力路径中改用这两个 helper。CI 单测阶段已通过,但覆盖率校验显示本次变更的 17 个有效覆盖行中有 15 行未覆盖,整体增量覆盖率只有 11%,其中 linear.py:1142linear.py:1154 是两个新增 einsum 返回逻辑。现有 tests/model_executor/test_linear.py::test_kvbatch_paths 只调用旧的 forward_k_b() / forward_v_b(),没有覆盖新增 THD/HTR 布局 helper。

修复建议:

  1. tests/model_executor/test_linear.py::test_kvbatch_paths 中增加 forward_k_b_thd()forward_v_b_htr() 的断言,覆盖输出 shape 和数值等价性。
  2. 建议用同一组权重对比旧路径:forward_k_b(x.transpose([1, 0, 2])).transpose([1, 0, 2])forward_k_b_thd(x),以及 forward_v_b(x.transpose([1, 0, 2])).transpose([1, 0, 2])forward_v_b_htr(x)
  3. 如 DeepSeek V3 改动还触发未覆盖分支,补充对应 attention forward 路径测试或拆出可单测的投影逻辑。

关联变更: fastdeploy/model_executor/layers/linear.py:1132, fastdeploy/model_executor/layers/linear.py:1144, fastdeploy/model_executor/models/deepseek_v3.py:457, fastdeploy/model_executor/models/deepseek_v3.py:622, fastdeploy/model_executor/models/deepseek_v3.py:1027

@chang-wenbin chang-wenbin changed the title Update dsk-v3 [Models] Optimize DeepSeek V3 MLA projection layout Jun 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants