[Models] Optimize DeepSeek V3 MLA projection layout by chang-wenbin · Pull Request #8083 · PaddlePaddle/FastDeploy

chang-wenbin · 2026-06-28T15:31:35Z

Motivation

Optimize DeepSeek V3 MLA K_b/V_b projection layout handling and reduce intermediate transpose/padding work in the attention forward path.

Modifications

Added KVBatchLinear.forward_k_b_thd() for token-head-dim K_b projection.
Added KVBatchLinear.forward_v_b_htr() for token-head-rank V_b projection.
Updated DeepSeek V3 MLA, sliding-window MLA, and DSA attention paths to use the new helpers.
Removed prefill value padding/slicing so the value path keeps v_head_dim directly.

Usage or Command

N/A

Accuracy Tests

N/A. This PR changes DeepSeek V3 attention forward/output shape handling, but no accuracy or logits-alignment result is provided yet.

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

PaddlePaddle-bot

🤖 Paddle-CI-Agent | pr_review | 2026-06-29 00:03:12

📋 Review 摘要

PR 概述：调整 DeepSeek V3/DeepSeek V3.2 MLA 路径中的 K_b/V_b 投影布局处理，减少 transpose + bmm + transpose 形式的中间变换，并移除 prefill value padding/slice。

变更范围：fastdeploy/model_executor/layers/linear.py、fastdeploy/model_executor/models/deepseek_v3.py

影响面 Tag：[Models] [OP]

问题

未发现阻塞性问题。PR 规范问题在下面章节报,不要在这里重复

📝 PR 规范检查

当前 PR 标题缺少官方 Tag，PR 描述保留了模板但 Motivation、Modifications、Usage or Command、Accuracy Tests 均为空；本次改动触及模型 forward/attention 输出形状路径，也未提供精度或 logits 对齐结果。建议按下面内容补齐。

标题建议（可直接复制）：

[Models] Optimize DeepSeek V3 MLA projection layout

PR 描述建议（点击展开，可直接复制）

## Motivation
Optimize DeepSeek V3 MLA K_b/V_b projection layout handling and reduce intermediate transpose/padding work in the attention forward path.

## Modifications
- Added `KVBatchLinear.forward_k_b_thd()` for token-head-dim K_b projection.
- Added `KVBatchLinear.forward_v_b_htr()` for token-head-rank V_b projection.
- Updated DeepSeek V3 MLA, sliding-window MLA, and DSA attention paths to use the new helpers.
- Removed prefill value padding/slicing so the value path keeps `v_head_dim` directly.

## Usage or Command
N/A

## Accuracy Tests
N/A. This PR changes DeepSeek V3 attention forward/output shape handling, but no accuracy or logits-alignment result is provided yet.

## Checklist

- [x] Add at least a tag in the PR title.
  - Tag list: [`[FDConfig]`,`[APIServer]`,`[Engine]`, `[Scheduler]`, `[PD Disaggregation]`, `[Executor]`, `[Graph Optimization]`, `[Speculative Decoding]`, `[RL]`, `[Models]`, `[Quantization]`, `[Loader]`, `[OP]`, `[KVCache]`, `[DataProcessor]`, `[BugFix]`, `[Docs]`, `[CI]`, `[Optimization]`, `[Feature]`, `[Benchmark]`, `[Others]`, `[XPU]`, `[HPU]`, `[GCU]`, `[DCU]`, `[Iluvatar]`, `[Metax]`]
  - You can add new tags based on the PR content, but the semantics must be clear.
- [ ] Format your code, run `pre-commit` before commit.
- [ ] Add unit tests. Please write the reason in this PR if no unit tests.
- [ ] Provide accuracy results.
- [ ] If the current PR is submitting to the `release` branch, make sure the PR has been submitted to the `develop` branch, then cherry-pick it to the `release` branch with the `[Cherry-Pick]` PR tag.

总体评价

本轮按风险优先审查了 KVBatchLinear 新增投影 helper，以及 DeepSeek V3 MLA/DSA/sliding-window 路径中 K/V 维度、latent rank 和 merge 输出的形状链路。代码层面暂未确认到需要阻塞合入的问题；但该 PR 修改模型 forward 热路径，建议补充至少一组 DeepSeek V3/DeepSeek V3.2 logits 或精度对齐结果，并说明是否覆盖 prefill、decode、mixed、DSA/滑窗路径。

codecov-commenter · 2026-06-28T16:08:08Z

Codecov Report

❌ Patch coverage is 11.76471% with 15 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@f4eda5a). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
fastdeploy/model_executor/models/deepseek_v3.py	0.00%	13 Missing ⚠️
fastdeploy/model_executor/layers/linear.py	50.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #8083   +/-   ##
==========================================
  Coverage           ?   67.55%           
==========================================
  Files              ?      475           
  Lines              ?    66912           
  Branches           ?    10320           
==========================================
  Hits               ?    45204           
  Misses             ?    18833           
  Partials           ?     2875

Flag	Coverage Δ
GPU	`77.58% <11.76%> (?)`
XPU	`6.95% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

PaddlePaddle-bot · 2026-06-29T03:00:31Z

🤖 Paddle-CI-Agent | ci_status_monitor | 2026-06-29 18:00:44 UTC+08:00

CI报告基于以下代码生成（30分钟更新一次）:
PR commit: 97c30e5 | Merge base: f4eda5a (branch: develop)

1 Required任务 : 9/10 通过

总执行（rerun次数）	总任务	✅ 通过	❌ 失败	⏳ 运行中	⏸️ 等待中	跳过
113(72)	41	37	4	0	0	0

任务	错误类型	置信度	日志
`Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage`	PR问题：增量覆盖率仅 11%，未达 80%	高	Job

2 失败详情

🔴 Run FastDeploy Unit Tests and Coverage / run_tests_with_coverage — PR问题（置信度: 高）

分析器: 通用分析(fallback)

失败用例: 覆盖率阈值校验

用例	错误摘要
`Verify Code Coverage Threshold (80%)`	增量覆盖率 11%，低于 80% 阈值，步骤以 exit code 9 失败

关键日志:

TEST_EXIT_CODE=0
COVERAGE_EXIT_CODE=9
Verify Code Coverage Threshold (80%)
total_num_lines=17, total_num_violations=15, total_percent_covered=11
violations: fastdeploy/model_executor/layers/linear.py:1142, 1154
Process completed with exit code 9.

根因摘要: 新增 KVBatchLinear helper 未补充单测

PR 新增 KVBatchLinear.forward_k_b_thd() 和 forward_v_b_htr()，并在 DeepSeek V3 注意力路径中改用这两个 helper。CI 单测阶段已通过，但覆盖率校验显示本次变更的 17 个有效覆盖行中有 15 行未覆盖，整体增量覆盖率只有 11%，其中 linear.py:1142、linear.py:1154 是两个新增 einsum 返回逻辑。现有 tests/model_executor/test_linear.py::test_kvbatch_paths 只调用旧的 forward_k_b() / forward_v_b()，没有覆盖新增 THD/HTR 布局 helper。

修复建议:

在 tests/model_executor/test_linear.py::test_kvbatch_paths 中增加 forward_k_b_thd() 和 forward_v_b_htr() 的断言，覆盖输出 shape 和数值等价性。
建议用同一组权重对比旧路径：forward_k_b(x.transpose([1, 0, 2])).transpose([1, 0, 2]) 与 forward_k_b_thd(x)，以及 forward_v_b(x.transpose([1, 0, 2])).transpose([1, 0, 2]) 与 forward_v_b_htr(x)。
如 DeepSeek V3 改动还触发未覆盖分支，补充对应 attention forward 路径测试或拆出可单测的投影逻辑。

关联变更: fastdeploy/model_executor/layers/linear.py:1132, fastdeploy/model_executor/layers/linear.py:1144, fastdeploy/model_executor/models/deepseek_v3.py:457, fastdeploy/model_executor/models/deepseek_v3.py:622, fastdeploy/model_executor/models/deepseek_v3.py:1027

update dsk-v3

97c30e5

chang-wenbin had a problem deploying to Metax_ci June 28, 2026 15:31 — with GitHub Actions Failure

PaddlePaddle-bot reviewed Jun 28, 2026

View reviewed changes

chang-wenbin changed the title ~~Update dsk-v3~~ [Models] Optimize DeepSeek V3 MLA projection layout Jun 29, 2026

zhoutianzi666 approved these changes Jun 29, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Models] Optimize DeepSeek V3 MLA projection layout#8083

[Models] Optimize DeepSeek V3 MLA projection layout#8083
chang-wenbin wants to merge 1 commit into
PaddlePaddle:developfrom
chang-wenbin:dskv3

chang-wenbin commented Jun 28, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot left a comment

Uh oh!

codecov-commenter commented Jun 28, 2026 •

edited

Loading

Uh oh!

PaddlePaddle-bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

chang-wenbin commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

PaddlePaddle-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

📝 PR 规范检查

总体评价

Uh oh!

codecov-commenter commented Jun 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

PaddlePaddle-bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1 Required任务 : 9/10 通过

2 失败详情

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chang-wenbin commented Jun 28, 2026 •

edited

Loading

codecov-commenter commented Jun 28, 2026 •

edited

Loading

PaddlePaddle-bot commented Jun 29, 2026 •

edited

Loading