[Cherry-Pick][Optimization] merge matmul and add(#6986) by BingooYang · Pull Request #7184 · PaddlePaddle/FastDeploy

BingooYang · 2026-04-03T08:06:08Z

Motivation

性能优化

Modifications

将UnquantizedLinearMethod中的matmul和add用linear替换。
带bias情况基本上有加速，不带bias情况小shape下性能有下降（主要是python层if等调度开销，linear内部实现也是matmul）。

Usage or Command

无

Accuracy Tests

精度保持一致

Checklist

[ x ] Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
[ x ] Format your code, run pre-commit before commit.
[ x ] Add unit tests. Please write the reason in this PR if no unit tests.
[ x ] Provide accuracy results.
[ x ] If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2026-04-03T08:06:14Z

Thanks for your contribution!

codecov-commenter · 2026-04-03T09:29:19Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (release/2.5@bccf388). Learn more about missing BASE report.

Additional details and impacted files

@@              Coverage Diff               @@
##             release/2.5    #7184   +/-   ##
==============================================
  Coverage               ?   69.19%           
==============================================
  Files                  ?      390           
  Lines                  ?    54384           
  Branches               ?     8574           
==============================================
  Hits                   ?    37631           
  Misses                 ?    14041           
  Partials               ?     2712

Flag	Coverage Δ
GPU	`69.19% <100.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

fastdeploy-bot

🤖 AI Code Review | 2026-04-03 17:30 CST

📋 Review 摘要

PR 概述：将 UnquantizedLinearMethod.apply() 中的 matmul + add 替换为 paddle.nn.functional.linear，优化带 bias 场景的计算性能

变更范围：model_executor/layers/linear.py（核心优化）、tests/e2e/utils/（测试 baseline 路径更新）

影响面 Tag：OP

问题

级别	文件	概述
🟡 建议	`rollout_routing_replay_test_utils.py:160`	baseline 路径硬编码日期 `0403`，建议语义化命名

总体评价

核心优化逻辑正确，新增的 bias 维度 assert 检查增强了代码健壮性。测试文件的 baseline 路径变更建议改用更语义化的版本命名方式。

fastdeploy-bot · 2026-04-03T09:30:57Z

tests/e2e/utils/rollout_routing_replay_test_utils.py

    if model_path:
-        baseline_path = os.path.join(model_path, f"R3_BaseLine_25_uint8/routing_replay_output_baseline_{model_name}")
+        baseline_path = os.path.join(
+            model_path, f"R3_BaseLine_dev_uint8_0403/routing_replay_output_baseline_{model_name}"


🟡 建议 baseline 路径中硬编码了日期 0403。

这种命名方式可能导致未来维护困难：

每次更新 baseline 都需要修改代码中的路径

日期含义不够清晰，dev_uint8_0403 中的 0403 指的是什么不明确

建议：

使用更语义化的版本命名，如 R3_BaseLine_v2.5_uint8

或者将 baseline 路径配置化，通过环境变量或配置文件管理

fastdeploy-bot

🤖 AI Code Review | 2026-04-08 16:57 CST

📋 Review 摘要

PR 概述：将 UnquantizedLinearMethod 中的 matmul + add 融合为 paddle.nn.functional.linear 调用，提升推理性能。

变更范围：model_executor/layers/linear.py、测试文件

影响面 Tag：[Optimization] [OP]

问题

级别	文件	概述
🔴 Bug	`linear.py:87`	assert 语句在每次 forward 时执行，引入额外性能开销

总体评价

PR 的性能优化方向正确，使用 paddle.nn.functional.linear 融合 matmul 和 add 是合理的优化策略。但在热路径（apply 方法）中添加 assert 语句会在每次推理时执行验证，这违背了性能优化的初衷。建议将形状验证移至初始化或权重加载阶段。测试文件中的预期输出更新和 baseline 路径变更属于正常的维护更新。

fastdeploy-bot · 2026-04-08T08:57:17Z

fastdeploy/model_executor/layers/linear.py

-            linear_out = paddle.add(linear_out, layer.bias)
-        return linear_out
+            bias = layer.bias
+            assert bias.dim() == 1 and bias.shape[-1] == layer.weight.shape[-1], (


🔴 Bug 在 apply 方法中每次 forward 都执行 assert 语句检查 bias 形状，这会引入不必要的性能开销，违背了性能优化的初衷。

影响分析：

assert 语句在每次调用 forward_cuda 时都会执行，增加了 Python 层的开销

bias 的形状在初始化时就已经确定，无需在每次调用时验证

这会部分抵消使用 paddle.nn.functional.linear 带来的性能收益

建议修复方式：
将形状检查移到权重加载阶段，例如在 process_loaded_weights 中验证：

def process_loaded_weights(self, layer, weights) -> None: if layer.weight.dtype != weights.dtype: weights = weights.cast(layer.weight.dtype) layer.weight.set_value(weights) # 在这里验证 bias 和 weight 的形状匹配（仅在权重加载时执行一次） if layer.with_bias: assert layer.bias.dim() == 1 and layer.bias.shape[0] == layer.weight.shape[-1], ( f"bias must be 1D with size equal to the last dim of weight, " f"but got bias.shape={layer.bias.shape}, weight.shape[-1]={layer.weight.shape[-1]}" )

这样验证只在模型加载时执行一次，而非每次推理调用。

zoooo0820

LGTM

replace matmul+add to linear

577ca87

BingooYang had a problem deploying to Metax_ci April 3, 2026 08:06 — with GitHub Actions Failure

fastdeploy-bot reviewed Apr 3, 2026

View reviewed changes

EmmonsCurse changed the title ~~[Optimization] merge matmul and add~~ [Cherry-Pick][Optimization] merge matmul and add(#6986) Apr 3, 2026

modify baseline

a2a6f35

BingooYang had a problem deploying to Metax_ci April 8, 2026 08:48 — with GitHub Actions Failure

fastdeploy-bot suggested changes Apr 8, 2026

View reviewed changes

zoooo0820 approved these changes Apr 9, 2026

View reviewed changes

zoooo0820 merged commit 324f083 into PaddlePaddle:release/2.5 Apr 9, 2026
35 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-Pick][Optimization] merge matmul and add(#6986) #7184

[Cherry-Pick][Optimization] merge matmul and add(#6986) #7184
zoooo0820 merged 2 commits intoPaddlePaddle:release/2.5from
BingooYang:2.5/linear_opt

BingooYang commented Apr 3, 2026

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

codecov-commenter commented Apr 3, 2026 •

edited

Loading

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 3, 2026

Uh oh!

fastdeploy-bot left a comment

Uh oh!

fastdeploy-bot Apr 8, 2026

Uh oh!

zoooo0820 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

BingooYang commented Apr 3, 2026

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Apr 3, 2026

Uh oh!

codecov-commenter commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

fastdeploy-bot Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

fastdeploy-bot left a comment

Choose a reason for hiding this comment

📋 Review 摘要

问题

总体评价

Uh oh!

fastdeploy-bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

zoooo0820 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Apr 3, 2026 •

edited

Loading